A Simulation study
In this Appendix, we conduct a simulation study to investigate the power performance when
sample size is 10 in settings where the theoretical power (with type I error rate α = 0.05)
is above 0.8; or, equivalently, the required sample size to achieve 0.8 power is below 10.
Such situations are likely to occur if: (a) The total number of decision times T is relatively
large, due to either a large number of days or large number of decision times per day, (b)
the provided average (standardized) proximal treatment effect is relatively large (say, 0.15),
or the parameterization of the treatment effect is relatively simple, e.g. constant or linear,
and (c) the average of expected availability throughout the study is relatively large. In the
following, we provide simulation results for the above scenarios under different generative
models.
First, we consider the case when the working assumptions made to obtain a tractable
sample size calculation are satisfied; see Liao et al. (2016) for details. Since neither the work-
ing assumptions nor the inputs to the sample size formula specify the error distribution, we
consider five distributions for the outcomes, including independent normal, correlated nor-
mal with different correlation structures, independent t-distribution with three degrees of
freedom (heavy tailed) and independent (centered) exponential distribution with rate pa-
rameter equal to 1 (skewed). The simulation results are provided in Table A1. In general,
these results show that the power performance is still quite robust to different error dis-
tributions when the sample size equal to 10, as in the case of relatively large sample size
demonstrated in Liao et al. (2016).
Secondly, we consider the case in which the working assumptions proposed in Liao et al.
(2016) are not satisfied. In particular, we consider two cases. In the first case, the time-
varying pattern of underlying true proximal effects is different from the user-provided pattern.
For example, the user might input a linear pattern for the treatment effect, but the true effects
are quadratic. Instead the vector of standardized effect, d used in the sample size formula
corresponds to the projection of d(t), that is, d = (
P
T
t=1
E[I
t
]Z
t
Z
T
t
)
−1
P
T
t=1
(E[I
t
]Z
t
d(t)); see
more details in Liao et al. (2016). We consider three different patterns of treatment effect
which cannot be represented as constant, linear and quadratic form, but can be sufficiently
well approximated; see Figure A1. Results are provided in Table A2. As opposed to the case
when sample size is relatively large, the simulated power results when N = 10 are slightly
smaller than the power estimated by MRT-SS Calculator, roughly below 0.05.
In the second case, we investigate the performance when the conditional variance of the
outcome at decision time t, e.g. Var[Y
t+1
|I
t
= 1, A
t
] = A
t
σ
2
1t
+ (1 − A
t
)σ
2
0t
, is time-varying
and depends on the treatment variable A
t
. It was reported in Liao et al. (2016) that when
sample sizes are relatively large, power might decrease slightly depending on the choice of
σ
0t
and σ
1t
. Here, we investigate whether robustness is maintained in small sample sizes
(e.g. N = 10). We consider three time-varying trends for the average conditional variance
¯σ
2
t
= ρσ
2
1t
+ (1 − ρ)σ
2
0t
, together with different ratio of σ
1t
and σ
0t
; see Figure A2. The
simulation results are given in Table A3. It can be seen that the ratio factor σ
1t
/σ
0t
has
almost no impact on the simulated power. Large variation in ¯σ
t
, e.g. trend 3 in Figure
A2, reduces the power in all cases. This is also true when sample size is relatively large:
15