An R Shiny Application for Sample Size Calculation in Micro

MRT-SS Calculator: An R Shiny Application for

Sample Size Calculation in Micro-Randomized Trials

Nicholas J. Seewald

, Ji Sun

, and Peng Liao

Department of Statistics, University of Michigan

Abstract

The micro-randomized trial (MRT) is a new experimental design which allows for

the investigation of the proximal eﬀects of a “just-in-time” treatment, often provided

via a mobile device as part of a mobile health intervention. As with a traditional

randomized controlled trial, computing the minimum required sample size to achieve a

desired power is a crucial step in designing an MRT. We present MRT-SS Calculator, an

online sample-size calculator for micro-randomized trials, built with R Shiny. MRT-SS

Calculator requires speciﬁcation of time-varying patterns for the proximal treatment

eﬀect and expected treatment availability. We illustrate the implementation of MRT-

SS Calculator using a mobile health trial, HeartSteps. The application can be accessed

from https://pengliao.shinyapps.io/mrt-calculator.

1 Introduction

Due to recent advances in mobile technologies, including smartphones and sophisticated

wearable sensors, mobile health (mHealth) technologies are drawing much attention in the

behavioral health community. In “just-in-time” mobile interventions, treatments, provided

via a mobile or wearable device, are intended to help an individual in the moment. For

example, treatments might promote engaging in a healthy behavior when an opportunity

arises, or successfully coping with a stressful event. A challenge in developing just-in-time

mobile interventions is the limited experimental methodology available to support their de-

velopment. Current experimental designs do not readily enable researchers to empirically

investigate whether a just-in-time treatment had the intended eﬀect, or when and in which

context it is useful to deliver a treatment.

Recently, a new experimental design, called the micro-randomized trial (MRT), has been

developed to assess the eﬀects of these interventions (Liao et al., 2016). In these trials,

participants are sequentially randomized between intervention options at each of many occa-

sions (decision times) at which treatment might be provided. As with traditional randomized

controlled trials, determining sample size is an important part of the process of designing a

arXiv:1609.00695v3 [stat.ME] 5 Aug 2020

micro-randomized trial. More speciﬁcally, it is important for the scientist to justify the num-

ber of participants or experimental units needed in the study to address a speciﬁc scientiﬁc

question with a given power (Noordzij et al., 2010).

Here, we introduce MRT-SS Calculator, a user-friendly, web-based application designed

to facilitate determination of the minimal sample size needed to detect a given eﬀect of a

just-in-time intervention. The application was built using R Shiny (R Core Team, 2016;

Chang et al., 2016). Commonly, tools used to compute sample sizes can be diﬃcult to

use, particularly for non-statisticians. MRT-SS Calculator provides a clean, intuitive user

interface which elicits study parameters from the scientist in a thoughtful way, while still

oﬀering enough ﬂexibility to accommodate more complex trials. The calculator is based on

methodology reviewed in Section 2. In Section 3, we provide a detailed tour of MRT-SS

Calculator and describe its use. Finally, in Section 4, we introduce HeartSteps, a micro-

randomized trial investigating the use of mobile interventions to increase physical activity

in sedentary adults. With HeartSteps as an example, we illustrate the use of MRT-SS

Calculator in a “real-world” setting.

2 Review of methodology used in MRT-SS Calculator

MRT-SS Calculator implements the methodology developed in Liao et al. (2016). Here,

we present a brief review. A micro-randomized trial provides participants with randomly-

assigned treatments at each of T decision times, indexed by t ∈ [T ]. Depending on the study,

the number of decision times T may be in the 100’s or 1000’s; in the HeartSteps example

described in Section 4, T = 210. A simpliﬁed version of the longitudinal data collected from

each participant in an MRT can be written as

, I

, A

, Y

, I

, A

, Y

, . . . , I

, A

, Y

t+1

, . . . , I

, A

, Y

T +1

where S

is a vector of the individual’s baseline information (e.g., age, gender). This calcula-

tor is developed for binary treatment, A

∈ {0, 1}; that is, there are two intervention options

at decision time t. The proximal response to treatment A

is denoted by Y

t+1

. I

is an indi-

cator for the participant’s “availability” at time t. At some decision times, the participant

may be unavailable for treatment; that is, it may be unethical, scientiﬁcally inappropriate, or

infeasible to deliver a treatment. For example, if treatment involves delivering messages via

audible and visual cues on a smartphone, it would be considered unethical to deliver these

potentially distracting messages while the participant is driving a car. In such situations,

the participant is classiﬁed as unavailable, and I

= 0. During time periods when I

= 0,

participants are not randomized and A

is left undeﬁned.

At decision time t, the proximal eﬀect of a treatment, denoted β(t), is deﬁned as

β(t) = E [Y

t+1

| I

= 1, A

= 1] − E [Y

t+1

| I

= 1, A

= 0] ,

the diﬀerence in expected proximal response conditioned on diﬀerent treatment options.

Note β(t) is deﬁned only for those participants who are available at decision time t.

We are interested in testing the null hypothesis

: β(t) = 0, t = 1 . . . T

against the alternative

: β(t) > 0, for some t.

We wish to ﬁnd the minimal sample size needed to detect H

with desired power. To

construct a test statistic and derive a sample size formula, we target alternatives in H

that

are linear in a vector parameter β ∈ R

; in particular, we target β(t) of the form

β(t) = Z

β, t = 1, . . . , T,

where Z

is a p ×1 vector function of t and covariates that are unaﬀected by treatment, such

as gender, time of day, and day of the week. For example, consider a study in which Z

is a linear function of time in days. We refer to this as a “linear alternative.” If there are

5 decision times per day, then β(t) = β

+ b

t−1

cβ

, which can be written as Z

β, where



1, b

t−1



and β = (β

, β

)

. Note that b

t−1

c translates the index of each treatment

occasion to an index of the number of days that have elapsed since the outset of the study.

In Section 3.3, we consider other treatment eﬀect trends.

We construct a test statistic based on the least-squares estimator of β from the following

working model for E[Y

t+1

= 1, A

α + (A

− ρ

β, t = 1, . . . , T, (1)

where B

is a q × 1 vector function of t and covariates that are unaﬀected by treatment,

such as gender, time of day, and day of the week, and ρ

= P [A

= 1] is the randomization

probability at decision time t. The associated test statistic is given by

−1

where

β is the least-squares estimator that minimizes

(

t=1

t+1

− B

α − (A

− ρ

β)

)

where P

is the average over the sample with size N, and

is an estimator of the asymptotic

variance of

√

β (Liao et al., 2016). The rejection region for the test is



−1

β >

N − q − p

p(N − q − 1)

−1

p,N−q−p

(1 − α

)



where F

p,N−q−p

is the distribution function of a F -distribution with d

= p and d

= N −q−p.

To derive a tractable sample size formula, Liao et al. (2016) made additional working

assumptions. Under these assumptions, the minimum-required sample size N to detect the

alternative with power 1 − β

is found by solving

p,N−q−p;c



−1

p,N−q−p

(1 − α

)



= 1 − β

where F

p,N−q−p;c

is the distribution function of a non-central F -distribution with d

p, d

= N − q − p and the non-centrality parameter

= Nd

t=1

E[I

]ρ

(1 − ρ

where d is a p-dimensional vector for the standardized treatment eﬀects, i.e., β(t)/¯σ = Z

see the deﬁnition of ¯σ in Liao et al. (2016). We note that to calculate the sample size, it

is enough to know q, i.e., the length of B

that is used to model the average outcome in

(1). In MRT-SS Calculator presented below, we assume q = 3 for simplicity (e.g., when B

corresponds to a quadratic pattern). The reader can use the R package (“MRTSampleSize”)

to obtain the sample size in the general setting.

3 Using MRT-SS Calculator

In this section, we will explain how to use MRT-SS Calculator. In brief, the user should

provide basic information about the study as well as the alternative hypothesis and expected

availability of participants over the course of the trial. MRT-SS Calculator will return either

the minimum required sample size to achieve a speciﬁed power, or the power achievable given

a speciﬁed number of participants. We explore each of these components in detail below.

3.1 Study setup

Scientists using MRT-SS Calculator are ﬁrst prompted to provide speciﬁc details describing

their planned trial. These include the study’s duration in days, the daily number of decision

times at which treatment is randomly assigned, and the probability of being randomized to

receive treatment.

For example, Figure 1 describes a study which takes place over 42 days with up to 5 ran-

domizations per day, where treatment is delivered with probability 0.4 at each decision time

= 1 if treatment is delivered, A

= 0 if no treatment). Using the notation established

in Section 2, T = 42 × 5 = 210, and ρ

= P [A

= 1] = 0.4 for all t ∈ [T ]. Given this ran-

domization probability, participants in the study will be delivered an average two treatments

per day, provided they are available. Treatments are not provided when the participant is

unavailable (and thus the participant is not randomized), for the reasons described above.

Some users may wish to vary the randomization probability over the course of the study.

MRT-SS Calculator is ﬂexible enough to accommodate this. The user can upload a .csv

ﬁle containing randomization probabilities for each day of the study, or for every individual

decision time point (see Figure 2).

3.2 Specifying patterns for expected availability

MRT-SS Calculator requires the speciﬁcation of a time-varying expected availability E[I

]

for each decision time t ∈ [T ]. Recall from Section 2 that availability may vary according to

(a) User speciﬁcation of study duration in

days and the number of randomizations per

day.

(b) User speciﬁcation of a constant probabil-

ity of randomization to treatment.

Figure 1: Study setup for a 42-day trial in which participants are randomized to receive

treatment up to ﬁve times per day, conditional on I

= 1 (a), each with probability 0.4 (b).

Figure 2: Time-varying randomization probabilities. The user speciﬁes whether to provide

probabilities which change either daily or at each decision time, and uploads a .csv ﬁle

containing (index, probability) pairs. A template ﬁle is provided for ease of use, and a

preview of the uploaded ﬁle is shown for veriﬁcation.

many factors, including, for example, whether the participant has turned oﬀ the intervention.

MRT-SS Calculator provides three classes of trends for expected availability: constant, linear,

and quadratic (see Figure 3). These trends are averaged over each day, i.e., if there are

multiple decision times per day then the trend is in the average over the multiple decision

times.

The diﬀerent classes of expected availability patterns correspond to a variety of scenarios.

For example, if it is believed that participants will be more likely to turn oﬀ the intervention

as the study goes on, then expected availability will decrease. This might occur if, for

example, participants ﬁnd the interventions more burdensome as the study goes on.

MRT-SS Calculator requires the user to provide inputs which fully specify the pattern of

expected availability over the course of the study. For example, after selecting the quadratic

class of trends, the user is prompted to provide an estimate of participants’ average availabil-

ity throughout the study, the estimated availability for participants at the outset of the trial,

and the day of maximum or minimum availability (the “changing point”). See Figure 4 for

examples of how diﬀerent values of these inputs change the pattern of expected availability.

3.3 Specifying the targeted alternative eﬀect

MRT-SS Calculator requires the user to specify the desired detectable standardized treatment

eﬀect d. Recall from Section 2 that the targeted alternative eﬀect is of the form β(t) = Z

β,

where Z

is some function of time. Here d is a standardized β; see Liao et al. (2016). MRT-

SS Calculator allows the user to choose the form of Z

from constant, linear, or quadratic

classes of trends (see Figure 5). In all classes the trends are averaged over each day, as with

the pattern of expected availability (see Section 3.2).

Each of these classes corresponds to a diﬀerent possible targeted alternative. A constant

trend is most useful if the user believes that the eﬀect of the treatment will be relatively

stable over the duration of the study. A linear trend is useful if, for example, users believe

that the eﬀect of the treatment may grow over time and is unlikely to dissipate at later

decision times. A quadratic trend would be useful if it is believed that the treatment eﬀect

will grow initially but may dissipate with time as participants begin to ignore or disengage

from treatment.

3.4 Output

MRT-SS Calculator allows the user to choose to compute either the minimum sample size

required to achieve a speciﬁed power or the achievable power given a speciﬁed sample size.

In both cases, the signiﬁcance level of the test must be supplied. Figure 6 shows the use

of MRT-SS Calculator to obtain a minimum sample size for a 42-day study with 5 decision

times per day. All computed results from a session are saved, and users can view and/or

download all past output from their current session.

(a) Speciﬁcation of a constant availability pattern over the course of the trial.

(b) Speciﬁcation of a linearly-increasing availability pattern over the course of the trial. A linearly-

decreasing pattern can be speciﬁed by adjusting the initial value and average.

over the course of the trial. “Changing point” refers to the day on trial at which expected availability

is a maximum or minimum.

Figure 3: Three examples of possible patterns of expected availability, each falling in one of

three classes: constant (a), linear (b), and quadratic (c).

(a) A concave quadratic pattern of expected availability. Availability increases until day 25, when

it is maximized, then decreases until the end of the trial.

(b) A convex quadratic pattern of expected availability. Availability decreases until day 20, when

it is minimized, then increases until the end of the trial.

Figure 4: Diﬀerent patterns of expected availability in the quadratic class.

(a) Speciﬁcation of a constant targeted alternative proximal treatment eﬀect.

(b) Speciﬁcation of a linearly-increasing targeted alternative proximal treatment eﬀect. Linearly

decreasing eﬀects can be speciﬁed by changing the average and initial values.

Figure 5: Example inputs for the three available classes of trends for the standardized

proximal treatment eﬀect: constant (a), linear (b), and quadratic (c). Included in the plots

are the null hypothesis (β(t) = 0 for all t) in blue, the speciﬁed average treatment eﬀect in

black, and the alternative β(t) = Z

β in red.

(a) Selection of calculator output. To determine minimum-required sample size for a trial, the

user inputs the desired power to detect the targeted alternative proximal treatment eﬀect, and the

signiﬁcance level for the planned hypothesis test.

(b) Computed minimum-required sample sizes for all combinations of patterns in Figures 3 (avail-

ability) and 5 (proximal treatment eﬀect) with study setup as in Figure 1, desired power 0.8 and

signiﬁcance level 0.05.

Figure 6: Output speciﬁcation and result history. After having provided all required inputs

described in Section 3, the user selects the type of outcome desired (a). The user can access

a record of past outputs from the current session (b).

Figure 7: Error handling for inputs which lead to negative proximal treatment eﬀects. In

a 42-day study, choosing “Day of Maximal Proximal Eﬀect” less than 22 when using the

quadratic class with an initial eﬀect of zero will result in a negative value of proximal treat-

ment eﬀect on some days. The application will produce an error message.

3.5 Error handling

The MRT-SS Calculator delivers warnings when inappropriate inputs are provided. For

example, the warning provided in Figure 7 will appear if the entered “Day of Maximal

Proximal Eﬀect” is less than 22 in 42-day study when the initial eﬀect is set to be 0. This

is because some values of the resulting treatment eﬀect are less than 0, which should be

avoided. When the calculated sample size is less than 10, MRT-SS Calculator will return

10 with a warning. In Appendix A, we conduct a simulation study to investigate diﬀerent

scenarios in which the required sample size is less than 10, or equivalently the estimated

power for a sample size of 10 is larger than the desired power. In particular, we compare the

power estimated by MRT-SS Calculator with the simulated power under diﬀerent generative

models. In general, the power performance under a variety of generative models when sample

size is equal to 10 is slightly degenerated compared to scenarios with relatively large sample

sizes.

4 An example: HeartSteps

4.1 Introduction to HeartSteps

HeartSteps is a mobile intervention study designed to increase physical activity for sedentary

adults (Klasnja et al., 2015). HeartSteps uses a mobile application to investigate the eﬀects

of in-the-moment activity suggestions. These are messages which encourage the participant

to engage in physical activity and which appear on the lock screen of the participant’s mobile

phone. The suggestions may vary in content depending on a number of contextual factors

such as weather, the participant’s location, or the time of day. Participants are randomized

to either receive or not receive a suggestion at ﬁve pre-speciﬁed decision times throughout the

day. These time points correspond roughly to a morning commute, mid-day, mid-afternoon,

an evening commute, and after dinner. When a suggestion is delivered, it is displayed on

the lock screen of the phone, which then plays a notiﬁcation sound, vibrates, or lights up.

In HeartSteps, data are collected both passively via sensors and actively through partici-

pant self-report. Each participant is provided a Jawbone UP activity tracker which monitors

and records step count. Furthermore, sensors on the phone are used to collect a variety of

information at each of the ﬁve decision times during the day. This information includes the

participant’s current location and activity status (e.g., walking, driving, etc.). If sensors

indicate that the individual is likely walking or driving a car, activity suggestions are not

delivered (the availability indicator I

is set to 0).

4.2 Illustration of MRT-SS Calculator with HeartSteps

HeartSteps is a 42-day trial with ﬁve decision times per day, so that T = 210. Suppose we

wish to size the trial to detect a given proximal eﬀect of the intervention on a participant’s

step count. Notice that this is a binary treatment: participants are randomized to be

delivered a suggestion or not. Suggestions are delivered with constant probability ρ =

P [A

= 1] = 0.4 over the course of the study, so that if a participant is always available, an

average of two messages are delivered per day.

The proximal treatment eﬀect may vary across time for a variety of reasons. In Heart-

Steps, the treatment eﬀect might initially increase, as it is believed that participants will

enthusiastically engage with the intervention at the outset. Then, as the study goes on, some

participants may disengage or begin to ignore the activity suggestions due to habituation, so

we expect a decreasing proximal treatment eﬀect. Thus, a plausible target alternative eﬀect

would be quadratic in time.

For example, we might be interested in the sample size needed to achieve at least 80%

power when there is no initial treatment eﬀect on the ﬁrst day and the maximal proximal

eﬀect comes around day 28, or the amount of power we can guarantee with a sample size

of 40. The results of sample sizes and power calculations for HeartSteps are provided in

Figure 8.

5 Conclusion

MRT-SS Calculator is designed to conduct sample size and power calculations for micro-

randomized trials, a new experimental framework which allows for the investigation of the

eﬀects of just-in-time mobile health interventions. Such interventions are of interest not only

in the realm of physical activity, as in the example in Section 4, but also in the study of smok-

ing cessation, obesity, congestive heart failure, and healthy eating, among others. MRT-SS

Calculator is accessible, designed to elicit trial information from the scientist through a clear,

well-explained sequence of inputs and provides real-time visual feedback. The calculator is

also ﬂexible, and capable of sizing trials which vary in complexity. MRT-SS Calculator can

be used to quickly and reliably determine the minimal sample size needed to achieve a given

power, or the power achievable given a sample size. MRT-SS Calculator can be used by

(a) Example sample size output for HeartSteps with a given target power of 80% and varying target

average proximal treatment eﬀects.

(b) Example power output for HeartSteps with a given sample size of 40 and varying target average

proximal treatment eﬀects. The application does not display power less than 50%.

Figure 8: Illustrative sample size (a) and power (b) calculations for HeartSteps. The result

is displayed in the ﬁrst column, while the remaining columns are used to describe inputs

provided by the user.

scientists to ease the burden of designing micro-randomized trials to investigate proximal

treatment eﬀects.

Acknowledgments

The development of this application, as well as the preparation of the manuscript, was under-

taken with support from NIAAA R01 AA023187, NIDA P50 DA039838, NIBIB U54EB020404,

and NHLBI/NIA R01 HL125440. The authors wish to thank Prof. Susan A. Murphy for

her thoughtful advice and support throughout the course of this work.

References

Chang W, Cheng J, Allaire J, Xie Y, McPherson J (2016). Shiny: Web Application Frame-

work for R. R package version 0.13.2, URL https://CRAN.R-project.org/package=

shiny.

Klasnja P, Hekler EB, Shiﬀman S, Boruvka A, Almirall D, Tewari A, Murphy SA (2015).

“Microrandomized Trials: An Experimental Design for Developing Just-in-Time Adaptive

Interventions.” Health Psychology, 34, 1220–1228.

Liao P, Klasnja P, Tewari A, Murphy SA (2016). “Sample Size Calculations for Micro-

Randomized Trials in mHealth.” Statistics in Medicine, 35, 1944–1971.

Noordzij M, Tripepi G, Dekker FW, Zoccali C, Tanck MW, Jager KJ (2010). “Sample Size

Calculations: Basic Principles and Common Pitfalls.” Nephrology Dialysis Transplanta-

tion, 25(5), 1388–1393.

R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foun-

dation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

A Simulation study

In this Appendix, we conduct a simulation study to investigate the power performance when

sample size is 10 in settings where the theoretical power (with type I error rate α = 0.05)

is above 0.8; or, equivalently, the required sample size to achieve 0.8 power is below 10.

Such situations are likely to occur if: (a) The total number of decision times T is relatively

large, due to either a large number of days or large number of decision times per day, (b)

the provided average (standardized) proximal treatment eﬀect is relatively large (say, 0.15),

or the parameterization of the treatment eﬀect is relatively simple, e.g. constant or linear,

and (c) the average of expected availability throughout the study is relatively large. In the

following, we provide simulation results for the above scenarios under diﬀerent generative

models.

First, we consider the case when the working assumptions made to obtain a tractable

sample size calculation are satisﬁed; see Liao et al. (2016) for details. Since neither the work-

ing assumptions nor the inputs to the sample size formula specify the error distribution, we

consider ﬁve distributions for the outcomes, including independent normal, correlated nor-

mal with diﬀerent correlation structures, independent t-distribution with three degrees of

freedom (heavy tailed) and independent (centered) exponential distribution with rate pa-

rameter equal to 1 (skewed). The simulation results are provided in Table A1. In general,

these results show that the power performance is still quite robust to diﬀerent error dis-

tributions when the sample size equal to 10, as in the case of relatively large sample size

demonstrated in Liao et al. (2016).

Secondly, we consider the case in which the working assumptions proposed in Liao et al.

(2016) are not satisﬁed. In particular, we consider two cases. In the ﬁrst case, the time-

varying pattern of underlying true proximal eﬀects is diﬀerent from the user-provided pattern.

For example, the user might input a linear pattern for the treatment eﬀect, but the true eﬀects

are quadratic. Instead the vector of standardized eﬀect, d used in the sample size formula

corresponds to the projection of d(t), that is, d = (

t=1

E[I

)

−1

t=1

(E[I

d(t)); see

more details in Liao et al. (2016). We consider three diﬀerent patterns of treatment eﬀect

which cannot be represented as constant, linear and quadratic form, but can be suﬃciently

well approximated; see Figure A1. Results are provided in Table A2. As opposed to the case

when sample size is relatively large, the simulated power results when N = 10 are slightly

smaller than the power estimated by MRT-SS Calculator, roughly below 0.05.

In the second case, we investigate the performance when the conditional variance of the

outcome at decision time t, e.g. Var[Y

t+1

= 1, A

] = A

+ (1 − A

)σ

, is time-varying

and depends on the treatment variable A

. It was reported in Liao et al. (2016) that when

sample sizes are relatively large, power might decrease slightly depending on the choice of

and σ

. Here, we investigate whether robustness is maintained in small sample sizes

(e.g. N = 10). We consider three time-varying trends for the average conditional variance

¯σ

= ρσ

+ (1 − ρ)σ

, together with diﬀerent ratio of σ

and σ

; see Figure A2. The

simulation results are given in Table A3. It can be seen that the ratio factor σ

/σ

has

almost no impact on the simulated power. Large variation in ¯σ

, e.g. trend 3 in Figure

A2, reduces the power in all cases. This is also true when sample size is relatively large:

D K Z

Estimated

Power

i.i.d.

Normal

i.i.d.

t dist.

i.i.d.

Exp.dist.

(-0.8)

(-0.5)

(0.5)

(0.8)

CSblock

(0.5)

CSblock

(0.8)

100 5 0 0.12 0.839 0.818 0.806 0.820 0.813 0.801 0.847 0.817 0.926 0.936

100 5 1 0.15 0.914 0.891 0.906 0.892 0.885 0.896 0.871 0.879 0.961 0.977

100 5 2 0.20 0.907 0.854 0.861 0.862 0.859 0.859 0.862 0.854 0.929 0.947

50 10 0 0.12 0.839 0.816 0.814 0.806 0.818 0.820 0.818 0.826 0.867 0.888

50 10 1 0.15 0.915 0.886 0.882 0.884 0.900 0.906 0.904 0.897 0.932 0.936

50 10 2 0.20 0.907 0.876 0.868 0.848 0.856 0.823 0.847 0.861 0.899 0.928

25 25 0 0.12 0.908 0.891 0.884 0.903 0.906 0.888 0.899 0.896 0.906 0.916

25 25 1 0.15 0.963 0.945 0.948 0.956 0.940 0.938 0.950 0.961 0.962 0.955

25 25 2 0.20 0.955 0.928 0.932 0.936 0.943 0.930 0.942 0.921 0.940 0.941

10 50 0 0.12 0.839 0.806 0.809 0.816 0.823 0.815 0.802 0.794 0.824 0.842

10 50 1 0.15 0.926 0.903 0.893 0.902 0.887 0.916 0.892 0.890 0.908 0.911

10 50 2 0.20 0.912 0.868 0.850 0.879 0.865 0.867 0.850 0.859 0.889 0.881

Table A1: Simulation results when working assumptions are true. D = Number of Days. K = Number of decision times

per day. Z refers to the parameterization of the treatment eﬀect in both the sample size model and simulation: 0 =

Constant, 1 = Linear, and 2 = Quadratic.

d is the average standardized treatment eﬀect. In all cases, the initial eﬀect is

0, the treatment eﬀects are identical within same day and the maximal eﬀect is reached midway through the study. The

underlying true eﬀects in the generative model are the same as in the sample size model. The expected availability is

assumed to be constant throughout the study and equal to 0.7. For the error distributions: the t distribution is used with

3 degrees of freedom; the rate parameter in the exponential distribution is 1; AR(ρ) and CSblock(ρ) are the correlated

normal distributions with correlation structure Σ = (Σ

) satisfying Σ

= ρ

|i−j|

, and Σ

= ρ for i 6= j in the same day,

otherwise 0, respectively. Results are based on 1,000 replications.

(a) Trend 1: Maintained ef-

fect

(b) Trend 2: Slightly de-

graded eﬀect

Figure A1: Proximal Treatment Eﬀects {β(t)}

t=1

: representing maintained (a), slightly

degraded (b), or severely degraded (c) time-varying treatment eﬀects. The horizontal axis

is the decision time point. The vertical axis is the standardized treatment eﬀect.

the reduction in power is similar, on average 0.05. When treatment eﬀects are constant, or

quadratic with maximal eﬀect midway through the study, either decreasing or increasing ¯σ

does not aﬀect power substantially. When treatment eﬀects are linear, an increasing trend

(Figure A2a) lowers the power, while a decreasing trend (Figure A2b) improves the power.

D K Z β(t)

Estimated

Power

Simulated

Power

100 5 2 (a) 0.20 0.905 0.828

100 5 2 (b) 0.20 0.901 0.833

100 5 2 (c) 0.20 0.931 0.897

50 10 2 (a) 0.20 0.906 0.867

50 10 2 (b) 0.20 0.903 0.851

50 10 2 (c) 0.20 0.933 0.898

25 25 2 (a) 0.20 0.957 0.927

25 25 2 (b) 0.20 0.960 0.942

25 25 2 (c) 0.20 0.977 0.972

10 50 2 (a) 0.20 0.920 0.875

10 50 2 (b) 0.20 0.917 0.872

10 50 2 (c) 0.20 0.952 0.933

100 5 1 (a) 0.15 0.871 0.822

100 5 1 (b) 0.15 0.841 0.787

100 5 1 (c) 0.15 0.820 0.758

50 10 1 (a) 0.15 0.873 0.835

50 10 1 (b) 0.15 0.842 0.808

50 10 1 (c) 0.15 0.820 0.755

25 25 1 (a) 0.15 0.923 0.896

25 25 1 (b) 0.15 0.904 0.857

25 25 1 (c) 0.15 0.896 0.868

10 50 1 (a) 0.15 0.889 0.877

10 50 1 (b) 0.15 0.853 0.814

10 50 1 (c) 0.15 0.822 0.768

100 5 0 (a) 0.12 0.839 0.825

100 5 0 (b) 0.12 0.839 0.818

100 5 0 (c) 0.12 0.839 0.828

50 10 0 (a) 0.12 0.839 0.792

50 10 0 (b) 0.12 0.839 0.821

50 10 0 (c) 0.12 0.839 0.774

25 25 0 (a) 0.12 0.908 0.890

25 25 0 (b) 0.12 0.908 0.886

25 25 0 (c) 0.12 0.908 0.893

10 50 0 (a) 0.12 0.839 0.819

10 50 0 (b) 0.12 0.839 0.833

10 50 0 (c) 0.12 0.839 0.826

Table A2: Simulation result when treatment eﬀect is wrongly speciﬁed. D = Number of

Days. K = Number of decision times per day. Z refers to the parameterization of the

treatment eﬀect in both sample size model and simulation; 0 = Constant, 1 = Linear, and

2 = Quadratic. β(t) is the underlying true treatment eﬀect in the generative model; entries

correspond to subﬁgures of Figure A1.

d is the average of standardized treatment eﬀect.

The expected availability is assumed to be constant throughout the study and equal to 0.7.

The error distribution in the generative model is i.i.d. normal. Results are based on 1,000

replications.

Estimated ratio = 0.8 ratio = 1.0 ratio = 1.2

D K Z

d power Trend 1 Trend 2 Trend 3 Trend 1 Trend 2 Trend 3 Trend 1 Trend 2 Trend 3

100 5 2 0.20 0.907 0.866 0.864 0.828 0.864 0.859 0.819 0.848 0.874 0.834

50 10 2 0.20 0.907 0.843 0.863 0.827 0.851 0.850 0.830 0.847 0.854 0.828

25 25 2 0.20 0.955 0.932 0.940 0.917 0.916 0.938 0.908 0.935 0.935 0.901

10 50 2 0.20 0.912 0.845 0.891 0.830 0.852 0.900 0.849 0.846 0.883 0.849

100 5 1 0.15 0.914 0.830 0.948 0.878 0.811 0.942 0.849 0.831 0.940 0.875

50 10 1 0.15 0.915 0.823 0.941 0.878 0.811 0.950 0.854 0.800 0.959 0.871

25 25 1 0.15 0.963 0.887 0.991 0.946 0.898 0.983 0.938 0.920 0.990 0.935

10 50 1 0.15 0.926 0.802 0.962 0.892 0.803 0.959 0.887 0.847 0.960 0.887

100 5 0 0.12 0.839 0.803 0.816 0.774 0.820 0.787 0.789 0.796 0.832 0.781

50 10 0 0.12 0.839 0.815 0.813 0.793 0.832 0.810 0.758 0.798 0.805 0.778

25 25 0 0.12 0.908 0.895 0.879 0.888 0.888 0.876 0.895 0.890 0.893 0.880

10 50 0 0.12 0.839 0.822 0.825 0.785 0.807 0.822 0.777 0.827 0.814 0.787

Table A3: Simulation result when the conditional variance of the outcome is time-varying and depends on the treatment

variable. D = Number of Days. K = Number of decision times per day. Z refers to the parameterization of the treatment

eﬀect in both sample size model and simulation; 0 = Constant, 1 = Linear, and 2 = Quadratic.

d is the average of

standardized treatment eﬀect. In all cases, the initial eﬀect is 0, the treatment eﬀects are identical within same day and

attains the maximal eﬀect midway through the study. The underlying true eﬀects in the generative model are same as

in the sample size model. The expected availability is assumed to be constant throughout the study and equal to 0.7.

The ratio is deﬁned as σ

/σ

and is assumed constant. The trend refers to three time-varying patterns of the average

conditional variance {¯σ

}

t=1

; see Figure A2. Results are based on 1,000 replications.

(a) Trend 1: Linearly increas-

ing

(b) Trend 2: Linearly decreas-

ing

uous

Figure A2: Trends of ¯σ

. For all trends, ¯σ

is scaled so that (1/T )

t=1

¯σ

= 1. The

horizontal axis is the decision time point. The vertical axis is the average conditional variance

¯σ