Electronic Journal of Statistics
Vol. 17 (2023) 1492–1546
ISSN: 1935-7524
https://doi.org/10.1214/23-EJS2137
Pretest estimation in combining
probability and non-probability samples
Chenyin Gao and Shu Yang
Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
Abstract: Multiple heterogeneous data sources are becoming increasingly
available for statistical analyses in the era of big data. As an important
example in finite-population inference, we develop a unified framework of
the test-and-pool approach to general parameter estimation by combining
gold-standard probability and non-probability samples. We focus on the
case when the study variable is observed in both datasets for estimating
the target parameters, and each contains other auxiliary variables. Utilizing
the probability design, we conduct a pretest procedure to determine the
comparability of the non-probability data with the probability data and
decide whether or not to leverage the non-probability data in a pooled
analysis. When the probability and non-probability data are comparable,
our approach combines both data for efficient estimation. Otherwise, we
retain only the probability data for estimation. We also characterize the
asymptotic distribution of the proposed test-and-pool estimator under a
local alternative and provide a data-adaptive procedure to select the critical
tuning parameters that target the smallest mean square error of the test-
and-pool estimator. Lastly, to deal with the non-regularity of the test-and-
pool estimator, we construct a robust confidence interval that has a good
finite-sample coverage property.
MSC2020 subject classifications: Primary 62D05; secondary 62E20,
62F03, 62F35.
Keywords and phrases: Data integration, dynamic borrowing, non-regularity,
Pretest estimator.
Received October 2022.
Contents
1 Introduction................................ 1493
2 Basicsetup ................................ 1495
2.1 Notation:twodatasources .................... 1495
2.2 Assumptionsandseparateestimators............... 1496
2.3 Ecientestimator ......................... 1498
3 Test-and-pool estimator . . . . . . . . . . . . . . . . . . . . . . . . . 1499
3.1 Hypothesisandtest ........................ 1499
3.2 Data-driven pooling . . . . . . . . . . . . . . . . . . . . . . . . 1500
4 Asymptotic properties of the test-and-pool estimator . . . . . . . . . 1501
4.1 Asymptoticdistribution ...................... 1501
Yang’s research is partially supported by the NIH 1R01AG066883 and 1R01ES031651.
1492
Test-and-pool estimator 1493
4.2 Asymptotic bias and mean squared error . . . . . . . . . . . . . 1502
4.3 Adaptiveinference ......................... 1503
5 Simulationstudy ............................. 1505
6 A real-data illustration . . . . . . . . . . . . . . . . . . . . . . . . . . 1508
7 Concludingremarks ........................... 1510
A Proofs................................... 1510
A.1 Regularityconditions........................ 1510
A.2 Proof of Lemmas 2.1 and 3.1 ................... 1512
A.3 Proof of Lemma 3.2 ........................ 1516
A.4 Proof of Theorem 4.1 ........................ 1517
A.5 Proof of the bias and mean squared error of n
1/2
(μ
tap
μ
g
).. 1520
A.6 Proof of the asymptotic distribution for U(a) .......... 1522
A.7 Proof of Theorem 4.2 ........................ 1524
A.8 Proof of Remark 4.1 ........................ 1525
A.9 Proof of Lemma A.1 ........................ 1526
A.10 Proof of Lemma A.2 ........................ 1527
A.11 Proof of Lemma A.3 ........................ 1528
B Simulation................................. 1530
B.1 A detailed illustration of simulation . . . . . . . . . . . . . . . 1530
B.2 A detailed illustration of bias and mean squared error . . . . . 1536
B.3 Additionalsimulationresults ................... 1539
B.4 Double-bootstrap procedure for v
n
selection........... 1540
B.5 Details of the Bayesian method . . . . . . . . . . . . . . . . . . 1541
References................................... 1542
1. Introduction
It has been widely accepted that probability sampling, where each selected sam-
ple is treated as a representative sample to the target population, is the best
vehicle for finite-population inference. Since the sampling mechanism is known
based on survey design, each weight-calibrated sample can be used to obtain
consistent estimators for the target population; see [53], [15]and[24]fortext-
book discussions. However, complex and ambitious surveys are facing more and
more hurdles and concerns recently, such as costly intervention strategies and
lower participation rates. [2] address some of the current challenges in using
probability samples for finite-population inference. On the other hand, higher
demands of small area estimation and other more factors have led researchers
to seek out alternative data collection with less program budget [69, 28]. In
particular, lots of attention has been drawn to the studies of non-probability
samples.
Non-probability samples are sets of selected objects where the sampling mech-
anism is unknown. First of all, non-probability samples are readily available from
many data sources, such as satellite information [36], mobile sensor data [40],
and web survey panels [62]. In addition, these non-representative samples are
far more cost-effective compared to probability samples and have the potential
1494 C. Gao and S. Yang
of providing estimates in near real-time, unlike the traditional inferences derived
from probability samples [42]. Based on these big and easy-accessible data, a
wealth of literature has been proposed which enunciates the bright future while
properly utilizing such amount of data (e.g., 18, 14, 61,and41).
However, the naive use of such data cannot ensure the statistical validity of
the resulting estimators because such non-probability samples are often selected
without sophisticated supervision. Therefore, the acquisition of large whereas
highly unrepresentative data is likely to produce erroneous conclusions. [17]and
[23] present more recent examples where non-probability samples can often lead
to estimates with significant selection biases. To overcome these challenges, it is
essential to establish appropriate statistical tools to draw valid inferences when
integrating data from the probability and non-probability samples. Various data
integration methods have been proposed in the literature to leverage the unique
strengths of the probability and non-probability samples; see [73] for a review,
and the existing methods for data integration can be categorized into three types
including the inverse propensity score adjustment [50, 22], calibration weighting
[19, 31], and mass imputation [46, 30, 74, 11].
But most of the works assume that the non-probability sample is comparable
to the probability sample in terms of estimating the finite-population param-
eters, which may not be satisfied in many applications due to the unknown
sampling mechanism of the non-probability samples. Thus, the non-probability
samples with unknown sampling mechanisms may bias the estimators for the
target parameters. To resolve this issue, [47] propose a pretest to gauge the sta-
tistical adequacy of integrating the probability and non-probability samples in
an application. The pretesting procedure has been broadly practiced in econo-
metrics and medicine, and its implications are of considerable interests (e.g.,
[68, 63, 3, 72]). Essentially, the final value of the estimator depends on the out-
come of a random testing event and therefore is a stochastic mixture of two dif-
ferent estimators. Despite the long history of the application of the pretest, few
literature investigates the theoretical properties of the underlying non-smooth
distribution for the pretest estimators.
In this paper, we establish a general statistical framework for the test-and-
pool analysis of the probability and non-probability samples by constructing a
test to gauge the comparability of the non-probability data and decide whether
or not to use non-probability data in a pooled analysis. In addition, we con-
sider the null, fixed, and local alternative hypotheses for the pre-testing, rep-
resenting different levels of comparability of the non-probability data with the
probability data. In particular, the non-probability sample is perfectly compa-
rable under the null hypothesis, whereas it is starkly incomparable under the
fixed alternative. Therefore, the fixed alternative cannot adequately capture the
finite-sample behavior of the pre-testing estimator, under which the test statis-
tic will diverge to infinity as the sample size increases. Toward this end, we
establish the asymptotic distribution of the proposed estimator under local al-
ternatives, which provides a better approximation of the finite-sample behavior
of the pretest estimator when the idealistic assumption required for the non-
probability data is weakly violated. Also, we provide a data-adaptive procedure
Test-and-pool estimator 1495
to select the optimal values of the tuning parameters achieving the smallest
mean square error of the pretest estimator. Lastly, we construct a robust con-
fidence interval accounting for the non-regularity of the estimator, which has a
valid coverage property.
The rest of the paper is organized as follows. Section 2 lays out the basic setup
and presents an efficient estimator for combing the non-probability sample and
the probability sample. Section 3 proposes a test statistic and the test-and-
pool estimator. In Section 4, we present the asymptotic properties of the test-
and-pool estimator, an adaptive inference procedure, and lastly a data-adaptive
selection scheme of the tuning parameters. Section 5 presents a simulation study
to evaluate the performance of our test-and-pool estimator. Section 6 provides
a real-data illustration. All proofs are given in the Appendix.
2. Basic setup
2.1. Notation: two data sources
Let F
N
= {V
i
=(X
i
,Y
i
)
: i U} with U = {1,...,N} denote a finite
population of size N ,whereX
i
is a vector of covariates and Y
i
is the study
variable. We assume that F
N
is a random sample from a superpopulation model
ζ and our objective is to estimate the finite-population parameter μ
g
R
l
,
defined as the solution to
1
N
N
i=1
S(V
i
; μ)=0, (2.1)
where S(V
i
; μ)isal-dimensional estimating function. The class of parameters is
fairly general. For example, if S(V ; μ)=Y μ, μ
g
= Y
N
= N
1
N
i=1
Y
i
is the
population mean of Y
i
.IfS(V ; μ)=1(Y<c)μ for some constant c,where1(·)
is an indicator function, μ
g
= N
1
N
i=1
1(Y
i
<c) is the population proportion
of Y
i
less than c.IfS(V ; μ)=X(Y X
μ), μ
g
=(
N
i=1
X
i
X
i
)
1
(
N
i=1
X
i
Y
i
)
is the coefficient of the finite-population regression projection of Y
i
onto X
i
.
Suppose that there are two data sources, one from a probability sample,
referred to as Sample A, and the other from a non-probability sample, referred to
as Sample B. Assume Sample A to be independent of Sample B, and the observed
units can be envisioned as being generated through two phases of sampling [12].
Firstly, a superpopulation model ζ generates the finite population F
N
. Then, the
probability (or non-probability) sample is selected from it using some known (or
unknown) sampling schemes. Hence, the considered total variance of estimators
is based on the randomness induced by both the superpopulation model and
the sampling mechanisms; see Table 1 for the notations of probability order,
expectation and (co-)variance. For example, E
p
(·|F
N
) is the average over all
possible samples under the probability design for particular finite population
F
N
,andE(·) is the average over all possible samples from all possible finite
populations.
1496 C. Gao and S. Yang
Table 1
Notation and definitions.
Randomness order notation expectation (co-)variance
probability design o
p
(1),O
p
(1) E
p
(·|F
N
)var
p
(·|F
N
) , cov
p
(·|F
N
)
non-probability design o
np
(1),O
np
(1) E
np
(·|F
N
)var
np
(·|F
N
) , cov
np
(·|F
N
)
ζ model o
ζ
(1),O
ζ
(1) E
ζ
(·)var
ζ
(·) , cov
ζ
(·)
total variance o
ζ-p-np
(1),O
ζ-p-np
(1) E (·)var(·) , cov (·)
Thus far, our focus has been on the setting where the covariates X and
the study variable Y are available in both the probability and non-probability
samples, which has also been considered in [21]and[20]. The sampling indicators
are denoted by δ
A,i
and δ
B,i
, respectively; e.g., δ
A,i
= 1 if unit i is selected into
Sample A and zero otherwise. Sample A contains observations O
A
= {(d
i
=
π
1
A,i
,X
i
,Y
i
):i ∈A}with sample size n
A
,whereπ
A,i
is the known first-order
inclusion probability for Sample A, and Sample B contains observations O
B
=
{(X
i
,Y
i
):i ∈B}with sample size n
B
. The unknown propensity score for being
selected into Sample B is denoted by π
B,i
. Here, A and B denote the indexes
of units in Samples A and B with total sample size n = n
A
+ n
B
and negligible
sampling fractions, i.e., n/N = o(1). Let the limits of the fractions of Sample A
and B be f
A
= lim
n→∞
n
A
/n and f
B
= lim
n→∞
n
B
/n with 0 <f
A
,f
B
< 1.
2.2. Assumptions and separate estimators
As observing (X
i
,Y
i
) for all units i in U is usually not feasible in practice, we
can estimate the population estimating equation (2.1) by the design-weighted
sample analog under the probability sampling design
1
N
N
i=1
δ
A,i
π
A,i
S(V
i
; μ)=0, (2.2)
yielding a design-weighted Z-estimator μ
A
[65]. When S(V ; μ) is a score function,
the resulting estimator will be a pseudo maximum likelihood estimator [58]. For
example, for estimating
Y
N
,wehaveS(V ; μ)=Y μ, which leads to μ
A
=
(
N
i=1
δ
A,i
π
1
A,i
)
1
N
i=1
δ
A,i
π
1
A,i
Y
i
. We now make the following assumption for
the design-weighted Z-estimator.
Assumption 2.1 (Design consistency and central limit theorem). Let μ
A
be the
corresponding design-weighted Z-estimator of μ
g
, which satisfies that var
p
(μ
A
|
F
N
)=O
ζ
(n
1
A
) and {var
p
(μ
A
)}
1/2
×(μ
A
μ
g
) |F
N
→N(0, 1) in distribution
as n
A
→∞.
Under the typical regularity conditions [24], Assumption 2.1 holds for many
common sampling designs such as probability proportional to size and stratified
simple random sampling. Under Assumption 2.1, μ
A
is design-consistent and
does not rely on any modeling assumptions. This explains why the probability
sampling has been the gold standard approach for finite-population inference,
and we make this assumption throughout this article.
Test-and-pool estimator 1497
Let f(Y | X) be the conditional density function of Y given X in the super-
population model ζ,andletf(X)andf(X | δ
B
= 1) be the density function
of X in the finite population and the non-probability sample, respectively. To
correct for the selection bias of the non-probability sample, most of the existing
literature considers the following assumptions [e.g., 46, 66, 12].
Assumption 2.2 (Common support and ignorability of sampling). (i) The vec-
tor of covariates X has a compact and convex support, with its density bounded
and bounded away from zero. Also, there exist positive constants C
l
and C
u
such
that C
l
f(X)/f(X | δ
B
=1) C
u
almost surely. (ii) Conditional on X, the
density of Y in the non-probability sample follows the superpopulation model;
i.e., f(Y | X, δ
B
=1)=f(Y | X). (iii) The sample inclusion indicator δ
B,i
and
δ
B,j
are independent given X
i
and X
j
for i = j.
Assumption 2.2 (i) and (ii) constitute the strong sampling ignorability condi-
tion [50]. Assumption 2.2 (i) implies that the support of X in the non-probability
sample is the same as that in the finite population, and it can also be formulated
as a positivity assumption that P(δ
B
=1| X) > 0 for all X. This assumption
does not hold if certain units would never be included in the non-probability
sample. Assumption 2.2 (ii) is equivalent to the ignorability of the sampling
mechanism for the non-probability sample conditional on the covariates X, i.e.,
P(δ
B
=1| X, Y )=P(δ
B
=1| X)[34]. This assumption holds if the set of
covariates contain all the outcome predictors that affect the possibility of be-
ing selected into the non-probability sample. Assumption 2.2 (iii) is a critical
condition to employ the weak law of large numbers under the non-probability
sampling design [12]. Under Assumption 2.2, the non-probability sample can be
used to produce consistent estimators. However, this assumption may be unre-
alistic if the non-probability data collection suffers from uncontrolled selection
biases [6], measurement errors [17], or other error-prone issues. Thus, we con-
sider Assumption 2.2 as an idealistic assumption, which may be violated and
require pretesting.
Under Assumptions 2.1 and 2.2, let Φ
A
(V,δ
A
; μ)an
B
(V,δ
A
B
; μ)be
two l-dimensional estimating functions for the target parameter μ
g
when us-
ing the probability sample and the combined samples, respectively. In prac-
tice, Φ
A
(·)an
B
(·) may depend on unknown nuisance functions, and solving
E{Φ
A
(V,δ
A
; μ)} =0andE{Φ
B
(V,δ
A
B
; μ)} = 0 is not feasible. By replacing
the nuisance functions with their estimated counterparts, and the expectations
with the empirical averages, we obtain μ
A
and μ
B
by solving
1
N
N
i=1
Φ
A
(V
i
A,i
; μ)=0,
1
N
N
i=1
Φ
B
(V
i
A,i
B,i
; μ)=0, (2.3)
respectively, where {
Φ
A
(·),
Φ
B
(·)} are the estimated version of {Φ
A
(·), Φ
B
(·)}.
Remark 2.1. For estimating the finite population means, that is, μ
g
= Y
N
,
Φ
A
(·) and Φ
B
(·) are commonly chosen as
Φ
A
(V,δ
A
; μ)=
δ
A
π
A
(Y μ), (2.4)
1498 C. Gao and S. Yang
Φ
B
(V,δ
A
B
; μ)=
δ
B
π
B
(X)
{Y m (X)} +
δ
A
π
A
m (X) μ, (2.5)
where π
B
(X)=P(δ
B
=1| X) and m(X)=E(Y | X, δ
B
=1). To obtain the
estimators μ
A
and μ
B
, parametric models π
B
(X; α) and m(X; β) can be posited
for the nuisance functions π
B
(X) and m(X), respectively.
In addition, researchers might be interested in estimating the individual-level
outcomes rather than the population-level outcomes. In this case, Φ
A
(·) and
Φ
B
(·) can be specified for estimating the outcome model m(X; β) as:
Φ
A
(V,δ
A
; β)=
δ
A
π
A
∂m(X; β)
∂β
{Y m(X; β)}
Φ
B
(V,δ
A
B
; β)=
δ
A
π
A
+
δ
B
π
B
(X)
∂m(X; β)
∂β
{Y m(X; β)}.
Next, we adopt the model-design-based framework for inference, which in-
corporates the randomness over the two phases of sampling [27, 37, 7, 70]. The
asymptotic properties for μ
A
and μ
B
can be derived using the standard M-
estimation theory under suitable moment conditions.
Lemma 2.1. Suppose Assumptions 2.1, 2.2 and additional regularity condi-
tions A.1 hold. Then, we have
n
1/2
μ
A
μ
g
μ
B
μ
g
→N

0
l×1
0
l×1
,
V
A
Γ
Γ
V
B

, (2.6)
where V
A
, V
B
,andΓ are defined explicitly in the Appendix.
In Lemma 2.1, we extend the conditional normality to unconditional as in
[55], which implies that the asymptotic (co-)variances terms V
A
,V
B
and Γ refer
to all the sources of uncertainty over the two phases.
2.3. Efficient estimator
Under Assumptions 2.1 and 2.2,bothμ
A
and μ
B
are consistent, and it is ap-
pealing to combine μ
A
with μ
B
to achieve efficient estimation. We consider a
class of linear combinations of the functions in (2.3):
N
i=1
{
Φ
A
(V
i
A,i
; μ)+Λ
Φ
B
(V
i
A,i
B,i
; μ)} =0, (2.7)
where Λ is the linear coefficient that gauges how much information of the non-
probability sample should be integrated with the probability sample. Equation
(2.7) leads to a class of composite estimators which is a weighted average of
μ
A
and μ
B
with Λ-indexed weight ω
A
and ω
B
.WhenΛ=0,(2.7)provides
the design-consistent estimator μ
A
. The optimal choice Λ
eff
can be empirically
tuned to minimize the asymptotic variance of the composite estimator, leading
Test-and-pool estimator 1499
to the efficient estimator μ
eff
. However, the major concern for μ
eff
is the possible
bias due to the violation of Assumption 2.2 (ii) for the non-probability sample.
When it is violated, it is reasonable to choose Λ = 0 and prevent any bias
associated with the non-probability sample.
3. Test-and-pool estimator
Motivated by the above reasoning, we develop a strategy that pretests the com-
parability of the non-probability sample with the probability sample first and
then decides whether or not we should combine them for efficient estimation.
We formulate the hypothesis test in Section 3.1, and construct the test-and-pool
estimator in Section 3.2.
3.1. Hypothesis and test
We formalize the null hypothesis H
0
when Assumption 2.2 holds, and the fixed
and local alternatives H
a
and H
a,n
when Assumption 2.2 is violated. To be
specific, we consider
H
0
: E{Φ
B
(V,δ
A
B
; μ
g,0
)} =0, (3.1)
H
a
: E{Φ
B
(V,δ
A
B
; μ
g,0
)} = η
fix
, (3.2)
H
a,n
: E{Φ
B
(V,δ
A
B
; μ
g,0
)} = n
1/2
B
η, (3.3)
where μ
g,0
= E
ζ
(μ
g
), μ
g
= μ
g,0
+ O
ζ
(N
1/2
), and η
fix
, η are two fixed pa-
rameters. The fixed alternative H
a
is commonly considered in the standard
hypothesis testing framework. However, it enforces the bias of the estimating
function Φ
B
(·) to be fixed and indicates a strong violation of Assumption 2.2,
under which the test statistic T will diverge to infinity with the sample size.
Moreover, the inference under the fixed alternative can not capture the finite-
sample behavior of the test well and lacks uniform validity. On the contrary, the
local alternative provides a useful tool to study the finite-sample distribution of
non-regular estimators when the signal of violation is weak, i.e., in the n
1/2
B
neighborhood of zero. In such cases, we allow the existence of a set of unmea-
sured covariates whose association with either the possibility of being selected
into Sample B or the outcome is small. Also, the local alternative H
a,n
is more
general in the sense that it reduces to H
a
with η = ±∞, and has been widely
employed to illustrate the non-regularity settings, such as weak instrumental
variables regression [59], regression estimators of weakly identified parameters
[13] and test errors in classification [33]. We will mainly exploit the local alter-
native to show the inherent non-regularity of the pretest estimator.
Under the null hypothesis (3.1), μ
B
is consistent, and hence, it is reasonable to
combine μ
A
and μ
B
for efficient estimation. However, when the null hypothesis
is violated as in (3.3), the efficient estimator is biased. Lemma 3.1 presents the
asymptotic properties of the separate and efficient estimators under H
a,n
.
1500 C. Gao and S. Yang
Lemma 3.1. Suppose Assumptions 2.1, 2.2 (i) and (iii), and all the regular-
ity conditions in Lemma 2.1 hold. Then, under the local alternative H
a,n
, the
asymptotic distributions for μ
A
and μ
B
are
n
1/2
μ
A
μ
g
μ
B
μ
g
N

0
l×1
f
1/2
B
E {Φ
B
(V,δ
A
B
; μ
g,0
)/∂μ}
1
η
,
V
A
Γ
Γ
V
B

.
(3.4)
The asymptotic distribution of the efficient estimator μ
eff
is
n
1/2
(μ
eff
μ
g
)→N {b
eff
(η),V
eff
},
where b
eff
(η)=f
1/2
B
ω
B
eff
)E {Φ
B
(V,δ
A
B
; μ
g,0
)/∂μ}
1
η. The exact form
of ω
B
eff
) and V
eff
are presented in Lemma A.3.
By Lemma 3.1, among the three estimators μ
A
, μ
B
and μ
eff
,whenH
0
holds,
μ
eff
is optimal because it is consistent and the most efficient; while when H
0
is
violated, μ
A
is optimal because it is consistent but the other two estimators are
not.
We now use pretesting to guide choosing the estimators. To test H
0
,thekey
insight is that μ
A
is always consistent for μ
g
by Assumption 2.1,andifH
0
holds,
Φ
B,n
(μ
A
)=n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
A
) should behave as a mean-zero
random vector asymptotically. Thus, we construct the test statistic T as
T =
Φ
B,n
(μ
A
)
Σ
1
T
Φ
B,n
(μ
A
)
, (3.5)
where Σ
T
is the asymptotic variance of Φ
B,n
(μ
A
, τ), and
Σ
T
is a consistent
estimator of Σ
T
. The exact form of Σ
T
in (A.15) involves V
A
, V
B
, and Γ. Thus,
Σ
T
can be obtained by replacing the unknown components in the expression of
Σ
T
with their estimated counterparts, and the expectations with the empirical
averages. In addition, we can consider the replication-based method for variance
estimation in Algorithm B.1 adapted from [35].
Lemma 3.2 serves as the foundation for our data-driven pooling step in Sec-
tion 3.2.
Lemma 3.2. Suppose Assumptions 2.1, 2.2 (i) and (iii), and all the regularity
conditions in Lemma 2.1 hold. Under H
0
, the test statistic T χ
2
l
, i.e., a chi-
square distribution with degree of freedom l.UnderH
a,n
, T χ
2
l
(η
Σ
1
T
η/2) with
non-central parameter η
Σ
1
T
η/2 as n →∞.
3.2. Data-driven pooling
If T is large, it indicates that H
0
may be violated and thus it is desirable to retain
only the probability sample for estimation. If T is small, it indicates that H
0
may
be accepted and suggests combining the probability and non-probability samples
Test-and-pool estimator 1501
for efficient estimation. This strategy leads to the test-and-pool estimator μ
tap
as the solution to
N
i=1
{
Φ
A
(V
i
A,i
; μ)+1(T<c
γ
Φ
B
(V
i
A,i
B,i
; μ)} =0, (3.6)
where c
γ
is the (1 γ) critical value of χ
2
l
.In(3.6), we can fix Λ to be the
optimal form Λ
eff
leading to an efficient estimator under H
0
in Section 2.3.
Alternatively, we view c
γ
and Λ jointly as tuning parameters that determine how
much information from the non-probability sample can be borrowed in pooling.
Larger c
γ
and Λ borrow more information from the non-probability sample,
leading to more efficient but more error-prone estimators, and vice versa. We
will use a data-adaptive rule to select ,c
γ
) that minimizes the mean squared
error of μ
tap
.
Remark 3.1. Compare to the t-test-based pooling estimator in [38], our pro-
posed method is more general in the sense that (a) the auxiliary covariates are
used to provide a more informative model of μ
g
; (b) our test statistic T is mo-
tivated by the estimating function, which can be more robust to model misspec-
ification, and (c) a data-adaptive selection of ,c
γ
) is adopted for minimizing
the post-integration mean squared error.
4. Asymptotic properties of the test-and-pool estimator
In this section, we characterize the asymptotic properties of μ
tap
. Before pro-
ceeding further, we introduce more notations. Let I
l×l
be a l ×l identify matrix,
F
l
(·; η) be the cumulative distribution function for χ
2
l
with non-central param-
eter η,andF
l
(·)=F
l
(·; 0). Denote V
A-eff
= V
A
V
eff
and V
B-eff
= V
B
V
eff
,
which are both positive-definite.
4.1. Asymptotic distribution
By construction, the estimator μ
tap
is a pretest estimator that first constructs T
for pretesting H
0
and then forms the test-based weights for combining μ
A
and
μ
B
. It is challenging to derive the asymptotic distribution of μ
tap
because it is
involved with the test statistic T and two asymptotically dependent components
μ
A
and μ
B
. In order to formally characterize the asymptotic distribution of
μ
tap
, we decompose the asymptotic representation of μ
tap
by two orthogonal
components, one is affected by the testing and the other is not.
First, by Lemma 3.1,letn
1/2
(μ
A
μ
g
)Z
1
and n
1/2
(μ
B
μ
g
)Z
2
,where
Z
1
and Z
2
are multivariate normal random vectors as in (3.4).
Second, by Lemma 3.2, asymptotically, we write T as a quadratic form W
T
2
W
2
with W
2
= f
1/2
B
Σ
1/2
T
E {Φ
B
(μ
g,0
0
)/∂μ}
1
(Z
1
Z
2
). We then find an-
other standardized l-variate normal vector W
1
= f
1/2
B
Σ
1/2
S
{
V
B
)(Γ
V
A
)
1
Z
1
+ Z
2
} that is orthogonal to W
2
,wherecov(W
1
,W
2
)=0
l×l
, E(W
1
)=
1502 C. Gao and S. Yang
μ
1
, var(W
1
)=I
l×l
and E(W
2
)=μ
2
, var(W
2
)=I
l×l
S
is introduced for the
purpose of standardization.
Third, μ
tap
can be asymptotically represented by two components involving
W
1
and W
2
, respectively, one component is affected by the test constraint and
the other component is not. Following the above steps, Theorem 4.1 character-
izes the asymptotic distribution of μ
tap
.
Theorem 4.1. Suppose the assumptions in Lemma 3.1 hold except that As-
sumption 2.2 (ii) may be violated as dictated by H
a,n
in (3.3). Let W
1
and W
2
to be independent normal random vectors with mean μ
1
and μ
2
(given below,
which vary by hypothesis) and variance matrices I
l×l
. The test-and-pool estima-
tor μ
tap
follows the following asymptotic distribution
n
1/2
(μ
tap
μ
g
)
V
1/2
eff
W
1
+(ω
A
V
1/2
Aeff
ω
B
V
1/2
Beff
)W
t
[0,c
γ
]
w.p. ξ,
V
1/2
eff
W
1
+ V
1/2
Aeff
W
t
[c
γ
,]
w.p. 1 ξ,
where W
t
[a,b]
is the truncated normal distribution W
2
| (a W
2
W
2
b) and
ξ = F
l
(c
γ
; μ
2
μ
2
/2).
(a) Under H
0
, μ
1
= μ
2
=0 = F
l
(c
γ
;0)=γ.
(b) Under H
a,n
, μ
1
= Σ
1/2
S
E {Φ
B
(μ
g,0
0
)/∂μ}
1
η, μ
2
= Σ
1/2
T
η and
ξ = F
1
(c
γ
; μ
T
2
μ
2
/2).
Theorem 4.1 reveals that the asymptotic distribution of μ
tap
depends on
the local parameter η and thus characterizes the non-regularity of the pretest
estimator. When H
0
is violated weakly (a small perturbation in the true data
generating model), the asymptotic distribution of μ
tap
can change abruptly
depending on η. The non-regularity of μ
tap
also poses challenges for inference
as shown in Section 4.3. Based on Theorem 4.1, we derive the asymptotic biases
and mean squared errors of μ
tap
under H
0
and H
a,n
, which serve as the stepping
stone to a data-driven procedure to select the tuning parameters Λ and c
γ
.
4.2. Asymptotic bias and mean squared error
BasedontheTheorem4.1, the asymptotic distribution of μ
tap
involves elliptical
truncated normal distributions [60, 4]. To understand the asymptotic behavior
of our proposed estimator, it is crucial to comprehend the essential properties
of elliptical truncated multivariate normal distributions. We derive the moment
generating function and subsequently the mean square error of the estimator
μ
tap
. The exact form of mean squared error given by mse(Λ,c
γ
; η)in(B.13),
albeit complicated, reveals that the amount of information borrowed from the
non-probability sample (controlled by Λ and c
γ
) should tailor to the strength
of violation of H
0
(dictated by local parameter η). For illustration, we consider
a toy example in the supplemental material.
We search for the optimal values
,c
γ
) that minimize mse(Λ,c
γ
; η)using
standard numerical optimization algorithm [39], where η
B,n
(μ
A
, τ). Note
that the decision of rejecting H
0
or not is subject to the hypothesis testing
Test-and-pool estimator 1503
errors, namely the Type I error and Type II error. That is, the test statistic
T can be larger than c
γ
even when H
0
holds; similarly, it can be small when
H
a,n
holds. However, the data-adaptive tuning procedure aims at minimizing
the mean squared error of the estimator μ
tap
, which implicitly restricts these
two testing errors to be small.
4.3. Adaptive inference
Standard approaches to inference, e.g., the nonparametric bootstrap, require the
estimators to be regular [56]. In non-regular settings, researchers have proposed
alternative approaches such as the m-out-n bootstrap or subsampling. However,
these approaches critically rely on a proper choice of m or the subsample size;
otherwise, the small sample performances can be poor. The non-regularity is
induced because the asymptotic distribution of the estimator μ
tap
depends on
the local parameter, thus, it does not converge uniformly over the parameter
space. [33] propose adaptive confidence intervals for test errors in the classifi-
cation problems. Following this idea, we construct the bound-based adaptive
confidence interval (BACI) for the estimator μ
tap
that guarantees good cover-
age properties. To avoid the non-regularity, our general strategy is to derive
two smooth functionals that bound the estimator μ
tap
. Because these two func-
tionals are regular, standard approaches to inference can be adopted and valid
confidence intervals follow.
To be concrete, we construct a bound-based adaptive confidence interval
for a
μ
g
,wherea R
l
is fixed. By Theorem 4.1, we can reparametrize the
asymptotic distribution of a
n
1/2
(μ
tap
μ
g
)as
a
n
1/2
(μ
tap
μ
g
)R
n
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)U
n
, (4.1)
where
R
n
= a
V
1/2
eff
W
1
+ a
(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)W
2
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)μ
t
[c
γ
,)
,
U
n
= W
t
[c
γ
,)
μ
t
[c
γ
,)
,
and μ
t
[c
γ
,)
= μ
2
1
μ
2
μ
2
>c
γ
. By construction, R
n
is regular and asymptotically
normal, but U
n
is nonsmooth. Nonsmoothness and nonregularity are interre-
lated. To illustrate, if μ
2
=0,U
n
follows a standard truncated normal distribu-
tion with truncated probability P(W
2
W
2
c
γ
| μ
2
= 0); whereas, if |μ
2
|→∞,
P(W
2
W
2
c
γ
| μ
2
) diminishes to zero, implying that U
n
follows a standard
normal distribution. Thus, the limiting distribution of a
n
1/2
(μ
tap
μ
g
)isnot
uniform over local parameter μ
2
(or equivalently η).
Our goal is to form the least conservative smooth upper and lower bounds.
An important observation is that if |μ
2
| is sufficiently large, we may treat
U
n
as regular. Thus, we define B as the nonregular zone for μ
2
μ
2
such that
max
μ
2
μ
2
B
P(W
2
W
2
c
γ
| μ
2
) 1 ε for small >0andB
the regular
zone. When μ
2
μ
2
B
, standard inference can apply, and bounds are only
1504 C. Gao and S. Yang
Fig 1. Illustration of the nonregular zone B (shaded) and two power functions: the solid and
dash lines are P(W
2
W
2
>c
γ
| μ
2
μ
2
) and P(T v
n
| μ
2
μ
2
) as functions of μ
2
μ
2
, respectively.
needed when μ
2
μ
2
B to avoid the inference procedure to be overly con-
servative. We then require another test procedure to test μ
2
μ
2
B against
μ
2
μ
2
B
. Toward this end, we use T v
n
,wherev
n
is chosen such that
max
μ
2
μ
2
B
P(T v
n
| μ
2
)=˜α for a pre-specified ˜α. Figure 1 illustrates the
regular and nonregular zones and the test. If T ν
n
, we conclude the regularity
of the estimator μ
tap
and construct a normal confidence interval, but if T<ν
n
,
we construct the least favorable confidence interval by taking the union for all
μ
2
R
l
. In practice, v
n
can be determined by the double bootstrapping sat-
isfying the regularity condition that lim
n→∞
v
n
/n = 0; see Section B.4 of the
supplemental material for more details.
Accordingly, U
n
can be decomposed into two components U
n
=(W
t
[c
γ
,)
μ
t
[c
γ
,)
)1
T υ
n
+(W
t
[c
γ
,)
μ
t
[c
γ
,)
)1
T<v
n
and only regularize (i.e., deriving
bounds for) the latter component. Continuing with (4.1), we can take the supre-
mum over all μ
2
in the nonregular zone to construct the upper bound U(a),
U(a)=R
n
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)(W
t
[c
γ
,)
μ
t
[c
γ
,)
)1
T υ
n
+sup
μ
2
R
l
a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)(W
t
[c
γ
,)
μ
t
[c
γ
,)
)
1
T<v
n
(4.2)
The lower bound L(a)fora
n
1/2
(μ
tap
μ
g
) can be computed in an analogous
way by replacing sup with inf in (4.2). Taking the supremum and the infimum
of μ
2
over R
l
renders the two bounds U(a)andL(a) smooth and regular. The
limiting distribution of U(a)is
U(a)R + a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)(W
t
[c
γ
,)
μ
t
[c
γ
,)
)1
μ
2
μ
2
B
Test-and-pool estimator 1505
+sup
μ
2
R
l
a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)(W
t
[c
γ
,)
μ
t
[c
γ
,)
)
1
μ
2
μ
2
B
. (4.3)
Similarly, the limiting distribution of L(a)is(4.3) by replacing sup with inf.
Based on the limiting distribution of U(a)andL(a), if P(μ
2
μ
2
B)=0,U(a)
and L(a) have approximately the same limiting distributions as a
n
1/2
(μ
tap
μ
g
). However, if P(μ
2
μ
2
B) =0,U(a) is stochastically larger and L(a)is
stochastically smaller than a
n
1/2
(μ
tap
μ
g
).
Based on the regular bounds U (a)andL(a), we construct the (1 α) ×100%
bound-based adaptive confidence interval of a
μ
g
as
C
BACI
μ
g
,1α
(a)=
a
μ
tap
U
1α/2
(a)/
n, a
μ
tap
L
α/2
(a)/
n
, (4.4)
where
U
d
(a)and
L
d
(a) approximate the d-th quantiles of the distribution of U(a)
and L(a), respectively, which can be obtained by the nonparametric bootstrap
method.
Theorem 4.2. Assume the conditions in Theorem 4.1 hold true. Furthermore,
assume matrices Σ
T
, Σ
S
in Lemma 3.1 and their consistent estimates
Σ
T
,
Σ
S
are strictly positive-definite, and the sequence v
n
satisfies v
n
→∞and v
n
/n 0
with probability one. The asymptotic coverage rate of (4.4) satisfies
P
a
μ
g
C
BACI
μ
g
,1α
(a)
1 α. (4.5)
In particular, if Assumption 2.2 is strongly violated with P(μ
2
μ
2
B
)=1, the
inequality in (4.5) becomes equality.
Remark 4.1. We discuss an alternative approach to construct valid confidence
intervals for the non-regular estimators using projection sets [48] (referred to as
projection-based adaptive confidence intervals (PACI), C
PACI
μ
g
,1α
(a)). The basic
idea is as follows. For a given μ
2
, the limiting distribution of μ
tap
is known and
aregular(1 ˜α
1
) × 100% condence interval C
μ
g
,1 ˜α
1
(a; μ
2
) of a
μ
g
can be
formed through the standard procedure. Since μ
2
is unknown, a (1 α) ×100%
projection confidence interval of μ
g
can be conservatively constructed as the
union of all C
μ
g
,1 ˜α
1
(a; μ
2
) over μ
2
in its (1 ˜α
2
) × 100% condence region,
where α α
1
α
2
. Such strategy may be overly conservative, and in that way, the
projection-based adaptive confidence interval then introduces a pretest in order to
mitigate the conservatism. If the pretest rejects H
0
: μ
2
μ
2
B, C
μ
g
,1 ˜α
1
(a; μ
2
)
is used; otherwise, the union of C
μ
g
,1 ˜α
1
(a; μ
2
) is used. The technical details
for the C
PACI
μ
g
,1α
(a) are presented in the supplemental material. Our simulation
study later shows that the C
PACI
μ
g
,1α
(a) is more conservative than the proposed
C
BACI
μ
g
,1α
(a).
5. Simulation study
In this section, we evaluate the finite-sample performances of the proposed esti-
mator μ
tap
and C
BACI
μ
g
,1α
(a). First, we generate the finite population F
N
with size
1506 C. Gao and S. Yang
N =10
5
. For each subject i, generate X
i
=(1,X
1,i
,X
2,i
)
,whereX
1,i
∼N(0, 1)
and X
2,i
∼N(1, 1), and generate Y
i
by Y
i
=1+X
1,i
+X
2,i
+u
i
+u
2
i
+ε
i
,where
u
i
∼N(0, 1) and
i
∼N(0, 1). Generate samples from the finite population F
N
by Bernoulli sampling with specified inclusion probabilities
log
π
A,i
1 π
A,i
| X
i
= ν
A
+ .2X
1,i
+ .1X
2,i
,
log
π
B,i
1 π
B,i
| X
i
= ν
B
+ .1X
1,i
+ .2X
2,i
+ .5n
1/2
B
bu
i
,
where ν
A
and ν
B
are adaptively chosen to ensure the target sample sizes n
A
600 and n
B
5000. We assume that (X
i
,Y
i
) are observed but u
i
is unobserved,
and we vary b in {0, 10, 100} to represent the scenarios where H
0
holds, is slightly
violated or strongly violated, respectively.
We compare the estimator μ
tap
with other estimators: (a) μ
A
: the solu-
tion to
N
i=1
Φ
A
(V
i
A,i
; μ) = 0 with Φ
A
(V
i
A,i
; μ) defined in (2.4). (b) μ
B
:
the naive sample mean
μ
B
=(
N
i=1
δ
B,i
)
1
N
i=1
δ
B,i
Y
i
.(c)μ
dr
: the solution
to
N
i=1
Φ
B
(V
i
A,i
B,i
; μ, α, β) = 0 with Φ
B
(V
i
A,i
B,i
; μ, α, β) defined in
(2.5), where (α, β) are estimated by using the maximum pseudo-likelihood es-
timator α and the ordinary least square estimator
β [26]; see Equations (B.2)
and (B.4). (d) μ
eff
: the solution to (2.7) with the optimal choice Λ
eff
speci-
fied in (A.11) and the consistent estimators (α,
β) obtained from (c). (e) μ
eff:B
:
μ
eff
,whereα is estimated in the same manner as (c) but β is estimated solely
based on the non-probability sample; see Equation (B.3). (f) μ
eff:KH
: μ
eff
,where
(α, β) are estimated simultaneously by adopting the methods proposed by [29];
see Equations (B.5)and(B.6). (g) μ
tap
, μ
tap:B
, μ
tap:KH
: the solution to (3.6),
where ,c
γ
) are chosen by our data-adaptive procedure with (α,
β) obtained
from (d), (e), (f), respectively. (h) μ
Bayes:1
, μ
Bayes:2
, μ
Bayes:3
: the Bayesian ap-
proaches for combining the non-probability sample with the probability sample
assuming different informative priors [52].
For all estimators, we specify the model π
B
(X; α) to be a logistic regression
model with X
i
and the outcome mean model m(X; β) to be a linear regression
model with X
i
. For non-regular estimators μ
tap
, μ
tap:B
and μ
eff:KH
,wecon-
struct the C
BACI
μ
g
,1α
(a)in(4.4) with a data-adaptiv choice of ν
n
,theC
BACI
μ
g
,1α
(a)
with a fixed v
n
= log log n{C
BACI:F
μ
g
,1α
(a)} (BACI
F
), and the C
PACI
μ
g
,1α
(a). For any
confidence intervals requiring the nonparametric bootstrap, the bootstrap size
is 2000. For the Bayesian estimators, the point estimates are obtained by the
Markov chain Monte Carlo sampling with size 2000 after additional 500 burn-in
samples.
Table 2 reports the bias, variance and mean squared error of each estimator
over 2000 simulated datasets. The benchmark estimators μ
A
have small biases
across all scenarios, guaranteed by the probability sampling design. On the other
hand, the non-probability-only estimators
μ
B
exhibit high biases in all cases,
mainly due to the effect of selection bias. When the impact of the unmeasured
confounder b increases, the pooled estimators μ
eff
, μ
eff:B
and μ
eff:KH
are be-
Test-and-pool estimator 1507
Table 2
Simulation results for bias (×10
3
),variance(var)(×10
3
) and mean squared error
(MSE) (×10
3
) of μ
A
, μ
B
, μ
dr
, μ
eff
, μ
Bayes
and μ
tap
when H
0
holds, is slightly violated or
strongly violated.
H
0
holds slightly violated strongly violated
bias var MSE bias var MSE bias var MSE
Regular μ
A
4.1 10.4 10.4 4.1 10.4 10.4 4.1 10.4 10.4
μ
B
284.1 1.2 81.9 355.3 1.2 127.4 1318.8 2.0 1741.4
μ
dr
0.4 4.2 4.2 71.0 4.3 9.3 1048.0 5.0 1103.2
μ
eff
0.9 4.1 4.1 62.3 4.2 8.1 851.5 6.6 731.7
μ
eff:B
0.9 4.1 4.1 62.3 4.2 8.1 851.7 6.6 732.1
μ
eff:KH
0.9 4.1 4.1 62.3 4.2 8.1 851.5 6.7 731.7
Bayes μ
Bayes:1
3.7 14.1 14.1 1.0 14.0 14.0 4.3 14.1 14.1
μ
Bayes:2
4.1 10.8 10.8 17.1 11.1 11.4 7.0 13.8 13.8
μ
Bayes:3
2.4 8.9 8.9 51.2 9.0 11.6 614.0 10.8 387.9
TAP μ
tap
4.8 7.6 7.6 10.1 9.3 9.4 4.1 10.4 10.4
μ
tap:B
4.8 7.6 7.6 10.1 9.3 9.4 4.1 10.4 10.4
μ
tap:KH
4.8 7.6 7.6 10.1 9.3 9.4 4.1 10.4 10.4
Table 3
Simulation results for coverage rates (CR) (×10
2
) and widths (×10
3
) for 95% condence
intervals when H
0
holds, is slightly violated or strongly violated.
H
0
holds slightly violated strongly violated
CIs CR width CR width CR width
μ
A
Wald 95.2 404.1 95.3 404.1 95.2 404.0
μ
B
0.0 135.5 0.0 138.8 0.0 173.7
μ
dr
95.9 262.8 81.8 264.4 0.0 282.4
μ
eff
95.9 259.5 85.1 260.9 0.0 273.6
μ
Bayes:1
hpdi 98.3 463.0 97.5 461.5 97.3 462.8
μ
Bayes:2
97.8 404.2 97.4 409.8 97.5 458.3
μ
Bayes:3
99.3 368.2 97.4 370.6 0.0 407.0
μ
tap
paci 98.4 558.7 98.4 535.7 99.2 541.0
baci
F
94.7 399.1 95.9 402.3 94.7 402.6
baci 92.1 363.1 93.3 367.2 94.8 402.8
coming more biased. Additionally, the Bayesian methods, particularly μ
Bayes:2
,
perform reasonably well when H
0
holds or is slightly violated, but it tends to
have large biases when H
0
is strongly violated. Whereas the proposed estima-
tors μ
tap
, μ
tap:B
and μ
tap:KH
have small biases regardless of the strength of the
unmeasured confounder. When H
0
is slightly violated, our proposed estimators
have slightly larger biases but smaller mean squared errors than μ
A
by inte-
grating the non-probability sample. When H
0
is strongly violated, the proposed
estimators perform similarly to μ
A
with the protection of pretesting.
Table 3 reports the properties of 95% Wald confidence intervals for the regu-
lar estimators, the highest posterior density intervals (HPDIs) for the Bayesian
estimators, and various adaptive confidence intervals for the non-regular es-
timators μ
tap
, where the Wald confidence intervals are constructed, and the
Bayesian credible intervals are constructed based on the posterior samples after
1508 C. Gao and S. Yang
burn-in. Because the confidence intervals (and the point estimates; see Table 2)
are not sensitive to the methods of estimating the nuisance parameters (α, β),
we only present the confidence intervals for μ
eff:KH
and μ
tap:KH
for simplicity.
BasedonTable3, C
PACI
μ
g
,1α
tend to overestimate the uncertainty, leading to
over-conservative confidence intervals. C
BACI
μ
g
,1α
and C
BACI:F
μ
g
,1α
are less conserva-
tive and alleviate the over-coverage issues; thus, the empirical coverage rates are
close to the nominal level in all cases. Moreover, C
BACI
μ
g
,1α
have narrower inter-
vals than C
BACI:F
μ
g
,1α
by using the double bootstrap procedure to select v
n
at the
expense of computational burden. When H
0
holds, the C
BACI
μ
g
,1α
are narrower
than the Wald for the probability-only estimator μ
A
, indicating the advantages
of implementing the test-and-pool strategy in these cases. When H
0
is slightly
violated, the benefit in coverage rate is not significantly observed under similar
coverage rates. When H
0
is strongly violated, the adaptive confidence interval
C
BACI
μ
g
,1α
reduces to the Wald confidence intervals for μ
A
. Lastly, the credible in-
tervals for the Bayesian estimators do not have satisfactory coverage properties
as the model misspecification persists across scenarios, which is aligned with the
Bernstein-von Mises Theorem [65, Chapter 10.2].
6. A real-data illustration
To demonstrate the practical use, we apply the proposed method to a probability
sample from the 2015 Current Population Survey (CPS) and a non-probability
sample from the 2015 Behavioral Risk Factor Surveillance System (BRFSS)
survey. Note that the Behavioral Risk Factor Surveillance System survey itself
is a probability sample and we manually discard its sampling weights to recast
it as a non-probability sample for illustrating our proposed method.
To apply the proposed method, we use a two-phase sampling survey data with
sizes n
A
= 1000 and n
B
= 8459. We focus on two outcome variables of interest:
employment (percentages of working and retired) and educational attainment
(high school or less as h.s.o.l, and college or above as c.o.a.). Both datasets
provide measurements on the outcomes of interest and some common covariates
including age, sex (female or not), race (white and black), origin (Hispanic or
not), region (northeast, south, or west), and marital status (married or not).
To illustrate the heterogeneity in the study populations, Table 4 contrasts the
means of variables from the CPS sample (design-weighted averages) and the
BRFSS sample (simple averages). Based on Table 4, the BRFSS sample may
not be representative of the target population, and the pretesting procedures
before pooling should be expected.
Table 5 presents the results. For all estimators, we specify the propensity
score model to be a logistic regression model with the covariates (all variables
excluding the outcome variable) and the outcome mean model to be a logistic
regression model with the covariates. The efficient estimator μ
eff
gains efficiency
in all estimators compared to both μ
A
and μ
dr
; however, it may be subject to
biases if the non-probability sample does not satisfy the required assumptions. In
the test-and-pool analysis, the pretesting rejects the use of the non-probability
Test-and-pool estimator 1509
Table 4
The covariate means by two samples: CPS sample (a probability sample) and BRFSS
sample (a hypothetical non-probability sample.)
Data source age %sex %white %black %hispanic %northeast %south
CPS 47.5 56.5 81.9 11.0 13.3 18.1 37.7
BRFSS 48.3 54.2 83.2 8.4 8.3 20.0 27.6
%west %married %working %retired %h.s.o.l. %c.o.a.
CPS 24.1 52.5 58.7 13.6 39.4 30.3
BRFSS 29.5 50.8 52.2 24.5 21.2 41.9
Table 5
Estimated population mean (EST), standard errors (SE) and confidence intervals of μ
g
for
selected covariates when combining two datasets.
Outcome Y %working %retire %h.s.o.l. %c.o.a.
μ
A
est 58.7 13.6 39.4 30.3
se 1.51 1.17 1.60 1.59
Wald (54.8,62.3) (11.6,16.2) (35.7,43.0) (27.2,33.7)
μ
dr
est 56.5 20.0 25.8 32.3
se 1.03 1.24 0.93 1.20
Wald (54.2,58.8) (17.9,22.4) (234.0,27.5) (30.3,34.5)
μ
eff:KH
est 56.6 17.3 26.4 32.1
se 0.80 0.19 0.87 0.62
Wald (54.3,58.9) (15.4,19.6) (24.6,28.1) (30.1,34.3)
μ
Bayes:1
est 59.8 14.1 40.5 30.7
se 1.97 1.37 2.00 1.84
hpdi (56.0, 63.6) (11.4,16.8) (36.6,44.4) (27.2,34.4)
μ
Bayes:2
est 59.8 14.0 40.3 30.9
se 2.01 1.33 1.92 1.84
hpdi (56.1,63.9) (11.4,16.4) (36.4,44.0) (27.2,34.5)
μ
Bayes:3
est 58.6 14.1 37.6 31.1
se 1.94 1.30 1.92 1.76
hpdi (54.7, 62.4) (11.6,16.7) (33.7,41.4) (27.7,34.7)
μ
tap:KH
est 58.7 13.6 39.0 31.7
se 1.51 1.17 1.55 0.64
baci (54.9,62.6) (11.6,15.8) (35.8,42.6) (31.0,33.6)
sample for the employment variables “working” and “retired” but accepts the use
of the non-probability sample for the education variables “high school or less”
and “college or above”. Thus, for the employment variables, μ
tap
= μ
A
,andfor
the educational attainment variables, μ
tap
gains efficiency over μ
A
. The Bayesian
estimators with the informative priors 2 and 3 are more efficient than the prior
1. However, they still yield larger standard errors compared to the probability-
only estimator μ
A
perhaps because the non-probability-based informative priors
are biased for the model parameters for the probability sample. From the test-
and-pool analysis, the employment rate and the retirement rate are 58.7% and
13.6%, respectively, the percentage of the U.S. population with a high school
education or less is 39.0% and the percentage of the population with a college
education or above is 31.7% in 2015.
1510 C. Gao and S. Yang
7. Concluding remarks
When utilizing the non-probability samples, researchers often assume that the
observed covariates contain all the information needed for recovering the sam-
pling mechanism. However, this assumption may be violated, and hence the
integration of the probability and non-probability samples is subject to biases.
In this paper, we propose the test-and-pool estimator that firstly scrutinizes
the assumption required for combining by hypothesis testing and carefully com-
bines the probability and non-probability samples by a data-driven procedure to
achieve the minimum mean squared error. In theoretical development, we treat
,c
γ
) jointly as two tuning parameters and establish the asymptotic distribu-
tion of the pretesting estimator without taking their uncertainties into account.
The non-regularity of the pretest estimator invalidates the conventional method
for generating reliable inferences. To address this issue, the proposed adaptive
confidence interval has been designed to effectively handle the non-smoothness
of the pretest estimator and ensure uniform validity of inferences. It is important
to note, however, that this approach may result in a little gain in the precision
of the confidence interval, although the point estimator might have a signifi-
cant gain in the MSE compared to the estimator based only on the probability
sample. Further research is required to develop a valid post-testing confidence
interval that offers reduced conservatism.
Pretest estimation is the norm rather than the exception in applied research,
so the theories that we have established are highly relevant to researchers who
engage in applied work. The proposed framework can be extended in the fol-
lowing directions. First, in this work, we study the implications of pretesting
on estimation and inference under one single pretest. In practice, researchers
may engage in multiple presetting. For example, in the data integration con-
text, one can encounter multiple data sources [51, 71, 16], requiring pretesting
of the comparability of each data source and the benchmark. Multiple preset-
ting alters the current asymptotic results and is an important future research
topic. Second, our framework considers a fixed number of covariates; however,
in reality, practitioners often collect a rich set of auxiliary variables, rendering
variable selection imperative [75]. Developing a valid statistical framework to
deal with issues arising from selective inference is a challenging but important
topic for further investigation. Third, small area estimation has received a lot of
attention in the data integration context [44, 28, 25]. The typical estimator in
small area estimation is a weighted average of the design-based estimator and
a model-based synthetic estimator. [5] discussed the trade-off of the efficiency
gain from invoking model assumptions and the risk that these assumptions do
not hold. Thus, pretesting can be potentially useful for small-area estimation,
which we will investigate in the future.
Appendix A: Proofs
A.1. Regularity conditions
Let F
N
= {V
i
=(X
i
,Y
i
)
: i U }
A
(V,δ
A
; μ)an
B
(V,δ
A
B
; μ, τ)be
l-dimensional estimating functions for the parameter μ
g
R
l
when using the
Test-and-pool estimator 1511
probability sample and the combined samples, respectively. Let Φ
τ
(V,δ
A
B
; τ)
be the k-dimensional estimating equations for the nuisance parameter τ
0
R
k
. Then, we construct one stacked estimating equation system Φ(V, δ
A
B
; θ)
with θ =(μ
A
B
)
and dim(θ)=2l + k. For establishing our stochastic
statements, we require the following regularity conditions.
Assumption A.1. The following regularity conditions hold.
a) The parameter θ =(μ
A
B
)
belongs to a compact parameter spaces
Θ in R
2l+k
.
b) There exist a unique solution θ
0
=(μ
A,0
B,0
0
)
lying in the interior
of the compact space Θ such that
E{Φ
A
(V,δ
A
; θ
0
)} = E{Φ
B
(V,δ
A
B
; θ
0
)} = E{Φ
τ
(V,δ
A
B
; θ
0
)} =0.
c) Φ(V,δ
A
B
; θ) is integrable with respect to the joint distribution of (V , δ
A
,
δ
B
) for all θ in a neighborhood of θ
0
.
d) The first two partial derivatives of E{Φ(V,δ
A
B
; θ)} and their empirical
estimators are invertible for all θ in a neighborhood of θ
0
.
e) For all j, k, l ∈{1, ···, 2l +k}, there is an integrable function B(V,δ
A
B
)
such that
|Φ
j
(V,δ
A
B
; θ)/∂θ
k
∂θ
l
|≤B(V,δ
A
B
), E {B(V,δ
A
B
)} < ,
for all θ in a neighborhood of θ
0
almost surely.
f) {V
i
: i ∈U}are a set of i.i.d. random variables s.t. E{|Φ(V,δ
A
B
; θ)|
2+δ
}
is uniformly bounded for θ in a neighborhood of θ
0
.
g) The sample sizes n
A
and n
B
are in the same order of magnitude, i.e., n
A
=
O(n
B
). The sampling fractions for both Sample A and B are negligible, i.e.,
n/N = o(1),wheren = n
A
+ n
B
.
h) There exist C
1
and C
2
such that 0 <C
1
A,i
/n
A
C
2
and 0 <C
1
B,i
/n
B
C
2
for all i ∈U.
Assumption A.1 a)-e) are typical finite moment conditions to ensure the
consistency of the solution to the estimating functions [49, Appendix B], [64,
Section 3.2], [9, page 293] and [67, Appendix C]. Assumption A.1 f) is required
for obtaining the asymptotic normality of μ
g
under superpopulation. Assump-
tion A.1 g) states that the sampling fraction is negligible, which is helpful
for subsequent variance estimation, and we can use O(n
1/2
A
), O(n
1/2
B
)and
O(n
1/2
) interchangeably. Assumption A.1 h) implies that the inclusion prob-
abilities for Samples A and B are in the order of n/N , which is necessary to
establish their root-n consistency.
It is noteworthy that in Assumption 2.1, the asymptotic normality is as-
certained for the design-weighted estimators given the finite population F
N
.
Hereby, we extend the conditional normality to the unconditional one, which
averages over all possible finite populations satisfying the Assumption A.1 (f).
The following lemma plays a key role to establish the stochastic statements [24,
Theorem 1.3.6.].
1512 C. Gao and S. Yang
Lemma A.1. Under Assumption 2.1 and Assumption A.1 (f), let {F
N
} be
a sequence of finite populations and A
N
be a sample selected from the Nth
population by PR design with size n
N
. Assume that
lim
N→∞
n
N
= , lim
N→∞
N n
N
= .
We know that the distribution of the design-weighted estimator μ
g
and finite-
population estimator μ
g
are both asymptotically normal distributed such that
μ
g
|F
N
·
∼N(μ
g
,V
1
)
g
·
∼N(μ
g,0
,V
2
),
where
·
denotes the asymptotic distribution. Then, μ
g
μ
g
is also asymptotically
normal.
By lemma A.1, the sampling fraction is negligible, and therefore the limiting
variance of lim
N→∞
n
1/2
N
(μ
g
μ
g,0
) is 0, indicating that the intermediate step
of producing the finite population is of little significance.
A.2. Proof of Lemmas 2.1 and 3.1
In the general case, we begin to investigate the statistical properties of
Φ
A,n
(μ
A
, τ)=n
1/2
N
1
N
i=1
Φ
A
(V
i
A,i
; μ
A
, τ)
and
Φ
B,n
(μ
B
, τ)=n
1/2
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
B
, τ).
First, to simplify our notations, let
˙
Φ
A
(V,δ
A
; μ, τ)=Φ
A
(V,δ
A
; μ, τ)/∂μ,
˙
Φ
B
(V,δ
A
B
; μ, τ)=Φ
B
(V,δ
A
B
; μ, τ)/∂μ,
φ
B,τ
(V,δ
A
B
; μ, τ)=Φ
B
(V,δ
A
B
; μ, τ)/∂τ,
φ
τ
(V,δ
A
B
; τ)=Φ
τ
(V,δ
A
B
; τ)/∂τ.
By the Taylor expansion of Φ
B,n
(μ
B
, τ)at(μ
g
0
), we have
0=Φ
B,n
(μ
B
, τ)
= n
1/2
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
g
0
)
+n
1/2
N
1
N
i=1
φ
B,τ
(V
i
A,i
B,i
; μ
B
, τ
)(τ τ
0
)
Test-and-pool estimator 1513
+n
1/2
N
1
N
i=1
˙
Φ
B
(V
i
A,i
B,i
; μ
B
, τ
)(μ
B
μ
g
), (A.1)
for some (μ
B
, τ
) lying between (μ
B
, τ)and(μ
g
0
), which leads to
n
1/2
N
1
N
i=1
˙
Φ
B
(V
i
; μ
B
, τ
)(μ
B
μ
g
) (A.2)
= n
1/2
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
g
0
)
+ n
1/2
N
1
N
i=1
φ
B,τ
(V
i
A,i
B,i
; μ
B
, τ
)(τ τ
0
).
Also, under Assumption A.1 a), b) and c), by the Taylor expansion, we have
n
1/2
(τ τ
0
)=
1
N
N
i=1
φ
τ
(V
i
; τ
0
)
1
×
n
1/2
N
1
N
i=1
Φ
τ
(V
i
A,i
B,i
; τ
0
)
+ o
ζ-p-np
(1), (A.3)
as τ τ
0
. Also, under Assumption A.1 (e), we know that
N
1
N
i=1
˙
Φ
A
(V
i
; μ
A
, τ
) E{
˙
Φ
A
(V ; μ
g,0
0
)},
N
1
N
i=1
φ
r
(V
i
; τ
0
) E {φ
τ
(V ; τ
0
)},
N
1
N
i=1
˙
Φ
B
(V
i
; μ
B
, τ
) E{
˙
Φ
B
(V ; μ
g,0
0
)},
N
1
N
i=1
φ
B,τ
(V
i
; μ
B
, τ
) E {φ
B,τ
(V ; μ
g,0
0
)},
(A.4)
where the first two probability convergence can be straightforward to obtain by
Weak Law of Large Numbers under Assumption A.1 f) and continuous map-
ping theorem as μ
g
μ
g,0
,(μ
A
, τ) (μ
g,0
0
) by design and (μ
A
, τ
)islying
between (μ
A
, τ)and(μ
g,0
0
). As for the third and fourth probability conver-
gence, we first prove that μ
B,0
μ
g,0
= o
np-p-ζ
(1) under the local alternative
E{Φ
B
(V,δ
A
B
; μ
g,0
0
)} = n
1/2
B
η in Lemma A.2.
Lemma A.2. Under Assumptions 2.1, 2.2 (iii) and suitable moments condi-
tions in Assumption A.1, we have μ
B,0
μ
g,0
= O
np-p-ζ
(n
1/2
).
1514 C. Gao and S. Yang
Next, we have under Assumption A.1 e),
N
1
N
i=1
˙
Φ
B
(V
i
; μ
B
, τ
)
=
N
1
N
i=1
˙
Φ
B
(V
i
; μ
g,0
0
)+N
1
N
i=1
2
Φ
B
(V
i
; μ
#
B
0
)
∂μ∂μ
(μ
B
μ
g,0
)
=
N
1
N
i=1
˙
Φ
B
(V
i
; μ
g,0
0
)+O
ζ-p-np
{(μ
B
μ
B,0
)+(μ
B,0
μ
g,0
)}
= E{
˙
Φ
B
(V ; μ
g,0
0
)} + o
ζ-p-np
(1),
(A.5)
where A
n
=
B
n
means that A
n
= B
n
+ o
ζ-p-np
(1) and μ
#
B
lies between μ
B
and
μ
g,0
.Sinceμ
B
μ
B,0
g
μ
g,0
and μ
B
lies between μ
B
and μ
g
, we establish
the second approximation in (A.5)as
(μ
B
μ
B,0
)+(μ
B,0
μ
g,0
)=O
np
(n
1/2
B
)+O
ζ
(N
1/2
)=o
ζ-p-np
(1),
since n
B
/N = o(1). The probability convergence of N
1
N
i=1
φ
B,τ
(V
i
; μ
B
, τ
)
can be established similarly and hence we obtain the last two parts of (A.4). By
plugging (A.3)and(A.4)into(A.2), we obtain the influence function for μ
B
as
n
1/2
(μ
B
μ
g
)
=
E
˙
Φ
B
(V ; μ
g,0
0
)
1
×
n
1/2
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
g
0
)
E {φ
B,r
(V ; μ
g,0
0
)E {φ
τ
(V ; τ
0
)}
1
n
1/2
N
1
N
i=1
Φ
τ
(V
i
A,i
B,i
; τ
0
)

=
n
1/2
N
1
N
i=1
ψ
B
(V
i
; μ
g
0
), (A.6)
where ψ
B
(V
i
; μ, τ) is the influence function for estimation of μ
B
under H
0
.For
completeness, we define the influence function ψ
A
(V
i
; μ, τ) for estimator μ
A
in
an analogous way as
n
1/2
(μ
A
μ
g
)
=
n
1/2
N
1
N
i=1
N
1
N
i=1
˙
Φ
A
(V
i
A,i
; μ
A
, τ
)
1
×{Φ
A
(V
i
A,i
; μ
g
0
)+φ
A,τ
(V
i
; μ
A
, τ
) · (τ τ
0
)} (A.7)
=
E
˙
Φ
A
(V ; μ
g,0
0
)
1
×
n
1/2
N
1
N
i=1
Φ
A
(V
i
A,i
; μ
g
0
)
E {φ
A,τ
(V ; μ
g,0
0
)E {φ
τ
(V ; τ
0
)}
1
n
1/2
N
1
N
i=1
Φ
τ
(V
i
A,i
B,i
; τ
0
)

Test-and-pool estimator 1515
=
n
1/2
N
1
N
i=1
ψ
A
(V
i
; μ
g
0
), (A.8)
where φ
A,r
(V ; μ, τ)=Φ
A
(V,δ
A
; μ, τ)/∂τ . By Lemma A.1, the joint asymptotic
distribution for n
1/2
(μ
A
μ
g
)andn
1/2
(μ
B
μ
g
) would be
n
1/2
μ
A
μ
g
μ
B
μ
g
N

0
l×1
f
1/2
B
[E {Φ
B
(V
i
; μ
g,0
0
)/∂μ}]
1
η
,
V
A
Γ
Γ
V
B

,
where V
A
, ΓandV
B
are the total (co-)variance of two-phase design averaging
over the finite populations:
V
A
= nN
2
E
ζ
var
p
N
i=1
ψ
A
(V
i
; μ
g
0
) |F
N

+ nN
2
var
ζ
E
p
N
i=1
ψ
A
(V
i
; μ
g
0
) |F
N

,
V
B
= nN
2
E
ζ
var
p-np
N
i=1
ψ
B
(V
i
; μ
g
0
) |F
N

+ nN
2
var
ζ
E
p-np
N
i=1
ψ
B
(V
i
; μ
g
0
) |F
N

,
Γ=nN
2
E
ζ
cov
p-np
N
i=1
ψ
A
(V
i
; μ
g
0
),
N
i=1
ψ
B
(V
i
; μ
g
0
) |F
N

+ nN
2
× var
ζ
E
p
N
i=1
ψ
A
(V
i
; μ
g
0
) |F
N
, E
p-np
N
i=1
ψ
B
(V
i
; μ
g
0
) |F
N

,
where the first term is attributed to the randomness of probability (and non-
probability) sample designs, and the second term is attributed to the random-
ness of the superpopulation model. The rest of the proof is summarized in
Lemma A.3.
Lemma A.3. Under the Assumption A.1 and the asymptotic joint distribution
for μ
A
and μ
B
in Lemma 3.1, the form of μ
eff
which maximizes the variance
reduction under H
0
would be
n
1/2
(μ
eff
μ
0
)
=
n
1/2
{ω
A
eff
)(μ
A
μ
g
)+ω
B
eff
)(μ
B
μ
g
)},
where the weight functions are
ω
A
(Λ) = E
˙
Φ
A,B,n
g,0
0
)
1
E
˙
Φ
A
(V
i
A,i
; μ
g,0
0
)
, (A.9)
1516 C. Gao and S. Yang
ω
B
(Λ) = E
˙
Φ
A,B,n
g,0
0
)
1
ΛE
˙
Φ
B
(V
i
A,i
B,i
; μ
g,0
0
)
, (A.10)
where
˙
Φ
A,B,n
g,0
0
)=
˙
Φ
A
(V
i
A,i
; μ
g,0
0
)+Λ
˙
Φ
B
(V
i
A,i
B,i
; μ
g,0
0
).
The most efficient estimator μ
eff
with
Λ
eff
= E
˙
Φ
A
(V
i
; μ
g,0
0
)
(V
A
Γ)(V
B
Γ
)
1
E
˙
Φ
B
(V
i
; μ
g,0
0
)
1
(A.11)
has the asymptotic distribution under H
a,n
as
n
1/2
(μ
eff
μ
g
) →N{b
eff
(η),V
eff
},
where b
eff
(η)=f
1/2
B
ω
B
eff
) {EΦ
B
(μ
g,0
0
)/∂μ}
1
η and
V
eff
=
ω
A
eff
)
ω
B
eff
)
V
A
Γ
Γ
V
B

ω
A
eff
)
ω
B
eff
)
.
When μ
A
and μ
B
are both scalar, V
eff
would reduce to
V
eff
=(V
A
V
B
Γ
2
)(V
A
+ V
B
2Γ)
1
= V
A
V
Δ
,
where V
Δ
=(V
A
Γ)
2
(V
A
+ V
B
2Γ)
1
.
A.3. Proof of Lemma 3.2
By applying the Taylor expansion with Lagrange forms of remainder to the
asymptotic distribution for n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
A
, τ)in(3.5) could
be shown as
n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
A
, τ)=n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
g
0
)
+ n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
;μ
A
)
∂μ
Φ
B
(V
i
A,i
B,i
;μ
A
)
∂τ
μ
A
μ
g
τ τ
0
where (μ
A
τ
)
is the neighborhood of (μ
g,0
0
)
as plimμ
A
= μ
g,0
and
plimτ = τ
0
. Under the Assumption A.1 e), we have
n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
A
, τ)
= n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
g
0
) (A.12)
+ n
1/2
B
N
1
N
i=1
Φ
B,j
(V
i
A,i
B,i
; μ
A
)
∂μ
(μ
A
μ
g
)
+ n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
A
)
∂τ
(τ τ
0
)
Test-and-pool estimator 1517
= n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
g
0
)
+ E
Φ
B
(V,δ
A
B
; μ
g,0
0
)
∂τ
n
1/2
B
(τ τ
0
) (A.13)
+ E
Φ
B
(V,δ
A
B
; μ
g,0
0
)
∂μ
n
1/2
B
(μ
A
μ
g
)+o
ζ-p-np
(1).
Next, by replacing the first two term in Equation (A.13) with Equation (A.2),
we have
n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
A
, τ)
= E
Φ
B
(V,δ
A
B
; μ
g,0
0
)
∂μ
n
1/2
B
(μ
B
μ
g
)
+ E
Φ
B
(V,δ
A
B
; μ
g,0
0
)
∂μ
n
1/2
B
(μ
A
μ
g
)+o
ζ-p-np
(1)
= (n
B
/n)
1/2
· E
˙
Φ
B
(V
i
; μ
g,0
0
) · n
1/2
(μ
B
μ
g
)
+(n
B
/n)
1/2
· E
˙
Φ
B
(V
i
; μ
g,0
0
) · n
1/2
(μ
A
μ
g
)+o
ζ-p-np
(1),
provided by WLLN under Assumptions 2.1, 2.2 (iii) and Assumption A.1.By
the joint distribution of μ
A
and μ
B
in Lemma 3.1, the variance of Φ
B,n
(μ
A
, τ)
would be
Σ
T
= f
B
E
˙
Φ
B
(V
i
; μ
g,0
)
(V
A
+ V
B
Γ
Γ)
E
˙
Φ
B
(V
i
; μ
g,0
)
.
Thus, the asymptotic distribution for Φ
B,n
(V
i
A,i
B,i
; μ
A
, τ) would be
n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
A
, τ)
→N
η, f
B
E
˙
Φ
B
(V
i
; μ
g,0
)
(V
A
+ V
B
Γ
Γ)
E
˙
Φ
B
(V
i
; μ
g,0
)
.
A.4. Proof of Theorem 4.1
From Lemma 2.1 and 3.1, we know that the asymptotic joint distribution for
μ
A
and μ
B
would be
n
1/2
μ
A
μ
g
μ
B
μ
g
→N

0
l×1
f
1/2
B
[E {Φ
B
(μ
g,0
0
)/∂μ}]
1
η
,
V
A
Γ
Γ
V
B

.
For simplicity, we let n
1/2
(μ
A
μ
g
)andn
1/2
(μ
B
μ
g
) be asymptotically dis-
tributed as Z
1
and Z
2
, respectively. Then, Φ
B,n
(V
i
A,i
B,i
; μ
A
, τ) could be
1518 C. Gao and S. Yang
expressed as
n
1/2
B
N
1
N
i=1
Φ
B
(V
i
A,i
B,i
; μ
A
, τ)
=
n
1/2
B
N
1
N
i=1
˙
Φ
B
(V
i
A,i
B,i
; μ
g,0
0
)(μ
B
μ
g
)
+ n
1/2
B
N
1
N
i=1
˙
Φ
B
(V
i
A,i
B,i
; μ
g,0
0
)(μ
A
μ
g
)
f
1/2
B
E
˙
Φ
B
(V,δ
A
B
; μ
g,0
0
)
(Z
1
Z
2
).
Let U
2
= f
1/2
B
E
˙
Φ
B
(V,δ
A
B
; μ
g,0
0
)
(Z
1
Z
2
). Next step, we attempt to find
another linear combination of Z
1
and Z
2
which is orthogonal to U
2
. Observed
that when U
1
= f
1/2
B
{
V
B
)(Γ V
A
)
1
Z
1
+ Z
2
}, it is easy to verify that the
covariance of U
1
and U
2
is zero under H
0
.
cov(U
2
,U
1
)=f
B
E
˙
Φ
B
(V,δ
A
B
; μ
g,0
0
)
×
I
l×l
I
l×l
×
V
A
Γ
Γ
V
B
×
V
A
)
1
V
B
)
I
l×l
= f
B
E
˙
Φ
B
(V,δ
A
B
; μ
g,0
0
)
× (
V
A
Γ
Γ V
B
) ×
V
A
)
1
V
B
)
I
l×l
=0
l×l
.
Also, since U
1
and U
2
are both asymptotically normal distributions, which im-
plies that zero covariance leads to independency. After a few standardization
procedures, we have W
1
and W
2
as W
1
1/2
S
U
1
, W
2
1/2
T
U
2
with Σ
S
and Σ
T
defined as
Σ
S
=var(U
1
)=f
B
var{
V
B
)(Γ V
A
)
1
Z
1
+ Z
2
}, (A.14)
Σ
T
= f
B
E
˙
Φ
B
(μ
g,0
0
)
(V
A
+ V
B
Γ
Γ)
E
˙
Φ
B
(μ
g,0
0
)
. (A.15)
Therefore, we have the form for the standardized random variables W
1
and W
2
as
W
1
1/2
S
U
1
= f
1/2
B
Σ
1/2
S
{
V
B
)(Γ V
A
)
1
Z
1
+ Z
2
},
W
2
= Σ
1/2
T
U
2
= (V
A
+ V
B
Γ
Γ)
1/2
(Z
1
Z
2
).
Here we use Σ
1/2
T
to standardize U
2
for the sake of convenience later. There-
fore, under the local alternative H
a,n
: E{Φ
B
(V,δ
A
B
; μ
g,0
0
)} = n
1/2
B
η,
we have that E(Z
1
)=0, E(Z
2
)=f
1/2
B
E
˙
Φ
B
(μ
g,0
0
)
1
η. Combining the
above leads to
W
1
N(μ
1
,I
l×l
),W
2
N(μ
2
,I
l×l
),
Test-and-pool estimator 1519
where
μ
1
= Σ
1/2
S
E
˙
Φ
B
(μ
g,0
0
)
1
η,
μ
2
= f
1/2
B
(V
A
+ V
B
Γ
Γ)
1/2
E
˙
Φ
B
(μ
g,0
0
)
1
η = Σ
1/2
T
η,
and since W
1
W
2
, we could project out TAP estimator μ
tap
with the opti-
mal tuning parameter
,c
γ
) onto these two basis respectively. First, on the
condition that
T>c
γ
= {Φ
B,n
(μ
A
, τ)}
Σ
1
T
{Φ
B,n
(μ
A
, τ)} >c
γ
W
2
W
2
>c
γ
,
we have
n
1/2
(μ
tap
μ
g
) | T>c
γ
= n
1/2
(μ
A
μ
g
) | T>c
γ
Z
1
|W
2
W
2
>c
γ
→−f
1/2
B
V
A
)(V
A
+ V
B
Γ
Γ)
1
U
1
+ f
1/2
B
V
A
)(V
A
+ V
B
Γ
Γ)
1
E
˙
Φ
B
(μ
g,0
0
)
1
U
2
|W
2
W
2
>c
γ
→−f
1/2
B
(V
A
+ V
B
Γ
Γ)
1
Σ
1/2
S
W
1
+(Γ V
A
)(V
A
+ V
B
Γ Γ
)
1/2
W
2
|W
2
W
2
>c
γ
→−V
1/2
eff
W
1
+ V
1/2
A-eff
W
2
|W
2
W
2
>c
γ
.
Next, on the condition T = W
2
W
2
c
γ
,wehave
n
1/2
(μ
tap
μ
g
)ω
A
Z
1
+ ω
B
Z
2
|W
2
W
2
c
γ
→−f
1/2
B
V
A
)(V
A
+ V
B
Γ
Γ)
1
U
1
+ f
1/2
B
ω
A
V
A
)(V
A
+ V
B
Γ
Γ)
1
×
E
˙
Φ
B
(μ
g,0
0
)
1
U
2
| W
2
W
2
c
γ
f
1/2
B
ω
B
V
B
)(V
A
+ V
B
Γ
Γ)
1
×
E
˙
Φ
B
(μ
g,0
0
)
1
U
2
| W
2
W
2
c
γ
→−f
1/2
B
(V
A
+ V
B
Γ
Γ)
1
Σ
1/2
S
W
1
+ f
1/2
B
ω
A
V
A
)(V
A
+ V
B
Γ Γ
)
1/2
W
2
| W
2
W
2
c
γ
f
1/2
B
ω
B
V
B
)(V
A
+ V
B
Γ
Γ)
1/2
W
2
| W
2
W
2
c
γ
→−V
1/2
eff
W
1
+(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)W
2
|W
2
W
2
c
γ
,
where W
t
2
= W
2
|W
2
W
2
c
γ
,andω
A
B
are the new tuned weighted functions
defined in (A.9)and(A.10) with Λ = Λ
. In this way, we could fully characterize
the asymptotic distribution for the TAP estimator μ
tap
under the optimal tuning
parameter as,
n
1/2
(μ
tap
μ
g
)
V
1/2
eff
W
1
+(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)W
t
[0,c
γ
]
w.p. ξ,
V
1/2
eff
W
1
+ V
1/2
A-eff
W
t
[c
γ
,]
w.p. 1 ξ,
where ξ = P(W
2
W
2
<c
γ
).
1520 C. Gao and S. Yang
A.5. Proof of the bias and mean squared error of n
1/2
(
μ
tap
μ
g
)
For general case, given W
2
N
p
(μ
2
,I
p×p
), the MGF of truncated normal dis-
tribution W
2
|a W
2
W
2
b is [60]
αm(t)=E{exp(t
W
2
)}
=(2π)
p/2
C
exp(t
W
2
)exp
1
2
(W
2
μ
2
)
(W
2
μ
2
)
dW
2
=(2π)
p/2
exp(
1
2
t
t + μ
2
t)
×
C
exp
1
2
(W
2
μ
2
t)
(W
2
μ
2
t)
dW
2
=exp(
1
2
μ
2
μ
2
)
k=0
{F
p+2k
(b) F
p+2k
(a)}{(μ
2
+ t)
(μ
2
+ t)/2}
k
/k!,
where α = F
p
(b; μ
2
μ
2
/2) F
p
(a; μ
2
μ
2
/2) is the normalization constant and
F
p
(a; μ
2
μ
2
/2) is CDF of chi-square distribution at value a with non-central
parameter μ
2
μ
2
/2. The second and the third equality above are justified by
(2π)
p/2
C
exp
1
2
(W
2
μ
2
t)
(W
2
μ
2
t)
= P{a W
2
W
2
b | W
2
∼N(μ
2
+ t, I
p×p
)}
= F {b; k = p, λ =(μ
2
+ t)
(μ
2
+ t)}−F {a; k = p, λ =(μ
2
+ t)
(μ
2
+ t)}
=exp{−
1
2
(μ
2
+ t)
(μ
2
+ t)}
×
k=0
{F
p+2k
(b) F
p+2k
(a)}{(μ
2
+ t)
(μ
2
+ t)/2}
k
/k!.
To compute the first and second moment of this truncated normal distribution,
we take derivative of the MGF and evaluate the function at t =0
α
dm(t)
dt
t=0
=(μ
2
+ t)exp(
1
2
μ
2
μ
2
)
×
k=0
{F
p+2k+2
(b) F
p+2k+2
(a)}{(μ
2
+ t)
(μ
2
+ t)}/k!|
t=0
= μ
2
exp(
1
2
μ
2
μ
2
)
k=0
{F
p+2k+2
(b) F
p+2k+2
(a)}{μ
2
μ
2
/2}
k
/k!
= μ
2
{F
p+2
(b; μ
2
μ
2
/2) F
p+2
(a; μ
2
μ
2
/2)}.
By the nature of MGF, we obtain the expectation of the first moment of W
2
E(W
2
|a W
2
W
2
b)=μ
2
·
F
p+2
(b; μ
2
μ
2
/2) F
p+2
(a; μ
2
μ
2
/2)
F
p
(b; μ
2
μ
2
/2) F
p
(a; μ
2
μ
2
/2)
.
Test-and-pool estimator 1521
Then, taking the second derivative of the MGF follows by
α
d
2
m(t)
dtdt
t=0
=exp(
1
2
μ
2
μ
2
)
k=0
{F
p+2k+2
(b) F
p+2k+2
(a)}{(μ
2
+ t)
(μ
2
+ t)/2}
k
/k!|
t=0
+(μ
2
+ t)(μ
2
+ t)
×
k=0
{F
p+2k+4
(b) F
p+2k+4
(a)}{(μ
2
+ t)
(μ
2
+ t)/2}
k
/k!|
t=0
=exp(
1
2
μ
2
μ
2
)
k=0
{F
p+2k+2
(b) F
p+2k+2
(a)}{μ
2
μ
2
/2}
k
/k!
+μ
2
μ
2
k=0
{F
p+2k+4
(b) F
p+2k+4
(a)}{μ
2
μ
2
/2}
k
/k!
= I
p×p
(F
p+2
(b; μ
2
μ
2
/2) F
p+2
(a; μ
2
μ
2
/2))
+ μ
2
μ
2
(F
p+4
(b; μ
2
μ
2
/2) F
p+4
(a; μ
2
μ
2
/2)),
which leads to
E(W
2
W
T
2
|a W
T
2
W
2
b)=I
p×p
F
p+2
(b; μ
T
2
μ
2
/2) F
p+2
(a; μ
T
2
μ
2
/2)
F
p
(b; μ
T
2
μ
2
/2) F
p
(a; μ
T
2
μ
2
/2)
+ μ
2
μ
2
F
p+4
(b; μ
T
2
μ
2
/2) F
p+4
(a; μ
T
2
μ
2
/2)
F
p
(b; μ
T
2
μ
2
/2) F
p
(a; μ
T
2
μ
2
/2)
.
In our case,
p = l, μ
1
= Σ
1/2
S
[E {Φ
B
(μ
g,0
0
)/∂μ}]
1
η, μ
2
= Σ
1/2
T
η.
Recall, for T c
γ
,wehaven
1/2
(μ
tap
μ
g
)→−V
1/2
eff
W
1
+(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)W
2
|W
2
W
2
c
γ
with probability ξ = F
l
(c
γ
; μ
2
μ
2
), the bias would be
bias(λ, c
γ
; η)
T c
γ
= V
1/2
eff
μ
1
+(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
) · E(W
2
|W
2
W
2
c
γ
)
= V
1/2
eff
μ
1
+(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
) ·
F
l+2
(c
γ
; μ
T
2
μ
2
/2)μ
2
F
l
(c
γ
; μ
T
2
μ
2
/2)
.
The MSE can be derived based on the known formula mse(X + Y )=var(X +
Y )+{E(X + Y )}
2
= {var(X)+μ
2
X
} + {var(Y )+μ
2
Y
} +2μ
X
μ
Y
mse(λ, c
γ
; η)
T c
γ
= V
1/2
eff
(μ
1
μ
1
+ I
l×l
)V
1/2
eff
+(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)
× E(W
2
W
T
2
|W
2
W
2
c
γ
)(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)
2V
1/2
eff
μ
1
E(W
2
|W
2
W
2
c
γ
)(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)
= V
1/2
eff
(μ
1
μ
1
+ I
l×l
)V
1/2
eff
+(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)
1522 C. Gao and S. Yang
×
F
l+2
(c
γ
; μ
T
2
μ
2
/2)
F
l
(c
γ
; μ
T
2
μ
2
/2)
I
l×l
+
F
l+4
(c
γ
; μ
T
2
μ
2
/2)
F
l
(c
γ
; μ
T
2
μ
2
/2)
μ
2
μ
2
(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)
2F
l+2
(c
γ
; μ
T
2
μ
2
/2)
F
l
(c
γ
; μ
T
2
μ
2
/2)
V
1/2
eff
μ
1
μ
2
(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
).
For T>c
γ
,wehaven
1/2
(μ
tap
μ
g
)→−V
1/2
eff
W
1
+ V
1/2
A-eff
W
2
|W
2
W
2
>c
γ
with
probability 1 ξ =1F
l
(c
γ
; μ
2
μ
2
), the corresponding bias and MSE would be
bias(λ, c
γ
; η)
T>c
γ
= V
1/2
eff
μ
1
+ V
1/2
A-eff
· E(W
2
|W
2
W
2
>c
γ
)
= V
1/2
eff
μ
1
+ V
1/2
A-eff
·
1 F
l+2
(c
γ
; μ
T
2
μ
2
/2)μ
2
1 F
l
(c
γ
; μ
T
2
μ
2
/2)
,
and
mse(λ, c
γ
; η)
T>c
γ
= V
1/2
eff
(μ
1
μ
1
+ I
l×l
)V
1/2
eff
+ V
1/2
A-eff
E(W
2
W
T
2
|W
2
W
2
>c
γ
)V
1/2
A-eff
2V
1/2
eff
μ
1
E(W
2
|W
2
W
2
>c
γ
)V
1/2
A-eff
= V
1/2
eff
(μ
1
μ
1
+ I
l×l
)V
1/2
eff
+ V
1/2
A-eff
×
1 F
l+2
(c
γ
; μ
T
2
μ
2
/2)
1 F
l
(c
γ
; μ
T
2
μ
2
/2)
I
l×l
+
1 F
l+4
(c
γ
; μ
T
2
μ
2
/2)
1 F
l
(c
γ
; μ
T
2
μ
2
/2)
μ
2
μ
2
V
1/2
A-eff
2
1 F
3
(c
γ
; μ
T
2
μ
2
/2)
1 F
1
(c
γ
; μ
T
2
μ
2
/2)
V
1/2
eff
μ
1
μ
2
V
1/2
A-eff
.
Overall, the bias and mean squared error for n
1/2
(μ
tap
μ
g
) can be characterized
as
bias(λ, c
γ
; η)=ξ ·bias(λ, c
γ
; η)
T c
γ
+(1 ξ) · bias(λ, c
γ
; η)
T>c
γ
,
mse(λ, c
γ
; η)=ξ ·mse(λ, c
γ
; η)
T c
γ
+(1 ξ) · mse(λ, c
γ
; η)
T>c
γ
.
A.6. Proof of the asymptotic distribution for U (a)
Throughout the proof, we assume that the regularity conditions in Lemma 2.1
and assumptions in Theorem 4.2 hold, we prove that the coverage probability
for the adaptive projection sets is guaranteed to be larger than 1 α,whichis
P
a
μ
g
C
BACI
μ
g
,1α
(a)
1 α + o(1),
where C
BACI
μ
g
,1α
(a)=
a
μ
tap
U
1α/2
(a)/
n, a
μ
tap
L
α/2
(a)/
n
. As we al-
ready know that
a
n
1/2
(μ
tap
μ
g
) U(a),a
n
1/2
(μ
tap
μ
g
) L(a),
it is needed to show that
U(a) obtained by bootstrapping converges to the same
asymptotic distribution as U(a). Let D
p×p
denotes the space of p ×p symmetric
Test-and-pool estimator 1523
positive-definite matrices equipped with the spectral norm. We can rewrite U(a)
as
U(a)=a
V
1/2
eff
W
1
{Σ
S
,n
1/2
(μ
A
μ
g
),n
1/2
(μ
B
μ
g
)}
+ a
(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)W
2
{Σ
T
,n
1/2
(μ
A
μ
g
),n
1/2
(μ
B
μ
g
)}
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)μ
t
[c
γ
,)
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)
×
W
2
{Σ
T
,n
1/2
(μ
A
μ
g
),n
1/2
(μ
B
μ
g
)}
[c
γ
,)
μ
t
[c
γ
,)
1
T υ
n
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)
× sup
μ
2
R
l
W
2
{Σ
T
,n
1/2
(μ
A
μ
g
),n
1/2
(μ
B
μ
g
)}
[c
γ
,)
μ
t
[c
γ
,)
1
T<υ
n
.
Next, we adopt the notation for the bootstrapping to express the upper bound
U(a)=U
(b)
(a)as
U
(b)
(a)=a
V
1/2
eff
W
1
{
Σ
S
,n
1/2
(μ
(b)
A
μ
A
),n
1/2
(μ
(b)
B
μ
A
), τ}
+ a
(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)W
2
{
Σ
T
,n
1/2
(μ
(b)
A
μ
A
),n
1/2
(μ
(b)
B
μ
A
), τ}
+ a
ω
B
(
V
1/2
B-eff
+
V
1/2
A-eff
)
¯
W
(b)
2
[c
γ
,)
+ a
ω
B
(
V
1/2
B-eff
+
V
1/2
A-eff
)
×
W
2
{
Σ
T
,n
1/2
(μ
(b)
A
μ
A
),n
1/2
(μ
(b)
B
μ
A
), τ}
[c
γ
,)
¯
W
(b)
2
[c
γ
,)
1
T υ
n
+ a
ω
B
(
V
1/2
B-eff
+
V
1/2
A-eff
)
× sup
μ
2
R
l
W
2
{
Σ
T
,n
1/2
(μ
(b)
A
μ
A
),n
1/2
(μ
(b)
B
μ
A
), τ}
[c
γ
,)
μ
t
[c
γ
,)
1
T<υ
n
,
where
¯
W
(b)
2
=(1/K)
K
b=1
W
2
{
Σ
T
,n
1/2
(μ
(b)
A
μ
A
),n
1/2
(μ
(b)
B
μ
A
), τ}.Next,we
define some functions to proceed our proof. w
11
: D
l×l
×D
l×l
×R
l
×R
l
×R
d
×R
R, w
12
: D
l×l
×R
l
×R
l
×R
d
×R
l
R and ρ : D
2l×2l
×D
l×l
×R
l
×R
l
×R
d
×R×R
l
R are functions defined as below
w
11
T
, Σ
S
, G
A
, G
B
2
)=a
V
1/2
eff
W
1
S
, G
A
, G
B
)
+ a
(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)W
2
T
, G
A
, G
B
)
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)μ
t
[c
γ
,)
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)
×
W
2
T
, G
A
, G
B
)
[c
γ
,)
μ
t
[c
γ
,)
1
μ
2
μ
2
B
,
w
12
T
, G
A
,G
B
2
)=a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)
×
W
2
T
, G
A
, G
B
)
[c
γ
,)
μ
t
[c
γ
,)
1
μ
2
μ
2
B,
ρ
11
T
, G
A
, G
B
2
)=a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)
1524 C. Gao and S. Yang
×
W
2
T
, G
A
, G
B
)
[c
γ
,)
μ
t
[c
γ
,)
(1
T υ
n
1
μ
2
μ
2
B
),
ρ
12
T
, G
A
, G
B
2
)=a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)
×
W
2
T
, G
A
, G
B
)
[c
γ
,)
μ
t
[c
γ
,)
(1
T<υ
n
1
μ
2
μ
2
B
),
where G
A
= n
1/2
(μ
A
μ
g
)andG
B
= n
1/2
(μ
B
μ
g
). Using the functions we
have defined, we could re-express the upper bound U (a)intermsof
U(a)=w
11
T
, Σ
S
, G
A
, G
B
2
)+ρ
11
T
, G
A
, G
B
2
)
+sup
μ
2
R
l
{w
12
T
, G
A
, G
B
2
)+ρ
12
T
, G
A
, G
B
2
)}.
Assume the conditions in Theorem 4.2, we can show that
1. w
11
is continuous at points in
T
, Σ
S
, R
l
, R
l
, R
d
2
)andw
12
is continuous
at points in
T
, R
l
, R
l
2
) uniformly in μ
2
.Thatis,forany
Σ
T
Σ
T
,
Σ
S
Σ
S
, G
(b)
A
= n
1/2
(μ
(b)
A
μ
A
) Z
1
, G
(b)
B
= n
1/2
(μ
(b)
B
μ
A
) Z
2
and
τ τ ,wehave
sup
μ
2
R
l
|w
11
(
Σ
T
,
Σ
S
, G
(b)
A
, G
(b)
B
, τ,μ
2
) w
11
T
, Σ
S
,Z
1
,Z
2
2
)|→0,
sup
μ
2
R
l
|w
12
(
Σ
T
, G
(b)
A
, G
(b)
B
2
) w
12
T
,Z
1
,Z
2
2
)|→0.
(A.16)
2. ρ
11
(
Σ
T
, G
(b)
A
, G
(b)
B
2
)andρ
12
(
Σ
T
, G
(b)
A
, G
(b)
B
2
) converge to zeros with
probability one as n →∞uniformly in μ
2
.Thatis,
sup
μ
2
R
l
|ρ
11
(
Σ
T
, G
(b)
A
, G
(b)
B
2
)|→0, max
μ
2
R
l
|ρ
12
(
Σ
T
, G
(b)
A
, G
(b)
B
2
)|→0.
(A.17)
See Lemma B.9. and Lemma B.11. in [32] for details.
By far, combine (A.16)and(A.17), U(a) is guaranteed to be continuous, and
the continuity of L(a) can be derived in the same way. Based on continuous
mapping theorem and Theorem 4.2 in [32], we can state that
sup
M
|E{L(a),U(a)}−E
M
{L
(b)
(a),U
(b)
(a)}|
converges to zero in probability, where E
M
(·) denotes the expectation taken
with respect to the bootstrap weights.
A.7. Proof of Theorem 4.2
Based on the established consistency of the bootstrapping bounds in Section A.6,
the proof can be decomposed into two parts. One part is for
P{a
n(μ
tap
μ
g
)
U
1α/2
(a)}≥P{U(a)
U
1α/2
(a)}
Test-and-pool estimator 1525
= G
U
{
U
1α/2
(a)}−
G
U
{
U
1α/2
(a)}
+
G
U
{
U
1α/2
(a)}
= o(1) + 1 α/2,
where G
U
(·) is the cumulative distribution function for U (a). Let
G
U
(·)bethe
empirical cumulative distribution function
U(a) estimated by bootstrapping.
Similarly, we can show that the other part of our proof as
P{a
n(μ
tap
μ
g
)
L
α/2
(a)}≤P{L(a)
L
α/2
(a)}
= G
L
{
L
α/2
(a)}−
G
L
{
L
α/2
(a)}
+
G
L
{
L
α/2
(a)}
= o(1) + α/2,
where G
L
(·) is the cumulative distribution function for L(a). Combine the re-
sults we have above, we can obtain that
P(
L
α/2
(a) a
n(μ
tap
μ
g
)
U
1α/2
(a))
= P{a
n(μ
tap
μ
g
)
U
1α/2
(a)}
P{a
n(μ
tap
μ
g
)
L
α/2
(a)}
1 α/2+o(1) α/2+o(1) = 1 α.
Thus, the proof is completed.
A.8. Proof of Remark 4.1
In this section, we construct a data-adaptive confidence interval based on the
projection sets proposed in [48]. Starting from the common projection sets, we
re-express the test-and-pool estimator
a
n
1/2
(μ
tap
μ
g
)=a
V
1/2
eff
W
1
+ a
(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)W
2
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)W
t
[c
γ
,)
.
For given μ
2
,weknowthat
n
1/2
{μ
tap
(μ
2
) μ
g
} = a
V
1/2
eff
W
1
+ a
(ω
A
V
1/2
A-eff
ω
B
V
1/2
B-eff
)W
2
(μ
2
)
+ a
ω
B
(V
1/2
B-eff
+ V
1/2
A-eff
)W
t
[c
γ
,)
(μ
2
),
where the right hand side can be approximated by empirical sample distribu-
tion as
Q
n
(μ; a) and we could construct a (1 ˜α
1
) × 100% confidence interval
B
μ
g
,1 ˜α
1
(a; μ
2
)ofμ
g
given μ
2
by the empirical quantile confidence interval as
B
μ
g
,1 ˜α
1
(a; μ
2
)
1526 C. Gao and S. Yang
=
μ
g
R
l
: μ
tap
(μ
2
)
Q
1
n
(1 α/2; a)
n
μ
g
μ
tap
(μ
2
)
Q
1
n
(α/2; a)
n
,
where
Q
1
n
(d; a)isthed-th sample quantiles based on our empirical distribution.
However, the value of μ
2
is unknown, a useful approach is to form a (1
˜α
2
)×100% confidence region B
μ
2,
1 ˜α
2
for μ
2
, and thus the projection confidence
interval for μ
g
is the union of B
μ
g
,1 ˜α
1
(a; μ
2
) over all μ
2
B
μ
2,
1 ˜α
2
. Here, the
confidence bounds for μ
2
can be constructed as B
μ
2
1 ˜α
2
= μ
2
±Φ
1
(1 ˜α
2
/2)
where
μ
2
= n
1/2
f
1/2
B
Σ
1/2
T
N
1
N
i=1
˙
Φ
B
(V
i
A,i
B,i
; μ
A
, τ)
1
(μ
A
μ
B
),
Φ
1
(·) is the inverse cdf for a standard normal distribution. Thus, let α α
1
α
2
and the union would be the data-adaptive projection (1 α) ×100% confidence
interval for μ
g
C
PCI
μ
g
,1α
(a)=
μ
2
B
μ
2,
1 ˜α
2
B
μ
g
,1 ˜α
1
(a; μ
2
). (A.18)
To limit conservatism, a pretest procedure is carried out while we construct
the projection adaptive confidence intervals C
PACI
μ
g
,1α
(a), and we would use the
C
PCI
μ
g
,1α
(a) if we cannot reject the H
0
: μ
μ B.Toprovethecoverageforthe
projection adaptive confidence interval, denote for α (0, 1),wehavethat
P
a
μ
g
/C
PACI
μ
g
,1α
(a)
= P
a
μ
g
/C
PACI
μ
g
,1α
(a) | T v
n
P(T v
n
)
+ P
a
μ
g
/B
μ
g
,1α
(a; μ
2
)|T>v
n
P(T>v
n
)
= P(a
μ
g
/C
PCI
μ
g
,1α
(a)
2
B
μ
2,
1 ˜α
2
| T v
n
)P(T v
n
)
+ P(a
μ
g
/C
PCI
μ
g
,1α
(a)
2
/B
μ
2,
1 ˜α
2
| T v
n
)P(T v
n
)
+ {˜α
1
+ o(1)}P(T>v
n
)
P{a
μ
g
/B
μ
g
,1 ˜α
1
(a; μ
2
)
2
B
μ
2,
1 ˜α
2
| T v
n
}P(T v
n
)
+ P(μ
2
/B
μ
2,
1 ˜α
2
| T v
n
)P(T v
n
)+αP(T>v
n
)
α
1
α
2
)P(T v
n
)+αP(T>v
n
)
= α,
where we know that P{a
μ
g
/ B
μ
g
,1 ˜α
1
(a; μ
2
)
2
B
μ
2,
1 ˜α
2
}≤˜α
1
holds for
any value μ
2
.
A.9. Proof of Lemma A.1
Following the similar arguments in [55], let F (·)andG(·) be the cumulative
distribution function (c.d.f.) of N(μ
g
,V
1
)andN(μ
g,0
,V
2
). Let Φ(t)bethe
convolution of G(·)andF (·)a(·)=(G F )(·), then we have
|P{(μ
g
μ
g
) t}−Φ(t)|
Test-and-pool estimator 1527
E
ζ
sup
x
P(μ
g
x |F
N
) F (x)
+ |E
ζ
{F (s) Φ(t)}|,
where s = t + μ
g
= t (μ
g
). By Lemma 3.2 in [45], |P(μ
g
x |F
N
) F (x)|
converges to 0 uniformly in x. For the first term, we have
lim
N→∞
E
ζ
sup
x
P(μ
g
x |F
N
) F (x)
E
ζ
lim
N→∞
sup
x
P(μ
g
x |F
N
) F (x)
0.
Since F (·)andG(·) are both bounded and continuous, by the dominated con-
vergence theorem, the second term is
lim
N→∞
E
ζ
{F(s)}−Φ(t)=E
ζ
lim
N→∞
F(t (μ
g
))
Φ(t)
=
x
G(x)F (t x)dx Φ(t),
which also converges to 0 [55, Lemma 1]. Hence, the asymptotic c.d.f of μ
g
μ
g
is Φ(·) and the result follows as the convolution of Gaussians is still Gaussian
[1, 8].
A.10. Proof of Lemma A.2
Under Assumptions 2.1, 2.2 (iii) and Assumption A.1 f), we have
0=N
1
N
i=1
E
np-p
{Φ
B
(V
i
A,i
B,i
; μ
B,0
0
) |F
N
}
=N
1
N
i=1
E
np-p
{Φ
B
(V
i
A,i
B,i
; μ
g,0
0
) |F
N
}
+ N
1
N
i=1
E
np-p
˙
Φ
B
(V
i
A,i
B,i
; μ
B
0
) |F
N
(μ
B,0
μ
g,0
)
=E{Φ
B
(V
i
A,i
B,i
; μ
g,0
0
)}
+ E
˙
Φ
B
(V
i
; μ
B
0
)
(μ
B,0
μ
g,0
)+O
np-p-ζ
(n
1/2
),
for some μ
B
between μ
B,0
and μ
g,0
,where
N
1
N
i=1
E
np-p
{Φ
B
(V
i
A,i
B,i
; μ
g,0
0
) |F
N
}
= E
ζ
[E
np-p
{Φ
B
(V
i
A,i
B,i
; μ
g
0
) |F
N
}]+O
np-p-ζ
(n
1/2
) (A.19)
= E{Φ
B
(V
i
A,i
B,i
; μ
g,0
0
)} + O
np-p-ζ
(N
1/2
)+O
np-p-ζ
(n
1/2
), (A.20)
1528 C. Gao and S. Yang
where for (A.19), the first approximation E
np-p
(·|F
N
) is based on the design
consistency and the non-probability sample-based Weak Law of Large Numbers
under Assumption 2.2 (iii), and the second approximation E
ζ
(·) is justified un-
der Assumption A.1 f); For (A.20), it can be obtained by continuous mapping
theorem as μ
g
= μ
g,0
+ O
ζ
(N
1/2
) under Assumption A.1 f). By rearranging
the terms under the local alternative, it follows that
μ
B,0
μ
g,0
=
E
˙
Φ
B
(V ; μ
B
0
)

1
E{Φ
B
(V
i
A,i
B,i
; μ
g,0
0
)} + O
np-p-ζ
(n
1/2
)
= O(1) × n
1/2
B
η + O
np-p-ζ
(n
1/2
)=o
np-p-ζ
(1).
A.11. Proof of Lemma A.3
First, we show that the composite estimator μ
pool
is essentially the solution to
N
i=1
{Φ
A
(V
i
A,i
; μ, τ)+ΛΦ
B
(V
i
A,i
B,i
; μ, τ)} =0.
Next, under the Assumption A.1 a)-d), we apply the Taylor expansion at point
(μ
g
0
) which leads to
0=
N
i=1
{Φ
A
(V
i
A,i
; μ
pool
, τ)+ΛΦ
B
(V
i
A,i
B,i
; μ
pool
, τ)}
=
N
i=1
{Φ
A
(V
i
A,i
; μ
g
0
)+ΛΦ
B
(V
i
A,i
B,i
; μ
g
0
)}
+
N
i=1
Φ
A
(V
i
A,i
; μ
pool
, τ
)
∂μ
Φ
B
(V
i
A,i
B,i
; μ
pool
, τ
)
∂μ
(μ
pool
μ
g
)
+
N
i=1
Φ
A
(V
i
A,i
; μ
pool
, τ
)
∂τ
Φ
B
(V
i
A,i
B,i
; μ
pool
, τ
)
∂τ
(τ τ
0
),
for some (μ
pool
, τ
) between (μ
pool
, τ)and(μ
g
0
). Given the asymptotic joint
distribution for μ
A
and μ
B
in Lemma 3.1, we obtain
n
1/2
(μ
pool
μ
g
)
= n
1/2
N
i=1
˙
Φ
A
(V
i
A,i
; μ
pool
, τ
)+Λ
˙
Φ
B
(V
i
A,i
B,i
; μ
pool
, τ
)
1
×
N
i=1
{Φ
A
(V
i
A,i
; μ
g
0
)+ΛΦ
B
(V
i
A,i
B,i
; μ
g
0
)}
+
N
i=1
Φ
A
(V
i
A,i
; μ
pool
, τ
)/∂τ
Test-and-pool estimator 1529
Φ
B
(V
i
A,i
B,i
; μ
pool
, τ
)/∂τ
(τ τ
0
)
=
E
˙
Φ
A,B,n
pool
0
)
1
×
n
1/2
E
˙
Φ
A
(V ; μ
g,0
0
)
(μ
A
μ
g
)+n
1/2
ΛE
˙
Φ
B
(V ; μ
B
0
)
(μ
B
μ
g
)
,
(A.21)
for some intermittent value μ
pool
between plimμ
pool
and μ
g,0
, where Equation
(A.21) is obtained by using Equation (A.7)and(A.2) collectively. By Assump-
tions 2.1, 2.2 (iii) and suitable moments condition in Assumption A.1, under
the local alternative, n
1/2
(μ
pool
μ
g
) would follow the normal distribution with
mean and variance as
E
n
1/2
(μ
pool
μ
g
)
= f
1/2
B
E
˙
Φ
A,B,n
g,0
0
)
1
Λη,
var
n
1/2
(μ
pool
μ
g
)
= E
˙
Φ
A,B,n
g,0
0
)
1
×

E
˙
Φ
A
(V ; μ
g,0
0
)
ΛE
˙
Φ
B
(V ; μ
g,0
0
)

V
A
Γ
Γ
V
B

E
˙
Φ
A
(V ; μ
g,0
0
)
ΛE
˙
Φ
B
(V ; μ
g,0
0
)
×
E
˙
Φ
A,B,n
g,0
0
)
1
,
obtained by the similar arguments in (A.5). Plugging (A.11) into Equation
(A.21), the asymptotic distribution of the most efficient estimator μ
eff
follows
n
1/2
(μ
eff
μ
g
)
=
E
˙
Φ
A,B,n
eff
g,0
0
)
1
×
E
˙
Φ
A
(V ; μ
g,0
0
) · n
1/2
(μ
A
μ
g
)+Λ
eff
E
˙
Φ
B
(V ; μ
g,0
0
) · n
1/2
(μ
B
μ
g
)
=
n
1/2
{ω
A
eff
)(μ
A
μ
g
)+ω
B
eff
)(μ
B
μ
g
)}.
It yields a similar efficient estimator as derived in [71]
n
1/2
(μ
eff
μ
g
)
=
n
1/2
{ω
A
eff
)μ
A
+ ω
B
eff
)μ
B
μ
g
}, (A.22)
with
ω
A
(Λ) = E
˙
Φ
A,B,n
g,0
0
)
1
E
˙
Φ
A
(V
i
A,i
; μ
g,0
0
)
,
ω
B
(Λ) = E
˙
Φ
A,B,n
g,0
0
)
1
ΛE
˙
Φ
B
(V
i
A,i
B,i
; μ
g,0
0
)
,
where it is easy to show that ω
A
+ ω
B
= I
l×l
. So that the asymptotic variance
V
eff
of this efficient estimator will become
V
eff
=
ω
A
eff
)
ω
B
eff
)
V
A
Γ
Γ
V
B

ω
A
eff
)
ω
B
eff
)
.
The expression of V
eff
can be complicated when the dimension of the parameters
of interest is greater than 1. Here, we provide the form of V
eff
when estimating
equations are (2.4)and(2.5):
ω
A
eff
)=E
˙
Φ
A,B,n
eff
g,0
0
)
1
E
˙
Φ
A
(V,δ
A
; μ
g,0
0
)
,
1530 C. Gao and S. Yang
= {I
l×l
+(V
A
Γ)(V
B
Γ
)
1
}
1
=(V
B
Γ
)(V
A
+ V
B
Γ Γ
)
1
,
ω
B
eff
)=E
˙
Φ
A,B,n
eff
g,0
0
)
1
Λ
eff
E
˙
Φ
B
(V,δ
A
B
; μ
g,0
0
)
,
= {I
l×l
+(V
A
Γ)(V
B
Γ
)
1
}
1
(V
A
Γ)(V
B
Γ
)
1
=(V
B
Γ
)(V
A
+ V
B
Γ Γ
)
1
(V
A
Γ)(V
B
Γ
)
1
,
and
V
eff
=

1
1

V
A
Γ
Γ
V
B

1
1

2
×
V
B
Γ
V
A
Γ
V
A
Γ
Γ
V
B

V
B
Γ
V
A
Γ
=(V
A
+ V
B
Γ
Γ)
2
{(V
B
Γ
)
2
V
A
+(V
A
Γ)
2
V
B
(V
B
Γ
)(V
A
Γ
)+Γ
(V
A
Γ)(V
B
Γ)}
=(V
A
V
B
Γ
2
)(V
A
+ V
B
2Γ)
1
= V
A
V
Δ
,
with V
Δ
=(V
A
Γ)
2
(V
A
+ V
B
2Γ)
1
guaranteed to be non-negative definite,
i.e., non-negative quantity. By Cauchy-Schwarz inequality, we have
E{(μ
A
μ
g
)
2
E{(μ
B
μ
g
)
2
}≥E{(μ
A
μ
g
)(μ
B
μ
g
)},
which leads to
V
A
V
B
Γ, and therefore
V
A
+ V
B
2{|V
A
V
B
|
1/2
Γ}≥0,
where the two sides are equal if and only if V
A
= V
B
= Γ. The asymptotic vari-
ance of the efficient estimator for other multi-dimensional estimating equations
can be obtained in an analogous way but with much heavier notations.
Appendix B: Simulation
B.1. A detailed illustration of simulation
Here, we will provide detailed proof for estimating the finite-population param-
eter μ
y
= μ
g
= N
1
N
i=1
Y
i
and μ
0
= E
ζ
(Y ). First, we know the following
expectation that
E
np
(δ
B,i
| X
i
,Y
i
)=π
B
(X
i
,Y
i
), E
np
(Y
i
| X
i
)=m(X
i
).
To obtain the asymptotic joint distribution μ
A
and μ
B
, the stacked estimating
equation system Φ(V,δ
A
B
; θ) is constructed with θ =(μ
A
B
)
where
Φ(V,δ
A
B
; θ)={Φ
A
(V,δ
A
; θ)
, Φ
B
(V,δ
A
B
; θ)
, Φ
τ
(V,δ
A
B
; θ)
}
, (B.1)
Test-and-pool estimator 1531
where we use μ
A
and μ
B
to distinguish between estimators yielded by Φ
A
(V,δ
A
;
μ
A
)an
B
(V,δ
A
B
; μ
B
). By positing a logistic regression model π
B
(X
i
; α)=
exp(X
i
α)/{1+exp(X
i
α)} and a linear model m(X
i
; β)=X
i
β, one common
choices for Φ
A
(V,δ
A
; μ
A
)an
B
(V,δ
A
B
; μ
B
)are
Φ
A
(V,δ
A
; μ
A
)=δ
A
π
1
A
(Y μ
A
),
Φ
B
(V,δ
A
B
; μ
B
)=
δ
B
π
B
(X; α)
{Y m (X; β)} +
δ
A
π
A
m (X; β) μ
B
,
where τ =(α, β)andπ
A
is the known sample weights under probability samples
accounting for sample design. There are various ways to construct the estimating
functions Φ
τ
(V
i
; α, β)for(α, β). One standard approach is to use the pseudo
maximum likelihood estimator α and the ordinary least square estimator
β
[54, 26]. In usual, the maximum likelihood estimator of α can be computed by
maximizing the log-likelihood function l(α)
α =argmax
α
N
i=1
[δ
B,i
log π
B
(X
i
; α)+(1 δ
B,i
) log{1 π
B
(X
i
; α)}]
=argmax
α
N
i=1
δ
B,i
log
π
B
(X
i
; α)
1 π
B
(X
i
; α)
+
N
i=1
log{1 π
B
(X
i
; α)}.
Since we do not have the X
i
for all units in the finite population, we then instead
construct the following pseudo log-likelihood function l
(α)
l
(α)=
N
i=1
δ
B,i
log
π
B
(X
i
; α)
1 π
B
(X
i
; α)
+
N
i=1
δ
A,i
π
1
A,i
log{1 π
B
(X
i
; α)}
=
N
i=1
δ
B,i
X
i
α δ
A,i
π
1
A,i
log{1+exp(X
i
α)}
,
where the second equality is derived under the logistic regression model for
π
B
(X
i
; α). By taking derivative of l
(α) with respect to α, the estimating func-
tions for (α, β) can be constructed as follows:
Φ
τ,1
(V,δ
A
B
; α, β)=δ
B
X δ
A
π
1
A
π
B
(X; α)X, (B.2)
Φ
τ,2
(V,δ
A
B
; α, β)=δ
B
X{Y m(X; β)}, (B.3)
with Φ
τ
(V,δ
A
B
; α, β)=(Φ
τ,1
(V,δ
A
B
; α, β)
Φ
τ,2
(V,δ
A
B
; α, β)
)
.Un-
der our setup, both Sample A and Sample B provide information on X and
Y , thus we can also consider the estimating equation based on the combined
samples for β:
Φ
τ,1
(V,δ
A
B
; α, β)=δ
B
X δ
A
π
1
A
π
B
(X; α)X,
Φ
τ,2
(V,δ
A
B
; α, β)=(δ
A
+ δ
B
)X{Y m(X; β)}. (B.4)
1532 C. Gao and S. Yang
In addition, [29] propose a new set of estimating functions, in which (α,
β)are
obtained by jointly solve the following estimating functions:
Φ
KH
τ,1
(V,δ
A
B
; α, β)=
δ
B
π
1
B
(X; α) δ
A
π
1
A
X, (B.5)
Φ
KH
τ,2
(V,δ
A
B
; α, β)=δ
B
{π
1
B
(X; α) 1}X{Y m(X; β)}. (B.6)
Denote the solution to
N
i=1
Φ(V
i
A,i
B,i
; θ)=0as
θ =(μ
A
, μ
B
, τ
)
. Under
Assumption A.1 a)-e), we could apply the Taylor expansion to around θ
y
=
(μ
y
y
0
)
and obtain
0=
N
i=1
Φ(V
i
A,i
B,i
;
θ)
=
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)+
N
i=1
Φ(V
i
A,i
B,i
;
θ
)
∂θ
(
θ θ
y
), (B.7)
for some
θ
=(μ
A
, μ
B
, τ
)
lying between
θ and θ
y
. Under Assumption 2.1,
the consistency of μ
A
for μ
y
can be established, i.e., μ
A
= μ
y
+ O
p
(n
1/2
).
Moreover, under Assumption A.1 f), we have μ
y
= μ
0
+ O
ζ
(N
1/2
) and hence
plimμ
A
= μ
0
, i.e., μ
A
converges to μ
0
in probability. Under Assumption A.1
b), μ
B
is consistent to μ
B,0
,andμ
B,0
= μ
0
+ O
ζ-p-np
(n
1/2
) under the local
alternative. Denote θ
0
=(μ
0
0
0
)
, and the following uniform convergence
can be established under Assumption A.1 (a)-(c) and (e)
N
1
N
i=1
Φ(V
i
A,i
B,i
;
θ
)
∂θ
= E
Φ(V
i
A,i
B,i
; θ
0
)
∂θ
+ O
ζ-p-np
(n
1/2
)+O
ζ
(N
1/2
),
and by Assumption A.1 (d), we have
N
1
N
i=1
Φ(V
i
A,i
B,i
;
θ
)
∂θ
1
=
E
Φ(V
i
A,i
B,i
; θ
0
)
∂θ

+ o
ζ-p-np
(1).
Rearrange the terms of (B.7), we then have
n
1/2
(
θ θ
y
)
=
N
1
N
i=1
φ(
θ
)
1
n
1/2
N
1
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)
+ o
ζ-p-np
(1)
= −{Eφ(θ
0
)}
1
n
1/2
N
1
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)
+ o
ζ-p-np
(1),
Test-and-pool estimator 1533
where φ(θ)=Φ(V,δ
A
B
; θ)/∂θ
. For the simplicity of notation, we denote
π
B
(X
i
; α)=π
B,i
,m(X
i
; β)=m
i
, ˙m
i
= ∂m(X
i
; β) /∂β, and its expectation is
given by
E {φ(θ)}
= diag
11π
B
(X
i
; α){1 π
B
(X
i
; α)}X
i
X
i
(π
B,i
d
1
i
)X
i
X
i
,
(B.8)
where Ω = 0 if Φ
τ
(V,δ
A
B
; α, β) is constructed by (B.2)and(B.3), and Ω = 1
if Φ
τ
(V,δ
A
B
; α, β) is constructed by (B.2)and(B.4); π
B,i
= P(δ
B,i
=1| X
i
)
is the true probability. In addition, if (B.5)and(B.6) are used to estimate τ,it
gives us
E {φ
KH
(θ)}
=
E(δ
A,i
d
i
)0 0 0
01 0 0
00E
δ
B,i
(1π
B,i
)X
i
X
i
π
B,i
0
00E
δ
B,i
(1π
B,i
)(Y
i
m
i
)X
i
X
i
π
B,i
E
δ
B,i
(1π
B,i
)X
i
X
i
π
B,i
= diag
11(1 π
B,i
)X
i
X
i
(1 π
B,i
)X
i
X
i
. (B.9)
Below, we focus on the asymptotic properties of n
1/2
(
θ θ
y
) under (B.8), and
the asymptotics under under (B.9) can be obtained in an analogous way. First,
the inverse of E {φ(θ)} is
[E {φ(θ)}]
1
= diag
11π
B
(X
i
; α){1 π
B
(X
i
; α)}X
i
X
i
(π
B,i
d
1
i
)X
i
X
i
1
.
As shown in [12] under Assumption A.1 g), the asymptotic variance of μ
B
will not be affected by the estimated
β.Letπ
B,i,0
= π
B
(X
i
; α
0
)andm
i,0
=
m(X
i
; β
0
) be the correct working model evaluated the true parameter value
(α
0
0
). Therefore, the
N
i=1
Φ(V
i
A,i
B,i
; θ
y
) can be found by using the de-
composition
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)
=
0
N (h
N
μ
y
)+
N
i=1
δ
B,i
π
1
B,i,0
(Y
i
m
i,0
h
N
) b
X
i
N
i=1
δ
B,i
X
i
N
i=1
π
B,i,0
X
i
N
i=1
δ
B,i
(Y
i
X
i
β
0
)X
i
+
N
i=1
δ
A,i
d
i
(Y
i
μ
y
)
N
i=1
δ
A,i
d
i
t
i
N
i=1
π
B,i,0
X
i
N
i=1
δ
A,i
d
i
π
B,i,0
X
i
0
,
1534 C. Gao and S. Yang
where
h
N
= N
1
N
i=1
(Y
i
m
i,0
) ,
b
=[(1 π
B,i,0
){Y
i
m
i,0
h
N
}X
i
] {N
1
N
i=1
π
B,i,0
(1 π
B,i,0
)X
i
X
i
}
1
,
t
i
= π
A,i
X
i
b + m
i,0
N
1
N
i=1
m
i,0
.
Since the probability sample is assumed to be independent of the non-probability
sample [12], we could express the variance for
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)astwo
components V
1
and V
2
under Assumption 2.1 and 2.2 (iii)
var
n
1/2
N
1
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)
= V
1
+ V
2
= nN
2
N
i=1
π
B,i,0
(1 π
B,i,0
)
× E
ζ
00 0 0
2
ΔX
i
ΔY
i
X
i
X
i
X
i
X
i
Y
i
X
i
X
i
Y
i
X
i
Y
i
X
i
X
i
(Y
i
X
i
β
0
)
2
X
i
X
i
(B.10)
+ nN
2
E
ζ
D
11
D
12
D
13
0
D
12
D
22
D
23
0
D
13
D
23
D
33
0
0000
+ o(1), (B.11)
where
V
1
=var
ζ-np
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)
,
V
2
=var
ζ-p
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)
and
Δ=π
1
B,i,0
{y
i
m
i,0
h
N
}−b
x
i
.
By the law of total variance, we have
var
ζ-np
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)
= E
ζ
var
np
N
i=1
Φ(V
i
A,i
B,i
; θ
y
) |F
N

+var
ζ
E
np
N
i=1
Φ(V
i
A,i
B,i
; θ
y
) |F
N

,
Test-and-pool estimator 1535
Algorithm B.1: Replication-based method for estimating variance of
μ
A
and μ
B
Input: the probability sample {(V
i
A,i
):i ∈A}, the non-probability sample
{(V
i
B,i
):i ∈B}and the number of bootstrap K.
for b =1, ··· ,K do
Sample n
A
units from the probability sample with replacement as A
(b)
.
Sample n
B
units from the non-probability sample with replacement as B
(b)
.
Compute the bootstrap replicates μ
(b)
A
and μ
(b)
B
by solving
i∈A
(b)
Φ
A
(V
i
A,i
; μ)=0,
i∈A
(b)
∪B
(b)
Φ
B
(V
i
A,i
B,i
; μ, τ )=0.
Calculate the variance estimator
V
A
,
Γand
V
B
Γ=n(K 1)
1
K
b=1
(μ
(b)
A
μ
A
)(μ
(b)
B
μ
B
)
,
V
D
= n(K 1)
1
K
b=1
(μ
(b)
D
μ
D
)(μ
(b)
D
μ
D
)
,D= A, B,
where
μ
D
= K
1
K
b=1
μ
(b)
D
for D = A, B.
where the second term will be negligible under Assumption A.1 g) and h).
Similar arguments hold for var
ζ-p
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)
, therefore, (B.10)
and (B.11) follow. The sub-matrices D
kl
,k =1, ···, 3,l =1, ···, 3 are all design-
based variance-covariance matrices under the probability sampling design, and
can be obtained using standard plug-in approach.
Alternatively, a with-replacement bootstrap variance estimation can also be
used here [43]. To illustrate, we consider a single-stage probability proportional
to size sampling with negligible sampling ratios. Following [57], the bootstrap
procedures in Algorithm B.1 are conducted.
Under Assumptions 2.1 and A.1,
θ θ
y
|F
N
and θ
y
are both approximately
normal, which leads to the asymptotic normality of the unconditional distribu-
tion over all the finite populations by Lemma A.1:
n
1/2
(
θ θ
y
)
N
*
θ
, {Eφ(θ
0
)}
1
var
n
1/2
N
1
N
i=1
Φ(V
i
A,i
B,i
; θ
y
)
{Eφ(θ
0
)
}
1
+
,
where θ
=(0 f
1/2
B
[E{Φ
B
(μ
0
0
)/∂μ}]
1
η 0)
. Thus, the asymptotic
variance for the joint distribution n
1/2
(μ
A
μ
y
, μ
B
μ
y
)
is obtain by the 2 ×2
submatrix corresponding as
var{n
1/2
(μ
A
μ
y
, μ
B
μ
y
)
}
= nN
2
D
11
D
12
D
21
N
i=1
(1 π
B,i,0
)π
B,i,0
Δ
2
+ D
22
+ o(1)
1536 C. Gao and S. Yang
=
V
A
Γ
Γ
V
B
+ o(1),
and
n
1/2
μ
A
μ
y
μ
B
μ
y
→N

0
f
1/2
B
[E{Φ
B
(μ
0
0
)/∂μ}]
1
η
,
V
A
Γ
Γ
V
B

→N

0
f
1/2
B
η
,
V
A
Γ
Γ
V
B

,
where E{Φ
B
(μ
0
0
)/∂μ} = 1.
B.2. A detailed illustration of bias and mean squared error
Here,wetak
A
(V,δ
A
; μ) as Equation (2.4)an
B
(V,δ
A
B
; μ, τ) as Equation
(2.5) for an illustration. For T c
γ
,wehave
n
1/2
(μ
tap
μ
g
)
=
V
A
V
B
Γ
2
V
A
+ V
B
1/2
W
1
+
V
A
) λ V
B
)
(1 + λ)(V
A
+ V
B
2Γ)
1/2
W
2
|W
2
2
c
γ
,
with probability ξ = F
1
(c
γ
; μ
2
2
), which leads to
bias(λ, c
γ
; η)
T c
γ
=
V
A
V
B
Γ
2
V
A
+ V
B
1/2
μ
1
+
V
A
) λ V
B
)
(1 + λ)(V
A
+ V
B
2Γ)
1/2
· E(W
2
|W
2
2
c
γ
)
=
V
A
V
B
Γ
2
V
A
+ V
B
1/2
μ
1
+
V
A
) λ V
B
)
(1 + λ)(V
A
+ V
B
2Γ)
1/2
· μ
2
F
3
(c
γ
; μ
T
2
μ
2
/2)
F
1
(c
γ
; μ
T
2
μ
2
/2)
=
ηf
1/2
B
V
A
)
V
A
+ V
B
+
ηf
1/2
B
{ V
A
) λ V
B
)}
(1 + λ)(V
A
+ V
B
2Γ)
F
3
(c
γ
; μ
T
2
μ
2
/2)
F
1
(c
γ
; μ
T
2
μ
2
/2)
,
and
mse(λ, c
γ
; η)
T c
γ
=
V
A
V
B
Γ
2
V
A
+ V
B
· (μ
2
1
+1)+
V
A
) λ V
B
)
(1 + λ)(V
A
+ V
B
2Γ)
1/2
2
× E(W
2
2
|W
2
2
c
γ
)
2
V
A
V
B
Γ
2
1/2
{ V
A
) λ V
B
)}
(1 + λ)(V
A
+ V
B
2Γ)
μ
1
· μ
2
F
3
(c
γ
; μ
T
2
μ
2
/2)
F
1
(c
γ
; μ
T
2
μ
2
/2)
Test-and-pool estimator 1537
=
V
A
V
B
Γ
2
V
A
+ V
B
· (μ
2
1
+1)+
λ V
B
) V
A
)
(1 + λ)(V
A
+ V
B
2Γ)
1/2
2
×
F
3
(c
γ
; μ
2
2
/2)
F
1
(c
γ
; μ
2
2
/2)
+ μ
2
2
F
5
(c
γ
; μ
2
2
/2)
F
1
(c
γ
; μ
2
2
/2)
2
V
A
V
B
Γ
2
1/2
{ V
A
) λ V
B
)}
(1 + λ)(V
A
+ V
B
2Γ)
μ
1
· μ
2
F
3
(c
γ
; μ
T
2
μ
2
/2)
F
1
(c
γ
; μ
T
2
μ
2
/2)
.
For T>c
γ
,wehave
n
1/2
(μ
tap
μ
g
)=
V
A
V
B
Γ
2
V
A
+ V
B
1/2
W
1
+
V
A
)
(V
A
+ V
B
2Γ)
1/2
W
2
|W
2
2
>c
γ
,
with probability 1ξ =1F
1
(c
γ
; μ
2
2
), the corresponding bias and mean squared
error would be
bias(λ, c
γ
; η)
T>c
γ
=
V
A
V
B
Γ
2
V
A
+ V
B
1/2
μ
1
+
V
A
)
(V
A
+ V
B
2Γ)
1/2
· μ
2
1 F
3
(c
γ
; μ
T
2
μ
2
/2)
1 F
1
(c
γ
; μ
T
2
μ
2
/2)
=
ηf
1/2
B
V
A
)
V
A
+ V
B
+
ηf
1/2
B
V
A
)
V
A
+ V
B
1 F
3
(c
γ
; μ
T
2
μ
2
/2)
1 F
1
(c
γ
; μ
T
2
μ
2
/2)
,
and
mse(λ, c
γ
; η)
T>c
γ
=
V
A
V
B
Γ
2
V
A
+ V
B
· (μ
2
1
+1)+
V
A
)
2
V
A
+ V
B
× E(W
2
2
|W
2
2
>c
γ
)
2
V
A
V
B
Γ
2
1/2
V
A
)
V
A
+ V
B
μ
1
· μ
2
1 F
3
(c
γ
; μ
T
2
μ
2
/2)
1 F
1
(c
γ
; μ
T
2
μ
2
/2)
=
V
A
V
B
Γ
2
V
A
+ V
B
+
V
A
)
2
V + V
B
×
1 F
3
(c
γ
; μ
2
2
/2)
1 F
1
(c
γ
; μ
2
2
/2)
+ μ
2
2
1 F
5
(c
γ
; μ
2
2
/2)
1 F
1
(c
γ
; μ
2
2
/2)
2
V
A
V
B
Γ
2
1/2
V
A
)
V
A
+ V
B
μ
1
· μ
2
1 F
3
(c
γ
; μ
T
2
μ
2
/2)
1 F
1
(c
γ
; μ
T
2
μ
2
/2)
.
Then, the bias and mean squared error for n
1/2
(μ
tap
μ
g
) would be
bias(λ, c
γ
; η) = bias(λ, c
γ
; η)
T c
γ
· ξ + bias(λ, c
γ
; η)
T>c
γ
· (1 ξ)
=
ηf
1/2
B
V
A
)
V
A
+ V
B
+
ηf
1/2
B
{−λ V
B
)+(Γ V
A
)}
(1 + λ)(V
A
+ V
B
2Γ)
F
3
(c
γ
; μ
T
2
μ
2
/2)
+
ηf
1/2
B
V
A
)
V
A
+ V
B
1 F
3
(c
γ
; μ
T
2
μ
2
/2)
1538 C. Gao and S. Yang
=
ληf
1/2
B
1+λ
Γ V
B
V
A
+ V
B
+
Γ V
A
V
A
+ V
B
F
3
(c
γ
; μ
T
2
μ
2
/2)
= ηd
0
, (B.12)
with
d
0
= λf
1/2
B
(1 + λ)
1
(ω
A
+ ω
B
)F
3
(c
γ
; μ
T
2
μ
2
/2),
and
mse(λ, c
γ
; η)=
V
A
V
B
Γ
2
V
A
+ V
B
· (μ
2
1
+1)
+
{λ V
B
) V
A
)}
2
(1 + λ)
2
(V
A
+ V
B
2Γ)
×
F
3
(c
γ
; μ
2
2
/2) + μ
2
2
F
5
(c
γ
; μ
2
2
/2)
+
V
A
)
2
V
A
+ V
B
×
1 F
3
(c
γ
; μ
2
2
/2) + μ
2
2
μ
2
2
F
5
(c
γ
; μ
2
2
/2)
2
V
A
V
B
Γ
2
1/2
V
A
)
V
A
+ V
B
1 F
3
(c
γ
; μ
T
2
μ
2
/2)
μ
1
μ
2
2
V
A
V
B
Γ
2
1/2
{ V
A
) λ V
B
)}
(1 + λ)(V
A
+ V
B
2Γ)
F
3
(c
γ
; μ
T
2
μ
2
/2)μ
1
μ
2
= V
eff
d
1
+ V
B-eff
d
2
+ V
A-eff
d
3
+ V
1/2
eff
(V
1/2
B-eff
d
4
+ V
1/2
A-eff
d
5
), (B.13)
with
d
1
= μ
2
1
+1,
d
2
= λ(1 + λ)
2
F
3
(c
γ
; μ
2
2
/2) + μ
2
2
F
5
(c
γ
; μ
2
2
/2)
{λ 2ω
B
A
},
d
3
=1F
3
(c
γ
; μ
2
2
/2) + μ
2
2
1 F
5
(c
γ
; μ
2
2
/2)
+(1 + λ)
2
F
3
(c
γ
; μ
2
2
/2) + μ
2
2
F
5
(c
γ
; μ
2
2
/2)
,
d
4
=2λ(1 + λ)
1
μ
1
μ
2
F
3
(c
γ
; μ
2
2
/2),
d
5
= 2μ
1
μ
2
1 F
3
(c
γ
; μ
2
2
/2) + F
3
(c
γ
; μ
2
2
/2)(1 + λ)
1
.
Let V
A
=2,V
B
=1, Γ=0.5, and η =0, 0.5and1.5 (encoding zero, weak,
and strong violation of H
0
)in(B.12)and(B.13). Figure B.1 shows three mean
squared error surfaces as functions of ,c
γ
) with three values of η.
a) In the leftmost plot, where H
0
holds, for a given Λ, the mean squared
error decreases drastically and then flattens out as c
γ
increases. Moreover,
for a given c
γ
, there exists a minimizer Λ
such that the mean squared
error achieves the minimum. These observations justify our strategy by
viewing Λ and c
γ
jointly as tuning parameters since both of them are
playing important roles when searching for the minimum value of mean
squared error.
b) In the middle plot, where H
0
is weakly violated, the pattern of the mean
squared error retains the similar features for c
γ
asshownin(A).Inaddi-
tion, the optimal choice Λ
leads to a sharp decline of the mean squared
Test-and-pool estimator 1539
Fig B.1. The plots for the mean squared errors in a synthetic example. Leftmost (A) plots
the mean square error mse,c
γ
; η) of n
1/2
(μ
tap
μ
g
) as function of Λ and c
γ
when the null
hypothesis H
0
holds true (η =0); Middle (B) plots mse,c
γ
; η) when the null hypothesis H
0
is weakly violated (η =0.5); Rightmost (C) plots mse,c
γ
; η) when the null hypothesis H
0
is strongly violated (η =1.5).
error compared to other choices of Λ. These findings imply that despite the
bias due to accepting the non-probability sample, the impact would be less
compared to the increased variance due to rejecting the non-probability
sample. But care is needed to determine the amount of information bor-
rowed from the non-probability sample since a small deviation from the
optimal value Λ
can lead to a non-ignorable increase of the mean squared
error. Once the optimal mean squared error is reached at (Λ
,c
γ
), the fur-
ther increment of c
γ
will not be influential.
c) In the rightmost plot, where H
0
is strongly violated, the mean squared
error behaves differently as in (A) and (B). It is advisable to choose both Λ
and c
γ
close to zero (the low probability of combining the non-probability
sample with the probability sample) to minimize the mean squared error.
As above, keeping increasing c
γ
after the mean squared error flattens out
is of no importance.
B.3. Additional simulation results
Table B.1 provides the Monte Carlo averages and standard errors of the data-
adaptive tuned parameters (Λ,c
γ
) and the Monte Carlo proportion of combining
the probability and non-probability samples. Figure B.2 presents the plots of
Monte Carlo biases, variances and mean squared errors of the μ
A
, μ
dr
, μ
eff
, μ
tap
and μ
tap:fix
based on 2000 replicated datasets. For the fixed threshold strategy
μ
tap:fix
, the threshold c
γ
is held fixed to be the 95th quantile of a χ
2
1
distribution
(i.e., 3.84) and the tuning parameter Λ is selected by minimizing the asymptotic
mean square error at the fixed c
γ
.
In Table B.1, we find that the adaptive procedure tends to select smaller
values of Λ and c
γ
as b increases. As a result, the Monte Carlo proportions of
combining the probability and non-probability samples together are decreasing,
which is desired for down-weighting the biased non-probability sample. More-
over, we compare the adaptive tuning strategy of c
γ
with a fixed thresholding
strategy, and Figure B.2 shows that the strategy with pre-defined cutoff cannot
1540 C. Gao and S. Yang
Table B.1
Simulation results of Monte Carlo averages of the tuning parameters ,c
γ
) and the
proportion P(comb) of combining the probability and non-probability samples.
H
0
Λ c
γ
P(comb)
est se est se est se
holds μ
tap
3.02 4.26 35.06 9.45 0.95 0.22
μ
tap:B
3.05 4.62 35.06 9.44 0.95 0.22
μ
tap:KH
3.06 4.66 35.06 9.44 0.95 0.22
slightly violated μ
tap
2.21 3.39 31.60 13.76 0.86 0.35
μ
tap:B
2.22 3.47 31.60 13.75 0.86 0.35
μ
tap:KH
2.23 3.60 31.60 13.75 0.86 0.35
strongly violated μ
tap
0.16 0.28 1.40 1.97 0.00 0.06
μ
tap:B
0.16 0.28 1.40 1.97 0.00 0.06
μ
tap:KH
0.16 0.28 1.40 1.98 0.00 0.06
Fig B.2. Summary statistics plots of estimators of μ
y
with respect to the strength of violation,
labeled by b. Each column of the plots corresponds to a different metric: “bias” for bias, “var”
for variance, “MSE” for mean square error.
satisfactorily control the mean squared error when H
0
is slightly or strongly
violated.
B.4. Double-bootstrap procedure for v
n
selection
Following the algorithm mentioned by [10], where optimal v
n
is selected to en-
sure the coverage probability, we need to retain the K bootstrapped samples,
called V
(1)
,V
(2)
, ···,V
(K)
where V
(b)
= {V
i
=(X
(b)
i
,Y
(b)
i
)
: i 1, ···,n},b=
1, ···,K with n = n
A
+n
B
. The reason it is called double bootstrap is that each
bootstrap sample spawns itself to a set of K
second-order bootstrap samples.
Next, we set up the candidates for v
n
. Under the assumption (A2), we let v
n
be the form of κ log log n with κ ∈{2, 4, 10, 20, 30}, and construct the bound-
based adaptive confidence intervals for each given κ at 1 α confidence level,
denoted as C
PACI
μ
g
,1α
(a). Given each κ, we compute the coverage probability for
the associated adaptive confidence intervals regarding these K
second-ordered
simulated datasets. Then, choose the smallest κ that ensures the actual cov-
Test-and-pool estimator 1541
erage probability larger than 1 α. Specifically, we use the estimator μ
(b)
A
for
μ
A
in each bootstrapped dataset as the ground truth and count the number of
datasets in which the adaptive confidence interval covers the ground truth, say
c(κ)=
K
b=1
1{μ
(b)
A
C
PACI,κ,(b)
μ
g
,1α
(a)} and therefore the v
n
can be determined
by using v
n
=inf{κ : c(κ)/K
> 1 αlog log n. In our simulation, K
is set
to be 100.
B.5. Details of the Bayesian method
In this section, we provide the details of the Bayesian approaches proposed by
[52] to combine the probability and non-probability samples as follows.
1. Solve the score function for β by using the non-probability sample:
β
NPR
=argmin
β
N
i=1
δ
B,i
X
i
(Y
i
X
i
β)=0.
2. Construct the informative prior with three choices:
Prior 1: Choose a weakly informative parameterization of the prior as
β ∼N(0, 10
6
),
which can be treated as a reference for comparison.
Prior 2: Let
β
PR
be the solution to the score function based on the probability
sample
β
PR
=argmin
β
N
i=1
δ
A,i
X
i
(Y
i
X
i
β)=0.
Then consider the squared Euclidean distance between
β
PR
and
β
NPR
as the hyper-parameter σ
2
β
for the variance of β:
β ∼N
β
NPR
, diag(
β
PR
β
NPR
2
2
)
.
Prior 3: In lieu of using the squared distance to extract information on σ
2
β
,
a nonparametric with-replacement bootstrap procedure can be im-
plemented (B = 1000). After estimating the coefficient in each of
them, denoted by
β
(i)
NPR
, one replication-based variance estimator can
be obtained, σ
2
β
NPR
=
B
i=1
(
β
(i)
NPR
¯
β
NPR
)
2
/(B 1) with
¯
β
NPR
=
1/B
B
i=1
β
(i)
NPR
. Then, the informative prior can be constructed
β ∼N(
β
NPR
,I
p×p
· σ
2
β
NPR
).
3. Assume that the model for the observed probability sample is
Y
i
| δ
A,i
=1∼N(X
i
β,σ
2
).
1542 C. Gao and S. Yang
By imposing an informative non-probability-based prior, the resulting pos-
terior estimates are expected to be more efficient. Specifically, these priors
are:
β ∼N(β
0
2
β
)
2
Γ(r, m),r= m =10
3
,
where
Prior 1: β
0
=0
2
β
=10
6
,
Prior 2: β
0
=
β
NPR
2
β
= diag(
β
PR
β
NPR
2
2
),
Prior 3: β
0
=
β
NPR
2
β
= I
p×p
· σ
2
β
NPR
.
The posterior Markov chain Monte Carlo (MCMC) samples of β and Y
i
are ob-
tained by drawing 2000 samples from the posterior distributions and discarding
the first 500 samples as the burn-in procedures. The Bayesian estimator is
μ
Bayes
=1/
N
n
A
i=1
d
i
¯
Y
i
with
N =
n
A
i=1
d
i
,
where
¯
Y
i
is the posterior mean calculated by
¯
Y
i
=1/(2000 500)
2000
k=501
Y
i,k
.
Borrowed from Bayes’ Theorem, its variance and 95% highest posterior density
intervals can be estimated via the MCMC posterior samples. Denote μ
Bayes,k
=
1/
N
n
A
i=1
d
i
Y
i,k
,k = 501, ···, 2000. Then, we have
var(μ
Bayes
)=
1
2000 500 1
2000
k=501
(μ
Bayes,k
μ
Bayes
)
2
,
HPDI = {Q(μ
Bayes,k
; α/2),Q(μ
Bayes,k
;1 α/2)},
where Q(μ
Bayes,k
; α
0
)representstheα
0
-th sample quantile of the posterior sam-
ples μ
Bayes,k
, k = 501, ···, 2000 after burn-in.
References
[1] Abramowitz, M., Stegun, I. A. and Romer, R. H. (1988). Handbook
of mathematical functions with formulas, graphs, and mathematical tables.
[2] Baker, R., Brick, J. M., Bates, N. A., Battaglia, M.,
Couper, M. P., Dever, J. A., Gile, K. J. and Tourangeau, R. (2013).
Summary report of the AAPOR task force on non-probability sampling.
Journal of Survey Statistics and Methodology 1 90–143.
[3] Baltagi, B. H., Bresson, G. and Pirotte, A. (2003). Fixed effects, ran-
dom effects or Hausman–Taylor?: A pretest estimator. Economics Letters
79 361–369.
[4] Barr, D. R. and Sherrill, E. T. (1999). Mean and variance of truncated
normal distributions. The American Statistician 53 357–361.
Test-and-pool estimator 1543
[5] Beaumont, J.-F. (2020). Are probability surveys bound to disappear for
the production of official statistics? Survey Methodology 46 1–28.
[6] Bethlehem, J. (2016). Solving the nonresponse problem with sample
matching? Social Science Computer Review 34 59–77.
[7] Binder, D. A. and Roberts, G. R. (2003). Design-based and model-
based methods for estimating model parameters. Analysis of Survey Data
29 33–54. MR1978842
[8] Boas, M. L. (2006). Mathematical Methods in the Physical Sciences.John
Wiley & Sons.
[9] Boos, D. D. and Stefanski, L. A. (2013). Essential Statistical Inference:
Theory and Methods 591. Springer. MR3024617
[10] Chakraborty, B., Laber, E. B. and Zhao, Y. (2013). Inference for
optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap
scheme. Biometrics 69 714–723. MR3106599
[11] Chen, S., Yang, S. and Kim, J. K. (2022). Nonparametric mass imputa-
tion for data integration. Journal of survey statistics and methodology 10
1–24.
[12] Chen, Y., Li, P. and Wu, C. (2019). Doubly Robust Inference With Non-
probability Survey Samples. Journal of the American Statistical Association
115 2011–2021. MR4189773
[13] Cheng, X. (2008). Robust confidence intervals in nonlinear regression un-
der weak identification. Manuscript, Department of Economics, Yale Uni-
versity.
[14] Citro, C. F. (2014). From multiple modes for surveys to multiple data
sources for estimates. Survey Methodology 40 137–161.
[15] Cochran, W. G. (2007). Sampling Techniques, 3 ed. New York: John
Wiley & Sons, Inc. MR0054199
[16] Colnet, B., Mayer, I.,
Chen, G., Dieng, A., Li, R., Varoquaux, G.,
Vert, J.-P., Josse, J. and Yang, S. (2020). Causal inference methods
for combining randomized trials and observational studies: a review. arXiv
preprint arXiv:2011.08047.
[17] Couper, M. P. (2000). Web surveys: A review of issues and approaches.
The Public Opinion Quarterly 64 464–494.
[18] Couper, M. P. (2013). Is the sky falling? New technology, changing media,
and the future of surveys. Survey Research Methods 7 145–156.
[19] Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in sur-
vey sampling. Journal of the American Statistical Association 87 376–382.
MR1173804
[20] Elliot,M.R.(2009). Combining data from probability and non-
probability samples using pseudo-weights. Survey Practice 2 2982.
[21] Elliott, M. N. and Haviland, A. (2007). Use of a web-based conve-
nience sample to supplement a probability sample. Survey Methodology 33
211–215.
[22] Elliott, M. R. (2007). Bayesian weight trimming for generalized linear
regression models. Survey Methodology 33 23–34.
[23] Elliott, M. R., Valliant, R. et al. (2017). Inference for nonprobability
1544 C. Gao and S. Yang
samples. Statistical Science 32 249–264. MR3648958
[24] Fuller, W. A. (2009). Sampling Statistics. Wiley, Hoboken, NJ.
[25] Gao, C., Yang, S. and Kim, J. K. (2023). Soft calibration
for selection bias problems under mixed-effects models. Biometrika
doi.org/10.1093/biomet/asad016.
[26] Haziza, D. and Rao, J. N. (2006). A nonresponse model approach to
inference under imputation for missing survey data. Survey Methodology
32 53–64. MR2193025
[27] Kalton, G. (1983). Models in the practice of survey sampling. Interna-
tional Statistical Review/Revue Internationale de Statistique 51 175–188.
[28] Kalton, G. (2019). Developments in survey research over the past 60
years: A personal perspective. International Statistical Review 87 S10–S30.
MR3957341
[29] Kim, J. K. and Haziza, D. (2014). Doubly robust inference with missing
data in survey sampling. Statistica Sinica 24 375–394. MR3183689
[30] Kim, J. K. and Wang, Z. (2019). Sampling techniques for big data anal-
ysis. International Statistical Review 87 S177–S191. MR3957350
[31] Kott, P. S. (2006). Using calibration weighting to adjust for nonresponse
and coverage errors. Survey Methodology 32 133–142.
[32] Laber, E. B., Lizotte,D.J., Qian, M., Pelham,W.E.and Mur-
phy, S. A. (2014). Dynamic treatment regimes: Technical challenges and
applications. Electronic Journal of Statistics 8 1225–1272. MR3263118
[33] Laber, E. B. and Murphy, S. A. (2011). Adaptive confidence intervals
for the test error in classification. Journal of the American Statistical As-
sociation 106 904–913. MR2894746
[34] Little, R. J. (1982). Models for nonresponse in sample surveys. Journal
of the American statistical Association 77 237–250. MR0664675
[35] Mashreghi, Z., Léger, C. and Haziza, D. (2014). Bootstrap methods
for imputed data from regression, ratio and hot-deck imputation. Canadian
Journal of Statistics 42 142–167. MR3181587
[36] McRoberts, R. E., Tomppo,E.O.and Næsset, E. (2010). Advances
and emerging issues in national forest inventories. Scandinavian Journal of
Forest Research 25 368–381.
[37] Molina, E., Smith, T. and Sugden, R. (2001). Modelling overdispersion
for complex survey data. International Statistical Review 69 373–384.
[38] Mosteller, F. (1948). On pooling data. Journal of the American Statis-
tical Association 43 231–242.
[39] Nelder,J.A.and Mead, R. (1965). A simplex method for function
minimization. The Computer Journal 7 308–313. MR3363409
[40] Palmer, J. R., Espenshade, T. J., Bartumeus, F., Chung, C. Y.,
Ozgencil,N.E.and Li, K. (2013). New approaches to human mobility:
Using mobile phones for demographic research. Demography 50 1105–1128.
[41] Pfeffermann, D., Eltinge, J. L., Brown, L. D. and Pfeffer-
mann, D. (2015). Methodological issues and challenges in the production
of official statistics: 24th Annual Morris Hansen Lecture. Journal of Survey
Statistics and Methodology 3 425–483.
Test-and-pool estimator 1545
[42] Rao, J. (2020). On making valid inferences by integrating data from sur-
veys and other sources. Sankhya B 83 242–272. MR4256318
[43] Rao, J., Wu, C. and Yue, K. (1992). Some recent work on resampling
methods for complex surveys. Survey Methodology 18 209–217.
[44] Rao, J. N. (2014). Small-area estimation. Wiley StatsRef: Statistics Ref-
erence Online. MR1953089
[45] Rao, R. R. (1962). Relations between weak and uniform convergence
of measures with applications. The Annals of Mathematical Statistics 33
659–680. MR0137809
[46] Rivers, D. (2007). Sample Matching for Web Surveys: Theory and Appli-
cation. In Joint Statistical Meetings.
[47] Robbins, M. W., Ghosh-Dastidar, B. and Ramchand, R. (2021).
Blending of Probability and Non-Probability Samples: Applications to a
Survey of Military Caregivers. Journal of Survey Statistics and Methodol-
ogy 9 1114–1145. MR4417203
[48] Robins, J. M. (2004). Optimal structural nested models for optimal se-
quential decisions. In Proceedings of the Second Seattle Symposium in Bio-
statistics 179 189–326. Springer. MR2129402
[49] Robins,J.M., Rotnitzky, A. and Zhao,L.P.(1994). Estimation
of regression coefficients when some regressors are not always observed.
Journal of the American Statistical Association 89 846–866. MR1294730
[50] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the
propensity score in observational studies for causal effects. Biometrika 70
41–55. MR0742974
[51] Rothwell, P. M. (2005). Subgroup analysis in randomised controlled tri-
als: importance, indications, and interpretation. The Lancet 365 176–186.
[52] Sakshaug, J. W., Wiśniowski, A., Ruiz,D.A.P.and Blom,A.G.
(2019). Supplementing Small Probability Samples with Nonprobability
Samples: A Bayesian Approach. Journal of Official Statistics 35 653–681.
[53] Särndal, C.-E., Swensson, B. and Wretman, J. (2003). Model Assisted
Survey Sampling. New York: Springer-Verlag. MR1140409
[54] Scharfstein, D. O., Rotnitzky, A. and Robins,J.M.(1999). Adjust-
ing for nonignorable drop-out using semiparametric nonresponse models.
Journal of the American Statistical Association 94 1096–1120. MR1731478
[55] Schenker, N. and Welsh, A. (1988). Asymptotic results for multiple
imputation. Annals of Statistics 16 1550–1566. MR0964938
[56] Shao, J. (1994). Bootstrap sample size in nonregular cases. Proceedings of
the American Mathematical Society 122 1251–1262. MR1227529
[57] Shao, J. and Tu, D. (2012). The Jackknife and Bootstrap. Springer, New
York. MR1351010
[58] Skinner, C. et al. (1992). Pseudo-likelihood and quasi-likelihood estima-
tion for complex sampling schemes. Computational Statistics & Data Anal-
ysis 13 395–405. MR1173330
[59] Staiger, D. and Stock, J. H. (1997). Instrumental variables regression
with weak instruments. Econometrica 65 557–586. MR1445622
[60] Tallis, G. (1963). Elliptical and radial truncation in normal populations.
1546 C. Gao and S. Yang
The Annals of Mathematical Statistics 34 940–944. MR0152081
[61] Tam, S.-M. and Clarke, F. (2015). Big data, official statistics and some
initiatives by the Australian Bureau of Statistics. International Statistical
Review 83 436–448.
[62] Tourangeau, R., Conrad, F. G. and Couper,M.P.(2013). The
Science of Web Surveys. Oxford University Press: New York.
[63] Toyoda, T. and Wallace, T. D. (1979). Pre-testing on part of the data.
Journal of Econometrics 10 119–123. MR0567944
[64] Tsiatis, A. (2006). Semiparametric Theory and Missing Data. Springer,
New York. MR2233926
[65] van der Vaart (2000). Asymptotic Statistics 3. Cambridge university
press, Cambridge: Cambridge University Press. MR1652247
[66] Vavr ec k, L. and Rivers, D. (2008). The 2006 cooperative congres-
sional election study. Journal of Elections, Public Opinion and Parties 18
355–366.
[67] Vermeulen, K. and Vansteelandt, S. (2015). Bias-reduced doubly
robust estimation. Journal of the American Statistical Association 110
1024–1036. MR3420681
[68] Wallace, T. D. (1977). Pretest estimation in regression: A survey. Amer-
ican Journal of Agricultural Economics 59 431–443.
[69] Williams, D. and Brick, J. M. (2018). Trends in US face-to-face house-
hold survey nonresponse and level of effort. Journal of Survey Statistics
and Methodology 6 186–211.
[70] Xu, C., Chen, J. and Harold, M. (2013). Pseudo-likelihood-based
Bayesian information criterion for variable selection in survey data. Survey
Methodology 39 303–322.
[71] Yang, S. and Ding, P. (2020). Combining multiple observational data
sources to estimate causal effects. Journal of the American Statistical As-
sociation 115 1540–1554. MR4143484
[72] Yang, S., Gao, C., Zeng, D. and Wang, X. (2022). Elastic integrative
analysis of randomized trial and real-world data for treatment heterogeneity
estimation. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), In press.
[73] Yang, S. and Kim, J. K. (2020). Statistical data integration in survey
sampling: A review. Japanese Journal of Statistics and Data Science 3
625–650. MR4181993
[74] Yang, S., Kim, J. K. and Hwang, Y. (2021). Integration of survey data
and big observational data for finite population inference using mass im-
putation. Survey Methodology 47 29–58.
[75] Yang, S., Kim, J. K. and Song, R. (2020). Doubly robust inference when
combining probability and non-probability samples with high dimensional
data. Journal of the Royal Statistical Society: Series B (Statistical Method-
ology) 82 445–465. MR4084171