The Sister Study is a prospective cohort study of environmental and genetic risk factors for breast
cancer and other diseases among 50,884 sisters of women who have had breast cancer. Such
sisters have about twice the risk of developing breast cancer as other women, thus about 300
new cases of breast cancer are expected to be diagnosed each year. Study enrollment opened
nationally in October 2004 and closed in July 2009. Eligible women were 35 to 74 years of age,
lived in the United States, including Puerto Rico, and had a sister diagnosed with breast cancer
but did not have breast cancer themselves. Multiple recruitment strategies were used to enroll a
diverse cohort of women with a variety of different life experiences and exposures. Baseline data
on potential risk factors and current health status were collected in telephone interviews and
mailed questionnaires. Blood, urine, and environmental samples were collected during a baseline
home visit and banked for future use in nested case-cohort or case-control studies of breast
cancer or other diseases. Stored samples include whole blood, cryopreserved whole blood or
lymphocytes (12% random sample), plasma, serum, urine, toenail clippings, and household dust
collected with alcohol wipes. The cohort is being followed prospectively. Contact information and
major changes to health are updated annually. Comprehensive triennial questionnaires update
medical history and changes in exposures. Medical records, pathology reports, and tumor tissue
blocks are sought for women who develop breast and (recently) ovarian cancer. For other
cancers, pathology reports are requested. Other self-reported health outcomes are validated for
special studies. Analyses assess the effects of environmental and lifestyle exposures and genetic
factors on breast cancer risk and risk for other diseases (e.g. heart disease, osteoporosis, other
hormonal cancers, and autoimmune diseases). Future studies of environmental and genetic
influences on breast cancer prognosis are made possible by continuing to follow women in the
cohort who develop breast cancer.
Background and Rationale
Breast cancer is the leading (non-skin) cancer among women with approximately 282,000
diagnoses of breast cancer and 44,000 deaths per year in the United States. Known risk factors
explain less than 50% of variation in breast cancer risk and known breast cancer genes account
for fewer than 10% of cases. The Sister Study was designed to study environmental and genetic
risk factors for breast cancer and other women’s health conditions. The study creates a framework
for addressing current and future hypotheses as science advances over the follow up period,
including studies related to biological mechanisms. Studying sisters of women diagnosed with
breast cancer is advantageous because it allows for a smaller cohort size and shorter follow-up
than needed to study women in general. These sisters have about twice the risk of breast cancer
as other women and the frequency of relevant genes and shared risk factors will also be higher,
increasing the statistical power of the study. This enhances the ability to assess the interplay of
genes and environment in breast cancer risk and to identify potential modifiable risk factors. In
addition, sisters are often highly motivated to participate in long-term breast cancer research
because their family member has experienced the disease so the response rates and compliance
are high. The prospective design allows the assessment of exposures before the onset of disease
thereby avoiding biases common to retrospective studies.
Recruitment and Enrollment
After a four-city vanguard phase in 2003, nationwide enrollment took place October 2004 through
March 2009. Eligible women were 35 to 74 years of age, lived in the United States, including
Puerto Rico, and had a sister diagnosed with breast cancer but did not have breast cancer
themselves. Efforts were made to maximize the inclusion of women who are often under-
represented in research, such as minoritized racial and ethnic groups, those with low education,
and those aged 65 years and older, and to target women with possible relevant exposures
because of their place of residence or occupation. Recruitment activities included outreach to
volunteers and breast cancer organizations, networking with communities, direct mailings to
specific lists, national media campaigns and the endorsement of the Sister Study by high profile
celebrity supporters. Study materials were made available in Spanish in 2005. Additional details
can be found in Sandler, et al. 2017 (PMID: 29373861)
Study Population
A total of 50,884 women completed required baseline activities and were fully enrolled in the
study, including 8,311 women (16%) who self-identified as Hispanic/Latina or non-White, 8,874
women aged 65 years and older (17%), and 7,805 women with a high school education or less
(15%). A smaller group of women who completed some but not all study requirements (n=3,066)
is being followed passively through record linkage (vital statistics and possibly cancer registries)
to assess differences in outcomes for those who did and did not fully enroll. This latter group
includes a larger percentage of minority women (36%) and women with fewer years of schooling
(18%), women who were the focus of intense recruitment efforts towards the end of the
recruitment period. Reflecting the volunteer nature of the cohort and the recruitment of sisters of
women with breast cancer, Sister Study participants have higher education levels and have higher
prevalence of known breast cancer risk factors, including enhanced family history (Table 1).
Additional information about the cohort can be found on the study’s website
Baseline characteristics of women in the Sister Study
Non-Hispanic White
Non-Hispanic Black/African American
High school or less
Some college
Associates or technical degree
Bachelor’s degree
Master’s or doctoral degree
Alcohol consumption
Current, < 1 drink/day
Current, 1 drink/day
Body Mass Index
Number of sisters (full or half) with breast
Mother diagnosed with breast cancer
Includes non-Hispanic Asian/Pacific Islanders, non-Hispanic American Indians, and non-Hispanic Other;
women who self-identified as Black/African American and another race were included as Black/African
Data Collection
Computer-assisted telephone interviews (CATI): Telephone interviews were scheduled in two
one-hour sessions to collect information on a broad range of exposures and lifestyle
characteristics. Supporting materials, including a list of relevant medications and a chronological
life calendar, were provided to help women prepare for the interviews. Topics included
demographic and socioeconomic factors, lifestyle and environmental exposures, residential
history, medical and medication-use history, reproductive history and hormone use, breast
conditions and surgeries, occupational history, and physical activity. Questionnaires focused
specifically on early life (before puberty) and reproductive years as well as the time of enrollment.
Self-administered questionnaires: Participants filled out three self-administered questionnaires:
use of personal care products; prenatal (in utero) exposures and family medical history; and
current diet (Block 98 food frequency questionnaire), with supplementary questions on
complementary and alternative medicines, childhood diet, special diets, and eating patterns.
Home Visit: Trained female examiners from a national in-home phlebotomy service (EMSI) visited
participants’ homes (or a mutually agreed upon alternate site such as a doctor’s office) to draw
blood, measure blood pressure, height, weight, hips, and waist and to retrieve consent forms,
self-administered questionnaires and self-collected toenails, dust, and urine. Participants filled out
a brief questionnaire on the day of the visit to report information on their diet, medication use, and
activities over the past 24 hours. Examiners packed and shipped study samples and forms to the
Sister Study Laboratory by FedEx Priority Overnight on the same day as the visit.
Biological Specimen Collection: Nearly all participants provided biological samples. Details of
collection and processing can be found on the Sister Study website
Toenails: Participants collected toenail clippings from each toe unless they had a medical or
physical condition (e.g. diabetes) that would prohibit collection. Samples are stored at room
Dust: Participants collected dust samples from three rooms of their home using pre-packaged
alcohol wipes. Wipes are stored in -20° C freezers.
Urine: Participants collected clean-catch midstream first morning urine specimens on the day of
the home visit and kept them refrigerated until pick up by the examiner.
Blood: Participants were instructed to fast for at least eight hours prior to their blood draw.
Examiners collected approximately 45 ml of blood using six BD Vacutainer® (Becton, Dickinson
and Company) tubes, including two EDTA tubes (BD#s 367855 and 366643), two serum tubes
(BD# 367820) and two ACD-B tubes (BD# 364816). In the rare event that a blood sample could
not be collected due to an unsuccessful phlebotomy, participants were asked to provide a saliva
sample using an Oragene™ DNA self-collection saliva kit (DNA Genotek, Ottawa Canada).
DNA: DNA was extracted from whole blood (90%), clot, or saliva. We initially extracted DNA for
~2,400 breast cancer cases, 140 ovarian cancer cases, a random sample of the cohort (n=2,350),
and additional premenopausal women to maximize studies related to breast cancer in
premenopausal women. These cases and non-cases have been included in large scale GWAS
and methylation studies. Since then we have added more recently diagnosed breast and ovarian
cancer cases, women diagnosed with other cancers, additional premenopausal women, and
another random sample of the cohort. The other cancers selected were those likely to be
hormonally related, to have reasonable sample size to support candidate-SNP analyses or
pooling efforts, and to either have high rates of medical record confirmation or be those for which
self-reports are likely to be valid. DNA is also available for cases and non-cases included in early
pilot efforts. In all, DNA is available for 19,000 women in the cohort.
The Sister Study cohort is followed prospectively to identify incident breast cancer and other
health outcomes. Participants can report a diagnosis of breast cancer or other conditions at
annual updates (selected outcomes) or follow-up questionnaires, or they can contact a study
helpdesk by telephone, mail, or email. Annual update forms and biennial/triennial follow-up
questionnaires are available in English and Spanish. Starting in 2010, all study materials have
been available on the web and women have the option of completing follow-up questionnaires
on-line, by mail, or over the phone. Annual updates and questionnaires are administered in
“waves” representing groupings by enrollment date. Over time, waves have been combined to
condense the time it takes to complete a single follow-up activity from 5 years to just over 2 years
(see schematic below).
Women reporting a diagnosis of LCIS, DCIS, invasive breast cancer or ovarian cancer are asked
to provide information on their diagnosis and treatment and provide authorization for medical
record and tumor tissue sample retrieval. Women with other cancers are asked for pathology
reports or permission to retrieve them from medical providers. Protocols for validating other
incident conditions reported during follow up are developed as needed.
Annual updates: Women are contacted annually for a brief update or a scheduled detailed follow-
up questionnaire. The annual update form collects changes in contact information and allows
participants to report major changes in health, including breast cancer. Response rates for the
annual updates have been 91% or higher throughout follow-up (range 91%-96%). Through the
end of July 2021, 2,967 (5.8%) of Sister Study participants were known to be deceased.
Detailed questionnaires: More in-depth questionnaires collect information on medical diagnoses
and symptoms, changes in environmental exposures and lifestyle, and special topics of interest.
The first detailed (biennial) follow-up, completed in July 2012, consisted of three questionnaires:
Health and Medical History, Lifestyle, and the special topic, Stress and Coping. Responses were
obtained from 48,090 women for an overall response rate of 95%. For the next round of detailed
follow-up, the study shifted to triennial administration to reduce participant burden and simplify
workflow. The special topic for the 2
detailed follow-up (completed April 2014) was Quality of
Life and other related topics. The 3
detailed follow-up introduced a streamlined reproductive
section for participants >60 years of age; this follow-up was completed in August 2016. With this
questionnaire, we introducted an advocacy program, which provides more personal attention to
those at higher risk for dropping out of the study. The result was a preservation of high response
rates (91%), with an approximately 2% increase in response rate among minority women. A 4th
follow-up was completed in 2019 (response rate 85%) and a 5th is currently in the field (starting
fall 2020, response rate of 67% through July 2021, on track with prior surveys). Additionally, a
detailed family history questionnaire was distributed in 2017-2018 to collect data on the history of
cancer in first and second degree relatives (response rate of 83%), and a special COVID-19
survey was distributed in the fall of 2020 (response rate of 74% through July 2021).
Breast and Ovarian Cancer Follow-up
The breast cancer follow-up protocol has been streamlined over time to reduce participant burden
and maximize response rates. We are now also getting more detailed follow-up information from
ovarian cancer cases. Women are now contacted 6 months after diagnosis, closer to the end of
their treatment. They are mailed a packet that includes instructions, breast cancer definitions, a
self-administered questionnaire and authorization forms for requesting medical records and tumor
tissue samples. The questionnaire was streamlined to focus on information only the woman can
provide herself, such as how the tumor was detected, her health insurance status, and quality of
life after diagnosis. It also covers basic information on tumor pathology and treatment in case the
medical record is not obtained. We ask women to send us a copy of their pathology report if they
have one. Medical providers are asked to complete a form about the breast cancer diagnosis and
treatment and/or provide relevant pages from the medical record. They are also asked to send
pathology reports, blocks of breast (ovarian) carcinoma and normal breast tissue, and diagnostic
H & E slides.
As of the most recent Sister Study Data Release (9.0), 3,999 women had reported a diagnosis of
incident DCIS, or invasive breast cancer. Out of the total incident breast cancer events at that
time, medical records or pathology reports were obtained to confirm 3,269 (81.7%) incident breast
cancer events, and 2,487 tissue samples have been retrieved. Among women for whom we
obtain pathology reports or medical records, the positive predictive value (PPV) of a self-reported
breast cancer is 99.4%.
Breast Tissue Microarrays (TMAs) and Tissue Cores: Tumor and normal tissue blocs are being
used to create TMAs for immunohistochemical staining and to obtain tissue core biopsies. TMAs
are prepared at the UNC Translational Pathology Laboratory. Mark Sherman, formerly at NCI and
now at Mayo Clinic, Jacksonville Florida, serves as study pathologist. He oversees the review of
slides, documentation of tumor features, and selection of tissue for sampling for TMAs and cores,
which are extracted and placed into individual tubes. When available, TMAs and cores include
invasive cancer tissue, co-occurring DCIS, and adjacent normal tissue. Initial
immunohistochemical staining will include ER, PR, HER2, Ki67, EGFR, and CK6/6 to allow for
comprehensive classification of breast cancer subtypes.
Other incident conditions
In 2010 we began validating other (non-breast) cancers, with prioritization of types based on
relevance to hormonal-related hypotheses, frequency of diagnosis, and opportunities for
consortia collaboration. Women are asked to mail a copy of the pathology report to us if available
and to sign an authorization form allowing us to request it from their medical provider.
Second Specimen Collection
In 2014/2015 we carried out the Sisters Changing Lives initiative in which we invited 3,800 women
to complete a second home visit for sample collection. Procedures were identical to those at the
enrollment visit except an RNA Tempest tube was substituted for one of the baseline whole blood
tubes. Women diagnosed with breast cancer by the time of the initiative and a random sample of
the cohort were targeted. Samples were obtained for 61% of invited breast cancer cases and 65%
of non-cases. A total of 2,434 women participated, of whom 1,227 were cases. This resource
allows for studies of changes in biomarkers over time and of changes in biomarkers due to a
breast cancer diagnosis.
Special COVID Survey
In response to the 2020-2021 COVID-19 pandemic, we designed a special questionnaire to
collect data on coronavirus infections, testing, and COVID-related health behaviors. We also
included questions about screening or treatment delays, mental health (including stress, anxiety,
and sleep health), and many other factors. The survey was sent to all active participants in
November 2020, with a 74% response rate through July 2021. Several COVID-related questions
(e.g. ever infected, vaccine status) were also added to the annual follow-up questionnaire and 5th
detailed follow-up questionnaire. Additionally, we joined a large collaborative group collecting
COVID data from large cohorts in the US or UK via the Zoe app (
Data Management and Processing
Over the course of the study, data files were released for analysis for the first 10,000 (2006),
20,000 (2007), 30,000 (2008) and the final baseline cohort of 50,884 (2011) participants. To
create continuity across analyses and papers, we are now using data releases. The first data
release was created in January 2013. Data releases are issued approximately once per year to
incorporate new follow-up data, including updated mortality data from the National Death Index,
and any changes due to data cleaning. The most recent data release (DR 9.1) was in June 2021.
Data Release 10.0 is expected in early 2022.
Data Sharing and Collaboration
In the interest of promoting scientific research on the environmental and genetic risk factors for
breast cancer and other diseases, the Sister Study welcomes proposals for collaborative studies
from within NIEHS and the wider scientific community. Proposals are reviewed to ensure scientific
merit and to protect the integrity of the study and the confidentiality of participants. Acceptable
study topics will take advantage of the unique characteristics of the Sister Study cohort and may
involve the analysis of routinely collected data or specimens or involve new data collection.
Information on available data, instructions on how to submit a research topic and proposal, and
guidelines regarding the use of study data and specimens can be found on the Sister Study data
portal at This portal tracks study proposals, data requests, specimen
use and manuscripts.
Consortia and data pooling
The Sister Study participates in the National Cancer Institute’s Cohort Consortium, a group that
facilitates the pooling of data from individual cohort studies to create high-quality databases
large enough to investigate risk factors for rare cancers or to study low-penetrance genetic
variants and other factors with small effects in relation to breast and other common cancers.
More than 20 international cohorts are included.
Sister Study investigators (Dr. Sandler and former fellow Hazel Nichols, now at the University of
North Carolina) are leading a Cohort Consortium project on premenopausal breast cancer in
collaboration with investigators from the Institute of Cancer Research, London. Initial efforts aim
to understand pregnancy-related breast cancer risk factors and other exposures that may
differentially affect premenopausal versus postmenopausal breast cancer, such as obesity,
physical activity, and hormone therapy use. Drs. O’Brien and Sandler have also led projects in
the Ovarian Cancer Cohort Consortium (OC3), including studies of genital powder use and an
upcoming project on hormone therapy.
The Sister Study also has contributed data to Cohort Consortium studies on head and neck,
gallbladder, and thyroid cancers as well as large GWAS and sequencing-based studies of
breast and ovarian cancer. This allows us to contribute to research on cancers for which we lack
sufficient power on our own or to contribute to the large efforts needed for gene identification
and risk prediction. These include:
The Confluence Project, NCI
Biomarkers and Breast Cancer Risk Prediction in Younger Women, NCI
Diet and Cancer Consortium, NCI
Breast Cancer Association Consortium (BCAC)
Cancer Risk Estimates Related to Susceptibility Genes (CARRIERS) Consortium, Mayo
Breast CAncer STratification (B-CAST) Consortium, Netherlands Cancer Institute
Ovarian Cancer Association Consortium (OCAC)
Biliary Tract Cancers Pooling Project, Epidemiology and Genomics Research, NCI
Reproductive and Hormonal Factors and Thyroid Cancers Risk Pooling Project, NCI
AMH and Breast Cancer Pooling Project (NYU and the NCI Cohort Consortium)
Collaborative Study to Find Genetic Variants Associated with Variation in Anti-Müllerian
Hormone Levels, Exeter University (UK)
The Sister Study also contributes to other meta-analyses and data pooling efforts outside the
Cohort Consortium:
Trauma and ovarian cancer, Moffitt Cancer Center
Circulating hormones in premenopausal women and subsequent breast cancer risk, NCI
Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm
(BODICEA), Cambridge UK
Through an agreement with the PIs, we will be receiving a near complete copy of their study
data (enrollment and follow-up questionnaires and selected cancer outcomes) for the
Breakthrough Generations Cohort that will allow for pooling data from the two studies to
evaluate new hypotheses that require a larger sample or to validate findings published from
either cohort alone.
Finally, The Sister study participates in research led by extramural collaborators, funded by NIH
or other grants. For example, the Sister Study is one of several prospective cohorts included in
a study (NIH RO1, Joel Kaufman, PI) of ambient air pollution and incident cardiovascular. As
part of that effort, self-reported cases of cardiovascular disease in the Sister study were
validated with medical records and probability-based algorithms were created to classify those
with no available records. A spin-off study will include some of the same cohorts in a pooled
study of air pollution and breast cancer.
Geocoding studies
Sister Study enrollment, longest-lived and childhood residences have been geocoded allowing
linkage to air pollution, census, and other data. Follow-up addresses were recently geocoded as
part of the cardiovascular disease collaboration with Dr. Kaufman. A new effort will attempt to
identify Sister Study participant’s interim addresses (post-1980) using the LexisNexis database.
Ancillary Studies
Two Sister Study
The Two Sister Study, which completed enrollment in December 2010, is a family-based study of
genetic and environmental risk factors for young onset (before age 50) breast cancer. The study
recruited the affected sister of Sister Study participants (the sister with breast cancer who was
not in the Sister Study) if her diagnosis was before age 50 and within four years of screening for
eligibility. Case-sisters completed the same computer-assisted telephone interviews as their
sisters did when joining the Sister Study, provided saliva, dust, and toenail samples, provided
detailed information about their breast cancer diagnosis and treatment, and were asked to
authorize retrieval of medical records and tumor samples. In addition, participants invited their
parents to provide a saliva sample as a source of DNA for genetic analyses. Over 1,400 young-
onset sisters enrolled in the study by completing questionnaires and/or providing saliva samples
for DNA. These index cases are the sisters of ~1,700 women in the Sister Study (who also
provided questionnaires and samples for DNA). Of their parents, 1,438 provided a saliva sample.
About 1,300 of the sisters with young-onset breast cancer completed all study requirements (all
questionnaires and saliva sample) and are now being followed prospectively along with Sister
Study participants who developed breast cancer after joining the study.
CDC Special Survey and Survivorship Survey
In response to a CDC mandate under the Young Women's Breast Health Awareness and Support
of Young Women Diagnosed with Breast Cancer bill (the 2010 EARLY Act), the Sister Study
teamed with researchers from the Epidemiology and Applied Research Branch in the Division of
Cancer Prevention and Control at the Centers for Disease Control and Prevention (CDC) to 1)
survey breast cancer free Sister Study participants about breast cancer screening practices,
family communication about cancer, and the effect of having a sister with breast cancer on
participants and their families. About 18,000 women participated in 2012; 2) survey women
diagnosed with breast cancer about topics of interest to younger women such as body image,
work-life balance, relationships and intimacy, and fertility, as well as impact of cancer on the lives
of breast cancer survivors and their families, survivors’ quality of life, physical and emotional
health, changes in lifestyle and environment, and coordination of cancer treatment and follow-up
care. This survey was completed (2012-2013) by 2,537 women with breast cancer in the Sister
Study and the Two Sister Study.
Validation of Early Life Factors
Data collection for a validation study of self-reported early life factors was completed in 2012. The
aim was to evaluate how accurately women reported the information on early life collected at
baseline, including information on their mother's pregnancy. A total of 1,802 of the participants’
mothers completed a questionnaire after receiving an invitation from their daughters.
Mammography Initiative
In collaboration with investigators from Columbia University, the Sister study attempted to retrieve
digital and film mammography images from a case-control sample of participants age <55, with a
goal of studying factors related to breast density changes over time. Approximately 60% of women
contacted for the study provided medical release forms, allowing for the collection of 10,000+
mammograms from more than 1,500 women.
Olfactory Impairment
Dr. Honglei Chen of Michigan State University received grant funding (Department of Defense
and Parkinson’s Foundation) to study airborne pollutants, the olfactory system, and Parkinson’s
disease. To support this research, a sample of 3,406 participants aged 50-79 completed a special
survey about olfaction status and a Brief Smell Identification Test.
Faustine Williams, PhD, MPH, MS National Institute of Minority Health and Health Disparities
Angeline Andrew, MD
Olga Basso, PhD
Kimberly Bertrand, ScD
Deborah Bookwalter, PhD
Leah Hawkins Bressler, MD
Timothy Buckley, PhD
Andrew Chan, MD MPH
Honglei Chen, MD, PhD
Fergus Couch, PhD
Sandra Deming-Halverson
Lisa DeRoo, PhD, MPH
Renee Fortner, PhD
Holly Harris, ScD
M. Elizabeth Hodgson, PhD
Laura Hooper, MD
Brian Jackson, PhD
Margaret Karagas, PhD
Joel D. Kaufman, MPH, MD
Alexander Keil, PhD
Joshua Keller, PhD
Cynthia Kleeberger, MS
Jenna Lilyquist, PhD
Erin Linnenbringer, PhD
Julie Palmer, ScD
Lucy Peipins, PhD
Juan Rodriguez, MPH
Minouk Schoemaker, PhD
Mark Sherman, MD
Amanda Simanek, PhD
Anthony Swerdlow, MD PhD
Adam A. Szpiro, PhD
Parisa Tehranifar, PhD
Mary Beth Terry, PhD
Melissa Troester, PhD, MPH
Wei-Lun Tsai, PhD
Shelley Tworoger, PhD
Paul Villeneuve, PhD
Mary C. White, ScD
Lauren Wright
Dartmouth College
McGill University
Boston University
Westat, Inc.
Univ. of North Carolina
US Environmental Protection Agency
Harvard University
Michigan State University
Mayo Clinic, Rochester
Social & Scientific Systems, Inc.
University of Bergen, Norway
German Cancer Research Center
Univ. of Washington
Social & Scientific Systems, Inc.
University of Washington, Seattle
Dartmouth, Dept of Earth Sciences
Dartmouth, Geisel School of Medicine
University of Washington, Seattle
University of North Carolina
U Washington/ Johns Hopkins / Colorado State
Social & Scientific Systems, Inc.
Social & Scientific Systems, Inc.
Washington University in St. Louis
Boston University
Centers for Disease Control and Prevention
Centers for Disease Control and Prevention
Institute of Cancer Research, UK
Mayo Clinic, Jacksonville, FL
University of WI at Milwaukee
Institute of Cancer Research, UK
University of Washington, Seattle
Columbia University
Columbia University
University of North Carolina, Chapel Hill
US Environmental Protection Agency
Moffitt Cancer Center
Carleton University, Canada
Centers for Disease Control and Prevention
Institute of Cancer Research, UK
