ta
REPORTTOTHEPRESIDENT
BIGDATAANDPRIVACY:
ATECHNOLOGICAL
PERSPECTIVE
ExecutiveOfficeofthePresident
President’sCouncilofAdvisorson
ScienceandTechnology
May2014
REPORTTOTHEPRESIDENT
BIGDATAANDPRIVACY:
ATECHNOLOGICALPERSPECTIVE
ExecutiveOfficeofthePresident
President’sCouncilofAdvisorson
ScienceandTechnology
May2014
AboutthePresident’sCouncilofAdvisorson
ScienceandTechnology
ThePresident’sCouncilofAdvisorsonScienceandTechnology(PCAST)isanadvisorygroupof
theNation’sleadingscientistsandengineers,appointedbythePresidenttoaugmentthescience
and technology advice available to him from inside the White House and from cabinet
departments and other Federal agencies. PCAST is consulted about, and often makes policy
recommendationsconcerning,thefullrangeofissueswhereunderstandingsfromthedomains
of science, technology, and innovation bear potentially on the policy choices before the
President.
FormoreinformationaboutPCAST,seewww.whitehouse.gov/ostp/pcast
i
ThePresident’sCouncilofAd visorson
ScienceandTechnology
CoChairs
JohnP.Holdren
AssistanttothePresidentfor
ScienceandTechnology
Director,OfficeofScienceandTechnology
Policy
EricS.Lander
President
BroadInstituteofHarvardandMIT
ViceChairs
WilliamPress
RaymerProfessorinComputerScienceand
IntegrativeBiology
UniversityofTexasatAustin
MaxineSavitz
VicePresident
NationalAcademyofEngineering
Members
RosinaBierbaum
Dean,SchoolofNaturalResourcesand
Environment
UniversityofMichigan
ChristineCassel
PresidentandCEO
NationalQualityForum
ChristopherChyba
Professor,AstrophysicalSciencesand
InternationalAffairs
Director,ProgramonScienceandGlobal
Security
PrincetonUniversity
S.JamesGates,Jr.
JohnS.TollProfessorofPhysics
Director,CenterforString
andParticle
Theory
UniversityofMaryland,CollegePark
MarkGorenberg
ManagingMember
ZettaVenturePartners
SusanL.Graham
PehongChenDistinguishedProfessor
EmeritainElectricalEngineeringand
ComputerScience
UniversityofCalifornia,Berkeley
ii
ShirleyAnnJackson
President
RensselaerPolytechnicInstitute
RichardC.Levin(throughmidApril2014)
PresidentEmeritus
FrederickWilliamBeineckeProfessorof
Economics
YaleUniversity
MichaelMcQuade
SeniorVicePresidentforScienceand
Technology
UnitedTechnologiesCorporation
ChadMirkin
GeorgeB.RathmannProfessorofChemistry
Director,InternationalInstitutefor
Nanotechnology
NorthwesternUniversity
MarioMolina
DistinguishedProfessor,Chemistryand
Biochemistry
UniversityofCalifornia,SanDiego
Professor,CenterforAtmosphericSciences
attheScrippsInstitutionofOceanography
CraigMundie
SeniorAdvisortotheCEO
MicrosoftCorporation
EdPenhoet
Director,AltaPartners
ProfessorEmeritus,BiochemistryandPublic
Health
UniversityofCalifornia,Berkeley
BarbaraSchaal
MaryDellChiltonDistinguishedProfessorof
Biology
WashingtonUniversity,St.Louis
EricSchmidt
ExecutiveChairman
Google,Inc.
DanielSchrag
SturgisHooperProfessorofGeology
Professor,EnvironmentalScienceand
Engineering
Director,HarvardUniversityCenterfor
Environment
HarvardUniversity
Staff
MarjoryS.Blumenthal
ExecutiveDirector
AshleyPredith
AssistantExecutiveDirector
KnatokieFord
AAASScience&TechnologyPolicyFellow
iii
PCASTBigDataandPrivacyWorkingGroup
WorkingGroupCoChairs
SusanL.Graham
PehongChenDistinguishedProfessor
EmeritainElectricalEngineeringand
ComputerScience
UniversityofCalifornia,Berkeley
WilliamPress
RaymerProfessorinComputerScienceand
IntegrativeBiology
UniversityofTexasatAustin
WorkingGroupMembers
S.JamesGates,Jr.
JohnS.TollProfessorofPhysics
Director,CenterforStringandParticle
Theory
UniversityofMaryland,CollegePark
MarkGorenberg
ManagingMember
ZettaVenturePartners
JohnP.Holdren
AssistanttothePresidentforScienceand
Technology
Director,OfficeofScienceandTechnology
Policy
WorkingGroupStaff
MarjoryS.Blumenthal
ExecutiveDirector
President’sCouncilofAdvisorsonScience
andTechnology
EricS.Lander
President
BroadInstituteofHarvardandMIT
CraigMundie
SeniorAdvisortotheCEO
MicrosoftCorporation
MaxineSavitz
VicePresident
NationalAcademyofEngineering
EricSchmidt
ExecutiveChairman
Google,Inc.
MichaelJohnson
AssistantDirector
NationalSecurityandInternationalAffairs
iv
EXECUTIVE OFFICE OF THE PRESIDENT
PRESIDENT’S COUNCIL OF ADVISORS ON SCIENCE AND TECHNOLOGY
WASHINGTON, D.C. 20502
President Barack Obama
The White House
Washington, DC 20502
Dear Mr. President,
We are pleased to send you this report, Big Data and Privacy: A Technological Perspective, prepared for you by the
President’s Council of Advisors on Science and Technology (PCAST). It was developed to complement and inform
the analysis of big-data implications for policy led by your Counselor, John Podesta, in response to your requests of
January 17, 2014. PCAST examined the nature of current technologies for managing and analyzing big data and for
preserving privacy, it considered how those technologies are evolving, and it explained what the technological
capabilities and trends imply for the design and enforcement of public policy intended to protect privacy in big-data
contexts.
Big data drives big benefits, from innovative businesses to new ways to treat diseases. The challenges to privacy
arise because technologies collect so much data (e.g., from sensors in everything from phones to parking lots) and
analyze them so efficiently (e.g., through data mining and other kinds of analytics) that it is possible to learn far more
than most people had anticipated or can anticipate given continuing progress. These challenges are compounded by
limitations on traditional technologies used to protect privacy (such as de-identification). PCAST concludes that
technology alone cannot protect privacy, and policy intended to protect privacy needs to reflect what is (and is not)
technologically feasible.
In light of the continuing proliferation of ways to collect and use information about people, PCAST recommends that
policy focus primarily on whether specific uses of information about people affect privacy adversely. It also
recommends that policy focus on outcomes, on the “what” rather than the “how,” to avoid becoming obsolete as
technology advances. The policy framework should accelerate the development and commercialization of
technologies that can help to contain adverse impacts on privacy, including research into new technological options.
By using technology more effectively, the Nation can lead internationally in making the most of big data’s benefits
while limiting the concerns it poses for privacy. Finally, PCAST calls for efforts to assure that there is enough talent
available with the expertise needed to develop and use big data in a privacy-sensitive way.
PCAST is grateful for the opportunity to serve you and the country in this way and hope that you and others who read
this report find our analysis useful.
Best regards,
John P. Holdren
Co-chair, PCAST
Eric S. Lander
Co-chair, PCAST

BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
vii
TableofContents
ThePresident’sCouncilofAdvisorsonScience andTechnology............................................i
PCASTBigDataandPrivacyWorkingGroup...........................................................................ii
TableofContents..................................................................................................................vii
ExecutiveSummary ................................................................................................................ix
1.Introdu c tion........................................................................................................................1
1.1Contextandoutlineof th is report............................................................................1
1.2Technologyhaslongdriventhe meaningofprivacy................................................3
1.3Whatisdifferenttod ay? ..........................................................................................5
1.4Values,harms,andrights .........................................................................................6
2.ExamplesandScen ario s....................................................................................................11
2.1Thingshappeningtodayorverysoon ....................................................................11
2.2Scenariosofthenearfutureinhe althcare andeducation.....................................13
2.2.1He althcare:personalizedmedicine.............................................................13
2.2.2He althcare:detectionofsymptomsbymobile devices ..............................13
2.2.3Education....................................................................................................14
2.3Challengestothehome’sspecialstatus................................................................14
2.4Tradeoffsamongp rivacy,se curity, and convenience............................................17
3.Collection,Analytics,andSupportingInfrastructure........................................................19
3.1Electronicsourcesofpersonaldata.......................................................................19
3.1.1“Bor n digital”data......................................................................................19
3.1.2Datafrom sensors .......................................................................................22
3.2Bigdataanalytics ....................................................................................................24
3.2.1Datamining.................................................................................................24
3.2.2Datafu sion andinformationintegration....................................................25
3.2.3Imageandspee ch recognition....................................................................26
3.2.4Socialnetworkanalysis...............................................................................28
3.3Theinfrastructurebehindbigdata........................................................................30
3.3.1Datacenters................................................................................................30
3.3.2Theclou d ....................................................................................................31
4.TechnologiesandStrategiesforP rivacyProtection.........................................................33
4.1Therelationshipbetweencybersecurityandprivacy .............................................33
4.2Cryptographyanden cryp tion ................................................................................35
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
viii
4.2.1WellEstablisheden cryp tion technology.....................................................35
4.2.2Encryp tionfrontiers....................................................................................36
4.3Noticeandconsent................................................................................................38
4.4Otherstrategiesandtechniques............................................................................38
4.4.1Anon y mization ordeidentification............................................................38
4.4.2Deletionand nonretention........................................................................39
4.5Robusttechnologiesgoingforward.......................................................................40
4.5.1ASuccessortoNoticeandConsent............................................................40
4.5.2ContextandUse..........................................................................................41
4.5.3Enforcementanddeterrence......................................................................42
4.5.4OperationalizingtheConsumerPrivacyBillofRights .................................43
5.PCASTPerspectivesandConclusions................................................................................47
5.1Technicalfeasibilityofpolicyinterventi ons...........................................................48
5.2Recommendations.................................................................................................49
5.4FinalRemarks .........................................................................................................53
AppendixA.AdditionalExpertsProvidingInput...................................................................55
SpecialAcknowledgment ......................................................................................................57

BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
ix
ExecutiveSummary
Theubiquityofcomputingandelectroniccommunicationtechnologieshasledtotheexponential
growthofdatafrombothdigitalandanalogsources.Newcapabilitiestogather,analyze,disseminate,
andpreservevastquantitiesofdataraisenewconcernsaboutthenatureofprivacyandthemeans by
whichindividualprivacymightbe
compromisedorprotected.
Afterprovidinganoverviewofthisreportanditsorigins,Chapter1describesthechangingnatureof
privacyascomputingtech nologyhasadvancedandbigdatahascometothefore.Thetermprivacy
encompassesnotonlythefamous“righttobeleftalone,”orkeepingone’s
personalmattersand
relationshipssecret,butalsotheabilitytoshareinformationselectivelybutnotpublicly.Anonymity
overlapswithprivacy,butthetwoarenotidentical.Likewise,theabilitytomakeintimatepersonal
decisionswithoutgovernmentinterferenceisconsideredtobeaprivacyright,asisprotectionfrom
discriminationonthe
basisofcertainpersonalcharacteristics(suchasrace,gender,orgenome).Privacy
isnotjustaboutsecrets.
ConflictsbetweenprivacyandnewtechnologyhaveoccurredthroughoutAmericanhistory.Concern
withtheriseofmassmediasuchasnewspapers inthe19
th
centuryledtolegalprotectionsagainstthe
harmsoradverseconsequencesof“intrusionuponseclusion,”publicdisclosureofprivatefacts,and
unauthorizeduseofnameorlikenessincommerce.Wireandradiocommunicationsledto20
th
century
lawsagainstwiretappingandtheinterceptionofprivatecommunicationslawsthat,PCASTnotes,have
notalwayskeptpacewiththetechnologicalrealitiesoftoday’sdigitalcommunications.
Pastconflictsbetweenprivacyandnewtechnologyhavegenerallyrelatedtowhatisnowtermed“small
data,”thecollectionanduseof
datasetsbyprivate‐andpublicsectororganizations wherethedataare
disseminatedintheiroriginalformoranalyzedbyconventionalstatisticalmethods.Today’sconcerns
aboutbigdatareflectboththesubstantialincreasesintheamountofdatabeingcollectedand
associatedchanges,bothactualandpotential,inhowtheyare
used.
Bigdataisbigintwodifferentsenses.Itisbiginthequan tityandvarietyofdatathatareavailabletobe
processed.And,itisbiginthescale ofanalysis(termed“analytics”)thatcanbeappliedtothosedata,
ultimatelytomakeinferencesanddrawconclusions.
Bydataminingandotherkindsofanalytics,non
obviousandsometimesprivateinformationcanbede rivedfromdata that,atthetimeoftheir
collection,seemedtoraiseno,oronlymanageable,privacyissues.Suchnewinformation,used
appropriately,mayoftenbringbenefitstoindividualsandsocietyChapter
2ofthisreportgivesmany
suchexamples,andadditionalexamplesarescatteredthroughouttherestofthetext.Eveninprinciple,
however,onecanneverknowwhatinformationmaylaterbeextractedfromanyparticularcollectionof
bigdata,bothbecausethatinformationmayresultonlyfromthecombinationof
seeminglyunrelated
datasets,andbecausethealgorithmforrevealingthenewinformationmaynotevenhavebeen
inventedatthetimeofcollection.
Thesamedataandanalyticsthatprovidebenefitstoindividualsandsocietyifusedappropriatelycan
alsocreatepotentialharmsthreatstoindividualprivacyaccording
toprivacynormsbothwidely
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
x
sharedandpersonal.Forexample,largescaleanalysisofresearchondisease,togetherwithhealthdata
fromelectronicmedicalrecordsandgenomicinformation,mightleadtobetterandtimeliertreatment
forindividualsbutalsotoinappropriatedisqualificationforinsuranceorjobs.GPStrackingofindividuals
mightleadtobettercommunitybased
publictransportationfacilities,butalsotoinappropriateuseof
thewhereaboutsofindividuals.Alistofthekindsofadverseconsequencesorharms fromwhich
individualsshouldbeprotectedispropos edinSection1.4.PCASTbelievesstronglythatthepositive
benefitsofbigdatatechn ologyare(orcanbe)greater
thananynewharms.
Chapter3ofthereportdescribesthemanynewwaysinwhichpersonaldataareacquired,bothfrom
originalsources,andthroughsubsequentprocessing.Today,althoughtheymaynotbeawareofit,
individualsconstantlyemitintotheenvironmentinformation whoseuseormisusemaybea
sourceof
privacyconcerns.Physically,theseinformationemanationsareoftwotypes,whichcanbecalled“born
digital”and“bornanalog.”
Wheninformationis“borndigital,”itiscreated,byusorbyacomputersurrogate,specificallyforuseby
acomputerordataprocessing system.Whendataareborn
digital,privacyconcernscanarisefrom
overcollection.Overcollection occurswhenaprogram’sdesignintentionally, andsometimes
clandestinely,collectsinformationunre latedtoitsstatedpurpose.Overcollectioncan,inprinciple,be
recognizedatthetimeofcollection.
Wheninformationis“bornanalog,”itarisesfromthecharacteristicsofthephysical
world.Such
informationbecomesaccessibleelectronicallywhenitimpingesonasensorsuchasacamera,
microphone,orotherengineered device.Whendataarebornanalog,theyarelikelytocontainmore
informationthantheminimumnecessaryfortheirimmediatepurpose,andforvalidreasons.One
reasonisforrobustness
ofthedesired“signal”inthepresenceofvariable“noise.”Anotheris
technologicalconvergence , theincreasinguseofstandardizedcomponents(e.g.,cellphonecameras)in
newproducts(e.g.,homealarmsystemscapableofrespondingtogesture).
Datafusionoccurswhendatafromdifferentsourcesarebroughtintocontactandnewfacts
emerge
(seeSection3.2.2).Individually,eachdatasource mayhaveaspecific,limitedpurpose.Their
combination,however,mayuncovernewmeanings.Inparticular,datafusioncanresultinthe
identificationofindividualpeople,thecreationofprofilesofanindividual,andthetrackingofan
individual’sactivities.Morebroadly,data
analyticsdiscoverspatternsandcorrelationsinlargecorpuses
ofdata,usingincreasinglypowerfulstatisticalalgorithms.Ifthosedataincludepersonaldata,the
inferencesflowingfromdataanalyticsmaythenbemappedbacktoinf erences,bothcertainand
uncertain,aboutindividuals.
Becauseofdatafusion,privacyconcernsmaynotnecessarilyberecognizable
inborndigitaldatawhen
theyarecollected.Becauseofsignalprocessingrobustnessandstandardization,thesameistrueof
bornanalogdataevendatafromasinglesource(e.g.,asinglesecuritycamera).Borndigitaland
bornanalogdatacanbothbecombinedwithdatafusion,andnew
kindsofdatacanbegeneratedfrom
dataanalytics.Thebeneficialusesofnearubiquitousdatacollectionarelarge,andtheyfuelan
increasinglyimportantsetofeconomicactivities.Takentogether,theseconsiderationssuggestthata
policyfocusonlimitingdatacollectionwillnotbeabroadlyapplicableorscalablestrategy
norone
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
xi
likelytoachievetherightbalancebetweenbeneficialresultsandunintendednegativeconsequences
(suchasinhibitingeconomicgrowth).
Ifcollectioncannot,inmostcases,belimitedpractically,thenwhat?Chapter4discussesindetaila
numberoftechnologiesthathavebeenusedinthepastforprivacyprotection,andothersthat
may,toa
greaterorlesserextent,serveastechnologybuildingblocksforfuturepolicies.
Sometechnologybuildingblocks(forexample,cybersecuritystandards,technologiesrelatedto
encryption,andformalsystemsofauditableaccesscontrol)arealreadybeingutilizedandneedtobe
encouragedinthemarketplace.Ontheotherhand,
sometechniquesforprivacyprotectionthathave
seemedencouraginginthepastareusefulassupplementarywaystoreduceprivacyrisk,butdonot
nowseemsufficientlyrobusttobeadependablebasisforprivacyprotectionwherebigdatais
concerned.Foravarietyofreasons,PCASTjudgesanonymization,datadeletion,
anddistinguishingdata
frommetadata(definedbelow)tobeinthiscategory.Theframeworkofnoticeandconsentisalso
becomingunworkableasausefulfoundationforpolicy.
Anonymizationisincreasinglyeasilydefeatedbytheverytechniquesthatarebeingdevelopedformany
legitimateapplicationsofbigdata.Ingeneral,
asthesizeanddiversityofavailabledatagrows,the
likelihoodofbeingabletoreidentifyindividuals(thatis,reassociatetheirrecordswiththeirnames)
growssubstantially.Whileanonymization mayremainsomewhatusefulasanaddedsafeguardinsome
situations,approachesthatdeemit,byitself,asufficientsafeguard
needupdating.
Whileitisgoodbusinesspracticethatdataofallkindsshouldbedeletedwhentheyarenolongerof
value,economicorsocialvalueoftencanbeobtainedfromapplyingbigdatatechniquestomassesof
datathatwereotherwiseconsideredtobeworthless.Similarly,archivaldata
mayalsobeimportantto
futurehistorians,orforlaterlongitudinalanalysisbyacademicresearchersandothers.Asdescribed
above,manysourcesofdatacontainlatentinformationaboutindividuals,informationthatcanbe
knownonlyiftheholderexpendsanalyticresources,orthatmaybecomeknowableonlyinthefuture
with
thedevelopmentofnewdataminingalgorithms.Insuchcasesitispracticallyimpossibleforthe
dataholdereventosurface“allthedataaboutanindividual,”muchlessdeleteitonanyspecified
scheduleorinresponsetoanindividual’srequest.Today,giventhedistribute dandredundantnatureof
datastorage,itisnotevenclearthatdata,evensmalldata,canbedestroyedwithanyhighdegreeof
assurance.
Asdatasetsbecomemorecomplex,so dotheattachedmetadata.Metadataareancillarydatathat
describepropertiesofthedatasuchasthetimethedatawerecreated,the
deviceonwhichtheywere
created,orthedestinationofamessage.Includedinthedataormetadatamaybeidentifying
informationofmanykinds.Itcannottodaygenerallybeassertedthatmetadataraisefewerprivacy
concernsthandata.
Noticeandconsentisthepracticeofrequiringindividualstogive
positiveconsenttothepersonaldata
collectionpracticesofeachindividualapp,program,orwebservice.Onlyinsomefantasyworlddo
usersactuallyreadthesenoticesandunderstandtheirimplicationsbeforeclickingtoindicatetheir
consent.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
xii
Theconceptualproblemwithnoticeandconsentisthatitfundamentallyplacestheburdenofprivacy
protectionontheindividual.Noticeandconsentcreatesanonlevelplayingfieldintheimplicitprivacy
negotiationbetweenprovideranduser.Theprovideroffersacomplex,takeitorleaveitsetofterms,
whiletheuser,inpractice,canallocateonlyafewsecondstoevaluatingtheoffer.Thisisakindof
marketfailure.
PCASTbelievesthattheresponsibilityforusingpersonaldatainaccordancewiththeuser’spreferences
shouldrestwiththeproviderratherthanwiththeuser.Asapractical
matter,intheprivatesector,third
partieschosenbytheconsumer(e.g.,consumerprotectionorganizations,orlargeappstores)could
intermediate:Aconsumermightchooseoneofseveral“privacyprotectionprofiles”offeredbythe
intermediary,whichinturnwouldvetappsagainsttheseprofiles.Byvettingapps,theintermediaries
wouldcreate
amarketplaceforthenegotiationofcommunitystandardsforprivacy.TheFederal
governmentcouldencouragethedevelopmentofstandardsforelectronicinterfacesbetweenthe
intermediariesandtheappdevelopersandvendors.
Afterdataarecollected,dataanalyticscomeintoplayandmaygenerateanincreasingfractionof
privacyissues.Analysis,
perse,doesnotdirectlytouchtheindividu al (itisneithercollectionnor,
withoutadditionalaction,use)andmayhavenoexternalvisibility.Bycontrast,itistheuseofaproduct
ofanalysis,whetherincommerce,bygovernment,bythepress,orbyindividuals,thatcancause
adverseconsequencesto
individuals.
Morebroadly,PCASTbelievesthatitistheuseofdata(includingborndigitalorbornanalogdataand
theproductsofdatafusionandanalysis)thatisthelocuswhereconsequencesareproduced.Thislocus
isthetechnicallymostfeasibleplacetoprotectprivacy.Technologiesareemerging,both
inthe
researchcommunityandinthecommercialworld,todescribeprivacypolicies,torecordtheorigins
(provenance)ofdata,theiraccess,andtheirfurtherusebyprograms,includinganalytics,and to
determinewhetherthoseusesconformtoprivacypolicies.Someapproachesarealreadyinpractical
use.
Giventhestatisticalnature
ofdataanalytics,thereisuncertaintythatdiscoveredproperties ofgroups
applytoaparticularindividualinthegroup.Makingincorrectconclusionsaboutindividualsmayhave
adverseconsequencesforthemandmayaffectmembersofcertaingroupsdisproportionately(e.g.,the
poor,theelderly,orminorities).Amongthetechnicalmechanis msthat
canbeincorporatedinause
basedapproacharemethodsforimposi ngstandardsfordataaccuracyandintegrityandpoliciesfor
incorporatinguseableinterfacesthatallowanindividualtocorrecttherecordwithvoluntaryadditional
information.
PCAST’schargeforthisstudydidnotaskittorecommendspecificprivacypolicies,but
rathertomakea
relativeassessmentofthetechnicalfeasibilitiesofdifferentbroadpolicyapproaches.Chapter5,
accordingly,discussestheimplicationsofcurrentandemergingtechnologiesforgovernmentpoliciesfor
privacyprotection.Theuseoftechnicalmeasuresforenforcingprivacycanbestimulatedby
reputationalpressure,butsuchmeasuresare
mosteffectivewhenthereareregulationsandlawswith
civilorcriminalpenalties.Rulesandregulationsprovidebothdeterrenceofharmfulactionsand
incentivestodeployprivacyprotectingtechnologies.Privacyprotectioncannotbeachievedby
technicalmeasuresalone.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
xiii
Thisdiscussionleadstofiverecommendations.
Recommendation1.Policyattentionshouldfocusmoreontheactualusesofbigdataandlessonits
collectionandanalysis.Byactualuses,wemeanthespecificeventswheresomethinghappensthatcan
causeanadverseconsequenceorharmtoanindividualorclass
ofindividuals.Inthecontextofbig
data,theseevents(“uses”)arealmostalwaysactionsofacomputerprogramorappinteractingeither
withtherawdataorwiththefruitsofanalysisofthosedata.Inthisformulation,itisnotthedata
themselvesthatcausetheharm,nor
theprogramitself(absentanydata),buttheconfluenceofthe
two.These“use”events(incommerce,bygovernment,orbyindividuals)embodythenecessary
specificitytobethesubjectofregulation.Bycontrast,PCASTjudgesthatpoliciesfocused onthe
regulationofdatacollection,storage,retention,apriorilimitationson
applications,andanalysis(absent
identifiableactualusesofthedataorproductsofanalysis)areunlikelytoyieldeffectivestrategiesfor
improvingprivacy.Suchpolicieswouldbeunlikelytobescalableovertime,ortobeenforceableby
otherthansevereandeconomicallydamagingmeasures.
Recommendation2.Policiesandregulation,
atalllevelsofgovernment,shouldnotembedparticular
technologicalsolutions,butrathershouldbestatedintermsofintendedoutcomes.
Toavoidfallingbehindthetechnology,itisessentialthatpolicyconcerningprivacyprotectionshould
addressthepurpose(the“what”)ratherthanprescribingthemechanism(the“how”).
Recommendation3.With
coordinationandencouragementfromOSTP,
1
theNITRDagencies
2
should
strengthenU.S.researchinprivacyrelatedtechnologiesandintherelevantareasofsocialscience
thatinformthesuccessfulapplicationofthosetechnologies.
Someofthetechnologyforcontrollingusesalreadyexists.However,research(andfundingforit)is
neededinthetechnologiesthathelptoprotectprivacy,in
thesocialmechanismsthatinfluenceprivacy
preservingbehavior,andinthelegaloptionsthatarerobusttochangesintechnologyandcreate
appropriatebalanceamongeconomicopportunity,nationalpriorities,andprivacyprotection.
Recommendation4.OSTP,togetherwiththeappropriateeducationalinstitutionsandprofessional
societies,shouldencourageincreasededucationandtrainingopportunities
concerningprivacy
protection,includingcareerpathsforprofessionals.
Programsthatprovideeducationleadingtoprivacyexpertise(akintowhatisbeingdoneforsecurity
expertise)areessentialandneedencouragement.Onemightenvisioncareersfordigitalprivacyexperts
bothonthesoftwaredevelopmentsideandonthetechnicalmanagementside.

1
TheWhiteHouseOfficeofScienceandTechnologyPolicy
2
NITRDreferstotheNetworkingandInformationTechnologyResearchandDevelopmentprogram,whose
participatingFederalagenciessupportunclassifiedresearchinadvancedinformationtechnologiessuchas
computing,networking,andsoftwareandincludebothresearch‐andmissionfocusedagenciessuchasNSF,NIH,
NIST,DARPA,NOAA,DOE’sOfficeofScience,andthe
D0Dmilitaryservicelaboratories(see
http://www.nitrd.gov/SUBCOMMITTEE/nitrd_agencies/index.aspx).
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
xiv
Recommendation5.TheUnitedStatesshouldtaketheleadbothintheinternationalarenaandat
homebyadoptingpoliciesthatstimulatetheuseofpracticalprivacyprotectingtechnologiesthat
existtoday.Itcanexhibitleadershipbothbyitsconveningpower(forinstance,bypromotingthe
creationandadoptionofstandards)
andalsobyitsownprocurementpractices(suchasitsownuseof
privacypreservingcloudservices).
PCASTisnotawareofmoreeffectiveinnovationorstrategiesbeingdevelopedabroad;rather,some
countriesseeminclinedtopursuewhatPCASTbelievestobeblindalleys.Thiscircumstanceoffersan
opportunityfor
U.S.technicalleadershipinprivacyintheinternationalarena,anopportunitythat
shouldbetaken.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
1
1.Introduction
InawidelynotedspeechonJanuary17,2014,PresidentBarackObamachargedhisCounselor,JohnPodesta,
withleadingacomprehensivereviewofbigdataandprivacy,onethatwould“reachouttoprivacyexperts,
technologists,andbusinessleadersandlookathowthechallengesinherentinbigdataarebeing
confrontedby
boththepublicand privatesectors;whetherwecanforgeinternationalnormsonhowtomanagethisdata;and
howwecancontinuetopromotethefreeflowofinformationinwaysthatareconsistentwithbothprivacyand
security.”
3
ThePresidentandCounselorPodestaaskedthePresident’sCouncilofAdvisorsonScienceand
Technology(PCAST)toassistwiththetechnologydimensionsofthereview.
ForthistaskPCAST’sstatementofworkreads,inpart,
PCASTwillstudythetechnologicalaspectsoftheintersectionofbigdatawithindividualprivacy,in
relationtoboththecurrentstateandpossiblefuturestatesoftherelevanttechnologicalcapabilities
andassociatedprivacyconcerns.
Relevantbigdataincludedataand metadatacollected,orpotentiallycollectable,fromorabout
individualsbyentitiesthatincludethegovernment,theprivatesector,andotherindividuals.Itincludes
bothproprietary
andopendata,andalsodataaboutindividualscollectedincidentallyoraccidentally in
thecourseofotheractivities(e.g.,environmentalmonitoringorthe“InternetofThings”).
Thisisatallorder,especiallyontheambitioustimescalerequestedbythePresident.Theliteratureandpublic
discussionofbigdataandprivacy
arevast,withnewideasandinsightsgenerateddailyfromavarietyof
constituencies:technologistsinindustryandacademia,privacyandconsumeradvocates,legalscholars,and
journalists(amongothers).IndependentlyofPCAST,butinformingthisreport,thePodestastudysponsored
threepublicworkshopsatuniversitiesacrossthecountry.Limitingthisreport’s
chargetotechnological,not
policy,aspectsoftheproblemnarrowsPCAST’smandatesomewhat,butthisisasubjectwheretechnologyand
policyaredifficulttoseparate.Inanycase,itisthenatureofthesubject thatthis reportmu stberegardedas
basedonamomentarysnapshotofthe
technology,althoughwebelievethekeyconclusionsand
recommendationshavelastingvalue.
1.1Contextandoutlineofthisreport
Theubiquityofcomputingandelectroniccommunicationtechnologieshasledtotheexponentialgrowthof
onlinedata,frombothdigitalandanalogsources.Newtechnologicalcapabilitiestocreate,analyze,and
disseminatevastquantitiesofdataraisenewconcernsaboutthenatureofprivacyandthemeansbywhich
individualprivacymight
becompromisedorprotected.
Thisreportdiscussespresentandfuturetechnologiesconcerningthissocalled“bigdata”asitrelatestoprivacy
concerns.Itisnotacompletesummaryofthetechnologyconcerningbigdata,noracompletesummaryofthe
waysinwhichtechnologyaffectsprivacy,butfocuseson
thewaysinwhichbigdata andprivacyinteract.Asan
example,ifLeslieconfidesasecrettoChrisandChrisbroadcaststhatsecretbyemailortexting,thatmightbea

3
“RemarksbythePresidentonReviewofSignalsIntelligence,”January17,2014.http://www.whitehouse.gov/thepress
office/2014/01/17/remarkspresidentreviewsignalsintelligence
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
2
privacyinfringinguseofinformationtechnology,butitisnotabigdataissue.Asanotherexample,if
oceanographicdataarecollectedinlargequantitiesbyremotesens ing,thatisbigdata,butnot,inthefirst
instance,aprivacyconcern.Somedataaremoreprivacysensitivethanothers,forexample,
personalmedical
data,asdistinctfrompersonaldatapubliclysharedbythesameindividual.Differenttechnologiesandpolicies
willapplytodifferentclasses ofdata.
Thenotionsofbigdataandthenotionsofindividualprivacyusedinthisreportareintentionallybroadand
inclusive.BusinessconsultantsGartner,Inc.define
bigdataas“highvolume,highvelocityandhighvariety
informationassetsthatdemandcosteffective,innovativeformsofinformationprocessingforenhancedinsight
anddecisionmaking,”
4
whilecomputerscientistsreviewingmultipledefinitionsofferthemoretechnical,“a
termdescribingthestorageandanalysisoflargeand/orcomplexdatasetsusingaseriesoftechniques
including,butnotlimitedto,NoSQL,MapReduce,andmachinelearning.”
5
(SeeSections3.2.1and3.3.1for
discussionofthesetechnicalterms.)Inaprivacycontext,theterm“bigdata”typicallymeansdataaboutoneor
agroupofindividuals,orthatmightbeanalyzedtomakeinferencesaboutindividuals.Itmightincludedataor
metadatacollectedbygovernment,bythe
privatesector,orbyindividuals.Thedataandmetadatamightbe
proprietaryoropen,theymightbecollectedintentionallyorincidentallyoraccidentally.Theymightbetext,
audio,video,sensorbased,orsomecombination.Theymightbedatacollecteddirectlyfromsomesource,or
dataderivedbysomeprocessofanalysis.
Theymightbesavedforalongperiodoftime,ortheymightbe
analyzedanddiscardedastheyarestreamed.Inthisreport,PCASTusuallydoesnotdistinguishbetween“data”
and“information.”
Theterm“privacy”encompassesnotonlyavoidingobservation,orkeepingone’spersonalmattersand
relationshipssecret,but
alsotheabilitytoshareinformationselectivelybutnotpublicly.Anonymityoverlaps
withprivacy,butthetwoarenotidentical.Votingisrecognizedasprivate,butnotanonymous,while
authorshipofapoliticaltractmaybeanonymous,butitisnotprivate.Likewise,theabilitytomakeintimate
personaldecisionswithout
governmentinterferenceisconsideredtobeaprivacyright,asisprotectionfrom
discriminationonthebasisofcertainpersonalcharacteristics(suc hasanindividual’srace,gender,orgenome).
So,privacyisnotjustaboutsecrets.
Thepromiseofbigdatacollectionandanalysisisthatthederiveddatacan
beusedforpurposesthatbenefit
bothindividualsandsociety.Threatstoprivacystemfromthedeliberateorinadvertentdisclosureofcollected
orderivedindividualdata,themisuseofthedata,andthefactthatderive ddata maybeinaccurateorfalse.The
technologiesthataddresstheconfluenceoftheseissues
arethesubjectofthisreport.
6
Theremainderofthisintroductorychaptergivesfurthercontextintheformofasummaryofhowthelegal
conceptofprivacydevelopedhistoricallyintheUnitedStates.Interestingly,andrelevanttothisreport,privacy
rightsandthedevelopmentofnewtechnologieshavelongbeenintertwined.Today’sissuesareno
exception.
Chapter2ofthisreportisdevotedtoscenariosandexamples,somefromtoday,butmostanticipatinganear
tomorrow.YogiBerra’smuchquotedremark“It’stoughtomakepredictions,especiallyaboutthefuture”is

4
Gartner,Inc.,“ITGlossary.”https://www.gartner.com/itglossary/bigdata/
5
Barker,AdamandJonathanStuartWard,“UndefinedByData:ASurveyofBigDataDefinitions,”arXiv:1309.5821.
http://arxiv.org/abs/1309.5821
6
PCASTacknowledgesgratefullytheassistanceofseveralcontributorsattheNationalScienceFoundation,whohelpedto
identifyanddistillkeyinsightsfromthetechnicalliteratureandresearchcommunity,aswellasothertechnicalexpertsin
academiaandindustrythatitconsultedduringthisproject.SeeAppendixA.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
3
germane.Butitisequallytrueforthissubjectthatpoliciesbasedonoutofdateexamplesandscenariosare
doomedtofailure.Bigdatatechnologiesareadvancingsorapidlythatpredictionsaboutthefuture,however
imperfect,mustguidetoday’spolicydevelopment.
Chapter3examinesthetechnologydim ensionsofthetwo
greatpillarsofbigdata:collectionandanalysis.Ina
certainsensebigdataisexactlytheconfluenceofthesetwo:bigcollectionmeetsbiganalysis(oftentermed
“analytics”).Thetechnicalinfrastructureoflargescalenetworkingandcomputingthatenables“big”isalso
discussed.
Chapter4looksattechnologiesand
strategiesfortheprotectionofprivacy.Althoughtechnologymaybepartof
theproblem,itmustalsobepartofthesolution.Manycurrent andforeseeabletechnologiescanenhance
privacy,andtherearemanyadditionalpromisingavenuesofresearch.
Chapter5,drawingonthepreviouschapters,containsPCAST’sperspectivesandconclusions.
Whileitisnot
withinthisreport’schargetorecommendspecificpolicies,itisclearthatcertainkindsofpoliciesaretechnically
morefeasibleandlesslikelytoberenderedirrelevantorunworkablebynewtechnologiesthanothers.These
approachesarehighlighted,alongwithcommentsonthetechnicaldeficienciesof
someotherapproaches.This
chapteralsocontainsPCAST’srecommendationsinareasthatliewithinourcharge,thatis,otherthanpolicy.
1.2Technologyhaslongdriventhemeaningofprivacy
Theconflictbetweenprivacyandnewtechnologyisnotnew,exceptperhapsnowinitsgreaterscope,degreeof
intimacy,andpervasiveness.Formorethantwoce nturies,valuesandexpectationsrelatingtoprivacyhave
beencontinuallyreinterpretedandrearticulatedinlightoftheimpactofnewtechnologies.
Thenationwidepostalsystem
advocatedbyBenjaminFranklinandestablishedin1775wasanewtechnology
designedtopromoteinterstatecommerce.Butmailwasroutinelyandopportunisticallyopenedintransituntil
Congressmadethisactionillegalin1782.WhiletheConstitution’sFourthAmendmentcodifiedtheheightened
privacyprotectionaffordedtopeopleintheirhomes
orontheirpersons(previouslyprinciplesofBritish
commonlaw),ittookanothercenturyoftechnologicalchallengestoexpandtheconceptofprivacyrightsinto
moreabstractspaces,includingtheele ctronic.Thei nventionofthetelegraphand,later,telephonecreatednew
tensionsthatwereslowtoberesolved.Abillto
protecttheprivacyoftelegrams,introducedinCongressin
1880,wasneverpassed.
7
Itwasnottelecommunications,however,buttheinventionoftheportable,consumeroperablecamera(soon
knownastheKodak)thatgaveimpetustoWarrenandBrandeis’s1890article“TheRighttoPrivacy,”
8
thena
controversialtitle,butnowviewedasthefoundationaldocumentformodernprivacylaw.Inthearticle,Warren
andBrandeisgavevoicetotheconcernthat
[i]nstantaneousphotographsandnewspaperenterprisehave
invadedthesacredprecinctsofprivateanddomesticlife;andnumerousmechanicaldevicesthreate nto
makegoodthepredictionthat‘whatiswhisperedintheclosetshallbeproclaimedfromthehousetops,’”
furthernotingthat
“[f]oryearstherehasbeenafeelingthatthelawmustaffordsomeremedyforthe
unauthorizedcirculati o n ofportraitsofprivatepersons
…”
9


7
Seipp,DavidJ.,TheRighttoPrivacyinAmericanHistory,HarvardUniversity,ProgramonInformationResourcesPolicy,
Cambridge,MA,1978.
8
Warren,SamuelD.andLouisD.Brandeis,"TheRighttoPrivacy."HarvardLawReview4:5,193,December15,1890.
9
Id.at195.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
4
WarrenandBrandeissoughttoarticulatetherightofprivacybetweenindividuals(whosefoundationliesincivil
tortlaw).Today,manystatesrecognizeanumberofprivacyrelatedharmsascausesforcivilorcriminallegal
action(furtherdiscussedinSection1.4).
10
FromWarrenandBrandeis’“righttoprivacy,”ittookanother75yearsfortheSupremeCourttofind, in
Griswoldv.Connecticut
11
(1965),arighttoprivacyinthe"penumbras"and"emanations"ofotherconstitutional
protections(asJusticeWilliamO.Douglasputit,writingforthemajority).
12
Withabroadperspective,scholars
todayrecognizeanumberofdifferentlegalmeaningsfor“privacy.”Fiveoftheseseemparticularlyrelevantto
thisPCASTreport:
(1) Theindividual’srighttokeepsecretsorseekseclusion(thefamous“righttobeleftalone”ofBrandeis’
1928dissentingopinioninOlmsteadv.
UnitedStates).
13
(2) Therighttoanonymousexpression,especially(butnotonly)inpoliticalspeech(asinMcIntyrev.Ohio
ElectionsCommission
14
)
(3) Theabilitytocontrolaccessbyotherstopersonalinformationafteritleavesone’sexclusivepossession
(forexample,asarticulatedintheFTC’sFairInformationPracticePrinciples).
15
(4) Thebarringofsomekindsofnegativeconsequencesfromtheuseofanindividual’spersonal
information(forexample,jobdiscriminationonthebasisofpersonalDNA,forbiddenin2008bythe
GeneticInformationNondiscriminationAct
16
).
(5) Therightoftheindividualtomake intimatedecisionswithoutgovernmentinterference,asinthe
domainsofhealth,reproduction,andsexuality (asinGriswold).
Theseareasserted,notabsolute,rights.Allaresupported,butalsocircumscribed,bybothstatuteandcaselaw.
Withtheexceptionofnumber5
onthelist(arightof“decisionalprivacy”asdistinctfrom“informational
privacy”),allareapplicableinvaryingdegreesbothtocitizengovernmentinteractionsandtocitizencitizen
interactions.Collisionsbetweennewtechnologiesandprivacyrightshaveoccurredinallfive.Apatchworkof
stateandfederallawshaveaddressedconcerns
inmanysectors,buttodatetherehasnotbeencomprehensive
legislationtohandletheseissues.Collisionsbetweennewtechnologiesandprivacyrightsshouldbeexpectedto
continuetooccur.

10
DigitalMediaLawProject,“PublishingPersonalandPrivateInformation.”http://www.dmlp.org/legalguide/publishing
personalandprivateinformation
11
Griswoldv.Connecticut,381U.S.479(1965).
12
Id.at48384.
13
Olmsteadv.UnitedStates,277U.S.438(1928).
14
McIntyrev.OhioElectionsCommission,514U.S.334,34041(1995).Thedecisionreadsinpart,“Protectionsfor
anonymousspeecharevitaltodemocraticdiscourse.Allowingdissenterstoshieldtheiridentitiesfreesthemtoexpress
criticalminorityviews...Anonymityisashieldfromthetyrannyofthe
majority....Itthusexemplifiesthepurposebehind
theBillofRightsandoftheFirstAmendmentinparticular:toprotectunpopularindividualsfromretaliation...atthehand
ofanintolerantsociety.
15
FederalTradeCommission,“PrivacyOnline:FairInformationPracticesintheElectronicMarketplace,”May2000.
16
GeneticInformationNondiscriminationActof2008,PL110–233,May21,2008,122Stat881.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
5
1.3Whatisdifferenttoday?
Newcollisionsbetweentechnologiesandprivacyhavebecomeevident,asnewtechnologicalcapabilitieshave
emergedatarapidpace.Itisnolongerclearthatthefiveprivacyconcernsraisedabove,ortheircurrentlegal
interpretations,aresufficientinthecourtofpublicopinion.
Muchofthepublic’sconcerniswith
theharmdonebytheuseofpersonaldata,bothinisolationorin
combination.Controllingaccesstopersonaldataaftertheyleaveone’sexclusivepossessionhasbeenseen
historicallyasameansofcontrollingpotentialharm.Buttoday,personaldatamayneverbe,orhavebeen,
withinone’spossession
forinstancetheymaybeacquiredpassivelyfromexternalsourcessuchaspublic
camerasandsensors,orwithoutone’sknowledgefrompublicelectronicdisclosuresbyothersusingsocial
media.Inaddition,personaldatamaybederivedfrompowerfuldataanalyses(seeSection3.2)whoseuseand
outputisunknownto
theindividual.Those analyses sometimesyieldvalidconclusions thattheindividualwould
notwantdisclosed.Worseyet,theanalysescanproducefalsepositivesorfalsenegatives‐‐i nformationthatis
aconsequenceoftheanalysisbutisnottrueorcorrect.Furthermore,toamuchgreaterextentthanbefore,the
same
personaldatahavebothbeneficialandharmfuluses,dependingonthepurposesforwhichandthe
contextsinwhichtheyareused.Informationsuppliedbytheindividualmightbeusedonlytoderiveother
informationsuchasidentityoracorrelation,afterwhichitisnotneeded.Thederiveddata,which
werenever
undertheindividual’scontrol,mightthenbeusedeitherforgoodorill.
Inthecurrentdiscourse,someassertthattheissuesconcerningprivacyprotectionarecollectiveaswellas
individual,particularlyinthedomainofcivilrightsforexample,identificationofcertainindividualsata
gathering
usingfacialrecognitionfromvideos,andtheinferencethatotherindividualsatthesamegathering,
alsoidentifiedfromvideos,havesimilaropinionsorbehaviors.
Currentcircumstancesalsoraiseissuesofhowtherighttoprivacyextendstothepublicsquare,ortoquasi
privategatheringssuchaspartiesorclassrooms.If
theobserversinthesevenuesarenotjustpeople,butalso
bothvisibleandinvisiblerecordingdeviceswithenormousfidelityandeasypathstoelectronicpromulgation
andanalysis,doesthatchangetherules?
Alsorapidlychangingarethedistinctionsbetweengovernmentandtheprivatesectoraspotentialthreatsto
individual
privacy.Governmentisnotjusta“giantcorporation.”Ithasamonopolyintheuseofforce;ithasno
directcompetitorswho seekmar ket advantageoveritand maythusmotivateittocorrectmissteps.
Governmentshavechecksandbalances,whichcancontributetoselfimposedlimitsonwhatthey
maydowith
people’sinformation.Companiesdecidehowtheywillusesuchinformationinthecontextofsuchfactorsas
competitiveadvantagesandrisks,governmentregulation,andperceived threatsandconsequencesoflawsuits.
Itisthusappropriatethattherearedifferentsetsofconstraintsonthepublicandprivatesectors.
But
governmenthasasetofauthoritiesparticularlyintheareasoflawenforcementandnationalsecuritythat
placeitinauniquelypowerfulposition,andthereforetherestraintsplacedonitscollectionanduseofdata
deservespecialattention.Indeed,theneedforsuchatten tionisheightenedbecause
oftheincreasinglyblurry
linebetweenpublicandprivatedata.
Whilethesedifferencesarereal,bigdataistosomeextentalevelerofthedifferencesbetweengovernmentand
companies.Bothgovernmentsandcompanieshavepotentialaccesstothesamesourcesofdataandthesame
analytictools.Currentrules
mayallowgovernmenttopurchaseorotherwiseobtaindatafromtheprivate
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
6
sectorthat,insomecases,itcouldnotlegallycollectitself,
17
ortooutsourcetotheprivatesectoranalysesit
couldnotitselflegallyperform.
18
Thepossibilityofgovernmentexercising,withoutpropersafeguards,itsown
monopolypowersandalsohavingunfetteredaccesstotheprivateinformationmarketplaceisunsettling.
Whatkindsofactionsshouldbeforbiddenbothtogovernment(Federal,state,andlocal,andincludinglaw
enforcement)andtotheprivatesector?Whatkindsshould
beforbiddentoonebutnottheother?Itisunclear
whethercurrentlegalframeworksaresufficientlyrobustfortoday’schallenges.
1.4Values,harms,andrights
AswasseeninSections1.2and1.3,newprivacyrightsusuallydonotcomeintobeingasacademicabstractions.
Rather,theyarisewhentechnology encroachesonwidelysharedvalues.Wherethereisconsensusonvalues,
therecanalsobeconsensusonwhatkindsofharmstoindividualsmaybean
affronttothosevalues.Notall
suchharmsmaybepreventableorremediablebygovernmentactions,but,conversely,itisunlikelythat
governmentactionswillbewelcomeoreffectiveiftheyarenotgroundedtosomedegreeinvaluesthatare
widelyshared.
Intherealmofprivacy,Warrenand
Brandeisin1890
19
(seeSection1.2)beganadialogueaboutprivacythatled
totheevolutionoftherightinacademiaandthecourts,latercrystalizedbyWilliamProsserasfourdistinct
harmsthathadcometoearnlegalprotection.
20
Adirectresultisthat,today,manystatesrecognizeascauses
forlegalactionthefourharmsthatProsserenumerated,
21
andwhichhavebecome(thoughvaryingfromstate
tostate
22
)privacy“rights.”Theharmsare:
Intrusionuponseclusion.A personwhointentionallyintrudes,physicallyoro therwise(nowincluding
electronically),uponthesolitudeorseclusionofanotherpersonorherprivateaffairsorconcerns,can
besubjecttoliabilityfortheinvasionofherprivacy,butonlyiftheintrusion
wouldbehighlyoffensiveto
areasonableperson.
Publicdisclosureofprivatefacts.Similarly, apersoncanbesuedforpublishingprivatefactsabout
anotherperson,evenifthosefactsaretrue.Privatefactsarethoseaboutsomeone’spersonallifethat
havenotpreviouslybeenmadepublic, thatare
notoflegitimatepublicconcern,andthatwouldbe
offensivetoareasonableperson.

17
OneHundredTenthCongress,“Privacy:Theuseofcommercialinformationresellersbyfederalagencies,”Hearingbefore
theSubcommitteeonInformationPolicy,Census,andNationalArchivesoftheCommitteeonOversightandGovernment
Reform,HouseofRepresentatives,March11,2008.
18
Forexample,ExperianprovidesmuchofHealthcare.gov’sidentityverificationcomponentusingconsumercredit
informationnotavailabletothegovernment.SeeConsumerReports,“Havingtroubleprovingyouridentityto
HealthCare.gov?Here'showtheprocessworks,”December18,2013.
http://www.consumerreports.org/cro/news/2013/12/howtoproveyouridentityonhealthcare
gov/index.htm?loginMethod=auto
19
Warren,SamuelD.andLouisD.Brandeis,"TheRighttoPrivacy."HarvardLawReview4:5,193,December15,1890.
20
Prosser,WilliamL.,“Privacy,”CaliforniaLawReview48:383,389,1960.
21
Id.
22
(1)DigitalMediaLawProject,“PublishingPersonalandPrivateInformation.”http://www.dmlp.org/legal
guide/publishingpersonalandprivateinformation.(2)Id.,“ElementsofanIntrusionClaim.”http://www.dmlp.org/legal
guide/elementsintrusionclaim
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
7
“Falselight”orpublicity.Closelyrelatedtodefamation,thisharmresultswhenfalsefactsarewidely
publishedaboutanindividual.Insomestates,falselightincludesuntrueimplications,notjustuntrue
factsassuch.
Misappropriationofnameorlikeness.Individualshavea“rightofpublicity”tocontroltheuse
oftheir
nameorlikenessincommercialsettings.
ItseemslikelythatmostAmericanstodaycontinuetosharethevaluesimplicitintheseharms,evenifthelegal
language(bynowrefinedinthousandsofcourtdecisions)strikesoneasarchaicandquaint.However,new
technologicalinsultstoprivacy,actual
orprospective,andacentury’s evolutionofsocialvalues(forexample,
today’sgreaterrecognitionoftherightsofminorities,andofrightsassociatedwithgender),mayrequirea
longerlistthansufficedin1960.
AlthoughPCAST’sengagementwiththissubjectiscenteredontechnology,notlaw,anyreportonthesubject
of
privacy,includingPCAST’s,shouldbegroundedinthevaluesofitsday.Asastartingpointfordiscussion,albeit
onlyasnapshotoftheviewsofonesetoftechnologicallymindedAmericans,PCASTofferssomepossible
augmentationstotheestablishedlistofharms,eachofwhichsuggestsapossible
underlyingrightintheageof
bigdata.
PCASTalsobelievesstronglythatthepositivebenefitsoftechnologyare(orcanbe)greaterthananynew
harms.Almosteverynewharmisrelatedtoor“adjacentto”beneficialusesofthesametechnology.
23
To
emphasizethispoint,foreachsuggestednewharm,wedescribearelatedbeneficialuse.
Invasionofprivatecommunications.Digitalcommunicationste chnologiesmakesocialnetworking
possibleacrosstheboundariesofgeography,andenablesocialandpoliticalparticipationonpreviously
unimaginablescales.Anindividual’srighttoprivate communication,securedfor
writtenmailand
wirelinetelephoneinpartbytheisolationoftheirdeliveryinfrastructure,mayneedreaffirmationinthe
digitalera,however,whereallkindsof“bits”sharethesamepipelines,andthebarrierstointerception
areoftenmuchlower.(Inthiscontext,wediscusstheuseandlimitationsofencryption
inSection4.2.)
Invasionofprivacyinaperson’svirtualhome.TheFourthAmendmentgivesspecialprotectionagainst
governmentintrusionintothehome,forexampletheprotectionofprivaterecordswithinthehome;
tortlawoffersprotectionagainstsimilarnongovernmentintrusion.Thenew“virtualhome”includes
theInternet,
cloudstorage,andotherservices.Personaldatainthecloudcanbeaccessibleand
organized.Photographsandrecordsinthecloudcanbesharedwithfamilyandfriends,andcanbe
passeddowntofuturegenerations.Theunderlyingsocialvalue,the“homeasone’scastle,”should
logicallyextendtoone’s“castle
inthecloud,”butthisprotectionhasnotbeenpreservedinthenew
virtualhome.(WediscussthissubjectfurtherinSection2.3.)
Publicdisclosureofinferredprivatefacts.Powerfuldataanalyticsmayinferpersonalfactsfr om
seeminglyharmlessinputdata.Sometimestheinferencesarebeneficial.Atitsbest,
targeted
advertisingdirectsconsumers toproductsthattheyactuallywantorneed.Inferencesaboutpeople’s
healthcanleadtobetterandtimeliertreatmentsandlongerlives.Butbeforetheadventofbigdata,it
couldbeassumedthattherewasacleardistinctionbetweenpublic andprivateinformation:eithera
fact
was“outthere”(andcouldbepointedto),oritwasnot.Today,analyticsmaydiscoverfactsthat

23
Oneperspectiveinformedbynewtechnologiesandtechnologymedicatedcommunicationsuggeststhatprivacyisabout
the“continualmanagementofboundariesbetweendifferentspheresofactionanddegreesofdisclosurewithinthose
spheres,”withprivacyandone’spublicfacebeingbalancedindifferentwaysatdifferenttimes.See:LeysiaPalenand
Paul
Dourish,“Unpacking‘Privacy’foraNetworkedWorld,”ProceedingsofCHI2003,AssociationforComputingMachinery,
April510,2003.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
8
arenolessprivatethanyesterday’spurelyprivatesphereoflife.Examplesincludeinferringsexual
preferencefrompurchasingpatterns,orearlyAlzheimer’sdiseasefromkeyclickstreams.Inthelatter
case,theprivatefactmaynotevenbeknowntotheindividualinquestion.(Section3.2discussesthe
technologybehindthe
dataanalyticsthatmakessuchinferencespossible.)Thepublicdisclosureofsuch
information(andpossiblyalsosomenonpubliccommercialuses)seemsoffensivetowidelyshared
values.
Tracking,stalking,andviolationsoflocationalprivacy.Today’stechnologieseasilydeterminean
individual’scurrentorpriorlocation.Usefullocationbasedservicesinclude
navigation,suggesting
bettercommuterroutes,finding nearbyfriends,avoidingnaturalhazards,andadvertisingthe
availabilityofnearbygoodsandservices.Sightinganindividualinapublicplacecanhardlybeaprivate
fact.Whenbigdataallowssuchsightings,orotherkindsofpassiveoractivedatacollection,tobe
assembled
intothecontinuouslocationaltrackofanindividual’sprivatelife,however,manyAmericans
(includingSupreme CourtJusticeSotomayor,forexample
24
)perceiveapotentialaffronttoawidely
accepted”reasonableexpectationofprivacy.”
Harmarisingfromfalseconclusionsaboutindividuals,basedonpersonalprofilesfrombigdata
analytics.Thepowerofbigdata,andthereforeitsbenefit,isoftencorrelational.Inmanycasesthe
“harms”fromstatisticalerrorsaresmall,forexampletheincorrectinferenceofamoviepreference;or
thesuggestion
thatahealthissuebediscussedwithaphysicia n,followingfromanalysesthatmay,on
average,bebeneficial,evenwhenaparticularinstanceturnsouttobeafalsealarm.Evenwhen
predictionsarestatisticallyvalid,moreover,theymaybeuntrueaboutparticularindividualsand
mistakenconclusionsmaycauseharm.
Societymaynotbewillingtoexcuseharmscausedbythe
uncertaintiesinherentinstatisticallyvalidalgorithms.Theseharmsmayunfairlyburdenparticular
classesofindividuals,forexample,racialminoritiesortheelderly.
Foreclosureofindividualautonomyorselfdetermination.Dataanalysesaboutlargepopulations can
discoverspecialcases
thatapplytoindividualswithinthatpopulation.Forexample,byidentifying
differencesin“learningstyles,”bigdatamaymakeitpossibletopersonalizeeducationinwaysthat
recognizeeveryindividual’spotentialandoptimizethatindividual’sachievement.Buttheprojectionof
populationfactorsontoindividualscanbemisused.Itiswidelyaccepted
thatindividualsshouldbeable
tomaketheirownchoicesandpursueopportunitiesthatarenotnecessarilytypical,andthatnoone
shouldbedeniedthechancetoachievemorethansomestatisticalexpectationofthemselves.Itwould
offendourvaluesifachild’schoicesinvideogameswerelater
usedforeducationaltracking(for
example,collegeadmissions).
Similarlyoffensivewouldbeafuture,akintoPhilipK.Dick’sscience
fictionshortstoryadaptedbyStevenSpielberginthefilmMinorityReport,where“precrime”is
statisticallyidentifiedandpunishe d.
25
Lossofanonymityandprivateass ociation.Anonymityisnotacceptableasanenablerofcommitting
fraud,orbullying,orcyberstalking,orimproperinteractionswithchildren.Apartfromwrongful
behavior,however,theindividual’srighttochoosetobeanonymousisalongheldAmericanvalue(as,
forexample,the
anonymousauthorshipoftheFederalistpapers).Usingdatato(re)identifyan
individualwhowishestobeanonymous(exceptinthecaseoflegitimategovernmentalfunctions,such
aslawenforcement)isregardedasaharm.Similarly,individualshavearightofprivateassociationwith
groupsorotherindividuals,andthe
identificationofsuchassociations maybeaharm.

24
“Iwouldaskwhetherpeoplereasonablyexpectthattheirmovementswillberecordedandaggregatedinamannerthat
enablestheGovernmenttoascertain,moreorlessatwill,theirpoliticalandreligiousbeliefs,sexualhabits,andsoon.”
UnitedStatesv.Jones(101259),Sotomayorconcurrenceathttp://www.supremecourt.gov/opinions/11pdf/101259.pdf.

25
Dick,PhillipK.,“TheMinorityReport,”firstpublishedinFantasticUniverse(1956)andreprintedinSelectedStoriesof
PhilipK.Dick,NewYork:Pantheon,2002.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
9
Whileinnosenseistheabovelistintendedtobecomplete,itdoeshaveafewintentionalomissions.For
example,individualsmaywantbigdatatobeused“fairly,”inthesenseoftreatingpeopleequally,but(apart
fromthesmallnumberofprotectedclassesalreadydefinedbylaw)it
seemsimpossibletoturnthisintoaright
thatisspecificenoughtobemeaningful.Likewise,individualsmaywanttheabilitytoknowwhatothersknow
aboutthem;butthatissurelynotarightfromthepredigitalage;and,inthecurrenteraofstatisticalanalysis,it
is
notsoeasytodefinewhat“know”means.ThisimportantissueisdiscussedinSection3.1.2,andagaintaken
upinchapter5,wheretheattemptistofocusonactualharmsdonebytheuseofinformation,notbyaconcept
astechnicallyambiguousaswhetherinformationisknown.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
10

BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
11
2.ExamplesandScenarios
ThischapterseekstomakeChapter1’sintroductorydiscussionmoreconcretebysketchingsomeexamplesand
scenarios.Whilesomeoftheseapplicationsoftechnologyareinusetoday,otherscomprisePCAST’s
technologicalprognosticati onsaboutthenearfuture,uptoperhaps10yearsfromtoday.Takentogetherthe
examplesandscenariosare
intendedtoillustrateboththeenormousbenefitsthatbigdatacanprovideandalso
theprivacychallengesthatmayaccompanythesebenefits.
Inthefollowingthreesections,itwillbeusefultodevelopsomescenariosmorecompletelythanothers,moving
fromverybriefexamplesofthingshappeningtodaytomore
fullydevelopedscenariossetinthefuture.
2.1Thingshappeningtodayorverysoon
Herearesomerelevantexamples:
Pioneeredmorethanadecadeago,devicesmountedonutilitypolesareabletosensetheradiostations
beinglistenedtobypassingdrivers,withtheresultssoldtoadvertisers.
26

In2011,automaticlicenseplatereaderswereinusebythreequartersoflocalpolicedepartments
surveyed.Within5years,25%ofdepartmentsexpecttohavetheminstalledonallpatrolcars,alerting
policewhenavehicleassociatedwithanoutstandingwarrantisinview.
27
Meanwhile,civilianusesof
licenseplatereadersareemerging,leveragingcloudplatformsandpromisingmultiplewaysofusingthe
informationcollected.
28
ExpertsattheMassachusettsInstituteofTechnologyandtheCambridgePoliceDepartmenthaveuseda
machinelearningalgorithmtoidentifywhichburglarieslikelywerecommittedbythesameoffender,
thusaidingpoliceinvestigators.
29

Differentialpricing(offeringdifferentpricestodifferentcustomersforessentiallythesamegoods)has
becomefamiliarindomainssuchasairlineticketsandcollegecosts.Bigdatamayincreasethepower
andprevalenceofthispracticeandmayalsodecreaseevenfurtheritstransparency.
30

26
ElBoghdady,Dina,“AdvertisersTuneIntoNewRadioGauge,”TheWashingtonPost,October25,2004.
http://www.washingtonpost.com/wpdyn/articles/A600132004Oct24.html
27
AmericanCivilLibertiesUnion,“YouAreBeingTracked:HowLicensePlateReadersAreBeingUsedToRecordAmericans’
Movements,”July,2013.https://www.aclu.org/files/assets/071613aclualprreportoptv05.pdf
28
Hardy,Quentin,“HowUrbanAnonymityDisappearsWhenAllDataIsTracked,”TheNewYorkTimes,April19,2014.
29
Rudin,Cynthia,“Predictivepolicing:UsingMachineLearningtoDetectPatternsofCrime,”Wired,August22,2013.
http://www.wired.com/insights/2013/08/predictivepolicingusingmachinelearningtodetectpatternsof
crime/.“:www.wired.com/insights/2013/08/predictive‐‐detectpattern
30
(1)Schiller,Benjamin,“FirstDegreePriceDiscriminationUsingBigData,”Jan.30.2014,BrandeisUniversity.
http://benjaminshiller.com/images/First_Degree_PD_Using_Big_Data_Jan_27,_2014.pdfand
http://www.forbes.com/sites/modeledbehavior/2013/09/01/willbigdatabringmorepricediscrimination/(2)Fisher,
WilliamW.“WhenShouldWePermitDifferentialPricingofInformation?”UCLALawReview55:1,2007.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
12
TheUKfirmFeatureSpaceoffersmachinelearningalgorithmstothegamingindustrythatmaydetect
earlysignsofgamblingaddictionorotheraberrantbehavioramongonlineplayers.
31

RetailerslikeCVSandAutoZoneanalyzetheircustomers’shoppingpatternstoimprovethelayoutof
theirstoresandstocktheproductstheircustomerswantinaparticularlocation.
32
Bytrackingcell
phones,RetailNextoffersbricksandmortarretailersthechancetorecognizereturningcustomers,just
ascookiesallowthemtoberecognizedbyonlinemerchants.
33
SimilarWiFitrackingtechnologycould
detecthowmanypeopleareinaclosedroom(andinsomecasestheiridentities).
TheretailerTargetinferredthatateenagecustomerwaspregnantand,bymailinghercoupons
intendedtobeuseful,unintentionallydisclosedthisfacttoherfather.
34
Theauthorofananonymousbook,magazinearticle,orwebpostingisfrequently“outed”byinformal
crowdsourcing,fueledbythenaturalcuriosityofmanyunrelatedindividuals.
35
Socialmediaandpublicsourcesofrecordsmakeiteasyforanyonetoinferthenetworkoffriendsand
associatesofmostpeoplewhoareactiveontheweb,andmanywhoarenot.
36
MaristCollegeinPoughkeepsie,NewYork,usespredictivemodelingtoidentifycollegestudentswhoare
atriskofdroppingout,allowingittotargetadditionalsupporttothoseinneed.
37

TheDurkheimProject,fundedbytheU.S.DepartmentofDefense,analyzessocialmediabehaviorto
detectearlysignsofsuicidalthoughtsamongveterans.
38
LendUp,aCaliforniabasedstartup,soughttousenontraditionaldatasourcessuchassocialmediato
providecredittounderservedindividuals.Becauseofthechallengesinensuringaccuracyandfairness,
however,theyhavebeenunabletoproceed.
39,40

31
BurnMurdoch,John,“UKtechnologyfirmusesmachinelearningtocombatgamblingaddiction,”TheGuardian,August1,
2013.http://www.theguardian.com/news/datablog/2013/aug/01/ukfirmusesmachinelearningfightgamblingaddiction
32
Clifford,Stephanie,“UsingDatatoStageManagePathstothePrescriptionCounter,”TheNewYorkTimes,June19,2013.
http://bits.blogs.nytimes.com/2013/06/19/usingdatatostagemanagepathstotheprescriptioncounter/
33
Clifford,Stephanie,“Attention,Shoppers:StoreIsTrackingYourCell,”TheNewYorkTimes,July14,2013.
34
Duhigg,Charles,“HowCompaniesLearnYourSecrets,”TheNewYorkTimesMagazine,February12,2012.
http://www.nytimes.com/2012/02/19/magazine/shoppinghabits.html?pagewanted=all&_r=0
35
Volokh,Eugene,“OutingAnonymousBloggers,”June8,2009.http://www.volokh.com/2009/06/08/outinganonymous
bloggers/;A.Narayananetal.,“OntheFeasibilityofInternetScaleAuthorIdentification,”IEEESymposiumonSecurityand
Privacy,May2012.http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6234420
36
Facebook’s“TheGraphAPI”(athttps://developers.facebook.com/docs/graphapi/)describeshowtowritecomputer
programsthatcanaccesstheFacebookfriends’data.
37
Oneoffourbigdataapplicationshonoredbythetradejournal,Computerworld,in2013.King,Julia,“UNtacklessocio
economiccriseswithbigdata,”Computerworld,June3,2013.
http://www.computerworld.com/s/article/print/9239643/UN_tackles_socio_economic_crises_with_big_data
38
Ungerleider,Neal,“ThisMayBeTheMostVitalUseOf“BigData”We’veEverSeen,”FastCompany,July12,2013.
http://www.fastcolabs.com/3014191/thismaybethemostvitaluseofbigdataweveeverseen.
39
CenterforDataInnovations,100DataInnovations,InformationTechnologyandInnovationFoundation,Washington,DC,
January2014.http://www2.datainnovation.org/2014100datainnovations.pdf
40
Waters,Richard,Dataopendoorstofinancialinnovation,”FinancialTimes,December13,2013.
http://www.ft.com/intl/cms/s/2/3c59d58a43fb11e2844c00144feabdc0.html
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
13
Insightintothespreadofhospitalacquiredinfectionshasbeengainedthroughtheuseoflargeamounts
ofpatientdatatogetherwithpersonalinformationaboutuninfectedpatientsandclinicalstaff.
41
Individuals’heartratescanbeinferredfromthesubtlechangesintheirfacialcolorationthatoccurwith
eachbeat,enablinginferencesabouttheirhealthandemotionalstate.
42
2.2Scenariosofthenearfutureinhealthcareandeducation
Hereareafewexamplesofthekindsofscenariosthatcanreadilybeconstructed.
2.2.1Healthcare:personalizedmedicine
Notallpatientswhohaveaparticulardiseasearealike,nordotheyrespondidenticallytotreatment.
Researcherswillsoonbeabletodrawonmillionsofhealthrecords(includinganalogdatasuchasscansin
additiontodigitaldata),vastamountsofgenomicinformation,extensivedataonsuccessfulandunsuccessful
clinicaltrials,hospitalrecords,andsoforth.Insomecasestheywillbeabletodiscernthatamongthediverse
manifestationsofthedisease,asubsetofthepatie ntshaveacollectionof traitsthattogetherformavariant
thatrespondstoaparticulartreatmentregime.
Sincetheresultof
theanalysiscouldleadtobetteroutcomesforparticularpatients,itisdesirabletoidentify
thoseindividualsinthecohort,contactthem,treattheirdiseaseinanovelway,andusetheirexperiencesin
advancingtheresearch.Theirdatamayhavebeengatheredonlyanonymously,however,oritmayhavebeen
deidentified.
Solutionsmaybeprovidedbyspecificnewtechnologiesfortheprotectionofdatabaseprivacy.Thesemay
createaprotectedquerymechanismsoindividualscanfindoutwhethertheyareinthecohort,orprovidean
alertmechanismbasedonthecohortcharacteristicssothat,whenamedicalprofessional
seesapatientinthe
cohort,anoticeisgenerated.
2.2.2Healthcare:detectionofsymptomsbymobiledevices
ManybabyboomerswonderhowtheymightdetectAlzheimer 'sdiseaseinthemselves.Whatwouldbebetter
toobservetheirbehaviorthanthemobiledevicethatconnectsthemtoapersonalassistantinthecloud(e.g.,
SiriorOKGoogle),helpsthemnavigate,remindsthemwhatwordsmean,rememberstodothings,
recalls
conversations,measuresgait,andotherwiseisinapositiontodetectgradualdeclinesontraditionalandnovel
medicalindicatorsthatmightbeimperceptibleeventotheirspouses?
Atthesametime,anyleakofsuchinformationwouldbeadamagingbetrayaloftrust.Whatareindividuals’
protectionsagainstsuch
risks?Cantheinferredinformationaboutindividuals’healthbesold,without
additionalconsent,tothirdparties(e.g.,pharmaceuticalcompanies)?Whatifthisisastatedconditionofuseof

41
(1)Wiens,Jenna,JohnGuttag,andEricHorvitz,“AStudyinTransferLearning:LeveragingDatafromMultipleHospitalsto
EnhanceHospitalSpecificPredictions,”JournaloftheAmericanMedicalInformaticsAssociation,January2014.(2)
Weitzner,DanielJ.,etal.,“ConsumerPrivacyBillofRightsandBigData:ResponsetoWhiteHouse
OfficeofScienceand
TechnologyPolicyRequestforInformation,”April4,2014.
42
Frazer,Bryant,“MITComputerProgramRevealsInvisibleMotioninVideo,”TheNewYorkTimesvideo,February27,2013.
https://www.youtube.com/watch?v=3rWycBEHn3s
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
14
theapp?Shouldinformationgotoindividuals’personalphysicians withtheirinitialconsentbutnota
subsequentconfirmation?
2.2.3Education
Drawingonmillionsoflogsofonlinecourses,includingbothmassiveopenonlinecourses(MOOCs)andsmaller
classes,itwillsoonbepossibletocreateandmaintainlongitudinaldataabouttheabilitiesandlearningstylesof
millionsofstudents.Thiswillincludenotjustbroadaggregateinformationlikegrades,but
finegrainedprofiles
ofhowindividualstudentsrespondtomultiplenewkindsofteachingtechniques,howmuchhelptheyneedto
masterconceptsatvariouslevelsofabstraction,whattheirattentionspanisinvariouscontexts,andsoforth.A
MOOCplatformcanrecordhowlongastudentwatchesa
particularvideo;howoftenasegmentisrepeated,
spedup,orskipped;howwellastudentdoesonaquiz;howmanytimesheorshemissesaparticularproblem;
andhowthestudentbalanceswatchingcontenttoreadingatext.Astheabilitytopresentdifferentmaterialto
differentstudents
materializesinthepla tforms,thepossibilityofblind,randomizedA/Btestingenablesthegold
standardofexperimentalsciencetobeimplementedatlargescaleintheseenvironments.
43
Similardataarealsobecomingavailableforresidentialclasses,aslearningmanagementsystems(suchas
Canvas,Blackboard,orDesire2Learn)expandtheirrolestosupportinnovativepedagogy.Inmanycourses one
cannowgetmomentbymomenttrackingofthestudent'sengagementwiththecoursematerialsandcorrelate
thatengagementwith
thedesiredlearningoutcomes.
Withthisinformation,itwillbepossiblenotonlytogreatlyimproveeducation,butalsotodiscoverwhatskills,
taughttowhichindividualsatwhichpointsinchildhood,leadtobetteradultperformanceincertaintasks,orto
adultpersonalandeconomicsuccess.Whilethesedatacould
revolutionizeeducationalresearch,theprivacy
issuesarecomplex.
44
Therearemanyprivacychallengesinthisvisionofthefutureofeducation.Knowledgeofearlyperformancecan
createimplicitbiases
45
thatcolorlaterinstructionandcounseling.Thereisgreatpotentialformisuse,ostensibly
forthesocialgood,inthemassiveabilitytodirectstudentsintohigh‐orlowpotentialtracks.Parentsand
othershaveaccesstosensitiveinformationaboutchildren,butmechanismsrarelyexisttochangethose
permissionswhenthe
childreachesmajority.
2.3Challengestothehome’sspecialstatus
Thehomehasspecialsignificanceasasanctuaryofindividualprivacy.TheFourthAmendment’slist,“persons,
houses,papers,andeffects,”putsonlythephysicalbodyintherhetoricallymoreprominentposition;anda
houseisoftenthephysicalcontainerfortheotherthree,aboundaryinsideofwhichenhancedprivacyrights
apply.

43
ForanoverviewofMOOCsandassociatedanalyticsopportunities,seePCAST’sDecember2013lettertothePresident.
http://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_edit_dec2013.pdf
44
Thereisalsouncertaintyabouthowtointerpretapplicablelaws,suchastheFamilyEducationalRightsandPrivacyAct
(FERPA).RecentFederalguidanceisintendedtohelpclarifythesituation.See:U.S.DepartmentofEducation,“Protecting
StudentPrivacyWhileUsingOnlineEducationalServices:RequirementsandBestPractices,”February2014.
http://ptac.ed.gov/sites/default/files/Student%20Privacy%20and%20Online%20Educational%20Services%20%28February%
202014%29.pdf
45
Cukier,Kenneth,andViktorMayerSchoenberger,"HowBigDataWillHauntYouForever,"Quartz,March11,2014.
http://qz.com/185252/howbigdatawillhauntyouforeveryourhighschooltranscript/
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
15
ExistinginterpretationsoftheFourthAmendmentareinadequateforthepresentworld,however.We,along
withthe“papersandeffects”contemplatedbytheFourthAmendment,liveincreasinglyincyberspace,where
thephysicalboundaryofthehomehaslittlerelevance.In1980,afamily’sfinancialrecordswerepaper
documents,locatedperhapsin
adeskdrawerinsidethehouse.By2000,theyweremigratingtotheharddrive
ofthehomecomputerbutstillwithinthehouse. By2020,itislikelythatmostsuchrecordswillbeinthe
cloud,notjustoutsidethehouse,butlikelyreplicatedinmultiplelegal
jurisdictionsbecausecloudstorage
typicallyuseslocationdiversitytoachievereliability.Thepictureisthesameifonesubstitutesforfinancial
recordssomethinglike“politicalbookswepurchase,”or“lovelettersthatwereceive,”or“eroticvideosthatwe
watch.”Absentdifferentpolicy,legislative,andjudicialapproaches,thephysicalsanctity
ofthehome’spapers
andeffectsisrapidlybecominganemptylegalvessel.
Thehomeisalsothecen trallocusofBrandeis’“righttobeleftalone.”Thisrightisalsoincreasinglyfragile,
however.Increasingly,peoplebringsensorsintotheirhomeswhoseimmediatepurposeistoprovide
convenience,safety,and
security.Smokeandcarbonmonoxidealarmsarecommon,andoftenrequiredby
safetycodes.
46
Radondetectorsareusualinsomepartsofthecountry.Integratedairmonitorsthatcandetect
andidentifymanydifferentkindsofpollutantsandallergensarereadilyforeseeable.Refrigeratorsmaysoonbe
ableto“sniff”forgasesreleasedfromspoiledfood,or,asanotherpossiblepath,maybeableto
“read”food
expirationdatesfromradiofrequencyidentification (RFID)tagsinthefood’spackaging.Ratherthantoday’s
annoyingcacophonyofbeeps,tomorrow’ssensors(assomealreadydotoday)willinterfacetoafamilythrough
integratedappsonmobiledevicesordisplayscreens.Thedatawillhavebeenprocessedandinterpreted.Most
likelythatprocessingwilloccur inthecloud.So,todeliverservicestheconsumerwants,muchdatawillneedto
haveleftthehome.
Environmentalsensorsthatenablenewfoodandairsafetymayalsobeabletodetectandcharacterizetobacco
ormarijuanasmoke.Healthcareorhealthinsurance
providersmaywantassurancethatselfdeclarednon
smokersaretellingthetruth.Mightthey,asaconditionoflowerpremiums,requirethehomeowner’sconsent
fortappingintotheenvironmentalmonitors’data?Ifthemonitordetectsheroinsmoking,isaninsurance
companyobligatedtoreportthistothepolice?Canthe
insurercancelthehomeowner’spropertyinsurance?
Tosome,itseemsfarfetchedthatthetypical homewillforeseeablyacquirecamerasandmicrophonesinevery
room,butthatappearstobealikelytrend.Whatcanyourcellphone(alreadyequippedwithfrontandback
cameras)hearorseewhenitis
onthenightstandnexttoyourbed?Tabl ets, laptops,andmany desk top
computershavecamerasandmicrophones.Motiondetectortechnologyforhomeintrusionalarmswilllikely
movefromultrasoundandinfraredtoimagingcameraswiththebenefitoffewerfalsealarmsandtheability
todistinguishpetsfrompeople.Facial
recognitiontechnologywillallowfurthersecurityandconvenience.For
thesafetyoftheelderly,camerasandmicrophones willbeabletodetectfallsorcollapses,orcallsforhelp,and
benetworkedtosummonaid.
Peoplenaturallycommunicatebyvoiceandgesture.Itisinevitablethatpeoplewillcommunicatewith
their
electronicservantsinbothsuchmodes(necessitatingthattheyhaveaccesstocamerasandmicrophones).

46
Nest,acquiredbyGoogle,attractedattentionearlyforitsdesignanditsuseofbigdatatoadapttoconsumerbehavior.
See:Aoki,Kenji,"NestGivestheLowlySmokeDetectoraBrain,"Wired,October,2013.
http://www.wired.com/2013/10/nestsmokedetector/all/
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
16
CompaniessuchasPrimeSense,anIsraelifirmrecentlyboughtbyApple,
47
aredevelopingsophisticated
computervisionsoftwareforgesturereading,alreadyakeyfeatureintheconsumercomputergamecons ole
market(e.g.,MicrosoftKinect).Consumertelevisionsarealreadyamongthefirst“appliances”torespondto
gesture;already,devicessuchastheNestsmokedetectorrespondtogestures.
48
Theconsumerwhotapshis
templetosignalaspokencomman dtoGoogleGlass
49
maywanttousethesamegestureforthetelevision,or
forthatmatterforthethermostatorlightswitch,inanyroomathome.Thisimpliesomnipresentaudioand
videocollectionwithinthehome.
Alloftheseaudio,video,andsensordatawillbegeneratedwithinthesupposedsanctuary
ofthehome.But
theyarenomorelikelytostayinthehomethanthe“papersandeffects”alreadydiscussed.Electronicdevices
inthehomealreadyinvisiblycommunicatetotheoutsideworldviamultipleseparateinfrastructures:Thecable
industry’shardwiredconnectiontothehomeprovidesmultipletypesoftwoway
communication,including
broadbandInternet.WirelinephoneisstillusedbysomehomeintrusionalarmsandsatelliteTVreceivers,and
asthephysicallayerforDSLbroadbandsubscribers.Somehomedevicesusethecellphonewireless
infrastructure.ManyotherspiggybackonthehomeWiFinetworkthatisincreasinglyanecessity
ofmodern
life.Today’ssmarthomeentertainmentsystemknowswhatapersonrecordsonaDVR,whatsheactually
watches,andwhenshewatchesit.Likepersonalfinancialrecordsin2000,thisinformationtodayisinpart
localizedinsidethehome,ontheharddriveinsidetheDVR.Aswithfi nancial
informationtoday,however,itis
ontracktomoveintothecloud.Today,NetflixorAmazoncanofferentertainmentsuggestionsbasedon
customers’pastkeyclickstreamsandviewinghistoryontheirplatforms.Tomorrow,evenbettersuggestions
maybeenabledbyinterpretingtheirminutebyminutefacialexpressionsasseen
bythegesturereading
camerainthetelevision.
Thesecollectionsofdataarebenign,inthesensethattheyarenecessaryforproductsandservicesthat
consumerswillknowinglydemand.Theirchallengestoprivacyarisebothfromthefactthattheiranalogsensors
necessarilycollectmoreinformationthanisminimallynecessary
fortheirfunction(seeSection3.1.2),andalso
becausetheirdatapracticallycryoutforsecondaryusesrangingfrominnovativenewproductstomarketing
bonanzastocriminalexploits.Asinmanyotherkindsofbigdata,thereisambiguityastodataownership,data
rights,andalloweddatause.Computer
visionsoftwareislikelyalreadyabletoreadthebrandlabelson
productsinitsfieldofviewthisisa mucheasiertechnologythanfacialrecognition.Ifthecamerainyour
televisionknowswhatbrandofbeeryouaredrinkingwhilewatchingafootballgame,andknowswhetheryou
openedthebottlebeforeorafterthebeerad,who(ifanyone)isallowedtosellthisinformationtothebeer
company,ortoitscompetitors?Isthecameraallowedtoreadbrandnameswhenthetelevisionsetis
supposedlyoff?Canitwatchformagazinesorpoliticalleaflets?If
theRFIDtagsensorinyourrefrigerator
usefullydetectsoutofdatefood,canitalsoreportyourbrandchoicestovendors?Isthiscreepyandstrange,
oraconsumerfinancialbenefitwheneverysupermarketcanofferyourelevantcoupons?
50
Or(thedilemmaof

47
Reuters,“AppleacquiresIsraeli3DchipdeveloperPrimeSense,”November25,2013.
http://www.reuters.com/article/2013/11/25/usprimesenseofferappleidUSBRE9AO04C20131125
48
Id.
49
Google,“Glassgestures.”https://support.google.com/glass/answer/3064184?hl=en
50
Tene,Omer,andJulesPolonetsky,"ATheoryofCreepy:Technology,PrivacyandShiftingSocialNorms,"YaleJournalof
LawandTechnology16:59,2013,pp.59100.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
17
differentialpricing
51
)isitanydifferentifthedataareusedtoofferothersabetterdealwhileyoupayfullprice
becauseyourbrandloyaltyisknowntobestrong?
AboutonethirdofAmericansrent,ratherthanown,theirresidences.Thisnumbermayincreasewithtimeasa
resultof
longtermeffectsofthe2007financialcrisis,aswellasagingoftheU.S.population.Todayand
foreseeably,rentersarelessaffluent,onaverage,thanhomeowners.Thelawdemarcatesafinelinebetween
thepropertyrightsoflandlordsandtheprivacyrightsoftenants.Landlordshavetherightto
entertheir
propertyundervariousconditions,generallyincludingwherethetenanthasviolatedhealthorsafetycodes,or
tomakerepairs.Asmoredataarecollectedwithinthehome,therightsoftenantandlandlordmayneednew
adjustment.Ifenvironmentalmonitorsarefixturesofthelandlord’sproperty,doesshehave
anunconditional
righttotheirdata?Canshesellthosedata?Iftheleasesoprovides,cansheevictthetenantifthemonitor
repeatedlydetectscigarettesmoke,oracamerasensorisabletodistinguishaprohibitedpet?
Ifathirdpartyoffersfacialrecognitionservicesforlandlords(no
doubtwithallkindsofcryptographic
safeguards!),canthelandlordusethesedatatoenforceleaseprovisionsagainstsublettingoradditional
residents?Cansherequiresuchmonitoringasaconditionofthelease?Whatifthelandlord’scamerasare
outsidethedoors,butkeeptrackofeveryonewhoentersorleaves
herproperty?Howisthisdifferent fr omthe
caseofasecuritycameraacrossthestreetthatisownedbythelocalpolice?
2.4Tradeoffsamongprivacy,security,andconvenience
Notionsofprivacychangegenerationally.Oneseestodaymarkeddifferencesbetweentheyoungergeneration
of“digitalnatives”andtheirparentsorgrandparents.Inturn,thechildrenoftoday’sdigitalnativeswilllikely
havestilldifferentattitudesabouttheflowoftheirpersonalinformation.Raisedinaworldwithdigital
assistantswho
knoweverythingaboutthem,and(onemayhope) withwisepoliciesinforcetogovernuseofthe
data,futuregenerationsmayseelittlethreatinscenariosthatindividualstodaywouldfindthreatening,ifnot
Orwellian.PCAST’sfinalscenario,perhapsattheouterlimitofitsabilitytoprognosticate,is
constructedto
illustratethispoint.
TaylorRodriguezpreparesforashortbusinesstrip.Shepackedabagthenightbeforeandputitoutsidethe
frontdoorofherhomeforpickup.Noworriesthatitwillbestolen:Thecameraonthestreetlightwaswatching
it;and,inanycase,
almosteveryiteminithasatinyRFIDtag.Anywouldbethiefwouldbetrackedand
arrestedwithinminutes.Noristhereanyneedtogiveexplicitinstructionstothedeliverycompany,becausethe
cloudknowsTaylor’sitineraryandplans;thebagispickedupovernightandwill
beinTaylor’sdestinationhotel
roombythetimeofherarrival.
Taylorfinishesbreakfastandstepsoutthefrontdoor.Knowingtheschedule,thecloudhasprovidedaself
drivingcar,waitingatthecurb.Attheairport,Taylorwalksdirectlytothegatenoneedtogothrough
any
security.Norarethereanyformalitiesatthegate:Atwentyminute“opendoor”intervalisprovidedfor
passengerstostrollontotheplaneandtaketheirseats(whicheachseesindividuallyhighlightedinhisorher
wearableopticaldevice).Therearenoboardingpassesandnoorganizedlines.
Whybother,whenTaylor’s
identity(asforeveryoneelsewhoenterstheairport)hasbeentrackedandisknownabsolutely?Whenher
knowninformationemanations(phone,RFIDtagsinclothes,facialrecognition,gait,emotionalstate)areknown
tothecloud,vetted,andessentiallyunforgeable?When,intheunlikelyeventthatTaylor
hasbecomederanged
anddangerous,manydetectablesignswouldalreadyhavebeentracked,detected,andactedon?

51
Seereferencesatfootnote30.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
18
Indeed,everythingthatTaylorcarrieshasbeenscreenedfarmoreeffectivelythananyrushedairportsearch
today.FriendlycamerasineveryLEDlightingfixtureinTaylor’shousehavewatchedherdressandpack,asthey
doeveryday.NormallythesedatawouldbeusedonlybyTaylor’spersonaldigital assistants,perhaps
tooffer
remindersorfashionadvice.Asaconditionofusingtheairporttransitsystem,however,Taylorhasauthorized
theuseofthedataforensuringairportsecurityandpublicsafety.
Taylor’sworldseemscreepytous.Taylorhasacceptedadifferentbalanceamongthepublicgoodsof
convenience,privacy,
andsecuritythanwouldmostpeopletoday.Tayloractsintheunconsciousbelief
(whetherjustifiedornot,dependingonthenatureandeffectivenessofpoliciesinforce)thatthecloudandits
roboticservantsaretrustworthyinmattersofpersonalprivacy.Insuchaworld,majorimprovementsinthe
convenienceand
securityofeverydaylifebecomepossible.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
19
3.Collection,Analytics,andSupportingInfrastructure
Bigdataisbigintwodifferentsenses.Itisbiginthequan tityandvarietyofdatathatareavailabletobe
processed.And,itisbiginthescale ofanalysis(“analytics”)thatcanbeappliedtothosedata,ultimatelyto
makeinferences.Bothkindsof“big”dependon
theexistenceofamassiveandwidelyavailablecomputational
infrastructure,onethatisincreasinglybeingprovidedbycloudservices.Thischapterexpandsonthesebasic
concepts.
3.1Electronicsourcesofpersonaldata
Sinceearlyinthecomputerage,publicandprivateentitieshavebeenassemblingdigitalinformationabout
people.Databasesofpersonalinformationwerecreatedduringthedaysof“batchprocessing.”
52
Indeed,early
descriptionsofdatabasetechnologyoftentalkaboutpersonnelrecordsusedforpayrollapplications.As
computingpowerincreased,moreandmorebusinessapplicationsmovedtodigitalform.Therenowaredigital
telephonecallrecords,creditcardtransactionrecords,bankaccountrecords,emailrepositories,andsoon.As
interactivecomputing
hasadvanced,individualshaveenteredmoreandmoredataaboutthemselves,bothfor
selfidentificationtoanonlineserviceandforproductivitytoolssuchasfinancialmanagementsystems.
Thesedigitaldataarenormallyaccompaniedby“metadata”orancillarydatathatexplainthelayoutand
meaningofthedatatheydescribe.Databases
haveschemasandemailhasheaders,
53
asdonetworkpackets.
54

Asdatasetsbecomemorecomplex,so dotheattachedmetadata.Includedinthedataormetadatamaybe
identifyinginformationsuchasaccountnumbers,loginnames,andpasswords.Thereisnoreasontobelieve
thatmetadataraisefewerprivacyconcernsthanthedatatheydescribe.
Inrecenttimes,the
kindsofelectronicdataavailabl eabout peop lehaveincreasedsubstantially,inpartbecause
oftheemergenceofsocialmediaandin partbecauseofthegrowthinmobiledevices,surveillancedevices,and
adiversityofnetworkedsensors.Today,althoughtheymaynotbeawareofit,individualsconstantlyemitinto
the
environmentinformationwhoseuseormisusemaybeasourceofprivacyconcerns.Physically,these
informationemanationsareoftwotypes,whichcanbecalled“borndigital”or“bornanalog.”
3.1.1“Borndigital”data
Wheninformationis“borndigital,”itiscreated,byusorbyacomputersurrogate,specificallyfordigitaluse
thatis,forusebyacomputerordataprocessingsystem.Examplesofdatathatareborndigitalinclude:
emailandtextmessaging
inputviamouseclicks,taps,swipes,
orkeystrokesonaphone,tablet,computer,orvideogame;thatis,
datathatpeopleintentionallyenterintoadevice

52
Suchdatabasesendureandformthebasisofcontinuingconcernamongprivacyadvocates.
53
Schemasareformaldefinitionsoftheconfigurationofadatabase:itstables,relations,andindices.Headersarethe
sometimesinvisibleprefacestoemailmessagesthatcontaininformationaboutthesendinganddestinationaddressesand
sometimestheroutingofthepathbetweenthem.
54
IntheInternetandsimilarnetworks,informationisbrokenupintochunkscalledpackets,whichmaytravel
independentlyanddependonmetadatatobereassembledproperlyatthedestinationofthetransmission.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
20
GPSlocationdata
metadataassociatedwithphonecalls:thenumbersdialedfromorto,thetimeanddurationofcalls
dataassociatedwithmostcommercialtransactions:creditcardswipes,barcodereads,readsofRFID
tags(asusedforantitheftandinventorycontrol)
dataassociatedwith
portalaccess(keycardorIDbadgereads)andtollroadaccess(remotereadsof
RFIDtags)
metadatathatourmobiledevices usetostayconnectedtothenetwork,includingdevicelocationand
status
increasingly,datafromcars,televisions,appliances:the“InternetofThings”
Consumertrackingdataprovide
anexampleofborndigitaldatathathasbecomeeconomicallyimportant.Itis
generallypossibleforcompaniestoaggregatelargeamountsofdataandthenusethosedataformarketing,
advertising,ormanyotheractivities.Thetraditionalmechanismhasbeentousecookies,smalldatafilesthata
browsercan
leaveonauser’scomputer(pioneeredbyNetscapetwodecadesago).Thetechniqueistoleavea
cookiewhenauserfirstvisitsasiteandthenbeabletocorrelatethatvisitwithasubsequentevent.This
informationisveryvaluabletoretailersandformsthebasisofmanyof
theadvertisingbusinessesofthelast
decade.Therehasbeenavarietyofpro posalstoregulatesuchtracking,
55
andmanycountriesrequireoptin
permissionbeforethistrackingisdone.Cookiesinvolverelativelysimplepiecesofinformationthatproponents
representasunlikelytobeabused.Althoughnotalwaysawareoftheprocess,peopleacceptsuchtrackingin
returnforafreeorsubsidizedservice.
56
Atthesametime,cookiefreealternativesaresometimesavailable.
57
Evenwithoutcookies,socalled“fingerprinting”techniquescanoftenidentifyauser’scomputerormobil e
deviceuniquelybytheinformationthatitexposespublicly,suchasthesizeofitsscreen,itsinstalledfonts,and
otherfeatures.
58
Mosttechnologistsbelievethatapplicationswillmoveawayfromcookies,thatcookiesaretoo
simpleanidea,andthattherearebetteranalyticscomingandbetterapproachesbeinginvented.Theeconomic
incentivesforconsumertrackingwillremain,however,andbigdatawillallowformorepreciseresponses.
Trackingisalso
theenablingtechnologyofsomemorenefarioususes.Unfortunately, manysocialnetworking
appsbeginbytakingaperson’scontactlistandspammingalltherecipientswithadvertisingfortheapp.This
techniqueisoftenabused,especiallybysmallstartupswhomayassessthevaluegainedbyreachingnew
customersas
beinggreaterthanthevaluelosttotheirreputationforhonoringprivacy.

55
FederalTradeCommission,“FTCStaffRevisesOnlineBehavioralAdvertisingPrinciples,”PressRelease,February12,2009.
http://www.ftc.gov/newsevents/pressreleases/2009/02/ftcstaffrevisesonlinebehavioraladvertisingprinciples
56
(1)Cf.TheWallStreetJournal’s“Whattheyknow”series(http://online.wsj.com/public/page/whattheyknowdigital
privacy.html).(2)Turow,Joseph,TheDailyYou:HowtheAdvertisingIndustryisDefiningyourIdentityandYourWorth,Yale
UniversityPress,2012.http://yalepress.yale.edu/book.asp?isbn=9780300165012
57
DuckDuckGoisanontrackingsearchenginethat,whileperhapsyieldingfewerresultsthanleadingsearchengines,is
usedbythoselookingforlesstracking.See:https://duckduckgo.com/
58
(1)Tanner,Adam,“TheWebCookieIsDying.Here'sTheCreepierTechnologyThatComesNext,”Forbes,June17,2013.
http://www.forbes.com/sites/adamtanner/2013/06/17/thewebcookieisdyingheresthecreepiertechnologythat
comesnext/(2)Acar,G.etal.,“FPDetective:DustingtheWebforFingerprinters,”2013.
http://www.cosic.esat.kuleuven.be/publications/article2334.pdf
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
21
Allinformationthatisborn digitalsharescertaincharacteristics.Itiscreatedinidentifiableunitsforparticular
purposes.Theseunitsareinmostcases“datapackets”ofoneoranotherstandardtype.Sincetheyarecreated
byintent,theinformationthattheycontainisusuallylimited,forreasonsofefficiencyand
goodengineering
design,tosupporttheimmediatepurposeforwhichtheyarecollected.
Whendataareborndigital,privacyconcernscanariseintwodifferentmodes,oneobvious(“overcollection”),
theothermorerecentandsubtle(“datafusion”).Overcollectionoccurswhenanengineeringdesign
intentionally,andsometimesclandestinely,collects
informationunrelatedtoitsstatedpurpose.Whileyour
smartphonecouldeasilyphotographandtransmittoathirdpartyyourfacialexpressionasyoutypeevery
keystrokeofatextmessage,orcouldcaptureallkeystrokes,therebyrecordingtextthatyouhaddeleted,these
wouldbeinefficientandunreasonablesoftwaredesignchoices
forthedefaulttextmessagingapp.Inthat
contexttheywouldbeinstancesofover collection.
ArecentexampleofovercollectionwastheBrightestFlashlightFreephoneapp,downloadedbymorethan50
millionusers,whichpassedbacktoitsvendoritslocationeverytimetheflashlightwasused.
Notonlyislocation
informationunnecessaryfortheilluminationfunctionofaflashlight,butitalsodisclosespersonalinformation
thattheusermightwishtokeepprivate.TheFederalTradeCommissionissuedacomplaintbecausethefine
printonthenoticeandconsentscreen(seeSection4.3)hadneglectedtodisclose
thatlocationinformation,
whosecollectionwasdisclosed,wouldbesoldtothirdparties,suchasadvertisers.
59,60
Oneseesinthisexample
thelimitationsofthenoti ceandconsentframework:AmoredetailedinitialfineprintdisclosurebyBrightest
FlashlightFree,whichalmostnoonewouldhaveactuallyread,wouldlikelyhaveforestalledanyFTCaction
withoutmuchaffectingthenumberofdownloads.
Incontrasttoover
collection,datafusionoccurswhendatafromdifferentsourcesarebroughtintocontactand
new,oftenunexpected,phenomenaemerge(seeSection3.1).Individually,eachdatasourcemayhavebeen
designedforaspecific,limitedpurpose.Butwhenmultiplesourcesareprocessedbytechniquesofmodern
statisticaldatamining,patternrecogni tion,
andthecombiningofrecordsfromdiversesourcesbyvirtueof
commonidentifyingdata,newmeaningscanbefound.Inparticular,datafusionfrequentlyresults inthe
identificationofindividualpeople(thatis,theassociationofeventswithuniquepersonalidentities),the
creationofdatarichprofilesofanindividual,and
thetrackingofanindividual’sactivitiesoverdays,months,or
years.
Bydefinition,theprivacychallengesfromdatafusiondonot lieintheindividualdatastreams,eachofwhose
collection,realtimeprocessing,andretentionmaybewhollynecessaryandappropriateforitsovert,immediate
purpose.Rather,theprivacy
challengesareemergentpropertiesofourincreasingabilitytobringintoanalytical
juxtapositionlarge,diversedatasetsandtoprocessthemwithnewkindsofmathematicalalgorithms.

59
FederalTradeCommission,“AndroidFlashlightAppDeveloperSettlesFTCChargesItDeceivedConsumers,”Press
Release,December5,2013.http://www.ftc.gov/newsevents/pressreleases/2013/12/androidflashlightappdeveloper
settlesftcchargesitdeceived
60
(1)FTCFileNo.1323087Decisionandorder.
http://www.ftc.gov/system/files/documents/cases/140409goldenshoresdo.pdf(2)“FTCApprovesFinalOrderSettling
ChargesAgainstFlashlightAppCreator.”http://www.ftc.gov/newsevents/pressreleases/2014/04/ftcapprovesfinal
ordersettlingchargesagainstflashlightapp
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
22
3.1.2Datafromsensors
Turnnowtothesecondbroadclassofinformationemanations.Onecansaythatinformationis“bornanalog”
whenitarisesfromthecharacteristicsofthephysicalworld.Suchinformationdoesnotbecomeaccessible
electronicallyuntilitimpingesona“sensor,”anengineereddevicethatobservesphysicaleffectsandconverts
them
todigitalform.Themostcommonsensorsarecameras,includingvideo,whichsensevisible
electromagneticradiation;andmicrophones,whichsensesoundandvibration.Therearemanyotherkindsof
sensors,however.Today,cellphonesroutinelycontainnotonlycameras,microphones,andradiosbutalso
analogsensorsformagneticfields(3
Dcompass)andmotion(acceleration).Otherkindsofsensorsinclude
thoseforthermalinfrared(IR)radiation;airquality,includingtheid entificationofchemicalpollutants;
barometricpressure(andaltitude);lowlevelgammaradiation;andmanyotherphenomena.
Examplesofbornanalogdataprovidingpersonalinformationandinusetodayinclude:
the
voiceand/orvideocontentofaphonecallbornanalogbutimmediatelyconvertedtodigitalbythe
phone’smicrophoneandcamera
personalhealthdatasuchasheartbeat,respiration, andgait,assensedbyspecialpurposedevices
(Fitbithasbeenaleadingprovider
61
)orcellphoneapps
cameras/sensorsintelevisionsandvideogamesthatinterpretgesturesbytheuser
videofromsecuritysurveillancecameras,mobilephones,oroverheaddrones
imaginginfraredvideothatcanseeinwhatpeopleperceiveastotaldarkness(andalsoseeevanescent
tracesofpastevents,
socalledheatscars)
microphonenetworksincities,usedtodetectandlocategunshotsandforpublicsafety
cameras/microphonesinclassroomsandothermeetingrooms
ultrasonicmotiondetectors
medicalimaging,CT,andMRIscans,ultrasonicimaging
opportunisticallycollectedchemicalorbiologicalsamples,notablytraceDNA
(todayrequiringslow,off
lineanalysis,butforeseeablymorenimble)
syntheticapertureradar(SAR),whichcanimagethroughcloudsand,undersomeconditions,seeinside
ofnonmetallicstructures
unintendedradiofrequencyemissionsfr omelectricalandelectronicdevices
Whendataarebornanalog,theyarelikelytocontainmore
informationthantheminimumnecessaryfortheir
immediatepurpose,forseveralvalidreasons.Oneisthatthedesiredinformation(“signal”)mustbesensedin
thepresenceofunwantedextraneousinformation(“noise”).Thetechnologiestypicallyworkbysensingthe
environment(“signalplusnoise”)withhighprecision,sothatmathematicaltechniquescanthen
beappliedthat
willseparatethetwoevenintheworstanticipatedcasewhenthesignalissmallestorthenoiseislargest.
Anotherreasonistechnologicalconvergence.Forexample,asthecamerasincellphonesbecomesmallerand
cheaper,theuseofidenticalcomponentsinotherproductsbecomesa
favoreddesignchoice,evenwhenfull
imagesarenotneeded.WhereabigscreentelevisiontodayhasseparatesensorsforitsIRremotecontrol,
roombrightness,andmotiondetection(afeaturethatturnsoffthepicturewhennooneisintheroom),plusa
truevideocameraintheadd
ongameconsole,tomorrow’smodelmayintegrateallofthesefunctionsina
single,cheap,highresolution,IRsensiti vecamera,afewmillimetersinsize.

61
See:http://www.fitbit.com/
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
23
Inadditiontotheinformationavailablefromdigitalandanalogsourcesconsciouslyintendedtoprovide
informationaboutpeople,inadvertentdisclosureaboundsfromtheemerging“InternetofThings,”an
amalgamationofsensorswhoseprimarypurposeisenhancedby“smart”networkconnectedcomputational
capabilities.Examplesinclude“smart”thermostatsthatdetecthumanpresenceand
adjustairtemperatures
accordingly,“smart”automobileignitionsystems,andlockingsystemsthatarebiometricallytriggered.
Theprivacychallengesofbornanalogdataaresomewhatdifferentfromthoseofborndigitaldata.Where
overcollection(aswasdefinedabove)isanirrationaldesignchoicefortheprincipleddigitald esignerand
thereforeanidentifiableredflagforprivacyissuesovercollectionintheanalogdomaincanbearobustand
economicaldesignchoice.Aconsequenceisthatbornanalogdatawilloftencontaininformationthatwasnot
originallyexpected.Unexpectedinformationcouldinmanycasesleadtounanticipatedbeneficialproductsand
services,butitcouldalsogiveopportunitiesforunanticipatedmisuse.
Asaconcreteexample,onemightconsiderthreekeyparametersofvideoimaging:resolution(howmanypixels
intheimage),contrastratio(howwellcantheimageseeintodarkregions),andphotometricprecision(how
accurateistheimagein
brightnessandcolor).Allthreeparametershaveimprovedbyordersofmagnitudeand
arelikelytokeepimproving.Today,withspecialcameras,onecanimageacityscapefromahighrooftopand
seeclearlyintoeveryfacinghouseandapartmentwindowwithinseveralmiles.
62
Or,alreadymentioned,the
abilityexiststosenseremotelythepulseofanindividual,givinginformationonhealthstatusandemotional
state.
63
Itisforeseeable,perhapsinevitable,thatthesecapabilitieswillbepresentineverycellphoneandsecurity
surveillancecamera,oreverywearablecomputerdevice.(Imaginetheprocessofnegotiatingthepriceforacar,
ornegotiatinganinternationaltradeagreement,wheneveryparticipant’sGoogleGlass(orsecuritycameraor
TV
camera)isabletomonitorandinterprettheautonomicphysiologicalstateofeveryotherparticipant,inreal
time.)Itisunforeseeablewhatotherunexpectedinformationalsoliesinsignals fromthesamesensors.
Oncetheyenterthedigitalworld,bornanalogdatacanbefusedandminedalongwithborndigital
data.For
example,facialrecognitionalgorithms,whichmightbeerrorproneinisolation,mayyieldnearlyperfectidentity
trackingwhentheycanbecombinedwithborndigitaldatafromcellphones(includingunintendedemanations),
pointofsaletransactions,RFIDtags,andsoforth;andalsowithotherbornanalogdata
suchasvehicletracking
(e.g.,fromoverheaddrones)andautomatedlicenseplaterea ding.Biometricdatacanprovideidentity
informationthatenhancestheprofileofanindividualevenmore,anddataonbehavior(asfromsocial
networks)arebeingusedtoanalyzeattitudesoremotions(“sentiment analysis,”forindividualsorgroups
64
).In
short,moreandmoreinformationcanbecapturedandputinaquantifiedformatsoitcanbetabulatedand
analyzed.
65

62
Koonin,StevenE.,GregoryDoblerandJonathanS.Wurtele,“UrbanPhysics,”AmericanPhysicalSocietyNews,March,
2014.http://www.aps.org/publications/apsnews/201403/urban.cfm
63
Durand,Fredo,etal.,“MITComputerProgramRevealsInvisibleMotioninVideo,”TheNewYorkTimes,video,February
27,2013.https://www.youtube.com/watch?v=3rWycBEHn3s
64
Feldman,Ronen,“TechniquesandApplicationsforSentimentAnalysis,”CommunicationsoftheACM,56:4,pp.8289.
65
MayerSchönberger,ViktorandKennethCukier,BigData:ARevolutionThatWillTransformHowWeLive,Work,and
Think,Boston,NY:HoughtonMifflinHarcourt,2013.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
24
3.2Bigdataanalytics
Analyticsiswhatmakesbigdatacomealive.Withoutanalytics,bigdatasetscouldbestored,andtheycouldbe
retrieved,whollyorselectively.Butwhatcomesoutwouldbeexactlywhatwentin.Analytics,comprisinga
numberofdifferentcomputationaltechnologies,iswhatfuelsthebigdatarevolution.
66
Analyticsiswhat
createsthenewvalueinbigdatasets,vastlymorethanthesumofthevaluesoftheparts.
67
3.2.1Datamining
Datamining,sometimeslo osely equatedtoanalyticsbutactuallyonlyasubsetofit,referstoacomputational
processthatdiscoverspatternsinlargedatasets. Itisaconvergenceofmanyfieldsofacademicresearchin
bothappliedmathematicsandcomputerscience,includingstatistics,databases,artificialintelligence,and
machinelearning.
Likeothertechnologies,advancesindatamininghavearesearchanddevelopmentstage,in
whichnewalgorithmsandcomputerprogramsaredeveloped,andtheyhavesubsequentphasesof
commercializationandapplication.
Dataminingalgorithmscanbetrainedtofindpatternseitherbysupervisedlearning,socalledbecausethe
algorithmis
seededwithmanuallycuratedexamplesofthepatterntoberecognized,orbyunsupervised
learning,wherethealgorithmtriestofindrelatedpiecesofdatawithoutpriorseeding.Arecentsuccessof
unsupervisedlearningalgorithmswasaprogramthat,searchingmillions ofimagesontheweb,figuredouton
itsown
that“cat”wasamuchpostedcategory.
68
Thedesiredoutputofdataminingcantakeseveralforms,eachwithitsownspecializedalgorithms.
69
Classificationalgorithmsattempttoassignobjectsoreventstoknowncategories.Forexample,a
hospitalmightwanttoclassifydischargedpatientsashigh,medium,orlowriskforreadmission.
Clusteringalgorithmsgroupobjectsoreventsintocategoriesbysimilarity,asinthe“cat”example
above.
Regressionalgorithms
(alsocallednumericalpredictionalgorithms)trytopredictnumericalquantities.
Forexample,abankmaywanttopredict,fromthedetailsinaloanapplication,theprobabilityofa
default.
Associationtechniquestrytofindrelationshipsbetweenitemsintheirdataset.Amazon’ssuggested
productsandNetflix’ssuggestedmovie s are
examples.
Anomalydetectionalgorithmslookforuntypicalexampleswithinadataset,forexample,detecting
fraudulenttransactionsonacreditcardaccount.
Summarizationtechniquesattempttofindandpresentsalientfeaturesindata.Examplesincludeboth
simplestatisticalsummaries(e.g.,averagestudenttestscoresbyschoolandteacher),
andhigherlevel
analysis(e.g.,alistofkeyfactsaboutanindividualasgleanedfromallwebpostingsthatmentionher).

66
NationalResearchCouncil,FrontiersinMassiveDataAnalysis,NationalAcademiesPress,2013.
67
(1)Thill,BrentandNicoleHayashi,BigData=BigDisruption:OneoftheMostTransformativeITTrendsOvertheNext
Decade,UBSSecuritiesLLC,October2013.(2)McKinseyGlobalInstitute,CenterforGovernment,andBusinessTechnology
Office,Opendata:Unlockinginnovationandperformancewithliquidinformation,McKinsey&
Company,October2013.
68
Le,Q.V.etal.,“BuildingHighlevelFeaturesUsingLargeScaleUnsupervisedLearning,”
http://static.googleusercontent.com/media/research.google.com/en/us/archive/unsupervised_icml2012.pdf
69
Bramer,M.,“PrinciplesofDataMining,”Springer,2013.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
25
Dataminingissometimesconfusedwithmachinele arning,thelatterabroadsubfieldofcomputersciencein
academicandindustrialresearch.
70
Dataminingmakesuseofmachinelearning,aswellasotherdisciplines,
whilemachinelearninghasapplications tofieldsotherthandatamining,forexample,robotics.
Therearelimitations,bothpracticalandtheoretical,towhatdataminingcanaccomplish,aswellaslimitsto
howaccurateitcanbe.It
mayrevealpatternsandrelationships,butitusuallycannottelltheuserthevalueor
significanceofthesepatterns.Forexample,supervisedlearningbasedonthecharacteristicsofknownterrorists
mightfindsimilarpersons,buttheymightormightnotbeterrorists;anditwouldmissdifferentclassesof
terroristswho
don’tfittheprofile.
Dataminingcanidentifyrelationshipsbetweenbe haviorsand/orvariables,buttheserelationshipsdonot
alwaysindicatecausality.Ifpeoplewholiveunderhighvoltagepowerlineshavehighermorbidity,itmight
meanthatpowerlinesareahazardtopublichealth;oritmightmeanthat
peoplewholiveunderpowerlines
tendtobepoorandhaveinadequateaccesstohealthcare.Thepolicyimplicationsarequitedifferent.While
socalledconfoundingvariables(inthisexample,income)canbecorrectedforwhentheyareknownand
understood,thereisnosurewaytoknowwhetherall
ofthemhavebeenidentified.Imputingtruecausalityin
bigdataisaresearchfieldinitsinfancy.
71
Manydataanalysesyieldcorrelationsthatmightormightnotreflectcausation.Somedataanalysesdevelop
imperfectinformation,eitherbecauseoflimitationsofthealgorithms,orbytheuseofbiasedsampling.
Indiscriminateuseoftheseanalysesmaycausediscriminationagainstindividualsoralackoffairnessbecauseof
incorrect
associationwithaparticulargroup.
72
Inusingdataanalyses,particularcaremustbetakentoprotect
theprivacyofchildrenandotherprotectedgroups.
Realworlddataareincompleteandnoisy.Thesedataqualityissueslowertheperformanceofdatamining
algorithmsandobscureoutputs.Wheneconomicsallow,carefulscreeningandpreparationoftheinput
data
canimprovethequalityofresults,butthisdatapreparationisoftenlaborintensiveandexpensive.Users,
especiallyinthecommercialsector,musttradeoffcostandaccuracy,sometimeswithnegativeconsequences
fortheindividualrepresentedinthedata.Additionally,realworlddatacancontainextremeeventsoroutliers.
Outliers
mayberealeventsthat,bychance,areoverrepresentedinthedata;ortheymaybetheresultofdata
entryordatatransmissionerrors.Inbothcasestheycanskewthemodelanddegradeperformance.Thestudy
ofoutliersisanimportantresearchareaofstatistics.
3.2.2Datafusionandinformationintegration
Datafusionisthemergingofmultipleheterogeneousdatasetsintoonehomogeneousrepresentationsothat
theycanbebetterprocessedfordataminingandmanagement.Datafusionisusedinanumberoftechnical
domainssuchassensornetworks,video/imageprocessing,roboticsandintelligentsystems,andelsewhere.

70
Mitchell,TomM.,“TheDisciplineofMachineLearning,”TechnicalReportCMUML06108,CarnegieMellonUniversity,
July2006.
71
DARPA,forexample,hasaprojectinvolvingmachinelearningandothertechnologiestobuildmedicalcausalmodelsfrom
analysisofcancerliterature,leveragingthegreatercapacityofacomputerthanapersontoprocessinformationfroma
largenumberofsources.Seedescriptionathttp://www.darpa.mil/Our_Work/I2O/Programs/Big_Mechanism.aspx
72
“Dataminingbreaksthebasicintuitionthatidentityisthegreatestsourceofpotentialharmbecauseitsubstitutes
inferenceforidentifyinginformationasabridgetogetatadditionalfacts.”Barocas,SolonandHelenNissenbaum,“Big
Data’sEndRunAroundAnonymityandConsent,”ChapterII,inLane,Julia,etal.,
Privacy,BigData,andthePublicGood,
CambridgeUniversityPress,2014.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
26
Dataintegrationisdifferentiatedfromdatafusioninthatintegrationmorebroadlycombinesdatasetsand
retainsthelargersetofinformation.Indata fusion, thereisusuallyareductionorreplacementtechnique.Data
fusionisfacilitatedbydatainteroperability,theabilityfortwosystemstocommunicateandexchangedata.
Data
fusionanddataintegrationarekeytechniquesforbusinessintelligence.Retailersareintegratingtheir
online,instore,andcatalogsalesdataba ses tocreatemorecompletepicturesoftheircustomers.Williams
Sonoma,forexample,hasintegratedcustomerdatabases withinformationon60millionhouseholds.Variables
includinghouseholdincome,housingvalues,and
numberofchildrenaretracked.Itisclaimedthattargeted
emailsbasedonthisinformationyieldtento18timestheresponserateofemailsthatarenottargeted.
73
Thisis
asimpleillustrationofhowmoreinformationcanleadtobetterinferences.Techniquesthatcanhelpto
preserveprivacyareemerging.
74
Thereisagreatamountofinteresttodayinmultisensordatafusion.
75
Thebiggesttechnicalchallengesbeing
tackledtoday,generallythroughdevelopmentofnewandbetteralgorithms,relatetodataprecision/resolution,
outliersandspuriousdata,conflictingdata,modality(bothheterogeneousandhomogeneousdata)and
dimensionality,datacorrelation,dataalignment,associationwithindata,centralizedvs.decentralized
processing,operationaltiming,andtheabilityto
handledynamic vs.staticphenomena.Privacyconcernsmay
arisefromsensorfidelityandprecisionaswellascorrelationofdatafrommultiplesensors.Asinglesensor’s
outputmightnotbesensitive,butthecombinationfromtwoormoremayraiseprivacyconcerns.
3.2.3Imageandspeechrecognition
Image‐andspeechrecognitiontechnologiesareabletoextractinformation,insomelimitedcasesapproaching
humanunderstanding,frommassivecorpusesofstillimages,videos,andrecordedorbroadcastspeech.
Urbansceneextractioncanbeaccomplishedusingavarietyofdatasourcesfromphotosandvideostoground
basedLiDAR(aremote
sensingtechniqueusinglasers).
76
Inthegovernmentsector,citymodelsarebecoming
vitalforurbanplanningandvisualization.Theyareequallyimportantforabroadrangeofacademicdisciplines
includinghistory,archeology,geography,andcomputergraphicsresearch.Digitalcitymodelsarealsocentralto
popularconsumermappin g andvisualizationapplicationssuchasGoogleEarth
andBingMaps,aswellasGPS
enablednavigationsystems.
77
Sceneextractionis anexampleoftheinadvertentcaptureofpersonal
informationandcanbeusedfordatafusionthatrevealspersonalinformation.
Facialrecognitiontechnologiesarebeginningtobepracticalincommercialandlawenforcementapplications.
78

Theyareabletoacquire,normalize,andrecognizemovingfacesindynamicscenes.Realtimevideosurve illance
withsinglecamerasystems(andsomewithmulticamerasystems,whichcanbothrecognizeobjectsand
analyzeactivity)hasawidevarietyofapplicationsinbothpublicandprivateenvironments,suchashomeland

73
Manyika,J.etal.,“BigData:Thenextfrontierforinnovation,competition,andproductivity,”McKinseyGlobalInstitute,
2011.
74
NavarroArriba,G.andV.Torra,"Informationfusionindataprivacy:Asurvey,"InformationFusion,13:4,2012,pp.235
244.
75
Khaleghi,B.etal.,"Multisensordatafusion:Areviewofthestateoftheart,"InformationFusion,14:1,2013,pp.2844.
76
Lam,J.,etal.,"Urbansceneextractionfrommobilegroundbasedlidardata,"Proceedingsof3DPVT,2010.
77
Agarwal,S.,etal.,"BuildingRomeinaday,"CommunicationsoftheACM,54:10,2011,pp.105112.
78
WorkshoponFrontiersinImageandVideoAnalysis,NationalScienceFoundation,FederalBureauofInvestigation,
DefenseAdvancedResearchProjectsAgency,andUniversityofMarylandInstituteforAdvancedComputerStudies,January
2829,2014.http://www.umiacs.umd.edu/conferences/fiva/
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
27
security,crimeprevention,trafficcontrol,accidentpredictionanddetection,andmonitoringpatients,the
elderly,andchildrenathome.
79
Dependingontheapplication,useofvideosurveillanceisatvaryinglevelsof
deployment.
80
Additionalcapabilitiesofimagerecognitioninclude
Videosummarizationandscenechangedetection(thatis,pickingthesmallnumber ofimagesthat
summarizeaperiodoftime)
Precisegeolocationinimageryfromsatellitesordrones
Imagebasedbiometrics
Humanintheloopsurveillancesystems
Reidentification
ofpersonsandvehicles,thatis,trackingthesamepersonorvehicleasitmovesfrom
sensortosensor
Humanactivityrecognitionofvariouskinds
Semanticsummarization(thatis,convertingpicturesintotextsummaries)
Althoughsystemsareexpectedtobecomeabletotrackobjectsacrosscameraviewsand
detectunusual
activitiesinalargeareabycombininginformationfrommultiplesources,reidentificationofobjects remains
hardtodo(achallengeforintercameratracking),asisvideosurveillanceincrowdedenvironments.
Althoughthedatatheyuseareoftencapturedinpublicareas,sceneextractiontechnologieslikeGoogleStreet
Viewhavetriggeredprivacyconcerns.PhotoscapturedforuseinStreetViewmaycontainsensitiveinformation
aboutpeoplewhoareunawaretheyarebeingobservedandphotographed.
81

Socialmediadatacanbeusedasaninputsourceforsceneextractiontechniques.Whenthesedataareposted,
however,usersareunlikelytoknowthattheirdatawouldbeusedintheseaggregatedwaysandthattheirsocial
mediainformation(althoughpublic)mightappearsynthesizedinnewforms.
82
Automatedspeechrecognitionhasexistedsinceatleastthe1950s,
83
butrecentdevelopmentsoverthelast10
yearshaveallowedfornovelnewcapabilities.Spokentext(e.g.,newsbroadcastersreadingpartofadocument)
cantodayberecognizedwithaccuracyhigherthan95percentusingstateofthearttechniques.Spontaneous
speechismuchhardertorecognizeaccurately.Inrecent
yearstherehasbeenadramaticincreaseinthe
corpusesofspontaneousspeechdataa vailabletoresearchers,whichhasallowedforimprovedaccuracy.

79
Forexample,NewarkAirportrecentlyi nstalledasystemof171LEDlights(fromSensity[http://www.sensity.com/])that
containspecialchipstoconnecttosensorsandcamerasoverawirelesssystem.Thesesystemsallowforadvanced
automaticlightingtoimprovesecurityinplaceslikeparkinggarages,andindoingsocapturealargerangeofinformation.
80
Thiswasdiscussedattheworkshopcitedinfootnote78.
81
SuchconcernsarelikelytogrowascommercialsatelliteimagerysystemssuchasSkybox(http://skybox.com/)provide
thebasisformoreservices.
82
Billitteri,ThomasJ.,etal.“SocialMediaExplosion:Dosocialnetworkingsitesthreatenprivacyrights?”CQResearcher,
January25,2013,23:84104.
83
Juang,B.H.andLawrenceR.Rabiner,“AutomatedSpeechRecognitionABriefHistoryoftheTechnologyDevelopment,”
October8,2004.http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/354_LALIASRHistoryfinal108.pdf
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
28
Overthenextfewyearsspeechrecognitioninterfaceswillbeinmanymoreplaces.Forexample,multiple
companiesareexploringspeechrecognitiontocontrol televisions andcars,tofindashowonTV,ortoschedule
aDVRrecording.ResearchersatNuancesaytheyareactivelyplanninghowspeechtechnologywould
haveto
bedesignedtobeavailableonwearablecomputers.
84
Googlehasalreadyimplementedsomeofthisbasic
functionalityinitsGoogleGlassproduct,andMicrosoft’sXboxOnesystemalreadyintegratesmachinevision
andmultimicrophoneaudioinputforcontrollingsystemfunctions.
3.2.4Socialnetworkanalysis
Socialnetworkanalysisreferstotheextractionofinformationfromavarietyofinterconnectingunitsunderthe
assumptionthattheirrelationshipsareimportantandthattheunitsdonotbehaveautonomously.
85
Social
networksoftenemergeinanonlinecontext.Themostobviousexamplesarededicatedonlinesocialmedia
platforms,suchasFacebook,LinkedInandTwitter,whichprovidenewaccesstosocialinteractionbyallowing
userstoconnectdirectlywitheachotherovertheInternettocommunicateandshareinformation.Offline
human
socialnetworksmayalsoleaveanalyzabledigitaltraces,suchasinphonecallmetadatarecordsthat
recordwhichphoneshaveexchangedcallsortexts,andforhowlong.Analysisofsocialnetworksisincreasingly
enabledbytherisingcolle ctionofdi gitaldatathatlinkspeopletogether,especiallywhenitiscorrelated
toother
dataormetadataabouttheindividual.
86
Toolsforsuchanalysisarebeingdevelopedandmadeavailable,
87
motivatedinpartbythegrowingamountofsocialnetworkcontentaccessiblethroughopenapplication
programminginterfacestoonlinesocialmediaplatforms.Thissortofanalysisisanactivearenaforresearch.
Socialnetworkanalysiscomplementsanalysisofconventionaldatabases,andsomeofthetechniquesused(e.g.,
clusteringinassociation
networks)canbeusedineithercontext.Socialnetworkanalysiscanbemorepowerful
becauseoftheeasyassociationofdiversekindsofinformation(i.e.,considerabledatafusionispossible).It
lendsitselftovisualizationoftheresults,whichaidsininterpretingtheresultsoftheanalysis.Itcanbeused
to
learnaboutpeoplethroughtheirassociationwithothers,inacontextofpeople’stendencytoassociatewith
otherswhoarehavesomesimilaritiestothemselves.
88
Socialnetworkanalysisisyieldingresultsthatmaysurprisepeople.Inparticular,uniqueidentificationofan
individualiseasierthanfromdatabaseanalysisalone.Moreover,itisachievedthroughmorediversekin dsof

84
“WhereSpeechRecognitionisGoing,”TechnologyReview,May29,2012.http://www.kurzweilai.net/wherespeech
recognitionisgoing
85
Wasserman,S.“Socialnetworkanalysis:Methodsandapplications,”CambridgeUniversityPress,8,1994.
86
See,forexample:(1)Backstrom,Lars,etal.,“InferringSocialTiesfromGeographicCoincidences,”Proceedingsofthe
NationalAcademyofSciences,2010.(2)Backsrom,Lars,etal.,“WhereforeArtThoughR3579X?AnonymizedSocial
Networks,HiddenPatterns,andStructuralSteganography,”InternationalWorldWideWebConference2007,Alberta,
Canada,May12,2007.
87
Avarietyoftoolsexistformanaging,analyzing,visualizingandmanipulatingnetwork(graph)datasets,suchas
Allegrograph,GraphVis,R,visoneandWolframAlpha.Some,suchasCytoscape,GephiandNetvizareopensource.
88
(1)Geetoor,L.andE.Zheleva,“Preservingtheprivacyofsensitiverelationshipsingraphdata,”Privacy,security,andtrust
inKDD,153171,2008.(2)Mislove,A.,etal.,“AnanalysisofsocialbasednetworkSybildefenses,”ACMSIGCOMM
ComputerCommunicationReview,2011.(3)Backstrom,Lars,etal.,“Find
MeIfYouCan:ImprovingGeographicPrediction
withSocialandSpatialProximity,”Proceedingsofthe19thinternationalconferenceonWorldWideWeb,2010.(4)
Backstrom,L.andJ.Kleinberg,“RomanticPartnershipsandtheDispersionofSocialTies:ANetworkAnalysisofRelationship
StatusonFacebook,Proceedingsofthe17thACMConference
onComputerSupportedCooperativeWorkandSocial
Computing(CSCW),2014.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
29
datathanmanypeoplemayunderstand ,contributingtotheerosionofanonymity.
89
Thestructureofan
individual’snetworkisuniqueanditselfservesasanidentifier;cooccurrenceintimeandspaceisasignifi cant
meansofidentification;and,asdiscussedelsewhereinthisreport,differentkindsofdatacanbecombinedto
fosteridentification.
90
Socialnetworkanalysisisusedincriminalforensicinvestigationstounderstandthelinks,means,andmotivesof
thosewhomayhavecommittedcrimes.Inparticular,socialnetworkanalysishasbeenusedtobetter
understandcovertterroristnetworks,whosedynamicsmaybedifferentfromthoseofovertnetworks.
91
Intherealmofcommerce,itiswellunderstoodthatwhataperson’sfriendslikeorbuycaninfluencewhatheor
shemightbuy.Forexample,in2010,itwasreportedthathavingoneiPhoneowningfriendmakesaperson
threetimesmorelikelytoownaniPhonethan
otherwise.ApersonwithtwoiPhoneowningfriendswasfive
timesmorelikelytohaveone.
92
Suchcorrelationsemergeinsocialnetworkanalysisandcanbeusedtohelp
predictproducttrends,tailormarketingcampaignstowardsproductsanindividualmaybemorelikelytowant,
andtargetcustomers(saidtohavehigher“networkvalue”)withacentralrole(andalargeamountofinfluence)
ina
socialnetwork.
93
Becausediseaseiscommonlyspreadviadirectcontactbetweenindividuals(humansoranimals),understanding
socialnetworksthroughwhateverproxiesareavailablecansuggestpossibledirectcontactsandtherebyassistin
monitoringandstemmingtheoutbreakofdisease.
ArecentstudybyresearchersatFacebookanalyzedtherelationshipbetweengeographiclocation
ofindividual
usersandthatoftheirfriends.Fromthisanalysis,theywereabletocreateanalgorithmtopredictthelocation
ofanindividualuserbaseduponthelocationsofasmallnumberof friendsintheirnetwork,withhigher
accuracythansimplylookingattheuser’sIPaddress.
94
Therearemanycommercial“sociallistening”services ,suchasRadian6/SalesforceCloud,CollectiveIntellect,
Lithium,andothers,thatminedatafromsocialnetworkingfeedsforuseinbusinessintelligence.
95
Coupled

89
(1)Narayanan,A.andV.Shmatikov,“Deanonymizingsocialnetworks,”30thIEEESymposiumonSecurityandPrivacy,
173187,2009.(2)Crandall,DavidJ.,etal.,“Inferringsocialtiesfromgeographiccoincidences,ProceedingsoftheNational
AcademyofSciences,107:52,2010.(3)Backstrom,L,C.DworkandJ.Kleinberg,“WhereforeArtThouR3579X?Anonymized
SocialNetworks,HiddenPatterns,andStructuralSteganography,”Proceedingsofthe
16thIntl.WorldWideWeb
Conference,2007.(4)Saramäki,Jari,etal.,"Persistenceofsocialsignaturesinhumancommunication,"Proceedingsofthe
NationalAcademyofSciences,111.3:942947,2014.
90
Fienberg,S.E.,"IsthePrivacyofNetworkDataanOxymoron?"JournalofPrivacyandConfidentiality,4:2,2013.
91
Krebs,V.E.,"Mappingnetworksofterroristcells,"Connections,24.3:4352,2002.
92
Sundsøy,P.R.,etal.,"Productadoptionnetworksandtheirgrowthinalargemobilephonenetwork,"AdvancesinSocial
NetworksAnalysisandMining(ASONAM),2010.
93
Hodgson,Bob,“AVitalNewMarketingMetric:TheNetworkValueofaCustomer,”PredictiveMarketing:OptimizeYour
ROIWithAnalytics.http://predictivemarketing.com/index.php/avitalnewmarketingmetricthenetworkvalueofa
customer/
94
Backstrom,Larsetal,"Findmeifyoucan:improvinggeographicalpredictionwithsocialandspatialproximity,"
Proceedingsofthe19thinternationalconferenceonWorldWideWeb,2010.
95
“Top20socialmediamonitoringvendorsforbusiness,”Socialmedia.biz,http://socialmedia.biz/2011/01/12/top20
socialmediamonitoringvendorsforbusiness/
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
30
withsocialnetworkanalysis,thisinformationcanbeusedtoevaluatechanginginfluencesandthespreadof
trendsbetweenin dividualsandcommunitiestoinformmarketingstrategies.
3.3Theinfrastructurebehindbigdata
Bigdataanalyticsrequiresnotjustalgorithmsanddata,butalsophysicalplatfo rmswherethedataarestored
andanalyzed.Therelatedsecurityservicesusedforpersonaldata(seeSections4.1and4.2)arealsoan
essentialcomponentoftheinfrastructure.Onceavailableonlytolargeorganizations,thisclassofinfrastructure
isnowavailablethrough“thecloud”tosmallbusinessesandtoindividuals.Totheextentthatthesoftware
infrastructureiswidelyshared,privacypreservinginfrastructureservicescanalsobemorereadilyused.
3.3.1Datacenters
Onewaytothinkaboutbigdataplatformsisinphysicalunitsof“datacenters.”Inrecentyears,datacenters
havebecomealmoststandardcommodities.Atypicaldatacenterisalarge,warehouselikebuilding ona
concreteslabthesizeofafewfootballfields.Itislocatedwithgood
accesstocheapelectricpowerandtoa
fiberoptic,Internetbackboneconn ection,usuallyinaruralorisolatedarea.Thetypicalcenterconsumes2040
megawattsofpower(theequivalent ofacitywith20,00040,000residents)andtodayhousessometensof
thousandsofserversandharddisk
drives,totalingsometensofpetabytes.
96
Worldwide,thereareroughly
6000datacentersofthisscale,abouthalfintheUnitedStates.
97
Datacentersarethephysicallocusofbi gdatainallitsforms.Largedatacollectionsareoftenreplicatedin
multipledatacen ters toimprovebothperformanceandrobustness.Thereisagrowingmarketplaceinselling
datacenterservices.
Specializedsoftwaretechnologyallows thedatainmultipledatacenters(and
spreadacrosstensofthousandsof
processorsandharddiskdrives)tocooperateinperformingthetasksofdataanalytics,therebyprovidingboth
scalingandbetterperformance.Forexample,MapReduce(originallyaproprietarytechnologyofGoogle,but
nowatermusedgenerically)isaprogrammingmodelforparalleloperationsacrossa
practicallyunlimited
numberofprocessors;Hadoopisapopularopensourceprogrammingplatformandprogramlibrarybasedon
thesameideas;NoSQL(thenamederivedfromnotStructuredQueryLanguage”)isasetofdatabase
technologiesthatrelaxesmanyoftherestrictionsoftraditional,“relational”databasesandallowsforbetter
scalabilityacrossthemanyprocessorsinoneormoredatacenters.Contemporaryresearchisaimedatthenext
generationbeyondHadoop.OnepathisrepresentedbyAccumulo,initiatedbytheNationalSecurityAgency
andtransitionedtotheopensourceApachecommunity.
98
AnotheristheBerkeleyDataAnalyticsStack,an
opensourceplatformthatoutperformsHadoopbyafactorof100formemoryintensivedataanalyticsandis
beingusedbysuchcompaniesasFoursquare,Conviva,Klout,Q uantifind,Yahoo,andAmazonWebServices.
99

Sometimestermed“NoHadoop”(toparallelthemovementfromSQLtoNoSQL),technologiesthatfitthistrend
includeGoogle’sDremel,MPI(typicallyusedinsupercomputing),Pregel(forgraphs),andCloudscale(forreal
timeanalytics).

96
Apetabyteis10
15
bytes.OnepetabytecouldstoretheindividualgenomesoftheentireU.S.population.Thehuman
brainhasbeenestimatedtohaveacapacityof2.5petabytes.
97
McLellan,Charles,“The21
st
CenturyDataCenter:AnOverview,”ZDNet,April2,2013.http://www.zdnet.com/the21st
centurydatacenteranoverview7000012996/
98
See:http://accumulo.apache.org/
99
See:https://amplab.cs.berkeley.edu/software/
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
31
3.3.2Thecloud
The“cloud”isnotjusttheworldinventoryofdatacenters(althoughmuchofthepublicmaythinkofitassuch).
Rather,onewayofunderstandingthecloudisasasetofplatformsandservicesmadepossiblebythephysical
commoditizationofdatacenters.Whenonesaysthatdata
are“inthecloud,”onerefersnotjusttothephysical
harddiskdrivesthatexist(somewhere!)withtheda ta, butalsotothecomplexinfrastructureofapplication
programs,middleware,networkingprotocols,and(notleast)businessmodelsthatallowthatdatatobe
ingested,accessed,andutilized,allwithcosts
thatarecompetitivelyallocated.Thecommercialentitiesthat,in
aggregate,provisionthecloudexistinanecosystemthathasmanyhierarchicallevelsandmanydifferent
coexistingmodelsofvalueadded.Theremaybeseveralhandoffsofresponsibilitybetweentheenduserand
thephysicaldatacenter.
Today’scloudprovidersoffersome
securitybenefits(andthroughthat,privacybenefits)ascomparedto
yesterday’sconventionalcorporatedatacentersorsmallbusinesscomputers.
100
Theseservicesmayinclude
betterphysicalprotectionandmonitoring,aswellascentralizedsupportstaffing,training,andoversight.Cloud
servicesalsoposenewchallengesforsecurity,asubject ofcurrentresearch.Bothbenefitsandriskscomefrom
thecentralizationofresources:Moredataareheldbyagivenentity(albeit
distributedacrossmultipleserv ers
orsites),andacloudprovidercanperformbetterthanseparatelyhelddatacentersbyapplyinghighstandards
torecruitingandmanagingpeopleandsystems.
Usageofthecloudandindividualinteractionswithit(whetherwittingornot)areexpectedtoincrease
dramaticallyincomingyears.The
riseofbothmobileapps,
101
reinforcingtheuseofcellphonesandtabletsas
platforms,andbroadlydistributedsensorsisassociatedwiththegrowinguseofcloudsystemsforstoring,
processing,andotherwiseactingoninformationcontributedbydisperseddevices.Althoughprogressinthe
mobileenvironmentimprovestheusabilityofmobile cloudapplications,itmay
bedetrimentaltoprivacytothe
extentthatitmoreeffectivelyhidesinformationexchangefromtheuser.Asmorecoremobilefunctionalityis
transitionedtothecloud,largeramountsofinformationwillbeexchanged,andusersmaybesurprisedbythe
natureoftheinformationthatnolongerremainslocalizedto
theircellphone.Forexample,cloudbaseds creen
rendering(or“virtualizedscreens”)forcellphoneswouldmeanthattheimagesshownonacellphonescreen
willactuallybecalculatedonthecloudandtransmittedtothemobiledevice.Thismeansalltheimagesonthe
screenofthemobile
devicecanbeaccessedandmanipulatedfromthecloud.
Cloudarchitecturesarealsobeingusedincreasinglytosupportbigdataanalytics,bothbylargeenterprises(e.g.,
Google,Amazon,eBay)andbysmallentitiesorindividualswhomakeadhocorroutineuseofpubliccloud
platforms(e.g.,AmazonWebServices,Google
CloudPlatform,MicrosoftAzure)inlieuofacquiringtheirown
infrastructure.SocialmediaservicessuchasFacebookandTwitteraredeployedandanalyzedbytheirproviders
usingcloudsystems.Theseusesrepresentakindofdemocratizationofanalytics,withthepotentialtofacilitate
newbusinessesandmore.Prospectsforthe
futureincludeexplorationofoptionsforfederatingor

100
CloudSecurityAlliance,“BigDataWorkingGroup:CommentonBigDataandtheFutureofPrivacy,”March2014.
https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Comment_on_Big_Data_Future_of_Privacy.pdf
101
Qi,H.andA.Gani,"Researchonmobilecloudcomputing:Review,trendandperspectives,"DigitalInformationand
CommunicationTechnologyandit'sApplications(DICTAP),2012SecondInternationalConferenceon,2012.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
32
interconnectingcloudapplicationsandforreducingsomeoftheheterogeneityinapplicationprogramming
interfacesforcloudapplications.
102


102
Jeffery,K.etal.,"Avisionforbettercloudapplications,"Proceedingsofthe2013InternationalWorkshoponMultiCloud
ApplicationsandFederatedClouds,Prague,CzechRepublic,MODAClouds,ACMDigitalLibrary,April2223,2013.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
33
4.TechnologiesandStrategiesforPrivacyProtection
Datacomeintoexistence,arecollected,andarepossiblyprocessedimmediately(includingadding“metadata”),
possiblycommunicated,possiblystored(locally, remotely,orboth),possiblycopied,possiblyanalyzed,possibly
communicatedtousers,possiblyarchived,possiblydiscarded.Technologyatanyofthesestagescanaffect
privacypositivelyornegatively.
Thischapterfocusesonthe
positiveandassessessomeofthekeytechnologiesthatcanbeusedinserviceofthe
protectionofprivacy.Itseekstoclarifytheimportantdistinctionsbetweenprivacyand(cyber)security,aswell
asthevital,butyetlimited,rolethatencryptiontechnologycanplay.Someoldertechniques,suchas
anonymization,whilevaluableinthepast,areseenashavingonlylimitedfuturepotential.Newertechnologies,
someenteringthemarketplaceandsomerequiringfurtherresearch,aresummarized.
4.1Therelationshipbetweencybersecurityandprivacy
Cybersecurityisadiscipline,orsetoftechnologies,thatseekstoenforcepoliciesrelatingtoseveraldifferent
aspectsofcomputeruseandelectroniccommunication.
103
Atypicallistofsuchaspectswouldbe
identityandauthentication:Areyouwhoyousayyouare?
authorization:Whatareyouallowedtodo?
availability:Canattackersinterferewithauthorizedfunctions?
confidentiality:Candataorcommunicationsbe(passively)copiedbysomeonenotauthorizedtodo
so?
integrity:Candataorcommunicationsbe(actively)changedormanipulatedbysomeonenot
authorized?
nonrepudiation,auditability:Canactions(paymentsmayprovidethebestexample)laterbeshownto
haveoccurred?
Goodcybersecurityenforcespoliciesthatarepreciseandunambiguous.Indeed,suchclarityofpolicy,
expressiblein
mathematicalterms,isanecessaryprerequisitefortheHolyGrai lofcybersecurity,“provably
secure”systems.Atpresent,provablesecurityexistsonlyinverylimiteddomai ns,forexample,forcertain
functionsonsomekindsofcomputerchips.Itisagoalofcybersecurityresear chtoextendthescopeof
provablysecuresystems
tolargerandlargerdomains.Meanwhile,practicalcybersecuritydrawsonthe
emergingprinciplesofsuchresearch,butitisguidedevenmorebypracticallessonslearnedfromknownfailures
ofcybersecurity.Therealisticgoalisthatthepracticeofcybersecurityshouldbecontinuouslyimprovingsoas
tobe,inmost
placesandatmostofthetime,aheadoftheevolvingthreat.
Poorcybersecurityisclearlyathreattoprivacy.Privacycanbebr eachedbyfailuretoenforceconfidentialityof
data,byfailureofidentityandauthenticationprocesses,orbymorecomplexscenariossuchasthose
compromisingavailability.

103
PCASThasaddressedissuesincybersecurity,bothinreviewingtheNITRDprogramsanddirectlyina2013report,
ImmediateOpportunitiesforStrengtheningtheNation’sCybersecurity.
http://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_cybersecurity_nov2013.pdf
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
34
Securityandprivacyshareafocusonmalice.Thesecurityofdatacanbecompromisedbyinadvertenceor
accident,butitcanalsobecompromisedbecausesomepartyactedknowinglytoachievethecompromisein
thelanguageofsecurity,committedanattack.Substitutingthewords“breach”or“invasion”for“compromise”
or“attack,”thesameconceptsapplytoprivacy.
Eveniftherewereperfectcybersecurity,however,privacywouldremainatrisk.Violationsofprivacyare
possibleevenwhenthereisnofailureincomputersecurity.Ifanauthorizedindividualchoosestomisuse(e.g.,
disclose)data,whatisviolatedisprivacypolicy,
notsecuritypolicy.Or,aswehavediscussed(seeSection3.1.1),
privacymaybeviolatedbythefusionofdataevenifperformedbyauthorizedindividualsonsecurecomputer
systems.
104
Privacyisdifferentfromsecurityinotherrespects.Foronething,itishardertocodifyprivacypolicies precisely.
Arguablythisisbecausethepresup positionsandpreferencesofhumanbeingshavegreaterdiversitythanthe
usefulscopeofassertionsaboutcomputersecurity.Indeed,howtocodifyhumanprivacypreferencesis
an
important,nascentareaofresearch.
105
Whenpeopleprovideassurance(atsomelevel)thatacomputersystemissecure,theyaresayingsomething
aboutapplicationsthatarenotyetinvented:Theyareassertingthattechnologicaldesignfeaturesalready in
themachinetodaywillprev entsuchapplicationprogramsfromviolatingpertinentsecuritypoliciesinthat
machine,even
tomorrow.
106
Assurancesaboutprivacyaremuchmoreprecarious.Sincenotyetinvented
applicationswillhaveaccesstonotyetimaginednewsourcesofdata,aswellastonotyetdiscoveredpowerful
algorithms,itmuchhardertoprovide, today,technologicalsafeguardsagainstanewroutetoviolationof
privacytomorrow.Security
dealswithtomorrow’sthreatsagainsttoday’splatforms.Thatishardenough.But
privacydealswithtomorrow’sthreatsagainsttomorrow’splatforms,sincethose“platforms”comprisenotjust
hardwareandsoftware,butalsonewkindsofdataandnewalgorithms.
Computerscientistsoftenworkfromthebasisofaformalpolicyforsecurity,
justasengineersaimtodescribe
somethingexplicitlysothattheycandesignspe cific waystodealwithitbypurelytechnicalmeans.Asmore
computerscientistsbegintothinkaboutprivacy,thereisincreasingattentiontoformalarticulationofprivacy
policy.
107
Tocaricature,youhavetoknowwhatyouaredoingtoknowwhetherwhatyouaredoingisdoingthe
rightthing.
108
Researchaddressingthechallengesofaligningregulationsandpolicieswithsoftware

104
Therearealsochoicesinthedesignandimplementationofsecuritymechanismsthataffectprivacy.Inparticular,
authenticationortheattempttodemonstrateidentityatsomelevelcanbedonewithvaryingdegreesofdisclosure.See,
forexample:ComputerScienceandTelecommunicationsBoard,WhoGoesThere:AuthenticationThroughtheLens
of
Privacy,NationalAcademiesPress,2003.
105
Suchresearchcaninformeffortstoautomatethecheckingofcompliancewithpoliciesand/orassociatedauditing.
106
Thisfutureproofingremainshardtoachieve;PCAST’scybersecurityreportadvocatedapproachesthatwouldbemore
durablethanthekindsofcheckliststhatareeasilyrenderedobsolete.See:
http://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_cybersecurity_nov2013.pdf
107
See,forexample:(1)Breaux,TravisD.,andAshwiniRao,“FormalAnalysisofPrivacyRequirementsSpecificationsfor
MultiTierApplications,”21
st
IEEERequirementsEngineeringConference(RE2013),RiodeJaneiro,Brazil,July2013.
http://www.cs.cmu.edu/~agrao/paper/Analysis_of_Privacy_Requirements_Facebook_Google_Zynga.pdf(2)Feigenbaum,
Joan,etal.,“TowardsaFormalModelofAccountability,”NewSecurityParadigmsWorkshop2011,MarinCounty,CA,
September1215,2011.http://www.nspw.org/papers/2011/nspw2011feigenbaum.pdf
108
Landwehr,Carl,“EngineeredControlsforDealingwithBigData,”Chapter10,inLane,Julia,etal.,Privacy,BigData,and
thePublicGood,CambridgeUniversityPress,2014.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
35
specificationsincludesfor mallanguagestoexpresspoliciesandsystemrequirements;toolstoreasonabout
conflicts,inconsistencies,andambiguitieswithinandamongpoliciesandsoftwarespecifications;methodsto
enablerequirementsengineers,businessanalysts,andsoftwaredeveloperstoanalyzeandrefinepolicyinto
measurablesystemspecificationsthatcanbemonitoredovertime;formalizing
andenforcingprivacythrough
auditingandaccountabilitysystems;privacycomplianceinbigdatasystems;andformalizingandenforcing
purposerestrictions.
4.2Cryptographyandencryption
Cryptographycomprisesasetofalgorithmsandsystemdesignprinciples,somewelldevelopedandothers
nascent,forprotectingdata.Cryptographyis afieldofknowledgewhoseproductsareencryptiontechnology.
Withwelldesignedprotocols,encryptiontechnologyisaninhibitortocompromisingprivacy,butitisnota
“silverbullet.”
109

4.2.1Wellestablishedencryptiontechnology
Usingcryptography,readabledataofanykind,termedplaintext,aretransformedintowhatare,forallintents
andpurposes,incomprehensiblestringsofprovablyrandombits,socalledcryptotext.Cryptotextrequiresno
securityprotectionofanykind.Itcanbestoredinthecloudorsentanywherethatisconvenient.Itcan
besent
promiscuouslytoboththeNSAandRussianFSB.Iftheyhaveonlycryptotextandifitwasproperlygenerated
inaprecisemathematicalsenseit is uselesstothem.Theycanneitherreadthedatanorcomputewithit.
Whatisneededtodecrypt,toturn
cryptotextbackintotheoriginalplaintext,isa“key,”whichisinpracticea
stringofbitsthatissupposedtobeknownto(orcomputableby)onlyauthorizedusers.Onlywiththekeycan
encrypteddatabeused,i.e.,theirvalueread.
Inthecontextofprotectingprivacy,itis
primarilynotthecryptography thatisofconcern.
110
Rather,
compromisesofdatawilloccurinoneoftwomainways:
Datacanbestolen,ormistakenlyshared,beforetheyhavebeenencryptedoraftertheyhavebeen
decrypted.Manyattacks onsupposedlyencrypteddataareactuallyattacksonmachinesthatcontain
howeverbrieflyunencryptedplaintext.Forexample,inTarget’s2013breachofonehundredmillion
debitcardnumberandpersonalidentificationnumbers(PINs),thePINswerepresentinunencrypted
formonlyephemerally.Theywerestolennonetheless.
111

Keysmustbeauthorized,generated,distributed,andused.Ateverystageofakey’slife,itispotentially
opentocompromiseormisusethatcanultimatelycompromisethedatathatthekeywasintendedto
protect.Nosystembasedonencryptionissecure,ofcourse,ifpersonswithaccess
toprivatekeyscan
becoercedintosharingthem.

109
Theuseofthistermincomputingoriginatedwithwhatisnowviewedasaclassicarticle:Brooks,FredP.,“Nosilver
bulletEssenceandAccidentsofSoftwareEngineering”,IEEEComputer20:4,April1987,pp.1019.
110
Attacksthatcompromisethehardwareorsoftwarethatdoestheencrypting(forexample,thepromulgationof
intentionallyweakcryptographystandards)canbeconsideredtobeavariantofattacksthatrevealplaintext.
111
“KrebsonSecurity,collectedpostsonTargetdatabreach,”2014.http://krebsonsecurity.com/tag/targetdatabreach/
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
36
Untilthe1970s,keysweredistributedphysically,onpaperorcomputermedia,protectedbyregisteredmail,
armedguards,oranythinginbetween.Theinventionof“publickeycryptography”
112
changedeverything.
Publickeycryptography,asthenamei mplies,allow s individualstobroadcastpubliclytheirpersonalkey.But
thispublickeyisonlyanencryptionkey,usefulforturningplaintextintocryptotextthatismeaninglessto
others.Itscorresponding“privatekey,”usedtotrans formcryp totexttoplaintext,isstillkept
secretbythe
recipient.Publickeycryptographythusturnstheproblemofkeydistributionintoaproblemofidentity
determination.Alice’s messages(encrypteddatatransmissions)toBobarecompletelyprotectedbyBob’s
publickeybutonlyifAliceiscertainthatitisreallyBob’spublickeythatsheis
using,andnotthepublic keyof
someonemerelymasqueradingasBob.
Luckily,publickeycryptographyalsoprovidessometechniquesforhelpingtoestablishidentity,namelythe
electronic“signing”ofmessagestodocumenttheirauthenticity.Electronicsignatures,inturn,enablemessages
oftheform“I,apersonofauthorityknown
asX,certifythatthefollowingisreallythepublickeyofsubordinate
personY.(Signed)X.”Messageslikethisaretermedcertificates.Certificatescanbecascaded,withAcertifying
theidentityofB,whocertifiesC,andsoon.Certificatesessentially transformtheidentityproblemfromoneof
validatingthe
identityofmillionsofpossibleY’stovalidatingtheidentityofmuchsmallernumberoftoplevel
certificateauthorities(CAs).Yetitisamatter ofconcernthat morethan100toplevelCAsarewidelyrecognized
(e.g.,acceptedbymostallwebbrowsers),becausetheremaybeseveralintermediate
stepsinthehierarchyof
certificatesfromaCAtoauser,andateverystepaprivatekeymustbeprotectedbysomesigneronsome
computer.Thecompromiseofthisprivatekeypotentially compromisestheprivacyofalluserslowerdownthe
chainbecauseforgedcertificatesofidentity
cannowbecreated. Suchexploitshavebeenseen.Forexample,
the2011apparenttheftofaDutchCA’sprivatekeycompromisedtheprivacyofpotentiallyallgovernment
recordsintheNetherlands.
113,114

Manymajorcompanieshaverecentlyintroducedors trengthenedtheiruseofencryptiontotransmitdata.
115

Somearenowusing“(perfect)forwardsecrecy,”avariantofpublickeycryptographythatensuresthatthe
compromiseofanindividual’sprivatekeycancompromiseonlymessagesthathereceivessubsequently,while
theconfidentialityofpastconversationsismaintaine d, eveniftheircryptotextwaspreviouslyrecordedbythe
sameeavesdropper
nowinpossessionofthepurloinedprivatekey.
116
4.2.2Encryptionfrontiers
Thetechnologiesthusfarmentionedenabletheprotectionofdatabothinstorageandintransit,allowingthose
datatobefullydecryptedbyuserswhoeither(i)havetherightkeyalready(asmightbethecaseforpersons

112
PublickeyencryptionoriginatedthroughthesecretworkofBritishmathematiciansattheU.K.’sGovernment
CommunicationsHeadquarters(GCHQ),anorganizationroughlyanalogoustotheNSA,andreceivedbroaderattention
throughtheindependentworkbyresearchersincludingWhitfieldDiffieandMartinHellmanintheUnitedStates.
113
Fisher,Dennis,“FinalReportonDigiNotarHackShowsTotalCompromiseofCAServers,”ThreatPost,October31,2012.
http://threatpost.com/finalreportdiginotarhackshowstotalcompromisecaservers103112/77170.
114
Itisnotpubliclyknownwhetherornottheearlier2010compromiseofserversbelongingtoVeriSign,amuchlargerCA,
ledtocompromisesofcertificatesorsigningauthorities.Bradley,Tony,“VeriSignHacked:WhatWeDon'tKnowMightHurt
Us,”PCWorld,February2,2012.
http://www.pcworld.com/article/249242/verisign_hacked_what_we_dont_know_might_hurt_us.html
115
Asamplereportcard:https://www.eff.org/deeplinks/2013/11/encryptwebreportwhosdoingwhat#cryptochart
116
Diffie,Whitfield,etal.,"AuthenticationandAuthenticatedKeyExchanges"Designs,CodesandCryptography2:2,June
1992,pp.107125.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
37
storingdatafortheirownlateruse),or(ii)areauthorizedbythedataownerandhaveidentitiescertifiedbyaCA
thatisitselftrustedbythedataowner.Afrontierofcryptographyresearch,withsomeinventionsnowstarting
tomakeitintopractice,ishowtocreatedifferentkinds
ofkeys,oneswhichgiveonlylimitedaccessofvarious
kinds,orwhichallowmessagestobesenttoclassesofindividualswithout knowinginadvanceexactlywhothey
maybe.
Forexample,“identitybasedencryption”and“attributebasedencryption”arewaysofsendingamessage,or
protectingafile
ofdata,fortheexclusiveuseof“apersonnamedRamonaQ.DoewhowasbornonMay23,
1980,”orfor“anyonewiththejobtitleombudsman,ombudsperson,orconsumeradvocate.”Thesetechniques
requireatrustedthirdparty(essentiallyacertificateauthority),butthemessagesthemselvesdonotneedto
passthroughthehandsofthatthird party.Thesetoolsareinearlystagesofadoption.
“Zeroknowledge”systemsallowencrypteddatatobequeriedforcertainhighe rlevelabstractionswithout
revealingthelowleveldata.Forexample,awebsiteoperatorcouldverifythatauserisoverage21
without
learningtheuser’sactualbirthdate.Whatisremarkableisthatthiscanbedoneinawaythatproves
mathematicallythattheuserisnot lyingabouthisage:Theoperatorlearnswithmathematicalcertaintythata
certificate(signedbysomeCAofcourse!)atteststotheuser’sbirthdate,without
everactuallyseeingthat
certificate.Zeroknowledgesystemsarejustbeginningtobecommercialized insimplecases.Theyarenot
foreseeablyextendabletocomplexandunstructuredsituations,suchaswhatmightbeneededfortheresearch
miningofhealthrecorddatafromnonconsentingpatients.
Insomesimplerdomains,for
examplelocationprivacy,practicalcryptographicprotectionisclosertoreality.
Thetypicalcasemightbethatagroupoffriendswanttoknowwhentheyareclosetooneanother,butwithout
sharingtheiractuallocationswithanythirdparty.Applicationslikethisare,ofcourse,muchsimplerifthereis
a
trustedthirdparty,asisdefactothecaseformostsuchcommercialapplicationstoday.
Homomorphicencryptionisaresearchareathatgoesbeyondthemerequeryingofencrypteddatabasesto
actualcomputations(e.g.,thecollectionofstatistics)usingencrypteddatawithouteverdecryptingit.These
techniquesarefar
frombeingpractical,andtheyareunlikelytoprovidepolicyoptionsonthetimescalerelevant
tothisreport.
Insecuremultipartycomputation,w hichisrelatedtohomomorphicencryptionandisofparticularinterestin
thefinancialsector,computationmaybedoneondistributeddatastoresthatareencrypted.Although
individual
dataarekeptprivateusing“collusionrobust”encryptionalgorithms,datacanbeusedtocalculate
generalstatistics.Partiesthateachknowsomepriva tedatauseaprotocolthatgeneratesusefulresultsbased
onbothinformationtheyknowandinf ormationtheydonotknow,withoutrevealingtothemdatatheydo
not
alreadyknow.
Differentialprivacy,acomparativelynewdevelopmentrelatedtobutdifferentfromencryption,aimsto
maximizetheaccuracyofdatabasequeriesorcomputationswhileminimizingthei dentifiability ofindividuals
withrecordsinthedatabase,typicallyviaobfuscationofqueryresults(forexample,bytheadditionofspurious
informationor
“noise”).
117
Aswithotherobfuscationapproaches,thereisatradeoffbetweendataanonymity

117
(1)Dwork,Cynthia,“DifferentialPrivacy,”33rdInternationalColloquiumonAutomata,LanguagesandProgramming,
2006.(2)Dwork,Cynthia,“AFirmFoundationforPrivateDataAnalysis,”CommunicationsoftheACM,54.1,2011.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
38
andtheaccuracyandutilityofthequeryoutputs.Theseideasarefarfrompracticalapplication,exceptinsofar
astheymayenabletherisksofallowinganyqueriesatalltobebetterassessed.
4.3Noticeandconsent
Noticeandconsentis,today,themostwidelyusedstrategyforprotectingconsumerprivacy.Whentheuser
downloadsanewapptohisorhermobiledevice,orwhenheorshecreatesanaccountforawebservice,a
noticeisdisplayed,towhichtheusermustpositivelyindicateconsent
beforeusingtheapporservice.Insome
fantasyworld,usersactuallyreadthesenotices,understandtheirlegalimplications(consultingtheirattorneysif
necessary),negotiatewithotherprovidersofsimilarservicestogetbetterprivacytreatment,andonlythenclick
toindicatetheirconsent.Realityisdifferent.
118

Noticeandconsentfundamentally places theburdenofprivacyprotectionontheindividualexactlythe
oppositeofwhatisusuallymeantbya“right.”Worseyet,ifitishiddeninsuchanoticethattheproviderhas
therighttosharepersonaldata,theusernormallydoesnot
getanynoticefromthenextcompany,muchless
theopportunitytoconsent,eventhoughuseofthedatamaybedifferent.Furthermore,iftheproviderchanges
itsprivacynoticefortheworse,theuseristypicallynotnotifiedinausefulway.
Asausefulpolicytool,noticeandconsent
isdefeatedbyexactlythepositivebenefitsthatbigdataenables:
new,nonobvious,unexpectedlypowerfulusesofdata.Itissimplytoocomplicatedfortheindividualtomake
finegrainedchoicesforeverynewsituationorapp.Nevertheless,sincenoticeandconsentissodeeply rooted
incurrentpractice,
someexplorationofhowitsusefulnessmightbeextendedseemswarranted.
Onewaytoviewtheproblemwithnoticeandconsentisthatitcreatesanonlevelplayingfieldintheimplicit
privacynegotiationbetweenprovideranduser.Theprovideroffersacomplextakeitorleaveitsetof
terms,
backedbyalotoflegalfirepower,whiletheuser,inpractice,allocatesonlyafewsecondsofmentaleffortto
evaluatingtheoffer,sinceaccep tanceisneededtocompletethetransactionthatwastheuser’spurpose,and
sincethetermsaretypicallydifficulttocomprehendquickly.Thisis
akindofmarketfailure.Inothercontexts,
marketfailureslikethiscanbemitigatedbytheinterventionofthirdpartieswhoareabletorepresent
significantnumbersofusersandnegotiateontheirbehalf.Section4.5.1belowsuggestshowsuchintervention
mightbeaccomplished.
4.4Otherstrategiesandtechniques
4.4.1Anonymizationordeidentification
Longusedinhealthcareresearchandotherresearchareasinvolvinghumansubjects,anonymization(also
termeddeidentification)applieswhenthedata,standingaloneandwithoutanassociationtoaspecificperson,
donotviolateprivacynorms.Forexample,youmaynotmindifyourmedicalrecordisusedinresearch
aslong
asyouareidentifiedonlyasPatientXandyouractualnameandpatientidentifierarestrippedfromthatrecord.
Anonymizationofadatarecordmightseemeasytoimplement.Unfortunately,itisincreasinglyeasytodefeat
anonymizationbytheverytechniquesthatarebeingdevelopedformany
legitimateapplicat ionsofbigdata.In

118
Gindin,SusanE.,NobodyReadsYourPrivacyPolicyorOnlineContract:LessonsLearnedandQuestionsRaisedbythe
FTC'sActionagainstSears,”NorthwesternJournalofTechnologyandIntellectualProperty1:8,20092010.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
39
general,asthesizeanddiversityofavailabledatagrows,thelikelihoodofbeingabletoreidentifyindividuals
(thatis,reassociatetheirrecordswiththeirnames)growssubstantially.
119
OnecompellingexamplecomesfromSweeney,Abu,andWinn.
120
Theyshowedinarecentpaperthat,byfusing
public,PersonalGenomeProjectprofilescontainingzipcode,birthdate,andgenderwithpublicvoterrolls,and
miningfornameshiddeninattacheddocuments,8497percentoftheprofilesforwhichnameswereprovided
werecorrectlyidentified.
Anonymizationremainssomewhatuseful
asanaddedsafeguard,butitisnotrobustagainstneartermfuturere
identificationmethods.PCASTdoesnotseeitasbeingausefulbasisforpolicy.Unfortunately,anonymizationis
alreadyrootedinthelaw,sometimesgivingafalseexpectationofprivacywheredatalackingcertainidentifiers
aredeemednot
tobepersonallyidentifiableinformationandthereforenotcoveredbysuchlawsastheFamily
EducationalRightsandPrivacyAct(FERPA).
4.4.2Deletionandnonretention
Itisanevidentgoodbusinesspracticethatdataofallkindsshouldbedeletedwhentheyarenolongerofvalue.
Indeed,wellruncompaniesoftenmandatethedestructionofsomekindsofrecords(bothpaperandelectronic)
afterspecifiedperiodsoftime,oftenbecausetheyseelittlebenefitin
keepingtherecordsaswellaspotential
costinproducingthem.Forexample,employeeemails,whichmaybesubjecttolegalprocessby(e.g.)divorce
lawyers,areoftenseenashavingnegativeretentionvalue.
Countertothispracticeisthenewobservationthatbigdataisfrequentlyabletofind
economicorsocialvaluein
massesofdatathatwereotherwiseconsideredtobeworthless.Asthephysicalcostofretentioncontinuesto
decreaseexponentiallywithtime(especiallyinthecloud),therewillbeatendencyinbothgovernmentandthe
privatesectortoholdmoredataforlongerwith
obviousprivacyimplications.Archivaldatamayalsobe
importanttofuturehistorians,orforlaterlongitudinalanalysisbyacademicresearchers.
Onlypolicyinterventionswill counterthistrend.Governmentcanmandateretention policiesforitself.To
affecttheprivatesector,governmentmaymandatepolicieswhereithasregulatoryauthorities(asfor
consumer
protection,forexample).Butitcanalsoencouragethedevelopmentofstricterliabilitystandardsforcompanies
whosedata,includingarchiveddata,causeharmtoi ndividuals.Arationalresponsebytheprivatesectorwould
thenbetoholdfewerdataortoprotecttheiruse.
Theaboveholdstrueforprivacy
sensitivedataaboutindividualsthatareheldovertlythat is,theholderknows
thathehasthedataandtowhomtheyrelate.AswasdiscussedinSection3.1.2,however,sourcesofdata
increasinglycontainlatentinformationaboutindividuals,informationthatbecomesknow nonlyiftheholder
expendsanalytic
resources(beyondwhatmaybeeconomicallyfeasible), orthatmaybecomeknowable onlyin
thefuturewiththedevelopmentofnewdataminingalgorithms.Insuchcasesitispracticallyimpossibleforthe
dataholdereventosurface“allthedataaboutanindividual,”muchlessdeletethosedataonany
specified
schedule.

119
Deidentificationcanalsobeseenasaspectrum,ratherthanasingleapproach.See:“ResponsetoRequestfor
InformationFiledbyU.S.PublicPolicyCounciloftheAssociationforComputingMachinery,”March2014.
120
Sweeney,etal.,“IdentifyingParticipantsinthePersonalGenomeProjectbyName,”HarvardUniversityDataPrivacyLab.
WhitePaper10211,April24,2013.http://dataprivacylab.org/projects/pgp/
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
40
Theconceptsofephemerality(keepingdataonlyontheflyorforabriefperiod),andtransparency(enablingthe
individualtoknowwhatdataabouthimorherareheld)arecloselyrelated,andwiththesamepractical
limitations.Whiledatathatareonlystreamed,andnotarchived,mayhavelower
riskoffutureuse,thereisno
guaranteethataviolatorwillplaybythesupposedrules,asinTarget’slossof100milliondebitcardPINs,each
presentonlyephemerally(seeSection4.2.1).
Today,giventhedistributedandredundantnatureofdatastorage,itisnotevenclearthat
datacanbe
destroyedwithanyusefuldegreeofassurance.Althoughresearchondatadestructionisongoing,itisa
fundamentalfactthatatthemomentthatdataaredisplayed(in“analog”)toauser’seyeballsorears,theycan
alsobecopied(“redigitized”)withoutanytechnicalprotections.Thesame
holdsifdataareevermadeavailable
inunencryptedformtoaroguecomputerprogram,onedesignedtocircumventtechnicalsafeguards.Some
misinformedpublicdiscussionnotwithstanding,thereisnosuchthingasautomaticallyselfdeletingdata,other
thaninafullycontrolledandruleabidingenvironment.
Asacurrentexample,
SnapChatprovidestheserviceofdeliveringephemeralsnapshots (images),visibleforonly
afewseconds,toadesignatedrecipient’smobiledevice.SnapChatpromisestodeletepastdatesnapsfrom
theirservers,butitisonlyapromise.And,theyarecarefulnottopromisethattheintendedrecipientmaynot
contrive
tomakeanuncontrolledandnonexpiringcopy.Indeed,thesuccessofSnapChatincentivizesthe
developmentofjustsuchcopyingapplications.
121
Fromapolicymakingperspective,theonlyviableassumptiontoday,andfortheforeseeablefuture,isthatdata,
oncecreated,arepermanent.Whiletheirusemayberegulated,theircontinuedexistenceisbestconsidered
conservativelyasunalterablefact.
4.5Robusttechnologiesgoingforward
4.5.1ASuccessortoNoticeandConsent
Thepurposeofnoticeandconsentisthattheuserassentstothecollectionanduseofpersonaldataforastated
purposethatisacceptabletothatindividual.GiventhelargenumberofprogramsandInternetavailable
devices,bothvisibleandnot,thatcollectandusepersonaldata,thisframework
isincreasinglyunworkableand
ineffective.PCASTbelievesthattheresponsibilityfo rusingpersonaldatainaccordancewiththeuser’s
preferencesshouldrestwiththeprovider,possiblyassistedbyamutuallyacceptedintermediary,ratherthan
withtheuser.
Howmightthatbeaccomplished?Individualsmightbeencouragedtoassociatethemselveswith
oneofa
standardsetofprivacypreferenceprofiles(thatis,settingsorchoices)voluntarilyofferedbythirdparties.For
example,JanemightchoosetoassociatewithaprofileofferedbytheAmericanCivilLibertiesUnionthatgives
particularweighttoindividual rights,whileJohnmightassociatewithoneofferedby
ConsumerReportsthat
givesweighttoeconomicvaluefortheconsumer.Largeappstores(suchasAppleAppStore,GooglePlay,
MicrosoftStore)forwhomreputationalvalueisimportant,orlargecommercialsectorssuchasfinance,might
choosetooffercompetingprivacypreferenceprofiles.

121
See,forexample:RyanWhitwam,“SnapSaveforiPhoneDefeatsthePurposeofSnapchat,SavesEverythingForever,PC
Magazine,August12,2013.http://appscout.pcmag.com/appleiosiphoneipadipod/314653snapsaveforiphone
defeatsthepurposeofsnapchatsaveseverythingforever
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
41
Inthefirstinstance,anorganizationofferingprofileswouldvetnewappsasacceptableornotacceptablewithin
eachoftheirprofiles.Basically,theywo ulddotheclosereadingoftheprovider’snoticethattheusershould,
butdoesnot,do.Thisisnotasonerousasitmaysound:While
therearemillionsofapps,themostpopular
downloadsarerelativelyfewandareconcentratedinarelativelysmallnumberofportals.The“longtail”of
appswithfewcustomerseachmightinitiallybeleftas“unrated.”
Simplybyvettingapps,thethirdpartyo r ganizationswouldautomaticallycreateamarketplace
forthe
negotiationofcommunitystandardsforprivacy.Toattractmarketshare,providers(especiallysmallerones)
couldseektoqualifytheirofferingsinasmanyprivacypreferenceprofiles,offeredbyasmanydifferentthird
parties,astheydeemfeasible.TheFederalgovernment(e.g.,throughtheNationalInstituteofStandardsand
Technology)
couldencouragethedevelopmentofstandard,machinereadableinterfacesforthecommunication
ofprivacyimplicationsandsettingsbetweenprovidersandassessors.
Althoughhumanprofessionalscoulddothevettingtodayusingpoliciesexpressedinnaturallanguage,itwould
bedesirableinthefuturetoautomatethatprocess.Todothat,it
wouldbenecessarytohaveformalismsto
specifyprivacypoliciesandtoolstoanalyzesoftwaretodetermineconformance tothosepolicies.Butthatis
onlypartofthechallenge.Agreaterchallengeistomakesurethepolicylanguageissufficientlyexpressive,the
policiesaresufficientlyrich,andconformancetestsare
sufficientlypowerful.Thoserequirementsleadtoa
considerationofcontextanduse.
4.5.2ContextandUse
Thepreviousdiscussion,particularlyth atofSections3.1and3.2,illustratesPCAST’sbeliefthatafocusonthe
collection,storage,andretentionofelectronicpersonaldatawillnotprovide atechnologicallyrobust
foundationonwhichtobasefuturepolicy.Amongthemanyauthorsthathavetouchedontheseissues,Kagan
and
Abelsonexplainwhyaccesscontroldoesnotsufficetoprotectprivacy.
122
Mundiegivesacogentandmore
completeexplanationofthisissueandadvocatesthatprivacyprotectionisbetterservedbycontrollingtheuse
ofpersonaldata,broadlyconstrued,includingmetadataanddataderivedfromanalyticsthanbycontrolling
collection.
123
Inacomplementaryvein,Nissenbaumexplainsthatboththecontextofusageandtheprevailing
socialnormscontributetoacceptableuse.
124

Toimplementin ameaningfulwaytheapplicationofprivacypoliciestotheuseofpersonaldataforaparticular
purpose(i.e.,incontext),thosepoliciesneedtobeassociatedbothwithdataandwiththecodethatoperates
onthedata.Forexample,itmustbepossibletoensure
thatonlyappswithparticularpropertiescan beapplied
tocertaindata.Thepoliciesmightbeexpressedinwhatcomputerscientistscallnaturallanguage (plainEnglish
ortheequivalent)andtheassociationdonebytheuser,orthepolicies mightbestatedformallyandtheir
associationandenforcementdoneautomatically.In
eithercase,theremustalsobepoliciesassociatedwiththe
outputsofthecomputation,sincetheyaredataaswell.Theprivacypoliciesoftheoutputdatamustbe
computedfromthepoliciesassociatedwiththeinputs,thepoliciesassociatedwiththecode,andtheintended
useoftheoutputs
(i.e.,thecontext).Theseprivacypropertiesareakindofmetadata.Toachieveareasonable
levelofreliability,theirimplementationmustbetamperproofand“sticky”whendataarecopied.

122
Abelson,HalandLalanaKagal,“AccessControlisanInadequateFrameworkforPrivacyProtection,”W3CWorkshopon
PrivacyforAdvancedWebAPIs12/13,July2010,London.http://www.w3.org/2010/apiprivacy ws/papers.html
123
Mundie,Craig,“PrivacyPragmatism:FocusonDataUse,NotDataCollection,”ForeignAffairs,March/April,2014.
124
Nissenbaum,H.,“PrivacyinContext:Technology,Policy,andtheIntegrityofSocialLife,”StanfordLawBooks,2009.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
42
Therehasbeenconsiderableresearchinareasthatwouldcontributetosuchacapability,someofwhichis
beginningtobecommercialized.Thereisahistoryofusingmetadata(“tags”or“attributes”)indatabase
systemstocontroluse.Whiletheformalizationofprivacypoliciesandtheirsynthesisisaresearchtopic,
125
manualinterpretationofsuchpoliciesandthehumandeterminationofusagetagscanbefoundinrecent
products.Identitymanagementsystems(toauthenticateusersandtheirroles,i.e.,theircontext)arealso
evidentbothinresearch
126
andinpractice.
127
CommercialprivacysystemsforimplementingusecontrolexisttodayunderthenameofTrustedDataFormat
(TDF)implementations,developedprincipallyfortheUnitedStatesintelligencecommunity.
128
TDFoperatesat
thefilelevel.Thesystemsareprimarilybeingimplementedonacustombasisbylargeconsultingfirms,often
assembledfromopensourcesoftwarecomponents.Customerstodayareprimarilygovernmentagencies,such
asFederalintelligenceagenciesorlocalgovernmentcriminalintelligenceunits,orlargecommercialcompanies
invertically
integratedindustrieslikefinancialservicesandpharmaceuticalcompanieslookingtoimprovetheir
accountabilityandauditingcapabilities.Consultingservicesthathaveexpertiseinbuildingsuchsystemsinclude,
forexample,BoozAllen,Ernst&Young,IBM,Northr opGrumman,andLockheed;productbasedcompanieslike
Palantirandnewstartupspioneeringinternalusageauditing,policy
analytics,andpolicyreasoningengineshave
suchexpertise,aswell.Withsufficientmarketdemand,more widespreadmarketpenetrationcouldhappenin
thenextfiveyears.Marketpenetrationwouldbefurtheracceleratedif theleadingcloudplatformproviders
likeAmazon,Google,andMicrosoftimplementedusagecontrolledsystemtechnologies intheirofferings.
Widerscaleusethroughthegovernmentwouldhelpmotivatethecreationofofftheshelfstandardsoftware.
4.5.3Enforcementanddeterrence
Privacypoliciesandthecontrol ofuseincontextareonlyeffectiv etotheextentthattheyarerealizedand
enforced.Technicalmeasuresthatincre asetheprobability thataviolatoriscaughtcanbeeffectiveonlywhen
thereareregulationsandlawswithcivilorcriminalpenaltiestodetertheviolators.
Thenthereisboth
deterrenceofharmfulactionsandincentivetodeployprivacyprotectingtechnologies.
Itistodaystraightforwardtechnicallytoassociatemetadatawithdata,withvaryingdegreesofgranularity
rangingfromanindividualdatum,toarecord,toanentirecollection.Thesemetadatacanrecordawealthof
auditable
information,forexample,provenance,detailedaccessandusepolicies,authorizations,logsofactual
accessanduse,anddestructiondates.Extendingsuchmetadatatoderivedorshareddata(secondaryuse)
togetherwithprivacyawareloggingcanfacilitateauditing.Althoughthestateoftheartisstillsomewhatad
hoc,andauditingis
oftennotautomated,socalledaccountablesystemsarebeginningtobedeployed(Section

125
Seereferencesatfootnote107andalso:(1)Weitzner,D.J.,etal.,“InformationAccountability,”Communicationsofthe
ACM,June2008,pp.8287.(2)Tschantz,MichaelCarl,AnupamDatta,andJeannetteM.Wing,“FormalizingandEnforcing
PurposeRestrictionsinPrivacyPolicies.”http://www.andrew.cmu.edu/user/danupam/TschantzDattaWing12.pdf
126
Forexample,atCarnegieMellonUniversity,LorrieCranordirectstheCyLabUsablePrivacyandSecurityLaboratory
(http://cups.cs.cmu.edu/).Also,see2ndInternationalWorkshoponAccountability:Science,TechnologyandPolicy,MIT
ComputerScienceandArtificialIntelligenceLaboratory,January2930,2014.
http://dig.csail.mit.edu/2014/AccountableSystems2014/
127
Oracle’seXtensibleAccessControlMarkupLanguage(XACML)hasbeenusedtoimplementattributebasedaccess
controlsforidentitymanagementsystems.(Personalcommunication,MarkGorenbergandPeterGuerraofBoozAllen)
128
OfficeoftheDirectorofNationalIntelligence,“ICCIOEnterpriseIntegration&Architecture:TrustedDataFormat.”
http://www.dni.gov/index.php/about/organization/chiefinformationofficer/trusteddataformat
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
43
4.5.2).Theabilitytodetectviolationsofprivacypolicies,particularlyiftheauditingisautomated and
continuous,canbeusedbothtodeterprivacyviolationsandtoensurethatviolatorsarepunished.
Inthenextfiveyears,withregulationormarketdrivenencouragement,thelargecloudbasedinfrastructure
systems(e.g.,Google,
Amazon,Microsoft,Rackspace)could,asoneexample,incorporatethedataprovenance
andusagecomplianceaspectsofaccountablesystemsintotheircloudapplicationprogramminginterfaces
(APIs)andadditionallyprovideAPIsforpolicyawareness.Thesecapabilitiescouldthenreadilybeincludedin
opensourcebasedsystemslikeOpenStack(associatedwithRackspace)
129
andotherproviderplatforms.
Applicationsintendedtorunonsuchcloudbasedsystemscouldbebuiltwithprivacyconcepts“bakedinto
them,”evenwhentheyaredevelopedbysmallenterprisesorindividualdevelopers.
4.5.4OperationalizingtheConsumerPrivacyBillofRights
InFebruary2012,theAdministrationissuedareportsettingfortha ConsumerPrivacyBillofRights(CPBR).The
CPBRaddressescommercial(notpublicsector)usesofpersonaldataandisastrongstatementofAmerican
privacyvalues.
Forpurposesofthisdiscussion,theprinciplesembodiedinCPBRcanbedividedinto
twocategories.First,there
areobligationsfordataholders,analyzers,orcommercialusers.Thesearepassivefromtheconsumer’s
standpointtheobligationsshouldbemetwhetherornottheconsumerknows,cares,oracts.Second,and
different,thereareconsumerempowerments,thingsthattheconsumershouldbeempoweredto
initiate
actively.ItisusefulheretorearrangetheCPBR’sprinciplesbycategory.
Inthecategoryofobligationsaretheseelements:
RespectforContext:Consumershavearighttoexpectthatcompanieswillcollect,use,anddisclose
personaldatainwaysthatareconsistentwiththecontextinwhichconsumers
providethedata.
FocusedCollection:Consumershavearighttoreasonablelimitsonthepersonaldatathatcompanies
collectandretain.
Security:Consumershavearighttosecureandresponsiblehandlingofpersonaldata.
Accountability:Consumershavearighttohavepersonaldatahandledbycompanieswith
appropriate
measuresinplacetoassuretheyadheretotheConsumerPrivacyBillofRights.
Inthecategoryofconsumerempowermentsaretheseelements:
IndividualControl:Consumershavearighttoexercisecontroloverwhatpersonaldatacompanies
collectfromthemandhowtheyuseit.
Transparency:Consumershave
arighttoeasilyunderstandableandaccessibleinformationabout
privacyandsecuritypractices.
AccessandAccuracy:Consumershavearighttoaccessandcorrectpersonaldatainusableformats,ina
mannerthatisappropriatetothesensitivityofthedataandtheriskofadverseconsequencesto
consumers
ifthedataarei naccurate.
PCASTendorsesassoundtheprinciplesunderlyingCPBR.Becauseoftherapidlychangingtechnologies
associatedwithbigdata,however,effectiveoperationalizationofCPBRisatrisk.Uptonow,debateoverhow
tooperationalizeCPBRhasfocusedonthecollection,storage,andretentionofdata,
withanemphasisonthe

129
See:http://www.openstack.org/
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
44
“smalldata”contextsthatmotivatedCPBRdevelopment.But,asdiscussedatmultipleplacesinthisreport
(e.g.,Sections3.1.2,4.4and4.5.2),PCASTbelievesthatsuchafocuswill not provideatechnologicallyrobust
foundationonwhichtobasefuturepolicythatalsoappliestobigdata.Further,theincreasingcomplexity
of
applicationsandusesofdataunderminesevena simpleconceptlike“noticeandconsent.”
PCASTbelievesthattheprinciplesofCPBRcanreadilybeadaptedtoamorerobustregimebasedonrecognizing
andcontrollingharmfulusesofthedata. Somespecificsuggestionsfollow.
Turnfirsttotherights
classifiedaboveasobligationsonthedataholder.
TheprincipleofRespectforContextneedsaugmentation.Asthisreporthasrepeatedlydiscussed,thereare
instancesinwhichpersonaldataarenotprovidedbythecustomer.Suchdatamayemergeasaproductof
analysiswellafterthedatawerecollectedand
aftertheymayhavepassedthroughseveralhands.Whilethe
intentoftherightisappropriate,namelythatdatabeusedforlegitimatepurposesthatdonotproducecertain
adverseconsequencesorharmstoindividuals,theCPBR’sarticulationinwhich“consumersprovidethedata”is
toolimited.Thisrightneeds
tostateinsomewaythatdataaboutanindividualhoweveracquirednotbe
usedsoastocausecertainadverseconsequencesorharmstothatindividual.(SeeSection1.4forapossiblelist
ofadverseconsequencesandharmsthatmightbesubjecttosomeregulation.)
Asinitiallyconceived,
therighttoFocus edCollectionwastobeachievedbytechniqueslikedeidentificationand
datadeletion.AsdiscussedinSection4 .4.1,however ,deidentification(anonymization)isnotarobust
technologyforbigdatainthefaceofdatafusion;in someinstances,theremaybecompellingreasonstoretain
dataforbeneficialpurposes.Thisrightshouldbeaboutuseratherthancollection.Itshouldemphasizeutilizing
bestpracticestopreventinappropriate useofdataduringthedata’swholelifecycle,ratherthandependingon
deidentification.Itshouldnotdependonacompany’sbeingableitselftorecognize“all”the
dataabouta
consumerthatitholds,whichisincreasinglytechnicallyinfeasible.
TheprinciplesunderlyingCPBR’sSecurityandAccountabilityremainvalidinausebasedregime.Theyneedto
beappliedthroughoutthevaluechainthatincludesdatacollection,analysis,anduse.
Turnnexttotherightshereclassifiedas
consumerempowerments.
Whereconsumerempowermentshavebecomepracticallyimpossiblefortheconsumertoexercise
meaningfully,theyneedtoberecastasobligationsofthecommercialentitythatactuallyusesthedataor
productsofdataanalysis.ThisappliestotheCPBR’sprinciplesofIndividualControlandofTransparency.
Section4.3explained
howthenonobviousnatureofbigdata’sproductsofanalysismakeitallbutimpossible
foranindividualtomakefinegrainedprivacychoicesforeverynewsituationorapp.Fortheprincipleof
IndividualControltohavemeaning,PCASTbelievesthattheburdenshouldnolongerfallon
theconsumerto
manageprivacyforeachcompanywithwhichtheconsumerinteractsbyaframeworklike“noticeandconsent.”
Rather,eachcompanyshouldtakeresponsibilityforconformingitsusesofpersonaldatatoapersonalprivacy
profiledesignatedbytheconsumerandmadeavailabletothatcompany(includingfroma
thirdparty
designatedbytheconsumer).Section4.5.1proposedamechanismforthischangeinresponsibility.
Transparency(inthesenseofdisclosureofprivacypractices)suffersfrommanyofthesameproblems.To day,
theconsumerreceivesanunhelpfulblizzardofprivacypolicynotifications,manyofwhichsay,inessence,“we
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
45
providerscandoanythingwewant.”
130
AswithIndividualControl,theburdenofconformingtoaconsumer’s
statedpersonalprivacyprofileshouldfallonthecompany,withnotificationtotheconsumersbyacompanyif
theirprofileprecludesthatcompany’sacceptingtheirbusiness.Sincecompaniesdonotliketolosebusiness,a
positivemarketdynamicfor
competingprivacypracticeswouldthusbecreated.
FortherightofAccessandAccuracytobemeaningful,personaldatamustincludethefruitsofdataanalytics,
notjustcollection.However,asthisreporthasalreadyexplained(Section4.4.2),itisnotalwayspossiblefora
companyto“knowwhatitknows”
aboutaconsumer,sincethatinfo rmation maybeunrecognizedinthedata;
oritmaybecomeidentifiableonlyinthefuture,whendatasetsarecombinedusingnewalgorithms.When,
however,thepersonalcharacterofdataisapparenttoacompanybyvirtueofitsuseofthedata,its
obligation
toprovidemeansforthecorrectionoferrorsshouldbetriggered.Consumersshouldhaveanexpectationthat
companieswillvalidateandcorrectdatastemmingfromanalysisand,sincenotallerrorswillbecorrected,will
alsotakestepstominimizetheriskofadverseconsequencestoconsumersfromtheuse
ofinaccuratedata.
Again,theprimaryburdenmustfallonthecommercialuserofbigdataandnotontheconsumer.

130
Lawyersmayencouragecompaniestouseoverinclusivelanguagetocovertheunpredictableevolutionofpossibilities
describedelsewhereinthisreport,evenintheabsenceofspecificplanstousespecificcapabilities.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
46

BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
47
5.PCASTPerspectivesandConclusions
Breachesofprivacycancauseharmtoindividualsandgroups.Itisaroleofgovernmenttoprevent suchharm
wherepossible,andtofacilitatemeansofredresswhentheharmoccurs.Technicalenhancementsofprivacy
canbeeffectiveonlywhenaccompaniedbyregulationsorlawsbecause,unlesssomepenaltiesare
enforced,
thereisnoendtotheescalationofthemeasurescountermeasures“game”betweenviolatorsandprotectors.
Rulesandregulationsprovidebothdeterrenceofharmfulactionsandincentivestodeployprivacyprotecting
softwaretechnologies.
Fromeverythingalreadysaid,itshouldbeobviousthatnewsourcesofbigdataareabundant;
thattheywill
continuetogrow;andthattheycanbringenormous economicandsocialbenefits.Similarly,andofcomparable
importance,newalgorithms,software,andhardwaretechnologieswillcontinuetoincreasethepowerofdata
analyticsinunexpectedways.Giventhesenewcapabilitiesofdataaggregationandprocessing,thereis
inevitably
newpotentialforboththeunintentionalleakingofbothbulkandfinegraineddataaboutindividuals,
andfornewsystematicattacksonprivacybythosesominded.
Cameras,sensors,andotherobservationalormobiletechnologiesraisenewprivacyconcerns.Individualsoften
donotknowinglyconsenttoprovidingdata.Thesedevices
naturallypullindataunrelatedtotheirprimary
purpose.Theirdatacollectionisofteninvisible.Analysistechnology(suchasfacial,scene,speech,andvoice
recognitiontechnology)isimprovingrapidly.Mobiledevicesprovidelocationinformationthatmightnotbe
otherwisevolunteered.Thecombinationofdatafromthosesourcescanyieldprivacythreatening
information
unbeknownsttotheaffectedindividuals.
Itisalsotrue,however,thatprivacysensitivedatacannotalwaysbereliablyrecognizedwhentheyarefirst
collected,becausetheprivacysensitiveelementsmaybeonlylatentinthedata,madevisibleonlybyanalytics
(includingthosenotyetinvented),orbyfusion
withotherdatasources(includingthosenotyetknown).
Suppressingthecollectionofprivacysensitivedatawouldthusbeincreasinglydifficult,anditwouldalsobe
increasinglycounterproductive,frustratingthedevelopmentofbigdata’ssociallyimportantandeconomic
benefits.
Norwoulditbedesirabletosuppressthecombiningofmultiplesources
andkindsofdata:Muchofthepowerof
bigdatastemsfromthiskindofdatafusion.Thatsaid,itremainsamatterofconcernthatconsiderable
amountsofpersonaldatamaybederivedfromdatafusion.Inotherwords,suchdatacanbeobtainedor
inferredwithoutintentional
personaldisclosure.
Itisanunavoidablefactthatparticularcollectionsofbigdataandparticularkindsofanalysiswilloftenhave
bothbeneficialandprivacyinappropriateuses.Theappropriateuseofboththedataandtheanalysesare
highlycontextual.
Anyspecificharmoradverseconsequenceistheresultofdata,
ortheiranalyticalproduct,passingthrough the
controlofthreedistinguishable classesofactorinthevaluechain:
First,therearedatacollectors,whocontroltheinterfacestoindividualsortotheenvironment.Datacollectors
maycollectdatafromclearlyprivaterealms(e.g.,ahealthquestionnaireorwearablesensor),from
ambiguous
situations(e.g.,cellphonepicturesorGoogleGlassvideostakenatapartyorcamerasandmicrophonesplaced
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
48
inaclassroomforremotebroadcast),orincreasinginbothquantityandqualitydatafromthe“public
square,”whereprivacysensitivedatamaybelatentandinitiallyunrecognizable.
Second,therearedataanalyzers.Thisiswherethe“big”inbigdatabecomesimportant.Analyzersmay
aggregatedatafrom
manysources,andtheymaysharedatawithotheranalyzers.Analyzers,asdistinctfrom
collectors,createuses(“productsofanalysis”)bybringingtoge theralgorithmsanddatasetsinalargescale
computationalenvironment.Importantly,analyzersarethelocuswhereindividualsmaybeprofiledbydata
fusionorstatisticalinference.
Third,there
areusersoftheanalyzeddatabusiness,governme nt,orindividual.Userswillgenerallyhavea
commercialrelationshipwithanalyzers;theywillbepurchasersorlicensees(etc.)oftheanalyzer’sproductsof
analysis.Itistheuserwhocreatesdesirableeconomicandsocialoutcomes.But,itisalsotheuser
whoisthe
locusofproducingactualadverseconsequencesorharms,whensuchoccur.
5.1Technicalfeasibilityofpolicy interventions
Policy,ascreatedbynewlegislationorwithinexistingregulatoryauthorities,can,inprinciple,interveneat
variousstagesinthevaluechaindescribedabove.Notallsuchinterventionsareequallyfeasiblefroma
technicalperspective,orequallydesirableifthesocietalandeconomicbenefitsofbigdataaretoberealized.
AsindicatedinChapter4,basingpolicyonthecontrolofcollectionisunlikelytosucceed,exceptinverylimited
circumstanceswherethereisanexplicitlyprivatecontext(e.g.,measurementordisclosureofhealthdata)and
thepossibilityofmeaningfulexplicitorimplicitnoticeandconsent(e.g.,byprivacypreference
profiles,see
Sections4.3and4.5.1),whichdoesnotexisttoday.
Thereislittletechnicallikelihoodthat"arighttoforget"orsimilarlimitsonretentioncouldbemeaningfully
definedorenforced(seeSection4.4.2).Increasingly,itwillnotbetechnicallypossibletosurface“all”ofthedata
aboutanindividual.
Policybasedonprotectionbyanonymizationisfutile,becausethefeasibilityofre
identificationincreasesrapidlywiththeamountofadditionaldata(seeSection4.4.1).Thereislittle,and
decreasing,meaningfuldistinctionbetweendataandmetadata.Thecapabilitiesofdatafusion,datamining,
andreidentificationrendermetadatanotmuch
lessproblematicthandata(seeSection3.1).
Evenifdirectcontrolsoncollectionareinmostcasesinfeasible,ho wever,attentiontocollectionpracticesmay
helptoreduceriskinsomecircumstances.Suchbestpracticesastrackingprovenance,auditingaccessanduse,
andcontinuousmonitoringandcontrol(seeSections4.5.2and4.5.3)
couldbedrivenbypartnershipsbetween
governmentandindustry(thecarrot)andalsobyclarifyingtortlawanddefiningwhatmightconstitute
negligence(thestick).
Turnnexttodataanalyzers.Onetheonehand,itmaybedifficulttoregulatethem,becausetheiractionsdonot
directlytouchtheindividual
(itisneithercollectionnoruse)andmayhavenoexternalvisibility.Mereinference
aboutanindividual,absentitspublicationoruse,maynotbeafeasibletargetofregulation.Ontheotherhand,
anincreasingfractionofprivacyissueswillsurfaceonlywiththeapplicationofdataanalytics.Manyprivacy
challengeswillarisefromtheanalysisofdatacollectedunintentionallythatwasnot,atthetimeofcollection,
targetedatanyparticularindividualorevengroupofindividuals.Thisisbecausecombiningdatafrommany
sourceswillbecomemoreandmorepowerful.
Itmightbefeasibletointroduceregulationat
the“momentofparticularization”ofdataaboutanindividual,or
whenthisisdoneforsomeminimumnumberofindividualsconcurrently.Tobeeffectivesuchregulationwould
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
49
needtobeaccompaniedbyrequirementsfortrackingprovenance,auditingaccessanduse,andusingsecurity
measures(e.g.,robustencryptioninfrastructure)atallstagesoftheevolutionofdata,andforproviding
transparency,and/ornotification,atthemomentofparticularization.
Bigdata’s“productsofanalysis”arecreatedbycomputerprogramsthat
bringtogetheralgorithmsanddataso
astoproducesomethingofvalue.Itmightbefeasibletorecognizesuchprograms,ortheirproducts,inalegal
senseandtoregulatetheircommerce.Forexample,theymightnotbeallowedtobeusedincommerce(sold,
leased,licensed,andsoon)
unlesstheyareconsistentwithindividuals’privacyelectionsorotherexpressionsof
communityvalues(seeSections4.3and4.5.1).Requirementsmightbeimposedonconformitytoappropriate
standardsofprovenance,auditability,accuracy,andsoon,inthedatatheyuseandproduce;orthatthey
meaningfullyidentifywho(licensorvs.licensee)
isresponsibleforcorrectingerrorsandliableforvarioustypes
ofharmoradverseconsequencecausedbytheproduct.
Itisnot,however,themeredevelopmentofaproductofanalysisthatcancauseadverseconsequences.Those
occuronlywithitsactualuse,whetherincommerce,bygovernment,bythe
press,orbyindividuals.Thisseems
themosttechnicallyfeasibleplacetoapplyregulationgoingforward,focusingatthelocuswhereharmcanbe
produced,notfarupstreamfromwhereitmaybarely(ifatall)beidentifiable.
Whenproductsofanalysisproduceimperfectinformationthatmaymisclassifyindividualsinways
thatproduce
adverseconsequences,onemightrequirethattheymeetstandar dsfordataaccuracyandintegrity;thatthere
areuseableinterfacesthatallowanindividualtocorrecttherecordwithvoluntaryadditionalinformation;and
thatthereexiststreamlinedoptionsforredress,includingfinancialredress,whenadverseconsequencesreacha
certain
level.
Someharmsmayaffectgroups(e.g.,thepoororminorities)ratherthanidentifiableindividuals.Mechanismsfor
redressinsuchcasesneedtobedeveloped.
Thereisaneedtoclarifystandards forliabilityincaseofadverseconsequencesfromprivacyviolations.
Currentlythereisapatchworkofoutofdate
statelawsandlegalprecedents.Onecouldencouragethedrafting
oftechnologicallysavvymodellegislationoncybertortsforconsiderationbythestates.
Finally,governmentmaybeforbiddenfromcertainclassesofuses,despitetheirbeingavailableintheprivate
sector.
5.2Recommendations
PCAST’schargeforthisstudydoesnotaskittomakerecommendationsonprivacypolicies,butrathertomakea
relativeassessmentofthetechnicalfeasibilityofdiffe rentbroadpolicyapproaches.PCAST’soverallconclusions
aboutthatquestionareembodiedinthefirsttwoofourrecommendations:
Recommendation1.Policyattentionshould
focusmoreontheactualusesofbigdataandlessonits
collectionandanalysis.
Byactualuses,wemeanthespecificeventswheresomethinghappensthatcancauseanadverseconsequence
orharmtoanindividualorclassofindividuals.Inthecontextofbigdata,theseevents
(“uses”)arealmost
alwaysactionsofacomputerprogramorappinteractingeitherwiththerawdataorwiththefruitsofanalysisof
thosedata.Inthisformulation,itisnotthedatathemselvesthatcausetheharm,northeprogramitself(absent
anydata),buttheconfluenceofthe
two.These“useevents”(incommerce,bygovernment,orbyindividuals)
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
50
embodythenecessaryspecificitytobethesubjectofregulation.Sincethepurposeofbringingprogramand
datatogetheristoaccomplishsomeidentifiabledesiredtask,useeventsalsocapturesomenotionofintent,ina
waythatdatacollectionbyitselforprogramdevelopmentbyitselfmaynot.Thepolicy
questionofwhatkinds
ofadverseconsequencesorharmsrisetothelevelofneedingregulationisoutsideofPCAST’scharge,butan
illustrativesetthatseemgrounded incommonAmericanvalueswasprovidedinSection1.4.
PCASTjudgesthatalternativebigdatapoliciesthatfo cusontheregulationof
datacollection,storage,retention,
apriorilimitationsonapplications,andanalysis(absentidentifiableactualusesofbigdataoritsproductsof
analysis)areunlikelytoyieldeffectivestrategiesforimprovingprivacy.Suchpoliciesareunlikelytobescalable
overtimeasitbecomesincreasinglydifficulttoascertain,aboutanyparticular
dataset,whatpersonal
informationmaybelatentinitorinitspossiblefusionwitheveryotherpossibledataset,presentorfuture.
Therelatedissueisthatpolicieslimitingcollectionandretentionareincreasinglyunlikelytobeenforceableby
otherthansevereandeconomicallydamagingmeasures.While
therearecertaindefinableclassesofdataso
repugnanttosocietythattheirmerepossessioniscriminalized,
131
theinformationinbigdatathatmayraise
privacyconcernsisincreasinglyinseparablefromavastvolumeofthedataofordinarycommerce,or
governmentfunction,orcollectioninthepublicsquare.Thisdualusecharacterofinformation,too,arguesfor
theregulationofuseratherthancollection.
Recommendation2.
Policiesandregulation,atalllevelsofgovernment,shouldnotembedparticular
technologicalsolutions,butrathershouldbestatedintermsofintendedoutcomes.
Toavoidfallingbehindthetechnology,itisessentialthatpolicyconcerningprivacyprotectionshouldaddress
thepurpose(the“what”)ratherthanthemechanism(the“how”).For
example,regulatingdisclosureofhealth
informationbyregulatingtheuseofanonymizationfailstocapturethepowerofdatafusion;regulatingthe
protectionofinformationaboutminorsbycontrollinginspectionofstudentrecordsheldbysc hools failsto
anticipatethestudent informationcapturingbyonlinelearningtechnologies.Regulatingcontrolof
the
inappropriatedisclosureofhealthinformationorstudentperformance,nomatterhowthedataareacquiredis
morerobust.
PCASTfurtherrespondstoitschargewiththefollowingrecommendations,intendedtoadvancetheagendaof
strongprivacyvaluesandthetechnologicaltoolsneededtosupportthem:
Recommendation3.Withcoordinationand
encouragementfromOSTP,theNITRDagencies
132
should
strengthenU.S.researchinprivacyrelatedtechnologiesandintherelevantareasofsocialsciencethatinform
thesuccessfulapplicationofthosetechnologies.
Someofthetechnologyforcontrollingusesalreadyexists.Research(andfundingforit)isneeded,however,in
thetechnologiesthathelptoprotectprivacy,in
thesocialmechanismsthatinfluenceprivacypreserving

131
Childpornographyisthemostuniversallyrecognizedexample.
132
NITRDreferstotheNetworkingandInformationTechnologyResearchandDevelopmentprogram,whoseparticipating
Federalagenciessupportunclassifiedresearchininadvancedinformationtechnologiessuchascomputing,networking,and
softwareandincludebothresearch‐andmissionfocusedagenciessuchasNSF,NIH,NIST,DARPA,NOAA,DOE’sOfficeof
Science,and
theD0Dmilitaryservicelaboratories(seehttp://www.nitrd.gov/SUBCOMMITTEE/nitrd_agencies/index.aspx).
ThereisresearchcoordinationbetweenNITRDandFederalagenciesconductingorsupportingcorrespondingclassified
research.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
51
behavior,andinthelegaloptionsthatarerobusttochangesintechnologyandcreateappropriatebalance
amongeconomicopportunity,othernationalpriorities,andprivacyprotection.
FollowinguponrecommendationsfromPCASTforincreasedprivacyrelatedresearch,
133
a20132014internal
governmentreviewofprivacyfocusedresearchacros s Federalagenciessupportingresearchoninformation
technologiessuggeststhatabout$80millionsupportseitherresearchwithanexplicitfocusonenhancing
privacyorresearchthataddressesprivacyprotectionancillarytosomeothergoal(typicallycybersecurity).
134

Thefundedresearchaddressessuchtopicsasanindividual’scontroloverhisorherinformation,transparency,
accessandaccuracy,andaccountability.Itistypicallyofageneralnature,exceptforresearchfocusingonthe
healthdomainor(relativelynew)consumerenergyusage.Thebroadestandmostvariedsupportforprivacy
research,intheformofgrantstoindividualsand centers,comesfromtheNationalScienceFoundation(NSF),
engagingsocialscienceaswellascomputerscienceandengineering.
135,136
ResearchintoprivacyasanextensionorcomplementtosecurityissupportedbyavarietyofDepartmentof
Defenseagencies(AirForceResearchLaboratory,theArmy’sTelemedicineandAdvancedTechnologyResearch
Center,DefenseAdvancedResearchProjectsAgency,NationalSecurityAgency,andOfficeofNavalResearch)
andtheIntelligenceAdvancedResearch
ProjectsActivity(IARPA)withintheIntelligenceCommunity.IARPA,for
example,hashostedtheSecurityandPrivacyAssuranceResearch
137
program,whichhasexploredavarietyof
encryptiontechniques.ResearchattheNationalInstituteforStandardsandTechnology(NIST)focusesonthe
developmentofcryptographyandbiometrictechnologytoenhanceprivacyaswellassupportforfederal
standardsandprogramsforidentitymanagement.
138
Lookingtothefuture,continuedinvestmentisneedednotonlyinprivacytopicsancillarytosecurity,butalsoin
automatingprivacyprotectionforthebroadestaspectsofuseofdatafromallsources.Relevanttopicsinclude
cryptography,privacypreservingdatamining(includinganalysisofstreamingaswellasstored)data,
139
formalizationofprivacypolicies,toolsforautomatingconformanceofsoftwaretopersonalprivacypolicyandto
legalpolicy,methodsforauditinguseincontextandidentifyingviolationsofpolicy,andresearchonenhancing
people’sabilitytomakesenseoftheres ultsofvariousbigdataanalyses.Developmentoftechnologiesthat
supportbothqualityanalyticsandprivacypreservationondistributeddata,suchassecuremultiparty
computation,willbecomeevenmoreimportant,giventheexpectationthatpeoplewilldrawincreasinglyfrom

133
DesigningaDigitalFuture:FederallyFundedResearchandDevelopmentinNetworkingandInformationTechnology
(http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcastnitrd2013.pdf[2012]and
http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcastnitrdreport2010.pdf[2010]).
134
FederalNetworkingandInformationTechnologyResearchandDevelopmentProgram,“ReportonPrivacyResearch
WithinNITRD[NetworkingandInformationTechnologyResearchandDevelopment],NationalCoordinationOfficefor
NITRD,April23,2014.http://www.nitrd.gov/Pubs/Report_on_Privacy_Research_within_NITRD.pdf
135
TheSecureandTrustworthyCyberspaceprogramisthelargestfunderofrelevantresearch.See:
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504709
136
InDecember2013,theNSFdirectoratessupportingcomputerandsocialsciencejoinedinsolicitingproposalsforprivacy
relatedresearch.http://www.nsf.gov/pubs/2014/nsf14021/nsf14021.jsp.
137
See:http://www.iarpa.gov/index.php/researchprograms/spar
138
NISTisresponsibleforadvancingtheNationalStrategyforTrustedIdentitiesinCyberspace(NSTIC),whichisintendedto
facilitatesecuretransactionswithinandacrosspublicandprivatesectors.See:http://www.nist.gov/nstic/
139
Pike,W.A.etal.,“PNNL[PacificNorthwestNationalLaboratory]ResponsetoOSTPBigDataRFI,”March2014.
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
52
datastoredinmultiplelocations.ThecreationoftoolsthatanalyzethepanoplyofNational,state,regional,and
internationalrulesandregulationsforinconsistenciesanddifferenceswillbehelpfulforthedefinitionofnew
rulesandregulations,aswellasforthosesoftwaredevelopersthatneedtocustomizetheirservicesfor
different
markets.
Recommendation4.OSTP,togetherwiththeappropriateeducationalinstitutionsandprofessionalsocieties,
shouldencourageincreasededucationandtrainingopportunitiesconcerningprivacyprotection,including
professionalcareerpaths.
Programsthatprovideeducationleadingtoprivacyexpertise(akintowhatisbeingdoneforsecurityexpertise)
areessentialandneedencouragement.
Onemightenvisioncareersfordigitalprivacyexpertsbothonthe
softwaredevelopmentsideandonthetechnicalmanagementside.Employmentopportunitiesshouldexistnot
onlyinindustry(andgovernmentatalllevels),wherejobsfocusedonprivacy(includingbutnotlimitedtoChief
PrivacyOfficers)havebeengrowing,butalso
forconsumerandcitizenadvocacyandsupport,perhapsoffering
“annualprivacycheckups”forindividuals.Justaseducationandtrainingaboutcybersecurityhasadvancedover
thepast20yearswithinthetechnicalcommunity,thereisnowopportunitytoeducateandtrainstudentsabout
privacyimplicationsandprivacyenhancements,beyondthepresent
smallnicheareaoccupiedbythisfocus
withincomputerscienceprograms.
140
Privacyisalsoanimportantcomponentofethicseducationfor
technologyprofessionals.
Recommendation5.TheUnitedStatesshouldtaketheleadbothintheinternationalarenaandathomeby
adoptingpoli ciesthatstimulatetheuseofpracticalprivacyprotectingtechnologiesthatexisttoday.This
countrycanexhibitleadershipboth
byitsconveningpower(forinstance,bypromotingthecreationand
adoptionofstandards)andalsobyitsownprocurementpractices(suchasitsownuseofprivacypreserving
cloudservices).
Section4.5.2describedasetofprivacyenhancingbestpracticesthatalreadyexisttodayinU.S.markets.PCAST
isnot
awareofanymoreeffectiveinnovationorstrategiesbeingdevelopedabroad;rather,somecountries
seeminclinedtopursuewhatPCASTbelievestobeblindalleys.ThiscircumstanceoffersanopportunityforU.S.
technicalleadershipinprivacyintheinternationalarena,anopportunitythatshouldbeseized.Publicpolicycan
help
tonurturethebuddingcommercialpotentialofprivacyenhancingtechnologies,boththroughU.S.
governmentprocurementandthroughthelargerpolicyframeworkthatmotivatesprivatesectortechnology
engagement.
Asitdoesforsecurity,cloudcomputingofferspositivenewopportunitiesforprivacy.Byrequiringprivacy
enhancingservicesfromcloudserviceproviderscontractingwith
theU.S.government,thegovernmentshould
encouragethoseproviderstomakeavailablesophisticatedprivacy enhancingtechnologiestosmallbusinesses
andtheircustomers,beyondwhatthesmallbusinessmightbeabletodoonitsown.
141

140
AbasiscanbefoundinthenewestversionofthecurriculumguidanceoftheAssociationforComputingMachinery
(http://www.acm.org/education/CS2013finalreport.pdf).Givenallofthepressuresoncurriculum,progress—aswith
cybersecurity—mayhingeongrowthinprivacyrelatedresearch,businessopportunities,andoccupations.
141
AbeginningcanbefoundintheFederalGovernment’sFedRAMPprogramforcertifyingcloudservices.Initiatedto
addressFederalagencysecurityconcerns,FedRAMPalreadybuildsinattentiontoprivacyintheformofarequiredPrivacy
ThresholdAnalysisandinsomesituationsaPrivacyImpactAnalysis.Theofficeofthe
U.S.ChiefInformationOfficer
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
53
5.4FinalRemarks
Privacyisanimportanthumanvalue.Theadvanceoftechnologyboththreatenspersonalprivacyandprovides
opportunitiestoenhanceitsprotection.Thechalle ngefortheU.S.Governmentandthelargercommunity,both
withinthiscountryandglobally,istounderstand whatthenatureofprivacyisinthemodernworldand
tofind
thosetechnological,educational,andpolicyavenuesthatwillpreserveandprotectit.

providesguidanceonFederalusesofinformationtechnologythataddressesprivacyalongwithsecurity(see
http://cloud.cio.gov/).ItprovidesspecificguidanceonthecloudandFedRAMP(http://cloud.cio.gov/fedramp),including
privacyprotection(http://cloud.cio.gov/document/privacythresholdanalysisandprivacyimpactassessment).
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
54
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
55
AppendixA.AdditionalExpertsProvidingInput
YochaiBenkler
Harvard
EleanorBirrell
CornellUniversity
CourtneyBowman
Palantir
ChristopherClifton
PurdueUniversity
JamesCosta
SandiaNationalLaboratory
LorrieFaithCranor
CarnegieMellonUniversity
DeborahEstrin
CornellNYC
WilliamW.(Terry)Fisher
HarvardLawSchool
StephanieForrest
UniversityofNewMexico
DanGeer
InQ
Tel
DeborahK.Gracio
PacificNorthwestNationalLaboratory
EricGrosse
Google
PeterGuerra
BoozAllen
MichaelJordan
UniversityofCalifornia,Berkeley
PhilipKegelmeyer
SandiaNationalLaboratory
AngelosKeromytis
ColumbiaUniversity
ThomasKalil
OSTP
JonKleinberg
CornellUniversity
JuliaLane
AmericanInstitutesforResearch
CarlLandwehr
GeorgeWashingtonUniversity
DavidMoon
Ernst&Young
KeithMarzullo
NationalScienceFoundation
MarthaMinow
HarvardLawSchool
TomMitchell
CarnegieMellonUniversity
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
56
DeirdreMulligan
UniversityofCalifornia,Berkeley
LeonardNapolitano
SandiaNationalLaboratory
CharlesNelson
OSTP
ChrisOehmen
PacificNorthwestNationalLaboratory
Alex“Sandy”Pentland
MassachusettsInstituteofTechnology
RenePeralta
NationalInstituteofStandardsandTechnology
AnthonyPhilippakis
GenomeBridge
TimothyPolk
OSTP
FredB.Schneider
CornellUniversity
GregShipley
InQTel
LaurenSmith
OSTP
FrancisSullivan
InstituteforDefenseAnalysis
ThomasVagoun
NITRDNationalCoordinationOffice
KonradVesey
IntelligenceAdvancedResearchActivity
JamesWaldo
Harvard
PeterWeinberger
Google,Inc.
DanielJ.Weitzner
MassachusettsInstituteofTechnology
NicoleWong
OSTP
JonathanZittrain
HarvardLawSchool
BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE
57
SpecialAcknowledgment
PCASTisespeciallygratefulfortherapidandcomprehensiveassistanceprovidedbyanadhocgroupof
staffattheNationalScienceFoundation(NSF),ComputerandInformationScienceandEngineering
Directorate.ThisteamwasledbyFenZhaoandEmilyGrumbling,whowereenlistedbySuzanne
Iacono.Drs.ZhaoandGrumblingworkedtirelesslytoreviewthetechnicalliterature,elicit
perspectivesandfeedbackfromarangeofNSFcolleagues,anditerateondescriptionsofnumerous
technologiesrelevanttobigdataandprivacyandhowthosetechnologieswereevolving.
NSFTechnologyTeamLeaders
FenZhao,AAASFellow,CISE
EmilyGrumbling,AAASFellow,Officeof
Cyberinfrastructure
AdditionalNSFContributors
RobertChadduck,ProgramDirector
AlmadenaY.Chtchelkanova,ProgramDirector
DavidCorman,ProgramDirector
JamesDonlon,ProgramDirector
JeremyEpstein,ProgramDirector
JosephB.Lyles,ProgramDirector
DmitryMaslov,ProgramDirector
MimiMcClure,AssociateProgramDirector
AnitaNikolich,Expert
AmyWalton,ProgramDirector
RalphWachter,ProgramDirector
President’s Council of Advisors on Science and
Technology (PCAST)
www.whitehouse.gov/ostp/pcast