Pei LI et al. Linking temporal records 19
[19]. In [18] the authors study behavior based linkage wh ere
it leverages the periodical behavior patterns of each entity in
linking pairs of records and learns such patterns from trans-
action logs. Their behavior pattern is different from the de-
cay in our techniques in that decay learns the probability of
value changes over time for all entities. In addition, we do
not require a fixed and repeated value change pattern of par-
ticular entities, and we app ly decay in a global fashion (rather
than just between pairs of records) such that we can handle
value evolution over time. Burdick et al. [19] applies domain-
dependent rules to leverage temporal information in linking
records, while we are the first to present a theoretical model
that can be applied generally.
Temporal data A suite of temporal data models [20], tem-
poral knowledge discovery paradigms [21] and data currency
models [6] have been proposed in the past; however, we are
not aware of any work focusing on linking temporal records.
The notion of decay has recently been proposed in the con-
text of data warehouses and streaming data [22,23]. They use
decay to reduce the effect of older tuples on data analysis. Of
them, backward decay [22] measures time difference back-
ward from the latest time and forward decay [23] measures
time difference forward from a fixed landmark. Their d ecay
function is either binary or a fixed (exponential or polyno-
mial) function. We differ in that 1) we consider time differ-
ence between two records rather than from a fixed point, and
2) we learn the decay curves purely from the data rather than
using a fixed function.
7 Conclusions and future work
This article studied linking records with temporal informa-
tion. We apply decay in record-similarity computation and
consider the time order of records in clustering; thus, our
linkage technique is tolerant of entity evolution over time and
can glean evidence globally for decision making. Future work
includes combining temporal information with other dimen-
sions of information such as spatial information to achieve
better results, considering erroneous data especially erro-
neous time stamps, and combining our work with recent work
on inferring temporal ordering of records [6] for linkage.
References
1. Elmagarmid A, Ipeirotis P, Verykios V. Duplicate record detection: A
survey. IEEE Transactions on Knowledge and Data Engineering, 2007,
19(1): 1–16
2. Koudas N, Sarawagi S, Sriv astava D. Record linkage: similarity mea-
sures and algorithms. In: Proceedings of the 25th ACM SIGMOD In-
ternational Conference on Management of Data. 2006, 802–803
3. Weikum G, Ntarmos N, Spaniol M, T riantafillou P, Bencz
rA,Kirk-
patrick S, Rigaux P, Williamson M. Longitudinal analytics on web
archive data: It
s about time! In: Proceedings of the Biennial Confer-
ence on Innovative Data Systems Research. 2011, 199–202
4. McCallum A, Nigam K, Ungar L. Efficient clustering of highdimen-
sional data sets with application to reference matching. In: Proceedings
of the 6th ACM SIGKDD International Conference on Knowledge Dis-
cove ry and Data Mining. 2000, 169–178
5. Li P, Dong X, Maurino A, Srivasta va D. Linking temporal records. Pro-
ceedings of the VLDB Endowment, 2011, 4(7): 956–967
6. Fan W, Geerts F, Wijsen J. Determining the currency of data. In: Pro-
ceedings of the 30th Symposium on Principles of Database Systems of
Data. 2011, 71–82
7. Hassanzadeh O, Chiang F, Lee H, Miller R. Framework for e valuating
clustering algorithms in duplicate detection. Proceedings of the VLDB
Endowment, 2009, 2(1): 1282–1293
8. Fellegi I P, Sunter A B. A theory for record linkage. Journal of the
American Statistical Association, 1969, 64(328): 1183–1210
9. Dey D. Entity matching in heterogeneous databases: A logistic regres-
sion approach. Decision Support Systems, 2008, 44(3): 740–747
10. Hern
ndez M, Stolfo S. Real-world data is dirty: Data cleansing
and the merge/purge problem. Data Mining and Knowledge Discov-
ery, 1998, 2(1): 9–37
11. Domingos P. Multi-relational record linkage. In: In Proceedings of the
KDD-2004 Workshop on Multi-Relational Data Mining. 2004, 31–48
12. Winkler W. Methods for record linkage and bayesian networks. T ech-
nical report, Statistical Research Division, US Census Bureau, Wash-
ington, DC, 2002
13. Ananthakrishna R, Chaudhuri S, Ganti V. Eliminating fuzzy duplicates
in data w a rehouses. In: Proceedings of the 28th International Confer -
ence on Very Large Data Bases. 2002, 586–597
14. Chen Z, Kalashniko v D, Mehrotra S. Exploiting relationships for ob-
ject consolidation. In: Proceedings of the 2nd International Workshop
on Information Quality in Information Systems. 2005, 47–58
15. On B, Koudas N, Lee D, Sri vastava D. Group linkage. In: Proceedings
of IEEE 23rd International Conference on the Data Engineering. 2007,
496–505
16. Wijaya D, Bressan S. Ricochet: A family of unconstrained algorithms
for graph clustering. In: Database Systems for Advanced Applications.
2009, 153–167
17. Flake G, Tarjan R, Tsioutsiouliklis K. Graph clustering and minimum
cut trees. Internet Mathematics, 2004, 1(4): 385–408
18. Yakout M, Elmagarmid A, Elmeleegy H, Ouzzani M, Qi A. Behav-
ior based record linkage. Proceedings of the VLDB Endowment, 2010,
3(1-2): 439–448
19. Burdick D, Hern
ndez M A, Ho H, Koutrika G, Krishnamurthy R,
Popa L, Stanoi I, Vaithyanathan S, Das S R. Extracting, linking and in-
tegrating data from public sources: A financial case study. IEEE Data
Engineering, 2011, 34(3): 60–67
20. Ozsoyoglu G, Snodgrass R. Temporal and real-time databases: A sur-
ve y. IEEE Transactions on Knowledge and Data Engineering, 1995,