advertisement performance numbers on large datasets is a very data-intensive opera-
tion, and the scalability and cost advantages of Hadoop and Hive can really help in
computing these numbers in a reasonable time frame and at a reasonable cost.
Many ad networks provide standardized CPC- and CPM-based ad-units to the adver-
tisers. The CPC ads are cost-per-click ads: the advertiser pays the ad network amounts
that are dependent on the number of clicks that the particular ad gets from the users
visiting the site. The CPM ads (short for cost per mille, that is, the cost per thousand
impressions), on the other hand, bill the advertisers amounts that are proportional to
the number of users who see the ad on the site. Apart from these standardized ad units,
in the last few years ads that have more dynamic content that is tailored to each indi-
vidual user have also become common in the online advertisement industry. Yahoo!
does this through SmartAds, whereas Facebook provides its advertisers with Social Ads.
The latter allows the advertisers to embed information from a user’s network of friends;
for example, a Nike ad may refer to a friend of the user who recently fanned Nike and
shared that information with his friends on Facebook. In addition, Facebook also pro-
vides Engagement Ad units to the advertisers, wherein the users can more effectively
interact with the ad, be it by commenting on it or by playing embedded videos. In
general, a wide variety of ads are provided to the advertisers by the online ad networks,
and this variety also adds yet another dimension to the various kinds of performance
numbers that the advertisers are interested in getting about their campaigns.
At the most basic level, advertisers are interested in knowing the total and the number
of unique users that have seen the ad or have clicked on it. For more dynamic ads, they
may even be interested in getting the breakdown of these aggregated numbers by the
kind of dynamic information shown in the ad unit or the kind of engagement action
undertaken by the users on the ad. For example, a particular advertisement may have
been shown 100,000 times to 30,000 unique users. Similarly, a video embedded inside
an Engagement Ad may have been watched by 100,000 unique users. In addition, these
performance numbers are typically reported for each ad, campaign, and account. An
account may have multiple campaigns with each campaign running multiple ads on
the network. Finally, these numbers are typically reported for different time durations
by the ad networks. Typical durations are daily, rolling week, month to date, rolling
month, and sometimes even for the entire lifetime of the campaign. Moreover, adver-
tisers also look at the geographic breakdown of these numbers among other ways of
slicing and dicing this data, such as what percentage of the total viewers or clickers of
a particular ad are in the Asia Pacific region.
As is evident, there are four predominant dimension hierarchies: the account, cam-
paign, and ad dimension; the time period; the type of interaction; and the user dimen-
sion. The last of these is used to report unique numbers, whereas the other three are
the reporting dimensions. The user dimension is also used to create aggregated geo-
graphic profiles for the viewers and clickers of ads. All this information in totality allows
the advertisers to tune their campaigns to improve their effectiveness on any given ad
network. Aside from the multidimensional nature of this set of pipelines, the volumes
558 | Chapter 16: Case Studies