In my article in Monday’s Times, “To Aim Ads, Web Is Keeping Closer Eye on What You Click,” I worked with comScore to develop a new measure for Web companies: how much data they can collect from users.
On the Internet, companies are typically ranked by how many different people visit their sites in a given month. And when Microsoft announced its $41 billion bid for Yahoo, comScore and Nielsen Online promptly put out estimates counting how many people would be in the merged company’s total audience.
But audience size is not everything in the online world. Advertisers increasingly want media companies to find their most likely customers and show their ads only to those people, rather than to the site’s entire audience.
Such targeted advertising requires data, so there’s a good argument to be made that we can spot the companies that will lead the pack in online advertising by looking at the depth of data that large media companies can collect about each of their Web visitor. Here is some more detail about the methodology comScore and I came up with:
The comScore study tallied five types of “data collection events” on the Internet for 15 large media companies. Four of these events are actions that occur on the sites the media companies run: Pages displayed, search queries entered, videos played, and advertising displayed. Each time one of those four things occurs, there is a conversation between the user’s computer and the server of the company that owns the site or serves the ad.
The fifth area that comScore looked at was ads served on pages anywhere on the Web by advertising networks owned by the media companies. These include text ads provided by Google’s AdSense network, for example, and display ads from AOL’s Advertising.com unit. Ad networks add the ability for these companies to note where you are on other Web sites when they serve you an ad. Google, for example, can note that your Internet Protocol address is on Kelly Blue Book, if it serves you an AdSense ad there.
So each time one of these five things occur, it is an “data collection event.” The data that is transferred varies for each. Typically, Web company receives information about the type of page the user is looking at, the user’s I.P. address (which sometimes has clues to the user’s location), and for advertising, the content of the ad. Most Web sites and advertising networks place cookies on users’ browsers, allowing them to recognize each time they interact with that user in the future. Cookies themselves don’t identify the name of users, but if users register with a Web site, their identities can be linked to their cookies.
When all these data collection events are combined for users in the United States in December 2007, Yahoo had the potential to gather data, through 400 billion events in the month. Time Warner, which includes AOL, was second, with about 100 billion events. Google was not too far behind with 91 billion.
Interestingly, Microsoft, with 51 billion events in December is far behind not only the other big Internet companies, but also the News Corporation’s Fox Interactive Media, which owns MySpace.
Below is a view of this data. Here is an image that shows the data behind the graphic, as well as a version of the data that shows the average number of data collection events for each of the company’s users.
What is important here is not the precise numbers, but the overall picture that the biggest Internet companies are accumulating many different ways to collect data about users. Many caveats are needed: Not all of this data is useful; not all of it is retained by the companies with access to it; much of it cannot be traced back to individuals.
Moreover, this method often identifies several data collection events on a single Web page. That is because one page can contain search results, video players, and ads from several sources, each of which can send different data in a different direction.
Another caveat: ComScore’s method of measuring advertising networks has limitations that make it difficult to compare one network to another. For the networks run by Yahoo, Microsoft and AOL, comScore doesn’t count how many ads they actually display, but how many pages their ads could appear on. This substantially overcounts the networks’ data collection because some Web sites have several networks that compete to place ads on their pages. ComScore counts the page views on those pages - without knowing if that network did in fact serve an ad on that page view. So the ad network tallies for these companies represent potential data collection events, rather than definite ones.
For Google, comScore can actually identify when ads from its AdSense network are loaded on a Web page. but this measure could overstate Google’s potential to collect data. That’s because Google may display several short text ads on one page, and comScore counts each of those text ads separately. To compensate in this study, comScore tried to figure out how many pages Google ads are loaded on pages. It took its count of ads displayed and divided that by 4.17, its estimate of the average number of AdSense ads that appear together on a page.
ComScore’s December 2007 figures for AOL, moreover, do not include the reach of Tacoda, the behavioral targeting firm AOL just bought.
I do not suggest using the ad network figures to make comparisons between the Internet giants. Instead, you should look at them as potential expansions of these companies’ reach. They do collect significant data from their ad networks - but possibly not as much as suggested by these figures.
These comScore figures - though eye-popping - provide only a minimum level of data collection events. There are other ways these companies obtain data that comScore was unable to capture. The two largest ways left out here are ad-serving data (from the likes of Microsoft’s Atlas and Google’s desired partner DoubleClick) and user-volunteered data. By the latter, I mean the information that users enter when they register for sites or e-mail accounts as well as all the juicy details they post on social networking pages.
Arnie Gullov-Singh, vice president of advertising technology at Fox Interactive Media, the owner of MySpace, likes to call this sort of information “hand-raiser data,” since people choose to type it in.
I hope what I’ve done here will start a conversation. It would be fascinating to see someone try to quantify the aspects of data collection left out of this analysis. Atlas serves 6 billion ads per day, for example, which could be added in.
It is also well worth watching whether most of the data proves lucrative. Perhaps there will be diminishing returns at some point, though Mike Galgon, chief advertising strategist at Microsoft (and co-founder of aQuantive), told me he didn’t think there would be.
Consumers get all kinds of free services and content on the Web because they are shown ads, and media companies are increasingly showing them ads based on data they have collected about them. So, in a sense, consumers “pay” for free content and features like e-mail by letting companies collect this data about them.
When regulators evaluate mergers from a consumer protection standpoint, they consider whether mergers would end up raising the prices that consumers pay for those companies’ products. Since people “pay” with information about themselves on the Internet, rather than with dollars, regulators should consider consumer data when they consider mergers.
If Yahoo is to merge with Microsoft or any company, the merged company will be an entity that has significantly more data about consumers. Will consumers get more - or better - free services in exchange?
Famous New Yorker cartoon from 1993 showed two dogs at a computer, with one saying to the other, "On the Internet, nobody knows you're a dog."
That may no longer be true.
A new analysis of online consumer data shows that large Web companies are learning more than ever before the gritty details of what people search for and do on the Internet, gathering clues about the tastes and preferences of a typical user several hundred times a month.
These companies use that information to predict what content and advertisers people most likely want to see. They can charge steep prices for carefully tailored ads because of their high response rates.
The analysis, conducted for The New York Times by the research firm comScore, provides what advertising executives say is the first broad estimate of the amount of consumer data transmitted to Internet companies every day.
The analysis indicates that Web companies are, in effect, taking the trail of crumbs people leave behind as they move around the Internet and analyzing them to anticipate people's next steps. So anybody who searches for information on such disparate topics as iron supplements, airline tickets, hotels and soft drinks may see ads for those products and services later on.
Consumers have not complained to any great extent about data collection online. But privacy experts say that is because the collection is invisible to them.
"When you start to get into the details, it's scarier than you might suspect," said Marc Rotenberg, executive director of the Electronic Privacy Information Center, a privacy-rights group. "We're recording preferences, hopes, worries and fears."
But executives from the largest Web companies say that privacy fears are misplaced, and that they have policies in place to protect consumers' names and other personal information from advertisers. Moreover, they say, the data is a boon to consumers, because it makes the ads they see more relevant.
Detail producing payoffs
The rich troves of data at the fingertips of the biggest Internet companies also are creating a new kind of digital divide within the industry. Traditional media companies, which collect far less data about visitors to their sites, are increasingly at a disadvantage when they compete for ad dollars.
The major television networks and magazine and newspaper companies "aren't even in the same league," said Linda Abraham, an executive vice president at comScore.
extra...Yahoo's New Appeal to Women
Given how the media world has changed in recent years, it's fitting that what's essentially the next major women's magazine will come courtesy of Yahoo! (YHOO).
Against the turbulent backdrop of Microsoft's (MSFT) bid to buy the Web portal and reports of its own maneuvers to find alternate deal partners, Yahoo is quietly putting the finishing touches on a major new content site aimed at women between the ages of 25 and 54. Much like a general-interest women's magazine, the site will focus on familiar content categories: fashion and beauty, entertainment, health, astrology, home, food, parenting, relationships, and work and money. It's not yet clear what the site will be called, but one name in contention is Shine.
Despite Yahoo's well-annotated stock price ills, comScore (SCOR) data still show the portal as the most-trafficked site among U.S. Web users. Its key category sites in the areas of finance, sports, and news are the most-trafficked spots of their ilk as well. (Although Yahoo Sports has been tussling with ESPN.com for the top slot of late.) The company has similar aims for its women's site. "At Yahoo, we have to publish in categories that have super-mass scale," says Scott Moore, a Yahoo senior vice-president and head of media. Accordingly, he adds, "Our ambitions are very big."
Leading the Pack
Top players among what comScore identifies as "women's community" sites are NBC Universal's iVillage.com (GE), AOL Living (TWX), and fast-rising newcomer Everyday Health. Those three sites respectively notched 17.8 million, 16.9 million, and 14.4 million unique U.S. visitors in January.
Yahoo executives argue that women are broadly underserved online, which the established players in the space would dispute. But other recent developments hint at increased activity at female-aimed online plays. A quintet of high-powered media professionals—adwoman Mary Wells, gossip columnist Liz Smith, pundit Peggy Noonan, the journalist Lesley Stahl, and former top publishing executive Joni Evans—joined up to launch wowowow.com, a Web site aimed at women over 40. And InterActiveCorp's (IACI) search engine, Ask.com, is expected to sharpen the focus on its predominately female audience.
The new site's editor-in-chief will be Brandon Holley, who came to Yahoo in November, 2007, after stints as top editor of the now-defunct magazines Elle Girl and Jane. Holley is overseeing an edit staff of about 12, which, she notes, is around a quarter of the size of the staff she headed at those magazines.
A One-Stop Shop
The reigning ethos of Web media, and especially the digital cognoscenti, would appear to argue against a site that acts as a giant, edited aggregator of sorts. Yahoo-ites involved with the project reject this, pointing to the portal's success in attracting Web surfers to broad sites like Yahoo News. They also claim that Yahoo's extensive data on Web users—particularly from the roughly 40 million women between the ages of 25 and 54 who visit Yahoo each month—give them a unique leg up in designing such an ambitious site. "She would like to come to one place," says Holley of the target consumer. "A one-stop shop.… They are just not satisfied with what they are finding in their various pockets."
The appearance of the site will differ somewhat from Yahoo's extant category plays. Mock-up pages spotlight many short items; having multiple options to click on gives the pages more of what Moore terms a "blogroll" feel. Users will be able to create their own blogs, some of which may be spotlighted by the editors. They will also be able to create their own home pages, and be able to "clip" certain articles and put them on such pages, although that technology is not expected to be in place at the launch in early spring.
Yahoo executives say they are in advanced discussions with several established publishers over content partnerships, although no deals are finalized. Yahoo is also expected to link out to other sites' offerings. Yahoo executives have scheduled group meetings in three cities, to reach out to key female bloggers and other women it has identified as influential. The first such meeting will be Mar. 8 in San Francisco.