comScore answers a few of my questions about their recent report
As I mentioned a few times in the Yahoo! group, I have been talking to the folks at comScore about their recent report on cookie deletion. I got an email back from Andrew Lipsman with some more information and partial answers to questions of mine and a few passed to me by other cookie-savvy folk.
According to Andrew, comScore will be publishing a more complete document describing their research methodology in the next few weeks. Until then, they’re giving me the scoop so here you have it, direct from comScore (my questions in bold type, comScore’s answers follow in normal type):
(Andrew provided this preamble to his answers …)
The reason we have done this study for two cookies is to ensure that we are very familiar with the cookie structure, the different value pairs (e.g. GUID=1234) and their purpose. We are in particular interested in ID value pair that identifies a user over time, and does not change when the cookie gets refreshed.
How did they identify the unique values of the cookies? Using the Set Cookie Response header, the Cookie Request header, or the actual storage (cookie) file?
We are reading the cookie request call and parsing out specific ID value pairs. Over time we will observe a time series for each panelist for the value of this identifier corresponding to each cookie request. Cookie reset events are based on qualified value changes for a targeted ID value-pair.
How did they take into account non-persistence and/or cookie expiration settings?
The cookie domain value-pairs were chosen to represent passively assigned unique identifiers designed to be persistent over time. Cookies of this nature should only expire in the event that the visitor never returned within a relatively long expiration window.
How did they identify First vs. Third party cookies?
We are reading specific value-pairs for specific domain cookies. The first party cookie is the cookie used by the Portal site. The third party cookie was used by the ad serving company. All information is directly observed from metered panel activity. Recall information was not a source of determining preservation rates.
How was cookie blocking treated or accounted for?
The analysis is based on a sample for which at least one cookie value was observed.
What were the domains they examined? If not the domains, what was the nature of the first-party site?
We will not disclose the names of the sites used for the analysis. First party site is a major internet portal. The third-party site is a major ad server.
What were the survey questions asked? How many people were asked and how were they selected?
All deletion/retention figures were derived from direct panel observations, not from a recall-based survey. Only qualitative information came from the survey.
Obviously some of the answers provided are lacking but I’m willing to admit that it may be more a function of my incomplete knowledge of what the comScore panel application is able to capture.
One particularly good question from a reader essentially asked if P3P-instigated cookie blocking could be artificially running up cookie reset counts (essentially counting each page request as a new cookie) to which comScore answered that the study only included panel members for which “at least one cookie value was observed.” (plus, P3P is less likely to impact the first-party cookies that I’m more interested in …)
The encouraging news is that comScore is now officially on the record as willing to produce additional documentation about the study within the next week or so. I conveyed to Andrew some of the skepticism about the results they report, skepticism I told them they would hear, and pointed him to the ongoing conversation so hopefully the community’s concerns will be directly addressed in their methodology document.
Suffice to say, if some major flaw appears in their research, the company will have major egg on their face as they approach their announced IPO. Conversely, if the research proves sound under examination, regardless of whether you’re a data purist looking for “perfection” or willing to manage based on trends however flawed the underlying data might actually be, we all have something to consider the next time someone asks us “how many visitors come to your web site?”
Perhaps the only true and precise answer is, “It depends!”
What do you think about the answers that comScore provided? As always, your comments are greatly appreciated!