How Google and Adobe Identify Your Web Visitors
A few weeks ago I wrote about cookies and how they are used in web analytics. I also wrote about the browser feature called local storage, and why it’s unlikely to replace cookies as the primary way for identifying visitors among analytics tools. Those 2 concepts really set the stage for something that is likely to be far more interesting to the average analyst: how tools like Google Analytics and Adobe Analytics uniquely identify website visitors. So let’s take a look at each, starting with Google.
Google Analytics
Classic GA
The classic Google Analytics tool uses a series of cookies to identify visitors. Each of these cookies is set and maintained by GA’s JavaScript tracking library (ga.js), and has a name that starts with __utm (a remnant from the days before Google acquired Urchin and rebranded its product). GA also allows you to specify the scope of the cookie, but by default it will be for the top-level domain, meaning the same cookie will be used on all subdomains of your site as well.
- __utma identifies a visitor and a visit. It has a 2-year expiration that will be updated on every request to GA.
- __utmb determines new sessions and visits. It has 30-minute expiration (same as the standard amount of time before a visit “times out” in GA) that will be updated on every request to GA.
- __utmz stores all GA traffic source information (i.e. how the visitor found your site). If you look closely at its value, you’ll be able to spot campaign query parameters or search engine referring domains, or at the very least the identifier of a “direct” visit. It has an expiration of 6 months that is updated on every request to GA.
- __utmv stores GA’s custom variable data (visitor-level only). It has an expiration of 2 years that is updated on every request to GA.
That was a mouthful – you might want to read through it again to make sure you didn’t miss anything! There are even a few cookies I didn’t list because GA sets them but they don’t contribute at all to visitor identification. If that looks like a lot of data sitting in cookies to you, you’re exactly right – and it helps explain why classic GA offers a much smaller set of reports than some of the other tools on the market. While I’m sure GA does a lot of work on the back-end, with all those cookies storing traffic source and custom variable data, there’s definitely a lot more burden being placed on the browser to keep a visitor’s “profile” up-to-date than on other analytics tools I’ve used. Understanding how classic GA used cookies is important to understanding just what an advancement Google’s Universal Analytics product really is.
Universal Analytics
Of all the improvements Google Universal Analytics has introduced, perhaps none is as important as the way it identifies visitors to your website. Now, instead of using a set of 4 cookies to identify visitors, maintain visit state, and store traffic source and custom variable data, GA uses just one, called _ga, with a 2-year expiration, and the same default scope as with Classic GA (top-level domain). That single cookie is set by the Universal Analytics JavaScript library (analytics.js) and used to uniquely identify a visitor. It contains a value that is relatively short compared to everything Classic GA packed into its 4 cookies. Universal Analytics then uses that one ID to maintain both visitor and visit state inside its own system, rather than in the browser. This reduces the amount of cookies being stored on the visitor’s computer, and opens up all kinds of new possibilities in reporting.
One final note about GA’s cookies – and this applies to both Classic and Universal – is that there is code that can be used to pass cookie values from one domain to another. This code passes GA’s cookie values through the query string onto the next page, for cases where your site spans multiple domains, allowing you to preserve your visitor identification across sites. I won’t get into the details of that code here, but it’s useful to know that feature exists.
Many of the new features introduced with Universal Analytics – including additional custom dimensions (formerly variables) and metrics, enhanced e-commerce tracking, attribution, etc. – are either dependent upon or made much easier by that simpler approach to cookies. And the ability to identify your own visitors with your own unique identifier – part of the new “Measurement Protocol” introduced with Universal Analytics – would have fallen somewhere between downright impossible and horribly painful with Classic GA.
This one change to visitor identification put GA on a much more level playing field with its competitors – one of whom we’re about to cover next.
Adobe Analytics
Over the 8 years or so that I’ve been implementing Adobe Analytics (and its Omniture SiteCatalyst predecessor), Adobe’s best-practices approach to visitor identification has changed many times. We’ll look at 4 different iterations – but note that with each one, Adobe has always used a single ID to identify visitors, and then maintained visitor and visit information on its servers (like GA now does with Universal Analytics).
Third-party cookie (s_vi)
Originally, all Adobe customers implemented a third-party cookie. This is because rather than creating its visitor identifier in JavaScript, Adobe has historically created this identifier on its own servers. Setting the cookie server-side allows them to offer additional security and a greater guarantee of uniqueness. Because the cookie is set on Adobe’s server, and not on your server or in the browser, it is scoped to an Adobe subdomain, usually something like companyname.112.2o7.net or companyname.dc1.omtrdc.net, and is third-party to your site.
This cookie, called s_vi, has an expiration of 2 years, and is made up of 2 hexadecimal values, surrounded by [CS] and [CE]. On Adobe’s servers, these 2 values are converted to a more common base-10 value. But using hexadecimal keeps the values in the cookie smaller.
First-party cookie (s_vi)
You may remember from an earlier post that third-party cookies have a less-than-glowing reputation, and almost all the reasons for this are valid. Because third-party cookies are much more likely to be blocked, several years ago, Adobe started offering customers the ability to create a first-party cookie instead. The cookie is still set on Adobe’s servers – but using this approach, you actually allow Adobe to manage a subdomain to your site (usually metrics.companyname.com) for you. All Adobe requests are sent to this subdomain, which looks like part of your site – but it actually still just belongs to Adobe. It’s a little sneaky, but it gets the job done, and allows your Adobe tracking cookie to be first-party.
First-party cookie (s_fid)
In most cases, using the standard cookie (either first- or third-party) works just fine. But what if you’re using a third-party cookie and you find that a lot of your visitors have browser settings that reject it? Or what if you’re using a first-party cookie, but you have multiple websites on completely different domains? Do you have to set up subdomains for first-party cookies for every single one of them? What a hassle!
To solve for this problem where companies are worried about third-party cookies – but can’t set up a first-party cookie for all their different websites – a few years ago Adobe began offering yet another alternative. This approach uses the standard cookie, but offers a fallback method when that cookie gets rejected. This cookie is called s_fid, and it is set with JavaScript and has a 2-year expiration. Whenever the traditional s_vi cookie cannot be set (either because it’s the basic Adobe third-party cookie, or you have multiple domains and don’t have first-party cookies set up for all of them), Adobe will use s_fid to identify your visitors. Note that the value (2 hexadecimal values separated by a dash) looks very similar to the value you’d find in s_vi. It’s a nice approach for companies that just can’t set up first-party cookies for every website they own.
Adobe Marketing Cloud ID
The current iteration of Adobe’s visitor identification is a brand-new ID that allows for a single ID across Adobe’s entire suite of products (called the “Marketing Cloud”). That means if you use Adobe Analytics and Adobe Target, they can now both identify your visitors the exact same way. It must sound crazy that Adobe has owned both tools for over 6 years and that functionality is only now built right into the product – but it’s true!
This new Marketing Cloud ID works a little differently than any approach we’ve looked at so far. A request will be made to Adobe’s server, but the cookie won’t be set there. Instead, an ID is created and returned to the page as a snippet of JavaScript code. That code can then be used to write the ID to a first-party cookie by Adobe’s JavaScript library. That cookie will have the name of AMCV_, followed by your company’s unique organization ID at Adobe, and it has an expiration of 2 years. The value is much more complex than with either s_vi or s_fid, but I’ll save more details about the Marketing Cloud ID until next time. It offers a lot of new functionality and has some unique quirks that probably deserve their own post. We’ve covered a lot of ground already – so check back soon and we’ll take a much more in-depth look at Adobe’s Marketing Cloud!