All Web Analytics Tools Are the Same (when it comes to data capture)
I started to write a post on using web analytics tools — Google Analytics, specifically, but with a nod to Webtrends as well — to track traffic to custom tabs and interactive elements on Facebook pages. But, as I started thinking through that content, I realized that I needed to back up and make sure I had a good, clean explanation of a key aspect of the mechanics of page tag-based web analytics tools. I poked around on the interweb a bit and found some quick explanations that were accurate, but that really weren’t as detailed as I was hoping to find.
Regardless of whether you’re trying to track Facebook or not, it’s worth having a good, solid understanding of these underlying mechanics:
- If you’re a web analyst, understanding this is like understanding gravity if you’re a human being — there are some immutable laws of the internet, and knowing how those laws drive the data you are seeing will open up new possibilities for capturing activity on your site
- If you’re a developer, then this will be a quick read, but understanding it will make you the hero to both your web analysts and (assuming they’re not glory hogs) the people they support with their analysis, because you will be able to suggest some clever ways to capture useful information
By the end of this post, you should understand both the title and why the URLs I listed below are what make it so:
- Google Analytics = http://www.google-analytics.com/__utm.gif
- Webtrends = http://statse.webtrendslive.com/<ID>/dcs.gif
- Sitecatalyst = https://<custom domain>/b/ss/<account name>/1/<code version>/<random ID>
- Coremetrics = http://<custom domain>/cm or http://<custom domain>/eluminate
I’ve been deep under the hood with both Google Analytics and Webtrends for this, but the same principles apply to all tools (because they’re all bounded by the Physics of the Internet). I’m going to talk about Google Analytics the most in-depth, because it has the largest market share (measured by number of sites tagged with it), and I’ll try to call out key differences when appropriate.
Let’s start with a simple picture of how all of these tools work. When a visitor comes to a page on your site, the following sequence of events happens:
Steps 2 and 3 are really the crux of the biscuit, but we need to make sure we’re all clear on the first step, too, before getting to the fun there.
1 – Javascript figures out stuff about the visitor
We all know what Javascript is, right? It’s one of the key languages that can be interpreted by a web browser so that web pages aren’t just static text and images: dropdown menus, mouseovers, and such. But, Javascript also enables some things to go on behind the scenes. The basic data capture method for any tag-based web analytics tool is to run Javascript to determine what page the visitor is on, what relevant cookies are set on the user’s machine, whether the visitor has been to the site before, what browser the visitor is using, what language encoding is set for the browser, the user’s screen resolution, and a slew of other fairly innocuous details. This happens every time a visitor views a page running the page tag. So, great — a visitor has viewed a page, and the Javascript has figured out a bunch of details about the visitor and the page. Now what? It’s on to step 2!
(I realize I’m saying “Javascript” here, and most tools also have Actionscript support for tracking activity within Flash — for the purposes of this post, I’m just going to stick with Javascript, but I’ll get back to Actionscript in my next post!)
2 – Javascript packages that info into a single string of information
The next step is pretty simple, but it’s where the magic starts to happen. Let’s say the Javascript in step 1 had figured out the following information about a visitor to a page:
- Site = www.gilliganondata.com
- Page title = The Fun of Facebook Measurement
- Page URL = /index.php/2010/01/11/the-fun-of-facebook-measurement/
- Browser language = en-us
Converting that info into a single string is pretty straightforward. Let’s start by pretending we’re going to put it into a single row in a pipe-delimited file. It would look like this:
Site (hostname) = www.gilliganondata.com | Page name = The Fun of Facebook Measurement | Page URL = /index.php/2010/01/11/the-fun-of-facebook-measurement/ | Browser language = en-us
Now, rather than using the pretty, readable names for each of the four characteristics of the page view, let’s use some variable names (these are the Google Analytics variable names, but the documentation for any web analytics tool will provide their specific variable names for these same things):
- Site (hostname) –> utmhn
- Page title –> utmdt
- Page URL –> utmp
- Browser language –> utmul
So, now our string looks like:
utmhn = www.gilliganondata.com | utmdt = The Fun of Facebook Measurement | utmp = /index.php/2010/01/11/the-fun-of-facebook-measurement/ | utmul = en-us
We used pipes to separate out the different variables, but there’s nothing really wrong with using something different, is there? Let’s go with using “&” instead and eliminate the spaces around equal signs and the delimiters. The single string now looks like this:
utmhn=www.gilliganondata.com&utmdt=The Fun of Facebook Measurement&utmp=/index.php/2010/01/11/the-fun-of-facebook-measurement/&utmul=en-us
Now, we’ve still got some “special” characters that aren’t going to play nice in the Step 3 — namely spaces and “/”s, so let’s replace those characters with the appropriate URL encoding (%20 for the spaces and %2F for the “/”s):
utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20
Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-
facebook-measurement%2F&utmul=en-us
It looks a little messy, but it’s a single, portable string that has the exact information that was listed in the four bullets that started this section. While it might be painful to reverse-engineer this string into a more reader-friendly format by hand, it’s a snap to do programmatically (which is exactly what web analytics tools do…as we’ll discuss in step 4) or in Excel.
Before we move on, let’s tack one more parameter onto our string. This is something that is actually hard-coded into the Javascript, and it identifies which web analytics account this traffic needs to go to. In the case of this blog, that account ID is “UA-2629617-3” and the variable Google Analytics uses to identify the account parameter is “utmac.” I’ll just tack that on the end of our string, which now looks like:
utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20
Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-
facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3
A subtle point: what we’ve really done above is to combine all the information into a single string with a series of “key-value pairs.” In the case of the first variable, the “key” is “utmhn” and the “value” is “www.gilliganondata.com.” Notice that both the key AND the value are included in the string. If you’ve worked with comma-delimited or tab-delimited files, then you might be wondering why the key is included. Why can’t the Javascript always pass in the variables in the same order, and the web analytics server would know that the first value is the hostname, the second value is the title, and so on? There are at least four reasons for this:
- It just generally makes the process more robust because it reaffirms to the server exactly what each value means at the point the server receives the information; the internet is messy, so hiccups can happen
- Most “advanced” features when it comes to capturing web analytics data rely on tacking on additional parameters to the master string — by including both the key and the value for every parameter, that fanciness doesn’t have to worry about the order the parameters are passed in, AND it means the custom parameters get viewed/processed exactly the same way that the basic parameters do
- The “key-value pairs separated by the & sign” are standard on the internet. Go to any online retail site and poke around, and you will see them in the URL. It’s kind of a standard way to transmit a series of variables onto the back end of a web page or image request, and that’s really all that’s going to happen in step 3
We’ve got our string, so now let’s do something with it!
3 – Javascript makes an image request with that string tacked on the end
Somehow, we need to pass that string back to the web analytics server. We do that by making an image call. In the case of Google Analytics that image request is always, always, always exactly the same, no matter the site using Google Analytics:
http://www.google-analytics.com/__utm.gif
Just like we covered in the “online retail site” URL structure discussion at the end of the last section, we’re going to tack some parameters on the end of the __utm.gif request. The standard way to take a base URL and tack on parameters is to add a “?” followed by one or more key-value pairs that are separated by an “&” sign. Lucky for us, the “&” sign is what we used when we were building our string in the last section! So:
http://www.google-analytics.com/__utm.gif
+
?
+
utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20
Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2F
the-fun-of-facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3=
http://www.google-analytics.com/__utm.gif?utmhn=www.gilliganondata.com&
utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2F
index.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&
utmul=en-us&utmac=UA-2629617-3
Wow, that looks messy, but it just looks messy — it’s actually quite clean! In reality, there are way more than five parameters tacked onto the image request. As a matter of fact, the request above would really look more like this:
http://www.google-analytics.com/__utm.gif?utmwv=4.6.5&utmn=1516518290&
utmhn=www.gilliganondata.com&utmcs=UTF-8&utmsr=1920×1080&utmsc=24-
bit&utmul=en-us&utmje=1&utmfl=10.0%20r45&utmdt=The%20Fun%20of%20
Facebook%20Measurement%20%7C%20Gilligan%20on%20Data%20by%20Tim
%20Wilson&utmhid=1640286085&utmr=http%3A%2F%2Fgilliganondata.com
%2F&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-
measurement%2F&utmac=UA-2629617-3&utmcc=__utma%3D116252048.
1573621408.1267294551.1267294551.1267299933.2%3B%2B__utmz%3D
116252048.1267294551.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7C
utmcmd%3D(none)%3B&gaq=1
You can get a complete list of the Google Analytics tracking variables from Google (if you’re really into this, check out the utmcc value — that actually is a single parameter that includes multiple sub-parameters, which are separated by “%3D” — a URL-encoded semicolon — instead of an “&”; these are the user cookie values, which you can find towards the end of the long string above if you look for it). You can inspect the specific calls using any number of tools. I like to use the Firebug plugin for Firefox, but Fiddler is another free tool, and Charles is the standard tool used at my company. And, there’s always WASP to provide the “clean” view of the parameters (I use WASP heavily…unless I’m trying to reverse-engineer the specific calls being made for some reason).
The Javascript makes a request for that URL. This is the infamous “1×1 image.” Just to sharpen the edges a little bit on some common misconceptions about that image request:
- The request for the image is what matters — while the 1×1 image will get delivered back, by the time www.google-analytics.com actually sends out the image, the page view has already been counted. As a matter of fact, if there was no __utm.gif image, the traffic would still get counted simply by virtue of the fact that the Google Analytics server received the image request. As it happens, some other little user experience hiccups can happen if there’s no actual image, but the existence of the file matters ‘nary at all from a data capture perspective!
- Yes, you can actually just request the image directly from your browser. Go ahead — here’s the URL as a hyperlink: http://www.google-analytics.com/__utm.gif (yeah, it’s something of a letdown, but now you can say you’ve done it)
- The image isn’t a 1×1 pixel image so that it’s small and not noticed by the user. If Google got a wild hair to replace the __utm.gif image with a 520×756 pixel image of a psychedelic interpretation of the Mona Lisa…no one would ever see the change (unless they were doing something silly like calling the image directly from their browser as described in the previous bullet). The image gets requested by the Javascript, but it never gets displayed to the user. It’s sort of like a Javascript dropdown menu — the text for the dropdown gets loaded into the browser memory so that, if you mouse over the menu, the text is already there and can be displayed immediately. The __utm.gif request is the same way…except there’s nothing in the Javascript that ever actually tries to render the image to the user
And one more point: While we’ve been talking about “image requests” here, it doesn’t have to be an image request per se. In the case of Google Analytics, it is. In the case of Webtrends, it is, too (the image is called dcs.gif). In the case of other web analytics packages, it’s not necessarily an image request, but it is a request to the web analytics server. What matters is understanding that there are a bunch of key-value pairs tacked on after a “?” in the request, and that’s where all of the fun information about the visit to the page gets recorded and passed.
4 – Web analytics tool reads the string and puts the information into a database
So, the web analytics server has been getting bombarded with the requests from Step 3. Can you see how straightforward it is for software to take those requests and split them back out into their component parts? That’s the easy part. Where the tools really differentiate themselves is how exactly they store all of that data — the design of their database and then how that data is made available for queries and reports by analysts.
Back in the day (and I assume it’s still an option), Webtrends would make the raw log files available to their customers as an add-on service. That was handy — once we understood the basics of this post and the Webtrends query parameters, we were able to sift through for some juicy nuggets to supplement our “traditional” web analytics (these were in the days before Webtrends had their “warehouse” solution, which would have made the same information available).
5 – Web analyst queries the database for insights
Like step 4, this is an area where web analytics tools really differentiate themselves. In the case of Google Analytics, there is the web-based tool and the API. In the case of paid, enterprise-class tools, there are similar tools plus true data warehouse environments that allow much more granular detail, as well as two-way integration with other systems.
Why Understanding This Matters
You’re still reading, so maybe I should have made this case earlier. But, the reason this matters is because, once you understand these mechanics, you can start to do some fun things to handle unique situations. For instance, what do you do if you have Google Analytics, and you want to track activity somewhere where Javascript won’t run (like…um…your Facebook fan page — that’ll be my next post!). Or, more generally, if you’re Googling around looking for ways to address some sort of one-off tracking need, you’ll understand the explanations that you’re finding — these solutions invariably involve twiddling around within the framework described here.
As I read back through this post before publishing it, I was struck by how far into the tactical mechanics of web analytics it is. The overwhelming majority of web analytics blog posts focus on step 5 and beyond — how to use the data to be an analysis ninja rather than a report monkey. Understanding the mechanics described here is a foundational step that will support all of that analysis work. I was incredibly fortunate, early in my web analytics career, to have an opportunity to run the migration from a log-based web analytics package to a tag-based solution. I was triply fortunate that I worked on that migration with two brilliant and patient IT folk: Ernest Mueller as the web admin supporting the effort, and Ryan Rutan, the developer supporting the effort — he was hacking the Webtrends page tag before the consultant who we had on-site to help implement it had finished his first day. Ernest drew countless whiteboard diagrams to explain to me “how the internet works” (those “immutable laws” I mentioned early in this post), while Ryan repeated himself again and again until I understood this whole “image request with parameters” paradigm.
If you’re a web analyst, seek out these types of people in IT. A hearty collaboration of cross-discipline skills can yield powerful results and be a lot of fun. I had similar collaborations when I worked at Bulldog Solutions, and the last two weeks saw the same thing happening at my current gig at Resource Interactive. Those are pretty energizing experiences that leave me scratching my head as to why so many companies wind up with an adversarial relationship between “the business” and “IT.” But THAT is a topic for a whoooollllle other post that I may never write…