Analytics Strategy

Web Analytics Platforms Are Fundamentally Broken

Farris Khan, Analytics Lead at ProQuest and Chevy Volt ponderer extraordinaire, tweeted the question that we bandy about over cocktails in hotel bars the world over during any analytics gathering:

His tweet came on the heels of the latest Beyond Web Analytics podcast (Episode 48), in which hosts Rudi Shumpert and Adam Greco chatted with Jenn Kunz about “implementation tips.” Although not intended as such, the podcast was skewed heavily (95%) towards Adobe/Omniture Sitecatalyst implementations. As the dominant enterprise web analytics package these days, that meant it was chock full of useful information, but I found myself getting irritated with Omniture just from listening to the discussion.

My immediate reply to Farris’s tweet, having recently listened to the podcast, reflected that irritation:

Sitecatalyst throws its “making it much harder than it should be” talent on the implementation side of things, and I say that as someone who genuinely likes the platform (I’m not  a homer for any web analytics platform — I’ve been equally tickled pink and wildly frustrated with Google Analytics, Sitecatalyst, and Webtrends in different situations). I’m also not criticizing Sitecatalyst because I “just don’t understand the tool. ” I no longer get confused by the distinction between eVars, sProps, and events. I’ve (appropriately) used the Products variable for something totally separate from product information. I’ve used scView for an event that has nothing to do with a shopping cart. I’ve set up SAINT classifications. I’ve developed specs for dynamically triggering effectively named custom links. I’ve never done a stint as an Adobiture employee as an implementation engineer, but I get around the tool pretty well.

Given that I’ve got some experience there, I’ve also worked with a range of clients who have Sitecatalyst employed on their sites. As such, I’ve rolled my eyes and gnashed my teeth at the utter botched-ness of multiple clients’ implementations, and, yes, I’ve caught myself making the same type of critical statements that were rattled off during the podcast about companies’ implementations:

  • Failure to put adequate up front planning into their Sitecatalyst implementation
  • Failure to sufficiently document the implementation
  • Failure to maintain the implementation going forward on an on-going basis
  • Failure to invest in the people to actually maintain the implementation and use the data (Avinash has been fretting about this issue publicly for over 5 years)

In the case of the podcast, though, I wasn’t participating in the conversations — I was simply listening to others’ talk. The problem, though, was that I heard myself chiming in. I jumped right  on the “it’s the client’s fault” train, nodding my head as the panel described eroded and underutilized implementations. But, then a funny thing happened. As  I stepped back and listened to what “I” would have been saying, I got a bit unsettled. I realized I’d been seduced by the vendor. Through my own geeky pride at having cracked the nut of their inner machinations, I’d crossed over to vendor-land and started unfairly blaming the customer for technology shortcomings:

If the overwhelming majority of companies that use a given platform use it poorly…shouldn’t we shine a critical light on the platform rather than blaming the users?

I love digital analytics. I enjoy figuring out new platforms, and it’s fun to develop implement something elegantly and then let the usable data come pouring in that I can feed into reports and use for analysis. But:

  • I’ve been doing this for a decade — hands-on experience with a half-dozen different tools
  • It’s what I’m most interested in doing with my career — it beats out strategy development, creative concepting, campaign ideation, and any and every other possible marketing role
  • I’m a sharp and motivated guy

In short…I’m uniquely suited to the space. I’m neither the only person who is really wired to do this stuff nor even in the 90th percentile of people who fit that bill. But the number of people who are truly equipped to drive a stellar Sitecatalyst implementation are, best case, in the low thousands, and, worst case, in the low hundreds. At the same time, demand for these skills is exploding. Training and evangelization is not going to close the gap! The Analysis Exchange is a fantastic concept, but that’s not going to close the gap, either.

There is simply too much breadth of knowledge and thought required to effectively work in the world of digital analytics for a tool to have a steep learning curve with undue complexity for implementation and maintenance. The Physics of the Internet means there are a relatively finite number of types of user actions that can be captured. Sitecatalyst has set up a paradigm that requires so much client-side configuration/planning/customization/maintenance/incantations/prayer that the majority of implementations are doomed to take longer than expected (much longer than promised by the sales team) and then further doomed to be inadequately maintained.

The signals that Adobe is slowly taking steps to merge the distinction between eVars and sProps is an indication that they realize that there are cases where the backend architecture needlessly drives implementation complexity. But, just as the iPhone shattered the expectations we had for smartphones, and the iPad ushered in an era of tablet computing that will garner mass adoption, Adobe has a very real risk of Sitecatalyst becoming the Blackberry of web analytics. Sitecatalyst 15, for all of the excitement Adobe has tried to gin up, is a laundry list of incremental fixes to functional shortcomings that the industry has simply complained about for years (or, in the case of the the introduction of segmentation, a diluted attempt to provide “me, too” functionality based on what a competitor provides).

The vendors have to take some responsibility for simplifying things. The fact that I can pull Visits for an eVar and Visits for an sProp and get two completely different numbers (or do the same thing for instances and page views) is a shortcoming of the tool. We’ve got to get out of the mode of simply accepting that this will happen, that a deep and nuanced understanding of the platform is required to understand the difference, and then gnashing our teeth when more marketers don’t have the interest and/or time to develop that deep understanding of the minutia of the tool.

<pause>

Although I’ve focused on Sitecatalyst here, that doesn’t mean other platforms are beyond reproach:

  • Webtrends — Why do I have to employ black magic to get my analysis and report limits set such that I don’t miss data? Why do I have to employ Gestapo-like processes to prevent profile explosion (and confusion)? Why do I have to fall back on weeks-long reprocessing of the logs when someone comes up with a clever hypothesis that needs to be tested?
  • Google Analytics — Why can’t I do any sort of real pathing? Why do I start bumping up against sampled data that makes me leery…just when I’m about to get to something really cool I want to hang my hat on? Why is cross-domain and cross-subdomain tracking such a nightmare to really get to perform as I want it to?

My point here is that the first platform that gets a Jobs-like visionary in place who is prepared to totally destroy the current paradigm is going to have a real shot at dominating over the long haul. There are scads of upstarts in the space, but most of them are focused on excelling at one functional niche or another. Is there the possibility of a tool (or one of the current big players) really dramatically lowering the implementation/maintenance complexity bar (while also, of course, handling the proliferation of digital channels well beyond the traditional web site) so that the skills we need to develop can be the ones required to use the data rather than capture it?

Such a paradigm shift is sorely needed.

Update: Eric Peterson started a thread on Google+ spawned by this post, and the lengthy discussion that ensued is worth checking out.

Analytics Strategy

Web Analytics Tools Comparison — Columbus WAW Recap Part 2

[Update: After getting some feedback from a Coremetrics expert and kicking around the content with a few other people, I rounded out the presentation a bit.]

In my last post, I recapped and posted the content from Bryan Cristina’s 10-minute presentation and discussion of campaign measurement planning at February’s Columbus Web Analytics Wednesday. For my part of the event, I tackled a comparison of the major web analytics platforms: Google Analytics, Adobe/Omniture Sitecatalyst, Webtrends, and, to a certain extent, Coremetrics. I only had five minutes to present, so I focussed in on just the base tools — not the various “warehouse” add-ons, not the A/B and MVT testing tools, etc.

Which Tool Is Best?

This question gets asked all the time. And, anyone who has been in the industry for more than six nanoseconds knows the answer: “It depends.” That’s not a very satisfying answer, but it’s true. Unfortunately, it’s also an easy answer — someone who knows Google Analytics inside and out, has never seen the letters “DCS,” referenced the funkily-spelled “eluminate” tag, or bristled at Microsoft usurping the word “Vista” for use with a crappy OS, can still confidently answer the, “Which tool is best?” question with, “It depends.”

And You’re Different?

The challenge is that very, very few people are truly fluent in more than a couple of web analytics tools. I’ve heard that a sign of fluency in a language is that you actually think in the language. Most of us in web analytics, I suspect, are not able to immediately slip into translated thought when it comes to a tool. So, here’s my self-evaluation of my web analytics tool fluency (with regards to the base tools offered — excluding add-ons for this assessment; since the add-ons bring a lot of power, that’s an important limitation to note):

  • Basic page tag data capture mechanics — 95th percentile — this is actually something pretty important to have a good handle on when it comes to understanding one of the key differences between Sitecatalyst and other tools
  • Google Analytics — 95th percentile — I’m not Brian Clifton or  John Henson, but I’ve crafted some pretty slick implementations in some pretty tricky situations
  • Adobe-iture Sitecatalyst — 80th percentile — I’m more recent to the Sitecatalyst world, but I’ve now gotten some implementations under my belt that leverage props, evars, correlations, subrelations, classifications, and even a crafty usage of the products variable
  • Webtrends — 80th percentile — I cut my teeth on Webtrends and would have put myself in the 95th percentile five years ago, but my use of the tool has been limited of late; I’m actually surprised at how little some of the fundamentals change, but maybe I should
  • Coremetrics — 25th percentile — I can navigate the interface, I’ve dived into the mechanics of the different tags, and I’ve done some basic implementation work; it’s just the nature of the client work I’ve done — my agency has Coremetrics expertise, and I’m hoping to rely on that to refine the presentation over time

So, there’s my full disclosure. I consider myself to be pretty impartial when it comes to tools (I don’t have much patience for people who claim impartiality and then exhibit a clear bias towards “their” tool — the one tool they know really well), but, who knows? It’s a fine line between “lack of bias” and “waffler.”

Any More Caveats Before You Get to the Content?

My goal with this exercise was to sink my teeth in a bit and see what I could clearly capture and explain as the differences. Ideally, this would also get to the, “So what?” question. What I’ve found, though, is that answering that question gets circular in a hurry: “If <something one tool shines as> is important to you, then you really should go with <that tool>.” Two examples:

  • If enabling users to quickly segment traffic and view any number of reports by those segments is important, then you should consider Google Analytics (…or buying the “warehouse” add-on and plenty of seats for whatever other tool you go with)
  • If being able to view clickpaths through content aggregated different ways is important, then you should consider Sitecatalyst

These are more of a “features”-oriented assessment, and they rely on a level of expertise with web analytics in order to assess their importance in a given situation. That makes it tough.

Any tool is only as good as its implementation and the analysts using it (see Avinash’s 10/90 rule!). Some tools are much trickier to implement and maintain than others — that trickiness brings a lot of analytics flexibility, so the implementation challenges have an upside. In the end, I’ll take any tool properly implemented and maintained over a tool I get to choose that is going to be poorly implemented.

Finally! The Comparison

I expect to continue to revisit this subject, but the presentation below is the first cut. You might want to click through to view it on SlideShare and click the “Speaker Notes” tab under the main slide area — I added those in after I presented to try to catch the highlights of what I spoke to on each slide.

Do you see anything I missed or with which you violently disagree? Let me know!

http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6= http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=

http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6= http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=

http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=
http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=

http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=
http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=

http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=

Analytics Strategy, Reporting, Social Media

Monish Datta Learns All about Facebook Measurement

Columbus Web Analytics Wednesday was last week — sponsored by Omniture, an Adobe company, and the topic wound up being “Facebook Measurement” (deck at the end of this post).

For some reason, Monish Datta cropped up — prominently — in half of the pictures I took while floating around the room. In my never-ending quest to dominate SEO for searches for Monish, this was well-timed, as I’m falling in the rankings on that front. You’d think I’d be able to get some sort of cross-link from http://www.monishdatta.com/, but maybe that’s not to be.

Columbus Web Analytics Wednesday -- May 2010

We had another great turnout at the event, AND we had a first for a Columbus WAW: a door prize. Omniture provided a Flip video camera and a copy of Adobe Premier Elements 8 to one lucky winner. WAW co-organizer Dave Culbertson presented the prize to the lucky winner, Matt King of Quest Software:

Columbus Web Analytics Wednesday -- May 2010

Due to an unavoidable last minute schedule change, I wound up pinch-hitting as the speaker and talked about Facebook measurement. It’s been something I’ve spent a good chunk of time exploring and thinking about over the past six months, and it was a topic I was slated to speak on the following night in Toronto at an Omniture user group, so it wound up being a nice dry run in front of a live, but friendly crowd.

I made some subsequent updates to the deck (improvements!), but below is substantially the material I presented:

In June, Columbus Web Analytics Wednesday is actually going to happen in Cincinnati — we’re planning a road trip down and back for the event. We’re hoping for a good showing!

Analytics Strategy

Columbus WAW Recap: Don't "Antisappoint" Visitors

We had a fantastic Web Analytics Wednesday last week in Columbus, sponsored by (Adobe) Omniture, with just under 50 attendees! Darren “DJ” Johnson was the presenter, and he spoke about web site optimization (kicking off with a riff of how “optimization” is an over-used word!). I, unfortunately, forgot my “good” camera, which means my photojournalism duties were poorly, poorly performed (DJ is neither 8′ tall, nor was he ignoring his entire audience):

Columbus Web Analytics Wednesday -- March 2010

One of the anecdotes that stuck with me was when DJ explained a personal experience he had clicking through on a banner ad (“I NEVER click on banner ads!” he exclaimed) and then having the landing page experience totally under-deliver on the promise of the ad. He used the term “antisappointment” (or “anticappointment?”) to describe the experience. It’s a handy word that works better orally than written down, but I’ll be shocked with myself if I don’t start using it!

I’ve been spending more and more time thinking about and working on optimization strategies of late, and DJ’s presentation really brought it all together. This post isn’t going to be a lengthy explanation of optimization and testing…because I’m really not qualified to expound on the subject (yet). But, I will drop down a few takeaways from DJ’s presentation that hit home the most with me:

  • Testing (and targeting) doesn’t typically deliver dramatic step function improvements, so don’t expect it to — it delivers incremental improvements over time that can add up to significant gains
  • (Because of the above) Testing isn’t a project; it’s a process — it’s not enough to plan out a test, run it, and evaluate the results; rather, it’s important to develop the organizational capabilities to always be testing
  • “Testing” without “targeting” is going to deliver limited results — while initial tests may be on “all visitors to the site,” it’s important to start segmenting traffic and testing different content at the segment level as quickly as possible

Good stuff.

In other news, I’ve got a few additional bullet points:

  • Our next Web Analytics Wednesday is tentatively slated to be a happy hour only (unsponsored or with a limited sponsor) on a Tuesday. If you don’t already get e-mail reminders and you’d like to, just drop me a note and I’ll add you to our list (tim at this domain)
  • The Ohio Interactive Awards are fast approaching! This event, started up by Teambuilder Search, huber+co. interactive, and 247Interactive,  is shaping up to be a great event on April 29th at the Arena Grand Movie Theater (Resource Interactive is sponsoring the event happy hour)
  • The TechLife Columbus meetup.com group continues to grow and thrive, with over 1,500 members now — it’s free, and it’s a great way to find meetups and people who are involved in high tech and digital in central Ohio

It’s been a lot of fun to watch social media get put to use in central Ohio and make it so easy to find interesting people with shared interests. I’ve certainly gotten to know some great people over the past couple of years with a relatively low investment of my time and energy, and I’m a better person for it!

Analytics Strategy

All Web Analytics Tools Are the Same (when it comes to data capture)

I started to write a post on using web analytics tools — Google Analytics, specifically, but with a nod to Webtrends as well — to track traffic to custom tabs and interactive elements on Facebook pages. But, as I started thinking through that content, I realized that I needed to back up and make sure I had a good, clean explanation of a key aspect of the mechanics of page tag-based web analytics tools. I poked around on the interweb a bit and found some quick explanations that were accurate, but that really weren’t as detailed as I was hoping to find.

Regardless of whether you’re trying to track Facebook or not, it’s worth having a good, solid understanding of these underlying mechanics:

  • If you’re a web analyst, understanding this is like understanding gravity if you’re a human being — there are some immutable laws of the internet, and knowing how those laws drive the data you are seeing will open up new possibilities for capturing activity on your site
  • If you’re a developer, then this will be a quick read, but understanding it will make you the hero to both your web analysts and (assuming they’re not glory hogs) the people they support with their analysis, because you will be able to suggest some clever ways to capture useful information

By the end of this post, you should understand both the title and why the URLs I listed below are what make it so:

I’ve been deep under the hood with both Google Analytics and Webtrends for this, but the same principles apply to all tools (because they’re all bounded by the Physics of the Internet). I’m going to talk about Google Analytics the most in-depth, because it has the largest market share (measured by number of sites tagged with it), and I’ll try to call out key differences when appropriate.

Let’s start with a simple picture of how all of these tools work. When a visitor comes to a page on your site, the following sequence of events happens:

Steps 2 and 3 are really the crux of the biscuit, but we need to make sure we’re all clear on the first step, too, before getting to the fun there.

1 – Javascript figures out stuff about the visitor

We all know what Javascript is, right? It’s one of the key languages that can be interpreted by a web browser so that web pages aren’t just static text and images: dropdown menus, mouseovers, and such. But, Javascript also enables some things to go on behind the scenes. The basic data capture method for any tag-based web analytics tool is to run Javascript to determine what page the visitor is on, what relevant cookies are set on the user’s machine, whether the visitor has been to the site before, what browser the visitor is using, what language encoding is set for the browser, the user’s screen resolution, and a slew of other fairly innocuous details. This happens every time a visitor views a page running the page tag. So, great — a visitor has viewed a page, and the Javascript has figured out a bunch of details about the visitor and the page. Now what? It’s on to step 2!

(I realize I’m saying “Javascript” here, and most tools also have Actionscript support for tracking activity within Flash — for the purposes of this post, I’m just going to stick with Javascript, but I’ll get back to Actionscript in my next post!)

2 – Javascript packages that info into a single string of information

The next step is pretty simple, but it’s where the magic starts to happen. Let’s say the Javascript in step 1 had figured out the following information about a visitor to a page:

  • Site = http://www.gilliganondata.com
  • Page title = The Fun of Facebook Measurement
  • Page URL = /index.php/2010/01/11/the-fun-of-facebook-measurement/
  • Browser language = en-us

Converting that info into a single string is pretty straightforward. Let’s start by pretending we’re going to put it into a single row in a pipe-delimited file. It would look like this:

Site (hostname) = http://www.gilliganondata.com | Page name = The Fun of Facebook Measurement | Page URL = /index.php/2010/01/11/the-fun-of-facebook-measurement/ | Browser language = en-us

Now, rather than using the pretty, readable names for each of the four characteristics of the page view, let’s use some variable names (these are the Google Analytics variable names, but the documentation for any web analytics tool will provide their specific variable names for these same things):

  • Site (hostname) –> utmhn
  • Page title –> utmdt
  • Page URL –> utmp
  • Browser language –> utmul

So, now our string looks like:

utmhn = http://www.gilliganondata.com | utmdt = The Fun of Facebook Measurement | utmp = /index.php/2010/01/11/the-fun-of-facebook-measurement/ | utmul = en-us

We used pipes to separate out the different variables, but there’s nothing really wrong with using something different, is there? Let’s go with using “&” instead and eliminate the spaces around equal signs and the delimiters. The single string now looks like this:

utmhn=www.gilliganondata.com&utmdt=The Fun of Facebook Measurement&utmp=/index.php/2010/01/11/the-fun-of-facebook-measurement/&utmul=en-us

Now, we’ve still got some “special” characters that aren’t going to play nice in the Step 3 — namely spaces and “/”s, so let’s replace those characters with the appropriate URL encoding (%20 for the spaces and %2F for the “/”s):

utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20
Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-
facebook-measurement%2F&utmul=en-us

It looks a little messy, but it’s a single, portable string that has the exact information that was listed in the four bullets that started this section. While it might be painful to reverse-engineer this string into a more reader-friendly format by hand, it’s a snap to do programmatically (which is exactly what web analytics tools do…as we’ll discuss in step 4) or in Excel.

Before we move on, let’s tack one more parameter onto our string. This is something that is actually hard-coded into the Javascript, and it identifies which web analytics account this traffic needs to go to. In the case of this blog, that account ID is “UA-2629617-3” and the variable Google Analytics uses to identify the account parameter is “utmac.” I’ll just tack that on the end of our string, which now looks like:

utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20
Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-
facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3

A subtle point: what we’ve really done above is to combine all the information into a single string with a series of “key-value pairs.” In the case of the first variable, the “key” is “utmhn” and the “value” is “www.gilliganondata.com.” Notice that both the key AND the value are included in the string. If you’ve worked with comma-delimited or tab-delimited files, then you might be wondering why the key is included. Why can’t the Javascript always pass in the variables in the same order, and the web analytics server would know that the first value is the hostname, the second value is the title, and so on? There are at least four reasons for this:

  • It just generally makes the process more robust because it reaffirms to the server exactly what each value means at the point the server receives the information; the internet is messy, so hiccups can happen
  • Most “advanced” features when it comes to capturing web analytics data rely on tacking on additional parameters to the master string — by including both the key and the value for every parameter, that fanciness doesn’t have to worry about the order the parameters are passed in, AND it means the custom parameters get viewed/processed exactly the same way that the basic parameters do
  • The “key-value pairs separated by the & sign” are standard on the internet. Go to any online retail site and poke around, and you will see them in the URL. It’s kind of a standard way to transmit a series of variables onto the back end of a web page or image request, and that’s really all that’s going to happen in step 3

We’ve got our string, so now let’s do something with it!

3 – Javascript makes an image request with that string tacked on the end

Somehow, we need to pass that string back to the web analytics server. We do that by making an image call. In the case of Google Analytics that image request is always, always, always exactly the same, no matter the site using Google Analytics:

http://www.google-analytics.com/__utm.gif

Just like we covered in the “online retail site” URL structure discussion at the end of the last section, we’re going to tack some parameters on the end of the __utm.gif request. The standard way to take a base URL and tack on parameters is to add a “?” followed by one or more key-value pairs that are separated by an “&” sign. Lucky for us, the “&” sign is what we used when we were building our string in the last section! So:

http://www.google-analytics.com/__utm.gif

+

?

+

utmhn=www.gilliganondata.com&utmdt=The%20Fun%20of%20Facebook%20
Measurement&utmp=%2Findex.php%2F2010%2F01%2F11%2F
the-fun-of-facebook-measurement%2F&utmul=en-us&utmac=UA-2629617-3

=

http://www.google-analytics.com/__utm.gif?utmhn=www.gilliganondata.com&amp;
utmdt=The%20Fun%20of%20Facebook%20Measurement&utmp=%2F
index.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-measurement%2F&
utmul=en-us&utmac=UA-2629617-3

Wow, that looks messy, but it just looks messy — it’s actually quite clean! In reality, there are way more than five parameters tacked onto the image request. As a matter of fact, the request above would really look more like this:

http://www.google-analytics.com/__utm.gif?utmwv=4.6.5&utmn=1516518290&amp;
utmhn=www.gilliganondata.com&utmcs=UTF-8&utmsr=1920×1080&utmsc=24-
bit&utmul=en-us&utmje=1&utmfl=10.0%20r45&utmdt=The%20Fun%20of%20
Facebook%20Measurement%20%7C%20Gilligan%20on%20Data%20by%20Tim
%20Wilson&utmhid=1640286085&utmr=http%3A%2F%2Fgilliganondata.com
%2F&utmp=%2Findex.php%2F2010%2F01%2F11%2Fthe-fun-of-facebook-
measurement%2F&utmac=UA-2629617-3&utmcc=__utma%3D116252048.
1573621408.1267294551.1267294551.1267299933.2%3B%2B__utmz%3D
116252048.1267294551.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)%7C
utmcmd%3D(none)%3B&gaq=1

You can get a complete list of the Google Analytics tracking variables from Google (if you’re really into this, check out the utmcc value — that actually is a single parameter that includes multiple sub-parameters, which are separated by “%3D” — a URL-encoded semicolon — instead of an “&”; these are the user cookie values, which you can find towards the end of the long string above if you look for it). You can inspect the specific calls using any number of tools. I like to use the Firebug plugin for Firefox, but Fiddler is another free tool, and Charles is the standard tool used at my company. And, there’s always WASP to provide the “clean” view of the parameters (I use WASP heavily…unless I’m trying to reverse-engineer the specific calls being made for some reason).

The Javascript makes a request for that URL. This is the infamous “1×1 image.” Just to sharpen the edges a little bit on some common misconceptions about that image request:

  • The request for the image is what matters — while the 1×1 image will get delivered back, by the time http://www.google-analytics.com actually sends out the image, the page view has already been counted. As a matter of fact, if there was no __utm.gif image, the traffic would still get counted simply by virtue of the fact that the Google Analytics server received the image request. As it happens, some other little user experience hiccups can happen if there’s no actual image, but the existence of the file matters ‘nary at all from a data capture perspective!
  • Yes, you can actually just request the image directly from your browser. Go ahead — here’s the URL as a hyperlink: http://www.google-analytics.com/__utm.gif (yeah, it’s something of a letdown, but now you can say you’ve done it)
  • The image isn’t a 1×1 pixel image so that it’s small and not noticed by the user. If Google got a wild hair to replace the __utm.gif image with a 520×756 pixel image of a psychedelic interpretation of the Mona Lisa…no one would ever see the change (unless they were doing something silly like calling the image directly from their browser as described in the previous bullet). The image gets requested by the Javascript, but it never gets displayed to the user. It’s sort of like a Javascript dropdown menu — the text for the dropdown gets loaded into the browser memory so that, if you mouse over the menu, the text is already there and can be displayed immediately. The __utm.gif request is the same way…except there’s nothing in the Javascript that ever actually tries to render the image to the user

And one more point: While we’ve been talking about “image requests” here, it doesn’t have to be an image request per se. In the case of Google Analytics, it is. In the case of Webtrends, it is, too (the image is called dcs.gif). In the case of other web analytics packages, it’s not necessarily an image request, but it is a request to the web analytics server. What matters is understanding that there are a bunch of key-value pairs tacked on after a “?” in the request, and that’s where all of the fun information about the visit to the page gets recorded and passed.

4 – Web analytics tool reads the string and puts the information into a database

So, the web analytics server has been getting bombarded with the requests from Step 3. Can you see how straightforward it is for software to take those requests and split them back out into their component parts? That’s the easy part. Where the tools really differentiate themselves is how exactly they store all of that data — the design of their database and then how that data is made available for queries and reports by analysts.

Back in the day (and I assume it’s still an option), Webtrends would make the raw log files available to their customers as an add-on service. That was handy — once we understood the basics of this post and the Webtrends query parameters, we were able to sift through for some juicy nuggets to supplement our “traditional” web analytics (these were in the days before Webtrends had their “warehouse” solution, which would have made the same information available).

5 – Web analyst queries the database for insights

Like step 4, this is an area where web analytics tools really differentiate themselves. In the case of Google Analytics, there is the web-based tool and the API. In the case of paid, enterprise-class tools, there are similar tools plus true data warehouse environments that allow much more granular detail, as well as two-way integration with other systems.

Why Understanding This Matters

You’re still reading, so maybe I should have made this case earlier. But, the reason this matters is because, once you understand these mechanics, you can start to do some fun things to handle unique situations. For instance, what do you do if you have Google Analytics, and you want to track activity somewhere where Javascript won’t run (like…um…your Facebook fan page — that’ll be my next post!). Or, more generally, if you’re Googling around looking for ways to address some sort of one-off tracking need, you’ll understand the explanations that you’re finding — these solutions invariably involve twiddling around within the framework described here.

As I read back through this post before publishing it, I was struck by how far into the tactical mechanics of web analytics it is. The overwhelming majority of web analytics blog posts focus on step 5 and beyond — how to use the data to be an analysis ninja rather than a report monkey. Understanding the mechanics described here is a foundational step that will support all of that analysis work. I was incredibly fortunate, early in my web analytics career, to have an opportunity to run the migration from a log-based web analytics package to a tag-based solution. I was triply fortunate that I worked on that migration with two brilliant and patient IT folk: Ernest Mueller as the web admin supporting the effort, and Ryan Rutan, the developer supporting the effort — he was hacking the Webtrends page tag before the consultant who we had on-site to help implement it had finished his first day. Ernest drew countless whiteboard diagrams to explain to me “how the internet works” (those “immutable laws” I mentioned early in this post), while Ryan repeated himself again and again until I understood this whole “image request with parameters” paradigm.

If you’re a web analyst, seek out these types of people in IT. A hearty collaboration of cross-discipline skills can yield powerful results and be a lot of fun. I had similar collaborations when I worked at Bulldog Solutions, and the last two weeks saw the same thing happening at my current gig at Resource Interactive. Those are pretty energizing experiences that leave me scratching my head as to why so many companies wind up with an adversarial relationship between “the business” and “IT.” But THAT is a topic for a whoooollllle other post that I may never write…

Analysis, Analytics Strategy, Reporting, Social Media

The Most Meaningful Insights Will Not Come from Web Analytics Alone

Judah Phillips wrote a post last week laying out why the answer to the question, “Is web analytics hard or easy?” is a resounding “it depends.” It depends, he wrote, on what tools are being used, on how the site being analyzed is built, on the company’s requirements/expectations for analytics, on the skillset of the team doing the analytics, and, finally, on the robustness of the data management processes in place.

One of the comments on the blog came from John Grono of GAP Research, who, while agreeing with the post, pointed out:

You refer to this as “web analytics”. I also know that this is what the common parlance is, but truth be known it is actually “website analytics”. “web” is a truncation of “world wide web” which is the aggregation of billions of websites. These tools do not analyse the “web”, but merely individual nominated “websites” that collectively make up the “web”. I know this is semantics … but we as an industry should get it right.

It’s a valid point. Traditionally, “web analytics” has referred to the analysis of activity that occurs on a company’s web site, rather than on the web as a whole. Increasingly, though, companies are realizing that this is an unduly narrow view:

  • Search engine marketers (SEO and SEM) have, for years, used various keyword research tools to try to determine what words their target customers are using explicitly off-site in a search engine (although the goal of this research has been to use that information to bring these potential customers onto the company’s site)
  • Integration with a company’s CRM and/or marketing automation system — to combine information about a customer’s on-site activity with information about their offline interactions with the company — has been kicked around as a must-do for several years; the major web analytics vendors have made substantial headway in this area over the past few years
  • Of late, analysts and vendors have started looking into the impact of social media and how actions that customers and prospects take online, but not on the company’s web site, play a role in the buying process and generate analyzable data in the process

The “traditional” web analytics vendors (Omniture, Webtrends, and the like) were, I think, a little late realizing that social media monitoring and measurement was going to turn into a big deal. To their credit, they were just getting to the point where their platforms were opening up enough that CRM and data warehouse integration was practical. I don’t have inside information, but my speculation is that they viewed social media monitoring more as an extension of traditional marketing and media research companies that as an adjacency to their core business that they should consider exploring themselves. In some sense, they were right, as Nielsen, J.D. Power and Associates (through acquisition), Dow Jones, and TNS Media Group all rolled out social media monitoring platforms or services fairly early on. But, the door was also opened for a number of upstarts: Biz360, Radian6, Alterian/Techrigy/SM2, Crimson Hexagon, and others whom I’m sure I’ve left off this quick list. The traditional web analytics vendors have since come to the party through partnerships — leveraging the same integration APIs and capabilities that they developed to integrate with their customers’ internal systems to integrate with these so-called listening platforms.

Somewhat fortuitously, a minor hashtag snafu hit Twitter in late July when #wa, which had settled in as the hashtag of choice for web analytics tweets was overrun by a spate of tweets about Washington state. Eric Peterson started a thread to kick around alternatives, and the community settled on #measure, which Eric documented on his blog. I like the change for two reasons (notwithstanding those five precious characters that were lost in the process):

  1. As Eric pointed out, measurement is the foundation of analysis — I agree!
  2. “Web analytics,” which really means “website analytics,” is too narrow for what analysts need to be doing

I had a brief chat with a co-worker on the subject last week, and he told me that he has increasingly been thinking of his work as “digital analytics” rather than “web analytics,” which I liked as well.

It occurred to me that we’re really now facing two fundamental dimensions when it comes to where our customers (and potential customers) are interacting with our brand:

  • Online or offline — our website, our competitors’ websites, Facebook, blogs, and Twitter are all examples of where relevant digital (online) activities occur, while phone calls, tradeshows, user conferences, and peer discussions are all examples of analog (offline) activities
  • On-site or off-site — this is a bit of a misnomer, but I haven’t figured out the right words yet. But, it really means that customers can interact with the company directly, or, they can have interactions with the company’s brand through non-company channels

Pictorially, it looks something like this:
Online / Offline vs. Onsite / Offsite

I’ve filled in the boxes with broad descriptions of what sort of tools/systems actually collect the data from interactions that happen in each space. My claim is that any analyst who is expecting to deliver meaningful insight for his company needs to understand all four of these quadrants and know how to detect relevant signals that are occuring in them.

What do you think?