Analytics Strategy

Working Around Sampled Search Data in Google Analytics

I got into a discussion of sampling in Google Analytics  with SEO expert and Web PieRat Jill Kocher earlier this year, which led to some profile/filter noodling that seemed worth sharing. Specifically, Jill and I were discussing how, in the world of search engine optimization — where the long tail can be a handy thing to analyze — sampling in Google Analytics can be a real nuisance.

That got me thinking that a partial solution would be to have a Google Analytics profile that only includes organic search traffic. This isn’t a profile that you would use for cross-session analytics, but it’s one that would allow simplified segmentation, reduced cases of sampling, and, perhaps, a more complete data set.

As it turns out, it was pretty simple to set up, and it seems to do the trick.

Step 1: Make a New Profile

Create a new profile under the same web property that you’re using for your site and name it Organic Search Traffic Only:

There’s nothing magic about this. The key is that this is a profile that uses the same web property ID as the profile where you’re running into sampling issues with your SEO analysis. We’re just going to take that same feed of data coming in as visitors visit your site and carve out the subset of that data that is traffic from organic search referrals.

Step 2: Apply an Organic Search Filter

The next (and final) step is to create a filter and apply it to the profile such that only organic search traffic is included.

In the new profile you just created, select the Filters tab and then click New Filter:

From there:

  1. Give the filter a name like “Organic Search Referrals”
  2. Select Custom Filter as the Filter Type
  3. Set the filter as an Include filter
  4. Set the Filter Field to Campaign Medium
  5. Set the Filter Pattern to “organic”
  6. Save the filter

The screen below shows the filter settings:

Step 3: Sit Back and Let the Data Roll In

The profile is only going to include data from the point you set it up going forward. But, it will accurately reflect (to the extent that any web analytics package can accurately reflect this) new versus returning visitors for all time (well, since you initially implemented Google Analytics), because it’s getting that data from the cookie that already exists on users’ machines.

Initially, I saw some odd data on the unique visitors front, which I can semi-intuitively understand…but not quite explain.

Suffice it to say that, once you have the profile up and running for a week or so, you can select the Non-paid Search Traffic segment in your main profile and compare it to the All Visits segment in your new profile, and the numbers will be virtually identical. But, you can now do SEO analysis with a base set of data that only includes search traffic.

Is that handy?

Analytics Strategy

SEO Tips and Thoughts at Web Analytics Wednesday

Last week’s Columbus Web Analytics Wednesday had something of an odd vibe, but it was also one of the most tactically informative ones that we’ve had to date! The crowd was smaller than usual — 18 attendees — due to a confluence of factors ranging from the influenza virus (not H1N1, as far as I know, but appropriate precautionary non-attendance by several people), to business travel to residential water line leaks, to touching-if-inconveniently-timed spousal romantic gestures! The silver lining is that, to a person, there was genuine regret about not being able to attend the event, which is a strong indication that our informal community of local analysts really has solidified. (Monish Datta was in attendance, so I am able to gratuitously make a reference to him — ask him or me at the next WAW what that is all about, if you don’t already know!)

As for the event itself, we welcomed a new sponsor — Resource Interactive. The topic for the event was search engine optimization (SEO) with a little bit of search engine marketing (SEM). It wasn’t the first time that we relied on Dave Culbertson of Lightbulb Interactive to present, and it likely will not be the last, as his knowledge and enthusiasm about SEO, SEM, and web analytics is both entertaining and informative!

Dave Culbertson at Web Analytics Wednesday

Dave attended SMX East in New York the week before WAW, and he agreed to pull together the highlights of the sessions that he attended. One of my favorite tweets from Dave while he was at the conference was this one:

“Ended up leading a lunchtime discussion on web analytics at #smxeast. Web analytics and SEO – like peanut butter and chocolate!”

Partly because Dave is one of the organizers of Columbus Web Analytics Wednesday, and partly because, well, SEO/SEM and web analytics really should be integrated, “search” is a frequent cornerstone of our WAW topics. Dave’s presentation was titled SMX East 2009: The Spinal Tap Wrap-up. At least half of us (myself included) didn’t get the reference, while a solid quarter of the attendees immediately got it and thought it was quite clever and amusing. There were 11 slides in the deck, so:

The presentation focussed primarily on SEO tips, although there was some SEM here and there. An incomplete list of the nuggets/surprises that jumped out the most to me included:

  • PageRank sculpting — this is when you try to gently influence the Google PageRank for pages you control by making subtle, behind-the-scenes tweaks to both that page and other pages that you control that link to that page. Apparently, a somewhat common way to do this has been through the use of the NoFollow tag. While this may have worked at one point, Google now pretty much ignores the tag when it comes to assessing PageRank
  • rel=”canonical” — this is a biggie, especially when it comes to web analytics and campaign tracking; this is a tag that can be added to a page to specify the exact “preferred” URL for the page. It’s important because many pages get linked to or arrived at with one or many extraneous parameters tacked on to the end of the URL: campaign tracking parameters for the web analytics tool, link tracking information for the e-mail engine from which a user may access the page, session ID or user ID information for the application that is rendering the page to enable it to make subtle tweaks in the content, etc. The full adoption of this tag by Google, Yahoo! Search, and Bing should go a long way towards removing the tension that exists between the SEO person pushing for the removal of these parameters in links (to avoid link dilution) and the web analyst who pushes to add them (to improve tracking capabilities). Google put together a nice write-up and video on the canonical tag after SMX West.
  • keywords — this is “keywords for SEO,” rather than the SEM usage of the term. A lot of information was presented about studies as to where the appearance of a keyword had the most/least impact. Having the keyword in the domain name itself was great, but, of course, you’re not going to be able to do that for too many keywords! (I couldn’t help but thinking of Clearsaleing’s http://www.attributionmanagement.com/ site, though!) Even better is to have the keyword in the domain and in the directory path (i.e., http://www.keyword.com/keyword). Having the keyword in a subdomain (http://keyword.company.com) is apparently not very effective (there was a quick side discussion about an online shoe retailer — and I can’t remember which one it was and, ironically, can’t seem to put together the right Google search to figure it out — that tried creating a subdomain for very type of shoe they sold…which then helped trigger Google to make this not effective; I’m fuzzy on the specifics, obviously!) Another point here is that there is both the “what the search engine algorithm puts weight on keyword-wise” and the “how user behavior — which links users follow — is affected by keywords showing up in subdomains, domains, query parameters, etc.” factor — it’s hard to tease out which is which, so the studies have focussed more on “what actually happens” rather than “why it happens.”

At the end of the day, search engine optimization still comes down to providing great content in a way that users can easily navigate to it and consume it. Google’s algorithms are geared around making the same recommendations that a human being with an infinite knowledge of what content was where on the web would recommend in response to a question from another human being. SEO efforts need to focus on helping that theoretical human out — not trying to fool him/her!

I also distributed copies of the deck that Laura Thieme of Bizresearch presented at SMX East. That presentation was primarily SEM-focussed, but it also had some great nuggets in it. Unfortunately, Laura wasn’t able to attend WAW (see the first paragraph of this post!) this month. Laura presented at WAW back in July and really knows her way around SEM, so we missed having her there!

All in all, it was a good event!

Analysis, Analytics Strategy, Reporting, Social Media

The Most Meaningful Insights Will Not Come from Web Analytics Alone

Judah Phillips wrote a post last week laying out why the answer to the question, “Is web analytics hard or easy?” is a resounding “it depends.” It depends, he wrote, on what tools are being used, on how the site being analyzed is built, on the company’s requirements/expectations for analytics, on the skillset of the team doing the analytics, and, finally, on the robustness of the data management processes in place.

One of the comments on the blog came from John Grono of GAP Research, who, while agreeing with the post, pointed out:

You refer to this as “web analytics”. I also know that this is what the common parlance is, but truth be known it is actually “website analytics”. “web” is a truncation of “world wide web” which is the aggregation of billions of websites. These tools do not analyse the “web”, but merely individual nominated “websites” that collectively make up the “web”. I know this is semantics … but we as an industry should get it right.

It’s a valid point. Traditionally, “web analytics” has referred to the analysis of activity that occurs on a company’s web site, rather than on the web as a whole. Increasingly, though, companies are realizing that this is an unduly narrow view:

  • Search engine marketers (SEO and SEM) have, for years, used various keyword research tools to try to determine what words their target customers are using explicitly off-site in a search engine (although the goal of this research has been to use that information to bring these potential customers onto the company’s site)
  • Integration with a company’s CRM and/or marketing automation system — to combine information about a customer’s on-site activity with information about their offline interactions with the company — has been kicked around as a must-do for several years; the major web analytics vendors have made substantial headway in this area over the past few years
  • Of late, analysts and vendors have started looking into the impact of social media and how actions that customers and prospects take online, but not on the company’s web site, play a role in the buying process and generate analyzable data in the process

The “traditional” web analytics vendors (Omniture, Webtrends, and the like) were, I think, a little late realizing that social media monitoring and measurement was going to turn into a big deal. To their credit, they were just getting to the point where their platforms were opening up enough that CRM and data warehouse integration was practical. I don’t have inside information, but my speculation is that they viewed social media monitoring more as an extension of traditional marketing and media research companies that as an adjacency to their core business that they should consider exploring themselves. In some sense, they were right, as Nielsen, J.D. Power and Associates (through acquisition), Dow Jones, and TNS Media Group all rolled out social media monitoring platforms or services fairly early on. But, the door was also opened for a number of upstarts: Biz360, Radian6, Alterian/Techrigy/SM2, Crimson Hexagon, and others whom I’m sure I’ve left off this quick list. The traditional web analytics vendors have since come to the party through partnerships — leveraging the same integration APIs and capabilities that they developed to integrate with their customers’ internal systems to integrate with these so-called listening platforms.

Somewhat fortuitously, a minor hashtag snafu hit Twitter in late July when #wa, which had settled in as the hashtag of choice for web analytics tweets was overrun by a spate of tweets about Washington state. Eric Peterson started a thread to kick around alternatives, and the community settled on #measure, which Eric documented on his blog. I like the change for two reasons (notwithstanding those five precious characters that were lost in the process):

  1. As Eric pointed out, measurement is the foundation of analysis — I agree!
  2. “Web analytics,” which really means “website analytics,” is too narrow for what analysts need to be doing

I had a brief chat with a co-worker on the subject last week, and he told me that he has increasingly been thinking of his work as “digital analytics” rather than “web analytics,” which I liked as well.

It occurred to me that we’re really now facing two fundamental dimensions when it comes to where our customers (and potential customers) are interacting with our brand:

  • Online or offline — our website, our competitors’ websites, Facebook, blogs, and Twitter are all examples of where relevant digital (online) activities occur, while phone calls, tradeshows, user conferences, and peer discussions are all examples of analog (offline) activities
  • On-site or off-site — this is a bit of a misnomer, but I haven’t figured out the right words yet. But, it really means that customers can interact with the company directly, or, they can have interactions with the company’s brand through non-company channels

Pictorially, it looks something like this:
Online / Offline vs. Onsite / Offsite

I’ve filled in the boxes with broad descriptions of what sort of tools/systems actually collect the data from interactions that happen in each space. My claim is that any analyst who is expecting to deliver meaningful insight for his company needs to understand all four of these quadrants and know how to detect relevant signals that are occuring in them.

What do you think?