R and Adobe Analytics: Two Dimensions, Many Metrics – Part 1 of 3
This is the first of three posts that all use the same base set of configuration to answer three different questions:
- How do my key metrics break out across two different dimensions?
- Did any of these metrics change significantly over the past week (overall)?
- Which of these metrics changed significantly over the past week within specific combinations of those two different dimensions?
Answering the second question looks something like this (one chart for each metric):
Answering the third question — which uses the visualization from the first question and the logic from the second question — looks like this:
These were all created using R, and the code that was used to create them is available on Github. It’s one overall code set, but it’s set up so that any of these questions can be answered independently. They just share enough common ground on the configuration front that it made sense to build them in the same project (we’ll get to that in a bit).
This post goes into detail on the first question. The next one goes into detail on the second question. And, I own a T-shirt that says, “There are two types of people in this world: those who know how to extrapolate from incomplete information.” So, I’ll let you guess what the third post will cover.
The remainder of this post is almost certainly TL;DR for many folks. It gets into the details of the what, wherefore, and why of the actual rationale and methods employed. Bail now if you’re not interested!
Key Metrics? Two Dimensions?
Raise your hand if you’ve ever been asked a question like, “How does our traffic break down by channel? Oh…and how does it break down by device type?” That question-that-is-really-two-questions is easy enough to answer, right? But, when I get asked it, I often feel like it’s really one question, and answering it as two questions is actually a missed opportunity.
Recently, while working with a client, a version of this question came up regarding their last touch channels and their customer segments. So, that’s what the examples shown here are built around. But, it could just as easily have been device category and last touch channel, or device category and customer segment, or new/returning and device category, or… you get the idea.
When it comes to which metrics were of interest, it’s an eCommerce site, and revenue is the #1 metric. But, of course, revenue can be decomposed into its component parts:
[Visits] x [Conversion Rate] x [Average Order Value]
Or, since there are multiple lines per order, AOV can actually be broken down:
[Visits] x [Conversion Rate] x [Lines per Order] x [Revenue per Line]
Again, the specific metrics can and should vary based on the business, but I got to a pretty handy list in my example case simply by breaking down revenue into the sub-metrics that, mathematically, drive it.
The Flexibility of Scripting the Answer
Certainly, one way to tackle answering the question would be to use Ad Hoc Analysis or Analysis Workspace. But, the former doesn’t visualize heatmaps at all, and the latter…doesn’t visualize this sort of heatmap all that well. Report Builder was another option, and probably would have been the route I went…except there were other questions I wanted to explore along this two-dimensional construct that are not available through Report Builder.
So, I built “the answer” using R. That means I can continue to extend the basic work as needed:
- Exploring additional metrics
- Exploring different dimensions
- Using the basic approach with other sites (or with specific segments for the current site — such as “just mobile traffic”)
- Extending the code to do other explorations of the data itself (which I’ll get into with the next two posts)
- Extending the approach to work with Google Analytics data
Key Aspects of R Put to Use
The first key to doing this work, of course, is to get the data out. This is done using the RSiteCatalyst package.
The second key was to break up the code into a handful of different files. Ultimately, the output was generated using RMarkdown, but I didn’t put all of the code in a single file. Rather, I had one script (.R) that was just for configurations (this is what you will do most of the work in if you download the code and put it to use for your own purposes), one script (.R) that had a few functions that were used in answering multiple questions, and then one actual RMarkdown file (.Rmd) for each question. The .Rmd files use
read_chunk() to selectively pull in the configuration settings and functions needed. So, the actual individual files break down something like this:
This probably still isn’t as clean as it could be, but it gave me the flexibility (and, perhaps more importantly, the extensibility) that I was looking for, and it allowed me to universally tweak the style and formatting of the multi-slide presentations that each question generated.
.Renviron file is a very simple text file with my credentials for Adobe Analytics. It’s handy, in that it only sits on my local machine; it never gets uploaded to Github.
How It Works (How You Can Put It to Use)
There is a moderate level of configuration required to run this, but I’ve done my best to thoroughly document those in the scripts themselves (primarily in
config.R). But, summarizing those:
- Date Range — you need to specify the start and end date. This can be statically defined, or it can be dynamically defined to be “the most recent full week,” for instance. The one wrinkle on the date range is that I don’t think the script will work well if the start and end date cross a year boundary. The reason is documented in the script comments, so I won’t go into that here.
- Metrics — for each metric you want to include, you need to include the metric ID (which can be something like “revenue” for the standard metrics or “event32” for events, but can also be something like “cm300000270_56cb944821d4775bd8841bdb” if it’s a calculated metric; you may have to use the
GetMetrics()function to get the specific values here. Then, so that the visualization comes out nicely, for each metric, you have to give it a label (a “pretty name”), specify the type of metric it is (simple number, currency, percentage), and how many places after the decimal should be included (visits is a simple number that needs 0 places after the decimal, but, “Lines per Order” may be a simple number where 2 places after the decimal make sense).
- One or more “master segments” — it seems reasonably common, in my experience, that there are one or two segments that almost always get applied to a site (excluding some ‘bad’ data that crept in, excluding a particular sub-site, etc.), and the script accommodates this. This can also be used to introduce a third layer to the results. If, for instance, you wanted to look at last touch channel and device category just for new visitors, then you can apply a master segment for new visitors, and that will then be applied to the entire report.
- One Segment for Each Dimension Value — I went back and forth on this and, ultimately, went with the segments approach. In the example above, this was 13 total segments (one each for the seven channels, which included the “All Others” channel, and one each for each of the six customer segments, which was five customer segment values plus one “none specified” customer segment). I could have also simple pulled the “Top X” values for specific dimensions (which would have had me using a different RSiteCatalyst function), but this didn’t give me as much control as I wanted to ensure I was covering all of the traffic and was able to make an “All Others” catch-all for the low-volume noise areas (which I made with an Exclude segment). And, these were very simple segments (in this case, although many use cases would likely be equally simple). Using segments meant that each “cell” in the heatmap was a separate query to the Adobe Analytics API. On the one hand, that meant the script can take a while to run (~20 minutes for this site, which has a pretty high volume of traffic). But, it also means the queries are much less likely to time out. Below is what one of these segments looks like. Very simple, right?
- Segment Meta Data — each segment needs to have a label (a “pretty name”) specified, just like the metrics. That’s a “feature!” It let me easily obfuscate the data in these examples a bit by renaming the segments “Channel #1,” “Channel #2,” etc. and “Segment A,” “Segment B,” etc. before generating the examples included here!
- A logo — this isn’t in the configuration, but, rather, just means replacing the logo.png file in the images subdirectory.
Getting the segment IDs is a mild hassle, too, in that you likely will need to use the
GetSegments() function to get the specific values.
This may seem like a lot of setup overall, but it’s largely a one-time deal (until you want to go back in and use other segments or other metrics, at which point you’re just doing minor adjustments).
Once this setup is done, the script just:
- Cycles through each combination of the segments from each of the segment lists and pulls the totals for each of the specified metrics
- For each [segment 1] + [segment 2] + [metric] combination, adds a row to a data frame. This results in a “tidy” data frame with all of the data needed for all of the heatmaps
- For each metric, generates a heatmap using
- Generates an ioslides presentation that can then be shared as is or PDF’d for email distribution
Easy as pie, right?
What about Google Analytics?
This code would be fairly straightforward to repurpose to use googleAnalyticsR rather than RSiteCatalyst. That’s not the case when it comes to answering the questions covered in the next two posts (although it’s still absolutely doable for those, too — I just took a pretty big shortcut that I’ll get into in the next two posts). And, I may actually do that next. Leave a comment if you’d find that useful, and I’ll bump it up my list (it may happen anyway based on my client work).
The Rest of the Series
If you’re feeling ambitious and want to go ahead and dive into the rest of the series: