R and Adobe Analytics: Did the Metric Move Significantly? Part 3 of 3
This is the third post in a three-post series. The earlier posts build up to this one, so you may want to go back and check them out before diving in here if you haven’t been following along:
- Part 1 of 3: The overall approach, and a visualization of metrics in a heatmap format across two dimensions
- Part 2 of 3: Recreating — and refining — the use of Adobe’s anomaly detection to get an at-a-glance view of which metrics moved “significantly” recently
The R scripts used for both of these, as well as what’s covered in this post, are posted on Github and available for download and re-use (open source FTW!).
Let’s Mash Parts 1 and 2 Together!
This final episode in the series answers the question:
Which of the metrics changed significantly over the past week within specific combinations of two different dimensions?
The visualization I used to answer this question is this one:
This, clearly, is not a business stakeholder-facing visualization. And, it’s not a color-blind friendly visualization (although the script can easily be updated to use a non-red/green palette).
Hopefully, even without reading the detailed description, the visualization above jumps out as saying, “Wow. Something pretty good looks to have happened for Segment E overall last week, and, specifically, Segment E traffic arriving from Channel #4.” That would be an accurate interpretation.
But, What Does It Really Mean?
If you followed the explanation in the last post, then, hopefully, the explanation is really simple. In the last post, the example I showed was this:
This example had three “good anomalies” (the three dots that are outside — and above — the prediction interval) in the last week. And, it had two “bad anomalies” (the two dots at the beginning of the week that are outside — and below — the prediction interval).
In addition to counting and showing “good” and “bad” anomalies, I can do one more simple calculation to get “net positive anomalies:”
[Good Anomalies] – [Bad Anomalies] = [Net Positive Anomalies]
In the example above, this would be:
[3 Good Anomalies] – [2 Bad Anomalies] = [1 Net Positive Anomaly]
If the script is set to look at the previous week, and if weekends are ignored (which is a configuration within the script), then that means the total possible range for net positive anomalies is -5 to +5. That’s a nice range to provide a spectrum for a heatmap!
A Heatmap, Though?
This is where the first two posts really get mashed together:
- The heatmap structure lets me visualize results across two different dimensions (plus an overall filter to the data set, if desired)
- The anomaly detection — the “outside the prediction interval of the forecast of the past” — lets me get a count of how many times in the period a metric looked “not as expected”
The heatmap represents the two dimensions pretty obviously. For each cell — each intersection of a value from each of the two dimensions — there are three pieces of information:
- The number of good anomalies in the period (the top number)
- The number of bad anomalies in the period (the bottom number)
- The number of net positive anomalies (the color of the cell)
You can think of each cell as having a trendline with a forecast and prediction confidence band for the last period, but actually displaying all of those charts would be a lot of charts! With the heatmap shown above, there are 42 different slices represented for each metric (there is then one slide for each metric), and it’s quick to interpret the results once you know what they’re showing.
What Do You Think?
This whole exercise grew out of some very specific questions that I was finding myself asking each time I reviewed a weekly performance measurement dashboard. I realize that “counting anomalies by day,” is somewhat arbitrary. But, by putting some degree of rigor behind identifying anomalies (which, so far, relies heavily on Adobe to do the heavy lifting, but, as covered in the second post, I’ve got a pretty good understanding of how they’re doing that lifting, and it seems fairly replicable to do this directly in R), it seems useful to me. If and when a specific channel, customer segment, or combination of channel/segment takes a big spike or dip in a metric, I should be able to hone in on it with very little manual effort. And, I can then start asking, “Why? And, is this something we can or should act on?”
Almost equally importantly, the building blocks I’ve put in place, I think, provide a foundation that I (or anyone) can springboard off of to extend the capabilities in a number of different directions.
What do you think?