Analysis, Featured

The Trouble (My Troubles) with Statistics

Okay. I admit it. That’s a linkbait-y title. In my defense, though, the only audience that would be successfully baited by it, I think, are digital analysts, statisticians, and data scientists. And, that’s who I’m targeting, albeit for different reasons:

  • Digital analysts — if you’re reading this then, hopefully, it may help you get over an initial hump on the topic that I’ve been struggling mightily to clear myself.
  • Statisticians and data scientists — if you’re reading this, then, hopefully, it will help you understand why you often run into blank stares when trying to explain a t-test to a digital analyst.

If you are comfortably bridging both worlds, then you are a rare bird, and I beg you to weigh in in the comments as to whether what I describe rings true.

The Premise

I took a college-level class in statistics in 2001 and another one in 2010. Neither class was particularly difficult. They both covered similar ground. And, yet, I wasn’t able to apply a lick of content from either one to my work as a web/digital analyst.

Since early last year, as I’ve been learning R, I’ve also been trying to “become more data science-y,” and that’s involved taking another run at the world of statistics. That. Has. Been. HARD!

From many, many discussions with others in the field — on both the digital analytics side of things and the more data science and statistics side of things — I think I’ve started to identify why and where it’s easy to get tripped up. This post is an enumeration of those items!

As an aside, my eldest child, when applying for college, was told that the fact that he “didn’t take any math” his junior year in high school might raise a small red flag in the admissions department of the engineering school he’d applied to. He’d taken statistics that year (because the differential equations class he’d intended to take had fallen through). THAT was the first time I learned that, in most circles, statistics is not considered “math.” See how little I knew?!

Terminology: Dimensions and Metrics? Meet Variables!

Historically, web analysts have lived in a world of dimensions. We combine multiple dimensions (channel + device type, for instance) and then put one or more metrics against those dimensions (visits, page views, orders, revenue, etc.)

Statistical methods, on the other hand, work with “variables.” What is a variable? I’m not being facetious. It turns out it can be a bit a mind-bender if you come at it from a web analytics perspective:

  • Is device type a variable?
  • Or, is the number of visits by device type a variable?
  • OR, is the number of visits from mobile devices a variable?

The answer… is “Yes.” Depending on what question you are asking and what statistical method is being applied, defining what your variable(s) are, well, varies. Statisticians think of variables as having different types of scales: nominal, ordinal, interval, or ratio. And, in a related way, they think of data as being either “metric data” or “nonmetric data.” There’s a good write-up on the different types — with a digital analytics slant — in this post on dartistics.com.

It may seem like semantic navel-gazing, but it really isn’t: different statistical methods work with specific types of variables, so data has to be transformed appropriately before statistical operations are performed. Some day, I’ll write that magical post that provides a perfect link between these two fundamentally different lenses through which we think about our data… but today is not that day.

Atomic Data vs. Aggregated Counts

In R, when using ggplot to create a bar chart that uses underlying data that looks similar to how data would look in Excel, I have to include a parameter that is stat="identity". As it turns out, that is a symptom of the next mental jump required to move from the world of digital analytics to the world of statistics.

To illustrate, let’s think about how we view traffic by channel:

  • In web analytics, we think: “this is how many (a count) visitors to the site came from each of referring sites, paid search, organic search, etc.”
  • In statistics, typically, the framing would be: “here is a list (row) for each visitor to the site, and each visitor is identified as being visiting from referring sites, paid search, organic search, etc.” (or, possibly, “each visitor is flagged as being yes/no for each of: referring sites, paid search, organic search, etc.”… but that’s back to the discussion of “variables” covered above).

So, in my bar chart example above, R defaults to thinking that it’s making a bar chart out of a sea of data, where it’s aggregating a bunch of atomic observations into a summarized set of bars. The stat="identity" argument has to be included to tell R, “No, no. Not this time. I’ve already counted up the totals for you. I’m telling you the height of each bar with the data I’m sending you!”

When researching statistical methods, this comes up time and time again: statistical techniques often expect a data set to be a collection of atomic observations. Web analysts typically work with aggregated counts. Two things to call out on this front:

  • There are statistical methods (a cross tabulation with a Chi square test for independence is one good example) that work with aggregated counts. I realize that. But, there are many more that actually expect greater fidelity in the data.
  • Both Adobe Analytics (via data feeds, and, to a clunkier extent, Data Warehouse) and Google Analytics (via the GA360 integration with Google BigQuery) offer much more atomic level data than the data they provided historically through their primary interfaces; this is one reason data scientists are starting to dig into digital analytics data more!

The big, “Aha!” for me in this area is that we often want to introduce pseudo-granularity into our data. For instance, if we look at orders by channel for the last quarter, we may have 8-10 rows of data. But, if we pull orders by day for the last quarter, we have a much larger set of data. And, by introducing granularity, we can start looking at the variability of orders within each channel. That is useful! When performing a 1-way ANOVA, for instance, we need to compare the variability within channels to the variability across channels to draw conclusions about where the “real” differences are.

This actually starts to get a bit messy. We can’t just add dimensions to our data willy-nilly to artificially introduce granularity. That can be dangerous! But, in the absence of truly atomic data, some degree of added dimensionality is required to apply some types of statistical methods. <sigh>

Samples vs. Populations

The first definition for “statistics” I get from Google (emphasis added) is:

“the practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.”

Web analysts often work with “the whole.” Unless we consider historical data the sample and the “whole” including future web traffic. But, if we view the world that way — by using time to determine our “sample” — then we’re not exactly getting a random (independent) sample!

We’ve also been conditioned to believe that sampling is bad! For years, Adobe/Omniture was able to beat up on Google Analytics because of GA’s “sampled data” conditions. And, Google has made any number of changes and product offerings (GA Premium -> GA 360) to allow their customers to avoid sampling. So, Google, too, has conditioned us to treat the word “sampled” as having a negative connotation.

To be clear: GA’s sampling is an issue. But, it turns out that working with “the entire population” with statistics can be an issue, too. If you’ve ever heard of the dangers of “overfitting the model,” or if you’ve heard, “if you have enough traffic, you’ll always find statistical significance,” then you’re at least vaguely aware of this!

So, on the one hand, we tend to drool over how much data we have (thank you, digital!). But, as web analysts, we’re conditioned to think “always use all the data!” Statisticians, when presented with a sufficiently large data set, like to pull a sample of that data, build a model, and then test the model with another sample of the data. As far as I know, neither Adobe nor Google have an, “Export a sample of the data” option available natively. And, frankly, I have yet to come across a data scientist working with digital analytics data who is doing this, either. But, several people have acknowledged this is something that should be done in some cases.

I think this is going to have to get addressed at some point. Maybe it already has been, and I just haven’t crossed paths with the folks who have done it!

Decision Under Uncertainty

I’ve saved the messiest (I think) for last. Everything on my list to this point has been, to some extent, mechanical. We should be able to just “figure it out” — make a few cheat sheets, draw a few diagrams, reach a conclusion, and be done with it.

But, this one… is different. This is an issue of fundamental understanding — a fundamental perspective on both data and the role of the analyst.

Several statistically-savvy analysts I have chatted with have said something along the lines of, “You know, really, to ‘get’ statistics, you have to start with probability theory.” One published illustration of this stance can be found in The Cartoon Guide to Statistics, which devotes an early chapter to the subject. It actually goes all the way back to the 1600s and an exchange between Blaise Pascal and Pierre de Fermat and proceeds to walk through a dice-throwing example of probability theory. Alas! This is where the book lost me (although I still have it and may give it another go).

Possibly related — although quite different — is something that Matt Gershoff of Conductrics and I have chatted about on multiple occasions across multiple continents. Matt posits that, really, one of the biggest challenges he sees traditional digital analysts facing when they try to dive into a more statistically-oriented mindset is understanding the scope (and limits!) of their role. As he put it to me once in a series of direct messages really boils down to:

  1. It’s about decision-making under uncertainty
  2. It’s about assessing how much uncertainty is reduced with additional data
  3. It must consider, “What is the value in that reduction of uncertainty?”
  4. And it must consider, “Is that value greater than the cost of the data/time/opportunity costs?”

The list looks pretty simple, but I think there is a deeper mindset/mentality-shift that it points to. And, it gets to a related challenge: even if the digital analyst views her role through this lens, do her stakeholders think this way? Methinks…almost certainly not! So, it opens up a whole new world of communication/education/relationship-management between the analyst and stakeholders!

For this area, I’ll just leave it at, “There are some deeper fundamentals that are either critical to understand or something that can be kicked down the road a bit.” I don’t know which it is!

What Do You Think?

It’s taken me over a year to slowly recognize that this list exists. Hopefully, whether you’re a digital analyst dipping your toe more deeply into statistics or a data scientist who is wondering why you garner blank stares from your digital analytics colleagues, there is a point or two in this post that made you think, “Ohhhhh! Yeah. THAT’s where the confusion is.”

If you’ve been trying to bridge this divide in some way yourself, I’d love to hear what of this post resonates, what doesn’t, and, perhaps, what’s missing!

Analysis, Reporting

Performance Measurement vs. Analysis

I’ve picked up some new terminology over the course of the past few weeks thanks to an intermediate statistics class I’m taking. Specifically — what inspired this post — is the distinction between two types of statistical studies, as defined by one of the fathers of statisical process control, W. Edwards Deming. There’s a Wikipedia entry that actually defines them and the point of making the distinction quite well:

  • Enumerative study: A statistical study in which action will be taken on the material in the frame being studied.
  • Analytic study: A statistical study in which action will be taken on the process or cause-system that produced the frame being studied. The aim being to improve practice in the future.

…In other words, an enumerative study is a statistical study in which the focus is on judgment of results, and an analytic study is one in which the focus is on improvement of the process or system which created the results being evaluated and which will continue creating results in the future. A statistical study can be enumerative or analytic, but it cannot be both.

I’ve now been at three different schools in three different states where one of the favorite examples used for processes and process control is a process for producing plastic yogurt cups. I don’t know if Yoplait just pumps an insane amount of funding into academia-based research, or if there is some other reason, but I’ll go ahead and perpetuate it by using the same as an example here:

  • Enumerative study — imagine that the yogurt cup manufacturer is contractually bound to provide shipments where less than 0.1% of the cups are defective. Imagine, also, that to fully test a cup requires destroying it in the process of the test. Using statistics, the manufacturer can pull a sample from each shipment, test those cups, and, if the sampling is set up properly, be able to predict with reasonable confidence the proportion of defective cups in the entire shipment. If the prediction exceeds 0.1%, then the entire shipment can be scrapped rather than risking a contract breach. The same test would be conducted on each shipment.
  • Analytic study — now, suppose the yogurt cup manufacturer finds that he is scrapping one shipment in five based on the process described in the enumerative study. This isn’t a financially viable way to continue. So, he decides to conduct a study to try to determine what factors in his process are causing cups to come out defective. In this case, he may set up a very different study — isolating as many factors in the process as he can to see if can identify where the trouble spots in the process itself are and fix them.

It’s not an either/or scenario. Even if an analytics study (or series of studies) enables him to improve the process, he will likely still need to continue the enumerative studies to identify bad batches when they do occur.

In the class, we have talked about how, in marketing, we are much more often faced with analytic situations rather than enumerative ones. I don’t think this is the case. As I’ve mulled it over, it seems like enumerative studies are typically about performance measurement, while analytic studies are about diagnostics and continuous improvement. See if the following table makes sense:

Enumerative Analytic
Performance management Analysis for continuous improvement
How did we do in the past? How can we do better in the future?
Report Analysis

Achievement tests administered to schoolchildren are more enumerative than analytic — they are not geared towards determining which teaching techniques work better or worse, or even to provide the student with information about what to focus on and how going forward. They are merely an assessment of the student’s knowledge. In aggregate, they can be used as an assessment of a teacher’s effectiveness, or a school’s, or a school district’s, or even a state’s.

“But…wait!” you cry! “If an achievement test can be used to identify which teachers are performing better than others, then your so-called ‘process’ can be improved by simply getting rid of the lowest performing teachers, and that’s inherently an analytic outcome!” Maybe so…but I don’t think so. It simply assumes that each teacher is either good, bad, or somewhere in between. Achievement tests do nothing to indicate why a bad teacher is a bad teacher and a good teacher is a good teacher. Now, if the results of the achievement tests are used to identify a sample of good and bad teachers, and then they are observed and studied, then we’re back to an analytic scenario.

Let’s look at a marketing campaign. All too often, we throw out that we want to “measure the results of the campaign.” My claim is that there are two very distinct purposes for doing so…and both the measurement methods and the type of action to be taken are very different:

  • Enumerative/performance measurement — Did the campaign perform as it was planned? Did we achieve the results we expected? Did the people who planned and executed the campaign deliver on what was expected of them?
  • Analytic/analysis — What aspects of the campaign were the most/least effective? What learnings can we take forward to the next campaign so that we will achieve better results the next time?

In practice, you will want to do both. And, you will have to do both at the same time. I would argue that you need to think about the two different types and purposes as separate animals, though, rather than expecting to “measure the results” and muddle them together.

Analysis

"The Axiom of Research" and "The Axiom of Action"

I attended a one-day seminar today on “The Role of Statistical Concepts and Methods in Research” taught by Dr. Tom Bishop of The Ohio State University. Dr. Bishop heads up a pretty cool collaboration between Nationwide (all areas of the company, including Nationwide: Car Insurance) and OSU, and this seminar was one of the types of minor artifacts of that collaboration.

Dr. Bishop had me on page 5 of the seminar materials when he introduced “The Fundamental Axioms of Research,” which he stated are twofold:

  • The Axiom of Variation — all research data used for inference and decision making are subject to uncertainty and variation
  • The Axiom of Action — in research, theory is developed, experiments are conducted, and data are collected and analyzed to generate knowledge to form a rational basis for action

The rest of the seminar was a healthy mix of theory and application, with all of the “theory” being tied directly to how it should be applied correctly. Dr. Bishop is a statistician by training with years of industry experience, so it was pretty cool to hear him emphasize again and again and again that you can get a lot of value from the data without running all sorts of complex, Greek-letter-rich, statistical analyses. The key is to apply a high degree of rigor in understanding and articulating the problem and the approach.

Lots of good stuff!