Analysis, Reporting

What is "Analysis?"

Stephen Few had a recent post, Can Computers Analyze Data?, that started: “Since ‘business analytics’ has come into vogue, like all newly popular technologies, everyone is talking about it but few are defining what it is.” Few’s post was largely a riff off of an article by Merv Adrian on the BeyeNETWORK: Today’s ‘Analytic Applications’ — Misnamed and Mistargeted. Few takes issue (rightly so), with Adrian’s implied definition of the terms “analysis” and “analytics.” Adrian outlines some fair criticisms of BI tool vendors, but Few’s beef regarding his definitions are justified.

Few defines data analysis as “what we do to make sense of data.” I actually think that is a bit too broad, but I agree with him that analysis, by definition, requires human beings.

Fancy NancyWith data “coming into vogue,” it’s hard to walk through a Marketing department without hearing references to “data mining” and “analytics.” Given the marketing departments I tend to walk through, and given what I know of their overall data maturity, this is often analogous to someone filling the ice cube trays in their freezer with water and speaking about it in terms of the third law of thermodynamics.

I’ve got a 3-year-old daughter, and it’s through her that I’ve discovered the Fancy Nancy series of books, in which the main character likes to be elegant and sophisticated well beyond her single-digit age. She regularly uses a word and then qualifies it as “that’s a fancy way to say…” a simpler word. For instance, she notes that “perplexed” is a fancy word for “mixed up.”

“Analytics” is a Fancy Nancy word. “Web analytics” is a wild misnomer. Most web analysts will tell you there’s a lot of work to do with just basic web site measurement. And, that work is seldom what I would consider “analytics.” As cliché as it is, you can think about data usage as a pyramid, with metrics forming the foundation and analysis (and analytics) being built on top of them.

Metrics Analysis Pyramid

There are two main types of data usage:

  • Metrics / Reporting — this is the foundation of using data effectively; it’s the way you assess whether you are meeting your objectives and achieving meaningful outcomes. Key Performance Indicators (KPIs) live squarely in the world of metrics (KPIs are a fancy way to say “meaningful metrics”). Avinash Kaushik defines KPIs brilliantly: “Measures that help you understand how you are doing against your objectives.” Metrics are backward-looking. They answer the question: “Did I achieve what I set out to do?” They are assessed against targets that were set long before the latest report was pulled. Without metrics, analysis is meaningless.
  • Analysis — analysis is all about hypothesis testing. The key with analysis is that you must have a clear objective, you must have clearly articulated hypotheses, and, unless you are simply looking to throw time and money away, you must validate that the analysis will lead to different future actions based on different possible outcomes. Analysis tends to be backward looking as well — asking questions, “Why did that happen?”…but with the expectation that, once you understand why something happened, you will take different future actions using the knowledge.

So, what about “analytics?” I asked that question of the manager of a very successful business intelligence department some years back. Her take has always resonated with me: “analytics” are forward-looking and are explicitly intended to be predictive. So, in my pyramid view, analytics is at the top of the structure — it’s “advanced analysis,” in many ways. While analysis may be performed by anyone with a spreadsheet, and hypotheses can be tested using basic charts and graphs, analytics gets into a more rigorous statistical world: more complex analysis that requires more sophisticated techniques, often using larger data sets and looking for results that are much more subtle. AND, using those results, in many cases, to build a predictive model that is truly forward-looking.

The key is that the foundation of your business (whether it’s the entire company, or just your department, or even just your own individual role) is your vision. From your vision comes your strategy. From your strategy come your objectives and your tactics. If you’re looking to use data, the best place to start is with those objectives — how can you measure whether you are meeting them, and, with the measures you settle on, what is the threshold whereby you would consider that you achieved your objective? Attempting to do any analysis (much less analytics!) before really nailing down a solid foundation of objectives-oriented metrics is like trying to build a pyramid from the top down. It won’t work.

Reporting

Baseball Stats and BI Musings Part I: Good Metrics?

It’s late spring, and my 9-year-old’s baseball season is getting rolling. Due to my gross lack of eye-hand coordination, I volunteered to do the scoring for the team.

There are two basic reasons to score a baseball game:

  • Capture enough information on a single page (two pages, actually) that would allow you to entirely recreate the game, play by play, after the fact
  • Capture information required to compile game/season statistics for individual players — things like batting average on offense, fielding effectiveness on defense, and ERA for pitchers (also technically a defensive thing)

This means you need to capture a lot of information. Every pitch typically gets recorded in some fashion, and any time a batter finishes at the plate (through a hit, a walk, hit by a pitch, etc.) requires recording additional information. The more detailed the information, the more fun statistics you can pull from the data. But, generally, it’s good to capture a bit more data than you expect to use. For instance, with the system I’m using now, I actually catch the sequence of pitches for any batter: ball, then strike, then strike, then ball, then hit, for instance. That detail, in theory, would allow me to report how a batter fares when he is “behind in the count” (more strikes than balls) vs. “ahead in the count” (more balls than strikes). I’m not going there at all at this point.

At my son’s age, we really just want to make sure we get the final score right. But, the statistics are awfully alluring, so I’ve been logging the information in a spreadsheet so I can do some crunching and see what it tells me. We’re only four games in, and I’m no baseball sophisticate, so I started with the two most popular stats in baseball: earned run average (ERA) and batting average. I regularly mount my “a metric that isn’t tied to a clear objective is not a good metric” soapbox, and it turns out ERA is a pretty great metric. A pitcher’s objective is pretty clear: allow as few runs to score as possible. But, you can’t simply look at the total runs scored on a pitcher for two reasons:

  1. A great pitcher who has an infield that regularly flubs plays is going to have more runs scored on him than a similar pitcher who has Derek Jeter and Alex Rodriguez shagging grounders
  2. The more innings a pitcher pitches, the more runs he’s going to have scored on him

The “earned run” part of the ERA addresses the first issue by trying to isolate how many runs would have been scored if the other 8 players on the field played perfectly. The “average” part of ERA addresses the second issue by normalizing the metric to a 9-inning average (or a 6-inning average in my son’s case, as their games are only 6 innings long).

What about setting a target? The Gospel According to Gilligan clearly states “Thou shalt not consider a metric worthy if it does not have a preset target.” In the majors, an ERA below 3.00 is considered to be pretty darn good. It’s a “benchmark” of sorts. Or, the other way to look at the metric is to say the target is a 0.00, which is unattainable, but a worthy stretch goal.

So, what about batting average? This seems pretty simple. The batting average is the percent of a player’s at bats where he gets a hit. It’s actually represented as a 3-place fraction rather than a percentage (a .347 batting average means the player gets a hit on 34.7% of his at bats). The stat has been around as long as ERA and has long been considered the metric that is the single best measure of a player’s offensive output. There are a couple of problems with the metric, though. First off, what is a batter’s primary objective? Ultimately, it’s to score runs…but there are too many other factors at play to use that as metric. And, as it turns out, it’s not to get hits as much as it is to get on base. And hits are only one way of doing that. When you peel back the batting average calculation a bit, you find that a walk is not considered an official at bat, so it doesn’t go into the numerator or the denominator of the equation. The reasoning is that the batter got on base because the pitcher screwed up. That’s giving the pitcher a bit too much credit, as a batter who has “plate discipline” is a batter that doesn’t swing at balls — he gets more walks, and when he swings, he’s more likely to be swinging at a hittable ball. (Sacrifices also don’t count as an at bat, but I’m okay with that, as the batter’s objective in that case is to move the baserunner(s) up, so he’s not really trying to get on base himself. A fielder’s choice where the hitter winds up on base doesn’t count as a hit, which makes sense. And, if a batter puts a ball in play and then reaches base on an error, that’s still not considered a hit, because that was more a defensive goof than an offensive success, so it goes into the denominator as an at bat but not in the numerator as a hit. Oh…MAN…can I digress on this subject…!)

Whether it’s true or not, or whether it’s a gross oversimplification, Billy Beane, the general manager of the Oakland A’s, gets credited with this epiphany. The story of how Billy used data to go against baseball’s conventional wisdom to make the Oakland A’s a consistent contender despite their minuscule payroll (by MLB standards) is the basic premise of Moneyball: The Art of Winning an Unfair Game. One of the metrics that Billy and his number crunching assistant started focussing on was on-base percentage (OBP), which includes walks in the numerator and denominator of the calculation. OBP gets a lot closer to a batter’s objective than batting average does. And, Beane started picking up college players who walked a lot but didn’t have a great batting average. And it worked.

Theo Epstein, the general manager of the Boston Red Sox, followed in Beane’s footsteps (he actually worked for Beane for all of 12 hours during Beane’s one-day stint as GM of the Red Sox). And the Red Sox finally won another World Series.

So, as I’ve started tallying the stats for my son’s team, I’ve calculated both batting average and OBP, and, lo’ and behold, we’ve got a couple of kids who are in the lowest third of the team based on batting average…but move up considerably when it comes to OBP. None of this is to be shared with the kids — at this point, they’re having a good time, they’re trying hard, and they’re learning to support each other, so introducing a hierarchy of “who’s better” is wildly counter-productive.

In the end, I’ve violated my core tenet — I’m looking at metrics that are not, in the end, actionable at all! But I’m having fun, and it’s got me thinking about data in some new ways. This post was about metrics. I’ll explore data quality in the next post. Stay tuned!

Reporting, Social Media

Social Media Success Metrics. Or…at Least Objectives.

Jeremiah Owyang has a post on his Web Strategist blog titled Why Your Social Media Plan should have Success Metrics. Based on the URL of the post, it looks like Owyang initially titled the entry “Why Your Social Media Plan should Indicate What Does Success Look Like.” Admittedly, the original title is a bit clunky. But, in the cleanup, he actually oversimplified the main point of his post, which is that it’s important to have some clear idea of why you’re tackling social media and some idea what you’re hoping to get out of it. He includes some examples:

A few examples of what success could look like for you:

  • We were able to learn something about customers we’ve never know before
  • We were able to tell our story to customers and they shared it with others
  • A blogging program where there are more customers talking back in comments than posts
  • An online community where customers are self-supporting each other and costs are reduced
  • We learn a lot from this experimental program, and pave the way for future projects, that could still be a success metric
  • We gain experience with a new way of two-way communication
  • We connect with a handful of customers like never before as they talk back and we listen
  • We learned something from customers that we didn’t know before

One of the commenters correctly pointed out that none of these examples were “metrics” per se. I say, “Cool!” Owyang’s point is spot on — be clear on why you’re tackling social media. And, you know what? If it’s, “Because I don’t understand it and don’t ‘get’ it and figure the best way to learn is to dive in and do it,” then that’s okay! Of course, if that is the only reason you are dipping your phallanges into social media, then you should also set a target date for when you’re going to evaluate whether you are going to continue — with more focussed objectives — or whether you are going to reduce your focus on it.

The metrics will come. Sometime, they’re not crisp, clean, perfect metrics. That’s okay. I’m a fan of proxy measures, as well as the occasional use of subjective measures. Quantitative measures that aren’t tied to clear objectives, on the other hand, drive me bonkers.

So, what are my objectives with this part of my personal social media experimenting? Very simply, they’re as follows:

  • See if I can “do” it — post with some level of substance on a sustained basis
  • Give myself an outlet for expressing my opinions and frustrations about data usage (when it’s not appropriate to express them directly to the person who triggered the need for an outlet)
  • Learn about blogging technologies

The jury is still a bit out on the first objective, but it’s looking like the answer is, “I can.”

I am clearly hitting the second objective (and will continue to do so).

I’ve become intimate with both Blogger and WordPress, as well as dabbled with Technorati, Feedburner, Yahoo! Pipes, and any number of social networking and social bookmarking platforms, so I’d say I’m well on my way to the third.

I’m not feeling the need to reset my objectives just yet.