When There's Just Not Enough Good Data to Draw a Conclusion

Posted on September 21, 2007October 25, 2019 by Eric Peterson

I ran into an interesting, but not uncommon scenario this week. I had someone who was trying to do a back-of-the-napkin calculation to assess the relationship between leads and revenue for a company. This is probably the most common relationship that marketers want to find. In the seven years I’ve been in Marketing, I have never seen a company actually pull this off as a really tight correlation, but I’ve seen many, many people try.

There are lots of reasons for this: long and variable sales cycles, many marketing-driven interactions with a lead before Sales even starts working it, changing sales processes, changing product offerings, changing marketing processes, and, yes, revenue that comes in from sources that did not originate with Marketing.

So, obviously, this is a futile exercise, right?

Not exactly.

Back-of-the-napkin analyses, where one person takes macro numbers, makes reasonable assumptions, and then comes up with a result do have value, even if they give a data purist heartburn. They do a couple of things:

They force that one person to step back and think about the moving parts in the process. This can and should spark discussions and questions. Some of those questions are answerable (e.g., “How many times does Marketing touch our prospects that ultimately turn into opportunities for Sales?” “How long does it take for a new prospect who ultimately becomes a Sales opportunity to make that conversion?”) and, as long as they’re validated to make sure the answers are in some way actionable, these questions are worth trying to answer.
If multiple people come at the same question with their own napkins, and making their own assumptions, based on their own experience and expertise, then all of those napkins can be laid on the same conference room table and discussed. That’s a Wisdom of Crowds-type opportunity. There’s a guarantee that everyone will come up with different answers, but the answers that are similar are interesting, because they were arrived at with different approaches. The outliers can drive a useful discussion as to what assumptions and approaches were taken, which will broaden the perspective of all of the napkin analysts and drive some agreement and consensus as to what is going on with the business.

Are these analyses valid from a statistically rigorous perspective? Probably not. But, if they drive agreement on what a valid analysis should be, and if it highlights trouble spots that are preventing that analysis from happening, then they can still drive positive change.

In the case that spawned this post, we stumbled across one risk of these types of analyses of which you need to be wary. The person who did the analysis didn’t think the result he got could possibly be “right.” So, he wanted to dive deeper into the data to try to get an accurate answer. He listed the assumptions he’d made, and he requested, basically, that we go through and validate those assumptions. For each one, though, we could unequivocally say that his instincts and experience — which drove his assumptions — were going to be much more valid than the data for various reasons. So, instead, I did my own back-of-the-napkin analysis with my own assumptions and my own experience…and came up with a pretty similar result to what he came up with.

That seemed to work.

Analysis, Reporting

Reporting vs. Analysis — not just me making the distinction

Posted on September 19, 2007October 25, 2019 by Eric Peterson

A good friend of mine from my youth (and still today) read my first real post on this blog, and it really resonated with him — he’s commented offline about it a couple of times over the past couple of months. He pinged me today asking for some resources that would elaborate on the subject. My kneejerk reaction was: there ain’t none — it’s not a distinction people make.

But, I did a quick Google search, anyway. Immediately, I turned up a post by a highly credible person — Jim Novo. Jim’s been absent from the Yahoo! webanalytics forum of late, but that’s where I first read him, and he’s got experience and considerable smarts.

Turns out, he had a post from February of this year (2007) on the exact subject of Reporting vs. Analysis. The first thing I noticed was that he referenced a post from Avinash Kaushik that touches on the subject as well. This really is a small world, apparently (see my last entry in this blog…where I referenced an Avinash post!).

When I skimmed Jim’s article, my initial reaction was that he was indeed using “reporting” and “analysis” in a different way than I define them. But, on a closer read, I don’t think that’s the case. He (and, by extension, Eric Peterson) really is calling a report something that you monitor to see if you’re doing what you want (maintaining the status quo or driving improvements), whereas analysis is about trying ot understand what’s going on. Interesting stuff. I feel validated. 😉

Analysis, Analytics Strategy

A GIANT in web analytics says, "Don't get your hopes up…"

Posted on September 17, 2007October 25, 2019 by Eric Peterson

Avinash Kaushik has a great blog post about trying to do predictive analytics with web data:
“Data Mining And Predictive Analytics on Web Data Works? Nyet!”

Avinash is one of the truly brilliant minds in web analytics, so it’s great to see him put his brainpower behind explaining this assertion. And, it’s timely, in light of the new book by Ian Ayres, a Yale Law School professor and econometrician. I really need to order the book and read it, as I’ve got preconceived notions based on watching an interview with Ayres. Fortunately, I’ve got a B&N gift card and we’re B&N members, so get an additional discount. Hmmm…my link above is to Amazon…yet I’m going to buy through B&N. Why is that? Topic for another post…on someone else’s blog, I suspect!

For now, I’m going to view these two sources as representing two schools of thought / approaches to data. On the one hand, we actually have, oddly enough, highly trained statisticians, academics, AND casual users of business data. This is a group that, overly simplistically, sees “more data is better,” albeit for different reasons. The statisticians want more data so they can find increasingly subtle correlations in the data. the casual users want more data because, like revenue, more data is better, right?

The other school of thought is a bit more grounded in reality. As I’m prone to do on this blog, I’m once again blatantly showing where I fit. This school of thought recognizes that more data brings along the need for more discipline — more *business* discipline — to actually get actionable information from the data.

Now, I’ve got to go and see if I can figure out how to use a BN gift card AND get my discount through their online site.

Analysis, Reporting

"You can make the data say whatever you want it to."

Posted on September 10, 2007October 25, 2019 by Eric Peterson

Rack that up as one of those popular, throwaway cliches, stated with a ho-hum air as if to say, “It’s so factual and irrefutable that I can’t believe I’m wasting my body’s energy pumping carbon dioxide converted from oxygen into the atmosphere to say it.”

Drives me nuts.

My personal fantasy? Anyone who makes this statement is banned from accessing or using any data for a year.

Why? Because, as stated this way, it fairly directly implies that any sort of data analysis is just a way to drive someone’s agenda or spin the results of an initiative. And, data can absolutely be used to do this. But it doesn’t have to be.

While I may sound like a broken record, I hope that I more sound like a well-produced album, with a selection of tunes that, while all on a similar theme, approach that theme from various angles.

So, two ways to come at the, “whatever you want it to” comment.

Situation 1
Someone makes this statement when discussing data they have looked at or the results of an analysis that was undertaken, directly or indirectly, at their behest. I had this happen today, and the fellow’s initial beef was that the analysis that we had done did not aggressively and vividly back up his own strongly held beliefs about a certain business situation. What he was looking for was an analysis that simply supported his assertion as to the current state of affairs. What temperature does blood boil at? When he dropped the, “You can make data…” comment, it was an intellectual slap in the face of sorts (not that he saw it that way — this is not a fellow who is particularly self-aware as to the impact of his words, and he was by no means trying to insult…um…my chosen career). I pushed back (rather calmly; please hold while I pat myself on the back) that, if he already knew what he wanted the data to say, and if he was just going to push for multiple iterations on the analysis until it said that, then what’s the point? Analysis should be all about answering questions. In this sort of situation, the only question is, “Can I dip my advocacy brush into a bucket of data and paint the picture that already exists in my mind’s eye?”

Situation 2
This is the case when someone presents data, and someone that is seeing it presented doesn’t buy into what it supposedly supports. Ironically, the person who points out the spinnability of data is the same person who would spin it for his/her own purposes (see Situation 1). This is, quite simply, sad. It’s a waste of a company’s money to pay for someone to take this on. More often than not, though, what happens in this situation is that the user/presenter of the data simply didn’t know how to effectively use the data. It’s sooooo tempting to start with the data. If you do that, you have an infinite number of ways to pivot it, plot it, and print it. And, after making several dozen charts, and realizing you’ve got a real snoozer in the works if you present all of them, you narrow down to 2 or 3 that seem relevant. And, human nature says, the more they seem to show something positive, the more of a relevancy boost they’ll get. The way to avoid this is to have the pre-data discipline to articulate what you’re trying to get at. Publicize those (or, at least, write them down for yourself — it will keep you on the right track!). Doggonit! Seems like half of the songs in this blog-album wind up referring back to one of my first posts. It’s a “how to avoid Situation 2” prescription.

Yes, you can make the data say whatever you want. But, that’s an awfully jaded view of the world. And, with some proper up-front discipline, you won’t be wasting your time trying to make it say what you want, and you won’t be providing much of an opportunity for the people who are receiving the analysis or report to have that cliche pop into their heads.

Analysis, Reporting

When a Data Geek Hits the Road

Posted on August 27, 2007October 25, 2019 by Eric Peterson

I’ve been offline for a while, primarily because I’ve been in the process of relocating my family from Austin, TX to Dublin, OH (Columbus, OH, basically). The last part of that relocation was me driving the entire trip in one shot with our two labs. My wife and our three kids had left two weeks earlier, taken a meandering trip up, then handled the closing on our Dublin house and the movers showing up and getting everything moved in.

Things did work out well for me there.

But, to the data geek front:

I decided that I’d go for a one-day trip, with a planned “out” to get a hotel if things got unsafe fatigue-wise. According to Google, the door-to-door trip was 1,272 milies and would take about 19 hours and 43 minutes. I brought along the driving directions, which I followed to a T, and logged actual elapsed times and mileage at various waypoints along the journey.

	Miles
Waypoint	Google	Actual	% Difference
Depart Dripping Springs, TX	0.0	0.0	–
Exit I-35E onto I-30 E in Dallas, TX	215.5	203.0	-5.8%
Exit I-30 onto I-440 E in Little Rock, AR	531.5	505.4	-4.9%
Exit I-440 onto I-40 E in Little Rock, AR	541.5	515.0	-4.9%
Stay on I-40 E in Memphis, TN	681.4	649.0	-4.8%
Stay on I-65 N as leaving Nashville, TN	880.5	840.0	-4.6%
Take I-71 N in Louisville, KY	1056.8	1008.0	-4.6%
Stay on I-71 N in Cincinnati, OH	1128.4	1095.2	-2.9%
Arrive Dublin, OH	1272.5	1215.3	-4.5%

Not a bad variance, actually. And, who knows? Maybe the odometer on my truck is off. The fact is, I didn’t particularly care how far I drove — the critical factor was how long I drove. So, how did Google do on the timing front? Same waypoints, but cumulative elapsed times: Google’s estimate versus the digital clock in my truck:

	Cum. Hours
Waypoint	Google	Actual	% Difference
Depart Dripping Springs, TX	0.0	0.0	–
Exit I-35E onto I-30 E in Dallas, TX	3.5	3.1	-9.6%
Exit I-30 onto I-440 E in Little Rock, AR	8.2	8.0	-1.8%
Exit I-440 onto I-40 E in Little Rock, AR	8.3	8.2	-2.0%
Stay on I-40 E in Memphis, TN	10.4	10.1	-2.9%
Stay on I-65 N as leaving Nashville, TN	13.3	13.2	-1.1%
Take I-71 N in Louisville, KY	16.1	15.9	-1.4%
Stay on I-71 N in Cincinnati, OH	17.2	17.2	0.0%
Arrive Dublin, OH	19.6	19.5	-0.8%

Holy. COW! I arrived 10 minutes earlier than Google predicted after driving over 1,200 miles!

Now, to be a good data analyst, I’m going to have to assume that, if I made this same drive 100 times, I wouldn’t hit it within 10 minutes all that often. There are simply too many variables. What would the standard deviation of my total trip times be, though, I wonder? At a minimum, we’ll have to wait until gas prices come down considerably and until I slip into some sort of mild dementia to find that out. 19 hours of driving…solo…in one day was fairly brutal, and not the safest of things to do!

Analysis, Reporting

"If you can't measure it, don't do it"

Posted on August 10, 2007October 25, 2019 by Eric Peterson

I heard this again today. It’s a mini-mantra in my current company, and I couldn’t disagree more.

There’s a fairly famous Albert Einstein quote: “Not everything that can be counted counts, and not everything that counts can be counted.” (It’s also sometimes quoted as: “Everything that counts cannot be counted and everything that can be counted does not count.” Same difference, and I have no idea which, if either, is precisely correct). Supposedly, this hung in his office. It’s hung in my office — prominently — for several years.

All too often, it seems like we shy away from setting objectives if we can’t think of a way to easily measure them. Then, in practice, we try to achieve those objectives, anyway, because we just know it’s the right thing to do. I’m a firm, firm, firm believer in having a clear (and clearly articulated) vision, then developing a strategy for achieving that vision. Let that strategy be the guiding principle — not the measurability of your day-to-day actions.

Could I be more vague?

I love that Ben and Jerry’s has held firm to their quirky, environmentally conscientious vision for the entire life of their company. Sure, they measure the return on specific flavors, and they measure the effectiveness of their marketing campaigns. Do they try to tightly measure the effectiveness of their brand? I don’t know…but I doubt it. Their brand is driven by who they are and who they will always be. That counts, but is very, very hard to measure. Yet, it is at the core of their operations year in and year out.

At a more tactical level, we’ve been doing a lot of work around building nurturing programs. There absolutely has to be a core belief that these programs directly add value for the prospect who is being nurtured. If they do that well, then they will drive more business to the company that is doing the nurturing. It’s darn near impossible to measure that value-add to the prospect, so there can be a drift, over time, of just focussing on the business that the nurturing is driving. Proceed with caution! If the unmeasurable — what can’t be counted — aspects of the program do not remain core to every decision, then what can be counted may start to suffer, as “nurturing” starts evolving into “spam.”

Analysis, Analytics Strategy, Reporting

One more reason why you CAN'T just start with the data

Posted on August 8, 2007October 25, 2019 by Eric Peterson

My boss mentioned Parkinson’s Law to me this morning in reference to a discussion we were having about sales and marketing process efficiency. I was familiar with the concept, but not with the actual law. If you didn’t follow the link, and you don’t know what it is, it’s the principle that “work expands so as to fill the time available for its completion.” This is so true in the business world that it’s well, kinda sad.

The part of the write-up that jumped out at me, though, was the statement that, “It has been observed over the last 10 years that the memory usage of evolving systems tends to double roughly once every 18 months.” Poor form on the passive voice usage, but that’s a tangent that is not related to this post (or this blog at all, for that matter). I need to do some digging to find the source of this stat. It sounds right, but I did some digging several years ago for this sort of information, and I didn’t find this. What I did find were two different studies by Gartner — performed several years apart — that predicted that there would be a 30x increase in the total volume of enterprise data in the next seven years (I think the studies were done five years apart, and both had a similar projection). I have a clipping somewhere with one of the studies, but it’s in a box en route to Ohio, so I can’t nail the specifics.

These two estimates are so eerily similar that they sort of smell like they came from the same study. Doubling every 18 months would mean you had a 32x increase (2^5) in 7.5 years.

As usual, I’m spending way too damn long on the preamble and not getting to the point, which is this:

Rewind seven years and let’s come up with a hypothetical situation whereby you have just started in a new position. In order to get the lay of the land and figure out what you should do first, you ask for a dump of all data that could possibly be related to your domain of responsibility. For chuckles, let’s say that came out to 3 pages of raw data (not realistic, but making it ridiculously small still supports my point). So, you could take that data, print it out, spread it out on your desk, and pore over it for a couple of hours. Make it a day. You could become so intimate with that data that you would feel like plopping back on a big fluffy pillow and smoking a cigarette. If you did plop back on a pillow and take a drag on a smoke, you could then stare up at the ceiling and wait for your brain to work it’s magic. If there were any interesting, useful insights in that data, your brain would likely find them (assuming your boss doesn’t interrupt your thoughts and want to know: 1) what you’re doing with a big fluffy pillow in your office, or 2) why you’re smoking). That’s one of those really cool things about the brain.

So, in that case, you could start with the data: “Give me the data, I’ll ‘analyze’ it, and then I’ll figure out what action I should take.”

Fast forward seven years. Same situation. Except, there’s been a 30x increase in what you get when you ask for “all the data that could possibly be relevant.” That’s 90 pages of data. You’re brain isn’t going to be able to work it’s magic with that. You could spend 3 weeks looking at the data without feeling like you truly had your head wrapped around it. What most people would do with 90 pages of data would be to start charting it. A picture is worth a 1,000 words, right? That’s one way to get 90 pages of data summarized into something that the brain might be able to handle. Of course, with 90 pages of data, you could produce 900 pages of graphs. Obviously, you would have to pick and choose what you would graph and how. Then, you would keep generating one graph at a time until you saw something that showed either an “interesting” trend or a spike somewhere. At that point, you would be so relieved that you had found something, that you would quickly copy the chart and paste it into PowerPoint so you could show it to a group in a meeting and prove that you were, by golly, doing stuff (um…see Parkinson’s Law!).

If asked by an anal BI-oriented stickler, “Did you take action on the data?” you would respond, “Absolutely! I charted it, put it in PowerPoint, and showed it in a meeting, where everyone agreed that it was interesting!”

EGAD!

Point made?

Analysis, Analytics Strategy

Are you data-oriented or process-oriented?

Posted on August 4, 2007October 25, 2019 by Eric Peterson

Yet another entry on a topic that’s been kicked around in my head for several years, but which seems to come up every week or two. It’s the recognition that there are two different types of people: data-oriented people and process-oriented people. Okay, that’s a gross oversimplification. There are loads of people who are neither. And, there are people who can manage to approach a problem from both perspectives, but those people tend to have a bias one way or the other.

This isn’t an idea I came up with. A business analyst at National Instruments named Drake Botello actually first explained the concept to me. Drake started his career at National Instruments working on various internal applications. When I got to know him well, he had long since switched over to supporting the company’s data warehouse. On the app side, he had to be more process-focused. On the data warehouse side, he had to be more data-oriented, and what he found was that he often ran into challenges when working with business analysts that supported transactional apps that generated data that wound up in the DW. Drake formed the “data vs. process” theory from those experiences.

So, what does it actually mean?

Well, a process-oriented person expects any system, first and foremost, to support the process it was designed to support in the most efficient manner possible. He views data as a natural, incidental byproduct of the process — it doesn’t need to be thought of beyond what specific reporting capabilities are needed to support the process: all of the data comes from the process, so if the process is good, then the data will be good.

A data-oriented person, on the other hand, sees data design, data capture, data integrity, and data analysis as a critical part of any process. He feels it is perfectly acceptable to incorporate additional steps in a process as required to ensure the integrity of the data that gets generated.

Both of these are perfectly valid perspectives. While I don’t have any way to prove it, my sense is the process-oriented perspective outnumbers the data-oriented perspective by 5-to-1 or so.

This perspective can and does cause problems. My earlier post about operational reporting (not the title, but that’s the meat of the content) comes at this issue from a different angle by pointing out that marketing automation and sales force automation tools, which are geared towards driving very efficient, repeatable processes have a bias towards operational reporting. A data warehouse is just the opposite — it’s geared toward metrics reporting and analysis.

These different perspectives are spawned from different legitimate needs. Still, you don’t want a data-oriented person designing a transactional system any more than you want a process-oriented person designing a data warehouse — they’re both liable to see their assignments as relatively straightforward…and then thoroughly muck them up!

I’ll give one simple example of how two people might approach a contact management system. Contacts’ job titles change over time, right? So, when that happens, how should the system handle it? A process-oriented person may say the job title in the system should be overwritten with the newest job title. He may even pat himself on the back by recording a timestamp as to when the title was last updated (“for analysis purposes.”) But, he will likely say there is no need to retain the previous job title. What use is that? A contact management system is for managing contacts and communication with someone. Since you only really care about the current job title — that’s what you would want to use in any correspondence — the old data is old and not of any use.

A data-oriented person, on the other hand, would demand that a historical record of the person’s job title over time must be maintained and accessible. What if you want to analyze which job titles you are most effective selling to? if you sold to the contact before his promotion, but you don’t know what his title was at that point, your analysis will be severely hampered!

I’ve got a definite data bias, as you can probably tell from my tone. But, I do realize that data requirements can add some very real complexities and difficulties to transactional processes. The best situation is to have both perspectives at the table and collaborating.

Of course, there are also lots of people who don’t really think “data” or “process” at all. This is probably the overwhelming majority of the human population. But if you’re still reading this entry, you’re not one of them!

Analysis, Reporting

"Time Span" vs. "Time Range"…reporting

Posted on July 13, 2007October 25, 2019 by Eric Peterson

I spent all week in partner training for Eloqua, which is one of the premier marketing automation companies. My company has used them for ourselves as well as for our clients for several years with great results. We’ve recently deepened our partnership so that we can actually set up and manage instances of the system for our clients, which is a pretty exciting proposition.

Over the course of the week, we spent 3-4 hours on various aspects of reporting in the tool. I’d done minimal poking around in the various Eloqua reports prior to the training (my focus to this point has been more on learning the guts of Salesforce.com and our internal systems and how they apply to our processes), and, to be honest, I’d been pretty frustrated.

One of the most annoying aspects of their reporting was that the majority of the reports I looked at were based on selecting a given “time span:” last day, last 2 days, last week, last month, etc. I was usually looking for trend data — how many visits to our Web site by week over the past few months, how many e-mail opens or clickthroughs, etc. Occasionally, I’d find myself buried in the interface at a point where I could specify a time range, but it always seemed like the available reports were pretty limited.

After the training, I have a better understanding of their (wildly confusing) user interface…so I think I can get to many more time range reports now.

More importantly, I had an epiphany this afternoon.

First and foremost, products like Eloqua and Salesforce.com (and Oracle applications, and Siebel, and SAP, and you-get-the-idea) are process automation/optimization tools. This wasn’t the epiphany. It’s just a fact. But, while it’s true that processes generate data, it’s wildly naive to think that a perfectly efficient, perfectly functioning process produces perfectly accessible data. As a matter of fact, there seems to be almost a negative correlation between these two areas. But that’s really a topic for another post.

In an earlier post, I wrote about the difference between metrics and analysis. I spent a couple of paragraphs on what I call “operational reporting” (I didn’t coin the term…but I use it!). The definition of an operational report is that it is part of a defined process: an invoice is part of a billing process, for example. Eloqua’s reporting is really oriented towards operational reporting. For instance, “I sent out an invite to a webinar yesterday, and I need the detail of who opened it, who clicked through on it, who registered for the event, and who didn’t.” It makes sense that “time span” reporting would be used here.

I, however, had been looking more at possible metrics reports and data for analysis. Time range reports become much more useful — even more useful if time increments can be a dimension in the report. I don’t know for sure, but I suspect time range reporting was not initially available in Eloqua. As phenomenal as the tool is…it’s got some definite shortcomings on the data-for-metrics and data-for-analysis front, IMHO.

But, at least I now have a framework to work within when I’m trying to get at that data in a meaningful way.

Analysis

Sometimes, it doesn't make sense to look at the data

Posted on July 6, 2007October 25, 2019 by Eric Peterson

(YET AGAIN aiming for a briefer entry)

We got into a discussion today at work about how we were going to optimize certain aspects of our marketing efforts. Specifically, we were discussing shifting some aspects of some of our campaigns to make the leads from them more segmented in a way that could trigger different sales processes for following up. As one of the data folk, and the only one in the meeting, all eyes turned to me with that, “Well, Tim, what can you do to tell us how we should tweak and tune these aspects?”

I’m all for using data wherever possible, but this was a classic case of where the data was a rathole waiting to happen. In this case, we were discussing aspects of the campaigns that have only been formalized for the past few months. And, we’re a relatively small company. While we’ve dramatically increased the number of leads we pass to Sales each month, we’re still playing well in the 4-digit range for total volume, and well in the single digits for the number of discrete campaigns we execute each month. That’s a paucity of data when it comes to, in hindsight, trying to glean insights about multi-dimensioned aspects of the data.

In this case, it makes a lot more sense to tackle two approaches simultaneously. For the very-near-term changes, we need to talk to the experienced Sales and Marketing personnel at the company — use their experience as a data source. At the same time, we need to start some pretty rudimentary A/B testing — apply some design of experiments to the process.

Now, an academic or a theorist may say the first part of this is a horrible idea, because those experienced personnel may well have deeply-ingrained myths as to what works and what doesn’t. And, there’s a chance — although a small chance — that this is true. But, there’s a twofold benefit to getting their opinions and running with them: 1) they’re more likely right than they are wrong, and 2) that gets their buy-in and shows them that you value their input. That’s the “human nature” element of things — build the relationship and trust first, so that when your A/B testing turns up some interesting results, they might actually agree to help implement them. Even better, pull those same people in to help you pick what tests to do when, AND pull them in to help interpret the results.

It’s just darn naive to view the world as a place where data is so abundant, so clean, and so clear-cut that it will spit out results that are so irrefutable that there’s no need to apply any soft skills to manage change.

Analysis, Analytics Strategy

In Search of the Mythical Step Function

Posted on July 5, 2007October 25, 2019 by Eric Peterson

One thing I’ve learned over the years is that the real world of data is a lot messier than a whole range of information outlets imply. Whether it’s web data, CRM data, or ERP data, it is very, very seldom that there is something going on that, once discovered, can have an immediate and dramatic positive impact with little effort. The reality is, it takes some up front discipline to prepare to conduct an analysis, then, often, a not insignificant effort to get the data needed for the analysis pulled and prepped. And, at the end of the day, the best results pass the hurdle of “statistically significant.” What that means is that a set of variables may be found that have a slight-but-real impact on something you care about. Now, hopefully, those are variables that you can influence, and that you can influence without too much investment.

A classic examples with web data is what I call “the myth that people are cows.” Anyone who has ever been in a pasture with a herd of cattle knows that it takes the average bovine somewhere between 1 and 3 nanoseconds to settle into an unwavering pattern. Grove of trees to watering hole at 8:07 AM. Watering hole to grassy knoll at 1:53 PM. Grassy knoll back to grove of trees at 6:32 PM. More than that, the entire herd follows the exact same 12-inch wide path unerringly from point to point. It takes almost no time for that path to be a well-worn, dirt, 12-inch wide trail.

(For an absolutely wonderful poem on the subject of such paths, and how Sam Walter Foss imagined such a path driving urban development of a major city, check out “The Calf-Path.” at Public Radio International’s The Writer’s Almanac.)

The problem is that, all too often, Marketeers assume that people are like a herd of cattle. They know that, if they can just find the most common paths through their web site, they can take advantage of it in huge and profitable ways! “We’ll know exactly where to put a billboard for Maisy’s Magnificent Udder Moisterizer that will attract the most eyeballs!”

Unfortunately, visitors to web sites are not cows. Not even close. Try some simple math. How many unique links do you have on your home page? 10? 20? 100? I’d bet good money that, if you count them, you’ll realize it’s more than you thought. That’s the beauty of drop-down menus in that a clean and simple design can still present the visitor with a lot of options. For chuckles, let’s say there are only 10 links on each page on your site, including the home page. Let’s also say that the site does not have a search box that persists on every page (poor form, that). And, let’s go ahead and say visitors’ browsers don’t have a Back button. Given all of those unrealistic constraints, the number of unique possible paths from the main page of the site five levels deep is 10^5, or 100,000. With 100,000 options, the most popular path is going to be, at best a percentage point or two of the overall traffic. Now, factor in a Back button and a search box…and the math got wayyyyy to complicated for this blogger.

The point? It’s a waste of time to try to hone in on the “most popular paths from the main page of our site.” At best, it makes sense to look one or two levels deep…and the most likely insight you will ge there is that there is a noticeable chunk of people who are clicking on a link on your main page and then clicking the Back button, which should make you question if the page they clicked to is delivering what the link implies it will.

Web analytics vendors don’t really help things. Clickstreams are so popular among the underinformed that they have to bake clickstream functionality into their tools, and they need to do so in a way that makes for a slick demo. Of course, in the case of the demo, they have control over the data, so they can show a cow-like clickstream! Reality…just isn’t that simple!

I really can’t seem to right a short entry. I’ll try again next time!

Analysis, Reporting

Reporting vs. Analysis

Posted on July 2, 2007October 25, 2019 by Eric Peterson

In my mind, all too often, we erroneously equate “reporting” with “analysis.” This can lead to a lot of cycles of spinning confusedly through reams of data or, worse, the belief that we “took action from the data” just because we converted a spreadsheet into a chart.

A former colleague of mine, Shane Stephens, and I sat down a few years ago and decided that there are really three different ways to use data:

Operational Reporting

This is when data is being reported at a high frequency and, often, at a very granular level with a discretely defined role in a given process. A daily report of all bookings from the prior day for a given salesperson’s territory is one example (it’s only a good example if his/her process includes reviewing that list each day and following up with any customers that he needs to check in on once they have placed an order). A call center report that breaks down wait times by different controllable factors is another example — used for adjusting staffing throughout the week, for instance. We even included an invoice-“printing” system as an operational report — it’s a highly detailed, highly structured report that gets sent to a customer letting him/her know what payment is due.

That’s all I’ll write about operational reporting — in a lot of ways, it’s pretty simple. Trouble arises, though, when someone starts to repurpose such a report: “I get a daily detail report of bookings for my region, so I’m just going to combine all of those into a spreadsheet to see what my bookings to date for the quarter are.” Or, same compilation, but, “…so I can analyze sales in my territory.” This winds up being darn cumbersome and can create all sorts of issues with data interpretation and application. Maybe I’ll come back to that later.

Metrics Reporting

Metrics reporting typically has aggregated data: total bookings for a territory, total bookings for the company, lead-to-sales conversion rate, etc. Key Performance Indicators (KPIs) are always metrics, but all metrics aren’t necessarily KPIs. Metrics are very different from operational reports. And, they’re a lot easier to turn into vortexes of wasted energy.

There are at least two ways that I’ve seen people get into a death spiral with metrics:

Confusing metrics with analysis. They’re wildly different…which should become evident by the end of this post.
Starting with the data when determining metrics instead of starting with objectives

I’ll tackle the latter first. The “easy” way to get data quickly is to start out by asking what data is easily available, and then choosing your metrics from that list. This is just wrong, wrong, WRONG! It’s tempting to do, and even experienced analysts who know what a slippery slope this is can easily fall into the trap. But, it’s still WRONG! (not that I have strong opinions here…)

In the long run, the right place to start when determining metrics is with what you’re trying to accomplish in business terms rather than data terms: “We’re trying to improve the effectiveness of our direct marketing efforts,” “We’re trying to grow the company,” “We’re trying to make the company more profitable,” “We’re trying to improve the user experience on our Web site.” A couple of these teeter on the edge of sounding like being in “data terms.” For instance, isn’t “grow the company” the same as “increase revenue?” Maybe. Maybe not.

The next step is to tighten up the definitions of what your objectives are. Still, stay away from thinking about the data. Think about how you would explain to your spouse, a friend, or a peer in another department what it is you are really trying to accomplish.

Random aside: A lot of business articles and books claim that, to establish good metrics, you have to start at the very top level of the company and then drill down to more detailed/granular metrics. That’s one of the fundamental premises for the Balanced Scorecard, I think. I like a lot about the balanced scorecard approach…but not this piece of it. It may work for some companies, but, in my experience, it’s just too much to try to start at the highest level and then drill all the way down to get any metrics. Rather, if the top levels of an organization have clearly articulated the company’s vision, strategy, and high level tactics, I’m all for empowering individual departments to figure out what they should be achieving and then starting there with the metrics. This does mean there needs to be some validation of metrics once they’re settled upon in order to ensure alignment. But, I’ve never seen a department (or even a project — project metrics should follow the same approach) that doesn’t get 85% of the way there by just knowing the company and understanding their role and then working through their own proposed metrics.

Back to the main point on metrics. Once you have really, really clear, tight objectives, you can sit back and brainstorm on how to measure your progress towards them. What you’ll find is that there are some objectives that can be measured very easily with a single metric. But, with other objectives, there will not be a perfect metric. In those cases, you can shoot for one or more proxies for the objective. This is actually a good sign — it means you’ve got some clear objectives that are hard to measure. That’s a damn sight better than having clear metrics with objectives that are hard to articulate!

You’re not quite done at this point with metrics. It’s absolutely critical that you set targets for each metric. It’s really tempting to want to “just measure it for a while, because we don’t even have a baseline.” Resist the temptation! If you can get some set of historical data within the next day or two, fine. Wait. I won’t be that persnickety. But, if not, set a target anyway! Come up with a number that is so high/good that you don’t need any historical data to know you’d be thrilled to hit it. Come up with the opposite — a number that would be so low/bad that you would know there’s a problem. Start working from both directions to see how much you can close the gap before being in “no idea”-land. Then, split the difference. Sure, you may be WAY off, but it’s going to be a much more useful discussion once you have actual data to put against it.

The last step is a validation step, really. For each metric, ask yourself what you would do if you missed the target. Are there actions that you (or your department) can and would take if you missed the target? Or, would you simply go and tell another department that they have a problem (Oopsy! That means it’s a metric for them — not for you; it’s their call as to whether they should use it!). Would you cross your fingers and wait for another month and hope the number looks better then? If that’s the case, you’re admitting that you either don’t know how to actually impact the number or you can’t impact the number. It’s not a valid metric.

Enough on that (starting to think I bit off more than I should’ve with my first real entry here).

Moving on to…

Analysis

Analysis is very different from metrics reporting. While metrics reporting is all about measuring the performance of a person, a department, a process, a project, or a company…and knowing what corrective action to take if there is a performance issue…analysis is about trying to figure out what’s going on with something.

The best way to approach analysis is to start with a hypothesis. If you don’t have a clear hypothesis, you’ll find yourself going in even worse circles than if you started with the data when identifying your metrics. Put simply:

Start with a clear hypothesis
Ask yourself what action you will take if the hypothesis is disproven or not disproven. If there are not clearly different actions…then you’re wasting your time. It might be a fun analysis, but it’s not going to be particulary worthwhile (contrary to popular belief, data mining, which is one form of analysis, is not simply a case of, “dump all the data into a fancy tool and see what it spits back out that you can use” — you need hypotheses for data mining!)
Develop an approach that would enable you to disprove the hypothesis with as little data as possible
Get that data…and only that data
Perform the analysis

It’s tempting to pull extra data just so it’s there. And, that’s okay, as long as you don’t expand the scope of the data-pulling dramatically. Generally, just remember that it is a lot easier to sequence together a series of small analyses (if we disprove hypothesis X then we will test sub-hypothesis Y) than trying to do it all at once in one fell swoop.

That’s all for now!

analytics

analytics

DEMYSTIFIED

Category: Analysis

When There's Just Not Enough Good Data to Draw a Conclusion

Reporting vs. Analysis — not just me making the distinction

A GIANT in web analytics says, "Don't get your hopes up…"

"You can make the data say whatever you want it to."

When a Data Geek Hits the Road

"If you can't measure it, don't do it"

One more reason why you CAN'T just start with the data

Are you data-oriented or process-oriented?

"Time Span" vs. "Time Range"…reporting

Sometimes, it doesn't make sense to look at the data

In Search of the Mythical Step Function

Reporting vs. Analysis