Presentation

Why Data Visualization Matters: It's Funnel Optimization

One of the reasons I like to give presentations at conferences is because it forces me to really, really, really, crystallize my thoughts. When I’m writing a blog post, I’m generally just trying to get an idea into some sort of coherent form, but conference presentations, for me, have a much higher bar for clarity and concision.

Part of my presentation at the Austin DAA Symposium earlier this month focused on data visualization. It didn’t go very deep into the mechanics of effective data visualization, but I did try to  make a strong case that the topic really matters.

Driving Action

As analysts, our ultimate goal is to drive action that delivers business value. Stop and consider what is involved in “driving action:”

A person who is empowered to act must make a decision to act.

So, really, what we’re talking about here is impacting a decision by a human being, and, if we consider that:

A decision is made based on thoughts and ideas in the brain.

That means that, as analysts, it behooves us to understand a little bit about how the brain works.

Neuroscience says…

Two guys who have had a strong professional influence on me are Stephen Few and John Medina:

Both books provide descriptions of the different types of memory, and both provide various tips for getting information to long-term memory, which is where information needs to be in order for a person to decide to act (and then follow through on that decision).

Taking those concepts and morphing them a bit cheekily into the marketing vernacular of “the funnel,” we’re talking about memory looking like this:

The Memory Funnel

Ultimately, if we don’t get the key points of our analysis into long-term memory, then there is little hope of action being taken. Just as eCommerce sites have to optimize their purchase funnel, as analysts, we need to optimize the memory funnel when presenting results.

In the case of the memory funnel the steps are actually much more distinct than the awareness/consideration/preference/etc. steps in the marketing funnel. They’re distinct…but they have some unpleasant realities.:

  • Iconic memory — this is also called the “visual sensory register,” and it’s where “preattentive cognitive processing” occurs. We are constantly bombarded with information, and our iconic memory is the first point that we are aware — subconsciously aware — of every bit of information in our field of view. Instantaneously, we are making decisions as to what information we should actually pay attention. This means that, instantaneously, we are discarding most of what we see! If a chart is unclear, our iconic memory may very well shift focus to the clock on the wall or the ugly tie being worn by the fellow sitting next to the analyst. Iconic memory is fickle and fleeting!
  • Short-term memory — this is where we actually focus and “think about what we’re seeing.” It’s that thought that is going to decide whether or not the information gets passed along to long-term memory. But, here’s the real kicker when it comes to short-term memory: it can only hold 3 to 9 pieces of visual information at once. It’s our RAM…but it’s RAM circa 1992, in that it has very limited capacity. The more extraneous information we include in our analysis results, the more we risk a buffer overrun. And, if short-term memory can’t fully make sense of the information, then it’s going to fall out of the funnel then and there.

“Sight” is the sense that we are forced to heavily rely on to communicate the results of our analyses. There is a lot of visual clutter occurring in our audiences’ worlds that we can’t control, and we’re competing with that visual clutter any time we deliver the results of our work. It behooves us to compete as effectively as we possibly can by effectively visualizing the information we are communicating.

Excel Tips, Presentation

Small Charts in Excel: Beyond Sparklines, Still Economical

I’m a fan of Stephen Few; pretty much, always have been, and, pretty much, always will be. When developing dashboards, reports, and analysis results, it’s not uncommon at all for me to consciously consider some Few-oriented data visualization principles.

One of those principles is “maximize the data-pixel ratio,” which is a derivation of Edward R. Tufte’s “data-ink ratio.” The concept is pretty simple: devote as much of the non-white space to actually representing data and as little as possible to decoration and structure. It’s a brilliant concept, and I’m closing in on five years since I dedicated an entire blog post to it.

Another Tufte-inspired technique that Few is a big fan of is the “sparkline.” Simply put, a sparkline is a chart that is nothing but the line of data:

Small Charts: Sparkline

In Few’s words (from his book, Information Dashboard Design: The Effective Visual Communication of Data):

Sparklines are not meant to provide the quantitative precision of a normal line graph. Their whole purpose is to provide a quick sense of historical context to enrich the meaning of the measure.

When Few designs (or critiques) a dashboard, he is a fan of sparklines. He believes (rightly), that dashboards need to fit on a single screen (for cognitive processing realities that are beyond the scope of this post), and sparklines are a great way to provide additional context about a metric in a very economical space.

Wow! Sparklines ROCK!

But, still…sparklines are easy to criticize. In different situations, the lack of the following aspects of “context” can be pretty limiting:

  • What is the timeframe covered by the sparkline? Generally, a dashboard will cover a set time period that is displayed elsewhere on the dashboard. But, it can be unclear as to whether the sparkline is the variation of the metric within the report period (the last two weeks, for instance) or, rather, if it shows a much longer period so that the user has greater historical context.
  • What is the granularity of the data? In other words, is each point on the sparkline a day? A week? A month?
  • How much is the metric really varying over time? The full vertical range of a sparkline tends to be from the smallest number to the largest number in the included data. That means a metric that is varying +/-50% from the average value can have a sparkline that looks almost identical to one that is varying +/-2%.
  • How has the metric compared to the target over time? The latest value for the metric may be separately shown as a fixed number with a comparison to a prior period.  But, the sparkline doesn’t show how the metric has been trending relative to the target (Have we been consistently below target? Consistently above target? Inconsistent relative to target?).

So, sparklines aren’t a magic bullet.

So, What’s an Alternative?

While I do use sparklines, I’ve found myself also using “small charts” more often, especially when it comes to KPIs. A small chart, developed with a healthy layer of data-pixel ratio awareness, can be both data-rich and space-economical.

Let’s take the following data set, which is a fictitious set of data showing a site’s conversion rate by day over a two-week period , as well as the conversion rate for the two weeks prior:

Small Charts: Sample Data

If we just plot the data with Excel’s (utterly horrid) default line chart, it looks like this:

Small Charts: Default Excel

Right off the bat, we can make the chart smaller without losing any data clarity by moving the legend to the top, dropping the “.00” that is on every number in the y-axis, and removing the outer border:

Small Charts: Smaller Step 1

The chart above still has an awful lot of “decoration” and not enough weight for the core data, so let’s drop the font size and color for the axis labels, remove the tick marks from both axes and the line itself from the y-axis, and lighten up the gridlines. And, to make it more clear which is the “main” data, and to make the chart more color-blind friendly in the process, let’s change the “2 Weeks Prior” line to be thinner and gray:

Small Charts: Smaller Step 2

Now, if the fact that the dates are diagonal isn’t bugging you, you’re just not paying attention. Did you realize that you’re head is cocked ever so slightly to the left as you’re reading this post?

We could simply remove the dates entirely:

Small Charts: Smaller Step 3 (Too Far)

That certainly removes the diagonal text, and it lets us shrink the chart farther, but it’s a bit extreme — we’ve lost our ability to determine time range covered by the data, and, in the process, we’ve lost an easy way to tell the granularity of the data.

What if, instead, we simply provide the first and last date in the range? We get this:

Small Charts: Smaller Final

Voila!

In this example, I’ve reduced the area of the chart by 60% and (I claim) improved the readability of the data! The “actual value” — either for the last data point or for the entire range — should also be included in the display (next to or above the chart). And, if a convention of the heavy line as the metric and the lighter gray line as the compare is used across the dashboard or the report, then the legend can be removed and the chart size can be farther reduced.

That’s Cool, but How Did You Do Just the First and Last Dates?

Excel doesn’t natively provide a “first and last date only” capability, but it’s still pretty easy to make the chart show up that way.

In this example, I simply added a “Chart Date” column and used the new column for the x-axis labels:

Small Charts: Sample Data with First and Last Date Column

The real-world case that actually inspired this post actually allows the user to change the start and end date for the report, so the number of rows in the underlying data varied. So, rather than simply copying the dates over to that column, I put the following formula in cell D3 and then dragged it down to autofill a number of additional rows. That way, Excel automatically figured out where the “last date” value should be displayed:

=IF(AND(A3<>””,A4=””),A3,””)

What that formula does is look in the main date column, and, if the current row has a date and the next row has no date, then the current row must be the last row, so the date is displayed. Otherwise, the cell is left blank.

Neither a Sparkline Nor a Full Chart Replacement

To be clear, I’m not proposing that a small chart is a replacement for either sparklines or full-on charts. Even these small charts take up much more screen real estate than a sparkline, and small charts aren’t great for showing more than a couple of metrics at once or for including data labels with the actual data values.

But, they’re a nice in-between option that are reasonably high on information content while remaining reasonably tight on screen real estate.

Presentation

Dashboard Development and Unleashing Creative Juices

Ryan Goodman of Centigon Solutions wrote up his take on a recent discussion on LinkedIn that centered on the tension between data visualization that is “flashy” versus data visualization that rigorously adheres to the teachings of Tufte and Few.

The third point in Goodman’s take is worth quoting almost in its entirety, as it is both spot-on and eloquent:

Everyone has a creative side, but someone who has never picked up a design book with an emphasis on data visualization should not implement dashboards for their own company and certainly not as a consultant. Dashboard development is not the forum to unleash creative juices when the intent is to monitor business performance. Working with clients who have educated themselves have[sic] definitely facilitated more productive engagements. Reading a book does not make you an expert, but it does allow for more constructive discussions and a smoother delivery of a dashboard.

“The book” of choice (in my mind, and, I suspect, in Goodman’s) is Few’s Information Dashboard Design: The Effective Visual Communication of Data (which I’ve written about before). Data visualization is one of those areas where spending just an hour or two understanding some best practices, and, more importantly, why those are best practices, can drive a permanent and positive change in behavior, both for analytical-types with little visual design aptitude and for visual design-types with little analytical background.

Goodman goes on in his post to be somewhat ambivalent about tool vendors’ responsibility and culpability when it comes to data visualization misfires. On the one hand, he feels like Few is overly harsh when it comes to criticizing vendors whose demos illustrate worst practice visualizations (I agree with Few on this one). But, he also acknowledges that vendors need to “put their best foot forward to prove that their technology can deliver adequate dashboard execution as well as marketing sizzle.” I agree there, too.

Presentation

Recovery.gov Needs Some Few and Some Tufte

I caught an NPR story about recovery.gov last week, and it sounded really promising. Depending on where you fall on the political spectrum, the various rounds of stimulus and bailout funding that have come through over the past six months fall somewhere between “throwing money away,” “ready, fire, aim,” and “point in what seems what might be a good direction, pull the finger, and shoot.” No one can stand up and say, with 100% certainty, that we’re not going to look back on this approach in a decade or two and say, “Um…oops?”

It’s hard to imagine anyone taking issue with the proclaimed intent of recovery.gov, though — make the process as transparent as possible, including how much money is going where, when it’s going, and what ultimately comes of it. It was a day or two before I found myself at a computer with time to check out the site…and I was disappointed. In the NPR interview, the interviewer commented how the site was slick and clean. Reality is “not so much.”

Now, I did once take a run at downloading the federal budget to try to scratch a curiousity itch regarding, at a macro level, where the federal government allocates its funds. On the one hand, I was pleased that I was able to find a .csv file with a sea of data that I could easily download and open with Excel. On the other hand, the budget is incredibly complex, and it takes someone with a deeper understanding of our government to really translate that sea of data into the answers I was looking for. Really, though, that wasn’t a surprise:

The data is ALWAYS more complex than you would like…when you’re trying to answer a specific question.

To the credit of recovery.gov, they clearly intended to show some high-level charts that would answer some of the more common questions citizens are asking. Unfortunately, it looks like they turned over the exercise to a web designer who had no experience in data visualization.

Examples from the featured area on the home page:

recovery.gov Funds Distribution Reported by Week

The overall dark/inverse style itself I won’t knock too much (althought it bothers me). And, the fact that the gridlines are kept to a minimum is definitely a good thing. My main beef is admittedly a bit ticky-tack. There was an earlier version where there was a $30 B gridline, and that has since been removed — that gridline clearly showed the “30.5 B point” being below the midway point between 20 B and 40 B. Clearly, someone would have to really be scrutinizing the graph to identify this hiccup, but someone will.

When presenting data to an audience, the data as it stands alone needs to be rock solid. If it contradicts itself, even in a minor way, it risks having its overall credibility questioned.

So, moving on to some more egregious examples:

recover.gov Relief for America's Working Families

We get a triple-whammy with this one:

  • Pie charts are inherently difficult for the human brain to interpret accurately
  • Pie charts are even worse when they are “tilted” to give a 3D effect — the wedges on the right and left get “shrunk” while wedges on the top or bottom get “stretched”
  • Exploding a pie chart and then providing a pie chart of just the wedge…just ain’t good

Two questions this visualization might have been trying to answer:

  • How much of the stimulus plan is devoted to tax benefits?
  • How much of the stimulus plan is going to the “Making Work Pay” tax credit?

Without doing any math, can you estimate either one of these? For the first question, you’re estimating the size of the small wedge on the left pie chart. It looks like it’s ~ 1/4 of the pie, doesn’t it? In reality, it’s 37%! For the second question, you have to combine your first estimate with an estimate of the lavender wedge in the right pie chart…and that’s way more work than it’s worth. If you do the math, you’ll get that the lavender wedge works out to ~7% of the entire left pie. A simple table or a bar graph would be more effective.

And, finally, the estimated distribution of Highway Infrastructure Funds:

recovery.gov Distribution of Highway Infrastructure Funding

Well, that’s just silly. There is NO value of making these bars come flying out of the graph. Really.

Now, to the site’s credit, it takes all of 3 clicks to get from the home page to downloading .csv files with department-specific data and weekly updates (which includes human-entered context as to major activities during the prior week). That’s good (assuming it’s not unduly cumbersome to maintain)! And, I’m sure the site will continue to evolve. But, I’d love to see them bring in some data visualization expertise. The model for the visualization should be pretty simple:

  1. Identify the questions that citizens are asking about the stimulus money
  2. Present the data in the way that answers those questions most effectively
  3. Link to the underlying data — the aggregate and the detail — directly from each visualization

As it turns out, Edward Tufte has already been engaged (thanks to Peter Couvares for that tip via Twitter), and is doing some pro bono work. But, it’s not clear that he’s focussing on the high-level stuff. I would love to see Stephen Few get involved as well — pro bono or not! Or, hell, I’d offer my services…but might as well get the Top Dog for something like this.

Starting today, the site is hosting a weeklong online dialogue to engage the public, potential recipients, solution providers, and state, local and tribal partners about how to make Recovery.gov better. I’ve submitted a couple of ideas already!


Excel Tips, Presentation

Data Visualization — March Madness Style

I got an e-mail last week just a few hours into Round 1 of this year’s NCAA men’s basketball tournament. The subject of the email was simply “dumb graph,” and the key line in the note was:

The “game flow” graph…how in the WORLD is that telling me anything? That the score goes up as the game goes on? Really? Ya think?

My friend was referring to the diagrams that ESPN.com is providing for every game in the tournament. The concept of these graphs is pretty simple: plot the score for each team over the course of the game. For instance, the “Game Flow” graph for the Oklahoma vs. Morgan State game looks like this (you can see the actual graph on the game recap page — just scroll down a bit and it’s on the right):

Oklahoma vs. Morgan State

This isn’t an exact replication, but it’s pretty close — best I could manage in Excel 2007 (the raw data is courtesy of the ESPN.com play-by-play page  for the game). ESPN’s graph is a Flash-based chart, so it’s got some interactivity that the image above does not (we’ll get to that in a bit).

The graph shows that the game was tight for the first 4-5 minutes, then Oklahoma pulled away, Morgan State made it really close mid-way through the first half, and then Oklahoma pulled away and never looked back. My friend had a point, though —  the dominant feature of the graph is that both lines trend up and to the right…and any chart of a basketball game is going to exhibit that pattern (actually, the play-by-play for that game has a couple of hiccups such that, when I originally pulled the data, I had a couple places where the score went down due to out-of-sequence free throw placement…but I noticed the issue and fixed it). In business, we’re pretty well conditioned to see “up and to the right” as a good thing…but it’s meaningless in the case of a basketball game.

Compare that graph to a game that was much closer — the Clemson vs. Michigan game (the graph on ESPN’s site is on the recap page, and the raw data is on the play-by-play page):

Clemson vs. Michigan

This was a tighter game all through the first half. Clemson led for the first 7-8 minutes, Michigan pulled substantially ahead early in the second half, and then things got tight in the last few minutes of the game. But, again, both lines moved up and to the right.

These charts are not difficult to interpret:

  • The line on top is the team that is leading
  • The distance between the lines is the size of the lead
  • The lines crossing signifies a lead change

But, could we do better? Well, my wife and kids are out-of-town for the week (spring break), I have the social life you’d expect from someone who blogs about data and data visualization, and the fridge is well-stocked with beer. Party. ON!

At best, my level of basketball fan-ness hovers right around “casual.” Still, I follow it enough to know the key factors of a game update or game upset (Think: “Hey, Joe. What’s the score?”). Basically:

  • Who’s winning?
  • By how much?

(If there’s time for a third data point, the actual score is an indication of whether it’s a high scoring shootout or a low scoring defense-oriented game.)

Given these two factors as the key measures of a game, take another look at the graphs above. When the game is tight, you have to look closely to assess who is winning. And, determining how much they’re winning by requires some mental exertion (try it yourself: look back at the last graph and ask yourself how much Michigan was winning by halfway through the second half).

This is just begging for a Stephen Few-style exercise to see if I can do better.

First, the Oklahoma/Morgan State game:

Oklahoma vs. Morgan State

Rather than plotting both team’s scores, with the total score on the Y-axis, this chart plots a single line with the size of the lead — whichever side of the “0” line the plot is on is the team that is winning. The team on the top is the higher seed, and the team on the bottom is the lower seed. I added the actual score at halftime and the end of the game, as well as each team’s seed. Compare that chart to the much closer Clemson/Michigan game:

Clemson vs. Michigan

The chart looks very different — focussing on what information fans really want and presenting it directly, rather than presenting the data in a way that requires mental exertion to derive what the fan is really interested in: who’s winning and by how much? While the graphs on ESPN’s site allow you to mouse over any point in the game and see the exact score and the exact amount of time remaining, it’s hard to imagine who would actually care to do that — better to come up with an information-rich and easy-to-interpret static chart than to get fancy with unnecessary interactivity.

A few other subtle changes to the alternative representation:

  • I tried to dramatically increase the “data-pixel ratio” (Few’s principle that the ratio of actual data to decoration should be maximized) — this is a little unfair to ESPN, as their site is working with an overall style and palette for the site, but it’s still worth keeping in mind
  • I used color on the Y-axis to show which team’s lead is above/below the mid-line. The numbers below the middle horizontal line are actually negative numbers, but with a little Excel trickery, I was able to remove the “-” and change the color of the labels (all done through Custom number formatting)
  • By putting the top seed on the top, looking at a full page of these charts would quickly highlight the games that were upsets

I’m my own worst critic, so here are two things I don’t like about the alternate charts above:

  • The overall palette still feels a little clunky — the main data plot doesn’t seem to “pop” as much as it should, even though it’s black, and the shaded heading doesn’t feel right
  • While the interpretation of the data requires less mental effort once you understand what the chart is showing, it does seem like this approach requires another half-second of interpretation upr front that the original charts don’t require

What do you think? What else could I try to improve the representation?

Presentation

Data Visualization — Few's Examples

I attended a United Way meeting last week that was hosted at an overburdened county government agency site in south Columbus. The gist of the meeting was discussing the bleakness of the economy and what that could or should mean to the work of the committee. The head of the government agency did a brief presentation on what the agency does and what they are seeing, and the presentation included the distribution of a packet of charts with data the agency tracks.

I was struck by how absolutely horridly the information was presented. A note at the bottom of each chart indicated that the same staff member had compiled each chart. Yet, there was absolutely no consistency from one chart to the next: the color palette changed from chart to chart (and none of the palettes were particularly good), a 3-D effect was used on some charts and not others (3-D effects are always bad, so I suppose I’d rather inconsistency than having 3-D effects on every chart), and totally different chart types were used to present similar information. On several of the bar charts, each bar was a different color, which made for an extremely distracting visualization of the information. 

I glanced around the room and saw that most of the other committee members had furrowed brows as they studied the information. It occurred to me that there was an undue amount of mental exertion going on to understand what was being presented that would have been better spent thinking through the implications of the information.

Ineffective presentation of data can significantly mute the value of a fundamentally useful report or analysis.

Show Me the NumbersLater that evening, I found myself popping around the web — ordering my own copy of Stephen Few’s Show Me the Numbers, and, later, poking around on Few’s site. Specifically, I spent some time on his Examples page, browsing through the myriad before/after examples that clearly illustrate how the same information, presented with the same amount of effort, but using some basic common principles, dramatically reduce the mental effort required to understand what is going on.

It’s a fascinating collection of examples. And Show Me the Numbers is a seminal book on the topic.

Analysis, Analytics Strategy, Excel Tips, General, Presentation, Reporting

The Best Little Book on Data

How’s that for a book title? Would it pique your interest? Would you download it and read it? Do you have friends or co-workers who would be interested in it?

Why am I asking?

Because it doesn’t exist. Yet. Call it a working title for a project I’ve been kicking around in my head for a couple of years. In a lot of ways, this blog has been and continues to be a way for me to jot down and try out ideas to include in the book. This is my first stab at trying to capture a real structure, though.

The Best Little Book on Data

In my mind, the book will be a quick, easy read — as entertaining as a greased pig loose at a black-tie political fundraiser — but will really hammer home some key concepts around how to use data effectively. If I’m lucky, I’ll talk a cartoonist into some pen-and-ink, one-panel chucklers to sprinkle throughout it. I’ll come up with some sort of theme that will tie the chapter titles together — “myths” would be good…except that means every title is basically a negative of the subject; “Commandments” could work…but I’m too inherently politically correct to really be comfortable with biblical overtones; an “…In which our hero…” style (the “hero” being the reader, I guess?). Obviously, I need to work that out.

First cut at the structure:

  • Introduction — who this book is for; in a nutshell, it’s targeted at anyone in business who knows they have a lot of data, who knows they need to be using that data…but who wants some practical tips and concepts as to how to actually go about doing just that.
  • Chapter 1: Start with the Data…If You Want to Guarantee Failure — it’s tempting to think that, to use data effectively, the first thing you should do is go out and query/pull the data that you’re interested in. That’s a great way to get lost in spreadsheets and emerge hours (or days!) later with some charts that are, at best, interesting but not actionable, and, at worst, not even interesting.
  • Chapter 2: Metrics vs. Analysis — providing some real clarity regarding the fundamentally different ways to “use data.” Metrics are for performance measurement and monitoring — they are all about the “what” and are tied to objectives and targets. Analysis is all about the “why” — it’s exploratory and needs to be hypothesis driven. Operational data is a third way, but not really covered in the book, so probably described here just to complete the framework.
  • Chapter 3: Objective Clarity — a deeper dive into setting up metrics/performance measurement, and how to start with being clear as to the objectives for what’s being measured, going from there to identifying metrics (direct measures combined with proxy measures), establishing targets for the metrics (and why, “I can’t set one until I’ve tracked it for a while” is a total copout), and validating the framework
  • Chapter 4: When “The Metric Went Up” Doesn’t Mean a Gosh Darn Thing — another chapter on metrics/performance measuremen. A discussion of the temptation to over-interpret time-based performance metrics. If a key metric is higher this month than last month…it doesn’t necessarily mean things are improving. This includes a high-level discussion of “signal vs. noise,” an illustration of how easy it is to get lulled into believing something is “good” or “bad” when it’s really “inconclusive,” and some techniques for avoiding this pitfall (such as using simple, rudimentary control limits to frame trend data).
  • Chapter 5: Remember the Scientific Method? — a deeper dive on analysis and how it needs to be hypothesis-driven…but with the twist that you should validate that the results will be actionable just by assessing the hypothesis before actually pulling data and conducting the analysis
  • Chapter 6: Data Visualization Matters — largely, a summary/highlights of the stellar work that Stephen Few has done (and, since he built on Tufte’s work, I’m sure there would be some level of homage to him as well). This will include a discussion of how graphic designers tend to not be wired to think about data and analysis, while highly data-oriented people tend to fall short when it comes to visual talent. Yet…to really deliver useful information, these have to come together. And, of course, illustrative before/after examples.
  • Chapter 7: Microsoft Excel…and Why BI Vendors Hate It — the BI industry has tried to equate MS Excel with “spreadmarts” and, by extension, deride any company that is relying heavily on Excel for reporting and/or analysis as being wildly early on the maturity curve when it comes to using data. This chapter will blow some holes in that…while also providing guidance on when/where/how BI tools are needed (I don’t know where data warehousing will fit in — this chapter, a new chapter, or not at all). This chapter would also reference some freely downloadable spreadsheets with examples, macros, and instructions for customizing an Excel implementation to do some of the data visualization work that Excel can do…but doesn’t default to. Hmmm… JT? Miriam? I’m seeing myself snooping for some help from the experts on these!
  • Chapter 8: Your Data is Dirty. Get Over It. — CRM data, ERP data, web analytics data, it doesn’t matter what kind of data. It’s always dirtier than the people who haven’t really drilled down into it assume. It’s really easy to get hung up on this when you start digging into it…and that’s a good way to waste a lot of effort. Which isn’t to say that some understanding of data gaps and shortcomings isn’t important.
  • Chapter 9: Web Analytics — I’m not sure exactly where this fits, but it feels like it would be a mistake to not provide at least a basic overview of web analytics, pitfalls (which really go to not applying the core concepts already covered, but web analytics tools make it easy to forget them), and maybe even providing some thoughts on social media measurement.
  • Chapter 10: A Collection of Data Cliches and Myths — This may actually be more of an appendix, but it’s worth sharing the cliches that are wrong and myths that are worth filing away, I think: “the myth of the step function” (unrealistic expectations), “the myth that people are cows” (might put this in the web analytics section), “if you can’t measure it, don’t do it” (and why that’s just plain silliness)
  • Chapter 11: Bringing It All Together — I assume there will be such a chapter, but I’m going to have to rely on nailing the theme and the overall structure before I know how it will shake out.

What do you think? What’s missing? Which of these remind you of anecdotes in your own experience (haven’t you always dreamed of being included in the Acknowledgments section of a book? Even if it’s a free eBook?)? What topic(s) are you most interested in? Back to the questions I opened this post with — would you be interested in reading this book, and do you have friends or co-workers who would be interested? Or, am I just imagining that this would fill a gap that many businesses are struggling with?

Presentation

Test Your Data Visualization IQ

Data visualization has really been on my mind of late. Partly because I’ve personally been struggling to produce some effectively-presented information, and, even more so, because one of my co-workers has been spending even more time overhauling the way we communicate data-driven information to our clients. He’s making a lot more headway on his work than I am on mine, unfortunately (for me).

Last week, he pinged me with a link to a 10-question Graph Design IQ test on the Perceptual Edge web site. He and I both patted ourselves on the back for scoring 10 out of 10…but, then again, he and I had both recently read, and are working to apply, Stephen Few’s Information Dashboard Design. And, Perceptual Edge = Stephen Few. So, let’s face it, it would be downright embarrassing if we’d just read the book…and then not aced the test. Still, it’s a good exercise — almost a memory-jogger as to the concepts Few lays out and the rationale behind them.

Take the test and see how you do. It’s all of 10 questions long, and each question only has two answers, so it’s a 2-minute exercise. If you get a question wrong, a little box comes up and gives a very quick/brief explanation as to why. It’s interesting.

On a related note, thanks to a couple of people on Twitter, I got pointed to an interesting post on creating bullet graphs using Google Spreadsheets. One of these days, doggonit, I’m going to lock myself away and play around with Google Docs. I managed to spend 15 minutes trying out a revamped scorecard using Google Docs today. But, that was 15 minutes in five 3-minute chunks, so I didn’t make much headway. Stay tuned, though, and I’ll let you know if I ever manage to produce anything there!

Excel Tips, Presentation

Stephen Few's Derivation of Tufte: The Data-Pixel Ratio

I’ve glanced through various folks’ copies of Stephen Few’s Information Dashboard Design: The Effective Visual Communication of Data on several occasions over the past few years. And, it was a heavy influence on the work that an ad hoc team in the BI department at National Instruments undertook a couple of years ago to standardize/professionalize the work they were putting out.

I finally got around to reading a good chunk of the book as I was flying a three-legged trip out to British Columbia last week…and it is good! One section that particularly struck me started on page 100:

Edward R. Tufte introduced a concept in his 1983 classic The Visual Display of Quantitative Information that he calls the “data-ink ratio.” When quantitative data is displayed in printed form, some of the ink that appears on the page presents data, and some presents visual content that is not data.
:

He then applies it as a principle of design: “Maximize the data-ink ratio, within reason. Every bit of ink on a graphic requires a reason. And nearly always that reason should be that the ink presents new information.”
:
This principle applies perfectly to the design of dashboards, with one simple revision: because dashboards are always displayed on computer screens, i’ve changed the work “ink” to “pixels.”

I’ll actually go farther and say that “dashboards” can be replaced with “spreadsheets” and this maxim holds true. Taking some sample data straight from Few’s book, and working with a simple table, below is how at least 50% of Excel users would format a simple table with bookings by geographic region:

Look familiar? The light gray gridlines in the background turned on in Excel by default. And, a failure to resist the urge to put a “thin” grid around the entire data set.

Contrast that with how Few represents the same data:

Do you agree? This is clearly an improvement, and all Few really did was remove the unnecessary non-data pixels.

So, how would I have actually formatted the table? It’s tough to resist the urge to add color, and I am a fan of alternating shaded rows, which I can add with a single button click based on a macro that adds conditional formatting (“=MOD(ROW()+1,2)=0” for shaded and “=MOD(ROW(),2)=0” for not shaded):

In this case…I’d actually vote for Few’s approach. But, even Few gives the okay to lightly shaded alternative rows later in the same chapter, when some sort of visual aid is needed to follow a row across a large set of data. That’s really not necessary in this case. And, does bolding the totals really add anything? I don’t know that it does.

The book is a great read. It’s easy to dismiss the topic as inconsequential — the data is the data, and as long as it’s presented accurately, does it really matter if it’s presented effectively? In my book, it absolutely does matter. The more effectively the data is presented, the less work the consumer of the data needs to do to understand it. The human brain, while a wondrously effective computer, has its limits, and presenting data effectively allows the brain to spend the bulk of its effort on assessing the information rather than trying to understand the data.