Excel Tips, Presentation

Small Charts in Excel: Beyond Sparklines, Still Economical

I’m a fan of Stephen Few; pretty much, always have been, and, pretty much, always will be. When developing dashboards, reports, and analysis results, it’s not uncommon at all for me to consciously consider some Few-oriented data visualization principles.

One of those principles is “maximize the data-pixel ratio,” which is a derivation of Edward R. Tufte’s “data-ink ratio.” The concept is pretty simple: devote as much of the non-white space to actually representing data and as little as possible to decoration and structure. It’s a brilliant concept, and I’m closing in on five years since I dedicated an entire blog post to it.

Another Tufte-inspired technique that Few is a big fan of is the “sparkline.” Simply put, a sparkline is a chart that is nothing but the line of data:

Small Charts: Sparkline

In Few’s words (from his book, Information Dashboard Design: The Effective Visual Communication of Data):

Sparklines are not meant to provide the quantitative precision of a normal line graph. Their whole purpose is to provide a quick sense of historical context to enrich the meaning of the measure.

When Few designs (or critiques) a dashboard, he is a fan of sparklines. He believes (rightly), that dashboards need to fit on a single screen (for cognitive processing realities that are beyond the scope of this post), and sparklines are a great way to provide additional context about a metric in a very economical space.

Wow! Sparklines ROCK!

But, still…sparklines are easy to criticize. In different situations, the lack of the following aspects of “context” can be pretty limiting:

  • What is the timeframe covered by the sparkline? Generally, a dashboard will cover a set time period that is displayed elsewhere on the dashboard. But, it can be unclear as to whether the sparkline is the variation of the metric within the report period (the last two weeks, for instance) or, rather, if it shows a much longer period so that the user has greater historical context.
  • What is the granularity of the data? In other words, is each point on the sparkline a day? A week? A month?
  • How much is the metric really varying over time? The full vertical range of a sparkline tends to be from the smallest number to the largest number in the included data. That means a metric that is varying +/-50% from the average value can have a sparkline that looks almost identical to one that is varying +/-2%.
  • How has the metric compared to the target over time? The latest value for the metric may be separately shown as a fixed number with a comparison to a prior period.  But, the sparkline doesn’t show how the metric has been trending relative to the target (Have we been consistently below target? Consistently above target? Inconsistent relative to target?).

So, sparklines aren’t a magic bullet.

So, What’s an Alternative?

While I do use sparklines, I’ve found myself also using “small charts” more often, especially when it comes to KPIs. A small chart, developed with a healthy layer of data-pixel ratio awareness, can be both data-rich and space-economical.

Let’s take the following data set, which is a fictitious set of data showing a site’s conversion rate by day over a two-week period , as well as the conversion rate for the two weeks prior:

Small Charts: Sample Data

If we just plot the data with Excel’s (utterly horrid) default line chart, it looks like this:

Small Charts: Default Excel

Right off the bat, we can make the chart smaller without losing any data clarity by moving the legend to the top, dropping the “.00” that is on every number in the y-axis, and removing the outer border:

Small Charts: Smaller Step 1

The chart above still has an awful lot of “decoration” and not enough weight for the core data, so let’s drop the font size and color for the axis labels, remove the tick marks from both axes and the line itself from the y-axis, and lighten up the gridlines. And, to make it more clear which is the “main” data, and to make the chart more color-blind friendly in the process, let’s change the “2 Weeks Prior” line to be thinner and gray:

Small Charts: Smaller Step 2

Now, if the fact that the dates are diagonal isn’t bugging you, you’re just not paying attention. Did you realize that you’re head is cocked ever so slightly to the left as you’re reading this post?

We could simply remove the dates entirely:

Small Charts: Smaller Step 3 (Too Far)

That certainly removes the diagonal text, and it lets us shrink the chart farther, but it’s a bit extreme — we’ve lost our ability to determine time range covered by the data, and, in the process, we’ve lost an easy way to tell the granularity of the data.

What if, instead, we simply provide the first and last date in the range? We get this:

Small Charts: Smaller Final

Voila!

In this example, I’ve reduced the area of the chart by 60% and (I claim) improved the readability of the data! The “actual value” — either for the last data point or for the entire range — should also be included in the display (next to or above the chart). And, if a convention of the heavy line as the metric and the lighter gray line as the compare is used across the dashboard or the report, then the legend can be removed and the chart size can be farther reduced.

That’s Cool, but How Did You Do Just the First and Last Dates?

Excel doesn’t natively provide a “first and last date only” capability, but it’s still pretty easy to make the chart show up that way.

In this example, I simply added a “Chart Date” column and used the new column for the x-axis labels:

Small Charts: Sample Data with First and Last Date Column

The real-world case that actually inspired this post actually allows the user to change the start and end date for the report, so the number of rows in the underlying data varied. So, rather than simply copying the dates over to that column, I put the following formula in cell D3 and then dragged it down to autofill a number of additional rows. That way, Excel automatically figured out where the “last date” value should be displayed:

=IF(AND(A3<>””,A4=””),A3,””)

What that formula does is look in the main date column, and, if the current row has a date and the next row has no date, then the current row must be the last row, so the date is displayed. Otherwise, the cell is left blank.

Neither a Sparkline Nor a Full Chart Replacement

To be clear, I’m not proposing that a small chart is a replacement for either sparklines or full-on charts. Even these small charts take up much more screen real estate than a sparkline, and small charts aren’t great for showing more than a couple of metrics at once or for including data labels with the actual data values.

But, they’re a nice in-between option that are reasonably high on information content while remaining reasonably tight on screen real estate.

Presentation

Dashboard Development and Unleashing Creative Juices

Ryan Goodman of Centigon Solutions wrote up his take on a recent discussion on LinkedIn that centered on the tension between data visualization that is “flashy” versus data visualization that rigorously adheres to the teachings of Tufte and Few.

The third point in Goodman’s take is worth quoting almost in its entirety, as it is both spot-on and eloquent:

Everyone has a creative side, but someone who has never picked up a design book with an emphasis on data visualization should not implement dashboards for their own company and certainly not as a consultant. Dashboard development is not the forum to unleash creative juices when the intent is to monitor business performance. Working with clients who have educated themselves have[sic] definitely facilitated more productive engagements. Reading a book does not make you an expert, but it does allow for more constructive discussions and a smoother delivery of a dashboard.

“The book” of choice (in my mind, and, I suspect, in Goodman’s) is Few’s Information Dashboard Design: The Effective Visual Communication of Data (which I’ve written about before). Data visualization is one of those areas where spending just an hour or two understanding some best practices, and, more importantly, why those are best practices, can drive a permanent and positive change in behavior, both for analytical-types with little visual design aptitude and for visual design-types with little analytical background.

Goodman goes on in his post to be somewhat ambivalent about tool vendors’ responsibility and culpability when it comes to data visualization misfires. On the one hand, he feels like Few is overly harsh when it comes to criticizing vendors whose demos illustrate worst practice visualizations (I agree with Few on this one). But, he also acknowledges that vendors need to “put their best foot forward to prove that their technology can deliver adequate dashboard execution as well as marketing sizzle.” I agree there, too.

Presentation

Recovery.gov Needs Some Few and Some Tufte

I caught an NPR story about recovery.gov last week, and it sounded really promising. Depending on where you fall on the political spectrum, the various rounds of stimulus and bailout funding that have come through over the past six months fall somewhere between “throwing money away,” “ready, fire, aim,” and “point in what seems what might be a good direction, pull the finger, and shoot.” No one can stand up and say, with 100% certainty, that we’re not going to look back on this approach in a decade or two and say, “Um…oops?”

It’s hard to imagine anyone taking issue with the proclaimed intent of recovery.gov, though — make the process as transparent as possible, including how much money is going where, when it’s going, and what ultimately comes of it. It was a day or two before I found myself at a computer with time to check out the site…and I was disappointed. In the NPR interview, the interviewer commented how the site was slick and clean. Reality is “not so much.”

Now, I did once take a run at downloading the federal budget to try to scratch a curiousity itch regarding, at a macro level, where the federal government allocates its funds. On the one hand, I was pleased that I was able to find a .csv file with a sea of data that I could easily download and open with Excel. On the other hand, the budget is incredibly complex, and it takes someone with a deeper understanding of our government to really translate that sea of data into the answers I was looking for. Really, though, that wasn’t a surprise:

The data is ALWAYS more complex than you would like…when you’re trying to answer a specific question.

To the credit of recovery.gov, they clearly intended to show some high-level charts that would answer some of the more common questions citizens are asking. Unfortunately, it looks like they turned over the exercise to a web designer who had no experience in data visualization.

Examples from the featured area on the home page:

recovery.gov Funds Distribution Reported by Week

The overall dark/inverse style itself I won’t knock too much (althought it bothers me). And, the fact that the gridlines are kept to a minimum is definitely a good thing. My main beef is admittedly a bit ticky-tack. There was an earlier version where there was a $30 B gridline, and that has since been removed — that gridline clearly showed the “30.5 B point” being below the midway point between 20 B and 40 B. Clearly, someone would have to really be scrutinizing the graph to identify this hiccup, but someone will.

When presenting data to an audience, the data as it stands alone needs to be rock solid. If it contradicts itself, even in a minor way, it risks having its overall credibility questioned.

So, moving on to some more egregious examples:

recover.gov Relief for America's Working Families

We get a triple-whammy with this one:

  • Pie charts are inherently difficult for the human brain to interpret accurately
  • Pie charts are even worse when they are “tilted” to give a 3D effect — the wedges on the right and left get “shrunk” while wedges on the top or bottom get “stretched”
  • Exploding a pie chart and then providing a pie chart of just the wedge…just ain’t good

Two questions this visualization might have been trying to answer:

  • How much of the stimulus plan is devoted to tax benefits?
  • How much of the stimulus plan is going to the “Making Work Pay” tax credit?

Without doing any math, can you estimate either one of these? For the first question, you’re estimating the size of the small wedge on the left pie chart. It looks like it’s ~ 1/4 of the pie, doesn’t it? In reality, it’s 37%! For the second question, you have to combine your first estimate with an estimate of the lavender wedge in the right pie chart…and that’s way more work than it’s worth. If you do the math, you’ll get that the lavender wedge works out to ~7% of the entire left pie. A simple table or a bar graph would be more effective.

And, finally, the estimated distribution of Highway Infrastructure Funds:

recovery.gov Distribution of Highway Infrastructure Funding

Well, that’s just silly. There is NO value of making these bars come flying out of the graph. Really.

Now, to the site’s credit, it takes all of 3 clicks to get from the home page to downloading .csv files with department-specific data and weekly updates (which includes human-entered context as to major activities during the prior week). That’s good (assuming it’s not unduly cumbersome to maintain)! And, I’m sure the site will continue to evolve. But, I’d love to see them bring in some data visualization expertise. The model for the visualization should be pretty simple:

  1. Identify the questions that citizens are asking about the stimulus money
  2. Present the data in the way that answers those questions most effectively
  3. Link to the underlying data — the aggregate and the detail — directly from each visualization

As it turns out, Edward Tufte has already been engaged (thanks to Peter Couvares for that tip via Twitter), and is doing some pro bono work. But, it’s not clear that he’s focussing on the high-level stuff. I would love to see Stephen Few get involved as well — pro bono or not! Or, hell, I’d offer my services…but might as well get the Top Dog for something like this.

Starting today, the site is hosting a weeklong online dialogue to engage the public, potential recipients, solution providers, and state, local and tribal partners about how to make Recovery.gov better. I’ve submitted a couple of ideas already!


General

PowerPoint the Application vs. the Application of PowerPoint

Slightly off-topic for this blog, and a little dated, but worth sharing nonetheless.

During a discussion with a couple of my co-workers today, I made an observation about how my current company, as well as one of the major consulting firms we use, seem to really be in love with PowerPoint as the documentation/presentation/communication/general-purpose tool of choice. This prompted an immediate and emphatic response from one of those co-workers, who insisted that he “loved PowerPoint.” 

The exchange reminded me of the news last year that Katsuaki Watanabe, the President and CEO of Toyota, had decreed that employees stop using PowerPoint for the creation of documents. Garr Reynolds (aka, Presentation Zen…master), had a great take on the news. A couple of the highlights:

  • To be clear, Watanabe did not “ban PowerPoint use,” as was mis-circulated at the time
  • Watanabe did severely discourage the use of PowerPoint as a documentation tool — Reynolds calls these “slideuments” (slides + documents), which is a wickedly apt designation (and the core of this post)
  • “…visuals projected on a screen in support of a live talk are very different from material that is to be printed and read and/or analyzed.”

And, a longer excerpt that is also key:

…there is often no distinction made between documents (slideuments made in PowerPoint) and presentation slides prepared for projection. They are often interchangeable. Sounds efficient, right? And it would be funny if it was not so inefficient, wasteful, and unproductive. The slideuments produced in Japan make understanding and precision harder when printed, and when used for projected slides in a darkened conference room, they are the country’s number one cure for insomnia. 

This was fundamentally the distinction that I was trying to get my co-worker to understand…without much luck. He’s clearly got some PowerPoint chops — he kept pulling up different slides he had done that had intricate builds and snazzy palettes and templates. But, the slides he was most proud of were heavily laden with annotations and text — they were standalone, comprehensive pictorial representations of complex concepts or systems.

Once he let loose with, “The point of PowerPoint is not the retention of the information — it’s the ‘wow’ factor,” I admitted defeat.

The title of this post is really the gist of my thesis here: Powerpoint the application is not the same thing as the application of PowerPoint. All too often, we don’t make that distinction. As Reynolds puts it, “Slideware is not a method, it’s simply a kind of tool.”

Think of a sledgehammer. It’s a tool — an application, if you will. But, it can applied for vastly different purposes:

  • Used with a wedge to split firewood
  • Used to drive a metal fencepost into the ground
  • Used to prop open a door that keeps blowing shut

These are very different applications of the tool, and you would be clear as to what it was you were trying to accomplish when you hiked it over your shoulder and headed off to the task at hand.

It's not what the software does...
[Cartoon by Hugh MacLeod — see oodles more at gapingvoid.com]

The same holds true for PowerPoint. It has several different distinct possible uses…and it’s worth being clear as to which one you are tackling:

  • A live, in-person demonstration — think simple, minimalist visual backup that supports an engaging presenter without distracting from what he/she is saying; think Steve Jobs (and, if you’re not familiar, check out one of the Presentation Zen posts on that subject)
  • A live, online presentation via a webinar or web conferencing solution — this is a stickier wicket, in a lot of ways; it’s tempting to hedge against technology quirks by distributing the .ppt/.pptx file to all of the attendees via e-mail so they can simply pull up the deck and follow along, but this can be problematic, as the audience can then jump ahead and jump back. Generally, this sort of presentation is “the best alternative we have” when, ideally, you’d be doing a live, in-person demonstration. I would think this means the same minimalist approach described in the prior bullet would apply.
  • Documentation never intended to be presented — slideuments — these really are problematic and should be avoided. 

All too often, there is a blurring of all three of these: a live presentation for some people, while other people are participating remotely, and the “presenter” has distributed the presentation as a handout that has all of the detail that he/she is going to present. That leaves the participants cognitively vascillating between listening to the presenter’s words and reading through the detail in the presentation that is either being projected or is printed in front of them. It’s just not effective. Make the presentation a presentation. If there is supplemental detail or review material, put that in a document and distribute it separately — before, during, or after the presentation. Let the presentation be truly visual and let it support the concepts and information being presented, with an emphasis on the concepts

Aside: To bridge back to the topic of this site, I’ve even seen PowerPoint used as a poor man’s BI presentation tool: PowerPoint 2007 linked to Excel 2007, which was in turn linked to Access 2007, which was in turn hooked into a SQL Server database. On the one hand…<shudder>. On the other hand, when it came to a portable (once the link to Excel was removed), shareable report, it wasn’t half bad! (Our intent was for it to also be a prototype that we could iterate on quickly as we developed requirements for a true BI tool…but that didn’t pan out for other reasons.)

So, that’s my mini-rant. It’s a problem. A clear problem. But, not one that I intend to solve. If you’re interested in thinking more about the topic check out:

  • Presentation Zen (obviously)
  • Laura Fitton / Pistachio Consulting — you can just look at her posts that have the presentation tag
  • For the militant/extreme death-to-PowerPoint take, there’s the inimitable Edward Tufte…but he really does go a bit overboard 
Excel Tips, Presentation

Stephen Few's Derivation of Tufte: The Data-Pixel Ratio

I’ve glanced through various folks’ copies of Stephen Few’s Information Dashboard Design: The Effective Visual Communication of Data on several occasions over the past few years. And, it was a heavy influence on the work that an ad hoc team in the BI department at National Instruments undertook a couple of years ago to standardize/professionalize the work they were putting out.

I finally got around to reading a good chunk of the book as I was flying a three-legged trip out to British Columbia last week…and it is good! One section that particularly struck me started on page 100:

Edward R. Tufte introduced a concept in his 1983 classic The Visual Display of Quantitative Information that he calls the “data-ink ratio.” When quantitative data is displayed in printed form, some of the ink that appears on the page presents data, and some presents visual content that is not data.
:

He then applies it as a principle of design: “Maximize the data-ink ratio, within reason. Every bit of ink on a graphic requires a reason. And nearly always that reason should be that the ink presents new information.”
:
This principle applies perfectly to the design of dashboards, with one simple revision: because dashboards are always displayed on computer screens, i’ve changed the work “ink” to “pixels.”

I’ll actually go farther and say that “dashboards” can be replaced with “spreadsheets” and this maxim holds true. Taking some sample data straight from Few’s book, and working with a simple table, below is how at least 50% of Excel users would format a simple table with bookings by geographic region:

Look familiar? The light gray gridlines in the background turned on in Excel by default. And, a failure to resist the urge to put a “thin” grid around the entire data set.

Contrast that with how Few represents the same data:

Do you agree? This is clearly an improvement, and all Few really did was remove the unnecessary non-data pixels.

So, how would I have actually formatted the table? It’s tough to resist the urge to add color, and I am a fan of alternating shaded rows, which I can add with a single button click based on a macro that adds conditional formatting (“=MOD(ROW()+1,2)=0” for shaded and “=MOD(ROW(),2)=0” for not shaded):

In this case…I’d actually vote for Few’s approach. But, even Few gives the okay to lightly shaded alternative rows later in the same chapter, when some sort of visual aid is needed to follow a row across a large set of data. That’s really not necessary in this case. And, does bolding the totals really add anything? I don’t know that it does.

The book is a great read. It’s easy to dismiss the topic as inconsequential — the data is the data, and as long as it’s presented accurately, does it really matter if it’s presented effectively? In my book, it absolutely does matter. The more effectively the data is presented, the less work the consumer of the data needs to do to understand it. The human brain, while a wondrously effective computer, has its limits, and presenting data effectively allows the brain to spend the bulk of its effort on assessing the information rather than trying to understand the data.