Excel Tips, Presentation

Big Book of Key Performance Indicators

I have received a bunch of email the last few days from folks who have been directed to my Big Book of Key Performance Indicators and the companion spreadsheet.  Since we have changed the web site (albeit years ago) I figured it was easier to just put it here in a blog post:

I hope you enjoy the work!

Analysis, Presentation

10 Tips for Presenting Data

Big data. Analytics. Data science. Businesses are clamoring to use data to get a competitive edge, but all the data in the world won’t help if your stakeholders can’t understand, or if their eyes glaze over as you present your incredibly insightful analysis. This post outlines my top ten tips for presenting data.

It’s worth noting that these tips are tool agnostic—whether you use Data Studio, Domo, Tableau or another data viz tool, the principles are the same. However, don’t assume your vendors are in lock-step with data visualization best practices! Vendor defaults frequently violate key principles of data visualization, so it’s up to the analyst to put these principles in practice.

Tip #1: Recognize That Presentation Matters

The first step to presenting data is to understand that how you present data matters. It’s common for analysts to feel they’re not being heard by stakeholders, or that their analysis or recommendations never generate action. The problem is, if you’re not communicating data clearly for business users, it’s really easy for them to tune out.

Analysts may ask, “But I’m so busy with the actual work of putting together these reports. Why should I take the time to ‘make it pretty’?”

Because it’s not about “making things pretty.” It’s about making your data understandable.

My very first boss in Analytics told me, “As an analyst, you are an information architect.” It’s so true. Our job is to take a mass of information and architect it in such a way that people can easily comprehend it.

… Keep reading on ObservePoint‘s blog …

Analysis, Featured, General, Presentation

Foundational Social Psychology Experiments (And Why Analysts Should Know Them) – Part 5 of 5

Digital Analytics is a relatively new field, and as such, we can learn a lot from other disciplines. This post continues exploring classic studies from social psychology, and what we analysts can learn from them.

Jump to an individual topic:

False Consensus

Experiments have revealed that we tend to believe in a false consensus: that others would respond similarly to the way that we would. For example, Ross, Greene & House (1977) provided participants with a scenario, with two different possible ways of responding. Participants were asked to explain which option they would choose, and guess what other people would choose. Regardless of which option they actually chose, participants believed that other people would choose the same one.

Why this matters for analysts: As you are analyzing data, you are looking at the behaviour of real people. It’s easy to make assumptions about how they will react, or why they did what they did, based on what you would do. But our analysis will be far more valuable if we can be aware of those assumptions, and actively seek to understand why our actual customers did these things – without relying on assumptions.

Homogeneity of the Outgroup

There is a related effect here: the Homogeneity of the Outgroup. (Quattrone & Jones, 1980.) In short, we tend to view those who are different to us (the “outgroup”) as all being very similar, while those who are like us (the “ingroup”) are more diverse. For example, all women are chatty, but some men are talkative, some are quiet, some are stoic, some are more emotional, some are cautious, others are more risky… etc.

Why this matters for analysts: Similar to the False Consensus Effect, where we may analyse user behaviour assuming everyone thinks as we do, the Homogeneity of the Outgroup suggests that we may oversimplify the behaviour of customers who are different to us, and fail to fully appreciate the nuance of varied behaviour. This may seriously bias our analyses! For example, if we are a large global company, an analysis of customers in another region may be seriously flawed if we are assuming customers in the region are “all the same.” To overcome this tendency, we might consider leveraging local teams or local analysts to conduct or vet such analyses.

The Hawthorne Effect

In 1955, Henry Landsberger analyzed several studies conducted between 1924 and 1932 at the Hawthorne Works factory. These studies were examining the factors related to worker productivity, including whether the level of light within a building changed the productivity of workers. They found that, while the level of light changing appeared to be related to increased productivity, it was actually the fact that something changed that mattered. (For example, they saw an increase in productivity even in low light conditions, which should make work more difficult…) 

However, this study has been the source of much criticism, and was referred to by Dr. Richard Nisbett as a “glorified anecdote.” Alternative explanations include that Orne’s “Demand Characteristics” were in fact at work (that the changes were due to the workers knowing they were a part of the experiment), or the fact that the changes were always made on a Sunday, and Mondays normally show increased productivity, due to employee’s having a day off. (Levitt & List, 2011.)

Why this matters for analysts: “Demand Characteristics” could mean that your data is subject to influence, if people know they are being observed. For example, in user testing, participants are very aware they are being studied, and may act differently. Your digital analytics data however, may be less impacted. (While people may technically know their website activity is being tracked, it may not be “top of mind” enough during the browsing experience to trigger this effect.) The Sunday vs. Monday explanation reminds us to consider other explanations or variables that may be at play, and be aware of when we are not fully in control of all the variables influencing our data, or our A/B test. However, the Hawthorne studies are also a good example where interpretations of the data may vary! There may be multiple explanations for what you’re seeing in the data, so it’s important to vet your findings with others. 

Conclusion

What are your thoughts? Do these pivotal social psychology experiments help to explain some of the challenges you face with analyzing and presenting data? Are there any interesting studies you have heard of, that hold important lessons for analysts? Please share them in the comments!

Analysis, Featured, General, Presentation

Foundational Social Psychology Experiments (And Why Analysts Should Know Them) – Part 4 of 5

Digital Analytics is a relatively new field, and as such, we can learn a lot from other disciplines. This post continues exploring classic studies from social psychology, and what we analysts can learn from them.

Jump to an individual topic:

The Bystander Effect (or “Diffusion of Responsibility”)

In 1964 in New York City, a woman name Kitty Genovese was murdered. A newspaper report at the time claimed that 38 people had witnessed the attack (which lasted an hour) yet no one called the police. (Later reports suggested this was an exaggeration – that there had been fewer witnesses, and that some had, in fact, called the police.)

However, this event fascinated psychologists, and triggered several experiments. Darley & Latane (1968) manufactured a medical emergency, where one participant was allegedly having an epileptic seizure, and measured how long it took for participants to help. They found that the more participants, the longer it took to respond to the emergency.

This became known as the “Bystander Effect”, which proposes that the more bystanders that are present, the less likely it is that an individual will step in and help. (Based on this research, CPR training started instructing participants to tell a specific individual, “You! Go call 911” – because if they generally tell a group to call 911, there’s a good chance no one will do it.)

Why this matters for analysts: Think about how you present your analyses and recommendations. If you offer them to a large group, without specific responsibility to any individual to act upon them, you decrease the likelihood of any action being taken at all. So when you make a recommendation, be specific. Who should be taking action on this? If your recommendation is a generic “we should do X”, it’s far less likely to happen.

Selective Attention

Before you read the next part, watch this video and follow the instructions. Go ahead – I’ll wait here.

In 1999, Simons and Chabris conducted an experiment in awareness at Harvard University. Participants were asked to watch a video of basketball players, where one team was wearing white shirts, and the other team was wearing black shirts. In the video, the white team and black team respectively were passing the ball to each other. Participants were asked to count the number of passes between players of the white team. During the video, a man dressed as a gorilla walked into the middle of the court, faced the camera and thumps his chest, then leaves (spending a total of 9 seconds on the screen.) Amazingly? Half of the participants missed the gorilla entirely! Since then, this has been termed “the Invisible Gorilla” experiment. 

Why this matters for analysts: As you are analyzing data, there can be huge, gaping issues that you may not even notice. When we focus on a particular task (for example, counting passes by the white-shirt players only, or analyzing one subset of our customers) we may overlook something significant. Take time before you finalize or present your analysis to think of what other possible explanations or variables there could be (what could you be missing?) or invite a colleague to poke holes in your work.

Stay tuned

More to come!

What are your thoughts? Do these pivotal social psychology experiments help to explain some of the challenges you face with analyzing and presenting data?

Analysis, Featured, General, Presentation

Foundational Social Psychology Experiments (And Why Analysts Should Know Them) – Part 3 of 5

Digital Analytics is a relatively new field, and as such, we can learn a lot from other disciplines. This post continues exploring classic studies from social psychology, and what we analysts can learn from them.

Primacy and Recency Effects

The serial position effect (so named by Ebbinghaus in 1913) finds that we are most likely to recall the first and last items in a list, and least likely to recall those in the middle. For example, let’s say you are asked to recall apple, orange, banana, watermelon and pear. The serial position effect suggests that individuals are more likely to remember apple (the first item; primacy effect) and pear (the final item; recency effect) and less likely to remember orange, banana and watermelon.

The explanation cited is that the first item/s in a list are the most likely to have made it to long-term memory, and benefit from being repeated multiple times. (For example, we may think to ourselves, “Okay, remember apple. Now, apple and orange. Now, apple, orange and banana.”) The primacy effect is reduced when items are presented in quick succession (probably because we don’t have time to do that rehearsal!) and is more prominent when items are presented more slowly. Longer lists tend to see a decrease in the primacy effect (Murdock, 1962.)

The recency effect, that we’re more likely to remember the last items, is explained because the most recent item/s are recalled, since they are still contained within our short-term memory (remember, 7 +/- 2!) However, the items in the middle of the list benefit from neither long, nor short, term memory, and therefore are forgotten.

This doesn’t just affect your recall of random lists of items. When participants are given a list of attributes of a person, their order appears to matter. For example, Asch (1964) found participants told “Steve is smart, diligent, critical, impulsive, and jealous” had a positive evaluation of Steve, whereas participants told “Steve is jealous, impulsive, critical, diligent, and smart” had a negative evaluation of Steve. Even though the adjectives are the exact same – only the order is different!

Why this matters for analysts: When you present information, your audience is unlikely to remember everything you tell them. So choose wisely. What do you lead with? What do you end with? And what do you prioritize lower, and save for the middle?

These findings may also affect the amount of information you provide at one time, and the cadence with which you do so. If you want more retained, you may wish to present smaller amounts of data more slowly, rather than rapid-firing with constant information. For example, rather than presenting twelve different “optimisation opportunities” at once, focusing on one may increase the likelihood that action is taken.

This is also an excellent argument against a 50-slide PowerPoint presentation – while you may have mentioned something in it, if it was 22 slides ago, the chance of your audience remembering are slim.

The Halo Effect

Psychologists have found that our positive impressions in one area (for example, looks) can “bleed over” to our perceptions in another, unrelated area (for example, intelligence.) This has been termed the “halo effect.”

In 1977, Nisbet and Wilson conducted an experiment with university students. The two students watched a video of the same lecturer deliver the same material, but one group saw a warm and friendly “version” of the lecturer, while the other saw the lecturer present in a cold and distant way. The group who saw the friendly version rated the lecturer as more attractive and likeable.

There are plenty of other examples of this. For example, “physically attractive” students have been found to receive higher grades and/or test scores than “unattractive” students at a variety of ages, including elementary school (Salvia, Algozzine, & Sheare, 1977; Zahr, 1985), high school (Felson, 1980) and college (Singer, 1964.) Thorndike (1920) found similar effects within the military, where a perception of a subordinate’s intelligence tended to lead to a perception of other positive characteristics such as loyalty or bravery.

Why this matters for analysts: The appearance of your reports/dashboards/analyses, the way you present to a group, your presentation style, even your appearance may affect how others judge your credibility and intelligence.

The Halo Effect can also influence the data you are analysing! It is common with surveys (especially in the case of lengthy surveys) that happy customers will simply respond “10/10” for everything, and unhappy customers will rate “1/10” for everything – even if parts of the experience differed from their overall perception. For example, if a customer had a poor shipping experience, they may extend that negative feeling about the interaction with the brand to all aspects of the interaction – even if only the last part was bad! (And note here: There’s a definite interplay between the Halo Effect and the Recency Effect!)

Stay tuned

More to come soon!

What are your thoughts? Do these pivotal social psychology experiments help to explain some of the challenges you face with analyzing and presenting data?

Analysis, Conferences/Community, Presentation, Reporting

Ten Tips For Presenting Data from MeasureCamp SF #1

Yesterday I got to attend my first MeasureCamp in San Francisco. The “Unconference” format was a lot of fun, and there were some fantastic presentations and discussions.

For those who requested it, my presentation on Data Visualization is now up on SlideShare. Please leave any questions or comments below! Thanks to those who attended.

Analysis, Featured, Presentation

Foundational Social Psychology Experiments (And Why Analysts Should Know Them) – Part 2 of 5

Digital Analytics is a relatively new field, and as such, we can learn a lot from other disciplines. This post continues exploring classic studies from social psychology, and what we analysts can learn from them.

Jump to an individual topic:

Confirmation Bias

We know now that “the facts” may not persuade us, even when brought to our attention. However, Confirmation Bias tells us that we intentionally seek out information that continually reinforces our beliefs, rather than searching for all evidence and fully evaluating the possible explanations.

Wason (1960) conducted a study where participants were presented with a math problem: find the pattern in a series of numbers, such as “2-4-6.” Participants could create three subsequent sets of numbers to “test” their theory, and the researcher would confirm whether these sets followed the pattern or not. Rather than collecting a list of possible patterns, and using their three “guesses” to prove or disprove each possible pattern, Wason found that participants would come up with a single hypothesis, then seek to prove it. (For example, they might hypothesize that “the pattern is even numbers” and check whether “8-10-12”, “6-8-10” and “20-30-40” correctly matched the pattern. When it was confirmed their guesses matched the pattern, they simply stopped. However, the actual pattern was “increasing numbers” – their hypothesis was not correct at all!

Why this matters for analysts: When you start analyzing data, where do you start? With a hunch, that you seek to prove, then stop your analysis there? (For example, “I think our website traffic is down because our paid search spend decreased.”) Or with multiple hypotheses, which you seek to disprove one by one? A great approach used in government, and outlined by Moe Kiss for its applicability to digital analytics, is the Analysis of Competing Hypotheses.

Conformity to the Norm

In 1951, Asch found that we conform to the views of others, even when they are flat-out wrong, surprisingly often! He conducted an experiment where participants were seated in a group of eight others who were “in” on the experiment (“confederates.”) Participants were asked to judge whether a line was most similar in length to three other lines. The task was not particularly “grey area” – there was an obvious right and wrong answer.

(Image Credit)

Each person in the group gave their answer verbally, in turn. The confederates were instructed to give the incorrect answer, and the participant was the sixth of the group to answer.

Asch was surprised to find that 76% of people conformed to others’ (incorrect) conclusions at least once. 5% always conformed to the incorrect answer. Only 25% never once agreed with the group’s incorrect answers. (The overall conformity rate was 33%.)

In follow up experiments, Asch found that if participants wrote down their answers, instead of saying them aloud, the conformity rate was only 12.5%. However, Deutsch and Gerard (1955) found a 23% conformity rate, even in situations of anonymity.

Why this matters for analysts: As mentioned previously, if new findings contradict existing beliefs, it may take more than just presenting new data. However, these conformity studies suggest that efforts to do so may be further hampered if you are presenting information to a group. It is less likely that people will stand up for your new findings against the norm of the group. In this case, you may be better to discuss your findings slowly to individuals, and avoid putting people on the spot to agree/disagree within a group setting. Similarly, this argues against jumping straight to a “group brainstorming” session. Once in a group, Asch demonstrated that 76% of us will agree with the group (even if they’re wrong!) so we stand the best chance of getting more varied ideas and minimising “group think” by allowing for individual, uninhibited brainstorming and collection of all ideas first.

Stay tuned!

More to come next week. 

What are your thoughts? Do these pivotal social psychology experiments help to explain some of the challenges you face with analyzing and presenting data?

Analysis, Featured, Presentation

Foundational Social Psychology Experiments (And Why Analysts Should Know Them) – Part 1 of 5

Digital Analytics is a relatively new field, and as such, we can learn a lot from other disciplines. This series of posts looks at some classic studies from social psychology, and what we analysts can learn from them.

Jump to an individual topic:

The Magic Number 7 (or, 7 +/- 2)

In 1956, George A. Miller conducted an experiment that found that the number of items a person can hold in working memory is seven, plus or minus two. However, all “items” are not created equal – our brain is able to “chunk” information to retain more. For example, if asked to remember seven words or even seven quotes, we can do so (we’re not limited to seven letters) because each word is an individual item or “chunk” of information. Similarly, we may be able to remember seven two-digit numbers, because each digit is not considered its own item.

Why this matters for analysts: This is critical to keep in mind as we are presenting data. Stephen Few argues that a dashboard must be confined to one page or screen. This is due to this limitation of working memory. You can’t expect people to look at a dashboard and draw conclusions about relationships between separate charts, tables, or numbers, while flipping back and forth constantly between pages, because this requires they retain too much information in working memory. Similarly, expecting stakeholders to recall and connect the dots between what you presented eleven slides ago is putting too great a pressure on working memory. We must work with people’s natural capabilities, and not against them.

When The Facts Don’t Matter

In 1957, Leon Festinger studied a Doomsday cult who believed that aliens would rescue them from a coming flood. Unsurprisingly, no flood (nor aliens) eventuated. In their book, When Prophecy Fails, Festinger et al commented, “A man with a conviction is a hard man to change. Tell him you disagree and he turns away. Show him facts or figures and he questions your sources. Appeal to logic and he fails to see your point … Suppose that he is presented with evidence, unequivocal and undeniable evidence, that his belief is wrong: what will happen? The individual will frequently emerge, not only unshaken, but even more convinced of the truth of his beliefs than ever before.”

In a 1967 study by Brock & Balloun, subjects listened to several messages, but the recording was staticky. However, the subjects could press a button to clear up the static. They found that people selectively chose to listen to the message that affirmed their existing beliefs. For example, smokers chose to listen more closely when the content disputed a smoking-cancer link.

However, Chanel, Luchini, Massoni, Vergnaud (2010) found that if we are given an opportunity to discuss the evidence and exchange arguments with someone (rather than just reading the evidence and pondering it alone) we are more likely to change our minds in the face of opposing facts.

Why this matters for analysts: Even if your data seems self-evident, if it goes against what the business has known, thought, or believed for some time, you may need more data to support your contrary viewpoint. You may also want to allow for plenty of time for discussion, rather than simply sending out your findings, as those discussions are critical to getting buy-in for this new viewpoint.

Stay tuned!

More to come tomorrow.

What are your thoughts? Do these pivotal social psychology experiments help to explain some of the challenges you face with analyzing and presenting data?

Adobe Analytics, Conferences/Community, Featured, Presentation, Testing and Optimization

Get Your Analytics Training On – Down Under!

Analytics Demystified is looking at potentially holding Analytics training in Sydney, in November of this year. We’re looking to gauge interest (given it’s a pretty long trip!)

Proposed sessions:

Adobe Analytics Top Gun with Adam Greco

Adobe Analytics, while being an extremely powerful web analytics tool, can be challenging to master. It is not uncommon for organisations using Adobe Analytics to only take advantage of 30%-40% of its functionality. If you would like your organisation to get the most out its investment in Adobe Analytics, this “Top Gun” training class is for you. Unlike other training classes that cover the basics about how to configure Adobe Analytics, this one-day advanced class digs deeper into features you already know, and also covers many features that you may not have used. (Read more about Top Gun here.)

Cost: $1,200AUD
Date: Mon 6/11/17 (8 hours)

Data Visualisation and Expert Presentation with Michele Kiss

The best digital analysis in the world is ineffective without successful communication of the results. In this half-day workshop, Analytics Demystified Senior Partner Michele Kiss will share her advice for successfully presenting data to all audiences, including communication of numbers, data visualisation, dashboard best practices and effective storytelling and presentation. Want feedback on something you’re working on? Bring it along!

Cost: $600 AUD
Date: Fri 3/11/17 (4 hours)

Adobe Target and Optimization Best Practices with Brian Hawkins

Adobe Target has been going through considerable changes over the last year. A4T, at.js, Auto-Target, Auto-Allocate, and significant changes to Automated Personalisation. This half day session will dive into these concepts, as well as some heavy focus on the power of the Adobe Target profile and how it can be used as a key tool to advance personalisation efforts. Time will also be set aside to dive into proven organisational best practices that have helped organisations democratise test intake, work flow, dissemination of learnings and automating test learnings.

Cost: $600 AUD
Date: Fri 3/11/17 (4 hours)

[MeasureCamp Sydney is being proposed to be held on the Saturday, giving you a great reason to stay and hang out in Sydney over the weekend]

If you plan to attend, we need you to sign up here bit.ly/demystified-downunder so we can understand if there’s sufficient interest.

These trainings have not been (and likely never will come again!) to Australia, so it’s an awesome opportunity to get a great training experience at a way lower cost than that of flying to the US!

This is not confirmed yet so please do not book any travel (or any other non-refundable stuff) until you hear from us. Hope to see you all soon!! (edited)

* I’m allowed to say that, because I was born and raised in Australia (though I may no longer sound like it.) From the booming metropolis of Geelong! 

Presentation

What Analysts Can Learn About Presentation From Group Fitness Instructors

In my (vast amounts of) free time, I am a Les Mills group fitness instructor for BodyPump (weight training), RPM (indoor cycling) and CXWORX (functional core training.) While analytics and group fitness might seem very different, there are some surprising similarities between teaching a group fitness class, and presenting on analytics. Both involve sharing complicated information (exercise science, digital data) and trying to make this accessible and easily understood.

When we are trained as instructors, part of our training involves education on how different people learn, and how to teach to all of them. This is directly applicable to analytics!

There are three types of learners:

Visual learners need to see it to understand.

  • In group fitness, these participants need you to demonstrate a move, not explain it. You can talk till you’re blue in the face – it won’t make sense for them until you preview it.  
  • In analytics, this may mean visually displaying data, using diagrams, graphs and flow charts instead of data tables – and perhaps even hitting up the whiteboard from time to time.

Auditory learners need to hear it to process it.

  • In group fitness, they require a verbal explanation of exactly what you’re doing. Visual demonstrations may be lost on them.
  • In analytics, these are the people who need to hear your commentary and explanation of the analysis. Their eyes may glaze over at your slide deck, or charts, or reports, but if you talk to them, they can connect with that.

Kinesthetic learners need to feel it to understand, to experience what you’re talking about.

  • In group fitness, they need to be guided in how a move should feel. (For example, a squat is “like you’re reaching your bottom back to sit in a chair, that keeps getting pulled away.”) Analogies work well with this group.
  • In analytics, these are the people that need to be led through your logic, or guided through the user’s experience. It’s not enough to show them your findings, and to display the final results. They need a narrative, to walk through the experience, to understand it. (A good example: Have you ever attended a tool or technology training that went in one ear, and out the other, until you actually got some hands-on practice? That’s kinesthetic learning – you need to do it to retain it.)

Note: People are not necessarily 100% one style – they may be visual and kinesthetic, for example.

Now, here’s where it gets trickier. Whether you are teaching a group fitness class, or presenting an analysis, your audience won’t all be the same type of learner.

This means you need to explain the same thing in multiple ways, to ensure that your information resonates with every type of learner. For example, you might:

  • Use a graph, flowchart or whiteboard to appeal to your visual learners;
  • Talk through the explanation for your auditory learners; and
  • Provide a “sample user experience” for your kinesthetic learners. (“Imagine Debbie arrives on our site. She navigates to the Product page, clicks on the Videos button, and encounters Error X.”)

Keep in mind that you too have your own learning style. Your analysis and presentation style will likely lean more towards your personal learning style, because that’s what makes sense to you. (If you are a visual learner, a visual presentation will come easy to you.) Therefore, you need to make a conscious effort to make sure you incorporate the learning styles you do not share.

Review your presentation through the lens of all three learners.

  • If you didn’t say a word, could visual learners follow your report/slides to understand your point?
  • If you had no slides, could your auditory learners follow you, just from your narrative?
  • Does your story (or examples) guide kinesthetic learners well through the problem?

By appealing to all three learning styles, you stand the best chance of your analysis resonating, and driving action.


What do you think? How do you learn best? Leave your thoughts in the comments! 

Republished based on a post from October 3, 2010

Analysis, Conferences/Community, google analytics, Presentation

Advanced Training for the Digital Analyst

In today’s competitive business environments, the expectations placed on the digital analysts are extremely high. Not only do they need to be masters of the web analytics tools necessary for slicing data, creating segments, and extracting insights from fragmented bits of information…but they’re also expected to have fabulous relationships with their business stakeholders; to interpret poorly articulated business needs; to become expert storytellers; and to use the latest data visualization techniques to communicate complex data in simple business terms. It’s no short order and most businesses are challenged to find the staff with the broad set of skills required to deliver insights and recommendations at the speed of business today.

In response to these challenges, Analytics Demystified has developed specific training courses and workshops designed to educate and inform the digital analyst on how to manage the high expectations placed on their job roles. Starting with Requirements Gathering the Demystified Way, we’ll teach you how to work with business stakeholders to establish measurement plans that answer burning business questions with clear and actionable data. Then in Advanced Google Analytics & Google Tag Manager, we’ll teach you or your teams how to get the most from your digital analytics tools. And finally in our workshops for digital analysts, attendees can learn about Data Visualization and Expert Presentation to put all their skills together and communicate data in a visually compelling way. Each of these courses is offered in our two day training session on October 13th & 14th. If any of these courses are of interest…read on:

 

Requirements Gathering the Demystified Way

Every business with a website goes through changes. Sometimes, it’s a wholesale website redesign, other times a new microsite emerges, or maybe it’s small tweaks to navigation, but features change, and sites evolve always. This workshop led by Analytics Demystified Senior Partner, John Lovett will teach you how to strategically measure new efforts coming from your digital teams. The workshop helps analysts to collaborate with stakeholders, agencies, and other partners using our proven method to understand the goals and objectives of any new initiative. Once we understand the purpose, audience and intent, we teach analysts how to develop a measurement plan capable of quantifying success. Backed with process and documentation templates analysts will learn how to translate business questions into events and variables that produce data. But we don’t stop there…gaining user acceptance is critical to our methodology so that requirements are done right. During this workshop, we’ll not only teach analysts how to collect requirements and what to expect from stakeholders, we we also have exercises to jumpstart the process and send analyst’s back to their desk with a gameplan for improving the requirements gathering process.  

 

Advanced Google Analytics & Google Tag Manager

Getting the most out of Google Analytics isn’t just about a quick copy-paste of JavaScript. In this half-day training, you will learn how to leverage Google Analytics as a powerful enterprise tool. This session sets the foundation with basic implementation, but delves deeper into more advanced features in both Google Analytics and Google Tag Manager. We will also cover reporting and analysis capabilities and new features, including discussion of some exclusive Premium features. This session is suitable for users of both Classic and Universal Analytics, both Standard and Premium.

 

Data Visualization and Expert Presentation

The best digital analysis in the world is ineffective without successful communication of the results. In this half-day class, Web Analytics Demystified Senior Partners Michele Kiss and Tim Wilson share their advice for successfully presenting data to all audiences, including communication of numbers, data visualization, dashboard best practices and effective storytelling and presentation.

 

At Analytics Demystified we believe that people are the single most valuable asset in any digital analytics program. While process and technology are essential ingredients in the mix as well, without people your program will not function. This is why we encourage our clients, colleagues, and peers to invest in digital analytics education. We believe that the program we’re offering will help any Digital Analyst become a more valuable member of their team. Reach out to us at partners@analyticsdemystified.com to learn more, or if we’ve already convinced you, sign up to attend this year’s training on October 13th & 14th in San Francisco today!

Presentation

Making Tables of Numbers Comprehensible

I’m always amazed (read: dismayed) when I see the results of an analysis presented with a key set of the results delivered as a raw table of numbers. It is impossible to instantly comprehend a data table that has more than 3 or 4 rows and 3 or 4 columns. And, “instant comprehension” should be the goal of any presentation of information — it’s the hook that gets your audience’s brain wrapped around the material and ready to ponder it more deeply. Below are two different ways to use conditional formatting to convey information rather than data.

Heatmaps

An industry analyst report was released recently that summarized the scores from the analyst’s evaluation of multiple platforms across a number of dimensions. You may recognize the table below (although I’ve doctored the results enough that I’m not giving away any of the analyst’s intellectual property):

Original Comparison Table

This is a barely-dressed-up Excel spreadsheet. It takes some real staring at the table, including scanning and re-scanning the numbers, to realize that “Jupiter” is rated as the strongest offering. “Neptune” — the fourth vendor listed — appears to be the second strongest. To be fair, a high-level summary of these results is presented in a separate chart, but that chart is really high level.

Some things that, when I tried to wrap my own head around the table, seemed extraneous:

  • 2 decimal places — these scores were the roll-ups of dozens and dozens of individual scores that were, inherently, somewhat subjective. Two decimal places implies a precision that simply does not exist.
  • The actual component scores — in an evaluation like this, the reader really primarily cares about relative strength across each row rather than the absolute scores.
  • The weightings — the weighting matters…but it’s just a factor in the formulas that get to the overall scores — it’s not actually part of the results

With that in mind, just to get a better understanding of the data myself, I grabbed the data and reformatted it using a heatmap. Each row is graduated separately based on the high/low values in that row:

Comparison Table Heatmap

I eliminated the actual display of the numbers for the components for each group, and I shifted the weighting off to the right (and lightened it and made it a smaller font). When I looked at this, I realized that the order of the vendors was simply alphabetical. While that is a logical order, and it may seem like a good way to “let the data speak for itself,” alphabetization is entirely arbitrary in this context. Why not arrange the platforms from overall strongest to weakest?

Comparison Table Heatmap

As it turns out, “Jupiter” happened to be first alphabetically and had the highest overall score. But, with this arrangement, we can quickly see where the different platforms stand out. For instance, “Mercury” is rated relatively lower on all dimensions except for “Employees,” where they were scored high relative to the other platforms. “Saturn” has only 3 areas where they are not the absolute weakest of the entire group.

Now, is the heatmap approach above the only way to present it? Do I think you have to omit the numbers from all of the cells? Of course not! But, it hardly seems arguable that the heatmap is much easier to digest than the raw data table.

Chartless Bars

A different type of data table is shown below. This one is a case where multiple metrics are shown across a single dimension (in this case, traffic source):

Table of Metrics

This is a reasonably formatted table of numbers, but reading and interpreting a bunch of numbers at a glance isn’t something we do well. Interestingly, the first “read” of this data actually comes from the number of digits in each metric rather than the numbers themselves. For instance, the higher conversion rate for email jumps out because it’s a longer number, rather than because it is numerically larger. As a matter of fact, the first three metrics actually each have a bit of a bar chart nature to them just because they have varying numbers of digits.

Understanding that that length is a much faster/easier visual input than numerical digits, we can use conditional formatting to capitalize on that fact:

Table of Numbers with Conditional Bars

Or, if you have the room to allow a slightly wider chart, add a column for each metric so that the value and the bars don’t overlap:

Table of Numbers - Conditional Bars

In either case, it then becomes much easier to grasp which metrics for which rows are anomalous. For instance, the conversion rate for email — noted earlier — but also the visits from paid search.

Always More Than “Just The Numbers”

Hopefully, the examples here, more than prescribing exactly how to plague-ishly avoid raw tables of numbers, show how much more readily comprehensible tables of numbers can be with some quick visualizations of the data. What do you think? Do you have techniques you use to make data tables more readily digestible?

Presentation

Funnel Visualizations That Make Sense

It happened last week: reviewing a client’s weekly report that I inherited, and I just couldn’t take it any more:

funnel_sourceTweet

That prompted some Twitter back-and-forth with Todd Belcher and Alyson Murphy, which led to me popping off that I’d write a post showing some of my preferred alternatives. And…this is that post.

First, a Word on Funnel Atrociousness

Funnels, as a concept, make some sense (although someone once made a good argument that they make no sense, since, when the concept is applied by marketers, the funnel is really more a “very, very leaky funnel,” which would be a worthless funnel — real-world funnels get all of a liquid from a wide opening through a smaller spout; but, let’s not quibble). Major web analytics platforms, though, simply use a static image to represent the concept of a funnel, which, as it turns out, is a pretty horrible thing to do:

  1. Consumers of funnel visualizations are humans
  2. Humans process graphical elements much more readily than they process raw numbers
  3. Ergo, a statically proportioned funnel with real numbers on it obfuscates the actual information, Q.E.D.

So as not to pick on specific web analytics vendors (but ones that start with “A” and “G” are both guilty of this), below is a generic version of a “typical” funnel visualization:

Typical Awful Funnel

For some reason, analysts like to build this sort of thing in Excel. They also tend to do a fairly lousy job of availing themselves of the various alignment and spacing capabilities of Excel, so the funnel segments don’t quite line up. And, sometimes, they even introduce a 3D effect by adding some additional ovals and curved lines. The multi-coloring is just silly…but it happens.

With the image above, what is your eye drawn to? The labels of each step? Nope! The numbers inside each step? Again…nope! They’re drawn to the big (and static) trapezoids that make up the funnel. Now, imagine if you removed the labels and the numbers and only had the funnel (since we just determined that was the strongest visual element in the image). Would it tell you anything of use? No!!!

A Better Way: A Horizontal Bar Chart

I’ve got a host of alternate ways to represent this information. The most elaborate — but still a “build once and then just update the data” option — is to use a horizontal bar chart with two series in it:

funnel_chart

This is a “stairstep funnel,” but it’s still clearly a funnel, and the lengths of each bar are accurate representations of the values in each step. I’ve created this sort of visualization using data from “static” funnels in the past and immediately seen crazy/weird data that required a lot of scrutiny when just looking at the raw numbers on a static image. In the example above, I took a page from Adobe Analytics fallout reports (many users treat Adobe’s conversion funnels as fallout reports…although they’re quite different) and added “conversion from start” and “conversion from previous step” values. I also added in-cell conditional formatting for the “conversion from previous step” — it jumps out that the biggest falloff points are from viewing cart to starting checkout, and from entering payment info to completing the order. These values could also use color-coded conditional formatting, and those same steps would show as “red.”

To build the bar chart itself, I added two columns of data to split the total into a “negative half” and a “positive half.” These additional calculations can be somewhere off the printable area and hidden, but are shown right next to the base data below for illustrative purposes:

Data Table

It’s simply a matter of plotting two series on a horizontal bar chart (“-Half” and “+Half”), changing the series properties (for either series) to an “Overlap” of 100%, and setting both series to use the same fill color.

Compact Options: Conditional Formatting

The examples above are for situations when the funnel is a Big Deal in the overall set of data being presented. That is sometimes the case, but it’s not always the case. One way to provide a more compact view — a mini-funnel — is to use in-cell conditional formatting. I sometimes only show a “half funnel” this way:

To get the bars at the right, I added a simple formula to those cells to set the value equal to the value for that step. Then, I added conditional formatting with bars and selected the option to “Show data bar only.”

That, of course, doesn’t have to be a one-sided funnel. It’s not a perfect funnel, but using the same “split values” option described in the previous section, this can also be a symmetrical funnel:

If the white gap down the middle offends your OCD, you can always manually add boxes to fill that in.

Kick it Old School with REPT

If you’re using an old version of Excel — a version that doesn’t have the conditional formatting options introduced in Excel 2010 — then the REPT function is a handy way to get a similar effect. I used to do this all the time:

funnel_reptOneside

All this does is repeat the “|” symbol an appropriate number of times. The one wrinkle is that you have to divide your base number by some constant. Otherwise, you’d be trying to repeat the “|” 151,131 times for the first step, which wouldn’t work! In the example above, “151,131” is in cell C5. The formula in D5 is:  =REPT(“|”,C5/10000). Make sense?

To do a symmetrical funnel, you can actually use the identical format from above, but put a right-justified column next to a left-justified one:

funnel_reptTwoside

I like to think of this approach as the hipster option — the intentionally less crisp and clean way to achieve something that modern Excel already provides.

Are You Convinced?

These are just a few options. They’re all based on the same premise: humans process relative lengths much more easily than they process raw numbers. And, effective data visualization is all about removing friction from the cognitive funnel!

Excel Tips, General, Presentation

Excel: Charting Averages without Adding Columns

I was recently building out a pretty involved dashboard where, ultimately, I had about 50 different metrics that were available through various drilldowns in Excel. Beyond just the number of metrics (from multiple data sources), I wanted users of the dashboard to be able to select the report timeframe, whether to display the data trended weekly or monthly, and how many periods they wanted in the historical trend of the data. So, there was already some pretty serious dynamic named range action going on. But, I realized it would also be useful to include an average line on the metric charts to illustrate the mean (a target line is a related use case for this — that’s equally applicable and addressed at the end of the post). Basically, getting to a chart like this:

Chart Average

Now, the classic way to do this is to add a new column to the underlying data, put a formula in that column to calculate the average and repeat it in every cell. Then, simply add that data to the chart (a clustered column chart), select the average column and change the chart series type to be a line and “Voila!” there is the chart.

Plotting an Average - The Usual Way

But…50 metrics…built on multiple tabs of underlying data from different sources…that were relying on pivot tables and clever formula-age to change the timeframe, data granularity, and trend length… and my head started spinning. That was going to get messy! So, I figured out a way to accomplish the same thing without taking up any additional cells in the spreadsheet.

In a nutshell, there are just three steps to pull this off:

  1. Make the core data that is being plotted a named range (I was doing this already)
  2. Make a new named range that calculates the average of that named range and repeats it a many times as the original named range has it
  3. Add that new named range to the chart as a line

It’s the second step that is either a brilliant piece of baling wire or a shiny piece of duct tape, but no amount of Googling turned up a better approach, so I ran with it. If you know a better way, please comment!

Let’s break it down to a bit more detail.

Make the Data a Named Range

Okay, this is the easy part, and, in this example, it’s just a dumb, static range. But, more often than not, this would be a slicker — at least a column of a table or a dynamic named range of one flavor or another. But, that’s not really the point of this post, so let’s go with a simple named range called WidgetsSold:

Static Named Range

Make a New Named Range that Is the Average Line

Now, here’s where the fun happens. I made a second named range called “WidgetsSold_AverageLine” that looks like this:

Chart Average Named Range

 

See what that does? Let’s break it down:

  • WidgetsSold*0 — since WidgetsSold is a multicell range, it’s, essentially, an array. Multiplying that range by 0 makes an array of the same length with zeros for all of the values (whether it’s really an array in Excel-land, I don’t know — I tried to actually insert array formulas in the definition of the named range with no luck). Think of it as being an array that looks like this: {0,0,0,0,0,0,0,0,0,0,0,0}
  • +AVERAGE(WidgetsSold) — this actually takes the average of the WidgetsSold range and adds that to each of the zero values, so now we have a list/array/range where each value is the average of the original named range: {15493,15493,15493,15493,15493,15493,15493,15493,15493,15493,15493,15493}

Make sense? Cool, right?

Add that Line to the Chart

Now, it’s just a matter of adding a new data series to the chart referencing that named range. Remember that you have to include the name of your workbook in the Series values box:

Adding the Average Line to the Chart

And, there you have it!

A Few More Notes about This Approach

This post didn’t cover the step-by-step details on how to actually get the chart to play nice, but there are scads of posts that go into that. Heck, there are scads of posts on Jon Peltier’s site alone (like this one). But, here are a couple of other thoughts on this approach:

  • Because the average line named range is based solely off of the named range for the chart itself, it’s pretty robust — no matter how complex and dynamic you make the base named range, the formula for the average line named range stays exactly the same.
  • Having said that, in my dashboard, I actually made the formula a bit more complex, because I didn’t want to include the last period in the charted range in average (e.g., if I was viewing data for October and had data trended from June to October, I only wanted the average to be for June through September). That’s a pretty straightforward adjustment, but this post is already long enough!
  • This example was for the average, but, what if, instead, you wanted to plot a target line, where the target for the data was a fixed number? The same approach applies, and you’re not stuck duplicating your target data across multiple cells.

What do you think? Do you have a simpler way?

[Update] And…a (Brief) Case Cautioning Against this Approach

Jon Peltier pointed out that, while named ranges, when used to refer to ranges of data, make a lot of sense, named formulas like the one described in this post have some downsides. Compiling the multi-part tweet where he described these:

You can used named formulas (“Names”) in Excel worksheets and charts. Named formulas are clever, dynamic, and flexible. Names are also hidden, “magical.” and hard to create, modify, understand, and maintain. In 6 months, try to recall how your Name works. Or someone else’s. Try to explain Names to the Sarbox auditors. Using worksheet space (“helper” columns) is cheap, fast, visible, traceable, easy to work with. Whenever possible, limit use of Names to those that reference regions of the worksheet.

Excellent points!

 

Presentation

#eMetrics Reflection: Data Visualization (Still!) Matters

I’m chunking up my reflections on last week’s eMetrics conference in San Francisco into several posts. I’ve got a list of eight possible topics, but I seriously doubt I’ll managed to cover all of them.

On Tuesday, I attended Ian Lurie’s presentation: “Data That Persuades: How to Prove Your Point.” This session was a “fist pumper” for me, as Ian is as frustrated by crappy data visualization as I am (he led off the presentation by showing a mouth guard, sharing that he wears one at night because he grinds his teeth, and then noting that the stress of seeing data poorly presented was a big source of the stress driving that grinding!).

One of the ways Ian illustrated the importance of putting care into the way data gets presented was with this image:

Read, React, Respond

think it’s fair to say this a representation of the three types of memory:

  • The “lizard brain” represents iconic memory — the “visual sensory register.” It’s where preattentive cognitive processing occurs. If we don’t put something forth that is clear and instantaneously perceptible, then the information won’t get past the lizard brain.
  • The “ape brain” represents short-term memory — where conscious thought and basic processing occurs. The initial, “Do I care?” question gets asked and answered.
  • The “human brain” represents longer-term memory — where we actually need to digest the information and develop and implement a response.

Ian also spent a lot of time on Tufte’s data-ink ratio — imploring the audience to be heavily reductionist in the visualization of data by removing extraneous words, lines, tick marks, etc. so that “the data” really comes through.

Otherwise, the recipients of the data will be like screaming goats:

Screaming Goat

Analytics Strategy, Presentation

#AdobeSummit Takeaways: My Favorite Tips

I’ve written several posts with different reflections on my Adobe Summit 2013 experience. You can see a list of all of them by going to my Adobe Summit tag.

This post isn’t long, but I picked up a few real nuggets of brilliance that were very tactical tips that I’ll be exploring further in my day job in the next week or two.

Finding Questions in Site Search

Nancy Koons might be the nicest person on the planet (feel free to leave a comment if you think you know someone nicer) and also is the source of two of my favorite tips (neither of which is at all Adobe-specific).

I’m a fan of site search data (I even wrote a Practical eCommerce article on the subject last year). Nancy set up the tip by explaining why site search analytics makes sense, but then she gave this tip:

“Filter your site search terms report by the words: who, what, why, where, and how.”

Literally. Filter for those 5 words. What this will give you a list of results that are full questions people typed into your search box. These are all going to be unique — they’ll be wayyyy out on the long tail of the report. But, they’re also context-rich. They tell you exactly what the visitor was trying to do.

Cool, huh?

A Poster of Insights

This next tip is also completely to Nancy’s credit. The entire panel touched on the need to not just do analysis, but to effectively communicate their results. Nancy shared a situation where her team was doing a “year in review” and had a number of useful insights that they had turned up over the course of the year. The challenge they had was, “How to actually communicate them in a way that they wouldn’t be forgotten at the point when they would be most useful to apply in the coming year?”

The solution: a printed poster that captured the insights that would most be able to be applied in the coming year. The poster was heavily designed — almost infographic-level detail. The posters were good-sized — they looked to be 24-30″ wide and maybe 15″ tall — and were distributed to the marketers to put up in their offices. Brilliant! A constant reminder/reference of the most useful learning from the prior year!

Report Builder…

There were several tips that were geared towards “don’t present the data directly from within SiteCatalyst,” which meant Report Builder and Excel got some real love. Report Builder is a great way to get automated data updates into Excel, where the richer visualization options for the platform can be put to full use.

If you want to hone your Report Builder and Excel chops, consider Kevin Willeitner’s class this fall in Columbus (and stick around for #ACCELERATE).

Context Variables in SiteCatalyst

I’m not proud. I’ll admit that I totally missed context variables in the v15 release…until Ben Gaines explained them in his “10 tips” session. Basically, remove developer confusion over the difference between props, eVars, and event.

Did You Pick Up a Favorite Tip?

I got a number of other little nuggets and ideas, but these were the ones I most felt like I’d be putting to use almost immediately. What did you take back from Salt Lake City that you’ll be putting into action soon?

 

 

 

Presentation

Why Data Visualization Matters: It's Funnel Optimization

One of the reasons I like to give presentations at conferences is because it forces me to really, really, really, crystallize my thoughts. When I’m writing a blog post, I’m generally just trying to get an idea into some sort of coherent form, but conference presentations, for me, have a much higher bar for clarity and concision.

Part of my presentation at the Austin DAA Symposium earlier this month focused on data visualization. It didn’t go very deep into the mechanics of effective data visualization, but I did try to  make a strong case that the topic really matters.

Driving Action

As analysts, our ultimate goal is to drive action that delivers business value. Stop and consider what is involved in “driving action:”

A person who is empowered to act must make a decision to act.

So, really, what we’re talking about here is impacting a decision by a human being, and, if we consider that:

A decision is made based on thoughts and ideas in the brain.

That means that, as analysts, it behooves us to understand a little bit about how the brain works.

Neuroscience says…

Two guys who have had a strong professional influence on me are Stephen Few and John Medina:

Both books provide descriptions of the different types of memory, and both provide various tips for getting information to long-term memory, which is where information needs to be in order for a person to decide to act (and then follow through on that decision).

Taking those concepts and morphing them a bit cheekily into the marketing vernacular of “the funnel,” we’re talking about memory looking like this:

The Memory Funnel

Ultimately, if we don’t get the key points of our analysis into long-term memory, then there is little hope of action being taken. Just as eCommerce sites have to optimize their purchase funnel, as analysts, we need to optimize the memory funnel when presenting results.

In the case of the memory funnel the steps are actually much more distinct than the awareness/consideration/preference/etc. steps in the marketing funnel. They’re distinct…but they have some unpleasant realities.:

  • Iconic memory — this is also called the “visual sensory register,” and it’s where “preattentive cognitive processing” occurs. We are constantly bombarded with information, and our iconic memory is the first point that we are aware — subconsciously aware — of every bit of information in our field of view. Instantaneously, we are making decisions as to what information we should actually pay attention. This means that, instantaneously, we are discarding most of what we see! If a chart is unclear, our iconic memory may very well shift focus to the clock on the wall or the ugly tie being worn by the fellow sitting next to the analyst. Iconic memory is fickle and fleeting!
  • Short-term memory — this is where we actually focus and “think about what we’re seeing.” It’s that thought that is going to decide whether or not the information gets passed along to long-term memory. But, here’s the real kicker when it comes to short-term memory: it can only hold 3 to 9 pieces of visual information at once. It’s our RAM…but it’s RAM circa 1992, in that it has very limited capacity. The more extraneous information we include in our analysis results, the more we risk a buffer overrun. And, if short-term memory can’t fully make sense of the information, then it’s going to fall out of the funnel then and there.

“Sight” is the sense that we are forced to heavily rely on to communicate the results of our analyses. There is a lot of visual clutter occurring in our audiences’ worlds that we can’t control, and we’re competing with that visual clutter any time we deliver the results of our work. It behooves us to compete as effectively as we possibly can by effectively visualizing the information we are communicating.

Analysis, Analytics Strategy, Presentation

Effectively Communicating Analysis Results

I was fortunate enough to not only get to attend the Austin DAA Symposium this week, but to get to deliver one of the keynotes. The event itself was fantastic — a half day that seemed to end pretty much as soon as it started, but in which I felt like I had a number of great conversations, learned a few things, and got to catch up with some great people whom I haven’t seen in a while.

The topic of my keynote was “Effectively Communicating Analysis Results,” and, as sometimes tends to happen between the writing of the description and the actual creation of the content, the scope morphed a bit by the time the symposium arrived.

My theme, ultimately, was that, as analysts, we have to play a lot of roles that aren’t “the person who does analysis” if we really want to be effective. I illustrated why that is the case…in a pie chart (I compensated by explaining that pie charts are evil later in the presentation). The pie chart was showing, figuratively, a breakdown of all of the factors that actually contribute to an analysis driving meaningful and positive action by the business:

What Goes Into Effective Analysis

 

The roles? Well:

  • Translator
  • Cartographer
  • Process Manager
  • Communicator
  • Neuroscientist
  • Knowledge Manager

I recorded one of my dry runs, which is available as a 38 minute video, and the slides themselves are available as well, over on the Clearhead blog.

It was a fun presentation to develop and deliver, and a fantastic event!

Excel Tips, Presentation

Small Charts in Excel: Beyond Sparklines, Still Economical

I’m a fan of Stephen Few; pretty much, always have been, and, pretty much, always will be. When developing dashboards, reports, and analysis results, it’s not uncommon at all for me to consciously consider some Few-oriented data visualization principles.

One of those principles is “maximize the data-pixel ratio,” which is a derivation of Edward R. Tufte’s “data-ink ratio.” The concept is pretty simple: devote as much of the non-white space to actually representing data and as little as possible to decoration and structure. It’s a brilliant concept, and I’m closing in on five years since I dedicated an entire blog post to it.

Another Tufte-inspired technique that Few is a big fan of is the “sparkline.” Simply put, a sparkline is a chart that is nothing but the line of data:

Small Charts: Sparkline

In Few’s words (from his book, Information Dashboard Design: The Effective Visual Communication of Data):

Sparklines are not meant to provide the quantitative precision of a normal line graph. Their whole purpose is to provide a quick sense of historical context to enrich the meaning of the measure.

When Few designs (or critiques) a dashboard, he is a fan of sparklines. He believes (rightly), that dashboards need to fit on a single screen (for cognitive processing realities that are beyond the scope of this post), and sparklines are a great way to provide additional context about a metric in a very economical space.

Wow! Sparklines ROCK!

But, still…sparklines are easy to criticize. In different situations, the lack of the following aspects of “context” can be pretty limiting:

  • What is the timeframe covered by the sparkline? Generally, a dashboard will cover a set time period that is displayed elsewhere on the dashboard. But, it can be unclear as to whether the sparkline is the variation of the metric within the report period (the last two weeks, for instance) or, rather, if it shows a much longer period so that the user has greater historical context.
  • What is the granularity of the data? In other words, is each point on the sparkline a day? A week? A month?
  • How much is the metric really varying over time? The full vertical range of a sparkline tends to be from the smallest number to the largest number in the included data. That means a metric that is varying +/-50% from the average value can have a sparkline that looks almost identical to one that is varying +/-2%.
  • How has the metric compared to the target over time? The latest value for the metric may be separately shown as a fixed number with a comparison to a prior period.  But, the sparkline doesn’t show how the metric has been trending relative to the target (Have we been consistently below target? Consistently above target? Inconsistent relative to target?).

So, sparklines aren’t a magic bullet.

So, What’s an Alternative?

While I do use sparklines, I’ve found myself also using “small charts” more often, especially when it comes to KPIs. A small chart, developed with a healthy layer of data-pixel ratio awareness, can be both data-rich and space-economical.

Let’s take the following data set, which is a fictitious set of data showing a site’s conversion rate by day over a two-week period , as well as the conversion rate for the two weeks prior:

Small Charts: Sample Data

If we just plot the data with Excel’s (utterly horrid) default line chart, it looks like this:

Small Charts: Default Excel

Right off the bat, we can make the chart smaller without losing any data clarity by moving the legend to the top, dropping the “.00” that is on every number in the y-axis, and removing the outer border:

Small Charts: Smaller Step 1

The chart above still has an awful lot of “decoration” and not enough weight for the core data, so let’s drop the font size and color for the axis labels, remove the tick marks from both axes and the line itself from the y-axis, and lighten up the gridlines. And, to make it more clear which is the “main” data, and to make the chart more color-blind friendly in the process, let’s change the “2 Weeks Prior” line to be thinner and gray:

Small Charts: Smaller Step 2

Now, if the fact that the dates are diagonal isn’t bugging you, you’re just not paying attention. Did you realize that you’re head is cocked ever so slightly to the left as you’re reading this post?

We could simply remove the dates entirely:

Small Charts: Smaller Step 3 (Too Far)

That certainly removes the diagonal text, and it lets us shrink the chart farther, but it’s a bit extreme — we’ve lost our ability to determine time range covered by the data, and, in the process, we’ve lost an easy way to tell the granularity of the data.

What if, instead, we simply provide the first and last date in the range? We get this:

Small Charts: Smaller Final

Voila!

In this example, I’ve reduced the area of the chart by 60% and (I claim) improved the readability of the data! The “actual value” — either for the last data point or for the entire range — should also be included in the display (next to or above the chart). And, if a convention of the heavy line as the metric and the lighter gray line as the compare is used across the dashboard or the report, then the legend can be removed and the chart size can be farther reduced.

That’s Cool, but How Did You Do Just the First and Last Dates?

Excel doesn’t natively provide a “first and last date only” capability, but it’s still pretty easy to make the chart show up that way.

In this example, I simply added a “Chart Date” column and used the new column for the x-axis labels:

Small Charts: Sample Data with First and Last Date Column

The real-world case that actually inspired this post actually allows the user to change the start and end date for the report, so the number of rows in the underlying data varied. So, rather than simply copying the dates over to that column, I put the following formula in cell D3 and then dragged it down to autofill a number of additional rows. That way, Excel automatically figured out where the “last date” value should be displayed:

=IF(AND(A3<>””,A4=””),A3,””)

What that formula does is look in the main date column, and, if the current row has a date and the next row has no date, then the current row must be the last row, so the date is displayed. Otherwise, the cell is left blank.

Neither a Sparkline Nor a Full Chart Replacement

To be clear, I’m not proposing that a small chart is a replacement for either sparklines or full-on charts. Even these small charts take up much more screen real estate than a sparkline, and small charts aren’t great for showing more than a couple of metrics at once or for including data labels with the actual data values.

But, they’re a nice in-between option that are reasonably high on information content while remaining reasonably tight on screen real estate.

Presentation

How Communicating Analytics Is Like New York City

I joined a new company, Clearhead, at the beginning of September, and it’s been a fun-tiring-exciting ride thus far (unfortunately, it hasn’t been a particularly prolific one with respect to this site!). One of the core tenets of the company – something that we are all passionate about because we have all seen it go horribly awry – is that we will always deliver information in a way that is clear, concise, (as) simple (as possible), and elegant. There’s a reason I have Data Visualization as one of the categories for this blog – I’ve long believed that there is an inordinate amount of incomprehensible charts, graphs, and tables being emailed and presented by analysts and marketers. We. Need. To. Do. Better! (as an industry).

Same Idea, but From Another Angle

So…shift to my recent trip to New York City. I was with the co-founders of the company, Matt and Ryan, who were both long-time New York City residents before moving to Austin (Ryan is a San Antonio native and has an affinity for The Big Apple that I find both baffling and moderately traitorous…but I’m sure that is a phase that will pass in due time as he gets re-acquainted with Austin). The last apartment Ryan lived in before returning to the Lone Star State was at 21st and 1st.

That’s important, so I’m going to write it again (I could make it really big and bold, but it’s not that kind of important, so we’ll just go with italics): Ryan lived in an apartment at 21st and 1st.

Here’s what’s interesting about that statement: you just read it, and you will be able to place yourself in one – and only one – of the two following groups:

  • You immediately knew that he lived in Midtown Manhattan (Midtown East, even) and had a mental image, if not of the exact intersection, then of a street/building/intersection reasonably near by
  • You registered the location mentally as “somewhere in New York City.”

I’ve now been to that exact apartment – and to that intersection – several times…and I still fall in the latter category. That’s not because I’m particularly slow or non-observant. It’s because I’ve never lived in New York, have never spent more than 4 consecutive days there, and have only rarely needed to get around the the city on my own, rather than simply tagging along with a local.

In short, I don’t speak “areas of Manhattan” with any degree of fluency. I cognitively know that the “Lower East Side” is generally towards the bottom and to the right of a north-oriented map of the island. But, I can’t tell you the vibe and character of that area. I can’t tell you what the main landmarks are there. I can’t tell you what the main thoroughfares are that bisect the area.

Now, you have read the past couple of paragraphs and thought one of two things:

  • “Seriously? He knows nothing about the Lower East Side other than what the three words ‘lower,’’ east,’ and ‘side’ describe?”
  • “Why is Tim belaboring this? Obviously – he hasn’t spent a lot of time in the city, so he doesn’t really intuitively know what is where.”

What’s interesting (borderline fascinating, really, if you’re into brain stuff) is that one of the statements above made total sense to you, and the other one seemed totally foreign. It’s like listening to a couple of people having an animated conversation in a foreign (to you) language. They are clearly communicating without any effort whatsoever, and, yet, it is insanely difficult to actually imagine how what sounds like fast-paced gibberish to you could possibly be clearly transmitting very real information and ideas.

The key, in both cases, is that everyone’s brain is wired differently, and the synaptic paths that have been traversed hundreds of times with different visual and experiential reinforcement (the Lower East Side, daily conversation in German, etc.) by one person have barely been traveled at all by others.

And, Yes, I Have a Point to All of This

As analysts, when we discuss, visualize, or present data, we are often the equivalent of a native New Yorker coordinating a visit with someone raised in Sour Lake, Texas (such as yours truly). Just as Matt and Ryan quickly learned that they could not skip any steps in guiding me from JFK to 23rd and 3rd, as analysts, we have to work really hard to speak in the visual language of the people to whom we’re delivering information. We have to minimize “the data” that gets presented and maximize “the meaning.”

The next time you get a blank look from someone to whom you are delivering the results of an analysis, stop and ask yourself if it’s because you’re a native New Yorker talking to someone who only visits occasionally. It’s not a knock against that person at all – the onus is on the native to be a good host and to figure out the best way to present the information in a way that it can be quickly and simply received.

Analytics Strategy, Excel Tips, Presentation

Data Visualization Tips and Concepts (Monish Datta calls it "stellar")*

Columbus Web Analytics Wednesday was sponsored by Resource Interactive last week, and it was, as usual, a fun and engaging event:

Web Analytics Wednesday -- Attendees settling in

We tried a new venue — the Winking Lizard on Bethel Road — and were pretty pleased with the accommodations (private room, private bar, very reasonable prices), so I expect we’ll be back.

Relatively new dad Bryan Cristina had a child care conflict with his wife…so he brought along Isabella (who was phenomenally calm and well-behaved, and is cute as a button!):

Bryan and Isabella

I presented on a topic I’m fairly passionate about — data visualization. The presentation was well-received (Monish Datta really did tweet that it was “stellar”)  and generated a lot of good discussion. I had several requests for copies of the presentation, so I’ve modified it slightly to make it more Slideshare-friendly and posted it. If you click through on the embedded version below, you can see the notes for each slide by clicking on the “Notes on Slide X” tab underneath the slideshow, or you can download the file itself (PowerPoint 2007), which includes notes with each slide (I think you might have to create/login to a Slideshare account, which it looks like you can do quickly using Facebook Connect).

 

 

 

 

I had fun putting the presentation together, as this is definitely a topic that I’m passionate about!

* The “Monish Datta” reference in the title of this post, while accurate, is driven by my never-ending quest to dominate search rankings for searches for Monish. I’m doing okay, but not exactly dominating.

http://b.scorecardresearch.com/beacon.js?c1=7&c2=7400849&c3=1&c4=&c5=&c6=

Presentation

How Succinctly Can I Explain Why Pie Charts Are Evil?

I’m right at three months into my new gig, and, around the office, probably the most commonly known fact is, “He hates pie charts.” It’s not that I’ve exactly been standing at the elevator handing out leaflets explaining why pie charts are evil, but I have, apparently, chosen a couple of particularly public venues to make a mild statement or two. And, the quasi-preplanned visceral groan when some co-workers put up a pie chart might’ve contributed just a teensy bit.

I’ve been put on the spot since then a couple of times to do one of two things:

  • Explain why pie charts are evil, or
  • Agree that one or another particular usage of a pie chart is appropriate

After catching up on some blog reading yesterday morning and seeing an excellent example of pie chart alternatives from Jon Peltier, and then watching seven presentations yesterday, six of which used the same basic presentation template, and five of which stuck with a pie chart for the sole non-text slide in the presentation, how could I not write another post?! Let’s see how succinct I can make it (don’t hold your breath that you could read the whole thing before exhaling!).

Yes, There is ONE Thing That a Pie Chart Does Well

This kills me, because there’s one way, in a a very narrow set of circumstances, that pie charts do marginally better than alternatives. All THREE of the following criteria have to be met for this to be the case:

  • Exactly 2 or 3 categories that make up the “whole”
  • A fairly significant difference in % makeup for each of the categories
  • Plenty of space available to present the information

99 times out of 100 when pie charts get used, all of these criteria are not met. But, there, I’ve admitted that there is a situation where pie charts are appropriate.

Of course, mullets are an appropriate hairstyle if you are prone to both warm ears and spontaneous hair donations…but that doesn’t mean I’m going to sport one!

Of Course, We Must Start with a Before/After Example

With only the category names changed, below is one of the pie charts I saw yesterday:

Pie Chart Example

In my experience, a simple horizontal bar chart is a better option (among a variety of better options):

Bar Chart Example

Why is this a better option? Oh, let me count the ways…

1. Rainbows Are Good in Princess Tales — Not in Data Visualization

When it comes to data visualization, a chart that doesn’t rely on multiple colors always trumps a chart that does. Four reasons:

  • If you use subtle/muted colors, you can’t get past 4 or 5 categories before you are asking the person reading the chart to work hard to distinguish between subtle shading differences
  • If you use bright/high-contrast colors, you’re asking your user to put on sunglasses to keep from wincing at the visual overkill
  • Roughly 10% of men suffer from some form of color-blindness — it’s darn tricky to nail a palette with more than a small handful of colors that works across the various types of the condition (of course, if you’ve got a secret agenda to have women take over the world, this is one way to contribute, as color blindness is exceedingly rare in women)
  • Maybe you’re presenting your chart in glorious, projected color…but are you sure no one is going to try to print it in black-and-white?

These are all issues with any pie chart that has more than 3 categories. None of these are an issue with a horizontal bar chart.

2. Labels, Labels, Labels

If you’ve every constructed a pie chart in Excel, you’ve run into the challenge of trying to get all of the wedges labeled right there on the chart. Excel continues to make odd choices as to where to wrap text in pie charts, and the circular nature of the whole layout means some wedges have plenty of horizontal labeling room, while others have almost none. You’ve tried some (or all) of the following:

  • Using leader lines for some of the wedges so you can label the most troubling wedges somewhere more spacious
  • Abbreviating the category names
  • Strategically rotating the chart so that the labeling all happens to work (it never does)
  • Rearranging the underlying data so that the pie wedges occur in a different order (which also never works)

After fiddling with the above, you finally break down and yank the labels from the chart and just use a legend. This is bad, bad, BAD! Scroll back up to the pie chart example above and pretend you’re actually trying to interpret the data, but pay attention to how many times you look back and forth between the legend and the pie. This is putting a totally unnecessary strain on your brain! Take a look at the horizontal bar chart — no jumping back and forth needed!

With a horizontal bar chart, the label sits right next to the data, and it doesn’t need to be abbreviated to do so (this is one reason that I find horizontal bar charts to be better than vertical column charts in many cases — with a horizontal orientation, the labels have more width with which to work).

3. Those Pesky Near-Zero Values

Pie charts suck at the small percentages. Small percentage categories wreak havoc on the labeling issue, for sure, but they’re also nearly impossible to compare to each other. In the example above, the smallest percentage is 3%, and that’s almost manageable. But, heaven forbid you have a couple of pesky sub-one-percent categories, and you’re looking at wedges that look suspiciously like the lines between wedges.

4. Seeing Small Differences

Fundoogles & Flibbers came in at 3%, while Dracula’s Mickety Micks came in at 5%. Do the wedge sizes really look different? That’s a fundamental challenge with pie charts — we don’t do a very good job of comparing the areas of these odd sorta-triangular-but-with-one-curved-side shapes. In the case of the bar chart, all you have to compare is lengths — much easier.

5. Economy (of Space) Is a Virtue

Check out the overall size of the charts. While they have the same font size, the same text displayed, and the same width, the bar chart is 20% shorter…and it could have been shorter still! Bar charts are more efficient space-wise. With pie charts, and largely because of the other issues listed above, it’s often necessary to make the chart larger and larger to make it readable.

Of Course, This Exampel Was At Least Flat

This post would be twice as long if I went into the additional issues of using the “3D effect” version of the pie chart.

[Update] Always Room for Improvement

Of course, the danger of posting a “here’s a better way” is that you leave yourself open for suggestions as to how the better way can be improved! See Naomi’s comment below. She raises a good point — basically, that I didn’t do a great job of heeding the data-pixel ratio with my bar chart! So, below is a revised version.

bar chart exampleIn a subsequent email exchange, Naomi made the case for keeping the x-axis and the numbers, but simply removing the “%” signs entirely and putting the word “Percent” in the axis label:

Bar Chart Example

Her main point is that numbers can be read more easily if they are not cluttered with symbols like dollar signs and percent signs. And, her case for keeping the gridlines and labeled axis is that it helps show that the bars are drawn to scale — there hasn’t been any incorrect or misleading scaling (intentional or not — in the same spate of presentations that spurred this post, there was a bar chart with an accompanying table of data…and one of the bars was clearly not accurate).

I’m partial to the version with all of the lines removed, but, at this point, the debate is at a much healthier level than “pie vs. bar,” so I’m happy!

Excel Tips, Presentation

An Excel Dashboard Widget

As I wrote in my last post, I’ve been spending a lot of time building out Excel-based dashboard structures and processes of late. I also wrote a few weeks ago about calculating trend indicators. A natural follow-on to both of those posts is a look at the “metric widget” that I use as a basis for much of the information that goes on a dashboard. Below is an example of part of a web site dashboard (not with real data):

Sparkline Widgets

I’ll walk through some of the components here in detail, but, first, a handful of key points:

  • There is no redundant information — it’s not uncommon to see dashboards (or reports in general) where there is a table of data, and that table of data gets charted, and the values for each point on the chart then get included as data labels. This is wasteful and unnecessary.
  • Hopefully, your eyes are drawn to the bold red elements (and these highlights should still pop out for users with the most common forms of colorblindness — I haven’t formally tested that yet, though) — this is really the practical application of the vision I laid out in my Perfect Dashboard post.
  • I have yet to produce a dashboard solely comprised of these widgets — there are always a few KPIs that needs to be given more prominent treatment, and there are other metrics that don’t make sense in this sparkline/trend/current format
  • I do mix up the specific measures on a dashboard-by-dashboard basis. In the example above, showing the past two years of trends by month, and then providing quarterly totals and comparisons, makes the most sense based on the planning cycle for the client. But, that certainly is not a structure that makes sense in all situations.

And now onto the explanation of the what and why of each element, working our way from left to right.

Metric Name

This one hardly warrants an explanation, but I’ll point out that I didn’t label that column. That was a conscious decision — the fact that these are the names of the metric is totally obvious, and Edward Tufte’s data-ink ratio dictates that, if it doesn’t add value, don’t include it!

Past 12 Months Sparkline

The sparkline is another Tufte invention, and it’s one that has really taken off in the data visualization space. That’s good, because sparklines are darn handy, and the more people get used to seeing them, the less there will need to be any “training” of dashboard users to interpret them. Google Analytics has been using sparklines for a while, even, so we’re well on our way to mass adoption!

Google Analytics Sparkline

One tweak on the sparkline front that I came up with (although I’m sure others have done something similar): I add a second, gray sparkline for either the target or the prior reporting period. I like that this gives a quick, easily interpretable view of the metric’s history over a longer period — has it been tracking to target consistently, consistently above or below the target, or bouncing back and forth? Is there inherent seasonality in the metric (signified by both the black and gray sparklines having similar spike/dip periods)?

One limitation of sparklines is that they don’t represent magnitude very well. If, for instance, a particular metric is barely fluctuating over time, then, depending on how the y-axis is set up, the sparkline can still show what looks like a wildly varying value. It’s a minor limitation, though, so I’ll live with it.

4-Month Trend Arrow

The 4-month trend is the single icon that results from a conceptually simple (but a little hairy to calculate) assessment of the most recent four data points. That was the punchline of an earlier post on calculating trend indicators. Whether the basis of the trend is months, weeks, or days can vary (not within one dashboard, generally, but as a standard for the dashboard overall), as well as whether it’s 4, 5, 6, or more data points. It’s a judgment call for both driven by the underlying business need that the dashboard supports.

I promise, promise, promise to make a simplified example of this arrow calculation and post it in a future post — check the Comments section for this post to see if a linkback exists (I’ll come back and update this entry as well once it’s done)

Current

Typically, when sparklines are used, the exact value of the last point in the sparkline is included. In the example above, I’ve done something a little different, in that I actually provide the sum of the last three data points. This is a quarterly dashboard, but the sparkline has a monthly basis to it to show intra-quarter trends. If the current value is sufficiently below the target threshold, then the value is automatically displayed as bold and red.

There are certainly situations where “Current” would actually be the last point on the sparkline. Like the trend arrow calculations, it’s a judgment call based on the business need that the dashboard supports.

YOY

In the example above, there is a comparison to the prior year. But, this could be a comparison to the target instead. Target-based comparison is even better — straight period-over-period comparisons tend to feel like something of a cop out, as prior periods really are more “benchmarks” than true “targets.” Now, setting a target as something like “15% growth over the prior year” has some validity! That would then impact both the gray sparkline, the “when does Current go bold red,” and this %-based calculation.

28 Data Points

In the version of the widget above, there are 28 unique pieces of data presented for each metric: the metric name (1), the black sparkline (12), the gray sparkline (12), the trend indicator (1), the current value (1), and the year-over-year growth percentage (1). And that’s not counting the conditional formatting that highlights values as bold and red when certain criteria are met. That’s a key aspect of the widget design. 28 sounds like a lot of data to represent for a single metric. Yet, they seem pretty digestible in this format, don’t they?

Let me know what you think. Does this work? What doesn’t work?

Presentation, Reporting

Calculating Trend Indicators

Put this down as one of my more tactical posts, brought on by a fit of lingering annoyance with the use (and by “use” I mean “grotesque misuse”) of trend indicators on reports and dashboards. The trouble is that trends are a trickier business than they seem at first blush, and, at the same time, there are a number of quick and easy ways to calculate them…that are all problematic.

With the well-warranted increasing use of sparklines, which are inherently trend-y representations of data, I like to be able to put a meaningful trend indicator that complements the sparkline. Throughout this post, I will illustrate trendlines, but I’m really focussed on trend indicators, which are a symbol that indicates whether the trend in the data is upward, downward, or flat. Although there are a few minor tweaks I’d love to make once Excel 2010 is released and allows the customization of icon sets, I’m reasonably happy with their 5-arrow set of trend indicators:

Trend Icons

They’re clean and clear, and they work in both color and in black and white. And, with conditional formatting, they can be automatically updated as new data gets added to a dashboard or report. While I won’t show these indicators again in this post, the trendlines I do show are the behind-the-scenes constructs that would manifest themselves as the appropriate indicator next to a sparkline or numerically reported measure.

I’ll use a simple 12-period data set throughout this post to illustrate some thoughts (not as a sparkline, but the principles all still apply):

Sample Data

Trends are slippery beasts for several reasons:

  • Noise, noise, noise — all data is noisy, which means it’s easy to over-read into the data and spot a trend that is not really there
  • The aircraft carrier vs. the speedboat conundrum — the more data points you use, the more stable your trend, but the longer it takes to collect enough data to identify a trend, or, worse, to determine if you’ve truly impacted the trend going forward

Let’s start this exploration by walking through some of the common ways that “trend” judgments get made and point out why they’re troubling. I will then show an alternative that, while only marginally more complex to implement, works better when it comes to specifying trend-age.

Trending Approaches of which I’m Leery

Trending Based on the Change Over the Previous Period

The most common way I see trends reported is on a “change since the previous period” basis.

Prior Period

In this example, the trend would be an “up” because the data went up from the prior period to the current period. The problem with this is that, if you look at the longer pattern of data, you see that the data is pretty noisy, and it’s entirely possible that this “trend” is entirely a case of noise masking the true signal.

Trending Over an Extended Period

Another way to trend your data, which Excel makes very simple, is to add a trendline using Excel’s built-in trending capabilities (converting this trendline to an indicator would require some use of a couple of Excel functions that I’ll go into a bit in my recommended approach later in the post).

Trendline Example

With this method, the trend would be indicated as “slightly up.” While this may be a valid representation of the overall trend…it seldom seems quite right to use it. The trend gets impacted heavily by any sort of big spikes (or dips) in the data. These keep the same upward or downward trend for a very long period of time. I had a blog post during March Madness one year that wound up driving a big spike in traffic to my site. While it was legitimate for that spike to show an upward trend when I looked at my traffic that week or month, that spike has now wreaked havoc on the macro trend indicator that Google Analytics has shown ever since — for several months that spike kept my overall trend up, and, then, once that spike passed the fulcrum of the tool’s trend calculation, it caused the reporting of a downward trend for severals subsequent months. Through the whole period, I had to mentally discount what the trend indicator showed.

Year-Over-Year Trending

Because seasonality wreaks havoc with trendlines, it’s not uncommon to see trend indicators based on year-over-year results — if the current reporting period is a higher number than the same period a year ago, then the trend is up. For trending purposes, this combines the worst of the two prior examples — it takes a very small number of data points (subjecting the assessment to noise) and it uses ancient history data in the equation.

This isn’t to say that comparisons to the same period in the prior year (or even the same period in the prior quarter, since many companies see an intra-quarter pattern) are bad. But, the question those comparisons answer differs from a trend: a trend should be an indication of “where we are heading of late such that, if we continue on the current course, we can estimate whether we will  be doing better or worse next week/next month,” while a year-over-year comparison is more a measure of “did we move positively from where we were last year at this time?”

Trending Approaches I Feel Better About

I’ve spent an embarrassing amount of time thinking about trending over the past four or five years, but I’ve finally settled on an approach that meets all of these criteria:

  • It balances the number of data points available for the trend with the sluggishness/timeliness of the results
  • It’s reasonably intuitive to explain
  • It passes the “sniff test” — while a trend indicator may initially be a little surprising, on closer inspection, the user will realize it’s legit

The last bullet point is really a combination/result of the first two.

My Failed Exploration: Single Point Moving Range (mR)

Because of criteria above, I’ve discarded what I thought was my most promising approach — using the single point moving range (mR). A light bulb went off last spring when I took an intermediate stats class, and, although the professor glossed over the moving range formulas, I thought it was going to be the answer that would allow me to solve my trendline quandary — it would look at the “change over previous period” and determine if that change was sufficiently large to warrant reporting a measurable trend. After noodling with it quite a bit… I don’t think that it works for the purposes of trend indicators. For chuckles, a moving range chart for the example in this post looks like the following:

Moving Range

If you want to read more about moving ranges, the best explanation I found was on the Quality Magazine web site. I’ll just stop there, though. We’ve already lost on the “reasonably intuitive” front, and I haven’t even calculated the control limits yet!

And Another Failed Exploration: the Moving Average

There’s also the “moving average” approach, which smooths things out quite a bit:

Moving Average

I always feel like the moving average is some sort of narcotic applied to the data — it makes things fuzzy by having a single data point factored into multiple points represented on the chart. But, I’ll grudgingly admit that it does have its merits in some cases.

My Approach to Trending (At Last!!!)

There are two key elements to my trending approach, and neither is particularly earth-shattering:

  1. Break the data into smaller components than the reporting cycle
  2. Trend only over recent data, rather than over the entire reported timeframe

Going back to the original example here, let’s say that I update a dashboard once a month, and that the dashboard primarily looks at data for the prior 3 months. In that case, the 12 data points each represent (roughly) one week. IF I simply reported the data on a monthly basis, then the chart would look like this:

Trending Example

That shows a clear upward trend, regardless of whether I look at the last month or the last two months of data. It would be hard not to put an upward trend indicator on this plot. But, we’re relying on all of three data points, and we’re going back three full reporting periods to draw that conclusion. Both of these are a bit concerning. Invariably, we’d want to go back farther in time to get more data points to see if this trend was real…and then we’re falling into the aircraft carrier dilemma.

Instead, though, I can keep the granularity of the reporting at a week, but only trend over the last four periods:

Trendline Proposed Approach

I don’t actually plot the trendline shown in the chart above. Rather, I calculate the formula for the line using the SLOPE and INTERCEPT  Excel functions. I then calculate the value of the 4-weeks-ago endpoint of the line and the most-recent-week endpoint of the line and look at the percentage change from one to the other. I actually set some named cells in my workbook to specify how many periods I report over (so I can vary from 4 to 6 or something else universally) as well as what the different thresholds are for a strong up, weak up, no change, weak down, or strong down trend.

In the example in this post, the change is a 16% drop, which usually would garner a “strong down” trend — very different from all the upward trends in the early examples! And, even somewhat counter-intuitive, as the most recent change was actually an “up.” If the entire range has been trending upward, as shown by the 3-point plot as well as by a close inspection of the raw basic data (think of it as a sparkline), then you already have that information available as the longer term trend, but, of late, the trend seems to be somewhat downward.

A Note of Caution

This post has gone through what works for me as a general rule. As I read back over it, I realize I’m setting myself up for a case of, “Yeah, you CAN make the data say whatever you want.”

I’m less concerned about prescribing a universally-effective approach to trend calculation as I am about putting out a cautionary tone on the various “obvious” ways to calculate a trend. The sniff test is important — does the trend work for your specific situation when you actually apply it? Or, have you adopted a simplistic, formulaic approach that can actually provide a very clear misrepresentation of the data?

And…a Nod to Efficiency and Automation

The prospect of introducing SLOPE and INTERCEPT functions may seem a little intimidating from a maintenance and updating perspective, but it really doesn’t need to be. By using built-in Excel functionality, these can be set up once and then dynamically updated as new data comes in. I like to build spreadsheets with a data selector so that the dashboard is a poor man’s BI tool that allows exploring how the data has changed over time. The key is to use some of Excel’s most powerful, yet under-adopted, features:

  • Conditional formatting — especially in Excel 2007 where conditional formatting can make use of customized icon sets
  • Named cells and named ranges — these are handy for establishing constants used throughout the workbook (thresholds, for instance) that you may want to adjust
  • Data validation — using a cell as your “date range selector” that references a named range of the column that lists the dates for which you record the data
  • VLOOKUP — because you used data validation, you can then use VLOOKUP to find the current data based on what is selected by the user
  • Dynamic charts — these actually aren’t a “feature” of Excel so much as the clever combination of several different features; Jon Peltier has an excellent write-up of how to do this

If set up properly, a little investment up front can make for an easily updated report delivery tool…with meaningful trend indicators!

Presentation

Dashboard Development and Unleashing Creative Juices

Ryan Goodman of Centigon Solutions wrote up his take on a recent discussion on LinkedIn that centered on the tension between data visualization that is “flashy” versus data visualization that rigorously adheres to the teachings of Tufte and Few.

The third point in Goodman’s take is worth quoting almost in its entirety, as it is both spot-on and eloquent:

Everyone has a creative side, but someone who has never picked up a design book with an emphasis on data visualization should not implement dashboards for their own company and certainly not as a consultant. Dashboard development is not the forum to unleash creative juices when the intent is to monitor business performance. Working with clients who have educated themselves have[sic] definitely facilitated more productive engagements. Reading a book does not make you an expert, but it does allow for more constructive discussions and a smoother delivery of a dashboard.

“The book” of choice (in my mind, and, I suspect, in Goodman’s) is Few’s Information Dashboard Design: The Effective Visual Communication of Data (which I’ve written about before). Data visualization is one of those areas where spending just an hour or two understanding some best practices, and, more importantly, why those are best practices, can drive a permanent and positive change in behavior, both for analytical-types with little visual design aptitude and for visual design-types with little analytical background.

Goodman goes on in his post to be somewhat ambivalent about tool vendors’ responsibility and culpability when it comes to data visualization misfires. On the one hand, he feels like Few is overly harsh when it comes to criticizing vendors whose demos illustrate worst practice visualizations (I agree with Few on this one). But, he also acknowledges that vendors need to “put their best foot forward to prove that their technology can deliver adequate dashboard execution as well as marketing sizzle.” I agree there, too.

Excel Tips, Presentation

Data Visualization that Is Colorblind-Friendly — Excel 2007?

Wow. This post started out not as a post, but as what I thought was going to be a 5-minute exercise with Google to download a colorblind-friendly palette for Excel charts. That was two weeks ago, and this post is just scratching the surface.

Several weeks ago, one of the presenters in a meeting showed some data as a map overlay. As soon as she projected the first map, someone in the meeting quipped, “Good luck understanding this one, Jim!” Jim, you see, is colorblind. And, apparently, most of the people in the meeting knew it. Approximately 8% of men have some form of color blindness (it’s much more rare in women — only 1 in 200). And the overlays on the map were color-coded very subtly. Jim commented that it was hopeless!

As it happened, I was exploring a fresh set of data that same week, as we’d recently rolled out some new customer data capture capabilities. As I worked through how best to present the results, I decided to grab a colorblind-friendly palette from the web and use it in the visualization of the information. I’d hoped to find a site with one or more Excel files that I could download with such a palette, but, worst case, I was prepared to snag a palette and manually update my Excel file (for future sharing on this blog, of course!).

No. Such. Luck!

What I did find was a slew of information on the different types of color blindness (which I’ll touch on briefly in a bit), as well as a bevy of almost-useful tools and palettes:

  • How to make figures and presentations that are friendly to Colorblind people — ultimately, I used the palette that is ~2/3 of the way down this page for my spreadsheet (the figure labeled “Set of colors that is unambiguous both to colorblinds and non-colorblinds”).  Mr. Excel actually references this palette and provides a macro that will update a workbook’s palette with this palette. The downside of this palette is that, while it may be plenty functional, I can’t say I’m wild about it from an aesthetic viewpoint. But, I’d spent the 30 minutes I’d given myself to dig, so I ran with it.
  • Colorjack — a nifty tool for finding a color palette. Unfortunately…there’s no way to test how colorblind-friendly any of the palettes are
  • Colorblind Web Page Filter — there were a number of tools for sale that would simulate how content would appear to people with different forms of colorblindness, but this is the (free) online tool I wound up using for the exercise below. It couldn’t be easier to use — you just provide a URL and what form of color blindness you’re interested in, and it renders it

So, aside from the one palette that was solely focussed on functionality and not at all on aesthetics, I struck out. As I pondered this over the next few days, it occurred to me that, perhaps Excel’s default colors always seemed so gosh-awful because they were actually developed explicitly with colorblindness in mind. I could not find any documentation to support the theory…so I turned left and headed down that rathole to see if I could figure it out myself.

The exercise was pretty simple. I created a 10-color bar chart using the Excel 2007 default palette. Note: This was created purely for palette-testing — this actual chart is a great example of needlessly using more color than is needed! Here’s the chart:

Excel 2007 Default Chart Colors
Excel 2007 Default Chart Colors

Like the one colorblind-friendly palette I found online, I really don’t like the aesthetics of this palette. It’s been toned down a bit from the Excel 2003 (and earlier) versions, which is good, but it still seems rather harsh. Could that be for colorblind compatibility? I think so! I took the chart above and ran it through the Colorblind Web Page Filter mentioned above for the four most common types of color blindness (as described in a Pearson report by Betsy J. Case):

Excel 2007 Default Chart Colors -- Deuteranomaly (Affects 4.9% of Males)
Deuteranomaly (Affects 4.9% of Men)
Excel 2007 Default Chart Colors -- Deuteranopia (Affects 1.1% of Men)
Deuteranopia (Affects 1.1% of Men)
Excel 2007 Default Chart Colors -- Protanopia (Affects 1% of Men)
Protanopia (Affects 1% of Men)
Excel 2007 Default Chart Colors -- Protanomaly (Affects 1% of Men)
Protanomaly (Affects 1% of Men)

Overall, the palette seems workable in all four situations. The first three colors absolutely work. Color 4, as well as color 5, start to lose a little contrast from color 1, but they still seem manageable. Color 5 and color 7, as well as color 10, start to get a little problematic in some cases, but, if you’re going beyond four colors in a single chart, you might need to reconsider your chart type anyway. Right?

Now, one final test: for achromatopsia. On the one hand, this is extremely rare. On the other hand…it’s common when your office has a lot of black-and-white printers:

Excel 2007 Default Chart Colors -- Achromatopsia
Achromatopsia (Extremely Rare)

Apparently, any palette that works in grayscale is a quick way to check for compatibility with all forms of colorblindness. It’s also…a best practice. Interestingly, the Excel 2007 palette really lays an egg here, in that colors 1, 2, and 4 are all barely distinguishable!

Clearly, there is an opportunity here to test a variety of functional, attractive palettes for grayscale printability and the top four forms of colorblindness and develop something better than the Excel defaults. But, that’s an exercise for another time. I think I’ll aim for the first four colors of the palette being “highly distinguishable” in all scenarios and the next four being “functionally distinguishable.” What do you think? Would this be useful? What else should I take into consideration?

Presentation

Recovery.gov Needs Some Few and Some Tufte

I caught an NPR story about recovery.gov last week, and it sounded really promising. Depending on where you fall on the political spectrum, the various rounds of stimulus and bailout funding that have come through over the past six months fall somewhere between “throwing money away,” “ready, fire, aim,” and “point in what seems what might be a good direction, pull the finger, and shoot.” No one can stand up and say, with 100% certainty, that we’re not going to look back on this approach in a decade or two and say, “Um…oops?”

It’s hard to imagine anyone taking issue with the proclaimed intent of recovery.gov, though — make the process as transparent as possible, including how much money is going where, when it’s going, and what ultimately comes of it. It was a day or two before I found myself at a computer with time to check out the site…and I was disappointed. In the NPR interview, the interviewer commented how the site was slick and clean. Reality is “not so much.”

Now, I did once take a run at downloading the federal budget to try to scratch a curiousity itch regarding, at a macro level, where the federal government allocates its funds. On the one hand, I was pleased that I was able to find a .csv file with a sea of data that I could easily download and open with Excel. On the other hand, the budget is incredibly complex, and it takes someone with a deeper understanding of our government to really translate that sea of data into the answers I was looking for. Really, though, that wasn’t a surprise:

The data is ALWAYS more complex than you would like…when you’re trying to answer a specific question.

To the credit of recovery.gov, they clearly intended to show some high-level charts that would answer some of the more common questions citizens are asking. Unfortunately, it looks like they turned over the exercise to a web designer who had no experience in data visualization.

Examples from the featured area on the home page:

recovery.gov Funds Distribution Reported by Week

The overall dark/inverse style itself I won’t knock too much (althought it bothers me). And, the fact that the gridlines are kept to a minimum is definitely a good thing. My main beef is admittedly a bit ticky-tack. There was an earlier version where there was a $30 B gridline, and that has since been removed — that gridline clearly showed the “30.5 B point” being below the midway point between 20 B and 40 B. Clearly, someone would have to really be scrutinizing the graph to identify this hiccup, but someone will.

When presenting data to an audience, the data as it stands alone needs to be rock solid. If it contradicts itself, even in a minor way, it risks having its overall credibility questioned.

So, moving on to some more egregious examples:

recover.gov Relief for America's Working Families

We get a triple-whammy with this one:

  • Pie charts are inherently difficult for the human brain to interpret accurately
  • Pie charts are even worse when they are “tilted” to give a 3D effect — the wedges on the right and left get “shrunk” while wedges on the top or bottom get “stretched”
  • Exploding a pie chart and then providing a pie chart of just the wedge…just ain’t good

Two questions this visualization might have been trying to answer:

  • How much of the stimulus plan is devoted to tax benefits?
  • How much of the stimulus plan is going to the “Making Work Pay” tax credit?

Without doing any math, can you estimate either one of these? For the first question, you’re estimating the size of the small wedge on the left pie chart. It looks like it’s ~ 1/4 of the pie, doesn’t it? In reality, it’s 37%! For the second question, you have to combine your first estimate with an estimate of the lavender wedge in the right pie chart…and that’s way more work than it’s worth. If you do the math, you’ll get that the lavender wedge works out to ~7% of the entire left pie. A simple table or a bar graph would be more effective.

And, finally, the estimated distribution of Highway Infrastructure Funds:

recovery.gov Distribution of Highway Infrastructure Funding

Well, that’s just silly. There is NO value of making these bars come flying out of the graph. Really.

Now, to the site’s credit, it takes all of 3 clicks to get from the home page to downloading .csv files with department-specific data and weekly updates (which includes human-entered context as to major activities during the prior week). That’s good (assuming it’s not unduly cumbersome to maintain)! And, I’m sure the site will continue to evolve. But, I’d love to see them bring in some data visualization expertise. The model for the visualization should be pretty simple:

  1. Identify the questions that citizens are asking about the stimulus money
  2. Present the data in the way that answers those questions most effectively
  3. Link to the underlying data — the aggregate and the detail — directly from each visualization

As it turns out, Edward Tufte has already been engaged (thanks to Peter Couvares for that tip via Twitter), and is doing some pro bono work. But, it’s not clear that he’s focussing on the high-level stuff. I would love to see Stephen Few get involved as well — pro bono or not! Or, hell, I’d offer my services…but might as well get the Top Dog for something like this.

Starting today, the site is hosting a weeklong online dialogue to engage the public, potential recipients, solution providers, and state, local and tribal partners about how to make Recovery.gov better. I’ve submitted a couple of ideas already!


General, Presentation

PowerPoint / Presentations / Data Visualization

I wrote a post last week about PowerPoint and how easy it is to use it carelessly — to just open it up and start dumping in a bunch of thoughts and then rearranging the slides. That post wound up being, largely, a big, fat nod to Garr Reynolds / Presentation Zen. Since then, I’ve been getting hit right and left with schtuff that’s had me thinking more broadly about effective communication of information in a business environment:

Put all of those together, and I’ve got a mental convergence of PowerPoint usage, presenting effectively (which goes well beyond “the deck”), and data visualization. These are all components of “effective communication” — the story, the content, how the content is displayed, how the content is talked to. In one of Reynolds’s sets of sample slides, you can clearly see the convergence of data visualization and PowerPoint. And, even he admits that this is a tricky thing to post…because it removes overall context for the content and it removes the presenter. Clearly, there are lots of resources out there that lay out fundamental best practices for effectively communicating in a presentation-style format. Three interrelated challenges, though:

  • The importance of learning these fundamentals is wildly undervalued — it sounds like Abela’s book tries to quantify this value through tangible examples…but it’s a niche book that, I suspect, will not get widely read by the people who would most benefit from reading it
  • “I need to put together a presentation for <tomorrow>/<Friday>/<next week>” — we’re living under enormous time pressure, and it’s incredibly easy to get caught up in “delivering a substantive deliverable” rather than “effectively communicating the information.” When I think about the number of presentations that I’ve developed and delivered over the past 15 years, the percentage that were truly effective, compelling, and engaging is abysmally small. And that’s a waste.
  • Culture/expectations — every company has its own culture and norms. For many companies, the norms regarding presentations are that they are linear, slide-heavy, logically compiled, and mechanically delivered affairs. For recurring meetings, there is often the “template we use every month” whereby the structure is pre-defined, and each subsequent presentation is an update to the skeleton from the prior meeting. Walk into one of those meetings and deliver a truly rich, meaningful, presentation…and your liable to be shuttled off for a mandatory drug test, followed by a dressing down about “lack of proper preparation” because the slides were not sufficiently text/fact/content-heavy. <sigh>

What’s interesting to me is that I have spent a lot of time and energy boning up on my data visualization skills over the past few years. And, even if it takes me an extra 5-10 minutes in Excel, I never send out something that doesn’t have data viz best practices applied to some extent. As you would expect, applying those best practices is getting easier and faster with repetition and practice. So, can I do the same for presentations? And, again, that’s presentations-the-whole-enchilada, rather than presentations-the-PowerPoint-deck. Can I balance that with cultural norms — gently pushing the envelope rather than making a radical break? Can you? Should you?

Excel Tips, Presentation

Data Visualization — March Madness Style

I got an e-mail last week just a few hours into Round 1 of this year’s NCAA men’s basketball tournament. The subject of the email was simply “dumb graph,” and the key line in the note was:

The “game flow” graph…how in the WORLD is that telling me anything? That the score goes up as the game goes on? Really? Ya think?

My friend was referring to the diagrams that ESPN.com is providing for every game in the tournament. The concept of these graphs is pretty simple: plot the score for each team over the course of the game. For instance, the “Game Flow” graph for the Oklahoma vs. Morgan State game looks like this (you can see the actual graph on the game recap page — just scroll down a bit and it’s on the right):

Oklahoma vs. Morgan State

This isn’t an exact replication, but it’s pretty close — best I could manage in Excel 2007 (the raw data is courtesy of the ESPN.com play-by-play page  for the game). ESPN’s graph is a Flash-based chart, so it’s got some interactivity that the image above does not (we’ll get to that in a bit).

The graph shows that the game was tight for the first 4-5 minutes, then Oklahoma pulled away, Morgan State made it really close mid-way through the first half, and then Oklahoma pulled away and never looked back. My friend had a point, though —  the dominant feature of the graph is that both lines trend up and to the right…and any chart of a basketball game is going to exhibit that pattern (actually, the play-by-play for that game has a couple of hiccups such that, when I originally pulled the data, I had a couple places where the score went down due to out-of-sequence free throw placement…but I noticed the issue and fixed it). In business, we’re pretty well conditioned to see “up and to the right” as a good thing…but it’s meaningless in the case of a basketball game.

Compare that graph to a game that was much closer — the Clemson vs. Michigan game (the graph on ESPN’s site is on the recap page, and the raw data is on the play-by-play page):

Clemson vs. Michigan

This was a tighter game all through the first half. Clemson led for the first 7-8 minutes, Michigan pulled substantially ahead early in the second half, and then things got tight in the last few minutes of the game. But, again, both lines moved up and to the right.

These charts are not difficult to interpret:

  • The line on top is the team that is leading
  • The distance between the lines is the size of the lead
  • The lines crossing signifies a lead change

But, could we do better? Well, my wife and kids are out-of-town for the week (spring break), I have the social life you’d expect from someone who blogs about data and data visualization, and the fridge is well-stocked with beer. Party. ON!

At best, my level of basketball fan-ness hovers right around “casual.” Still, I follow it enough to know the key factors of a game update or game upset (Think: “Hey, Joe. What’s the score?”). Basically:

  • Who’s winning?
  • By how much?

(If there’s time for a third data point, the actual score is an indication of whether it’s a high scoring shootout or a low scoring defense-oriented game.)

Given these two factors as the key measures of a game, take another look at the graphs above. When the game is tight, you have to look closely to assess who is winning. And, determining how much they’re winning by requires some mental exertion (try it yourself: look back at the last graph and ask yourself how much Michigan was winning by halfway through the second half).

This is just begging for a Stephen Few-style exercise to see if I can do better.

First, the Oklahoma/Morgan State game:

Oklahoma vs. Morgan State

Rather than plotting both team’s scores, with the total score on the Y-axis, this chart plots a single line with the size of the lead — whichever side of the “0” line the plot is on is the team that is winning. The team on the top is the higher seed, and the team on the bottom is the lower seed. I added the actual score at halftime and the end of the game, as well as each team’s seed. Compare that chart to the much closer Clemson/Michigan game:

Clemson vs. Michigan

The chart looks very different — focussing on what information fans really want and presenting it directly, rather than presenting the data in a way that requires mental exertion to derive what the fan is really interested in: who’s winning and by how much? While the graphs on ESPN’s site allow you to mouse over any point in the game and see the exact score and the exact amount of time remaining, it’s hard to imagine who would actually care to do that — better to come up with an information-rich and easy-to-interpret static chart than to get fancy with unnecessary interactivity.

A few other subtle changes to the alternative representation:

  • I tried to dramatically increase the “data-pixel ratio” (Few’s principle that the ratio of actual data to decoration should be maximized) — this is a little unfair to ESPN, as their site is working with an overall style and palette for the site, but it’s still worth keeping in mind
  • I used color on the Y-axis to show which team’s lead is above/below the mid-line. The numbers below the middle horizontal line are actually negative numbers, but with a little Excel trickery, I was able to remove the “-” and change the color of the labels (all done through Custom number formatting)
  • By putting the top seed on the top, looking at a full page of these charts would quickly highlight the games that were upsets

I’m my own worst critic, so here are two things I don’t like about the alternate charts above:

  • The overall palette still feels a little clunky — the main data plot doesn’t seem to “pop” as much as it should, even though it’s black, and the shaded heading doesn’t feel right
  • While the interpretation of the data requires less mental effort once you understand what the chart is showing, it does seem like this approach requires another half-second of interpretation upr front that the original charts don’t require

What do you think? What else could I try to improve the representation?

Presentation

Data Visualization — Few's Examples

I attended a United Way meeting last week that was hosted at an overburdened county government agency site in south Columbus. The gist of the meeting was discussing the bleakness of the economy and what that could or should mean to the work of the committee. The head of the government agency did a brief presentation on what the agency does and what they are seeing, and the presentation included the distribution of a packet of charts with data the agency tracks.

I was struck by how absolutely horridly the information was presented. A note at the bottom of each chart indicated that the same staff member had compiled each chart. Yet, there was absolutely no consistency from one chart to the next: the color palette changed from chart to chart (and none of the palettes were particularly good), a 3-D effect was used on some charts and not others (3-D effects are always bad, so I suppose I’d rather inconsistency than having 3-D effects on every chart), and totally different chart types were used to present similar information. On several of the bar charts, each bar was a different color, which made for an extremely distracting visualization of the information. 

I glanced around the room and saw that most of the other committee members had furrowed brows as they studied the information. It occurred to me that there was an undue amount of mental exertion going on to understand what was being presented that would have been better spent thinking through the implications of the information.

Ineffective presentation of data can significantly mute the value of a fundamentally useful report or analysis.

Show Me the NumbersLater that evening, I found myself popping around the web — ordering my own copy of Stephen Few’s Show Me the Numbers, and, later, poking around on Few’s site. Specifically, I spent some time on his Examples page, browsing through the myriad before/after examples that clearly illustrate how the same information, presented with the same amount of effort, but using some basic common principles, dramatically reduce the mental effort required to understand what is going on.

It’s a fascinating collection of examples. And Show Me the Numbers is a seminal book on the topic.

Analysis, Analytics Strategy, Excel Tips, General, Presentation, Reporting

The Best Little Book on Data

How’s that for a book title? Would it pique your interest? Would you download it and read it? Do you have friends or co-workers who would be interested in it?

Why am I asking?

Because it doesn’t exist. Yet. Call it a working title for a project I’ve been kicking around in my head for a couple of years. In a lot of ways, this blog has been and continues to be a way for me to jot down and try out ideas to include in the book. This is my first stab at trying to capture a real structure, though.

The Best Little Book on Data

In my mind, the book will be a quick, easy read — as entertaining as a greased pig loose at a black-tie political fundraiser — but will really hammer home some key concepts around how to use data effectively. If I’m lucky, I’ll talk a cartoonist into some pen-and-ink, one-panel chucklers to sprinkle throughout it. I’ll come up with some sort of theme that will tie the chapter titles together — “myths” would be good…except that means every title is basically a negative of the subject; “Commandments” could work…but I’m too inherently politically correct to really be comfortable with biblical overtones; an “…In which our hero…” style (the “hero” being the reader, I guess?). Obviously, I need to work that out.

First cut at the structure:

  • Introduction — who this book is for; in a nutshell, it’s targeted at anyone in business who knows they have a lot of data, who knows they need to be using that data…but who wants some practical tips and concepts as to how to actually go about doing just that.
  • Chapter 1: Start with the Data…If You Want to Guarantee Failure — it’s tempting to think that, to use data effectively, the first thing you should do is go out and query/pull the data that you’re interested in. That’s a great way to get lost in spreadsheets and emerge hours (or days!) later with some charts that are, at best, interesting but not actionable, and, at worst, not even interesting.
  • Chapter 2: Metrics vs. Analysis — providing some real clarity regarding the fundamentally different ways to “use data.” Metrics are for performance measurement and monitoring — they are all about the “what” and are tied to objectives and targets. Analysis is all about the “why” — it’s exploratory and needs to be hypothesis driven. Operational data is a third way, but not really covered in the book, so probably described here just to complete the framework.
  • Chapter 3: Objective Clarity — a deeper dive into setting up metrics/performance measurement, and how to start with being clear as to the objectives for what’s being measured, going from there to identifying metrics (direct measures combined with proxy measures), establishing targets for the metrics (and why, “I can’t set one until I’ve tracked it for a while” is a total copout), and validating the framework
  • Chapter 4: When “The Metric Went Up” Doesn’t Mean a Gosh Darn Thing — another chapter on metrics/performance measuremen. A discussion of the temptation to over-interpret time-based performance metrics. If a key metric is higher this month than last month…it doesn’t necessarily mean things are improving. This includes a high-level discussion of “signal vs. noise,” an illustration of how easy it is to get lulled into believing something is “good” or “bad” when it’s really “inconclusive,” and some techniques for avoiding this pitfall (such as using simple, rudimentary control limits to frame trend data).
  • Chapter 5: Remember the Scientific Method? — a deeper dive on analysis and how it needs to be hypothesis-driven…but with the twist that you should validate that the results will be actionable just by assessing the hypothesis before actually pulling data and conducting the analysis
  • Chapter 6: Data Visualization Matters — largely, a summary/highlights of the stellar work that Stephen Few has done (and, since he built on Tufte’s work, I’m sure there would be some level of homage to him as well). This will include a discussion of how graphic designers tend to not be wired to think about data and analysis, while highly data-oriented people tend to fall short when it comes to visual talent. Yet…to really deliver useful information, these have to come together. And, of course, illustrative before/after examples.
  • Chapter 7: Microsoft Excel…and Why BI Vendors Hate It — the BI industry has tried to equate MS Excel with “spreadmarts” and, by extension, deride any company that is relying heavily on Excel for reporting and/or analysis as being wildly early on the maturity curve when it comes to using data. This chapter will blow some holes in that…while also providing guidance on when/where/how BI tools are needed (I don’t know where data warehousing will fit in — this chapter, a new chapter, or not at all). This chapter would also reference some freely downloadable spreadsheets with examples, macros, and instructions for customizing an Excel implementation to do some of the data visualization work that Excel can do…but doesn’t default to. Hmmm… JT? Miriam? I’m seeing myself snooping for some help from the experts on these!
  • Chapter 8: Your Data is Dirty. Get Over It. — CRM data, ERP data, web analytics data, it doesn’t matter what kind of data. It’s always dirtier than the people who haven’t really drilled down into it assume. It’s really easy to get hung up on this when you start digging into it…and that’s a good way to waste a lot of effort. Which isn’t to say that some understanding of data gaps and shortcomings isn’t important.
  • Chapter 9: Web Analytics — I’m not sure exactly where this fits, but it feels like it would be a mistake to not provide at least a basic overview of web analytics, pitfalls (which really go to not applying the core concepts already covered, but web analytics tools make it easy to forget them), and maybe even providing some thoughts on social media measurement.
  • Chapter 10: A Collection of Data Cliches and Myths — This may actually be more of an appendix, but it’s worth sharing the cliches that are wrong and myths that are worth filing away, I think: “the myth of the step function” (unrealistic expectations), “the myth that people are cows” (might put this in the web analytics section), “if you can’t measure it, don’t do it” (and why that’s just plain silliness)
  • Chapter 11: Bringing It All Together — I assume there will be such a chapter, but I’m going to have to rely on nailing the theme and the overall structure before I know how it will shake out.

What do you think? What’s missing? Which of these remind you of anecdotes in your own experience (haven’t you always dreamed of being included in the Acknowledgments section of a book? Even if it’s a free eBook?)? What topic(s) are you most interested in? Back to the questions I opened this post with — would you be interested in reading this book, and do you have friends or co-workers who would be interested? Or, am I just imagining that this would fill a gap that many businesses are struggling with?

Presentation

Harvey Balls: A Good Way to Ramp Back Up

As you may have noticed, my blogging here over the past couple of months has been pretty sparse. That was largely because I was winding down one job, taking a “break” between jobs, and then ramping up in my new job. The first was mentally exhausting, the second was physically exhausting, and the third was back to mentally exhausting. Plus, I’ve gone from a company with 35 employees to a company with 38,000, and I wanted to get my feet wet and see if there were guidelines of any sort regarding this sort of blogging by employees. What I’ve found out is that they’re really embracing social media as it should be embraced, so I’m excited that I’ll get to start blogging on data management (specifically address management). But…that’s not this post!

I’ve been sitting in on some vendor selection meetings in my new role, and, today, I was part of a “summary and recommendations” presentation review that had a really nice use of “Harvey Balls” on the key summary slide. Harvey balls? These things, which you might be familiar with from Consumer Reports:

Wikipedia has a brief, yet interesting, history on the subject, including some practical tips for actually putting them to use.

I was struck by how effective they were at providing summary information. In today’s case, they were used to iconically represent the roll-up of ratings for groups of vendor requirements. It worked. Obviously, there is a loss of granularity here, but, in the situation today, we were looking at 8 different groups of requirements across three different alternatives, so we were still looking at 24 summary data points on a single, clean slide (there was also some subtle background shading that prioritized the groups of requirements, which almost worked…but we decided that sorting the groups from highest to lowest would probably do the trick as well).

There’s a part of me that sees a circle and thinks “pie chart” and cringes. But these really aren’t pie charts. Well…actually…they sorta’ are…but I’m still okay with them. Now, if someone started trying to use 100 different Harvey Balls where the granularity of each “wedge” was deeper than just a quadrant, I’d have a problem with it.

Overall, it’s one more tool for my (and your) data visualization toolkit!

Analysis, Presentation, Reporting

Techrigy — New Kid on the Social Media Measurement Block

When Connie Bensen posted that she had formalized a relationship with Techrigy to work on their community, I had to take a look! She gave me a demo of their SM2 product today, and it is very cool. SM2 is pretty clearly competing with radian6, in that their tool is geared around measuring and monitoring a brand/person/company/product’s presence in the world of Web 2.0. I’m not an expert on this space by any means, although I have caught myself describing these sorts of tools as “clip services” for social media. But, hey, I’m not a PR person, either, so I barely know what clip services do!

I started out by stating how little I know about this area for a reason. It’s because this post is my take on the tool from something of a business intelligence purist perspective. Take it for what it’s worth.

What I Liked

The things that impressed me about SM2 — either enough to stick in my head through the rest of the day or because I jotted them down:

  • They brought a community expert (Connie) on board early; on the one hand, Connie is there to help them “build their community,” which, in and of itself, is a pretty brilliant move. But, what they’ve gotten at the same time is someone who is going to use their product heavily to support herself in the role, which means they’ll be eating their own dogfood and getting a lot of great feedback about what does/does not work from a true thought leader in the space. More on what I expect on that on the “Opportunities for Maturity” below…
  • The tool keeps data for all time — it doesn’t truncate after 30 days or, as I understand it, aggregate data over a certain age so that there is less granularity. I’m not entirely sure, but it sort of sounds like the tool is sitting on a Teradata warehouse. If that’s the case, then they’re starting off with some real data storage and retrieval horsepower — it’s likely to scale well
    UPDATE: I got clarification from Techrigy, and it’s not Teradata (too expensive) as the data store. It’s “a massively parallel array of commodity databases/hardware.” That sounds like fun!
  • Users can actually add data and notes in various ways to the tool; a major hurdle for many BI tools is that they are built to allow users to query, report, slice, dice, and, generally pull data…but don’t provide users with a way to annotate the data; I would claim this is one of the reasons that Excel remains so popular — users need to make notes on the data as they’re evaluating it. Some of the ways SM2 allows this sort of thing:
    • On some of there core trending charts, the user can enter “events” — providing color around a spike or dip by noting a particular promotion, related news event, a crisis of some sort, etc. That is cool:
    • The tool allows drilling down all the way to specific blog authors — there is a “Notes” section where the user can actually comment about the author: “tried to contact three times and never heard back,” “is very interested in what we’re doing,” etc. This is by no means a robust workflow, but is seemed like it would have some useful applications
    • The user could override some of the assessments that the tool made — if it included references from “high authority” sources that really weren’t…the user could change the rating of the reference
  • Integration at some level with Technorati, Alexa, and compete.com — it’s great to see third-party data sources included out of the box (although it’s not entirely clear how deep that integration goes); all three of these have their own shortcomings, but they all also have a wealth of data and are good at what they do; SM2 actually has an “SM2 Popularity” calculation that is analogous to Technorati Authority (or Google PageRank, to extend it a bit farther)
  • The overall interface is very clean — much more Google Analytics‘y than WebTrends-y (sorry, WebTrends)

Overall, the tool looks very promising! But, it’s still got a little growing up to do, from what I could see.

Opportunities for Maturity

I need to put in another disclaimer: I got an hour long demo of the tool. I saw it, but haven’t used it.

With that said, there were a few things that jumped out at me as, “Whoa there, Nellie!” issues. All are fixable and, I suspect, fixable rather easily:

  • I said the interface overall was really clean, and the screen capture above is a good example — Stephen Few would be proud, for the most part. Unfortunately, there are some pretty big no-no’s buried in the application as well from a data visualization perspective:
    • The 3D effect on a bar chart is pointless and evil
    • The tool uses pie charts periodically, which are generally a bad idea; worse, though, is that they frequently represent data where there is a significant “Unknown” percentage — the tool consistently seems to put “Unknown: <number>” under the graph. The problem is that pie charts are deeply rooted in our brains to represent “the whole” — not “the whole…except for the 90% that we’re excluding”

    The good news on this is that, whatever tool SM2 is running under the hood to do the visualization clearly has the flexibility to present the data just about any way they want (see the screen capture earlier in this post; it should be an easy fix

  • The “flexibility” of the tool is currently taken to a bit of an extreme. This is really a bit of an add-on to the prior point — it doesn’t look like any capabilities of the underlying visual display tool have been turned off. There are charting and graphing options that make the data completely nonsensical. This is actually fairly common in technology-driven companies (especially software companies): make the tool infinitely flexible so that the user “can” do anything he wants. The problem? Most of the users are going to simply stick with the defaults…and even more so if clicking on any of the buttons to tweak the defaults brings on a tidal wave of flexibility. Can you say…Microsoft Word?
  • There is some language/labeling inconsistency in the tool, which they’re clearly working to clean up. But, the tool has the concept of “Categories,” which, as far as I could tell, was a flat list of taggability. That meant that a “category” could be “Blogs.” Another category could be “Blogger,” which is a subset of Blogs…presumably. Another category could be “mobile healthcare,” which is really more of a keyword. In some places, these different types of tags/categories were split out, but the “Categories” area, which can be used for filtering and slicing the data, seemed to invite apples-and-oranges comparison. This one, definitely, may just be me not fully understanding the tool

Overall, Though, I’d Give It a “Strong Buy”

The company and the product seem to have a really solid foundation — strategy, approach, infrastructure, and so on. There are some little things that jumped out at me as clear areas for improvement…but they’re small and agile, so I suspect they’ll take feedback and incorporate it quickly. And, most of the things I noticed are the same traps that the enterprise BI vendors stumble into release after release after release.

Mostly, I’m interested to see what Connie comes up with as she gets in and actually road tests the tool for herself and for Techrigy. In one sense, SM2 is “just” an efficiency tool — it’s pulling together and reporting data that is available already through Google Alerts, Twitter Search, Twemes, Technorati, and so on. And, with many of these tools providing information through customized RSS feeds, a little work with Yahoo! Pipes can aggregate that information nicely. The problem is that it takes a lot of digging to get that set up, and the end result is still going to be clunky. SM2 is set up to do a really nice job of knocking out that legwork and presenting the information in a way that is useful and actionable.

Fun stuff!

Presentation, Reporting

Dashboard Design Part 3 of 3: An Iterative Tale

On Monday, we covered the first chapter of this bedtime tale of dashboard creation: a cutesy approach that made the dashboard into a straight-up reflection of our sales funnel. Last night, we followed that up with the next performance management tracking beast — a scorecard that had lots (too much) detail and too much equality across the various metrics. Tonight’s tale is where we find a happy ending, so snuggle in, kids, and I’ll tell you about…

Version 3 – Hey…Windows Was a Total POS until 3.1…So I’m Not Feeling Too Bad!

(What’s “POS?” Um…go ask your mother. But don’t tell her you heard the term from me!)

As it turned out, versions 1 and 2, combined with some of the process evolution the business had undergone, combined with some data visualization research and experimentation, meant that I was a week’s worth of evenings and a decent chunk of one weekend away from something that actually works:

Some of the keys that make this work:

  • Heavy focus on Few’s Tufte-derived “data-pixel ratio” –- asking the question for everything on the dashboard: “If it’s not white space, does it have a real purpose for being on the dashboard?” And, only including elements where the answer is, “Yes.”
  • Recognition that all metrics aren’t equal –- I seriously beefed up the most critical, end-of-the-day metrics (almost too much – there’s a plan for the one bar chart to be scaled down in the future once a couple other metrics are available)
  • The exact number of what we did six months ago isn’t important -– I added sparklines (with targets when available) so that the only specific number shown is the month-to-date value for the metric; the sparkline shows how the metric has been trending relative to target
  • Pro-rating the targets -– it made for formulas that were a bit hairier, but each target line now assumes a linear growth over the course of the month; the target on Day 5 of a 30-day month is 1/6 of the total target for the month
  • Simplification of alerts -– instead of red/yellow/green…we went to red/not red; this really makes the trouble spots jump out

Even as I was developing the dashboard, a couple of things clued me in that I was on a good track:

  • I saw data that was important…but that was out of whack or out of date; this spawned some investigations that yielded good results
  • As I circulated the approach for feedback, I started getting questions about specific peaks/valleys/alerts on the dashboard – people wound up skipping the feedback about the dashboard design itself and jumping right to using the data

It took a couple of weeks to get all of the details ironed out, and I took the opportunity to start a new Access database. The one I had been building on for the past year still works and I still use it, but I’d inadvertently built in clunkiness and overhead along the way. Starting “from scratch” was essentially a minor re-architecting of the platform…but in a way that was quick, clean and manageable.

My Takeaways

Looking back, and telling you this story, has given me a chance to reflect on what the key learnings are from this experience. In some cases, the learning has been a reinforcement what I already knew. In others, they were new (to me) ideas:

  • Don’t Stop after Version 1 — obviously, this is a key takeaway from this story, but it’s worth noting. In college, I studied to be an architect, and a problem that I always had over the course of a semester-long design project was that, while some of my peers (many of whom are now successful practicing architects) wound up with designs in the final review that looked radically different from what they started with, I spent most of the semester simply tweaking and tuning whatever I’d come up with in the first version of my design. At the same time, these peers could demonstrate that their core vision for their projects was apparent in all designs, even if it manifested itself very differently from start to finish. This is a useful analogy for dashboard design — don’t treat the dashboard as “done” just because it’s produced and automated, and don’t consider a “win” simply because it delivered value. It’s got to deliver the value you intended, and deliver it well to truly be finished…and then the business can and will evolve, which will drive further modifications.
  • Democratizing Data Visualization Is a “Punt” — in both of the first two dashboards, I had a single visualization approach and I applied that to all of the data. This meant that the data was shoe-horned into whatever that paradigm was, regardless of whether it was data that mattered more as a trend vs. data that mattered more as a snapshot, whether it was data that was a leading indicator  vs. data that was a direct reflection of this month’s results, or whether the data was a metric that tied directly to the business plan vs. data that was “interesting” but not necessarily core to our planning. The third iteration finally broke out of this framework, and the results were startlingly positive.
  • Be Selective about Detailed Data — especially in the second version of the scorecard, we included too much granularity, which made the report overwhelming. To make it useful, the consumers of the dashboard needed to actually take the data and chart it. One of the worst things a data analyst can do is provide a report that requires additional manipulation to draw any conclusions.
  • Targets Matter(!!!) — I’ve mounted various targets-oriented soapboxes in the past, but this experience did nothing if it didn’t shore up that soapbox. The second and third iterations of the dashboard/scorecard included targets for many of the metrics, and this was useful. In some cases, we missed the targets so badly that we had to go back and re-set them. That’s okay. It forced a discussion about whether our assumptions about our business model were valid. We didn’t simply adjust the targets to make them easier to hit — we revisited the underlying business plan based on the realities of our business. This spawned a number of real and needed initiatives.

Will There Be Another Book in the Series?

Even though I am pleased with where the dashboard is today, the story is not finished. Specifically:

  • As I’ve alluded to, there is some missing data here, and there are some process changes in our business that, once completed, will drive some changes to the dashboard; overall, they will make the dashboard more useful
  • As much of a fan as I am of our Excel/Access solution…it has its limitations. I’ve said from the beginning that I was doing functional prototyping. It’s built well enough with Access as a poor man’s operational data store and Excel as the data visualization engine that we can use this for a while…but I also view it as being the basis of requirements for an enterprise BI tool (in this regard, it jibes with a parallel initiative that is client-facing for us). Currently, the dashboard gets updated with current data when either the Director of Finance or I check it out of Sharepoint and click a button. It’s not really a web-based dashboard, it doesn’t allow drilling down to detailed data, and it doesn’t have automated “push” capabilities. These are all improvements that I can’t deliver with the current platform.
  • I don’t know what I don’t know. Do you see any areas of concern or flaws with the iteration described in this post? Have you seen something like this fail…or can you identify why it would fail in your organization?

I don’t know when this next book will be written, but you’ll read it here first!

I hope you’ve enjoyed this tale. Or, if nothing else, it’s done that which is critical for any good bedtime story: it’s put you to sleep!  🙂

Presentation, Reporting

Dashboard Design Part 2 of 3: An Iterative Tale

Yesterday, I described my first shot at developing a weekly corporate dashboard for my current company. It was based on the concept of the sales funnel and, while a lot of good came out of the exercise…it was of no use as a corporate performance management tool.

Tonight’s bedtime story will be chapter 2, where the initial beast was slain and a new beast was created in its place. Gather around, kids, and we’ll explore the new and improved beast…

Version 2: A Partner in Crime and a Christmas Tree Scorecard

Several months after the initial dashboard had died an abrupt and appropriate death, we found ourselves backing into looking at monthly trends on a regular basis for a variety of areas of the business. I was involved, as was our Director of Finance. I honestly don’t remember exactly how it happened, but a soft decree hit both of us that we needed to be circulating that data amongst the management team on a weekly basis.

Now, several very positive things had happened by this point that made the task doable:

  • We’d rolled into a new year, and the budgeting and planning that led up to the new year led to a business plan with more specific targets being set around key areas of the business
  • We had cleaned up our processes — the reality of them rather than simply the theory; they were still far from perfect, but they had moved in the right direction to at least have some level of consistency
  • We had achieved greater agreement/buy-in/understanding that there was underlying and necessary complexity in our business, both our business model and our business processes

Although I would still say we failed, we at least failed forward.

As I recall, the Director of Finance took a first cut at the new scorecard, as he was much more in the thick of things when it came to providing the monthly data to the executive team. I then spent a few evenings filling in some holes and doing some formatting and macro work so that we had a one-page scorecard that showed rolling month-to-month results for a number of metrics. These metrics still flowed loosely from the top to the bottom of a marketing and sales funnel:

Some things we did right:

  • Our IT organization had been very receptive to my “this is a nuisance”-type requests over the preceding months and had taken a number of steps to make much of the data more accessible to me much more efficiently (my “data update” routine dropped from taking my computer over an hour to complete to taking under 5 minutes); “my” data for the scorecard was still pulled from the same underlying Access database, but it was pulled using a whole new set of queries
  • We incorporated a more comprehensive set of metrics -– going beyond simply Sales and Marketing metrics to capture some key Operations data
  • We accepted that we needed to pull some data from the ERP system -– the Director of Finance would handle this and had it down to a 5-minute exercise on his end
  • Because we had targets for many of the metrics, we were able to use conditional formatting to highlight what was on track and what wasn’t. And, we added a macro that would show/hide the targets  to make it easy to reduce the clutter on the scorecard (although it was still cluttered even with the targets hidden)
  • We reported historical data -– the totals for each past month, as well as the color-coding of where that month ended up relative to its target.
  • We allowed a few metrics that did not have targets set -– offending my purist sensibilities, and, honestly, this was the least useful data, but it was appropriate to include in some cases.

We even included limited “drilldown” capability — hyperlinks next to different rows in the scorecard (not shown in the image above) that, when clicked, jumped to another worksheet that had more granular detail.

But the scorecard was still a failure.

We found ourselves updating it once a week and pulling it up for review in a management meeting…and increasingly not discussing it at all. As a matter of fact, just how much of an abstract-but-not-useful picture this weekly exercise became really became clear when we got to version 3…and quickly realized how much of the data we had let lapse when it came to updates.

So, what was wrong with it? Several things:

  • Too much detailed data –- because we had forsaken graphical elements almost entirely, we were able to cram a lot of data into a tabular grid. We found ourselves including some metrics to make the scorecard “complete” simply because we could – for instance, if we included total leads and, as a separate metric, leads who were entirely new to the company, then, for the sake of symmetry, we included the number of leads for the month who were already in our database: new + existing = total. This was redundant and unnecessary
  • We treated all of the metrics the same -– everything was represented as a monthly total, be it the number of leads generated, the number of opportunities closed, the amount of revenue booked, or the headcount for the company; we didn’t think about what really made sense – we just presented it all equally
  • No pro-rating of the targets –- we had a simple red/yellow/green scheme for the conditional formatting alerts; but, we compared the actuals for each metric to the total targets for the month; this meant that, for the first half of the month, virtually every metric was in the red

Pretty quickly, I saw that version 2 represented some improvements from version 1, but, somehow, wasn’t really any better at helping us assess the business.

At that point, we fell into a pretty common trap of data analysts: once a report has stabilized, we find a way to streamline its production and automate it as much as possible simply to remove the tedium of the creation. I’ve got countless examples from my own experience where a BI or web analytics tool has the ability to automate the creation and e-mailing of reports out. Once it’s automated, the cost to produce it each day/week/month goes virtually to zero, so there is no motivation to go back and ask, “Is this of any real value?” Avinash Kaushik calls this being a “reporting squirrel” (see Rule #3 on his post: Six Rules for Creating A Data-Driven Boss) or a “data puke” (see Filter #1 in his post: Consultants, Analysts: Present Impactful Analysis, Insightful Reports), and it’s one of the worst places to find yourself.

Even though I was semi-aware of what had happened, the truth is that we would likely still be cruising along producing this weekly scorecard save for two things:

  • What was acceptable for internal consumption was not acceptable for the reports we provided to our clients. The other almost-full-time analyst in the company and I had embarked on some aggressive self-education when it came to data visualization best practices; we started trolling The Dashboard Spy site, we read some Stephen Few, we poked around in the new visualization features of Excel 2007, and generally started a vigorous internal effort to overhaul the reporting we were providing to our clients (and to ourselves as our own clients)
  • The weekly meeting where the managers reviewed the scorecard got replaced with an “as-needed” meeting, with the decision that the scorecard would still be prepared and presented weekly…to the entire company

So, what really happened was that fear of being humiliated internally spurred another hasty revision of the scorecard…and its evolution into more of a dashboard.

And that, kids, will be the subject of tomorrow’s bedtime tale. But, as you snuggle under your comforter and burrow your head into your pillow, think about the approach I’ve described here. Do you use something similar that actually works? If so, why? What problems do you see with this approach? What do you like?

Presentation, Reporting

Dashboard Design Part 1 of 3: An Iterative Tale

One of my responsibilities when I joined my current company was to institute some level of corporate performance management through the use of KPIs and a scorecard or dashboard. It’s a small company, and it was a fun task. In the end, it took me over a year to get to something that really seems to work. On the one hand, that’s embarrassing. On the other hand, it was a side project that never got a big chunk of my bandwidth. And, like many small companies, we have been fairly dynamic when it comes down to nailing down and articulating the strategies we are using to drive the company.

Looking back, there have been three very distinct versions of the corporate scorecard/dashboard. What drove them, what worked about them, and what didn’t work about them, makes for an interesting story. So gather around, children, and I will regale you with the tale of this sordid adventure. Actually, we don’t have time to go through the whole story tonight, so we’ll hit one chapter a day for the next three days.

If you want to click on your flashlight and pull the covers over your head and do a little extra reading after I turn off the light, Avinash Kaushik has a recent post that was timely for me to read as I worked up this bedtime tale: Consultants, Analysts: Present Impactful Analysis, Insightful Reports. The post has the seven “filters” Avinash developed as he judged a WAA competition, and it’s a bit skewed towards web analytics reporting…but, as usual, it’s pretty easy to extrapolate his thoughts to a broader arena. The first iteration of our corporate dashboard would have gotten hammered by most of his filters. Where we are today (which we’ll get to in due time), isn’t perfect, but it’s much, much better when assessed against these filters.

One key piece of background here is that the technology I’ve had available to me throughout this whole process does not include any of the big “enterprise BI” tools. All three of the iterations were delivered using Excel 2003 and Access 2003, with some hooks into several different backend systems.

That was fine with me for a couple of reasons:

  • It allowed me to produce and iterate on the design quickly and independently – I didn’t need to pull in IT resources for drawn-out development work
  • It was cheap – I didn’t need to invest in any technology beyond what was already on my computer

So, let’s dive in, shall we?

Version 1: The “Clever” Approach As I Learned the Data and the Business

I rolled out the first iteration of a corporate dashboard within a month of starting the job. I took a lot of what I was told about our strategy and objectives at face value and looked at the exercise as being a way to cut my teeth on the company’s data, as well as a way to show that I could produce.

The dashboard I came up with was based on the sales funnel paradigm. We had clearly defined and deployed stages (or so I thought) in the progression of a prospect from the point of being simply a lead all the way through being an opportunity and becoming revenue. We believed that what we needed to keep an eye on week to week was pretty simple:

  • How many people were in each stage
  • How many had moved from one stage to another

We had a well-defined…theoretical…sales funnel. We had Marketing feeding leads into that funnel. Sure, the data in our CRM wasn’t perfect, but by reporting off of it, we would drive improvements in the data integrity by highlighting the occasional wart and inconsistency. Right…?

I crafted the report below. Simply put, the numbers in each box represented the number of leads/opportunities at that stage of our funnel, and the number in each arrow between a box represented the number who had moved from one box to another over the prior week.

High fives all around!

Except…

It became apparent almost immediately the the report was next to useless when it came to its intended purpose:

  • It turned out, our theoretical funnel really didn’t match reality – our funnel had all sorts of opportunities entering and exiting mid-funnel…and there was generally a reasonable explanation each time that happened.
  • There were no targets for any of these numbers – I’d quietly raised this point up front, but was rebuffed with the even-then familiar refrain: “We can’t set a target until we look at the data for a while.” But…no targets were ever set. Partly because…
  • “Time” was poorly represented – the arrows represented a snapshot of movement over the prior week…but no trending information was available
  • Much of the data didn’t “match” the data in the CRM – while the data was coming from the underlying database tables in the CRM, I had to do some cleanup and massaging to make it truly fit the funnel paradigm. Between that and the fact that I was only refreshing my data once/week, a comparison of a report in the CRM to my weekly report invariably invited questions as to why the numbers were different. I could always explain why, and I was always “right,” and it wasn’t exactly that people didn’t trust my report…but it just made them question the overall point a little bit more.
  • I had access to the data in some of our systems…but not all of them; most importantly, our ERP system was not something that had data that was readily accessible either through scheduled report exports or an ODBC connection; and, at the end of the day…that’s where several of our KPIs (in reality…if not named as such) lived; back to my first point, there were theoretical ways to get financial data out of our CRM…but, in practice, there was often a wide gulf between the two.

As I labored to address some of these issues, I wound up with several versions of the report that, tactically, did a decent job…but made the report more confusing.

The sorts of things I tried included:

  • Adding arrows and numbers that would conditionally appear/disappear in light gray that showed non-standard entries/exits from the funnel
  • Adding information within each box to indicate how it compared to the prior week (still not a “trend,” but at least a week-over-week comparison)
  • Adding moving averages for many of the numbers
  • Adding a total for the prior 12 weeks for many of the numbers

All told, I had five different iterations on this concept — each time taking feedback as to what it was lacking or where it was confusing and trying to address it.

To no avail.

Even as I look back on the different iterations now, it’s clear that each iteration introduced as many new issues as it addressed existing ones.

Still, some real good had come of the exercise:

  • I understood the data and our processes quite well -– tracking down why certain opportunities behaved a certain way gave me a firehose sip of knowledge into our internal sales processes
  • With next to zero out-of-pocket technology investment, I’d built a semi-automated process for aggregating and reporting the data –- I had to run a macro in MS Access that took ~1 hour to run (it was pulling data across the Internet from our SaaS CRM) and then do a “Refresh All” in Excel; I still had a little bit of manual work each week, so it took me ~30 minutes each time I produced the report
  • I’d built some credibility and trust with IT –- as I dove in to try to understand the data and processes, I was quickly asking intelligent questions and, on occasion, uncovering minor system bugs

Unfortunately, none of these were really the primary intended goal of the dashboard. The report really just wasn’t of much use to anyone. This came to a head one afternoon after I’d been dutifully producing it each week (and scratching my head as to what it was telling me) when the CEO, in a fit of polite but real pique, popped off, “You know…nobody actually looks at this report! It doesn’t tell us anything useful!” To which I replied, “I couldn’t agree more!” And stopped producing it.

A few months passed, and I focused more of my efforts on helping clean up our processes and doing various ad hoc analyses –- using the knowledge and technology I had picked up through the initial dashboard development, most assuredly…but the idea of a dashboard/scorecard migrated to the back burner.

Tomorrow, kiddies, as I tuck you in at night, I’ll tell the tale of Version 2 — a scorecard with targets! As you drift off to sleep though, ponder this version. What would you have done differently? What problems with it do you see? Is there anything that looks like it holds promise?

Presentation

Test Your Data Visualization IQ

Data visualization has really been on my mind of late. Partly because I’ve personally been struggling to produce some effectively-presented information, and, even more so, because one of my co-workers has been spending even more time overhauling the way we communicate data-driven information to our clients. He’s making a lot more headway on his work than I am on mine, unfortunately (for me).

Last week, he pinged me with a link to a 10-question Graph Design IQ test on the Perceptual Edge web site. He and I both patted ourselves on the back for scoring 10 out of 10…but, then again, he and I had both recently read, and are working to apply, Stephen Few’s Information Dashboard Design. And, Perceptual Edge = Stephen Few. So, let’s face it, it would be downright embarrassing if we’d just read the book…and then not aced the test. Still, it’s a good exercise — almost a memory-jogger as to the concepts Few lays out and the rationale behind them.

Take the test and see how you do. It’s all of 10 questions long, and each question only has two answers, so it’s a 2-minute exercise. If you get a question wrong, a little box comes up and gives a very quick/brief explanation as to why. It’s interesting.

On a related note, thanks to a couple of people on Twitter, I got pointed to an interesting post on creating bullet graphs using Google Spreadsheets. One of these days, doggonit, I’m going to lock myself away and play around with Google Docs. I managed to spend 15 minutes trying out a revamped scorecard using Google Docs today. But, that was 15 minutes in five 3-minute chunks, so I didn’t make much headway. Stay tuned, though, and I’ll let you know if I ever manage to produce anything there!

Excel Tips, Presentation

Stephen Few's Derivation of Tufte: The Data-Pixel Ratio

I’ve glanced through various folks’ copies of Stephen Few’s Information Dashboard Design: The Effective Visual Communication of Data on several occasions over the past few years. And, it was a heavy influence on the work that an ad hoc team in the BI department at National Instruments undertook a couple of years ago to standardize/professionalize the work they were putting out.

I finally got around to reading a good chunk of the book as I was flying a three-legged trip out to British Columbia last week…and it is good! One section that particularly struck me started on page 100:

Edward R. Tufte introduced a concept in his 1983 classic The Visual Display of Quantitative Information that he calls the “data-ink ratio.” When quantitative data is displayed in printed form, some of the ink that appears on the page presents data, and some presents visual content that is not data.
:

He then applies it as a principle of design: “Maximize the data-ink ratio, within reason. Every bit of ink on a graphic requires a reason. And nearly always that reason should be that the ink presents new information.”
:
This principle applies perfectly to the design of dashboards, with one simple revision: because dashboards are always displayed on computer screens, i’ve changed the work “ink” to “pixels.”

I’ll actually go farther and say that “dashboards” can be replaced with “spreadsheets” and this maxim holds true. Taking some sample data straight from Few’s book, and working with a simple table, below is how at least 50% of Excel users would format a simple table with bookings by geographic region:

Look familiar? The light gray gridlines in the background turned on in Excel by default. And, a failure to resist the urge to put a “thin” grid around the entire data set.

Contrast that with how Few represents the same data:

Do you agree? This is clearly an improvement, and all Few really did was remove the unnecessary non-data pixels.

So, how would I have actually formatted the table? It’s tough to resist the urge to add color, and I am a fan of alternating shaded rows, which I can add with a single button click based on a macro that adds conditional formatting (“=MOD(ROW()+1,2)=0” for shaded and “=MOD(ROW(),2)=0” for not shaded):

In this case…I’d actually vote for Few’s approach. But, even Few gives the okay to lightly shaded alternative rows later in the same chapter, when some sort of visual aid is needed to follow a row across a large set of data. That’s really not necessary in this case. And, does bolding the totals really add anything? I don’t know that it does.

The book is a great read. It’s easy to dismiss the topic as inconsequential — the data is the data, and as long as it’s presented accurately, does it really matter if it’s presented effectively? In my book, it absolutely does matter. The more effectively the data is presented, the less work the consumer of the data needs to do to understand it. The human brain, while a wondrously effective computer, has its limits, and presenting data effectively allows the brain to spend the bulk of its effort on assessing the information rather than trying to understand the data.

Analysis, Presentation, Reporting

The "Action Dashboard" — Avinash Mounts My Favorite Soapbox

Avinash Kaushik has a great post today titled The “Action Dashboard” (An Alternative to Crappy Dashboards. As usual, Avinash is spot-on with his observations about how to make data truly useful. He provides a pretty interesting 4-quadrant dashboard framework (as a transitional step to an even more powerful dashboard). I’ve gotten red in the face more times than I care to count when it comes to trying to get some of the concepts he presents across. It’s a slow process that requires quite a bit of patience. For a more complete take on my thoughts check out my post over on the Bulldog Solutions blog.

And, yes, I’m posting here and pointing to another post that I wrote on a completely different blog. We’ve recently re-launched the Bulldog Solutions blog — new platform, and, we hope, with a more focussed purpose and strategy. What I haven’t fully worked out yet is how to determine when to post here and when to post there…and when to post here AND there (like this post).

It may be that we find out that we’re not quite as ready to be as transparent as we ought to be over on the corporate blog, in which case this blog may get some posts that are more “my fringe opinion” than will fly on the corporate blog. I don’t know. We’ll see. I know I’m not the first person to face the challenge of contributing to multiple blogs (I’ve also got my wife’s and my personal blog…but that one’s pretty easy to carve off).

Presentation

Oh. So THAT's What Hans Rosling Is Doing at Google…

Yep. I’m living under a rock.

I’d re-stumbled across Hans Rosling and Trendalyzer a couple of months ago. I made a comment regarding if Trendalyzer hits the business world. Well, in a way, it sort of has. It’s hanging around under the hood in some fashion, I’m almost sure, of Google’s Visualization API.

Must. Find. Time. To. Play. With. Google Spreadsheets and visualization gadgets.

Analysis, Presentation

Sometimes, the Data DOES Paint a Clear Picture

I’ll admit right up front that this is the least value-add post on this blog to date. Part of me sincerely hopes that it holds that distinction indefinitely. But, I know me better than that, so no promises.

We all have them. Those moments where someone says something — in person, in an e-mail, in an instant message — that triggers a completely random, but oddly inspired, response.

What happened: One of my pet peeves is the cliche, “If you can’t measure it, don’t do it.” It sounds good, but I challenge any company to fully apply this overly simplistic maxim and survive. I’m all for having a bias towards measurement, but I get nervous when people speak in absolutes like this.

Earlier this week, I fired off an internal e-mail proposing an initiative that was extremely low cost that seemed like a good idea to me. It really wasn’t an initiative where it made sense to try to quantify the benefits, though. I made a comment as such in the e-mail — that, despite it not being practical to measure the results, I still thought it was a good idea. (I was having one of the 15-20 snarky moments I have throughout any given day.) Two of the five people on the distribution list immediately responded with demands for an ROI estimate.

FLASH!

10 minutes later, and I’d fashioned the following chart in Excel and responded to the group with my analysis:

The Bird

Everyone had a good chuckle.

Here’s the spreadsheet file itself. It’s as clean as clean can be, so feel free to snag it and put it to your own use. If you put it to use with entertaining results, I’d appreciate a quick comment with the tale. Or, if you make modifications to enhance the end result, I’d love to get a copy.

Enjoy.

Analysis, Presentation, Reporting

Depth vs. Breadth, Data Presentation vs. Absorption, Frank and Bernanke

For anyone who knows me or follows this blog, it will be no surprise that I can get a bit…er…animated when it comes to data visualization. Partly, this may be from my background in Art and Design. I got out of that world as quickly as possible, when I realized that I lacked the underlying wiring to really do visual design well.

As a professional data practitioner, I also see effective data visualization as being a way to manage the paradox of business data: the world of business is increasingly complex, yet the human brain is only able to comprehend a finite level of complexity. And, while I love to bury myself up to my elbows in complex systems and processes, I’m the first person to admit that my eyes glaze over when I’m presented with a detailed balance sheet (sorry, Andy). A picture is worth a thousand words. A chart is worth a thousand data points. That’s how we interpret data most effectively — by aggregating and summarizing it in a picture.

So, it’s pretty important that the picture be “drawn” effectively. I had a boss for a year or two who flat-out was much closer to Stephen Hawking-ish than he was to Homer Simpson when it came to raw brainpower. He took over the management of a 50-person group, and promptly called the whole group together and presented slide after slide of data that “clearly showed”…something or other. The presentation has become semi-legendary for those of us who witnessed it. The fellow was facing a room of blank-confused-bored-bewildered gazes by the time he hit his third slide. Now, to his credit, he learned from the experience. He still looks at fairly raw data…but he’s careful as to how and where he shares it.

All that is a lengthy preamble to a Presentation Zen post I read this evening about Depth vs. Breadth of presentations. It’s a simple concept (meaning I can understand it), with some pretty good, rich examples to back it up. The fundamental point is that none of us spend very much time thinking about what to cut from our presentations. I would extend that to say we don’t spend very much time thinking about what data not to share or show. It’s easy to see this as a case for “make the data support what you want it to,” which it is not. At all! Really, it’s more about focussing on showing the data — and only the data — that directly relates to the objectives you are measuring or the hypotheses that you are testing.

Then, focus on presenting that data in a way that makes it clear as to what story it is telling. You do the hard work of interpreting the data. Then, highlight what is coming out of that intepretation. If there is ambiguity, highlight that, too. If there is a clear story, and your audience gets it, and you then introduce an anomaly, you’re much more likely to have a fruitful, engaging discussion about it. You will learn, and your audience will retain!

In the end, this is a riff on a bit of a tangent, I realize. Robert Frank presents some fairly alarming evidence of college professors aiming for broad and deep…and not gaining any better retention than the slide-happy, chart-crazy PowerPoint users provide in the business setting. He goes on to talk about how, in his teaching, he makes a point, repeats it, comes at it from a different angle, makes the students think about it, and then repeats it again. He goes for deep. His students, I’m sure, leave his introductory economics class with a thoroughly embedded (and accurate) understanding of “opportunity cost” (having seen the term mis-applied more than once in my day…and still having to struggle to get to the correct answer…and barely…and barely in time…in his presentation…I applaud that!).

I’m not arguing for simplicity for simplicity’s sake. I’m arguing for going deep, understanding the complexity, and then distilling it down to a narrative, cleanly presented, that leaves your audience with takeaways that are accurate and absorbed.

And…on that note, have any of you read The Economic Naturalist? It sounds like it would be right up my alley. It’s just a bonus that, if I ever actually attended something that could be labeled a “cocktail party,” I could talk about how I’d “read some of Bernanke’s work!”

Presentation

Guy Kawasaki (Almost) Says 3-D Graphs Are Evil

Guy Kawasaki posted Ten Questions with Garr Reynolds (author of Presentation Zen: Simple Ideas on Presentation Design and Delivery). Question number 10 (which, as it turns out, was not the last question, as, in an apparent nod to Douglas Adams, Kawasaki actually included 13 questions): “Why do you think 2-D graphs are better than 3-D graphs?”

Answer: 3D charts and graphs are very popular with consumers, but in almost every case it is preferable to use 2-D graphics to display 2-D data. Charts with 3-D depth and distortion usually make things harder to see, not easier. Some of the precision is lost. There is beauty in the simple display of the data itself, there is no need to decorate with distorted perspectives. If the graphic is just for showing the roughest of general trends, then there is nothing really wrong with a 3-D chart I suppose, but when you are trying to show a true visual representation of the data in the clearest way possible, a simple chart without 3-D adornment is usually better.

<sniffle>Pardon me while I wipe a tear from my eye.

I’ve ranted about 3-D charts and graphs before. And since. And a third time.

It’s not just me!

The only issue I have is with Reynold’s supposition that 3-D charts are okay for showing the roughest of general trends. I’d call that the same as saying it’s okay to unload your shotgun at some quail with a friend (or at least a large donor) within range of the spray of pellets. It’s not okay. It’s just not. Unless you are super-duper qualified (meaning you make a good living as a professional graphic designer or artist), don’t do it!

Presentation

What happens if you combine 3D WITH fading axes?

I really hope this blog doesn’t become just a continuous series of rants around my pet peeves regarding data visualization. But…I stumbled across the following in a white paper about measuring visitor engagement on a web site.

This chart was generated from Unica’s Affinium NetInsight product. Unica is a great product, from everything I’ve heard. And, they seem to have a sharp acquisition strategy, based on their buying Sane (with NetTracker) several years ago.

But…this chart is pretty awful. It highlights how downright silly a gratuitous third dimension is. But, the “gradient to white” move on the Y-axis was a new one on me. Search engines 1 through 5 all actually have a value of 3. I did manage to figure this out without looking at the data itself in the report…but it was 3 beats more of mental exertion than it should have been.

Excel Tips, Presentation

Vitriolic Rant Redux — 3D Pie Charts

Pie charts are generally bad enough. Mainly, because they take a lot of real estate to provide pretty limited information. But, they do have their place. That place is showing the relative relationship of the parts of a whole when there is no time dimension.

3D pie charts, though, are simply horrid! They actually misrepresent the data and remove whatever instantaneous clarity that a flat pie chart provides.

In the pie chart above, Which product has the greatest portion of the whole?

Product B. That’s not too hard.

Which is greater, Product A or Product D?

Trick question. They’re the same. And, you probably figured that out. But, in order to do so, your brain had to undo the 3D effect, since when it comes to raw area shown, Product A is larger.

When asked a direct question like, “Which is greater, Product A or Product D,” this isn’t too hard to do. But, that’s not usually the approach of interpreting visual displays of data. Rather, the viewer looks at it and says, “What does this chart tell me?” In a 3D pie chart, your brain has to spend extra cycles doing the A vs. D comparison for every wedge in the pie. And it gets pretty hairy when you’ve got, say, 10 or more wedges. What’s happening is your brain has to go through a (subconscious, but real) effort to remove the 3D effect. That’s an effect that somebody else wasted brain cycles and effort on adding in the first place.

This is the sort of inefficiency that process improvement folk salivate over finding in a manufacturing environment: “Person A unwraps a widgetlet and then screws it on to a doohickey and sends it to the next station. Person B then unscrews the widgetlet, inserts a washer, and then screws it back on in the exact same spot.” Obviously, if Person A didn’t screw the widgetlet on in the first place, then the process would have two steps removed: Person A’s screwing on of the widgetlet and Person B’s unscrewing of it.

It’s the same deal with 3D pie charts.

Excel Tips, Presentation

Vitriolic Rant about "3D" Charts

This is my second week in a row in training — just today and tomorrow this week, thankfully. This week, the product is HardMetrics which, frankly, is a pretty damn cool tool. The trainer is the VP of Product Development, who has been architecting and developing in the reporting and analysis space for seven years or so.

HardMetrics actually OEMs a product from Visual Mining for their visual displays. And, there’s a lot of flexibility and power between the two products. However, I asked right off the bat if there was a way to disable the 3D effect from bar charts (there is). I’m not talking about three dimensions of data — just those damn annoying drop-shadows on two dimensions of data.

Below is an example of the 3D effect in a bar chart — quickly and effortlessly generated using Excel 2007.

These sorts of graphs make a southbound feller’s neck hair point due north. Hard! Unfortunately, Microsoft Excel makes it ridiculously easy to spit these charts out. And, as every BI vendor and data visualization tool maker has scrambled to be able to back up their Marketing departments’ claims that their tool will do “everything that you can do in Excel…and then some!” they’ve all gone right along and rolled out the same dastardly functionality.

The problem is: drop shadow adds NO value…AND can make the data harder to read! The reality is, dropping a shadow on a 2D picture is a fairly straightforward transformation. As a matter of fact, something very similar to that is one of the few homework exercises I remember from my C programming class in college (early 1990s).

The Hard Metrics VP who is doing the training admitted that dropping shadows are “eye candy” and help sell the product. That’s. Just. Ridiculous! Unfortunately, it’s also got the faint tingling jingle of truth.

Clearly illustrating data…and making it easy to clearly illustrate data…is what should sell products. If the supposed glitz of a drop shadow is the tipping point with a decision maker at a company, that’s a customer who is focussed on the wrong, wrong, WRONG thing, and, chances are, he/she is going to make other non-value-adding demands of the product. Of course, the real world requires generating revenue, and if there are potential customers who have cash in the bank that matches their misperceptions about best practices, then, well….

Check out the chart above. Quickly…estimate the revenue for Product A in July. Do it. Dont’ keep reading here. Scroll back up and make an estimate!

It looks to be a little more than $10 million, right? Wrong! It’s $10.8 million! Not only is this a 3D chart…but the bars are plopped down right in the middle of no-man’s land — check out the base of the graph. So, you actually have to project the shadow line back (diagonally and up) and then follow the plot over to the values on the left. That’s just silly. But, oh so easy to do in Excel. Thank you, Microsoft!

What absolutely kills me is that, in Excel, you at least have to consciously decide to add this obfuscating crap to a chart. Which too, too many dunderheads decide to do (all too often, I suspect, because they have a bunch of data and don’t know how to interpret it, so they spend extra time and energy making the chart fancier). The killer is that, in all too many analytics programs, this is the default! And, in some cases, it can’t be changed! This was the case with WebTrends OnDemand (sorry, WebTrends — I hate to pick on a solid tool that has a lot of positive features, but, in this regard, it talks, walks, and craps like a duck, so I’ve got to give it a quack out). These are people who should know better. It was a case of perception becoming reality — Marketing decided this was a “cool” feature that everyone else had and didn’t just stand up to not make it their de facto standard. Ugh!!!

Now, fortunately, as I was poking around on the ‘net for some good examples of this abomination…I hit some of the really big players in the BI space…and they had limited use of 3D in the screen caps they showed. I don’t know if that’s because the likes of Stephen Few, Edward Tufte, and the many good folk over at TDWI finally got it through their heads…or if I just hit the wrong pages. (DISCLAIMER: I have never seen Few’s or Tufte’s feelings on this specific subject — they’re two of the most brilliant minds in the visual presentation of data world, so I feel like I’m pretty safe by guessing that they’re not big fans.) It was an encouraging sign.

Some time…when I don’t have four baskets of laundry to fold, I’ll tell you what I really think about pie charts and — the horror! — 3D pie charts! Stay tuned.