Data Visualization — March Madness Style
I got an e-mail last week just a few hours into Round 1 of this year’s NCAA men’s basketball tournament. The subject of the email was simply “dumb graph,” and the key line in the note was:
The “game flow” graph…how in the WORLD is that telling me anything? That the score goes up as the game goes on? Really? Ya think?
My friend was referring to the diagrams that ESPN.com is providing for every game in the tournament. The concept of these graphs is pretty simple: plot the score for each team over the course of the game. For instance, the “Game Flow” graph for the Oklahoma vs. Morgan State game looks like this (you can see the actual graph on the game recap page — just scroll down a bit and it’s on the right):
This isn’t an exact replication, but it’s pretty close — best I could manage in Excel 2007 (the raw data is courtesy of the ESPN.com play-by-play page for the game). ESPN’s graph is a Flash-based chart, so it’s got some interactivity that the image above does not (we’ll get to that in a bit).
The graph shows that the game was tight for the first 4-5 minutes, then Oklahoma pulled away, Morgan State made it really close mid-way through the first half, and then Oklahoma pulled away and never looked back. My friend had a point, though — the dominant feature of the graph is that both lines trend up and to the right…and any chart of a basketball game is going to exhibit that pattern (actually, the play-by-play for that game has a couple of hiccups such that, when I originally pulled the data, I had a couple places where the score went down due to out-of-sequence free throw placement…but I noticed the issue and fixed it). In business, we’re pretty well conditioned to see “up and to the right” as a good thing…but it’s meaningless in the case of a basketball game.
Compare that graph to a game that was much closer — the Clemson vs. Michigan game (the graph on ESPN’s site is on the recap page, and the raw data is on the play-by-play page):
This was a tighter game all through the first half. Clemson led for the first 7-8 minutes, Michigan pulled substantially ahead early in the second half, and then things got tight in the last few minutes of the game. But, again, both lines moved up and to the right.
These charts are not difficult to interpret:
- The line on top is the team that is leading
- The distance between the lines is the size of the lead
- The lines crossing signifies a lead change
But, could we do better? Well, my wife and kids are out-of-town for the week (spring break), I have the social life you’d expect from someone who blogs about data and data visualization, and the fridge is well-stocked with beer. Party. ON!
At best, my level of basketball fan-ness hovers right around “casual.” Still, I follow it enough to know the key factors of a game update or game upset (Think: “Hey, Joe. What’s the score?”). Basically:
- Who’s winning?
- By how much?
(If there’s time for a third data point, the actual score is an indication of whether it’s a high scoring shootout or a low scoring defense-oriented game.)
Given these two factors as the key measures of a game, take another look at the graphs above. When the game is tight, you have to look closely to assess who is winning. And, determining how much they’re winning by requires some mental exertion (try it yourself: look back at the last graph and ask yourself how much Michigan was winning by halfway through the second half).
This is just begging for a Stephen Few-style exercise to see if I can do better.
First, the Oklahoma/Morgan State game:
Rather than plotting both team’s scores, with the total score on the Y-axis, this chart plots a single line with the size of the lead — whichever side of the “0” line the plot is on is the team that is winning. The team on the top is the higher seed, and the team on the bottom is the lower seed. I added the actual score at halftime and the end of the game, as well as each team’s seed. Compare that chart to the much closer Clemson/Michigan game:
The chart looks very different — focussing on what information fans really want and presenting it directly, rather than presenting the data in a way that requires mental exertion to derive what the fan is really interested in: who’s winning and by how much? While the graphs on ESPN’s site allow you to mouse over any point in the game and see the exact score and the exact amount of time remaining, it’s hard to imagine who would actually care to do that — better to come up with an information-rich and easy-to-interpret static chart than to get fancy with unnecessary interactivity.
A few other subtle changes to the alternative representation:
- I tried to dramatically increase the “data-pixel ratio” (Few’s principle that the ratio of actual data to decoration should be maximized) — this is a little unfair to ESPN, as their site is working with an overall style and palette for the site, but it’s still worth keeping in mind
- I used color on the Y-axis to show which team’s lead is above/below the mid-line. The numbers below the middle horizontal line are actually negative numbers, but with a little Excel trickery, I was able to remove the “-” and change the color of the labels (all done through Custom number formatting)
- By putting the top seed on the top, looking at a full page of these charts would quickly highlight the games that were upsets
I’m my own worst critic, so here are two things I don’t like about the alternate charts above:
- The overall palette still feels a little clunky — the main data plot doesn’t seem to “pop” as much as it should, even though it’s black, and the shaded heading doesn’t feel right
- While the interpretation of the data requires less mental effort once you understand what the chart is showing, it does seem like this approach requires another half-second of interpretation upr front that the original charts don’t require
What do you think? What else could I try to improve the representation?