Quotable Quotes from Nate Silver
It’s hard to be an analyst and not be a fan of Nate Silver. Actually, I think it might actually be the law — one of those “natural law” things, like gravity (“Obey gravity! It’s the law!”), rather than one of those legislated ones.
Not too long ago, I wrote a post that gave my take on one aspect of the post-election commentary about Silver’s work. In some of the Twitter exchanges around that post, Jim Cain suggested that I really should read Silver’s book, as the content of the post lined up well with some of the topics Silver covered. I’d planned to read the book over the holidays, anyway, but his nudge convinced me to go ahead and buy the Kindle edition and bump it up to the top of my list.
THAT was a great move (thank you, Twitter and Jim!). I’ve still got some digesting (and rereading) to do, but I thought I’d throw out some of my favorite quotes from the book as a blog post. In order of appearance…
The most elegant description of the why and what of having massive amounts of data at your disposal to tell whatever story you want:
The instinctual shortcut that we take when we have “too much information” is to engage with it selectively, picking out the parts we like and ignoring the remainder, making allies with those who have made the same choices and enemies with the rest.
On the role of the analyst and the necessity for thought behind the data:
The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning…Data-driven predictions can succeed–and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.
But…also recognizing that people are not machines (he spends a lot of time breaking down the evolution of chess-playing computers to articulate the strengths and weaknesses of computers and humans when it comes to prediction):
We can never make perfectly objective predictions. They will always be tainted by our subjective point of view.
Silver actually quotes John P. A. Ioannidis, author of a paper called “Why Most Published Research Findings Are False,” at length and then explains in simple terms the mathematical realities of digging into Big Data:
“In the last twenty years, with the exponential growth in the availability of information, genomics, and other technologies, we can measure millions and millions of potentially interesting variables,” Ioannidis told me. “The expectation is that we can use that information to make predictions work for us. I’m not saying that we haven’t made any progress. Taking into account that there are a couple of million papers, it would be a shame if there wasn’t. But there are obviously not a couple of million discoveries. Most are not really contributing much to generating knowledge.”
This is why our predictions may be more prone to failure in the era of Big Data. As there is an exponential increase in the amount of available information, there is likewise an exponential increase in the number of hypotheses to investigate. For instance, the U.S. government now publishes data on about 45,000 economic statistics. If you want to test for relationships between all combinations of two pairs of these statistics–is there a causal relationship between the bank prime loan rate and the unemployment rate in Alabama?–that gives you literally one billion hypotheses to test.
But the number of meaningful relationships in the data–those that speak to causality rather than correlation and testify to how the world really works–is orders of magnitude smaller. Nor is it likely to be increasing at nearly so fast a rate as the information itself; there isn’t any more truth in the world than there was before the internet or the printing press. Most of the data is just noise, as most of the universe is filled with empty space.
I have a whole slew of reading and understanding-deepening to do around Bayesian reasoning, Fisher’s statistical method, Frequentists, and all sorts of other data science-y topics spawned by the middle part of the book (so, my original plan to read this book has now been replaced by a plan to dig into Matt Gershoff’s list of data science and machine learning resources). Silver is a strong believer in what he calls “The Bayesian Path to Less Wrongness:”
…I’m of the view that we can never achieve perfect objectivity, rationality, or accuracy in our beliefs. Instead, we can strive to be less subjective, less irrational, and less wrong. Making predictions based on our beliefs is the best (and perhaps even only) way to test ourselves. If objectivity is the concern for a greater truth beyond our personal circumstances, and prediction is the best way to examine how closely aligned our personal perceptions are with that greater truth, the most objective among us are those who make the most accurate predictions.
And, more on the “art and science” of analytics — the need to not simply expect the numbers to give the right answer on their own:
It would be nice if we could just plug data into a statistical model, crunch the numbers, and take for granted that it was a good representation of the real world. Under some conditions, especially in data-rich fields like baseball, that assumption is fairly close to being correct. In many other cases, a failure to think carefully about causality will lead us up blind alleys.
As analysts, how often are we faced with stakeholders who have unrealistic expectations of getting a black-and-white answer? The reality:
In science, one rarely sees all the data point toward one precise conclusion. Real data is noisy–even if the theory is perfect, the strength of the signal will vary.
And, finally, a conclusion that wraps with a clever turn on Reinhold Niebuhr’s Serenity Prayer:
Prediction is difficult for us for the same reason that it is so important: it is where objective and subjective reality intersect. Distinguishing the signal from the noise requires both scientific knowledge and self-knowledge: the serenity to accept the things we cannot predict, the courage to predict the things we can, and the wisdom to know the difference.
These were a sample of some of the passages I highlighted throughout the book. They capture a degree of the concepts and ideas that the book covers. They don’t — at all — cover the deeply researched examples that Silver uses to illustrate these ideas. From poker to election prediction to weather forecasting (which has gotten much better in the past few decades) to earthquake and financial market prediction (that have barely improved at all when compared to weather forecasting) to predicting terrorism, the depth and breadth of his research is impressive!
I suspect I will be returning to specific aspects of his book in greater detail in the future, but this was a fun re-skim to remind me that the writing and ideas were both outstanding!