Now batting: randomness. Part 1: The coin toss

freesecentredMLB’s World Series is under way. Soon the field of 30 pro baseball teams will been whittled down to one. Trophies will be engraved, champagne popped, rings forged, parades thrown, and banners hung. It’s an exciting time of year for baseball fans: MLB’s postseason is arguably the most entertaining of the major sports’. It is rife with upsets and comebacks. In recent memory, a team has come back from a 3-0 series deficit to win the American League Championship (Boston Red Sox, 2004), and another scored two runs in the bottom of the ninth and tenth to stave off elimination in game 6 of the Word Series (St. Louis Cardinals, 2011).

It just seems like anything can happen.

And it’s true, in baseball, more so than any other major sport, anything can happen. Because in the big leagues, randomness rules.

***

We’ll get back to baseball in a bit, but first here’s a little experiment: Let’s take 30 fair coins. Now let’s toss each of those 30 coins 162 times and keep track of the percentage of tosses that each individual coin comes up heads. We would expect most coins to come up heads around 81 times, or 50% of the time. But we’d also expect some to come up heads more often and some to come up heads less often. If we run this experiment, and we plot each coin at the percentage that it came up heads, we might get something that looks like this:

randomsample

Most of the coins are clustered around the 50% mark, but there is a certain amount of spread away from 50%. A measure of this spread is called variance. A high variance means the coins came up heads further away from the 50% mark, and a low variance means the coins came up heads right around the 50% mark. We can also plot the results like this:

randomdist

The curve is highest around 50% because that’s how often most of the coins came up heads. The curve is lower around 40% and 60% because less coins came up heads around that percent of their 162 tosses.

Looking at this plot, you might think that some coins were better at coming up heads than others. You might see a coin that comes up heads 60% of the time. It must be good at coming up heads! But remember, no coin is actually good at coming up heads. The variance (the spread of results away from or close to 50%) is attributable entirely to randomness (luck).

Another example: Let’s take 30 MLB teams. Each team plays 162 games in a season; let’s keep track of the percentage of games that each individual team wins. Now let’s plot it just like we did with the coins:

winsdist

Like the coins, most teams win around 81 games, or 50% of the games they play. The variance is higher though. Let’s put the coin plot and the baseball plot together so we can compare them.

bothdists

If we randomly determine the winner of baseball games by flipping a coin, we get the blue line. If we determine the winner of baseball games by playing nine innings of baseball, we get the green line. They’re different, but they have a lot in common.

We know that the ‘best’ coins just happen to come up heads more than the other coins. Do the ‘best’ baseball teams just happen to win more games than the other teams?

In my next post I’ll dig deeper and compare the role of randomness in pro baseball to the role of randomness in pro hockey, football, and basketball to see in which sport randomness plays the biggest role.

For more details about what I said in this post, check out this document.  It’s an iPython notebook: a visually appealing, readable way to share code and figures. If you’re interested in changing some variables or otherwise playing around with the code, you can download it and open it up as your own iPython notebook.

Leave a Reply

Your email address will not be published.