In the last post, I compared two different plots: one showing the distribution of heads percentages after coins are tossed 162 times, and one showing the distribution of win percentages after teams play 162 MLB games. They looked sort of similar, but had different variances.
Tom Tango, who has done a lot of work with baseball statistics, suggests that if we subtract the variance due to randomness/luck (the coin tossing variance) from the actual variance observed in MLB winning percentages, what’s left is the variance due to skill. With that value, we can find the ratio of skill variance to luck variance. The higher this ratio, the more we can attribute the results of a season of baseball to skill as opposed to randomness/luck.
Each major professional sports league in North America has a different skill/luck ratio. It’s important to note that this ratio goes up with the number of games played (the more games that are played, the more we can attribute the final standings to skill rather than luck – see the supplementary document for this post for the reason why). So to compare the sports to each other, let’s look at how many games you would have to play in the MLB, NBA, NHL, and NFL, in order to achieve a 1:1 skill/luck ratio in those leagues.
What this plot tells us is that after all MLB teams play 70 games, you will know as much about who the best teams in the league are as you would after all NFL teams play only 12 games. The bigger the bar in this plot, the bigger the role of randomness, and the smaller the role of skill in determining how many games a team wins.
Why is the role of randomness higher in certain leagues? I have some guesses. One, I think that games with few scoring opportunities are more subject to randomness. Random deflections could cause every score in a hockey game. Random deflections cause a very low percentage of scores in a basketball game. Two, I think that games with hard-to-measure variables are more subject to randomness. A goaltender has a hard time measuring puck position and speed when he is being screened. A batter has a hard time measuring pitch position and speed because it changes so rapidly. A jump shooter has a much easier time measuring the position of the basket.
OK, so randomness plays a huge role in the MLB. But each team in the MLB also plays a lot more games than those in the other sports (162 games vs 82 for the NHL and the NBA, and 16 for the NFL). And as I mentioned earlier, the skill/luck ratio goes up with the more games you play. So what if we plot the skill/luck ratio after seasons of 162 MLB games, 82 NHL and NBA games, and 16 NFL games?
What this plot tells us is that after an NBA season, we know far more about who the best teams in the league are than we do after the season in any other league. Luck plays the smallest role in the sport of NFL football (as shown by the first plot), but because they play so few games, at the end of the NFL season, when we look at the standings, it’s hard to know if the teams at the top were the most skilled, or just the luckiest.
The first plot tells us in which sport randomness plays the biggest role.
The second plot tells us which league is the best at determining who the best team is at the end of the season.
In the next post I’ll expand on Tom Tango’s work described here and look at how randomness/luck affects the chances that a less skilled team will beat a more skilled team. I’ll use that to comment on the playoffs in each league.
How I came up with the numbers/figures in this post is explained in this accompanying iPython notebook. I didn’t mention the details of the analysis in this post, but you can find them in the notebook. There are also some bonus plots in there, so check it out! And feel free to download the notebook and play around with it.