End of an ERA: The Superiority of Sabermetric Pitching Statistics

When the BBWAA announced that Felix Hernandez had won the 2010 American League Cy Young Award in November, there was widespread excitement in the sabermetric community. That the voters had overlooked New York Yankees ace CC Sabathia’s superior win-loss record and proclaimed a starter who earned just 13 victories for the lowly Seattle Mariners to be the best pitcher in the league was a sign of progress—10 years ago, King Felix wouldn’t have stood a chance, despite beating Sabathia in virtually every other major statistic.

But while Sabathia’s snubbing was cause for celebration, the statheads’ endorsement of Hernandez is puzzling, as the newfangled numbers showed that Cliff Lee was clearly superior. Sure, Felix had the edge in baseball-card stats like strikeouts (232 to 185) and innings pitched (249.2 to 212.1), but Lee came out on top in more important places, like control (0.76 BB/9 and 10.28 K:BB to 2.52 BB/9 and 3.31 K:BB) and Wins Above Replacement (7.1 to 6.2).

In addition, Lee (and, for that matter, Francisco Liriano and Justin Verlander) had a superior mark in Fielding Independent Pitching—a statistic that looks like Earned Run Average but is based solely on walks, strikeouts, and home runs allowed. But that didn’t matter, because Hernandez had a lower ERA. The little debate there was over Lee’s candidacy was framed in the false dichotomy of “what should have happened” (FIP) versus “what really happened” (ERA). When put that way, of course FIP sounds stupid.

But FIP isn’t about creating imaginary realities where pitchers give up different amounts of runs—it’s an estimator of a pitcher’s true talent removed when from the outside factors that mar ERA. Therefore, it should be a more accurate predictor of a hurler’s future ERA than his current one is.

There were 98 pitchers who threw at least 100 MLB innings in both 2009 and 2010, from Aaron Cook to Zack Greinke. For each qualifying pitcher, I calculated the absolute values of the differences between his 2010 ERA and both his 2009 ERA and his 2010 FIP. For example, Washington Nationals starter John Lannan had a 4.65 ERA in 2010 after posting a 3.88 ERA and a 4.70 FIP in 2009, so the ERA estimator was .77 off, while the FIP estimator missed by just .05.

The absolute value of the difference between 2009 ERA and 2010 ERA had mean µ = .808 with standard deviation σ ~ .384. Meanwhile, the difference between 2009 FIP and 2010 ERA had mean µ = .722 and standard deviation σ ~ .321.

A simple t-test shows that the previous season’s FIP is a more accurate predictor of future ERA than is the past year’s ERA, with a p-value of .044. In other words, there is a 95.6-percent chance that FIP is a better estimator of pitching talent than ERA. So yes, you can still argue that ERA is a better way to determine the best pitchers in baseball, but you’ve got a 22-in-23 chance of being wrong.

Nor does ERA gain any ground against other similar statistics. The difference between 2009 tERA (like FIP, but expanded to include the types of batted balls pitchers allow) and 2010 ERA had mean µ = .741 and standard deviation σ ~ .293, giving it a 90.8-percent chance of being a better predictor than past ERA.

And when 2009 xFIP (like FIP, but with home runs allowed replaced by “expected” home runs allowed, based on the pitcher’s flyball rate and the league average HR/FB rate) is plugged in, the difference is just µ = .720 with standard deviation σ ~ .272—good for a 96.8-percent chance of being better than ERA.

Of all four choices, the past year’s ERA bore the least resemblance to 2010 ERA. Unless the nature of pitching prowess is truly this inconsistent, to award accolades based on ERA is simply absurd.

Felix, it’s time to return your crown.

Around the Web