Have feedback / suggestions? Let us know!

MLB Arizona Diamondbacks

The Paradox of DIPS: Can Relievers Like Mariano Rivera Induce Weak Contact?

Ever since the fateful day when Chicago paralegal Voros McCracken discovered that pitchers have little, if any, control over what happens to the baseball once the batter makes contact, DIPS (Defense Independent Pitching Statistics) theory has been a central tenet of the sabermetric movement. And with good reason. It has been proven that stats like FIP (Fielding Independent Pitching, an estimator of ERA based solely on walks, strikeouts, and home runs allowed), xFIP (like FIP, but replacing HRs with a pitcher’s flyball rate), and tERA (which uses a pitcher’s batted-ball profile) are all better predictors of future ERA than is current ERA, meaning that they are superior measures of pitching talent.

ark in Arlington in Arlington, Texas USA, 15 October 2010. This is the first game of the best of seven of the 2010 American League Championship Series. EPA/RALPH LAUER fotoglif764121

But DIPS theory doesn’t hold for every pitcher. Take Mariano Rivera. As I was perusing his stats the other day, I was struck by the major discrepancy between his 2.23 career ERA and his 2.79 FIP—and that’s not a small sample size, that’s over 1,150 innings pitched. Rivera has outperformed his FIP by more than a full run two years in a row, and in 2010 his xFIP (3.65) was more than double his ERA (1.80). He’s exceeded his DIPS stats eight times in the last nine seasons, and 12 of the last 14.

Mo’s longevity makes him the best example of a pitcher who outperforms his peripherals, but he’s far from the only one. Indians closer Chris Perez (1.71 ERA/3.54 FIP/4.30 xFIP in 2010) came to mind quickly. Andrew Bailey (1.47/2.96/3.80), Rafael Soriano (1.73/2.81/3.81), and Darren O’Day (2.03/3.50/4.06) are other dramatic examples. All these names have one thing in common: they’re relievers.

I was struck by an idea that sounded crazy: what if DIPS theory doesn’t apply to relievers?

Seventy-four pitchers threw at least 50 innings of MLB relief in both 2009 and 2010. For each pitcher, I took the absolute value of the difference between his 2010 ERA and both his 2009 ERA (henceforth known as “ERA estimator”) and his 2009 FIP (“FIP estimator). For example, Rivera posted a 1.80 ERA in 2010 after posting a 1.76 ERA and 2.89 FIP in 2009, so his ERA estimator was just 0.04 off while his FIP estimator was 1.09 off. If the FIP estimator is generally closer to the mark than the ERA estimator, it means that FIP is a better method of predicting future ERAs, and it stands to reason that the FIP estimator is a better indicator of pitching talent.

The absolute value of the difference between 2009 ERA and 2010 ERA had mean µ = .933 with standard deviation σ ~ .661. Meanwhile, the difference between 2009 FIP and 2010 ERA had mean µ = 1.02 and standard deviation σ ~ .572. Given that information, a simple t-test is all it takes to make the comparison.

Surprisingly, the results actually reinforce my hair-brained idea. According to this data, there is an 84.9 percent chance that ERA is a superior predictor than FIP for relievers—twenty times as likely as it is for starting pitchers. The .151 p-value is too high to make this test completely conclusive, but it’s still a remarkable difference from what we saw with starters.

Could it be that relief pitchers actually have the skill to induce weak contact (or the inability to avoid allowing well-hit balls)? The xFIP estimator data (µ = 1.044, σ ~ .658) makes this seem even more plausible. The ERA estimator has an even more impressive 88.1 percent chance of beating the xFIP estimator, suggesting that relievers are consistent in their abilities (or lack thereof) to keep the ball in the park. This is the opposite of what we saw from starters—the xFIP estimator was more accurate than FIP.

Still not convinced? Here’s where it gets weird.

There is one sabermetric stat that predicts ERA better than the real thing—tERA. Taking the types of contact allowed into account made the estimator less accurate for starters, but the tERA estimator (µ = .894, σ ~ .535) beats the ERA estimator with a 67.4 percent confidence level. More relevantly to the question at hand, it has a 95.1 percent chance of being more accurate than the FIP estimator, and 96.1 percent odds—a .049 p-value—of being superior to the xFIP model. In other words, the estimator that gives the most weight to how balls are hit is the most accurate.

One thing is clear from the data: whether by luck, intangible late-inning factors, or true talent, relief pitchers have an ability to affect the way the ball moves off the bat that their peers in the rotation do not share. Yes, the fact that Rivera’s 2010 ERA barely budged while his K/9 rate dropped by three is probably luck. But, crazy as it may sound, the fact that he and dozens of other relievers manage to outperform their DIPS stats is because they know how to induce weak contact.