The Paradox of DIPS: Can Relievers Like Mariano Rivera Induce Weak Contact?

Ever since the fateful day when Chicago paralegal Voros McCracken discovered that pitchers have little, if any, control over what happens to the baseball once the batter makes contact, DIPS (Defense Independent Pitching Statistics) theory has been a central tenet of the sabermetric movement. And with good reason. It has been proven that stats like FIP (Fielding Independent Pitching, an estimator of ERA based solely on walks, strikeouts, and home runs allowed), xFIP (like FIP, but replacing HRs with a pitcher’s flyball rate), and tERA (which uses a pitcher’s batted-ball profile) are all better predictors of future ERA than is current ERA, meaning that they are superior measures of pitching talent.

ark in Arlington in Arlington, Texas USA, 15 October 2010. This is the first game of the best of seven of the 2010 American League Championship Series. EPA/RALPH LAUER fotoglif764121

But DIPS theory doesn’t hold for every pitcher. Take Mariano Rivera. As I was perusing his stats the other day, I was struck by the major discrepancy between his 2.23 career ERA and his 2.79 FIP—and that’s not a small sample size, that’s over 1,150 innings pitched. Rivera has outperformed his FIP by more than a full run two years in a row, and in 2010 his xFIP (3.65) was more than double his ERA (1.80). He’s exceeded his DIPS stats eight times in the last nine seasons, and 12 of the last 14.

Mo’s longevity makes him the best example of a pitcher who outperforms his peripherals, but he’s far from the only one. Indians closer Chris Perez (1.71 ERA/3.54 FIP/4.30 xFIP in 2010) came to mind quickly. Andrew Bailey (1.47/2.96/3.80), Rafael Soriano (1.73/2.81/3.81), and Darren O’Day (2.03/3.50/4.06) are other dramatic examples. All these names have one thing in common: they’re relievers.

I was struck by an idea that sounded crazy: what if DIPS theory doesn’t apply to relievers?

Seventy-four pitchers threw at least 50 innings of MLB relief in both 2009 and 2010. For each pitcher, I took the absolute value of the difference between his 2010 ERA and both his 2009 ERA (henceforth known as “ERA estimator”) and his 2009 FIP (“FIP estimator). For example, Rivera posted a 1.80 ERA in 2010 after posting a 1.76 ERA and 2.89 FIP in 2009, so his ERA estimator was just 0.04 off while his FIP estimator was 1.09 off. If the FIP estimator is generally closer to the mark than the ERA estimator, it means that FIP is a better method of predicting future ERAs, and it stands to reason that the FIP estimator is a better indicator of pitching talent.

The absolute value of the difference between 2009 ERA and 2010 ERA had mean µ = .933 with standard deviation σ ~ .661. Meanwhile, the difference between 2009 FIP and 2010 ERA had mean µ = 1.02 and standard deviation σ ~ .572. Given that information, a simple t-test is all it takes to make the comparison.

Surprisingly, the results actually reinforce my hair-brained idea. According to this data, there is an 84.9 percent chance that ERA is a superior predictor than FIP for relievers—twenty times as likely as it is for starting pitchers. The .151 p-value is too high to make this test completely conclusive, but it’s still a remarkable difference from what we saw with starters.

Could it be that relief pitchers actually have the skill to induce weak contact (or the inability to avoid allowing well-hit balls)? The xFIP estimator data (µ = 1.044, σ ~ .658) makes this seem even more plausible. The ERA estimator has an even more impressive 88.1 percent chance of beating the xFIP estimator, suggesting that relievers are consistent in their abilities (or lack thereof) to keep the ball in the park. This is the opposite of what we saw from starters—the xFIP estimator was more accurate than FIP.

Still not convinced? Here’s where it gets weird.

There is one sabermetric stat that predicts ERA better than the real thing—tERA. Taking the types of contact allowed into account made the estimator less accurate for starters, but the tERA estimator (µ = .894, σ ~ .535) beats the ERA estimator with a 67.4 percent confidence level. More relevantly to the question at hand, it has a 95.1 percent chance of being more accurate than the FIP estimator, and 96.1 percent odds—a .049 p-value—of being superior to the xFIP model. In other words, the estimator that gives the most weight to how balls are hit is the most accurate.

One thing is clear from the data: whether by luck, intangible late-inning factors, or true talent, relief pitchers have an ability to affect the way the ball moves off the bat that their peers in the rotation do not share. Yes, the fact that Rivera’s 2010 ERA barely budged while his K/9 rate dropped by three is probably luck. But, crazy as it may sound, the fact that he and dozens of other relievers manage to outperform their DIPS stats is because they know how to induce weak contact.

Around the Web

  • Pingback: The Paradox of DIPS: Can Relievers Like Mariano Rivera Induce Weak Contact?, MLB | BallHyped Sports Blogs

  • David

    I think this is because starting pitchers pitch to contact, whereas relievers try to miss contact as much as possible. The reason starters want contact is to limit their pitch count. It would be interesting to see of relievers average more pitches per inning than starters.

  • David

    To continue, I would say almost every late inning reliever is throwing the ball as hard as they can EVERY pitch; they can do this because they only have to get three or four outs. Starters can’t go all out because they would be gassed by the 4th, hence more solid contact vs. starters most of the time.

    • Lewie Pollis

      I agree, it makes sense that relievers would be better at inducing weaker contact (see Tom Tango’s “Rule of 17″), but the more interesting aspect of this to me is the consistency. I suppose less contact would mean less opportunity for variation, but I’m not sure that fully explains the discrepancy.

  • Steve RYan

    Two questions:

    1 – How does a reliever entering the game in the middle of an inning and therefore just needing one or two outs to complete the frame affect the ERA, FIP, xFIP & tERA?

    2 – Could this instead be a test to determine which pitchers should be starters or relievers?

    • Lewie Pollis

      Those are both good questions to which I do not know the answers. I’d be very interested to see some research on that.

  • studes

    You might be interested in this, from a few years ago:

    • studes

      (the key related point is at the end of the article)

    • Lewie Pollis

      Very interesting read. Thanks for the heads up on that.

      There’s a key difference in our findings, though: your conclusion was that “great relievers” are anti-DIPS, but my data suggest that relievers in general are more consistent in the type of contact they allow. What do you think caused the discrepancy?