Characterizing The Sustainability Of Left On Base Percentage

In my last article, I found that mainstream pitcher WAR calculations (fWAR, rWAR, and WARP) disagree on how to value two pitching statistics that include skill and luck, BABIP and LOB%. BABIP has been covered in many articles (including this excellent one), so I thought I would focus on LOB%, and how we can characterize which players tend to be skilled in holding players on the basepaths.

Intuitively, there are a few factors that could cause a pitcher to consistently produce an excellent left on base rate. One, which gets plenty of debate in the sabermetrics community of even being real, is the ability to pitch in “clutch” situations. However, I wanted to look beyond this, as very few players who are “clutch” one year continue to be “clutch” in future years. Let’s consider, instead, types of outcomes that would greatly affect a pitcher’s LOB%, and then see if those outcomes are sustainable for specific kinds of pitchers.

Outcomes that are likely to help a pitcher’s LOB% need to accomplish two criteria: create out(s), and prevent existing base runners from advancing. For instance, a double play adds two outs in the inning, and not only prevents a runner from advancing, it causes a runner to retreat to the bench. On the other hand, a sacrifice fly lets the runners advance, even though the pitcher gets a batter out in the process. For an example of a non-out situation that does not let existing runners to advance, think of an infield single or walk with a man on second. The man on second usually does not run in these situations, so the likelihood of him scoring a run does not change that much, but nothing productive was done in terms of ending the inning.

So, with these criteria in mind, I listed all of the outcomes in terms of meeting neither, one of, or all of the criteria, as such (note that I did not include errors, catcher interference, or catcher indifference since the pitcher has little, if any effect on those):

possible-situations

In the above table, the best outcomes are the ones that have red boxes closest to the bottom. Since a pitcher most helps his LOB% if a play has a checkbox in one of the last two rows, and least helps his LOB% if a play is in the first row, I hypothesize that pitchers who force double plays, caught stealing attempts, strikeouts, and fielder’s choices will be the ones to have the highest LOB% on average, and bad pitchers (ones who give up hits, stolen bases, wild pitches, balks, walks, and hit by pitches) tend to also have bad LOB%.

To test this hypothesis, I undertook a regression analysis that equates LOB% in terms of other rate statistics. This allows us to see two things: which outcomes tend to affect LOB% the most, and how much each of those outcomes contributes to one’s LOB%. I gathered a lot of data (such as the rate versions of the statistics in the above table, batted ball statistics, pitch F/x data, and plate discipline data) in order to estimate a variety of regression equations, and plugged away for a few days. In the end, it seemed three variables, CS%, BB%, and (XBH-HR)% seemed to contribute most to LOB%. I realize that (XBH-HR)% is not very commonly used, all it means is the percent of batters faced for a pitcher that got a double or triple. The results I got are a bit surprising, and partially against my hypothesis, though they do not completely contradict it. To start, here are the statistics that measure how accurate the regression is in predicting LOB%:

regression-statistics

The results of the regression are encouraging, though, do not perfectly predict LOB%. As I mentioned earlier, LOB% is a stat that combines skill and luck, so trying to reduce LOB% down to skill based statistics obviously creates an imperfect model, since it’s omitting factors largely out of the pitcher’s control. Because of this, the R2value of 0.34 actually is likely very good, since we should not expect all of LOB% to be predicted by skill.

The more indicative statistic that tells us if the model is representing LOB% well is the Significance F, which is the probability that the model does not explain variation in LOB% among pitchers. In other words, there is less than a 0.001% that any fit between the model and LOB% is purely by chance. Now let’s turn to the actual variable data for this regression:

coefficients

From this model, we can approximate LOB% = 0.864 + 0.595*CS% – 0.631*BB% – 1.973*(XBH-HR)%. All of these variables are significant in the model, shown by the low p-values (which are similar to the Significance F, just for the individual variables in a multivariate regression). Additionally, the Lower 95% and Upper 95% do not change signs for any of the variables, which means most likely that not only are these variables significant, but also are currently reflecting their correct relationships on LOB%.

This model supports my hypothesis that pitchers with high CS% will have higher LOB%, and pitchers who give up a lot of extra base hits and walks tend to have worse LOB%. The portion of my hypothesis that the model does not support is that high GIDP% and SO% lead to high LOB%, and that pitchers who give up home runs and singles tend to have worse LOB%.

The reasoning for GIDP% not being a significant factor in determining LOB% is likely due to the infrequency of double play chances in a game. In order to have a groundout double play, there needs to be 1 or less outs and a runner at a force-out base. Additionally, a hard hit ground ball to the shortstop or second baseman that is the likely cause of a double play could just as easily be a good hit up the middle if the ball were shifted a few feet. It could very well be that pitchers who cause more double plays also cause more good hits up the middle, so the two may offset in predicting LOB%. These are also likely reasons for FC% not being significant enough to be a part of the model.

I interpret the fact that singles and home runs are not significant in the model as follows. If someone hits a solo home run, while that does add runs to the board, it would not affect a pitcher’s LOB%. According to this article, about 58% of home runs in a season are solo home runs. This means around 58% of home runs would not affect LOB%. So, while pitchers who give up more home runs certainly should be expected higher LOB%, the home run’s effect on LOB% should not be as great as doubles or triples should be. On the opposite side of the spectrum, singles are probably less important to LOB% than extra base hits for two straightforward reasons. Firstly, an extra base hit is more likely to drive in existing runners than a single is, since extra base hits tend to be better hits that take longer to field. Secondly, after a single, a player needs to advance three more bases to score, which is much less likely than one or two after other extra base hits. It’s not that home runs and singles do not contribute to LOB%, it’s that doubles and triple contribute more.

The most surprising part of the model for me was that strikeouts did not have a great enough of an effect on LOB% to appear in the model. However, after looking at a correlation matrix between the statistics in the multivariate regression, SO%, and LOB%, I found a satisfactory answer:

CS

In the correlation matrix above, all four variables have a significant correlation to LOB%, though SO% is the weakest of them. However, the more important part of this matrix is that the other variables barely relate to each other, with the exception of SO% and (XBH-HR)%. In fact, the correlation between SO% and (XBH-HR)% is the strongest! Since SO% and (XBH-HR)% are interrelated, there would be multicollinearity if we included both SO% and (XBH-HR)% in this model. Strikeouts definitely have an effect on LOB%, but a weaker one than doubles and triples, therefore the model is more accurate if we include (XBH-HR)% instead of SO%.

Enough with the bad in my hypothesis, now let’s turn to the good! Three variables that tend to be great factors in predicting LOB% are CS%, BB%, and (XBH-HR)%. To me, all of these factors make sense as contributors to LOB%, as they all were in either the first or last row of the outcomes tables at the beginning of the article (a.k.a all three statistics are on the extreme ends of out production, and existing base runner advancement). For the last portion of the article, I want to look at the sustainability of each of these three factors, since, my initial goal for this article was to see how sustainable are above average LOB% over time.

Max Weinstein of fangraphs.com has recently done some great research on pitcher and catcher battery stolen base and caught stealing effects, and has even found that (contrary to common baseball wisdom), pitchers tend to have a greater effect on the result of a stolen base attempt than do catchers. I strongly urge you to read all of his work if you have the time. Using his research on the subject of CS% for pitchers, he has found that pitchers with quicker releases tend to have much higher CS%. Since pitchers try to have as consistent of mechanics as possible, CS% is sustainable over time for a pitcher, since those who have quicker releases one year should continue to have quicker releases in future years.

If you are reading this article, you probably know about fielding independent pitching statistics, and how they are more sustainable over time than statistics that depends on other fielders (if you haven’t, here’s a good place to get started). Since walks are one of the big three fielding independent statistics, pitcher BB% tend to be sustainable over time, though there are certainly some who improve over time, and some who decline.

The last statistic, (XBH-HR)%, is not typically used in analysis. However, both portions individually, XBH% and HR%, have been. XBH% was actually a topic in Moneyball, in the assessment of Chad Bradford. If pitchers allow doubles and triples, they are usually allowing good contact that is much more a result of their pitching than their supporting casts’ fielding. For instance, a line-drive double hit deep into the outfield will still be a hit no matter how athletic the fielder is, but a blooper single in the outfield could be an out depending on the fielder. So, while it is not a defensively independent statistic, XBH% tends to behave similarly to one since extra base hits have less fielding dependence than other contact does, and is thus sustainable. Home runs rely on even less fielding than extra-base hits, and therefore lead to more sustainable HR% for pitchers. Since both XBH% and HR% are sustainable, the difference between the two should be as well.

So, while it is hard to completely strip down LOB% into skill based statistics since there are factors involved outside of the pitcher’s control, a great portion of it is based on pitcher skills, specifically in the areas of CS%, BB%, and (XBH-HR) %. It is important to remember that even though certain statistics have randomness or luck involved, often in baseball, we can learn a great deal about them still since no play in baseball is completely a result of luck. For LOB%, we can see in the coming years if people like Hisashi Iwakuma can continue to lead the league, or just completely fall off the map. For now though, we’ll wait.

All statistics from this article are courtesy of Fangraphs, Baseball-Reference, and ESPN.

The following article was originally published on Batting Leadoff. For more information please visit us at www.battingleadoff.com or follow us on Twitter @Batting_Leadoff. 

 
  • More From Stats Insights
  •  

    Related

    Comments

    Your email address will not be published. Required fields are marked *

    *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

    Current day month ye@r *