September 16, 2005

Mythbusting

There has been one constant for Red Sox fans during this season of injury, inconsistency, and controversy.  The one thing we can be sure will happen when we sit down for a Sox game.  The one absolute truth that cannot be denied, for we have Seen It With Our Own Eyes.

If he's a rookie, we will lose to him.

Here's the problem:  It's not actually true.

Want proof?  Click the little button.

Posted by 12eight at 16:58:02 | Permanent Link | Comments (5) |

September 15, 2005

MVP: MCA

Or, 'Most Controversial Award'.

What is an MVP?  There are a lot of definitions.  Statheads like to do away with the accoutrements of the award, simply proclaiming it the domain of the best player in the league.  Pitchers, DH's, 3B's on last-place teams, whoever.  If they have the best numbers - if their numbers contributed more to their team's success than any other players' numbers - well that's an MVP.

Another definition centers on the word 'Valuable'.  This argument states that a 'valuable' player has to get his team somewhere they wouldn't be without him - and 4th rather than 5th place doesn't count.  in other words, the MVP should be on a team that makes the postseason, or at least one that comes really really close.

The final definition is less a definition than a practice; the late-season surger, or the headline maker.  This also extends to players who broke key historic records.  This one is pragmatic; reminiscent of Oscar buzz, it is what results from a player gaining backers in the press slowly over the course of the final month of the season.  It goes to the player who makes the headlines when the headlines are most desperate.

You can look at all three of these definitions and come up with someone who fits the bill over the last 3 or 4 seasons.  What I'd like to do is take a look back over several, identify the winner in each definitional category, and see if there's a favorite.

Posted by 12eight at 17:14:13 | Permanent Link | Comments (0) |

September 07, 2005

Notes for Further Study

After last night's post, Reb made an interesting comment that I think is worthy of further study.  The comment was as follows:

As for the LOB... it is somewhat depressing. What concerns me about it is that our boys don't seem to be getting the hits off good pitching like they were last year. I don't mind leaving 17 on if we get some others to actually score.

That leads me to a few questions.  First, how are the Sox batters vs. the best pitchers?  The good hitting vs. good pitching mantra applies here; are the Sox a team that is only great offensively vs. the mediocre pitchers, but fold for the good ones?  Or can we dish it out on the best as well?  Also, how can we compare several things: the Sox vs. good pitchers and the Sox vs. all pitchers, the Sox vs. good pitchers this year and the Sox vs. good pitchers last year, and the relationship between our hitting, good pitching, and average pitching over both years.  I spent a little while today running some numbers from this year, and some of the things I'm seeing are fairly interesting.  Tomorrow, I'll do the same for '04, and see what comparisons can be made.

Posted by 12eight at 17:17:47 | Permanent Link | Comments (10) |

August 08, 2005

100th pitch

Apparently (and I've never seen this, having only infrequent access to NESN) there's been a contest running where, if a Sox player homers on the 100th pitch of any game, one lucky fan wins a chance to win a Chevy Cobalt.  Empyreal Environs has been mentioning this contest.

In her last post, a commenter wondered what the odds of such an event actually occurring were.  Me being two things - a geek and a procrastinator - I decided to take a few moments out of my paper writing to try to figure out that exact thing.  Here's the comment (slightly edited for posting) that I just put up over there with my findings:

"Figuring out the odds of that happening means a two part problem. First, we have to figure out the odds of hitting a homer on any given pitch, and then the odds of that occurring on any specific pitch.

The first part is the more complicated one. Easy to do on a league scale - just dividing the total number of homers hit in the AL by the total number of pitches thrown in the AL but, it's complicated by the fact that we're talking about a specific team; the Red Sox. So, we have to figure out, approximately, how many pitches have been thrown against the Sox this year. Using ESPN stats, and multiplying each player's total Plate Appearances by their average # of pitches seen per PA, we get a total of 16,734 pitches thrown against the Sox this year. Of course, some of those have been to Sox pitchers, so subtracting that total gives us 16,650. Sox batters have hit 130 homers this year, none of them by pitchers; this means that a Sox player, on average, hits a homer every 128 or so pitches.

Now comes the easy part; figuring out the chances of it happening on any specific pitch. The average number of pitches thrown in any game vs the Sox, we can figure out by dividing the number of pitches by the total number of games: they've played 110 games, which means that on average, 167 pitches are thrown to Sox batters per game.

So, putting those two numbers together, the chances that Sox batter hits a homer on the 100th pitch of a ballgame are 1:(167*128), or 1:21,376.

That also means that, mathematically, this should happen once every 132 seasons (21,376/162).

So, not very good."

This also means, by the way, that it's eminently possible that a Sox player has never hit a homer on the 100th pitch of any given ballgame, which seems unlikely, and yet there it is.

Update:  Yeah, that's totally wrong.

The actual answer would be 1:128, being the chance that a Sox player wuld hit a homer on any given pitch.  It would only be 1:21,376 if the contest were something along the lines of "the Sox only homer of the game comes on the 100th pitch". 

So, gasp, I suck at math.  Nothing at least three of my high school teachers couldn't have told you.

Posted by 12eight at 16:06:09 | Permanent Link | Comments (33) |

July 21, 2005

James Doohan, 1920-2005

I meant to put this up earlier, and of course it has nothing to do with baseball, but warp speed to James Doohan, Star Trek's Scotty, who passed away yesterday at the age of 85.

Here's to you, laddie.

Posted by 12eight at 21:56:09 | Permanent Link | Comments (4) |

July 16, 2005

Is Rafael Palmeiro a Hall of Famer?

Last night, Rafael Palmeiro doubled into the left field corner, scoring a run.  It was the 583rd double of his 20 year career; the single he added in the next inning was the 1,814th that's left his bat.  Together, they were the 3,000th and 3,001st  base hits of Palmeiro's career, making him only the 4th player in major league history with 3,000 hits and 500 homeruns.  The others on that list?  Hank Aaron, Willie Mays, and Eddie Murray.

So, is Rafael Palmeiro a Hall of Famer?

My gut reaction has always been to say no.  Admittedly, no one really asks me, and no one would care if they did, but now I have a blog, so no.  I don't think that Rafael Palmeiro is a Hall of Famer.  But a lot of people seem to, so I want to line up the arguments for and against, and see how they stack up.

Posted by 12eight at 18:03:03 | Permanent Link | Comments (9) |

June 30, 2005

Pythagorean Differentials: Possible Causes

It's here!  I'm sure most of you have been hitting refresh obsessively, waiting for this one.

 

For the last few weeks, I've been looking at Pythagorean Win % differentials, trying to figure out exactly what causes them - whether they're predictable or random, whether they are related more to quantifiable team performance or to chance and luck.  I wanted to take a look at this after watching the Washington Nationals continue to lead their division, despite having allowed more runs than they've scored.  Is there something we can point to historically that might suggest why the Nats have been able to keep a winning record despite being outscored by their opponents?  That's sort of the inspiration for his post.

 

 Now, this is a very long study, with a lot of charts, data points, etc.  So, for those of you that might not want to read through it all, let me just sum up the findings below:

 

Good pitching only assures that you lose as many games as you "should".  Beyond that, luck or chance takes over; only luck and chance put a team over its predicted winning percentage.  So, the only thing a team has control over is whether it plays t the level that it should, and the only way to control that is through quality pitching, both from starters and from relievers.

 

The data I used to come to this conclusion, along with more info on the Pythagorean Win Theorem, my own methodology, and a bunch of pretty Excel charts, can be seen by pressing that little green button down there.  Enjoy.

Posted by 12eight at 22:37:26 | Permanent Link | Comments (13) |

June 15, 2005

Regarding Bellhorn

There has been, as per usual, a great deal of back and forth about the performance of one Mark Christian Bellhorn - on message boards, blogs, and (I'm presuming) around water coolers New England-wide.  Most of this discussion is centered on strikeouts: Are they bad? Are they not bad? Are they awesome?  Does he do it too much?  Does he put the ball in play too little?  What would happen if he didn't? Should he be shot?  If he tried to shoot himself, would he miss?

I've discussed the issue of how damaging a strikeout really is elsewhere, and I don't want to do it again here.  What I do want to do, though, is talk about Bellhorn's performance this year more intensively.  Regardless of his K rate, Bellhorn's performance has been down noticeably; as of the end of the game yesterday, there was a .100 difference between his 2004 OPS - the highest among 2B's in the AL - and his 2005 OPS, which ranks him 7th on the same list.  In other words, something this season has made Mark Bellhorn, in 2004 one of the absolute top-tier offensive 2B's in the game, mediocre.  Why?

To try and figure this out, I took Bellhorn's stats from 2004 and 2005 and broke them down, result by result.  When a batter steps to the plate, at least generally, you can break down the possible results in a number of ways.  The first is the most simple: getting on base vs. not getting on base.  The second is whether or not a player makes contact.  These two things can be subdivided; what happened when he made contact (hit? out? extra-bases? grounder? fly?), or by which method did he not make contact (K? BB? HBP?).  We can also look at things like what types of hits he gets when he makes contact - single, double, triple, homer.  There are also about 9 trillion other things you can do with a stat sheet and Excel, but I didn't do those.  I did the above, and what they told me gives me an idea of hat things have and what things have not changed for Mark Bellhorn this season.

As far as the first - on base vs. not on base - obviously Bellhorn's OBP is down somewhat.  in 2004, he had an OBP of .373, in '05 it is down to .353, which meas that over the course of this season, as compared to last year's rate, Bellhorn has gotten on base about 5 fewer times.  Yes, I said 5.  As in five.  As in this many: * * * * *.  Not a lot, huh?  And we can break that down a little further to find that Bellhorn has actually walked 2 more times than he would have at last years rate, but has gotten hit by a pitch 2 fewer times.  He has collected 5 fewer base hits.  In fact, to equal his rates from last year, Bellhorn would have only to collect 5 more base hits, exchange 2 walks for HBP's, and strike out 6 fewer times.  So, that's not a stunning difference.

So let's take the second set of splits.  First, contact and non-contact.  In 2004, Mark Bellhorn made contact in 56.5% of his plate appearances, or around 350.  This year so far, Bellhorn has made contact in 53.9%; a definite difference (though it mounts to merely 6 PA's over the course of this year).  We'll look more deeply into his contact numbers in a moment, but first, I want to address his non-contact numbers.  Of the 43.5% of his PA's in which he didn't make contact last year, 32.6% resulted in walks, 65.6% in strikeouts, and 1.9% in HBP's.  This year, in the 46.1% of his PA's in which he didn't make contact, 32.7% resulted in walks - virtually identical to his rate last year.  He has seen a slightly higher rate of K's at 67.3%, ad an equal decline in his HBP rate (Bellhorn has not been hit by a pitch this year).  So, yes, there's a slight uptick in K's.  Very slight.

Now for the bread and butter.  When Bellhorn made contact in 2004, he got on base (obviously not counting errors or fielder's choices) at a .394 clip; this year he s doing so at a .376 clip.  To break that down further, let's look at the kind of hits he's getting. When making contact, Bellhorn is hitting singles at approximately the same rate (23.1%/23.2%), and has doubled somewhat more frequently.  His triple rate is different, but triples are rare enough so that this is a flukey stat.  HR, on the other hand, are not.

In 2004, Bellhorn homered in about 2.7% of his total plate appearances, and in 5% of his contacts. In 2005? Bellhorn has homered in .3% of his plate appearances, and in 2.5% of his contacts.  In other words, Bellhorn's homerun rate is approximately half of what it had been.  It is the most consistent drop between his contact totals and his PA totals, and suggests a legitimate power problem.  Let's assume, then, that we equalize Bellhorn's contact stats for 2005.  We will give him 5 additional base hits, and make 3 of them homers, and one of them a triple (approximating last year' rate).  The results?  Bellhorn's OBP rises to .375, 2 points higher than the one he posted in 2004.  His Slugging rises to .444- identical to last years.  So, that's all it took to make Mark Bellhorn 2005 into Mark Bellhorn 2004.  17 total bases.

So what conclusions can we reach here?  First, yes.  Mark Bellhorn is making contact slightly less frequently than he did last year.  Second, the drop in OPS can be whittled down to exactly 5 plate appearances: one in which he should have tripled, 3 in which he should have homered, and one in which he should have singled.

On the surface, that doesn't sound like a lot, and it might not be.  However, it could also be indicative of a power drop.  The point is, at his point in the season, it is extremely difficult to tel which.  If Bellhorn were to go 5/10 over the next two days, with two homers, all of a sudden th gap in his OPS nearly halves itself.  And even the most strident Bellhorn haters can't say it's impossible for Bellhorn to do that.  That's the thing about baseball stats.  We have to look at them; they're on the screen every AB, they're in the newspaper in the morning, jackasses like me plaster them all of the internet... but slight changes - changes that could be as explainable by chance as they could be by an actual problem - can alter those stats dramatically. 5 hits, and 17 total bases; that is all that is separating this year's Mark Bellhorn from last year's.  The fact that Bellhorn makes contact in a relatively small number of PA's, and has a remarkably stable OBP when he doesn't, means that contact results become more glaring.  As poster 'teddykgb' wrote over at Royal Rooters,

Basically, Bellhorn walks the most marvelous and tenuous line we've ever seen an MLB player do. When you think about how perfect he has to be at everything he does as a hitter to be a high quality 2b in spite of his shortcomings, it is astounding.

He's got no margin for error. I worry for his BABIP. He can't afford to suffer a dip in BABIP, or we get the results we saw earlier in the year or worse, along with the traditional "Bellhorn sucks and i've always said he sucks because he strikes out too much" garbage we're forced to read every month or so.

And he's exactly right.  That dip in BABIP - or, more appropriately, his BA when making contact (BABIP removes HR from the equation, and as we've seen HR are a very important aspect of Bellhorn's production) - is what we've seen this year, along with a more pronounced drop in his SLG on contact.  These are the things that make Bellhorn 2005 a league average 2B as opposed to one of the very best in the American League.

So, there.  Can we shut up about Mark Bellhorn now?

Posted by 12eight at 21:31:55 | Permanent Link | Comments (16) |

May 18, 2005

Further LOB Thoughts

Well, blog.com just ate my follow-up post to the earlier one on runners left on base, so I'll try to sum it up again.

In the previous post, I looked at the relationship between runs scored, runners left on base, and runners lost - that is, runners who were ruled out on the basepaths during an inning.  I decided to look at those last a little more carefully, projecting them into theoretical runs lost.  Imagine that runners are never out on base - anyone who gets on base either scores or is left on base at the end of the inning.  The percentage of runners that score given that situation can be calculated by taking the number of runs scored and dividing it by the sum of RS and LOB.  Applying the number of erased runners to those percentages gives us an idea of how many lost runners would have scored had they not been called out, a number I'll call Runs Lost, or RL.  Of course, this early in the season, those numbers are fairly similar, as well as dependent on the number of runs scored by team.  To correct for that, I've prorated both the number of runners lost and the number of runs scored over a full 162 season, for each team.  Then, looking at RL162 as an expression of RS162 - by dividing the one into the other - shows us the impact of RL162 on a team.  The lower the result, which I'll call the Runs Lost Rate (RLR), the lower the impact lost runners have had on a team's offensive capability.

So, here are the 5 teams with the lowest and highest RLRs, along with their RL162, RS162, RE%, and RIE% (from the last post):

 

Team

RL162

RS162

RLR

RE%

RIE%

1.

STL

54.5

857

6.36%

40.0%

54.0%

2.

NYY

59.8

927

6.45%

40.8%

53.1%

3.

BOS

58.4

901

6.48%

38.8%

55.1%

4.

CHC

49.7

716

6.94%

36.4%

57.1%

5.

COL

57.2

824

6.95%

38.4%

55.1%


 

Team

RL162

RS162

RLR

RE%

RIE%

26.

SF

75.2

742

10.13%

34.8%

56.0%

27.

CWS

76.0

731

10.39%

38.7%

51.9%

28.

WAS

73.8

694

10.63%

34.2%

56.2%

29.

KC

68.8

631

10.82%

35.3%

54.9%

30.

MIN

84.2

775

10.86%

36.1%

54.1%

A pretty interesting list.  Notice, first, that RLR doesn't necessarily correspond to W-L records - as you can see, COL and CHC show up in the top 5 while CWS and MIN show up in the bottom 5.  It would look even more interesting next to a list of extra-base hits.  Note second that, while RE% does appear to be related to RLR, at least somewhat, there is no apparent connection between RIE% and RLR.  So, again, we can conclude from this that LOB is essentially the least important of the three possible outcomes for a baserunner, and should be better understood as a remainder when RS and runners lost are removed from the total number of baserunners.  In fact, assuming a reasonable RE%, a team could conceivably be better off having a higher LOB rate, as it suggest that less runners are getting killed on the basepaths.

Posted by 12eight at 21:20:55 | Permanent Link | Comments (2) |

Stranded on Second

There's been a lot of talk about runners left on base of late, and I wanted to take a closer look at the numbers to figure out whether, in fact, the Red Sox were leaving tremendous numbers of men on base this year - as well as whether that is quantifiably a 'bad thing'.  I started by establishing what the range was in terms of LOB.  With ESPN's handy team stats page, I compiled data on all the factors that lead to baserunners (except for things like reaching on errors, dropped third strikes, etc.), and then all the factors that lead to them getting off-base (scoring, getting caught stealing, getting killed in a DP).  Though all the various factors weren't accounted for, the biggest ones were, and the rest can be considered flukes anyway.  So, the numbers we're left with are: runs scored (RS), total baserunners, (BR) and men left on base (LOB).  I wanted to get a sense of how RS and LOB related to a teams total BR, so I divided each by BR, giving me what I'll call the Runner Efficiency % and the Runner Inefficiency %. 

Here's a look at the top 5 in RE%, with all numbers included:

 

Team

BR

RS

LOB

RE%

RIE%

1.

BAL

510

213

256

41.8%

50.2%

2.

TEX

501

207

260

41.3%

51.9%

3.

NYY

561

229

298

40.8%

53.1%

4.

ATL

469

188

244

40.1%

52.0%

5.

STL

502

201

271

40.0%

54.0%

And the bottom 5:

 

Team

BR

RS

LOB

RE%

RIE%

26.

ARI

510

173

298

33.9%

58.4%

27.

PHI

507

170

301

33.5%

59.4%

28.

HOU

439

146

253

33.3%

57.6%

29.

PIT

424

139

249

32.8%

58.7%

30.

OAK

462

145

282

31.4%

61.0%

And now here are Boston's numbers:

 

Team

BR

RS

LOB

RE%

RIE%

6.

BOS

559

217

308

38.8%

55.1%


So, as we can see here, Boston is actually quite near the top tier in all of baseball in terms of RE%, falling just 1.2% shy of 5th place St. Louis.  We can also see that, in terms of percentage, there's not a huge gap between the top and the bottom: right around 10%.

Now, let's take a look at RIE%, in the same order (top 5, bottom 5, Boston).

 

Team

BR

RS

LOB

RE%

RIE%

1.

BAL

510

213

256

41.8%

50.2%

2.

CWS

478

185

248

38.7%

51.9%

3.

TEX

501

207

260

41.3%

51.9%

4.

ATL

469

188

244

40.1%

52.0%

5.

TOR

507

194

268

38.3%

52.9%

And the bottom 5:

 

Team

BR

RS

LOB

RE%

RIE%

26.

SD

538

185

312

34.4%

58.0%

27.

ARI

510

173

298

33.9%

58.4%

28.

PIT

424

139

249

32.8%

58.7%

29.

PHI

507

170

301

33.5%

59.4%

30.

OAK

462

145

282

31.4%

61.0%

And now Boston:

 

Team

BR

RS

LOB

RE%

RIE%

15.

BOS

559

217

308

38.8%

55.1%

So, once again, we see that the gap between the best and the worst RIE% is just a shade under 11%.  Boston in terms of RIE% is pretty much dead in the middle of the pack at 15th place, which is interesting.  Why so wide a gap between RE% and RIE% for Boston?

This is where our peripherals come into play.  RE% is effectively the percentage of baserunners that a team is able to score, while RIE% is the number of baserunners that a team leaves on base.  However, these numbers do not add up to 100%.  Iinstead, there is a percentage of runners that neither scores nor is left on base.  So, by taking the sum of RE% and RIE%, and subtracting it from 100% (DIFF), we get the percentage of runners that neither score nor are left on base.  Now, of course, there are a tremendously large number of ways that a runner can be called out at a base.  The two most prevalent, however, are the ones that concern me here:  caught stealing, and grounded into double play.  These numbers, combined, form a type of signpost towards total number of runners lost (BRL), and there is a distinct correlation in the numbers between the DIFF and BRL- the higher the BRL the higher the DIFF. 

Here's a look at the top and bottom 5 in DIFF, with some select peripheral stats.

 

Team

DIFF

HR

XBH

SACB

CS

1.

STL

6.0%

46

128

11

7

2.

NYY

6.1%

52

129

8

9

3.

BOS

6.1%

43

130

2

1

4.

CHC

6.5%

48

124

20

10

5.

COL

6.5%

34

115

18

5


 

Team

DIFF

HR

XBH

SACB

CS

26.

SF

9.2%

29

108

16

5

27.

CWS

9.4%

42

102

15

19

28.

WAS

9.6%

33

118

16

12

29.

KC

9.8%

36

106

14

18

30.

MIN

9.8%

35

100

7

10

The numbers that jump out here are the differences between the top and bottom 5 in terms of XBH and HR.  While the top 5's averages in each are 125.2 XBH and 44.6 HR, the bottom 5's are 106.8 XBH and 35 HR.  Moreover, the differences in SACB and CS are definite: 11.8 SACB and 6.4 CS in the top 5, and 13.6 SACB, 12.8 CS for the bottom 5.  From this, and from other peripherals, we can see that small-ball tactics- bunting, sacrificing men over, stealing bases- increase the number of runners lost, while larger quantities of XBH and HR lower that number.

So let's make a supposition.  The truly important number does not have any realtion to the number of men left on base, but rather the number of men on base that score.  This only makes sense; if a larger number of your baserunners are scoring than those on other teams, who really cares what specifically happens to th ones that don't?  Either they are stranded or they are removed from base, but either way, they are non-scoring entities.  Now, as shown, Boston has been the sixth most efficient teams in the majors in bringing their runners around to score, but only a middlingly successful team in terms of leaving men on base.  Interestingly, in most cases, these numbers are connected; teams with a better rate of scoring their runners also have a better LOB rate.  What are some of the explanations for this?

For this we have to look again at DIFF, that is - essentially - the percentage of baserunners that neither score nor are left on base.  Boston has to this point had one of the three smalled DIFFs in the majors, and the reasons for this are clear.  Teams who do not erase baserunners, through GIDP, CS, etc., will have a lower DIFF than those that do.

So, in order to understand how important LOB really is, we have to take these items together.  First, and obviously, a team that scores a larger percentage of its baserunners than other teams is far more likely to be successful offensively.  Second, and interestingly, teams with lower DIFFs tend to have more baserunners - 510 for the lowest third in DIFF, vs. 471 for the highest third - more extra-base hits - 118 vs. 107 - fewer Sacrifices and CS - 21/7 vs. 26/13 - and more runs scored - 191 vs. 170.  There is no discernable difference in either RE% or RIE% in terms of DIFF.

So, in essence, of the three possible outcomes for a man on base (scoring, being stranded, or being removed) LOB is in fact the least important for a team.  A lineups job is to put men on base and make sure as many of them are allowed to score as is possible.  Therefore, teams that are efficient in scoring their runners, and teams that avoid having baserunners removed, are more successful.

Men being left on base is annoying, especially in a loss.  However, it turns out to not be that meaningful.  Instead, we should be concerned more with the number of runs we score in relation to baserunners, and the number of baserunners that are erased during an inning.  Teams that get a lot of men on base are guaranteed to leave more of them there - not every inning can end in a homerun.  But we can try to make ourselves more efficient in terms of the runners that reach.  The evidence here - and elsewhere - suggests that the way to do so is not by moving runners along and stealing bases, but instead through extra-base hits and bringing runners around in large innings.  Which the Red Sox have done a very good job of doing to this point.

Posted by 12eight at 16:00:30 | Permanent Link | Comments (2) |