8/2/2010: A
couple of days ago (which is to say in the summer of 2010), the
Colorado Rockies enjoyed a run of eleven consecutive hits against
the Chicago Cubs. An 11 hit streak has never happened before,
it says right here on the sports page.
Today's baseball season consists of 30 teams playing
162 games. Previous seasons contained fewer games. I'm not enough
of a fan (or historian) to know if there were more or fewer teams
in the early years, but a reasonable (enough) guess is that there
have been on the order of 2,000 games played during every year
of the modern era (since 1900). 2,000 games times 110 years is
220,000 games. Every game contains at least 54 at bats (roughly
speaking — rain shortened games have fewer, extra-inning
games have more, games with the home team leading after eight
innings have three fewer, games with lots of base runners have
more, etc etc). Let's call it 70 at bats per typical game. Given
the open-ended nature of baseball, every single at bat is an
opportunity for an 11-hit streak to begin (Colorado's began with
two outs in the eighth, playing at home, with a one run lead).
In the history of the modern game, there have been 70 * 220,000
chances to produce an 11-hit streak. One did.
The observed frequency of 11-hit streaks is 1
in about 15,400,000 opportunities.
Let b be the mean batting average of
all the players in a lineup. If the likelihood of a batter getting
a hit at any given at bat were independent of every other at
bat (which it is not, but which is a useful approximation that
ignores 90% of any manager's reason for living), then the likelihood
of any particular at bat beginning an 11-hit streak would be b
* b * b * b * b * b * b... well, anyway, b raised
to the 11th power. The inverse (1 / (b ^ 11)) is the
number of at bats before an 11-hit streak is likely to occur.
Of course, going through the lineup, b varies from player
to player over a range of something like 0.15 to 0.40 (taking b to
correspond to a player's full season's batting average). Some
outliers are below 0.15 and others are above 0.40 (where have
you gone, Joe DiMaggio?). People with access to enough data could
do Monte Carlo sims using actual lineups and actual batting averages,
but adjusting for hot streaks and dry spells within the season
for individual players might still become intractable. Let's
keep it much simpler and look only at yearly averages for lineups
as a whole. In fact, let's turn it around and ask what yearly
average for all players for all time corresponds to the observed
frequency of one 11-hit streak in the modern era. We know what
the general range of plausibility is: no entire lineup will average
.400 or more and no entire lineup will average .150 or less (if
God is kind). The number we want must lie somewhere between those
extremes.
By now one has to wonder whether anything about
a game as statistically rich and thoroughly storied as baseball
can be captured in so few numbers. So crank the wheels and see
what falls out. (Yes, I know we could solve for b directly,
but it's very late, and who really wants to mess with logarithms,
and isn't it easier to just type algebraic expressions into the
Google calculator and see what happens?)
Take b to be 0.222
1 / (0.222 ^ 11) = 15,492,349
Which is just about right.
If b is taken to be 0.209, an 11-hit
streak is only half as likely to have occurred in the modern
era. (Then the expectation would be 1 in 30,087,829 at bats,
so we might reasonably think we'd have to watch through the 2120
season to see one.) If b is assumed to be 0.237, then
the calculated likelihood of an 11-hit streak would be twice
the observed frequency (1 in 7,546,893 so we might expect it
to have happened twice in fifteen million at bats).
My untutored impression of that simple computation
is that it's not too bad. Those seem reasonable estimates for
the day-in day-out seaon-in season-out batting average for all
players for all time.
If so, then an 11-hit streak is flukey as hell
on any given day, or in any given year, but if batters win their
duels with pitchers between one fifth and one fourth of the time,
you'd expect this kind of thing to happen about once every century
and change.
Given all this, only one thing was 100% certain:
if it ever happened, it had to happen to the Cubs.
Comments?
Send your quibbles
and bits.
Given that at every at-bat the probability that
the batter hits the ball is b, what is the probability p(b) that
in n consecutive at-bats there will be at least one run of at
least r hits in a row (for fixed n and r)?
Let's first compute the probability that such
a run never happens; then the asked-for probability will be 1
minus this no-such-run probability.
The notation "P{...}" means "the
probability that ...", and ^ denotes exponentiation (= raising
to a power).
P{no such run} = P{not getting r hits in a row}^(number
of independent(?) opportunities to get r hits in a row) = (1
- P{r hits in a row})^(n-r+1) sets of r consecutive at-bats in
n at-bats = (1 - b^r)^(n-r+1) So the asked-for probability is
1 - P{no such hitting streak} = 1 - (1 - b^r)^(n-r+1). In other
words,
p(b) = 1 - (1 - b^r)^(n-r+1) . [1]
Setting d=b^r and m=n-r+1 for simplicity, we have
p(b) = 1 - (1 - d)^m [2]
Here is a program for the Texas Instruments TI-84
programmable calculator that for a given N and R repeatedly takes
in an "average batting average" B, and displays the
probability P that in N at-bats there will be at least one run
of at least R hits in a row.
PROGRAM: HITS
11 into R
15400000 into N
N-R+1 into M
Lbl B:Prompt B ; ask for B
B^R into D
1-(1-D)^M into P
Disp P ; show the run-of-hit probability
Goto B ; try another B
With this program, you can produce a table like
b p(b)
.100 .00015
.125 .00179
.130 .00276
.140 .00622
.150 .01323
.160 .02673
.170 .05141
.180 .09423
.190 .16422
.200 .27050
.204 .32440
.205 .33888
.210 .41692
.211 .43355
.212 .45048
.213 .46768
.214 .48514
.215 .50281
.216 .52068
.220 .59337
.221 .61166
.222 .62992
.223 .64810
.224 .66616 (= 2/3)
.225 .68406
.226 .70174
.227 .71917
.228 .73629
.229 .75307 (= 3/4)
.230 .76946
.232 .80090 (= 4/5)
.240 .90400
.250 .97457
.260 .99649
.270 .99981
.280 .99999716
.290 .9999999931
.300 1 (indistinguishable from certainty)
.400 1 (What does the graph look like?)
This is a very satisfying result. It says that
the probability of getting a results at least as good as the
11-hit run reported is 50% with an average batting average b
of .215, 63% if b=.222, 75% if b=.229, 90% if b=240, and virtually
100% if b>.270.
Problem solved.
-- Mark Spahn (West Seneca, NY)
:: back to
the slow blog ::