Twenty years ago, Bill James published a formula he had worked out for predicting, with 70% accuracy, the outcome of postseason series. Though I remembered the general idea (long-cycle offenses [2B, BB, 1B, SF] are even less successful than short-cycle [3B, HR] in postseason), I had forgotten the details. I posted an inquiry on SoSH, and Captain Fishtail was kind enough to give me the details of the formula (ta, Captain!).
I’ve spent every free second for the past week figuring out its success since the new playoff formula started in ‘95. The inputs are sometimes very counter-intuitive and, if it still works 20 years later, might suggest some courses of action for teams such as the Red Sox that are now primarily interested in postseason success.
Or, as we shall see, perhaps not! Some very interesting (at least, to me!) items emerged. In a nutshell, the formula still, in my opinion, works, although with some fascinating exceptions.
First, here is JamesÂ’ formula (per the Captain):
1. Compare the W/L records of the two teams. Give 1 point per half-game difference to the team with the better record.
2. 3 points to the team that scored more runs.
3. 14 pts to the team that hit fewer doubles.
4. 12 pts to the team that hit more triples.
5. 10 pts to the team that hit more homers.
6. 8 pts to the team with the lower batting average.
7. 8 pts to the team that made fewer errors.
8. 7 pts to the team that turned more double plays.
9. 7 pts to the team whose pitchers walked more men.
10. 19 pts to the team that threw more shutouts.
11. 15 pts to the team whose ERA is further below the league ERA.
12. 12 pts to the team that has been in postseason play more recently. If they've been there equally recently, award the points to the team that was more successful at that time.
13. (Playoffs only) 12 points to team that led in head-to-head competition.
Here are the year-by-year results:
Â’95 7-0
Â’96 4-3
Â’97 3-4
Â’98 5-2
Â’99 5-2
Â’00 4-3
Â’01 5-2
Total thru Â’01: 33-16 or 67.3%
Â’02 1.5-5.5 (The formula revealed a tie when the Angels beat the MFY.)
Â’03 3-4
Total for past two years: 4.5-9.5 or 26.5%
Grand total for 9 years: 37.5-25.5 or 59.5%
So, does it work at about a two-in-three rate, with the past two years simply a statistical aberration? Has something specific changed in the past two years, making the formula now useless? Or will the formula be successful at about a 60% rate? Even this latter, I would submit, is fairly good success. It would seem to me to suggest three things:
(1) That the postseason results are statistically predictable, and thus something more than random luck.
(2) That if James’ inputs are justified, many ideas about postseason success are not at all intuitive. (“Hey, let’s put together a team that smacks the fewest doubles, hits for the lowest average, and gives up the most walks!” [Okay, I understand that’s a distortion, but still…])
(3) That this represents a sabermetricianÂ’s opportunity to improve the formula. (Has anyone other than James attempted this?)
Here are some more interesting details from my analysis:
1. Prior to Â’02, the formula was phenomenally successful for the NL playoffs, picking 15 out of 21 winners (71.4%) over the previous seven years. Since Â’02, 0 for 6.
2. World Series (best 4 of 7) results are even better than playoffs: 7-2 overall, wrong only in Â’96 (MFY 56 points beat ATL 98 points) and Â’02 (ANA 52 points beat SFG 70 points). But it predicted what many considered upsets in Â’03 (FLA 71 over MFY 64), Â’01 (ARI 97 over MFY 35), Â’97 (FLA 93 over CLE 33), and Â’95 (ATL 73 over CLE 62).
3. The primary reason the formula is not more successful is the playoff presence of the AÂ’s and the MFY. The MFY are 10-1 when favored. They are 0-1 in the only Jamesian tie (vs. the Angels as mentioned above). But they are also a most annoying 6-3 when Jamesian underdogs.
4. When people say that Billy BeaneÂ’s Athletics arenÂ’t designed for the post season, they could hardly be more inaccurate! On the basis of JamesÂ’ successful prediction formula, Oakland should have beaten the MFY in Â’00 (73-63), and again in Â’01 (88-70). Oakland should have moiderized MIN in Â’02 (136-8!!) and crushed BOS in Â’03 (104-25). Yet they are 0-4. Why? (I suspect luck, but heyÂ…)
5. In those series where one team gets three or more times the Jamesian points of the other, the favorite is 10-3, with two of the exceptions being the Oakland teams mentioned above. The only other exception was in Â’97, when FLA (34) beat ATL (111).
So to use the formula for betting purposes, perhaps a corollary needs to be suggested: ALWAYS bet against the AÂ’s and NEVER bet when the MFY are underdogs. That would take the overall success of the formula to a spectacular 38.5 out of 54, or a very nice 71.3%, even including the past two years. I await speculation as to why the James formula doesnÂ’t work for those two teamsÂ…
Edit: Corrected arithmetic error (hey, I was an English major!).
I’ve spent every free second for the past week figuring out its success since the new playoff formula started in ‘95. The inputs are sometimes very counter-intuitive and, if it still works 20 years later, might suggest some courses of action for teams such as the Red Sox that are now primarily interested in postseason success.
Or, as we shall see, perhaps not! Some very interesting (at least, to me!) items emerged. In a nutshell, the formula still, in my opinion, works, although with some fascinating exceptions.
First, here is JamesÂ’ formula (per the Captain):
1. Compare the W/L records of the two teams. Give 1 point per half-game difference to the team with the better record.
2. 3 points to the team that scored more runs.
3. 14 pts to the team that hit fewer doubles.
4. 12 pts to the team that hit more triples.
5. 10 pts to the team that hit more homers.
6. 8 pts to the team with the lower batting average.
7. 8 pts to the team that made fewer errors.
8. 7 pts to the team that turned more double plays.
9. 7 pts to the team whose pitchers walked more men.
10. 19 pts to the team that threw more shutouts.
11. 15 pts to the team whose ERA is further below the league ERA.
12. 12 pts to the team that has been in postseason play more recently. If they've been there equally recently, award the points to the team that was more successful at that time.
13. (Playoffs only) 12 points to team that led in head-to-head competition.
Here are the year-by-year results:
Â’95 7-0
Â’96 4-3
Â’97 3-4
Â’98 5-2
Â’99 5-2
Â’00 4-3
Â’01 5-2
Total thru Â’01: 33-16 or 67.3%
Â’02 1.5-5.5 (The formula revealed a tie when the Angels beat the MFY.)
Â’03 3-4
Total for past two years: 4.5-9.5 or 26.5%
Grand total for 9 years: 37.5-25.5 or 59.5%
So, does it work at about a two-in-three rate, with the past two years simply a statistical aberration? Has something specific changed in the past two years, making the formula now useless? Or will the formula be successful at about a 60% rate? Even this latter, I would submit, is fairly good success. It would seem to me to suggest three things:
(1) That the postseason results are statistically predictable, and thus something more than random luck.
(2) That if James’ inputs are justified, many ideas about postseason success are not at all intuitive. (“Hey, let’s put together a team that smacks the fewest doubles, hits for the lowest average, and gives up the most walks!” [Okay, I understand that’s a distortion, but still…])
(3) That this represents a sabermetricianÂ’s opportunity to improve the formula. (Has anyone other than James attempted this?)
Here are some more interesting details from my analysis:
1. Prior to Â’02, the formula was phenomenally successful for the NL playoffs, picking 15 out of 21 winners (71.4%) over the previous seven years. Since Â’02, 0 for 6.
2. World Series (best 4 of 7) results are even better than playoffs: 7-2 overall, wrong only in Â’96 (MFY 56 points beat ATL 98 points) and Â’02 (ANA 52 points beat SFG 70 points). But it predicted what many considered upsets in Â’03 (FLA 71 over MFY 64), Â’01 (ARI 97 over MFY 35), Â’97 (FLA 93 over CLE 33), and Â’95 (ATL 73 over CLE 62).
3. The primary reason the formula is not more successful is the playoff presence of the AÂ’s and the MFY. The MFY are 10-1 when favored. They are 0-1 in the only Jamesian tie (vs. the Angels as mentioned above). But they are also a most annoying 6-3 when Jamesian underdogs.
4. When people say that Billy BeaneÂ’s Athletics arenÂ’t designed for the post season, they could hardly be more inaccurate! On the basis of JamesÂ’ successful prediction formula, Oakland should have beaten the MFY in Â’00 (73-63), and again in Â’01 (88-70). Oakland should have moiderized MIN in Â’02 (136-8!!) and crushed BOS in Â’03 (104-25). Yet they are 0-4. Why? (I suspect luck, but heyÂ…)
5. In those series where one team gets three or more times the Jamesian points of the other, the favorite is 10-3, with two of the exceptions being the Oakland teams mentioned above. The only other exception was in Â’97, when FLA (34) beat ATL (111).
So to use the formula for betting purposes, perhaps a corollary needs to be suggested: ALWAYS bet against the AÂ’s and NEVER bet when the MFY are underdogs. That would take the overall success of the formula to a spectacular 38.5 out of 54, or a very nice 71.3%, even including the past two years. I await speculation as to why the James formula doesnÂ’t work for those two teamsÂ…
Edit: Corrected arithmetic error (hey, I was an English major!).
"Why'd you leave him with me then, if you didn't want me to kill him?"
Mouse (Don Cheadle) in "Devil in a Blue Dress"
Mouse (Don Cheadle) in "Devil in a Blue Dress"
