It's no secret that statistics have become an increasingly important aspect of baseball evaluation. Today, with the growth of advanced metrics and scholarly research related to sabermetrics, nearly every performance on a baseball field is quantifiable. Recognizing the vast array of data available for comparative purposes, I set out with the goal of creating a metric that easily compares baseball clubs across eras in an effort to rank every team in major league history.
by PATRICK GORDON | Managing Editor | May 23 2014, 11:00am EDT | @Philabaseball
Illustration created by Patrick Gordon
(Editorial Note: I ran a version of this article last year, however this copy is streamlined and includes new information pertaining to the formula for the TGN.)
Let me start off by admitting I never was a huge fan of numbers. I dreaded math class in school and loved the notion that as a journalist I'd rarely have to deal with numbers and statistics. Ironically, I now find myself crunching data and using algorithms all the time - all because I'm chasing an answer to the question of what baseball team truly was the best in major league history.
My original research question, prior to getting involved with trying to determine the best team in baseball history, dealt with finding out what statistics correlated the most to winning.
After looking at data from Fangraphs on every team since 1900, I found OPS, Run Differential, and WHIP to be the three statistics that correlated most to successful baseball teams. [You can read that article here]
I used wins in the previous analysis
as the dependent variable because I simply wanted to see what metrics
correlated strongest to winning. Ironically, there actually is a major
drawback in using wins as a means to compare clubs across eras. Why?
Think of the several ways the sport has changed over the last century,
specifically with regard to the number of teams in each league and
scheduling. MLB introduced the 162-game schedule in the early 1960s, but
prior to that the schedule was 154 games and before that it was 140
games. Win totals may be greater now because clubs have more games to
play, meaning a team with more wins isn't necessarily better than a team
with less wins. The equalizer is winning percentage.
Winning percentage is found by
dividing the games a club has won by the total number of games played.
For example, say Team A won 90 games in 1928 (140-game schedule) and
Team B won 92 games in 1955 (154-game schedule.) Looking purely at the
win total, some fans may say Team B was a better club, but is that true?
If you divide 90 wins by 140 you get a winning percentage of ..643 - if
you divide 92 by 154 you have a .597 winning percentage. These examples
demonstrate that when comparing clubs across eras more wins does
not necessarily mean one club is better than another.
To find the coefficients for the
model I ran a regression analysis and used winning percentage as the
dependent variable and WHIP, OPS, and RDiff as the independent
variables. Remember, these coefficients will come from data from over
2,300 teams dating from 1900 through 2013. This means there is no bias
towards clubs that played in a specific era.
The model: (.5038+(OPS*.0946)+(RDiff*.0006)+(WHIP*-.0525).
In early 2013 I realized the formula could be improved by using Run Differential per Game instead of simple Run Differential. This slight modification removed any bias as the number of games played varied from year to year.
The updated model: (.5040+(OPS*.0897)+(RDiffPG*.0957)+(WHIP*-.0500).
The result of the equation, which
I'm deeming the Team Greatness Number, is a number similar to winning percentage
and a figure that normalizes teams across eras by focusing specifically
on skills (OPS, WHIP, RDiffPG) and not simply wins. Once you run the
equation you can take the results and fairly compare clubs across eras.
The relationship between the Team Greatness Number and winning percentage is 94.6% (per a correlation analysis).
The Team Greatness Number compares favorably to the Pythagorean Theorem as both have a 95% (rounded) correlation to actual winning percentage. I claim the Team Greatness Number is a better tool for comparing teams across eras though as it deals with more than simply runs scored and runs allowed.
Top 10 Major League Baseball teams of all-time via Team Greatness Number |
The full rankings via the Team Greatness Number will be released on June 1. The rankings will include all 2,371 major league clubs from 1900 - 2013.
- The Philadelphia Baseball Review is the top baseball news source in Philadelphia, providing news coverage and analysis of the 2014 Phillies and baseball in the Philadelphia region.