The Greatness Number: Using mathematics and regression to compare clubs across eras

PBR - Through the previous regression analysis [click here to view the article] with wins as the dependent variable we now know WHIP, OPS, and Run Differential all have a strong relationship to success. Using these three metrics, it is my belief that a mathematical formula can be constructed to definitively determine the best club in baseball history.

I used wins in the previous analysis as the dependent variable because I simply wanted to see what metrics correlated strongest to winning. Ironically, there actually is a major drawback in using wins as a means to compare clubs across eras. Why? Think of the several ways the sport has changed over the last century, specifically with regard to the number of teams in each league and scheduling. MLB introduced the 162-game schedule in the early 1960s, but prior to that the schedule was 154 games and before that it was 140 games. Win totals may be greater now because clubs have more games to play, meaning a team with more wins isn't necessarily better than a team with less wins. The equalizer is winning percentage.

Winning percentage is found by dividing the games a club has won by the total number of games played. For example, say Team A won 90 games in 1928 (140-game schedule) and Team B won 92 games in 1955 (154-game schedule.) Looking purely at the win total, some fans may say Team B was a better club, but is that true? If you divide 90 wins by 140 you get a winning percentage of ..643 - if you divide 92 by 154 you have a .597 winning percentage. These examples demonstrate why when comparing clubs across eras that more wins does not necessarily mean a better club.

To find the coefficients for the model I ran a regression analysis and used winning percentage as the dependent variable and WHIP, OPS, and RDiff as the independent variables. Remember, these coefficients will come from data from over 2,300 teams dating from 1900 through 2011. This means there is no bias towards clubs that played in a specific era.

The model: (.503+(OPS*.097)+(RDiff*.006)+(WHIP*-.053).

The result of the equation, which I'll call a Greatness Number, is a number similar to winning percentage and a figure that normalizes teams across eras by focusing specifically on skills (OPS, WHIP, RDiff) and not simply wins. Once you run the equation you can take the results and fairly compare clubs across eras.

As an example, let's compare the 1935 Detroit Tigers and the 1955 Brooklyn Dodgers. For the Tigers, the equation is (.503+(.801*.097)+(254*.006)+(1.44*-.053) = .660. For the Dodgers, the equation is (.503+(.804*.097)+(207*.006)+(1.29*-.053) = .640. Given the results, the '35 Tigers have a higher Greatness Number than the '55 Dodgers, despite having less wins and a lower winning percentage. This means the '55 Dodgers can be considered better than the '35 Tigers.

The relationship between the Greatness Number and winning percentage is 94.6% (per a correlation analysis).

Top 10 clubs in history ranked by Greatness Number

Top 10 clubs in history ranked by Winning Percentage
Over the next few days I'll unveil more details from my analysis of over 2,300 clubs, specifically information related to Philadelphia teams.

In the meantime, the 1939 Yankees carry the crown of the greatest baseball team to ever have played the game..

If we treated the Greatness Number as a winning percentage and assumed the '39 Yankees and '27 Yankees played similar 162-game seasons we can predict the '39 Yanks would finish ahead of the '27 Yanks by two games.

Formula: (GN*162)-162)

For the '39 Yankees: (.765*162)-162 = 124 wins and 38 losses.
For the '27 Yankees: (.750*162)-162 = 122 wins and 40 losses.
- Patrick Gordon is the editor of the Philadelphia Baseball Review. Contact him at pgordon@philadelphiabaseballreview.com or @Philabaseball on Twitter.

No comments: