Friday, May 23, 2014

The Greatness Number: A metric to determine the best team ever

It's no secret that statistics have become an increasingly important aspect of baseball evaluation. Today, with the growth of advanced metrics and scholarly research related to sabermetrics, nearly every performance on a baseball field is quantifiable. Recognizing the vast array of data available for comparative purposes, I set out with the goal of creating a metric that easily compares baseball clubs across eras in an effort to rank every team in major league history.


by PATRICK GORDON | Managing Editor | May 23 2014, 11:00am EDT | @Philabaseball

          Illustration created by Patrick Gordon

(Editorial Note:  I ran a version of this article last year, however this copy is streamlined and includes new information pertaining to the formula for the TGN.)

Let me start off by admitting I never was a huge fan of numbers. I dreaded math class in school and loved the notion that as a journalist I'd rarely have to deal with numbers and statistics. Ironically, I now find myself crunching data and using algorithms all the time - all because I'm chasing an answer to the question of what baseball team truly was the best in major league history.

My original research question, prior to getting involved with trying to determine the best team in baseball history, dealt with finding out what statistics correlated the most to winning. 

After looking at data from Fangraphs on every team since 1900, I found OPS, Run Differential, and WHIP to be the three statistics that correlated most to successful baseball teams. [You can read that article here

I used wins in the previous analysis as the dependent variable because I simply wanted to see what metrics correlated strongest to winning. Ironically, there actually is a major drawback in using wins as a means to compare clubs across eras. Why? Think of the several ways the sport has changed over the last century, specifically with regard to the number of teams in each league and scheduling. MLB introduced the 162-game schedule in the early 1960s, but prior to that the schedule was 154 games and before that it was 140 games. Win totals may be greater now because clubs have more games to play, meaning a team with more wins isn't necessarily better than a team with less wins. The equalizer is winning percentage.

Winning percentage is found by dividing the games a club has won by the total number of games played. For example, say Team A won 90 games in 1928 (140-game schedule) and Team B won 92 games in 1955 (154-game schedule.) Looking purely at the win total, some fans may say Team B was a better club, but is that true? If you divide 90 wins by 140 you get a winning percentage of ..643 - if you divide 92 by 154 you have a .597 winning percentage. These examples demonstrate that when comparing clubs across eras more wins does not necessarily mean one club is better than another.

To find the coefficients for the model I ran a regression analysis and used winning percentage as the dependent variable and WHIP, OPS, and RDiff as the independent variables. Remember, these coefficients will come from data from over 2,300 teams dating from 1900 through 2013. This means there is no bias towards clubs that played in a specific era. 

The model: (.5038+(OPS*.0946)+(RDiff*.0006)+(WHIP*-.0525). 

In early 2013 I realized the formula could be improved by using Run Differential per Game instead of simple Run Differential.  This slight modification removed any bias as the number of games played varied from year to year.

The updated model: (.5040+(OPS*.0897)+(RDiffPG*.0957)+(WHIP*-.0500).

The result of the equation, which I'm deeming the Team Greatness Number, is a number similar to winning percentage and a figure that normalizes teams across eras by focusing specifically on skills (OPS, WHIP, RDiffPG) and not simply wins. Once you run the equation you can take the results and fairly compare clubs across eras. 

The relationship between the Team Greatness Number and winning percentage is 94.6% (per a correlation analysis). 

The Team Greatness Number compares favorably to the Pythagorean Theorem as both have a 95% (rounded) correlation to actual winning percentage. I claim the Team Greatness Number is a better tool for comparing teams across eras though as it deals with more than simply runs scored and runs allowed.

Top 10 Major League Baseball teams of all-time via Team Greatness Number
The full rankings via the Team Greatness Number will be released on June 1. The rankings will include all 2,371 major league clubs from 1900 - 2013.

- The Philadelphia Baseball Review is the top baseball news source in Philadelphia, providing news coverage and analysis of the 2014 Phillies and baseball in the Philadelphia region. 

0 comments: