April 12, 2012

The Golf Genome Project: A Look At How Probability Shapes The Game

No charts or graphs to help you understand all this writing below, just a picture of some random golf course.  Gee, thanks, author guy.
     Football, baseball, basketball, tennis, soccer, boxing, and just about all other sports are made up of a lot of statistics, but they all boil down to one thing:  two athletes or players, each with a certain level of skill, facing off against each other directly.  You can come up with a probability that one entity will win, and if they don't, then the second entity wins.

     And then there's golf.  Up to 156 golfers, all playing their own rounds yet being scored against each other.  That format makes it hard to understand just how good the best players should be in any given tournament and how strange it is for guys like Zach Johnson and Y.E. Yang to win majors.

     So I decided to collect data from all 5722 rounds played in the 14 PGA Tour events prior to the Masters.  For each one, I noted the player who shot the round, what score he shot, what number round it was, and which tournament it was shot at.  I wanted to see how much each factor really affected a golfer's score, how much was just random occurrence, and eventually come up with a model for a simulated golf tournament.

     First, I looked at the three factors together, and all three factors do have a significant effect on a golfer's score.  The fact that round number was important surprised me, so I wanted to see what it's effect actually was.  The average Tour player would shoot 72.1 in round 1, 71.9 in round 2, 72.4 in round 3, and 72.8 in round 4.  So the round number can change a score by about half a stroke either way, an effect that is too confusing and too small to worry about in the model.

     After ignoring round number like I wanted to anyway, I crunched the numbers again using just the player and the tournament as factors.  Again, both had a significant impact, yet only 24% of the variation was explained by player and tournament effects.  It turns out that 76% of what makes up a pro golfer's score is just random error.

     So what's the best way to model a round of PGA Tour golf?  The average golfer in the average tournament will shoot an average of about 72.25.  (Though because better players are in more tournaments and play more rounds, the average round among those studied was 71.29.)

     Next, we add in the player's effect on the score.  We can consider player effects to be normally distributed with a standard deviation of 1.05.  (Note:  In a normal distribution, 67% of players will be within one standard deviation of the average, 95% will be within 2 deviations, and 99% will be within 3 deviations.)  Rory McIlroy rated as the best player at 4.9 strokes under the average player.  Tiger Woods was second at 4.3 strokes below average.

     The next step is the effect of the tournament/course, which is normally distributed with a standard deviation of 1.03.

     Finally, we add the random error, which is normal with a standard deviation of 2.93, almost three times as much as the player and course effects.  So if the average player shoots 200 rounds, he'll probably shoot one round of 63 or lower and one round of 82 or over.

     By using only the top 38.6% of normally generated golfers (I used the standard 144 per tournament divided by the 373 from the study; it just happened to work best) and random course and error effects, the simulated rounds very closely matched the actual rounds, with about 98% of the variation being explained by the model.  The other 2 percent is mostly found at the high end of the scores, where real golf allows for higher scores than the model.  Deleting the lower 60% of golfers matches the fact that the low-level pros rarely make it into tournaments.

     I hope some of that information left an impression on you, or at the very least made sense to you.

    


     In the second part of this project, I plan to figure out how often the best golfers in the world should actually be winning, and how bad a tournament winner can really be.

No comments:

Post a Comment