(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)
Jeff Petersmeyer writes:
I coach the jumpers here at Boise State and as a fan of the book Moneyball by Michael Lewis (the book that got my brain initially wired to look further than just recruiting the “best” jumpers out of high school (as listed by Track and Field News, etc), I have tried to delve a lot deeper. While coaching at the Olympics this summer in London I began reading—a lot. I read close to 30 books while there for six weeks—including, Outliers, Thinking, Fast and Slow (amazing), Judgment in Managerial Decision Making (Bazerman), The Power of Habit, Start with Why, Switch, Talent is Overrated, The Talent Code, Freakonomics, The House Advantage, among others and more recently Nate Silver’s The Signal and the Noise. I have been collecting data from past years of NCAA championships in the long and the triple jump—finding out where the All Americans have come from (not too surprising: Texas, Louisiana, N. Carolina, Virginia, California, Florida, etc—warm states = more opportunity to practice the technical nature of the jumps (10,000 rule?)). I’ve collected the information as to what it takes to become an All American (distances) and how tall on average these jumpers were and what their personal bests were coming out of high school.
For example, a future All American might look like this in the men’s long jump: High school best of 24’1″, 6’1″ in height, and he’s got a 67% chance of coming from the states listed above. However, when looking at the Top 10 rankings over the last 13 years, only 12 of those jumpers became All Americans. Jumping over 24′ is the average of those who became All American, but it certainly doesn’t guarantee your success!
After reading Silver’s book and learning of Bayes’ Theorem, (as I’ve seen you discuss it in your blog and in a review of Taleb’s Fooled by Randomness), I started pondering if there were a way for me to make a rudimentary predictive model of high school recruits (long and triple jumpers). I could do what Kahneman prescribed for hiring an employee (pick six attributes and score them up, and always take the person with the highest score—removing any potential bias). I’ve thought of those traits as potentially: Best three jumps, performance at the state championship, speed, test score or GPA, height (not always easy to find), etc). There are several biases coaches fall victim to in recruiting (judgments based on intuition indeed: going to watch an athlete perform in practice or a competition—a year I HAVE to sign a good jumper, let’s say, and we “think” he’s going to be good.. not based on fact but based on our faulty intuition because we NEED him to be good and he’s interested in our program). Also, we get calls from coaches who claim their athlete is going to be good, or has high potential.
I’ve collected over 350 of the best jumps from the 2007 high school class (among tons of other data) to see without hindsight bias (not throwing anything out—Julio Jones plays for the Atlanta Falcons, Jeremy Kerley for the Jets—but keeping them in any potential rankings I devise). So now I’m getting to my question you can already see: Do you think there’s a way using a regression model or Bayes’ Theorem, or any direct or indirect correlations (Bill James is a hero of mine :)) that I could come up with something to weed out one jump wonders or find diamonds in rough?
Perhaps I’m wasting my time but I feel the more we can select student-athletes based on factual information vs. faulty intuition sabotaged by some sort of bias, we will be better served. My background is in political science as an undergrad, but unfortunately they let me escape college without taking statistics!
The jumpers at Boise State . . . cool! I love these sports examples. I don’t know enough about jumping to offer any great ideas right now, but I thought that if I post this, maybe some of you will have useful thoughts? (I’m looking at you, Phil.)
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science