Friday, 13 May 2016

How much more valuable are first division runs?

England announced their squad to play Sri Lanka this week, with Hampshire's James Vince getting the nod to take up the middle order slot unfortunately vacated by James Taylor. Nick Compton, meanwhile, keeps his place at number 3, at least for the time being. Essex's Tom Westley, who has had a productive start to the season and has been much talked up, was left out (I was hoping he would be picked, but not for any cricketing reason- I just wanted the opportunity to make some Princess Bride jokes).

As England squad selections draw near, with places up for grabs, attention often turns to the county championship averages. One of the few things everyone seems to agree on at this point is that runs made in the first division of the championship should be valued more highly, being made against higher quality attacks. This seems eminently reasonable, but raises a question: how much more valuable are they? Can we make the comparison quantitative?

I'm going to have a go.

What we want is to take a sample of batsmen who played in both divisions in successive seasons and ask, on average, how much did their run output drop/rise on switching divisions. Such a sample is provided to us by the championship's promotion and relegation system.

What I've done is go through the county averages for all the completed seasons since 2010, looking at the performance of players in teams that were relegated or promoted and then comparing their season's batting average before and after the change of divisions. (So, for example, I took the batsmen who played for Kent in division 1 in 2010 and compared each batsman's average to what they managed in division 2 in 2011).

I only included batsmen who played at least 10 matches in both seasons. The results are depicted in the graph below. The batting average in division 2 for each batsman in the sample is one the x-axis, with division 1 on the y-axis. Players in relegated teams are in red, promoted teams in blue. Points below the black line averaged higher in division 2 than division 1, and above vice versa. The green line is the best linear fit to the data.

Of the 81 players in the sample, 52 averaged higher in division 1 and 29 averaged lower. So, the intuition that runs are harder to get in division 1 seems solid, as expected. But how big is the difference?

Well, on average the relegated players in the sample increased their averages by 4.98 runs on going from division 1 to division 2. The promoted players saw their averages drop by an average of 7.12 runs on going from division 2 to division 1. So based on those numbers the difference is moderate but noticeable- able to turn a "very good" set of numbers into merely "good" ones and "good" into merely "acceptable".

The linear fit which I attempted (which should be taken with absolute ladelfulls of salt) gives:

average in div 1=28.2 + 0.12 * (average in div 2)

so it would predict a player who averages 50 in division 2 to average only 34.2 in division 1. (As I say, don't take this equation too seriously, and possibly not seriously at all, not least since it predicts that players averaging less than 32 in div 2 should be expected to do better in div 1).

There is a chance that the difference between divisions is exaggerated in this data by a selection bias. Specifically, looking at players who were promoted from div 2 or relegated from div 1 may bias the sample towards players who under-performed their "true" ability when in div 1 or over-performed in div 2. In this case the shift in batting averages may in part be a case of regression to the mean, on top of the real change in the difficulty of run-getting.

This caveat notwithstanding, the difference in divisions seems quite considerable, and division 1 runs are worthy of their additional praise.

1 comment:

  1. Interesting data. Your linear regression should go through zero, which would give you a better sense of the difference. In this case, as they are both on the same axis, you can just take the average of div1_runs/div2_runs to get the comparative value.

    When I can get around to posting stats (rarely) I prefer to plot batting averages on hyperbolic axis, as it gets rid of the weird outliers (averages of 70+). If you think about an innings as a sequence of binomial probabilities of getting one more run (p=1/average), then it makes more sense, and should give an accurate linear fit (in this case, your data runs from 0.016 to 0.05).

    Notwithstanding all that. It is probably true that players averaging less than 32 will do better in division 1. There is a lot of luck in an average, so a player who averaged < 32 in division 2 will most likely gain from regression to the mean more than their performance will reduce in division 1. If you could adjust for expected regression (compare to career average, or trajectory because it changes with age) it will give a better number. (And age trajectories are a hugely important topic as yet uncovered in cricket stats. Hit me up at idlesummers if you want to discuss it).