We need to look at the data: May 2016

Thursday, 26 May 2016

Charting the evolving role of the wicketkeeper

Last week's test between England and Sri Lanka belonged to Jonny Bairstow. A century on his home ground and a match winning one at that- rescuing England from 83-5 and dragging them to a total out of the reach of Sri Lanka's callow batting line up. Behind the stumps in his role as wicketkeeper he took 9 catches, making it an all round good 3 days at the office.

Bairstow is an example of what would seem to have become a pretty established pattern for the modern test match side: picking your wicketkeeper with a heavy emphasis on their willow-wielding ability, and a lesser focus on their glovemanship than might have been seen in previous generations. I don't think I'm going too far out on a limb to suggest that Bairstow is not the best pure wicketkeeper available to England, but out of the plausible keeping options he's the best of the batsmen, at least for the longer format.

This has made me wonder: how much has the wicketkeeper's role evolved over time? How much more are teams relying on their keepers to score runs? And has an increased emphasis on the batting prowess of keepers had a measurable cost in their performance behind the stumps?

The simplest thing to think would be that picking keepers based on their batting would come at a price in catches and stumpings. But can this be seen in the data?

I particularly enjoyed researching this post, not least because answering those questions will take not one, not two, not three but four graphs.

First of all, the run scoring. The graph below shows the run scoring output of designated wicketkeepers, as a percentage of total runs scored by batsmen in tests from 1946-2015. The red points are the year by year data and the blue line is the decade by decade average. The decade by decade averages give you a better sense of the long term trends.

This data shows a clear evolution towards a greater dependence on wicket keepers to provide runs. Wicket keepers provided only 6% of runs in the immediate post-war period, but they now provide nearly 10%. This is, of course, very much in line with conventional wisdom. One thing that struck me, however is how steady this increase has been. I had expected to see a rather more dramatic increase in the 90s and early 2000s after Adam Gilchrist made the swashbuckling batsman-keeper cool, but the importance of the wicketkeeper's runs had been rising steadily for a while (with a bit of a dip in the 1980s).

But what of their behind the stump performance? If teams' enthusiasm for batsman-keepers is leading to a lower standing of keeping, one might expect that to be reflected in how wickets are taken. If keepers are worse than they used to be then perhaps modes of dismissal which depend on them- catches behind and stampings- will decrease relative to other, non-keeper dependent, modes of dismissal.

The next graph shows the percentage of total wickets that were catches by the keeper in tests from 1946-2015. (Again, red points=year by year, blue line=decade by decade)

Far from decreasing, the reliance on wicketkeeper catches to provide wickets increases steadily post 1946- over the same period that keeper run scoring was on the rise- before hitting a plateau around the 1990s. Modern wicketkeepers provide about 19% of the total wickets through catches, and that figure has shown any noticeable downward shift since keepers have been expected to provide more runs. It may well be that what this graph is telling us has most to do with the evolution wicket keeping and bowling styles rather than keeping quality, but in any case its true that modern teams rely on wicket keepers both for more runs, and for more catches than teams 70 years ago. As the responsibility of keepers has increased their responsibility as glovemen has not diminished at all.

Wicket keepers can also contribute to dismissals via stampings. This is a much rarer mode of dismissal than caught behind but, we some may argue its a truer test of wicket keeping skill. The graph below shows the percentage of wickets that were stumpings over the same period as the graphs above.

The contribution of stumpings to the total wickets decreases in the post war years- over the same period that the contribution of catches increase (perhaps reflective of a decrease in standing up to the stumps? I'm not sure). But it's held steady between 1.3% and 1.9% for the last 50 years. So, wicket keepers continue to hold up their end in whipping off the bails.

If we can't see any strong changes in wicket keeping contributions to wickets, what about other ways of measuring wicket keeping quality? Byes, for instance. The graph below shows the number of byes conceded per 1000 deliveries in test cricket from 1946-2015.

The rate of conceding byes has hardly changed in 70 years. Looking at the decade by decade trends you could argue that it was on a steady decrease up to the 90s before taking an uptick, but these changes are miniscule- corresponding to maybe 1 extra bye conceded in a 1000 deliveries.

So, while its clear that more runs are indeed required of the modern keeper, the expectations behind the stumps have not shifted that much. Keepers contribute a consistent ~19% of wickets through catches with an additional ~1.5% through stumpings. They concede about 7 byes per 1000 balls and have barely budged from that for 70 years. Considering that the expectations on their batting have increased, while they have remained steady in other aspects of the game, keepers arguably have more on their plate than ever before.

Monday, 16 May 2016

Reverse Swept Radio

This week I had the pleasure of being interviewed by Andy Ryan on the excellent Reverse Swept Radio podcast. If you would like to hear me talk about cricket, stats and this blog, the link is here:

http://reversesweptradio.podbean.com/e/rsr-81-a-cricket-podcast/

Friday, 13 May 2016

How much more valuable are first division runs?

England announced their squad to play Sri Lanka this week, with Hampshire's James Vince getting the nod to take up the middle order slot unfortunately vacated by James Taylor. Nick Compton, meanwhile, keeps his place at number 3, at least for the time being. Essex's Tom Westley, who has had a productive start to the season and has been much talked up, was left out (I was hoping he would be picked, but not for any cricketing reason- I just wanted the opportunity to make some Princess Bride jokes).

As England squad selections draw near, with places up for grabs, attention often turns to the county championship averages. One of the few things everyone seems to agree on at this point is that runs made in the first division of the championship should be valued more highly, being made against higher quality attacks. This seems eminently reasonable, but raises a question: how much more valuable are they? Can we make the comparison quantitative?

I'm going to have a go.

What we want is to take a sample of batsmen who played in both divisions in successive seasons and ask, on average, how much did their run output drop/rise on switching divisions. Such a sample is provided to us by the championship's promotion and relegation system.

What I've done is go through the county averages for all the completed seasons since 2010, looking at the performance of players in teams that were relegated or promoted and then comparing their season's batting average before and after the change of divisions. (So, for example, I took the batsmen who played for Kent in division 1 in 2010 and compared each batsman's average to what they managed in division 2 in 2011).

I only included batsmen who played at least 10 matches in both seasons. The results are depicted in the graph below. The batting average in division 2 for each batsman in the sample is one the x-axis, with division 1 on the y-axis. Players in relegated teams are in red, promoted teams in blue. Points below the black line averaged higher in division 2 than division 1, and above vice versa. The green line is the best linear fit to the data.

Of the 81 players in the sample, 52 averaged higher in division 1 and 29 averaged lower. So, the intuition that runs are harder to get in division 1 seems solid, as expected. But how big is the difference?

Well, on average the relegated players in the sample increased their averages by 4.98 runs on going from division 1 to division 2. The promoted players saw their averages drop by an average of 7.12 runs on going from division 2 to division 1. So based on those numbers the difference is moderate but noticeable- able to turn a "very good" set of numbers into merely "good" ones and "good" into merely "acceptable".

The linear fit which I attempted (which should be taken with absolute ladelfulls of salt) gives:

average in div 1=28.2 + 0.12 * (average in div 2)

so it would predict a player who averages 50 in division 2 to average only 34.2 in division 1. (As I say, don't take this equation too seriously, and possibly not seriously at all, not least since it predicts that players averaging less than 32 in div 2 should be expected to do better in div 1).

There is a chance that the difference between divisions is exaggerated in this data by a selection bias. Specifically, looking at players who were promoted from div 2 or relegated from div 1 may bias the sample towards players who under-performed their "true" ability when in div 1 or over-performed in div 2. In this case the shift in batting averages may in part be a case of regression to the mean, on top of the real change in the difficulty of run-getting.

This caveat notwithstanding, the difference in divisions seems quite considerable, and division 1 runs are worthy of their additional praise.

Thursday, 5 May 2016

The candidates

Despite its title, this is not a surprise post about the extraordinary political wranglings currently in full swing in the land of baseball and chilli-dogs. No, this will be about the far weightier matter of whether certain batsmen are especially susceptible to being pinned LBW, and who those current players are.

In cricket commentary, it's common for players whose technique looks somehow prone to leave them trapped in front of their stumps to be described as "lbw candidates". This terminology seems to be applied specially to that particular means of dismissal- batsmen are rarely described as "caught behind candidates".

The questions I want to investigate in today's post stem from this.

Firstly, is "lbw candidate" a worthwhile category- is there a substantial subgroup of modern test batsmen who are especially more lbw prone than their peers?

Secondly, who are these prime candidates in the post-Shane Watson era? I've often heard Alastair Cook described as a "candidate". Does he deserve the title?

We'll also be touching on where in the world lbws are most prevalent.

To tackle this, I took a sample of 45 current test match players, representing all the test nations apart from Zimbabwe, who haven't had much opportunity to play recently. The sample was obtained by taking the most recent test for each nation and including all the batsmen in he top 7 who had played at least 15 tests and who weren't obvious night-watchmen. For each player I looked up the total number of LBW dismissals in their test career and divided it by the number of dismissals overall. This is what is on the x-axis of the graph below, with the batting average of each player on the y-axis. The colour/shape of each point indicates the country for which the batsman plays.

The black dashed line is the sample median (0.155) and the red dashed lines either side are the upper (0.187) and lower (0.125) quartiles. As you can see, the data is quite clustered horizontally suggesting only a fairly small degree of variation in vulnerability to LBW amongst current test batsmen. There's also no significant correlation between the LBWs/dismissal and the batting average, suggesting that having a high proportion of dismissals be LBW doesn't indicate much either way for a batsman's run scoring ability.

There are, however, a few noticeable outliers, far removed from the central cluster to whom we now come:

The Shane Watson memorial award for excellence in attracting LBW decisions (I like the idea of this award- we could call it the "iron pad" and award it annually) goes to South Africa's JP Duminy, who is way off to the right of the graph with 39% of his dismissals being LBW. (A lot of these were against spin bowlers).
There's a select trio of players to the left of the graph who hardly ever get pinned LBW. Namely Pakistan's Sarfraz Ahmed (0 lbws/28 dismissals), England's Ben Stokes (1/41) and Bangladesh's Tamim Iqbal (2/79). It may not be significant but these are all quite aggressive batsmen, so perhaps more than being good at avoiding LBWs, they're finding other, more exciting, ways to get out first.
There's a foursome of Pakistan players separated from the main cluster, at around 0.25 LBWs/dismissal. These are: Younis Khan, Misbah ul Haq, Asad Shafiq and Mohammed Hafeez. It's tempting to wonder whether this might be because they play a lot of tests in the UAE, where the low, slow pitches are thought to be favourable for LBWs. Indeed, in the graph below you can see that the UAE does have the highest rate of LBWs per dismissal of top 6 batsmen amongst test match hosts since 2010. However, this probably doesn't fully account for it- if we exclude tests in the UAE for these four players only Hafeez sees his percentage of LBWs drop significantly.

Overall, modern test batsmen don't vary too much in how frequently their pinned leg before, with a small number of exceptions. For what it's worth, Alastair Cook falls close to the central cluster of data points in our first graph, albeit slightly on the high side, with a rate of 0.19 LBWs/dismissal. And with Pakistan's apparently quite LBW prone top order coming to England this summer, it could be quite a good season for the thump of ball on pad, and the slowly raised finger. Maybe.