After a bit of an unplanned break, I'm happy to return to looking at cricket's beliefs and the evidence behind them.
When a batsman reaches a reasonable score, somewhere around 20 or 30, we'll often find ourselves declaring that they've got themselves 'in'- that they've got used to the conditions and the nature of the bowling, they're seeing the ball well etc etc. The corollary of this is that right at the beginning of their innings we expect the batsman to be more vulnerable. This belief is reflected in staples of commentary like "one brings two" or the oft-repeated assertion that a batsman who gets out for 35 will be much more disappointed than one who gets out for 10 because they'd already "done the hard work".
For this post, I want to look at how a batsman's vulnerability to getting out changes as a function of how many runs they already have. This clearly impacts on what I just discussed above but also on another veteran of the cliché circuit: the idea of the "nervous nineties"- that a batsman's performance will change, and perhaps drop, as they approach the emotive figure of 100. I mention that one because it's the first piece of cricketing received wisdom that I remember having serious doubts about when I was a young, geeky, cricket lover. Anyway, enough of the origin story, on to the data.
The graph below plots the probability that a test match opening batsman will get out before scoring 5 more runs against how many runs they already have. I chose to look at opening batsman to start with because they all start their innings in circumstances which are to some extent comparable- i.e. against the new ball, with no runs on the board. The "error bars" are estimates of the uncertainty in the estimate of the probability with this sample size, based on the assumption that the count of batsmen getting out in a given interval obeys a Poisson distribution. (At this point I want to admit to not being a statistician by profession, I'm a physicist so I justified that assumption in my head by analogy with radioactive decay). The main point of the error bars is as a rough guide to how seriously you may want to take small wiggles in the data.
As you can see there is a quite a big drop in vulnerability going from 0 to 5 runs. After that the drop is much more gradual, so it seems that opening batsmen do most of their playing themselves in in the first 5 runs. After that their susceptibility to getting out only changes a very little, hovering around a
10% chance to get out before the next 5 runs from there on. Looking towards the right hand end of the line, there isn't much support for the idea batsmen are more vulnerable in the nineties. Indeed, if there's any increase in the likelihood of getting out around the 100 run mark it seems to be just after 100 not just before. Maybe celebrating the ton is a actually a serious distraction. We could call it the "hacky hundreds" or "hubristic hundreds". Or something.
The picture is similar for middle order batsmen (positions 3-5 in the order), as we see below.
There is a steep decline in vulnerability over the first five runs followed by something more gradual. Again, the rise in vulnerability around 100 seems to occur after 100, not before.
So it seems, averaging over all batsmen, over all test history that:
1) Batsmen seem to get 'in' quite quickly: most of the decline in vulnerability comes in the first five runs
2) Batsmen are more vulnerable immediately after scoring a hundred than immediately before.
Just to finish off, I thought it would be interesting to see how this looks for some individual players. Obviously, sample size is going to be a problem here, so this exercise can only make sense for players who've played a lot of tests. I chose to look at two current England veterans: Alastair Cook and Ian Bell, a legend they'll be facing soon: Younis Khan and recently retired Sri Lankan hero Kumar Sangakkara. To try to further mitigate the sample size thing I've looked at the data in blocks of 10 runs rather than blocks of 5, so the graphs below aren't directly comparable with the graphs above.
First, Cook and Bell:
The data's pretty noisy so it's hard to say too much. It is interesting to note though that while Bell really is a lot more vulnerable early on (binary Bell and all that), Cook's vulnerability shows very little systematic dependence on how long he's been in at all. But as you can see, the data's pretty messy.
On to the legends:
Again the data's noisy, but broadly consistent with what we saw above: an initial decrease in vulnerability followed by very little systematic dependence and more suggestion of a rise in vulnerability immediately after a 100 then before.
Go easy on the bat waving, centurions.