Friday, 25 December 2015

Festive Tidings and the Exceptional AB de Villiers

Today, I bring you a festive look at the data! I mean, not that there's anything particularly Christmassy about the content of this blog post- but hey, it's Christmas, there are mince pies in the oven and I'm writing about the batting statistics of wicket keepers. To me, that's festive.

This time around, the piece of cricketing received wisdom coming under the microscope is the belief that when a batsman who's able to keep wicket has to do so, it impacts negatively on their run scoring ability. The most famous (and, not coincidentally, also the most extreme) example of this is Kumar Sangakkara who averaged an acceptable 40.48 when playing as a wicket keeper and a stellar 66.78 when playing as a specialist batsman. It seems reasonable to believe that the physical and mental strain of long periods of wicket keeping would make run scoring harder, but the same could be said of the pressure of the captaincy- and we saw in the last post that captaincy actually seems not to generally make so much difference to run scoring output.

I actually prepared the research for this post a while ago, but didn't write a post on it because- as you'll see below- there isn't so much to work with in this case, and I worried that there wasn't enough numerical meat to make a satisfying analysis. However, the issue came up on the superb Switch Hit podcast this week- in the context of AB de Villiers' stewardship of the keeper's gloves for South Africa- and I thought that since it's an interesting question, I might as well write it up. Decide for yourselves whether the data justifies the conclusions.

So, what we want to do is take some test match players who've played a decent number of tests both as wicket keeper, and as a specialist batsman and compare their batting averages in those two sets of games. The problem is that there are very few players who fit that description. Specifically, I could find only seven players who played both at least 10 tests as the designated wicket keeper and at least 10 not as the wicket keeper. That rather select club is listed in the table below

In the graph below, I've plotted the batting average when playing as keeper against the average when not playing as keeper for each player. Players falling below the blue line have worse averages when playing as wicket keeper and those above have better batting averages when granted the gloves.

Seven players isn't much to draw a conclusion from but nevertheless, the evidence in this case weighs in favour of the received wisdom- it does seem that having to keep wicket depresses a batsman's average. Of our seven players 2 have better averages when playing as keeper and 5 do worse. That in itself could easily just be chance, but what's more notable is the players who are doing worse as keeper tend to be doing rather a lot worse, suggesting that there is a potentially rather a strong effect at play. The average difference between averages when keeping and not our sample was -10.19 runs- less extreme than Sanga's -26.3 but a pretty big difference all the same.

Which makes AB de Villiers' bucking of the trend all the more special. He averages fully 8.83 runs higher when keeping. Of course, this won't necessarily last. It's quite possible - maybe even likely - that if he stays as South Africa's first choice gloveman for a couple more years his average as keeper will regress back in line with his average when not keeping - or even below. Or perhaps - as he has in many other ways - de Villiers will prove to be exceptional in the truest sense of the word.

I want to finish this post by thanking you all for reading and to particularly thank Chris of the excellent blog "Declaration Game" for kindly promoting my blogging over the last 6 months. I was honoured to be included in his "Select XI" blog posts of the year, which if you haven't seen it yet is well worth a look- providing a very broad cross section of some extremely interesting cricket writing.

Merry Christmas!

Sunday, 15 November 2015

Batsmen and the burden of the captaincy

It must be tough being a test match captain. The potential for days in the field, a mind full of bowling changes and fielding positions. Commentators and fans analysing your every move. Are you being too funky or not funky enough? Then, after all that, you have to go out and bat. As well as being held responsible for the collective success or failure of your team, you have your personal performance to take care of. The burden is heavy. Surely you're exhausted. Something must give, mustn't it?

It seems to be a fairly commonly held belief that the cost of doing what test teams usually do, in making one of their best batsmen the captain, will often come in the form of reduced run output from the player in question. In discussions of Joe Root, England's presumptive captain-in-waiting, I have certainly heard it raised that making him captain will dent his prolific run scoring.

It seems a reasonable enough worry to have. The captaincy certainly carries a lot of pressure with it, and a lot of extra responsibility which one would have thought would make it harder to focus on one's batting. But what does the evidence say? How does the captaincy affect a batsman's performance?

The graph below plots the batting average when playing as captain against the batting average when not playing as captain for all the test captains who have led their side at least 30 times.

Points below the blue line represent players who's batting average was lower when captaining, and points above the line represent players who were more prolific when skippering. There are two things to notice here:
1) Most points fall fairly close to the blue line- i.e. for most of the players in our sample their batting averages with or without the captaincy only differ a little bit.
2) There are more points above the line than below it (26 vs 17 to be precise)- i.e. it's more common for a player's average to improve with the captaincy than to decrease.

On average, the players in this sample increased their batting average by 3.76 runs when carrying the captaincy burden. I wouldn't read too much into that positive shift as it is much smaller than the sample standard deviation. The main take home message is that for most players the captaincy doesn't seem to make much difference to their average, and only for very few does their average significantly decrease.

I have heard it said that the England captaincy may carry peculiar pressures- perhaps due to the often slightly tempestuous relationship between players, media and fans in English sport. So one may wonder about the England captains of recent vintage in our sample. Of those only, Michael Vaughan (36.02 with captaincy vs 50.98 without) shows a big negative shift. Alastair Cook (49.94 vs 46.36), Andrew Strauss (40.76 vs 41.04) and Nasser Hussain (36.04 vs 38.10) all have pretty similar numbers for the two cases. Mike Atherton shows a slightly bigger shift but in the positive direction (40.58 vs 35.25).

So there's really not much compelling evidence to make us think that the captaincy depresses the run scoring of batsmen. But why, then, is this believed? I don't know, but my personal theory is this: when a player is given the captaincy they're usually coming off the back of a pretty good run- since you generally don't want to give the captaincy to a player unsure of their place. But all good runs must end eventually, for all players, captaincy or no. Whenever that does happen, this will be widely attributed to the pressures of captaincy catching up with them and the belief is perpetuated.

Test match captains are made of stern stuff- despite the pressure, they'll just carry on batting.

Tuesday, 10 November 2015

Pakistan's spinners have mastered the UAE where others have failed

For this post, I was reflecting on England's recent performance against Pakistan in the UAE. The consensus, after England's 2-0 defeat, seems to be that they performed fairly well but just came up against a side better suited to the conditions.

There's certainly a lot of truth in that. Despite the fact that Pakistan haven't been able to play tests in their home country in recent years their success in the UAE- where they've played in lieu of home matches- rivals some of the strongest home teams in test cricket. The graph below illustrates the 'home' record of each test match side since November 2010 (the period over which Pakistan have been laying regularly in the UAE). The 'x' axis shows the percentage of wins achieved by the 'home' side and the 'y' axis shows their net batting average - bowling average at home in that period.

By these measures Pakistan's record in the UAE is very close to England's in England- not bad considering they don't actually get to play at home.

The most obvious difference between the two sides was the performance of their respective spin bowlers. While England's batsmen floundered against the legspin of Yasir Shah; Adil Rashid, Moeen Ali and Samit Patel neither took regular wickets nor kept the runs down. I think it's fair to say that, in the main, they rightly haven't been over-harshly criticised, but I also think there's an air of disappointment surrounding the fact that the best spinners England could muster simply didn't seem to cut the mustard.

I would like to offer one point in mitigation of this- the UAE is actually quite a difficult place to be a non-Pakistani spinner. The graph below plots the bowling averages of 'home' and 'away' spin bowlers in each test match hosting nation since November 2010.

There's a (fairly weak) trend in the direction you'd expect- that in places where "home" spinners perform well, so do "away" spinners, at least relatively. But the performance of Pakistan's spinners in the UAE is far, far better than the overall performance of spinners for the touring sides they've been playing against. Pakistan's spinners average 29.65 in the UAE since November 2010, as compared with 44.69 for spinners from other test nations in the UAE. The difference between those two figures is the second highest for any of the test hosting nations. The largest difference between home and away spinners is in Australia, where baggy green spinners have been taking wickets at 41.63, as against 57.49 for touring sides.

So it seems that Pakistan's spinners have been finding a way to succeed in the UAE, where the best spin bowlers of touring sides have generally been struggling. Whether this is because of the pitches, because Pakistan's batsmen are really good at playing spin, something else, or a combination- I don't know. I do think that this to some extent sets the performance of England's spinners in some context- their collective average of 59.85, certainly remains a disappointment but there were always unlikely to be England's match winners in that series (although Rashid nearly was in the first test, but for the bad light). I also think it illustrates that there was never likely to be much tactical value in picking a third spinner for the third test, but perhaps a discussion of how to balance a bowling attack is one for another day.

Saturday, 31 October 2015

How long before a batting average means something?

Since I last posted, England have battled through two thirds of test series against Pakistan, acquitting themselves much better than at least I imagined they would, but still coming out behind. The struggles of England's middle order look set to lead to a test comeback for James Taylor.

In recent years, England's selectors have been praised for giving players a decent run in the side when called up- giving them more than or two chances to show what they can do. I assume, and hope, that the same treatment will be extended to Taylor and that, barring injury, he'll also play in the South Africa tour.

These ruminations lead me on to today's question: if we judge a batsman by their batting average, how many matches will it actually take before that average fairly reflects their ability?

I think most of us understand that quoting someone's batting average after two games isn't going to provide terribly strong evidence either way about how good they'll be in the long term. But how long should we wait before we can suppose that their average gives a strong clue as to their underlying run scoring prowess? In my experience, the conventional wisdom might place this number somewhere around 10 matches or a little more, depending on who you talk to.

To try and answer this question, I've attempted something a little different to my previous posts. Instead of using data from past test matches, I wrote a computer simulation to simulate the run scoring output of two (fictional) batsmen of known ability and looked at the distribution of their averages as a function of the number of innings played. The reason for doing this is that allows me to make a controlled 'experiment' in which I know how good the players in my simulation 'should' be and can see the degree to which statistical fluctuations obscure that in a finite sample of innings.

In my previous post, I argued that a player's vulnerability to getting out is only weakly dependent on how many runs they already have- being slightly elevated right at the very beginning of their innings (and maybe also a little elevated immediately after reaching 100).

I simulated the output of two players:

Player A had a 12% chance of getting out before reaching 5 and an 8% chance of getting out before scoring the next five runs thereafter. To put these numbers in context, this is very good- in the long run Player A could expect to average around 55.

Player B had a 16% chance of getting out before reaching 5 and an 12% of getting out before scoring the next five runs thereafter. This is rather more mediocre- in the long run Player B could expect to average around 35.

The two graphs below illustrate the probability distribution of batting averages for the each player as a function of the number of innings they were given in the simulation. The green points represent their median average after that number of innings and the red and blue points are the 10th and 90th percentile respectively. The region between the blue and red points reflects their likely range of batting averages after a given number of innings.

What's striking is that even after 50 innings the distributions are still quite broad - particularly for the better player (Player A). After 50 innings Player A has a 10% chance of averaging more than 66- making him look like a potential legend and also a 10% of averaging lower than 45 making look much more run of the mill.

Player B meanwhile has a 10% chance of averaging higher than 42 or lower than 28- the difference  between fairly good and pretty poor.

These averages are converging to a fair reflection of the players' abilities but they are doing so rather slowly- a hint that even after a fairly decent number of tests we need to base our judgements of players on more than their bare batting average.

Imagine if you were a selector, who brought these two imaginary players into your imaginary team and after a fixed number of tests had to choose between these two (perhaps you have a star player about to come back from injury and have to drop someone to fit him in). Would their averages be likely to guide you to the right decision?

The graph below shows the probability that the very good player A has a better average than the pretty mediocre player B after a given number of tests.

After 10 innings there's around an 80% chance that the averages will correctly reflect that player A is better than player B. Which sounds kind of okay, until one reflects that selection decisions are often- necessarily- based on fewer innings than that and that these two players are really not evenly matched at all- in the long run one would average a full 20 runs higher than the other.

Of course, in reality selectors have a lot more information available to them than just batting averages. Anyone can look up a players' average but selectors must exercise their judgement on a player's technique, temperament and suchlike using what they've seen in both matches and training. They have to do so because they don't have the luxury of letting a player play 20 test matches before making a decision about whether they're good enough- which is probably the minimum they would need to justify a decision based on batting average alone. To look at Gary Ballance's batting average of 47.76 after 27 innings, it's hard to avoid the conclusion he's been hard done by to not be in the team right now. And maybe he is- but one can't be sure of that from just his average.

It may well be the case that one could find a better way of estimating a batsman's ability from their stats after a small number of tests, which would converge on something fair a bit faster than simple batting average. On the other hand, fans like me should perhaps give selectors a break sometimes- they have rather complicated decisions to make, with rather limited and noisy information.

Sunday, 27 September 2015

When is a batsman 'in'?

After a bit of an unplanned break, I'm happy to return to looking at cricket's beliefs and the evidence behind them.

When a batsman reaches a reasonable score, somewhere around 20 or 30, we'll often find ourselves declaring that they've got themselves 'in'- that they've got used to the conditions and the nature of the bowling, they're seeing the ball well etc etc. The corollary of this is that right at the beginning of their innings we expect the batsman to be more vulnerable. This belief is reflected in staples of commentary like "one brings two" or the oft-repeated assertion that a batsman who gets out for 35 will be much more disappointed than one who gets out for 10 because they'd already "done the hard work".

For this post, I want to look at how a batsman's vulnerability to getting out changes as a function of how many runs they already have. This clearly impacts on what I just discussed above but also on another veteran of the cliché circuit: the idea of the "nervous nineties"- that a batsman's performance will change, and perhaps drop, as they approach the emotive figure of 100. I mention that one because it's the first piece of cricketing received wisdom that I remember having serious doubts about when I was a young, geeky, cricket lover. Anyway, enough of the origin story, on to the data.

The graph below plots the probability that a test match opening batsman will get out before scoring 5 more runs against how many runs they already have. I chose to look at opening batsman to start with because they all start their innings in circumstances which are to some extent comparable- i.e. against the new ball, with no runs on the board. The "error bars" are estimates of the uncertainty in the estimate of the probability with this sample size, based on the assumption that the count of batsmen getting out in a given interval obeys a Poisson distribution. (At this point I want to admit to not being a statistician by profession, I'm a physicist so I justified that assumption in my head by analogy with radioactive decay). The main point of the error bars is as a rough guide to how seriously you may want to take small wiggles in the data.

As you can see there is a quite a big drop in vulnerability going from 0 to 5 runs. After that the drop is much more gradual, so it seems that opening batsmen do most of their playing themselves in in the first 5 runs. After that their susceptibility to getting out only changes a very little, hovering around a
10% chance to get out before the next 5 runs from there on. Looking towards the right hand end of the line, there isn't much support for the idea batsmen are more vulnerable in the nineties. Indeed, if there's any increase in the likelihood of getting out around the 100 run mark it seems to be just after 100 not just before. Maybe celebrating the ton is a actually a serious distraction. We could call it the "hacky hundreds" or "hubristic hundreds". Or something.

The picture is similar for middle order batsmen (positions 3-5 in the order), as we see below.

There is a steep decline in vulnerability over the first five runs followed by something more gradual. Again, the rise in vulnerability around 100 seems to occur after 100, not before.

So it seems, averaging over all batsmen, over all test history that:
1) Batsmen seem to get 'in' quite quickly: most of the decline in vulnerability comes in the first five runs
2) Batsmen are more vulnerable immediately after scoring a hundred than immediately before.

Just to finish off, I thought it would be interesting to see how this looks for some individual players. Obviously, sample size is going to be a problem here, so this exercise can only make sense for players who've played a lot of tests. I chose to look at two current England veterans: Alastair Cook and Ian Bell, a legend they'll be facing soon: Younis Khan and recently retired Sri Lankan hero Kumar Sangakkara. To try to further mitigate the sample size thing I've looked at the data in blocks of 10 runs rather than blocks of 5, so the graphs below aren't directly comparable with the graphs above.

First, Cook and Bell:
The data's pretty noisy so it's hard to say too much. It is interesting to note though that while Bell really is a lot more vulnerable early on (binary Bell and all that), Cook's vulnerability shows very little systematic dependence on how long he's been in at all. But as you can see, the data's pretty messy.

On to the legends:

Again the data's noisy, but broadly consistent with what we saw above: an initial decrease in vulnerability followed by very little systematic dependence and more suggestion of a rise in vulnerability immediately after a 100 then before.

Go easy on the bat waving, centurions.

Friday, 28 August 2015

The value of the toss

Mike Selvey has written an interesting piece in the Guardian arguing that the toss makes very little difference to the outcome of test matches. This idea interests me, since pre-match analysis and the chat amongst fans often seems to imply enormous tactical significance to winning the toss. Could it be that it's all just pageantry?

Selvey backs up his point with statistics from England and Australia's recent history, but the subject intrigues me and I feel like there's room to chip in with my own two statistical penneth.

The table below shows the number of wins and losses for teams winning the toss in various formats. Taking my lead from the suggestion that uncovered pitches made a big difference to the value of the toss I've divided the Test data into pre- and post- 1970. (Honestly, I'm a bit young to remember uncovered pitches but this article suggests they began to be phased out in the 60s, which is why I picked 1970). I've split the data for limited overs formats into day, day-night and night since there's good reason to think that might be important.

I've also made an attempt at evaluating whether the difference between wins and losses is "statistically significant" in each case. More of that below.

The data is consistent with the uncovered pitches idea: the win/loss ratio for teams winning the toss is much better pre-1970 than post-1970. The post-1970 data, meanwhile, is pretty consistent with Selvey's assertion that the toss doesn't matter too much. The ODI data seems consistent with the idea that the toss doesn't matter much in Day games, but plays a role in Day-Night games. Counter-intuitively, the T20 data seems to reverse that trend.

So, what of this statistical significance?

One way of estimating whether a statistical finding is significant for a given sample size is to calculate the probability that you could have got a result at least equally extreme from that sample size, assuming that the "null hypothesis" is true. In our case "the null hypothesis" would be "the toss makes no difference". If this were true than in any given game (excluding draws) there would be a 50/50 chance that the toss would just happen to fall for the team that was going to win the game anyway, and from that assumption we can work out how likely it is that we would get a result as extreme as the one we did, from the sample size we have, if the toss really made no difference. This number is the "p-value" in the second-last column of the table.

Conventionally, if this number is less than 0.05 we call our result "statistically significant". There's a lot of issues with this approach to things, not least that the 0.05 threshold is completely arbitrary, but it at least gives us a rough starting point for deciding how much importance we should attach to a finding.

In this case, our results that the toss makes a (positive) difference in Day-Night ODIs, Daytime T20Is and that it used to make a difference (pre-1970) in Tests all pass our significance threshold of 0.05. The other cases fail to pass the threshold, which certainly isn't the same as saying we've proved the toss makes no difference in those cases, but does mean that we can't be sure it does with the data available.

As far as modern test matches go, it seems Mike Selvey is right- for the most part we probably do over-analyse the toss. Even if the difference it makes is real, it seems to be tiny. So, next time I think Alastair Cook's made the wrong post coin-flip decision, I'll try to remember to give the guy a break- it may well not make any difference.

Tuesday, 25 August 2015

Ian Bell's 'easy' runs

Today's post features something totally new for this blog: a reader's request. Specifically, Simon Mills asks:

"Can you test the hypothesis that Ian Bell only scored easy runs?"

Right now seems like a good time to reflect a bit on Ian Bell and his reputation as many are suggesting his international career could or should be about to draw to a close, after an Ashes series where he struggled to cut sufficient mustard. So, here goes.


Clearly, how we answer the question of whether Ian Bell deserves to be labelled a scorer of 'easy runs' is going to depend on how we define 'easy'. One, very simple, way we could try to define it is to say
"easy runs are those scored against test cricket's weakest attacks"

The graph below plots Ian Bell's batting average against each test side against the overall batting average of middle order batsmen (positions 3-6 in the order) against each opponent over the period of Bell's career (Aug 2004-present). Points which fall above the red line represent nations against whom he has outperformed his peers and those below the red line represent the nations against whom he has fared worse than the 'average' middle order batsman of his era.

His batting average against each nation for the most part follows that of the average middle order batsman. Against Australia, South Africa, Pakistan, India and West Indies his average is within a few runs either way of the average performance for middle order batsmen against those opponents. There a couple of large exceptions in Sri Lanka (against whom he has over-performed compared to his peers) and New Zealand (against whom his record is poor). Then there's a giant honking outlier in Bangladesh, who he has really cashed in against averaging 158 compared to the 'average average' of 63.

Bangladesh aside, there isn't a very strong trend for Bell to strongly over-perform against weak attacks or under-perform against strong ones. His does score more runs against weaker bowling sides, but only to a degree comparable to the average middle order batsman of his time.

Match situation

Another way of looking at Simon's question is to define 'easy' and 'difficult' not by the opposition but by the match situation, saying something like:

'easy runs are scored when you come in with your team already in a good position'

To look at whether Ian Bell primarily scores runs when he comes in with England in a good position, I've compared his performance when coming in with less than 20 runs per wicket on the board (i.e. at scores worse than 20-1, 40-2 etc) against his performance coming in with more than 80 runs per wicket scored (scores better than 80-1, 160-2 etc). The results are in the table below. I've also broken it up into time periods pre- and post- his being dropped after the West Indies tour of 2009, an event many people consider a watershed in his career.

The result is that yes, he does score more runs coming in with a good platform laid, then coming in to a dicey situation. This is even more true of post-2009 Bell than pre-2009 Bell. Post-2009 Bell seems to be quite the master of putting the boot in from a well-laid platform. On the other hand, one shouldn't take it that he's useless when coming in in a tough situation- his average when coming in with fewer than 20 runs/wicket on the board is fairly close to his overall, pretty respectable, average.

Clearly, it's unfair to accuse Bell of being only a scorer of easy runs. The 2013 Ashes stand out amongst Bell's achievements but he's starred on many other occasions too. However, it is true to say he is more prolific at scoring against weaker attacks and in more comfortable match situations. That shouldn't surprise us too much- easy runs are, after all, easier.

Tuesday, 18 August 2015

Winning away tests isn't getting harder, but not losing them is

As the Ashes locomotive rumbles into its final stop at The Oval, the destination of the series is decided- another home win, the sixth in the last seven Ashes. Meanwhile, in Sri Lanka, the home team have taken a 1-0 series lead against India, after a borderline miraculous comeback.

Winning tests away from home is hard. It was ever thus, but some are worried that it's harder than ever, that with packed international schedules players simply don't have the will or the time to acclimatise to foreign conditions and that away wins are becoming an endangered species.

With this in mind, I wanted to see how much more unusual winning away test matches has become. The graph below shows the percentage of home wins, away wins and draws in Tests (excluding Tests at neutral venues) in 5 year chunks since 1946.

The first thing we can notice is that the proportion of home wins has indeed increased steadily, but markedly since the late 80s. What's interesting to me here is that there has been very little change in the rate of away wins over the same period. Instead, the 'extra' home wins are coming out of the proportion of drawn games- the rise in home wins almost exactly tracks a decline in the number of draws.

I think many cricket fans are aware both that home wins are becoming more frequent and that draws are becoming less so but offer separate explanations for these observations.

A discussion about rise of home wins will tend to centre around modern scheduling and inadequate preparation by away sides. Meanwhile, if you asked a random cricket fan to explain the decline in the number of draws I suspect they would talk about the influence of limited overs form of cricket on test matches- making the game go faster and producing results. They might also make a mention of improved drainage systems. These factors may all be contributing but take no account of the fact that the rate of away wins has remained fairly stable and nor do they explain why home teams are benefitting disproportionately from the decline of the Test match draw. I can't claim to properly explain these things either, but I hope to come back to it. Perhaps it's that the increasing availability of wins for away sides (because less games are drawn) is roughly cancelling out what would be a decrease in away wins from increasing home advantage.

It's not the winning that away teams are getting worse at, it's the not losing.

Saturday, 8 August 2015

Umpires really are calling fewer no-balls under DRS

Amidst the confusion and excitement of the last two days of crazy test cricket at Trent Bridge, there's been a bit of chat about no-balls. I know-  of all the wonderful, brilliant things that have happened, I want to show you a graph about no-balls. Sad, right?

Yesterday, on two occasions, England thought they'd taken a wicket only for the batsman to be called back because the umpires checked for an overstepping front foot on the video replay and belatedly called a no-ball. Instances like this seem to happen quite a lot in modern test cricket. Many commentators are accusing the umpires of not even looking for no-balls in normal play because they know they can always check if a wicket falls. Therefore, the argument goes, the bowler isn't getting any warning that he's overstepping until it costs him a wicket.

To hear some commentators hold forth on the subject you'd think that no-balls are now only ever called when there's a dismissal at stake (which, just to be clear, isn't true) and that this is a serious moral disservice to the bowler (which I find pretty doubtful, but I guess is a matter of opinion).

So, are umpires really calling fewer no-balls in the DRS era?

Yes. A lot fewer.

The graph shows the number of no balls called in test cricket per legal delivery by year since 2000. There's a fairly steep drop following the introduction of DRS in 2009 which now seems to have levelled off. No balls are now being called at less than half the rate they were in 2009. Looking at the trend since 2005, it looks like the rate of no-ball calling was already decreasing before DRS. I think that's kind of interesting and I don't know why it would be true. It is possible that it's a false impression created by the way the noise in the data fell. The decrease since 2009 is undeniable though.

Personally, I think there are two ways cricket can go from here:

1) Technology ends up being used to check no-balls for every delivery, not just wicket taking ones-
although whether this can be done without either confusion or incredibly dull disruption remains to be seen.

2) Bowlers, commentators and fans learn to accept "the new normal". Front foot no-balls just aren't going to be called very often unless there's a wicket. If bowlers are really relying on the umpire to warn when they're overstepping then they'll have to get over that. They'll either have to get better at measuring their run up or have one of their fielders watch out for when they're pushing the line and warn them. Or they can just accept that overstepping will cost them a wicket every now and then.

Sure, no-balls aren't as exciting as the England being on the brink of Ashes victory. But they still make for a nice graph.

**ADDITION (9/8/15)**
Someone left a comment on reddit asking whether the drop in no-ball calling could be down to bowlers simply bowling fewer no-balls and have nothing particularly to do with DRS. This is a fair question, and of course, as a general rule, we must be careful not to conflate correlation with causation. I'd be quite surprised if this was the true explanation, as I'm not sure why such marked, global improvement in the avoidance of overstepping should occur, but I suppose it's possible.

I can't definitively rule it out but I did have a quick look at the rate at which no-balls have been called on (would-be) wicket taking deliveries in tests in the 2015 season. I found 11 examples of wickets being chalked off for no-balling and the 2015 season has seen 572 wickets from legal deliveries. 11/572=0.019 which is much closer to the overall rate of no-ball calling 10 years ago than the current rate. Indeed, it's higher than even the rate ten years ago (please beware the small sample though), which isn't too surprising since they probably were also missing some no balls in the pre-DRS era.


1) Suggests that umpires are indeed missing significant numbers of no-balls in 'normal' play, which they're catching on wicket taking deliveries.
2) Is consistent with the notion that the rate at which bowlers are actually overstepping hasn't changed too much. I don't think there's any way to make this definitive since there's no way of knowing how many no-balls umpires were missing pre-DRS.

I do think the more likely explanation is that umpires are catching fewer no-balls rather than that bowlers are simply overstepping less, but the only way I can think of to prove that would be if some poor intern at Sky went through 10 years of archive footage looking for missed no-balls. On balance, it's probably good that that isn't going to happen.

Tuesday, 4 August 2015

Thrashings are the norm in test match cricket

As the cricket watching community picked through the smouldering debris left behind by Steven Finn at Edgbaston, I noticed one theme come up several times: isn't it odd that these two teams keep thrashing one another?

A 2-1 series scoreline suggests two teams who are fairly evenly matched, and yet none of the individual matches has been even a tiny bit close. The margins of victory to date are: 169 runs (England), 405 runs (Australia) and 8 wickets (England). Shouldn't evenly matched sides produce evenly matched games?

And yet, test cricket confounds that expectation rather often. Australia's tour of South Africa last year  or the 2009 Ashes spring to mind as examples of test series which somehow contrived to be close overall, without producing a single close game.

So to place this years sequence of shoeings in some context I've had a look at the proportion of margins of victory in non-drawn test matches since 2005, to see just how prevalent hammerings are in modern test cricket and conversely just how much of a rare and priceless jewell a close test match is.

Our first chart shows the proportion of various margins of victory in all test matches since 2005. As you can see, nearly 50% of the pie is taken up by really big wins: innings victories and those by more than 200 runs or 9 or 10 wickets. By contrast, truly down-to-the-wire games (I believe the technical term is "arse nippers") form only a tiny proportion of test matches. Wins by less than 50 runs or 3 or less wickets make up only 7.6% if the total.

You may at this point be thinking that the stats are skewed by the inclusion of matches involving the relatively weak teams like Bangladesh and Zimbabwe. You might also be thinking of the shoddy away performances by the likes of England's last Ashes tour party or the Indian side that toured England in 2011. It's no surprise that those teams got thrashed, you might say.

In the chart below, to isolate the proportions in games between relatively evenly matched teams, I've restricted the sample to include only matches in series where both teams won at least one game- thus proving themselves at least capable of beating their opposition in those conditions. 

As you'd expect, this restriction does alter the proportions. But not by very much.

The region of pie chart given to the biggest victories (innings, >200 runs, 9-10 wickets) shrinks a little bit, garnering a slice of more than 40%. Nevertheless, games with margins smaller than 100 runs or 7 wickets still occur less than a quarter of the time and very close games (<50 runs, 1-3 wickets) only 8% of the time- even between relatively evenly matched sides.

In this light, it doesn't seem anomalous at all that we haven't seen any close games in the Ashes yet. With only two games to go, it wouldn't really be surprising if we don't see any at all.

To be honest, I think anyone who's followed test cricket for a few years knows that games with really narrow margins of victory are quite rare. What looking at these numbers did for me is throw into sharp relief how much we as cricket fans over-react to both heavy defeats and large victories. Even if your team has been absolutely walloped in one game, it doesn't necessarily mean they won't be a match for the opposition in the next. On the other hand, those of us with triumphalist tendencies, when rejoicing in a big win should remember that even that hapless Aussie team of 2010-11 absolutely smashed England at Perth.

I don't think I know of any other sport in which it is so common for games between fairly evenly matched teams to end in very large margins of victory.

"It's not unusual to get thrashed at any time,
Even when you're evenly matched and in your prime"

as Tom Jones sang in his hymn to the strange fluctuations of cricketing fortunes, before his manager persuaded him to make the song about love or some nonsense like that. I think it's pretty hard to pick a winner for the Trent Bridge test. But I doubt it will be close.

Saturday, 1 August 2015

Home and Away with James Anderson (and 15 other fast bowlers)

The mood of the English cricketing public has lurched once more from despondent to cheerful, as England continued their extended experiment in demonstrating why 'momentum' is nonsense with victory in the third test. It was a test with lots of great moments and sub-plots- Ian Bell's success after being moved up to three, Adam Voges' magic jumper and Steve Finn's heartening resurgence to name but a few. England's victory was set up, however, by James Anderson's first day efforts- taking 6-47 as Australia staggered their way to 136 all out. Worryingly for England, we're quite likely not to see Anderson for the rest of the Ashes as he went off in the second innings with a side injury.

Anderson's critics have always maintained that he is a home track bully, that he only produces his best in favourable home conditions- and that therefore his reputation as one of his generation's leading bowlers is undeserved. Anderson's 6-47 on a seaming pitch at Edgbaston can hardly dispel that impression.

Although it's certainly unfair to suggest that he never produces away from home it is probably fair enough to say his home record is much more impressive than his away one.  As far his average goes, he takes his wickets at 26.80 at home and 34.04 away.

What I want to do in this post is put that difference in some context. Yes, Jimmy Anderson takes his wickets more cheaply at home- but is he extraordinary in that regard or is he just an example of a very common phenomenon? Is his reputation more dependent on home performances than that of his peers in the fast bowling fraternity?

The graph above shows the difference between the home and away bowling averages for the world's 16 leading fast bowlers, as defined by the current ICC rankings. I should note that in the case of Pakistan's Junaid Khan I have counted his games in the UAE as being at home, since that's where Pakistan have played their 'home' games during his career.

Almost all of these 16 pace merchants have better averages at home than away- and of course it's no surprise that bowlers prefer conditions they're familiar with, where they learned their trade and gained their reputation. The three buckers of this trend are Josh Hazlewood, Tim Southee and Junaid Khan. As discussed above, Junaid Khan has never played a 'true' home test, and for that reason I was in two minds about including him. Josh Hazlewood has played only 8 tests (3 at home, 5 away) and so his startling stats should probably be taken with a pinch of salt, a squeeze of lime and a shot of tequila.

James Anderson, meanwhile, sits in the midst of a cluster of bowlers who average between 5 and 8 runs higher away than at home. The gap between his home and away performances is thus a little on the high side but by no means extraordinary amongst his fast bowling peers. 

You can say Jimmy is a home track bully if you want, but in that case test cricket's home track playground is full of them.  Bowlers who consistently excel to the same high level in all countries and conditions are rare beasts indeed.

Thursday, 23 July 2015

Does momentum matter?

If there's on thing that we can surely all agree on after the carnage of second Ashes test it's that Australia have "the momentum".

Cricket commentators, and in fact sports commentators in general, really like to talk about momentum. The idea seems to be that if a team scores a win- particularly a big win- in a series they can carry those good vibes with them into the next game, making them materially more likely to win the next one. The defeated team meanwhile, like a mouse transfixed in the path of a charging rhinoceros, better brace themselves for all that momentum coming their way.

I think the frequency with which "momentum" is wheeled out bothers me because it's extremely easy to invoke in hindsight and thereby argue that a particular run of victories or defeats was somehow inevitable, but is very quickly forgotten about when events take a different turn.

So the question for today is: does having the momentum make a difference to the result of test matches?

Specifically I've looked at test series of more than two matches which were poised at 1-1 after two tests. I've chosen to do this because I want to see if the effect exists in games between teams who are fairly evenly matched- otherwise is not an interesting effect at all, it's just another way of saying "Team A is a lot better than Team B and will still be a lot better than them when they play again next week." Moreover, it's the kind of situation where momentum gets brought up a lot.

In a test series poised at 1-1 after two tests, the team which one the second test will be widely proclaimed to have the momentum- having impressively wrestled that momentum from their opponents who won the first test. If momentum helps you win test matches then the team who won the second test should be more likely, on average, to win the third.

So does it work out like this?


After a bit of trawling through the archives I was able to find 59 test series of greater than two matches which were poised at 1-1 after two games. The third test of those series went "with the momentum" 19 times (i.e. the team who won the second test also won the third) but went "against the momentum" 22 times. There were 18 draws. Basically, momentum makes no difference.

If you just look at the Ashes you have 6 with the momentum, 3 against the momentum and 5 draws- which looks better for the momentum camp but is quite likely to be a fluke from the small sample. If you just look at recent test series (last 10 years) things go in the opposite direction, favouring wins against the momentum.

"But hang on," you might say, "Australia didn't just beat England in the third test, they absolutely hammered them- surely that means something momentum-wise?"

Actually, no not really. For the final bit I just looked at those series where the second test was a "thrashing", to see if in those cases the momentum carried over into the third test. Obviously the cut-off for what defines a thrashing is a little arbitrary, but the definition I went with was:

-any innings victory
-any 'runs' victory by greater than 200 runs
-any 'wickets' victory by 9 or 10 wickets

Restricting to that subset the result is: 11 wins with the momentum, 9 wins against the momentum and 4 draws, so still no noticeable effect.

All of this is not to say that Australia won't win the third test. As an England fan I'd like to think they might not, but truthfully, I think they probably will. Before the series began, most observers were saying that Australia had the better side and some people probably revised that conclusion a bit too hastily in the wake of the Cardiff test. If Australia do win at Edgbaston, however, it won't be the momentum that does it for them.

Sunday, 19 July 2015

Does Mitchell Johnson's bowling "feed off" his batting?

One of the subplots as England put the finishing touches on victory in Cardiff was a defiant innings of 77 by Mitchell Johnson. At the time, more than one commentator asserted that this innings would help his bowling, by imparting that most sought after of abstract sporting commodities: "confidence".

Sure enough, in England's first innings at Lord's Mitch caused considerable damage to England, despite the placid pitch, taking 3-53.

So, is this a general pattern in Johnson's career?

I must admit I was sceptical that there would be any evidence for this but it turns out that there is a little bit, but it seems to be a short term effect. By that I mean that when Johnson has a good innings with the bat he often does do well with the ball in the next innings. However, when averaged over the course of a series there is no particular relationship between him batting well and bowling well. 

The short term effect:

Mitchell Johnson has scored 1 century and 11 half-centuries in test cricket. I looked at his performance in the bowling innings immediately following these performances. I excluded the case of his 123* South Africa which was right at the end of a series, so there was no immediately following bowling innings, so that leaves his 11 half centuries.

In these 11 bowling innings following half centuries Mitch has taken 36 wickets at an average 21.33.

Comparing this to his career bowling average of 27.90 it seems that his bowling performance does pick up slightly in the immediate aftermath of a good batting performance.

Over the course of a series:

The graph below plots Mitchell Johnson's bowling average against his batting average for all the completed test series he has played in.
Looking at it by eye there's no clear pattern to associate a good batting average in a series with a good bowling average, suggesting that when averaged over the course of a series Johnson's batting and bowling averages behave more or less independently.

Getting a bit more Mathsy about it, I evaluated Spearman's rank correlation coefficient which is a way of determining how much two datasets are correlated. The answer was pretty close to zero (0.09 if you want to know) suggesting once again that Johnson's series batting and bowling averages are pretty uncorrelated.

So there you have it: the commentators were right- a good innings with the bat does seem to help Mitchell Johnson bowl well next time out. However, if you look at it over a course of a series his batting and bowling averages have pretty little to do with each other.

Monday, 13 July 2015

Shane Watson: LBW magnet, DRS fiend

For my first quick look at the facts behind cricket's stories, I'm going a little bit topical. 

Amongst the fallout of England's surprise victory in the first Ashes test, has been an awful lot of giggling at poor old Shane Watson, his propensity for LBWs and his poor use of the decision review system. Even as he walked to the wicket in the second innings it seemed everyone knew how he would be out and, of course, that he would review it. And so it came to pass.

It occurred to me to wonder if Watto really deserves his reputation as LBW magnet, DRS fiend. Is he really out LBW so very much? And is his use of DRS really so outrageously bad compared to others?

As you can see from the first column in the table, the answer to the first question is a resounding yes. He's gets out LBW an awful lot compared to his teammates in Australia's top seven (Adam Voges isn't included because he's yet to be out LBW in his fledgling test career). Shane Watson has been dismissed in this way a whopping 29 times in his test career- 27% of all his dismissals in tests. A more typical proportion of LBW dismissals seems to be just above 10%, so Shane's 27% really is loads.

No surprises there then. Opposition bowlers aren't idiots, they've been bowling to pin him LBW for a reason and it's been working.

The second column of the table shows for each player how many of their LBWs they could in principal have reviewed, by which I mean that

1)  the dismissal occurred in a match where DRS available to review LBW decisions
2)  Australia had some reviews left at the time
3)  the dismissal wasn't the result of the opposition reviewing a not out decision.

The third column shows how many they actually reviewed and the fourth shows how many they reviewed as a percentage of the 'reviewable' dismissals.

What we see is that while Shane does like a review, his proportion of LBWs that were failed reviews isn't wildly out of kilter with other batsmen in his team. Indeed he's relatively more frugal with reviews than either Michael Clarke or Brad Haddin- who has reviewed every one of his eventual LBWs where he had the option.

In fairness to Haddin, he may reasonably argue that he bats in the last recognised batsman's position at 7 and so he might as well use a review if there's one left to be used. On the other hand, Watson could have made exactly the same argument on Saturday, but it didn't stop everyone finding it very funny.

So, it seems Watson definitely deserves his LBW candidate reputation and that has probably fed in to the DRS fiend narrative- he reviews a lot of LBW decisions because he has a lot of opportunity to! Nevertheless, even if he is a bit trigger happy with the review, he's not the only baggy green to be that way.

Friday, 10 July 2015

The point of this blog

I think it's fair to say that Peter Moores isn't held in very high esteem by most England cricket fans these days. No surprises there- the world cup was a masterclass in sporting incompetence, even by English standards.

As England crashed out of the world cup, poor old Mooresy was widely lampooned for one particular comment, which now seems destined to become his cricketing epitaph:

"We need to look at the data".

This is all very unfair, of course, because apparently he didn't actually say that.

Which is a shame, in my opinion, because people should look at data. Data is great. It helps us to see through the fog of our biases, of the received wisdom of experts and the hyperbole of journalists, to get a glimpse of the world as it actually is.

There's an awful lot of received wisdom in cricket. Just listen to an hour of Test Match Special (just to be clear, I love TMS). You'll be told that one player has a weakness against left arm spin and another scores too many pretty thirties. A batsman has been 'found out' because he's gone three games without a fifty and that a bowler is going to bowl better because he scored runs when batting. You'll hear A LOT about momentum. You won't hear much backing up of claims with evidence.

Some of these things might be true. But I defy anyone who says that they know that, without having looked at some data. Without hard data all of our observations of the world are refracted through the lens of our preconceived ideas. We make our own narrative, but it may have nothing to do with what's really going on.

So I'm going to try and look into these things for myself. I want to take those things which commentators and fans assert as self-evident facts and see if they stand up to the evidence. I'm sure some will pass the test. I'm sure some won't.

Cricket has it's own beauty, which goes beyond what numbers can tell you. But looking at the numbers can give you a fuller picture and a deeper understanding. And in their own way, numbers can be beautiful too.

This is just a fun project for me to learn more about cricket, writing and maths- all of which are things I like. I'll be posting my findings on this blog, we'll see how it goes.