We need to look at the data: 2017

Friday, 25 August 2017

Over too quickly

The English test summer has been a summer of thrashings. Not only the one-sided day-night affair at Edgbaston last week, but also the South Africa series finished without producing a single game which you would really call close.

This is not something terribly unusual. Large margins of victory are the norm in test match cricket, even if the two teams competing are fairly evenly matched, as has been noted previously on this blog.

There has been another common feature of these games- related to, but separate from, the comfortable victory margins: the outcome was highly predictable once both teams had batted once. The first innings leads in England's home games of 2017 have been: 97, 130, 178, 136 and a whopping 346.

With these leads duly established, the side ahead has remained more-or-less in control for the remainder of the game.

This observation prompts today's blog topic: how frequently are modern test matches virtually decided by the time both teams have batted once?

The pie chart below shows the distribution of first innings leads in tests since 2013. This amounts to a sample of 194 tests (I discounted a few which didn't get as far as two complete innings)

As you can, see more than a third of games feature first innings leads greater than 200 runs, and around half of all games feature leads greater than 150 runs. As you would guess, a lead of 150 runs is pretty determinative. The chart below shows the outcome of those games since 2013 with a first innings difference above 150: the first innings leader won 86% of the time with all the rest being draws, apart from one- Sri Lanka's miraculous victory against India at Galle in 2015

So around half of modern test matches really are basically over once each team has batted once- perhaps taking you up to the early to mid stages of day 3 with any real doubt about the destination of the game. You may wonder whether the qualifier "modern" was really necessary in that last sentence. Perhaps it was ever thus. People often like to tell you that the past was better, but people are often wrong.

I haven't attempted an analysis of first lead innings over the entire history of test cricket, which is what I would have liked to do. Unfortunately, I wasn't clever enough to find a time-efficient way of gathering the data on first innings leads (statsguru doesn't have a button for that). But to provide a bit of historical context for the data above, I homed in on the data for a 5 year period in the late 90s, by way of comparison with the modern day. I chose this period for no better reason than that it was the time I first got into cricket and I feel nostalgic about it.

The data from 1995-1999 support the theory that tests were not quite so frequently decided early in those days, as you can see below.

Only around a third of games saw first innings leads above 150 runs, and nearly a quarter were within 50 runs on first innings (as an aside, games with sub 50 run leads are basically toss-ups, both in the 90s and now- dividing roughly evenly between the leading side, the trailing side and the draw).

In games which did feature a big difference on first innings the distribution of outcomes were basically the same as they are now- so teams weren't necessarily better at responding to a large deficit, but large deficits weren't quite so frequent.

Test cricket is a wonderful sport, but I think it must be admitted that one of its weaknesses is that it very easily produces one-sided games which are over long before they are over. This has probably always been the case to some extent, but it would appear to have become exacerbated of late. Proposed explanations and solutions for this may vary. When a truly nail-biting test match comes along, uncertain to the end, treasure it for the previous jewel it is.

Sunday, 4 June 2017

Approaching the milestone

During England's eventually successful chase of 306 to beat Bangladesh in the opening game of the Champions Trophy, Alex Hales was on his way to a hundred. He'd just started to take a fancy to the bowling of Sabbir Rahman and biffed a couple of boundaries to move to 94. Swinging for the fence once more he was caught at deep midwicket.

Of course, this was a cause of exasperation for some English observers but George Dobell - one of my favourite cricket writers- took a different view, tweeting:

Dobell is suggesting - and for what its worth I broadly agree- that whatever you think of the selection and execution of the shot, Hales' attitude was admirable. Rather than play steadily through the 90s to try and guarantee himself the milestone and associated plaudits, he judged that it was better for the team if he carried on accelerating, and was willing to risk the personal achievement of notching another ODI hundred for the good of the team.

The tweet also seems to allude to a converse attitude among many of Hales' peers- that many of them do slow down as they approach 100, for the sake of trying to make sure of getting there. In today's blogpost I want to examine that idea amongst modern ODI batsmen.

Is it really common for ODI batsmen to noticeably slow down as they approach 100?

If so, how much do they slow down? How much innings momentum is lost to milestone hunting?

In an attempt to answer these questions I have had a look at the ball by ball data for all of the ODI centuries scored between the beginning of 2016 and the England v Bangladesh match the other day (so the data doesn't include Kane Williamson and Hashim Amla's centuries in the last two days).

This adds up to 108 centuries. I divided them each up into windows of 10 runs (0-9; 10-19 etc) and asked how many balls each batsman spent with their score in each window. If batsmen are tending to slow down as they approach 100, we should see that they spend more balls in the nineties than in the 70s or 80s.

The graph below show the average result for each run window, averaged over the 108 centuries in the sample. The red line is the mean, the blue line is the median.

One feature of the graph which I like- which is basically irrelevant to today's question but I'll mention it all the same- is that does give a nice visualisation of the batsmen playing themselves in. The first 10 runs really are noticeably slow compared to the rest of the innings, taking an average of around 14 balls. Thereafter, the average ODI centurion stays close to the run a ball mark, with gentle acceleration over the course of the innings. The average number of balls taken to get through a 10 run window goes from 10.09 in the twenties down to 8.27 in the 80s.

The brakes do seem to go on just a little bit in the 90s however, with the average balls taken for those 10 runs ticking back up to 9.23. (The median goes up to 9 from 8).

So, the data is a consistent with a weak slowing down as batsmen get near the milestone. But its a very tiny effect. Indeed the size of the effect is comparable to the degree of statistical noise in the data, so I'm not even 100% sure its real. But, if we take it at face value, batsmen are on average spending about 1 ball longer in the 90s than they are in the 80s, possibly influenced by the impending glory of an ODI hundred.

To look at the data a different way: 56% of the centurions were slower through 90s than they were through the 80s. By way of comparison only 38% were slower through the 80s than the 70s. Again this is consistent with the expected steady acceleration through the innings, which is very slightly waylaid by a nervous nineties slowdown. The distributions for the number of balls taken to get through the 80s and the 90s are plotted below as histograms. Comparing them you can see a slight rightward shift of the distribution as you go from 80s to 90s.

An extra one ball to score 10 runs is very small potatoes, so you probably don't need to be too worried that this is going to cost your team a match. Of course, for some individuals the effect may be stronger. And still, its interesting to reflect that even the top pros may be affected by the arbitrary milestones made for them, even if just by a tiny bit.

Sunday, 5 February 2017

Double the score at 30 overs

At some point during the display of pure swashbuckling, boundary smashing, batsmanship that was the recent India vs England ODI series something caught my eye. Well, several things did, but only one directly inspired this blog post. It was a someone on twitter estimating what the batting team's final total would be, using the rule-of-thumb that it will be roughly double the score at 30 overs.

In my head, my reaction went something like this "Really? People are still using the 'double the score at 30 overs rule'? Surely that's way out of date now, if ever it was true."

I then continued thinking: "I bet modern batsmen, with their skills honed to play aggressively, playing on flat pitches with short boundaries probably consistently beat that mark these days".

Well, the data shows my snide internal monologue was wrong, as you'll see below. To be honest, a moment's further thinking would have revealed my prediction to be hopelessly naive (not to mention cliche-ridden)- if team's are now scoring faster in the later overs, they're also scoring faster in the early overs. So, how it ends up working out for the "double the score at 30 overs" rule isn't immediately obvious.

In fact, in recent ODIs, if you estimate a team's total by doubling their 30 over score, you will be quite consistently too generous.

The graph below plots the ratio of the final total to the score at 30 overs achieved by teams batting first in (non-weather affected) ODIs since the beginning of 2016, against the number of wickets fallen at 30 overs. Points above the green dashed line represent innings which beat the benchmark of twice the 30 over score, points below the green dashed line fell short of it. The red line is the median ratio as a function of the number of wickets down at 30 overs, just to give a sense of how much it depends on how many batsmen are back in the pavilion.

As you can see, there are many more points below the green dashed line than above it. To be precise, sides batting first fall short of that mark 74% of the time. Even sides who are only one or two wickets down at 30 overs fall short of doubling their score more often than not.

The "double the score" heuristic is still not too bad as a ballpark figure, which I guess is all its meant for- the average ratio between the 30 over score and the final score was 1.81. Nevertheless, it is a fairly consistent overestimate. If you want a rule-of-thumb which is still somewhat simple, but a bit less over-generous "double the score and subtract 10%", might be better.

One might then wonder, how this plays out for teams batting second. Is a chasing side that's only halfway to the target at 30 overs likely to win?

The answer is: no, they are not. Chasing teams that are around halfway to their target at 30 overs usually lose.

The graph below plots, for each ODI chase since the beginning of 2016 (again, in non-weather-affected matches), the fraction of the target achieved at 30 overs against the number of wickets fallen at that point. Red points represent winning chases, blue points represent losing chases.

The dashed green line is the threshold of being halfway to the target at 30 overs. You'll notice that teams on the edge of this threshold rarely win and that teams below it never do. (I say never- of course I just mean within the sample I studied- I'm sure there are plenty of examples if I had gone back further).

The bottom line is that a chasing team, who at 30 overs is only halfway there, is very much livin' on a prayer.