Bradman head and shoulders above the rest with Geometric Mean nearly equal to Arithmetic Mean of next best

Bradman head and shoulders above the rest with Geometric Mean nearly equal to Arithmetic Mean of next best

Boring Introduction

My Jun 2012 post Don Bradman & Geometric Mean was an attempt to look beyond traditional batting averages. The key points from that article:

  • Geometric mean G is computed by taking the product of all the numbers and finding the nth root of the result.
  • Geometric Mean tends to dampen the effect of a few very high values.
  • Geometric Mean applies to positive numbers only.

Two values 5 and 125 are plotted on a scale. 65 will be its Arithmetic Mean due to equal distance (of 60) from both the points while Geometric Mean is 25 with an equal ratio (of 5). An average value represents a central or typical value. In cricket, the batting average is the average number of runs scored per dismissal. A clear distinction is maintained between the number of innings played and number of dismissals because at least one batsman remains not out at the end of every inning. Calculations for G were purely based on runs scored in the earlier article where no distinction was made between not outs and dismissals.

How un-original?

Batting Average ignores Not Outs altogether whereas RpI (Runs per Innings) treats even 0* as a completed innings, both unfair in my opinion. In Mar 2013, Anantha Narayanan proposed two measures, ExtAvge & RpFI, in his article the vexed question of ‘not outs’  to distinguish not out scores from dismissals. I have made some ornamental amendments to RpFI to compute revised Geometric Mean. But more about it later.

Let us continue with key ideas from previous article. Geometric Mean does not apply to negative or ZERO values. Batting scores are non-negative. I proposed that we can deal with zero scores by adding a unit value to all the data points, calculating G and subtracting that unit from the result as an acceptable compromise because practically everyone gets out for 0 over the course of an individual career.

Grandiloquence

Frequency of each score from 0 to 400

Frequency of each score from 0 to 400

I have now built a personal database with entire test match data and this hypothesis can be tested easily. I found that 790 out of 2746 test players never lost their wicket for a duck. 103 of those 790 have batted in 10 or more innings and 30 players have scored more than 500 runs.

Most Inns without a duck

Most Inns without a duck

40 innings were played by Reggie Duff, most by any player, with two 100s and four 50s to score 1317 runs without a zero dismissal. Cheteshwar Pujara has played 32 innings until Test #2122 (played in Mar 2014) for 1650 runs, the most runs scored in this club. See the list of more than 30 batsmen who played 20 innings or scored 500 runs without getting out for ZERO below. The relatively long list is a deliberate choice. It highlights that while getting out without scoring is common, it is not unusual for players to score in each appearance. Besides the list throws some interesting names often unseen due to filters like 3000 or more runs etc.

First of several ho hum tables

Name Mat Inns NO Runs 100 50 LS Ave µ G
Cheteshwar Pujara (Ind) 19 32 4 1650 6 4 1 58.9 53.1 25.8
Dave Houghton (Zim) 22 36 2 1464 4 4 1 43.1 40.7 20.5
Herbie Collins (Aus) 19 31 1 1352 4 6 1 45.1 44.9 23.4
Reggie Duff (Aus) 22 40 3 1317 2 6 1 35.6 33.2 19.3
Jim Burke (Aus) 24 44 7 1280 3 5 0* 34.6 30.8 15.0
Brendan Nash (Win) 21 33 0 1103 2 8 1 33.4 33.4 15.6
Waqar Hasan (Pak) 21 35 1 1071 1 6 1* 31.5 31.5 19.5
Chris Rogers (Aus) 14 27 0 1030 4 5 1 38.1 38.1 18.4
Faf du Plessis (Saf) 14 22 3 996 3 4 1 52.4 46.5 29.6
Raman Subba Row (Eng) 13 22 1 984 3 4 2 46.9 46.1 30.8
Billy Zulch (Saf) 16 32 2 983 2 4 1 32.8 31.5 16.9
Brijesh Patel (Ind) 21 38 5 972 1 5 1 29.5 25.8 13.1
Robert Christiani (Win) 22 37 3 896 1 4 1 26.4 25.4 14.6
Jack Robertson (Eng) 11 21 2 881 2 6 2 46.4 43.8 30.6
Bernard Julien (Win) 24 34 6 866 2 3 1 30.9 27.2 13.5
Walter Read (Eng) 18 27 1 720 1 5 0* 27.7 27.7 15.0
Billy Bates (Eng) 15 26 2 656 0 5 1 27.3 25.2 14.8
Matthew Wade (Aus) 12 22 4 623 2 3 1 34.6 28.5 15.5
Upul Chandana (Slk) 16 24 1 616 0 2 1* 26.8 26.7 21.0
Tommy Andrews (Aus) 16 23 1 592 0 4 1 26.9 25.7 12.5
Maurice Foster (Win) 14 24 5 580 1 1 3 30.5 24.9 14.8
Geoff Rabone (Nzl) 12 20 2 562 1 2 1 31.2 29.3 21.1
Kumar Duleepsinhji (Eng) 12 19 2 995 3 5 1 58.5 53.4 35.2
Cyril Walters (Eng) 11 18 3 784 1 7 1 52.3 48.0 35.4
Mominul Haque (Bng) 7 13 3 755 3 3 8 75.5 60.7 46.5
Rusi Modi (Ind) 10 17 1 736 1 6 1 46.0 43.3 25.3
Stewie Dempster (Nzl) 10 15 4 723 2 5 8 65.7 50.5 35.9
David Lloyd (Eng) 9 15 2 552 1 0 4 42.5 38.1 22.0
Barry Richards (Saf) 4 7 0 508 2 2 29 72.6 72.6 60.6
Brad Hodge (Aus) 6 11 2 503 1 2 6 55.9 46.2 28.5

Out of 65107 dismissals, 8040 i.e. 12.35% or 1 in every 8 dismissals is for the lowest score possible. But my assumption that 90% or more batsman would fall in this category was off target. 1 in 4 players did not get out for a ZERO score ever. This indicates that a suitable alternative to deal with 0 values is necessary.

Oh my goodness – a formula

7 inns by Barry Richards are within 3 scoring bands

7 inns by Barry Richards are within 3 scoring bands

Instead of focussing solely on cricket scores, let us try to find a way to calculate for any data range. This range will include both negative and decimal values. Adding a unit value will not be a viable solution in those cases. But it is possible to calculate 3 different Geometric Means for each category. Let us assume that N values in the data can be divided as N1 positive values, N2 zero values and N3 negative values. We can calculate G1 for the N1 positive values by taking the product of all the numbers and finding the nth root. G2 for all zero values will no doubt be zero. G3 for all negative values can be calculated using absolute values only and multiplying the result by -1. Then we can determine G for the entire data by suitably applying the weights:

G = (G1 * N1/N) + (G2 * N2/N) + (G3 * N3/N)

David Lloyd scored a double hundred but not  a duck

David Lloyd scored a double hundred but not a duck

For cricket scores we need to determine only G1 & G2 where we know that G2 is always going to be 0. Effectively we calculate GM for all positive scores and scale down the value by percentage of zero values. Players like David Lloyd, Barry Richards, Rusi Modi & Faf du Plessis from the above list scored in each inning. So our derived G for these players will be identical to the traditional Geometric Mean. The method proposed in earlier article adds a unit value to all these non-zero scores even though it is possible to calculate Geometric Mean. The derived G does not match conventional Geometric Mean hence that method is discarded despite its merits.

Mr. Cricket

Mr. Cricket

Keeping aside the treatment for not out scores for a moment, Geometric Mean for Bradman’s 73 non-zero scores is 53.5 which will be scaled down to a G of 48.82 accounting for 10% zero scores. Compare this value with 38.59 calculated using the add unit value method. Is 39 a better representative value than 49? If Bradman had scored a solitary run instead of his seven zero scores, the Geometric Mean would have fallen to 37.77. This thought experiment proves that the revised method ‘rewards’ utter failure as long as remaining scores are very high.

But it is better to retain integrity of data and not alter one or more actual observations. New method for calculating G does not change actual scores. Hence it is the preferred solution. A number of lists will be presented in this article after applying the revised formula uniformly to all players.

We can get to those values only after we identify a uniform treatment to deal with the other issue – not outs. In earlier version every score was treated like a dismissal. Let us begin with the boundary condition of an unbeaten zero (0*). This is the worst case scenario if not-outs are ignored completely. Nothing gets added to the numerator even though player was not dismissed but the denominator is incremented anyhow. A 0* can be treated fairly by specifically adding a zero value to the denominator as well. This means that average will remain unchanged. So we are still calculating RpI but it is adjusted for a specific condition.

No double hundreds yet the best ever barring Bradman!

No double hundreds yet the best ever barring Bradman!

What about 1*? Learning from 0* we can add a tiny fraction to the denominator instead of the maximum value of 1. How do we calculate this fraction represented as n/d? No doubt the numerator n has to be runs scored. We could use d=400, the most runs scored in an innings ever, but most of the scores, both dismissals and not outs, are for very low values. Choice of 400 is unfair as n/d fraction will be too low. We could also use the GM of all dismissed scores for all players, or GM of player’s country. We can also divide the playing history in different eras and calculate a mean value for each period. Instead I find that average of all dismissed scores for the concerned player is the most suitable value for tis denominator d. This value is 49.25 for Bradman. He has scores of 30* & 37* under this value. These two innings are treated as 61% and 75% fulfilled. The remaining not out scores of 56*, 57*, 102*, 103*, 127*, 144*, 173*, 299* are all treated as completed innings. Bradman’s G increases to 50.11 from 48.82. He is the only player with an average over 50 with 10 or more innings. Surely those double and triple hundreds helped despite a string of low scores. Barry Richards averages 60.6 which is 20% higher than Bradman, but he played only 7 innings in 4 tests scoring 29, 32, 140, 65, 35, 81 & 126, all non-zero dismissals. His lowest score of 29 is exceptional and is the highest amongst those who played 5 innings or more. Gary Stead (16), Frank Duce (15), Kaushal Lokuarachchi (15), Brian Bolus (14), Lee Irvine (13), Dilawar Hussain (13), John Nicolson (13), Don Taylor (11) are the others with a double digit lowest score.

Solving a problem that does not exist

In cricket the batting average is the average number of runs scored per dismissal. This means using 30*, 37*, 56*, 57*, 102*, 103*, 127*, 144*, 173*, 299* in the computation of Bradman’s G but using only 70 (not 80) as the divisor. That is the conventional method of dealing with not-outs which is not used here. Please read the comments section of the vexed question of ‘not outs’ , the article mentioned at the very beginning, for a variety of opinions supporting the traditional method.

This man scored 10 hundreds in 40 innings

This man scored 10 hundreds in 40 innings

I do not believe that a not out score is a failed chance to increase personal tally. A run is scored for the team and it is deemed complete only when both not out players have crossed and made good their ground from end to end. The rules of cricket are very clear that once a side is bowled out, the lone un-dismissed batsman must return to the pavilion. Sometimes two players remain unbeaten when the captain declares the innings. This is a case of effectively forfeiting all further scoring opportunities when a team decides to pursue victory within available time. Another instance when two players remain not out occurs when a match is won with wickets in hand. Here too the victorious team is not allowed to bat until the end of scheduled time to score additional runs. Players also remain unbeaten when a match remains unfinished due to bad weather. In each scenario speculation about how many more runs could be added is irrelevant. The score posted by the not out batsman is the only value that should be used to calculate the mean value hence measures like ExtAvge were not considered.

He was as prolific as Sutcliffe. Not equally consistent though.

He was as prolific as Sutcliffe. Not equally consistent though.

Yet a distinction between a completed innings and an unbeaten innings should be made. We have looked at the 0* situation already. It is a fact that most of the unbeaten scores are very low. The mode (or the most common score) for the nearly 10K not outs is 0. This occurs in more than 13% cases or roughly 1 in every 8. The Median value is 12 which means 50% scores are below this value. Q3 (the third quartile) is 36 i.e. 75% scores are below this value. Nearly 81% scores are 50 or less. 90% scores are below 78 and 99% scores are below 189. The highest score is 400*. A tall score, a rare occasion, rightfully boosts player average. This method avoids further boosting that value. All such high scores are treated as fulfilled. On the other hand the 1 in 8 scores of 0* do no decrease the player average adversely. Proportional values are used for the in between scores. More about this a little later.

Halves and Quarters

Median is the middle value that separates the higher half from the lower half of the data set and is denoted by Q2 in this post. In general Q2 score will be less than the batting average because a player inflates this value with the help of a few very high scores and some not outs. Here is a list of players where that statement does not hold true. 129 players fulfil this criterion with top 15 listed below:

Name Mat Inns NO Runs 100 50 Q2 HS Ave µ G
David Steele (Eng) 8 16 0 673 1 5 43 106 42.1 42.1 29.2
Giff Vivian (Nzl) 7 10 0 421 1 5 50.5 100 42.1 42.1 25.8
Michael Carberry (Eng) 6 12 0 345 0 1 32.5 60 28.8 28.8 26.1
Herby Wade (Saf) 10 18 2 327 0 0 20.5 40* 20.4 18.2 14.1
Fred Susskind (Saf) 5 8 0 268 0 4 37 65 33.5 33.5 18.8
Dilawar Hussain (Ind) 3 6 0 254 0 3 45 59 42.3 42.3 37.9
James Marshall (Nzl) 7 11 0 218 0 1 24 52 19.8 19.8 14.4
Bert Vance (Nzl) 4 7 0 207 0 1 31 68 29.6 29.6 18.3
Harry Moses (Aus) 6 10 0 198 0 0 23.5 33 19.8 19.8 13.7
Aftab Gul (Pak) 6 8 0 182 0 0 27.5 33 22.8 22.8 19.8
Leonard Moon (Eng) 4 8 0 182 0 0 29 36 22.8 22.8 20.2
Frank Hearne (Eng/Saf) 6 10 0 168 0 0 21.5 30 16.8 16.8 15.1
Pananmal Punjabi (Ind) 5 10 0 164 0 0 17 33 16.4 16.4 11.2
Charles Studd (Eng) 5 9 1 160 0 0 21 48 20.0 20.0 18.6
Andy Pycroft (Zim) 3 5 0 152 0 1 39 60 30.4 30.4 16.1
This Kumar is second behind Don - half the time?!

This Kumar is second behind Don – half the time?!

Notice the absence or a very low count of not outs and the highest score of 106. Everyone here bows to the tyranny of low scores. No one could raise batting average above Q2 by scoring a few big hundreds or adding substantial runs without getting dismissed. The highest dismissed score by Herby Wade is 39 and his two not out scores are 32* & 40* both of which will be treated as fulfilled while calculating G. The sole unfinished innings by Charles Studd is 0* who fails to take any advantage of the non-dismissal. The above exception list confirms the hypothesis that batting average is higher than Q2 because a player inflates average with the help of a few very high scores and some not outs.

A bid to discredit Imran, Steve Waugh, Sobers …. And Bradman

Let us identify players who will ‘suffer’ the most by this adjustment made for not-outs. Next a list of players with more than 3000 runs and highest frequency of Not Outs:

Name Mat Inns NO Runs 100 50 Q2 HS Ave µ G
Shaun Pollock (Saf) 108 156 39 3781 2 16 17.5 111 32.3 25.3 15.7
Chaminda Vaas (Slk) 111 162 35 3089 1 13 12 100* 24.3 19.9 12.0
Imran Khan (Pak) 88 126 25 3807 6 18 23 136 37.7 31.4 18.6
Hashan Tillakaratne (Slk) 83 131 25 4545 11 20 20 204* 42.9 36.3 21.5
Jimmy Adams (Win) 54 90 17 3012 6 14 18 208* 41.3 34.4 18.6
Steve Waugh (Aus) 168 260 46 10927 32 50 25.5 200 51.1 43.5 24.7
Misbah-ul-Haq (Pak) 46 80 14 3218 5 25 32 161* 48.8 41.6 26.2
Shiv Chanderpaul (Win) 153 261 45 11219 29 62 31 203* 51.9 43.7 26.0
Matt Prior (Eng) 75 116 20 3920 7 27 21 131* 40.8 35.7 21.2
Andy Flower (Zim) 63 112 19 4794 12 27 29 232* 51.5 44.2 26.3
Allan Border (Aus) 156 265 44 11174 27 63 31 205 50.6 43.1 25.7
Graham Thorpe (Eng) 100 179 28 6744 16 39 23 200* 44.7 39.0 23.4
Ashwell Prince (Saf) 66 104 16 3665 11 11 21 162* 41.6 36.0 18.8
Thilan Samaraweera (Slk) 81 132 20 5462 14 30 23.5 231 48.8 43.1 24.3
VVS Laxman (Ind) 134 225 34 8781 17 56 27 281 46.0 40.3 23.8
Adam Gilchrist (Aus) 96 137 20 5570 17 26 26 204* 47.6 42.1 22.5
Keith Fletcher (Eng) 59 96 14 3272 7 19 18 216 39.9 35.1 18.2
Jacques Kallis (Saf/ICC) 166 280 40 13289 45 58 31.5 224 55.4 48.5 27.9
Saleem Malik (Pak) 103 154 22 5768 15 29 25 237 43.7 38.7 24.0
Richard Hadlee (Nzl) 86 134 19 3124 2 15 15 151* 27.2 24.2 14.7
Robin Smith (Eng) 62 112 15 4236 9 28 25 175 43.7 39.5 24.4
Daniel Vettori (Nzl/ICC) 112 173 23 4516 6 23 15 140 30.1 27.6 15.5
Garry Sobers (Win) 93 160 21 8032 26 30 33.5 365* 57.8 51.3 31.4
Ian Bell (Eng) 98 170 22 6722 20 39 24.5 235 45.4 40.4 21.4
Damien Martyn (Aus) 67 109 14 4406 13 23 30 165 46.4 41.9 24.8
Ian Healy (Aus) 119 182 23 4356 4 22 14 161* 27.4 24.5 14.6
Greg Chappell (Aus) 87 151 19 7110 24 31 31 247* 53.9 49.2 29.5
Don Bradman (Aus) 52 80 10 6996 29 13 56.5 334 99.9 89.5 50.1
Desmond Haynes (Win) 116 202 25 7487 18 39 24 184 42.3 39.0 23.4
Dean Jones (Aus) 52 89 11 3631 11 14 24 216 46.6 41.1 23.1
The other Kumar is perhaps the best modern batsman

The other Kumar is perhaps the best modern batsman

That was another deliberate attempt to tabulate too many names. This list includes the very best, like Bradman, who were difficult to dislodge. Then we have members of a strong squad who may have batted second time only to overhaul a small target. It also includes those who played the role of last specialist batsman in a team. And the rest are bowling all-rounders.

Trivia

Don Bradman (50.1), Garry Sobers (31.4), Greg Chappell (29.5) & Jacques Kallis (27.9) from the above list are frequently mentioned. So a little detour to peruse a list of players who will fail to make any batting list simply because they never came out to bat.

Name Mat Balls Runs Wkts Cts Sts
Jack MacBryan (Eng) 1
Fred Root (Eng) 3 642 194 8 1
Johnnie Clay (Eng) 1 192 75 1
Hopper Read (Eng) 1 270 200 6
Colin Snedden (Nzl) 1 96 46
Lance Pierre (Win) 1 42 28
Bal Dani (Ind) 1 60 19 1 1
Vijay Rajindernath (Ind) 1 4
Narain Swamy (Ind) 1 108 45
Jack Wilson (Aus) 1 216 64 1
Peter Allan (Aus) 1 192 83 2
Tony Howard (Win) 1 372 140 2
Farrukh Zaman (Pak) 1 80 15
Rakesh Shukla (Ind) 1 294 152 2
Shahid Mahboob (Pak) 1 294 131 2
Ali Hussain Rizvi (Pak) 1 111 72 2
Charl Willoughby (Saf) 2 300 125 1
Dan Cullen (Aus) 1 84 54 1
Amjad Khan (Eng) 1 174 122 1

Jack MacBryan appears in eleven quirky debuts for his solitary appearance in Eng-Saf Jul 1924 test which was washed out after Saf scored 116/4. Vijay Rajindernath kept wickets for India in Nov 1952 test against Pak where he made four stumpings without taking a catch. Rest are bowlers who were not required to bat. These are the only players who are not assigned any G because it is not applicable.

The oldest player to score a test hundred was as good as his partner Hebert Sutcliffe

The oldest player to score a test hundred was as good as his partner Hebert Sutcliffe

Batsman not dismissed – so what?

Batting average is infinite (or undefined) for 62 players who were never dismissed. A total of 505 runs were scored between them. Eight players who scored 20 or more runs feature below who will be assigned a G nevertheless:

Name Mat Inns NO Runs 100 50 Q2 HS Ave µ G
Afaq Hussain (Pak) 2 4 4 66 0 0 11.5 35* 22.9 25.5
Stuart Law (Aus) 1 1 1 54 0 1 54 54* 54.0 54.0
Clive Halse (Saf) 3 3 3 30 0 0 10 19* 14.3 11.2
Fred Freer (Aus) 1 1 1 28 0 0 28 28* 28.0 28.0
Ian Callen (Aus) 1 2 2 26 0 0 13 22* 19.9 23.1
Len Johnson (Aus) 1 1 1 25 0 0 25 25* 25.0 25.0
Audley Miller (Eng) 1 2 2 24 0 0 12 20* 18.0 20.7
Bill Bell (Nzl) 2 3 3 21 0 0 0 21* 21.0 21.0

Afaq Hussain batted in all 4 innings of his 2 tests scoring 10*, 35*, 8* & 13*. Stuart Law is well known for his non-test achievements. He also appears in One-Test wonder list for his sole innings of 54* but others may be less known.

Shouldn’t all those who achieved the rare feet of playing a test match be celebrated? I have listed every single one of them in a single table here.

Getting anywhere?

 "I've only had nine balls all week!"

“I’ve only had nine balls all week!”

The top players and their top performances are remembered because they overcome the tyranny of low scores more often than others. One last table celebrating the sub-3000 club. Some of these players have scored less than 1000 runs but managed to maintain G north of 30:

Name Mat Inns NO Runs 100 50 0 Q1 Q2 Q3 HS Ave µ G
Sidney Barnes (Aus) 13 19 2 1072 3 5 1 30.5 41.0 63.0 234 63.1 59.5 45.0
Vijay Merchant (Ind) 10 18 0 859 3 3 2 23.8 31.5 53.5 154 47.7 47.7 36.8
Stewie Dempster (Nzl) 10 15 4 723 2 5 0 13.0 27.0 72.0 136 65.7 50.5 35.9
Cyril Walters (Eng) 11 18 3 784 1 7 0 20.0 46.0 57.3 102 52.3 48.0 35.4
Kumar Duleepsinhji (Eng) 12 19 2 995 3 5 0 29.0 48.0 59.5 173 58.5 53.4 35.2
Alan Melville (Saf) 11 19 2 894 4 3 2 9.0 24.0 72.5 189 52.6 49.4 33.4
Ernest Tyldesley (Eng) 14 20 2 990 3 6 2 13.3 40.5 78.8 122 55.0 49.5 33.2
Eddie Paynter (Eng) 20 31 5 1540 4 7 3 9.0 33.0 74.5 243 59.2 54.2 31.6
Colin Bland (Saf) 21 39 5 1669 3 9 2 22.0 33.0 53.0 144* 49.1 44.1 31.6
Raman Subba Row (Eng) 13 22 1 984 3 4 0 13.8 31.0 58.3 137 46.9 46.1 30.8
Jack Robertson (Eng) 11 21 2 881 2 6 0 21.0 28.0 56.0 133 46.4 43.8 30.6
Charlie Davis (Win) 15 29 5 1301 4 4 1 19.0 29.0 71.0 183 54.2 46.1 30.3
Sidney, the batsman,  from Australia - not Sydney the bowler from England.

Sidney, the batsman, from Australia – not Sydney the bowler from England.

Batting average for Vijay Merchant is equal to his RpI value because he was dismissed in each inning. That 47.7 appears less than impressive compared to the batting average of many recent players with a 50+ value, For a sub-50 batting average, note his remarkable G of 36.8 which he achieved through scores of 23, 30, 54, 17, 26, 28, 35, 0, 33, 114, 52, 48, 12, 27, 78, 0, 128 & 154 in his 18 outings.

Let us rearrange these in blocks of 4 & 5:
0, 0, 12, 17, 23,
26, 27, 28, 30,
33, 35, 48, 52,
54, 78, 114, 128, 154.
Q2 or median marks the middle of data. His Q2 of 31.5 is between 30 & 33. We can continue to find middle of top and bottom half. A quarter of values are below 23.75 denoted by Q1 and 3 quarters of values are below 53.5 which fall between 52 & 54. He lost his wicket for no score twice but there are no other single digit dismissals. His ability to reach a double digit score may appear normal but it is in fact highly uncommon.

A quarter of all batting efforts fetch 4 runs or less. Q1 for New Zealand, Zimbabwe and Bangladesh is even lower at 3. 11 players have scored more than 10,000 runs. 5 of those 11 score less than 10 in a quarter of their innings. Only Sidney Barnes & Kumar Duleepsinhji (both feature in the above table) register a significant Q1 of 30.5 & 29 respectively (amongst those who have played 15 or more innings). Sutcliffe is 7th, Hobbs is 9th and Bradman is 12th on this list, each with 80 or more innings.

The original Wicketkeeper-Batsman?

The original Wicketkeeper-Batsman?

It is common to represent a large data set by a single value. Using 3, 4 or even 5 values will improve the representation. The first of our selected 5 values is Q1. A Q1 of 8 i.e. only a quarter of innings below 8 runs represents a consistent start.

Merchant has a weak second quarter and hence a low Q2 of 31.5 despite a healthy Q1. Duleepsinhji scored 48 runs in half his innings. Bradman catches up well in this quarter with an impressive 56.5 placing Kumar 2nd in this elite table.

In the above table only Ernest Tyldesley & Eddie Paynter continue building their innings after a decent start with very good Q3 values of 78.75 & 74.5 which means 1 in 4 of their innings are above these values. HS or the highest score is obviously the topmost value in each set.

Cyril Walters scores 35.4 on G despite the highest value of 102 which is the sign of an extremely consistent player. His scores in 4 blocks:

1, 2, 2*, 14*, 17,

29, 44, 45, 46,

46, 50*, 51, 52,

59, 64, 78, 82 & 102

Merchant (36.8) scored 859 runs in 18 innings. Walters (35.4) scored 784 runs in the same number of innings but with the aid of 3 not outs. Hence batting average of Merchant (47.7) is lower than Walters (52.3) despite scoring 75 more runs. But both have a very healthy and comparable G which treats the not out situations relatively fairly.

He scored 7 double hundreds, 1 triple and the only quadruple. He also scored 17 ducks.

He scored 7 double hundreds, 1 triple and the only quadruple. He also scored 17 ducks.

Q1, Q2, Q3 & G are four of the five indicators discussed so far. The last value is µ (pronounced as mu). It is commonly used as the symbol for co-efficient of friction. It is also a symbol for the mean in normal distribution. Actually distribution for batting scores is skewed. Yet I have chosen this symbol as my preferred central batting score to clearly distinguish it from batting average. Both G & µ will be between Q1 & Q3 where G is the (lower) damped Geometric Mean whereas (higher) µ is the Arithmetic Mean.

Back to square

Time to revisit RpFI, proposed by Anantha Narayanan

Average and extended average v Runs per fulfilled Innings and Runs per Innings

It is best explained by this quote and above image from his article the vexed question of ‘not outs’ :

A cut-off point at 50% of the “Average for dismissed innings”. Here are couple of examples. Don Bradman’s average for dismissed innings is 83.83 and any not out innings below 42 will be considered as a “real not out”. Ken Barrington’s average for dismissed innings is 50.37 and any not out innings below 25 will be considered as a “real not out”. Any other not out innings would be considered as a fulfilled innings.

This method does not add Notional runs and addresses some of those not-outs. µ is nothing but RpFI with minor improvements. Every not out is represented by a value between ZERO & ONE. 0* indicates the lowest value of ZERO while all not out innings above a player’s average for dismissed innings will equal ONE. This means that 6 of Bradman’s 10 not out scores – 102*, 103*, 127*, 144*, 173*, 299* – are treated as fulfilled. Remaining not out scores of 30*, 37*, 56* & 57* are deemed to be 36%, 44%, 67% & 68% complete. In the above image, yellow shapes are ‘accounted’. RpFI is not fully yellow but µ is.

Batting average, RpI and µ for Vijay Merchant will be identical because he was dismissed in all his 18 innings. Walters also played 18 innings but remained unbeaten thrice scoring 75 fewer runs. 8 top scores by Walters are 7 fifties and 1 hundred. Corresponding 8 for Merchant are 35, 48, and 3 fifties with 3 hundreds. Merchant has better scores in first quarter and last quarter. Walters is better in second and third.

Name Mat Inns NO Runs 100 50 0 Q1 Q2 Q3 HS Ave µ G
Vijay Merchant (Ind) 10 18 0 859 3 3 2 23.8 31.5 53.5 154 47.7 47.7 36.8
Cyril Walters (Eng) 11 18 3 784 1 7 0 20.0 46.0 57.3 102 52.3 48.0 35.4

Comparable values of G & µ illustrate how 75 extra runs are nearly nullified by 3 non-dismissals.

Any list that includes Hayden is…

Sidney Barnes, Vijay Merchant, Stewie Dempster, Cyril Walters, Kumar Duleepsinhji, Alan Melville, Ernest Tyldesley, Eddie Paynter, Colin Bland, Raman Subba Row, Jack Robertson & Charlie Davis batted less than 40 times. Each one was very consistent but their careers were shorter and hence those names need not be familiar. But those who played over 70 innings for a 30+ value of G can be recognised by their last name alone – Bradman, Sutcliffe, Hobbs, Barrington, Walcott, Sobers, Hutton, Hayden & Weekes. These and a few more in the top 20 of G list having played 70 innings or more are:

Name Mat Inns NO Runs 100 50 0 Q1 Q2 Q3 HS Ave µ G
Don Bradman (Aus) 52 80 10 6996 29 13 7 17.5 56.5 133.5 334 99.9 89.5 50.1
Herbert Sutcliffe (Eng) 54 84 9 4555 16 23 2 20.8 38.0 77.5 194 60.7 56.2 38.1
Jack Hobbs (Eng) 61 102 7 5410 15 28 4 19.0 40.0 74.0 211 56.9 54.6 37.1
Ken Barrington (Eng) 82 131 15 6806 20 35 5 13.5 46.0 80.0 256 58.7 53.6 31.5
Clyde Walcott (Win) 44 74 7 3798 15 14 1 13.3 34.5 80.5 220 56.7 53.2 31.4
Garry Sobers (Win) 93 160 21 8032 26 30 12 13.0 33.5 66.3 365* 57.8 51.3 31.4
Len Hutton (Eng) 79 138 15 6971 19 33 5 12.0 32.0 72.8 364 56.7 51.9 30.8
Matthew Hayden (Aus) 103 184 14 8625 30 29 14 12.0 31.0 67.3 380 50.7 48.6 30.6
Everton Weekes (Win) 48 81 5 4455 15 19 6 13.0 36.0 86.0 207 58.6 55.7 30.4
Rohan Kanhai (Win) 79 137 6 6227 15 28 7 14.0 32.0 62.0 256 47.5 46.6 29.9
Wally Hammond (Eng) 85 140 16 7249 22 24 4 14.8 32.0 63.0 336* 58.5 53.1 29.9
Kumar Sangakkara (Slk) 122 209 17 11151 35 45 9 11.0 32.0 70.0 319 58.1 54.7 29.7
Greg Chappell (Aus) 87 151 19 7110 24 31 12 8.0 31.0 63.5 247* 53.9 49.2 29.5
Mohammad Yousuf (Pak) 90 156 12 7530 24 33 11 11.0 28.5 69.0 223 52.3 49.0 28.8
Michael Hussey (Aus) 79 137 16 6235 19 29 12 12.0 31.0 67.0 195 51.5 46.4 28.3
Jacques Kallis (Saf/ICC) 166 280 40 13289 45 58 16 10.0 31.5 68.3 224 55.4 48.5 27.9
Denis Compton (Eng) 78 131 15 5807 17 28 10 11.0 28.0 64.5 278 50.1 46.2 27.9
Alvin Kallicharran (Win) 66 109 10 4399 12 21 10 9.0 26.0 57.0 187 44.4 41.7 27.7
Ted Dexter (Eng) 62 102 8 4502 9 27 6 11.0 29.0 62.0 205 47.9 45.1 27.5
Sachin Tendulkar (Ind) 200 329 33 15921 51 68 14 10.0 32.0 73.0 248* 53.8 49.5 27.3

A higher value of G indicates better consistency in getting starts. A higher value of µ illustrates ability to significantly consolidate those starts. Let us see the changes to above table for the Top 20 sorted by µ:

Name Mat Inns NO Runs 100 50 0 Q1 Q2 Q3 HS Ave µ G
Don Bradman (Aus) 52 80 10 6996 29 13 7 17.5 56.5 133.5 334 99.9 89.5 50.1
Herbert Sutcliffe (Eng) 54 84 9 4555 16 23 2 20.8 38.0 77.5 194 60.7 56.2 38.1
Everton Weekes (Win) 48 81 5 4455 15 19 6 13.0 36.0 86.0 207 58.6 55.7 30.4
Kumar Sangakkara (Slk) 122 209 17 11151 35 45 9 11.0 32.0 70.0 319 58.1 54.7 29.7
Jack Hobbs (Eng) 61 102 7 5410 15 28 4 19.0 40.0 74.0 211 56.9 54.6 37.1
Ken Barrington (Eng) 82 131 15 6806 20 35 5 13.5 46.0 80.0 256 58.7 53.6 31.5
Clyde Walcott (Win) 44 74 7 3798 15 14 1 13.3 34.5 80.5 220 56.7 53.2 31.4
Wally Hammond (Eng) 85 140 16 7249 22 24 4 14.8 32.0 63.0 336* 58.5 53.1 29.9
Len Hutton (Eng) 79 138 15 6971 19 33 5 12.0 32.0 72.8 364 56.7 51.9 30.8
Brian Lara (Win/ICC) 131 232 6 11953 34 48 17 8.0 33.5 72.3 400* 52.9 51.9 26.3
Garry Sobers (Win) 93 160 21 8032 26 30 12 13.0 33.5 66.3 365* 57.8 51.3 31.4
Sachin Tendulkar (Ind) 200 329 33 15921 51 68 14 10.0 32.0 73.0 248* 53.8 49.5 27.3
Greg Chappell (Aus) 87 151 19 7110 24 31 12 8.0 31.0 63.5 247* 53.9 49.2 29.5
Mohammad Yousuf (Pak) 90 156 12 7530 24 33 11 11.0 28.5 69.0 223 52.3 49.0 28.8
Matthew Hayden (Aus) 103 184 14 8625 30 29 14 12.0 31.0 67.3 380 50.7 48.6 30.6
Jacques Kallis (Saf/ICC) 166 280 40 13289 45 58 16 10.0 31.5 68.3 224 55.4 48.5 27.9
Virender Sehwag (Ind/ICC) 104 180 6 8586 23 32 16 9.8 31.0 60.3 319 49.3 48.4 27.0
Sunil Gavaskar (Ind) 125 214 16 10122 34 45 12 8.0 29.0 67.8 236* 51.1 48.2 25.4
Mahela Jayawardene (Slk) 143 240 16 11319 33 46 14 9.8 30.0 61.3 374 50.5 48.2 25.1
Younis Khan (Pak) 89 158 14 7399 23 28 15 8.0 30.5 66.5 313 51.4 48.1 25.9

Rohan Kanhai, Michael Hussey, Denis Compton, Alvin Kallicharran & Ted Dexter move out to make place for Brian Lara, Virender Sehwag, Sunil Gavaskar, Mahela Jayawardene & Younis Khan.

Huh, another ranking exercise

Who is the second best after Bradman? Both tables point at Herbert Sutcliffe as the clear second. After that we have a number of very good players. Each one of us can have an opinion based on arguments about quality of opposition, era, longevity, ability to score in difficult conditions, ability to not get into difficult conditions etc. This article is not about ranking players. It is about using G & µ along with Quartiles to better understand a player’s career. So let us look at Sobers v Kallis and then Lara v Tendulkar chosen for the high recall factor of their career.

Name µ G
Brian Lara (Win/ICC) 51.9 26.3
Garry Sobers (Win) 51.3 31.4
Sachin Tendulkar (Ind) 49.5 27.3
Jacques Kallis (Saf/ICC) 48.5 27.9

Sobers fares better than Kallis. His Arithmetic mean is higher by 2.87 runs and Geometric Mean by 3.44. The comparison is both relative and trivial because both belong to a very special club. This is an attempt to understand the utility of these measures and not an exercise to determine relative worthiness of stalwarts.

The best all-round player ever.

Th best all-round player ever.

May be the best all-round player ever.

May be the best all-round player ever.

 

 

 

 

There is a little less margin between Lara and Tendulkar. Lara scores 1 run less on the consistency factor in lieu of 2.36 runs in the ability to score big. I guess that summarises the role of G & µ!

He scored more than others.

He scored more than others.

He didn't!

He didn’t!

 

 

 

 

Sangakkara performs better than both Lara and Tendulkar. Amogst contemporaries, G for Lara is lower compared to Mohammed Yousuf, Hayden, Kallis and Sehwag as well.

µ is a fancy name for RpI

There we go. It is time to understand µ with the help of examples. Joseph Vine scored 36, 4* & 6* in 3 innings. His average for dismissed innings is 36. Next we have 4* which is a score below his average. µ remains unchanged from 36 because we divide (36+4) runs scored by fulfilled innings of (1 + 4/36). This means that when we add 1/9th of undismissed runs, we also increment the number of innings played by 1/9 which ensures that the ratio remains unchanged. Now 4* is treated exactly like 0*. The calculated average is neither reduced to 20 (RpI) nor increased to 40 (batting average). Exactly same thing happens when we process the third innings of 6*. When we add 6 runs (which are 6/36 or 1/6 of dismissed average) we also increment fulfilled innings by only 1/6, µ remains 36.

Jeff Moss batted twice scoring 22 & 38*. His dismissed average is 22. The not out score is higher than 22 which means that µ must increase. This inning will be treated as fulfilled. His 60 runs scored will be divided by 2. Hence µ increases to 30, which is equal to RpI in this case, but it is half the batting average of 60.

South Paw, Player of the Century. No, Not Lara.

South Paw, Player of the Century. No, Not Lara.

Now we examine a familiar case. Brian Lara scored 11245 runs in 226 dismissed innings for an average of 49.76. His first 3 not out scores of 14*, 48* & 13* are below that average. None of these scores will change µ but the next not out score of 153* lifts it to 50.21. Another score of 80* results in an ever higher value of 50.34. And the highest score ever of 400* ensures that Lara remains part of a very exclusive 50+ club with a career µ of 51.86. That exclusive club features only 13 players (amongst those who batted in 40 or more innings). Headley who batted in 40 innings is behind Bradman. Sutcliffe is in 3rd place. Graeme Pollock (41 innings) is slightly behind Weekes in 5th spot. Lara is 12th and Sobers 13th in this list. Tendulkar at 14th place with a career µ of only 49.5 misses out.

“Too Long, Didn’t Read”

Two top-20 tables of G & µ published earlier excluded Headley & Pollock who batted fewer than 70 times in their career. So let us now look at a series of comparison charts to find top players with a higher G & µ cutoff for fewer innings and relatively lower cut-offs for longer careers.

The left arm  of England's first professional captain was two inches shorter than the other

The left arm of England’s first professional captain was two inches shorter than the other

Player careers will be compared using a CandleVolume (or BoxWhisker) chart which is generally used to depict the traded volume of a stock along with OHLC (Open, High, Low & Close) prices on that day. Volumes are plotted on the primary axis and prices on secondary. Open and Close values form the body of candlestick (box) while High and Low appears as upper and lower shadows (or whiskers).

 

Only 12 players managed to end their career with µ > 50 & G > 30. It is a very stiff standard to maintain so no wonder that 4 of those 12 had very short careers. In other words, the remaining 8 were the most prolific as well as the most consistent over a long period.

12 players with µ > 50 and G >30 over 15+ innings

12 players with µ > 50 and G > 30 over 15+ innings

Primary Axis depicts career runs scored by a player which is scaled between 2000 and 16000. Sidney Barnes, Eddie PaynterKumar Duleepsinhji & Stewie Dempster have scored fewer than 2000 runs and hence the green bar depicting runs is not visible. Secondary Axis is used to show four key averages. Q1 & Q3 are the Low and High values whereas G & µ values form the edges of the dark green box. Bradman’s Q3 or 75% scores are between 130 & 140 as seen by the High Whisker of first element. Dempster’s µ is around 50 shown as the Top of Box in the last element. Q1 for Sidney Barnes & Duleepsinhji roughly equals G for Weekes which is nearly 30. This can be seen by matching the level of Bottom of 4th Box with the lowest point of 2nd item.

15 players with µ > 48 and G > 28 over 40 or more innings

15 players with µ > 48 and G > 28 over 40 or more innings

The high standards are lowered a tad here to allow 7 more players to join the club. We also part ways with the 4 players who played less than 40 innings. Q3 for George Headley majestically rises over 90 underlining why he is appreciated nearly as much as Bradman despite a short career. Kumar Sangakkara is the highest run scorer in this club with over 11,000 runs taking the 5th spot. Hammond with an under-30 value of G sneaks between Walcott and Hutton. Sobers moves from edge of the box towards centre so that 4 new players can enter the fray, Greg Chappell,  Dudley Nourse ,  Mohammad Yousuf & Matthew Hayden.

18 players with µ > 47 and G > 27 over 60 or more innings

18 players with µ > 47 and G > 27 over 60 or more innings

The net is cast wider to include 18 players with µ > 47 & G > 27. Headley completes his cameo and we don’t see any change in the line-up until we get to the middle of chart where Tendulkar fills the light green bar with the weight of his nearly 16000 runs. His Arithmetic Mean is just under 50, higher than 7 other players next to him but a lower consistency with G < 30 compared to Chappell, Nourse, Yousuf and Hayden. We welcome three other entrants – Jacques KallisVirender Sehwag and Javed Miandad.

25 players with µ > 46 and G > 26 over 70 or more innings

25 players with µ > 46 and G > 26 over 70 or more innings

This is the last chart with 25 players where the players from 10,000+ club are fairly represented. Lara with µ > 50 takes his place in the left half of the chart before Sobers. Nourse has left the fray having played less than 70 innings. de Villiers and Dravid find a place above Miandad. The last 5 entrants to this exclusive club are – PontingKanhaiHusseyGraeme Smith & Compton

Those missing from the 10k club are JayawardeneChanderpaulBorderWaugh & Gavaskar.

You will find the top 50 run scores, those with 6800+ runs, in the first chart at the top of this post. All charts can be clicked to view the full size image.  Top players are compared in the attached video using individual player charts sprinkled throughout this article.

Advertisements

2 thoughts on “Vijay Merchant’s G and Cyril Walters’ µ

  1. Pingback: India XI | Random Keystrokes

  2. Pingback: India XI | Random Keystrokes

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s