Let us find the average of a set of values. A rectangle has sides 3 & 5 so its perimeter is 3+5+3+5 = 16. An equivalent square with the same perimeter will have each side equalling 4 which is the Arithmetic Mean of 3 & 5.
Here we add both the numbers and divide the sum by number of data points which is 2. So (3+5)/2 = 4. Arithmetic mean A is defined via the equation:
Here Σ is the summation operator and the above notation is shorthand for A = (a1 + a2 + a3+ a4 + … + ai + … +an) /n
Now let us consider another rectangle with sides 2 & 8 which has a total area of 16. We can find an equivalent square of side 4 with the same area. In this case 4 is the Geometric Mean of 2 & 8.
Here we multiply both the numbers and then take the square root of the resulting product. The square root of (2*8)=16 is 4 which is shown as 2√(2 × 8) = 4.
So what is the Geometric Mean of a cuboid with sides 2, 4 & 8? For three values it will be the cube root of the product (2*4*8) which turns out to be 4.
We can extend this logic to multiple data points. Geometric mean G is computed by taking the product of all the numbers and finding the nth root of the result. It is defined via the equation:
Here π is the product operator.
Logarithms are a means to simplify calculations. This common technique involves transforming each number to a new domain, carrying out simpler arithmetic and transform the result back to get required answer. It is often used for product, quotient, power and root (by using addition, subtraction, multiplication and division in the log domain). The above equation for GM using logarithm can also be written as:
A quick comparison with the equation for Arithmetic Mean reveals that ‘log of the Geometric Mean(G) is the Arithmetic Mean(A) of the logs of the numbers.’
We have understood Geometric Mean in terms of Geometry. Since we can’t have a side with negative or zero values, Geometric Mean applies to positive numbers only. Also logarithm of a negative number is not defined and a single zero value in the data set will make the product zero.
Geometric Mean is practically used to determine the average investment returns. Consider a portfolio with 5 annual returns of say 0.1, 0, -0.1, -0.2 & 0.2. The Arithmetic Mean of these values is zero.
But a notional investment of 100 at the end of first year will be 110, no change at the end of year 2, down to 99 in year 3, further down to 79.2 in year 4 and finally up to 95.04 at the end of year 5. This shows that the portfolio will lose .05 in 5 years or 0.01 every year.
Arithmetic Mean of zero is not relevant in this case. Since Geometric Mean applies to positive numbers only, we will add 1 to each value. The product of (1.1*1*0.9*0.8*1.2)=.9504 and its fifth root rounded to 2 decimals is 0.99. The Geometric Mean can be computed as -.01 by subtracting 1 which is the average return for our portfolio.
The 2008 book by Lawrence Weinstein and John A. Adams called ‘Guesstimation: Solving the World’s Problems on the Back of a Cocktail Napkin‘ suggests that if you have an intuition about the upper and lower bounds for any unknown quantity, then your best guess should be the Geometric Mean instead of commonly used Arithmetic Mean. The book asks, ‘How many clowns can fit in a VW Bug?’ and suggests that the answer could be anything between 1 and 100, so go for 2√(1*100) i.e. 10. If you tried 50 then it would be 50 times the lower bound(1) but only half the upper bound(100) whereas the Geometric Mean of 10 is ten times the lower bound and a tenth of the upper bound.
Geometric Mean tends to dampen the effect of a few very high values. What is the mean value of a house on a street where most of the properties are similarly priced but one of these happens to be a mansion? Suppose there are ten properties in a street valued at 100K, 80K, 50K, 95K, 120K, 70K, 105K, 1 Million, 60K & 90K. The Arithmetic Mean of 177K is higher than the price of remaining 9 properties but Geometric Mean of about 106K represents a fairer average.
Let us take a look at the 80 scores by Don Bradman in four batches of 20:
0,0,0,0,0,0,0,1,1,2,4,7,8,8,12,13,13,13,14,16, 18,18,24,25,25,26,29,30*,30,33,36,37*,38,38,40,43,48,49,51,56*, 57*,58,63,66,71,76,77,79,79,82,89,102*,103*,103,112,112,123,127*,131,132, 138,144*,152,167,169,173*,185,187,201,212,223,226,232,234,244,254,270,299*,304,334
The sum of his 80 scores is 6996. He remained not out in 10 of those innings which are shown as 299* etc. in above list. A cricket average is computed by adding individual scores but dividing by number of completed innings only which is 70 in this case. So Don Bradman’s Cricket Batting Average is the iconic 99.94 but the conventional Arithmetic Mean (or Runs per Inning) is 87.45.
A set of numbers such as heights, weights or property prices are non-zero with plenty of values around the mean and a few extreme values at either end. Such a set of numbers is called normally distributed. But the above 80 scores are not. In 20 of those innings, Bradman failed to reach 18. At the other end he scored more than 132 in his top 20 innings. There are 14 single digit scores which include 7 zeroes and 12 scores of 200 or more. His lower scores are very close to one another e.g. 0,1,2,.., 7,8,.. , 12, 13, .., 24, 25, 26, .., 37, 38, 40, .., 56, 57, 58 etc but top 10 scores 223, 226, 232, 234, 244, 254, 270, 299*,304,334 are wider apart. With relatively few high values, most of the distribution is concentrated on the lower side. Such distribution is said to be right-skewed.
For substantially positive skewness (with zero values), Logarithmic data transformation is used. This handout suggests deriving transformed values using the SPSS command NEWX = LG10(X + C) where C = a constant added to each score so that the smallest score is 1.
It is time to combine the two topics of this post. We would like to calculate GM for batting scores which include zero values. We have established that adding a unit value to all the data points, calculating GM and subtracting the unit from the result is an acceptable compromise. GM is generally used for a set of numbers whose values are meant to be multiplied together or are exponential in nature, not for batting scores. But we have seen the value of dampening the effect of few higher scores in right-skewed distribution. It is interesting to note that this type of data gets logarithmic transformation and GM is based on the AM of the logs of numbers.
A dampened Geometric Average is a fair estimate for the central tendency of batting scores by an individual. By not giving any special treatment to not-outs, the Geometric Mean for 80 scores by Don Bradman is calculated as 38.59. Table below lists GM, Average, RpI, Median and Mode for selected players.
|Geometric Mean||Cricket Average||Mean (RpI)||Median||Mode|