CHAPTER Measures of Central Tendency Studying this chapter should enable you to: • understand the need for summarising a set of data by one single number; • recognise and distinguish between the different types of averages; • learn to compute different types of averages; • draw meaningful conclusions from a set of data; • develop an understanding of which type of average would be most useful in a particular situation. 1. I N T R O D U C T I O N In the previous chapter, you have read the tabular and graphic representation of the data. In this chapter, you will study the measures of central tendency which is a numerical method to explain the data in brief. You can see examples of summarising a large set of data in day to day life like average marks obtained by students of a class in a test, average rainfall in an area, average production in a factory, average income of persons living in a locality or working in a firm etc. Baiju is a farmer. He grows food grains in his land in a village called Balapur in Buxar district of Bihar. The village consists of 50 small farmers. Baiju has 1 acre of land. You are interested in knowing the economic condition of small farmers of Balapur. You want to compare the economic MEASURES OF CENTRAL TENDENCY condition of Baiju in Balapur village. For this, you may have to evaluate the size of his land holding, by comparing with the size of land holdings of other farmers of Balapur. You may like to see if the land owned by Baiju is – 1. above average in ordinary sense (see the Arithmetic Mean below) 2. above the size of what half the farmers own (see the Median below) 3. above what most of the farmers own (see the Mode below) In order to evaluate Baiju’s relative economic condition, you will have to summarise the whole set of data of land holdings of the farmers of Balapur. This can be done by use of central tendency, which summarises the data in a single value in such a way that this single value can represent the entire data. The measuring of central tendency is a way of summarising the data in the form of a typical or representative value. There are several statistical measures of central tendency or “averages”. The three most commonly used averages are: • Arithmetic Mean • Median • Mode You should note that there are two more types of averages i.e. Geometric Mean and Harmonic Mean, which are suitable in certain situations. However, the present discussion will be limited to the three types of averages mentioned above. 5 9 2. ARITHMETIC MEAN Suppose the monthly income (in Rs) of six families is given as: 1600, 1500, 1400, 1525, 1625, 1630. The mean family income is obtained by adding up the incomes and dividing by the number of families. Rs 1600 + 1500 + 1400 + 1525 + 1625 + 1630 6 = Rs 1,547 It implies that on an average, a family earns Rs 1,547. Arithmetic mean is the most commonly used measure of central tendency. It is defined as the sum of the values of all observations divided by the number of observations and is usually denoted by x . In general, if there are N observations as X1, X2, X3, ..., XN, then the Arithmetic Mean is given by X 1 + X 2 + X 3 + ... + X N N SX = N x= Where, S X = sum of all observations and N = total number of observations. How Arithmetic Mean is Calculated The calculation of arithmetic mean can be studied under two broad categories: 1. Arithmetic Mean for Ungrouped Data. 2. Arithmetic Mean for Grouped Data. 6 0 STATISTICS FOR ECONOMICS Arithmetic Mean for Series of Ungrouped Data Direct Method Arithmetic mean by direct method is the sum of all observations in a series divided by the total number of observations. Example 1 Calculate Arithmetic Mean from the data showing marks of students in a class in an economics test: 40, 50, 55, 78, 58. X= = SX N 40 + 50 + 55 + 78 + 58 = 56.2 5 The average marks of students in the economics test are 56.2. Assumed Mean Method If the number of observations in the data is more and/or figures are large, it is difficult to compute arithmetic mean by direct method. The computation can be made easier by using assumed mean method. In order to save time of calculation of mean from a data set containing a large number of observations as well as large numerical figures, you can use assumed mean method. Here you assume a particular figure in the data as the arithmetic mean on the basis of logic/experience. Then you may take deviations of the said assumed mean from each of the observation. You can, then, take the summation of these deviations and divide it by the number of observations in the data. The actual arithmetic mean is estimated by taking the sum of the assumed mean and the ratio of sum of deviations to number of observations. Symbolically, Let, A = assumed mean X = individual observations N = total numbers of observations d = deviation of assumed mean from individual observation, i.e. d = X – A (HEIGHT IN INCHES) MEASURES OF CENTRAL TENDENCY Then sum of all deviations is taken as Sd = S( X - A ) Then find Sd N 6 1 Arithmetic Mean using assumed mean method Sd = 850 + (2, 660)/10 N = Rs1,116. X =A + Sd to get X N Sd Therefore, X = A + N You should remember that any value, whether existing in the data or not, can be taken as assumed mean. However, in order to simplify the calculation, centrally located value in the data can be selected as assumed mean. Then add A and Example 2 The following data shows the weekly income of 10 families. Family A B C D E F G H I J Weekly Income (in Rs) 850 700 100 750 5000 80 420 2500 400 360 Compute mean family income. TABLE 5.1 Computation of Arithmetic Mean by Assumed Mean Method Families Income (X) d = X – 850 d' = (X – 850)/10 A B C D E F G H I J 850 700 100 750 5000 80 420 2500 400 360 0 –150 –750 –100 +4150 –770 –430 +1650 –450 –490 0 –15 –75 –10 +415 –77 –43 +165 –45 –49 11160 +2660 +266 Thus, the average weekly income of a family by both methods is Rs 1,116. You can check this by using the direct method. Step Deviation Method The calculations can be further simplified by dividing all the deviations taken from assumed mean by the common factor ‘c’. The objective is to avoid large numerical figures, i.e., if d = X – A is very large, then find d'. This can be done as follows: d X-A = . c C The formula is given below: S d¢ ·c N Where d' = (X – A)/c, c = common factor, N = number of observations, A= Assumed mean. Thus, you can calculate the arithmetic mean in the example 2, by the step deviation method, X = 850 + (266)/10 × 10 = Rs 1,116. X =A + Calculation of arithmetic mean for Grouped data Discrete Series Direct Method In case of discrete series, frequency against each of the observations is 6 2 STATISTICS FOR ECONOMICS multiplied by the value of the observation. The values, so obtained, are summed up and divided by the total number of frequencies. Symbolically, X = S fX Sf Where, S fX = sum of product of variables and frequencies. S f = sum of frequencies. Example 3 Calculate mean farm size of cultivating households in a village for the following data. Farm Size (in acres): 64 63 62 61 60 18 12 9 7 6 –3 –7 Arithmetic mean using direct method, X = S fd Sf using assumed mean method. d X-A = in order to c C reduce the size of numerical figures for easier calculation. Then get fd' and S fd'. Finally the formula for step deviation method is given as, estimate d' = Farm Size No. of X d fd (X) cultivating (1 × 2) (X - 62) (2 × 4) in acres households(f) (1) (2) (3) (4) (5) 64 8 512 +2 +16 63 18 1134 +1 +18 62 12 744 0 0 61 9 549 –1 –9 60 7 420 –2 –14 59 6 354 –3 –18 3713 X =A + In this case the deviations are divided by the common factor ‘c’ which simplifies the calculation. Here we TABLE 5.2 Computation of Arithmetic Mean by Direct Method 60 As in case of individual series the calculations can be simplified by using assumed mean method, as described earlier, with a simple modification. Since frequency (f) of each item is given here, we multiply each deviation (d) by the frequency to get fd. Then we get S fd. The next step is to get the total of all frequencies i.e. S f. Then find out S fd/ S f. Finally the arithmetic mean is calculated by Step Deviation Method 59 No. of Cultivating Households: 8 Assumed Mean Method S fX 3717 = = 61.88 acres Sf 60 Therefore, the mean farm size in a village is 61.88 acres. X =A + S fd ¢ ·c Sf Activity • Find the mean farm size for the data given in example 3, by using step deviation and assumed mean methods. Continuous Series Here, class intervals are given. The process of calculating arithmetic mean MEASURES OF CENTRAL TENDENCY 6 3 in case of continuous series is same as that of a discrete series. The only difference is that the mid-points of various class intervals are taken. You should note that class intervals may be exclusive or inclusive or of unequal size. Example of exclusive class interval is, say, 0–10, 10–20 and so on. Example of inclusive class interval is, say, 0–9, 10–19 and so on. Example of unequal class interval is, say, 0–20, 20–50 and so on. In all these cases, calculation of arithmetic mean is done in a similar way. Example 4 Calculate average marks of the following students using (a) Direct method (b) Step deviation method. Marks 0–10 10–20 20–30 30–40 40–50 50–60 60–70 No. of Students 5 12 15 25 8 3 2 TABLE 5.3 Computation of Average Marks for Exclusive Class Interval by Direct Method (1) 0–10 10–20 20–30 30–40 40–50 50–60 60–70 No. of mid fm d'=(m-35) fd' students value (2)×(3) 10 (f) (m) (2) (3) (4) (5) (6) 5 5 25 –3 –15 12 15 180 –2 –24 15 25 375 –1 –15 25 35 875 0 0 8 45 360 1 8 3 55 165 2 6 2 65 130 3 6 70 1. Obtain mid values for each class denoted by m. 2. Obtain S fm and apply the direct method formula: X= S fm 2110 = = 30.14 marks Sf 70 Step deviation method m A c 2. Take A = 35, (any arbitrary figure), c = common factor. 1. Obtain d' = £ fd’ ( 34) c = 35 + £f 70 = 30.14 marks X = A+ 10 An interesting property of A.M. Direct Method Mark (x) Steps: 2110 –34 It is interesting to know and useful for checking your calculation that the sum of deviations of items about arithmetic mean is always equal to zero. Symbolically, S ( X – X ) = 0. However, arithmetic mean is affected by extreme values. Any large value, on either end, can push it up or down. Weighted Arithmetic Mean Sometimes it is important to assign weights to various items according to their importance, when you calculate the arithmetic mean. For example, there are two commodities, mangoes and potatoes. You are interested in finding the average price of mangoes (p1) and potatoes (p2). The arithmetic 6 4 STATISTICS FOR ECONOMICS mean will be p1 + p2 . However, you 2 might want to give more importance to the rise in price of potatoes (p2). To do this, you may use as ‘weights’ the quantity of mangoes (q1) and the quantity of potatoes (q2). Now the arithmetic mean weighted by the quantities would be q1p1 + q 2 p 2 q1 + q 2 . In general the weighted arithmetic mean is given by, w1 x1 + w 2 x 2 +...+ w n x n £ wx = £w w1 + w 2 +...+ w n When the prices rise, you may be interested in the rise in the price of the commodities that are more important to you. You will read more about it in the discussion of Index Numbers in Chapter 8. Activities • • • • Check this property of the arithmetic mean for the following example: X: 4 6 8 10 12 In the above example if mean is increased by 2, then what happens to the individual observations, if all are equally affected. If first three items increase by 2, then what should be the values of the last two items, so that mean remains the same. Replace the value 12 by 96. What happens to the arithmetic mean. Comment. 3. MEDIAN The arithmetic mean is affected by the presence of extreme values in the data. If you take a measure of central tendency which is based on middle position of the data, it is not affected by extreme items. Median is that positional value of the variable which divides the distribution into two equal parts, one part comprises all values greater than or equal to the median value and the other comprises all values less than or equal to it. The Median is the “middle” element when the data set is arranged in order of the magnitude. Computation of median The median can be easily computed by sorting the data from smallest to largest and counting the middle value. Example 5 Suppose we have the following observation in a data set: 5, 7, 6, 1, 8, 10, 12, 4, and 3. Arranging the data, in ascending order you have: 1, 3, 4, 5, 6, 7, 8, 10, 12. The “middle score” is 6, so the median is 6. Half of the scores are larger than 6 and half of the scores are smaller. If there are even numbers in the data, there will be two observations which fall in the middle. The median in this case is computed as the MEASURES OF CENTRAL TENDENCY arithmetic mean of the two middle values. 6 5 th (N+1) item Median = size of 2 Example 6 Discrete Series The following data provides marks of 20 students. You are required to calculate the median marks. 25, 72, 28, 65, 29, 60, 30, 54, 32, 53, 33, 52, 35, 51, 42, 48, 45, 47, 46, 33. In case of discrete series the position of median i.e. (N+1)/2th item can be located through cumulative frequency. The corresponding value at this position is the value of median. Arranging the data in an ascending order, you get Example 7 25, 28, 29, 30, 32, 33, 33, 35, 42, 45, 46, 47, 48, 51, 52, 53, 54, 60, 65, 72. You can see that there are two observations in the middle, namely 45 and 46. The median can be obtained by taking the mean of the two observations: 45 + 46 = 45.5 marks 2 In order to calculate median it is important to know the position of the median i.e. item/items at which the median lies. The position of the median can be calculated by the following formula: Median = th (N+1) item Position of median = 2 Where N = number of items. You may note that the above formula gives you the position of the median in an ordered array, not the median itself. Median is computed by the formula: The frequency distribution of the number of persons and their respective incomes (in Rs) are given below. Calculate the median income. Income (in Rs): Number of persons: 10 2 20 4 30 10 40 4 In order to calculate the median income, you may prepare the frequency distribution as given below. TABLE 5.4 Computation of Median for Discrete Series Income (in Rs) 10 20 30 40 No of persons(f) Cumulative frequency(cf) 2 4 10 4 2 6 16 20 The median is located in the (N+1)/ 2 = (20+1)/2 = 10.5th observation. This can be easily located through cumulative frequency. The 10.5th observation lies in the c.f. of 16. The income corresponding to this is Rs 30, so the median income is Rs 30. Continuous Series In case of continuous series you have to locate the median class where 6 6 N/2th item [not (N+1)/2th item] lies. The median can then be obtained as follows: (N/2 c.f.) h f Where, L = lower limit of the median class, c.f. = cumulative frequency of the class preceding the median class, f = frequency of the median class, h = magnitude of the median class interval. No adjustment is required if frequency is of unequal size or magnitude. Median = L + Example 8 Following data relates to daily wages of persons working in a factory. Compute the median daily wage. Daily wages (in Rs): 55–60 50–55 45–50 40–45 35–40 30–35 25–30 20–25 Number of workers: 7 13 15 20 30 33 28 14 The data is arranged in ascending order here. STATISTICS FOR ECONOMICS In the above illustration median class is the value of (N/2)th item (i.e.160/2) = 80th item of the series, which lies in 35–40 class interval. Applying the formula of the median as: TABLE 5.5 Computation of Median for Continuous Series Daily wages (in Rs) No. of Workers (f) 20–25 25–30 30–35 35–40 40–45 45–50 50–55 55–60 14 28 33 30 20 15 13 7 Cumulative Frequency 14 42 75 105 125 140 153 160 (N/2 c.f.) h f 35 +(80 75) = (40 35) 30 = Rs 35.83 Median = L + Thus, the median daily wage is Rs 35.83. This means that 50% of the MEASURES OF CENTRAL TENDENCY workers are getting less than or equal to Rs 35.83 and 50% of the workers are getting more than or equal to this wage. You should remember that median, as a measure of central tendency, is not sensitive to all the values in the series. It concentrates on the values of the central items of the data. 6 7 The third Quartile (denoted by Q3) or upper Quartile has 75% of the items of the distribution below it and 25% of the items above it. Thus, Q1 and Q3 denote the two limits within which central 50% of the data lies. Activities • Find mean and median for all four values of the series. What do you observe? TABLE 5.6 Mean and Median of different series Series X (Variable Values) Mean Median A B C D 1, 2, 3 1, 2, 30 1, 2, 300 1, 2, 3000 ? ? ? ? ? ? ? ? • • Is median affected by extreme values? What are outliers? Is median a better method than mean? Quartiles Quartiles are the measures which divide the data into four equal parts, each portion contains equal number of observations. Thus, there are three quartiles. The first Quartile (denoted by Q1) or lower quartile has 25% of the items of the distribution below it and 75% of the items are greater than it. The second Quartile (denoted by Q2) or median has 50% of items below it and 50% of the observations above it. Percentiles Percentiles divide the distribution into hundred equal parts, so you can get 99 dividing positions denoted by P1, P2, P3, ..., P99. P50 is the median value. If you have secured 82 percentile in a management entrance examination, it means that your position is below 18 percent of total candidates appeared in the examination. If a total of one lakh students appeared, where do you stand? Calculation of Quartiles The method for locating the Quartile is same as that of the median in case of individual and discrete series. The value of Q1 and Q3 of an ordered series can be obtained by the following formula where N is the number of observations. Q1= size of (N + 1)th item 4 6 8 STATISTICS FOR ECONOMICS Q3 = size of 3(N +1)th item. 4 Computation of Mode Example 9 Calculate the value of lower quartile from the data of the marks obtained by ten students in an examination. 22, 26, 14, 30, 18, 11, 35, 41, 12, 32. Arranging the data in an ascending order, 11, 12, 14, 18, 22, 26, 30, 32, 35, 41. Q1 = size of (N +1)th item = size of 4 (10 +1)th item = size of 2.75th item 4 = 2nd item + .75 (3rd item – 2nd item) = 12 + .75(14 –12) = 13.5 marks. Activity • Mode is the most frequently observed data value. It is denoted by Mo. Find out Q3 yourself. 5. MODE Sometimes, you may be interested in knowing the most typical value of a series or the value around which maximum concentration of items occurs. For example, a manufacturer would like to know the size of shoes that has maximum demand or style of the shirt that is more frequently demanded. Here, Mode is the most appropriate measure. The word mode has been derived from the French word “la Mode” which signifies the most fashionable values of a distribution, because it is repeated the highest number of times in the series. Discrete Series Consider the data set 1, 2, 3, 4, 4, 5. The mode for this data is 4 because 4 occurs most frequently (twice) in the data. Example 10 Look at the following discrete series: Variable Frequency 10 2 20 8 30 20 40 10 50 5 Here, as you can see the maximum frequency is 20, the value of mode is 30. In this case, as there is a unique value of mode, the data is unimodal. But, the mode is not necessarily unique, unlike arithmetic mean and median. You can have data with two modes (bi-modal) or more than two modes (multi-modal). It may be possible that there may be no mode if no value appears more frequent than any other value in the distribution. For example, in a series 1, 1, 2, 2, 3, 3, 4, 4, there is no mode. Unimodal Data Bimodal Data Continuous Series In case of continuous frequency distribution, modal class is the class with largest frequency. Mode can be calculated by using the formula: MEASURES OF CENTRAL TENDENCY MO = L + D1 D1 + D2 6 9 exclusive to calculate the mode. If mid points are given, class intervals are to be obtained. h Where L = lower limit of the modal class D 1 = difference between the frequency of the modal class and the frequency of the class preceding the modal class (ignoring signs). D2 = difference between the frequency of the modal class and the frequency of the class succeeding the modal class (ignoring signs). h = class interval of the distribution. You may note that in case of continuous series, class intervals should be equal and series should be Example 11 Calculate the value of modal worker family’s monthly income from the following data: Income per month (in ’000 Rs) Below 50 Below 45 Below 40 Below 35 Below 30 Below 25 Below 20 Below 15 Number of families 97 95 90 80 60 30 12 4 As you can see this is a case of cumulative frequency distribution. In order to calculate mode, you will have to covert it into an exclusive series. In TABLE 5.7 Grouping Table Income (in ’000 Rs) I 45–50 40–45 35–40 30–35 25–30 20–25 15–20 10–15 97 – 95 – 90 – 80 – 60 – 30 – 12 – 95 90 80 60 30 12 4 I = 2 = 5 = 10 = 20 = 30 = 18 = 8 4 Group Frequency III IV 7 V VI 17 15 30 35 50 60 48 68 26 56 12 30 TABLE 5.8 Analysis Table Columns 45–50 40–45 I I III IV V VI Total 35–40 Class Intervals 30–35 25–30 × × × × × × × × 1 3 6 × × – – 20–25 15–20 10–15 × × × × 3 1 – 7 0 STATISTICS FOR ECONOMICS this example, the series is in the descending order. Grouping and Analysis table would be made to determine the modal class. The value of the mode lies in 25–30 class interval. By inspection also, it can be seen that this is a modal class. Now L = 25, D1 = (30 – 18) = 12, D2 = (30 – 20) = 10, h = 5 Using the formula, you can obtain the value of the mode as: MO (in ’000 Rs) M= D1 D1 + D2 h 12 5 = Rs 27,273 10+12 Thus the modal worker family’s monthly income is Rs 27,273. = 25 + Activities • A shoe company, making shoes for adults only, wants to know the most popular size of shoes. Which average will be most appropriate for it? • • Take a small survey in your class to know the student’s preference for Chinese food using appropriate measure of central tendency. Can mode be located graphically? 6. RELATIVE POSITION OF ARITHMETIC MEAN, MEDIAN AND MODE Suppose we express, Arithmetic Mean = Me Median = Mi Mode = Mo so that e, i and o are the suffixes. The relative magnitude of the three are M e>M i>M o or M e