When working with a large numerical data set, it is useful to represent the whole data set with one number. For example, we can use arithmetic mean. Except for mean, other measures of center in a given data set are mode and median.
Mode
Mode, denoted by $M_{o}$, is the property of a variable which has the highest frequency. We can say that mode is the property which appears most frequently.
Example 1: The mode of the set $\{1, 2, 3, 4, 2, 2, 5, 2\}$ is number $2$ because $2$ occurs $4$ times, which is more than any other element of a given set.
Example 2: The mode of the set $\{2.5, 2.6, 2.45, 3, 3.1\}$ doesn’t exist since each element of the given set is different.
Note: A set can have more than one mode. If the set has two modes, it is bimodal. Furthermore, if it has three modes, it is trimodal and so on.
Example 3: The set $\{8, 8 , 11, 8, 24, 13, 11, 11\}$ is bimodal. In other words, numbers $8$ and $11$ are modes as they each appear three times and no other element appears more than that.
Mode of grouped data
If a distribution of numeric variable is grouped in classes, we define modal class as class with the highest frequency. We can estimate the mode using the following formula:
$$M_{o} = L_{1} + \frac{b – a}{(b-a) + (b-c)}l,$$
where $L_{1}$ is lower class boundary, $l$ its width, $a$ the frequency of the class before the modal class, $b$ the frequency of the modal class and $c$ the frequency of the class after the modal class.
Example 4: Tom wrote the results of the sprint race for $21$ competitors and grouped them in the following table. Calculate the mode.
Solution:
Modal class is obviously $60 – 64$, which means that
$$L_{1} = 59.5, a = 7, b = 8, c = 4, l = 5.$$
Therefore, the mode is
$$M_{o} = 59.5 + \frac{8-7}{(8 – 7) + (8 – 4)}\cdot 5 = 59.5 + \frac{1}{1 + 4} \cdot 5 = 59.5 + 1 = 60.5.$$
Median
The median is the middle point in a given data set. In other words, half of the data points are smaller than the median and half of them are larger. More formally, the following holds:
Let $y_{1}, \cdots, y_{N}$ be a grouped statistical sequence. Precisely, let $y_{1}, \cdots, y_{N}$ be the values of numeric variable so that $y_{1}\leq \cdots \leq y_{N}$.
Furthermore, let $r = Int \left(\frac{N}{2}\right) + 1$, where Int denotes the whole value of a number without decimals (e.g. Int $7.9=7$).
First case: $N$ is odd
We define median as a value of the r – th member of sequence, $y_{r}$.
Second case: $N$ is even
We define median as $$M_{e} = \frac{y_{r-1}+y_{r}}{2}.$$
Therefore, in order to find the median, we need to arrange points from smallest to largest. If $N$ is odd, the median is the middle data point in the list. If $N$ is even, the median is the average of the two middle data points in the list.
Example 5: Find the median of the following data: $2, 5, 8, 13, 18$.
Solution:
$$r = Int \left(\frac{5}{2}\right) + 1 = 3 \Rightarrow M_{e} = y_{r} = y_{3} = 8$$
Example 6: Find the mode and median of the following data: $1, 3, 3, 2, 5, 3, 7, 7, 8, 8, 10, 11$.
Solution:
First, we need to rearrange the data set so that numbers start from the smallest and end with the largest:
$$1, 2, 3, 3, 3, 5, 7, 7, 8, 8, 10, 11$$
$M_{o} = 3$, since number $3$ occurs $3$ times, and no other element occurs more than that.
$N$ is even, so $M_{e} = \frac{y_{6} + y_{7}}{2} = \frac{5 + 7}{2} = 6$.
In this example we can see that median doesn’t have to be in the list of given numbers.
Median of grouped data
If a distribution of numeric variable is grouped in classes, we define median class as first class $[L_{1}, L_{2}]$ whose cumulative frequency is greater than or equal to $\frac{N}{2}$. We can calculate the median using the following formula:
$$M_{e} = L_{1} + \frac{\frac{N}{2}- F(L_{1})}{f_{med}}l,$$
where $f_{med}$ is a frequency of median class, $l = L_{2} – L_{1}$ its width and $F(L_{1})$ cumulative frequency.
Example 7: The following table shows the number of unemployed people in Croatia in $1999$. Calculate the median.
Solution:
$$N = 341730, \frac{N}{2} = 170865$$
The class with the highest frequency ($119819$) is median class. Therefore,
$$L_{1} = 25, f_{med} = 119819, l = 5, F(L_{1}) = 115652.$$
Finally, the median is
$$M_{e} = L_{1} + \frac{\frac{N}{2} – F(25)}{f_{med}}l = 27.304.$$
In conclusion, we can say that the age of the first half of the people which were unemployed was $27$ years or less and the other half were people older than $27$ years.
Note: We can imagine mode as a point in the base of polygon of frequency in which the polygon has the highest value. Furthermore, we can interpret median as a point in the base in which a perpendicular divides the polygon in two parts of equal areas.