A box and whisker plot (or box plot) is a graph that displays the data distribution by using five numbers. Those five numbers are: the minimum, first (lower) quartile, median, third (upper) quartile and maximum. Remember that we defined median in the lesson Mode and median and quartiles in the lesson Quantiles.
Interpreting box and whisker plots
Example 1: Find the range, the interquartile range and the median of the data in the box plot below.
Solution:
Recall that we defined range and interquartile range in the lesson Other measures of dispersion.
Since the minimum value of the given data is $5$ and maximum $50$, the range is $R_{X} = 50 – 5 = 45$.
The lower quartile is $15$ and the upper quartile is $35$. Therefore, the interquartile range is $I_{Q} = 35 – 15 = 20$. Actually, the interquartile range represents the length of the box.
The median is obviously $25$.
Example 2: The following data represents the number of sold items of some shop in one hour during one week:
$$7, 13, 5, 17, 26, 20, 10.$$
Which of the box plots below represents the given data?
a)
b)
c)
Solution:
First we need to order the data points from smallest to largest:
$$5, 7, 10, 13, 17, 20, 26.$$
The minimum value is $5$ and the maximum $26$. Therefore, a) is certainly not the answer since for box plot in a) the maximum value is $20$. Let’s calculate the median of the given data.
Since the number of data points is odd, we have:
$$M_{e} = 13.$$
As we can see, the median in the box plot in b) is $M_{e} = 13$. Furthermore, median in the box plot in c) is $M_{e} = 15$.
In conclusion, the correct answer is b).
Creating box and whisker plots
Example 3: Martha threw the dice $20$ times and got these results:
$$6 \ 3 \ 3 \ 6 \ 3 \ 5 \ 6 \ 1 \ 4 \ 6 $$
$$3 \ 5 \ 5 \ 2 \ 2 \ 2 \ 2 \ 3 \ 2 \ 3.$$
Draw a box plot.
Solution:
The first thing we need to do is to order the data from smallest to largest:
$$1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5, 6, 6, 6, 6.$$
Furthermore, we need to calculate the median. Since the number of data points is even, we have
$$Me = \frac{x_{10}+x_{11}}{2} = \frac{3 + 3}{2} = 3.$$
After that we have to calculate the quartiles.
As we mentioned in the lesson Quantiles, the lower quartile is the middle value of the data points on the left side of the median. Furthermore, the upper quartile is the middle value of the data points to the right of the median. In other words, the lower quartile is
$$Q_{1} = \frac{2 + 2}{2} = 2,$$
while the upper quartile is
$$Q_{3} = \frac{5 + 5}{2} = 5.$$
The minimum is the smallest data point, i.e. $1$. The maximum is the largest data point, i.e. $6$.
The next step is to scale an appropriate axis for obtained $5$ numbers.
Then we need to draw a box from $Q_{1}$ to $Q_{3}$ and put a vertical line through the median. Furthermore, we have to draw ”whiskers”. Those are the lines which extend parallel with the scale from the box. In other words, whisker goes from $Q_{1}$ to the minimum and from $Q_{3}$ to the maximum.
Finally, our box plot is:
Example 4: The following data are the weights (in kg) of $20$ students:
$$64, 50, 53, 89, 54, 55, 57, 75, 57, 92, 58, 61, 63, 66, 67, 70, 76, 85, 88, 95.$$
Draw a box plot.
What percentage of students have more than $80$ kg? What percentage have less than $65$ kg?
Solution:
By ordering the given set, we obtain
$$50, 53, 54, 55, 57, 57, 58, 61, 63, 64, 66, 67, 70, 75, 76, 85, 88, 89, 92, 95.$$
The minimum value is $50$, while the maximum value is $95$.
Now we can calculate the median:
$$M_{e} = \frac{64 + 66}{2} = 65.$$
Furthermore,
$$Q_{1} = \frac{57 + 57}{2} = 57, Q_{3} = \frac{76 + 85}{2} = 80.5.$$
Box plot is:
$80$ kg is an upper quartile $Q_{3}$, so one section of the box plot is greater than $Q_{3}$.
Since each section has $25 \%$ of the given data, we conclude that $25\%$ of the students have more than $80$ kg.
Similarly, $65$ kg is the median of the given data. Therefore, $50\%$ of the students have less than $65$ kg.
Comparing box and whisker plots
Example 5: The box plots below show an amount of time that men and women spend per day reading.
Using the box plots, answer the questions.
Time – men:
Time – women:
a) Approximate the interquartile range for the given box plots.
b) What percentage of men spend more than $2.5$ hours per day reading? Similarly, what percentage of women spend more than $2.5$ hours per day reading?
Solution:
a) For men:
The lower quartile is approximately $Q_{1} = 1.25$. In other words, $1.25$ hours per day.
The upper quartile is $Q_{3} = 2.5$ hours per day.
Therefore, the interquartile range is $I_{Q} = 2.5 – 1.25 = 1.25$ hours per day.
For women:
The lower quartile is approximately $Q_{1} = 1.25$ hours per day.
The upper quartile is $Q_{3} = 3$ hours per day.
In other words, the interquartile range is $I_{Q} = 3 – 1.25 = 1.75$ hours per day.
b) For men:
$2.5$ hours is an upper quartile $Q_{3}$, so one section of the box plot is greater than $Q_{3}$. Since each section has $25 \%$ of the given data, we conclude that $25\%$ of men spend more than $2.5$ hours per day reading.
For women:
$2.5$ hours is the median $M_{e}$ of the given data. Similarly as for the first box plot, we conclude that $50\%$ of women spend more than $2.5$ hours per day reading.