Standard deviation and variance


Let $X: S \rightarrow X(S) = \{x_{1}, x_{2}, \cdots, x_{k}\} \subseteq \mathbf{R}$ be a numeric variable given on finite population with $N$ elements. Furthermore, let $\{(x_{i}, f_{i}), i = 1, \cdots, k\}$ be its frequency distribution.

Definition: Variance $\sigma ^{2}$ of numeric variable given on finite population is the sum of all products between square of the difference between value of property $x_{i}$ and arithmetic mean $\mu$ and frequency $f_{i}$ of that property, divided by the number of elements of population. In other words, $\sigma ^{2}$ is given with formula:

$$\sigma ^{2} = \frac{\sum_{i = 1}^{k}f_{i}(x_{i} – \mu)^{2}}{\sum_{i = 1}^{k}f_{i}}.$$

If a variable is given with statistical line $y_{1}, \cdots, y_{N}$, then its variance is obviously

$$\sigma ^{2} (y_{1}, \cdots, y_{N}) = \frac{\sum_{i = 1}^{N}(y_{i}- \mu (y_{1}, \cdots, y_{N}))^{2}}{N}.$$


Standard deviation


The standard deviation is useful because it gives information about how far away the data is from the arithmetic mean.

Definition: Standard deviation is equal to the square root of the variance, i.e.

$$\sigma = \sqrt{\frac{\sum_{i = 1}^{k}f_{i}(x_{i} – \mu)^{2}}{\sum_{i = 1}{k}f_{i}}}.$$


Therefore, $\sigma$ is the average deviation of values of numeric variable from its arithmetic mean.

It should always be observed along with the arithmetic mean $\mu$ or in proportion $V = \frac{\sigma}{\mu}100 \%$. $V$ is called the coefficient of variation.


Chebyshev’s theorem: Let $\mu$ and $\sigma$ be the arithmetic mean and standard deviation of a numeric variable $X:S\rightarrow \mathbf{R}$. Furthermore, let $k \in \mathbf{R}, k > 1$.

Then at least $\left(1 – \frac{1}{k^{2}}\right)100 \%$ elements have a property which is an element of an interval $\left<\mu – k\sigma, \mu + k\sigma\right>$. In other words, the sum of relative frequencies of all the properties which belong to that interval is at least $1 – \frac{1}{k^{2}}$.


standard deviation and variance 2023


The consequence of the Chebyshev’s theorem is that at least $75 \%$ of the elements of population has a numeric property in interval $\left<\mu – 2 \sigma, \mu + 2 \sigma\right>$. Moreover, at least $89 \%$ of the elements is in the interval $\left<\mu – 3\sigma, \mu + 3\sigma\right>$ and at least $93 \%$ of the elements is in the interval $\left<\mu – 4 \sigma, \mu + 4 \sigma \right>$.




Example 1:  If average packing of box of sugar is $750 g$ and deviation $5 g$, then the average deviation from average weight of packing is $V = \frac{5}{750}100 = 0.66 \%$.

Example 2:  The age of the population of some country is given in the following table. Calculate $\mu$ and $\sigma$ and interpret the results.

standard deviation and variance 2023



$$\mu = \frac{\sum_{i = 1}^{7}f_{i}x_{i}}{\sum_{i = 1}^{7}f_{i}} = \frac{280,1 \cdot 2,5 + \cdots + 216,9 \cdot 82,5}{4.712,3} =\frac{175 708}{4.712,3} = 37,29$$

$$\sigma = \sqrt{\frac{\sum_{i = 1}^{7}f_{i}(x_{i}- \mu)^{2}}{4.712,3}} = \sqrt{\frac{2224367,07}{4.712,3}} \approx 21.7$$

$\mu$ is the average age of population and $\sigma$ is the average deviation from that value.


Example 3: (Example 3 from the lesson Arithmetic mean) Calculate the variance and standard deviation.


The variance is

$$\sigma ^{2} = \frac{\sum_{i = 1}^{k}f_{i}(x_{i} – \mu)^{2}}{\sum_{i = 1}^{k}f_{i}} = \frac{\sum_{i = 1}^{k}f_{i}(x_{i} – 22.5)^{2}}{12} = \frac{1 \cdot (10 – 22.5)^{2} + 2 \cdot (15 – 22.5)^{2} + \cdots + 1 \cdot (40 – 22.5)^{2}}{12}$$

$$= \frac{775}{12} = 64.58.$$

The standard deviation is

$$\sigma = \sqrt{64.58} = 8.04.$$

Notice that all the employees except for one, i.e. $91.67 \%$ of the employees, have a property in the interval $\left<\mu – 2 \sigma, \mu + 2 \sigma\right> = \left<6.42, 38.58\right>$.

By the Chebyshev’s theorem, we conclude that at least $75 \%$ of the data is in the same interval.