Random variables

Motivation

Random variables are very important in probability and statistics. Even though many confuse them with traditional variables they are conceptually different from any mathematical variables we have seen before.

A random variable is a variable whose numerical value is determined by the outcome of a random procedure. Essentially, it is a function from the event space of a random event to the set of real numbers. It takes outcomes from an event space, gives them a value and associates them with their probability.

Random variables have high usage in real world practice. More over, they are used in many branches of research, including biology, economics, physics, psychology and others . They provide a structure for making inferences about the world, especially when it is impossible to measure things comprehensively.

Definition

A function $X : \Omega \rightarrow \mathbb{R}$, where $\Omega$ is the set of all possible outcomes is called a random variable. We want to know the probability of $X$ being a certain value. Specifically, for $a \in \mathbb{R}$,

$$\{X=a\}=X^{-1}(\{a\})=\{\omega \in \Omega : X(\omega)=a\}$$

Generally, for any $B \subseteq \mathbb{R}$ we want to know a value of event

$$\{X \in B\}=X^{-1}(B)=\{\omega \in \Omega : X(\omega) \in B\}$$

Suppose a random variable $X$ may take $n \in \mathbb{N}$ different values, the probability that $X = a_{j}$ is defined as $P(X = a_{j}) = p_{j}$, for $j\in\{1,…n\}$. As we already know from previous lessons, probabilities $p_{j}$ must satisfy

1. $0 \leq p_{j} \leq 1$, for each $j$

2. $p_{1}+p_{2}+…+p_{n}=1$

Let’s use a coin flip as an example once more. Possible outcomes of flipping a coin are getting heads or tails. One possible random variable would be:

$\bullet $ $X=1$ if the flip of the coin is a head
$\bullet $ $X=0$ if the flip of the coin is a tail

You may wonder, how did we come up with numbers $0$ and $1$? We chose them for no specific reason. We could write

$\bullet $ $X=35$ if the flip of the coin is a head
$\bullet $ $X=88$ if the flip of the coin is a tail

and $X$ would still be a random variable representing the same event. However, the first example may be more intuitive.

Lets flip a coin 4 times. Random variable $X$ could look like this:

$\bullet $ $X=0$ no heads
$\bullet $ $X=1$ if 1 head
$\bullet $ $X=2$ if 2 heads
$\bullet $ $X=3$ if 3 heads
$\bullet $ $X=4$ if 4 heads

In conclusion, we gave numerical value to each outcome of an event. This set of values is a random variable $X=\{0,1,2,3,4\}$.

Difference between traditional and random variables

A variable is a symbol that represents quantity. It is useful in mathematics because we can prove something without assuming its value. As a result, you can make a general statement over a range of values for that variable.
For example, in expression $3x+2=14$, the value of traditional variable $x$ is $4$ and it is the only value $x$ can assume.

A random variable is a value that follows some probability distribution. In other words, it’s a value that is subjected to randomness or chance. From example above, random variable $X=\{0,1,2,3,4\}$ has a whole set of values and it can take on any of those randomly. Moreover,  each value can have a different ­­­­­­­­­­probability.

Discrete or continuous variables

Random variables come in two varieties: discrete or continuous.

Discrete random variable is a variable that can take on only a countable number of distinct (separate) values. Therefore, if a random variable can take only a finite number of distinct values, it must be discrete. They are usually counts.

However, continuous random variable is one which takes on an infinite number of possible values.  They can take any value in given interval.

Examples. $X=$  The year a student was born

The possible options are different years, for example 1995, 2005, 1987…. As we can see years are discrete, and we can count them. Therefore, $X$ is a discrete random variable.

Other examples are: Number of heads when we toss a coin 4 times, number of children in a family.

Examples. $X=$ the weight of a random animal in zoo

The possible options are numbers in interval from lowest to highest possible weight. For example from a weight of ant to the weight of an elephant. The weight of a random animal can be, let’s say $123,456$. But maybe it’s $123,456789$. As a result, we can’t count all the possible values. Therefore, $X$ isn’t a discrete random variable, it’s continuous.

Other examples: Height, temperature, amount of sugar in specific fruit.

Probability of a random variable

When solving probability problems we would ask questions like

1. What is the probability of getting exactly 1 head?
2. Calculate the probability of getting more than 3 heads?
3. What is the probability of getting less than 2 heads?

Instead of writing event $A=\{$getting exactly 1 head$\}$, $P(A)$ and so on, we write :

1. $P(X=1)$
2. $P(X>3)$
3. $P(X<2)$

Random variables give us a way to ask questions about random process in a concise mathematical way.

Generally, probability of  a random variable is presented as

$P(X= $value$)= $ probability of said value

Proposition. 

(a) Let $g:\mathbb{R} \rightarrow \mathbb{R}$ be a function, and $X$ a random variable. Their composition $g\circ X$ (or shorter $g(X)$) is also a random variable since $g\circ X: \Omega \rightarrow \mathbb{R}$

(b) Let $h:\mathbb{R}^{2} \rightarrow \mathbb{R}$ be a function, and $X, Y$ random variables.. Their composition $h\circ (X,Y)$ or shorter $h(X,Y)$) is also a random variable since $h\circ (X,Y): \Omega \rightarrow \mathbb{R}$

For example, $7X, X^{2}, 3X+12Y, XY^{2}$ and $sinX\cdot e^{X+Y}$ are all random variables.

Another important part of random variables is their distribution, which we’ll cover in the next lesson.

Example

There are 4 defective light bulbs in a package of 10. Two are randomly selected without replacement.

Let $X$ be the number of selected defective light bulbs. Since we are picking two, there are 3 possibilities: 0 out of 2 two are defective, 1 out of picked 2 is defective and both are defective. Consequently, $X=\{0,1,2\}$. Now lets calculate the probability of each possible outcome.

For $X=0$ it means we picked one out of 6 that are not defective. For the second pick there were 5 not defective ones left in a package of 9. Therefore,

$\displaystyle{P(X=0)= \frac{6}{10} \cdot \frac{5}{9}=\frac{30}{90}=\frac{1}{3}}$

Similarly, for $X=2$ we take defective ones which gives us

$\displaystyle{P(X=2)= \frac{4}{10} \cdot \frac{3}{9}=\frac{12}{90}=\frac{2}{15}}$

However, for $X=1$ we have two possibilities. First bulb we picked was defective and the second one wasn’t. Or, the first one wasn’t defective and the second one was. We write it as:

$\displaystyle{P(X=1)= \frac{6}{10} \cdot \frac{4}{9} + \frac{4}{10} \cdot \frac{6}{9}=\frac{48}{90}=\frac{8}{15}}$

Additionally, we could have used complement for solving $P(X=1)$. Since picking one defective bulb is a complement of “picking none or picking 2”:

$$P(X=1)=1-P(X=0)-P(X=2)$$