\( \newcommand{\E}{\mathbb{E}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\given}{\,\vert\,} \newcommand{\indic}[1]{\mathbf{1}\!\left\{#1\right\}} \newcommand{\pmf}{\text{p.m.f.}} \newcommand{\pdf}{\text{p.d.f.}} \newcommand{\cdf}{\text{c.d.f.}} \)

2  Random Variables & Distributions

Reading. Hill, Griffiths & Lim (5th ed.), Probability Primer, P.1<80><93>P.2.

A dataset is a sample drawn from a larger population. To learn about the population from the sample, we first need a language for uncertainty <80><94> a way to talk about outcomes before we have seen them. That language is the random variable, and this chapter builds it from scratch: what a random variable is, the two flavors it comes in (discrete and continuous), and the three functions we use to describe one (the pmf, the pdf, and the cdf).

This is the first of three chapters that assemble the probability toolkit we need for inference. Here we set up distributions; the next chapter summarizes a distribution with a single number (expectation); the one after introduces the Normal distribution and the Central Limit Theorem.

2.1 Random variables

Random variable

A random variable is a variable whose value is unknown until it is observed <80><94> a numerical outcome that is not perfectly predictable.

Everyday examples are everywhere: the score you will get on the next exam, tomorrow’s value of a stock-market index, the number of games the football team wins next season, the wage of a randomly selected worker. None of these is known in advance, yet each is a number we can reason about.

Notation

We write random variables with uppercase letters (\(X, Y, W\)) and the particular values they take with lowercase letters (\(x, y, w\)). So “\(X = x\)” reads: the random variable \(X\) takes the value \(x\).

Why economists care

Think of the population of California adults. Pick one person at random and record their education level. The outcome is not deterministic <80><94> different people have different education <80><94> so education is a random variable. Its distribution tells us the probability that a randomly drawn person falls in each category, for example \[ \Prob(\text{bachelor's degree}) \approx 0.225 . \]

But what is a probability? The probability of an outcome is its long-run relative frequency. Saying \(\Prob(\text{bachelor's}) \approx 0.225\) means that across many random draws, about \(22.5\%\) of those drawn hold a bachelor’s degree.

The econometric problem in one sentence

We rarely know the true distribution. Econometrics uses a random sample to make inferences about the underlying distribution.

2.2 Discrete vs. continuous

Every random variable comes with an outcome space \(\mathcal{O}_X\): the set of all values it can take. The single most important distinction in this chapter is whether that set is countable or not.

Discrete vs. continuous
  • A random variable is discrete if its outcome space \(\mathcal{O}_X\) is countable <80><94> think of a list, possibly infinite.
  • It is continuous if \(\mathcal{O}_X\) is uncountable <80><94> a whole interval of values.
Outcome spaces for discrete and continuous variables.
Discrete Continuous
Coin flip: \(\{H, T\}\) Sprint time (s): \([9.5,\,10.5]\)
Die roll: \(\{1,2,3,4,5,6\}\) Income: \([0, \infty)\)
Number of doctor visits: \(\{0,1,2,\dots\}\) Interest rate, GDP, \(\dots\)
Indicator variables

A yes/no answer (“college graduate?”) is a special discrete variable taking only the values \(\{0, 1\}\). We will use these constantly to encode qualitative traits, and they return in force when we study dummy variables.

We describe discrete and continuous variables with different tools <80><94> a mass function for the discrete case and a density function for the continuous case <80><94> so we take them in turn.

2.3 Discrete distributions: the pmf

For a discrete random variable, the distribution is captured by the probability mass function.

Probability mass function (pmf)

The pmf of a discrete random variable \(X\) assigns to each possible value \(x\) the probability that \(X\) equals exactly that value: \[ f_X(x) \;=\; \Prob(X = x). \]

Two rules every pmf obeys

\[ \text{(1)}\quad 0 \le f_X(x) \le 1 \qquad\qquad \text{(2)}\quad \sum_{x \in \mathcal{O}_X} f_X(x) = 1 . \]

To get the probability of a set of outcomes \(A\), just add up the masses: \[ \Prob(X \in A) = \sum_{x \in A} f_X(x). \]

Example: a fair die

Let \(X\) be the result of a fair die roll. Its pmf is \[ f_X(x) = \begin{cases} \tfrac{1}{6} & x \in \{1,2,3,4,5,6\}\\[2pt] 0 & \text{otherwise.} \end{cases} \]

What is the probability of an even roll, \(A = \{2,4,6\}\)? We follow the rule for the probability of a set: \[ \Prob(X \in \{2,4,6\}) = f_X(2)+f_X(4)+f_X(6) = \tfrac{1}{6}+\tfrac{1}{6}+\tfrac{1}{6} = \tfrac{1}{2}. \]

The answer is obvious here <80><94> but the procedure is what matters. With a loaded die we would follow exactly the same steps. Plotting the pmf, every bar has the same height \(\tfrac{1}{6}\), and each bar’s height is a probability (Figure 2.1).

Show the R code
die <- data.frame(x = 1:6, p = 1/6)
ggplot(die, aes(x, p)) +
  geom_col(fill = ucla$blue, color = ucla$darkblue, width = 0.6) +
  scale_x_continuous(breaks = 1:6) +
  scale_y_continuous(
    limits = c(0, 0.25),
    breaks = c(0, 1/6),
    labels = c("0", "1/6")
  ) +
  labs(x = "x", y = expression(f[X](x)))
Figure 2.1: The pmf of a fair die. Each bar’s height is a probability.

The pmf as a table

A discrete distribution is often easiest to read as a table. Consider \(X\) with \[ f_X(1)=0.1,\quad f_X(2)=0.2,\quad f_X(3)=0.3,\quad f_X(4)=0.4 . \] The probabilities are non-negative and sum to one <80><94> a valid pmf.

Show the R code
pmf_tab <- data.frame(x = c(1, 2, 3, 4, "sum"),
                      fx = c(0.1, 0.2, 0.3, 0.4, 1.0))
knitr::kable(pmf_tab, col.names = c("$x$", "$f_X(x)$"), align = "cc")
Table 2.1: A discrete distribution, written as a table.
\(x\) \(f_X(x)\)
1 0.1
2 0.2
3 0.3
4 0.4
sum 1.0
Show the R code
d <- data.frame(x = factor(1:4), p = c(0.1, 0.2, 0.3, 0.4))
ggplot(d, aes(x, p)) +
  geom_col(fill = ucla$blue, color = ucla$darkblue, width = 0.6) +
  scale_y_continuous(limits = c(0, 0.5)) +
  labs(x = "x", y = expression(f[X](x)))
Figure 2.2: The same distribution as a bar chart.

We return to this \(X\) below when we build its cdf.

A special case: the indicator (Bernoulli) variable

The most important discrete variable in this course takes only two values, \(0\) and \(1\). It is called an indicator (or dummy, or Bernoulli) variable, and it encodes a yes/no trait.

Bernoulli(p) distribution

Let \(D = 1\) if a randomly drawn person is a college graduate and \(D = 0\) if not. With \(p = \Prob(D = 1)\), the pmf is \[ f_D(d)= \begin{cases} p & d = 1\\[2pt] 1-p & d = 0\\[2pt] 0 & \text{otherwise.} \end{cases} \] A single number, \(p\), says everything.

Indicators encode qualitative traits <80><94> sex, race, treatment status, whether a policy is in place <80><94> which is why they are so useful in applied work.

A preview

The mean of a \(0/1\) variable is just the proportion of ones: \(\E[D] = p\). We show this in the next chapter <80><94> it is why regressions on indicators recover group shares and treatment effects (see dummy variables and treatment effects).

Show the R code
p <- 0.3
bern <- data.frame(d = factor(c(0, 1)), prob = c(1 - p, p))
ggplot(bern, aes(d, prob)) +
  geom_col(fill = ucla$blue, color = ucla$darkblue, width = 0.5) +
  scale_y_continuous(limits = c(0, 1)) +
  labs(x = "d", y = expression(f[D](d)))
Figure 2.3: A Bernoulli(0.3) variable: all the mass sits on 0 and 1.

2.4 Continuous distributions: the pdf

For a continuous random variable we cannot use a pmf. Why not?

The key fact

A continuous variable can take uncountably many values, so the probability of any single exact value is zero: \[ \Prob(X = x) = 0 \quad\text{for every } x. \]

Instead we describe the distribution with a probability density function. Probabilities become areas under the density.

Probability density function (pdf)

The pdf \(f_X(x)\) of a continuous random variable gives probabilities as areas: \[ \Prob(a \le X \le b) \;=\; \int_a^b f_X(x)\,dx . \]

Notation note. Following HGL we write \(f_X\) for both the discrete pmf and the continuous pdf. Same symbol, different meaning: for a discrete variable \(f_X(x)\) is a probability, while for a continuous variable it is a density <80><94> only its area is a probability.

Density is not probability

A density \(f_X(x)\) can exceed \(1\) <80><94> it is not a probability. Only the area under it is. Figure 2.4 shows a probability as the shaded area under a density curve between two points \(a\) and \(b\).

What makes $f_X$ a valid pdf

\[ f_X(x) \ge 0 \qquad\text{and}\qquad \int_{-\infty}^{\infty} f_X(x)\,dx = 1 . \] The total area under any density is one <80><94> the continuous analog of “the masses sum to one.”

Because single points carry zero probability, endpoints don’t matter: \[ \Prob(a \le X \le b) = \Prob(a < X < b). \]

Show the R code
xs  <- seq(-3.5, 3.5, length.out = 400)
dat <- data.frame(x = xs, y = dnorm(xs))
sh  <- subset(dat, x >= -1 & x <= 1.5)
ggplot(dat, aes(x, y)) +
  geom_area(data = sh, aes(x, y), fill = ucla$blue, alpha = 0.30) +
  geom_line(color = ucla$blue, linewidth = 1) +
  geom_segment(aes(x = -1,  xend = -1,  y = 0, yend = dnorm(-1)),
               linetype = "dashed", color = ucla$gray) +
  geom_segment(aes(x = 1.5, xend = 1.5, y = 0, yend = dnorm(1.5)),
               linetype = "dashed", color = ucla$gray) +
  annotate("text", x = 0.25, y = 0.16,
           label = "P(a <= X <= b)", color = ucla$darkblue, size = 3.4) +
  scale_x_continuous(breaks = c(-1, 1.5), labels = c("a", "b")) +
  scale_y_continuous(limits = c(0, 0.45)) +
  labs(x = "x", y = expression(f[X](x)))
Figure 2.4: For a continuous variable, probability is the area under the density between \(a\) and \(b\).

Example: the Uniform\([0,1]\) distribution

Let \(X\) be uniform on \([0,1]\), with density \[ f_X(x) = \begin{cases} 1 & 0 \le x \le 1\\ 0 & \text{otherwise.} \end{cases} \]

What is \(\Prob(0 \le X \le 0.5)\)? We integrate the density over the interval: \[ \Prob(0 \le X \le 0.5) = \int_{0}^{0.5} f_X(x)\,dx = \int_{0}^{0.5} 1 \, dx = 0.5 . \] The area is just a rectangle: width \(0.5 \times\) height \(1 = 0.5\). Half the probability sits in the left half of the interval <80><94> exactly what “uniform” means (Figure 2.5).

Show the R code
xs    <- seq(-0.4, 1.4, length.out = 300)
dens  <- ifelse(xs >= 0 & xs <= 1, 1, 0)
curve_df <- data.frame(x = xs, f = dens)
shade <- data.frame(x = c(0, 0, 0.5, 0.5), y = c(0, 1, 1, 0))
ggplot() +
  geom_polygon(data = shade, aes(x, y), fill = ucla$blue, alpha = 0.30) +
  geom_line(data = curve_df, aes(x, f), color = ucla$blue, linewidth = 1) +
  annotate("text", x = 0.25, y = 0.5, label = "0.5", color = ucla$darkblue) +
  scale_x_continuous(breaks = c(0, 0.5, 1)) +
  scale_y_continuous(limits = c(0, 1.3), breaks = 1) +
  labs(x = "x", y = expression(f[X](x)))
Figure 2.5: The Uniform[0,1] density. The shaded rectangle has area 0.5.

2.5 The cdf <80><94> the common language

Both discrete and continuous variables share one common summary: the cumulative distribution function, which accumulates probability from \(-\infty\) up to \(x\).

Cumulative distribution function (cdf)

\[ F_X(x) \;=\; \Prob(X \le x). \]

  • Discrete: \(\displaystyle F_X(x) = \sum_{t \le x} f_X(t)\)
  • Continuous: \(\displaystyle F_X(x) = \int_{-\infty}^{x} f_X(t)\,dt\)
Properties of any cdf
  • \(F_X\) is non-decreasing, with \(\displaystyle\lim_{x\to-\infty}F_X(x)=0\) and \(\displaystyle\lim_{x\to+\infty}F_X(x)=1\).
  • \(0 \le F_X(x) \le 1\).

Why the cdf is so useful

The cdf turns “probability of an interval” into simple subtraction.

The interval and complement rules

\[ \Prob(a < X \le b) \;=\; F_X(b) - F_X(a), \qquad \Prob(X > a) \;=\; 1 - F_X(a). \]

This is exactly how we will read probabilities off statistical tables and software later in the course (Normal and \(t\) probabilities, for instance). We almost never integrate by hand <80><94> we look up or compute cdf values.

The cdf of a discrete variable: a step function

Take the table from before, \(f_X(1{:}4) = (0.1, 0.2, 0.3, 0.4)\). Accumulating, \[ F_X(1)=0.1,\quad F_X(2)=0.3,\quad F_X(3)=0.6,\quad F_X(4)=1.0 . \]

Reading the cdf

\[ \Prob(X \le 2) = F_X(2) = 0.1 + 0.2 = 0.3. \] Even a value \(X\) can’t take has a cdf: \(F_X(2.5) = \Prob(X \le 2.5) = 0.3\). And the complement: \(\Prob(X > 2) = 1 - F_X(2) = 0.7\).

The discrete cdf jumps at each possible value, and the size of the jump at \(x\) equals \(f_X(x)\) (Figure 2.6). The closed dots show the value attained at each jump; the open dots show the limit from the left.

Show the R code
seg <- data.frame(
  x    = c(-0.2, 1,   2,   3,   4),
  xend = c(1,    2,   3,   4,   5),
  y    = c(0,    0.1, 0.3, 0.6, 1.0)
)
closed <- data.frame(x = 1:4, y = c(0.1, 0.3, 0.6, 1.0))
open   <- data.frame(x = 1:4, y = c(0.0, 0.1, 0.3, 0.6))
ggplot() +
  geom_segment(data = seg, aes(x = x, xend = xend, y = y, yend = y),
               color = ucla$blue, linewidth = 1) +
  geom_point(data = closed, aes(x, y), color = ucla$blue, size = 2.4) +
  geom_point(data = open, aes(x, y), shape = 21, fill = "white",
             color = ucla$blue, size = 2.4, stroke = 1) +
  scale_x_continuous(breaks = 1:4, limits = c(-0.2, 5)) +
  scale_y_continuous(breaks = c(0, 0.1, 0.3, 0.6, 1), limits = c(0, 1.05)) +
  labs(x = "x", y = expression(F[X](x)))
Figure 2.6: The cdf of a discrete variable is a step function; each jump equals \(f_X(x)\).

The cdf of a continuous variable: a smooth curve

For the Uniform\([0,1]\), accumulate the area from the left: \[ F_X(x) = \begin{cases} 0 & x < 0\\ x & 0 \le x \le 1\\ 1 & x > 1. \end{cases} \]

Let’s check the interval rule: \[ \Prob(0.2 < X \le 0.7) = F_X(0.7) - F_X(0.2) = 0.7 - 0.2 = 0.5. \] A continuous cdf is continuous <80><94> no jumps, because single points carry no probability, so there is nothing to jump by. Its slope is the density, \(F_X'(x) = f_X(x)\) (Figure 2.7).

Show the R code
xs <- seq(-0.4, 1.4, length.out = 300)
cdf_df <- data.frame(x = xs, F = pmin(pmax(xs, 0), 1))
ggplot(cdf_df, aes(x, F)) +
  geom_line(color = ucla$blue, linewidth = 1) +
  geom_segment(aes(x = 0.2, xend = 0.2, y = 0,   yend = 0.2),
               linetype = "dashed", color = ucla$gray) +
  geom_segment(aes(x = 0,   xend = 0.2, y = 0.2, yend = 0.2),
               linetype = "dashed", color = ucla$gray) +
  geom_segment(aes(x = 0.7, xend = 0.7, y = 0,   yend = 0.7),
               linetype = "dashed", color = ucla$gray) +
  geom_segment(aes(x = 0,   xend = 0.7, y = 0.7, yend = 0.7),
               linetype = "dashed", color = ucla$gray) +
  scale_x_continuous(breaks = c(0, 0.2, 0.7, 1)) +
  scale_y_continuous(breaks = c(0, 0.2, 0.7, 1), limits = c(0, 1.05)) +
  labs(x = "x", y = expression(F[X](x)))
Figure 2.7: The cdf of the Uniform[0,1] rises smoothly from 0 to 1.

2.6 Recap

A random variable is a numerical outcome that is unknown until observed, described by its distribution. The distinction between discrete and continuous drives which tool we use:

Discrete vs. continuous distributions at a glance.
Discrete (countable \(\mathcal{O}_X\)) Continuous (interval \(\mathcal{O}_X\))
Describe with pmf: \(f_X(x) = \Prob(X = x)\) pdf: area under the curve gives probability
Normalization \(\sum_x f_X(x) = 1\) \(\int_{-\infty}^{\infty} f_X(x)\,dx = 1\)
Probabilities \(\Prob(X \in A) = \sum_{x \in A} f_X(x)\) \(\Prob(a \le X \le b) = \int_a^b f_X(x)\,dx\)

And both share the cdf as a common language: \[ F_X(x) = \Prob(X \le x), \qquad \Prob(a < X \le b) = F_X(b) - F_X(a). \]

Next time: summarizing a distribution with a single number <80><94> expectation, then variance and covariance.