data(andy)
m_andy <- lm(sales ~ price + advert + I(advert^2), andy)
round(coef(m_andy), 3)
#> (Intercept) price advert I(advert^2)
#> 109.719 -7.640 12.151 -2.76816 Interaction Terms
Reading. SW
8.2 <80><93>8.3, HGL 5.6
Every multiple-regression coefficient so far has been a constant partial effect. In Big Andy’s burger model, \(\beta_3\) was the effect of advertising
This chapter lets marginal effects vary, using two devices that are still ordinary OLS. Polynomials (\(x^2\)) make an effect depend on its own level; interactions (\(x_2 \times x_3\)) make an effect depend on another variable. Once marginal effects can vary, we can finally do genuine economic optimization
This chapter sits in the multiple-regression sequence. It builds directly on multiple regression and the functional forms we met in simple regression; the next chapter gives us a way to test whether a whole block of these curvature and interaction terms is worth keeping.
16.1 Polynomials in multiple regression
A linear model forces a constant advertising effect \(\beta_3\)
Crucially, this is a multiple regression: ADVERT and ADVERT\(^2\) are two distinct regressors. In simple regression we could only fit one of them at a time; here both enter the same equation.
Big Andy’s, with diminishing returns
Estimating the quadratic model on Big Andy’s data gives \[
\widehat{\text{SALES}} = 109.72 - 7.640\,\text{PRICE}
+ \underset{(3.556)}{12.151}\,\text{ADVERT}
- \underset{(0.941)}{2.768}\,\text{ADVERT}^2 ,
\] with standard errors in parentheses below the advertising coefficients. Both signs come out as expected, and the ADVERT\(^2\) term is statistically significant
Show the R code
b <- coef(m_andy)
price_bar <- mean(andy$price)
sales_hat <- function(a) b[1] + b[2]*price_bar + b[3]*a + b[4]*a^2
me <- function(a) b[3] + 2*b[4]*a # marginal effect at advert = a
curve_df <- data.frame(a = seq(0.2, 3, length.out = 200))
curve_df$s <- sales_hat(curve_df$a)
tangent <- function(a0, lo, hi) {
aa <- seq(lo, hi, length.out = 2)
data.frame(a = aa, s = sales_hat(a0) + me(a0) * (aa - a0))
}
t_steep <- tangent(0.5, 0.2, 0.95)
t_flat <- tangent(2.0, 1.55, 2.45)
ggplot(curve_df, aes(a, s)) +
geom_line(color = ucla$blue, linewidth = 1) +
geom_line(data = t_steep, aes(a, s),
linetype = "dashed", color = ucla$red) +
geom_line(data = t_flat, aes(a, s),
linetype = "dashed", color = ucla$red) +
annotate("text", x = 1.05, y = sales_hat(0.5),
label = "steep", color = ucla$red, size = 3.4, hjust = 0) +
annotate("text", x = 2.0, y = sales_hat(2.0) + 1.3,
label = "flat", color = ucla$red, size = 3.4) +
scale_x_continuous(breaks = c(0.5, 2)) +
labs(x = "ADVERT ($000)", y = "SALES")Polynomials are everywhere in economics
Cost and product curves are inherently curved, and polynomials capture that curvature while staying linear in the parameters
A U-shaped average cost curve is a quadratic, \[ \text{AC} = \beta_1 + \beta_2 Q + \beta_3 Q^2 + e, \qquad \text{slope } \beta_2 + 2\beta_3 Q , \] where we expect \(\beta_2 < 0\) (cost falls at first) and \(\beta_3 > 0\) (cost eventually rises). An S-shaped total cost curve is a cubic, \[ \text{TC} = \alpha_1 + \alpha_2 Q + \alpha_3 Q^2 + \alpha_4 Q^3 + e, \qquad \text{marginal cost } = \alpha_2 + 2\alpha_3 Q + 3\alpha_4 Q^2 . \]
A polynomial coefficient is not a slope. Always report the marginal effect \(dy/dx\) evaluated at chosen values of \(x\)
One practical wrinkle: \(x\) and \(x^2\) can be highly correlated, which sometimes inflates their standard errors. This is the collinearity problem in disguise.
16.2 Interaction terms
A polynomial lets an effect depend on its own level. An interaction lets it depend on another variable. We build one by including the product of two regressors: \[ y = \beta_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4\,(x_2 \times x_3) + e . \] Differentiating, each variable’s marginal effect now slides with the other: \[ \frac{\partial\,\E(y)}{\partial x_2} = \beta_2 + \beta_4\,x_3, \qquad \frac{\partial\,\E(y)}{\partial x_3} = \beta_3 + \beta_4\,x_2 . \]
\(\beta_4\) is the effect of raising both \(x_2\) and \(x_3\), above and beyond the sum of their separate effects. If \(\beta_4 = 0\), the two effects are additive and separable. If not, they either reinforce each other (\(\beta_4 > 0\)) or offset each other (\(\beta_4 < 0\)).
The motivating case: age <97> income <86><92> pizza
Does the effect of income on pizza spending depend on age? Write the model as \[
\text{PIZZA} = \beta_1 + \beta_2\,\text{AGE} + \beta_3\,\text{INCOME}
+ \beta_4\,(\text{AGE}\times\text{INCOME}) + e .
\] The marginal effect of income is then \[
\frac{\partial\,\E(\text{PIZZA})}{\partial\,\text{INCOME}}
= \beta_3 + \beta_4\,\text{AGE} .
\] If \(\beta_4 < 0\), an extra dollar of income raises pizza spending less for older people
With the interaction present, \(\beta_3\) alone is the income effect only at \(\text{AGE} = 0\)
A worked interaction: education and experience
Do education and experience reinforce each other in the labor market? Interact them in a wage equation: \[ \text{WAGE} = \beta_1 + \beta_2\,\text{EDUC} + \beta_3\,\text{EXPER} + \beta_4\,(\text{EDUC}\times\text{EXPER}) + e . \] OLS on the CPS data gives
data(cps5_small)
m_wage <- lm(wage ~ educ + exper + I(educ * exper), cps5_small)
round(coef(m_wage), 6)
#> (Intercept) educ exper I(educ * exper)
#> -18.759265 2.655739 0.238374 -0.002747so that \[
\widehat{\text{WAGE}} = -18.76 + 2.656\,\text{EDUC} + 0.2384\,\text{EXPER}
- 0.002747\,(\text{EDUC}\times\text{EXPER}) .
\] The return to an extra year of experience is \[
\frac{\partial\text{WAGE}}{\partial\text{EXPER}}
= 0.2384 - 0.002747\,\text{EDUC},
\] which is about $0.22/hr at \(\text{EDUC} = 8\) and $0.19/hr at \(\text{EDUC} = 16\). The small (and here statistically insignificant) negative \(\beta_4\) hints that more schooling makes an extra year of experience slightly less valuable
Show the R code
bw <- coef(m_wage)
me_df <- data.frame(educ = seq(0, 21, length.out = 100))
me_df$me <- bw["exper"] + bw["I(educ * exper)"] * me_df$educ
pts <- data.frame(educ = c(8, 16))
pts$me <- bw["exper"] + bw["I(educ * exper)"] * pts$educ
ggplot(me_df, aes(educ, me)) +
geom_line(color = ucla$blue, linewidth = 1) +
geom_point(data = pts, aes(educ, me), color = ucla$darkblue, size = 2.4) +
geom_segment(data = pts,
aes(x = educ, xend = educ, y = 0, yend = me),
linetype = "dashed", color = ucla$gray) +
scale_x_continuous(breaks = c(0, 8, 16, 21)) +
labs(x = "EDUC (years)",
y = "return to a year of EXPER ($/hr)")Binary interactions <80><94> a preview
Interactions are even more common when one of the variables is a 0/1 indicator. Interacting a dummy with a continuous \(x\) gives the two groups different slopes, while a dummy on its own merely shifts the intercept.
| Model | Effect |
|---|---|
| \(y = \beta_1 + \beta_2 x + \beta_3 D\) | different intercepts, same slope |
| \(y = \beta_1 + \beta_2 x + \beta_3 D + \beta_4 (x \times D)\) | different intercepts and slopes |
For example, does the return to the student
Indicator variables get their own chapter
16.3 Economic optimization
A constant slope can never have an interior optimum
Big Andy’s optimal advertising
From the quadratic ADVERT model, the marginal revenue of $1 more advertising is \(\beta_3 + 2\beta_4\,\text{ADVERT}\). The marginal cost of $1 of advertising is exactly $1. Setting them equal and solving for the optimal advertising level, \[ \beta_3 + 2\beta_4\,\text{ADVERT}_0 = 1 \quad\Longrightarrow\quad \text{ADVERT}_0 = \frac{1 - \beta_3}{2\beta_4} . \] Plugging in the estimates, \[ \widehat{\text{ADVERT}}_0 = \frac{1 - 12.151}{2(-2.768)} = 2.014 \;\Rightarrow\; \text{optimal} \approx \$2{,}014/\text{month}. \]
b3 <- coef(m_andy)["advert"]
b4 <- coef(m_andy)["I(advert^2)"]
advert0 <- (1 - b3) / (2 * b4)
round(advert0, 3)
#> advert
#> 2.014The optimum is a nonlinear function of the coefficients
Notice that \(\widehat{\text{ADVERT}}_0 = (1 - b_3)/(2 b_4)\) divides one estimator by another. That makes it a nonlinear function of the coefficients, so the tidy variance rule for linear combinations no longer applies exactly.
The delta method approximates the standard error of a smooth function \(g(b_3, b_4)\) using its derivatives and the estimated variance
The same idea answers the question “how many years of experience maximize wages?”
16.4 Recap
This chapter let marginal effects vary while staying inside ordinary OLS.
- Polynomials. Adding \(x^2\) makes the marginal effect \(\beta_3 + 2\beta_4 x\), which varies with \(x\)’s own level. For Big Andy’s, the advertising effect fell from \(9.38\) to \(1.08\)
<80><94> diminishing returns. Cost and product curves are natural polynomials; always report the slope at chosen values of \(x\). - Interactions. Adding \(x_2 \times x_3\) makes \(\partial y / \partial x_2 = \beta_2 + \beta_4 x_3\), so one variable’s effect slides with another (age
<97> income <86><92> pizza, educ <97> exper <86><92> wage). Never read a “main effect” in isolation. - Optimization. Setting marginal benefit equal to marginal cost, \(\beta_3 + 2\beta_4\,\text{ADVERT}_0 = 1\), gives Big Andy’s optimum of $2{,}014 with a 95% interval of \([\$1{,}757,\ \$2{,}271]\). Because the optimum is a nonlinear function of the coefficients, its standard error comes from the delta method.
- Binary interactions. A dummy interacted with a continuous variable gives groups different slopes
<80><94> the topic of dummy variables.
Next time: is the whole curvature-or-interaction block worth keeping? Testing several coefficients at once (\(\beta_4 = 0\) and \(\beta_5 = 0\)) needs the \(F\)-test