data(andy)
fit <- lm(sales ~ price + advert, data = andy)
summary(fit)$coefficients
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 118.913610 6.3516375 18.721725 2.214293e-29
#> price -7.907854 1.0959930 -7.215241 4.423997e-10
#> advert 1.862584 0.6831955 2.726283 8.038182e-0315 Hypothesis Testing in Multiple Regression
Reading. SW
7.1, 7.3, HGL 5.4 <80><93>5.5
We now have Big Andy’s coefficients and their standard errors: \[ \widehat{\text{SALES}} = \underset{(6.35)}{118.91} \ \underset{(1.096)}{-\,7.908}\,\text{PRICE} \ \underset{(0.683)}{+\,1.863}\,\text{ADVERT}. \]
The good news is that the inference machinery is exactly what we built for simple regression in confidence intervals and hypothesis testing
The new power that multiple regression gives us is the ability to test and estimate combinations of several coefficients at once
15.1 \(t\)-tests on a single coefficient
Under the multiple-regression assumptions MR1
\[
t = \frac{b_k - c}{\mathrm{se}(b_k)} \sim t_{(N-K)}
\qquad\text{under } H_0:\beta_k = c .
\] The only difference from simple regression is that the degrees of freedom are \(N-K\)
Everything else carries over unchanged: the rejection region is determined by the direction of the alternative \(H_1\) (\(>,<,\neq\)); the \(p\)-value rule is still “reject if \(p \le \alpha\)”; and we still say “reject” or “fail to reject,” never “accept.”
Tests of significance
The default test reported for each coefficient is \(H_0:\beta_k = 0\) against \(H_1:\beta_k\neq0\)
For price, \[
t = \frac{-7.908}{1.096} = -7.22,
\] and since \(|{-7.22}| > 1.993\) (with \(p \approx 0.000\)) we reject \(H_0\): sales depend on price. For advertising, \[
t = \frac{1.863}{0.683} = 2.73,
\] and \(2.73 > 1.993\) (with \(p = 0.008\)), so we again reject: sales depend on advertising. Both regressors are statistically significant. The \(t\)’s and \(p\)’s printed in the regression table are exactly these tests, computed automatically. Running the regression in R makes this concrete t value and Pr(>|t|) columns are the significance tests:
One-tail tests of economic hypotheses
The interesting questions are usually directional, and the rule of thumb is to put the conjecture you want to establish in the alternative \(H_1\). Two of Big Andy’s questions illustrate the point, and they use the one-tail critical value \(t_{(0.95,\,72)} = 1.666\).
The first asks whether demand is price-elastic, i.e. whether \(\beta_2 < 0\) is large enough in magnitude that a price cut raises revenue. We set \(H_0:\beta_2 \ge 0\) against \(H_1:\beta_2 < 0\) and compute \[ t = \frac{-7.908}{1.096} = -7.22 < -1.666 . \] We reject: demand is price-elastic, so a price cut raises revenue.
The second asks whether advertising is profitable, meaning each extra dollar of advertising returns more than a dollar in sales, \(\beta_3 > 1\). We set \(H_0:\beta_3 \le 1\) against \(H_1:\beta_3 > 1\) and compute \[ t = \frac{1.863 - 1}{0.683} = 1.26 < 1.666 . \] Here we fail to reject: the data cannot prove that advertising returns more than it costs.
The advertising case is the cautionary one. The point estimate \(b_3 = 1.86 > 1\) looks profitable, but once we account for its imprecision (\(\mathrm{se} = 0.68\)), the data cannot rule out \(\beta_3 \le 1\). Statistical significance is not the same as economic decisiveness, and a point estimate on its own is not evidence for a sharp claim.
15.2 Confidence intervals in multiple regression
Interval estimation is identical to what we did in confidence intervals, with the only change again being the degrees of freedom: \[ b_k \pm t_c\,\mathrm{se}(b_k), \qquad t_c = t_{(1-\alpha/2,\,N-K)} = t_{(0.975,\,72)} = 1.993 . \]
For the price coefficient \(\beta_2\), \[
-7.908 \pm 1.993\,(1.096) = [-10.09,\ -5.72].
\] In dollar terms, a $1 price cut raises mean revenue by between $5,723 and $10,093 (or, for a dime, $572confint() reproduces both intervals:
confint(fit)
#> 2.5 % 97.5 %
#> (Intercept) 106.251852 131.575368
#> price -10.092676 -5.723032
#> advert 0.500659 3.224510Figure 15.1 shows the two intervals side by side. The width is the story: the advertising interval is much wider than the price interval, and that wide interval is the same imprecision that made the “\(\beta_3 > 1\)” test inconclusive.
Show the R code
ci <- data.frame(
coef = c("price", "advert"),
est = c(-7.908, 1.863),
lo = c(-10.09, 0.50),
hi = c(-5.72, 3.22)
)
ci$coef <- factor(ci$coef, levels = c("price", "advert"))
ggplot(ci, aes(est, coef)) +
geom_vline(xintercept = 0, linetype = "dashed", color = ucla$gray) +
geom_vline(xintercept = 1, linetype = "dotted", color = ucla$red) +
geom_errorbarh(aes(xmin = lo, xmax = hi), height = 0.18,
color = ucla$blue, linewidth = 1) +
geom_point(color = ucla$darkblue, size = 2.6) +
labs(x = "coefficient value", y = NULL)The remedy. The wide interval for \(\beta_3\) is a precision problem, and the fix is the one from variance and collinearity in multiple regression: more data, and more independent variation in advertising.
15.3 Linear combinations of coefficients
Real decisions move several regressors at once, so the natural quantity to estimate is a weighted sum of coefficients. Big Andy plans to drop price 40 cents and raise advertising $800 at the same time. The resulting change in expected sales is a linear combination of the coefficients: \[ \lambda = -0.4\,\beta_2 + 0.8\,\beta_3 . \]
Because the OLS estimator is BLUE, the corresponding combination of estimates, \(\hat\lambda = \sum_k c_k b_k\), is itself the best linear unbiased estimator of \(\lambda\), so the point estimate is immediate: \[ \hat\lambda = -0.4(-7.908) + 0.8(1.863) = 4.65 . \] Expected sales rise by about $4,653 from the combined strategy.
The standard error needs the covariances
To attach a standard error to \(\hat\lambda\) we need its variance, and the variance of a combination of several estimators carries a covariance term for every pair
For \(\lambda = -0.4\,b_2 + 0.8\,b_3\), using Big Andy’s estimated variance
A 90% interval uses \(t_c = t_{(0.95,\,72)} = 1.666\): \[ 4.65 \pm 1.666\,(0.71) = [\,3.47,\ 5.84\,] \;\Rightarrow\; \$3{,}471 \text{ to } \$5{,}835 . \]
If you ignore the covariance term and add only the scaled variances, you get the standard error wrong. Here, because the covariance is negative and enters with a negative sign, omitting it would understate the spread of \(\hat\lambda\).
In R, the whole computation is a quadratic form in the estimated variance
cvec <- c(0, -0.4, 0.8) # weights on (intercept, price, advert)
lam <- sum(cvec * coef(fit)) # point estimate
vlam <- as.numeric(t(cvec) %*% vcov(fit) %*% cvec)
se <- sqrt(vlam)
tc <- qt(0.95, df = df.residual(fit))
c(estimate = lam, se = se, lo90 = lam - tc * se, hi90 = lam + tc * se)
#> estimate se lo90 hi90
#> 4.6532091 0.7096133 3.4707851 5.8356332Figure 15.2 places this interval on the dollar scale. The whole interval lies well above zero, so the combined strategy is very likely to raise sales
Show the R code
lc <- data.frame(est = 4.653, lo = 3.471, hi = 5.835, y = 1)
ggplot(lc, aes(est, y)) +
geom_vline(xintercept = 0, linetype = "dashed", color = ucla$gray) +
geom_errorbarh(aes(xmin = lo, xmax = hi), height = 0.12,
color = ucla$blue, linewidth = 1) +
geom_point(color = ucla$darkblue, size = 3) +
annotate("text", x = 4.653, y = 1.18, label = "$4,653",
color = ucla$darkblue, size = 3.4) +
scale_x_continuous(limits = c(0, 6.5)) +
scale_y_continuous(limits = c(0.7, 1.3), breaks = NULL) +
labs(x = expression(hat(lambda)), y = NULL)Testing a linear combination
To test a claim about a combination, use the same \(t\)-statistic with the combination’s standard error: \[ t = \frac{\sum_k c_k b_k - c_0}{\mathrm{se}\bigl(\sum_k c_k b_k\bigr)} \sim t_{(N-K)} . \]
There is also a slick algebraic shortcut that turns certain combination tests into ordinary single-coefficient tests.
To test, say, \(H_0:\beta_1 = \beta_2\) (two regressors have equal effects), rewrite the restriction as a single new coefficient \(\gamma = \beta_1 - \beta_2\) and re-specify the regression so that \(\gamma\) appears directly. The test is then an ordinary single-coefficient \(t\)-test on \(\hat\gamma\), and \(\hat\gamma \pm 1.96\,\mathrm{se}(\hat\gamma)\) is a confidence interval for the difference.
Either way
15.4 Single vs. joint restrictions
Everything in this chapter tests a single restriction: one equation about the \(\beta\)’s, even when that equation mixes several of them (as in \(-0.4\,\beta_2 + 0.8\,\beta_3\)). A single restriction is always a \(t\)-test.
A \(t\)-test cannot test several restrictions at once
That joint machinery
15.5 Recap
Inference on a single coefficient in multiple regression is identical to simple regression, with the one change that the degrees of freedom are \(N-K\): \[ t = \frac{b_k - c}{\mathrm{se}(b_k)} \sim t_{(N-K)} . \] For Big Andy (\(N-K = 72\)), both price (\(t = -7.2\)) and advertising (\(t = 2.7\)) are significant; the one-tail tests show demand is elastic but cannot prove advertising is profitable (\(t = 1.26\)); and the confidence intervals are \(\beta_2 \in [-10.1,\,-5.7]\) and \(\beta_3 \in [0.50,\,3.22]\) (wide).
Multiple regression also lets us estimate and test linear combinations of coefficients. The estimator \(\hat\lambda = \sum_k c_k b_k\) is BLUE, its variance includes all the covariances between the estimates, and a single restriction
| What it tests | Tool | |
|---|---|---|
| Single restriction | one equation in the \(\beta\)’s (even across several) | \(t\)-test, \(\text{df}=N-K\) |
| Joint restriction | several equations at once | \(F\)-test |
To test several restrictions at once, we need the \(F\)-test, not a string of separate \(t\)-tests whose error rates compound.
Next time: interaction terms. So far each coefficient is a constant partial effect; with interactions (\(x_2 \times x_3\)) the effect of one variable depends on the level of another