\( \newcommand{\E}{\mathbb{E}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\given}{\,\vert\,} \newcommand{\indic}[1]{\mathbf{1}\!\left\{#1\right\}} \newcommand{\pmf}{\text{p.m.f.}} \newcommand{\pdf}{\text{p.d.f.}} \newcommand{\cdf}{\text{c.d.f.}} \)

15  Hypothesis Testing in Multiple Regression

Reading. SW 7.1, 7.3, HGL 5.4<80><93>5.5

We now have Big Andy’s coefficients and their standard errors: \[ \widehat{\text{SALES}} = \underset{(6.35)}{118.91} \ \underset{(1.096)}{-\,7.908}\,\text{PRICE} \ \underset{(0.683)}{+\,1.863}\,\text{ADVERT}. \]

The good news is that the inference machinery is exactly what we built for simple regression in confidence intervals and hypothesis testing <80><94> the same \(t\)-statistic, the same logic, the same rejection rules. There is only one change: the degrees of freedom are now \(N-K\) instead of \(N-2\).

The new power that multiple regression gives us is the ability to test and estimate combinations of several coefficients at once <80><94> the kind of question a manager actually asks (“what happens if I cut price and raise advertising?”). This chapter covers \(t\)-tests and confidence intervals on a single coefficient (with \(\text{df}=N-K\)), estimating and testing linear combinations of coefficients, and the distinction between single and joint restrictions <80><94> a teaser for the \(F\)-test.

15.1 \(t\)-tests on a single coefficient

Under the multiple-regression assumptions MR1<80><93>MR6, each coefficient has a \(t\)-statistic with the same form as in simple regression.

The same $t$-statistic, new degrees of freedom

\[ t = \frac{b_k - c}{\mathrm{se}(b_k)} \sim t_{(N-K)} \qquad\text{under } H_0:\beta_k = c . \] The only difference from simple regression is that the degrees of freedom are \(N-K\) <80><94> subtract one per estimated coefficient <80><94> rather than \(N-2\). For Big Andy’s model, \(N-K = 75 - 3 = 72\).

Everything else carries over unchanged: the rejection region is determined by the direction of the alternative \(H_1\) (\(>,<,\neq\)); the \(p\)-value rule is still “reject if \(p \le \alpha\)”; and we still say “reject” or “fail to reject,” never “accept.”

Tests of significance

The default test reported for each coefficient is \(H_0:\beta_k = 0\) against \(H_1:\beta_k\neq0\) <80><94> “does this variable matter, holding the others fixed?” With a two-tail critical value of \(t_c = t_{(0.975,\,72)} = 1.993\), we can run both significance tests for Big Andy directly from the printed output.

For price, \[ t = \frac{-7.908}{1.096} = -7.22, \] and since \(|{-7.22}| > 1.993\) (with \(p \approx 0.000\)) we reject \(H_0\): sales depend on price. For advertising, \[ t = \frac{1.863}{0.683} = 2.73, \] and \(2.73 > 1.993\) (with \(p = 0.008\)), so we again reject: sales depend on advertising. Both regressors are statistically significant. The \(t\)’s and \(p\)’s printed in the regression table are exactly these tests, computed automatically. Running the regression in R makes this concrete <80><94> the t value and Pr(>|t|) columns are the significance tests:

data(andy)
fit <- lm(sales ~ price + advert, data = andy)
summary(fit)$coefficients
#>               Estimate Std. Error   t value     Pr(>|t|)
#> (Intercept) 118.913610  6.3516375 18.721725 2.214293e-29
#> price        -7.907854  1.0959930 -7.215241 4.423997e-10
#> advert        1.862584  0.6831955  2.726283 8.038182e-03

One-tail tests of economic hypotheses

The interesting questions are usually directional, and the rule of thumb is to put the conjecture you want to establish in the alternative \(H_1\). Two of Big Andy’s questions illustrate the point, and they use the one-tail critical value \(t_{(0.95,\,72)} = 1.666\).

The first asks whether demand is price-elastic, i.e. whether \(\beta_2 < 0\) is large enough in magnitude that a price cut raises revenue. We set \(H_0:\beta_2 \ge 0\) against \(H_1:\beta_2 < 0\) and compute \[ t = \frac{-7.908}{1.096} = -7.22 < -1.666 . \] We reject: demand is price-elastic, so a price cut raises revenue.

The second asks whether advertising is profitable, meaning each extra dollar of advertising returns more than a dollar in sales, \(\beta_3 > 1\). We set \(H_0:\beta_3 \le 1\) against \(H_1:\beta_3 > 1\) and compute \[ t = \frac{1.863 - 1}{0.683} = 1.26 < 1.666 . \] Here we fail to reject: the data cannot prove that advertising returns more than it costs.

A point estimate is not proof

The advertising case is the cautionary one. The point estimate \(b_3 = 1.86 > 1\) looks profitable, but once we account for its imprecision (\(\mathrm{se} = 0.68\)), the data cannot rule out \(\beta_3 \le 1\). Statistical significance is not the same as economic decisiveness, and a point estimate on its own is not evidence for a sharp claim.

15.2 Confidence intervals in multiple regression

Interval estimation is identical to what we did in confidence intervals, with the only change again being the degrees of freedom: \[ b_k \pm t_c\,\mathrm{se}(b_k), \qquad t_c = t_{(1-\alpha/2,\,N-K)} = t_{(0.975,\,72)} = 1.993 . \]

For the price coefficient \(\beta_2\), \[ -7.908 \pm 1.993\,(1.096) = [-10.09,\ -5.72]. \] In dollar terms, a $1 price cut raises mean revenue by between $5,723 and $10,093 (or, for a dime, $572<80><93>$1,009). This is a tight, useful interval. For the advertising coefficient \(\beta_3\), \[ 1.863 \pm 1.993\,(0.683) = [0.50,\ 3.22]. \] This interval is wide: $1,000 of advertising could return as little as $501 (a loss) or over $3,000. The width reflects the large standard error \(\mathrm{se}(b_3)\). R’s confint() reproduces both intervals:

confint(fit)
#>                  2.5 %     97.5 %
#> (Intercept) 106.251852 131.575368
#> price       -10.092676  -5.723032
#> advert        0.500659   3.224510

Figure 15.1 shows the two intervals side by side. The width is the story: the advertising interval is much wider than the price interval, and that wide interval is the same imprecision that made the “\(\beta_3 > 1\)” test inconclusive.

Show the R code
ci <- data.frame(
  coef = c("price", "advert"),
  est  = c(-7.908, 1.863),
  lo   = c(-10.09, 0.50),
  hi   = c(-5.72, 3.22)
)
ci$coef <- factor(ci$coef, levels = c("price", "advert"))
ggplot(ci, aes(est, coef)) +
  geom_vline(xintercept = 0, linetype = "dashed", color = ucla$gray) +
  geom_vline(xintercept = 1, linetype = "dotted", color = ucla$red) +
  geom_errorbarh(aes(xmin = lo, xmax = hi), height = 0.18,
                 color = ucla$blue, linewidth = 1) +
  geom_point(color = ucla$darkblue, size = 2.6) +
  labs(x = "coefficient value", y = NULL)
Figure 15.1: 95% confidence intervals for the price and advertising coefficients. The advertising interval is far wider, and the dashed line marks the profitability threshold \(\beta_3 = 1\).

The remedy. The wide interval for \(\beta_3\) is a precision problem, and the fix is the one from variance and collinearity in multiple regression: more data, and more independent variation in advertising.

15.3 Linear combinations of coefficients

Real decisions move several regressors at once, so the natural quantity to estimate is a weighted sum of coefficients. Big Andy plans to drop price 40 cents and raise advertising $800 at the same time. The resulting change in expected sales is a linear combination of the coefficients: \[ \lambda = -0.4\,\beta_2 + 0.8\,\beta_3 . \]

Because the OLS estimator is BLUE, the corresponding combination of estimates, \(\hat\lambda = \sum_k c_k b_k\), is itself the best linear unbiased estimator of \(\lambda\), so the point estimate is immediate: \[ \hat\lambda = -0.4(-7.908) + 0.8(1.863) = 4.65 . \] Expected sales rise by about $4,653 from the combined strategy.

The standard error needs the covariances

To attach a standard error to \(\hat\lambda\) we need its variance, and the variance of a combination of several estimators carries a covariance term for every pair <80><94> this is the variance-of-a-sum rule from expectation, generalized: \[ \Var\!\Bigl(\sum_k c_k b_k\Bigr) = \sum_k c_k^2\,\Var(b_k) + 2\sum_{j<k} c_j c_k\,\Cov(b_j,b_k). \]

For \(\lambda = -0.4\,b_2 + 0.8\,b_3\), using Big Andy’s estimated variance<80><93>covariance matrix (\(\widehat{\Var}(b_2) = 1.201\), \(\widehat{\Var}(b_3) = 0.467\), \(\widehat{\Cov}(b_2,b_3) = -0.020\)), \[ \widehat{\Var}(\hat\lambda) = (-0.4)^2(1.201) + (0.8)^2(0.467) - 2(0.4)(0.8)(-0.020) = 0.504, \] so that \[ \mathrm{se}(\hat\lambda) = \sqrt{0.504} = 0.71 . \]

A 90% interval uses \(t_c = t_{(0.95,\,72)} = 1.666\): \[ 4.65 \pm 1.666\,(0.71) = [\,3.47,\ 5.84\,] \;\Rightarrow\; \$3{,}471 \text{ to } \$5{,}835 . \]

Don't drop the covariance term

If you ignore the covariance term and add only the scaled variances, you get the standard error wrong. Here, because the covariance is negative and enters with a negative sign, omitting it would understate the spread of \(\hat\lambda\).

In R, the whole computation is a quadratic form in the estimated variance<80><93>covariance matrix, \(\widehat{\Var}(\hat\lambda) = c'\,\widehat{\Var}(b)\,c\):

cvec <- c(0, -0.4, 0.8)          # weights on (intercept, price, advert)
lam  <- sum(cvec * coef(fit))    # point estimate
vlam <- as.numeric(t(cvec) %*% vcov(fit) %*% cvec)
se   <- sqrt(vlam)
tc   <- qt(0.95, df = df.residual(fit))
c(estimate = lam, se = se, lo90 = lam - tc * se, hi90 = lam + tc * se)
#>  estimate        se      lo90      hi90 
#> 4.6532091 0.7096133 3.4707851 5.8356332

Figure 15.2 places this interval on the dollar scale. The whole interval lies well above zero, so the combined strategy is very likely to raise sales <80><94> and even the pessimistic end, about $3,471, is a substantial gain.

Show the R code
lc <- data.frame(est = 4.653, lo = 3.471, hi = 5.835, y = 1)
ggplot(lc, aes(est, y)) +
  geom_vline(xintercept = 0, linetype = "dashed", color = ucla$gray) +
  geom_errorbarh(aes(xmin = lo, xmax = hi), height = 0.12,
                 color = ucla$blue, linewidth = 1) +
  geom_point(color = ucla$darkblue, size = 3) +
  annotate("text", x = 4.653, y = 1.18, label = "$4,653",
           color = ucla$darkblue, size = 3.4) +
  scale_x_continuous(limits = c(0, 6.5)) +
  scale_y_continuous(limits = c(0.7, 1.3), breaks = NULL) +
  labs(x = expression(hat(lambda)), y = NULL)
Figure 15.2: Point estimate and 90% confidence interval for Big Andy’s combined strategy (\(-0.4\beta_2 + 0.8\beta_3\)), in thousands of dollars of sales.

Testing a linear combination

To test a claim about a combination, use the same \(t\)-statistic with the combination’s standard error: \[ t = \frac{\sum_k c_k b_k - c_0}{\mathrm{se}\bigl(\sum_k c_k b_k\bigr)} \sim t_{(N-K)} . \]

There is also a slick algebraic shortcut that turns certain combination tests into ordinary single-coefficient tests.

Stock & Watson's reparametrization trick

To test, say, \(H_0:\beta_1 = \beta_2\) (two regressors have equal effects), rewrite the restriction as a single new coefficient \(\gamma = \beta_1 - \beta_2\) and re-specify the regression so that \(\gamma\) appears directly. The test is then an ordinary single-coefficient \(t\)-test on \(\hat\gamma\), and \(\hat\gamma \pm 1.96\,\mathrm{se}(\hat\gamma)\) is a confidence interval for the difference.

Either way <80><94> computing the combination’s standard error directly, or reparametrizing so the combination becomes one coefficient <80><94> a single restriction, even one that spans several coefficients, is always a \(t\)-test.

15.4 Single vs. joint restrictions

Everything in this chapter tests a single restriction: one equation about the \(\beta\)’s, even when that equation mixes several of them (as in \(-0.4\,\beta_2 + 0.8\,\beta_3\)). A single restriction is always a \(t\)-test.

What a $t$-test cannot do

A \(t\)-test cannot test several restrictions at once <80><94> a joint hypothesis such as \[ H_0:\ \beta_2 = 0 \ \text{ and } \ \beta_3 = 0 . \] Checking each restriction with its own \(t\)-test “one at a time” is unreliable, because the individual error rates compound and the joint significance level is no longer what you think. Testing joint hypotheses needs a new statistic: the \(F\)-test.

That joint machinery <80><94> overall significance of the regression, comparisons of nested models, and tests of economic restrictions like constant returns to scale <80><94> is the subject of \(F\)-tests.

15.5 Recap

Inference on a single coefficient in multiple regression is identical to simple regression, with the one change that the degrees of freedom are \(N-K\): \[ t = \frac{b_k - c}{\mathrm{se}(b_k)} \sim t_{(N-K)} . \] For Big Andy (\(N-K = 72\)), both price (\(t = -7.2\)) and advertising (\(t = 2.7\)) are significant; the one-tail tests show demand is elastic but cannot prove advertising is profitable (\(t = 1.26\)); and the confidence intervals are \(\beta_2 \in [-10.1,\,-5.7]\) and \(\beta_3 \in [0.50,\,3.22]\) (wide).

Multiple regression also lets us estimate and test linear combinations of coefficients. The estimator \(\hat\lambda = \sum_k c_k b_k\) is BLUE, its variance includes all the covariances between the estimates, and a single restriction <80><94> even one across several coefficients <80><94> is still a \(t\)-test. Big Andy’s combined price-and-advertising strategy is worth an estimated $4,653, with a 90% interval of \([\$3{,}471,\ \$5{,}835]\).

Single vs. joint restrictions.
What it tests Tool
Single restriction one equation in the \(\beta\)’s (even across several) \(t\)-test, \(\text{df}=N-K\)
Joint restriction several equations at once \(F\)-test

To test several restrictions at once, we need the \(F\)-test, not a string of separate \(t\)-tests whose error rates compound.

Next time: interaction terms. So far each coefficient is a constant partial effect; with interactions (\(x_2 \times x_3\)) the effect of one variable depends on the level of another <80><94> and the marginal effects that result, like Andy’s optimal advertising, are exactly the combinations of coefficients we now know how to handle.