Contrasts and Linear Hypothesis Tests

In my time running the UCD Ecology Statistical Support Group and at the Davis R Users Group, one of the most common questions I get is “how do my treatment groups differ from one another?” This sort of question, usually referred to as a “contrast” falls under a more general envelope of “linear hypothesis test”. Here, I’ve put together a brief overview of hypothesis tests, contrasts, and the use of the emmeans package to test them.

Linear hypothesis tests

Linear hypothesis tests involve a linear combination of the components of a statistical model, usually to generate the estimated mean value of our response given some set of predictors.

In most cases, these estimated means will be:

• The response of a group in an ANOVA
• The response at some value of a predictor in regression

What is meant by linear combination?

Mathematically speaking, linear operations used in hypothesis testing are either:

• Scalar multiplication (cX)

Contrasts

In particular, we are often interested in contrasts, which are comparisons between population means. By definition, a contrast is defined as a linear combination of means or regression coefficients where the values of the linear combination sum to zero:

$c_1\mu_1 + c_2\mu_2 + ... + c_p\mu_p$

A comparison of two means, $$\mu_1$$ and $$\mu_2$$ can be expressed as $$c_1\mu_1 + c_2\mu_2$$, where $$c_1 = 1$$ and $$c_2 = -1$$.

However, tests under the broad envelope of “linear hypothesis tests” are not just limited to contrasts.

Constructing Linear Hypothesis Tests

To understand how we construct hypothesis tests using these components, we need to think about how most models are constructed.

Let’s consider a one-way ANOVA, comparing the estimated responses of three groups. I’ll use the iris dataset here, for convenience. In this analysis, we might be interested in whether a certain feature of the irises, Sepal.Width, differs across the three Species.

The ANOVA model used in this case could be written as lm(Sepal.Length ~ Species, data = iris)

Given how R uses factor coding, the fitted model will consist of three coefficients:

• $$\beta_0$$, the estimated Sepal Length of the first group, “setosa”
• $$\beta_1$$, the estimated difference in Sepal Length between the second group, “versicolor”, and the first
• $$\beta_2$$, the estimated difference in Sepal Length between the third group, “virginica”, and the first

Or, written mathematically:

$$(\hat{\mu}_{setosa}) = \beta_0$$
$$(\hat{\mu}_{versicolor}) = \beta_0 + \beta_1$$
$$(\hat{\mu}_{virginica}) = \beta_0 + \beta_2$$

What tests can we run with this data?

The most natural test to start with would be asking whether the groups differ from one another. Mathematically, this statement could be expressed as whether $$\hat \mu_{species 1} - \hat \mu_{species 2} = 0$$

Based on how this model is constructed, we can express this difference using coefficients:

$$(\hat{\mu}_{setosa}) - (\hat{\mu}_{versicolor}) = (\beta_0) - (\beta_0 + \beta_1) = - \beta_1$$
$$(\hat{\mu}_{setosa}) - (\hat{\mu}_{virginica}) = (\beta_0) - (\beta_0 + \beta_2) = - \beta_2$$
$$(\hat{\mu}_{versicolor}) - (\hat{\mu}_{virginica}) = (\beta_0) - (\beta_0 + \beta_1) = \beta_1 - \beta_2$$

If we were to express these $$\beta$$ coefficients as a vector $$\vec{\beta} = (\beta_0, \beta_1, \beta_2)$$, we could write these tests as the following:

$$(\hat {\mu}_{setosa}) - (\hat {\mu}_{versicolor}) = (0,1,0) * \vec{\beta}$$
$$(\hat {\mu}_{setosa}) - (\hat {\mu}_{virginica}) = (0,0,1) * \vec{\beta}$$
$$(\hat {\mu}_{versicolor}) - (\hat {\mu}_{virginica} = (0,1,-1) * \vec{\beta}$$

Where each hypothesis is a linear combination of coefficients, either added or subtrated from one another with a constant multiplier (1).

How do I run these tests?

By design, some tests are baked in to lm when using factor variables. The summary object will offer T-tests of whether each coefficient differs from zero, which can be used to determine whether the sepal width of the first group, setosa, differs from 0, and whether the other two groups differ from this first factor level.

iris_anova <- lm(Sepal.Width ~ Species, data = iris)
summary(iris_anova)
##
## Call:
## lm(formula = Sepal.Width ~ Species, data = iris)
##
## Residuals:
##    Min     1Q Median     3Q    Max
## -1.128 -0.228  0.026  0.226  0.972
##
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)        3.42800    0.04804  71.359  < 2e-16 ***
## Speciesversicolor -0.65800    0.06794  -9.685  < 2e-16 ***
## Speciesvirginica  -0.45400    0.06794  -6.683 4.54e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3397 on 147 degrees of freedom
## Multiple R-squared:  0.4008, Adjusted R-squared:  0.3926
## F-statistic: 49.16 on 2 and 147 DF,  p-value: < 2.2e-16

emmeans

But how do we test the third hypothesis, whether the second and third groups differ from one another? Enter emmeans – a useful package to generate “estimated marginal means” that can be used to test hypotheses.

By default, the pairs() function quickly compares means between groups (identified by the specs argument) in all possible combinations.

# Call emmeans on the model object, with one or more sets of columns to generate means for
iris_emm <- emmeans(iris_anova, specs = "Species")

# All pairwise contrasts of the "Species" column
pairs(iris_emm)
##  contrast               estimate     SE  df t.ratio p.value
##  setosa - versicolor       0.658 0.0679 147  9.685  <.0001
##  setosa - virginica        0.454 0.0679 147  6.683  <.0001
##  versicolor - virginica   -0.204 0.0679 147 -3.003  0.0088
##
## P value adjustment: tukey method for comparing a family of 3 estimates

But we could also do this in a more defined way, by specifying linear combinations of these estimated means, as shown before. The one twist is that emmeans would like to define the linear combinations by estimated means, so the contrasts will take a “1” or “-1” for each mean that is being compared:

test1 <- c(1,-1,0) # mean of species 1 minus species 2
test2 <- c(1,0,-1) # mean of species 1 minus species 3
test3 <- c(0,1,-1) # mean of species 2 minus species 3

contrast(iris_emm,
method = list("1. Se - Ve" = test1,
"2. Se - Vi" = test2,
"3. Ve - Vi" = test3))
##  contrast   estimate     SE  df t.ratio p.value
##  1. Se - Ve    0.658 0.0679 147  9.685  <.0001
##  2. Se - Vi    0.454 0.0679 147  6.683  <.0001
##  3. Ve - Vi   -0.204 0.0679 147 -3.003  0.0031

More complicated tests

At times, we might have a specific biological hypothesis for how estimated means differ from one another, such as whether one mean is more than twice the value of another, or whether the difference between two groups is equal to a particular value.

In our fourth test, we can manipulate this vector to check whether the twice the mean sepal width of setosa differs from the sepal width of virginica.

test4 <- c(2, 0, -1)

contrast(iris_emm,
method = list("4. 2xSe - Vi" = test4))
##  contrast     estimate    SE  df t.ratio p.value
##  4. 2xSe - Vi     3.88 0.107 147 36.139  <.0001

In the fifth, we can test whether means differ by a specific value using the offset argument. Here, the test asks whether setosa and virginica differ by a value of 3.5.

test5 <- c(0, 1, -1)

contrast(iris_emm,
method = list("5. Se - Vi" = test4),
offset = - 3.5)
##  contrast   estimate    SE  df t.ratio p.value
##  5. Se - Vi    0.382 0.107 147 3.556   0.0005

P-value control

When running multiple hypothesis tests, remember that accounting for inflating family-wise error rate can be important. By default, the pairs function in emmeans will apply a Tukey contrast to account for all simultaneous pairwise comparisons.

pairs(iris_emm)
##  contrast               estimate     SE  df t.ratio p.value
##  setosa - versicolor       0.658 0.0679 147  9.685  <.0001
##  setosa - virginica        0.454 0.0679 147  6.683  <.0001
##  versicolor - virginica   -0.204 0.0679 147 -3.003  0.0088
##
## P value adjustment: tukey method for comparing a family of 3 estimates

Note that we can change this, if desired. The adjust argument can apply multiple corrections, such as the Bonferroni correction.

contrast(iris_emm,
method = list("1. Se - Ve" = test1,
"2. Se - Vi" = test2,
"3. Ve - Vi" = test3),
adjust = "bonferroni")
##  contrast   estimate     SE  df t.ratio p.value
##  1. Se - Ve    0.658 0.0679 147  9.685  <.0001
##  2. Se - Vi    0.454 0.0679 147  6.683  <.0001
##  3. Ve - Vi   -0.204 0.0679 147 -3.003  0.0094
##
## P value adjustment: bonferroni method for 3 tests

A worked example - Frog Occupancy in Mountain Streams

emmeans can handle far more complex model types than the one presented previously, including random effects models, generalized linear models, and even MCMC (Bayesian) methods.

Model parameters that consist of a mixture of numeric and categorical predictors can also be accounted for, using the same general syntax presented above (with some additional complexity). Here, we show how emmeans might be used to test a number of hypotheses about frog occupancy across different mountain streams.

frogs <- read.csv("frogexample.csv",
stringsAsFactors = TRUE)

Fitting a logistic GLMER

Terms:

• WaterDepth - Water Depth
• velo1 - Water Velocity
• substraten2 - Stream Substrate
• hab - Habitat Type
• site - Study Site (Random Effect)
occ_model <- glmer(occ ~ WaterDepth * velo1 + velo1 * substraten2 + hab + (1 | site),
data = frogs,
family = "binomial")

coef(occ_model)
## \$site
##         (Intercept) WaterDepth    velo1 substraten2boul substraten2grav
## stream1  -2.6007168  -3.821035 7.286018       0.5542787       0.2701652
## stream2  -0.4757093  -3.821035 7.286018       0.5542787       0.2701652
##         substraten2silt   habPool  habRiff WaterDepth:velo1
## stream1        1.430042 0.3435592 1.732065         1.039696
## stream2        1.430042 0.3435592 1.732065         1.039696
##         velo1:substraten2boul velo1:substraten2grav velo1:substraten2silt
## stream1             -7.871831             -10.69519             -25.31216
## stream2             -7.871831             -10.69519             -25.31216
##
## attr(,"class")
## [1] "coef.mer"

Mathematically:

$log( { \frac{y}{1-y}}) \sim \beta_0 + \beta_1WtrDepth + \beta_2Velo + \beta_{3-5}Substrate + \beta_{6-7}Hab + \beta_8(WaterDepth ~ * ~Velo) + \beta_{9-11}(Velo ~ * ~ Substrate) + \epsilon$

1. Pairwise tests of a single factor - Do frogs prefer pools, cascades, or riffles?

$$\bar {Y}_{Pool} = \bar {Y}_{Cascade} = \bar {Y}_{Riffle}$$

$$\beta_0 = \beta_0 + \beta_{Cascade} = \beta_0 + \beta_{Riffle}$$

While we have multiple factors included in the model, individual pairwise tests of each factor individually are possible.

Note that emmeans can provide results on the model scale (logit) or at the response scale (probability).

pairs(emmeans(occ_model,
specs = "hab"))
##  contrast    estimate    SE  df z.ratio p.value
##  Casc - Pool   -0.344 0.189 Inf -1.822  0.1625
##  Casc - Riff   -1.732 0.316 Inf -5.487  <.0001
##  Pool - Riff   -1.389 0.281 Inf -4.937  <.0001
##
## Results are averaged over the levels of: substraten2
## Results are given on the log odds ratio (not the response) scale.
## P value adjustment: tukey method for comparing a family of 3 estimates
pairs(emmeans(occ_model,
specs = "hab",
type = "response"))
##  contrast    odds.ratio     SE  df z.ratio p.value
##  Casc / Pool      0.709 0.1338 Inf -1.822  0.1625
##  Casc / Riff      0.177 0.0558 Inf -5.487  <.0001
##  Pool / Riff      0.249 0.0702 Inf -4.937  <.0001
##
## Results are averaged over the levels of: substraten2
## P value adjustment: tukey method for comparing a family of 3 estimates
## Tests are performed on the log odds ratio scale

3. Multiple interactions – Substrate | Water Depth and Velocity

Tests between:

$$\beta_0 + \beta_1 * WtrDepth + \beta_2*Velo + \beta_{3-5}Substrate + \beta_8(WtrDepth ~ * ~ Velo) + \beta_{9-11}(Velo * Substrate)$$

For different substrates and values of Water Depth and Velocity

br()

In this model, we have multiple quantitative predictors that interact with categorical ones. What if we would like to compare occupancy across habitat types for a couple different stream velocities and water depths? Using the condition operator, |, and the at = list() assignment, we can generate estimated means and contrasts across a combination of these predictors.

water_comp <- emmeans(occ_model,
specs = ~ substraten2 | WaterDepth + velo1, # Alternative formulation
at = list(velo1 = c(0, 1),
WaterDepth = c(0, .2)),
type = "response")

pairs(water_comp)
## WaterDepth = 0, velo1 = 0:
##  contrast    odds.ratio       SE  df z.ratio p.value
##  bedr / boul   1.00e+00 0.00e+00 Inf -2.740  0.0312
##  bedr / grav   1.00e+00 0.00e+00 Inf -1.000  0.7495
##  bedr / silt   0.00e+00 0.00e+00 Inf -4.622  <.0001
##  boul / grav   1.00e+00 0.00e+00 Inf  1.260  0.5882
##  boul / silt   0.00e+00 0.00e+00 Inf -3.191  0.0077
##  grav / silt   0.00e+00 0.00e+00 Inf -4.209  0.0002
##
## WaterDepth = 0.2, velo1 = 0:
##  contrast    odds.ratio       SE  df z.ratio p.value
##  bedr / boul   1.00e+00 0.00e+00 Inf -2.740  0.0312
##  bedr / grav   1.00e+00 0.00e+00 Inf -1.000  0.7495
##  bedr / silt   0.00e+00 0.00e+00 Inf -4.622  <.0001
##  boul / grav   1.00e+00 0.00e+00 Inf  1.260  0.5882
##  boul / silt   0.00e+00 0.00e+00 Inf -3.191  0.0077
##  grav / silt   0.00e+00 0.00e+00 Inf -4.209  0.0002
##
## WaterDepth = 0, velo1 = 1:
##  contrast    odds.ratio       SE  df z.ratio p.value
##  bedr / boul   1.51e+03 3.94e+03 Inf  2.797  0.0265
##  bedr / grav   3.37e+04 9.22e+04 Inf  3.810  0.0008
##  bedr / silt   2.35e+10 1.86e+11 Inf  3.023  0.0134
##  boul / grav   2.20e+01 3.40e+01 Inf  2.042  0.1727
##  boul / silt   1.56e+07 1.18e+08 Inf  2.189  0.1262
##  grav / silt   6.99e+05 5.32e+06 Inf  1.769  0.2883
##
## WaterDepth = 0.2, velo1 = 1:
##  contrast    odds.ratio       SE  df z.ratio p.value
##  bedr / boul   1.51e+03 3.94e+03 Inf  2.797  0.0265
##  bedr / grav   3.37e+04 9.22e+04 Inf  3.810  0.0008
##  bedr / silt   2.35e+10 1.86e+11 Inf  3.023  0.0134
##  boul / grav   2.20e+01 3.40e+01 Inf  2.042  0.1727
##  boul / silt   1.56e+07 1.18e+08 Inf  2.189  0.1262
##  grav / silt   6.99e+05 5.32e+06 Inf  1.769  0.2883
##
## Results are averaged over the levels of: hab
## P value adjustment: tukey method for comparing a family of 4 estimates
## Tests are performed on the log odds ratio scale

Default plotting of this object can also be useful:

plot(water_comp, CIs = TRUE)