Chapter 5 Data with Three or More Levels

Chapter 5 Learning Outcomes

  1. Identify the levels of a three-level experiment and the observational units and variables at each level.

  2. Interpret fixed and random effects and associated parameter estimates in models with three or more levels.

  3. Write equations for linear mixed effects models in mathematical notation, including assumptions about distributions of random effects and error terms for models with three or more levels.

These notes accompany Chapter 10 in Beyond Multiple Linear Regression by Roback and Legler.

# Packages required for Chapter 10
library(MASS)
library(gridExtra)  
library(mnormt) 
library(lme4) 
library(lmerTest)
library(knitr) 
library(kableExtra)
library(tidyverse)
library(Hmisc)
library(nlme)

5.1 Three-Level Plants Experiment

5.1.1 Experimental Design

Researchers are interested in comparing heights of plants of two different genotypes (A and B), with two different fertilizers (A and B), and also in assessing whether being close to window affects growth. Scientists obtained four trays and placed four pots in each tray. Two of the trays were placed close to windows and the other two were placed away from windows. Within each tray, two of the pots received fertilizer A and two received fertilizer B. Two plants of each phenotype were then planted in each plot. After 4 weeks, researchers recorded the height of each plant.

Potted Plant Experimental Design

Figure 5.1: Potted Plant Experimental Design

5.1.2 The Data

The first 20 rows of the dataset are shown below.

head(plants, 20)
##    plantid tray pot window fertA phenoA height
## 1        1    1   1      1     1      1   5.59
## 2        2    1   1      1     1      0   5.33
## 3        3    1   1      1     1      1   5.70
## 4        4    1   1      1     1      0   5.77
## 5        5    1   2      1     0      1   2.93
## 6        6    1   2      1     0      0   1.79
## 7        7    1   2      1     0      1   2.57
## 8        8    1   2      1     0      0   2.12
## 9        9    1   3      1     1      1   4.86
## 10      10    1   3      1     1      0   2.13
## 11      11    1   3      1     1      1   3.24
## 12      12    1   3      1     1      0   2.10
## 13      13    1   4      1     0      1   1.10
## 14      14    1   4      1     0      0   2.35
## 15      15    1   4      1     0      1   1.79
## 16      16    1   4      1     0      0   4.88
## 17      17    2   5      0     1      1   1.51
## 18      18    2   5      0     1      0   3.71
## 19      19    2   5      0     1      1   2.30
## 20      20    2   5      0     1      0   2.30

5.1.3 Questions

  1. Between plants B, C, and D, which would you expect to be most highly correlated with plant A? Which would you expect to be least highly correlated with plant A?
  2. What are the observational units at each level? What are the treatments or variables?

5.1.4 Three Level Model

Let \(Y_{ijk}\) represent the height of plant \(k\) in pot \(j\) in tray \(i\). A possible model is:

\[ \begin{align*} Y_{ijk} & = [\alpha_{0}+\alpha_{1}\textrm{Window}_{i}+\beta_{0}\textrm{FertA}_{ij}+\gamma_{0}\textrm{PhenoA}_{ijk}] \\ & \textrm{} + [t_{i}+ p_{ij} + \epsilon_{ijk}], \end{align*} \]

where \(t_i\sim N(0, \sigma^2_t)\), \(p_{ij}\sim N(0, \sigma^2_p)\), and \(\epsilon_{ijk}\sim N(0, \sigma^2)\)

This model includes fixed effects for window, fertilizer, and phenotype, and random effects for tray and pot. We assume there are no interactions between the fixed effects.

The model includes 3 random effects:

  • \(t_i\) - deviations in average plant heights between trays, after accounting for the three fixed effects.
  • \(p_{ij}\) - deviations in average plant height between pots in the same tray, after accounting for the three fixed effects.
  • \(\epsilon_{ijk}\) - deviations between individual plants in the same tray and pot after accounting for the three fixed effects.

The model does not include any random slope effects. This means we are assuming the effect of phenotype is the same in each pot, and the effect of fertilizer is the same in each tray.

5.1.5 R Implementation of Three Level Model

M <- lmer(data=plants, height ~ window + fertA + phenoA  + (1|pot) + (1|tray))
summary(M)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: height ~ window + fertA + phenoA + (1 | pot) + (1 | tray)
##    Data: plants
## 
## REML criterion at convergence: 201.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.35859 -0.61695 -0.06346  0.56342  2.27449 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pot      (Intercept) 0.4784   0.6917  
##  tray     (Intercept) 0.1336   0.3655  
##  Residual             1.0775   1.0380  
## Number of obs: 64, groups:  pot, 16; tray, 4
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)  
## (Intercept)   2.1178     0.4731  3.8400   4.476   0.0121 *
## window        1.1416     0.5662  2.0000   2.016   0.1813  
## fertA         0.7528     0.4324 11.0000   1.741   0.1095  
## phenoA        0.5997     0.2595 47.0000   2.311   0.0253 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) window fertA 
## window -0.598              
## fertA  -0.457  0.000       
## phenoA -0.274  0.000  0.000

\(\hat{\sigma}_p=0.6917\), \(\hat{\sigma}_t=0.3655\), \(\hat{\sigma}=1.0380\).

After accounting for phenotype, fertilizer, and window, the greatest amount of remaining variability occurs between individual plants in the same pot, followed by average heights among pots in the same tray, followed by average heights between trays.

5.1.6 Inappropriate LLSR Model

We compare this to an inappropriately fit LLSR model.

M_LLSR <- lm(data=plants, height ~ window + fertA + phenoA  )
summary(M_LLSR)
## 
## Call:
## lm(formula = height ~ window + fertA + phenoA, data = plants)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7591 -0.9762  0.2752  0.8677  2.1381 
## 
## Coefficients:
##             Estimate Std. Error t value      Pr(>|t|)    
## (Intercept)   2.1178     0.3126   6.775 0.00000000604 ***
## window        1.1416     0.3126   3.652      0.000548 ***
## fertA         0.7528     0.3126   2.408      0.019115 *  
## phenoA        0.5997     0.3126   1.918      0.059814 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.25 on 60 degrees of freedom
## Multiple R-squared:  0.2755, Adjusted R-squared:  0.2393 
## F-statistic: 7.606 on 3 and 60 DF,  p-value: 0.0002161
  • the LLSR model inappropriately assumes we have 64 independent observations

  • Since the phenotype variable pertains to the 64 individual plants, the “effective” sample size for inference about phenotype is still 64. Since we are able to randomly assign phenotypes within pots, the multilevel model accounts for variability associated with pot and tray, taking \(\sigma^2_t\) and \(\sigma^2_p\) out of the calculation of \(\sigma^2\) in the LSSR model, resulting in a smaller standard error associated with phenotype.

  • Since fertilizer was assigned to pots, rather than plants, the “effective” sample size is 16. This smaller sample size leads to higher standard error associated with the effect of fertilizer.

  • Since window was assigned to trays, the “effective” sample size is 4, resulting in the highest standard errors being associated with trays.

5.2 Three-Level Model with Random Slopes

5.2.1 Random Slope for PhenoA

We could allow for the possibility that the effect of phenotype is different in different pots, by adding a random slope effect for phenotype.

\[ \begin{align*} Y_{ijk} & = [\alpha_{0}+\alpha_{1}\textrm{Window}_{i}+\beta_{0}\textrm{FertA}_{ij}+\gamma_{0}\textrm{PhenoA}_{ijk}] \\ & \textrm{} + [t_i + p_{ij} + q_{ij}\textrm{PhenoA}_{ijk} + \epsilon_{ijk}], \end{align*} \]

where \(t_i\sim N(0, \sigma^2_t)\), and \(\epsilon_{ijk}\sim N(0, \sigma^2)\)

\[ \left[ \begin{array}{c} p_{ij} \\q_{ij} \end{array} \right] \sim N \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{cc} \sigma_{p}^{2} & \rho_{pq}\sigma_{p}\sigma_q \\ \rho_{pq}\sigma_{p}\sigma_q & \sigma_{q}^{2} \end{array} \right] \right) \]

We assume random error terms on different levels are independent of each other. That is, \(t_i\) is independent of \(p_{ij}\) and \(q_{ij}\), and all of these are independent of \(\epsilon_{ijk}\).

5.2.2 R Output for Random Slope for PhenoA

M3 <- lmer(data=plants, height ~ window + fertA + phenoA  + (phenoA|pot) + (1|tray))
summary(M3)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: height ~ window + fertA + phenoA + (phenoA | pot) + (1 | tray)
##    Data: plants
## 
## REML criterion at convergence: 200.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.16998 -0.65343 -0.06843  0.67860  1.97927 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  pot      (Intercept) 0.6371   0.7982        
##           phenoA      0.5754   0.7586   -0.42
##  tray     (Intercept) 0.1221   0.3494        
##  Residual             0.8938   0.9454        
## Number of obs: 64, groups:  pot, 16; tray, 4
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)  
## (Intercept)   2.1268     0.4719  3.5371   4.506   0.0142 *
## window        1.1380     0.5567  1.8497   2.044   0.1879  
## fertA         0.7384     0.4334 10.8398   1.704   0.1169  
## phenoA        0.5997     0.3030 15.0002   1.979   0.0665 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) window fertA 
## window -0.590              
## fertA  -0.459  0.000       
## phenoA -0.305  0.000  0.000

The negative estimate of correlation \(\rho_{pq}\) indicates that differences between phenotypes are smaller in pots when plants of phenotype B are taller.

5.2.3 Random Slope for Fert

What if we want to allow for the possibility that the effects of fertilizers varies between trays?

The R code for such a model is shown below.

M4 <- lmer(data=plants, height ~ window + fertA + phenoA  + (1|pot) + (fertA|tray))

How would we write this model mathematically? Include any distributional assumptions about the random effects and error terms.

5.2.4 R Output for Random Slope for Fert

M4 <- lmer(data=plants, height ~ window + fertA + phenoA + (1|pot) + (fertA|tray))
summary(M4)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: height ~ window + fertA + phenoA + (1 | pot) + (fertA | tray)
##    Data: plants
## 
## REML criterion at convergence: 201.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.34479 -0.62506 -0.01316  0.55385  2.38204 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  pot      (Intercept) 0.3905   0.6249        
##  tray     (Intercept) 0.4759   0.6899        
##           fertA       0.2731   0.5226   -1.00
##  Residual             1.0775   1.0380        
## Number of obs: 64, groups:  pot, 16; tray, 4
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)  
## (Intercept)   1.9829     0.5404  3.0907   3.669   0.0333 *
## window        1.4115     0.5430  2.7821   2.599   0.0869 .
## fertA         0.7528     0.4830  3.8691   1.559   0.1964  
## phenoA        0.5997     0.2595 47.0000   2.311   0.0253 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) window fertA 
## window -0.502              
## fertA  -0.661  0.000       
## phenoA -0.240  0.000  0.000
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

5.2.5 Boundary Constraints

Recall that \(-1\leq\rho\leq1\), where \(\rho\) represents the correlation between random effects at the same level. When \(\hat{\rho}=\pm1\), this indicates that the maximimum likelihood estimates are occuring along a boundary of this range, usually making them unreliable.

In these situations, we should not draw conclusions basde on \(\rho\) and should consider dropping this random effect from the model.

We could also add interaction or nonlinear terms for fixed effects if we had reason to do so.

5.3 Longitudinal Component to Plants Data

5.3.1 Repeated Measures over Time

Now suppose that in addition to this first measurement, the heights of the same plants were recorded again once per week for 4 weeks.

head(plants, 20)
##    plantid time tray pot window fertA phenoA   height
## 1        1    1    1   1      1     1      1 5.593006
## 2        1    2    1   1      1     1      1 6.588438
## 3        1    3    1   1      1     1      1 7.483192
## 4        1    4    1   1      1     1      1 8.889305
## 5        2    1    1   1      1     1      0 5.330448
## 6        2    2    1   1      1     1      0 6.350843
## 7        2    3    1   1      1     1      0 7.380016
## 8        2    4    1   1      1     1      0 8.442597
## 9        3    1    1   1      1     1      1 5.700156
## 10       3    2    1   1      1     1      1 6.549357
## 11       3    3    1   1      1     1      1 7.742034
## 12       3    4    1   1      1     1      1 8.892204
## 13       4    1    1   1      1     1      0 5.770243
## 14       4    2    1   1      1     1      0 6.550047
## 15       4    3    1   1      1     1      0 7.632318
## 16       4    4    1   1      1     1      0 8.796274
## 17       5    1    1   2      1     0      1 2.925716
## 18       5    2    1   2      1     0      1 3.626215
## 19       5    3    1   2      1     0      1 4.687552
## 20       5    4    1   2      1     0      1 5.433369

5.3.2 Levels and Variables

Now, this is a 4-level dataset.

Level Observational Unit Treatment or Variable
Level 1 time points time
Level 2 plants phenotype
Level 3 pots fertilizer
Level 4 trays window

5.3.3 Model for Longitudinal Plants Study

Let \(Y_{ijkl}\) represent the height of plant \(k\) in pot \(j\) in tray \(i\) at time \(l\). A possible model is:

\[ \begin{align*} Y_{ijkl} & = [\alpha_{0}+\alpha_{1}\textrm{Window}_{i}+\beta_{0}\textrm{FertA}_{ij}+\gamma_{0}\textrm{PhenoA}_{ijk} + \delta_{0}\textrm{Time}_{ijkl}] \\ & \textrm{} + [t_{i}+ p_{ij} + f_{ijk} + \epsilon_{ijkl}], \end{align*} \]

where \(t_i\sim N(0, \sigma^2_t)\), \(p_{ij}\sim N(0, \sigma^2_p)\), \(f_{ijk}\sim N(0, \sigma^2_f)\),and \(\epsilon_{ijkl}\sim N(0, \sigma^2)\)

This model involves random intercepts for each tray(\(t_i\)), pot(\(p_{ij}\)), plant (\(f_{ijk}\)), in addition to a random error term \(\epsilon{ijkl}\) capturing random differences between measurements on the same plant over time.

In addition to the assumptions of model 1, it assumes each plant grows at the same rate over time, after accounting for the fixed effects in the model.

5.3.4 R Output for Longitudinal Model

M5 <- lmer(data=plants, height ~ window + fertA + phenoA + time + (1|tray) + (1|pot) + (1|plantid))
summary(M5)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: height ~ window + fertA + phenoA + time + (1 | tray) + (1 | pot) +  
##     (1 | plantid)
##    Data: plants
## 
## REML criterion at convergence: 362.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.81419 -0.56459 -0.00598  0.56245  2.67726 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  plantid  (Intercept) 1.10406  1.0507  
##  pot      (Intercept) 0.77702  0.8815  
##  tray     (Intercept) 0.07314  0.2705  
##  Residual             0.08029  0.2834  
## Number of obs: 256, groups:  plantid, 64; pot, 16; tray, 4
## 
## Fixed effects:
##              Estimate Std. Error        df t value            Pr(>|t|)    
## (Intercept)   1.21075    0.50407   4.40523   2.402              0.0683 .  
## window        1.47080    0.58108   2.00001   2.531              0.1270    
## fertA         0.90342    0.51431  11.00003   1.757              0.1068    
## phenoA        0.58651    0.26506  47.00038   2.213              0.0318 *  
## time          0.66370    0.01584 190.99990  41.899 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) window fertA  phenoA
## window -0.576                     
## fertA  -0.510  0.000              
## phenoA -0.263  0.000  0.000       
## time   -0.079  0.000  0.000  0.000

We see that after accounting for time, phenotype, fertilizer, and window location, there is more variability between average heights of different plants in the same pot (\(\sigma_f=1.05\)), and in average heights in different pots in the same tray (\(\sigma_p=0.88\)), than between average heights in different trays(\(\sigma_t=0.27\)), or between heights of the same plant at different times(\(\sigma=0.28\)).

There is evidence that plants grow over time, and evidence of differences between the phenotypes.

5.4 Four-Level Longitudinal Model with Random Slopes

5.4.1 Random Slope for Time

We could add a random error term associated with the slope on time, to allow for the possibility that individual plants grow at different rates, even after accounting for the fixed effects in the model

\[ \begin{align*} Y_{ijkl} & = [\alpha_{0}+\alpha_{1}\textrm{Window}_{i}+\beta_{0}\textrm{FertA}_{ij}+\gamma_{0}\textrm{PhenoA}_{ijk} + \delta_{0}\textrm{Time}_{ijkl}] \\ & \textrm{} + [t_{i}+ p_{ij} + f_{ijk} + g_{ijk}\textrm{Time}_{ijkl} + \epsilon_{ijkl}], \end{align*} \]

where \(t_i\sim N(0, \sigma^2_t)\), \(p_{ij}\sim N(0, \sigma^2_p)\), \(\epsilon_{ijkl}\sim N(0, \sigma^2)\), and

\[ \begin{equation*} \left[ \begin{array}{c} f_{ijk} \\g_{ijk} \end{array} \right] \sim N \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{cc} \sigma_{f}^{2} & \rho_{fg}\sigma_{f}\sigma_g \\ \rho_{fg}\sigma_{f}\sigma_g & \sigma_{g}^{2} \end{array} \right] \right) \end{equation*} \]

M5 <- lmer(data=plants, height ~ window + fertA + phenoA + time+  (time|plantid) + (1|pot) + (1|tray))
summary(M5)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: height ~ window + fertA + phenoA + time + (time | plantid) +  
##     (1 | pot) + (1 | tray)
##    Data: plants
## 
## REML criterion at convergence: 211.8
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.10148 -0.52521 -0.00405  0.46057  2.19928 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  plantid  (Intercept) 1.15701  1.0756       
##           time        0.03871  0.1967   0.22
##  pot      (Intercept) 0.17876  0.4228       
##  tray     (Intercept) 0.20608  0.4540       
##  Residual             0.01646  0.1283       
## Number of obs: 256, groups:  plantid, 64; pot, 16; tray, 4
## 
## Fixed effects:
##             Estimate Std. Error       df t value            Pr(>|t|)    
## (Intercept)  1.81814    0.45694  3.32508   3.979              0.0234 *  
## window       0.73079    0.56786  1.99920   1.287              0.3270    
## fertA        0.47511    0.34115  7.96885   1.393              0.2013    
## phenoA       0.54007    0.26775 43.09443   2.017              0.0499 *  
## time         0.66370    0.02562 63.00065  25.909 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) window fertA  phenoA
## window -0.621                     
## fertA  -0.373  0.000              
## phenoA -0.293  0.000  0.000       
## time    0.051  0.000  0.000  0.000

The positive correlation between \(s_{ijk}\) and \(r_{ijk}\) tells us that taller plants tend to grow faster on average, even after accounting for fixed effects in the model.

5.4.2 More Random Slope Terms

We could also allow the effect of phenotype to vary between pots and the effect of fertilizer to vary between trays. This gives a model of the form:

\[ \begin{align*} Y_{ijk} & = [\beta_{0}+\beta_{1}\textrm{Window}_{i}+\beta_{2}\textrm{FertA}_{ij}+\beta_{3}\textrm{PhenoA}_{ijk} + \beta_{3}\textrm{Time}_{ijkl}] \\ & \textrm{} + [t_{i}+ s_i\textrm{fertA}_i + p_{ij} + q_{ij}\textrm{phenoA}_{ij} + f_{ijk} + g_{ijk}\textrm{Time}_{ijkl} + \epsilon_{ijkl}], \end{align*} \]

We again assume that random effects at different levels are independent. Thus, \(\epsilon_{ijkl}\sim\mathcal{N}(0, \sigma^2)\)

\[ \begin{equation*} \left[ \begin{array}{c} t_{i} \\s_{i} \end{array} \right] \sim N \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{cc} \sigma_{t}^{2} & \rho_{ts}\sigma_{t}\sigma_s \\ \rho_{ts}\sigma_{t}\sigma_s & \sigma_{s}^{2} \end{array} \right] \right) \end{equation*} \]

\[ \begin{equation*} \left[ \begin{array}{c} p_{ij} \\q_{ij} \end{array} \right] \sim N \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{cc} \sigma_{p}^{2} & \rho_{pq}\sigma_{p}\sigma_q \\ \rho_{pq}\sigma_{p}\sigma_q & \sigma_{q}^{2} \end{array} \right] \right) \end{equation*} \]

\[ \begin{equation*} \left[ \begin{array}{c} f_{ijk} \\g_{ijk} \end{array} \right] \sim N \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{cc} \sigma_{f}^{2} & \rho_{fg}\sigma_{f}\sigma_g \\ \rho_{fg}\sigma_{f}\sigma_g & \sigma_{g}^{2} \end{array} \right] \right) \end{equation*} \]

M5 <- lmer(data=plants, height ~ window + fertA + phenoA + time+  (time|plantid) + (phenoA|pot) + (fertA|tray))
summary(M5)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: height ~ window + fertA + phenoA + time + (time | plantid) +  
##     (phenoA | pot) + (fertA | tray)
##    Data: plants
## 
## REML criterion at convergence: 210.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.09915 -0.51800  0.00304  0.44894  2.19798 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  plantid  (Intercept) 0.96878  0.9843        
##           time        0.03871  0.1967   0.22 
##  pot      (Intercept) 0.37653  0.6136        
##           phenoA      0.55487  0.7449   -0.70
##  tray     (Intercept) 0.35597  0.5966        
##           fertA       0.13997  0.3741   -0.69
##  Residual             0.01646  0.1283        
## Number of obs: 256, groups:  plantid, 64; pot, 16; tray, 4
## 
## Fixed effects:
##             Estimate Std. Error       df t value            Pr(>|t|)    
## (Intercept)  1.71536    0.50375  1.46343   3.405               0.116    
## window       0.90369    0.57817  1.87488   1.563               0.266    
## fertA        0.50662    0.37849  2.69858   1.339               0.282    
## phenoA       0.54121    0.30820 14.50247   1.756               0.100    
## time         0.66370    0.02562 63.00080  25.909 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) window fertA  phenoA
## window -0.574                     
## fertA  -0.484  0.000              
## phenoA -0.324  0.000  0.000       
## time    0.042  0.000  0.000  0.000

After accounting for window, fertilizer, phenotype, and time,

  • plants that are taller initially tend to grow faster than those that are shorter initially (\(\hat{\rho}_{fg} =0.22\))

  • difference between phenotypes is smaller in pots where plants of phenotype B are taller

  • difference between fertilizers is smaller in trays where plants receiving fertilizer B are taller

5.4.3 Additional Random Slopes

In theory, we could add a random slope term that also allows plants in the same pot to change at different rates over time.

We would write this model as:

\[ \begin{align*} Y_{ij} & = [\beta_{0}+\beta_{1}\textrm{Window}_{i}+\beta_{2}\textrm{FertA}_{ij}+\beta_{3}\textrm{PhenoA}_{ijk} + \beta_{3}\textrm{Time}_{ijkl}] \\ & \textrm{} + [t_{i}+ s_i\textrm{fertA}_{ij} + p_{ij} + q_{ij}\textrm{phenoA}_{ijk} +r_{ij}\textrm{Time}_{ijkl}+ f_{ijk} + g_{ijk}\textrm{Time}_{ijkl} + \epsilon_{ijkl}], \end{align*} \]

We again assume that random effects at different levels are independent. Thus, \(\epsilon_{ijkl}\sim\mathcal{N}(0, \sigma^2)\)

\[ \begin{equation*} \left[ \begin{array}{c} t_{i} \\s_{i} \end{array} \right] \sim N \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{cc} \sigma_{t}^{2} & \rho_{ts}\sigma_{t}\sigma_s \\ \rho_{ts}\sigma_{t}\sigma_s & \sigma_{s}^{2} \end{array} \right] \right) \end{equation*} \]

\[ \begin{equation*} \left[ \begin{array}{c} p_{ij} \\q_{ij} \\ r_{ij} \end{array} \right] \sim N \left( \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right], \left[ \begin{array}{cc} \sigma_{p}^{2} & \rho_{pq}\sigma_{p}\sigma_q & \rho_{pr}\sigma_{p}\sigma_r\\ \rho_{pq}\sigma_{p}\sigma_q & \sigma_{q}^{2}& \rho_{qr}\sigma_{q}\sigma_r\\ \rho_{pr}\sigma_{p}\sigma_r & \rho_{qr}\sigma_{q}\sigma_r & \sigma_r^2 \\ \end{array} \right] \right) \end{equation*} \]

\[ \begin{equation*} \left[ \begin{array}{c} f_{ijk} \\g_{ijk} \end{array} \right] \sim N \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{cc} \sigma_{f}^{2} & \rho_{fg}\sigma_{f}\sigma_g \\ \rho_{fg}\sigma_{f}\sigma_g & \sigma_{g}^{2} \end{array} \right] \right) \end{equation*} \]

5.4.4 R - Model with Additional Random Slopes

M6 <- lmer(data=plants, height ~ window + fertA + phenoA + time+  (time|plantid) + (time|pot) + (phenoA|pot) + (fertA|tray))
summary(M6)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: height ~ window + fertA + phenoA + time + (time | plantid) +  
##     (time | pot) + (phenoA | pot) + (fertA | tray)
##    Data: plants
## 
## REML criterion at convergence: 172.5
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.09118 -0.52573  0.03103  0.47582  2.20602 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  plantid  (Intercept) 0.84573  0.9196        
##           time        0.01243  0.1115   -0.03
##  pot      (Intercept) 0.61375  0.7834        
##           time        0.02758  0.1661   1.00 
##  pot.1    (Intercept) 0.17401  0.4171        
##           phenoA      0.58649  0.7658   -1.00
##  tray     (Intercept) 0.05352  0.2313        
##           fertA       0.29478  0.5429   1.00 
##  Residual             0.01646  0.1283        
## Number of obs: 256, groups:  plantid, 64; pot, 16; tray, 4
## 
## Fixed effects:
##             Estimate Std. Error       df t value       Pr(>|t|)    
## (Intercept)  2.51486    0.40257  6.85950   6.247       0.000461 ***
## window      -0.30184    0.45850  3.01923  -0.658       0.557062    
## fertA        0.09256    0.39106  3.17981   0.237       0.827342    
## phenoA       0.56182    0.30098 14.99441   1.867       0.081635 .  
## time         0.66370    0.04438 15.04525  14.955 0.000000000194 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) window fertA  phenoA
## window -0.569                     
## fertA  -0.052  0.000              
## phenoA -0.387  0.000  0.000       
## time    0.446  0.000  0.000  0.000
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

This model, however, leads to boundary contraint issues.

5.4.5 When Not to Add Random Slopes

We can only add random slopes for variables measured at a lower level than the observational unit. Otherwise, the variable will not change within the observational unit.

For example, we can’t add a random slope for effect of fertilizer (a level 3 variable) between pots (our level 3 observational units). All plants in the same pot have the same type of fertilizer, so we can’t estimate difference of fertilizer effects between pots.

M7 <- lmer(data=plants, height ~ window + fertA + phenoA + time+  (1|plantid) + (1|pot) + (fertA|pot) + (1|tray))
summary(M7)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: height ~ window + fertA + phenoA + time + (1 | plantid) + (1 |  
##     pot) + (fertA | pot) + (1 | tray)
##    Data: plants
## 
## REML criterion at convergence: 359.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.79058 -0.55734 -0.00857  0.57002  2.67329 
## 
## Random effects:
##  Groups   Name        Variance       Std.Dev.   Corr
##  plantid  (Intercept) 1.071505907081 1.03513570     
##  pot      (Intercept) 0.000000001668 0.00004084     
##  pot.1    (Intercept) 0.000000000000 0.00000000     
##           fertA       1.529967713440 1.23691864  NaN
##  tray     (Intercept) 0.637103820226 0.79818784     
##  Residual             0.080293948587 0.28336187     
## Number of obs: 256, groups:  plantid, 64; pot, 16; tray, 4
## 
## Fixed effects:
##              Estimate Std. Error        df t value            Pr(>|t|)    
## (Intercept)   1.43324    0.63318   2.21882   2.264              0.1393    
## window        1.02583    0.86926   1.96543   1.180              0.3611    
## fertA         0.90342    0.50938   8.36789   1.774              0.1124    
## phenoA        0.58651    0.26120  50.13334   2.245              0.0292 *  
## time          0.66370    0.01584 191.00029  41.899 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) window fertA  phenoA
## window -0.686                     
## fertA  -0.106  0.000              
## phenoA -0.206  0.000  0.000       
## time   -0.063  0.000  0.000  0.000
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

In this section, we have focused on random effects, but we can, of course, explore fixed effects (such as interactions) as we’ve done before.

5.5 Practice Problems

Researchers are interested in studying the activity levels of two different species of fish. They are also interested in assessing the effect of the size of the fish tank and water temperature on activity level. Fish in three different aquariums with different water temperatures (one cold, one medium, and one warm) were studied. In each aquarium, there were two different tanks, one large, and one small. Two fish of each species were placed in each tank.

The dataset contains the following variables.

Id - fish ID number
Aquarium - numbered 1-3
Tank - numbered 1-6
Species - A or B
SizeLarge - was the tank large? (0=No, 1=Yes)
Temp - Temperature of the water (cold, medium, warm)
Activity - a measure of the fish’s activity level

The numbers under each fish show that fish’s activity level. Average activity levels for each tank and each aquarium are also given.

Fish in Aquarium Study

Figure 5.2: Fish in Aquarium Study

The first 10 rows of the dataset are shown below.

Fish <- 1:24
Tank <- factor(c(rep(1, 4), rep(2, 4), rep(3, 4), rep(4, 4), rep(5, 4), rep(6, 4)))
Aquarium <- c(rep(1, 8), rep(2, 8), rep(3, 8))
Species <- c("A","A","B","B","A","A","B","B","A","A","B","B","A","A","B","B","A","A","B",'B',"A","A","B","B")
Temp <- c(rep("Cold", 8), rep("Medium", 8), rep("Warm", 8))
Size <- c(rep(c("Large", "Large", "Small", "Small"),3)) 
Activity <- c(55, 45, 45, 55, 50, 50, 40, 60, 80, 40, 45, 75, 50, 66, 45, 71, 75, 35, 60, 50, 40, 70, 60, 50)
Activity <- c(60, 50, 40, 40, 50, 50, 40, 60, 80, 40, 45, 75, 50, 66, 45, 71, 80, 40, 65, 55, 35, 65, 55, 45)
FishData <- data.frame(Fish, Tank, Aquarium, Species, Temp, Size, Activity)
head(FishData, 10)
##    Fish Tank Aquarium Species   Temp  Size Activity
## 1     1    1        1       A   Cold Large       60
## 2     2    1        1       A   Cold Large       50
## 3     3    1        1       B   Cold Small       40
## 4     4    1        1       B   Cold Small       40
## 5     5    2        1       A   Cold Large       50
## 6     6    2        1       A   Cold Large       50
## 7     7    2        1       B   Cold Small       40
## 8     8    2        1       B   Cold Small       60
## 9     9    3        2       A Medium Large       80
## 10   10    3        2       A Medium Large       40

28) Identify the levels of a three-level experiment and the observational units and variables at each level.

Identify the levels of the data and list the observational unit and variable(s) measured at each level of the data.

29) Interpret fixed and random effects and associated parameter estimates in models with three or more levels.

For parts (a)-(b), let \(Y_{ijk}\) represent the activity level of a fish, where \(i\) indexes aquarium, \(j\) indexes tank, and \(k\) indexes fish. Suppose we fit a model of the form.

\(Y_{ijk} = \beta_{0} + a_{i}+t_{ij}+\epsilon_{ijk}\), where \(a_{i}\sim N(0,\sigma^2_a)\), \(t_{ij}\sim N(0,\sigma^2_t)\), and \(\epsilon_{ijk}\sim N(0,\sigma^2)\).

  1. Explain in words what \(\beta_0\), \(\sigma_a\), \(\sigma_t\), and \(\sigma\) represent in this context.

  2. Based on the activity levels shown in the illustration, which of \(\sigma_a\), \(\sigma_t\), and \(\sigma\), is the largest? Which is the smallest? Explain your answers.

  3. Now suppose we fit a model of the form

\(Y_{ijk} = \beta_{0} +\beta_1\textrm{SpeciesB}_{ijk} + a_{i}+t_{ij}+\epsilon_{ijk}\), where \(a_{i}\sim N(0,\sigma^2_a)\), \(t_{ij}\sim N(0,\sigma^2_t)\), and \(\epsilon_{ijk}\sim N(0,\sigma^2)\).

Explain in words what \(\beta_0\), \(\sigma_a\), \(\sigma_t\), and \(\sigma\) represent in this context. In your answers state clearly how these interpretations are different than for the model in (a).

30) Write equations for linear mixed effects models in mathematical notation, including assumptions about distributions of random effects and error terms for models with three or more levels.

Write the mathematical notation of each of the following models, specified in R. Include appropriate subscripts and state the distributions of all random effects and error terms.

M1 <- lmer(data=Fish, activity ~ species + size + (1|tank) + (1|aquarium))
M2 <- lmer(data=Fish, activity ~ species + size + (species|tank) + (1|aquarium))
M2 <- lmer(data=Fish, activity ~ species + size + (1|tank) + (size|aquarium))