Introduction: Linear mixed-effects models (LMMs), also known as hierarchical models, are another extension of simple linear models used when there is clustering (i.e., nested data structures) or non-independence (i.e., repeated measurements) among observations. These models are called “mixed-effects” because they incorporate both fixed and random effects. Fixed effects are variables with a constant effect on the response variable, while random effects are variables whose values or levels are assumed to be drawn randomly from a larger population of levels. Given the natural clustering in biological data (e.g., genetic groups, geographic locations), as well as the longitudinal monitoring of the same individuals over time, LMMs represent another essential tool in modern population biology.
In this module, you will expand your skills on linear models (Modules M.4) to LMMs while testing associations between Cayo Santiago rhesus macaque social cognition and age.
Upon completion of this module, you will be able to:
References:
Extra training:
Associated literature:
Notation:
Functions:
Base R:
ggeffects:
lmerTest:
In this module, you will use a subset of Cayo Santiago rhesus macaque adult demographic and cognitive data from Rosati et al. 2018. (cayo_demo_cog, Table 1) to explore ontogenic changes in social cognition using linear mixed-effects models. This is authentic demographic and cognition data. The demographic data is collected through daily visual censuses performed by the staff of the Caribbean Primate Research Center (CPRC) of the University of Puerto Rico-Medical Sciences Campus. The cognition health data is based on cognitive tasks collected in the field by a team of researchers from the University of Michigan, University of Pennsylvania, and Yale University. Here, social cognition is indexed as monkeys’ social attention towards different visual stimuli (Fig 1; Rosati et al. 2018).
Before coding, explore the data in Table 1.
Metadata of cayo_demo_cog: Demographic and cognition data of Cayo Santiago rhesus macaques.
Before any analysis, you should check the data and understand its attributes. Refer to Module M.1. Practice your coding first before clicking on the answer!
Guiding questions:
Variable classes include ‘numeric’ and ‘character’.
Click for
Answer!
Biological data often presents nested structures. This occurs when observations belong to a discrete set of groups, and you suspect or have evidence that group membership has an important effect on your response variable. As group membership may explain a portion of the observed variation in the response variable, it is very important to account for it! This can ultimately improve model fit. Similarly, non-independence among individual observations due to repeated measurements must be addressed.
Guiding questions:
Yes! For example, each individual monkey belongs to a discrete number
of Mom categories (IDs). Thus, Subject is nested or
clustered within units of Mom.
Click for
Answer!
Recall that linear models are defined by a fixed intercept and fixed slopes corresponding to the effect of each explanatory variable. Similarly, random effects can be added as random intercepts and or random slopes for nesting variables.
Below, you will fit a LMMs to the rhesus macaque data to test whether social cognition is a function of age. To address potential maternal effects, you will add a random intercept for Mom ID. For this, you will use R package lmerTest and its function lmer() where the first argument is the response variable, followed by the explanatory variable (separated by “~”). For lmer(), you also need to specify the structure of the random effects; (slope | intercept). To see the model output, you will use summary().
# installing lmerTest
install.packages("lmerTest",repos="http://cran.us.r-project.org")
# loading lmerTest
library(lmerTest)
# linear mixed-effects model including Mom as a random intercept
lmm1 <- lmer(Looking_Time ~ age + (1|mom), data = cayo_demo_cog) # (1|Mom) to only include a random intercept for Mom
# model output
summary(lmm1)
Model output interpretation. The model output summary now has an extra component; random effects. This section indicates the variance and standard deviation of the random intercept fit to the model (i.e., Mom ID). This variance tells you how much variability in looking time is explained by individual mom differences. In other words, this tells you how much mothers differ from each other in terms of their offspring looking time. The residual variance is the variability within each mom ID, and thus remains unexplained by the model.
Guiding questions:
Formula: the model formula defined.
Scaled residuals: minimum, 1st quartile, median, 3rd
quartile, and maximum value of the scaled residuals.
Fixed effects: model parameters
(y-intercept and slope) and statistics for each parameter (SE,
t statistic, p-value). According to the model, the mean baseline looking
time when there are no age effects is 5.557s. As age increases by 1
year, looking time decreases by 0.237s.
Correlation of Fixed Effects: correlation matrix for
explanatory variables.
Click for
Answer!
You can use R package sjPlot to plot predictions from LMMs. However, many extra tools are available. Below you will learn how to plot mean model predictions, as well as random intercepts, using R package ggeffects. This package also allows you to incorporate aesthetics using functions from ggplot2.
# installing ggeffects
install.packages("ggeffects",repos="http://cran.us.r-project.org")
# loading ggeffects and ggplot2
library(ggeffects)
library(ggplot2)
# plotting mean model prediction
p1 <- ggpredict(lmm1, terms = c("age"), type = "fixed") # extracting prediction
plot(p1) +
ylab("Looking time (s)") + # using ggplot functions
xlab("Age (years)")
# plotting model prediction across groups (random intercepts for Mom)
p2 <- ggpredict(lmm1, terms = c("age", "mom"), type = "random")
plot(p2, ci=FALSE) +
ylab("Looking time (s)") +
xlab("Age (years)")
Guiding questions:
Out of the total of 9 mothers, Mom 8 appears to have a higher
y-intercept than those associated to the other 8 mothers.
Click for
Answer!
Challenge!
Discussion questions: Compare lmm1 to your new model; which model would you choose and why?
Acknowledgements: The creation of this module was funded by the National Science Foundation DBI BRC-BIO award 2217812. Cayo Santiago is supported by the Office of Research Infrastructure Programs (ORIP) of the National Institutes of Health, grant 2 P40 OD012217, and the University of Puerto Rico (UPR), Medical Sciences Campus. Additional support was provided by the National Science Foundation Graduate Research Fellowship Program to Alexis A. Diaz, award 2141410. We acknowledge the use of BioRender.com to create Fig 1.
Authors: Raisa Hernández-Pacheco, Alexandra L. Bland, Alexis A. Diaz, Alexandra G. Rosati, California State University, Long Beach, University of Michigan