OVERVIEW. Environments play a major role in evolution. They affect the survival and reproduction of organisms, ultimately shaping phenotypes within populations. The effect of environments can be fast or slow, constant or acute but the many limitations we have in data collection (time, logistics, funding) often impede us to quantify such effects on evolution. On the other hand, there are climatic events that are so extreme that their effects on populations are immediate, opening opportunities to the quantification of selection. Today, we will test hypotheses about the effects of major hurricanes on morphological adaptations of lizards.



Student learning outcomes:


References:


Materials:



1. Morphological changes following extreme events


A. Import the data to RStudio and explore it.

Let’s import and explore the data from Donihue et al. (2018) from Dryad. For this, save the data file in your computer as a .csv file. Note: All lengths are measured in mm and areas are measured in \(mm^2\).

# changing the dataframe's name
 anol <- Donihue_2018_01_00672_Hurricanes_Data

# previewing "anol"
View(anol)


Questions:

  1. How many variables and what type of variables (e.g., categorical, continuous) “anole” has?
  2. Is there morphological data before and after an extreme event?


B. Generate a hypothesis about morphological changes after a hurricane.

Let’s generate a hypothesis, including the null hypothesis, about changes in the association between one morphological trait and the size of the lizard (snout-to-vent length; SVL) after a hurricane. Snout-to-vent length (mm) is a numerical variable, and thus we will be looking at associations between one morphological trait of your choice and SVL before and after a hurricane.


Questions:

  1. What are the variables of interest for your hypothesis?
  2. Given the variables involved in your hypothesis, what type of visualization you should carry out?


C. Plot the data.

Let’s plot the data. Below is an example using “Femur”. Choose another variable for your hypothesis.

We can use the command color= in aesthetics to differentiate between groups.

# loading the ggplot2 package
library(ggplot2)

# plotting femur versus svl 
p1 <- ggplot(anol,aes(SVL,Femur,color=Hurricane)) +
  geom_point() +
  ylab("Femur length (mm)") +
  xlab("Snout-to-vent length (mm)")+
  theme_classic(15) 

p1

Questions:

  1. Is there any pattern in your data?
  2. What is the most appropriate statistical test for your hypothesis?


D. Test your hypothesis.

Simple linear regression is useful for modeling associations between two continuous variables. This is particularly true if we are interested in predicting the values of one variable from the values of the other variable and determining the rate of change of one value given the other. We can fit a linear regression to our data using the line for which total prediction error (data points in red in the figure below) are as small as possible. Here, error is defined as the distance (green) between the point to the regression line (blue).


In linear regression, the observations (red) are assumed to be the result of random deviations (green) from an underlying relationship (blue) between a response variable and an explanatory variable (https://commons.wikimedia.org/w/index.php?curid=4099808)]

In linear regression, the observations (red) are assumed to be the result of random deviations (green) from an underlying relationship (blue) between a response variable and an explanatory variable (https://commons.wikimedia.org/w/index.php?curid=4099808)]


A linear regression - in contrast to a correlation - gives a slope defining the change in the response variable due to a change in the explanatory variable by unit distance. The intercept in a linear regression gives the value of the response variable if the explanatory variable is set to zero. Finally, linear regression can give you a prediction.


Linear regressions are defined by a linear equation: \[ \begin{aligned} y=bx+a, \end{aligned} \] where \(y\) is the response variable, \(x\) is the explanatory variable, \(b\) is the slope and \(a\) is the \(y\)-intercept.


Let’s fit a linear regression to our data. The function lm() in R is used to fit linear models. As for a t-test, the first argument is the response variable, the second argument is the explanatory variable, both separated by “~”. Below is an example using “Femur”.

# linear regression between Femur and SVL 
my_lm <- lm(anol$Femur~anol$SVL)

# model summary
summary(my_lm)
## 
## Call:
## lm(formula = anol$Femur ~ anol$SVL)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.39563 -0.48642 -0.03581  0.42777  2.54593 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.72157    0.41753  -4.123 5.95e-05 ***
## anol$SVL     0.25509    0.00836  30.513  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6685 on 162 degrees of freedom
## Multiple R-squared:  0.8518, Adjusted R-squared:  0.8509 
## F-statistic: 931.1 on 1 and 162 DF,  p-value: < 2.2e-16


Interpreting (some of) the R output

Call: The first item shown in the output is the formula R used to fit the data (the formula you just wrote).

Residuals are the difference between the actual observed values from the response variable (femur length) and the values that the model predicted (the line). The Residuals section breaks it down into 5 descriptive statistics (minimum, first quantile, median, third quantile, maximum). These stats give you an idea of how symmetric (normally distributed) the residuals are.

Coefficients contain two rows; the first one is the y-intercept of the line (a). In our example, the intercept is the expected length of a femur (-1.72 mm) when SVL = 0 mm. The second row is the slope (b). The slope term in our model is saying that for every 1 mm increase in SVL, the femur length goes up by 0.26 mm (there is a positive relationship between these two variables).

Pr(>|t|) represents the p-value for each parameter.

Multiple R-squared, Adjusted R-squared provide a measure of how well the linear model is fitting the data. It always lies between 0 and 1. In our example, the R-squared is 0.85. Thus, 85% of the variance found in the response variable (femur length) can be explained by the predictor variable (SVL). That’s a good fit!


Questions:

  1. Test your hypothesis by fitting a linear regression to your data.
  2. Write down the linear equation.
  3. What is the predicted change in the response variable as SVL increases?


E. Add the regression line to the figure (example below).

Finally, we can add a fitted line to the figure in ggplot2 using the function stat_smooth(method = “lm”).

# adding linear regressions to p1
p1 + stat_smooth(method = "lm")


As published in their paper, Donihue and colleagues (2018) found a shift in the relationship between femur length and SVL of anoles following the hurricane. Such relationship pointed towards selection favoring lizards with shorter legs following the extreme event.


Question:

  1. Does the analysis support your hypothesis? Explain.


Great Work!