2: Hands-On Example

For this example, imagine we are working in a common-garden experiment in North Florida, where deciduous and evergreen trees were planted under the same soil, climate, and management conditions. Our research question is: Do evergreen species grow more or less than deciduous species, after accounting for tree height and species-level variation?

2.1 Research Question, Objectives and Hypotheses

Overall Objective:
To demonstrate how Linear Mixed-Effects Models (LMMs) can be used to evaluate growth differences between plant functional types while accounting for hierarchical structure in ecological data.
Specific Objectives:
- Explore the dataset visually (EDA).
- Fit a linear mixed-effects model (LMM) with species as a random effect.
- Perform diagnostics to assess model assumptions.
- Interpret fixed effects, random effects, and R².
- Visualize predictions and group-specific differences.
Hypotheses:

H₀ (null): No difference in growth between deciduous and evergreen species, after accounting for height and species.

H₁ (alternative): Growth differs between plant functional types, even after accounting for height and species.

2.2: Files for This Workshop

tree_dataset.csv → Dataset in CSV format (semicolon-delimited), containing tree growth, height, species, and plant functional type.
Download the data: tree_dataset.csv
R script
Download the R script: LMM_example.R
Attributes:
growth → tree growth in diameter (mm/year).
height → tree height (m).
pft → plant functional type (deciduous vs evergreen).
sp → species identifier (e.g., sp01, sp02…).

2.3: Install required libraries

# List of required packages 
  
required_packages <- c(
  "lme4",        
  "lmerTest",    
  "ggplot2",    
  "car",         
  "performance", 
  "viridis",     
  "patchwork"    
)

# Install any that are missing
installed <- required_packages %in% rownames(installed.packages())
if (any(!installed)) {
  install.packages(required_packages[!installed])
}

# Load all libraries
for (pkg in required_packages) {
  print(pkg)
  library(pkg, character.only = TRUE)
}

## [1] "lme4"
## [1] "lmerTest"
## [1] "ggplot2"
## [1] "car"
## [1] "performance"
## [1] "viridis"
## [1] "patchwork"

2.4: Import and Explore the Data

# Read the dataset from a CSV file
# sep = ";"   → because the file is semicolon-delimited
# header = TRUE → first row contains column names
dataset <- read.table("Z:\\06_Carine\\04_Statistics\\05_Workshop\\script_R\\tree_dataset.csv", sep = ";", header = TRUE)


# Show the first 6 rows of the dataset to check structure
head(dataset)

##      growth   sp       pft height
## 1 0.6418726 sp01 deciduous  14.14
## 2 0.7516172 sp01 deciduous  17.53
## 3 1.0683803 sp01 deciduous  25.66
## 4 0.8559022 sp01 deciduous  18.63
## 5 0.9978498 sp01 deciduous  23.33
## 6 0.9342241 sp01 deciduous  21.34

# List all unique species codes in the dataset
unique(dataset$sp)

##  [1] "sp01" "sp02" "sp03" "sp04" "sp05" "sp06" "sp07" "sp08" "sp09" "sp10"

# List the unique plant functional types (deciduous vs evergreen)
unique(dataset$pft)

## [1] "deciduous" "evergreen"

# Summary statistics of tree growth and height (min, median, mean, max, quartiles)
summary(dataset)

##      growth            sp                pft                height     
##  Min.   :0.2979   Length:100         Length:100         Min.   :10.18  
##  1st Qu.:0.7955   Class :character   Class :character   1st Qu.:13.53  
##  Median :1.0879   Mode  :character   Mode  :character   Median :16.76  
##  Mean   :1.2094                                         Mean   :17.46  
##  3rd Qu.:1.4451                                         3rd Qu.:21.49  
##  Max.   :2.6150                                         Max.   :25.83

# Exploratory Plots
# Boxplot: Tree growth by plant functional type (PFT)
# Boxes are filled with green (evergreen) and brown (deciduous)
p1 <- ggplot(dataset, aes(x = pft, y = growth, fill = pft)) +
  geom_boxplot(alpha = 0.8, color = "black") +              # black outline, colored fill
  scale_fill_manual(values = c("evergreen" = "darkgreen", 
                               "deciduous" = "saddlebrown")) +  # custom colors
  theme_minimal() +                                         # clean minimal theme
  labs(title = "a) Tree Growth by Plant Functional Type",      # plot title
       x = "Plant Functional Type (PFT)",                   # x-axis label
       y = "Growth (mm/year)")                              # y-axis label

# Boxplot: Distribution of growth by species
# Boxes are filled by plant functional type (pft)
p2<-ggplot(dataset, aes(x = sp, y = growth, fill = pft)) +
  geom_boxplot() +                                          # draw boxplots
  scale_fill_manual(values = c("evergreen" = "darkgreen", 
                               "deciduous" = "saddlebrown")) +  # custom colors
  theme_minimal() +                                         # clean minimal theme
  labs(title = "d) Tree Growth Distribution by Species",       # plot title
       x = "Species", y = "Growth (mm/year)")               # axis labels

# Scatterplot: Tree growth vs. height
# Points are colored by plant functional type (pft)
p3<-ggplot(dataset, aes(x = height, y = growth, color = pft)) +
  geom_point(size = 2, alpha = 0.7) +                       # add points (semi-transparent)
  scale_color_manual(values = c("evergreen" = "darkgreen", 
                               "deciduous" = "saddlebrown")) +  # custom colors
  theme_minimal() +                                         # clean minimal theme
  labs(title = "c) Tree Growth by Height",            # plot title
       x = "Height (m)", y = "Growth (mm/year)")            # axis labels


# Scatterplot: Tree growth vs. height
# Points are colored by species type (sp)
p4 <- ggplot(dataset, aes(x = height, y = growth, color = sp)) +
  geom_point(size = 2, alpha = 0.7) +                          # species points
  theme_minimal() +
  labs(title = "d) Tree Growth by Height (by Species)",         # title
       x = "Height (m)", y = "Growth (mm/year)",                # axis labels
       color = "Species")                                       # legend title


# Combine diagnostic plots in a 2x2 grid
(p1 | p2) / (p3 | p4) #

📝 Check Your Understanding

From the boxplots (a, b), which PFT appears to grow faster?
In panel (c), what is the overall relationship between height and growth?
In panel (c), do all species show the same height–growth pattern, or does it vary by species?

2.5: Fit a Linear Mixed-Effects Model (LMM)

#################################################################
# Step 1: Full mixed model (random slope + intercept for species)
#################################################################
  

# --------------------------------------------------------------
# Fixed effects:
#   - height (continuous predictor of growth, in meters)
#   - pft (plant functional type: deciduous vs evergreen)
#   - height:pft interaction (tests whether the slope of growth vs. height
#     differs between deciduous and evergreen species)
#
# Random effects:
#   - (height | sp): allows each species (sp) to have:
#       * its own intercept (baseline growth)
#       * its own slope with respect to height
#   This structure captures both baseline differences among species
#   and species-specific responses to tree height.
#
# Model formula:
lmm_full <- lmer(growth ~ height + pft + height:pft + (height | sp), data = dataset)
summary(lmm_full)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
## Formula: growth ~ height + pft + height:pft + (height | sp)
##    Data: dataset

## 
## REML criterion at convergence: -121.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.59205 -0.57506  0.00258  0.62308  3.01476 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr
##  sp       (Intercept) 0.0409523 0.20237      
##           height      0.0006224 0.02495  0.03
##  Residual             0.0073655 0.08582      
## Number of obs: 100, groups:  sp, 10
## 
## Fixed effects:
##                     Estimate Std. Error       df t value Pr(>|t|)   
## (Intercept)          0.11721    0.10767  8.37205   1.089  0.30666   
## height               0.05699    0.01166  8.35778   4.886  0.00107 **
## pftevergreen         0.37304    0.14890  7.82085   2.505  0.03728 * 
## height:pftevergreen -0.01109    0.01636  8.12116  -0.678  0.51669   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) height pftvrg
## height      -0.130              
## pftevergren -0.723  0.094       
## hght:pftvrg  0.093 -0.713 -0.107

# --------------------------------------------------------------
# Notes on fixed effects coding:
# --------------------------------------------------------------
# Factor coding (treatment/dummy):
# - R encodes factors with dummy variables; the first level (alphabetical by default) is the reference.
# - Here, pft has two levels: "deciduous" (reference) and "evergreen".
# - "pftevergreen" = difference for evergreen relative to deciduous.

# Coefficient interpretation for:
#   growth ~ height + pft + height:pft + (height | sp)
# - (Intercept): expected growth for deciduous at height = 0 (baseline; note this may be extrapolative).
# - height: slope of growth vs. height for deciduous.
# - pftevergreen: intercept shift (evergreen − deciduous).
# - height:pftevergreen: slope shift (evergreen − deciduous).

# Fitted values (REML):
# - (Intercept) = 0.117
# - height = +0.05699  (p = 0.0011)
# - pftevergreen = +0.37304  (p = 0.037)
# - height:pftevergreen = −0.01109  (p = 0.517)

# Fitted lines:
# - Deciduous:  y = 0.11721 + 0.05699 * height
# - Evergreen:  y = (0.11721 + 0.37304) + (0.05699 − 0.01109) * height
#              ≈ 0.49025 + 0.04590 * height

📝 Check Your Understanding

What are the fixed and random effects in the equation below, and how do we interpret them?

\[ \text{Growth}_{ij} = \beta_0 + \beta_1 \cdot \text{Height}_{ij} + \beta_2 \cdot \text{PFT}_j + \beta_3 \cdot (\text{Height}_{ij} \times \text{PFT}_j) + b_{0j} + b_{1j}\text{Height}_{ij} + \epsilon_{ij} \]

Fixed effects (β₀, β₁, β₂, β₃): population-level effects common to all species.
Random effects (by species sp) (b_0j, b_1j): species-specific deviations (random intercept and random slope for height).
Residual (ε_ij): within-species unexplained error for individual trees.

Interpretation of terms

(Intercept) β₀: baseline expected growth for the reference PFT when height = 0.
height β₁: change in growth per unit height for the reference PFT (slope).
pftevergreen β₂: shift in baseline growth for evergreen vs. the reference PFT (difference at height = 0).
height:pftevergreen β₃: change in the height–growth slope for evergreen vs. the reference PFT.

Factor coding note

pft is dummy-coded: the first level (alphabetical) is the reference (here, “deciduous”); “pftevergreen” compares evergreen to that reference.

#################################################################
# Step 2: Random intercept model (simpler random structure)
#################################################################

# Fixed effects:
#   - height (continuous predictor of growth)
#   - pft (plant functional type: deciduous vs evergreen)
#   - height:pft interaction (does the effect of height differ by PFT?)
# Random effects:
#   - (1 | sp): only the intercept varies by species,
#     meaning species differ in baseline growth but share the same slope.
# Notes:
#   This is simpler than the random slope model, because it assumes
#   all species respond to height in the same way (parallel lines),
#   but they may start from different baseline levels of growth.
lmm_ri <- lmer(growth ~ height + pft + height:pft + (1 | sp), data = dataset)   # fit LMM with random intercept only
summary(lmm_ri)                                                    # show summary of fixed + random effects

## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
## Formula: growth ~ height + pft + height:pft + (1 | sp)
##    Data: dataset
## 
## REML criterion at convergence: -52.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9139 -0.5964  0.0008  0.5473  2.2598 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  sp       (Intercept) 0.24249  0.4924  
##  Residual             0.01803  0.1343  
## Number of obs: 100, groups:  sp, 10
## 
## Fixed effects:
##                      Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)          0.123663   0.234932 10.186083   0.526   0.6099    
## height               0.057576   0.004610 88.260450  12.489   <2e-16 ***
## pftevergreen         0.385739   0.330776 10.011122   1.166   0.2706    
## height:pftevergreen -0.012718   0.006202 88.167459  -2.051   0.0433 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) height pftvrg
## height      -0.339              
## pftevergren -0.710  0.241       
## hght:pftvrg  0.252 -0.743 -0.327

# Compare random slope vs random intercept
anova(lmm_full, lmm_ri, refit = FALSE)                             # likelihood ratio test: slope vs intercept

## Data: dataset
## Models:
## lmm_ri: growth ~ height + pft + height:pft + (1 | sp)
## lmm_full: growth ~ height + pft + height:pft + (height | sp)
##          npar      AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)    
## lmm_ri      6  -40.302 -24.671 26.151  -52.302                         
## lmm_full    8 -105.239 -84.398 60.619 -121.239 68.937  2  1.073e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## If p < 0.05 → keep random slopes
## If p > 0.05 → random intercept is sufficient

#################################################################
# Step 3: Remove interaction (height:pft)
#################################################################

lmm_no_interaction <- lmer(growth ~ height + pft + (height | sp), data = dataset)  # drop interaction term
anova(lmm_full, lmm_no_interaction)                                                # test if interaction is significant

## Data: dataset
## Models:
## lmm_no_interaction: growth ~ height + pft + (height | sp)
## lmm_full: growth ~ height + pft + height:pft + (height | sp)
##                    npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)
## lmm_no_interaction    7 -126.69 -108.46 70.346  -140.69                     
## lmm_full              8 -125.21 -104.37 70.605  -141.21 0.5173  1      0.472
## If p > 0.05 → interaction not needed, keep simpler model

#################################################################  
# Step 4: Remove plant functional type effect (pft)
#################################################################

lmm_no_pft <- lmer(growth ~ height + (height | sp), data = dataset)   # drop pft (evergreen/deciduous) effect
anova(lmm_no_pft, lmm_no_interaction)                                 # compare with model including pft

## Data: dataset
## Models:
## lmm_no_pft: growth ~ height + (height | sp)
## lmm_no_interaction: growth ~ height + pft + (height | sp)
##                    npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)  
## lmm_no_pft            6 -123.11 -107.47 67.553  -135.11                       
## lmm_no_interaction    7 -126.69 -108.46 70.346  -140.69 5.5863  1     0.0181 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## If p < 0.05 → pft effect matters, keep in model

#################################################################
# Step 5: Remove height effect
#################################################################

lmm_no_height <- lmer(growth ~ pft + (height | sp), data = dataset)   # drop height effect
anova(lmm_no_interaction, lmm_no_height)                              # compare with model including height

## Data: dataset
## Models:
## lmm_no_height: growth ~ pft + (height | sp)
## lmm_no_interaction: growth ~ height + pft + (height | sp)
##                    npar     AIC      BIC logLik deviance  Chisq Df Pr(>Chisq)    
## lmm_no_height         6 -111.43  -95.798 61.715  -123.43                         
## lmm_no_interaction    7 -126.69 -108.456 70.346  -140.69 17.263  1  3.254e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# If p < 0.05 → height matters, keep in model

#################################################################
# Final chosen model
#################################################################

final_model <- lmm_no_interaction                                      # assign the selected best model
summary(final_model)                                                   # show results (coefficients, SE, p-values)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
## Formula: growth ~ height + pft + (height | sp)
##    Data: dataset
## 
## REML criterion at convergence: -127.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5742 -0.5519  0.0201  0.6350  3.0028 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr
##  sp       (Intercept) 0.0399444 0.19986      
##           height      0.0005846 0.02418  0.04
##  Residual             0.0073747 0.08588      
## Number of obs: 100, groups:  sp, 10
## 
## Fixed effects:
##              Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  0.124179   0.106269 8.454012   1.169 0.274486    
## height       0.051339   0.007946 9.177103   6.461 0.000107 ***
## pftevergreen 0.362315   0.146754 7.841833   2.469 0.039356 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) height
## height      -0.089       
## pftevergren -0.721  0.026

fixef(final_model)                                                     # extract fixed effects only

##  (Intercept)       height pftevergreen 
##    0.1241793    0.0513392    0.3623146

# Interpretation of fixed effects:
#   
# Intercept (0.124)
# This is the baseline growth rate (in mm/year) for the reference group (deciduous species) when tree height = 0.
# (Of course, height = 0 is not realistic, but it anchors the model.)
# 
# height (0.0513)
# For every 1 meter increase in tree height, tree growth increases by about 0.051 mm/year (on average, across species).
# This slope applies to both deciduous and evergreen species.
# 
# pftevergreen (0.3623)
# Evergreen species grow, on average, 0.36 mm/year more than deciduous species (the reference group), after accounting for tree height.
# This shows the functional type effect.
# 
# So, the model equation can be written as:
#   
#   Deciduous species:
#   Growth = 0.124 + 0.051 × Height
# 
# Evergreen species:
#   Growth = (0.124 + 0.362) + 0.051 × Height
# Growth = 0.486 + 0.051 × Height

ranef(final_model)                                       # extract random effects only

## $sp
##      (Intercept)       height
## sp01 -0.01564030 -0.012968867
## sp02  0.02179330  0.023128275
## sp03  0.07673630  0.039909255
## sp04  0.14020490 -0.017147029
## sp05 -0.21427916 -0.007292709
## sp06  0.08649606  0.036282413
## sp07 -0.11122789 -0.017932154
## sp08  0.24076283 -0.019122809
## sp09 -0.29214778 -0.012231066
## sp10  0.06730175 -0.012625309
## with conditional variances for “sp”


# Interpretation of species-level random effects:
# Each row shows species-specific deviations (BLUPs) from the population fixed effects:
#   (Intercept): -> how the baseline (at Height = 0) for species s differs from the global intercept
#   height    :  -> how the Height slope for species s differs from the global height slope
# Positive values = above-average (higher intercept or steeper slope); negative = below-average.
# If your model includes pftevergreen as a fixed effect, it shifts the INTERCEPT only (no pft:height term),
# so random slope differences are unrelated to PFT unless you add that interaction.
# Quick read of the output:
# sp01:  intercept -0.0156 (slightly lower baseline), slope -0.0130 (slightly flatter than average)
# sp02:  intercept +0.0218 (slightly higher baseline), slope +0.0231 (steeper than average)
# sp03:  intercept +0.0767 (higher baseline),        slope +0.0399 (notably steeper than average)
# sp04:  intercept +0.1402 (higher baseline),        slope -0.0171 (somewhat flatter)
# sp05:  intercept -0.2143 (much lower baseline),    slope -0.0073 (slightly flatter)
# sp06:  intercept +0.0865 (higher baseline),        slope +0.0363 (steeper than average)
# sp07:  intercept -0.1112 (lower baseline),         slope -0.0179 (flatter)
# sp08:  intercept +0.2408 (highest baseline),       slope -0.0191 (flatter; height effect reduced)
# sp09:  intercept -0.2921 (lowest baseline),        slope -0.0122 (slightly flatter)
# sp10:  intercept +0.0673 (slightly higher),        slope -0.0126 (slightly flatter)

2.6: Model Diagnostics

###############################################################
# 4) Model Diagnostics
###############################################################

# Extract residuals (observed - fitted values)
res <- resid(final_model)

# --------------------------------------------------------------
# Shapiro-Wilk normality test
# --------------------------------------------------------------
# Tests whether residuals are normally distributed.
# Hypotheses:
# H0: Residuals follow a normal distribution.
# H1: Residuals deviate from normality.
# If p > 0.05 → assumption of normality is satisfied.
shapiro.test(res)

## 
##  Shapiro-Wilk normality test
## 
## data:  res
## W = 0.99304, p-value = 0.8905

# --------------------------------------------------------------
# Levene’s test for homogeneity of variance
# --------------------------------------------------------------
# Tests if residual variance is equal across groups (here, PFT).
# Hypotheses:
# H0: Variances are equal across groups (homoscedasticity).
# H1: Variances differ between groups (heteroscedasticity).
# If p > 0.05 → assumption of homogeneity is satisfied.
leveneTest(res ~ as.factor(dataset$pft))

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  2.7352 0.1014
##       98

# --------------------------------------------------------------
# R² for mixed model
# --------------------------------------------------------------
# Computes marginal and conditional R² values.
# - Marginal R² = variance explained by fixed effects only.
# - Conditional R² = variance explained by both fixed + random effects.
r2(final_model)

## # R2 for Mixed Models
## 
##   Conditional R2: 0.978
##      Marginal R2: 0.275

# --------------------------------------------------------------
# Prepare data for plots
# --------------------------------------------------------------
# Build a data frame with fitted values and residuals for visualization.
residual_df <- data.frame(Fitted = fitted(final_model), Residuals = res)

# --------------------------------------------------------------
# Residuals vs Fitted Plot
# --------------------------------------------------------------
# Purpose: check for systematic patterns in residuals.  
# - Ideally: residuals form a random "cloud" around 0 with no trend.  
# - Patterns (curves, funnels) may indicate non-linearity or heteroscedasticity.  
p5 <- ggplot(residual_df, aes(x = Fitted, y = Residuals, color = abs(Residuals))) +  # x = fitted values, y = residuals, color by residual magnitude
  geom_point(size = 2, alpha = 0.8) +                                                # scatterplot of residuals (semi-transparent points)
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +                 # dashed reference line at 0 (ideal mean of residuals)
  scale_color_viridis(option = "C", direction = 1) +                                 # viridis scale: blue = small residuals, yellow = large
  theme_minimal() +                                                                  # clean background
  labs(title = "a) Residuals vs Fitted",                                                # plot title
       x = "Fitted Values [Growth (mm/year)]",                                       # x-axis = predicted growth
       y = "Residuals [Growth (mm/year)]",                                           # y-axis = observed - predicted
       color = "|Residuals|")                                                        # legend explains color = |residuals|


# --------------------------------------------------------------
# Histogram of Residuals with Normal Curve
# --------------------------------------------------------------
# Purpose: check approximate normality of residuals.  
# - A bell-shaped histogram suggests residuals are close to normal.  
# - Skewed or multi-modal shapes indicate potential violations.  
p6 <- ggplot(residual_df, aes(x = Residuals)) +  
  geom_histogram(aes(y = after_stat(density), fill = after_stat(x)),                 # histogram of residuals (scaled to density)
                 bins = 20, color = "white", alpha = 0.9) +                          # 20 bins, white borders, semi-transparent fill
  
  stat_function(fun = dnorm,                                                         # overlay theoretical normal density curve
                args = list(mean = mean(res), sd = sd(res)),                         # parameters: mean and sd of residuals
                aes(color = "Theoretical Normal"),                                   # label for legend
                linewidth = 1, linetype = "dashed") +                                # dashed black curve
  
  scale_color_manual(name = "Distribution",                                          # legend for curve
                     values = c("Theoretical Normal" = "black")) +                   # black dashed line
  
  scale_fill_viridis(option = "C", direction = 1) +                                  # viridis color scale for residuals
  
  theme_minimal() +                                                                  # clean background
  
  labs(title = "b) Histogram of Residuals with Normal Curve",                           # plot title
       x = "Residuals [Growth (mm/year)]",                                           # x-axis label
       y = "Density",                                                                # y-axis label
       fill = "Residuals")                                                           # legend for histogram fill


# --------------------------------------------------------------
# Combine diagnostic plots side by side
# --------------------------------------------------------------
(p5 | p6)

Shapiro–Wilk test: Are residuals normal?
Levene’s test: Equal variance across PFTs?
R²: How much variance is explained by fixed vs random effects?

2.7:Visualization of models


# Predictions from the final model
dataset$pred_full  <- predict(final_model)              # predictions with fixed + random effects (species-specific)
dataset$pred_fixed <- predict(final_model, re.form=NA)  # predictions with fixed effects only (ignores species variation)

p4<-ggplot() +
  geom_point(data = dataset,                            # plot observed values
             aes(x = height, y = growth, color = sp),   # x = height, y = growth, color = species
             size = 2, alpha = 0.7) +                   # semi-transparent points
  
  geom_line(data = dataset,                             # plot mixed-effects predictions
            aes(x = height, y = pred_full, group = sp,  # predictions vary by species (grouped)
                color = sp,                             # color lines by species
                linetype = "Mixed (Fixed + Random)"),   # labeled in legend
            size = 0.8) +                               # thinner line for mixed effects
  
  geom_line(data = dataset,                             # plot fixed-effects predictions
            aes(x = height, y = pred_fixed, group = pft,# predictions by plant functional type (pft)
                color = pft,                            # color lines by deciduous/evergreen
                linetype = "Fixed Only"),               # labeled in legend
            size = 1.3) +                               # thicker line for fixed effects
  
  scale_linetype_manual(values = c("Fixed Only" = "solid", 
                                   "Mixed (Fixed + Random)" = "dashed")) + # define line styles
  
  scale_color_manual(                                   # set color palettes
    values = c(
      setNames(turbo(length(unique(dataset$sp))), unique(dataset$sp)), # turbo palette for species
      deciduous = "brown", evergreen = "darkgreen"                     # custom colors for PFT
    )
  ) +
  
  theme_minimal() +                                     # clean theme
  labs(title = "Height vs Growth by Species", # plot title
       y = "Growth (mm/year)", x = "Height (m)",         # axis labels
       linetype = "Prediction Type",                     # legend for line types
       color = "Species / PFT") +                        # legend for species and PFT
  
  # Annotate fixed-effect equations directly on the plot
  annotate("text", x = 10,          # position text at right side
           y = 2.4,                # place near bottom
           label = "Deciduous: Growth = 0.124 + 0.051 × Height", # regression equation for deciduous
           color = "brown", hjust = 0, size = 4) +       # brown text for deciduous
  
  annotate("text", x = 10,          # position text at right side
           y = 2.5,                # place near top
           label = "Evergreen: Growth = (0.124 + 0.362) + 0.051 × Height", # regression equation for evergreen
           color = "darkgreen", hjust = 0, size = 4)     # green text for evergreen

print(p4) # plot the graphic

Workshop – Introduction to Linear Mixed Models (LMMs)

2025-09-14

Welcome to the Workshop

Workshop Structure

1.Introduction

1.1: Objectives

1.2: Linear Mixed-Effects Models (LMM)

2: Hands-On Example

2.1 Research Question, Objectives and Hypotheses

2.2: Files for This Workshop

2.3: Install required libraries

2.4: Import and Explore the Data

2.5: Fit a Linear Mixed-Effects Model (LMM)

2.6: Model Diagnostics

2.7:Visualization of models

3: Shiny App

What you can do

How it works (by tab)

Practical tips

Under the hood

4: Wrap Up

5. References and Evaluation Survey

References

Evaluation Survey