Lecture 10: Mixed Models for Multivariate Regression

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

2024-10-09

Today's Class

  • Multivaraite regression via mixed models
  • Comparing and contrasting path analysis with mixed models
    • Differences in model fit measures
    • Differences in software estimation methods
    • Model comparisons via multivariate Wald tests (instead of LRTs)
    • How to compute R-square

R Setup

library(ESRM64503)
library(kableExtra)
library(tidyverse)
library(DescTools) # Desc() allows you to quick screen data
library(lavaan) # Desc() allows you to quick screen data
head(dataMath)
  id hsl cc use msc mas mse perf female
1  1  NA  9  44  55  39  NA   14      1
2  2   3  2  77  70  42  71   12      0
3  3  NA 12  64  52  31  NA   NA      1
4  4   6 20  71  65  39  84   19      0
5  5   2 15  48  19   2  60   12      0
6  6  NA 15  61  62  42  87   18      0
dim(dataMath)
[1] 350   9

Correction about fixed.x argument in previous lecture

  • If TRUE, the exogenous x covariates are considered fixed variables and the means, variances and covariances of these variables are fixed to their sample values.

  • If FALSE, they are considered random, and the means, variances and covariances are free parameters. Typically, called latent variable

  • If “default”, the value is set depending on the mimic option.

Thus, we considered the distributions of exogenous variables as known parameters.

What is Mixed Models

  • A mixed model, mixed-effects model or Linear mixed models (LMMs) is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences.

  • Mixed model can answer similar research questions as Path Analysis (or structural equation model):

    • Relationships among multiple endogenous variables

Fixed effects vs. Random effects

  1. Definition: Fixed effects are constant across individuals, and random effects vary.

Assume a person is measured t times (repeated measure design or longitudinal design), thus we have t points of x and y for each individuals

\[ y_{it} = \beta_{0i} + \beta_1 x_{it} \]

Here, \(\beta_{0i}\) is random intercept that varies across individuals. \(\beta_{1}\) is the fixed slope that shared acorss individuals.

  1. Alternative definition: Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population.

\[ y_i = \beta_0 + \beta_1 x_i +e_i \] Thus, \(\sigma^e\) is random effect, \(\beta_0\) and \(\beta_1\) are fixed effects.

Properties of Mixed Models

  1. Mixed models are used for many types of analyses:
    • Analogous to MANOVA and M-Regression (so repeated measures analyses)
    • Multilevel models for clustered, longitudinal, and crossed-effects data
  2. The biggest difference between mixed models and path analysis software is the assumed distribution of the exogenous variables
    • Mixed models: no distribution assumed
    • Path analysis: most software assumes multivariate normal distribution
    • This affects how missing data are managed – mixed models cannot have any missing IVs
  3. Mixed models also do not allow endogenous variables to predict other endogenous variables
    • No indirect effects are possible from a single analysis (multiple analyses needed)
  4. Mixed models software also often needs variables to be stored in so-called “stacked” or long-format data (one row per DV)
    • We used wide-format data for lavaan (one row per person)

Wide to Long Data Transformation

  • Original wide-format data (all DVs for a person on one row)
dat <- dataMath
dat$cc10 <- dat$cc - 10
dat_wide <- dat |> select(id, perf, use, female, cc10)
head(dat_wide) # show first 6 lines
  id perf use female cc10
1  1   14  44      1   -1
2  2   12  77      0   -8
3  3   NA  64      1    2
4  4   19  71      0   10
5  5   12  48      0    5
6  6   18  61      0    5
  • Reshape with pivot_longer() function and Resulting data:
dat_long <- dat_wide |> 
1  pivot_longer(cols = c(perf, use),
2               names_to = "DV",
3               values_to = "score") |>
  mutate(dPerf = ifelse(DV == 'perf', 0, 1)) # convert DVs into indicator variable - dperf
head(dat_long)
1
cols: Columns to pivot into longer format. Put multiple column names (no quote) into c()
2
names_to: A character vector specifying the new column to create from the information stored in the column names
3
values_to: A string specifying the name of the column to create from the data stored in cell values.
# A tibble: 6 × 6
     id female  cc10 DV    score dPerf
  <int>  <int> <dbl> <chr> <int> <dbl>
1     1      1    -1 perf     14     0
2     1      1    -1 use      44     1
3     2      0    -8 perf     12     0
4     2      0    -8 use      77     1
5     3      1     2 perf     NA     0
6     3      1     2 use      64     1

Execise 1: Wide to Long Transform

  • Turn msc, mas, mse into long-form
dat_wide <- dat |> select(id, msc, mas, mse)
## You turn
id DV Score
1 msc 55
1 mas 39
1 mse NA
2 msc 70

Statistical Form: Bridge Path Model w/t Mixed Model

  • Before we dive into mixed models, we will begin with a multivariate regression model:
    • Predicting mathematics performance (PERF) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC)
    • Predicting perceived usefulness (USE) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC)

\[ PERF_i = \beta_{0,PERF} + \beta_{F,PERF} F_i + \beta_{CC,PERF}CC_i + \beta_{F*CC,PERF}F_i*CC_i + e_{i,PERF} \qquad(1)\] \[ USE_i = \beta_{0,USE} + \beta_{F,USE} F_i + \beta_{CC,USE}CC_i + \beta_{F*CC,USE}F_i*CC_i + e_{i,USE} \qquad(2)\]

  • Mixed Model: Here I use the symbol \(\delta\) to represent each fixed effect in the multivariate model from the mixed model perspective. dPERF is the DV indicator: 1 - Perf and 0 - Use

\[ Score_i = (\delta_{0,PERF} + \delta_{0, dPERF}) + (\delta_{F,PERF} + \delta_{F, dPERF}) F_i + (\delta_{CC,PERF} \\ + \delta_{CC, dPERF}) CC_i + (\delta_{F*CC,PERF} + \delta_{F*CC,dPERF}) F_i*CC_i + e_{i,PERF} + e_{i,dPERF} \qquad(3)\]

\[ Score_i = \delta_{0,PERF} + \delta_{0, dPERF}dPERF_i + \delta_{F,PERF} F_i + \delta_{F, dPERF}dPERF_i * F_i \\ + \delta_{CC,PERF} CC_i + \delta_{CC, dPERF} dPERF_i * CC_i \\ + \delta_{F*CC,PERF} F_i*CC_i + \delta_{F*CC,dPERF} dPERF_i * F_i*CC_i + e_{i,PERF} + e_{i,dPERF} \qquad(4)\]

Build the Empty Model: Not So Empty

Statistical Form of empty mixed model

  • For illustration, let’s start from the empty model

  • A multivariate model using mixed model software uses the dummy code for DV to make all effects conditional on the specific DV in the model

    • I will compare/contrast these with the symbols \(\beta\) from the fixed effects in path analysis
  • For instance, our empty model is thus:

    • Where Score is condition on the value of dPerf:
      • When \(dPerf = 0\) \(\rightarrow\) DV = “Use” \(\rightarrow\) \(Score_i = Use_i = \delta_0 + e_{i, Use}\)
      • When \(dPerf = 1\) \(\rightarrow\) DV = “Perf” \(\rightarrow\) \(Score_i = Perf_i = \delta_0 + \delta_1 + e_{i, perf}\)

\[ Score_{i, DV} = \delta_0 + \delta_1 dPerf_i + e_{i, DV} \]

Estimating the Empty Model

  • From the nlme library, we will use the gls() function

    • Be sure the library is installed and loaded before trying this!
library(nlme) # install.packages("nlme")

dat_long <- dat_long[complete.cases(dat_long), ]
# create empty model using REML estimation to attempt to mirror initial analysis:
1model01_mixed = gls(model = score ~ 1 + dPerf,
                    data = dat_long,
2                    method = "REML",
3                    correlation = corSymm(form = ~1|id),
4                    weights = varIdent(form = ~1|DV))
summary(model01_mixed)
1
score ~ 1 + dPerf: \(Score_{i, DV} = \delta_0 + \delta_1 dPerf_i + e_{i, DV}\)
2
“REML”: Residual Maximum Likelihood Estimation
3
corSymm: General Correlation Structure; Provides estimates of all unique correlations; Needs id variable name after | for program to know which data comes from which person; ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups.
4
varIdent: a constant variance function structure; Estimates a different (residual) variance for each DV; With correlation line ensures an unstructured model is estimated

Estimating the Empty Model

Generalized least squares fit by REML
  Model: score ~ 1 + dPerf 
  Data: dat_long 
       AIC      BIC    logLik
  3774.313 3795.871 -1882.156

Correlation Structure: General
 Formula: ~1 | id 
 Parameter estimate(s):
 Correlation: 
  1    
2 0.136
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | DV 
 Parameter estimates:
    perf      use 
1.000000 5.337397 

Coefficients:
               Value Std.Error  t-value p-value
(Intercept) 13.94085 0.1868631 74.60461       0
dPerf       38.46901 0.9399708 40.92575       0

 Correlation: 
      (Intr)
dPerf -0.079

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-3.24789265 -0.64196261  0.01956532  0.68109325  2.99644101 

Residual standard error: 3.023304 
Degrees of freedom: 553 total; 551 residual

Empty Model Results I: Covariance Matrix of DVs

  • The covariance matrix of DVs comes from the getVarCov() function
getVarCov(model01_mixed)
Marginal variance covariance matrix
       [,1]     [,2]
[1,] 9.1404   6.6311
[2,] 6.6311 260.3900
  Standard Deviations: 3.0233 16.137 
  • Estimated variance-covariance matrix of PERF and USE scores.

Mapping Multivariate Mixed Models onto Path Models

  • To compare this result with the path analyses we conducted previously, we’ll have to use this data set
    • Omit the same observations
  • So, we’ll need to take our long-format data and reshape it into wide-format:
head(dat_wide)
  id perf use female cc10
1  1   14  44      1   -1
2  2   12  77      0   -8
3  3   NA  64      1    2
4  4   19  71      0   10
5  5   12  48      0    5
6  6   18  61      0    5
library(lavaan)
model01_mirror.syntax = "
# Means:
perf ~ 1
use ~ 1

# Variances:
perf ~~ perf
use ~~ use

# Covariance:
perf ~~ use
"

model01_path_noNA.fit = sem(model01_mirror.syntax, data = dat_wide,
                            fixed.x = TRUE, 
                            mimic = "MPLUS", 
                            estimator = "MLR")
summary(model01_path_noNA.fit, 
        fit.measures = TRUE,
        standardized = TRUE)

Mapping Multivariate Mixed Models onto Path Models

lavaan 0.6.17 ended normally after 29 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

                                                  Used       Total
  Number of observations                           348         350
  Number of missing patterns                         3            

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                 0.000       0.000
  Degrees of freedom                                 0           0

Model Test Baseline Model:

  Test statistic                                 6.064       5.573
  Degrees of freedom                                 1           1
  P-value                                        0.014       0.018
  Scaling correction factor                                  1.088

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000       1.000
  Tucker-Lewis Index (TLI)                       1.000       1.000
                                                                  
  Robust Comparative Fit Index (CFI)                         1.000
  Robust Tucker-Lewis Index (TLI)                            1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2085.032   -2085.032
  Loglikelihood unrestricted model (H1)      -2085.032   -2085.032
                                                                  
  Akaike (AIC)                                4180.064    4180.064
  Bayesian (BIC)                              4199.325    4199.325
  Sample-size adjusted Bayesian (SABIC)       4183.464    4183.464

Root Mean Square Error of Approximation:

  RMSEA                                          0.000          NA
  90 Percent confidence interval - lower         0.000          NA
  90 Percent confidence interval - upper         0.000          NA
  P-value H_0: RMSEA <= 0.050                       NA          NA
  P-value H_0: RMSEA >= 0.080                       NA          NA
                                                                  
  Robust RMSEA                                               0.000
  90 Percent confidence interval - lower                     0.000
  90 Percent confidence interval - upper                     0.000
  P-value H_0: Robust RMSEA <= 0.050                            NA
  P-value H_0: Robust RMSEA >= 0.080                            NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000       0.000

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  perf ~~                                                               
    use               6.847    2.850    2.403    0.016    6.847    0.147

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    perf             13.959    0.174   80.442    0.000   13.959    4.721
    use              52.440    0.872   60.140    0.000   52.440    3.322

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    perf              8.742    0.754   11.596    0.000    8.742    1.000
    use             249.245   19.212   12.973    0.000  249.245    1.000

Comparing and Contrasting Results: Intercept (fixed effect)

  • \(\beta_{0,Perf}\) and \(\beta_{0,Use}\):
parameterestimates(model01_path_noNA.fit) |> filter(op == "~1")
   lhs op rhs    est    se      z pvalue ci.lower ci.upper
1 perf ~1     13.959 0.174 80.442      0   13.619   14.299
2  use ~1     52.440 0.872 60.140      0   50.731   54.149
  • \(\delta_{0,DV}\) and \(\delta_{1,DV}\):
summary(model01_mixed)$tTable
               Value Std.Error  t-value       p-value
(Intercept) 13.94085 0.1868631 74.60461 3.552065e-290
dPerf       38.46901 0.9399708 40.92575 3.477058e-169

\(\delta_{0,DV} + \delta_{1,DV}\): 52.40986 is close to \(\beta_{0,Use}\)

Comparing and Contrasting Results: Residual variance coviarance

parameterestimates(model01_path_noNA.fit) |> 
  filter(op == "~~") |> 
  select(lhs, rhs, est) |> 
  pivot_wider(names_from = rhs, values_from = est)
# A tibble: 2 × 3
  lhs    perf    use
  <chr> <dbl>  <dbl>
1 perf   8.74   6.85
2 use   NA    249.  

In mixed model, we cannot get the z-value (significance testing)

getVarCov(model01_mixed)
Marginal variance covariance matrix
       [,1]     [,2]
[1,] 9.1404   6.6311
[2,] 6.6311 260.3900
  Standard Deviations: 3.0233 16.137 

Comparing and Contrasting Results: Correlation

standardizedsolution(model01_path_noNA.fit) |> filter(op == "~~", lhs != rhs)
   lhs op rhs est.std   se     z pvalue ci.lower ci.upper
1 perf ~~ use   0.147 0.06 2.437  0.015    0.029    0.265
model01_mixed$modelStruct$corStruct
Correlation structure of class corSymm representing
 Correlation: 
  1    
2 0.136

REML: Residual Maximum Likelihood Estimation

  • The ML estimator is nice, but the variance estimate is downward biased (too small)
    • Remember – it divides by N for the residual covariance matrix
  • In small samples, this is likely to lead to biased estimates and incorrect p-values
    • The variance goes into the SE, which goes into the Wald test, which dictates the p-value for the beta
  • Instead, another maximum likelihood technique has been developed: Residual Maximum Likelihood (REML)
    • Maximizes the likelihood of the residuals rather than the data
    • Has unbiased estimates of the residual covariance matrix
    • Is the default method of estimation for most mixed model estimation packages
  • There is one catch to REML: you cannot use a LRT to compare nested models with differing fixed effects
    • Because the algorithm uses residuals not data likelihood, if the residuals change, the likelihood changes
    • Residuals come from the fixed effects \(\rightarrow\) if fixed effects are different, then residuals change, causing the likelihood to change
    • Can use multivariate Wald test for fixed effects
  • Don’t mix ML and REML for the same analysis

Adding Predictors To The Model

Adding more predictors: female and cc10

  • Adding predictors to the model is similar to adding predictors in regular regression models

  • By using REML we cannot compare models using likelihood ratio tests

    • REML LRTs must have same fixed effects
    • Adding predictors adds new fixed effects to the empty model
  • We are predicting each DV with female, cc10, and female*cc10

Model with Predictors: Syntax

# Model 02: all predictors included
model02_formula = as.formula("score ~ 1 + dPerf + female + dPerf*female + cc10 + dPerf*cc10 + female*cc10 + dPerf*female*cc10")
model02_mixed <- gls(model = model02_formula, method = "REML",
                     data = dat_long, 
                     correlation = corSymm(form = ~1|id),
                     weights = varIdent(form = ~1|DV))
summary(model02_mixed)
getVarCov(model02_mixed)
Marginal variance covariance matrix
       [,1]     [,2]
[1,] 8.5491   5.0582
[2,] 5.0582 259.5200
  Standard Deviations: 2.9239 16.11 
summary(model02_mixed)$tTable |> round(3)
                   Value Std.Error t-value p-value
(Intercept)       13.689     0.224  61.183   0.000
dPerf             38.110     1.175  32.429   0.000
female             0.658     0.384   1.715   0.087
cc10               0.099     0.035   2.786   0.006
dPerf:female       1.177     2.007   0.587   0.558
dPerf:cc10         0.097     0.198   0.488   0.626
female:cc10        0.094     0.067   1.396   0.163
dPerf:female:cc10  0.166     0.353   0.472   0.637
  1. \(\beta_0 = 13.689, p < .001\)
  2. \(\beta_{dPerf} = 38.110, p < .001\)
  3. \(\beta_{female} = 0.658, p = 0.087\)
  4. \(\beta_{cc10} = 0.099, p = 0.006\)
  5. \(\beta_{dPerf*female} = 1.177, p = 0.558\)
  6. \(\beta_{dPerf*cc10} = 0.097, p = 0.626\)
  7. \(\beta_{female*cc10} = 0.094, p = 0.163\)
  8. \(\beta_{dPerf*female*cc10} = 0.166, p = 0.637\)

Model with Predictors: Syntax

Generalized least squares fit by REML
  Model: model02_formula 
  Data: dat_long 
       AIC      BIC    logLik
  3771.308 3818.617 -1874.654

Correlation Structure: General
 Formula: ~1 | id 
 Parameter estimate(s):
 Correlation: 
  1    
2 0.107
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | DV 
 Parameter estimates:
   perf     use 
1.00000 5.50965 

Coefficients:
                     Value Std.Error  t-value p-value
(Intercept)       13.68949 0.2237452 61.18340  0.0000
dPerf             38.10983 1.1751688 32.42924  0.0000
female             0.65832 0.3837798  1.71536  0.0868
cc10               0.09871 0.0354295  2.78617  0.0055
dPerf:female       1.17738 2.0068176  0.58669  0.5577
dPerf:cc10         0.09653 0.1979034  0.48778  0.6259
female:cc10        0.09377 0.0671627  1.39609  0.1633
dPerf:female:cc10  0.16641 0.3529074  0.47155  0.6374

 Correlation: 
                  (Intr) dPerf  female cc10   dPrf:f dPr:10 fml:10
dPerf             -0.096                                          
female            -0.583  0.056                                   
cc10              -0.121  0.011  0.070                            
dPerf:female       0.056 -0.586 -0.098 -0.006                     
dPerf:cc10         0.010 -0.147 -0.006 -0.081  0.086              
female:cc10        0.064 -0.006 -0.001 -0.528  0.003  0.043       
dPerf:female:cc10 -0.006  0.083  0.004  0.046  0.043 -0.561 -0.098

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-3.17907470 -0.67331699 -0.01178306  0.70547907  2.97992869 

Residual standard error: 2.923886 
Degrees of freedom: 553 total; 545 residual

First Question: Which Model “Fits” Better?

  • After adding the predictors (estimating their betas) to the model, we must first ask which model fits better

  • A likelihood ratio test (LRT) cannot be performed as we are using REML

  • Use multivariate wald test

library(multcomp)
model2_model_matrix <- diag(rep(1, 8))
rownames(model2_model_matrix) <- c(
  "Intercept",
  "dPerf",
  "female",
  "cc10",
  "dPerf:female",
  "dPerf:cc10",
  "female:cc10",
  "dPerf:female:cc10"
)
effects <- glht(model = model02_mixed, linfct = model2_model_matrix)
summary(effects)

     Simultaneous Tests for General Linear Hypotheses

Fit: gls(model = model02_formula, data = dat_long, correlation = corSymm(form = ~1 | 
    id), weights = varIdent(form = ~1 | DV), method = "REML")

Linear Hypotheses:
                       Estimate Std. Error z value Pr(>|z|)    
Intercept == 0         13.68949    0.22375  61.183   <1e-04 ***
dPerf == 0             38.10983    1.17517  32.429   <1e-04 ***
female == 0             0.65832    0.38378   1.715   0.4735    
cc10 == 0               0.09871    0.03543   2.786   0.0397 *  
dPerf:female == 0       1.17738    2.00682   0.587   0.9973    
dPerf:cc10 == 0         0.09653    0.19790   0.488   0.9992    
female:cc10 == 0        0.09377    0.06716   1.396   0.7130    
dPerf:female:cc10 == 0  0.16641    0.35291   0.472   0.9994    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
summary(effects, test = Ftest())

     General Linear Hypotheses

Linear Hypotheses:
                       Estimate
Intercept == 0         13.68949
dPerf == 0             38.10983
female == 0             0.65832
cc10 == 0               0.09871
dPerf:female == 0       1.17738
dPerf:cc10 == 0         0.09653
female:cc10 == 0        0.09377
dPerf:female:cc10 == 0  0.16641

Global Test:
  Chisq DF Pr(>Chisq)
1  8328  8          0
  • Note: R’s nlme function doesn’t do a good job with df.residual and provides a Chi-square test

  • Also note there are 6 degrees of freedom (one for each additional beta weight in the model)

Questions that can be answered

  • What is the effect of college experience on usefulness for males?
  • What is the effect of college experience on usefulness for females?
  • What is the difference between males and females ratings of usefulness when college experience = 10?
  • How did the difference between males and females ratings change for each additional hour of college experience?
  • What is the effect of college experience on performance for males?
  • What is the effect of college experience on performance for females?
  • What is the difference between males and females performance when college experience = 10?
  • How did the difference between males and females performance change for each additional hour of college experience?

Model R-squared

To determine the model R-squared, we have to compare the variance/covariance matrix from model01 and model02 and make the statistics ourselves:

Vmodel01 = getVarCov(model01_mixed)
Vmodel02 = getVarCov(model02_mixed)

## Rsquare for Performance and Usefulness
(diag(Vmodel01) - diag(Vmodel02)) / diag(Vmodel01)
[1] 0.064686451 0.003341706
  • 6.47% variance of performance was explained by added predictors.
  • 0.33% variance of usefulness was explained by added predictors.

Exercise 2: Model mixed model with mse, mas and msc

  • Model 1: A path analysis with mse, mas, and msc as outcomes

  • Model 2: A empty mixed model with mse, mas, and msc are repeated measures nested in each individual

  • Compare two models: intercepts and correlations

Wrapping up

  • Things we get directly from path models that we do not get directly in mixed models:

    • Tests for approximate model fit

    • Scaled Chi-square for some types of non-normal data

    • Standardized parameter coefficients

    • Tests for indirect effects

    • R-squared statistics

  • Things we get directly in mixed models that we do not get in path models:

    • REML (unbiased estimates of variances/covariances)
  • In this lecture we discussed the basics of mixed model analyses for multivariate models

    • Model specification/identification

    • Model estimation

    • Model modification and re-estimation

    • Final model parameter interpretation

  • There is a lot to the analysis

    • but what is important to remember is the over-arching principal of multivariate analyses: covariance between variables is important

    • Mixed models imply very specific covariance structures

    • The validity of the results still hinge upon accurately finding an approximation to the covariance matrix