Lecture 10: Mixed Models for Multivariate Regression

Author

Affiliation

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

Published

October 9, 2024

Modified

October 11, 2024

0.1

Today's Class

Multivaraite regression via mixed models
Comparing and contrasting path analysis with mixed models

Differences in model fit measures
Differences in software estimation methods
Model comparisons via multivariate Wald tests (instead of LRTs)
How to compute R-square

0.2 R Setup

library(ESRM64503)
library(kableExtra)
library(tidyverse)
library(DescTools) # Desc() allows you to quick screen data
library(lavaan) # Desc() allows you to quick screen data
head(dataMath)

  id hsl cc use msc mas mse perf female
1  1  NA  9  44  55  39  NA   14      1
2  2   3  2  77  70  42  71   12      0
3  3  NA 12  64  52  31  NA   NA      1
4  4   6 20  71  65  39  84   19      0
5  5   2 15  48  19   2  60   12      0
6  6  NA 15  61  62  42  87   18      0

dim(dataMath)

[1] 350   9

0.3 Correction about `fixed.x` argument in previous lecture

If TRUE, the exogenous x covariates are considered fixed variables and the means, variances and covariances of these variables are fixed to their sample values.
If FALSE, they are considered random, and the means, variances and covariances are free parameters. Typically, called latent variable
If “default”, the value is set depending on the mimic option.

Thus, we considered the distributions of exogenous variables as known parameters.

0.4 What is Mixed Models

A mixed model, mixed-effects model or Linear mixed models (LMMs) is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences.
Mixed model can answer similar research questions as Path Analysis (or structural equation model):
- Relationships among multiple endogenous variables

0.5 Fixed effects vs. Random effects

Definition: Fixed effects are constant across individuals, and random effects vary.

Assume a person is measured t times (repeated measure design or longitudinal design), thus we have t points of x and y for each individuals

y_{it} = \beta_{0i} + \beta_1 x_{it}

Here, \beta_{0i} is random intercept that varies across individuals. \beta_{1} is the fixed slope that shared acorss individuals.

Alternative definition: Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population.

y_i = \beta_0 + \beta_1 x_i +e_i Thus, \sigma^e is random effect, \beta_0 and \beta_1 are fixed effects.

0.6 Properties of Mixed Models

Mixed models are used for many types of analyses:
- Analogous to MANOVA and M-Regression (so repeated measures analyses)
- Multilevel models for clustered, longitudinal, and crossed-effects data
The biggest difference between mixed models and path analysis software is the assumed distribution of the exogenous variables：
- Mixed models: no distribution assumed
- Path analysis: most software assumes multivariate normal distribution
- This affects how missing data are managed – mixed models cannot have any missing IVs
Mixed models also do not allow endogenous variables to predict other endogenous variables
- No indirect effects are possible from a single analysis (multiple analyses needed)
Mixed models software also often needs variables to be stored in so-called “stacked” or long-format data (one row per DV)
- We used wide-format data for lavaan (one row per person)

0.7 Wide to Long Data Transformation

Original wide-format data (all DVs for a person on one row)

dat <- dataMath
dat$cc10 <- dat$cc - 10
dat_wide <- dat |> select(id, perf, use, female, cc10)
head(dat_wide) # show first 6 lines

  id perf use female cc10
1  1   14  44      1   -1
2  2   12  77      0   -8
3  3   NA  64      1    2
4  4   19  71      0   10
5  5   12  48      0    5
6  6   18  61      0    5

Reshape with pivot_longer() function and Resulting data:

dat_long <- dat_wide |> 
1  pivot_longer(cols = c(perf, use),
2               names_to = "DV",
3               values_to = "score") |>
  mutate(dPerf = ifelse(DV == 'perf', 0, 1)) # convert DVs into indicator variable - dperf
head(dat_long)

1: cols: Columns to pivot into longer format. Put multiple column names (no quote) into c()
2: names_to: A character vector specifying the new column to create from the information stored in the column names
3: values_to: A string specifying the name of the column to create from the data stored in cell values.

# A tibble: 6 × 6
     id female  cc10 DV    score dPerf
  <int>  <int> <dbl> <chr> <int> <dbl>
1     1      1    -1 perf     14     0
2     1      1    -1 use      44     1
3     2      0    -8 perf     12     0
4     2      0    -8 use      77     1
5     3      1     2 perf     NA     0
6     3      1     2 use      64     1

0.8 Execise 1: Wide to Long Transform

Turn msc, mas, mse into long-form

dat_wide <- dat |> select(id, msc, mas, mse)
## You turn

id	DV	Score
1	msc	55
1	mas	39
1	mse	NA
2	msc	70

0.9 Statistical Form: Bridge Path Model w/t Mixed Model

Before we dive into mixed models, we will begin with a multivariate regression model:
- Predicting mathematics performance (PERF) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC)
- Predicting perceived usefulness (USE) with female (F), college math experience (CC), and the interaction between female and college math experience (FxCC)

PERF_i = \beta_{0,PERF} + \beta_{F,PERF} F_i + \beta_{CC,PERF}CC_i + \beta_{F*CC,PERF}F_i*CC_i + e_{i,PERF} \tag{1} USE_i = \beta_{0,USE} + \beta_{F,USE} F_i + \beta_{CC,USE}CC_i + \beta_{F*CC,USE}F_i*CC_i + e_{i,USE} \tag{2}

Mixed Model: Here I use the symbol \delta to represent each fixed effect in the multivariate model from the mixed model perspective. dPERF is the DV indicator: 1 - Perf and 0 - Use

Score_i = (\delta_{0,PERF} + \delta_{0, dPERF}) + (\delta_{F,PERF} + \delta_{F, dPERF}) F_i + (\delta_{CC,PERF} \\ + \delta_{CC, dPERF}) CC_i + (\delta_{F*CC,PERF} + \delta_{F*CC,dPERF}) F_i*CC_i + e_{i,PERF} + e_{i,dPERF} \tag{3}

Score_i = \delta_{0,PERF} + \delta_{0, dPERF}dPERF_i + \delta_{F,PERF} F_i + \delta_{F, dPERF}dPERF_i * F_i \\ + \delta_{CC,PERF} CC_i + \delta_{CC, dPERF} dPERF_i * CC_i \\ + \delta_{F*CC,PERF} F_i*CC_i + \delta_{F*CC,dPERF} dPERF_i * F_i*CC_i + e_{i,PERF} + e_{i,dPERF} \tag{4}

1 Build the Empty Model: Not So Empty

1.1 Statistical Form of empty mixed model

For illustration, let’s start from the empty model
A multivariate model using mixed model software uses the dummy code for DV to make all effects conditional on the specific DV in the model
- I will compare/contrast these with the symbols \beta from the fixed effects in path analysis
For instance, our empty model is thus:
- Where Score is condition on the value of dPerf:
  - When dPerf = 0 \rightarrow DV = “Use” \rightarrow Score_i = Use_i = \delta_0 + e_{i, Use}
  - When dPerf = 1 \rightarrow DV = “Perf” \rightarrow Score_i = Perf_i = \delta_0 + \delta_1 + e_{i, perf}

Score_{i, DV} = \delta_0 + \delta_1 dPerf_i + e_{i, DV}

1.2 Estimating the Empty Model

From the nlme library, we will use the gls() function
- Be sure the library is installed and loaded before trying this!

library(nlme) # install.packages("nlme")

dat_long <- dat_long[complete.cases(dat_long), ]
# create empty model using REML estimation to attempt to mirror initial analysis:
1model01_mixed = gls(model = score ~ 1 + dPerf,
                    data = dat_long,
2                    method = "REML",
3                    correlation = corSymm(form = ~1|id),
4                    weights = varIdent(form = ~1|DV))
summary(model01_mixed)

1: score ~ 1 + dPerf: Score_{i, DV} = \delta_0 + \delta_1 dPerf_i + e_{i, DV}
2: “REML”: Residual Maximum Likelihood Estimation
3: corSymm: General Correlation Structure; Provides estimates of all unique correlations; Needs id variable name after | for program to know which data comes from which person; ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups.
4: varIdent: a constant variance function structure; Estimates a different (residual) variance for each DV; With correlation line ensures an unstructured model is estimated

Generalized least squares fit by REML
  Model: score ~ 1 + dPerf 
  Data: dat_long 
       AIC      BIC    logLik
  3774.313 3795.871 -1882.156

Correlation Structure: General
 Formula: ~1 | id 
 Parameter estimate(s):
 Correlation: 
  1    
2 0.136
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | DV 
 Parameter estimates:
    perf      use 
1.000000 5.337397 

Coefficients:
               Value Std.Error  t-value p-value
(Intercept) 13.94085 0.1868631 74.60461       0
dPerf       38.46901 0.9399708 40.92575       0

 Correlation: 
      (Intr)
dPerf -0.079

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-3.24789265 -0.64196261  0.01956532  0.68109325  2.99644101 

Residual standard error: 3.023304 
Degrees of freedom: 553 total; 551 residual

1.3 Empty Model Results I: Covariance Matrix of DVs

The covariance matrix of DVs comes from the getVarCov() function

getVarCov(model01_mixed)

Marginal variance covariance matrix
       [,1]     [,2]
[1,] 9.1404   6.6311
[2,] 6.6311 260.3900
  Standard Deviations: 3.0233 16.137

Estimated variance-covariance matrix of PERF and USE scores.

1.4 Mapping Multivariate Mixed Models onto Path Models

To compare this result with the path analyses we conducted previously, we’ll have to use this data set
- Omit the same observations
So, we’ll need to take our long-format data and reshape it into wide-format:

head(dat_wide)

  id perf use female cc10
1  1   14  44      1   -1
2  2   12  77      0   -8
3  3   NA  64      1    2
4  4   19  71      0   10
5  5   12  48      0    5
6  6   18  61      0    5

library(lavaan)
model01_mirror.syntax = "
# Means:
perf ~ 1
use ~ 1

# Variances:
perf ~~ perf
use ~~ use

# Covariance:
perf ~~ use
"

model01_path_noNA.fit = sem(model01_mirror.syntax, data = dat_wide,
                            fixed.x = TRUE, 
                            mimic = "MPLUS", 
                            estimator = "MLR")
summary(model01_path_noNA.fit, 
        fit.measures = TRUE,
        standardized = TRUE)

lavaan 0.6.17 ended normally after 29 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

                                                  Used       Total
  Number of observations                           348         350
  Number of missing patterns                         3            

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                 0.000       0.000
  Degrees of freedom                                 0           0

Model Test Baseline Model:

  Test statistic                                 6.064       5.573
  Degrees of freedom                                 1           1
  P-value                                        0.014       0.018
  Scaling correction factor                                  1.088

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000       1.000
  Tucker-Lewis Index (TLI)                       1.000       1.000
                                                                  
  Robust Comparative Fit Index (CFI)                         1.000
  Robust Tucker-Lewis Index (TLI)                            1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2085.032   -2085.032
  Loglikelihood unrestricted model (H1)      -2085.032   -2085.032
                                                                  
  Akaike (AIC)                                4180.064    4180.064
  Bayesian (BIC)                              4199.325    4199.325
  Sample-size adjusted Bayesian (SABIC)       4183.464    4183.464

Root Mean Square Error of Approximation:

  RMSEA                                          0.000          NA
  90 Percent confidence interval - lower         0.000          NA
  90 Percent confidence interval - upper         0.000          NA
  P-value H_0: RMSEA <= 0.050                       NA          NA
  P-value H_0: RMSEA >= 0.080                       NA          NA
                                                                  
  Robust RMSEA                                               0.000
  90 Percent confidence interval - lower                     0.000
  90 Percent confidence interval - upper                     0.000
  P-value H_0: Robust RMSEA <= 0.050                            NA
  P-value H_0: Robust RMSEA >= 0.080                            NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000       0.000

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  perf ~~                                                               
    use               6.847    2.850    2.403    0.016    6.847    0.147

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    perf             13.959    0.174   80.442    0.000   13.959    4.721
    use              52.440    0.872   60.140    0.000   52.440    3.322

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    perf              8.742    0.754   11.596    0.000    8.742    1.000
    use             249.245   19.212   12.973    0.000  249.245    1.000

1.5 Comparing and Contrasting Results: Intercept (fixed effect)

\beta_{0,Perf} and \beta_{0,Use}:

parameterestimates(model01_path_noNA.fit) |> filter(op == "~1")

   lhs op rhs    est    se      z pvalue ci.lower ci.upper
1 perf ~1     13.959 0.174 80.442      0   13.619   14.299
2  use ~1     52.440 0.872 60.140      0   50.731   54.149

\delta_{0,DV} and \delta_{1,DV}:

summary(model01_mixed)$tTable

               Value Std.Error  t-value       p-value
(Intercept) 13.94085 0.1868631 74.60461 3.552065e-290
dPerf       38.46901 0.9399708 40.92575 3.477058e-169

\delta_{0,DV} + \delta_{1,DV}: 52.40986 is close to \beta_{0,Use}

1.6 Comparing and Contrasting Results: Residual variance coviarance

parameterestimates(model01_path_noNA.fit) |> 
  filter(op == "~~") |> 
  select(lhs, rhs, est) |> 
  pivot_wider(names_from = rhs, values_from = est)

# A tibble: 2 × 3
  lhs    perf    use
  <chr> <dbl>  <dbl>
1 perf   8.74   6.85
2 use   NA    249.

In mixed model, we cannot get the z-value (significance testing)

getVarCov(model01_mixed)

Marginal variance covariance matrix
       [,1]     [,2]
[1,] 9.1404   6.6311
[2,] 6.6311 260.3900
  Standard Deviations: 3.0233 16.137

1.7 Comparing and Contrasting Results: Correlation

standardizedsolution(model01_path_noNA.fit) |> filter(op == "~~", lhs != rhs)

   lhs op rhs est.std   se     z pvalue ci.lower ci.upper
1 perf ~~ use   0.147 0.06 2.437  0.015    0.029    0.265

model01_mixed$modelStruct$corStruct

Correlation structure of class corSymm representing
 Correlation: 
  1    
2 0.136

1.8 REML: Residual Maximum Likelihood Estimation

The ML estimator is nice, but the variance estimate is downward biased (too small)
- Remember – it divides by N for the residual covariance matrix
In small samples, this is likely to lead to biased estimates and incorrect p-values
- The variance goes into the SE, which goes into the Wald test, which dictates the p-value for the beta
Instead, another maximum likelihood technique has been developed: Residual Maximum Likelihood (REML)
- Maximizes the likelihood of the residuals rather than the data
- Has unbiased estimates of the residual covariance matrix
- Is the default method of estimation for most mixed model estimation packages

There is one catch to REML: you cannot use a LRT to compare nested models with differing fixed effects
- Because the algorithm uses residuals not data likelihood, if the residuals change, the likelihood changes
- Residuals come from the fixed effects \rightarrow if fixed effects are different, then residuals change, causing the likelihood to change
- Can use multivariate Wald test for fixed effects
Don’t mix ML and REML for the same analysis

2 Adding Predictors To The Model

2.1 Adding more predictors: female and cc10

Adding predictors to the model is similar to adding predictors in regular regression models
By using REML we cannot compare models using likelihood ratio tests
- REML LRTs must have same fixed effects
- Adding predictors adds new fixed effects to the empty model
We are predicting each DV with female, cc10, and female*cc10

2.2 Model with Predictors: Syntax

# Model 02: all predictors included
model02_formula = as.formula("score ~ 1 + dPerf + female + dPerf*female + cc10 + dPerf*cc10 + female*cc10 + dPerf*female*cc10")
model02_mixed <- gls(model = model02_formula, method = "REML",
                     data = dat_long, 
                     correlation = corSymm(form = ~1|id),
                     weights = varIdent(form = ~1|DV))
summary(model02_mixed)

Generalized least squares fit by REML
  Model: model02_formula 
  Data: dat_long 
       AIC      BIC    logLik
  3771.308 3818.617 -1874.654

Correlation Structure: General
 Formula: ~1 | id 
 Parameter estimate(s):
 Correlation: 
  1    
2 0.107
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | DV 
 Parameter estimates:
   perf     use 
1.00000 5.50965 

Coefficients:
                     Value Std.Error  t-value p-value
(Intercept)       13.68949 0.2237452 61.18340  0.0000
dPerf             38.10983 1.1751688 32.42924  0.0000
female             0.65832 0.3837798  1.71536  0.0868
cc10               0.09871 0.0354295  2.78617  0.0055
dPerf:female       1.17738 2.0068176  0.58669  0.5577
dPerf:cc10         0.09653 0.1979034  0.48778  0.6259
female:cc10        0.09377 0.0671627  1.39609  0.1633
dPerf:female:cc10  0.16641 0.3529074  0.47155  0.6374

 Correlation: 
                  (Intr) dPerf  female cc10   dPrf:f dPr:10 fml:10
dPerf             -0.096                                          
female            -0.583  0.056                                   
cc10              -0.121  0.011  0.070                            
dPerf:female       0.056 -0.586 -0.098 -0.006                     
dPerf:cc10         0.010 -0.147 -0.006 -0.081  0.086              
female:cc10        0.064 -0.006 -0.001 -0.528  0.003  0.043       
dPerf:female:cc10 -0.006  0.083  0.004  0.046  0.043 -0.561 -0.098

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-3.17907470 -0.67331699 -0.01178306  0.70547907  2.97992869 

Residual standard error: 2.923886 
Degrees of freedom: 553 total; 545 residual

getVarCov(model02_mixed)

Marginal variance covariance matrix
       [,1]     [,2]
[1,] 8.5491   5.0582
[2,] 5.0582 259.5200
  Standard Deviations: 2.9239 16.11

summary(model02_mixed)$tTable |> round(3)

                   Value Std.Error t-value p-value
(Intercept)       13.689     0.224  61.183   0.000
dPerf             38.110     1.175  32.429   0.000
female             0.658     0.384   1.715   0.087
cc10               0.099     0.035   2.786   0.006
dPerf:female       1.177     2.007   0.587   0.558
dPerf:cc10         0.097     0.198   0.488   0.626
female:cc10        0.094     0.067   1.396   0.163
dPerf:female:cc10  0.166     0.353   0.472   0.637

\beta_0 = 13.689, p < .001
\beta_{dPerf} = 38.110, p < .001
\beta_{female} = 0.658, p = 0.087
\beta_{cc10} = 0.099, p = 0.006
\beta_{dPerf*female} = 1.177, p = 0.558
\beta_{dPerf*cc10} = 0.097, p = 0.626
\beta_{female*cc10} = 0.094, p = 0.163
\beta_{dPerf*female*cc10} = 0.166, p = 0.637

2.3 First Question: Which Model “Fits” Better?

After adding the predictors (estimating their betas) to the model, we must first ask which model fits better
A likelihood ratio test (LRT) cannot be performed as we are using REML
Use multivariate wald test

library(multcomp)
model2_model_matrix <- diag(rep(1, 8))
rownames(model2_model_matrix) <- c(
  "Intercept",
  "dPerf",
  "female",
  "cc10",
  "dPerf:female",
  "dPerf:cc10",
  "female:cc10",
  "dPerf:female:cc10"
)
effects <- glht(model = model02_mixed, linfct = model2_model_matrix)
summary(effects)


     Simultaneous Tests for General Linear Hypotheses

Fit: gls(model = model02_formula, data = dat_long, correlation = corSymm(form = ~1 | 
    id), weights = varIdent(form = ~1 | DV), method = "REML")

Linear Hypotheses:
                       Estimate Std. Error z value Pr(>|z|)    
Intercept == 0         13.68949    0.22375  61.183   <1e-04 ***
dPerf == 0             38.10983    1.17517  32.429   <1e-04 ***
female == 0             0.65832    0.38378   1.715   0.4735    
cc10 == 0               0.09871    0.03543   2.786   0.0397 *  
dPerf:female == 0       1.17738    2.00682   0.587   0.9973    
dPerf:cc10 == 0         0.09653    0.19790   0.488   0.9992    
female:cc10 == 0        0.09377    0.06716   1.396   0.7130    
dPerf:female:cc10 == 0  0.16641    0.35291   0.472   0.9994    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

summary(effects, test = Ftest())


     General Linear Hypotheses

Linear Hypotheses:
                       Estimate
Intercept == 0         13.68949
dPerf == 0             38.10983
female == 0             0.65832
cc10 == 0               0.09871
dPerf:female == 0       1.17738
dPerf:cc10 == 0         0.09653
female:cc10 == 0        0.09377
dPerf:female:cc10 == 0  0.16641

Global Test:
  Chisq DF Pr(>Chisq)
1  8328  8          0

Note: R’s nlme function doesn’t do a good job with df.residual and provides a Chi-square test
Also note there are 6 degrees of freedom (one for each additional beta weight in the model)

2.4 Questions that can be answered

What is the effect of college experience on usefulness for males?
What is the effect of college experience on usefulness for females?
What is the difference between males and females ratings of usefulness when college experience = 10?
How did the difference between males and females ratings change for each additional hour of college experience?
What is the effect of college experience on performance for males?
What is the effect of college experience on performance for females?
What is the difference between males and females performance when college experience = 10?
How did the difference between males and females performance change for each additional hour of college experience?

2.5 Model R-squared

To determine the model R-squared, we have to compare the variance/covariance matrix from model01 and model02 and make the statistics ourselves:

Vmodel01 = getVarCov(model01_mixed)
Vmodel02 = getVarCov(model02_mixed)

## Rsquare for Performance and Usefulness
(diag(Vmodel01) - diag(Vmodel02)) / diag(Vmodel01)

[1] 0.064686451 0.003341706

6.47% variance of performance was explained by added predictors.
0.33% variance of usefulness was explained by added predictors.

2.6 Exercise 2: Model mixed model with mse, mas and msc

Model 1: A path analysis with mse, mas, and msc as outcomes
Model 2: A empty mixed model with mse, mas, and msc are repeated measures nested in each individual
Compare two models: intercepts and correlations

2.7 Wrapping up

Things we get directly from path models that we do not get directly in mixed models:
- Tests for approximate model fit
- Scaled Chi-square for some types of non-normal data
- Standardized parameter coefficients
- Tests for indirect effects
- R-squared statistics
Things we get directly in mixed models that we do not get in path models:
- REML (unbiased estimates of variances/covariances)
In this lecture we discussed the basics of mixed model analyses for multivariate models
- Model specification/identification
- Model estimation
- Model modification and re-estimation
- Final model parameter interpretation
There is a lot to the analysis
- but what is important to remember is the over-arching principal of multivariate analyses: covariance between variables is important
- Mixed models imply very specific covariance structures
- The validity of the results still hinge upon accurately finding an approximation to the covariance matrix

0.1