Lecture 03: Simple, Marginal, and Interaction Effects
More about general linear model
Jihong Zhang*, Ph.D
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2024-09-09
Presentation Outline
Homework 1 has been posted online
Centering and Coding Predictors
Interpreting Parameters in the Model for the Means
Main Effects Within Interactions
GLM Example 1: “Regression” vs. “ANOVA”
Today’s Plan
Unit 1: Review dummy coding and centering + A short In-Class quiz
Unit 2: Illustrate the GLM with interaction using on example
Unit 3: Practice R code + Q&A
Unit 1: Dummy Coding and Centering
Today’s Example:
Study examining effect of new instruction method (New: 0=Old, 1=New) on test performance (Test: % correct) in college freshmen vs. senior (Senior: 0=Freshman, 1=Senior). “New by Senior” leads to 4 groups with n = 25 per group (Group: 1= Ctl, 2=T1, 3=T2, 4=T4).
Note that Equation 1 (single group variable model) and Equation 2 (group interaction model) are equivalent models with exactly same sets of parameters.
Each person’s expected (predicted) outcome is a function of his/her values on x and z (and their interaction), each measured once person
Estimated parameters are called fixed effects (here, \(\beta_0\), \(\beta_1\), \(\beta_2\), and \(\beta_3\)); although they have a sampling distribution, they are not random variables
The number of fixed effects will show up in formulas as k (so k = 4 here)
Model for the Variance
\(e_p \sim N(0, \sigma^2_e) \rightarrow\) ONE residual (unexplained) deviation
\(e_p\) has a mean of 0 with some estimated constant variance \(\sigma^2_e\), is normally distributed, is unrelated across people
Estimated parameter is the residual variance only (in the model above)
General Form of Two Models
Model for the Means (Predicted Values): The Best Guess of Estimates
Variance-Covariance Matrix of Regression Coefficients (Diagnoal: variances of betas (squared standard errors of betas); Nondiagnoal: residual covariances between betas)
From now on, we will think carefully about how the predictor variables enter into the model for the means rather than the scales of predictors
Why don’t people always care about the scale of predictors (centering):
Does NOT affect the amount of outcome variance account for (\(R^2\))
Does NOT affect the outcomes values predicted by the model for the means (so long as the same predictor fixed effects are included)
Why should this matter to us?
Because the Intercept = expected outcome value when X = 0
Can end up with nonsense values for intercept if X = isn’t in the data
We will almost always need to deliberately adjust the scale of the predictor variables so that they have 0 values that could be observed in our data
Is much bigger deal in models with random effects (MLM) or GLM once interactions are included
Adjusting the Scale of Predictor Variables
For continuous (quantitative) predictors, we will make the intercept interpretable by centering:
Centering = subtract as constant from each person’s variable value so that the 0 value falls within the range of the new centered predictor variable
Typical \(\rightarrow\) Center around predictor’s mean: \(X_1^\prime = X_1 - \bar{X_1}\)
Better \(\rightarrow\) Center around meaningful constant C: \(X_1^\prime = X_1 - C\)
x <-rnorm(100, mean =1, sd =1)x_c <- x -mean(x)mean(x)mean(x_c)
[1] 1.019229
[1] 2.664535e-17
For categorical (grouping) predictors, either we or the program will make the intercept interpretable by creating a reference group:
Reference group is given a 0 value on all predictor variable created from the original group variable, such that the intercept is the expected outcome for that reference group specifically
Accomplished via “dummy coding” or “reference group coding”
For categorical predictors with more than two groups
We need to dummy coded the group variable using factor() in R
For example, I dummy coded group variable with group 1 as the reference
d1 = GroupT1 (0, 1, 0, 0) \(\rightarrow\)\(\beta_1\) = mean difference between Treatment1 vs. Control
d2 = GroupT2 (0, 0, 1, 0) \(\rightarrow\)\(\beta_2\) = mean difference between Treatment2 vs. Control
d3 = GroupT3 (0, 0, 0, 1) \(\rightarrow\)\(\beta_3\) = mean difference between Treatment3 vs. Control
How does the model give us all possible group difference?
Control Mean
Treatment 1 Mean
Treatment 2 Mean
Treatment 3 Mean
\(\beta_0\)
\(\beta_0+\beta_1\)
\(\beta_0+\beta_2\)
\(\beta_0+\beta_3\)
The model (coefficients with dummy coding) directly provides 3 differences (control vs. each treatment), and indirectly provides another 3 differences (differences between treatments)
Group differences from Dummy Codes
Set \(J = 4\) as number of groups:
The total number of group differences is \(J * (J-1) / 2 = 4*3/2 = 6\)
Control Mean
Treatment 1 Mean
Treatment 2 Mean
Treatment 3 Mean
\(\beta_0\)
\(\beta_0+\beta_1\)
\(\beta_0+\beta_2\)
\(\beta_0+\beta_3\)
All group differences
Alt Group
Ref Group
Difference
Control vs. T1
\((\beta_0+\beta_1)\)
- \(\beta_0\)
= \(\beta_1\)
Control vs. T2
\((\beta_0+\beta_2)\)
- \(\beta_0\)
= \(\beta_2\)
Control vs. T3
\((\beta_0+\beta_3)\)
- \(\beta_0\)
= \(\beta_3\)
T1 vs. T2
\((\beta_0+\beta_2)\)
-\((\beta_0+\beta_1)\)
= \(\beta_2-\beta_1\)
T1 vs. T3
\((\beta_0+\beta_3)\)
-\((\beta_0+\beta_1)\)
= \(\beta_3-\beta_1\)
T2 vs. T3
\((\beta_0+\beta_3)\)
-\((\beta_0+\beta_2)\)
= \(\beta_3 - \beta_2\)
R Code: Estimating All Group Differences in R
library(multcomp) # install.packages("multcomp")mod_e0 <-lm(Test ~ Group, data = dataTestExperiment)summary(mod_e0)
contrast_model_matrix =matrix(c(0, 1, 0, 0, # Control vs. T10, 0, 1, 0, # Control vs. T20, 0, 0, 1, # Control vs. T30, -1, 1, 0, # T1 vs. T20, -1, 0, 1, # T1 vs. T30, 0,-1, 1# T2 vs. T3), nrow =6, byrow = T)rownames(contrast_model_matrix) <-c( "Control vs. T1", "Control vs. T2", "Control vs. T3", "T1 vs. T2", "T1 vs. T3", "T2 vs. T3" ) contrast_value <-glht(mod_e0, linfct = contrast_model_matrix)summary(contrast_value)
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = Test ~ Group, data = dataTestExperiment)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
Control vs. T1 == 0 7.7600 0.7586 10.229 <0.001 ***
Control vs. T2 == 0 2.1600 0.7586 2.847 0.0275 *
Control vs. T3 == 0 6.8800 0.7586 9.069 <0.001 ***
T1 vs. T2 == 0 -5.6000 0.7586 -7.382 <0.001 ***
T1 vs. T3 == 0 -0.8800 0.7586 -1.160 0.6534
T2 vs. T3 == 0 4.7200 0.7586 6.222 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
What the intercept should mean to you
The model for the means will describe what happens to the predicted outcome Y “as X increases” or “as Z increases” and so forth
But you wont what Y is actually supposed to be unless you know where the predictor variables are starting from!
Therefor, the intercept is the “YOU ARE HERE” sign in the map of your data… so it should be somewhere in the map*!
Main Effects Within Interactions
Why Interaction Effect
Interaction (Moderation) effect allows researchers to investigating more complex situations - The effect of A on C is adjusted by B.
Interaction Effects
Interaction = Moderation: the effect of a predictor depends on the value of the interacting predictor
Either predictor can be “the moderator” (interpretive distinction only)
Interaction can always be evaluated for any combination of categorical and continuous predictors, although
In “ANOVA”: By default, all possible interactions are estimated
Software does this for you; oddly enough, nonsignificant interactions usually still are kept in the model (even if only significant interactions are interpreted)
In “ANCOVA”: Continuous predictors (“covariates”) do not get to be part of interaction ➡️ make the “homogeneity of regression” assumption
There is no reason to assume this – it is a testable hypothesis!
In “Regression”: No default – effects of predictors are as you specify them
Requires most thought, but gets annoying because in regression programs you usually have to manually create the interaction as an observed variable:
e.g., XZ_interaction = centered_X * centered_Z
Simple / Conditional Main Effects in GLM with Interactions
Note
Main effects of predictors within interactions should remain in the model regardless of whether or not they are significant
Reason: the role of two-way interaction is to adjust its main effects
However, the original idea of a “main effect” no longer applied … each main effect is conditional on the interacting predictor as 0 (\(X_1X_2=0\))
Conditional Main Effect = What it is + What modified it
Simple Main Effect = The modified value of main effect
“Conditional” main effect is a general form of “Simple” main effect:
\(\beta_1\) is the “simple” main effect of X1 when X2 = 0
\(\beta_1 + \beta_3 X_2\) is the “conditional” main effect of X1 depending on X2 values
\(\beta_2\) is the “simple” main effect of X2 when X1 = 0
\(\beta_2 + \beta_3 X_1\) is the “conditional” main effect of X2 depending on X1 values
Model-Implied Simple Main Effects
To quickly compute simple main effects:
Tip
The trick is keeping track of what 0 means for every interacting predictor, which depends on the way each predictor is being represented, as determined by you, or by the software without you!
where new conditional main effect\(\beta_2^{new} = \beta_2 + \beta_3 * T\)
Quiz
Testing the Significance of Model-Implied Fixed Effects
We now know how to calculate any conditional main effect:
Effect of interest (“conditional” main effect) = what it is + what modifies it
Output Effect (“simple” main effect) = what it is + what modifies it is 0
But if we want to test whether that new effect is \(\neq 0\), we also need its standard error (SE needed to get Wald test T-value ➡️p-value)
Even if the conditional main effect is not directly given by the model, its estimate and SE are still implied by the model
3 options to get the new conditional main effect estimates and SE
Method 1: Ask the software to give it to you using your original model
e.g., glht function in R package multicomp, ESTIMATE in SAS, TEST in SPSS, NEW in Mplus
Model 2: Re-center your predictors to the interacting value of interest (e.g., make Exam = 3 the new 0 for \(\text{Exam}_C\)) and re-estimate your model; repeat as needed for each value of interest
Method 3: Hand calculations (what the program is doing for you in option #1)
Method 3 for example: Effect of Motiv = \(\beta_1 + \beta_3 * \text{Exam}\)
We have following formula to calculate the sampling error variance of “conditional” main effect as
Values come from “asymptotic (sampling) covariance matrix”
Variance of a sum of terms always includes covariance among them
Here, this is because what each main effect estimate could be is related to what the other main effect estimates could be
Note that if a main effect is unconditional, its \(SE^2 = Var(\beta)\) only
Unit 2: GLM Example 1
GLM via Dummy-Coding in “Regression”
# Model 1: 2 X 2 predictors with 0/1 codingmodel1 <-lm(Test ~ Senior + New + Senior * New, data = dataTestExperiment)# Alternativeformular_mod1 <-as.formula("Test ~ Senior + New + Senior * New")model1 <-lm(formular_mod1, data = dataTestExperiment)summary(model1)
Call:
lm(formula = formular_mod1, data = dataTestExperiment)
Residuals:
Min 1Q Median 3Q Max
-6.36 -2.08 0.04 1.83 6.64
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 80.2000 0.5364 149.513 < 2e-16 ***
Senior 2.1600 0.7586 2.847 0.00539 **
New 7.7600 0.7586 10.229 < 2e-16 ***
Senior:New -3.0400 1.0728 -2.834 0.00561 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.682 on 96 degrees of freedom
Multiple R-squared: 0.6013, Adjusted R-squared: 0.5888
F-statistic: 48.26 on 3 and 96 DF, p-value: < 2.2e-16
anova(model1)
Analysis of Variance Table
Response: Test
Df Sum Sq Mean Sq F value Pr(>F)
Senior 1 10.24 10.24 1.4235 0.235762
New 1 973.44 973.44 135.3253 < 2.2e-16 ***
Senior:New 1 57.76 57.76 8.0297 0.005609 **
Residuals 96 690.56 7.19
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note
These ANOVA table is displaying marginal tests (F-test) for the main effects. Marginal tests are for the main effect only and are not conditional on any interacting variables.
Getting Group Means as a Contrast
We can get group means of test scores with Senior/Freshman by New/Old combination
glht() requests predicted outcomes from model for the means:
# Standard error of freshman-new groupVar_Test_1 <- Var_beta_0 <-vcov(model1)[1,1](SE_Test_1 <-sqrt(Var_Test_1))
[1] 0.5364078
# Standard error of freshman-old groupVar_beta_0 <-vcov(model1)[1,1]Var_beta_2 <-vcov(model1)[3,3]Cov_beta_0_2 <-vcov(model1)[1,3]Var_Test_2 <- Var_beta_0 + Var_beta_2 +2*Cov_beta_0_2(SE_Test_2 <-sqrt(Var_Test_2))
[1] 0.5364078
Your Homework 1 - Question 1
Copy and paste your R syntax abd output that calculates the Senior-Old’s standard errors of means (use the data and model we use in class). For example, our class syntax for Freshman-New is like:
We can use glht() to automatically calculate conditional main effects and their standard errors. Or, we can manually calculate them using vcov(model) function, which represents the variance-covariance matrix of main effects.
Method 1: glht method
effect_model1_matrix =matrix(c(0, 1, 0, 0, # Senior vs. Freshmen | Old0, 1, 0, 1, # Senior vs. Freshmen | New0, 0, 1, 0, # New vs. Old | Freshmen0, 0, 1, 1# New vs. Old | Senior), nrow =4, byrow = T)rownames(effect_model1_matrix) <-c( "beta_Senior_New0", # Conditional Main Effect of Senior given New = 0"beta_Senior_New1", # Conditional Main Effect of Senior given New = 1"beta_New_Senior0", # Conditional Main Effect of New given Senior = 0"beta_New_Senior1") # Conditional Main Effect of New given Senior = 1effect_model1 <-glht(model1, linfct = effect_model1_matrix)summary(effect_model1)
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = formular_mod1, data = dataTestExperiment)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
beta_Senior_New0 == 0 2.1600 0.7586 2.847 0.0196 *
beta_Senior_New1 == 0 -0.8800 0.7586 -1.160 0.5939
beta_New_Senior0 == 0 7.7600 0.7586 10.229 <0.001 ***
beta_New_Senior1 == 0 4.7200 0.7586 6.222 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
Method 2: vcov method
Example: Conditional main effectof Senior\(\mathbf{\beta_1^{Senior}}\) (Senior vs. Freshmen) when using the new method (New = 1)
Var_beta_1 =vcov(model1)[2,2] # squared SE of beta1Var_beta_3 =vcov(model1)[4,4] # squared SE of beta3Cov_beta_1_3 =vcov(model1)[4,2] # residual covariance of beta1 and beta3Var_beta_1_new <- Var_beta_1 + Var_beta_3 *1+2* Cov_beta_1_3 *1(SE_beta_1_new <-sqrt(Var_beta_1_new))
[1] 0.7585952
Your Homework 1 - Question 2
Copy and paste your R syntax and output that calculates the standard error of conditional main effect of New (New vs. Old) when Senior = 1. For example, standard error of conditional main effect of Senior when New = 1 is like: