Generalized Measurement Models: Modeling Observed Dichotomous Data
Jihong Zhang
Educational Statistics and Research Methods
Previous Class
Dive deep into factor scoring
Show how different initial values affect Bayesian model estimation
Show how parameterization differs for standardized latent variables vs. marker item scale identification
Today’s Lecture Objectives
Show how to estimate unidimensional latent variable models with dichotomous data
Also know as Item response theory (IRT) or Item factor analysis (IFA)
Show how to estimate different parameterizations of IRT/IFA models
Describe how to obtain IRT/IFA auxiliary statistics from Markov Chains
Show variations of various dichotomous-data models.
Example Data: Conspiracy Theories
Today’s example is from a bootstrap resample of 177 undergraduate students at a large state university in the Midwest.
The survey was a measure of 10 questions about their beliefs in various conspiracy theories that were being passed around the internet in the early 2010s
All item responses were on a 5-point Likert scale with:
Strong Disagree \(\rightarrow\) 0
Disagree \(\rightarrow\) 0
Neither Agree nor Disagree \(\rightarrow\) 0
Agree \(\rightarrow\) 1
Strongly Agree \(\rightarrow\) 1
The purpose of this survey was to study individual beliefs regarding conspiracies.
Our purpose in using this instrument is to provide a context that we all may find relevant as many of these conspiracies are still prevalent.
Make Our Data Dichotomous (not a good idea in practice)
To show dichotomous-data models with our data, we will arbitrarilly dichotomize our item responses:
{0}: Response is Strongly disagree or disagree, or Neither (1-3)
{1}: Response is Agree, or Strongly agree (4-5)
Now, we could argue that a 1 represents someone who agrees with a statement and 0 represents someone who disagrees or is neutral.
Note that this is only for illustrative purpose, such dichotomization shouldn’t be done because
There are distributions for multinomial categories
The results will reflect more of our choice for 0/1
But we first learn dichotomous data models before we get to models for polytomous models.
Examining Dichotomous Data
Note
These items have a relatively low proportion of people agreeing with each conspiracy statement
Highest mean: .69
Lowest mean: .034
Dichotomous Data Distribution: Bernoulli
The Bernoulli distribution is a one-trial version of the Binomial distribution
Sample space (support) \(Y \in {0,1}\)
The probability mass function:
\[
P(Y=y)=\pi^y(1-\pi)^{1-y}
\]
The Bernoulli distribution has only one parameter: \(\pi\) (typically, known as the probability of success: Y=1)
Mean of the distribution: \(E(Y)=\pi\)
Variance of the distribution: \(Var(Y)=\pi(1-\pi)\)
Definition: Dichotomous vs. Binary
Note the definitions of some of the words for data with two values:
Dichotomous: Taking two values (without numbers attached)
Binary: either zero or one
Therefore:
Not all dichotomous variable are binary, i.e., {2,7} is a dichotomous but not binary variable
All binary variables are dichotomous
Finally:
Bernoulli distributions are for binary variables
Most dichotomous variables can be recorded as binary variables without loss of model effects
Models with Bernoulli Distributions
Generalized linear models using Bernoulli distributions put a linear model onto a transformation of the mean
Link function maps the mean \(E(Y)\) from its original range of [0,1] to (-\(\infty\), \(\infty\));
For an unconditional (empty) model, this is shown here:
\[
f(E(Y)) =f(\pi)
\]
Link Functions for Bernoulli Distributions
Common choices for the link function in latent variable modeling:
Logit (or log odds):
\[
f(\pi)=\log(\frac\pi{1-\pi})
\]
Probit:
\[
f(\pi)=\Phi^{-1}(\pi)
\]
Where \(\Phi\) is the inverse cumulative distribution of a standard normal distribution
\(\theta_p\) is the latent variable for examinee \(p\), representing the examinee’s proficiency such that higher values indicate more proficency
\(a_i\), \(d_i\), \(c_i\) are item parameters:
\(a_i\): the capability of item to discriminate between examinees with lower and higher values along the latent variables;
\(d_i\): item “easiness”
\(b_i\): item “difficulty”, \(b_i=d_i/(-a_i)\)
\(c_i\): “pseudo-guessing” parameter – examinees with low proficiency may have a nonzero probability of a correct response due to guessing
Model Family Names
Depending on your field, the model from the previous slide can be called:
The two-parameter logistic (2PL) model with slope/intercept parameterization
An item factor model
These names reflect the terms given to the model in diverging literature:
2PL: Education measurement
Birnbaum, A. (1968). Some Latent Trait Models and Their Use in Inferring an Examinee’s Ability. In F. M. Lord & M. R. Novick (Eds.), Statistical Theories of Mental Test Scores (pp. 397-424). Reading, MA: Addison-Wesley.
model { lambda ~ multi_normal(meanLambda, covLambda); // Prior for item discrimination/factor loadings mu ~ multi_normal(meanMu, covMu); // Prior for item intercepts theta ~ normal(0, 1); // Prior for latent variable (with mean/sd specified)for (item in1:nItems){ Y[item] ~ bernoulli_logit(mu[item] + lambda[item]*theta); }}
For logit models without lower / upper asymptote parameters, Stan has a convenient bernoulli_logit function
Automatically has the link function embedded
The catch: The data has to be defined as an integer
Also, note that there are few differences from the model with normal outcomes (CFA)
No \(\psi\) parameters
Stan’s parameters Block
parameters {vector[nObs] theta; // the latent variables (one for each person)vector[nItems] mu; // the item intercepts (one for each item)vector[nItems] lambda; // the factor loadings/item discriminations (one for each item)}
Only change from normal outcomes (CFA) model:
No \(\psi\) (psi) parameters
Stan’s data{} Block
data {int<lower=0> nObs; // number of observationsint<lower=0> nItems; // number of itemsarray[nItems, nObs] int<lower=0, upper=1> Y; // item responses in an arrayvector[nItems] meanMu; // prior mean vector for intercept parametersmatrix[nItems, nItems] covMu; // prior covariance matrix for intercept parametersvector[nItems] meanLambda; // prior mean vector for discrimination parametersmatrix[nItems, nItems] covLambda; // prior covariance matrix for discrimination parameters}
One difference from normal outcome model:
array[nItems, nObs] int<lower=0, upper=1> Y;
Arrays are types of matrices (with more than two dimensions possible)
Allows for different types of data (here Y are integers)
Integer-valued variables needed for bernoulli_logit() function
Arrays are row-major (meaning order of items and persons is switched)
Can define differently
Change to Data List for Stan Import
The switch of items and observations in the array statement means the data imported have to be transposed: