TITLE: A novel method for model selection in Bayesian Diagnostic Classification Modeling
University of Iowa
2022-10-24
Background (5 minutes)
Performance measures (5 minutes)
Simulation study (15 minutes)
Empirical study (10 minutes)
Conclusion (5 minutes)
Discussion (5 minutes)
Q-matrix is usually determined by expert judgement, so there can be uncertainty about some of its elements. Model selection methods are necessary to select the model with the “correct” Q-matrix.
Previous model selection methods such as information criterion and Bayes Factors are not flexible regarding checking specific aspects of data
Advantages of PPC
Drawbacks of PPC
PPC is not fully Bayesian since it doesn’t take the uncertainty of observed data into account
PPC uses data twice
To construct a novel PPMC method using limited-information model fit indices in Bayesian LCDM
Simulation study: to determine the performance of the proposed method under different conditions and compare it to previous model checking methods
Empirical study: to investigate the utility of PPMC with limited-information model fit indices in real settings
Is the proposed method appropriate for detecting model-data misfit with varied degree of Q-matrix misspecification
Compared to information criteria, does the proposed approach have higher true positive rate (TPR) when selecting the correct model?
How does the overall discrimination power indicated by Cognitive Diagnostic Index affects the performance of the proposed method in selecting the model with best Q-matrix?
Generate simulated data sets under the LCDM framework with two main factors:
30 items and 5 attributes
Latent attributes: mastery status of attributes for each individual are determined by cutting attribute scores. Item parameters are randomly sampled.
Based on attribute correlation, continuous attribute scores are first generated for each sample.
Then continuous attribute scores are dichonomized by cutting the scores with the cutting scores
Finally. observed item responses are generated with attribute status and corresponding item parameters
Q-matrix
BayesNet | Correct1 | Underspecified Qmatrix | Incorrect specified Qmatrix | |||
---|---|---|---|---|---|---|
10% | 20% | 10% | 20% | |||
Skill Correlation is .25 | ||||||
[1000,1250) | 23.23(1.94) | 21.49(2.08) | 28.92(3.19) | 33.51(3.97) | 27.92(3.36) | 30.02(3.49) |
[1250,1500) | 30.48(3.13) | 29.03(2.90) | 40.88(6.60) | 48.82(8.19) | 39.66(5.47) | 42.73(5.95) |
[1500,1750) | 39.36(3.82) | 38.30(4.15) | 57.56(8.70) | 69.60(10.70) | 53.99(8.13) | 58.41(9.03) |
[1750,2000) | 48.96(3.09) | 49.55(3.65) | 75.40(10.02) | 90.90(11.19) | 68.45(7.70) | 75.26(9.15) |
Skill Correlation is .50 | ||||||
[1000,1250) | 24.83(2.97) | 24.07(2.95) | 32.25(5.36) | 36.69(5.78) | 32.59(5.58) | 34.44(5.43) |
[1250,1500) | 34.24(3.70) | 33.72(3.58) | 47.70(6.60) | 54.91(7.26) | 47.04(6.00) | 49.96(5.94) |
[1500,1750) | 43.38(3.11) | 45.76(3.96) | 63.14(7.27) | 74.70(9.73) | 65.10(7.17) | 69.42(8.05) |
[1750,2000) | 55.54(3.48) | 60.55(4.74) | 94.06(10.59) | 111.40(12.80) | 90.52(9.36) | 97.18(10.01) |
1 Bold font: The model with smallest average values of PP-M2. |
The correct model and the BayesNet model have lowest PP-M2 (best fit).
As sample size increases, the difference of PP-M2 among models gets larger. In other words, the PP-M2 has asymptotically more power detecting misfit.
The BayesNet model has least uncertainty of model predictive accuracy in term of variations of average PP-M2
As more items misspecify/underspecify attributes in Q-matrix, the PP-M2 gets higher.
Models | Information Criterion1,2 | KS-PP-M2 | ||||
---|---|---|---|---|---|---|
DIC | WAIC | AIC | BIC | |||
Skill Correlation is .25 | ||||||
Correct | Model 1 | 44,301(8,816) | 44,303(8,817) | 44,387(8,812) | 45,291(8,846) | 0.18(0.04) |
Underspecified | Model 2: 10% | 44,869(8,931) | 44,872(8,932) | 44,961(8,927) | 45,865(8,961) | 0.66(0.14) |
Underspecified | Model 3: 20% | 45,400(9,028) | 45,406(9,030) | 45,500(9,024) | 46,404(9,058) | 0.83(0.10) |
Misspecified | Model 4: 10% | 44,842(8,905) | 44,845(8,907) | 44,932(8,902) | 45,805(8,934) | 0.60(0.13) |
Misspecified | Model 5: 20% | 45,110(8,952) | 45,113(8,953) | 45,188(8,949) | 45,997(8,979) | 0.70(0.12) |
Skill Correlation is .50 | ||||||
Correct | Model 1 | 45,607(9,219) | 45,612(9,221) | 45,699(9,215) | 46,614(9,250) | 0.33(0.11) |
Underspecified | Model 2: 10% | 46,152(9,341) | 46,159(9,342) | 46,250(9,337) | 47,165(9,372) | 0.75(0.13) |
Underspecified | Model 3: 20% | 46,659(9,448) | 46,668(9,450) | 46,763(9,444) | 47,677(9,479) | 0.86(0.10) |
Misspecified | Model 4: 10% | 46,172(9,326) | 46,178(9,327) | 46,268(9,322) | 47,150(9,356) | 0.75(0.13) |
Misspecified | Model 5: 20% | 46,403(9,373) | 46,409(9,374) | 46,485(9,369) | 47,304(9,401) | 0.80(0.10) |
1 Bold: The model with smallest average value of model selection indice. | ||||||
2 Information Criterion & KS-PP-M2: lower values better model fit |
Posterior predictive M2 statistics showed the Bayesian Network model and the correct model have best model fit.
Similar to information criteria, KS-PP-M2 can select data generation model from models with Q-matrix misspecification
Higher Q-matrix misspecification, KS-PP-M2 has higher values, which suggest worse model fit.
Compared to other methods, KS-PP-M2 has same power of selecting the better model and detecting Q-matrix misspecification under all conditions.
CDI (test-level discrimination power) has insignificant effect on the fit statistics of the proposed method.
The Examination for Certificate of Proficiency in English (ECPE) data was used as the example data.
One reference model and two analysis models: (1) three-dimensional model (the best fitted model in Templin & Hoffman, 2013); (2) two-dimensional model with randomly generated Q-matrix.
Measures: (1) absolute fit: PP-M2, (2) relative fit: KS-PP-M2, DIC and WAIC
According to the graphical checking of PP-M2, the BayesNet model is the best-fitting model, then followed by the three-dimensional model. The two-dimensioanl model has worst model fit.
DIC, WAIC, KS-PP-M2 all suggested that the three-dimensional model is better than the two-dimensional model.
KS-PP-M2 suggested that neither the three-dimensional model and the two-dimensional model have close model fit with BayesNet model.
DIC / WAIC
DIC is a somewhat Bayesian version of AIC that makes two changes, replacing the maximum likelihood estimate \(\theta\) with the posterior mean and replacing k with a data-based bias correction.1
WAIC is a more fully Bayesian approach for estimating the out-of-sample expectation, starting with the computed log pointwise posterior predictive density and then adding a correction for effective number of parameters to adjust for overfitting.
Simulating replicated data under the fitted model and then comparing these to the observed data (Gelman & Hill, 2006, p. 158)
Aims:
check local and global model-fit for some aspects of data they’re interested in
provide graphical evidence about model fit
One critique of posterior predictive check is it uses the data twice (Blei, 2011), which means data is not only used for estimating the model but also for checking if the model fits to the data.
Validate the model on external data
multiple alternative models existed
uncertaity of dimensionality
Q-matrix misspecification
Sample size
Discrimination information
Q-matrix
Model structure
M2 is a limited-information statistics which calculated up-to second probabilities of item responses.
M2 more robust than full-information fit statistics in small sample sizes.
PP-M2 is M2 values conditional on posterior information. Lower average values suggest better model fit.
Thesis Defence 2022