NEET Analysis
1 Analysis Plan
- We can conduct descriptive statistics by three NEET groups to examine groups differences regarding demographic information, STG proportion, CLD outcomes.
- We can use odds ratio (OR) significance tests to test the significance of groups differences regarding STGs. This can help us test the hypotheses that whether some specific STGs associate with long-term NEET or long-term exit.
- To test the significance of specific pair of groups (such as long-term NEET vs. long-term exit), we can conduct multinominal logistic regression with regression coefficient (
b1
) is the estimated increase in the log odds.
2 Data Analysis
YM = Young mothers; HY = Hidden Youth; SD = School dropouts; PSD = Potential school dropouts; YASB = Youth with anti-social behavior(s); EMY = Ethnic minority youth; YRCS = Youth living in residential care settings; YO = Youth offender; YCR = Youth with criminal record(s); SEN = Youth with special education needs;
2.1 Descriptive statistics
We have relatively high missing rate for education, because of respondents choose option 7 “none of the above”. Those samples will be removed for further OR analysis and multinomial regression analysis.
R Code
N | (NA) | Mean | SD | | | Median | Min | Max | Skewness | Kurtosis | |
---|---|---|---|---|---|---|---|---|---|---|
Male | 443 | 1.50 | 0.50 | | | 2 | 1 | 2 | 0.00 | -2.00 | |
Age | 443 | 18.73 | 3.26 | | | 18 | 14 | 29 | 0.72 | -0.05 | |
EduF | 170 | 273 | 2.56 | 1.31 | | | 2 | 1 | 6 | 0.88 | 0.11 |
EduM | 184 | 259 | 2.44 | 1.17 | | | 2 | 1 | 6 | 0.84 | 0.40 |
EduS | 439 | 4 | 2.91 | 0.85 | | | 3 | 1 | 6 | 1.24 | 2.39 |
Residence | 443 | 17.60 | 4.48 | | | 17 | 1 | 29 | -0.70 | 1.98 | |
Assistance | 438 | 5 | 1.84 | 0.37 | | | 2 | 1 | 2 | -1.80 | 1.26 |
Remove missing values of residence and assistance. The final sample is N = 434
.
Var | Levels | N | % |
---|---|---|---|
Male | 1 | 216 | 49.8 |
2 | 218 | 50.2 | |
Education | 1 | 2 | 0.5 |
2 | 130 | 30.0 | |
3 | 240 | 55.3 | |
4 | 35 | 8.1 | |
5 | 19 | 4.4 | |
6 | 8 | 1.8 | |
Assistance | 1 | 70 | 16.1 |
2 | 364 | 83.9 |
Subgroup | N | Prop(%) |
---|---|---|
Long-Term NEET | 153 | 35.25 |
Long-Term Exit | 50 | 11.52 |
Temporary NEET | 231 | 53.23 |
Missing | 1709 | 79.96 |
Table 3: the last column Prop(%)
represents differnt things for NEET subgroups and Missing. For the missing group, it denotes the proportion of missing rate out of the whole samples (N= 2143). For NEET groups, those numbers denote proportion of the group size out of the NEET group (N = 434).
R Code
datNeetWide <- NEET_Subgroup_Cleaned_basic |>
dplyr::select(Male, Age, EduS,
Residence, Assistance,
YM:SEN, CE1:CLDH1, Subgroup) |>
filter(!is.na(EduS), !is.na(Assistance))
datNeetLong <- datNeetWide |>
rownames_to_column('id') |>
pivot_longer(-c(id, Subgroup), names_to = "Vars",
values_to = "Resp", values_transform = as.numeric)
datNeetLongTbl <- datNeetLong |>
group_by(Subgroup, Vars) |>
summarise(Mean = mean(Resp, na.rm = TRUE)) |>
ungroup() |>
mutate(Subgroup = factor(Subgroup, labels = c("Long-Term NEET",
"Long-Term Exit",
"Temporary NEET",
"NotTracked NEET"))) |>
pivot_wider(names_from = Subgroup, values_from = Mean) |>
mutate(Vars = factor(Vars, levels = unique(datNeetLong$Vars))) |>
arrange(Vars) |>
select(-`NotTracked NEET`)
max_values_byRow <- datNeetLongTbl |>
mutate(across(where(is.numeric), \(x) round(x, 3))) |>
select(-Vars) |>
apply(1, max) |>
as.numeric()
datNeetLongTbl %>%
mutate(across(where(is.numeric), \(x) round(x, 3))) |>
mutate(across(-Vars, \(x) if_else(row_number() %in% which(x %in% max_values_byRow),
cell_spec(x,
format = "html",
color = "#B4464B", bold = TRUE
), as.character(x)
))) %>%
mykbl(escape = FALSE, booktabs = TRUE) |>
row_spec(which(row_number(datNeetLongTbl) %in% 1:5),
background = softcolors[2]) |>
row_spec(which(row_number(datNeetLongTbl) %in% 6:15),
background = softcolors[4]) |>
row_spec(which(row_number(datNeetLongTbl) %in% 16:20),
background = softcolors[6])
Vars | Long-Term NEET | Long-Term Exit | Temporary NEET |
---|---|---|---|
Male | 1.503 | 1.48 | 1.506 |
Age | 17.882 | 20.26 | 18.935 |
EduS | 2.908 | 3.08 | 2.883 |
Residence | 16.735 | 19.14 | 17.771 |
Assistance | 1.85 | 1.88 | 1.823 |
YM | 0.092 | 0.04 | 0.087 |
HY | 0.033 | 0.02 | 0.03 |
SD | 0.131 | 0.3 | 0.203 |
PSD | 0.386 | 0 | 0.268 |
YASB | 0.144 | 0.04 | 0.121 |
EMY | 0.033 | 0.04 | 0.108 |
YRCS | 0.02 | 0.12 | 0.013 |
YO | 0.039 | 0 | 0.039 |
YCR | 0.046 | 0.1 | 0.03 |
SEN | 0.105 | 0.2 | 0.082 |
CE1 | 1.912 | 1.39 | 1.877 |
SC1 | 1.951 | 1.624 | 2.015 |
SI1 | 3.134 | 2.748 | 3.158 |
YCDC1 | 3.035 | 2.725 | 3.07 |
CLDH1 | 2.723 | 2.27 | 2.083 |
2.2 Odds ratio tests with one-tailed chi-square test
The results in table below suggest that
- PSD (Potential school dropouts) is more associated with higher chance of being long-term NEET;
- SD, YRCS (Youth living in residential care settings), and SEN are more associated with higher chance of long-term exit,
- EMY (Ethnic minority youth) is correlated with higher likelihood of temporary NEET.
R Code
datOR <- datNeetLong |>
filter(Subgroup != 4) |>
dummy_cols(select_columns = "Subgroup") |>
pivot_wider(names_from = Vars, values_from = Resp)
compr <- expand_grid(
y = paste0("Subgroup_", 1:3),
x = c("YM", "HY", "SD", "PSD", "YASB", "EMY", "YRCS", "YO", "YCR", "SEN")
)
OR_onepair <- function(y, x) {
OR_list = fisher.test(datOR[[y]], datOR[[x]], alternative = 'greater')
res <- data.frame(
y = y,
x = x,
OR = round(OR_list$estimate, 4), # an estimate of the odds ratio
p.value = round(OR_list$p.value, 4),
CI = paste0("[",paste0(round(as.numeric(OR_list$conf.int), 2), collapse = ", "), "]")
)
res
}
OR_output <- Reduce("rbind", map2(compr$y, compr$x, \(x, y) OR_onepair(x, y)))
OR_output <- OR_output |>
mutate(y = factor(y, labels = c("Long-Term NEET","Long-Term Exit","Temporary NEET"),
levels = paste0("Subgroup_", 1:3)
))
kbl(OR_output, row.names = FALSE) |>
kable_styling(bootstrap_options = c("condensed")) |>
row_spec(which(OR_output$p.value < 0.05),
background = softcolors[2])
y | x | OR | p.value | CI |
---|---|---|---|---|
Long-Term NEET | YM | 1.1853 | 0.3791 | [0.61, Inf] |
Long-Term NEET | HY | 1.1525 | 0.5081 | [0.36, Inf] |
Long-Term NEET | SD | 0.5319 | 0.9932 | [0.32, Inf] |
Long-Term NEET | PSD | 2.2127 | 0.0002 | [1.51, Inf] |
Long-Term NEET | YASB | 1.4039 | 0.1633 | [0.82, Inf] |
Long-Term NEET | EMY | 0.3185 | 0.9972 | [0.11, Inf] |
Long-Term NEET | YRCS | 0.6051 | 0.8565 | [0.14, Inf] |
Long-Term NEET | YO | 1.2329 | 0.4437 | [0.43, Inf] |
Long-Term NEET | YCR | 1.0746 | 0.5297 | [0.42, Inf] |
Long-Term NEET | SEN | 1.0148 | 0.5426 | [0.55, Inf] |
Long-Term Exit | YM | 0.4296 | 0.9386 | [0.07, Inf] |
Long-Term Exit | HY | 0.6332 | 0.8011 | [0.03, Inf] |
Long-Term Exit | SD | 2.0239 | 0.0304 | [1.09, Inf] |
Long-Term Exit | PSD | 0.0000 | 1.0000 | [0, Inf] |
Long-Term Exit | YASB | 0.2789 | 0.9902 | [0.05, Inf] |
Long-Term Exit | EMY | 0.4923 | 0.9059 | [0.08, Inf] |
Long-Term Exit | YRCS | 8.5091 | 0.0009 | [2.66, Inf] |
Long-Term Exit | YO | 0.0000 | 1.0000 | [0, Inf] |
Long-Term Exit | YCR | 2.9262 | 0.0551 | [0.97, Inf] |
Long-Term Exit | SEN | 2.4861 | 0.0226 | [1.17, Inf] |
Temporary NEET | YM | 1.1075 | 0.4543 | [0.59, Inf] |
Temporary NEET | HY | 1.0260 | 0.5951 | [0.35, Inf] |
Temporary NEET | SD | 1.2255 | 0.2418 | [0.79, Inf] |
Temporary NEET | PSD | 0.8956 | 0.7335 | [0.62, Inf] |
Temporary NEET | YASB | 1.0287 | 0.5219 | [0.61, Inf] |
Temporary NEET | EMY | 3.3896 | 0.0024 | [1.56, Inf] |
Temporary NEET | YRCS | 0.2844 | 0.9899 | [0.07, Inf] |
Temporary NEET | YO | 1.3302 | 0.3954 | [0.49, Inf] |
Temporary NEET | YCR | 0.4982 | 0.9557 | [0.19, Inf] |
Temporary NEET | SEN | 0.6108 | 0.9573 | [0.34, Inf] |
2.3 Multinomial logistic regression
R Code
dat <- datNeetWide |>
mutate(
Male = factor(Male-1, levels = 0:1),
EduS = EduS-1,
Assistance = factor(Assistance-1, levels = 0:1)
) |>
mutate(Subgroup = factor(Subgroup, labels = c("Long-Term NEET",
"Long-Term Exit",
"Temporary NEET",
"Missing"))) |>
select(-c(YM, HY, SD, PSD, YASB, EMY, YRCS, YO, YCR, SEN))
coef_tbl <- coefMultinom(dat = dat)
colnames(coef_tbl) <- c("Predictor", "b (Long-Term Exit)",
"b (Temporary NEET)")
kable(coef_tbl, row.names = TRUE)
Predictor | b (Long-Term Exit) | b (Temporary NEET) | |
---|---|---|---|
1 | Male1 | -0.161 | -0.048 |
2 | Age | 0.212*** | 0.125** |
3 | EduS | -0.026 | -0.216 |
4 | Residence | 0.004 | 0.004 |
5 | Assistance1 | 0.239 | -0.165 |
6 | CE1 | -0.33* | -0.31 |
7 | SC1 | -0.085 | 0.351 |
8 | SI1 | -0.062 | 0.066 |
9 | YCDC1 | -0.003 | 0.008 |
10 | CLDH1 | -0.037 | -0.036 |