NEET Analysis

Author

Jihong Zhang

1 Analysis Plan

  1. We can conduct descriptive statistics by three NEET groups to examine groups differences regarding demographic information, STG proportion, CLD outcomes.
  2. We can use odds ratio (OR) significance tests to test the significance of groups differences regarding STGs. This can help us test the hypotheses that whether some specific STGs associate with long-term NEET or long-term exit.
  3. To test the significance of specific pair of groups (such as long-term NEET vs. long-term exit), we can conduct multinominal logistic regression with regression coefficient (b1) is the estimated increase in the log odds.

2 Data Analysis

YM = Young mothers; HY = Hidden Youth; SD = School dropouts; PSD = Potential school dropouts; YASB = Youth with anti-social behavior(s); EMY = Ethnic minority youth; YRCS = Youth living in residential care settings; YO = Youth offender; YCR = Youth with criminal record(s); SEN = Youth with special education needs;

R Code
source("Code/data_Preparation_NEET0609.R")

2.1 Descriptive statistics

We have relatively high missing rate for education, because of respondents choose option 7 “none of the above”. Those samples will be removed for further OR analysis and multinomial regression analysis.

R Code
kable(res$desc, digits = 2) |> 
  row_spec(which(rownames(res$desc) %in% c("EduF", "EduM")), background = softcolors[3]) |> 
  kable_styling()
Table 1: Sample Size for demographic variables
N (NA) Mean SD | Median Min Max Skewness Kurtosis
Male 443 1.50 0.50 | 2 1 2 0.00 -2.00
Age 443 18.73 3.26 | 18 14 29 0.72 -0.05
EduF 170 273 2.56 1.31 | 2 1 6 0.88 0.11
EduM 184 259 2.44 1.17 | 2 1 6 0.84 0.40
EduS 439 4 2.91 0.85 | 3 1 6 1.24 2.39
Residence 443 17.60 4.48 | 17 1 29 -0.70 1.98
Assistance 438 5 1.84 0.37 | 2 1 2 -1.80 1.26

Remove missing values of residence and assistance. The final sample is N = 434.

Table 2: Frequency table for demographic variables
Var Levels N %
Male 1 216 49.8
2 218 50.2
Education 1 2 0.5
2 130 30.0
3 240 55.3
4 35 8.1
5 19 4.4
6 8 1.8
Assistance 1 70 16.1
2 364 83.9
Table 3: Sample size by groups
Subgroup N Prop(%)
Long-Term NEET 153 35.25
Long-Term Exit 50 11.52
Temporary NEET 231 53.23
Missing 1709 79.96

Table 3: the last column Prop(%) represents differnt things for NEET subgroups and Missing. For the missing group, it denotes the proportion of missing rate out of the whole samples (N= 2143). For NEET groups, those numbers denote proportion of the group size out of the NEET group (N = 434).

R Code
datNeetWide <- NEET_Subgroup_Cleaned_basic |> 
  dplyr::select(Male, Age, EduS, 
                Residence, Assistance,
                YM:SEN, CE1:CLDH1, Subgroup) |> 
  filter(!is.na(EduS), !is.na(Assistance))
datNeetLong <-  datNeetWide |> 
  rownames_to_column('id') |> 
  pivot_longer(-c(id, Subgroup), names_to = "Vars", 
               values_to = "Resp", values_transform = as.numeric)

datNeetLongTbl <- datNeetLong |> 
  group_by(Subgroup, Vars) |> 
  summarise(Mean = mean(Resp, na.rm = TRUE)) |> 
  ungroup() |> 
  mutate(Subgroup = factor(Subgroup, labels = c("Long-Term NEET",
                                                "Long-Term Exit",
                                                "Temporary NEET",
                                                "NotTracked NEET"))) |> 
  pivot_wider(names_from = Subgroup, values_from = Mean) |> 
  mutate(Vars = factor(Vars, levels = unique(datNeetLong$Vars))) |> 
  arrange(Vars) |> 
  select(-`NotTracked NEET`)

max_values_byRow <- datNeetLongTbl |>
  mutate(across(where(is.numeric), \(x) round(x, 3))) |>
  select(-Vars) |>
  apply(1, max) |>
  as.numeric() 
 
datNeetLongTbl %>%
  mutate(across(where(is.numeric), \(x) round(x, 3))) |>
  mutate(across(-Vars, \(x) if_else(row_number() %in% which(x %in% max_values_byRow),
    cell_spec(x,
      format = "html",
      color = "#B4464B", bold = TRUE
    ), as.character(x)
  ))) %>%
  mykbl(escape = FALSE, booktabs = TRUE) |> 
  row_spec(which(row_number(datNeetLongTbl) %in% 1:5),
           background = softcolors[2]) |>
  row_spec(which(row_number(datNeetLongTbl) %in% 6:15),
           background = softcolors[4]) |>
  row_spec(which(row_number(datNeetLongTbl) %in% 16:20),
           background = softcolors[6]) 
Table 4: Descriptive statistics for groups
Vars Long-Term NEET Long-Term Exit Temporary NEET
Male 1.503 1.48 1.506
Age 17.882 20.26 18.935
EduS 2.908 3.08 2.883
Residence 16.735 19.14 17.771
Assistance 1.85 1.88 1.823
YM 0.092 0.04 0.087
HY 0.033 0.02 0.03
SD 0.131 0.3 0.203
PSD 0.386 0 0.268
YASB 0.144 0.04 0.121
EMY 0.033 0.04 0.108
YRCS 0.02 0.12 0.013
YO 0.039 0 0.039
YCR 0.046 0.1 0.03
SEN 0.105 0.2 0.082
CE1 1.912 1.39 1.877
SC1 1.951 1.624 2.015
SI1 3.134 2.748 3.158
YCDC1 3.035 2.725 3.07
CLDH1 2.723 2.27 2.083

2.2 Odds ratio tests with one-tailed chi-square test

The results in table below suggest that

  • PSD (Potential school dropouts) is more associated with higher chance of being long-term NEET;
  • SD, YRCS (Youth living in residential care settings), and SEN are more associated with higher chance of long-term exit,
  • EMY (Ethnic minority youth) is correlated with higher likelihood of temporary NEET.
R Code
datOR <- datNeetLong |> 
  filter(Subgroup != 4) |> 
  dummy_cols(select_columns = "Subgroup") |> 
  pivot_wider(names_from = Vars, values_from = Resp)

compr <- expand_grid(
  y = paste0("Subgroup_", 1:3),
  x = c("YM", "HY", "SD", "PSD", "YASB", "EMY", "YRCS", "YO", "YCR", "SEN")
)

OR_onepair <- function(y, x) {
  OR_list = fisher.test(datOR[[y]], datOR[[x]], alternative = 'greater')
  
  res <- data.frame(
    y = y,
    x = x,
    OR = round(OR_list$estimate, 4), # an estimate of the odds ratio
    p.value = round(OR_list$p.value, 4),
    CI = paste0("[",paste0(round(as.numeric(OR_list$conf.int), 2), collapse = ", "), "]")
  )
  
  res
}

OR_output <- Reduce("rbind", map2(compr$y, compr$x, \(x, y) OR_onepair(x, y)))
OR_output <- OR_output |> 
  mutate(y = factor(y, labels = c("Long-Term NEET","Long-Term Exit","Temporary NEET"),
                    levels = paste0("Subgroup_", 1:3)
                    ))


kbl(OR_output, row.names = FALSE) |> 
  kable_styling(bootstrap_options = c("condensed")) |> 
  row_spec(which(OR_output$p.value < 0.05),
           background = softcolors[2])
Table 5: Odds ratio results
y x OR p.value CI
Long-Term NEET YM 1.1853 0.3791 [0.61, Inf]
Long-Term NEET HY 1.1525 0.5081 [0.36, Inf]
Long-Term NEET SD 0.5319 0.9932 [0.32, Inf]
Long-Term NEET PSD 2.2127 0.0002 [1.51, Inf]
Long-Term NEET YASB 1.4039 0.1633 [0.82, Inf]
Long-Term NEET EMY 0.3185 0.9972 [0.11, Inf]
Long-Term NEET YRCS 0.6051 0.8565 [0.14, Inf]
Long-Term NEET YO 1.2329 0.4437 [0.43, Inf]
Long-Term NEET YCR 1.0746 0.5297 [0.42, Inf]
Long-Term NEET SEN 1.0148 0.5426 [0.55, Inf]
Long-Term Exit YM 0.4296 0.9386 [0.07, Inf]
Long-Term Exit HY 0.6332 0.8011 [0.03, Inf]
Long-Term Exit SD 2.0239 0.0304 [1.09, Inf]
Long-Term Exit PSD 0.0000 1.0000 [0, Inf]
Long-Term Exit YASB 0.2789 0.9902 [0.05, Inf]
Long-Term Exit EMY 0.4923 0.9059 [0.08, Inf]
Long-Term Exit YRCS 8.5091 0.0009 [2.66, Inf]
Long-Term Exit YO 0.0000 1.0000 [0, Inf]
Long-Term Exit YCR 2.9262 0.0551 [0.97, Inf]
Long-Term Exit SEN 2.4861 0.0226 [1.17, Inf]
Temporary NEET YM 1.1075 0.4543 [0.59, Inf]
Temporary NEET HY 1.0260 0.5951 [0.35, Inf]
Temporary NEET SD 1.2255 0.2418 [0.79, Inf]
Temporary NEET PSD 0.8956 0.7335 [0.62, Inf]
Temporary NEET YASB 1.0287 0.5219 [0.61, Inf]
Temporary NEET EMY 3.3896 0.0024 [1.56, Inf]
Temporary NEET YRCS 0.2844 0.9899 [0.07, Inf]
Temporary NEET YO 1.3302 0.3954 [0.49, Inf]
Temporary NEET YCR 0.4982 0.9557 [0.19, Inf]
Temporary NEET SEN 0.6108 0.9573 [0.34, Inf]

2.3 Multinomial logistic regression

R Code
dat <- datNeetWide |> 
  mutate(
    Male = factor(Male-1, levels = 0:1),
    EduS = EduS-1,
    Assistance = factor(Assistance-1, levels = 0:1)
  ) |> 
  mutate(Subgroup = factor(Subgroup, labels = c("Long-Term NEET",
                                                "Long-Term Exit",
                                                "Temporary NEET",
                                                "Missing"))) |> 
  select(-c(YM, HY, SD, PSD, YASB, EMY, YRCS, YO, YCR, SEN))

coef_tbl <- coefMultinom(dat = dat)
colnames(coef_tbl) <- c("Predictor", "b (Long-Term Exit)",
                        "b (Temporary NEET)")
kable(coef_tbl, row.names = TRUE)
Predictor b (Long-Term Exit) b (Temporary NEET)
1 Male1 -0.161 -0.048
2 Age 0.212*** 0.125**
3 EduS -0.026 -0.216
4 Residence 0.004 0.004
5 Assistance1 0.239 -0.165
6 CE1 -0.33* -0.31
7 SC1 -0.085 0.351
8 SI1 -0.062 0.066
9 YCDC1 -0.003 0.008
10 CLDH1 -0.037 -0.036