NEET Analysis

Author

Jihong Zhang

1 Analysis Plan

We can conduct descriptive statistics by three NEET groups to examine groups differences regarding demographic information, STG proportion, CLD outcomes.
We can use odds ratio (OR) significance tests to test the significance of groups differences regarding STGs. This can help us test the hypotheses that whether some specific STGs associate with long-term NEET or long-term exit.
To test the significance of specific pair of groups (such as long-term NEET vs. long-term exit), we can conduct multinominal logistic regression with regression coefficient (b1) is the estimated increase in the log odds.

2 Data Analysis

YM = Young mothers; HY = Hidden Youth; SD = School dropouts; PSD = Potential school dropouts; YASB = Youth with anti-social behavior(s); EMY = Ethnic minority youth; YRCS = Youth living in residential care settings; YO = Youth offender; YCR = Youth with criminal record(s); SEN = Youth with special education needs;

R Code

source("Code/data_Preparation_NEET0609.R")

2.1 Descriptive statistics

We have relatively high missing rate for education, because of respondents choose option 7 “none of the above”. Those samples will be removed for further OR analysis and multinomial regression analysis.

R Code

kable(res$desc, digits = 2) |> 
  row_spec(which(rownames(res$desc) %in% c("EduF", "EduM")), background = softcolors[3]) |> 
  kable_styling()

Table 1: Sample Size for demographic variables
	N	(NA)	Mean	SD	\|	Median	Min	Max	Skewness	Kurtosis
Male	443		1.50	0.50	\|	2	1	2	0.00	-2.00
Age	443		18.73	3.26	\|	18	14	29	0.72	-0.05
EduF	170	273	2.56	1.31	\|	2	1	6	0.88	0.11
EduM	184	259	2.44	1.17	\|	2	1	6	0.84	0.40
EduS	439	4	2.91	0.85	\|	3	1	6	1.24	2.39
Residence	443		17.60	4.48	\|	17	1	29	-0.70	1.98
Assistance	438	5	1.84	0.37	\|	2	1	2	-1.80	1.26

Remove missing values of residence and assistance. The final sample is N = 434.

Table 2: Frequency table for demographic variables
Var	Levels	N	%
Male	1	216	49.8
	2	218	50.2
Education	1	2	0.5
	2	130	30.0
	3	240	55.3
	4	35	8.1
	5	19	4.4
	6	8	1.8
Assistance	1	70	16.1
	2	364	83.9

Table 3: Sample size by groups
Subgroup	N	Prop(%)
Long-Term NEET	153	35.25
Long-Term Exit	50	11.52
Temporary NEET	231	53.23
Missing	1709	79.96

Table 3: the last column Prop(%) represents differnt things for NEET subgroups and Missing. For the missing group, it denotes the proportion of missing rate out of the whole samples (N= 2143). For NEET groups, those numbers denote proportion of the group size out of the NEET group (N = 434).

R Code

datNeetWide <- NEET_Subgroup_Cleaned_basic |> 
  dplyr::select(Male, Age, EduS, 
                Residence, Assistance,
                YM:SEN, CE1:CLDH1, Subgroup) |> 
  filter(!is.na(EduS), !is.na(Assistance))
datNeetLong <-  datNeetWide |> 
  rownames_to_column('id') |> 
  pivot_longer(-c(id, Subgroup), names_to = "Vars", 
               values_to = "Resp", values_transform = as.numeric)

datNeetLongTbl <- datNeetLong |> 
  group_by(Subgroup, Vars) |> 
  summarise(Mean = mean(Resp, na.rm = TRUE)) |> 
  ungroup() |> 
  mutate(Subgroup = factor(Subgroup, labels = c("Long-Term NEET",
                                                "Long-Term Exit",
                                                "Temporary NEET",
                                                "NotTracked NEET"))) |> 
  pivot_wider(names_from = Subgroup, values_from = Mean) |> 
  mutate(Vars = factor(Vars, levels = unique(datNeetLong$Vars))) |> 
  arrange(Vars) |> 
  select(-`NotTracked NEET`)

max_values_byRow <- datNeetLongTbl |>
  mutate(across(where(is.numeric), \(x) round(x, 3))) |>
  select(-Vars) |>
  apply(1, max) |>
  as.numeric() 
 
datNeetLongTbl %>%
  mutate(across(where(is.numeric), \(x) round(x, 3))) |>
  mutate(across(-Vars, \(x) if_else(row_number() %in% which(x %in% max_values_byRow),
    cell_spec(x,
      format = "html",
      color = "#B4464B", bold = TRUE
    ), as.character(x)
  ))) %>%
  mykbl(escape = FALSE, booktabs = TRUE) |> 
  row_spec(which(row_number(datNeetLongTbl) %in% 1:5),
           background = softcolors[2]) |>
  row_spec(which(row_number(datNeetLongTbl) %in% 6:15),
           background = softcolors[4]) |>
  row_spec(which(row_number(datNeetLongTbl) %in% 16:20),
           background = softcolors[6])

Table 4: Descriptive statistics for groups
Vars	Long-Term NEET	Long-Term Exit	Temporary NEET
Male	1.503	1.48	1.506
Age	17.882	20.26	18.935
EduS	2.908	3.08	2.883
Residence	16.735	19.14	17.771
Assistance	1.85	1.88	1.823
YM	0.092	0.04	0.087
HY	0.033	0.02	0.03
SD	0.131	0.3	0.203
PSD	0.386	0	0.268
YASB	0.144	0.04	0.121
EMY	0.033	0.04	0.108
YRCS	0.02	0.12	0.013
YO	0.039	0	0.039
YCR	0.046	0.1	0.03
SEN	0.105	0.2	0.082
CE1	1.912	1.39	1.877
SC1	1.951	1.624	2.015
SI1	3.134	2.748	3.158
YCDC1	3.035	2.725	3.07
CLDH1	2.723	2.27	2.083

2.2 Odds ratio tests with one-tailed chi-square test

The results in table below suggest that

PSD (Potential school dropouts) is more associated with higher chance of being long-term NEET;
SD, YRCS (Youth living in residential care settings), and SEN are more associated with higher chance of long-term exit,
EMY (Ethnic minority youth) is correlated with higher likelihood of temporary NEET.

R Code

datOR <- datNeetLong |> 
  filter(Subgroup != 4) |> 
  dummy_cols(select_columns = "Subgroup") |> 
  pivot_wider(names_from = Vars, values_from = Resp)

compr <- expand_grid(
  y = paste0("Subgroup_", 1:3),
  x = c("YM", "HY", "SD", "PSD", "YASB", "EMY", "YRCS", "YO", "YCR", "SEN")
)

OR_onepair <- function(y, x) {
  OR_list = fisher.test(datOR[[y]], datOR[[x]], alternative = 'greater')
  
  res <- data.frame(
    y = y,
    x = x,
    OR = round(OR_list$estimate, 4), # an estimate of the odds ratio
    p.value = round(OR_list$p.value, 4),
    CI = paste0("[",paste0(round(as.numeric(OR_list$conf.int), 2), collapse = ", "), "]")
  )
  
  res
}

OR_output <- Reduce("rbind", map2(compr$y, compr$x, \(x, y) OR_onepair(x, y)))
OR_output <- OR_output |> 
  mutate(y = factor(y, labels = c("Long-Term NEET","Long-Term Exit","Temporary NEET"),
                    levels = paste0("Subgroup_", 1:3)
                    ))


kbl(OR_output, row.names = FALSE) |> 
  kable_styling(bootstrap_options = c("condensed")) |> 
  row_spec(which(OR_output$p.value < 0.05),
           background = softcolors[2])

Table 5: Odds ratio results
y	x	OR	p.value	CI
Long-Term NEET	YM	1.1853	0.3791	[0.61, Inf]
Long-Term NEET	HY	1.1525	0.5081	[0.36, Inf]
Long-Term NEET	SD	0.5319	0.9932	[0.32, Inf]
Long-Term NEET	PSD	2.2127	0.0002	[1.51, Inf]
Long-Term NEET	YASB	1.4039	0.1633	[0.82, Inf]
Long-Term NEET	EMY	0.3185	0.9972	[0.11, Inf]
Long-Term NEET	YRCS	0.6051	0.8565	[0.14, Inf]
Long-Term NEET	YO	1.2329	0.4437	[0.43, Inf]
Long-Term NEET	YCR	1.0746	0.5297	[0.42, Inf]
Long-Term NEET	SEN	1.0148	0.5426	[0.55, Inf]
Long-Term Exit	YM	0.4296	0.9386	[0.07, Inf]
Long-Term Exit	HY	0.6332	0.8011	[0.03, Inf]
Long-Term Exit	SD	2.0239	0.0304	[1.09, Inf]
Long-Term Exit	PSD	0.0000	1.0000	[0, Inf]
Long-Term Exit	YASB	0.2789	0.9902	[0.05, Inf]
Long-Term Exit	EMY	0.4923	0.9059	[0.08, Inf]
Long-Term Exit	YRCS	8.5091	0.0009	[2.66, Inf]
Long-Term Exit	YO	0.0000	1.0000	[0, Inf]
Long-Term Exit	YCR	2.9262	0.0551	[0.97, Inf]
Long-Term Exit	SEN	2.4861	0.0226	[1.17, Inf]
Temporary NEET	YM	1.1075	0.4543	[0.59, Inf]
Temporary NEET	HY	1.0260	0.5951	[0.35, Inf]
Temporary NEET	SD	1.2255	0.2418	[0.79, Inf]
Temporary NEET	PSD	0.8956	0.7335	[0.62, Inf]
Temporary NEET	YASB	1.0287	0.5219	[0.61, Inf]
Temporary NEET	EMY	3.3896	0.0024	[1.56, Inf]
Temporary NEET	YRCS	0.2844	0.9899	[0.07, Inf]
Temporary NEET	YO	1.3302	0.3954	[0.49, Inf]
Temporary NEET	YCR	0.4982	0.9557	[0.19, Inf]
Temporary NEET	SEN	0.6108	0.9573	[0.34, Inf]

2.3 Multinomial logistic regression

R Code

dat <- datNeetWide |> 
  mutate(
    Male = factor(Male-1, levels = 0:1),
    EduS = EduS-1,
    Assistance = factor(Assistance-1, levels = 0:1)
  ) |> 
  mutate(Subgroup = factor(Subgroup, labels = c("Long-Term NEET",
                                                "Long-Term Exit",
                                                "Temporary NEET",
                                                "Missing"))) |> 
  select(-c(YM, HY, SD, PSD, YASB, EMY, YRCS, YO, YCR, SEN))

coef_tbl <- coefMultinom(dat = dat)
colnames(coef_tbl) <- c("Predictor", "b (Long-Term Exit)",
                        "b (Temporary NEET)")
kable(coef_tbl, row.names = TRUE)

	Predictor	b (Long-Term Exit)	b (Temporary NEET)
1	Male1	-0.161	-0.048
2	Age	0.212***	0.125**
3	EduS	-0.026	-0.216
4	Residence	0.004	0.004
5	Assistance1	0.239	-0.165
6	CE1	-0.33*	-0.31
7	SC1	-0.085	0.351
8	SI1	-0.062	0.066
9	YCDC1	-0.003	0.008
10	CLDH1	-0.037	-0.036