This analysis compares several prediction models to uncover some insights on why secondary education enrollment is low, the models can be extended and used for other outcomes later on, e.g. end line PW participation.
The analysis in this section aims at explore some predictions on some key variables of research interests: namely secondary school enrollment and income
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| empw_dmnbus_never | 0.837 | 0.304 | 2.755 | 0.006 |
| assets_hholdietms_19 | 0.311 | 0.109 | 2.849 | 0.004 |
| empw_cares | -0.267 | 0.101 | -2.643 | 0.008 |
| dummy_electricity | 0.241 | 0.070 | 3.436 | 0.001 |
| empw_dmnbus | 0.221 | 0.082 | 2.698 | 0.007 |
| empw_cagrp | 0.181 | 0.065 | 2.788 | 0.005 |
| adult_equiv | -0.154 | 0.039 | -3.941 | 0.000 |
| exp_educ_ind_day_wspil3 | -0.075 | 0.025 | -2.953 | 0.003 |
| exp_educ_ind_day_wspil2 | 0.050 | 0.018 | 2.855 | 0.004 |
| cons_restrict | -0.021 | 0.007 | -3.054 | 0.002 |
| exp_festivities_ind_day_wdescr | -0.014 | 0.006 | -2.583 | 0.010 |
| personal_hygien_cost_wdescr | 0.001 | 0.000 | 2.909 | 0.004 |
| publ_trans_amt_month_wdescr | -0.000 | 0.000 | -2.639 | 0.008 |
| pssn_monthly_transfer | 0.000 | 0.000 | 22.044 | 0.000 |
| farm_sales_value_ind_wtargt | -0.000 | 0.000 | -2.863 | 0.004 |
| child_health_exp_month | -0.000 | 0.000 | -2.623 | 0.009 |
| exp_hhitems_month_raw | 0.000 | 0.000 | 3.087 | 0.002 |
##
## Call:
## randomForest(formula = secondary_enrollment_rate ~ ., data = train_clean_sampled)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 447
##
## Mean of squared residuals: 0.1834125
## % Var explained: 24
##
## Call:
## svm(formula = secondary_enrollment_rate ~ ., data = train)
##
##
## Parameters:
## SVM-Type: eps-regression
## SVM-Kernel: radial
## cost: 1
## gamma: 0.0007451565
## epsilon: 0.1
##
##
## Number of Support Vectors: 1862
## # weights: 821
## initial value 465.235010
## iter 10 value 308.197086
## iter 20 value 253.858466
## iter 30 value 212.714160
## iter 40 value 185.495524
## iter 50 value 175.391266
## iter 60 value 166.903255
## iter 70 value 163.396769
## iter 80 value 159.540519
## iter 90 value 158.313103
## iter 100 value 157.442203
## final value 157.442203
## stopped after 100 iterations
## a 80-10-1 network with 821 weights
## inputs: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20 PC21 PC22 PC23 PC24 PC25 PC26 PC27 PC28 PC29 PC30 PC31 PC32 PC33 PC34 PC35 PC36 PC37 PC38 PC39 PC40 PC41 PC42 PC43 PC44 PC45 PC46 PC47 PC48 PC49 PC50 PC51 PC52 PC53 PC54 PC55 PC56 PC57 PC58 PC59 PC60 PC61 PC62 PC63 PC64 PC65 PC66 PC67 PC68 PC69 PC70 PC71 PC72 PC73 PC74 PC75 PC76 PC77 PC78 PC79 PC80
## output(s): secondary_enrollment_rate
## options were -
| Linear Regression | Random Forest | Support Vector Machine | Neural Network |
|---|---|---|---|
| 0.6347446 | 0.1781839 | 0.2396441 | 0.39855 |