Overview

This analysis compares several prediction models to uncover some insights on why secondary education enrollment is low, the models can be extended and used for other outcomes later on, e.g. end line PW participation.

Prediction Models

The analysis in this section aims at explore some predictions on some key variables of research interests: namely secondary school enrollment and income

Secondary Enrollment Rate

Linear Regression

Sorted Significant Linear Regression Coefficients
Estimate Std. Error t value Pr(>|t|)
empw_dmnbus_never 0.837 0.304 2.755 0.006
assets_hholdietms_19 0.311 0.109 2.849 0.004
empw_cares -0.267 0.101 -2.643 0.008
dummy_electricity 0.241 0.070 3.436 0.001
empw_dmnbus 0.221 0.082 2.698 0.007
empw_cagrp 0.181 0.065 2.788 0.005
adult_equiv -0.154 0.039 -3.941 0.000
exp_educ_ind_day_wspil3 -0.075 0.025 -2.953 0.003
exp_educ_ind_day_wspil2 0.050 0.018 2.855 0.004
cons_restrict -0.021 0.007 -3.054 0.002
exp_festivities_ind_day_wdescr -0.014 0.006 -2.583 0.010
personal_hygien_cost_wdescr 0.001 0.000 2.909 0.004
publ_trans_amt_month_wdescr -0.000 0.000 -2.639 0.008
pssn_monthly_transfer 0.000 0.000 22.044 0.000
farm_sales_value_ind_wtargt -0.000 0.000 -2.863 0.004
child_health_exp_month -0.000 0.000 -2.623 0.009
exp_hhitems_month_raw 0.000 0.000 3.087 0.002

Random Forest

## 
## Call:
##  randomForest(formula = secondary_enrollment_rate ~ ., data = train_clean_sampled) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 447
## 
##           Mean of squared residuals: 0.1834125
##                     % Var explained: 24

Support Vector Machine (SVM)

## 
## Call:
## svm(formula = secondary_enrollment_rate ~ ., data = train)
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.0007451565 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  1862

Neural Network

## # weights:  821
## initial  value 465.235010 
## iter  10 value 308.197086
## iter  20 value 253.858466
## iter  30 value 212.714160
## iter  40 value 185.495524
## iter  50 value 175.391266
## iter  60 value 166.903255
## iter  70 value 163.396769
## iter  80 value 159.540519
## iter  90 value 158.313103
## iter 100 value 157.442203
## final  value 157.442203 
## stopped after 100 iterations
## a 80-10-1 network with 821 weights
## inputs: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20 PC21 PC22 PC23 PC24 PC25 PC26 PC27 PC28 PC29 PC30 PC31 PC32 PC33 PC34 PC35 PC36 PC37 PC38 PC39 PC40 PC41 PC42 PC43 PC44 PC45 PC46 PC47 PC48 PC49 PC50 PC51 PC52 PC53 PC54 PC55 PC56 PC57 PC58 PC59 PC60 PC61 PC62 PC63 PC64 PC65 PC66 PC67 PC68 PC69 PC70 PC71 PC72 PC73 PC74 PC75 PC76 PC77 PC78 PC79 PC80 
## output(s): secondary_enrollment_rate 
## options were -

Compare and evaluate these prediction models

Performance (MSE) of different prediction models
Linear Regression Random Forest Support Vector Machine Neural Network
0.6347446 0.1781839 0.2396441 0.39855