Overview

This analysis explores the baseline data in complimentary to the descriptive report/analysis, with: - Non-parametric clustering analysis for households - ML prediction models on some key variables of interests: secondary enrollment, IPV

With sufficient visualization

Clustering Analysis

The non-parametric analysis below using PAM: Partitioning Around Medoids to discover similarities between households in the PSSN-II baseline dataset. PAM instead of the go to K-means clustering is used, given the prior of the outliers and substantial variance in many variables of interests, e.g. income.

Silhouette Analysis for optimal clusters

Clustering with PAM

Prediction Models

The analysis in this section aims at explore some predictions on some key variables of research interests: namely secondary school enrollment and IPV