The Correlation between Poverty and Various Internet Infrastructure Indicators
Team 2, School of Professional Studies, CUNY
5/24/2019
Question 1: What economic indicators (race, occupation, community poverty rate) are most strongly correlated with internet access rates?
Question 2: Can we build a model that accurately predicts said rates?
Question 3: Are internet access rates a stronger predictor of poverty rates than other forms of social investment (ie roads, schools, hospitals)?
Question 4: Do these effects extend across internet technologies(cell phones and broadband internet)? If not, which type of infrastructure investment is better.
Motivation, Literature Review, Methodology, and Hypothesis
Correlation between Various Technology Indicators and Poverty Rates
Data Initialization, Plots, and Models (SVM, Neural Networks, and GLM)
Finance Analysis
Conclusion and Next Steps
| Model | RMSE.train | RSquared.train | RMSE.test | RSquared.test |
|---|---|---|---|---|
| Support Vector Machine | 2.8493 | 0.8711891 | 1.5309 | 0.5936714 |
Variable Importance Plot for Social Indicators
We were successfully able to evaluate the efficacy of several different models.
The non-parametric models did not perform as well as the parametric one, perhaps because we had reduced our dimensionality during pre-processing.
Additionally, they both suffer from large test/train splits.
The linear model performed very well, but the strongest indicators were raced based - seeming to an indicate geography as a confounding factor. -
The random forest method performed only slightly worse than the generalized linear model, but due to the iterative nature of the algorithm,it had a superior test/train split indicating more generalizability
Because the glm and random forest have human-readable coefficients/factors and performed the best, we will continue using them moving forward.