Posit Cloud: https://posit.cloud/spaces/449020/content/7089697
Has the overall wealth/poverty divide in the world changed for better or worse in the past ~30 years? How have certain factors affected the change?
Each case is an individual recording of a country, when the poverty report was done, along with the ratio/amount of people living below a certain standard. There are 4877 cases.
The data was obtained from a public Kaggle dataset- https://www.kaggle.com/datasets/eishkaran/world-poverty-data
This is an observational study.
The data was obtained from a public Kaggle dataset- https://www.kaggle.com/datasets/eishkaran/world-poverty-data , which was sourced from the World Bank: https://www.worldbank.org/en/news/factsheet/2022/05/02/fact-sheet-an-adjustment-to-global-poverty-lines#2
The response variables are headcount_ratio_international_povline, headcount_ratio_lower_mid_income_povline, headcount_ratio_upper_mid_income_povline, which are the ratio headlines.
Additionally we looked at the totals, which are headcount_international_povline, headcount_lower_mid_income_povline, headcount_upper_mid_income_povline, along with the mean income
country, year, reporting_level, survey_year, total population (which we computed)
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
#Possibly needed libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(shiny)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.3 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(psych)
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
#Abstract & Introduction
For our final project, we decided to analyze historical data relating to global levels of poverty and income for the prior 30 years, which was collected by the World Bank in an effort by the United Nations to end poverty by 2030.
Our motivation for this analysis was to see how certain countries have changed in their poverty measures, and to see if there are any drivers for the changes in poverty lines, of which there are multiple, such as mean income and total population, and calculations methods.
This data, which ranges from 1967 to 2021, compares the values using a metric called Purchasing Power Parity (PPP), which is used to convert different currencies into a common, internationally comparable unit. This data accounts for both price differences, cost of living, and inflation across countries and regions. The poverty lines come in 3 levels- International Base Poverty, International Lower-Middle, and International Middle-Upper lines.
We decided to use the country, year, and PPP version as our independent variables, while the headcounts and ratios of the poverty levels, along with the mean income and total population as our dependent variables. The data itself is rather tidy, in that it is segmented by country, year, reporting level, and PPP version and includes the totals and ratios of the population living in certain levels of poverty.
After our analysis, we found that on a world-level, poverty is shrinking and income rising, but at different rates for certain countries, and regions, and that this trend coincides with the relationships we found between poverty, population, and mean income. This analysis is important since by analyzing our findings, we can provide actionable insights and potential causes for reducing poverty world-wide, at least on a broader scale.
#Data
#Load data and make into frame
poverty_raw <- read.csv(url("https://raw.githubusercontent.com/RonBalaban/CUNY-SPS-R/main/pip_dataset.csv"))
poverty_frame <- as.data.frame(poverty_raw)
# Adding total population based on ratios
poverty_frame <- poverty_frame %>%
mutate(Total_Population = headcount_upper_mid_income_povline / headcount_ratio_upper_mid_income_povline)
head(poverty_frame)
## country year reporting_level welfare_type ppp_version survey_year
## 1 Albania 1996 national consumption 2011 1996
## 2 Albania 2002 national consumption 2011 2002
## 3 Albania 2005 national consumption 2011 2005
## 4 Albania 2008 national consumption 2011 2008
## 5 Albania 2012 national consumption 2011 2012
## 6 Albania 2014 national consumption 2011 2014
## survey_comparability headcount_ratio_international_povline
## 1 0 0.9206690
## 2 1 1.5708434
## 3 1 0.8605271
## 4 1 0.3136496
## 5 1 0.8497544
## 6 2 1.5808972
## headcount_ratio_lower_mid_income_povline
## 1 11.174149
## 2 14.132118
## 3 8.715685
## 4 5.250542
## 5 6.182414
## 6 11.615621
## headcount_ratio_upper_mid_income_povline headcount_ratio_100
## 1 44.61842 0.05741581
## 2 49.66964 0.04637062
## 3 38.54525 0.02762022
## 4 31.11034 0.00000000
## 5 34.52891 0.11132100
## 6 37.03384 0.00000000
## headcount_ratio_1000 headcount_ratio_2000 headcount_ratio_3000
## 1 86.17521 99.63208 99.91247
## 2 85.33832 98.08006 99.63839
## 3 79.98713 97.57660 99.43446
## 4 75.85102 97.09978 99.01586
## 5 77.06984 97.77263 99.61830
## 6 71.58569 95.18214 99.12649
## headcount_ratio_4000 headcount_ratio_40_median headcount_ratio_50_median
## 1 99.95166 2.768821 7.548123
## 2 99.84308 3.212283 8.406373
## 3 99.75174 4.095002 8.991533
## 4 99.67825 2.118535 7.227714
## 5 99.88598 3.492413 7.753112
## 6 99.81976 7.514618 14.375828
## headcount_ratio_60_median headcount_international_povline
## 1 14.80251 29167
## 2 16.02055 47927
## 3 16.67577 25915
## 4 14.13715 9244
## 5 14.84282 24646
## 6 22.21602 45674
## headcount_lower_mid_income_povline headcount_upper_mid_income_povline
## 1 354001 1413526
## 2 431172 1515426
## 3 262472 1160785
## 4 154750 916920
## 5 179315 1001477
## 6 335587 1069946
## headcount_100 headcount_1000 headcount_2000 headcount_3000 headcount_4000
## 1 1819 2730059 3156377 3165260 3166501
## 2 1415 2603681 2992432 3039977 3046222
## 3 832 2408802 2938507 2994456 3004011
## 4 0 2235568 2861836 2918308 2937831
## 5 3229 2235334 2835798 2889330 2897094
## 6 0 2068185 2749911 2863867 2883897
## headcount_40_median headcount_50_median headcount_60_median
## 1 87717 239127 468949
## 2 98007 256479 488789
## 3 123320 270779 502189
## 4 62440 213023 416666
## 5 101294 224871 430501
## 6 217105 415333 641844
## avg_shortfall_international_povline avg_shortfall_lower_mid_income_povline
## 1 0.2890260 0.5999076
## 2 0.3166081 0.6466790
## 3 0.2899562 0.6467713
## 4 0.3104121 0.4577278
## 5 0.3781038 0.6586981
## 6 0.2546938 0.7016316
## avg_shortfall_upper_mid_income_povline avg_shortfall_100 avg_shortfall_1000
## 1 1.583098 0.20422681 4.459419
## 2 1.671541 0.53508932 4.719031
## 3 1.514365 0.00832818 4.228461
## 4 1.307345 NA 3.881222
## 5 1.377123 0.10741674 4.030469
## 6 1.742768 NA 4.431036
## avg_shortfall_2000 avg_shortfall_3000 avg_shortfall_4000
## 1 13.50430 23.45916 33.44882
## 2 13.70162 23.42240 33.36826
## 3 12.91856 22.61159 32.52479
## 4 12.36353 22.06975 31.89560
## 5 12.54608 22.24632 32.17559
## 6 12.47453 21.84794 31.67118
## avg_shortfall_40_median avg_shortfall_50_median avg_shortfall_60_median
## 1 0.3446150 0.4981147 0.6845159
## 2 0.3854980 0.5177960 0.6859527
## 3 0.4360586 0.6565995 0.8516646
## 4 0.4295745 0.5748611 0.8030727
## 5 0.5437887 0.7146906 0.8884043
## 6 0.5253690 0.7801093 1.0620848
## total_shortfall_international_povline
## 1 8430.020
## 2 15174.078
## 3 7514.215
## 4 2869.449
## 5 9318.746
## 6 11632.883
## total_shortfall_lower_mid_income_povline
## 1 212367.88
## 2 278829.87
## 3 169759.35
## 4 70833.38
## 5 118114.46
## 6 235458.46
## total_shortfall_upper_mid_income_povline total_shortfall_100
## 1 2237750 371.488565
## 2 2533096 757.151389
## 3 1757852 6.929046
## 4 1198731 0.000000
## 5 1379157 346.848655
## 6 1864668 0.000000
## total_shortfall_1000 total_shortfall_2000 total_shortfall_3000
## 1 12174477 42624656 74254340
## 2 12286852 41001158 71203557
## 3 10185526 37961293 67709400
## 4 8676735 35382386 64406320
## 5 9009444 35578150 64276958
## 6 9164201 34303852 62569586
## total_shortfall_4000 total_shortfall_40_median total_shortfall_50_median
## 1 105915713 30228.59 119112.7
## 2 101647115 37781.50 132803.8
## 3 97704835 53774.75 177793.4
## 4 93703870 26822.63 122458.6
## 5 93215717 55082.53 160713.2
## 6 91336415 114060.23 324005.1
## total_shortfall_60_median income_gap_ratio_international_povline
## 1 321003.0 15.21189
## 2 335286.1 16.66359
## 3 427696.6 15.26085
## 4 334613.1 16.33748
## 5 382458.9 19.90020
## 6 681692.8 13.40493
## income_gap_ratio_lower_mid_income_povline
## 1 18.74711
## 2 20.20872
## 3 20.21160
## 4 14.30399
## 5 20.58432
## 6 21.92599
## income_gap_ratio_upper_mid_income_povline income_gap_ratio_100
## 1 28.78360 20.422681
## 2 30.39165 53.508932
## 3 27.53390 0.832818
## 4 23.76991 NA
## 5 25.03859 10.741674
## 6 31.68669 NA
## income_gap_ratio_1000 income_gap_ratio_2000 income_gap_ratio_3000
## 1 44.59419 67.52149 78.19720
## 2 47.19031 68.50809 78.07467
## 3 42.28461 64.59282 75.37195
## 4 38.81222 61.81763 73.56582
## 5 40.30469 62.73040 74.15440
## 6 44.31036 62.37266 72.82646
## income_gap_ratio_4000 income_gap_ratio_40_median income_gap_ratio_50_median
## 1 83.62204 14.91890 17.25131
## 2 83.42064 17.39735 18.69432
## 3 81.31198 16.87440 20.32703
## 4 79.73899 15.43531 16.52455
## 5 80.43898 19.91816 20.94243
## 6 79.17794 19.11590 22.70784
## income_gap_ratio_60_median poverty_gap_index_international_povline
## 1 19.75582 0.14005071
## 2 20.63782 0.26176109
## 3 21.97156 0.13132549
## 4 19.23714 0.05124112
## 5 21.69394 0.16910086
## 6 25.76311 0.21191933
## poverty_gap_index_lower_mid_income_povline
## 1 2.0948318
## 2 2.8559177
## 3 1.7615814
## 4 0.7510374
## 5 1.2726091
## 6 2.5468370
## poverty_gap_index_upper_mid_income_povline poverty_gap_index_100
## 1 12.842785 0.0117261583
## 2 15.095426 0.0248164178
## 3 10.613009 0.0002300872
## 4 7.394906 0.0000000000
## 5 8.645555 0.0119586449
## 6 11.734796 0.0000000000
## poverty_gap_index_1000 poverty_gap_index_2000 poverty_gap_index_3000
## 1 38.42913 67.27306 78.12875
## 2 40.27143 67.19276 77.79233
## 3 33.82225 63.02749 74.94570
## 4 29.43947 60.02480 72.84183
## 5 31.06275 61.33316 73.87135
## 6 31.71987 59.36763 72.19030
## poverty_gap_index_4000 mean median decile1_avg decile2_avg decile3_avg
## 1 83.58161 6.570821 5.774805 2.538496 3.475535 4.191319
## 2 83.28973 6.715828 5.539607 2.346511 3.263519 3.923426
## 3 81.11013 7.591930 6.460357 2.643917 3.735312 4.537911
## 4 79.48243 8.314345 6.957659 3.104265 4.270897 5.062166
## 5 80.34727 7.882867 6.825289 2.884756 4.093648 4.842953
## 6 79.03524 8.399775 6.870837 2.396459 3.505870 4.387129
## decile4_avg decile5_avg decile6_avg decile7_avg decile8_avg decile9_avg
## 1 4.811182 5.506126 6.146939 7.111657 8.196356 9.790226
## 2 4.525262 5.139692 5.936429 6.869341 8.048536 10.023425
## 3 5.254513 6.064554 6.860879 7.874459 9.206284 11.261826
## 4 5.806393 6.578375 7.419832 8.562724 9.918245 12.085071
## 5 5.542227 6.372377 7.297124 8.289193 9.690645 11.736908
## 6 5.301601 6.310224 7.503401 8.898472 10.688325 13.549610
## decile10_avg decile1_share decile2_share decile3_share decile4_share
## 1 13.94037 3.863286 5.289347 6.378683 7.322042
## 2 17.08214 3.494002 4.859444 5.842059 6.738204
## 3 18.47964 3.482536 4.920109 5.977283 6.921183
## 4 20.33548 3.733625 5.136781 6.088472 6.983584
## 5 18.07883 3.659527 5.193095 6.143644 7.030726
## 6 21.45666 2.853004 4.173767 5.222913 6.311599
## decile5_share decile6_share decile7_share decile8_share decile9_share
## 1 8.379662 9.354903 10.82309 12.47387 14.89955
## 2 7.653102 8.839459 10.22859 11.98443 14.92508
## 3 7.988158 9.037069 10.37214 12.12641 14.83394
## 4 7.912079 8.924133 10.29873 11.92908 14.53520
## 5 8.083833 9.256943 10.51546 12.29330 14.88914
## 6 7.512373 8.932860 10.59370 12.72454 16.13092
## decile10_share decile1_thr decile2_thr decile3_thr decile4_thr decile6_thr
## 1 21.21557 3.06 3.88 4.48 5.16 6.66
## 2 25.43564 2.91 3.62 4.22 4.85 6.35
## 3 24.34117 3.30 4.18 4.93 5.63 7.32
## 4 24.45831 3.81 4.70 5.43 6.20 7.93
## 5 22.93434 3.65 4.49 5.17 5.93 7.74
## 6 25.54432 3.05 3.96 4.82 5.82 8.14
## decile7_thr decile8_thr decile9_thr gini mld polarization
## 1 7.61 8.85 10.92 0.2701034 0.1191043 0.2412933
## 2 7.38 8.83 11.58 0.3173898 0.1648116 0.2689816
## 3 8.51 10.02 12.78 0.3059566 0.1544128 0.2545287
## 4 9.24 10.74 13.62 0.2998467 0.1488934 0.2473111
## 5 8.91 10.52 13.26 0.2896048 0.1384171 0.2499879
## 6 9.71 11.74 15.78 0.3459890 0.1986616 0.3243097
## palma_ratio s80_s20_ratio p90_p10_ratio p90_p50_ratio p50_p10_ratio
## 1 0.9283351 3.945872 3.568627 1.889273 1.888889
## 2 1.2150564 4.831625 3.979381 2.090253 1.903780
## 3 1.1427183 4.662236 3.872727 1.978328 1.957576
## 4 1.1146566 4.395911 3.574803 1.956897 1.826772
## 5 1.0411926 4.272573 3.632877 1.941435 1.871233
## 6 1.3762154 5.930924 5.173770 2.296943 2.252459
## Total_Population
## 1 31680.33
## 2 30510.11
## 3 30114.86
## 4 29473.15
## 5 29004.02
## 6 28891.03
#Simple summary statistics
summary(poverty_frame)
## country year reporting_level welfare_type
## Length:4877 Min. :1967 Length:4877 Length:4877
## Class :character 1st Qu.:2000 Class :character Class :character
## Mode :character Median :2007 Mode :character Mode :character
## Mean :2006
## 3rd Qu.:2013
## Max. :2021
##
## ppp_version survey_year survey_comparability
## Min. :2011 Min. :1967 Min. :0.000
## 1st Qu.:2011 1st Qu.:2000 1st Qu.:1.000
## Median :2011 Median :2007 Median :1.000
## Mean :2014 Mean :2006 Mean :1.639
## 3rd Qu.:2017 3rd Qu.:2014 3rd Qu.:2.000
## Max. :2017 Max. :2021 Max. :6.000
## NA's :466 NA's :466
## headcount_ratio_international_povline headcount_ratio_lower_mid_income_povline
## Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 0.2784 1st Qu.: 0.8079
## Median : 2.0414 Median : 9.1867
## Mean :11.0816 Mean : 21.6727
## 3rd Qu.:13.3910 3rd Qu.: 34.1667
## Max. :96.8714 Max. : 99.9990
##
## headcount_ratio_upper_mid_income_povline headcount_ratio_100
## Min. : 0.000 Min. : 0.00000
## 1st Qu.: 3.032 1st Qu.: 0.04477
## Median : 28.583 Median : 0.37176
## Mean : 36.539 Mean : 3.16472
## 3rd Qu.: 64.865 3rd Qu.: 2.15848
## Max. : 99.999 Max. :79.53262
##
## headcount_ratio_1000 headcount_ratio_2000 headcount_ratio_3000
## Min. : 0.00 Min. : 0.9202 Min. : 6.083
## 1st Qu.: 10.33 1st Qu.: 44.5759 1st Qu.: 71.017
## Median : 55.34 Median : 84.9163 Median : 93.503
## Mean : 50.35 Mean : 69.1060 Mean : 79.081
## 3rd Qu.: 85.13 3rd Qu.: 97.1794 3rd Qu.: 99.113
## Max. :100.00 Max. :100.0000 Max. :100.000
##
## headcount_ratio_4000 headcount_ratio_40_median headcount_ratio_50_median
## Min. : 14.10 Min. : 0.000 Min. : 0.8776
## 1st Qu.: 84.25 1st Qu.: 3.973 1st Qu.: 8.8660
## Median : 96.67 Median : 6.596 Median :12.6409
## Mean : 85.72 Mean : 8.015 Mean :13.4216
## 3rd Qu.: 99.61 3rd Qu.:11.559 3rd Qu.:17.9596
## Max. :100.00 Max. :36.163 Max. :38.9103
## NA's :466 NA's :466
## headcount_ratio_60_median headcount_international_povline
## Min. : 4.892 Min. :0.000e+00
## 1st Qu.:15.583 1st Qu.:2.121e+04
## Median :19.610 Median :3.615e+05
## Mean :20.071 Mean :4.456e+07
## 3rd Qu.:24.748 3rd Qu.:5.160e+06
## Max. :41.442 Max. :2.005e+09
## NA's :466
## headcount_lower_mid_income_povline headcount_upper_mid_income_povline
## Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:8.558e+04 1st Qu.:3.214e+05
## Median :1.027e+06 Median :2.863e+06
## Mean :8.632e+07 Mean :1.264e+08
## 3rd Qu.:1.099e+07 3rd Qu.:2.006e+07
## Max. :3.156e+09 Max. :4.244e+09
##
## headcount_100 headcount_1000 headcount_2000
## Min. : 0 Min. :0.000e+00 Min. :3.736e+03
## 1st Qu.: 2694 1st Qu.:8.784e+05 1st Qu.:2.499e+06
## Median : 63838 Median :4.951e+06 Median :8.313e+06
## Mean : 9676924 Mean :1.522e+08 Mean :1.787e+08
## 3rd Qu.: 895900 3rd Qu.:3.226e+07 3rd Qu.:5.391e+07
## Max. :569231470 Max. :4.833e+09 Max. :6.010e+09
##
## headcount_3000 headcount_4000 headcount_40_median
## Min. :9.913e+03 Min. :1.006e+04 Min. : 0
## 1st Qu.:3.069e+06 1st Qu.:3.674e+06 1st Qu.: 259948
## Median :1.051e+07 Median :1.216e+07 Median : 859900
## Mean :1.907e+08 Mean :1.987e+08 Mean : 4501881
## 3rd Qu.:6.368e+07 3rd Qu.:6.902e+07 3rd Qu.: 3269602
## Max. :6.544e+09 Max. :6.868e+09 Max. :176375997
## NA's :466
## headcount_50_median headcount_60_median avg_shortfall_international_povline
## Min. : 1195 Min. : 1847 Min. :0.00443
## 1st Qu.: 497663 1st Qu.: 787206 1st Qu.:0.45363
## Median : 1489600 Median : 2233892 Median :0.69507
## Mean : 8382880 Mean : 13254819 Mean :0.77819
## 3rd Qu.: 6392031 3rd Qu.: 10529900 3rd Qu.:1.00221
## Max. :268177618 Max. :358853136 Max. :2.15893
## NA's :466 NA's :466 NA's :291
## avg_shortfall_lower_mid_income_povline avg_shortfall_upper_mid_income_povline
## Min. :0.00523 Min. :0.05913
## 1st Qu.:0.92154 1st Qu.:1.96546
## Median :1.28584 Median :2.59216
## Mean :1.34847 Mean :2.67004
## 3rd Qu.:1.68015 3rd Qu.:3.30440
## Max. :3.64996 Max. :6.84993
## NA's :100 NA's :12
## avg_shortfall_100 avg_shortfall_1000 avg_shortfall_2000 avg_shortfall_3000
## Min. :0.0013 Min. : 1.290 Min. : 1.860 Min. : 4.758
## 1st Qu.:0.2183 1st Qu.: 3.507 1st Qu.: 7.188 1st Qu.:12.650
## Median :0.3542 Median : 4.541 Median :11.288 Median :19.992
## Mean :0.4186 Mean : 4.756 Mean :11.127 Mean :18.652
## 3rd Qu.:0.6007 3rd Qu.: 5.790 3rd Qu.:14.530 3rd Qu.:24.134
## Max. :1.0101 Max. :10.000 Max. :20.000 Max. :30.000
## NA's :750 NA's :6
## avg_shortfall_4000 avg_shortfall_40_median avg_shortfall_50_median
## Min. : 8.731 Min. :0.0419 Min. : 0.0498
## 1st Qu.:19.908 1st Qu.:0.4396 1st Qu.: 0.6074
## Median :29.330 Median :1.0867 Median : 1.4246
## Mean :26.875 Mean :1.8868 Mean : 2.3040
## 3rd Qu.:33.972 3rd Qu.:2.7650 3rd Qu.: 3.4442
## Max. :40.000 Max. :9.2963 Max. :11.3676
## NA's :468 NA's :466
## avg_shortfall_60_median total_shortfall_international_povline
## Min. : 0.0839 Min. :0.000e+00
## 1st Qu.: 0.8007 1st Qu.:1.521e+04
## Median : 1.8008 Median :2.314e+05
## Mean : 2.8122 Mean :2.948e+07
## 3rd Qu.: 4.1505 3rd Qu.:3.632e+06
## Max. :14.2995 Max. :1.583e+09
## NA's :466
## total_shortfall_lower_mid_income_povline
## Min. :0.000e+00
## 1st Qu.:9.872e+04
## Median :1.252e+06
## Mean :1.239e+08
## 3rd Qu.:1.598e+07
## Max. :5.480e+09
##
## total_shortfall_upper_mid_income_povline total_shortfall_100
## Min. :0.000e+00 Min. : 0
## 1st Qu.:6.856e+05 1st Qu.: 906
## Median :6.607e+06 Median : 22392
## Mean :4.265e+08 Mean : 2757883
## 3rd Qu.:6.025e+07 3rd Qu.: 324459
## Max. :1.730e+10 Max. :158089241
##
## total_shortfall_1000 total_shortfall_2000 total_shortfall_3000
## Min. :0.000e+00 Min. :9.589e+03 Min. :1.653e+05
## 1st Qu.:2.993e+06 1st Qu.:2.209e+07 1st Qu.:5.547e+07
## Median :2.332e+07 Median :9.120e+07 Median :1.914e+08
## Mean :9.689e+08 Mean :2.660e+09 Mean :4.529e+09
## 3rd Qu.:1.570e+08 3rd Qu.:6.277e+08 3rd Qu.:1.194e+09
## Max. :3.257e+10 Max. :8.363e+10 Max. :1.440e+11
##
## total_shortfall_4000 total_shortfall_40_median total_shortfall_50_median
## Min. :3.207e+05 Min. : 0 Min. : 758
## 1st Qu.:9.264e+07 1st Qu.: 213643 1st Qu.: 598908
## Median :3.330e+08 Median : 967355 Median : 2133861
## Mean :6.495e+09 Mean : 7901541 Mean : 16305022
## 3rd Qu.:1.846e+09 3rd Qu.: 3581223 3rd Qu.: 7914232
## Max. :2.112e+11 Max. :366229625 Max. :690458017
## NA's :466 NA's :466
## total_shortfall_60_median income_gap_ratio_international_povline
## Min. :1.654e+03 Min. : 0.2058
## 1st Qu.:1.220e+06 1st Qu.: 22.4091
## Median :4.266e+06 Median : 34.4059
## Mean :2.952e+07 Mean : 38.4236
## 3rd Qu.:1.553e+07 3rd Qu.: 49.8513
## Max. :1.156e+09 Max. :100.4155
## NA's :466 NA's :291
## income_gap_ratio_lower_mid_income_povline
## Min. : 0.1633
## 1st Qu.: 26.8856
## Median : 37.5422
## Mean : 39.3608
## 3rd Qu.: 49.0408
## Max. : 99.9990
## NA's :100
## income_gap_ratio_upper_mid_income_povline income_gap_ratio_100
## Min. : 1.075 Min. : 0.1339
## 1st Qu.: 32.666 1st Qu.: 21.8340
## Median : 42.025 Median : 35.4210
## Mean : 43.188 Mean : 41.8600
## 3rd Qu.: 53.368 3rd Qu.: 60.0720
## Max. : 99.999 Max. :101.0080
## NA's :12 NA's :750
## income_gap_ratio_1000 income_gap_ratio_2000 income_gap_ratio_3000
## Min. : 12.90 Min. : 9.30 Min. : 15.86
## 1st Qu.: 35.07 1st Qu.: 35.94 1st Qu.: 42.17
## Median : 45.41 Median : 56.44 Median : 66.64
## Mean : 47.56 Mean : 55.63 Mean : 62.17
## 3rd Qu.: 57.90 3rd Qu.: 72.65 3rd Qu.: 80.45
## Max. :100.00 Max. :100.00 Max. :100.00
## NA's :6
## income_gap_ratio_4000 income_gap_ratio_40_median income_gap_ratio_50_median
## Min. : 21.83 Min. : 6.377 Min. : 7.987
## 1st Qu.: 49.77 1st Qu.:19.779 1st Qu.:21.340
## Median : 73.32 Median :26.196 Median :26.726
## Mean : 67.19 Mean :27.359 Mean :27.901
## 3rd Qu.: 84.93 3rd Qu.:34.351 3rd Qu.:33.838
## Max. :100.00 Max. :74.986 Max. :67.338
## NA's :468 NA's :466
## income_gap_ratio_60_median poverty_gap_index_international_povline
## Min. :10.00 Min. : 0.0000
## 1st Qu.:23.17 1st Qu.: 0.1245
## Median :27.65 Median : 0.6422
## Mean :29.09 Mean : 4.0624
## 3rd Qu.:34.74 3rd Qu.: 4.0977
## Max. :68.85 Max. :64.0970
## NA's :466
## poverty_gap_index_lower_mid_income_povline
## Min. : 0.0000
## 1st Qu.: 0.3387
## Median : 2.5326
## Mean : 9.1830
## 3rd Qu.: 12.4506
## Max. : 99.9980
##
## poverty_gap_index_upper_mid_income_povline poverty_gap_index_100
## Min. : 0.000 Min. : 0.00000
## 1st Qu.: 1.055 1st Qu.: 0.01365
## Median : 9.779 Median : 0.14990
## Mean : 18.266 Mean : 1.13075
## 3rd Qu.: 29.319 3rd Qu.: 0.72813
## Max. : 99.998 Max. :41.98930
##
## poverty_gap_index_1000 poverty_gap_index_2000 poverty_gap_index_3000
## Min. : 0.000 Min. : 0.1239 Min. : 1.265
## 1st Qu.: 3.154 1st Qu.: 15.0396 1st Qu.: 30.222
## Median : 22.685 Median : 48.6546 Median : 62.493
## Mean : 28.252 Mean : 44.7445 Mean : 54.781
## 3rd Qu.: 46.917 3rd Qu.: 69.3674 3rd Qu.: 78.971
## Max. : 99.998 Max. : 99.9980 Max. : 99.998
##
## poverty_gap_index_4000 mean median decile1_avg
## Min. : 3.656 Min. : 0.743 Min. : 0.5282 Min. : 0.000
## 1st Qu.: 42.380 1st Qu.: 6.580 1st Qu.: 4.6592 1st Qu.: 1.391
## Median : 70.682 Median :12.846 Median : 9.1076 Median : 2.959
## Mean : 61.831 Mean :19.749 Mean :15.8058 Mean : 5.671
## 3rd Qu.: 84.150 3rd Qu.:26.632 3rd Qu.:21.6472 3rd Qu.: 7.463
## Max. : 99.998 Max. :85.875 Max. :69.4836 Max. :26.906
## NA's :482
## decile2_avg decile3_avg decile4_avg decile5_avg
## Min. : 0.1529 Min. : 0.2382 Min. : 0.3429 Min. : 0.4755
## 1st Qu.: 2.2762 1st Qu.: 2.9703 1st Qu.: 3.6998 1st Qu.: 4.5085
## Median : 4.7560 Median : 6.1963 Median : 7.5307 Median : 8.9686
## Mean : 8.9950 Mean :11.1980 Mean :13.2337 Mean :15.3432
## 3rd Qu.:12.4957 3rd Qu.:15.6662 3rd Qu.:18.4650 3rd Qu.:21.2789
## Max. :40.5614 Max. :49.1261 Max. :56.0456 Max. :64.6635
## NA's :482 NA's :482 NA's :482 NA's :482
## decile6_avg decile7_avg decile8_avg decile9_avg
## Min. : 0.5807 Min. : 0.7132 Min. : 0.8907 Min. : 1.196
## 1st Qu.: 5.3870 1st Qu.: 6.4734 1st Qu.: 7.9534 1st Qu.: 10.491
## Median :10.7406 Median :12.7560 Median : 15.7568 Median : 21.003
## Mean :17.7250 Mean :20.6274 Mean : 24.5551 Mean : 30.903
## 3rd Qu.:24.6651 3rd Qu.:28.7787 3rd Qu.: 34.2894 3rd Qu.: 43.572
## Max. :74.9081 Max. :88.7781 Max. :106.7388 Max. :139.220
## NA's :482 NA's :482 NA's :482 NA's :482
## decile10_avg decile1_share decile2_share decile3_share
## Min. : 1.995 Min. :0.000 Min. :0.760 Min. :1.587
## 1st Qu.: 20.327 1st Qu.:1.877 1st Qu.:3.339 1st Qu.:4.379
## Median : 42.430 Median :2.718 Median :4.253 Median :5.369
## Mean : 55.409 Mean :2.631 Mean :4.096 Mean :5.134
## 3rd Qu.: 79.494 3rd Qu.:3.394 3rd Qu.:4.971 3rd Qu.:6.018
## Max. :264.254 Max. :5.494 Max. :6.876 Max. :7.690
## NA's :482 NA's :482 NA's :482 NA's :482
## decile4_share decile5_share decile6_share decile7_share
## Min. :2.549 Min. :3.310 Min. : 4.204 Min. : 5.248
## 1st Qu.:5.414 1st Qu.:6.530 1st Qu.: 7.836 1st Qu.: 9.449
## Median :6.388 Median :7.450 Median : 8.628 Median :10.085
## Mean :6.121 Mean :7.165 Mean : 8.361 Mean : 9.844
## 3rd Qu.:6.961 3rd Qu.:7.963 3rd Qu.: 9.101 3rd Qu.:10.423
## Max. :8.420 Max. :9.142 Max. :10.044 Max. :11.636
## NA's :482 NA's :482 NA's :482 NA's :482
## decile8_share decile9_share decile10_share decile1_thr
## Min. : 6.501 Min. : 8.778 Min. :16.99 Min. : 0.01
## 1st Qu.:11.614 1st Qu.:14.672 1st Qu.:24.48 1st Qu.: 1.79
## Median :11.998 Median :15.287 Median :27.49 Median : 3.57
## Mean :11.891 Mean :15.322 Mean :29.43 Mean : 7.34
## 3rd Qu.:12.248 3rd Qu.:15.918 3rd Qu.:32.90 3rd Qu.: 9.85
## Max. :13.843 Max. :18.950 Max. :61.49 Max. :35.50
## NA's :482 NA's :482 NA's :482
## decile2_thr decile3_thr decile4_thr decile6_thr
## Min. : 0.010 Min. : 0.15 Min. : 0.39 Min. : 0.64
## 1st Qu.: 2.480 1st Qu.: 3.13 1st Qu.: 3.85 1st Qu.: 5.61
## Median : 5.010 Median : 6.30 Median : 7.62 Median :10.96
## Mean : 9.677 Mean :11.67 Mean :13.65 Mean :18.33
## 3rd Qu.:13.300 3rd Qu.:15.88 3rd Qu.:18.64 3rd Qu.:25.12
## Max. :45.400 Max. :52.45 Max. :60.15 Max. :81.50
##
## decile7_thr decile8_thr decile9_thr gini
## Min. : 0.79 Min. : 1.01 Min. : 1.46 Min. :0.1779
## 1st Qu.: 6.82 1st Qu.: 8.68 1st Qu.: 11.96 1st Qu.:0.3087
## Median :13.32 Median : 16.98 Median : 24.56 Median :0.3556
## Mean :21.56 Mean : 26.27 Mean : 35.11 Mean :0.3756
## 3rd Qu.:29.48 3rd Qu.: 35.55 3rd Qu.: 48.35 3rd Qu.:0.4277
## Max. :96.85 Max. :120.10 Max. :164.70 Max. :0.6576
## NA's :476
## mld polarization palma_ratio s80_s20_ratio
## Min. :0.0536 Min. :0.1466 Min. :0.5964 Min. : 2.430
## 1st Qu.:0.1631 1st Qu.:0.2524 1st Qu.:1.1541 1st Qu.: 4.721
## Median :0.2210 Median :0.3004 Median :1.4658 Median : 6.172
## Mean :0.2644 Mean :0.3274 Mean :1.8865 Mean : 8.220
## 3rd Qu.:0.3209 3rd Qu.:0.3802 3rd Qu.:2.1521 3rd Qu.: 9.043
## Max. :0.9370 Max. :0.8157 Max. :8.3436 Max. :72.682
## NA's :476 NA's :476 NA's :482 NA's :482
## p90_p10_ratio p90_p50_ratio p50_p10_ratio Total_Population
## Min. : 2.191 Min. : 1.475 Min. : 1.485 Min. : 101
## 1st Qu.: 3.938 1st Qu.: 2.018 1st Qu.: 1.942 1st Qu.: 51047
## Median : 5.150 Median : 2.290 Median : 2.208 Median : 148589
## Mean : 7.425 Mean : 2.532 Mean : 2.660 Mean : 2192931
## 3rd Qu.: 7.458 3rd Qu.: 2.781 3rd Qu.: 2.740 3rd Qu.: 816780
## Max. :2892.000 Max. :11.492 Max. :809.000 Max. :76833723
## NA's :12
#-------------------------------------------------------------------------------
# The variables we're most interested in
describe(poverty_frame$headcount_ratio_international_povline)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 4877 11.08 18.19 2.04 6.62 3 0 96.87 96.87 2.12 3.94 0.26
describe(poverty_frame$headcount_ratio_lower_mid_income_povline)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 4877 21.67 27.1 9.19 16.77 13.25 0 100 100 1.25 0.32 0.39
describe(poverty_frame$headcount_ratio_upper_mid_income_povline)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 4877 36.54 33.67 28.58 33.76 40.06 0 100 100 0.48 -1.2 0.48
# Further summary
summary(poverty_frame$headcount_ratio_international_povline)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2784 2.0414 11.0816 13.3910 96.8714
summary(poverty_frame$headcount_ratio_lower_mid_income_povline)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.8079 9.1867 21.6727 34.1667 99.9990
summary(poverty_frame$headcount_ratio_upper_mid_income_povline)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.032 28.583 36.539 64.865 99.999
summary(poverty_frame$headcount_international_povline)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 2.121e+04 3.615e+05 4.456e+07 5.160e+06 2.005e+09
summary(poverty_frame$headcount_lower_mid_income_povline)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 8.558e+04 1.027e+06 8.632e+07 1.099e+07 3.156e+09
summary(poverty_frame$headcount_upper_mid_income_povline)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 3.214e+05 2.863e+06 1.264e+08 2.006e+07 4.244e+09
summary(poverty_frame$mean)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.743 6.580 12.846 19.749 26.632 85.875
#Let’s break the dataset into 3 categories
# Just countries
country_frame <- poverty_frame %>%
filter(reporting_level == "national")
#-------------------------------------------------------------------------------
# Just regions
region_frame <- poverty_frame %>%
filter(country != "High income countries", country != "World") %>%
filter(reporting_level == "")
#-------------------------------------------------------------------------------
# The world
world_frame <- poverty_frame %>%
filter(country == "World")
#Exploratory data analysis
#Poverty ratio changes
# Messy, hard to read. Limitation in readability
country_frame %>%
filter(ppp_version == "2011") %>%
ggplot (aes(x = year, y =headcount_international_povline)) +
geom_line() +
labs(title = "Poverty in Countries") +
facet_wrap(~country, scales = "free_y")
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
#-------------------------------------------------------------------------------
#Just the 6 country regions
region_frame %>%
ggplot (aes(x = year)) +
geom_line(aes(y = headcount_ratio_international_povline, color = "red")) +
geom_line(aes(y = headcount_ratio_lower_mid_income_povline, color = "green")) +
geom_line(aes(y = headcount_ratio_upper_mid_income_povline, color = "blue")) +
facet_wrap(~country+ppp_version, scales = "free_y") +
labs(title = "Poverty Ratio in Regions", x = "Year", y = "Headcount Ratio", color = 'Poverty Line') +
scale_color_manual(labels = c("High-Mid", "Mid-Low","Low"), values = c("blue", "green","red"))
#-------------------------------------------------------------------------------
# The whole world
world_frame %>%
ggplot (aes(x = year)) +
geom_line(aes(y = headcount_ratio_international_povline, color = "red")) +
geom_line(aes(y = headcount_ratio_lower_mid_income_povline, color = "green")) +
geom_line(aes(y = headcount_ratio_upper_mid_income_povline, color = "blue")) +
facet_wrap(~ppp_version, scales = "free_y") +
labs(title = "Poverty Ratio in the World", x = "Year", y = "Headcount Ratio", color = 'Poverty Line') +
scale_color_manual(labels = c("High-Mid", "Mid-Low","Low"), values = c("blue", "green","red"))
#Total headcounts of those in poverty
#-------------------------------------------------------------------------------
# Just the 6 country regions
region_frame %>%
ggplot (aes(x = year)) +
geom_line(aes(y = headcount_international_povline, color = "red")) +
geom_line(aes(y = headcount_lower_mid_income_povline, color = "green")) +
geom_line(aes(y = headcount_upper_mid_income_povline, color = "blue")) +
facet_wrap(~country+ppp_version, scales = "free_y") +
labs(title = "Poverty in Regions", x = "Year", y = "Headcount", color = 'Poverty Line') +
scale_color_manual(labels = c("High-Mid", "Mid-Low","Low"), values = c("blue", "green","red"))
#-------------------------------------------------------------------------------
# The whole world
world_frame %>%
ggplot (aes(x = year)) +
geom_line(aes(y = headcount_international_povline, color = "red")) +
geom_line(aes(y = headcount_lower_mid_income_povline, color = "green")) +
geom_line(aes(y = headcount_upper_mid_income_povline, color = "blue")) +
facet_wrap(~ppp_version) +
labs(title = "Poverty in the World", x = "Year", y = "Headcount", color = 'Poverty Line') +
scale_color_manual(labels = c("High-Mid", "Mid-Low","Low"), values = c("blue", "green","red"))
#Let’s break up the region frame into each of the 6 regions
reg_EastAsia_Pacific <- region_frame %>%
filter(country == "East Asia and Pacific")
reg_Europe_CentralAsia <- region_frame %>%
filter(country == "Europe and Central Asia")
reg_LatinAmerica_Caribbean <- region_frame %>%
filter(country == "Latin America and the Caribbean")
reg_MidEast_NorthAfrica <- region_frame %>%
filter(country == "Middle East and North Africa")
reg_South_Asia <- region_frame %>%
filter(country == "South Asia")
reg_Subsaharan_Africa <- region_frame %>%
filter(country == "Sub-Saharan Africa")
#Inference
#Linear regression to see how base poverty is affected by total population
povlm_EastAsia_Pacific <- lm(headcount_international_povline ~ Total_Population, data = reg_EastAsia_Pacific)
summary(povlm_EastAsia_Pacific)
##
## Call:
## lm(formula = headcount_international_povline ~ Total_Population,
## data = reg_EastAsia_Pacific)
##
## Residuals:
## Min 1Q Median 3Q Max
## -108905333 -43326922 -517867 34169463 134318545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.743e+09 9.524e+07 49.80 <2e-16 ***
## Total_Population -2.278e+02 5.061e+00 -45.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56770000 on 58 degrees of freedom
## Multiple R-squared: 0.9722, Adjusted R-squared: 0.9717
## F-statistic: 2027 on 1 and 58 DF, p-value: < 2.2e-16
# R-Squared = 0.9722
cor(reg_EastAsia_Pacific$headcount_international_povline, reg_EastAsia_Pacific$Total_Population)
## [1] -0.9859914
# Correlation = -0.9859914. squared = 0.972179
#-------------------------------------------------------------------------------
povlm_Europe_CentralAsia <- lm(headcount_international_povline ~ Total_Population, data = reg_Europe_CentralAsia)
summary(povlm_Europe_CentralAsia)
##
## Call:
## lm(formula = headcount_international_povline ~ Total_Population,
## data = reg_Europe_CentralAsia)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20387209 -4691624 -126251 5297826 20507374
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.945e+08 6.092e+07 6.476 2.21e-08 ***
## Total_Population -7.847e+01 1.284e+01 -6.111 8.93e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8586000 on 58 degrees of freedom
## Multiple R-squared: 0.3917, Adjusted R-squared: 0.3812
## F-statistic: 37.35 on 1 and 58 DF, p-value: 8.928e-08
# R-Squared = 0.3917
cor(reg_Europe_CentralAsia$headcount_international_povline, reg_Europe_CentralAsia$Total_Population)
## [1] -0.6258561
# Correlation = -0.6258561. squared = 0.3916959
#-------------------------------------------------------------------------------
povlm_LatinAmerica_Caribbean<- lm(headcount_international_povline ~ Total_Population, data = reg_LatinAmerica_Caribbean)
summary(povlm_LatinAmerica_Caribbean)
##
## Call:
## lm(formula = headcount_international_povline ~ Total_Population,
## data = reg_LatinAmerica_Caribbean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13925885 -5325697 -1378315 6264715 13068583
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.058e+08 8.319e+06 24.73 <2e-16 ***
## Total_Population -2.835e+01 1.516e+00 -18.70 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7166000 on 58 degrees of freedom
## Multiple R-squared: 0.8577, Adjusted R-squared: 0.8552
## F-statistic: 349.6 on 1 and 58 DF, p-value: < 2.2e-16
# R-Squared = 0.8577
cor(reg_LatinAmerica_Caribbean$headcount_international_povline, reg_LatinAmerica_Caribbean$Total_Population)
## [1] -0.9261177
# Correlation = -0.9261177. squared = 0.857694
#-------------------------------------------------------------------------------
povlm_MidEast_NorthAfrica <- lm(headcount_international_povline ~ Total_Population, data = reg_MidEast_NorthAfrica)
summary(povlm_MidEast_NorthAfrica)
##
## Call:
## lm(formula = headcount_international_povline ~ Total_Population,
## data = reg_MidEast_NorthAfrica)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6821008 -4070487 -1599699 3359836 15641536
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.395e+06 4.793e+06 1.960 0.0551 .
## Total_Population 1.049e+00 1.552e+00 0.676 0.5021
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5505000 on 54 degrees of freedom
## Multiple R-squared: 0.008386, Adjusted R-squared: -0.009977
## F-statistic: 0.4567 on 1 and 54 DF, p-value: 0.5021
# R-Squared = 0.008386
cor(reg_MidEast_NorthAfrica$headcount_international_povline, reg_MidEast_NorthAfrica$Total_Population)
## [1] 0.09157423
# correlation = 0.09157423. squared = 0.008385839
#-------------------------------------------------------------------------------
povlm_South_Asia <- lm(headcount_international_povline ~ Total_Population, data = reg_South_Asia)
summary(povlm_South_Asia)
##
## Call:
## lm(formula = headcount_international_povline ~ Total_Population,
## data = reg_South_Asia)
##
## Residuals:
## Min 1Q Median 3Q Max
## -121699098 -45663910 -22716894 82054096 100896183
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.232e+09 6.905e+07 17.85 < 2e-16 ***
## Total_Population -5.207e+01 4.484e+00 -11.61 1.52e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70390000 on 48 degrees of freedom
## Multiple R-squared: 0.7375, Adjusted R-squared: 0.732
## F-statistic: 134.9 on 1 and 48 DF, p-value: 1.519e-15
# R-Squared = 0.7375
cor(reg_South_Asia$headcount_international_povline, reg_South_Asia$Total_Population)
## [1] -0.8587743
# Correlation = -0.8587743. squared = 0.7374933
#-------------------------------------------------------------------------------
povlm_Subsaharan_Africa <- lm(headcount_international_povline ~ Total_Population, data = reg_Subsaharan_Africa)
summary(povlm_Subsaharan_Africa)
##
## Call:
## lm(formula = headcount_international_povline ~ Total_Population,
## data = reg_Subsaharan_Africa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58941860 -17735788 6548817 14893987 35815276
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.561e+08 1.256e+07 20.383 < 2e-16 ***
## Total_Population 1.459e+01 1.588e+00 9.189 6.49e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21850000 on 58 degrees of freedom
## Multiple R-squared: 0.5928, Adjusted R-squared: 0.5858
## F-statistic: 84.43 on 1 and 58 DF, p-value: 6.486e-13
# R-Squared = 0.5928
cor(reg_Subsaharan_Africa$headcount_international_povline, reg_Subsaharan_Africa$Total_Population)
## [1] 0.769925
# Correlation = 0.769925. squared = 0.5927845
#-------------------------------------------------------------------------------
# The R^2 for the full model accounts for approximately X% of the variance.
# summary(povlm_EastAsia_Pacific)$r.squared can access the R-squared value directly
#Mapping the slopes of the total population vs poverty
reg_EastAsia_Pacific %>%
ggplot(aes(x = Total_Population, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Poverty (East Asia & Pacific)", x = "Total Population", y = "Poverty") +
annotate("text",x=2e7,y=9e8,label=(paste0("Slope==",coef(lm(reg_EastAsia_Pacific$headcount_international_povline~reg_EastAsia_Pacific$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the headcount_international_povline decreases by 227.8
# If no population, there can be no poverty line, unless we don't start at 0. On the first day of 1990, poverty line was at 4.74 x 10e9
# Line: y= mx + b = -2.278e2 * x + 4.743e9
#-------------------------------------------------------------------------------
reg_Europe_CentralAsia %>%
ggplot(aes(x = Total_Population, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Poverty (Europe and Central Asia)", x = "Total Population", y = "Poverty") +
annotate("text",x=4.9e6,y=5e7,label=(paste0("Slope==",coef(lm(reg_Europe_CentralAsia$headcount_international_povline~reg_Europe_CentralAsia$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the headcount_international_povline decreases by 78.47
# If no population, there can be no poverty line, unless we don't start at 0. On the first day of 1990, poverty line was at 3.945 x 10e8
# Line: y= mx + b = -7.847e+01 * x + 3.945e+08
#-------------------------------------------------------------------------------
reg_LatinAmerica_Caribbean %>%
ggplot(aes(x = Total_Population, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Poverty (Latin America and the Caribbean)", x = "Total Population", y = "Poverty") +
annotate("text",x=4.9e6,y=5e7,label=(paste0("Slope==",coef(lm(reg_LatinAmerica_Caribbean$headcount_international_povline~reg_LatinAmerica_Caribbean$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the headcount_international_povline decreases by 28.35
# If no population, there can be no poverty line, unless we don't start at 0. On the first day of 1990, poverty line was at 2.058 x 10e8
# Line: y= mx + b = -2.835e+01 * x + 2.058e+08
#-------------------------------------------------------------------------------
reg_MidEast_NorthAfrica %>%
ggplot(aes(x = Total_Population, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Poverty (Middle East and North Africa)", x = "Total Population", y = "Poverty") +
annotate("text",x=3e6,y=2.5e7,label=(paste0("Slope==",coef(lm(reg_MidEast_NorthAfrica$headcount_international_povline~reg_MidEast_NorthAfrica$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the headcount_international_povline decreases by 1.049
# If no population, there can be no poverty line, unless we don't start at 0. On the first day of 1990, poverty line was at 9.395 x 10e6
# Line: y= mx + b = 1.049e0 * x + 9.395e+06
#-------------------------------------------------------------------------------
reg_South_Asia %>%
ggplot(aes(x = Total_Population, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Poverty (South Asia)", x = "Total Population", y = "Poverty") +
annotate("text",x=1.7e7,y=6e8,label=(paste0("Slope==",coef(lm(reg_South_Asia$headcount_international_povline~reg_South_Asia$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the headcount_international_povline decreases by 52.07
# If no population, there can be no poverty line, unless we don't start at 0. On the first day of 1990, poverty line was at 1.232 x 10e9
# Line: y= mx + b = -5.207e+01 * x + 1.232e+09
#-------------------------------------------------------------------------------
reg_Subsaharan_Africa %>%
ggplot(aes(x = Total_Population, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Poverty (Sub-Saharan Africa)", x = "Total Population", y = "Poverty") +
annotate("text",x=8e6,y=3e8,label=(paste0("Slope==",coef(lm(reg_Subsaharan_Africa$headcount_international_povline~reg_Subsaharan_Africa$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the headcount_international_povline increases by 14.59
# If no population, there can be no poverty line, unless we don't start at 0. On the first day of 1990, poverty line was at 2.561 x 10e8
# Line: y= mx + b = 1.459e+01 * x + 2.561e+08
#-------------------------------------------------------------------------------
# Now for all 6 regions wrapped
region_frame %>%
ggplot(aes(x = Total_Population, y = headcount_international_povline)) +
#geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~country, scales = "free") +
labs(title = "Total Population vs Poverty", x = "Total Population", y = "Poverty")
## `geom_smooth()` using formula = 'y ~ x'
#Mean Income in countries
#-------------------------------------------------------------------------------
# Just the 6 country regions
region_frame %>%
ggplot(aes(x = year, y = mean)) +
geom_line(aes(color = "darkred")) +
facet_wrap(~country+ppp_version, scales = "free_y") +
labs(title = "Mean Income in Regions", x = "Year", y = "Mean Income") +
theme(legend.position = "none")
#-------------------------------------------------------------------------------
# The whole world
world_frame %>%
ggplot(aes(x = year, y = mean)) +
geom_line(aes(color = "darkred")) +
facet_wrap(~ppp_version, scales = "free_y") +
labs(title = "Mean Income in the World", x = "Year", y = "Mean Income") +
theme(legend.position = "none")
#Linear regression to see how mean income is affected by total population
meanlm_EastAsia_Pacific <- lm(mean ~ Total_Population, data = reg_EastAsia_Pacific)
summary(meanlm_EastAsia_Pacific)
##
## Call:
## lm(formula = mean ~ Total_Population, data = reg_EastAsia_Pacific)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4087 -1.0683 -0.1965 0.8965 2.3379
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.303e+01 1.886e+00 -17.51 <2e-16 ***
## Total_Population 2.079e-06 1.002e-07 20.75 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.124 on 58 degrees of freedom
## Multiple R-squared: 0.8812, Adjusted R-squared: 0.8792
## F-statistic: 430.4 on 1 and 58 DF, p-value: < 2.2e-16
# R-Squared = 0.8812
cor(reg_EastAsia_Pacific$mean, reg_EastAsia_Pacific$Total_Population)
## [1] 0.9387421
# Correlation = 0.9387421, squared = 0.8812367
#-------------------------------------------------------------------------------
meanlm_Europe_CentralAsia <- lm(mean ~ Total_Population, data = reg_Europe_CentralAsia)
summary(meanlm_Europe_CentralAsia)
##
## Call:
## lm(formula = mean ~ Total_Population, data = reg_Europe_CentralAsia)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9666 -1.8633 -0.0714 1.6492 4.7491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.341e+02 1.541e+01 -8.702 4.13e-12 ***
## Total_Population 3.113e-05 3.248e-06 9.585 1.46e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.172 on 58 degrees of freedom
## Multiple R-squared: 0.613, Adjusted R-squared: 0.6063
## F-statistic: 91.86 on 1 and 58 DF, p-value: 1.46e-13
# R-Squared = 0.613
cor(reg_Europe_CentralAsia$mean, reg_Europe_CentralAsia$Total_Population)
## [1] 0.7829322
# Correlation = 0.7829322, squared = 0.6129829
#-------------------------------------------------------------------------------
meanlm_LatinAmerica_Caribbean<- lm(mean ~ Total_Population, data = reg_LatinAmerica_Caribbean)
summary(meanlm_LatinAmerica_Caribbean)
##
## Call:
## lm(formula = mean ~ Total_Population, data = reg_LatinAmerica_Caribbean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0006 -0.6437 -0.1413 0.6927 1.7736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.778e+00 1.120e+00 -7.836 1.15e-10 ***
## Total_Population 4.063e-06 2.042e-07 19.900 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.965 on 58 degrees of freedom
## Multiple R-squared: 0.8722, Adjusted R-squared: 0.87
## F-statistic: 396 on 1 and 58 DF, p-value: < 2.2e-16
# R-Squared = 0.8722
cor(reg_LatinAmerica_Caribbean$mean, reg_LatinAmerica_Caribbean$Total_Population)
## [1] 0.9339402
# Correlation = 0.9339402, squared = 0.8722443
#-------------------------------------------------------------------------------
meanlm_MidEast_NorthAfrica <- lm(mean ~ Total_Population, data = reg_MidEast_NorthAfrica)
summary(meanlm_MidEast_NorthAfrica)
##
## Call:
## lm(formula = mean ~ Total_Population, data = reg_MidEast_NorthAfrica)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.68709 -0.54125 -0.03051 0.46787 1.44601
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.396e+00 6.853e-01 2.037 0.0465 *
## Total_Population 2.423e-06 2.219e-07 10.919 2.77e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7871 on 54 degrees of freedom
## Multiple R-squared: 0.6883, Adjusted R-squared: 0.6825
## F-statistic: 119.2 on 1 and 54 DF, p-value: 2.775e-15
# R-Squared = 0.6883
cor(reg_MidEast_NorthAfrica$mean, reg_MidEast_NorthAfrica$Total_Population)
## [1] 0.8296239
# Correlation = 0.8296239, squared = 0.6882758
#-------------------------------------------------------------------------------
meanlm_South_Asia <- lm(mean ~ Total_Population, data = reg_South_Asia)
summary(meanlm_South_Asia)
##
## Call:
## lm(formula = mean ~ Total_Population, data = reg_South_Asia)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.48381 -0.23729 -0.04637 0.24660 0.86907
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.112e+00 3.106e-01 -3.581 0.000797 ***
## Total_Population 2.994e-07 2.017e-08 14.842 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3166 on 48 degrees of freedom
## Multiple R-squared: 0.8211, Adjusted R-squared: 0.8174
## F-statistic: 220.3 on 1 and 48 DF, p-value: < 2.2e-16
# R-Squared = 0.8211
cor(reg_South_Asia$mean, reg_South_Asia$Total_Population)
## [1] 0.9061418
# Correlation = 0.9061418, squared = 0.8210929
#-------------------------------------------------------------------------------
meanlm_Subsaharan_Africa <- lm(mean ~ Total_Population, data = reg_Subsaharan_Africa)
summary(meanlm_Subsaharan_Africa)
##
## Call:
## lm(formula = mean ~ Total_Population, data = reg_Subsaharan_Africa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.50012 -0.27080 -0.01294 0.29466 0.75187
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.807e+00 1.827e-01 9.890 4.68e-14 ***
## Total_Population 2.035e-07 2.309e-08 8.814 2.69e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3177 on 58 degrees of freedom
## Multiple R-squared: 0.5725, Adjusted R-squared: 0.5652
## F-statistic: 77.69 on 1 and 58 DF, p-value: 2.691e-12
# R-Squared = 0.5725
cor(reg_Subsaharan_Africa$mean, reg_Subsaharan_Africa$Total_Population)
## [1] 0.7566695
# Correlation = 0.7566695, squared = 0.5725488
#Mapping the slopes of the total population vs mean income
reg_EastAsia_Pacific %>%
ggplot(aes(x = Total_Population, y = mean)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Mean Income (East Asia & Pacific)", x = "Total Population", y = "Mean Income") +
annotate("text",x=1.7e7,y=10,label=(paste0("Slope==",coef(lm(reg_EastAsia_Pacific$mean~reg_EastAsia_Pacific$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the mean income changes by 2.079e-06
# Line: y= mx + b = 2.079e-06 * x + -3.303e+01
#-------------------------------------------------------------------------------
reg_Europe_CentralAsia %>%
ggplot(aes(x = Total_Population, y = mean)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Mean Income (Europe and Central Asia)", x = "Total Population", y = "Mean Income") +
annotate("text",x=4.7e6,y=20,label=(paste0("Slope==",coef(lm(reg_Europe_CentralAsia$mean~reg_Europe_CentralAsia$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the mean income changes by 3.113e-05
# Line: y= mx + b = 3.113e-05 * x + -1.341e+02
#-------------------------------------------------------------------------------
reg_LatinAmerica_Caribbean %>%
ggplot(aes(x = Total_Population, y = mean)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Mean Income (Latin America and the Caribbean)", x = "Total Population", y = "Mean Income") +
annotate("text",x=4.9e6,y=20,label=(paste0("Slope==",coef(lm(reg_LatinAmerica_Caribbean$mean~reg_LatinAmerica_Caribbean$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the mean income changes by 4.063e-06
# Line: y= mx + b = 4.063e-06 * x + -8.778e+00
#-------------------------------------------------------------------------------
reg_MidEast_NorthAfrica %>%
ggplot(aes(x = Total_Population, y = mean)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Mean Income (Middle East and North Africa)", x = "Total Population", y = "Mean Income") +
annotate("text",x=3e6,y=12,label=(paste0("Slope==",coef(lm(reg_MidEast_NorthAfrica$mean~reg_MidEast_NorthAfrica$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the mean income changes by 2.423e-06
# Line: y= mx + b = 2.423e-06 * x + 1.396e+00
#-------------------------------------------------------------------------------
reg_South_Asia %>%
ggplot(aes(x = Total_Population, y = mean)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Mean Income (South Asia)", x = "Total Population", y = "Mean Income") +
annotate("text",x=1.3e7,y=5,label=(paste0("Slope==",coef(lm(reg_South_Asia$mean~reg_South_Asia$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the mean income changes by 2.994e-07
# Line: y= mx + b = 2.994e-07 * x + -1.112e+00
#-------------------------------------------------------------------------------
reg_Subsaharan_Africa %>%
ggplot(aes(x = Total_Population, y = mean)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Total Population vs Mean Income (Sub-Saharan Africa)", x = "Total Population", y = "Mean Income") +
annotate("text",x=6e6,y=4,label=(paste0("Slope==",coef(lm(reg_Subsaharan_Africa$mean~reg_Subsaharan_Africa$Total_Population))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase population by 1 point, the mean income changes by 2.035e-07
# Line: y= mx + b = 2.035e-07 * x + 1.807e+00
#-------------------------------------------------------------------------------
# Now for all 6 regions wrapped
region_frame %>%
ggplot(aes(x = Total_Population, y = mean)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~country, scales = "free") +
labs(title = "Total Population vs Mean Income", x = "Total Population", y = "Mean Income")
## `geom_smooth()` using formula = 'y ~ x'
# Overall, there is minimal effect of total population on mean income.
#Linear regression to see how mean income and poverty affect each other
pov_mean_lm_EastAsia_Pacific <- lm(headcount_international_povline ~ mean, data = reg_EastAsia_Pacific)
summary(pov_mean_lm_EastAsia_Pacific)
##
## Call:
## lm(formula = headcount_international_povline ~ mean, data = reg_EastAsia_Pacific)
##
## Residuals:
## Min 1Q Median 3Q Max
## -206902854 -97304163 -18928122 95850485 241044247
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1048788635 34133804 30.73 <2e-16 ***
## mean -97036317 5027676 -19.30 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 124900000 on 58 degrees of freedom
## Multiple R-squared: 0.8653, Adjusted R-squared: 0.863
## F-statistic: 372.5 on 1 and 58 DF, p-value: < 2.2e-16
# R-Squared = 0.8653
cor(reg_EastAsia_Pacific$headcount_international_povline, reg_EastAsia_Pacific$mean)
## [1] -0.9302016
# Correlation = -0.9302016. squared = 0.865275
#-------------------------------------------------------------------------------
pov_mean_lm_Europe_CentralAsia <- lm(headcount_international_povline ~ mean, data = reg_Europe_CentralAsia)
summary(pov_mean_lm_Europe_CentralAsia)
##
## Call:
## lm(formula = headcount_international_povline ~ mean, data = reg_Europe_CentralAsia)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12319026 -5665948 -277055 5906187 12372888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57081885 3378940 16.89 < 2e-16 ***
## mean -2562711 241189 -10.62 3.12e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6413000 on 58 degrees of freedom
## Multiple R-squared: 0.6606, Adjusted R-squared: 0.6548
## F-statistic: 112.9 on 1 and 58 DF, p-value: 3.125e-15
# R-Squared = 0.6606
cor(reg_Europe_CentralAsia$headcount_international_povline, reg_Europe_CentralAsia$mean)
## [1] -0.8127826
# Correlation = -0.8127826. squared = 0.6606156
#-------------------------------------------------------------------------------
pov_mean_lm_LatinAmerica_Caribbean<- lm(headcount_international_povline ~ mean, data = reg_LatinAmerica_Caribbean)
summary(pov_mean_lm_LatinAmerica_Caribbean)
##
## Call:
## lm(formula = headcount_international_povline ~ mean, data = reg_LatinAmerica_Caribbean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11332063 -6939208 1580198 5192511 15047306
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 137753624 4946396 27.85 <2e-16 ***
## mean -6471368 362731 -17.84 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7458000 on 58 degrees of freedom
## Multiple R-squared: 0.8459, Adjusted R-squared: 0.8432
## F-statistic: 318.3 on 1 and 58 DF, p-value: < 2.2e-16
# R-Squared = 0.8459
cor(reg_LatinAmerica_Caribbean$headcount_international_povline, reg_LatinAmerica_Caribbean$mean)
## [1] -0.9197083
# Correlation = -0.9197083. squared = 0.8458634
#-------------------------------------------------------------------------------
pov_mean_lm_MidEast_NorthAfrica <- lm(headcount_international_povline ~ mean, data = reg_MidEast_NorthAfrica)
summary(pov_mean_lm_MidEast_NorthAfrica)
##
## Call:
## lm(formula = headcount_international_povline ~ mean, data = reg_MidEast_NorthAfrica)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5073249 -3326425 -2088447 988000 18108124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20526828 4621014 4.442 4.45e-05 ***
## mean -902286 519294 -1.738 0.088 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5380000 on 54 degrees of freedom
## Multiple R-squared: 0.05295, Adjusted R-squared: 0.03541
## F-statistic: 3.019 on 1 and 54 DF, p-value: 0.088
# R-Squared = 0.05295
cor(reg_MidEast_NorthAfrica$headcount_international_povline, reg_MidEast_NorthAfrica$mean)
## [1] -0.2301024
# Correlation = -0.2301024. squared = 0.05294711
#-------------------------------------------------------------------------------
pov_mean_lm_South_Asia <- lm(headcount_international_povline ~ mean, data = reg_South_Asia)
summary(pov_mean_lm_South_Asia)
##
## Call:
## lm(formula = headcount_international_povline ~ mean, data = reg_South_Asia)
##
## Residuals:
## Min 1Q Median 3Q Max
## -85218995 -45747976 8725679 22102802 87751160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1027510411 34434467 29.84 <2e-16 ***
## mean -170603542 9763216 -17.47 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 50630000 on 48 degrees of freedom
## Multiple R-squared: 0.8642, Adjusted R-squared: 0.8613
## F-statistic: 305.3 on 1 and 48 DF, p-value: < 2.2e-16
# R-Squared = 0.8642
cor(reg_South_Asia$headcount_international_povline, reg_South_Asia$mean)
## [1] -0.9295995
# Correlation = -0.9295995. squared = 0.8641553
#-------------------------------------------------------------------------------
pov_mean_lm_Subsaharan_Africa <- lm(headcount_international_povline ~ mean, data = reg_Subsaharan_Africa)
summary(pov_mean_lm_Subsaharan_Africa)
##
## Call:
## lm(formula = headcount_international_povline ~ mean, data = reg_Subsaharan_Africa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -100769085 -12731600 1405095 24486542 49588508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 312296204 30648031 10.190 1.54e-14 ***
## mean 16677322 8988959 1.855 0.0686 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33270000 on 58 degrees of freedom
## Multiple R-squared: 0.05602, Adjusted R-squared: 0.03975
## F-statistic: 3.442 on 1 and 58 DF, p-value: 0.06864
# R-Squared = 0.05602
cor(reg_Subsaharan_Africa$headcount_international_povline, reg_Subsaharan_Africa$mean)
## [1] 0.236692
# Correlation = 0.236692. squared = 0.0560231
#Mapping poverty line vs mean income
reg_EastAsia_Pacific %>%
ggplot(aes(x = mean, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Poverty Line vs Mean Income (East Asia & Pacific)", x = "Mean Income", y = "Poverty Line") +
annotate("text",x=10,y=8e8,label=(paste0("Slope==",coef(lm(reg_EastAsia_Pacific$mean~reg_EastAsia_Pacific$headcount_international_povline))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase mean income by 1, the poverty line changes by
# Line: y= mx + b =
#-------------------------------------------------------------------------------
reg_Europe_CentralAsia %>%
ggplot(aes(x = mean, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Poverty Line vs Mean Income (Europe and Central Asia)", x = "Mean Income", y = "Poverty Line") +
annotate("text",x=18,y=4e7,label=(paste0("Slope==",coef(lm(reg_Europe_CentralAsia$mean~reg_Europe_CentralAsia$headcount_international_povline))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase mean income by 1, the poverty line changes by
# Line: y= mx + b =
#-------------------------------------------------------------------------------
reg_LatinAmerica_Caribbean %>%
ggplot(aes(x = mean, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Poverty Line vs Mean Income (Latin America and the Caribbean)", x = "Mean Income", y = "Poverty Line") +
annotate("text",x=17,y=7e7,label=(paste0("Slope==",coef(lm(reg_LatinAmerica_Caribbean$mean~reg_LatinAmerica_Caribbean$headcount_international_povline))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase mean income by 1, the poverty line changes by
# Line: y= mx + b =
#-------------------------------------------------------------------------------
reg_MidEast_NorthAfrica %>%
ggplot(aes(x = mean, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Poverty Line vs Mean Income (Middle East and North Africa)", x = "Mean Income", y = "Poverty Line") +
annotate("text",x=8,y=2.5e7,label=(paste0("Slope==",coef(lm(reg_MidEast_NorthAfrica$mean~reg_MidEast_NorthAfrica$headcount_international_povline))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase mean income by 1, the poverty line changes by
# Line: y= mx + b =
#-------------------------------------------------------------------------------
reg_South_Asia %>%
ggplot(aes(x = mean, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Poverty Line vs Mean Income (South Asia)", x = "Mean Income", y = "Poverty Line") +
annotate("text",x=5,y=5e8,label=(paste0("Slope==",coef(lm(reg_South_Asia$mean~reg_South_Asia$headcount_international_povline))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase mean income by 1, the poverty line changes by
# Line: y= mx + b =
#-------------------------------------------------------------------------------
reg_Subsaharan_Africa %>%
ggplot(aes(x = mean, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Poverty Line vs Mean Income (Sub-Saharan Africa)", x = "Mean Income", y = "Poverty Line") +
annotate("text",x=4,y=5e8,label=(paste0("Slope==",coef(lm(reg_Subsaharan_Africa$mean~reg_Subsaharan_Africa$headcount_international_povline))[2])),parse=TRUE)
## `geom_smooth()` using formula = 'y ~ x'
# The further we increase mean income by 1, the poverty line changes by
# Line: y= mx + b =
#-------------------------------------------------------------------------------
# Now for all 6 regions wrapped
region_frame %>%
ggplot(aes(x = mean, y = headcount_international_povline)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~country, scales = "free") +
labs(title = "Poverty Line vs Mean Income", x = "Mean Income", y = "Poverty Line")
## `geom_smooth()` using formula = 'y ~ x'
# Overall, increasing the mean income does in fact lower the poverty line, with the exception of Sub-Saharan Africa.
#Shiny app for interactive mapping
country_frame2 <- country_frame
colnames(country_frame2)[which(names(country_frame2) == "country")] <- "region"
world_map <- map_data("world")
country_frame2_join <- left_join(country_frame2, world_map, by = "region")
## Warning in left_join(country_frame2, world_map, by = "region"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 765 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
#Using pivot longer to allow for type of poverty line to be selected
country_frame2_join_long <- country_frame2_join %>%
pivot_longer(
cols = c("headcount_ratio_international_povline",
"headcount_ratio_lower_mid_income_povline",
"headcount_ratio_upper_mid_income_povline"),
names_to = "Poverty_Line",
values_to = "Ratio"
)
#-------------------------------------------------------------------------------
sorted_year <- unique(sort(country_frame2_join_long$year))
ui <- fluidPage(
# shinybrowser::detect(),
# "Window size:",
# textOutput("size"),
titlePanel("Poverty By Country"),
sidebarLayout(
sidebarPanel(
radioButtons("ppp_version",
"Select Polling Version:",
choices = unique(country_frame2_join_long$ppp_version),
selected = unique(country_frame2_join_long$ppp_version)[1]),
radioButtons("poverty_line",
"Select Poverty Line Ratio:",
choices = unique(country_frame2_join_long$Poverty_Line),
selected = unique(country_frame2_join_long$Poverty_Line)[1]),
selectInput("year",
"Select Year:",
choices = sorted_year,
selected = sorted_year[1]),
),
mainPanel(
plotOutput("worldMap")
)
)
)
server <- function(input, output, session){
# output$size <- renderText({
# paste(get_width(), "x", get_height())
# })
filteredCountry <- reactive({
filter(country_frame2_join_long,
year == input$year & ppp_version == input$ppp_version & Poverty_Line == input$poverty_line)
})
output$worldMap <- renderPlot({
country_label_data <- filteredCountry() %>%
group_by(region) %>%
summarise(long = mean(long), lat = mean(lat))
ggplot (filteredCountry(), aes(long, lat)) +
geom_polygon(aes(fill = Ratio, group = group), color = "white") +
geom_text(aes(label = region), data = country_label_data, size = 3, hjust = 0.5) +
scale_fill_viridis_c(option = "C", name = "Poverty Ratio") +
labs(fill = "Poverty Ratio")
})
}
shinyApp(ui = ui, server = server)
## PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable.
# Messy, difficult to read. Limited viability
country_frame %>%
filter(ppp_version == "2017") %>%
ggplot (aes(x = year, y =gini)) +
geom_line() +
labs(title = "Gini Coefficient of Inequality") +
facet_wrap(~country, scales = "free_y")
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
#Conclusion
Our findings have shown that overall, poverty is shrinking worldwide, but at different rates in each country and region. In recent years, we can actually see an increase in the number of people living in poverty in the Middle East & North Africa, and Sub-Saharan Africa.
The growth in total population has had a minimal effect on poverty, which we believe is shrinking due to external factors unavailable in the dataset, such as advances in medical technology and food production.
Mean income has risen at a minimal rate since 1990, and has virtually no change when total population grows. Interestingly enough, this holds true for the relationship between mean income and baseline poverty, but this is likely an effect, not the cause.
Unfortunately, this data is rather surface-level and does not allow insight into multidimensional poverty factors such as access to clean water, electricity, education, shelter, etc. This analysis only skims the surface of the complex issue that is worldwide poverty, and does not include the full breadth of analysis as not all of the population can be realistically surveyed.
#References/Resources
https://github.com/owid/poverty-data/blob/main/datasets/pip_codebook.csv
https://ourworldindata.org/from-1-90-to-2-15-a-day-the-updated-international-poverty-line
https://www.worldvision.org/sponsorship-news-stories/global-poverty-facts#:~:text=According%20to%20the%20World%20Bank,in%20poverty%20as%20of%202021. https://www.worldbank.org/en/news/feature/2013/06/17/high-frequency-data-collection-new-breed-household-surveys
https://blogs.worldbank.org/developmenttalk/half-global-population-lives-less-us685-person-day
https://blogs.worldbank.org/opendata/march-2023-global-poverty-update-world-bank-challenge-estimating-poverty-pandemic https://ourworldindata.org/sdgs/no-poverty#1.1