A correlation analysis was conducted on the GapMinder dataset to understand the association of 14 explanatory variables (including income per person, alcohol consumption, armed forces rate, breast cancer per 100th, co2 emissions, female employment rate, hiv rate, internet use rate, oil per person, polity score, relectric per person, suicide per 100th, employment rate, urbanization rate) with the variable life expectancy.
After removing the obeservations with missing values the pearson correlation coefficient is computed. As can be seen from the below results, the variable internetuserate has a very strong positive correlation with the variable life expectancy. The variable incomeperperson also has a strong positive correlation with life expectancy. The variable hivrate is the variable most negatively associated with the variable life expectancy. The variable armedforcesrate has the least corrleation with life expectancy. Also, the corrsponding p-values (with the null hypothesis that the variables are not corrlelated) are reported. All the variables except suicideper100th and co2emissions have statistically significant correlations at 5% level.
| correlation | p.value | |
|---|---|---|
| lifeexpectancy | 1.000000 | 0.000000 |
| internetuserate | 0.769160 | 0.000000 |
| incomeperperson | 0.732452 | 0.000000 |
| breastcancerper100th | 0.580247 | 0.000003 |
| urbanrate | 0.552084 | 0.000010 |
| relectricperperson | 0.551581 | 0.000011 |
| oilperperson | 0.422911 | 0.001165 |
| polityscore | 0.344843 | 0.009248 |
| femaleemployrate | 0.268129 | 0.045718 |
| alcconsumption | 0.218541 | 0.105630 |
| employrate | 0.210334 | 0.119719 |
| co2emissions | 0.103990 | 0.445635 |
| armedforcesrate | 0.023648 | 0.862654 |
| suicideper100th | -0.218335 | 0.105966 |
| hivrate | -0.542506 | 0.000016 |