Xiyue Shu s3705474, Shan Jiang s3592369, Anna Krinochkina s3712761
Last updated: 19 October, 2018
The rationale of the investigation:
High infant mortality rates generally indicate human needs in medical care, nutrition, sanitation, etc. are unmet. Many studies suggest that higher income at country level is closely correlated with higher health status for that country’s population. It is also assumed that the indexes of IMR and GDP per capita have a negative relationship.
Understanding the relationship between two quantitative variables (IMR and GDP per capita) in order to allow making accurate predictions. It is interesting to know if the GDP per capita index of a country can be used to make predictions of infant mortality rate in that country.
countries dataset used in this presentation is obtained from Kaggle.com https://www.kaggle.com/fernandol/countries-of-the-worldcountries dataset contains information on population, region, area size, infant mortality, population density, coast/area ratio, net migration, GDP, literacy, phones per 1000, arable(%), crops(%), climate, deathrate, birthrate, industry, agriculture, service and other(%) for 2010.countries <- read_csv("countries.csv")
countries <- countries %>% select(`Infant mortality (per 1000 births)`, `GDP ($ per capita)`)Summary statistics for variable Infant mortality (per 1000 births) and GDP ($ per capita)are as follow:
summary(countries$`Infant mortality (per 1000 births)`, na.rm =T)## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2.29 8.15 21.00 35.51 55.70 191.19 3
sd(countries$`Infant mortality (per 1000 births)`, na.rm = T)## [1] 35.3899
summary(countries$`GDP ($ per capita)`, na.rm=T)## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 500 1900 5550 9690 15700 55100 1
sd(countries$`GDP ($ per capita)`, na.rm = T)## [1] 10049.14
which(is.na(countries))## [1] 48 222 224 451
countries <- na.omit(countries)
z.score<-countries$`Infant mortality (per 1000 births)` %>% scores(type = "z")
which(abs(z.score) > 3)## [1] 1 6 183
z.scores<-countries$`GDP ($ per capita)` %>% scores(type = "z")
which(abs(z.scores) > 3)## [1] 121
countries<- countries[-c(1,6,121,183),]Infant mortality (per 1000 births) was produced and shows that it is right skewedpar(mfrow = c(1,2))
hist(countries$`Infant mortality (per 1000 births)`, main = 'Infant Mortality', xlab = 'Infant Mortality', col = "lightblue")
log(countries$`Infant mortality (per 1000 births)`) %>% hist(main = "log(Infant Mortality)", col = "lightblue")GDP ($ per capita) was produced and shows that it is right skewedpar(mfrow=c(1,2))
hist(countries$`GDP ($ per capita)`, main = 'GDP', xlab = 'GDP($per capita)', col = "grey")
log(countries$`GDP ($ per capita)`) %>% hist(main = "log(GDP)", ylim = c(0,35), col = "grey")a scatter plot of the transformed varaibles is as follow, to give an overview of the relationship between the two variables
plot(log(`Infant mortality (per 1000 births)`) ~ log(`GDP ($ per capita)`), data = countries)countries data does not fit the linear regression model.countries data fits the linear regression model.model1 <- lm(log(`Infant mortality (per 1000 births)`) ~ log(`GDP ($ per capita)`), data = countries)
model1 %>% summary()##
## Call:
## lm(formula = log(`Infant mortality (per 1000 births)`) ~ log(`GDP ($ per capita)`),
## data = countries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.64617 -0.32677 -0.01089 0.35710 1.62962
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.5800 0.2677 35.79 <2e-16 ***
## log(`GDP ($ per capita)`) -0.7637 0.0309 -24.72 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5409 on 218 degrees of freedom
## Multiple R-squared: 0.737, Adjusted R-squared: 0.7358
## F-statistic: 610.9 on 1 and 218 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
model1 %>% plot(which = 1)
model1 %>% plot(which = 2)
model1 %>% plot(which = 3)
model1 %>% plot(which = 5)r <- cor(log(countries$`Infant mortality (per 1000 births)`), log(countries$`GDP ($ per capita)`), use = 'complete.obs')
r## [1] -0.8584828
library(psychometric)
CIr(r, n = 220, level = .95)## [1] -0.8897237 -0.8192381
detach('package:psychometric', unload = T)Based on the investigation, there was a statistically significant negative linear relationship between GDP per capita and IMR index of a country. The estimated linear regression model could be used for further predictions.