The attached who.csv dataset contains real-world data from 2008. The variables included follow.

Country: name of the country

LifeExp: average life expectancy for the country in years

InfantSurvival: proportion of those surviving to one year or more

Under5Survival: proportion of those surviving to five years or more

TBFree: proportion of the population without TB.

PropMD: proportion of the population who are MDs

PropRN: proportion of the population who are RNs

PersExp: mean personal expenditures on healthcare in US dollars at average exchange rate

GovtExp: mean government expenditures per capita on healthcare, US dollars at average exchange rate

TotExp: sum of personal and government expenditures.

```
library(knitr)
whodf <- read.csv(file="who.csv", header=TRUE, sep=",")
kable(head(whodf), digits = 2, align = c(rep("l", 4), rep("c", 4), rep("r", 4)))
```

Country | LifeExp | InfantSurvival | Under5Survival | TBFree | PropMD | PropRN | PersExp | GovtExp | TotExp |
---|---|---|---|---|---|---|---|---|---|

Afghanistan | 42 | 0.84 | 0.74 | 1 | 0 | 0 | 20 | 92 | 112 |

Albania | 71 | 0.98 | 0.98 | 1 | 0 | 0 | 169 | 3128 | 3297 |

Algeria | 71 | 0.97 | 0.96 | 1 | 0 | 0 | 108 | 5184 | 5292 |

Andorra | 82 | 1.00 | 1.00 | 1 | 0 | 0 | 2589 | 169725 | 172314 |

Angola | 41 | 0.85 | 0.74 | 1 | 0 | 0 | 36 | 1620 | 1656 |

Antigua and Barbuda | 73 | 0.99 | 0.99 | 1 | 0 | 0 | 503 | 12543 | 13046 |

There are 22 columns in our dataset and there are 463 rows of data.

Letâ€™s examine the relationship between LifeExp and TotExp variables - letâ€™s also add a regression line.

```
plot(whodf$LifeExp ~ whodf$TotExp, main = "LifeExp vs TotExp", xlab = "Pers and gov expenditures", ylab = "Average life expectancy")
abline(lm(whodf$LifeExp ~ whodf$TotExp), col="red") # regression line (y~x)
```

```
m1 <- lm(LifeExp ~ TotExp, data = whodf)
summary(m1)
```

```
##
## Call:
## lm(formula = LifeExp ~ TotExp, data = whodf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.764 -4.778 3.154 7.116 13.292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.475e+01 7.535e-01 85.933 < 2e-16 ***
## TotExp 6.297e-05 7.795e-06 8.079 7.71e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared: 0.2577, Adjusted R-squared: 0.2537
## F-statistic: 65.26 on 1 and 188 DF, p-value: 7.714e-14
```

F-statistic is 65.26 and p-value is close to 0 so there is high likelihood that the model is explaining the data failrly well, however due to the R^2 value - we can conclude that only 25% of the variation can be explained by our data. Standard error is very low. The assumptions of of simple linear regression are met.

```
qqnorm(m1$residuals)
qqline(m1$residuals)
```