## Elina Azrilyan

#### November 15th, 2019

##### The data

The attached who.csv dataset contains real-world data from 2008. The variables included follow.

Country: name of the country

LifeExp: average life expectancy for the country in years

InfantSurvival: proportion of those surviving to one year or more

Under5Survival: proportion of those surviving to five years or more

TBFree: proportion of the population without TB.

PropMD: proportion of the population who are MDs

PropRN: proportion of the population who are RNs

PersExp: mean personal expenditures on healthcare in US dollars at average exchange rate

GovtExp: mean government expenditures per capita on healthcare, US dollars at average exchange rate

TotExp: sum of personal and government expenditures.

``````library(knitr)
kable(head(whodf), digits = 2, align = c(rep("l", 4), rep("c", 4), rep("r", 4)))``````
Country LifeExp InfantSurvival Under5Survival TBFree PropMD PropRN PersExp GovtExp TotExp
Afghanistan 42 0.84 0.74 1 0 0 20 92 112
Albania 71 0.98 0.98 1 0 0 169 3128 3297
Algeria 71 0.97 0.96 1 0 0 108 5184 5292
Andorra 82 1.00 1.00 1 0 0 2589 169725 172314
Angola 41 0.85 0.74 1 0 0 36 1620 1656
Antigua and Barbuda 73 0.99 0.99 1 0 0 503 12543 13046
##### 1. Provide a scatterplot of LifeExp~TotExp, and run simple linear regression. Do not transform the variables. Provide and interpret the F statistics, R^2, standard error,and p-values only. Discuss whether the assumptions of simple linear regression met.

There are 22 columns in our dataset and there are 463 rows of data.

Letâ€™s examine the relationship between LifeExp and TotExp variables - letâ€™s also add a regression line.

``````plot(whodf\$LifeExp ~ whodf\$TotExp, main = "LifeExp vs TotExp", xlab = "Pers and gov expenditures", ylab = "Average life expectancy")
abline(lm(whodf\$LifeExp ~ whodf\$TotExp), col="red") # regression line (y~x) ``````

# Running simple linear regression

``````m1 <- lm(LifeExp ~ TotExp, data = whodf)
summary(m1)``````
``````##
## Call:
## lm(formula = LifeExp ~ TotExp, data = whodf)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -24.764  -4.778   3.154   7.116  13.292
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.475e+01  7.535e-01  85.933  < 2e-16 ***
## TotExp      6.297e-05  7.795e-06   8.079 7.71e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared:  0.2577, Adjusted R-squared:  0.2537
## F-statistic: 65.26 on 1 and 188 DF,  p-value: 7.714e-14``````

F-statistic is 65.26 and p-value is close to 0 so there is high likelihood that the model is explaining the data failrly well, however due to the R^2 value - we can conclude that only 25% of the variation can be explained by our data. Standard error is very low. The assumptions of of simple linear regression are met.

``````qqnorm(m1\$residuals)
qqline(m1\$residuals)``````