We will no begin testing whether the differences we have seen are “real” or not We will first use a t-test to test for differences between means First we load our mosaic library and our data
library(mosaic)
## Loading required package: car
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Loading required package: lattice
## Loading required package: ggplot2
##
## Attaching package: 'mosaic'
##
## The following objects are masked from 'package:dplyr':
##
## do, tally
##
## The following object is masked from 'package:car':
##
## logit
##
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cov, D, fivenum, IQR, median, prop.test, sd,
## t.test, var
##
## The following objects are masked from 'package:base':
##
## max, mean, min, print, prod, range, sample, sum
library(RCurl)
## Loading required package: bitops
library(knitr)
url<-"https://raw.githubusercontent.com/coreysparks/data/master/PRB2013_new.csv"
prbdata<-getURL(url)
prbdata<-read.csv(textConnection(prbdata), header=T, dec=",")
Next, we recode a variable using the ifelse() function
prbdata$Africa<-ifelse(prbdata$Continent=="Africa",yes= "Africa",no= "Not Africa")
Now we can use our new variable to do some descriptive analysis
mean(e0Total~Africa, data=prbdata, na.rm=T)
## Africa Not Africa
## 59.60 74.49
sd(e0Total~Africa, data=prbdata, na.rm=T)
## Africa Not Africa
## 8.608 5.467
bwplot(e0Total~Africa, prbdata)
Here is our test that average life expectancy is the same in Africa vs. Non-African countries To do this we construct a linear model for the difference in the means for the two groups this would be like:
\(latex e0Total_i = a + b*Africa + e_i \)
Where a is the mean e0Total in Africa, and b describes how the mean of the Non-African countries relates to the mean of the African countries. e contains all the information on e0Total that the difference between groups doesn’t explain, and is called the residual.
test1<-lm(e0Total~Africa, data=prbdata)
kable(summary(test1)$coef, digits=3)
| | Estimate| Std. Error| t value| Pr(>|t|)| |:—————-|——–:|———-:|——-:|——————:| |(Intercept) | 59.600| 0.870| 68.536| 0.000| |AfricaNot Africa | 14.890| 1.016| 14.660| 0.000| is most certainly is not the same, because we see the Probability that the Not Africa parameter is 0 is very, very small, close to 0, at least to three decimal places.
The mean e0Total for Africa is 59.6, the intercept the mean for the non-African countries is 59.6+14.89 = 74.49, which is exactly what we saw in:
mean(e0Total~Africa, data=prbdata, na.rm=T)
## Africa Not Africa
## 59.60 74.49
but what about the assumptions of our model? Are the residuals normal? We can do a graphical check using a Q-Q plot These plots compare the observed data’s quantiles to those expected from a normal distribution
qqnorm(rstudent(test1), main="Q-Q Plot for Model Residuals")
qqline(rstudent(test1), col="red")
This looks pretty good!