Hossein FaridNasr
First of all I have loaded the required packages in order to do the analysis:
install.packages("data.table", repos = "http://cran.us.r-project.org")
install.packages("ggplot2", repos = "http://cran.us.r-project.org")
library(data.table)
library(ggplot2)
After loading the required packages, we use the function fread() to read the data and save it in the variable ‘heights’.
heights <- fread("datafile.csv")
Now we have all the data in the heights variable. We can now subset our data so that we only have the columns we need for the analysis. For the first question we will use these variables:
boys_16_heights_2019 <- heights[Sex=="Boys"&
`Age group`==16&
Year==2019,5:8]
boys_16_heights_1990 <- heights[Sex=="Boys"&
`Age group`==16&
Year==1990,5:8]
For the second question we will use these variables:
boys_10_heights_Iran <- heights[Sex=="Boys"&
`Age group`==10&
Country=="Iran",5:8]
girls_10_heights_Iran <- heights[Sex=="Girls"&
`Age group`==10&
Country=="Iran",5:8]
And for the third question we will use these variables:
boys_15_heights_Germany <- heights[Sex=="Boys"&
`Age group`==15&
Country=="Germany",5:8]
boys_15_heights_Egypt <- heights[Sex=="Boys"&
`Age group`==15&
Country=="Egypt",5:8]
Finally we can begin our Hypothesis Testing.
Question 1
For the first question, we will need to perform a two sample t test between 16 year old boys in 1990 and 2019.
H0: sample_mean_2019 >= sample_mean_1990
H1: sample_mean_2019 < sample_mean_1990
t.test(boys_16_heights_2019[,`Mean height`],
boys_16_heights_1990[,`Mean height`],var.equal=TRUE)
##
## Two Sample t-test
##
## data: boys_16_heights_2019[, `Mean height`] and boys_16_heights_1990[, `Mean height`]
## t = 4.1452, df = 398, p-value = 4.151e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.123185 3.149662
## sample estimates:
## mean of x mean of y
## 169.8801 167.7437
We can conclude that the boys’ heights in 2019 is in general higher than the same in 1990.
Question 2
For the first question, we will need to perform a two sample t test between 10 year old boys and girls in IRAN through the years.
*Hypothesis H0: boys_10_heights_Iran >= girls_10_heights_Iran
H1: boys_10_heights_Iran < girls_10_heights_Iran
t.test(boys_10_heights_Iran,
girls_10_heights_Iran,var.equal = TRUE)
##
## Two Sample t-test
##
## data: boys_10_heights_Iran and girls_10_heights_Iran
## t = -0.05926, df = 278, p-value = 0.9528
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -14.23166 13.39984
## sample estimates:
## mean of x mean of y
## 101.4902 101.9061
We can conclude that the difference between the heights of boys and girls in the age of 10 is minuscule with a slight advantage towards boys.
Question 3
For the first question, we will need to perform a two sample t test between 15 year old boys born in Germany and Egypt through the years.
H0: boys_15_heights_Germany >= boys_15_heights_Egypt
H1: boys_15_heights_Germany < boys_15_heights_Egypt
t.test(boys_15_heights_Germany,
boys_15_heights_Egypt,var.equal = TRUE)
##
## Two Sample t-test
##
## data: boys_15_heights_Germany and boys_15_heights_Egypt
## t = 1.1834, df = 278, p-value = 0.2377
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.800299 27.298957
## sample estimates:
## mean of x mean of y
## 130.7986 120.5492
We can conclude that the boys’ heights born in Germany is in general higher than the same in boys born in Egypt.