summary(GPA)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 3.300 3.244 3.500 4.000
hist(GPA)
boxplot(GPA)
From these lines of code, we can observe that most of our data in the sample averages around the 3.25 area. But there are clear outliers in our dataset as shown by the barplot.
We’re going to run a T test that will tell us the confidence interval of 95% and the p-value.
t.test(GPA, mu=3.1)
##
## One Sample t-test
##
## data: GPA
## t = 5.6574, df = 252, p-value = 4.164e-08
## alternative hypothesis: true mean is not equal to 3.1
## 95 percent confidence interval:
## 3.193737 3.293852
## sample estimates:
## mean of x
## 3.243794
t.test(GPA, mu=3.1, alternative ="greater")
##
## One Sample t-test
##
## data: GPA
## t = 5.6574, df = 252, p-value = 2.082e-08
## alternative hypothesis: true mean is greater than 3.1
## 95 percent confidence interval:
## 3.201833 Inf
## sample estimates:
## mean of x
## 3.243794
With a p-value(0) less than the significance level(0.05), we have ample evidence to reject the null hypothesis. The data shows the GPA average is higher than 3.1.
We can be 95% confident that the real proportion is between 3.2 and 3.29
Does the sample show ample evidence of a significant difference in the proportions between male and female students under high stress?
plot(table(Gender, Stress))
Based on initial observations, it appears to be that there is a higher level of females than males that display high levels of stress. But this is only our initial observations so we need to run more tests to provide ample evidence this result isnt random.
prop.test(table(Gender, Stress))
##
## 2-sample test for equality of proportions with continuity correction
##
## data: table(Gender, Stress)
## X-squared = 1.5843, df = 1, p-value = 0.2081
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.03434101 0.18471110
## sample estimates:
## prop 1 prop 2
## 0.2516556 0.1764706
With a p-value higher than our significance level of 0.5 with it being 0.2, we fail to reject the null hypothesis. There is a lack of evidence to say there is a difference in stress levels between men and women at college X.
We can be 95% confident the true difference in stress levels between both genders at college X is between -0.034 and 0.185. Zero is in our confidence interval, so its plausible the true difference is that there is no difference, 0.
Does the sample have ample evidence of a significant difference between average GPA of a normal stress student and the average GPA of high-stress students at college X?
boxplot(GPA~Stress)
The box plot shows that the higher stressed students on average had higher GPA’s than normal stressed students. The median GPA and IQr is higher for stressed students
We will run a two mean hypothesis test at the .05 significance level.
t.test(GPA~Stress)
##
## Welch Two Sample t-test
##
## data: GPA by Stress
## t = 2.8397, df = 102.11, p-value = 0.005451
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
## 0.04741781 0.26711265
## sample estimates:
## mean in group high mean in group normal
## 3.366250 3.208985
With a p-value lower than the significance level being 0.005, we reject the null hypothesis that there isnt a correlation between stress levels and GPA of students at college X.
We can be 95% confident the true difference in GPA between high and normal stress students is between .047 and .27. 0 is not in the confidence interval meaning it is not plausible there is no difference.
Our first question and following test shows the GPA at college X is over 3.1. the second question and following test shows there is no difference between males and females at college X. The third question shows there is a relation between stress levels of students and their GPA.