College students at a large state university completed a survey about their academic and personal life. Questions ranged from “How many credits are you registered for this semester?” to “Would you define yourself as a vegetarian?” Four sections of an introductory statistics course were chosen at random from all the sections of introductory statistics courses offered at the university in the semester when the survey was conducted, and the 312 students who completed the survey were students registered in one of the four chosen sections.
The data observes individual state university students, who are enrolled in one of four selected introductory statistics courses. Each variable corresponds to their answer for one of the survey questions.
rm(list=ls())
load("cell_phones.RData")
x<-data
x<-na.omit(x)
head(x)
## Math Verbal Credits Year Exer Sleep Veg. Cell
## 1 640 470 15 1 60 7.0 no yes
## 2 660 650 14 1 20 7.5 no yes
## 3 550 580 15 2 0 9.0 no no
## 4 560 660 16 1 30 7.0 no yes
## 5 600 790 15 4 45 6.5 some no
## 6 560 640 16 2 75 4.5 yes no
hist(x$Verbal, main="Distribution of sample verbal SAT scores", xlab="Verbal SAT Score")
summary(x$Verbal)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 400.0 542.5 590.0 596.7 650.0 800.0
sd(x$Verbal)
## [1] 77.69806
We see that the distribution of the sampled students’ verbal SAT scores is ~N(597, 78). We will proceed to test whether or not the mean verbal score is statistically distinct from that of the general university population.
sigma = 111
u0 = 580
z = (mean(x$Verbal) - u0) / ( (sigma) / (sqrt(length(x$Verbal))))
paste("Z = ", round(z,2))
## [1] "Z = 2.47"
paste("p = ", round(2*pnorm(abs(z), lower.tail=FALSE),2))
## [1] "p = 0.01"
We performed a 1 sample Z test, and received the following results:
Therefore, we can reject the null hypothesis, and conclude that the mean verbal SAT score of stat students is significantly different than the mean score for the entire student body in this university.
We start by calculating the relative frequency of phone ownership amongst the selected students
tbl = table(x$Cell)
tbl2 = round(100*tbl/sum(tbl),2)
y = c(tbl2[1], tbl2[2])
names(y) <- c("% Doesn't Own", "% Owns")
paste("Phone ownership amongst sample")
## [1] "Phone ownership amongst sample"
y
## % Doesn't Own % Owns
## 21.85 78.15
Then, we can visualize these results.
pie(tbl2, labels=c(paste(tbl2[1], "% Do not own a phone"),
paste(tbl2[2], "% Own a phone")),
main="Phone Ownership Amongst Students")
Now, we can take these results and compare to the ownership proportions of university students in the US, to see if this university’s students own phones at a lower rate.
p0 = .8
n = length(x$Cell)
np = length(x$Cell[x$Cell == "yes"])
prop.test(np, n, p0, alternative="less", correct=FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: np out of n, null probability p0
## X-squared = 0.5787, df = 1, p-value = 0.2234
## alternative hypothesis: true p is less than 0.8
## 95 percent confidence interval:
## 0.0000000 0.8199443
## sample estimates:
## p
## 0.7814815
z = (0.7814815 - p0)/sqrt(p0*(1-p0)/n)
z
## [1] -0.760725
We performed a 1 sample Z test for proportions, and received the following results:
Given our high p value, we fail to reject the null hypothesis. Even though we found that the sample mean is lower than .8, it is not a magnitude lower that may be considered statistically significant.
hist(x$Sleep, main="Distribution of sample hours slept",
xlab="Hours slept in a typical day")
summary(x$Sleep)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 6.500 7.000 7.239 8.000 15.000
sd(x$Sleep)
## [1] 1.434884
u0 = 7
t.test(x$Sleep, alternative="two.sided", mu=u0)
##
## One Sample t-test
##
## data: x$Sleep
## t = 2.7357, df = 269, p-value = 0.00664
## alternative hypothesis: true mean is not equal to 7
## 95 percent confidence interval:
## 7.066963 7.410815
## sample estimates:
## mean of x
## 7.238889
We performed a 1 sample t test:
Given our small p value, we may reject the null hypothesis and conclude that stat students at this university, slept on average, at a statistically significantly different rate than US adults.