Background

College students at a large state university completed a survey about their academic and personal life. Questions ranged from “How many credits are you registered for this semester?” to “Would you define yourself as a vegetarian?” Four sections of an introductory statistics course were chosen at random from all the sections of introductory statistics courses offered at the university in the semester when the survey was conducted, and the 312 students who completed the survey were students registered in one of the four chosen sections.

Raw Data

The data observes individual state university students, who are enrolled in one of four selected introductory statistics courses. Each variable corresponds to their answer for one of the survey questions.

rm(list=ls())
load("cell_phones.RData")
x<-data
x<-na.omit(x)
head(x)
##   Math Verbal Credits Year Exer Sleep Veg. Cell
## 1  640    470      15    1   60   7.0   no  yes
## 2  660    650      14    1   20   7.5   no  yes
## 3  550    580      15    2    0   9.0   no   no
## 4  560    660      16    1   30   7.0   no  yes
## 5  600    790      15    4   45   6.5 some   no
## 6  560    640      16    2   75   4.5  yes   no

Q1. The mean verbal SAT score of all the students in this university is 580. Is this also the case for all stat students at this university? Note that verbal SAT scores in the U.S. have a standard deviation of 111.

hist(x$Verbal, main="Distribution of sample verbal SAT scores", xlab="Verbal SAT Score")

summary(x$Verbal)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   400.0   542.5   590.0   596.7   650.0   800.0
sd(x$Verbal)
## [1] 77.69806

We see that the distribution of the sampled students’ verbal SAT scores is ~N(597, 78). We will proceed to test whether or not the mean verbal score is statistically distinct from that of the general university population.

sigma = 111
u0 = 580

z = (mean(x$Verbal) - u0) / ( (sigma) / (sqrt(length(x$Verbal))))

paste("Z = ", round(z,2))
## [1] "Z =  2.47"
paste("p = ", round(2*pnorm(abs(z), lower.tail=FALSE),2))
## [1] "p =  0.01"

We performed a 1 sample Z test, and received the following results:

  • Null Hypothesis: sample mean verbal SAT score == 580
  • Alternative Hypothesis: sample mean verbal SAT score =/= 580
  • Test Statistic: z = 2.47
  • P Value: p = .01

Therefore, we can reject the null hypothesis, and conclude that the mean verbal SAT score of stat students is significantly different than the mean score for the entire student body in this university.

Q2. Based on a recent study, roughly 80% of college students in the U.S. own a cell phone. Do the data provide evidence that the proportion of students who own cell phones in this university is lower than the national figure?

We start by calculating the relative frequency of phone ownership amongst the selected students

tbl = table(x$Cell)
tbl2 = round(100*tbl/sum(tbl),2)
y = c(tbl2[1], tbl2[2])
names(y) <- c("% Doesn't Own", "% Owns")
paste("Phone ownership amongst sample")
## [1] "Phone ownership amongst sample"
y
## % Doesn't Own        % Owns 
##         21.85         78.15

Then, we can visualize these results.

pie(tbl2, labels=c(paste(tbl2[1], "%  Do not own a phone"), 
                  paste(tbl2[2], "%  Own a phone")),
    main="Phone Ownership Amongst Students")

Now, we can take these results and compare to the ownership proportions of university students in the US, to see if this university’s students own phones at a lower rate.

p0 = .8
n = length(x$Cell)
np = length(x$Cell[x$Cell == "yes"])
prop.test(np, n, p0, alternative="less", correct=FALSE)
## 
##  1-sample proportions test without continuity correction
## 
## data:  np out of n, null probability p0
## X-squared = 0.5787, df = 1, p-value = 0.2234
## alternative hypothesis: true p is less than 0.8
## 95 percent confidence interval:
##  0.0000000 0.8199443
## sample estimates:
##         p 
## 0.7814815
z = (0.7814815 - p0)/sqrt(p0*(1-p0)/n)
z
## [1] -0.760725

We performed a 1 sample Z test for proportions, and received the following results:

  • Null Hypothesis: sample proportion of phone owners == .8
  • Alternative Hypothesis: sample proportion of phone owners < .8
  • Test Statistic: z = -.7
  • P Value: p = .22

Given our high p value, we fail to reject the null hypothesis. Even though we found that the sample mean is lower than .8, it is not a magnitude lower that may be considered statistically significant.

Q3. Adults in the U.S. average 7 hours of sleep a night. Is this also the mean for all stat students at this university?

hist(x$Sleep, main="Distribution of sample hours slept", 
     xlab="Hours slept in a typical day")

summary(x$Sleep)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   6.500   7.000   7.239   8.000  15.000
sd(x$Sleep)
## [1] 1.434884
u0 = 7
t.test(x$Sleep, alternative="two.sided", mu=u0)
## 
##  One Sample t-test
## 
## data:  x$Sleep
## t = 2.7357, df = 269, p-value = 0.00664
## alternative hypothesis: true mean is not equal to 7
## 95 percent confidence interval:
##  7.066963 7.410815
## sample estimates:
## mean of x 
##  7.238889

We performed a 1 sample t test:

  • Null Hypothesis: sample mean hours slept == 7
  • Alternative Hypothesis: sample mean hours slept =/= 7
  • Test Statistic: t = 2.7
  • P Value: p = .006

Given our small p value, we may reject the null hypothesis and conclude that stat students at this university, slept on average, at a statistically significantly different rate than US adults.