head(`SleepStudy.(1)`)
## Gender ClassYear LarkOwl NumEarlyClass EarlyClass GPA ClassesMissed
## 1 0 4 Neither 0 0 3.60 0
## 2 0 4 Neither 2 1 3.24 0
## 3 0 4 Owl 0 0 2.97 12
## 4 0 1 Lark 5 1 3.76 0
## 5 0 4 Owl 0 0 3.20 4
## 6 1 4 Neither 0 0 3.50 0
## CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1 -0.26 4 4 3 8
## 2 1.39 6 1 0 3
## 3 0.38 18 18 18 9
## 4 1.39 9 1 4 6
## 5 1.22 9 7 25 14
## 6 -0.04 6 14 8 28
## DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1 normal normal normal 15 28 Moderate 10
## 2 normal normal normal 4 25 Moderate 6
## 3 moderate severe normal 45 17 Light 3
## 4 normal normal normal 11 32 Light 2
## 5 normal severe normal 46 15 Moderate 4
## 6 moderate moderate high 50 22 Abstain 0
## WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1 25.75 8.70 7.70 25.75 9.50 5.88
## 2 25.70 8.20 6.80 26.00 10.00 7.25
## 3 27.44 6.55 3.00 28.00 12.59 10.09
## 4 23.50 7.17 6.77 27.00 8.00 7.25
## 5 25.90 8.67 6.09 23.75 9.50 7.00
## 6 23.80 8.95 9.05 26.00 10.75 9.00
## AverageSleep AllNighter
## 1 7.18 0
## 2 6.93 0
## 3 5.02 0
## 4 6.90 0
## 5 6.35 0
## 6 9.04 0
attach(`SleepStudy.(1)`)
hist(DASScore)
summary(DASScore)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 7.00 16.00 20.04 28.00 82.00
# we estimate $\mu$
# 1- by point estimate
xbar <- mean(DASScore)
xbar
## [1] 20.03953
#2- by confidence intervals
s <- sd(DASScore)
n <- length(DASScore)
s
## [1] 16.54187
n
## [1] 253
2 - verify the condition to estimate to ensure \(\bar x\) is normally distributed:
#check sample size and skewness
n>30
## [1] TRUE
# It is right skewnsee
hist(DASScore)
# Compute the standard error
s <-
standard_error <- function(s, n) {
# s: sample standard deviation
# n: sample size
return(s/sqrt(n))
}
se <- standard_error (16.54187, 253)
se
## [1] 1.039978
# Compute the t* value for 95% confidence interval.
tstar <- qt(0.975,df = n-1)
tstar
## [1] 1.969422
# Compute the confidence interval (can use the function above!)
ci_function <- function(xbar, s, n, tstar) {
# xbar: sample mean
# s: sample standard deviation
# n: sample size
# tstar: t* computed using qt() for confidence interval
lower <- xbar - tstar * standard_error(s, n)
upper <- xbar + tstar * standard_error(s, n)
return(c(lower, upper))
}
xbar <- 20.03953
s <- 16.54187
n <- 253
tstar <- 1.969422
ci_function(xbar, s, n, tstar)
## [1] 17.99137 22.08769
With 95% confidence, we estimate the the average combined depressed, anxious, stressed score (DASscore) for this population of college students is between 17.99 and 22.09.
#Check missing data
is.na(DASScore)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE
sum(is.na(DASScore)) #no missing data
## [1] 0
#Point estimate and sample size
n <- length(DASScore)
x.bar <- mean(DASScore)
#Bootstrap
#10000 simulations
boot.xbars <- c() #initializing vector
for (b in 1:10000){
boot.samp <- sample(DASScore,n,replace=TRUE)
boot.xbar <- mean(boot.samp)
boot.xbars <- c(boot.xbar,boot.xbars)
}
se <- sd(boot.xbars)
#95% confidence interval for the average budget of US movies
#Chop 2.5% on each end
CI.lower <- sort(boot.xbars)[250] #2.5%
CI.upper <- sort(boot.xbars)[9750] #97.5%
#CI
c(CI.lower, CI.upper)
## [1] 18.03953 22.08696
With 95% confidence, we estimate the the average combined depressed, anxious, stressed score (DASscore) for this population of college students is between 18.03 and 22.08.
An estimate for the proportion of college students who pulled all nighters is 0.13 .
\(H_0:\) The majority of college students in this population pull all nighters. \(H_A:\) The majority of college students in this population not pull all nighters. significant level of 0.05
```r
attach(`SleepStudy.(1)`)
```
```
## The following objects are masked from SleepStudy.(1) (pos = 3):
##
## AlcoholUse, AllNighter, AnxietyScore, AnxietyStatus, AverageSleep,
## ClassesMissed, ClassYear, CognitionZscore, DASScore,
## DepressionScore, DepressionStatus, Drinks, EarlyClass, Gender, GPA,
## Happiness, LarkOwl, NumEarlyClass, PoorSleepQuality, Stress,
## StressScore, WeekdayBed, WeekdayRise, WeekdaySleep, WeekendBed,
## WeekendRise, WeekendSleep
```
```r
table(AllNighter)
```
```
## AllNighter
## 0 1
## 219 34
```
```r
prop.table(table(AllNighter))
```
```
## AllNighter
## 0 1
## 0.8656126 0.1343874
```
```r
n<- length(AllNighter)
n
```
```
## [1] 253
```
```r
k <- sum(AllNighter=="1")
k
```
```
## [1] 34
```
```r
prop.test(k,n)
```
```
##
## 1-sample proportions test with continuity correction
##
## data: k out of n, null probability 0.5
## X-squared = 133.82, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.09609494 0.18412247
## sample estimates:
## p
## 0.1343874
```
```r
#Compute the standard error, test statistic, and p-value.
# Define a function to calculate the test statistic
#test_statistic <- function(point_estimate,null_value,se){
# ans<- (point_estimate - null_value)/se
# return(ans)
#}
# Compute the test statistic and p-value
#t.stat <- test_statistic(point_estimate = xbar, null_value = 34,se= se) # number of student pulled all night is 34
#t.stat
#2*pt(t.stat,df = n-1,lower.tail = FALSE )
# Compare with t.test()
hist(AverageSleep)
```
<img src="Homework10_files/figure-html/unnamed-chunk-5-1.png" width="672" />
```r
AllNighter1 <- factor(AllNighter,levels = c(0,1),labels = c("Not Pulled","pulled"))
barplot(table(AllNighter1),main = "college students Prop.", col="cyan")
```
<img src="Homework10_files/figure-html/unnamed-chunk-5-2.png" width="672" />
0.87 not pulled all nighters. So the majority of college students in this population did not pull all nighters.
0.13 pulled all nighters and they represent less than quarter as shown in the graph.
We have a strong evidence the majority of college students in this population did not pull all-nighters because the alternative hypothesis is true and p_value not equal to 0.5 so based in that we can reject the null hypothesis test.