Below are the answers and commands that correspond to the handout from class.
Load the data:
sleep<-read.file("/home/emesekennedy/Data/Ch9/sleep.txt")
## Reading data with read.table()
Require the reshape2 package:
require(reshape2)
## Loading required package: reshape2
Create a two-way table:
acast(sleep, EnoughSleep~Exercise)
## Using Count as value column: use value.var to override.
## High Low
## No 148 242
## Yes 151 115
Save the two-way table as sleeptable:
sleeptable<-acast(sleep, EnoughSleep~Exercise)
## Using Count as value column: use value.var to override.
Include optional commands to display the row and column sums:
acast(sleep, EnoughSleep~Exercise,fun.aggregate=sum, margins=T)
## Using Count as value column: use value.var to override.
## High Low (all)
## No 148 242 390
## Yes 151 115 266
## (all) 299 357 656
Look at the joint distribution:
prop.table(sleeptable)
## High Low
## No 0.2256098 0.3689024
## Yes 0.2301829 0.1753049
Look at the conditional distributions:
prop.table(sleeptable, 1)
## High Low
## No 0.3794872 0.6205128
## Yes 0.5676692 0.4323308
prop.table(sleeptable, 2)
## High Low
## No 0.4949833 0.6778711
## Yes 0.5050167 0.3221289
Create a bargraph where you group by the variable Exercise:
barchart(Count~EnoughSleep, groups=Exercise, data=sleep, auto.key=T)
Create a barraph where you group by the variable EnoughSleep::
barchart(Count~Exercise, groups=EnoughSleep, data=sleep, auto.key=T)
After looking at the two-way tables and barcharts it appears that there is an association between adequate sleep and exercise. It seems like that people who don’t get enough sleep also don’t exercise much, and people who get enough sleep tend to exercise more.
Compute expected cell counts:
Yes-High:
266*299/656
## [1] 121.2409
Yes-Low:
266*357/656
## [1] 144.7591
No-High:
390*299/656
## [1] 177.7591
No-Low:
390*357/656
## [1] 212.2409
Compute the \(\chi^2\) value
(151-121.24)^2/121.24+(115-144.76)^2/144.76+(148-177.76)^2/177.76+(242-212.24)^2/212.24
## [1] 22.57833
The \(\chi^2\) value is approximately 22.58.
Plot the distribution of the \(\chi^2\) statistic:
plotDist("chisq", df=1)
The plot doesn’t look like much. Let’s plot it as a histogram:
plotDist("chisq", df=1, kind="histogram")
1-xpchisq(22.58, df=1)
## [1] 2.015721e-06
So the \(P\)-value is \(2\times10^{-6}=.000002\).
Carry out the significance test using the command chisq.test:
chisq.test(sleeptable)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: sleeptable
## X-squared = 21.825, df = 1, p-value = 2.987e-06
Notice that the values are a little different from what we computed. That is because by default R Studio uses continuity correction to make the results more accurate. We can turn this off by using the option correct=F:
chisq.test(sleeptable, correct=F)
##
## Pearson's Chi-squared test
##
## data: sleeptable
## X-squared = 22.577, df = 1, p-value = 2.019e-06
This gives the same values we found.
The \(P\)-value is really small, so we can definitely reject the null hypothesis. This means that we have very strong evidence to suggest that there is an association between adequate sleep and exercise.