HW5

This is week 7 of your BIO205 class. Previously, we learned how to compare dependent variables from two samples, using a two sample t test. This week, we will explore our Grip data we collected in lab, as well as other datasets, to better understand comparison of two or more groups. When we have two groups, we use a t-test. When we have more than two groups, we use a test called Analysis of Variance (ANOVA).

Learning outcomes

Part 1. Analyzing the grip data

Assign your file to an object

To work with datasets, we need to read them, usually in the form of a csv file. To do this, we use the read.csv() function. The file (“BIO205GripNew.csv”) is loaded into your R studio cloud workspace. Create a object name (grip) and assign this to the file.

grip1 <- read.csv("Grip_part1.csv")

Let’s look at the file.

str(grip1) 
## 'data.frame':    17 obs. of  19 variables:
##  $ name                              : chr  "Student49" "Student59" "Student51" "Student47" ...
##  $ semester                          : chr  "2022_spring" "2022_Spring" "2022_Spring" "2022_Spring" ...
##  $ dom.hand                          : chr  "Right" "Right" "Left" "Left" ...
##  $ height.in                         : num  62 64 66 65 64 70 65 65 65 68 ...
##  $ arm.circumference.in              : num  12.5 9.6 9.25 9.4 9.1 9.75 10.5 9.3 13.8 9.4 ...
##  $ dom.max.grip.lbs                  : num  42.8 54.8 73 52.5 67 75 43.8 79.6 82.8 64 ...
##  $ non.dom.max.grip.lbs              : num  47 62 56 57.8 53 65 32.4 43.2 59.6 72 ...
##  $ dom.fatigue.secs                  : num  2.5 29.4 44.7 21.3 22.3 ...
##  $ non.dom.fatigue.secs              : num  NA 21.2 28.1 16.2 42.5 ...
##  $ athlete.status                    : chr  "No" "Yes" "Yes" "Yes" ...
##  $ shoe.size.US                      : num  9 7 8.5 7.5 9 9 6 8.5 10.5 9 ...
##  $ Avg.sleep.per.night.hrs           : int  6 6 7 7 7 7 7 7 7 7 ...
##  $ age.yrs                           : int  19 20 19 20 19 19 19 20 20 20 ...
##  $ birth.month                       : chr  "June" "December" "July" "February" ...
##  $ step.length.heeltoheel.in         : num  24.1 24.8 24.8 24.7 26.3 33.3 22.1 27.3 29.8 24.1 ...
##  $ step.length.heeltoheel.backward.in: num  15.3 19.2 23.2 14.6 24.7 26.2 16 19.3 24.5 19.5 ...
##  $ drinkscaffiene                    : chr  "No" "Yes" "No" "Yes" ...
##  $ head.circumference.in             : num  23.2 22.8 22.5 22.9 23.5 ...
##  $ horoscopesign                     : chr  "Gemini" "" "Cancer" "Aquarius" ...
head(grip1)
##        name    semester dom.hand height.in arm.circumference.in
## 1 Student49 2022_spring    Right        62                12.50
## 2 Student59 2022_Spring    Right        64                 9.60
## 3 Student51 2022_Spring     Left        66                 9.25
## 4 Student47 2022_Spring     Left        65                 9.40
## 5 Student50 2022_Spring    Right        64                 9.10
## 6 Student52 2022_spring    Right        70                 9.75
##   dom.max.grip.lbs non.dom.max.grip.lbs dom.fatigue.secs non.dom.fatigue.secs
## 1             42.8                 47.0             2.50                   NA
## 2             54.8                 62.0            29.35                21.25
## 3             73.0                 56.0            44.70                28.07
## 4             52.5                 57.8            21.33                16.25
## 5             67.0                 53.0            22.31                42.55
## 6             75.0                 65.0            26.98                39.83
##   athlete.status shoe.size.US Avg.sleep.per.night.hrs age.yrs birth.month
## 1             No          9.0                       6      19        June
## 2            Yes          7.0                       6      20    December
## 3            Yes          8.5                       7      19        July
## 4            Yes          7.5                       7      20    February
## 5             No          9.0                       7      19        June
## 6             No          9.0                       7      19         May
##   step.length.heeltoheel.in step.length.heeltoheel.backward.in drinkscaffiene
## 1                      24.1                               15.3             No
## 2                      24.8                               19.2            Yes
## 3                      24.8                               23.2             No
## 4                      24.7                               14.6            Yes
## 5                      26.3                               24.7            Yes
## 6                      33.3                               26.2            Yes
##   head.circumference.in horoscopesign
## 1                 23.25        Gemini
## 2                 22.80              
## 3                 22.50        Cancer
## 4                 22.90      Aquarius
## 5                 23.50        Gemini
## 6                 23.00        Taurus

Let’s review the last part of Tuesday’s (Twosday 2/22) worksheet

We wanted to compare dominant hand fatigue in the groups of varying sleep.

What was your biological null hypothesis?
Maybe it was something like: Rest is important for muscle recovery.

To test this, maybe you collected data on peoples’ grip strength fatigue, and also polled them on their numbers of sleep hours. Nice, we have that data.

What is the statistical null?
May it is something like: Those students with 6, 7, or 8 hours of sleep will have the same mean fatigue time on a grip strength test.

For this, sleep is your independent variable and domninant hand fatigue is your measurement, or dependent variable. Because sleep, numeric information in the table, is your independent variable, we want to change it to nominal (factor in R) information. Use the as.factor() function.

grip1$Avg.sleep.per.night.hrs <- as.factor(grip1$Avg.sleep.per.night.hrs)

If we want to see summary stats, we can use shortcuts instead of subsetting functions. For example, let’s use the summarySE() function.

library(Rmisc)
## Loading required package: lattice
## Loading required package: plyr
gripSum <- summarySE(data = grip1, 
                     measurevar = "dom.fatigue.secs",
                     groupvars = "Avg.sleep.per.night.hrs")
## Warning in qt(conf.interval/2 + 0.5, datac$N - 1): NaNs produced
gripSum
##   Avg.sleep.per.night.hrs N dom.fatigue.secs       sd        se         ci
## 1                       6 2         15.92500 18.98582 13.425000 170.580799
## 2                       7 9         20.37333 11.82596  3.941987   9.090237
## 3                       8 5         30.55800 27.14187 12.138215  33.701088
## 4                    <NA> 1          6.50000       NA        NA        NaN

Notice the extra row. Let’s get rid of that.

gripSum <- gripSum[1:3,]
gripSum
##   Avg.sleep.per.night.hrs N dom.fatigue.secs       sd        se         ci
## 1                       6 2         15.92500 18.98582 13.425000 170.580799
## 2                       7 9         20.37333 11.82596  3.941987   9.090237
## 3                       8 5         30.55800 27.14187 12.138215  33.701088

Now, we can run the one factor ANOVA. Recall from lecture, this works off the lm() function, which produces a linear model. Then, we use the anova() function to run the model.

grip1.lm <- lm(grip1$dom.fatigue.secs ~ grip1$Avg.sleep.per.night.hrs)
anova(grip1.lm)
## Analysis of Variance Table
## 
## Response: grip1$dom.fatigue.secs
##                               Df Sum Sq Mean Sq F value Pr(>F)
## grip1$Avg.sleep.per.night.hrs  2  447.8  223.91  0.6577 0.5345
## Residuals                     13 4426.0  340.46

Are there differences among groups?

The ANOVA test only allos us to examine whether groups in general are different than the model. It doesn’t tell us whether those that slept 6 hours is different than those that slept 8. For this, we need to do post-hoc tests. The most common post-hoc test is the Tukey HSD. This does pairwise comparisons (similar to what we did for chi square tests before) of all groups, and gives us a p value. Remember, independent variable information needs to be nominal data.

TukeyHSD(aov(grip1$dom.fatigue.secs ~ grip1$Avg.sleep.per.night.hrs))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = grip1$dom.fatigue.secs ~ grip1$Avg.sleep.per.night.hrs)
## 
## $`grip1$Avg.sleep.per.night.hrs`
##          diff       lwr      upr     p adj
## 7-6  4.448333 -33.63813 42.53479 0.9491276
## 8-6 14.633000 -26.12938 55.39538 0.6209803
## 8-7 10.184667 -16.99025 37.35959 0.5959667

THis gives us all comparisons. If we search the one i just mentioned (6 vs 8), it is about 2/3 way down the list. The p value is 1.00, which suggest this is very much not differnet. Looking at the data, that is not much of a surprise.

Part 2. Graphing

OK. Graphing for two or multiple groups is the same. Let’s first try bar graphs. Let’s do a basic graph first of the sleep vs grip example above.

For bar graphs in ggplot, we can use the result of the summarySE function.

library(ggplot2)
gripSum # this is from above, we can use the same table
##   Avg.sleep.per.night.hrs N dom.fatigue.secs       sd        se         ci
## 1                       6 2         15.92500 18.98582 13.425000 170.580799
## 2                       7 9         20.37333 11.82596  3.941987   9.090237
## 3                       8 5         30.55800 27.14187 12.138215  33.701088
gripbase <- ggplot(data = gripSum,
               aes(x=Avg.sleep.per.night.hrs,
                       y=dom.fatigue.secs)) # this is the first layer of mapping
gripbase

Notice that we added a basic graph background.

gripbase + 
  geom_bar(stat = "identity")

Let’s clean it up now.

gripbase + 
  geom_bar(stat = "identity", 
           fill = c(rainbow(3))) + # this adds color to the bars
  geom_errorbar(aes(ymin = dom.fatigue.secs-se,
                    ymax = dom.fatigue.secs+se), 
                width=0.5) # this adds your error bars, using another element

We can also change titles. Each thing we are adding with the + sign is another element to the graph.

gripbase + 
  geom_bar(stat = "identity", 
           fill = c(rainbow(3))) + # this adds color to the bars
  geom_errorbar(aes(ymin = dom.fatigue.secs-se,
                    ymax = dom.fatigue.secs+se), 
                width=0.5) +
  xlab("hours of sleep") + 
  ylab("grip strength of dominant hand (lbs)")+
  ggtitle("grip strength by hours of sleep")

Viola! Much nicer than our old graphs.

Let’s just quickly do the boxplot version of this graph. Remember that boxplots use all the datapoints, since it is taking information to put them into quartiles. Thus, we don’t need a summarySE table.

# we need a new base
grip2 <- na.omit(grip1)
grip2$Avg.sleep.per.night.hrs <- as.factor(grip2$Avg.sleep.per.night.hrs)

boxgrip <- ggplot(data = grip2, 
                  aes(x=Avg.sleep.per.night.hrs, 
                      y=dom.max.grip.lbs)) # first layer
boxgrip + 
  geom_boxplot(fill=c(rainbow(3))) # add boxplot, add some color

Many times, we want to report real point. It shows transparancy in science. Let’s see how you do this.

boxgrip + geom_boxplot(fill=c(rainbow(3))) + geom_jitter(shape=16, position=position_jitter(0.2)) # add some points

# Part 3. Assignment

On your own, choose any other question, using our dataset, where you can measure 1 dependent variable comparing only 2 groups. Note, please try to do your own, and not the same exact ones as someone else. Name your information different. Then share with others after.

For these questions, use the gripclean.csv datafile. It is already on your working directory. This dataset includes last year’s information, which makes the dataset bigger and more interesting to work with.

For two groups,
* write the biological null
* write the statistical null
* run the test
* graph your data using a bargraph AND boxplot. Add points to your boxplot!
* explain your results

Then, try to find a question that would examine 1 dependent variable comparing more than 2 groups.

For more than groups,
* write the biological null
* write the statistical null
* run the test
* graph your data using a bargraph AND boxplot. Add points to your boxplot!
* explain your results