Testing with Numeric Data

depthData <- read.csv("Depth.csv")

flowerData <- read.csv("Growth.csv")

Coffee (5 points)

A professor posed a hypothesis that there was no difference between how much coffee students drink between student majoring in Environmental Science and Biology. She randomly sampled 25 Esci and 25 Biology students. The mean coffee consumption for Esci students was 40 oz per day and the standard deviation was 5. The mean coffee consumption for Biology students was 30 oz per day and the standard deviation was 5. If she used a two-tailed t test to compare the mean coffee consumption between Esci and Biology students:

Statistical Hypotheses

H\(_0\): \(\mu\)\(_e\)\(_s\)\(_c\)\(_i\) = \(\mu\)\(_b\)\(_i\)\(_o\)

H\(_A\): \(\mu\)\(_e\)\(_s\)\(_c\)\(_i\) \(\neq\) \(\mu\)\(_b\)\(_i\)\(_o\)

What was the computed t-value? (2pts)

# Set the seed for reproducibility
set.seed(123)

# Generate a vector of 25 samples with mean 25 and sd 5
esci.data <- rnorm(25, mean = 40, sd = 5)
bio.data <- rnorm(25, 30, 5)

#just perform a t-test
t.test <- t.test(esci.data, bio.data, var.equal = T)

#look at the output for t-value
t.test

## 
##  Two Sample t-test
## 
## data:  esci.data and bio.data
## t = 7.0662, df = 48, p-value = 5.821e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.669972 11.975351
## sample estimates:
## mean of x mean of y 
##  39.83335  30.51069

#t-stat = 9.0104, df = 48, p-value < 0.001

What is the critical value for T in this scenario?

# Degrees of freedom
df <- length(esci.data) + length(bio.data) - 2
#should be 48! Just like the t-test output. You can check your work here. 

# Significance level
alpha <- 0.05

# Calculate t-critical value for a two-tailed test
t_critical <- qt(1 - alpha/2, df)

t_critical

## [1] 2.010635

#2.010635 is t-critical

#t.critical < t.calculated and we can reject our null hypothesis in favor of alternative.

Now calculate the p value.

#this is all in the above t-test output, but below I will show you the long way to calculate it

# Calculate mean and standard deviation for each group
mean_esci <- mean(esci.data)
mean_bio <- mean(bio.data)
sd_esci <- sd(esci.data)
sd_bio <- sd(bio.data)

# Sample sizes
n_esci <- length(esci.data)
n_bio <- length(bio.data)

# Degrees of freedom
df <- n_esci + n_bio - 2

# Calculate the standard error of the difference in means
se_diff <- sqrt((sd_esci^2 / n_esci) + (sd_bio^2 / n_bio))

# Calculate the t-statistic
t_stat <- (mean_esci - mean_bio) / se_diff

# Calculate the p-value for a two-tailed test
p_value <- 2 * pt(-abs(t_stat), df)

p_value #VOILA! 6.8 * 10 ^ -12 matches the number in the t.test function. Things are looking good.

## [1] 5.821082e-09

What is your conclusion about coffee drinking and choice of major?

Based on the results of the two-sample t-test comparing the mean coffee consumption between Esci and Biology students, we found a statistically significant difference (t≈7.07, df=48, p < 0.05). The mean coffee consumption for Esci students (40 oz per day) was significantly higher than that for Biology students (30 oz per day).

Therefore, we reject the null hypothesis, which suggested no difference in coffee consumption between the two groups. This suggests that there is evidence to support the hypothesis that there is a significant difference in coffee consumption habits between Environmental Science and Biology students. It’s essential to consider these findings in the context of potential lifestyle or academic factors that might contribute to this observed difference.

Snow Pack (11 points)

Students in the WWU Ski and Snowboard club were interested in assessing past few years of snow pack up at the Mt Baker ski area in November. They want to compare this year to previous years to see if they could predict how good the ski season will be this year. They were particularly interested to see if the average snowfall for any given day in November 2017 was different than the average for November 2018 , because snow conditions are so variable in the fall and it is hard to tell if the averages could be considered different. The data can be found in the snowdepth tab of the accompanying spreadsheet.

What are the variables in this example and what kind of variables are they?

Two different continuous data for 2 categorical years

Plot the data using an appropriate plot. (2pts)

#there are huge outliers in the data. We should be careful about removing these data, but if we have detailed notes and we know that there was a mistake in that data, we can remove it. I will leave it here for now and just zoom into the boxplot below using the ylim() function. 

boxplot(depthData$Nov.17, ylim = c(0, 75), col = "tomato") #ylim() tells the y axis what distance to go to

boxplot(depthData$Nov.18, ylim = c(0, 20), col = "plum") #dang, look how crazy these data are.

What test is the most appropriate to use to analyze the difference between means for this set of data?

Because there are two categories (Nov. 17, and Nov. 18) and numerical data (temp), we should compare using a t-test.

State a null and alternative hypothesis that you could use in this scenario.

H\(_0\): The mean snow pack for Nov. 17 is equal to the mean snow pack for Nov. 18.

H\(_A\): The mean snow pack for Nov. 17 is NOT equal to the mean snow pack for Nov. 18.

Test this hypothesis using R. Include down all your steps required to do this accurately. (6pts)

#check out some summary stats and plot for normality
mean_2018 <- mean(depthData$Nov.18) #13.38
mean_2017 <- mean(depthData$Nov.17) #40.33

hist(depthData$Nov.17) #not exactly normal!

hist(depthData$Nov.18) #oh boy. This super violates our assumptions of normality. At this point we probably should not proceed. For the sake of the steps, I will continue though.

#Assessing equal variance (look at teh sd or both)

nov18.sd <- sd(depthData$Nov.18) #39

nov17.sd <- sd(depthData$Nov.17) #33

#those are approximately similar. We can continue assuming equal variance

t.test(depthData$Nov.17, depthData$Nov.18, var.equal=T)

## 
##  Two Sample t-test
## 
## data:  depthData$Nov.17 and depthData$Nov.18
## t = 13.905, df = 1438, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  23.14712 30.75069
## sample estimates:
## mean of x mean of y 
##  40.33098  13.38208

It seems that the snow pack for 2017 and 2018 was statistically significant (t ~ 13.9, df= 1438, p < 0.001). The average snowpack of for 2017 was much larger (mean = 40.33) than the average snowpack of 2018 (mean = 13.39).

Flowers (11 points)

We are going to talk about snow again… but this time in terms of biological effects. You are interested in testing whether the amount of snow affects growth of an early spring annual flower. You perform an experiment where you manipulate snow levels. The experimental variable is the amount of snow with 3 levels: low (1 inch), medium (3 inches), and high (6 inches). The response variable is plant height. The experiment is properly designed so that the data are independent and samples are randomly collected. You can find the data in snowgrowth tab of the accompanying spreadsheet.

Plot the data using an appropriate plot.(2pts)

#the way the data are setup will make ggplot useful

library(ggplot2) #x and y are even labelled! so you can just input the column names below

ggplot(flowerData, aes(x, y, color=x))+
  geom_jitter(width=0.2)+
  labs(title = "Checking out the data", 
       y="Flower growth", 
       x="Treatment")#BTW you should have labelled all of your charts and graphs for full credit

What test is the most appropriate to use to analyze the difference between means of depths for this set of data?

Here we have categorical treatment data (with 3 levels, low med and high) and numerical data for each of those. We will need to use an ANOVA because we have 3 levels instead of just two (i.e. there were only two categories in the depth data from the last question, 2017 and 2018 so we could just use a t-test).

What are the factors in this data set? How many levels?

There is 1 factor (treatment) with 3 levels (low/med/high)

State the null statistical hypothesis and the alternative statistical hypothesis.

H\(_0\): \(\mu\)\(_1\) = \(\mu\)\(_2\) = \(\mu\)\(_3\)

H\(_A\): At least one of those means is different and we will run a tukey HSD to find out which one it is.

Do the data meet all the assumptions for this test? Explain why and show work. (2pts)

ggplot(flowerData, aes(y, fill=x))+
  geom_histogram(alpha = 0.7, bins=30) #medium does not look very normal

#we will proceed assuming that if we sampled more, our data would look more bell-shaped

tapply(flowerData$y, flowerData$x, mean) #if any means are statistically diff it will be between low and high

##     High      Low      Med 
## 45.86812 40.58250 43.74562

tapply(flowerData$y, flowerData$x, sd) #the variances are close enough

##     High      Low      Med 
## 3.457865 5.387842 7.132181

Run the statistical test in R. Share your commands and the results. (2pts)

foranova <- aov(y~x, data = flowerData)

summary(foranova)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## x            2  226.4  113.20   3.697 0.0326 *
## Residuals   45 1377.8   30.62                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(foranova) #somebody is different enough!

## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value  Pr(>F)  
## x          2  226.39 113.195   3.697 0.03262 *
## Residuals 45 1377.81  30.618                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Calculate test statistic and P-value.

The test statistic for an anova is F! So our test statistic and p-value are: F = 3.697; p = 0.03262

What do you conclude?

We can reject our null hypothesis in favor of our alternative. After conducting a post-hoc Tukey HSD test we found that high levels of snowpack had higher flower growth than low levels, (F\(_2\)\(_,\)\(_4\)\(_5\) = 3.697; p < 0.001).

Depths (5 points)

Now we want to work with the same data as in #8, but we want to consider how each of the three depths may be different.

Analyze three different combinations of 2 treatments with a t-test. Adjust your alpha value using the Bonferroni correction. Which treatments differ from which other treatments? (3pts)

low <- subset(flowerData, x =="Low")

med <- subset(flowerData, x =="Med")

high <- subset(flowerData, x == "High")

low.t <- t.test(low$y)
med.t <- t.test(med$y)
high.t <- t.test(high$y)

p_values <- c(low.t$p.value, med.t$p.value, high.t$p.value)

p.adjust(p_values, method = "bonferroni")

## [1] 2.341333e-14 4.813041e-13 5.208736e-18

They are not the same. I think because of rounding errors - I will use anova and aov functions instead, since they are better.

Now run a Tukey-Kramer test on an ANOVA. Share the results table.

TukeyHSD(foranova)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = y ~ x, data = flowerData)
## 
## $x
##               diff        lwr        upr     p adj
## Low-High -5.285625 -10.027019 -0.5442308 0.0257392
## Med-High -2.122500  -6.863894  2.6188942 0.5282355
## Med-Low   3.163125  -1.578269  7.9045192 0.2490932

#low and high are statistically different

How do the Tukey-Kramer results compare to your analysis in part a.

Tukey HSD tells you which two means are different. This will be important when you don’t have ordinal data and you are trying to decide which two means from multiple categories are different from one another. This example was obvious though.

HW