Markdown Author: Jessie Bell, 2023

Libraries Used: ggplot

Answers: sea green

Part I:

Setting up t-test scenarios

1

Take home from this exercise?

Answer:

  1. Look at the data. (It is informative to visually examine data prior to running a NHST.)… Is it normal? Was the sample random?

  2. State the statistical hypotheses * Null hypothesis (depends on type of t-test) * Alternative (all other possibilities)

  3. Determine the degrees of freedom

    • df=n−1 for 1-sample t-test
    • df=n−2 for 2-sample t-test
    • df=Number of pairs - 1 for a paired t-test
  4. Choose criterion to reject/fail to reject H0

    For instance, set α=0.05. If p≤0.05 then reject H0, else fail to reject H0

  5. Calculate t

    The test statistic for all t-tests is t_calc (or t_stat). However, we calculate this value slightly different for one-sample, paired, and two-sample t-tests.

  6. Make a decision about H0

    • Two options: ‘reject H0’ or ‘fail to reject H0’

    • ‘If the p is low, the null must go’

  7. Write a conclusion sentence

    Compose a summary sentence that makes a conclusion responding to the initial question. Read the following example: The mean length of salmon differed between Chuckanut and Squalicum Creek (two sample t-test, t(2)36=3.12,p=0.004)

The format is: t(tails)df = t-test stat, p = calculated p-value

2

Describe how the assumptions of the ttest are found in the way we use the t-test function.

Part II:

The t-distribution

3

Plot the normal distribution and 2 t-distributions over it. Indicate which is which.

curve(dnorm(x,0,1),-3,3)

curve(dt(x, df=6), from=-4, to=4, col='steelblue') 
curve(dt(x, df=10), from=-4, to=4, col='pink', add=TRUE)
curve(dt(x, df=30), from=-4, to=4, col='red', add=TRUE)

#add legend
legend(-4, .3, legend=c("df=6", "df=10", "df=30"),
       col=c("steelblue", "pink", "red"), lty=1, cex=1.2)

Part III:

1 Sample t-test, 2 tails

The average heart rate of mice is 135 beats per minute under normal conditions, and you want to see if a chemical changes the heart rate of mice.

Statistical Hypotheses

H0: \(\mu=\) 0

HA: \(\mu\neq\) 0

mice <- c(145, 132, 164, 139, 174, 122, 136, 141, 151, 130)

meanMice <- mean(mice)
meanMice #larger than 135, which is the mean heartrate of mice without the chemical
## [1] 143.4
sdMice <- sd(mice)
sdMice
## [1] 15.87591
n <- length(mice)
n
## [1] 10
df <- n-1 #for 1 sample t-test, df=9

tvalueMice <- (meanMice-135)/(sdMice/sqrt(n))

tvalueMice
## [1] 1.673173
# determine t-critical, the mean of our sample is:
qt(0.975, df)
## [1] 2.262157
#just comparing the tcritical (2.26) to tvaluemice (1.67), we can tell that tcritical > tvaluemice and we fail to reject H0. 

#We should still calculate our p-value because we will want to use it in our summary statement. 

pt(tvalueMice, df, lower.tail = F) *2
## [1] 0.1286214
# our p-value is greater that 0.05 and we fail to reject our null. 

4

Mice Conclusion Sentence: The mean heartrate of mice after recieving the chemical was not statistically different from the average heartrate of mice without the chemical. We fail to reject H\(_0\) and there is strong evidence (t\(_2, 14\) = 1.673; two-tailed p > 0.05) that the \(\mu\) heartrate of mice after receiving chemical is not different from mice without the chemical.

2 Sample t-test, 2 tails

You have a mutant and a wildtype line of Drosophila and you want to test whether the wildtype flies faster than the mutant.

Mutant <- c(35.4, 33.0, 28.0, 42.2, 34.1, 33.2, 32.1, 36.8, 42.6, 33.2, 39.4, 28.6)

Wildtype <- c(33.6, 36.0, 31.6, 29.3, 29.8, 23.1, 42.9, 32.7, 39.7, 38.4, 42.9, 29.2)

5

Boxplot

#put data into frame
df <- data.frame(cbind(Mutant, Wildtype))

#plot the data
plot1 <- ggplot(df, aes(Mutant))+
  geom_boxplot(color="#c178f7")+
  xlab("Mutant Flight m/s^2")

plot2 <- ggplot(df, aes(Wildtype))+
  geom_boxplot(color="#00bfc4")+
  xlab("Wildtype Flight m/s^2")


#plot them together 
ggarrange(plot1, plot2)

6

Run the two sample t-test

Statistical Hypotheses

H0: \(\mu\)\(_1\) = \(\mu_2\)

HA: \(\mu_1\)\(\neq\)\(\mu_2\)

mean.wild <- mean(Wildtype)
mean.mutant <- mean(Mutant)
sd.wild <- sd(Wildtype)
sd.mutant <- sd(Mutant)

denominator1 <- (sd.mutant^2)/length(Mutant) #just doing it the long way to ensure I calc it correctly

denominator2 <- (sd.wild^2)/length(Wildtype)

t.calc <- (mean.wild-mean.mutant)/sqrt((denominator1+denominator2))

var(Mutant)
## [1] 21.98697
var(Wildtype) #not equal, so welches!
## [1] 36.66727
#because this is welches, just run this code (degrees of freedom are calculated differently)
t.test(Mutant, 
       Wildtype, 
       alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  Mutant and Wildtype
## t = 0.35431, df = 20.703, p-value = 0.7267
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.818397  5.385064
## sample estimates:
## mean of x mean of y 
##  34.88333  34.10000
# p-value > 0.05 and we fail to reject H0

Drosophilia Conclusion Sentence: The mean flight speed of flies between wildtype and mutant are not statistically different from one another. We fail to reject H\(_0\) and there evidence (t\(_2,\)\(_2\)\(_0\)\(_.7\) = 0.35; two-tailed p > 0.05) that the \(\mu\) flight speed of Drosophilia between wildtype and mutant are not different.

7

#take the difference in the means (r iterates over the entire list o fdata points)
meandiff <- Mutant-Wildtype
meandiff
##  [1]   1.8  -3.0  -3.6  12.9   4.3  10.1 -10.8   4.1   2.9  -5.2  -3.5  -0.6
#now add this to a table
df <- cbind(Mutant, Wildtype, meandiff)
head(df)
##      Mutant Wildtype meandiff
## [1,]   35.4     33.6      1.8
## [2,]   33.0     36.0     -3.0
## [3,]   28.0     31.6     -3.6
## [4,]   42.2     29.3     12.9
## [5,]   34.1     29.8      4.3
## [6,]   33.2     23.1     10.1
mean(meandiff)
## [1] 0.7833333
t.test(meandiff) #fail to reject null
## 
##  One Sample t-test
## 
## data:  meandiff
## t = 0.40813, df = 11, p-value = 0.691
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -3.441046  5.007713
## sample estimates:
## mean of x 
## 0.7833333
# plug into equation

t.meandiff <- meandiff/(sd(meandiff)/sqrt(length(meandiff)))

#solve df
df <- length(meandiff-1)

Part IV:

t-test function

8

t.test(mice, mu = meanMice, alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  mice
## t = 0, df = 9, p-value = 1
## alternative hypothesis: true mean is not equal to 143.4
## 95 percent confidence interval:
##  132.0431 154.7569
## sample estimates:
## mean of x 
##     143.4
t.test(Mutant, Wildtype, alternative = "two.sided", var.equal = T)
## 
##  Two Sample t-test
## 
## data:  Mutant and Wildtype
## t = 0.35431, df = 22, p-value = 0.7265
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.801687  5.368354
## sample estimates:
## mean of x mean of y 
##  34.88333  34.10000

Part V:

Algal Density

9

islandData <- read.csv("island.csv") #read in your data

#subset north and south beaches
north <- subset(islandData, Face == "North") #grab north beaches by calling column FACE
south <- subset(islandData, Face == "South") #look at the data in your environment. Notice it is a dataframe! not a string or list of numbers. 

hist(south$AlgalDensity, ylim = c(0,.1), xlim=c(20, 65), freq = F, col="#747da5")
curve(dnorm(x, 42.12, 8.7), add=T) #I found the mean and sd of south facing algal density to fill this out

hist(north$AlgalDensity, ylim = c(0,.1), xlim=c(20, 65), freq = F, col="#8db42a")
curve(dnorm(x, 42.12, 8.7), add = T)

hist(north$Nutrients, ylim = c(0,.1), xlim=c(0, 35), freq = F, col="steelblue")
curve(dnorm(x, 18.62, 6.33), add = T) #note that I am just running the mean and sdof the nutrient data to fill the dnorm out

hist(south$Nutrients, ylim = c(0,.1), xlim=c(0, 35), freq = F, col="pink")
curve(dnorm(x, 18.62, 6.33), add = T) #note that I am just running the mean and sdof the nutrient data to fill the dnorm out

Statistical Hypotheses

H0: \(\mu_S\)\(=\)\(\mu_N\)

HA: \(\mu_S\)\(\neq\)\(\mu_N\)

We can run a 2 sample t-test with 2 tails

t.test(south$AlgalDensity, north$AlgalDensity, alternative = c("two.sided"), var.equal = T)
## 
##  Two Sample t-test
## 
## data:  south$AlgalDensity and north$AlgalDensity
## t = -2.1186, df = 28, p-value = 0.04313
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -10.4623835  -0.1762831
## sample estimates:
## mean of x mean of y 
##  42.11800  47.43733
t.test(south$Nutrients, north$Nutrients, alternative = c("two.sided"), var.equal = T)
## 
##  Two Sample t-test
## 
## data:  south$Nutrients and north$Nutrients
## t = 0.64891, df = 28, p-value = 0.5217
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.471101  6.690009
## sample estimates:
## mean of x mean of y 
##  18.61688  17.00743

10

Algae Conclusion Sentences: Conclusion that talks about t, df, and p-value. Decide whether to accept or reject your statistical hypotheses. Are the means between algal density statistically different between North and South? Are the means between nutrients statistically different between North and South? See problem 6 for an example.