Markdown Author: Jessie Bell, 2023
Libraries Used: ggplot
Answers: sea green
Setting up t-test scenarios
Take home from this exercise?
Answer:
Look at the data. (It is informative to visually examine data prior to running a NHST.)… Is it normal? Was the sample random?
State the statistical hypotheses * Null hypothesis (depends on type of t-test) * Alternative (all other possibilities)
Determine the degrees of freedom
Choose criterion to reject/fail to reject H0
For instance, set α=0.05. If p≤0.05 then reject H0, else fail to reject H0
Calculate t
The test statistic for all t-tests is t_calc (or t_stat). However, we calculate this value slightly different for one-sample, paired, and two-sample t-tests.
Make a decision about H0
Two options: ‘reject H0’ or ‘fail to reject H0’
‘If the p is low, the null must go’
Write a conclusion sentence
Compose a summary sentence that makes a conclusion responding to the initial question. Read the following example: The mean length of salmon differed between Chuckanut and Squalicum Creek (two sample t-test, t(2)36=3.12,p=0.004)
The format is: t(tails)df = t-test stat, p = calculated p-value
Describe how the assumptions of the ttest are found in the way we use the t-test function.
The t-distribution
Plot the normal distribution and 2 t-distributions over it. Indicate which is which.
curve(dnorm(x,0,1),-3,3)
curve(dt(x, df=6), from=-4, to=4, col='steelblue')
curve(dt(x, df=10), from=-4, to=4, col='pink', add=TRUE)
curve(dt(x, df=30), from=-4, to=4, col='red', add=TRUE)
#add legend
legend(-4, .3, legend=c("df=6", "df=10", "df=30"),
col=c("steelblue", "pink", "red"), lty=1, cex=1.2)
The average heart rate of mice is 135 beats per minute under normal conditions, and you want to see if a chemical changes the heart rate of mice.
Statistical Hypotheses
H0: \(\mu=\) 0
HA: \(\mu\neq\) 0
mice <- c(145, 132, 164, 139, 174, 122, 136, 141, 151, 130)
meanMice <- mean(mice)
meanMice #larger than 135, which is the mean heartrate of mice without the chemical
## [1] 143.4
sdMice <- sd(mice)
sdMice
## [1] 15.87591
n <- length(mice)
n
## [1] 10
df <- n-1 #for 1 sample t-test, df=9
tvalueMice <- (meanMice-135)/(sdMice/sqrt(n))
tvalueMice
## [1] 1.673173
# determine t-critical, the mean of our sample is:
qt(0.975, df)
## [1] 2.262157
#just comparing the tcritical (2.26) to tvaluemice (1.67), we can tell that tcritical > tvaluemice and we fail to reject H0.
#We should still calculate our p-value because we will want to use it in our summary statement.
pt(tvalueMice, df, lower.tail = F) *2
## [1] 0.1286214
# our p-value is greater that 0.05 and we fail to reject our null.
Mice Conclusion Sentence: The mean heartrate of mice after recieving the chemical was not statistically different from the average heartrate of mice without the chemical. We fail to reject H\(_0\) and there is strong evidence (t\(_2, 14\) = 1.673; two-tailed p > 0.05) that the \(\mu\) heartrate of mice after receiving chemical is not different from mice without the chemical.
You have a mutant and a wildtype line of Drosophila and you want to test whether the wildtype flies faster than the mutant.
Mutant <- c(35.4, 33.0, 28.0, 42.2, 34.1, 33.2, 32.1, 36.8, 42.6, 33.2, 39.4, 28.6)
Wildtype <- c(33.6, 36.0, 31.6, 29.3, 29.8, 23.1, 42.9, 32.7, 39.7, 38.4, 42.9, 29.2)
Boxplot
#put data into frame
df <- data.frame(cbind(Mutant, Wildtype))
#plot the data
plot1 <- ggplot(df, aes(Mutant))+
geom_boxplot(color="#c178f7")+
xlab("Mutant Flight m/s^2")
plot2 <- ggplot(df, aes(Wildtype))+
geom_boxplot(color="#00bfc4")+
xlab("Wildtype Flight m/s^2")
#plot them together
ggarrange(plot1, plot2)
Run the two sample t-test
Statistical Hypotheses
H0: \(\mu\)\(_1\) = \(\mu_2\)
HA: \(\mu_1\)\(\neq\)\(\mu_2\)
mean.wild <- mean(Wildtype)
mean.mutant <- mean(Mutant)
sd.wild <- sd(Wildtype)
sd.mutant <- sd(Mutant)
denominator1 <- (sd.mutant^2)/length(Mutant) #just doing it the long way to ensure I calc it correctly
denominator2 <- (sd.wild^2)/length(Wildtype)
t.calc <- (mean.wild-mean.mutant)/sqrt((denominator1+denominator2))
var(Mutant)
## [1] 21.98697
var(Wildtype) #not equal, so welches!
## [1] 36.66727
#because this is welches, just run this code (degrees of freedom are calculated differently)
t.test(Mutant,
Wildtype,
alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: Mutant and Wildtype
## t = 0.35431, df = 20.703, p-value = 0.7267
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.818397 5.385064
## sample estimates:
## mean of x mean of y
## 34.88333 34.10000
# p-value > 0.05 and we fail to reject H0
Drosophilia Conclusion Sentence: The mean flight speed of flies between wildtype and mutant are not statistically different from one another. We fail to reject H\(_0\) and there evidence (t\(_2,\)\(_2\)\(_0\)\(_.7\) = 0.35; two-tailed p > 0.05) that the \(\mu\) flight speed of Drosophilia between wildtype and mutant are not different.
#take the difference in the means (r iterates over the entire list o fdata points)
meandiff <- Mutant-Wildtype
meandiff
## [1] 1.8 -3.0 -3.6 12.9 4.3 10.1 -10.8 4.1 2.9 -5.2 -3.5 -0.6
#now add this to a table
df <- cbind(Mutant, Wildtype, meandiff)
head(df)
## Mutant Wildtype meandiff
## [1,] 35.4 33.6 1.8
## [2,] 33.0 36.0 -3.0
## [3,] 28.0 31.6 -3.6
## [4,] 42.2 29.3 12.9
## [5,] 34.1 29.8 4.3
## [6,] 33.2 23.1 10.1
mean(meandiff)
## [1] 0.7833333
t.test(meandiff) #fail to reject null
##
## One Sample t-test
##
## data: meandiff
## t = 0.40813, df = 11, p-value = 0.691
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -3.441046 5.007713
## sample estimates:
## mean of x
## 0.7833333
# plug into equation
t.meandiff <- meandiff/(sd(meandiff)/sqrt(length(meandiff)))
#solve df
df <- length(meandiff-1)
t-test function
t.test(mice, mu = meanMice, alternative = "two.sided")
##
## One Sample t-test
##
## data: mice
## t = 0, df = 9, p-value = 1
## alternative hypothesis: true mean is not equal to 143.4
## 95 percent confidence interval:
## 132.0431 154.7569
## sample estimates:
## mean of x
## 143.4
t.test(Mutant, Wildtype, alternative = "two.sided", var.equal = T)
##
## Two Sample t-test
##
## data: Mutant and Wildtype
## t = 0.35431, df = 22, p-value = 0.7265
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.801687 5.368354
## sample estimates:
## mean of x mean of y
## 34.88333 34.10000
Algal Density
islandData <- read.csv("island.csv") #read in your data
#subset north and south beaches
north <- subset(islandData, Face == "North") #grab north beaches by calling column FACE
south <- subset(islandData, Face == "South") #look at the data in your environment. Notice it is a dataframe! not a string or list of numbers.
hist(south$AlgalDensity, ylim = c(0,.1), xlim=c(20, 65), freq = F, col="#747da5")
curve(dnorm(x, 42.12, 8.7), add=T) #I found the mean and sd of south facing algal density to fill this out
hist(north$AlgalDensity, ylim = c(0,.1), xlim=c(20, 65), freq = F, col="#8db42a")
curve(dnorm(x, 42.12, 8.7), add = T)
hist(north$Nutrients, ylim = c(0,.1), xlim=c(0, 35), freq = F, col="steelblue")
curve(dnorm(x, 18.62, 6.33), add = T) #note that I am just running the mean and sdof the nutrient data to fill the dnorm out
hist(south$Nutrients, ylim = c(0,.1), xlim=c(0, 35), freq = F, col="pink")
curve(dnorm(x, 18.62, 6.33), add = T) #note that I am just running the mean and sdof the nutrient data to fill the dnorm out
Statistical Hypotheses
H0: \(\mu_S\)\(=\)\(\mu_N\)
HA: \(\mu_S\)\(\neq\)\(\mu_N\)
We can run a 2 sample t-test with 2 tails
t.test(south$AlgalDensity, north$AlgalDensity, alternative = c("two.sided"), var.equal = T)
##
## Two Sample t-test
##
## data: south$AlgalDensity and north$AlgalDensity
## t = -2.1186, df = 28, p-value = 0.04313
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -10.4623835 -0.1762831
## sample estimates:
## mean of x mean of y
## 42.11800 47.43733
t.test(south$Nutrients, north$Nutrients, alternative = c("two.sided"), var.equal = T)
##
## Two Sample t-test
##
## data: south$Nutrients and north$Nutrients
## t = 0.64891, df = 28, p-value = 0.5217
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.471101 6.690009
## sample estimates:
## mean of x mean of y
## 18.61688 17.00743
Algae Conclusion Sentences: Conclusion that talks about t, df, and p-value. Decide whether to accept or reject your statistical hypotheses. Are the means between algal density statistically different between North and South? Are the means between nutrients statistically different between North and South? See problem 6 for an example.