In weeks 11 and 12 we cover various forms of t-tests. Please find code for the in-class examples below.
Answers.com claims that the mean length of all cell phone conversations in the United States is 195 seconds (3 minutes 15 seconds). One researcher believes this 195 value is outdated and that the true mean time spent on a cell phone calls is something other than 195 seconds. He collects a random sample of 100 phone call lengths from a phone company’s records and finds a sample mean of 182 seconds and a standard deviation of 245 seconds.
# EX: CELL PHONE CONVERSATIONS
mu0<-195
xbar<-182
s<-245
n<-100
# calcualte the test statistic
t_stat<-(xbar-mu0)/(s/sqrt(n))
t_stat
## [1] -0.5306122
# two-sided alternative
pt(abs(t_stat), df=n-1, lower.tail=FALSE)*2
## [1] 0.5968758
Although it is known that the white shark grows to a mean length of 21 feet, a marine biologist believes that the great white sharks off the Bermuda coast grow much longer due to unusual feeding habits. To test this claim, a number of full-grown great white sharks are captured off the Bermuda coast, measured and then set free. For the 15 sharks that were caught, a sample mean of 22.1 feet and sample standard deviation of 3.2 feet was found. A histogram of the sample data appeared to be approximately normal. Test using a significance level of 0.05
# EX: SHARKS
mu0<-21
xbar<-22.1
s<-3.2
n<-15
# calcualte the test statistic
t_stat<-(xbar-mu0)/(s/sqrt(n))
t_stat
## [1] 1.331338
# one-sided upper alternative
pt(t_stat, df=n-1, lower.tail=FALSE)
## [1] 0.1021756
There is evidence that T-cells participate in controlling tumor growth and that they can be harnessed to use the body’s immune system to treat cancer. One study investigated the use of a T cell-engaging antibody to recruited T cells to control tumor growth. The data below are T cell counts (1000 per microliter) at baseline and after 20 days on this antibody for 6 subjects.
Do the data give convincing evidence that the mean count of t-cells is higher after 20 days on this antibody at the 0.01 significance level?
# EX: T-cells
before<-c(0.04, 0.02, 0.00, 0.02, 0.33, 0.38)
after<-c(0.28, 0.47, 1.30, 0.25, 0.44, 1.22)
diff<-after-before
diff
## [1] 0.24 0.45 1.30 0.23 0.11 0.84
# summary statistics
mu0<-0
dbar<-mean(diff)
dbar
## [1] 0.5283333
s<-sd(diff)
s
## [1] 0.4573584
n<-length(diff)
n
## [1] 6
# calculate the test stastistic
t_stat<-(dbar-mu0)/(s/sqrt(n))
t_stat
## [1] 2.829613
# one-sided upper alternative
pt(t_stat, df=n-1, lower.tail=FALSE)
## [1] 0.01834571
Instead of coding this from scratch you can use the function in R!
# code for passing in two vectors for a paired t-test
t.test(after, before, paired=TRUE, alternative = "greater", conf.level=.99)
##
## Paired t-test
##
## data: after and before
## t = 2.8296, df = 5, p-value = 0.01835
## alternative hypothesis: true difference in means is greater than 0
## 99 percent confidence interval:
## -0.09995215 Inf
## sample estimates:
## mean of the differences
## 0.5283333
# or you can pass in a single vector of difference and test against 0
t.test(diff, mu=0, alternative = "greater", conf.level=.99)
##
## One Sample t-test
##
## data: diff
## t = 2.8296, df = 5, p-value = 0.01835
## alternative hypothesis: true mean is greater than 0
## 99 percent confidence interval:
## -0.09995215 Inf
## sample estimates:
## mean of x
## 0.5283333
A researcher compares two different thermometers on 6 different days. Ideally, these two instruments should agree. The researcher would like to know if the two measurements give the same reading on average.
Assume the populations are a normally distributed. Use a 0.10 significance level.
# EX: Thermometers
therm1<-c(75.4, 74.5, 74.3, 74.7, 74.9, 75.3)
therm2<-c(75.2, 75.1, 74.9, 75.3, 75.0, 75.7)
diff<-therm1-therm2
diff
## [1] 0.2 -0.6 -0.6 -0.6 -0.1 -0.4
# summary statistics
mu0<-0
dbar<-mean(diff)
dbar
## [1] -0.35
s<-sd(diff)
s
## [1] 0.3331666
n<-length(diff)
n
## [1] 6
# critical value
qt(.95, df=n-1)
## [1] 2.015048
# calculate the confidence interval
dbar+c(-1, 1)*qt(.95, df=n-1)*(s/sqrt(n))
## [1] -0.62407621 -0.07592379
Instead of coding this from scratch you can use the function in R and look at the confidence interval.
# code for passing in two vectors for a paired t-test
t.test(therm1, therm2, paired=TRUE, alternative="two.sided", conf.level=0.9)
##
## Paired t-test
##
## data: therm1 and therm2
## t = -2.5733, df = 5, p-value = 0.04984
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
## -0.62407621 -0.07592379
## sample estimates:
## mean of the differences
## -0.35
# or you can pass in a single vector of difference and test against 0
t.test(diff, mu=0, alternative="two.sided", conf.level=0.9)
##
## One Sample t-test
##
## data: diff
## t = -2.5733, df = 5, p-value = 0.04984
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
## -0.62407621 -0.07592379
## sample estimates:
## mean of x
## -0.35
A marketing research firm tests the effectiveness of a new flavoring for a leading beverage using a sample of 20 people, half of whom taste the beverage with the old flavoring and the other half who taste the beverage with the new favoring. The people in the study are then given a questionnaire which evaluates how enjoyable the beverage was. The summary statistics for the scores are below. Determine whether there is a significant difference between the perception of the two flavorings.
# summary statistics
n_new<-10
xbar_new<-15
var_new<-13.33
n_old<-10
xbar_old<-11.1
var_old<-18.77
# calculate the pooled standard deviation
sp2<-((n_new-1)*var_new+(n_old-1)*var_old)/(n_new+n_old-2)
sp2
## [1] 16.05
sp<-sqrt(sp2)
sp
## [1] 4.006245
# test statistic
t_stat<-(xbar_new-xbar_old-0)/(sp*sqrt(1/n_new+1/n_old))
t_stat
## [1] 2.176768
# degrees of freedom
this_df<-n_new+n_old-2
this_df
## [1] 18
#p-value (two-sided)
pt(abs(t_stat), df=this_df, lower.tail=F)*2
## [1] 0.04305272
Different varieties of the tropical flower Heliconia are fertilized by different species of hummingbirds. Over time, the lengths of the flowers and the forms of the hummingbirds’ beaks have evolved to match each other.
Data on the lengths in millimeters of two color varieties of the same species of flower on the island of Dominica can be found in the R script.
# EX: TROPICAL FLOWERS
# H. caribaea RED
red<-c(42.90, 42.01, 41.93, 43.09, 41.47, 41.69, 39.78,
39.63, 42.18, 40.66, 37.87, 39.16, 37.40, 38.20,
38.10, 37.97, 38.79, 38.23, 38.87, 37.78, 38.01)
# H. caribaea YELLOW
yellow<-c(36.78, 37.02, 36.52, 36.11, 36.03, 35.45, 38.13,
37.1, 35.17, 36.82, 36.66, 35.68, 36.03, 34.57, 34.63)
# Create a data frame
flowers<-data.frame(color=c(rep("red", length(red)),
rep("yellow", length(yellow))),
length=c(red, yellow))
# into the verse!
library(tidyverse)
# Lets that a look at this problem
# Do the means of these distributions look significantly different?
ggplot(data=flowers, aes(y=length, x=color, fill=color))+
geom_boxplot()
# Learning objectives: the pipe operator, group_by, and summarise
flowers%>%
group_by(color)%>%
summarise(mean=mean(length),
sd=sd(length),
n=n())
## # A tibble: 2 x 4
## color mean sd n
## <fct> <dbl> <dbl> <int>
## 1 red 39.8 1.91 21
## 2 yellow 36.2 0.975 15
# We can check this!
# sample stats
xbar1<-mean(red) #39.79619
s1<-sd(red) #1.910195
n1<- length(red) #21
xbar2<-mean(yellow) #36.18
s2<-sd(yellow) #0.9753241
n2<-length(yellow) #15
# standard error
se<-sqrt(s1^2/n1 + s2^2/n2)
se #0.4870027
## [1] 0.4870027
# estimate degrees of freedom
approxDF<-min(n1-1, n2-1)
approxDF #14
## [1] 14
# estimate a 95% confidence interval
(xbar1-xbar2)+c(-1,1)*qt(0.975, df=approxDF)*se
## [1] 2.571674 4.660707
#2.571674 4.660707
# test for significant difference
tstat2<-(xbar1-xbar2)/se
tstat2 #7.425401
## [1] 7.425401
# two-sided
pt(abs(tstat2), df=approxDF, lower.tail=F)*2
## [1] 3.225031e-06
# 3.225031e-06
# Now lets use the function
t.test(red, yellow)
##
## Welch Two Sample t-test
##
## data: red and yellow
## t = 7.4254, df = 31.306, p-value = 2.171e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.623335 4.609046
## sample estimates:
## mean of x mean of y
## 39.79619 36.18000
Compare multiple means!
Lets look at another species!
## CATEGORICAL Predictor
# H. Bihai
bihai<-c(47.12, 48.07, 46.75, 48.34, 46.81, 48.15, 47.12, 50.26,
46.67, 50.12, 47.43, 46.34, 46.44, 46.94, 46.64, 48.36)
# H. caribaea RED
red<-c(42.90, 42.01, 41.93, 43.09, 41.47, 41.69, 39.78,
39.63, 42.18, 40.66, 37.87, 39.16, 37.40, 38.20,
38.10, 37.97, 38.79, 38.23, 38.87, 37.78, 38.01)
# H. caribaea YELLOW
yellow<-c(36.78, 37.02, 36.52, 36.11, 36.03, 35.45, 38.13,
37.1, 35.17, 36.82, 36.66, 35.68, 36.03, 34.57, 34.63)
type<-c(rep("bihai", length(bihai)),
rep("red", length(red)),
rep("yellow", length(yellow)))
lengths<-c(bihai, red, yellow)
heliconia<-data.frame(type, lengths)
ggplot(heliconia, aes(y=lengths, x=type, fill=type))+
geom_boxplot()
m2<-lm(lengths~type)
anova(m2)
## Analysis of Variance Table
##
## Response: lengths
## Df Sum Sq Mean Sq F value Pr(>F)
## type 2 1074.13 537.06 242.86 < 2.2e-16 ***
## Residuals 49 108.36 2.21
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1