Answers.com claims that the mean length of all cell phone conversations in the United States is 195 seconds (3 minutes 15 seconds). One researcher believes this 195 value is outdated and that the true mean time spent on a cell phone calls is something other than 195 seconds. He collects a random sample of 100 phone call lengths from a phone company’s records and finds a sample mean of 182 seconds and a standard deviation of 245 seconds.
# EX: CELL PHONE CONVERSATIONS
mu0<-195
xbar<-182
s<-245
n<-100
# calcualte the test statistic
t_stat<-(xbar-mu0)/(s/sqrt(n))
t_stat
## [1] -0.5306122
# P-value for two-sided alternative
pt(abs(t_stat), df=n-1, lower.tail=FALSE)*2
## [1] 0.5968758
Although it is known that the white shark grows to a mean length of 21 feet, a marine biologist believes that the great white sharks off the Bermuda coast grow much longer due to unusual feeding habits. To test this claim, a number of full-grown great white sharks are captured off the Bermuda coast, measured and then set free. For the 15 sharks that were caught, a sample mean of 22.1 feet and sample standard deviation of 3.2 feet was found. A histogram of the sample data appeared to be approximately normal. Test using a significance level of 0.05
# EX: SHARKS
mu0<-21
xbar<-22.1
s<-3.2
n<-15
# calcualte the test statistic
t_stat<-(xbar-mu0)/(s/sqrt(n))
t_stat
## [1] 1.331338
# one-sided upper alternative
pt(t_stat, df=n-1, lower.tail=FALSE)
## [1] 0.1021756
There is evidence that T-cells participate in controlling tumor growth and that they can be harnessed to use the body’s immune system to treat cancer. One study investigated the use of a T cell-engaging antibody to recruited T cells to control tumor growth. The data below are T cell counts (1000 per microliter) at baseline and after 20 days on this antibody for 6 subjects.
Do the data give convincing evidence that the mean count of t-cells is higher after 20 days on this antibody at the 0.01 significance level?
# EX: T-cells
before<-c(0.04, 0.02, 0.00, 0.02, 0.33, 0.38)
after<-c(0.28, 0.47, 1.30, 0.25, 0.44, 1.22)
diff<-after-before
diff
## [1] 0.24 0.45 1.30 0.23 0.11 0.84
# summary statistics
mu0<-0
dbar<-mean(diff)
dbar
## [1] 0.5283333
s<-sd(diff)
s
## [1] 0.4573584
n<-length(diff)
n
## [1] 6
# calculate the test stastistic
t_stat<-(dbar-mu0)/(s/sqrt(n))
t_stat
## [1] 2.829613
# one-sided upper alternative
pt(t_stat, df=n-1, lower.tail=FALSE)
## [1] 0.01834571
# code for passing in two vectors for a paired t-test
t.test(after, before, paired=TRUE, alternative = "greater", conf.level=.99)
##
## Paired t-test
##
## data: after and before
## t = 2.8296, df = 5, p-value = 0.01835
## alternative hypothesis: true difference in means is greater than 0
## 99 percent confidence interval:
## -0.09995215 Inf
## sample estimates:
## mean of the differences
## 0.5283333
# or you can pass in a single vector of difference and test against 0
t.test(diff, mu=0, alternative = "greater", conf.level=.99)
##
## One Sample t-test
##
## data: diff
## t = 2.8296, df = 5, p-value = 0.01835
## alternative hypothesis: true mean is greater than 0
## 99 percent confidence interval:
## -0.09995215 Inf
## sample estimates:
## mean of x
## 0.5283333
You will need to import the football dataset. Please use the following code:
football<-read.csv("https://raw.githubusercontent.com/kitadasmalley/fa2020_MATH138/main/data/football.csv",
header=TRUE)
head(football)
## Trial Helium Air Difference
## 1 1 25 25 0
## 2 2 16 23 -7
## 3 3 25 18 7
## 4 4 14 16 -2
## 5 5 23 35 -12
## 6 6 29 15 14
Different varieties of the tropical flower Heliconia are fertilized by different species of hummingbirds. Over time, the lengths of the flowers and the forms of the hummingbirds’ beaks have evolved to match each other.
Data on the lengths in millimeters of two color varieties of the same species of flower on the island of Dominica can be found in the R script.
# EX: TROPICAL FLOWERS
# H. caribaea RED
red<-c(42.90, 42.01, 41.93, 43.09, 41.47, 41.69, 39.78,
39.63, 42.18, 40.66, 37.87, 39.16, 37.40, 38.20,
38.10, 37.97, 38.79, 38.23, 38.87, 37.78, 38.01)
# H. caribaea YELLOW
yellow<-c(36.78, 37.02, 36.52, 36.11, 36.03, 35.45, 38.13,
37.1, 35.17, 36.82, 36.66, 35.68, 36.03, 34.57, 34.63)
# Create a data frame
flowers<-data.frame(color=c(rep("red", length(red)),
rep("yellow", length(yellow))),
length=c(red, yellow))
# into the verse!
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# Lets that a look at this problem
# Do the means of these distributions look significantly different?
ggplot(data=flowers, aes(y=length, x=color, fill=color))+
geom_boxplot()
# Learning objectives: the pipe operator, group_by, and summarise
flowers%>%
group_by(color)%>%
summarise(mean=mean(length),
sd=sd(length),
n=n())
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 4
## color mean sd n
## <fct> <dbl> <dbl> <int>
## 1 red 39.8 1.91 21
## 2 yellow 36.2 0.975 15
# We can check this!
# sample stats
xbar1<-mean(red) #39.79619
s1<-sd(red) #1.910195
n1<- length(red) #21
xbar2<-mean(yellow) #36.18
s2<-sd(yellow) #0.9753241
n2<-length(yellow) #15
# standard error
se<-sqrt(s1^2/n1 + s2^2/n2)
se #0.4870027
## [1] 0.4870027
# estimate degrees of freedom
approxDF<-min(n1-1, n2-1)
approxDF #14
## [1] 14
# estimate a 95% confidence interval
(xbar1-xbar2)+c(-1,1)*qt(0.975, df=approxDF)*se
## [1] 2.571674 4.660707
#2.571674 4.660707
# test for significant difference
tstat2<-(xbar1-xbar2)/se
tstat2 #7.425401
## [1] 7.425401
# two-sided
pt(abs(tstat2), df=approxDF, lower.tail=F)*2
## [1] 3.225031e-06
# 3.225031e-06
# Now lets use the function
t.test(red, yellow)
##
## Welch Two Sample t-test
##
## data: red and yellow
## t = 7.4254, df = 31.306, p-value = 2.171e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.623335 4.609046
## sample estimates:
## mean of x mean of y
## 39.79619 36.18000
Compare multiple means!
Lets look at another species!
## CATEGORICAL Predictor
# H. Bihai
bihai<-c(47.12, 48.07, 46.75, 48.34, 46.81, 48.15, 47.12, 50.26,
46.67, 50.12, 47.43, 46.34, 46.44, 46.94, 46.64, 48.36)
# H. caribaea RED
red<-c(42.90, 42.01, 41.93, 43.09, 41.47, 41.69, 39.78,
39.63, 42.18, 40.66, 37.87, 39.16, 37.40, 38.20,
38.10, 37.97, 38.79, 38.23, 38.87, 37.78, 38.01)
# H. caribaea YELLOW
yellow<-c(36.78, 37.02, 36.52, 36.11, 36.03, 35.45, 38.13,
37.1, 35.17, 36.82, 36.66, 35.68, 36.03, 34.57, 34.63)
type<-c(rep("bihai", length(bihai)),
rep("red", length(red)),
rep("yellow", length(yellow)))
lengths<-c(bihai, red, yellow)
heliconia<-data.frame(type, lengths)
ggplot(heliconia, aes(y=lengths, x=type, fill=type))+
geom_boxplot()
m2<-lm(lengths~type)
anova(m2)
## Analysis of Variance Table
##
## Response: lengths
## Df Sum Sq Mean Sq F value Pr(>F)
## type 2 1074.13 537.06 242.86 < 2.2e-16 ***
## Residuals 49 108.36 2.21
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1