Question 4
setwd("/Users/wonny/Downloads/Applied reg")
rainfall <- read.csv(file='rainfall.csv',header=T)
groundhog <- read.table(file='groundhog.table',sep=",",header=T)
range(rainfall$WY);range(groundhog$year)
## [1] 1921 2013
## [1] 1990 2010
Separate years Phil sees his shadow and he doesn’t
YT <- groundhog[groundhog$shadow=="Y","year"] # Years Phil sees his shadow
YF <- groundhog[groundhog$shadow=="N","year"] # Years Phil does not his shadow
YT;YF
## [1] 1991 1992 1993 1994 1996 1998 2000 2001 2002 2003 2004 2005 2006 2008 2009
## [16] 2010
## [1] 1990 1995 1997 1999 2007
Calculate the average rainfalls
av_YT <- rainfall[which(rainfall$WY%in%YT),"Total"]/12
av_YF <- rainfall[which(rainfall$WY%in%YF),"Total"]/12
boxplot(av_YT,av_YF,names=c("Phil sees his shadow","Phil does not see his shadow"),ylab="Average rainfall",col=c("grey","darkgrey"))
Check whether the variances are equal or not
var(av_YT);var(av_YF)
## [1] 1.796807
## [1] 3.092972
var.test(av_YT,av_YF)
##
## F test to compare two variances
##
## data: av_YT and av_YF
## F = 0.58093, num df = 15, denom df = 4, p-value = 0.3952
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.06710904 2.21002367
## sample estimates:
## ratio of variances
## 0.5809322
Since p-value=0.3952 is greater than 0.05, we cannot reject the H0, so there is not enough evidence supporting the unequal variances.
t.test(av_YT,av_YF,conf.level = 0.90,var.equal = T)$conf.int
## [1] -1.6779319 0.8710985
## attr(,"conf.level")
## [1] 0.9
Interpret the interval in part 2.
The confidence interval will capture the true paramter, mean, with 0.95
probability.
At level α = 0.05, would you reject the null hypothesis that the average rainfall in Northern California during the month of February was the same in years Phil sees his shadow versus years he does not?
rain_YT_Feb <- rainfall[which(rainfall$WY%in%YT),"Feb"]
rain_YF_Feb <- rainfall[which(rainfall$WY%in%YF),"Feb"]
var.test(rain_YT_Feb,rain_YF_Feb)
##
## F test to compare two variances
##
## data: rain_YT_Feb and rain_YF_Feb
## F = 0.66525, num df = 15, denom df = 4, p-value = 0.5022
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.07684912 2.53078222
## sample estimates:
## ratio of variances
## 0.6652476
t.test(rain_YT_Feb,rain_YF_Feb,var.equal = T)
##
## Two Sample t-test
##
## data: rain_YT_Feb and rain_YF_Feb
## t = 0.8636, df = 19, p-value = 0.3986
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.597957 8.652707
## sample estimates:
## mean of x mean of y
## 9.779375 7.252000
Since the p-value 0.3986 is greater than 0.05, the null hypothesis cannot be rejected at α = 0.05.
Question 5
Hertz <- c(37.16, 14.36, 17.59, 19.73, 30.77, 26.29, 30.03, 29.02, 22.63, 39.21)
Thrifty <- c(29.49, 12.19, 15.07, 15.17, 24.52, 22.32, 25.30, 22.74, 19.35, 34.44)
Explain why this is a paired-sample problem.
The two groups are not independent because the car types are the
same.
Use a graph to determine whether the assumption of normality is reasonable.
diff <- Hertz-Thrifty
s.diff <- sort(diff)
qqnorm(s.diff);qqline(s.diff)
The data points seem to follow the straight line, so we can conclude that the normality assumption is reasonable.
t.test(diff,alternative="less", mu=0)
##
## One Sample t-test
##
## data: diff
## t = 8.3756, df = 9, p-value = 1
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
## -Inf 5.631148
## sample estimates:
## mean of x
## 4.62
\(\mu=\mu_{H}-\mu_{T}\)
H0:\(\mu<0\) vs H1:\(\mu \ge0\)
Since p-value is nearly 1, we cannot reject the null hypothesis H0 and
therefore we cannot conclude that Thrifty has a lower mean rental rate
than Hertz.
Question 6 1. Create a boxplot of the supervisor rating Y, splitting the data based on the median of X4
P060 <- read.table(file="P060.txt",head=T)
med4 <- median(P060$X4)
above_med <- subset(P060,X4>=med4)$Y # split the data based on the median of X4
below_med <- subset(P060,X4<med4)$Y
boxplot(above_med,below_med,names=c("above the median of X4","below the median of X4"),ylab="Y")
# mean and sd of Y in the first group
mean(above_med);sd(above_med)
## [1] 70.46667
## [1] 9.605554
# mean and sd of Y in the second group
mean(below_med);sd(below_med)
## [1] 58.8
## [1] 11.90558
par(mfrow=c(1,2))
hist(above_med,main="first group",xlab="Y");hist(below_med,main="second group",xlab="Y")
var.test(above_med,below_med)
##
## F test to compare two variances
##
## data: above_med and below_med
## F = 0.65094, num df = 14, denom df = 14, p-value = 0.4318
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.2185412 1.9388938
## sample estimates:
## ratio of variances
## 0.650944
t.test(above_med,below_med,conf.level = 0.90,var.equal = T)$conf.int
## [1] 4.947601 18.385732
## attr(,"conf.level")
## [1] 0.9
The two samples are drawn from normal distributions with equal variance but unknown, also they are independent.
t.test(above_med,below_med,conf.level = 0.95,var.equal = T)
##
## Two Sample t-test
##
## data: above_med and below_med
## t = 2.9538, df = 28, p-value = 0.006296
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.575942 19.757391
## sample estimates:
## mean of x mean of y
## 70.46667 58.80000
Assumptions are described in the answer to the previous question. Since p-value 0.006296 < 0.05, we can reject the null hypothesis that the true difference is equal to 0, which means there is strong evidence supporting the difference between the two average supervisor performances.