• Hypothesis Testing: One Sample Test
x=c(95,91,110,93,133,119,113,107,110,89,113,100,100,124,116,113,110,106,115,113)
t.test(x,alternative = 'greater',conf.level = 0.99)
##
## One Sample t-test
##
## data: x
## t = 43.182, df = 19, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 0
## 99 percent confidence interval:
## 102.1193 Inf
## sample estimates:
## mean of x
## 108.5
qqnorm(x, pch = 1, frame = FALSE)
print("Since the lower limit of the confidence interval is above the national average, the IQs of the students at this school are greater than the national average.")
## [1] "Since the lower limit of the confidence interval is above the national average, the IQs of the students at this school are greater than the national average."
print("Since the QQ plot is more or less a straight line, we can assume that the data is normally distributed.")
## [1] "Since the QQ plot is more or less a straight line, we can assume that the data is normally distributed."
12.3 11.4 14.2 15.3 14.8 13.8 11.1 15.1 15.8 13.2
library(TeachingDemos)
## Warning: package 'TeachingDemos' was built under R version 4.0.3
x <- c(12.3,11.4,14.2,15.3,14.8,13.8,11.1,15.1,15.8,13.2)
xbar = mean(x)
stdev =sd(x)
var(x)
## [1] 2.74
z.test(x,mu=xbar,sd=stdev,alternative ='less',conf.level = 0.99)
##
## One Sample z-test
##
## data: x
## z = 0, n = 10.00000, Std. Dev. = 1.65529, Std. Dev. of the sample mean
## = 0.52345, p-value = 0.5
## alternative hypothesis: true mean is less than 13.7
## 99 percent confidence interval:
## -Inf 14.91773
## sample estimates:
## mean of x
## 13.7
boxplot(x)
qqnorm(x, pch = 1, frame = FALSE)
print("We fail to reject the null hypothesis that the mean Hb level for children with chronic diarrhea is not less than that of the normal value of 14.6 g/dL")
## [1] "We fail to reject the null hypothesis that the mean Hb level for children with chronic diarrhea is not less than that of the normal value of 14.6 g/dL"
prop.test(550,1500,p=1/3,alternative = 'greater',conf.level=0.99)
##
## 1-sample proportions test with continuity correction
##
## data: 550 out of 1500, null probability 1/3
## X-squared = 7.3508, df = 1, p-value = 0.003352
## alternative hypothesis: true p is greater than 0.3333333
## 99 percent confidence interval:
## 0.337922 1.000000
## sample estimates:
## p
## 0.3666667
print("We reject the null hypothesis and conclude that the customers have a preference for the colour ivory.")
## [1] "We reject the null hypothesis and conclude that the customers have a preference for the colour ivory."
prop.test(45,175,p=0.12,alternative = 'greater',conf.level=0.98)
##
## 1-sample proportions test with continuity correction
##
## data: 45 out of 175, null probability 0.12
## X-squared = 29.884, df = 1, p-value = 2.294e-08
## alternative hypothesis: true p is greater than 0.12
## 98 percent confidence interval:
## 0.1930145 1.0000000
## sample estimates:
## p
## 0.2571429
print("The decision is correct and the machine should be repaired.")
## [1] "The decision is correct and the machine should be repaired."
prop.test(252,400,p=2/3)
##
## 1-sample proportions test with continuity correction
##
## data: 252 out of 400, null probability 2/3
## X-squared = 2.2578, df = 1, p-value = 0.1329
## alternative hypothesis: true p is not equal to 0.6666667
## 95 percent confidence interval:
## 0.5803883 0.6770735
## sample estimates:
## p
## 0.63
print("We fail to reject the null hypothesis and hence conclude that the claim may be correct.")
## [1] "We fail to reject the null hypothesis and hence conclude that the claim may be correct."
ssquare = 12^2
1-pchisq((24*ssquare)/100,24)
## [1] 0.07519706
print("We fail to reject the null hypothesis and hence conclude that the physician may be correct.")
## [1] "We fail to reject the null hypothesis and hence conclude that the physician may be correct."
1-pchisq((9)*(0.00027)/(0.0003),9)
## [1] 0.5241009
print("We fail to reject the null hypothesis an conclude that the company's claim may be correct.")
## [1] "We fail to reject the null hypothesis an conclude that the company's claim may be correct."
• Hypothesis Testing: Two Samples Test 1. In the academic year 1997–1998, two random samples of 25 male professors and 23 female professors from a large university produced a mean salary for male professors of $58,550 with a standard deviation of $4000 and an average for female professors of $53,700 with a standard deviation of $3200. (a) At the 5% significance level, can you conclude that the mean salary of all male professors for 1997–1998 was higher than that of all female professors? (b) Assume that the salaries of male and female professors are both normally distributed with equal standard deviations.
(58550-53700)/sqrt((4000)^2/25+(3200)^2/23)
## [1] 4.655683
print("The salary of all male professors are higher than that of female professors")
## [1] "The salary of all male professors are higher than that of female professors"
Sample 1 14 15 11 14 10 8 13 10 12 16 15 Sample 2 17 16 21 12 20 18 16 14 21 20 13 20 13
Test at the 2% significance level whether µ1 is lower than µ2.
Sample1=c(14,15,11,14,10,8,13,10,12,16,15)
sample2=c(17,16,21,12,20,18,16,14,21,20,13,20,13)
t.test(Sample1,sample2,alternative ='less',conf.level = 0.98)
##
## Welch Two Sample t-test
##
## data: Sample1 and sample2
## t = -3.7528, df = 21.88, p-value = 0.0005541
## alternative hypothesis: true difference in means is less than 0
## 98 percent confidence interval:
## -Inf -1.862584
## sample estimates:
## mean of x mean of y
## 12.54545 17.00000
smoke <- matrix(c(70,280-70,78,400-78),ncol = 2,byrow = TRUE)
smoke = as.table(smoke)
smoke
## A B
## A 70 210
## B 78 322
prop.test(smoke,alternative = "greater")
##
## 2-sample test for equality of proportions with continuity correction
##
## data: smoke
## X-squared = 2.6119, df = 1, p-value = 0.05303
## alternative hypothesis: greater
## 95 percent confidence interval:
## -0.001640794 1.000000000
## sample estimates:
## prop 1 prop 2
## 0.250 0.195
print("We fail to reject the null hypothesis that Americans are more likely to develop lung cancer due to smoking.")
## [1] "We fail to reject the null hypothesis that Americans are more likely to develop lung cancer due to smoking."
tstat = (105.9-100.5)/(sqrt((0.21/80)+(0.19/100)))
degf = (((0.21/80)+(0.19/100))^2)/((((0.21/80)^2)/79)+(((0.19/100)^2)/99))
pt(-abs(tstat),df = degf)
## [1] 9.04672e-135
print("We reject the null hypothesis and conclude that there is in fact a difference between the mean weights.")
## [1] "We reject the null hypothesis and conclude that there is in fact a difference between the mean weights."
var1 = 0.9^2
var2 = 1.2^2
tstat = (11.2-9.8)/(sqrt((var1/95)+(var2/75)))
degf = (((var1/95)+(var2/75))^2)/((((var1/95)^2)/94)+(((var2/75)^2)/74))
pt(-abs(tstat),degf)
## [1] 2.762958e-14
print("We reject the null hypothesis and conclude that the mean Hb levels of well-nourished children were higher than those of undernourished children.")
## [1] "We reject the null hypothesis and conclude that the mean Hb levels of well-nourished children were higher than those of undernourished children."
Fstat1 = 100/49
1-pf(Fstat1,17,14)
## [1] 0.09178643
print("We fail to reject the null hypothesis and conclude that the variances of the IQs of the students in the two different areas of the city may be the same.")
## [1] "We fail to reject the null hypothesis and conclude that the variances of the IQs of the students in the two different areas of the city may be the same."
eco <- matrix(c(36*0.37,36*(1-0.37),36*0.36,36*(1-0.36)),ncol = 2,byrow = TRUE)
eco = as.table(eco)
eco
## A B
## A 13.32 22.68
## B 12.96 23.04
prop.test(eco,alternative = "less")
##
## 2-sample test for equality of proportions with continuity correction
##
## data: eco
## X-squared = 1.6734e-31, df = 1, p-value = 0.5
## alternative hypothesis: less
## 95 percent confidence interval:
## -1.0000000 0.2066383
## sample estimates:
## prop 1 prop 2
## 0.37 0.36
print("We fail to reject the null hypothesis and hence conclude that the domestic content might not have fallen during the period 1977–1997.")
## [1] "We fail to reject the null hypothesis and hence conclude that the domestic content might not have fallen during the period 1977–1997."
year_1989 <- c(523,498,539,487,561,509,560,496,507,515,539,469,475,512,520,510)
year_1999 <- c(525,509,555,498,576,525,571,502,499,526,585,493,482,517,568,518)
t.test(year_1989,year_1999,alternative = 'greater',conf.level=0.95,var.equal = TRUE)
##
## Two Sample t-test
##
## data: year_1989 and year_1999
## t = -1.3561, df = 30, p-value = 0.9074
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -32.22575 Inf
## sample estimates:
## mean of x mean of y
## 513.7500 528.0625
var.test(year_1989,year_1999,alternative = "two.sided")
##
## F test to compare two variances
##
## data: year_1989 and year_1999
## F = 0.66122, num df = 15, denom df = 15, p-value = 0.4324
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.2310274 1.8924778
## sample estimates:
## ratio of variances
## 0.6612217
print("We fail to reject the null hypotheses in both cases and conclude that both the mean and variance in both years may have been the same")
## [1] "We fail to reject the null hypotheses in both cases and conclude that both the mean and variance in both years may have been the same"
Before 268 225 252 192 307 228 246 298 231 185 After 106 186 223 110 203 101 211 176 194 203
Do the data provide sucient evidence to support the claim that the new program reduces blood glucose level in diabetic patients? Use alpha = 0.05.
Before <- c(268,225,252,192,307,228,246,298,231,185)
After <- c(106,186,223,110,203,101,211,176,194,203)
t.test(Before,After,alternative = 'greater')
##
## Welch Two Sample t-test
##
## data: Before and After
## t = 3.6739, df = 17.556, p-value = 0.000899
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 37.91728 Inf
## sample estimates:
## mean of x mean of y
## 243.2 171.3
print("We reject the null hypothesis and conclude that the new program reduces blood glucose level in diabetic patients")
## [1] "We reject the null hypothesis and conclude that the new program reduces blood glucose level in diabetic patients"
No_Drug <- c(0,0,3,2,0,0,3,3,1)
After_Drug <- c(1,5,6,5,5,5,6,1,6)
t.test(No_Drug,After_Drug)
##
## Welch Two Sample t-test
##
## data: No_Drug and After_Drug
## t = -3.8015, df = 14.373, p-value = 0.001863
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.862103 -1.360119
## sample estimates:
## mean of x mean of y
## 1.333333 4.444444
print("We reject the null hypothesis and conclude that there is any difference in the individuals driving ability under the two conditions.")
## [1] "We reject the null hypothesis and conclude that there is any difference in the individuals driving ability under the two conditions."
Upstream 9.0 6.8 6.5 8.0 7.7 8.6 6.8 8.9 7.2 7.0 Downstream 10.2 10.2 9.9 11.1 9.6 8.7 9.6 9.7 10.4 8.1
Assuming that the samples come from a normal distribution, (a) Test that the mean BOD for the downstream samples is less than for the samples upstream at alpha = 0.05. Assume that the variances are equal. (b) Test for the equality of the variances at alpha = 0.05. (c) In parts (a) and (b), we assumed samples are independent. Now, we feel this assumption is not reasonable. Assuming that the difference of each pair is approximately normal, test that the mean BOD for the downstream samples is less than for the upstream samples at alpha = 0.05.
Upstream=c(9.0,6.8,6.5,8.0,7.7,8.6,6.8,8.9,7.2,7.0)
Downstream=c(10.2,10.2,9.9,11.1,9.6,8.7,9.6,9.7,10.4,8.1)
t.test(Upstream,Downstream,alternative='greater',var.equal = TRUE)
##
## Two Sample t-test
##
## data: Upstream and Downstream
## t = -5.2591, df = 18, p-value = 1
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -2.79242 Inf
## sample estimates:
## mean of x mean of y
## 7.65 9.75
var.test(Upstream,Downstream,alternative = "two.sided")
##
## F test to compare two variances
##
## data: Upstream and Downstream
## F = 1.1925, num df = 9, denom df = 9, p-value = 0.7974
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.2962035 4.8010519
## sample estimates:
## ratio of variances
## 1.192513
t.test(Upstream,Downstream,alternative='greater',paired = TRUE)
##
## Paired t-test
##
## data: Upstream and Downstream
## t = -5.3982, df = 9, p-value = 0.9998
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -2.81311 Inf
## sample estimates:
## mean of the differences
## -2.1
cat("We fail to reject the null hypothesis in the first case and conclude that that the mean BOD for the downstream samples is not less than that of the upstream sample. \n")
## We fail to reject the null hypothesis in the first case and conclude that that the mean BOD for the downstream samples is not less than that of the upstream sample.
cat("We fail to reject the null hypothesis in the second case and conclude that that the variance of BOD for the downstream samples may be equal to that of the upstream sample. \n")
## We fail to reject the null hypothesis in the second case and conclude that that the variance of BOD for the downstream samples may be equal to that of the upstream sample.
cat("We fail to reject the null hypothesis in the first case and conclude that that the mean BOD for the downstream samples is not less than that of the upstream sample if the two samples are dependent. \n")
## We fail to reject the null hypothesis in the first case and conclude that that the mean BOD for the downstream samples is not less than that of the upstream sample if the two samples are dependent.
• Hypothesis Testing: Goodness-of-fit test
Up_face=c(1,2,3,4,5,6)
Frequency=c(8,11,5,12,15,9)
freq <- c(Frequency/sum(Frequency))
chisq.test(Up_face,freq)
## Warning in chisq.test(Up_face, freq): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: Up_face and freq
## X-squared = 30, df = 25, p-value = 0.2243
print("We fail to reject the null hypothesis and hence conclude that the die may be balanced.")
## [1] "We fail to reject the null hypothesis and hence conclude that the die may be balanced."
Sense_and_Sensibility=c(150,30,30,90)
Long_lost_work=c(90,20,10,80)
chisq.test(Sense_and_Sensibility,Long_lost_work,)
## Warning in chisq.test(Sense_and_Sensibility, Long_lost_work, ): Chi-squared
## approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: Sense_and_Sensibility and Long_lost_work
## X-squared = 8, df = 6, p-value = 0.2381
print("We fail to reject the null hypothesis and hence conclude that the long lost book may be the one by Jane Austen.")
## [1] "We fail to reject the null hypothesis and hence conclude that the long lost book may be the one by Jane Austen."
Number_of_Students=c(12,36,90,44,18)
pnorm(Number_of_Students,mean = 75,sd=8,lower.tail = TRUE)
## [1] 1.703714e-15 5.440423e-07 9.696036e-01 5.331235e-05 5.204034e-13
Number_of_days=c(4,6,13,23,14)
pnorm(Number_of_days,mean = 75,sd=sqrt(6),lower.tail = TRUE)
## [1] 4.992733e-185 6.986430e-175 1.196519e-141 2.582264e-100 3.439361e-137
freq <- c(39,23,12,1)
exp <- c(0,1,2,3)
chisq.test(exp,freq)
## Warning in chisq.test(exp, freq): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: exp and freq
## X-squared = 12, df = 9, p-value = 0.2133
• Hypothesis Testing: Testing of Independence
Female=c(23,45,12,15)
Male=c(66,75,40,24)
x=matrix(c(23,45,12,15,66,75,40,24),nrow=2,byrow=T)
subject=c('Arts','Science','Engineering','Business')
k=data.frame(Female,Male,subject)
y=rowSums(x)%*%t(colSums(x))/sum(x) # Expected: E
testStatistic=sum((x-y)^2/y)
pValue=pchisq(testStatistic,prod(dim(x)-1),lower.tail = FALSE)
chisq.test(x)
##
## Pearson's Chi-squared test
##
## data: x
## X-squared = 5.8873, df = 3, p-value = 0.1172
print("We fail to reject the null hypothesis and hence conclude that the choice of the major by undergraduate students in this university may be independent of their gender.")
## [1] "We fail to reject the null hypothesis and hence conclude that the choice of the major by undergraduate students in this university may be independent of their gender."
x=matrix(c(39,19,12,28,18,172,61,44,70,37),nrow=2,byrow=T) # Observed: O
colnames(x)=c("Religion A","Religion B","Religion C","Religion D","No Religion")
rownames(x)=c("Single","Married")
print(x)
## Religion A Religion B Religion C Religion D No Religion
## Single 39 19 12 28 18
## Married 172 61 44 70 37
y=rowSums(x)%*%t(colSums(x))/sum(x) # Expected: E
testStatistic=sum((x-y)^2/y)
pValue=pchisq(testStatistic,prod(dim(x)-1),lower.tail = FALSE)
chisq.test(x)
##
## Pearson's Chi-squared test
##
## data: x
## X-squared = 7.1355, df = 4, p-value = 0.1289
print("We fail to reject the null hypothesis that marital status and religious aliation are independent.")
## [1] "We fail to reject the null hypothesis that marital status and religious aliation are independent."
x=matrix(c(30,15,15,50,10,40,10,25,5),nrow=3,byrow=T)
Type_of_Employee=c('Staff','Faculty','Administration')
Opinion_on_Collective_Bargaining=c('For','Against','Undecided')
y=rowSums(x)%*%t(colSums(x))/sum(x)
testStatistic=sum((x-y)^2/y)
pValue=pchisq(testStatistic,prod(dim(x)-1),lower.tail = FALSE)
chisq.test(x)
##
## Pearson's Chi-squared test
##
## data: x
## X-squared = 43.861, df = 4, p-value = 6.856e-09
print("We reject the null hypothesis that opinion on collective bargaining is independent of employee classification.")
## [1] "We reject the null hypothesis that opinion on collective bargaining is independent of employee classification."
x=matrix(c(12,9,12,10,7,10,12,17,7,4),nrow=2,byrow=T)
accesories=c('Boots','Leather shoes','Sneakers','Sandals','Others')
y=rowSums(x)%*%t(colSums(x))/sum(x)
testStatistic=sum((x-y)^2/y)
pValue=pchisq(testStatistic,prod(dim(x)-1),lower.tail = FALSE)
chisq.test(x)
##
## Pearson's Chi-squared test
##
## data: x
## X-squared = 2.8201, df = 4, p-value = 0.5884
print("We fail to reject the null hypothesis that the choice of footwear by undergraduate students in this university is independent of their gender.")
## [1] "We fail to reject the null hypothesis that the choice of footwear by undergraduate students in this university is independent of their gender."