# ShadeNorm
shadenorm = function(below=NULL, above=NULL, pcts = c(0.025,0.975), mu=0, sig=1, numpts = 500, color = "gray", dens = 40, justabove= FALSE, justbelow = FALSE, lines=FALSE,between=NULL,outside=NULL){
if(is.null(between)){
below = ifelse(is.null(below), qnorm(pcts[1],mu,sig), below)
above = ifelse(is.null(above), qnorm(pcts[2],mu,sig), above)
}
if(is.null(outside)==FALSE){
below = min(outside)
above = max(outside)
}
lowlim = mu - 4*sig # min point plotted on x axis
uplim = mu + 4*sig # max point plotted on x axis
x.grid = seq(lowlim,uplim, length= numpts)
dens.all = dnorm(x.grid,mean=mu, sd = sig)
if(lines==FALSE){
plot(x.grid, dens.all, type="l", xlab="X", ylab="Density") # label y and x axis
}
if(lines==TRUE){
lines(x.grid,dens.all)
}
if(justabove==FALSE){
x.below = x.grid[x.grid<below]
dens.below = dens.all[x.grid<below]
polygon(c(x.below,rev(x.below)),c(rep(0,length(x.below)),rev(dens.below)),col=color,density=dens)
}
if(justbelow==FALSE){
x.above = x.grid[x.grid>above]
dens.above = dens.all[x.grid>above]
polygon(c(x.above,rev(x.above)),c(rep(0,length(x.above)),rev(dens.above)),col=color,density=dens)
}
if(is.null(between)==FALSE){
from = min(between)
to = max(between)
x.between = x.grid[x.grid>from&x.grid<to]
dens.between = dens.all[x.grid>from&x.grid<to]
polygon(c(x.between,rev(x.between)),c(rep(0,length(x.between)),rev(dens.between)),col=color,density=dens)
}
}
# ShadeT
shadet = function(below=NULL, above=NULL, pcts = c(0.025,0.975), df=1, numpts = 500, color = "gray", dens = 40, justabove= FALSE, justbelow = FALSE, lines=FALSE,between=NULL,outside=NULL){
if(is.null(between)){
below = ifelse(is.null(below), qt(pcts[1],df), below)
above = ifelse(is.null(above), qt(pcts[2],df), above)
}
if(is.null(outside)==FALSE){
below = min(outside)
above = max(outside)
}
lowlim = -4
uplim = 4
x.grid = seq(lowlim,uplim, length= numpts)
dens.all = dt(x.grid,df)
if(lines==FALSE){
plot(x.grid, dens.all, type="l", xlab="X", ylab="Density")
}
if(lines==TRUE){
lines(x.grid,dens.all)
}
if(justabove==FALSE){
x.below = x.grid[x.grid<below]
dens.below = dens.all[x.grid<below]
polygon(c(x.below,rev(x.below)),c(rep(0,length(x.below)),rev(dens.below)),col=color,density=dens)
}
if(justbelow==FALSE){
x.above = x.grid[x.grid>above]
dens.above = dens.all[x.grid>above]
polygon(c(x.above,rev(x.above)),c(rep(0,length(x.above)),rev(dens.above)),col=color,density=dens)
}
if(is.null(between)==FALSE){
from = min(between)
to = max(between)
x.between = x.grid[x.grid>from&x.grid<to]
dens.between = dens.all[x.grid>from&x.grid<to]
polygon(c(x.between,rev(x.between)),c(rep(0,length(x.between)),rev(dens.between)),col=color,density=dens)
}
}
# MYP
myp=function(p, alpha){
if(p<alpha){print('REJECT Ho')}else{print('FAIL 2 REJECT')}
}
Using traditional methods, it takes 109 hours to receive a basic driving license. A new license training method using Computer Aided Instruction (CAI) has been proposed. A researcher used the technique with 190 students and observed that they had a mean of 110 hours. Assume the standard deviation is known to be 6. A level of significance of 0.05 will be used to determine if the technique performs differently than the traditional method. Make a decision to reject or fail to reject the null hypothesis. Show all work in R.
HINT: In class lecture notes. See solution in HW_5_Questions_Template.Rmd, and feel free to build upon the .Rmd file for your own submission.
# Ho: Mu=109, Ha: Mu<>109
Z = (110 - 109) / (6/sqrt(190))
teststat <- Z
teststat
## [1] 2.297341
alpha = .05
?pnorm
criticalval <- qnorm(p = 0.975,
mean = 0,
sd = 1
)
criticalval
## [1] 1.959964
Test statistic > Critical value, reject null
p_value = 2 * (1 - pnorm(q = teststat))
p_value
## [1] 0.0215993
p value < alpha, reject null
Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 5.3 parts/million (ppm). A researcher believes that the current ozone level is at an insufficient level. The mean of 5 samples is 5.0 ppm with a standard deviation of 1.1. Does the data support the claim at the 0.05 level? Assume the population distribution is approximately normal.
HINT: Lecture notes.
# Ho: Mu=5, Ha: Mu<>5
alpha <- 0.05
xbar <- 5
Mu <- 5.3
n <- 5
df <- n-1
sd <- 1.1
Se <- sd/sqrt(n)
t <- (xbar - Mu) / Se
t
## [1] -0.6098367
teststat <- t
teststat
## [1] -0.6098367
?qt
criticalval <- qt(p = 0.025,
df = df)
criticalval
## [1] -2.776445
xbar
## [1] 5
Se
## [1] 0.491935
t <- qt(p = 0.975,
df = n-1)
t
## [1] 2.776445
interval = c(xbar - t * Se, xbar + t * Se)
interval
## [1] 3.63417 6.36583
Fail to reject
Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 7.3 parts/million (ppm). A researcher believes that the current ozone level is not at a normal level. The mean of 51 samples is 7.1 ppm with a variance of 0.49. Assume the population is normally distributed. A level of significance of 0.01 will be used. Show all work and hypothesis testing steps.
HINT: Lecture notes.
# Ho: Mu=7.3, Ha: Mu<>7.3
Mu <- 7.3
alpha <- 0.01
n <- 51
df <- n-1
sd <- sqrt(0.49)
Se <- sd/sqrt(n)
t <- (7.1 - 7.3) / Se
t
## [1] -2.040408
criticalval <- qt(p = 0.005,
df = df)
criticalval
## [1] -2.677793
?pt
pvalue <- 2 * pt(q = t,
df = df)
pvalue
## [1] 0.04660827
temp <- rt(2000,df)*Se+Mu
pvaluesim <- 2*(length(temp[temp<=7.1])/length(temp))
pvaluesim
## [1] 0.039
myp(p = pvalue, alpha = alpha)
## [1] "FAIL 2 REJECT"
shadet(df = n-1,
pcts = c(.005 , 0.995),
color = "lightblue")
lines(x = rep(t,10),
y = seq(from = 0,
to = 1,
length.out = 10
),
col='gray'
)
A publisher reports that 36% of their readers own a laptop. A marketing executive wants to test the claim that the percentage is actually less than the reported percentage. A random sample of 100 found that 29% of the readers owned a laptop. Is there sufficient evidence at the 0.02 level to support the executive’s claim? Show all work and hypothesis testing steps.
HINT: Question 4 (See Open Stats Textbook - Chapter 5 Section 5.2: Confidence intervals for a proportion)
# Ho: pi>=0.36, Ha: pi<0.36
phat <- 0.29
p <- 0.36
q <- 1 - p
alpha <- 0.02
n <- 100
Se <- sqrt(p*q/n)
Z <- (phat - p) / Se
teststat <- Z
teststat
## [1] -1.458333
criticalval <- qnorm(p = 0.02 )
criticalval
## [1] -2.053749
pvalue <- pnorm(Z)
pnorm(Z)
## [1] 0.07237434
myp(p = pvalue, alpha = alpha)
## [1] "FAIL 2 REJECT"
criticalval <- qnorm(p = 0.05 )
criticalval
## [1] -1.644854
teststat
## [1] -1.458333
criticalval <- qnorm(p = 0.1)
criticalval
## [1] -1.281552
teststat
## [1] -1.458333
shadenorm(mu = .36,
sig = Se,
pcts = c(.02),
color = 'lightblue'
)
lines(x = rep(.29,10),
y = seq(from = 0,
to = 20,
length.out=10),
col='gray')
A hospital director is told that 31% of the treated patients are uninsured. The director wants to test the claim that the percentage of uninsured patients is less than the expected percentage. A sample of 380 patients found that 95 were uninsured. Make the decision to reject or fail to reject the null hypothesis at the 0.05 level. Show all work and hypothesis testing steps.
HINT: (See Open Stats Textbook - Chapter 5)
# Ho: pi>=0.31, Ha: pi<0.31
p <- 0.31
q <- 1-p
n <- 380
x <- 95
phat <- x/n
phat
## [1] 0.25
alpha<- 0.05
Se <- sqrt(p * q / n)
Se
## [1] 0.0237254
Z <- (phat - p) / Se
teststat <- Z
teststat
## [1] -2.528935
criticalval <- qnorm(p = alpha,
mean = 0,
sd = 1)
criticalval
## [1] -1.644854
myp(p = pvalue, alpha = alpha)
## [1] "FAIL 2 REJECT"
shadenorm(mu = 0.31,
sig = Se,
pcts = c(alpha),
color = 'lightblue')
lines(x = rep(phat,10),
y = seq(0,20,length.out=10),
col = 'gray')
Reject null
Given two independent random samples with the following
results:
𝑛1 = 11 𝑥̅1 = 127 𝑠1 = 33
𝑛2 = 18 𝑥̅2 = 157 𝑠2 = 27
Use this data to find the 95% confidence interval for the true difference between the population means. Assume that the population variances are not equal and that the two populations are normally distributed.
HINT: (See Open Stats Section 7.3, Example 7.22 in particular)
# Ho: Mu1-Mu2=0
# Ha: Mu1-Mu2<>0
mu1 <- 82
mu2 <- 81
alpha = 0.1
n1 = 34
n2 = 33
df1 = n1 - 1
df2 = n2 - 1
sd1 = 9
sd2 = 10
var1 = sd1^2
var2 = sd2^2
pointestdiff = (mu1-mu2)
denSe = sqrt(var1/n1 + var2/n2)
t = (pointestdiff - 0) / denSe
teststat <- t
teststat
## [1] 0.4298281
numdf = (var1/n1 + var2/n2)^2
dendf = (var1/n1)^2 / df1 + (var2/n2)^2 / df2
df = numdf / dendf
criticalvalsatter <- qt(p=(1-alpha/2), df=numdf/dendf)
criticalvalsatter
## [1] 1.669077
?min
criticalvalcons <- qt(p=(1-alpha/2), df=min(df1, df2))
criticalvalcons
## [1] 1.693889
?pt
pval = 2*(1-pt(q = t,
df = numdf/dendf)
)
pval
## [1] 0.6687682
myp(p = pval,
alpha = alpha)
## [1] "FAIL 2 REJECT"
pvalcons = 2*(1-pt(q = t,
df = min(df1, df2))
)
pvalcons
## [1] 0.6702014
myp(p = pvalcons,
alpha = alpha)
## [1] "FAIL 2 REJECT"
CI <- c(pointestdiff - (criticalvalcons * denSe),
pointestdiff + (criticalvalcons * denSe))
CI
## [1] -2.940852 4.940852
Fail to reject null
Two men, A and B, who usually commute to work together decide to conduct an experiment to see whether one route is faster than the other. The men feel that their driving habits are approximately the same, so each morning for two weeks one driver is assigned to route I and the other to route II. The times, recorded to the nearest minute, are shown in the following table.
Day Mo Tu Wed Th F Mo Tu Wed Th F ###### Route I ###### 32 27 34 24 31 25 30 23 27 35
Using this data, find the 98% confidence interval for the true mean difference between the average travel time for route I and the average travel time for route II.
Let 𝑑 = (𝑟𝑜𝑢𝑡𝑒 𝐼 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑖𝑚𝑒) − (𝑟𝑜𝑢𝑡𝑒 𝐼𝐼 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑖𝑚𝑒).
Assume that the populations of travel times are normally distributed for
both routes. Show all work and hypothesis testing steps.
HINT: See Open Stats Section 7.3, Example 7.22 in particular
# Ho: xbar1 - xbar2 = 0,
# Ha: xbar1 - xbar2 <> 0
alpha = 0.05
xbar1 = 127
xbar2 = 157
n1 = 11
n2 = 18
df1 = n1-1
df2 = n2-1
s1 = 33
s2 = 27
var1 = s1^2
var2 = s2^2
numdf = (var1 / n1 + var2 / n2)^2
dendf = (var1 / n1)^2 / df1 +
(var2 / n2)^2 / df2
df = numdf / dendf
delta = xbar1 - xbar2
delta
## [1] -30
t = qt(p = 0.025, df = df)
t
## [1] -2.10029
t = qt(p = 0.975, df = df)
t
## [1] 2.10029
Se = sqrt( var1/n1 + var2/n2 )
Se
## [1] 11.81101
interval = c(delta - t * Se,
delta + t * Se)
interval
## [1] -54.806548 -5.193452
teststat <- delta/Se
teststat
## [1] -2.540003
criticalval <- qt(p = alpha/2, df = numdf / dendf)
criticalval
## [1] -2.10029
criticalval <- qt(p = alpha/2, df = min(numdf, dendf))
criticalval
## [1] -1.96217
?pt
pval = 2*(pt(q = delta/Se, df = numdf/dendf))
pval
## [1] 0.02048034
myp(pval,alpha)
## [1] "REJECT Ho"
pvalrob = 2*( pt(q = delta/Se, df = min(df1, df2)))
pvalrob
## [1] 0.02936303
myp(pvalrob,alpha)
## [1] "REJECT Ho"
The U.S. Census Bureau conducts annual surveys to obtain information on the percentage of the voting-age population that is registered to vote. Suppose that 391 employed persons and 510 unemployed persons are independently and randomly selected, and that 195 of the employed persons and 193 of the unemployed persons have registered to vote. Can we conclude that the percentage of employed workers ( 𝑝1 ), who have registered to vote, exceeds the percentage of unemployed workers ( 𝑝2 ), who have registered to vote? Use a significance level of 𝛼 = 0.05 for the test. Show all work and hypothesis testing steps.
HINT: (See Open Stats Textbook - 6.2.1 Sampling distribution of the
difference of two proportions and 6.2.2 Confidence intervals for p1 -
p2).
HINT: (Also, See Open Stats Textbook - Chapter 5 Section 5.2-5.33:
Confidence intervals/Hypothesis testing for a proportion)
r1 <- c(32, 27, 34, 24, 31, 25, 30, 23, 27, 35)
r2 <- c(28, 28, 33, 25, 26, 29, 33, 27, 25, 33)
alpha <- 0.02
xbar1 <- mean(r1)
xbar2 <- mean(r2)
n1 <- 10
n2 <- 10
df1 <- n1 - 1
df2 <- n2 - 1
?sd
s1 <- sd(r1)
s2 <- sd(r2)
var1 <- s1^2
var2 <- s2^2
Se = sqrt(var1/n1 + var2/n2)
Se
## [1] 1.678955
delta <- xbar1-xbar2
delta
## [1] 0.1
t = qt(p = alpha/2, df = min(df1,df2))
t
## [1] -2.821438
t = qt(p = 1-alpha/2, df = min(df1,df2))
t
## [1] 2.821438
interval = c( delta - t * Se , delta + t * Se )
interval
## [1] -4.637066 4.837066
numdf = (var1/n1 + var2/n2)^2
dendf = (var1/n1)^2 / df1 + (var2/n2)^2 / df2
df = numdf / dendf
criticalvalsatter <- qt(p = (1-alpha/2), df = df)
criticalvalsatter
## [1] 2.568883
interval = c(delta - criticalvalsatter * Se ,
delta + criticalvalsatter * Se )
interval
## [1] -4.213039 4.413039
The U.S. Census Bureau conducts annual surveys to obtain information on the percentage of the voting-age population that is registered to vote. Suppose that 391 employed persons and 510 unemployed persons are independently and randomly selected, and that 195 of the employed persons and 193 of the unemployed persons have registered to vote. Can we conclude that the percentage of employed workers (p1) who have registered to vote, exceeds the percentage of unemployed workers (p2) who have registered to vote? Use a significance level of 0.05 for the test. Show all work and hypothesis testing steps.
Q: Can we conclude that the percentage of employed workers (p1) who have registered to vote, exceeds the percentage of unemployed workers (p2) who have registered to vote?
Notation: Employed group is group 1 ; Unemployed group is group 2.
# Ho: pi1 - pi2 <= 0,
# Ha: pi1 - pi2 > 0
alpha=0.05
n1 = 391
n2 = 510
x1 = 195
x2 = 193
p1 = x1/n1
p2 = x2/n2
p1-p2
## [1] 0.1202899
Se = sqrt (p1 * (1-p1) / n1 + p2 * (1-p2) / n2)
Se
## [1] 0.03317529
Z = (p1 - p2) / Se
teststat <- Z
teststat
## [1] 3.625887
criticalval <- qnorm(p = 1 - alpha)
criticalval
## [1] 1.644854
shadenorm(mu = 0,
sig = Se,
pcts = c(0.0,0.95),
col = 'lightblue'
)
lines(rep(p1-p2,10),
seq(0,20,length.out=10),
col='gray')
pnorm(q = teststat,
lower.tail = FALSE)
## [1] 0.0001439855
1 - pnorm(q = teststat)
## [1] 0.0001439855
Reject null