Setting up Functions

#function to reject or not 
myp=function(p, alpha){
  if(p<alpha){print('REJECT Ho')}else{print('FAIL 2 REJECT')}
}

1) Using traditional methods, it takes 109 hours to receive a basic driving license. A new license training method using Computer Aided Instruction (CAI) has been proposed. A researcher used the technique with 190 students and observed that they had a mean of 110 hours. Assume the standard deviation is known to be 6. A level of significance of 0.05 will be used to determine if the technique performs differently than the traditional method. Make a decision to reject or fail to reject the null hypothesis. Show all work in R.

Method 1 Test statistic vs critical value. If test statistic is larger than critical value in absolute terms, reject the null

# Choose level 
alp <- 0.05
# Take sample 
#Compute test statistic
z <- (110-109)/(6/sqrt(190))
test_sta <- z 
test_sta
## [1] 2.297341
critical_value <- qnorm(p = 0.975,
                        mean = 0,
                        sd = 1 
                        )
critical_value
## [1] 1.959964

Since the test statistic is more extreme than the critical value, we will reject the null.

Method 2 Compare the P value with alpha. If the p value is smaller than alpha, reject the null

p_value <- 2 * (1-pnorm(q = test_sta,                       mean = 0, 
                        sd = 1 )
                )
p_value
## [1] 0.0215993
myp(p = p_value, alpha = alp)
## [1] "REJECT Ho"

Since the pvalue is less than alpha, we reject the null hypothesis.

Method 3 Confidence interval constructed from the sample point estimate contains tht hypothesized values ?if not, reject the null.

xbar <- 110 
Se <- 6/sqrt(190)
z <- pnorm(q = 0.975)

interval = c(xbar - z * Se, xbar + z * Se)

interval 
## [1] 109.6364 110.3636

2) Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 5.3 parts/million (ppm). A researcher believes that the current ozone level is at an insufficient level. The mean of 5 samples is 5.0 ppm with a standard deviation of 1.1. Does the data support the claim at the 0.05 level? Assume the population distribution is approximately normal.

Ho: data does not support the claim Ha: data supports the claim

#take sample 

xbar2 <- 5
Se2 <- 1.1/sqrt(5)
z2 <- pnorm(q = 0.975)

interval2 = c(xbar2 - z2 * Se2, xbar2 + z2 * Se2 )
interval2 
## [1] 4.589126 5.410874

Since 5.3 is whitin the interval We can not reject the null.

3) Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 7.3 parts/million (ppm). A researcher believes that the current ozone level is not at a normal level. The mean of 51 samples is 7.1 ppm with a variance of 0.49. Assume the population is normally distributed. A level of significance of 0.01 will be used. Show all work and hypothesis testing steps.

Ho: Data does not support the claim Ha: data supports the claim.

xbar3 <- 7.1
Se3 <- sqrt(0.49)/sqrt(51)
z3 <- pnorm(0.995)

interval = c(xbar3 - z3 * Se3, xbar3 + z3 * Se3)

interval
## [1] 7.017651 7.182349

Since 7.4 is ouside of the confindece interval we can reject the null.

4) A publisher reports that 36% of their readers own a laptop. A marketing executive wants to test the claim that the percentage is actually less than the reported percentage. A random sample of 100 found that 29% of the readers owned a laptop. Is there sufficient evidence at the 0.02 level to support the executive’s claim? Show all work and hypothesis testing steps

Ho: % of reader who own a laptop is 36% Ha: The % of readers who own a laptop is actually less than 36%

T-distribution

n <- 100
actual <- 0.36
test <- 0.29

#Degrees of freeodom 
df <- n - 1

Se4 <- sqrt((actual*(1-actual))/n)
Se4
## [1] 0.048
#calculate z value with alpha 
alpha4 <- 0.02
z4 <- qnorm(1-alpha4/2)
z4
## [1] 2.326348
interval = c(test - z4 *Se4, test + z4 * Se4)
interval 
## [1] 0.1783353 0.4016647

With a 98% confindece interval and 36% being inside ours intervals we cannot reject the null.

5)A hospital director is told that 31% of the treated patients are uninsured. The director wants to test the claim that the percentage of uninsured patients is less than the expected percentage. A sample of 380 patients found that 95 were uninsured. Make the decision to reject or fail to reject the null hypothesis at the 0.05 level. Show all work and hypothesis testing steps.

Ho: % of uninsured patients is 31% Ha:% of uninsured patitents is actually less than 31%

actual5 <- 0.31
n5 <- 380
# We need a % not a number 95 out of 380 is 23.684210526315788 %
test5 <- 23.684210526315788
Se5 <- sqrt((actual5*(1-actual5))/n5)
Se5
## [1] 0.0237254
#calculate z with alpha 
alpha5 <- 0.05
z5 <- qnorm(1-alpha5/2)
z5
## [1] 1.959964
interval = c(test5 - z5 * Se5, test5 + z5 * Se5)
interval 
## [1] 23.63771 23.73071

With a 98% confindece interval, and 23.68 is inside our intervals we cannot reject the null.

6)

7) A standardized test is given to a sixth-grade class. Historically the mean score has been 112 with a standard deviation of 24. The superintendent believes that the standard deviation of performance may have recently decreased. She randomly sampled 22 students and found a mean of 102 with a standard deviation of 15.4387. Is there evidence that the standard deviation has decreased at the 𝛼 = 0.1 level? Show all work and hypothesis testing steps.

Ho:sd is 24 Ha: the sd is actually less than 24

actual <- 112 
sd7 <- 24
n7 <- 22 
test <- 102
sd72 <- 15.4387

Se7 <- sqrt((sd7^2/n)+(sd72^2/n))
Se7
## [1] 2.853688
df7 <- n7 - 1 
alpha7 <- 0.1 
?qt
tdf7 <- qt(p = alpha7/2, df = df7, lower.tail = FALSE)
tdf7
## [1] 1.720743
# Confindence Intervals 

interval = c( sd72 - tdf7 * Se7, sd72 + tdf7 * Se7)
interval 
## [1] 10.52824 20.34916

We cannot reject the null since 15.4387 is with in the intervals.

8) A medical researcher wants to compare the pulse rates of smokers and non-smokers. He believes that the pulse rate for smokers and non-smokers is different and wants to test this claim at the 0.1 level of significance. The researcher checks 32 smokers and finds that they have a mean pulse rate of 87, and 31 non-smokers have a mean pulse rate of 84. The standard deviation of the pulse rates is found to be 9 for smokers and 10 for non-smokers. Let πœ‡1 be the true mean pulse rate for smokers and πœ‡2 be the true mean pulse rate for non-smokers. Show all work and hypothesis testing steps.

Ho: pulse rate are not the same for smooker and non-smokers Ha: Pulse rate are the same for smoker and no smokers

nsmoker <- 32
meansmoker <- 87 
nnosmoker <- 31
meannosmoker <- 84
sdsmoker <- 9 
sdnosmoker <- 10
df8nosmoker <- nnosmoker -1 
df8smoker <- nsmoker -1 
varsmoker <- sdsmoker^2
varnosmoker <- sdnosmoker^2
alpha8 <- 0.1

calcualte differences between the means

meandiff <- meansmoker - meannosmoker
meandiff
## [1] 3

Calculate standard error using samplen standard deviation

Se8 <- sqrt( varsmoker/nsmoker + varnosmoker/nnosmoker)
Se8
## [1] 2.399387
## Calculate T test statistic 
t8 <- (meandiff - 0 )/Se8
test_stat8 <- t8
test_stat8
## [1] 1.25032

Calculate df for both samples

numdf <- (varsmoker/nsmoker + varnosmoker/nnosmoker)^2

dendf <- (varsmoker/nsmoker)^2 / df8smoker + (varnosmoker/nnosmoker)^2 / df8nosmoker

df8 <- numdf / dendf 
df8
## [1] 59.87528

Calculate critival value

critical8 <- qt(p = (1-alpha8/2), df = numdf / dendf )
critical8 
## [1] 1.670703

Since critical value 1.67 is more extreme than the test statitics 1.25 we can fail to reject the null.

9) Given two independent random samples with the following results:𝑛1 = 11, π‘₯Μ…1 = 12,7 𝑠1 = 33, 𝑛2 = 18, π‘₯Μ…2 = 15, 𝑠2 = 27

Use this data to find the 95% confidence interval for the true difference between the population means. Assume that the population variances are not equal and that the two populations are normally distributed.

n91 <- 11
x91 <- 127 
s91 <- 33 
n92 <- 18 
x92 <- 157
s92 <- 27
var91 <- x91^2
var92 <- x92^2
alpha9 <- 0.05
# Calculate mean diff 

meandiff9 <- x91 - x92
meandiff9
## [1] -30
#Calculate standard error using sample standard deviation 

Se9 <- sqrt( var91/n91 + var92/n92)
Se9
## [1] 53.25093
## Calculate t stat 

tstat9 <- (meandiff9-0)/Se9
tstat9
## [1] -0.5633704
## Calculate Df for both samples 

numdf9 <- (var91/n91 + var92/n92)^2

dendf9 <- (var91/n91)^2 / x91 + (var92/n92)^2 / x92

df9 <- numdf9 / dendf9 
df9
## [1] 278.4956
## Calculate T score 

tdf9 <- qt(p = alpha9/2, df9, lower.tail = FALSE)
tdf9
## [1] 1.968519
## Confindence interval

interval = c( meandiff9 - tdf9 * Se9, meandiff9 + tdf9 * Se9)
interval 
## [1] -134.82545   74.82545

10) Two men, A and B, who usually commute to work together decide to conduct an experiment to see whether one route is faster than the other. The men feel that their driving habits are approximately the same, so each morning for two weeks one driver is assigned to route I and the other to route II. The times, recorded to the nearest minute, are shown in the following table.

route1 <- c(32,27,34,24,31,25,30,23,27,35)
route2 <- c(28,28,33,25,26,29,33,27,25,33)

n101 <- 10 
x101 <- mean(route1)
sd101 <- sd(route1)
var101 <- x101^2
n102 <- 10
x102 <- mean(route2)
sd102 <- sd(route2)
var102 <- x102^2
#calculate the mean diff 

meandiff10 <- x101 - x102 
meandiff10
## [1] 0.1
# Calculate standard error using sample standard deviation 

Se10 <- sqrt( var101/n101 + var102/n102)
Se9
## [1] 53.25093
## Calculate t stat 

tstat10 <- (meandiff10 - 0) / Se10
tstat10
## [1] 0.007777616
## Calculate Df for both samples 

numdf10 <- (var101/n101 + var102/n102)^2

dendf10 <- (var101/n101)^2 / x101 + (var102/n102)^2 / x102

df10 <- numdf10 / dendf10 
df10
## [1] 57.49983
## Calculate T score 
alpha10 <- 0.02 
tdf10 <- qt(p = alpha10/2, df10, lower.tail = FALSE)
tdf10
## [1] 2.392967
## Confindence interval

interval = c( meandiff10 - tdf10 * Se10, meandiff10 + tdf10 * Se10)
interval 
## [1] -30.66736  30.86736

11) The U.S. Census Bureau conducts annual surveys to obtain information on the percentage of the voting-age population that is registered to vote. Suppose that 391 employed persons and 510 unemployed persons are independently and randomly selected, and that 195 of the employed persons and 193 of the unemployed persons have registered to vote. Can we conclude that the percentage of employed workers ( 𝑝1 ), who have registered to vote, exceeds the percentage of unemployed workers ( 𝑝2 ), who have registered to vote? Use a significance level of 𝛼 = 0.05 for the test. Show all work and hypothesis testing steps.

Ho:p1 < p2 Ha: p1 > p2

n111 <- 311
x111 <- 195
p1 <- x111/n111
n112 <- 510
x112 <- 193
p2 <- x112/n112
z11 <- pnorm(q = 0.975)
z11
## [1] 0.8352199
#calculate p proportion

p <- (x111 + x112) / (n111 + n112)
p
## [1] 0.4725944
z111 <- (p1 - p2)/sqrt(p*1-p*1/n111+1/n112)
z111
## [1] 0.3614234
pvalue11 <- pnorm (q = z111, mean = 0, sd = 1, lower.tail = FALSE)
pvalue11
## [1] 0.3588915

Since the p value is bigger than alpha we can reject the null