Function Setup (used for determining rejection of hypothesis or not)

myp=function(p, alpha){
  if(p<alpha){print('REJECT Ho')}else{print('FAIL 2 REJECT')}
}

Q01 - Using traditional methods, it takes 109 hours to receive a basic driving license. A new license training method using Computer Aided Instruction (CAI) has been proposed. A researcher used the technique with 190 students and observed that they had a mean of 110 hours. Assume the standard deviation is known to be 6. A level of significance of 0.05 will be used to determine if the technique performs differently than the traditional method. Make a decision to reject or fail to reject the null hypothesis. Show all work in R.

Ho (mu = 109): Mean number of hours to obtain the driving license with the CAI is equal to the mean number of hours to obtain the driving license with traditional method (109).

Ha (mu != 109): Mean number of hours to obtain the driving license with the CAI is NOT equal to the mean number of hours to obtain the driving license with traditional method (109).

Two sided test (look at alternative hypothesis).

# Define Variables
n <- 190 # Sample population
m <- 109 # Population Mean
xbar <- 110 # Sample mean
sigma <- 6 # Standard deviation of population
se <- sigma/sqrt(n) # Standard deviation of sample (standard error)
alpha <- 0.05 # Level of significance

# P()
z <- (xbar - m)/(se)
p <- round(2*(1 - pnorm(z)),4)

cat("The probability is",round(p,4), ".\n")
## The probability is 0.0216 .
myp(p,alpha)
## [1] "REJECT Ho"

Q02 - Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 5.3 parts/million (ppm). A researcher believes that the current ozone level is at an insufficient level. The mean of 5 samples is 5.0 parts per million (ppm) with a standard deviation of 1.1. Does the data support the claim at the 0.05 level? Assume the population distribution is approximately normal.

Ho: Mean ppm of ozone is equal to 5.3 ppm.

Ha: Mean ppm of ozone is less than 5.3 ppm.

One-sided test

# Define Variables
n <- 5 # Sample population
m <- 5.3 # Population Mean
xbar <- 5.0 # Sample mean
sigma <- 1.1 # Standard deviation of population
se <- sigma/sqrt(n) # Standard deviation of sample (standard error)
alpha <- 0.05 # Level of significance

# P()
z <- (xbar - m)/(se)
p <- pt(z,df=n-1,lower.tail=TRUE)

cat("The probability is",round(p,4), ".\n")
## The probability is 0.2875 .
myp(p,alpha)
## [1] "FAIL 2 REJECT"

Q03 - Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 7.3 parts/million (ppm). A researcher believes that the current ozone level is not at a normal level. The mean of 51 samples is 7.1 ppm with a variance of 0.49. Assume the population is normally distributed. A level of significance of 0.01 will be used. Show all work and hypothesis testing steps.

Ho: Mean ppm of ozone is equal to 7.3 ppm.

Ha: Mean ppm of ozone is less than 7.3 ppm.

One-sided test

# Define Variables
n <- 51 # Sample population
m <- 7.3 # Population Mean
xbar <- 7.1 # Sample mean
var <- 0.49 # Variance of population
sigma <- sqrt(var) # Standard deviation of population
se <- sigma/sqrt(n) # Standard deviation of sample (standard error)
alpha <- 0.01 # Level of significance

# P()
z <- (xbar - m)/(se)
p <- pnorm(q=z)

cat("The probability is",round(p,4), ".\n")
## The probability is 0.0207 .
myp(p,alpha)
## [1] "FAIL 2 REJECT"

Q04 - A publisher reports that 36% of their readers own a laptop. A marketing executive wants to test the claim that the percentage is actually less than the reported percentage. A random sample of 100 found that 29% of the readers owned a laptop. Is there sufficient evidence at the 0.02 level to support the executive’s claim? Show all work and hypothesis testing steps.

Ho: Readers who own a laptop is equal to 36%

Ha: Readers who own a laptop is less than 36%

No Standard Deviation = T Distribution

# Define Variables
n <- 100 # Sample population
pop <- 0.36
sam <- 0.29
alpha <- 0.02 # Level of significance
se <- sqrt((pop*(1-pop))/n)

t <- (sam - pop) / sqrt(pop * (1 - pop) / n)
p <- pt(t, df = n - 1, lower.tail = TRUE)

cat("The probability is",round(p,4), ".\n")
## The probability is 0.074 .
myp(p,alpha)
## [1] "FAIL 2 REJECT"

Q05 - A hospital director is told that 31% of the treated patients are uninsured. The director wants to test the claim that the percentage of uninsured patients is less than the expected percentage. A sample of 380 patients found that 95 were uninsured. Make the decision to reject or fail to reject the null hypothesis at the 0.05 level. Show all work and hypothesis testing steps.

Ho: Uninsured patients is equal to 31%

Ha: Readers who own a laptop is less than 31%

# Define Variables
n <- 380 # Sample population
pop <- 0.31
sam <- 95/n
alpha <- 0.05 # Level of significance
se <- sqrt((pop*(1-pop))/n)

z <- (sam - pop) /  se
p <- pnorm(z, lower.tail = TRUE)

cat("The probability is",round(p,4), ".\n")
## The probability is 0.0057 .
myp(p,alpha)
## [1] "REJECT Ho"

Q06 - [BONUS] Find the minimum sample size needed to be 99% confident that the sample’s variance is within 1% of the population’s variance.

Q07 - A standardized test is given to a sixth-grade class. Historically the mean score has been 112 with a standard deviation of 24. The superintendent believes that the standard deviation of performance may have recently decreased. She randomly sampled 22 students and found a mean of 102 with a standard deviation of 15.4387. Is there evidence that the standard deviation has decreased at the 𝛼 = 0.1 level? Show all work and hypothesis testing steps.

Ho: The standard deviation of tests is equal to 24.

Ha: The standard deviation of tests is less than 24.

# Define Variables
m <- 112 # Population mean
sigma <- 24 # Population standard deviation
n <- 22 # Sample size
xbar <- 102  # Sample mean
se <- 15.4387 # Sample standard deviation
alpha <- 0.1 # Level of significance
df <- n - 1 # Degrees of freedom

t <- (n - 1) * se^2 / sigma^2
p <- pt(t, df, lower.tail = TRUE)

cat("The probability is",round(p,4), ".\n")
## The probability is 1 .
myp(p,alpha)
## [1] "FAIL 2 REJECT"

Q08 - A medical researcher wants to compare the pulse rates of smokers and non-smokers. He believes that the pulse rate for smokers and non-smokers is different and wants to test this claim at the 0.1 level of significance. The researcher checks 32 smokers and finds that they have a mean pulse rate of 87, and 31 non-smokers have a mean pulse rate of 84. The standard deviation of the pulse rates is found to be 9 for smokers and 10 for non-smokers. Let πœ‡1 be the true mean pulse rate for smokers and πœ‡2 be the true mean pulse rate for non-smokers. Show all work and hypothesis testing steps.

Ho: The pulse rate for smokers and non-smokers is not equal.

Ha: The pulse rate for smokers and non-smokers is equal.

T-Distribution? Don’t know sd of pop and sample is small.

# Define Variables
nsmoke <- 32 # Sample num smokers
msmoke <- 87 # Sample mean smokers
sdsmoke <- 9 # Sample standard deviation smokers
nnon <- 31 # Sample num nonsmokers
mnon <- 84 # Sample mean nonsmokers
sdnon <- 10 # Sample standard deviation nonsmokers
n <- nsmoke + nnon # Total Sample
alpha <- 0.1 # Level of significance
df <- nnon - 1 # Degrees of freedom

mdiff <- msmoke - mnon
varsmoke <- sdsmoke^2
varnon <- sdnon^2
se <- sqrt((varsmoke/nsmoke)+(varnon/nnon))
t <- mdiff/se
p <- 2*pt(t, df, lower.tail = FALSE)


cat("The probability is",round(p,4), ".\n")
## The probability is 0.2208 .
myp(p,alpha)
## [1] "FAIL 2 REJECT"

Q09 - Given two independent random samples with the following results: [𝑛1 = 11, π‘₯Μ…1 = 127,𝑠1 = 33,𝑛2 = 18,π‘₯Μ…2 = 157,𝑠2 = 27].Use this data to find the 95% confidence interval for the true difference between the population means. Assume that the population variances are not equal and that the two populations are normally distributed.

# Define Variables
n1 <- 11
xbar1 <- 127
sigma1 <- 33
n2 <- 18
xbar2 <- 157
sigma2 <- 27
alpha <- 0.05
var1 <- sigma1^2
var2 <- sigma2^2
df <- min(n1-1,n2-1)

mdiff <- xbar1 - xbar2
se <- sqrt((var1/n1)+(var2/n2))
tstat <- mdiff/se
t <- qt(0.025, df, lower.tail = FALSE)
margin_error <- t * se

l <- mdiff - margin_error
u <- mdiff + margin_error

cat("95% Confidence interval:", round(l, 4), ",", round(u, 4))
## 95% Confidence interval: -56.3166 , -3.6834

Q10 - Two men, A and B, who usually commute to work together decide to conduct an experiment to see whether one route is faster than the other. The men feel that their driving habits are approximately the same, so each morning for two weeks one driver is assigned to route I and the other to route II. The times, recorded to the nearest minute, are shown in the following table.Using this data, find the 98% confidence interval for the true mean difference between the average travel time for route I and the average travel time for route II.

# Create data sets for each route
route1 <- c(32, 27, 34, 24, 31, 25, 30, 23, 27, 35)
route2 <- c(28, 28, 33, 25, 26, 29, 33, 27, 25, 33)

# Define Variables
n1 <- 10
xbar1 <- mean(route1)
s1 <- sd(route1)
var1 <- s1^2
n2 <- 10
xbar2 <- mean(route2)
s2 <- sd(route2)
var2 <- s2^2
df <- n1-1

mdiff <- xbar1 - xbar2
se <- sqrt((var1/n1)+(var2/n2))

t <- mdiff/se
tdf <- qt(p=.01, df, lower.tail=FALSE)

u <- mdiff + (tdf*se)
l <- mdiff - (tdf*se)
cat("(Lower: ",l,", Upper:",u,")")
## (Lower:  -4.637066 , Upper: 4.837066 )

Q11 - The U.S. Census Bureau conducts annual surveys to obtain information on the percentage of the voting-age population that is registered to vote. Suppose that 391 employed persons and 510 unemployed persons are independently and randomly selected, and that 195 of the employed persons and 193 of the unemployed personshave registered to vote. Can we conclude that the percentage of employed workers (𝑝1), who have registered to vote, exceeds the percentage of unemployed workers (𝑝2), who have registered to vote? Use a significance level of 𝛼𝛼 = 0.05 for the test. Show all work and hypothesis testing steps.

Ho: The percentage of employed workers registered to vote is less than percentage of unemployed workers registered to vote.

Ha: The percentage of employed workers registered to vote is greater than or equal to the percentage of unemployed workers registered to vote.

# Define variables
n1 <- 391
xbar1 <- 195
s1 <- 1
var1 <- s1^2
n2 <- 510
xbar2 <- 193
s2 <- 1
var2 <- s2^2
df <- min(n1-1,n2-1)
alpha <- 0.05

mdiff <- xbar1 - xbar2
se <- sqrt((var1/n1)+(var2/n2))

t <- mdiff/se
tdf<-qt(p=.025, df, lower.tail=FALSE)

u<- mdiff + (tdf*se)
l<- mdiff - (tdf*se)
cat("(Lower: ",l,", Upper:",u,")")
## (Lower:  1.867844 , Upper: 2.132156 )
p<-2*pt(t, df, lower.tail = FALSE)
cat("The probability is",p, ".\n")
## The probability is 2.234708e-102 .
myp(p,alpha)
## [1] "REJECT Ho"