Q1

Using traditional methods, it takes 109 hours to receive a basic driving license. A new license training method using Computer Aided Instruction (CAI) has been proposed. A researcher used the technique with 190 students and observed that they had a mean of 110 hours. Assume the standard deviation is known to be 6. A level of significance of 0.05 will be used to determine if the technique performs differently than the traditional method. Make a decision to reject or fail to reject the null hypothesis. Show all work in R.

H0: Technique does not perform differently than the traditional method Ha: Technique performs differently than the traditional method

First, define x1, x2, n, and standard deviation

mtrad<-109
mcai<-110
n<-190
SD<-6

Next, calculate degrees of freedom, and standard error

df<-n-1
SE<-SD/(sqrt(n))

Then calculate the t value with the appropriate degrees of freedom

#Calculate tdf
alpha<-.05
tdf<-qt(p=alpha/2, df, lower.tail=FALSE)

Finally, construct the confidence interval

upper<- mcai + (tdf*SE)
lower<- mcai - (tdf*SE)
upper
## [1] 110.8586
lower
## [1] 109.1414

Since 109 falls outside of the interval, we can reject the null hypothesis and assume that the technique performs differently than the traditional method.

Q2

Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 5.3 parts/million (ppm). A researcher believes that the current ozone level is at an insufficient level. The mean of 5 samples is 5.0 ppm with a standard deviation of 1.1. Does the data support the claim at the 0.05 level? Assume the population distribution is approximately normal.

H0: Data does not support the claim Ha: Data does support the claim

First, define x1, x2, n, and standard deviation

orig<-5.3
test<-5
n<-5
SD<-1.1 #sd of the sample population, shown usually as s (sd is usually sigma)

Next, calculate degrees of freedom, and standard error

df<-n-1
SE<-SD/(sqrt(n))

Calculate t value with the appropriate degrees of freedom

alpha<-.05
tdf<-qt(p=alpha/2, df, lower.tail=FALSE)

Construct confidence interval

upper<- test + (tdf*SE)
lower<- test - (tdf*SE)
upper
## [1] 6.36583
lower
## [1] 3.63417

Print the appropriate answer based on the CI

if(orig > upper || orig < lower){
  cat("With a 95% confidence interval, the level of ozone normally found is between", lower, "and", upper, "ppm.", 
    "\n", "Because", orig, "is outside of that interval, we can reject the null hypothesis and assume that the data supports the researcher's claim.")
  } else {cat ("With a 95% confidence interval, the level of ozone normally found is between", lower, "and", upper, "ppm.", 
    "\n", "Because", orig, "is within that interval, we cannot reject the null hypothesis. The data does not support the claim.")
  }
## With a 95% confidence interval, the level of ozone normally found is between 3.63417 and 6.36583 ppm. 
##  Because 5.3 is within that interval, we cannot reject the null hypothesis. The data does not support the claim.

Q3

Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 7.3 parts/million (ppm). A researcher believes that the current ozone level is not at a normal level. The mean of 51 samples is 7.1 ppm with a variance of 0.49. Assume the population is normally distributed. A level of significance of 0.01 will be used. Show all work and hypothesis testing steps.

H0: Data does not support the claim Ha: Data does support the claim

First, define x1, x2, n, and standard deviation and calculate degrees of freedom and standard error

orig<-7.3
test<-7.1
n<-51
SD<-0.49 #sd of the sample population, shown usually as s (sd is usually sigma)
df<-n-1
SE<-SD/(sqrt(n))

Calculate t value with the appropriate degrees of freedom

alpha<-.01
tdf<-qt(p=.005, df, lower.tail=FALSE)

Construct confidence interval

upper<- test + (tdf*SE)
lower<- test - (tdf*SE)
upper
## [1] 7.283733
lower
## [1] 6.916267

Print the appropriate answer based on the CI

if(orig > upper || orig < lower){
  cat("With a 99% confidence interval, the level of ozone normally found is between", lower, "and", upper, "ppm.", 
      "\n", "Because", orig, "is outside of that interval, we can reject the null hypothesis and assume that the data supports the researcher's claim.")
} else {cat ("With a 99% confidence interval, the level of ozone normally found is between", lower, "and", upper, "ppm.", 
             "\n", "Because", orig, "is within that interval, we cannot reject the null hypothesis. The data does not support the claim.")
}
## With a 99% confidence interval, the level of ozone normally found is between 6.916267 and 7.283733 ppm. 
##  Because 7.3 is outside of that interval, we can reject the null hypothesis and assume that the data supports the researcher's claim.

Q4

A publisher reports that 36% of their readers own a laptop. A marketing executive wants to test the claim that the percentage is actually less than the reported percentage. A random sample of 100 found that 29% of the readers owned a laptop. Is there sufficient evidence at the 0.02 level to support the executive’s claim? Show all work and hypothesis testing steps.

H0: Percentage of readers who own a laptop is not less than 36% Ha: Percentage of readers who own a laptop is less than 36%

First, define x1, x2, n, and standard deviation and calculate degrees of freedom and standard error

orig<-.36
test<-.29
n<-100
df<-n-1
#we don't have sd, so calculating SE and using Z value
SE<-sqrt((orig*(1-orig))/n)

Calculate Z value with the appropriate α

Z<-qnorm(1-.02/2) 

Construct confidence interval

upper<-test+Z*SE
lower<-test-Z*SE
upper
## [1] 0.4016647
lower
## [1] 0.1783353

Print the appropriate answer based on the CI

if(orig > upper){
  cat("With a 98% confidence interval, the percent of readers that own a laptop is between", lower*100, "and", upper*100, "percent.", 
      "\n", "Because", orig*100, "is greater than the upper bound, we can reject the null hypothesis and assume that the data supports the claim.")
} else {cat ("With a 98% confidence interval, the percent of readers that own a laptop is between", lower*100, "and", upper*100, "percent.", 
             "\n", "Because", orig*100, "is equal to or less than the upper bound, we cannot reject the null hypothesis. The data does not support the claim.")
}
## With a 98% confidence interval, the percent of readers that own a laptop is between 17.83353 and 40.16647 percent. 
##  Because 36 is equal to or less than the upper bound, we cannot reject the null hypothesis. The data does not support the claim.

Q5

A hospital director is told that 31% of the treated patients are uninsured. The director wants to test the claim that the percentage of uninsured patients is less than the expected percentage. A sample of 380 patients found that 95 were uninsured. Make the decision to reject or fail to reject the null hypothesis at the 0.05 level. Show all work and hypothesis testing steps.

H0: Percentage of uninsured patients is not less than 31%. Ha: Percentage of uninsured patients is less than 31%.

First, define x1, x2, n, and standard deviation and calculate degrees of freedom and standard error

orig<-.31
test<-.95
n<-380
df<-n-1
SE<-sqrt((orig*(1-orig))/n)

Calculate Z value with the appropriate α

Z<-qnorm(1-.05/2)

Construct confidence interval

upper<-test+Z*SE
lower<-test-Z*SE
upper
## [1] 0.9965009
lower
## [1] 0.9034991

Print the appropriate answer based on the upper CI

if(orig > upper){
  cat("With a 98% confidence interval, the percent of uninsured patients is between", lower*100, "and", upper*100, "percent.", 
      "\n", "Because", orig*100, "is greater than the upper bound, we can reject the null hypothesis and assume that the data supports the claim.")
} else {cat ("With a 98% confidence interval, the percent of uninsured patients is between", lower*100, "and", upper*100, "percent.", 
             "\n", "Because", orig*100, "is equal to or less than the upper bound, we cannot reject the null hypothesis. The data does not support the claim.")
}
## With a 98% confidence interval, the percent of uninsured patients is between 90.34991 and 99.65009 percent. 
##  Because 31 is equal to or less than the upper bound, we cannot reject the null hypothesis. The data does not support the claim.

Q6

Find the minimum sample size needed to be 99% confident that the sample’s variance is within 1% of the population’s variance.

First, calculate Z value with the appropriate α

Z<-qnorm(1-.01/2)

Next, define the margin of error and standard deviation

MOE<-.01 #margin of error
SD<- 1 #assuming standard deviation is 1

Since we know that n >= (Z*SD/MOE)^2, we can calculate n using Z, SD, and MOE

n<-((Z*SD)/MOE)^2
round(n,0)
## [1] 66349

Q7

A standardized test is given to a sixth-grade class. Historically the mean score has been 112 with a standard deviation of 24. The superintendent believes that the standard deviation of performance may have recently decreased. She randomly sampled 22 students and found a mean of 102 with a standard deviation of 15.4387. Is there evidence that the standard deviation has decreased at the 𝛼𝛼 = 0.1 level? Show all work and hypothesis testing steps.

H0: SD has not decreased below 24 Ha: SD has decreased below 24

First, define means, n, and standard deviations and calculate degrees of freedom and standard error

orig<-112
SD1<-24
test<-102
SD2<-15.4387
n<-22 #assuming n1 and n2 are the same
meandiff<-orig - test
SE<-sqrt((SD1^2/n)+(SD2^2/n))
df<-n-1

Calculate t value with the appropriate degrees of freedom

alpha<-.1
tdf<-qt(p=alpha/2, df, lower.tail=FALSE)

Construct confidence interval

upper<- SD2 + (tdf*SE)
lower<- SD2 - (tdf*SE)
upper
## [1] 25.90784
lower
## [1] 4.969557

Print the appropriate answer based on the CI

if(SD1 > upper || SD1 < lower){
  cat("With a 99% confidence interval, the standard deviation is between", lower, "and", upper, 
      "\n", "Because", SD1, "is outside of that interval, we can reject the null hypothesis and assume that the data supports the claim that the SD has decreased.")
} else {cat ("With a 99% confidence interval, the standard deviation is between", lower, "and", upper, 
             "\n", "Because", SD1, "is within that interval, we cannot reject the null hypothesis that SD has not decreased below 24.")
}
## With a 99% confidence interval, the standard deviation is between 4.969557 and 25.90784 
##  Because 24 is within that interval, we cannot reject the null hypothesis that SD has not decreased below 24.

Q8

A medical researcher wants to compare the pulse rates of smokers and non-smokers. He believes that the pulse rate for smokers and non-smokers is different and wants to test this claim at the 0.1 level of significance. The researcher checks 32 smokers and finds that they have a mean pulse rate of 87, and 31 non-smokers have a mean pulse rate of 84. The standard deviation of the pulse rates is found to be 9 for smokers and 10 for non-smokers. Let 𝜇𝜇1 be the true mean pulse rate for smokers and 𝜇𝜇2 be the true mean pulse rate for non-smokers. Show all work and hypothesis testing steps.

H0: Pulse rates are the same between smokers and non-smokers. Ha: Pulse rates are not the same between smokers and non-smokers.

First, define the means, populations, and standard deviations

mean.smoke<-87
sd.smoke<-8
mean.non<-84
sd.non<-10
n.smoke<-32
n.non<-31

Calculate the difference between the means (point estimate), standard error, and degrees of freedom (in this case, the smaller population)

meandiff<-mean.smoke - mean.non
SE<-sqrt((sd.smoke^2/n.smoke)+(sd.non^2/n.non))
df<-n.non-1

Calculate Test score and t value with the appropriate degrees of freedom

Tstat<-meandiff/SE
tdf<-qt(p=.05, df, lower.tail=FALSE)

Calculate p-value with T score and df

pvalue<-2*pt(Tstat, df, lower.tail = FALSE)
pvalue
## [1] 0.1993598

Print the appropriate answer based on the p-value

if(pvalue < .1){
  cat("The pvalue is",round(pvalue,3),"which is less than 0.1. Therefore, with a 99% confidence interval,",
      "\n", "we can reject the null hypothesis and assume the pulse rates are not the same between",
      "\n", "smokers and non-smokers.")
} else {cat ("The pvalue is ",round(pvalue,3),"which is greater than 0.1. Therefore, With a 99% confidence interval,",
             "\n", "we cannot reject the null hypothesis that the pulse rates are the same between", 
             "\n", "smokers and non-smokers.")
}
## The pvalue is  0.199 which is greater than 0.1. Therefore, With a 99% confidence interval, 
##  we cannot reject the null hypothesis that the pulse rates are the same between 
##  smokers and non-smokers.

Q9

Given two independent random samples with the following results: n1 = 11 x1 = 127 s1 = 33

n2 = 18 x2 = 157 s2 = 27

Use this data to find the 95% confidence interval for the true difference between the population means. Assume that the population variances are not equal and that the two populations are normally distributed.

First, assign values to the appropriate variables

n1<-11
x1<-127
s1<-33
n2<-18
x2<-157
s2<-27

Calculate the difference between the means (point estimate), standard error, and degrees of freedom (in this case, the smaller population)

meandiff<-x1-x2
SE<-sqrt((s1^2/n1)+(s2^2/n2))
df<-n1-1

Calculate T score and t value based on the degrees of freedom

Tstat<-meandiff/SE
tdf<-qt(p=.025, df, lower.tail=FALSE)

Construct confidence interval

upper<- meandiff + (tdf*SE)
lower<- meandiff - (tdf*SE)
cat("(",lower,",",upper,")")
## ( -56.31657 , -3.683426 )

Q10

Two men, A and B, who usually commute to work together decide to conduct an experiment to see whether one route is faster than the other. The men feel that their driving habits are approximately the same, so each morning for two weeks one driver is assigned to route I and the other to route II. The times, recorded to the nearest minute, are shown in the following table. Using this data, find the 98% confidence interval for the true mean difference between the average travel time for route I and the average travel time for route II. Let 𝑑𝑑 = (route I travel time) − (route II travel time). Assume that the populations of travel times are normally distributed for both routes. Show all work and hypothesis testing steps.

First, upload data

route1<-c(32,27,34,24,31,25,30,23,27,35)
route2<-c(28,28,33,25,26,29,33,27,25,33)

Calculate x and s for each route

x1<-mean(route1)
n1<-10
s1<-sd(route1)
x2<-mean(route2)
n2<-10
s2<-sd(route2)

Calculate the difference between the means (point estimate), standard error, and degrees of freedom (in this case, the smaller population)

meandiff<-x1-x2
SE<-sqrt((s1^2/n1)+(s2^2/n2))
df<-n1-1

Calculate T score and t value based on the degrees of freedom

Tstat<-meandiff/SE
tdf<-qt(p=.01, df, lower.tail=FALSE)

Construct confidence interval

upper<- meandiff + (tdf*SE)
lower<- meandiff - (tdf*SE)
cat("(",lower,",",upper,")")
## ( -4.637066 , 4.837066 )

Q11

The U.S. Census Bureau conducts annual surveys to obtain information on the percentage of the voting-age population that is registered to vote. Suppose that 391 employed persons and 510 unemployed persons are independently and randomly selected, and that 195 of the employed persons and 193 of the unemployed personshave registered to vote. Can we conclude that the percentage of employed workers (𝑝1), who have registered to vote, exceeds the percentage of unemployed workers (𝑝2), who have registered to vote? Use a significance level of 𝛼𝛼 = 0.05 for the test. Show all work and hypothesis testing steps.

H0: Percentage of employed workers registered to vote does not exceed percentage of unemployed workers registered to vote Ha: Percentage of employed workers registered to vote exceeds percentage of unemployed workers registered to vote

First, assign values to the appropriate variables

n1<-391
x1<-195
s1<-1
n2<-510
x2<-193
s2<-1

Calculate the difference between the means (point estimate), standard error, and degrees of freedom (in this case, the smaller population)

meandiff<-x1-x2
SE<-sqrt((s1^2/n1)+(s2^2/n2))
df<-n1-1

Calculate T score and t value based on the degrees of freedom

Tstat<-meandiff/SE
tdf<-qt(p=.025, df, lower.tail=FALSE)

Construct confidence interval and calculate the p-value

upper<- meandiff + (tdf*SE)
lower<- meandiff - (tdf*SE)
upper
## [1] 2.132156
lower
## [1] 1.867844
pvalue<-2*pt(Tstat, df, lower.tail = FALSE)
pvalue
## [1] 2.234708e-102

Because the p-value is smaller than α (0.05) and the confidence interval is positive at both the upper and lower bounds, we can accept Ha and conclude that the percentage of employed workers who have registered to vote exceeds the percentage of unemployed workers who have registered to vote.