Homework 5, Data Analysis

Question 1 (In class Lecture notes)

Using traditional methods, it takes 109 hours to receive a basic driving license. A new license training method using Computer Aided Instruction (CAI) has been proposed. A researcher used the technique with 190 students and observed that they had a mean of 110 hours. Assume the standard deviation is known to be 6. A level of significance of 0.05 will be used to determine if the technique performs differently than the traditional method. Make a decision to reject or fail to reject the null hypothesis. Show all work in R.

Given: \(\mu=109, n=190, \bar{x}=110, \sigma=6, \alpha=.05\).

To Do: Determine if the technique performs differently than the traditional method. Burden of proof falls on alternative hypothesis -

mu=109
n=190
xbar=110
sigma=6
alpha=0.05
Zc=qnorm(1-alpha/2)
Se=sigma/sqrt(n)
lower=mu - Zc*Se
upper=mu + Zc*Se
c(lower,mu,upper)

## [1] 108.1469 109.0000 109.8531

if(xbar<lower||xbar>upper){print('REJECT Ho')}else{print('FAIL 2 REJECT')}

## [1] "REJECT Ho"

Since \(\bar{x}\) is outside the limits, specificall above the upper limit, we can reject the Null Hypothesis

Question 2 (Lecture notes)

Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 5.3 parts/million (ppm). A researcher believes that the current ozone level is at an insufficient level. The mean of 5 samples is 5.0 parts per million (ppm) with a standard deviation of 1.1. Does the data support the claim at the 0.05 level? Assume the population distribution is approximately normal.

Given: \(\mu=5.3, n=5, \bar{x}=5, \sigma=1.1, \alpha=.05\).

mu=5.3
n=5
xbar=5
sigma=1.1
alpha=0.05
Zc=qnorm(1-alpha/2)
Se=sigma/sqrt(n)
lower=mu - Zc*Se
upper=mu + Zc*Se
c(lower,mu,upper)

## [1] 4.335825 5.300000 6.264175

if(xbar<lower||xbar>upper){print('REJECT Ho')}else{print('FAIL 2 REJECT')}

## [1] "FAIL 2 REJECT"

shadenorm(mu = mu, sig = sigma, pcts = c(alpha/2,1-alpha/2),xval=xbar)

pnorm(xbar,mu,sd=sigma) # p-value

## [1] 0.3925314

To Do: Researcher believes that the current ozone level is at an insufficient level - does the data support the claim at the 0.05 level ?

No, because the mean of the sample is within the .95 interval, we can’t reject the null hypothesis of of being at a normal level. Validate by calculating p-value and drawing vertical line on plot of normal distribution

Question 3 (Lecture notes)

Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 7.3 parts/million (ppm). A researcher believes that the current ozone level is not at a normal level. The mean of 51 samples is 7.1 ppm with a variance of 0.49. Assume the population is normally distributed. A level of significance of 0.01 will be used. Show all work and hypothesis testing steps.

Given: \(\mu=7.3, n=51, \bar{x}=7.1, \sigma^2=0.49, \alpha=.01\).

mu=7.3
n=51
xbar=7.1
v=0.49
sigma=sqrt(v)
alpha=0.01

State Null and Alternative Hypothesis - \(H_{o}\) : ozone level is at a normal level, \(H_{a}\) : ozone level is not at a normal level
Determine Statistical Test and Distribution: Use Z-test of proportions
Specify the Type 1 Error Rate: \(\alpha\) = 0.01
State Decision Rule: if p-value < \(\alpha\), then reject the null
Gather sample data and calculate value of test: 51 samples with 7.1 mean and 0.49 variance, see calculation below in R. Calculate Z-score above and below and create lower and upper limits to then compare with sample mean
State the statistical conclusion.

Zc=qnorm(1-alpha/2)
Se=sigma/sqrt(n)
lower=mu - Zc*Se
upper=mu + Zc*Se
c(lower,mu,upper)

## [1] 7.047518 7.300000 7.552482

if(xbar<lower||xbar>upper){print('REJECT Ho')}else{print('FAIL 2 REJECT')}

## [1] "FAIL 2 REJECT"

shadenorm(mu = mu, sig = sigma, pcts = c(alpha/2,1-alpha/2),xval=xbar)

pnorm(xbar,mu,sd=sigma) # p-value

## [1] 0.3875485

To Do: Researcher believes that the current ozone level is not at normal level. Thus, set a double sided hypothesis.

As shown above 7.1 is well within the limits, confirmed by p-value of .3875, shown by vertical line in distribution plot

Question 4 (See Open Stats Textbook - Chapter 5 Section 5.2: Confidence intervals for a proportion)

A publisher reports that 36% of their readers own a laptop. A marketing executive wants to test the claim that the percentage is actually less than the reported percentage. A random sample of 100 found that 29% of the readers owned a laptop. Is there sufficient evidence at the 0.02 level to support the executive’s claim? Show all work and hypothesis testing steps.

Given: \(\pi=.36, n=100, \hat{p}=.29,\alpha=.02\).

pi=.36
n=100
phat=.29
alpha=.02

State Null and Alternative Hypothesis - \(H_{o}\) : 36% of readers own a laptop, \(H_{a}\) : less than 36% of readers own a lap
Determine Statistical Test and Distribution: Use Z-test of proportions, lower tail test
Specify the Type 1 Error Rate: \(\alpha\) = 0.02
State Decision Rule: if p-value < \(\alpha\)/1 (1 sided), then reject the null
Get Sample Data and calulate value of test (Z)

# Z= (($\hat{p})-\pi$) / $\sqrt{(\pi(1-\pi))}$/n
Zc <- (phat-pi)/sqrt((pi*(1-pi))/n)
Zc

## [1] -1.458333

# Calculate $\sigma$
sigma <- (pi * ( 1-pi)) / n
sigma

## [1] 0.002304

To Do: Executive wants to test the claim that the percentage is actually less than the reported percentage. Thus, set a single sided hypothesis.

pval <- pnorm(Zc)
pval

## [1] 0.07237434

if(pval<alpha){print('REJECT Ho')}else{print('FAIL 2 REJECT')}

## [1] "FAIL 2 REJECT"

p-value is not < 0.2, so we can’t reject the \(H_{o}\)

Question 5 (See Open Stats Textbook - Chapter 5)

A hospital director is told that 31% of the treated patients are uninsured. The director wants to test the claim that the percentage of uninsured patients is less than the expected percentage. A sample of 380 patients found that 95 were uninsured. Make the decision to reject or fail to reject the null hypothesis at the 0.05 level. Show all work and hypothesis testing steps.

Given: \(\pi=.31, n=380, \hat{p}=\dfrac{95}{380}=.25,\alpha=.05\)

pi=0.31
n=380
phat=95/380
phat

## [1] 0.25

alpha=0.05
Zc <- (phat-pi)/sqrt((pi*(1-pi))/n)
Zc

## [1] -2.528935

sigma <- (pi * ( 1-pi)) / n
sigma

## [1] 0.0005628947

pval <- pnorm(Zc)
pval

## [1] 0.005720462

if(pval<alpha){print('REJECT Ho')}else{print('FAIL 2 REJECT')}

## [1] "REJECT Ho"

With a p-value of 0.00572, which is lower than 0.05 alpha, we can reject the \(H_{o}\)

Question 6 (See Open Stats Section 7.3, Example 7.25 in particular)

A medical researcher wants to compare the pulse rates of smokers and non-smokers. He believes that the pulse rate for smokers and non-smokers is different and wants to test this claim at the 0.1 level of significance. The researcher checks 32 smokers and finds that they have a mean pulse rate of 87, and 31 non-smokers have a mean pulse rate of 84. The standard deviation of the pulse rates is found to be 9 for smokers and 10 for non-smokers. Let \(\mu_1\) be the true mean pulse rate for smokers and \(\mu_2\) be the true mean pulse rate for non-smokers. Show all work and hypothesis testing steps.

Let smoker group be indexed by 1, non-smoker group by 2.
Given: \(n_1 = 32, \mu_1 = 87, n_2 = 32, \mu_2 = 84, \sigma_1 = 9, \sigma_2 = 10 , \alpha = .01\).

n_1=32 # 32 smokers
mu_1=87 # mean pulse smokers
n_2=31 # 31 Non-smokers
mu_2=84 # mean pulse non-smokers
sigma_1=9 # stdev of smokers
sigma_2=10 # stdev of non-smokers
alpha=0.01 # alpha
smokers <- rnorm(n_1, mean = mu_1, sd = sigma_1)
nonsmokers <- rnorm(n_2, mean = mu_2, sd = sigma_2)
# Run 2 sided t-test on samples
tRes <- t.test(smokers, nonsmokers, alternative = "two.sided", conf.level = 1-alpha)
tRes

## 
##  Welch Two Sample t-test
## 
## data:  smokers and nonsmokers
## t = 2.1495, df = 60.902, p-value = 0.03557
## alternative hypothesis: true difference in means is not equal to 0
## 99 percent confidence interval:
##  -1.211504 11.434557
## sample estimates:
## mean of x mean of y 
##  86.39184  81.28032

To Do: Test if the pulse rate for smokers and non-smokers is different at the 0.1 level of significance. Thus, double sided test.

With p-value = 0.08854, the pulse rate for smokers and non-smokers is not different at the 0.1 level of significance

Question 7 (See Open Stats Section 7.3, Example 7.22 in particular)

Given two independent random samples with the following results: \(n_1=11, \bar{x}_1=127, \sigma_1=33, n_2=18, \bar{x}_2=157, \sigma_2=27\)

Use this data to find the 95% confidence interval for the true difference between the population means. Assume that the population variances are not equal and that the two populations are normally distributed.

n_1=11
xbar_1=127
sigma_1=33
n_2=18
xbar_2=157
sigma_2=27
alpha=.05
s1 <- rnorm(n_1, mean=xbar_1, sd=sigma_1)
s2 <- rnorm(n_2, mean=xbar_2, sd=sigma_2)
tRes <- t.test(s1, s2, alternative = "two.sided", conf.level = 1-alpha)
tRes

## 
##  Welch Two Sample t-test
## 
## data:  s1 and s2
## t = -0.78056, df = 14.416, p-value = 0.4477
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -34.57025  16.08507
## sample estimates:
## mean of x mean of y 
##  135.2151  144.4577

To Do: Create a 95% confidence interval for true difference between the population means.

95% confidence interval for true difference between the population means is between -39.962861 and -1.299904

Question 8 (See Open Stats Section 6.2 Difference of two proportions, Example 6.2.2 in particular)

Two men, A and B, who usually commute to work together decide to conduct an experiment to see whether one route is faster than the other. The men feel that their driving habits are approximately the same, so each morning for two weeks one driver is assigned to route I and the other to route II. The times, recorded to the nearest minute, are shown in the following table. Using this data, find the 98% confidence interval for the true mean difference between the average travel time for route I and the average travel time for route II.

r1 = c (32, 27, 34, 24, 31, 25, 30, 23, 27, 35)
r2 = c (28, 28, 33, 25, 26, 29, 33, 27, 25, 33)
mean_1=mean(r1)
mean_2=mean(r2)
sigma_1=sd(r1)
sigma_2=sd(r2)
tRes <- t.test(r1, r2, conf.level=0.98,paired=T)
tRes

## 
##  Paired t-test
## 
## data:  r1 and r2
## t = 0.098427, df = 9, p-value = 0.9238
## alternative hypothesis: true mean difference is not equal to 0
## 98 percent confidence interval:
##  -2.766534  2.966534
## sample estimates:
## mean difference 
##             0.1

Let \(d1 =\) (route I travel time) − (route II travel time).

Assume that the populations of travel times are normally distributed for both routes. Show all work and hypothesis testing steps.

To Do: Find the 98% confidence interval for the true mean difference between the average travel time for route I and the average travel time for route II.

The 98% confidence interval for the true mean difference between the average travel times is -2.76653 abd 2.966534

Question 9 (See Open Stats Textbook - Chapter 5 Section 5.2-5.33: Confidence intervals/Hypothesis testing for a proportion)

The U.S. Census Bureau conducts annual surveys to obtain information on the percentage of the voting-age population that is registered to vote. Suppose that 391 employed persons and 510 unemployed persons are independently and randomly selected, and that 195 of the employed persons and 193 of the unemployed persons have registered to vote. Can we conclude that the percentage of employed workers (p1) who have registered to vote, exceeds the percentage of unemployed workers (p2) who have registered to vote? Use a significance level of 0.05 for the test. Show all work and hypothesis testing steps.

Q: Can we conclude that the percentage of employed workers (p1) who have registered to vote, exceeds the percentage of unemployed workers (p2) who have registered to vote?

n1=391 # employed
n2=510 # unemployed
p1=n1/(n1+n2)
p2=n2/(n1+n2)
rv1=195
phat1=rv1/n1
rv2=193
phat2=rv2/n2
alpha=0.05
# Proportion Test of 2 samples
prop.test(x = c(rv1, rv2), n = c(n1, n2), alternative = "greater")

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(rv1, rv2) out of c(n1, n2)
## X-squared = 12.575, df = 1, p-value = 0.0001955
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.0634622 1.0000000
## sample estimates:
##    prop 1    prop 2 
## 0.4987212 0.3784314

The p-value is 0.0001955, so we can, with significnce level of 0.05, conclude the the percentage of employed workers who have registered to vote exceeds the percentage of unemployed workers who are registered to vote

Week-5

Doru Cojoc

2024-02-18