Week4 inference

traditional<-109
ca<-110
n<-190
sigma<-sqrt(190)
sigma

## [1] 13.78405

sd<-6
#A level of significance of 0.05 will be used to determine if the technique performs differently than the traditional method
# calculate degrees of freedom, and standard error
#df Subtract from your sample size. This is your df, or degrees of freedom. For example, if the sample size is 190, then your df is 190 – 1 =189
df<-n-1
se<-sd/sigma
#calculate the t value with the appropriate degrees of freedom
#Choose an alpha level. The alpha level is usually given to you in the question — the most common one is 5% (0.05).
alpha<-.05
# function pt returns the value of the cumulative density function (cdf)
#pt(x, df, lower.tail = FALSE)  random variable x and degrees of freedom df
tdf<-qt(p=alpha/2, df, lower.tail=FALSE)
#Finally, construct the confidence interval
upper<- ca + (tdf*se)
upper

## [1] 110.8586

lower<- ca - (tdf*se)
lower

## [1] 109.1414

#2--------------------
ozo<-5.3
n<-5
mos<-5
sd=1.1
alpha<0.5

## [1] TRUE

df<-n-1  #5-1 =4
SE<-sd/sqrt(n) #1.1/2.236= 0.4932
SE

## [1] 0.491935

#t value
tvdf<-qt(p=alpha/2,df,lower.tail = FALSE)
tvdf

## [1] 2.776445

#confidence interval
upper<-mos+(tvdf*SE)
upper

## [1] 6.36583

lower<- mos-(tvdf*SE)
lower

## [1] 3.63417

# Does not the data support the claim at the 0.05 level
# -------------------------

# 3 Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 7.3 parts/million (ppm). A researcher believes that the current ozone level is not at a normal level. The mean of 51 samples is 7.1 ppm with a variance of 0.49. Assume the population is normally distributed. A level of significance of 0.01 will be used. Show all work and hypothesis testing steps.

OZO<-7.3
n<-51
mos<-7.1
variSD<-0.5
los<-0.01
df<-n-1
SE<-variSD/sqrt(n)
alpha<-0.01
tvdf<-qt(p=0.01/2,df,lower.tail = FALSE)
tvdf

## [1] 2.677793

upper<-mos+(tvdf*SE)
upper

## [1] 7.287483

lower<-mos-(tvdf*SE)
lower

## [1] 6.912517

# Does not . 7.3 is outside of that interval, we can reject the null hypothesis

#-----------------------------------------------------
  
#4  A publisher reports that 36% of their readers own a laptop. A marketing executive wants to test the claim that the percentage is actually less than the reported percentage. A random sample of 100 found that 29% of the readers owned a laptop. Is there sufficient evidence at the 0.02 level to support the executive's claim? Show all work and hypothesis testing steps.

n<-100
OWN<-.36
OWNED<-.29
alpha<-0.02
#Standard Error
SE<-sqrt((OWN*(1-OWN))/n)  #6*1-36/100=-0.3
SE

## [1] 0.048

Z<-qnorm(1-.02/2)
Z

## [1] 2.326348

upper<-OWNED+Z*SE
upper

## [1] 0.4016647

lower<-OWNED-Z*SE
lower

## [1] 0.1783353

# No.can't reject.  Is there sufficient evidence at the 0.02 level to support the executive’s claim? - 36 is less than the upper bound. 

#-------------------------------------------------------------

  
# A hospital director is told that 31% of the treated patients are uninsured. The director wants to test the claim that the percentage of uninsured patients is less than the expected percentage. A sample of 380 patients found that 95 were uninsured. Make the decision to reject or fail to reject the null hypothesis at the 0.05 level. Show all work and hypothesis testing steps.   
noins<-.31
n<-380
unins<-.95
df<-n-1
los<-0.05

SE<-sqrt((noins*(1-noins))/n)
SE

## [1] 0.0237254

Z<-qnorm(1-0.05/2)
Z

## [1] 1.959964

upper<-unins+(Z*SE)
upper

## [1] 0.9965009

lower<-unins-(Z*SE) 
lower

## [1] 0.9034991

# less than upper bound.cannot reject the null hypothesis and itdoes not support the claim.
#---------------------------


#6 Find the minimum sample size needed to be 99% confident that the sample's variance is within 1% of the population's variance.

sd<-1
Z<-(1-.01/2)
Z

## [1] 0.995

Er<-0.1
n<-((Z*sd)/Er)^2
n

## [1] 99.0025

round(n,0)

## [1] 99

#----------------------------
#7 A standardized test is given to a sixth-grade class. Historically the mean score has been 112 with a standard deviation of 24. The superintendent believes that the standard deviation of performance may have recently decreased. She randomly sampled 22 students and found a mean of 102 with a standard deviation of 15.4387. Is there evidence that the standard deviation has decreased at the 𝛼𝛼 = 0.1 level? Show all work and hypothesis testing steps.
Act<-112
Lat<-102
sd1<-24
sd2<-15.4387
n=22
alpha<-0.1
meandifference<-Act-Lat
meandifference

## [1] 10

SE<-sqrt((sd1^2/n)+(sd2^2/n))
SE

## [1] 6.084083

df<-n-1
tvdf<-qt(p=alpha/2,df,lower.tail = FALSE)
tvdf

## [1] 1.720743

# it is confusing
upper<-Lat+(tvdf*SE)
upper

## [1] 112.4691

lower<-Lat-(tvdf*SE)
lower

## [1] 91.53086

upper1<-sd2+(tvdf*SE)
upper1

## [1] 25.90784

lower1<-sd2-(tvdf*SE)
lower1

## [1] 4.969557

# SD with in the interval and can't reject the null hypothesis 

#--------------------------------------------

#8 A medical researcher wants to compare the pulse rates of smokers and non-smokers. He believes that the pulse rate for smokers and non-smokers is different and wants to test this claim at the 0.1 level of significance. The researcher checks 32 smokers and finds that they have a mean pulse rate of 87, and 31 non-smokers have a mean pulse rate of 84. The standard deviation of the pulse rates is found to be 9 for smokers and 10 for non-smokers. Let 𝜇𝜇1 be the true mean pulse rate for smokers and 𝜇𝜇2 be the true mean pulse rate for non-smokers. Show all work and hypothesis testing steps  
  

msmokers<-87
sdsmokers<-8
mnonsmokers<-84
sdnonsmokers<-10
nsmokers<-32
nnonsmokers<-31
meandiff<-(nsmokers-nnonsmokers)
meandiff

## [1] 1

SE<-sqrt((sdsmokers^2/nsmokers)+(sdnonsmokers^2/nnonsmokers))
SE

## [1] 2.286002

df<-nnonsmokers-1
df

## [1] 30

Tstat<-meandiff/SE
tvdf<-qt(p=.05, df, lower.tail=FALSE)
tvdf

## [1] 1.697261

# p-value with T score and df
pvalue<-2*pt(Tstat, df, lower.tail = FALSE)
pvalue

## [1] 0.6649219

# pvalue is 0.199 >0.1 can't reject the null hypothesis .

#---------------------------------------------------------------
   
#9 Given two independent random samples with the following results:
#𝑛1 = 11
#𝑥𝑥̅1 = 127
#𝑠𝑠1 = 33
#𝑛𝑛2 = 18
#𝑥𝑥̅2 = 157
#𝑠𝑠2 = 27
#Use this data to find the 95% confidence interval for the true difference between the population means. Assume that the population variances are not equal and that the two populations are normally distributed.

n1<-11
x1<-127
s1<-33
n2<-18
x2<-157
s2<-27

meandiff<-x1-x2
SE<-sqrt((s1^2/n1)+(s2^2/n2))
df<-n1-1

Tstat<-meandiff/SE
tdf<-qt(p=.025, df, lower.tail=FALSE)

upper<- meandiff + (tdf*SE)
lower<- meandiff - (tdf*SE)
cat("(",lower,",",upper,")")

## ( -56.31657 , -3.683426 )

#------------------------------

#10 Two men, A and B, who usually commute to work together decide to conduct an experiment to see whether one route is faster than the other. The men feel that their driving habits are approximately the same, so each morning for two weeks one driver is assigned to route I and the other to route II. The times, recorded to the nearest minute, are shown in the following table. Using this data, find the 98% confidence interval for the true mean difference between the average travel time for route I and the average travel time for route II. Let 𝑑𝑑 = (route I travel time) − (route II travel time). Assume that the populations of travel times are normally distributed for both routes. Show all work and hypothesis testing steps.


route1<-c(32,27,34,24,31,25,30,23,27,35)
route2<-c(28,28,33,25,26,29,33,27,25,33)


rx1<-mean(route1)
n1<-10
rs1<-sd(route1)
rx2<-mean(route2)
n2<-10
rs2<-sd(route2)

meandiff<-rx1-rx2
SE<-sqrt((rs1^2/n1)+(rs2^2/n2))
df<-n1-1


Tstat<-meandiff/SE
tdf<-qt(p=.01, df, lower.tail=FALSE)


upper<- meandiff + (tdf*SE)
lower<- meandiff - (tdf*SE)
cat("(",lower,",",upper,")")

## ( -4.637066 , 4.837066 )

#---------------------------
#11The U.S. Census Bureau conducts annual surveys to obtain information on the percentage of the voting-age population that is registered to vote. Suppose that 391 employed persons and 510 unemployed persons are independently and randomly selected, and that 195 of the employed persons and 193 of the unemployed persons

n1<-391
x1<-195
s1<-1
n2<-510
x2<-193
s2<-1

meandiff<-x1-x2
SE<-sqrt((s1^2/n1)+(s2^2/n2))
df<-n1-1

Tstat<-meandiff/SE
tdf<-qt(p=.025, df, lower.tail=FALSE)

upper<- meandiff + (tdf*SE)
lower<- meandiff - (tdf*SE)
upper

## [1] 2.132156

pvalue<-2*pt(Tstat, df, lower.tail = FALSE)
pvalue

## [1] 2.234708e-102

#he p-value is smaller than α (0.05) and the confidence interval is positive at both the upper and lower bounds, we can accept

Week4 inference

2023-02-16