5.8 As part of the recruitment of new businesses, the city’s economic development department wants to estimate the gross profit margin of small businesses (under $1 million in sales) currently residing in the city. A random sample of the previous years annual reports of 15 small businesses shows the mean net profit margin to be 7 .2% (of sales) with a standard deviation of 12.5%.
library(BSDA)
tsum.test(mean.x = 7.2,s.x = 12.5,n.x = 15,alternative = "two.sided",conf.level = .99)
##
## One-sample t-Test
##
## data: Summarized x
## t = 2.2308, df = 14, p-value = 0.04256
## alternative hypothesis: true mean is not equal to 0
## 99 percent confidence interval:
## -2.407719 16.807719
## sample estimates:
## mean of x
## 7.2
99 percent confidence interval: -2.407719 16.807719
Since the population is NOT known, the sample size is LESS THAN 30, and we CANNOT assume a normal distirbution,we use the T DISTRIBUTION. The margin of error in a T distribution is larger than a normal because the critical values of T are larger than the standard normal (gives a wider range for estimators). As N increases, the critical values of T become more like the standard normal values. There is a possibility that the claim may be valid, but the confidence interval is valid because the T distribution was used.
5.11 Refer to Example 5.4. Suppose an estimate of o is given by o = .7 a. If the level of confidence remains 99% but the desired width of the interval is reduced to 0.3, what is the necessary sample size?
It’s a one tailed test so alpha would be .995
a<-.995
woi<-.3
(qnorm(a,0,1,lower.tail = TRUE)*.7/(woi/2))^2
## [1] 144.4933
Sample size needed is 145
It’s a one tailed test so alpha would be .975
a<-.975
woi<-.5
(qnorm(a,0,1,lower.tail = TRUE)*.7/(woi/2))^2
## [1] 30.11704
Sample size needed is 31
It’s a one tailed test so alpha would be .9975
woi<-.5
a<-.9975
(qnorm(a,0,1,lower.tail = TRUE)*.7/(woi/2))^2
## [1] 61.7748
Sample size needed is 62
If we have a fixed desired width, an increase in confidence level will result in an increase in sample size.
If we have a fixed level of confidence, an increase in desired width will result in a decrease in sample size.
5.14 The housing department in a large city monitors the rent for rent-controlled apartments in the city. The mayor wants an estimate of the average rent. The housing department must determine the number of apartments to include in a survey in order to be able to estimate the average rent to within $100 using a 95% confidence interval. From past surveys, the monthly charge for rent-controlled apartments ranged from $1,000 to $3,500. How many renters must be included in the survey to meet the requirements? Confidence Int = .95 E = 100 Estimated standard deviation = 3500 - 1000/4 = 625
E<-100
a<-.975
(qnorm(a,0,1,lower.tail = TRUE)*625/E)^2
## [1] 150.057
Sample size needed is 151
5.16 A study is designed to test the hypotheses Ho: u >= 26 versus Ha: u < 26. A random sample of 50 units was selected from a specified population, and the measurements were summarized to y = 25.9 and s = 7.6. a. With alpha = .05, is there substantial evidence that the population mean is less than 26?
The critical value of Z for the left tail test is -Zalpha, so P(Z>Zalpha)=.05
zsum.test(25.9, sigma.x = 7.6, n.x = 50, alternative = "less", mu = 26, conf.level = 0.95)
##
## One-sample z-Test
##
## data: Summarized x
## z = -0.09304, p-value = 0.4629
## alternative hypothesis: true mean is less than 26
## 95 percent confidence interval:
## NA 27.66789
## sample estimates:
## mean of x
## 25.9
p-value = 0.4629< level of significance = .05 The test rejects the null hypothesis H0: u >= 26 at the level of significance of .05. Therefore, the Research Hypothesis is accepted Ha: u < 26.
a<-.05
uo<-26
u<-24
o<-7.6
n<-50
pnorm(qnorm(a,0,1,lower.tail =FALSE)-(uo-u)/(o/sqrt(n)),0,1)
## [1] 0.4145119
a<-.05
uo<-26
u<-24
o<-7.6
n<-100
pnorm(qnorm(a,0,1,lower.tail =FALSE)-(uo-u)/(o/sqrt(n)),0,1)
## [1] 0.1618887
5.17 Refer to Exercise 5.16. Graph the power curve for rejecting Ho: u>= 26 for the following values of m: 20, 21, 22, 23, 24, 25, and 26.
a<-.05
n<-50
plot(seq(20,26,1),pnorm(qnorm(a,0,1,lower.tail =TRUE) - (seq(20,26,1)-26)/(o/(sqrt(n)))),col="red",type="l",xlab= "Means",ylab = ".05 red")
Power decreases as mu increases.
a<-.05
n<-50
plot(seq(20,26,1),pnorm(qnorm(a,0,1,lower.tail =TRUE) - (seq(20,26,1)-26)/(o/(sqrt(n)))),col="red",type="l",xlab= "Means",ylab = ".05 red .01 green")
a<-.01
n<-50
lines(seq(20,26,1),pnorm(qnorm(a,0,1,lower.tail =TRUE) - (seq(20,26,1)-26)/(o/(sqrt(n)))),col="green",type="l")
a<-.05
n<-50
plot(seq(20,26,1),pnorm(qnorm(a,0,1,lower.tail =TRUE) - (seq(20,26,1)-26)/(o/(sqrt(n)))),col="red",type="l",xlab= "Means",ylab = "n=50 red n=35 green")
a<-.05
n<-35
lines(seq(20,26,1),pnorm(qnorm(a,0,1,lower.tail =TRUE) - (seq(20,26,1)-26)/(o/(sqrt(n)))),col="green",type="l")
5.18 Use a computer to simulate 100 samples of n = 25 from a normal distribution with u = 43 and o = 4. Test the hypotheses H0: u = 43 versus Ha: m != 43 separately for each of the 100 samples of size 25 with alpha = .05.
a<-.05
u<-43
uo<-43
o<-4
n<-25
r<-100
twotailed<-1
y<-rep(0,n)
L<-rep(0,r)
U<-rep(0,r)
chk<-0
if (twotailed==1){a<-a/2}
for(i in 1:r)
{
y<-rnorm(n,u,o)
L[i]<- mean(y)-qnorm(a,0,1,lower.tail=FALSE)*(o/sqrt(n))
U[i]<- mean(y)+qnorm(a,0,1,lower.tail=FALSE)*(o/sqrt(n))
if (L[i]>uo | U[i]<uo)
{
chk<-chk+1
}
}
chk
## [1] 5
rm(list=ls())
a<-.05
u<-43
uo<-43
o<-4
n<-50
r<-1000
twotailed<-1
y<-rep(0,n)
L<-rep(0,r)
U<-rep(0,r)
chk<-0
if (twotailed==1){a<-a/2}
for(i in 1:r)
{
y<-rnorm(n,u,o)
L[i]<- mean(y)-qnorm(a,0,1,lower.tail=FALSE)*(o/sqrt(n))
U[i]<- mean(y)+qnorm(a,0,1,lower.tail=FALSE)*(o/sqrt(n))
if (L[i]>uo | U[i]<uo)
{
chk<-chk+1
}
}
chk
## [1] 53
rm(list=ls())
a<-.01
u<-43
uo<-43
o<-4
n<-75
r<-1000
twotailed<-1
y<-rep(0,n)
L<-rep(0,r)
U<-rep(0,r)
chk<-0
if (twotailed==1){a<-a/2}
for(i in 1:r)
{
y<-rnorm(n,u,o)
L[i]<- mean(y)-qnorm(a,0,1,lower.tail=FALSE)*(o/sqrt(n))
U[i]<- mean(y)+qnorm(a,0,1,lower.tail=FALSE)*(o/sqrt(n))
if (L[i]>uo | U[i]<uo)
{
chk<-chk+1
}
}
chk
## [1] 10
rm(list=ls())
5.19 Refer to Exercise 5.18. Simulate 100 samples of size n = 25 from a normal population in which u = 45 and o = 4. Use alpha =.05 in conducting a test of H0: u = 43 versus Ha: u != 43 for each of the 100 samples. a. What proportion of the 100 tests of H0: u = 43 versus Ha: u != 43 resulted in the correct decision, that is, the rejection of H0?
a<-.05
u<-45
uo<-43
o<-4
n<-25
r<-100
twotailed<-1
y<-rep(0,n)
L<-rep(0,r)
U<-rep(0,r)
chk<-0
if (twotailed==1){a<-a/2}
for(i in 1:r)
{
y<-rnorm(n,u,o)
L[i]<- mean(y)-qnorm(a,0,1,lower.tail=FALSE)*(o/sqrt(n))
U[i]<- mean(y)+qnorm(a,0,1,lower.tail=FALSE)*(o/sqrt(n))
if (L[i]>uo | U[i]<uo)
{
chk<-chk+1
}
}
chk/r
## [1] 0.74
rm(list=ls())
a<-.025
uo<-43
u<-45
o<-4
n<-25
pnorm(qnorm(a,0,1,lower.tail =TRUE)-(uo-u)/(o/sqrt(n)),0,1)
## [1] 0.7054139
70 out of 100 tests should reject Ho correctly.
Compare this value to the number of rejections obtained in the simulation. Explain why the estimated number of rejections and the number of rejections observed in the simulation differ.
In the random simulation, we had approximately 73 rejections. Therefore it is a pretty good estimate, but the difference can be explained by some of the values being really close to the threshold.
5.26 The administrator of a nursing home would like to do a time-and-motion study of staff time spent per day performing nonemergency tasks. Prior to the introduction of some efficiency measures, the average number of person-hours per day spent on these tasks was u = 16. The administrator wants to test whether the efficiency measures have reduced the value of u. How many days must be sampled to test the proposed hypothesis if she wants a test having alpha =.05 and the probability of alpha Type II error of at most .10 when the actual value of u is 12 hours or less (at least a 25% decrease from the number of hours spent before the efficiency measures were implemented)? Assume o = 7.64.
power.t.test(n=NULL,delta = 16-12,sd=7.64,sig.level = .05,power = .90,type = c("one.sample"),alternative = c("one.sided"))
##
## One-sample t test power calculation
##
## n = 32.64325
## delta = 4
## sd = 7.64
## sig.level = 0.05
## power = 0.9
## alternative = one.sided
33 days must be sampled
5.36 The ability to read rapidly and simultaneously maintain a high level of comprehension is often a determining factor in the academic success of many high school students. A school district is considering a supplemental reading program for incoming freshmen. Prior to implementing the program, the school runs a pilot program on a random sample of n = 20 students. The students were thoroughly tested to determine reading speed and reading comprehension. Based on a fixed-length standardized test reading passage, the following reading times (in minutes) and comprehension scores (based on a 100-point scale) were recorded. Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 n y s Reading Time 5 7 15 12 8 7 10 11 9 13 10 6 11 8 10 8 7 6 11 8 20 9.10 2.573 Comprehension 60 76 76 90 81 75 95 98 88 73 90 66 91 83 100 85 76 69 91 78 20 82.05 10.88 a. What is the population about which inferences are being made? The incoming freshman for whom this reading program is being tested for.
rt<- c(5,7,15,12,8,7,10,11,9,13,10,6,11,8,10,8,7,6,11,8)
t.test(rt,alternativve=c("two.sided"),mu=mean(rt),paired = FALSE,var.equal = FALSE,conf.level = .95)
##
## One Sample t-test
##
## data: rt
## t = 0, df = 19, p-value = 1
## alternative hypothesis: true mean is not equal to 9.1
## 95 percent confidence interval:
## 7.895733 10.304267
## sample estimates:
## mean of x
## 9.1
95 percent confidence interval: 7.895733 10.304267
qqnorm(rt,
ylab="Reading Times",
xlab="Students",
main="Normal Probability Plot")
qqline(rt)
Yes. If the data is drawn from a normal distribution, the points will fall approximately in a straight line as we can see. If the data points deviate from a straight line, it suggests that the data is not drawn from a normal distribution.
5.37 Refer to Exercise 5.36. Using the reading comprehension data, is there significant evidence that the reading program would produce for incoming freshmen a mean comprehension score greater than 80, the statewide average for comparable students during the previous year? Determine the level of significance for your test. Interpret your findings.
rc<- c(60,76,76,90,81,75,95,98,88,73,90,66,91,83,100,85,76,69, 91,78)
t.test(rc,alternative=c("greater"),mu=80,paired = FALSE,var.equal = FALSE,conf.level = .90)
##
## One Sample t-test
##
## data: rc
## t = 0.84267, df = 19, p-value = 0.2049
## alternative hypothesis: true mean is greater than 80
## 90 percent confidence interval:
## 78.81996 Inf
## sample estimates:
## mean of x
## 82.05
The p value is .2049. We can accept the null hypothesis at an alpha of .01, .05, and .10. Claiming that the mean is greater than 80 is correct.
5.60 When an audit must be conducted that involves a tedious examination of a large inventory, the audit may be very costly and time consuming if each item in the inventory must be examined. In such situations, the auditor frequently obtains a random sample of items from the complete inventory and uses the results of an audit of the sampled items to check the validity of the company’s financial statement. A large company’s financial statement claims an inventory that averages $600 per item. The following data are the auditor’s assessment of a random sample of 75 items from the company’s inventory. The values resulting from the audit are rounded to the nearest dollar. 303 547 1,368 493 984 507 148 2,546 738 83 2 135 274 74 1,472 399 1,784 71 751 136 571 147 282 2,039 1,909 748 188 548 1 280 102 618 129 1,324 1,428 469 102 454 1,059 939 303 600 234 514 17 551 293 1,395 7 28 2 973 506 511 812 1,290 685 447 11 35 252 1,526 464 5 67 99 67 259 7 67 248 3,215 3 33 41
inventory <- read.csv(file="inventory.csv", header = TRUE)
attach(inventory)
z.test(inv,alternative = "two.sided",conf.level = .95,sigma.x = sd(inv))
##
## One-sample z-Test
##
## data: inv
## z = 7.5306, p-value = 5.052e-14
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 411.4787 701.0279
## sample estimates:
## mean of x
## 556.2533
At a 95% confidence interval, we estimate the mean value to be between 411.4787 and 701.0279.
z.test(inv,alternative = "less",mu=600,conf.level = .01,sigma.x = sd(inv))
##
## One-sample z-Test
##
## data: inv
## z = -0.59224, p-value = 0.2768
## alternative hypothesis: true mean is less than 600
## 1 percent confidence interval:
## NA 384.4154
## sample estimates:
## mean of x
## 556.2533
There is substantial evidence that the true mean is less than $600 as the p value we get is .2768, which is greater than .01. We accept the null.
The company’s complete inventory
For a normal distribution, data is at least is symmetric, data from multiple groups have the same variance, data has a linear relationship, data is independent. A sample size greater 30, we assume normal. Let’s plot it below….
qqnorm(inv,
ylab="Dollars",
xlab="Data",
main="Normal Probability Plot")
qqline(inv)
rm(list=ls())
One can see that the data does not follow a linear relationship so it is not a normal distribution. Therefore, a normal distribution approach is not appropriate. We can try to use other approaches like using the median instead of the mean.
5.62 If a new process for mining copper is to be put into full-time operation, it must produce an average of more than 50 tons of ore per day. A 15-day trial period gave the results shown in the accompanying table. Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Yield (tons) 57 .8 58.3 50.3 38.5 47 .9 157 .0 38.6 140.2 39.3 138.7 49.2 139.7 48.3 59.2 49.7 a. Estimate the typical amount of ore produced by the mine using both a point estimate and a 95% confidence interval.
Yld<- c(57.8,58.3,50.3,38.5,47.9,157.0,38.6,140.2,39.3,138.7,
49.2,139.7,48.3,59.2,49.7)
t.test(Yld,alternative=c("two.sided"),mu=50,paired = FALSE,var.equal = FALSE,conf.level = .95)
##
## One Sample t-test
##
## data: Yld
## t = 2.1196, df = 14, p-value = 0.05239
## alternative hypothesis: true mean is not equal to 50
## 95 percent confidence interval:
## 49.71317 98.64683
## sample estimates:
## mean of x
## 74.18
t.test(Yld,alternative=c("greater"),mu=50,paired = FALSE,var.equal = FALSE,conf.level = .95)
##
## One Sample t-test
##
## data: Yld
## t = 2.1196, df = 14, p-value = 0.0262
## alternative hypothesis: true mean is greater than 50
## 95 percent confidence interval:
## 54.08771 Inf
## sample estimates:
## mean of x
## 74.18
rm(list=ls())
P Value is .0262 for this test, which is less than .05 (alpha). Therefore, there isn’t enough significance evidence that on a typical day the mine produces more than 50 tons of ore.
EXTRA PROBLEM.
A car manufacturer wants to test a new engine to determine whether it meets new air pollution standard (true mean emission must be less than 20 parts per million of carbon). Thirty-six engines manufactured for testing purposes yield the following summary results of emission levels: Sample mean =18.50, Sample std dev = 3 Assume that the normal distribution assumption is satisfied for the sample data.
zsum.test(18.5, sigma.x = 3, n.x = 36, alternative = "greater", mu = 19, conf.level = 0.01)
##
## One-sample z-Test
##
## data: Summarized x
## z = -1, p-value = 0.8413
## alternative hypothesis: true mean is greater than 19
## 1 percent confidence interval:
## 19.66317 NA
## sample estimates:
## mean of x
## 18.5
P value is greater than alpha so we accept the null hypothesis.
zsum.test(18.5, sigma.x = 3, n.x = 36, alternative = "greater", mu = 18, conf.level = 0.01)
##
## One-sample z-Test
##
## data: Summarized x
## z = 1, p-value = 0.1587
## alternative hypothesis: true mean is greater than 18
## 1 percent confidence interval:
## 19.66317 NA
## sample estimates:
## mean of x
## 18.5
P value is greater than alpha so we accept the null hypothesis.
zsum.test(18.5, sigma.x = 3, n.x = 36, alternative = "greater", mu = 17, conf.level = 0.01)
##
## One-sample z-Test
##
## data: Summarized x
## z = 3, p-value = 0.00135
## alternative hypothesis: true mean is greater than 17
## 1 percent confidence interval:
## 19.66317 NA
## sample estimates:
## mean of x
## 18.5
P value is less than alpha so we reject the null hypothesis.
As the value of mu decreases, P value decreases.
a<-.01
uo<-19
u<-20
o<-3
n<-36
pnorm(qnorm(a,0,1,lower.tail =TRUE)-(uo-u)/(o/sqrt(n)),0,1)
## [1] 0.3720806
a<-.01
uo<-18
u<-20
o<-3
n<-36
pnorm(qnorm(a,0,1,lower.tail =TRUE)-(uo-u)/(o/sqrt(n)),0,1)
## [1] 0.9529005
a<-.01
uo<-17
u<-20
o<-3
n<-36
pnorm(qnorm(a,0,1,lower.tail =TRUE)-(uo-u)/(o/sqrt(n)),0,1)
## [1] 0.9998804
Which test has the highest power? mu = 17 has the highest power
zsum.test(18.5, sigma.x = 3, n.x = 36, alternative = "greater", mu = 19, conf.level = 0.05)
##
## One-sample z-Test
##
## data: Summarized x
## z = -1, p-value = 0.8413
## alternative hypothesis: true mean is greater than 19
## 5 percent confidence interval:
## 19.32243 NA
## sample estimates:
## mean of x
## 18.5
P value is greater than alpha so we accept the null hypothesis. ( ii) H0: mu = 20; Ha: mu = 18;
zsum.test(18.5, sigma.x = 3, n.x = 36, alternative = "greater", mu = 18, conf.level = 0.05)
##
## One-sample z-Test
##
## data: Summarized x
## z = 1, p-value = 0.1587
## alternative hypothesis: true mean is greater than 18
## 5 percent confidence interval:
## 19.32243 NA
## sample estimates:
## mean of x
## 18.5
P value is greater than alpha so we accept the null hypothesis.
zsum.test(18.5, sigma.x = 3, n.x = 36, alternative = "greater", mu = 17, conf.level = 0.05)
##
## One-sample z-Test
##
## data: Summarized x
## z = 3, p-value = 0.00135
## alternative hypothesis: true mean is greater than 17
## 5 percent confidence interval:
## 19.32243 NA
## sample estimates:
## mean of x
## 18.5
P value is less than alpha so we reject the null hypothesis.
As the value of mu decreases, P value decreases, but it is the same result as the level of significance of .01
a<-.05
uo<-19
u<-20
o<-3
n<-36
pnorm(qnorm(a,0,1,lower.tail =TRUE)-(uo-u)/(o/sqrt(n)),0,1)
## [1] 0.63876
a<-.05
uo<-18
u<-20
o<-3
n<-36
pnorm(qnorm(a,0,1,lower.tail =TRUE)-(uo-u)/(o/sqrt(n)),0,1)
## [1] 0.9907423
a<-.05
uo<-17
u<-20
o<-3
n<-36
pnorm(qnorm(a,0,1,lower.tail =TRUE)-(uo-u)/(o/sqrt(n)),0,1)
## [1] 0.9999934
Which test has the highest power? mu = 17 has the highest power.
One can see that if we are willing to accept a higher alpha, we increase the power for each individual mu. The biggest increase with the higher mu.