Interval Estimation

Basing findings on 60 successful pregnancies involving natural birth, an experimenter found that the mean pregnancy term was 274 days, with a standard deviation of 14 days. Construct a 99% confidence interval for the true mean pregnancy term µ.

n = 60
xbar = 274
stdev = 14
p = 1-0.01/2
z = qnorm(p)
ci = z*stdev/sqrt(n)

cat("The mean lies within ", xbar,"+/-",ci,"with 99% confidence.")

## The mean lies within  274 +/- 4.655534 with 99% confidence.

Many mutual funds use an investment approach involving owning stocks whose price/earnings multiples (P/Es) are less than the P/E of the S&P 500. The following data give P/Es of 49 companies a randomly selected mutual fund owns in a particular year.

6.8 9.9 8.9 11.4 14.2 5.6 8.5 8.5 8.4 7.5 9.3 9.4 16.6 9.1 10.1 10.6 11.1 6.4 13.3 12.8 13.7 17.9 21.8 18.4 12.0 34.3 9.6 9.0 11.7 12.8 9.9 14.3 14.0 15.5 9.4 13.7 11.5 11.5 11.8 16.9 18.0 7.8 7.1 10.6 11.1 12.3 12.3 13.9 12.9

Find a 98% confidence interval for the mean P/E multiples. Interpret the result and state any assumptions you have made.

x = c(6.8, 9.9, 8.9, 11.4, 14.2, 5.6, 8.5, 8.5, 8.4, 7.5, 9.3, 9.4, 16.6,
9.1, 10.1, 10.6, 11.1, 6.4, 13.3, 12.8, 13.7, 17.9, 21.8, 18.4, 12.0,
34.3, 9.6, 9.0, 11.7, 12.8, 9.9, 14.3, 14.0, 15.5, 9.4, 13.7, 11.5,
11.5, 11.8, 16.9, 18.0, 7.8, 7.1, 10.6, 11.1, 12.3, 12.3, 13.9, 12.9)
t.test(x,df=48,conf.level = .98)

## 
##  One Sample t-test
## 
## data:  x
## t = 18.056, df = 48, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 98 percent confidence interval:
##  10.50850 13.74048
## sample estimates:
## mean of x 
##  12.12449

print("The mean lies between 10.50850 and 13.74048 with 98% confidence. The result would have been nearly the same with a z test since the sample size is large.")

## [1] "The mean lies between 10.50850 and 13.74048 with 98% confidence. The result would have been nearly the same with a z test since the sample size is large."

For a particular car, when the brake is applied at 62mph, the following data give stopping distance (in feet) for 10 random trials on a dry surface.

146.9 148.4 149.4 148.6 150.3 147.5 147.5 149.3 148.4 145.5

Can we say that the data are approximately normally distributed?
Assuming normality, find a 95% confidence interval for population mean stopping distance µ.

x <- c(146.9, 148.4, 149.4, 148.6, 150.3, 147.5, 147.5, 149.3, 148.4, 145.5)
shapiro.test(x)

## 
##  Shapiro-Wilk normality test
## 
## data:  x
## W = 0.9728, p-value = 0.9155

mean(x)

## [1] 148.18

quantile(x)

##      0%     25%     50%     75%    100% 
## 145.500 147.500 148.400 149.125 150.300

print("Since the p value in the Shapiro-Wilk test is very close to 1, and since the mean and median of the sample are also very close to each other, we can conclude with a fair degree of certainty that the data is approximately normally distributed.")

## [1] "Since the p value in the Shapiro-Wilk test is very close to 1, and since the mean and median of the sample are also very close to each other, we can conclude with a fair degree of certainty that the data is approximately normally distributed."

t.test(x,df=9,conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  x
## t = 338.41, df = 9, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  147.1895 149.1705
## sample estimates:
## mean of x 
##    148.18

print("The mean lies within 147.1895 and 149.1705 with 95% confidence.")

## [1] "The mean lies within 147.1895 and 149.1705 with 95% confidence."

A drug is suspected of causing an elevated heart rate in a certain group of high–risk patients. Twenty patients from the group were given the drug. The changes in heart rates were found to be as follows.

-1 8 5 10 2 12 7 9 1 3 4 6 4 12 11 2 -1 10 2 8

Construct a 98% confidence interval for the mean change in heart rate. Assume that the population has a normal distribution. Interpret your answer.

x <- c(-1, 8, 5, 10, 2, 12, 7, 9, 1, 3, 4, 6, 4, 12, 11, 2, -1, 10, 2, 8)
t.test(x,df=19,conf.level = 0.98)

## 
##  One Sample t-test
## 
## data:  x
## t = 6.078, df = 19, p-value = 7.611e-06
## alternative hypothesis: true mean is not equal to 0
## 98 percent confidence interval:
##  3.318466 8.081534
## sample estimates:
## mean of x 
##       5.7

print("The mean change in heart rate lies between 3.318466 and 8.081534 with a confidence of 98%.")

## [1] "The mean change in heart rate lies between 3.318466 and 8.081534 with a confidence of 98%."

A survey indicates that it is important to pay attention to truth in political advertising. Based on a survey of 1200 people, 35% indicated that they found political advertisements to be untrue; 60% say that they will not vote for candidates whose advertisements are judged to be untrue; and of this latter group, only 15% ever complained to the media or to the candidate about their dissatisfaction.

Find a 95% confidence interval for the percentage of people who find political advertising to be untrue.
Find a 95% confidence interval for the percentage of voters who will not vote for candidates whose advertisements are considered to be untrue.
Find a 95% confidence interval for the percentage of those who avoid voting for candidates whose advertisements are considered untrue and who have complained to the media or to the candidate about the falsehood in commercials.
For each case above, interpret the results and state any assumptions you have made.

n = 1200
x1 = 0.35*n
x2 = 0.6*n
x3 = 0.6*0.15*n
prop.test(x1,n)

## 
##  1-sample proportions test with continuity correction
## 
## data:  x1 out of n, null probability 0.5
## X-squared = 107.4, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.3231229 0.3778491
## sample estimates:
##    p 
## 0.35

prop.test(x2,n)

## 
##  1-sample proportions test with continuity correction
## 
## data:  x2 out of n, null probability 0.5
## X-squared = 47.601, df = 1, p-value = 5.225e-12
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.5715851 0.6277671
## sample estimates:
##   p 
## 0.6

prop.test(x3,n)

## 
##  1-sample proportions test with continuity correction
## 
## data:  x3 out of n, null probability 0.5
## X-squared = 805.24, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.07470757 0.10797606
## sample estimates:
##    p 
## 0.09

print("We assume that the distribution is normal.")

## [1] "We assume that the distribution is normal."

In a random sample of 50 college seniors, 18 indicated that they were planning to pursue a graduate degree. Find a 98% confidence interval for the true proportion of all college seniors planning to pursue a graduate degree, and interpret the result, and state any assumptions you have made.

prop.test(18,50,conf.level = 0.98)

## 
##  1-sample proportions test with continuity correction
## 
## data:  18 out of 50, null probability 0.5
## X-squared = 3.38, df = 1, p-value = 0.06599
## alternative hypothesis: true p is not equal to 0.5
## 98 percent confidence interval:
##  0.2148709 0.5341143
## sample estimates:
##    p 
## 0.36

print("We fail to reject the null hypothesis since p-value is more than 0.02.")

## [1] "We fail to reject the null hypothesis since p-value is more than 0.02."

In a random sample of 500 items from a large lot of manufactured items, there were 40 defectives.

Find a 90% confidence interval for the true proportion of defectives in the lot.
Is the assumption of normal approximation valid?
Suppose we suspect that another lot has the same proportion of defectives as in the first lot. What should be the sample size if we want to estimate the true proportion within 0.01 with 90% confidence?

n=500
p=40/500
n*p*(1-p)

## [1] 36.8

print("Since n*p*(1-p) is greater than 10, we can assume that the sample is normally distributed.")

## [1] "Since n*p*(1-p) is greater than 10, we can assume that the sample is normally distributed."

prop.test(40,500,conf.level = 0.9)

## 
##  1-sample proportions test with continuity correction
## 
## data:  40 out of 500, null probability 0.5
## X-squared = 351.12, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 90 percent confidence interval:
##  0.06134797 0.10339737
## sample estimates:
##    p 
## 0.08

0.08*0.92*(qnorm(1-0.1/2)/0.01)^2

## [1] 1991.28

print("The sample size should be about 1992")

## [1] "The sample size should be about 1992"

Suppose that a local TV station in a city wants to conduct a survey to estimate support for the presidential candidate within 2% error with 95% confidence.

How many people should the station survey if they have no information on the support level?
Suppose they have an initial estimate that 60% of the people in the city support the economic policies of the president. How many people should the station survey?

print("If initially they have no data, they should ideally survey as many people as they can, preferably everyone in the city.")

## [1] "If initially they have no data, they should ideally survey as many people as they can, preferably everyone in the city."

(qnorm(1-0.05/2)^2)*0.6*0.4/(0.02^2)

## [1] 2304.875

print("The station should survey about 2305 people.")

## [1] "The station should survey about 2305 people."

A drug is suspected of causing an elevated heart rate in a certain group of high–risk patients. Twenty patients from the group were given the drug. The changes in heart rates were found to be as follows.

-1 8 5 10 2 12 7 9 1 3 4 6 4 12 11 2 -1 10 2 8

Construct a 95% confidence interval for the variance of change in heart rate. Assume that the population has a normal distribution and interpret.

x <- c(-1, 8, 5, 10, 2, 12, 7, 9, 1, 3, 4, 6, 4, 12, 11, 2, -1, 10, 2,8)
chi1 = qchisq(1-0.05/2,19)
chi2 = qchisq(0.05/2,19)
v1 = 19*var(x)/chi1
v2 = 19*var(x)/chi2
cat("We can state with 95% confidence that the population variance lies within",v1,"and",v2)

## We can state with 95% confidence that the population variance lies within 10.1728 and 37.52309

The rates of return (rounded to the nearest percentage) for 25 clients of a financial firm are given in the following table.

13 11 28 6 -4 15 13 6 11 11 3 12 20 3 16 16 15 8 20 15 4 1 12 2 -9

Find a 98% confidence interval for the variance of rates of return.
Use this to find the confidence interval for the population standard deviation, .

x <- c(13, 11, 28, 6, -4, 15, 13, 6, 11, 11, 3, 12, 20, 3, 16, 16, 15, 8, 20, 15, 4, 1, 12, 2, -9)
chi1 = qchisq(1-0.02/2,19)
chi2 = qchisq(0.02/2,19)
v1 = 24*var(x)/chi1
v2 = 24*var(x)/chi2
cat("We can state with 98% confidence that the population variance lies within",v1,"and",v2,"and the standard deviation lies between",sqrt(v1),"and",sqrt(v2))

## We can state with 98% confidence that the population variance lies within 43.43195 and 205.9342 and the standard deviation lies between 6.590292 and 14.35041

Interval Estimation

Aritra Halder

2/16/2021