This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
The overall learning objective of Chapter 8 is to help you understand estimating parameters of single populations, thereby enabling you to:
1.Estimate the population mean with a known population standard deviation using the z statistic, correcting for a finite population if necessary.
2.Estimate the population mean with an unknown population standard deviation using the t statistic and properties of the t distribution.
3.Estimate a population proportion using the z statistic.
4.Use the chi-square distribution to estimate the population variance given the sample variance.
5.Determine the sample size needed in order to estimate the population mean and population proportion.
\(z=\dfrac{\bar x-\mu}{\dfrac{\sigma}{\sqrt{n}}}\)
so
\(\mu=\bar x-z\dfrac{\sigma}{\sqrt n}\)
and, z can be positive or negative. Thus the preceding expression takes the following form:
\(\bar x\pm z\dfrac{\sigma}{\sqrt n}\)
\(\bar x \pm z_{\alpha/2}\dfrac{\sigma}{\sqrt n}\)
or
\(\bar x-z_{\alpha/2}\dfrac{\sigma}{\sqrt n}\le\mu\le\bar x+z_{\alpha/2}\dfrac{\sigma}{\sqrt n}\)
where
\(\alpha=\) the area under the normal curve outside the confidence interval area \(\alpha/2\) the area in one end(tail) of the distribution outside the confidence interval
is a statistic taken from a sample that is used to estimate a population parameter.
is a range of values within which the analyst can declare, with some confidence, the population parameter lies.
of the interval and is the distance between the statistic computed to estimate a parameter and the parameter. The margin of error takes into account the desired level of confidence, sample size, and standard deviation.
When the margin of error is added to the point estimate, the result is the upper bound of the confidence interval. When the margin of error is subtracted from the point estimate, the result is the lower bound of the confidence interval.
It indicates that, if the company researcher were to randomly select 100 samples of 85 bills and use the results of each sample to construct a 95% confidence interval, approximately 95 of the 100 intervals would contain the population mean. It also indicates that 5% of the intervals would not contain the population mean.
\(\bar x-z_{\alpha/2}\dfrac{\sigma}{\sqrt n}\sqrt{\dfrac{N-n}{N-1}}\le\mu\le\bar x+z_{\alpha/2}\dfrac{\sigma}{\sqrt n}\sqrt{\dfrac{N-n}{N-1}}\)
Use the following information to construct the confidence intervals specified to estimate μ.
95% confidence for \(\bar x=25,\sigma=3.5,\) and \(n=60\)
get_ci <- function(xbar=25, sigma=3.5, n=60, pct=.95, dec=2, N=Inf){
z = qnorm(pct/2+.5)
margin.error = z * sigma / n^.5
if (N != Inf){
margin.error = margin.error * ((N-n)/(N-1))^.5
}
low = round(xbar - margin.error, dec)
high = round(xbar + margin.error, dec)
print(paste0(low, " <= mu <= ", high))
}
get_ci()
## [1] "24.11 <= mu <= 25.89"
get_ci(xbar=119.6, sigma = 23.89, n=75, pct=.98)
## [1] "113.18 <= mu <= 126.02"
get_ci(xbar=3.419, sigma = .974, n=32, pct=.90, dec=3)
## [1] "3.136 <= mu <= 3.702"
get_ci(xbar=56.7, sigma = 12.1, N=500, n=47, pct=.8)
## [1] "54.54 <= mu <= 58.86"
A random sample of 81 items is taken, producing a sample mean of 47. The population standard deviation is 5.89. Construct a 90% confidence interval to estimate the population mean.
get_ci(xbar=47, sigma = 5.89, n=81, pct=.9)
## [1] "45.92 <= mu <= 48.08"
A random sample of size 39 is taken from a population of 200 members. The sample mean is 66 and the population standard deviation is 11. Construct a 96% confidence interval to estimate the population mean. What is the point estimate of the population mean?
get_ci(xbar=66, sigma = 11, N=200, n=39, pct=.96)
## [1] "62.75 <= mu <= 69.25"
A small lawnmower company produced 1,500 lawnmowers in 2005. In an effort to determine how maintenance-free these units were, the company decided to conduct a multiyear study of the 2005 lawnmowers. A sample of 200 owners of these lawnmowers was drawn randomly from company records and contacted. The owners were given an 800 number and asked to call the company when the first major repair was required for the lawnmowers. Owners who no longer used the lawnmower to cut their grass were disqualified. After many years, 187 of the owners had reported. The other 13 disqualified themselves. The average number of years until the first major repair was 5.3 for the 187 owners reporting. It is believed that the population standard deviation was 1.28 years. If the company wants to advertise an average number of years of repair-free lawn mowing for this lawnmower, what is the point estimate? Construct a 95% confidence interval for the average number of years until the first major repair.
get_ci(xbar=5.3, sigma = 1.28, N=1500, n=187, pct=.95)
## [1] "5.13 <= mu <= 5.47"
A community health association is interested in estimating the average number of maternity days women stay in the local hospital. A random sample is taken of 36 women who had babies in the hospital during the past year. The following numbers of maternity days each woman was in the hospital are rounded to the nearest day.
days <- c(3, 3, 4, 3, 2, 5, 3, 1, 4, 3, 4, 2, 3, 5, 3, 2, 4, 3, 2, 4, 1, 6, 3, 4, 3, 3, 5, 2, 3, 2, 3, 5, 4, 3, 5, 4)
get_ci(xbar=mean(days), sigma = 1.17, n=36, pct=.98, dec=3)
## [1] "2.852 <= mu <= 3.759"
According to the U.S. Census Bureau, the average travel time to work in Philadelphia is 27.4 minutes. Suppose a business researcher wants to estimate the average travel time to work in Cleveland using a 95% level of confidence. A random sample of 45 Cleveland commuters is taken and the travel time to work is obtained from each. The data follow. Assuming a population standard deviation of 5.124, compute a 95% confidence interval on the data. What is the point estimate and what is the error of the interval? Explain what these results means in terms of Philadelphia commuters.
data.8.11 <- c(27, 25, 19, 21, 24, 27, 29, 34, 18, 29, 16, 28, 20, 32, 27, 28, 22, 20, 14, 15, 29, 28, 29, 33, 16, 29, 28, 28, 27, 23, 27, 20, 27, 25, 21, 18, 26, 14, 23, 27, 27, 21, 25, 28, 30)
get_ci(xbar=mean(data.8.11), sigma = 5.124, n=45, pct=.95, dec=3)
## [1] "23.036 <= mu <= 26.03"
the t formula \(t=\dfrac{\bar x-\mu}{\dfrac{s}{\sqrt n}}\)
\(\bar x\pm t_{\alpha/2,n-1}\dfrac{s}{\sqrt n}\)
\(\bar x-t_{\alpha/2,n-1}\dfrac{s}{\sqrt n}\le\mu\le\bar x+t_{\alpha/2,n-1}\dfrac{s}{\sqrt n}\)
\(df=n-1\)
Suppose the following data are selected randomly from a population of normally distributed values. Construct a 95% confidence interval to estimate the population mean.
v <- c(40,51,43,48,44,57,54,39,42,48,45,39,43)
get_ci.t <- function(v, xbar=FALSE, s=FALSE, n=FALSE, p=.95, dec=2, N=Inf){
if (!xbar) {xbar = mean(v)}
if (!n) {n = length(v)}
if (!s) {s = sd(v)}
df = n-1
t.val = qt((p/2+.5), df)
margin.error = t.val * s / n^.5
if (N != Inf){
margin.error = margin.error * ((N-n)/(N-1))^.5
}
low = round(xbar - margin.error, dec)
high = round(xbar + margin.error, dec)
print(paste0(low, " <= mu <= ", high, ", point estimate is: ", round(xbar, dec)))
}
get_ci.t(v)
## [1] "42.17 <= mu <= 49.06, point estimate is: 45.62"
get_ci.t(xbar=128.4, s=20.6, n=41, p=.98)
## [1] "120.6 <= mu <= 136.2, point estimate is: 128.4"
v <- c(16.4,17.1,17.0,15.6,16.2,14.8,16.0,15.6,17.3,17.4,15.6,15.7,17.2,16.6,16.0,15.3,15.4,16.0,15.8,17.2,14.6,15.5,14.9,16.7,16.3)
get_ci.t(v, p=.99, dec=3)
## [1] "15.631 <= mu <= 16.545, point estimate is: 16.088"
A valve manufacturer produces a butterfly valve composed of two semicircular plates on a common spindle that is used to permit flow in one direction only. The semicircular plates are supplied by a vendor with specifications that the plates be 2.37 millimeters thick and have a tensile strength of five pounds per millimeter. A random sample of 20 such plates is taken. Electronic calipers are used to measure the thickness of each plate; the measurements are given here. Assuming that the thicknesses of such plates are normally distributed, use the data to construct a 95% level of confidence for the population mean thickness of these plates. What is the point estimate? How much is the error of the interval?
v <- c(2.4066,2.4579,2.6724,2.1228,2.3238,2.1328,2.0665,2.2738,2.2055,2.5267,2.5937,2.1994,2.5392,2.4359,2.2146,2.1933,2.4575,2.7956,2.3353,2.2699)
get_ci.t(v, p=.95, dec=5)
## [1] "2.26886 <= mu <= 2.45346, point estimate is: 2.36116"
print(paste0("The error of the interval is: ", round(2.45346 - 2.36116, 5)))
## [1] "The error of the interval is: 0.0923"
The marketing director of a large department store wants to estimate the average number of customers who enter the store every five minutes. She randomly selects five-minute intervals and counts the number of arrivals at the store. She obtains the figures 58, 32, 41, 47, 56, 80, 45, 29, 32, and 78. The analyst assumes the number of arrivals is normally distributed. Using these data, the analyst computes a 95% confidence interval to estimate the mean value for all five-minute intervals. What interval values does she get?
v <- c(58, 32, 41, 47, 56, 80, 45, 29, 32, 78)
get_ci.t(v, p=.95)
## [1] "36.77 <= mu <= 62.83, point estimate is: 49.8"
How much experience do supply-chain transportation managers have in their field? Suppose in an effort to estimate this, 41 supply-chain transportation managers are surveyed and asked how many years of managerial experience they have in transportation. Survey results (in years) are shown below. Use these data to construct a 99% confidence interval to estimate the mean number of years of experience in transportation. Assume that years of experience in transportation is normally distributed in the population.
v <- c(5,8,10,21,20,25,14,6,19,3,1,9,11,2,3,13,2,4,9,4,5,4,21,7,6,3,28,17,32,2,25,8,13,17,27,7,3,15,4,16,6)
get_ci.t(v, p=.99)
## [1] "7.53 <= mu <= 14.66, point estimate is: 11.1"
\(z=\dfrac{\hat p-p}{\sqrt{\dfrac{p\cdot q}{n}}}\) where \(q=1-p\) and \(n\cdot p>5,n\cdot q>5\)
Simplified:
\(z=\dfrac{\hat p-p}{\sqrt{\dfrac{\hat p\cdot\hat q}{n}}}\)
where
\(\hat q=1-\hat p\)
\(\hat p-z_{\alpha/2}\sqrt{\dfrac{\hat p\cdot\hat q}{n}}\le p\le\hat p+z_{\alpha/2}\sqrt{\dfrac{\hat p\cdot\hat q}{n}}\)
Use the information about each of the following samples to compute the confidence interval to estimate p.
get.z.p <- function(n=44, phat=.51, ci=.9, dec=3){
qhat = 1 - phat
p.ci = .5 + ci/2
z = qnorm(p.ci)
me = z * ((phat * qhat)/n)^.5
low = round(phat - me, dec)
high = round(phat + me, dec)
print(paste0(low, " <= mu <= ", high, ", point estimate is: ", round(phat, dec)))
}
get.z.p()
## [1] "0.386 <= mu <= 0.634, point estimate is: 0.51"
get.z.p(n=300, phat=.82, ci=.95)
## [1] "0.777 <= mu <= 0.863, point estimate is: 0.82"
get.z.p(n=1150, phat=.48, ci=.9)
## [1] "0.456 <= mu <= 0.504, point estimate is: 0.48"
get.z.p(n=95, phat=.32, ci=.88)
## [1] "0.246 <= mu <= 0.394, point estimate is: 0.32"
Suppose a random sample of 85 items has been taken from a population and 40 of the items contain the characteristic of interest. Use this information to
get.z.p(n=85, phat=40/85, ci=.9, dec=2)
## [1] "0.38 <= mu <= 0.56, point estimate is: 0.47"
get.z.p(n=85, phat=40/85, ci=.95, dec=2)
## [1] "0.36 <= mu <= 0.58, point estimate is: 0.47"
get.z.p(n=85, phat=40/85, ci=.99, dec=2)
## [1] "0.33 <= mu <= 0.61, point estimate is: 0.47"
According to the Stern Marketing Group, 9 out of 10 professional women say that financial planning is more important today than it was five years ago. Where do these women go for help in financial planning? Forty-seven percent use a financial advisor (broker, tax consultant, financial planner). Twenty-eight percent use written sources such as magazines, books, and newspapers. Suppose these figures were obtained by taking a sample of 560 professional women who said that financial planning is more important today than it was five years ago.
Construct a 95% confidence interval for the proportion of professional women who use a financial advisor. Use the percentage given in this problem as the point estimate.
get.z.p(n=560, phat=.47, ci=.95, dec=4)
## [1] "0.4287 <= mu <= 0.5113, point estimate is: 0.47"
Construct a 90% confidence interval for the proportion of professional women who use written sources. Use the percentage given in this problem as the point estimate.
get.z.p(n=560, phat=.28, ci=.9, dec=4)
## [1] "0.2488 <= mu <= 0.3112, point estimate is: 0.28"
The highway department wants to estimate the proportion of vehicles on Interstate 25 between the hours of midnight and 5:00 a.m. that are 18-wheel tractor trailers. The estimate will be used to determine highway repair and construction considerations and in highway patrol planning. Suppose researchers for the highway department counted vehicles at different locations on the interstate for several nights during this time period. Of the 3,481 vehicles counted, 927 were 18-wheelers.
Determine the point estimate for the proportion of vehicles traveling Interstate 25 during this time period that are 18-wheelers. Construct a 99% confidence interval for the proportion of vehicles on Interstate 25 during this time period that are 18-wheelers.
phat=927/3481
get.z.p(n=3481, phat=phat, ci=.99)
## [1] "0.247 <= mu <= 0.286, point estimate is: 0.266"
According to Runzheimer International, in a survey of relocation administrators 63% of all workers who rejected relocation offers did so for family considerations. Suppose this figure was obtained by using a random sample of the files of 672 workers who had rejected relocation offers. Use this information to construct a 95% confidence interval to estimate the population proportion of workers who reject relocation offers for family considerations.
get.z.p(n=672, phat=.63, ci=.95, dec=4)
## [1] "0.5935 <= mu <= 0.6665, point estimate is: 0.63"
\(s^2=\dfrac{\sum(x-\bar x)^2}{n-1}\)
\(\chi^2=\dfrac{(n-1)s^2}{\sigma^2}\)
\(df=n-1\)
The chi-square distribution is not symmetrical, and its shape will vary according to the degrees of freedom. Figure 8.13 shows the shape of chi-square distributions for three different degrees of freedom.
\(\dfrac{(n-1)s^2}{\chi^2_{\alpha/2}}\le\sigma^2\le\dfrac{(n-1)s^2}{\chi^2_{1-\alpha/2}}\)
\(df=n-1\)
For each of the following sample results, construct the requested confidence interval. Assume the data come from normally distributed populations.
chi <-function(n=12, xbar=28.4, s2=44.9, ci=.99, dec=2){
alpha = 1-ci
p.ci = alpha/2
df = n - 1
val1 = df * s2
low = round(val1/qchisq(p.ci, df, lower.tail = FALSE), dec)
high = round(val1/qchisq(p.ci, df), dec)
print(paste0(low, " <= ", expression(sigma^2), " <= ", high))
}
chi()
## [1] "18.46 <= sigma^2 <= 189.73"
chi(n=7, s2=1.24^2, ci=.95)
## [1] "0.64 <= sigma^2 <= 7.46"
chi(n=20, s2=32^2, ci=.9)
## [1] "645.45 <= sigma^2 <= 1923.1"
chi(n=17, s2=18.56, ci=.8)
## [1] "12.61 <= sigma^2 <= 31.89"
The Interstate Conference of Employment Security Agencies says the average workweek in the United States is down to only 35 hours, largely because of a rise in part-time workers. Suppose this figure was obtained from a random sample of 20 workers and that the standard deviation of the sample was 4.3 hours. Assume hours worked per week are normally distributed in the population. Use this sample information to develop a 98% confidence interval for the population variance of the number of hours worked per week for a worker.
chi(n=20, s2=4.3^2, ci=.98)
## [1] "9.71 <= sigma^2 <= 46.03"
What is the point estimate?
"Why is it 18.49?"
## [1] "Why is it 18.49?"
A manufacturing plant produces steel rods. During one production run of 20,000 such rods, the specifications called for rods that were 46 centimeters in length and 3.8 centimeters in width. Fifteen of these rods comprising a random sample were measured for length; the resulting measurements are shown here. Use these data to estimate the population variance of length for the rods. Assume rod length is normally distributed in the population. Construct a 99% confidence interval. Discuss the ramifications of the results.
v <- c(44,47,43,46,46,45,43,44,47,46,48,48,43,44,45)
chi(n=2000, s2=var(v), ci=.99)
## [1] "2.83 <= sigma^2 <= 3.33"
Suppose a random sample of 14 people 30-39 years of age produced the household incomes shown here. Use these data to determine a point estimate for the population variance of household incomes for people 30-39 years of age and construct a 95% confidence interval. Assume household income is normally distributed.
v <- c(37500, 44800, 33500, 36900, 42300, 32400, 28000, 41200, 46600, 38500, 40200, 32000, 35500, 36800)
chi(n=14, s2=var(v), ci=.95)
## [1] "14084035.72 <= sigma^2 <= 69553702.47"
\(z=\dfrac{\bar x-\mu}{\dfrac{\sigma}{\sqrt n}}\)
Let \(E=(\bar x-\mu)=\) the margin of error of estimation \(z=\dfrac{E}{\dfrac{\sigma}{\sqrt n}}\)
Solving for n yields
\(n=\dfrac{z^2_{\alpha/2}\sigma^2}{E^2}=(\dfrac{z_{\alpha/2}\sigma}{E})^2\)
Sometimes in estimating sample size the population variance is known or can be determined from past studies. Other times, the population variance is unknown and must be estimated to determine the sample size. In such cases, it is acceptable to use the following estimate to represent σ. \(\sigma\approx\dfrac 1 4(range)\)
\(z=\dfrac{\hat p-p}{\sqrt{\dfrac{p\cdot q}{n}}}\)
where
\(E=\hat p-p\)
\(z=\dfrac E{\sqrt{\dfrac{p\cdot q}n}}\)
\(n=\dfrac{z^2pq}{E^2}\)
Determine the sample size necessary to estimate p for the following information.
get.n.p <- function(E=.02, p=.4, ci=.96, z.dec=2) {
alpha = 1-ci
z = round(qnorm(alpha/2), z.dec)
return(ceiling(z^2 * p * (1-p) / E^2))
}
get.n.p()
## [1] 2522
get.n.p(E=.04, p=.5, ci=.95)
## [1] 601
get.n.p(E=.05, p=.55, ci=.9, z.dec = 3)
## [1] 268
Textbook answer is 16577, likely rounding error
get.n.p(E=.01, p=.5, ci=.99, z.dec = 4)
## [1] 16587
Suppose you have been following a particular airline stock for many years. You are interested in determining the average daily price of this stock in a 10-year period and you have access to the stock reports for these years. However, you do not want to average all the daily prices over 10 years because there are several thousand data points, so you decide to take a random sample of the daily prices and estimate the average. You want to be 90% confident of your results, you want the estimate to be within $2.00 of the true average, and you believe the standard deviation of the price of this stock is about $12.50 over this period of time. How large a sample should you take?
get.n <- function(E=2, sigma=12.5, ci=.9, z.dec=2){
alpha = 1-ci
z = round(qnorm(alpha/2), z.dec)
return(ceiling((z*sigma/E)^2))
}
get.n()
## [1] 106
Suppose a production facility purchases a particular component part in large lots from a supplier. The production manager wants to estimate the proportion of defective parts received from this supplier. She believes the proportion defective is no more than .20 and wants to be within .02 of the true proportion of defective parts with a 90% level of confidence. How large a sample should she take?
get.n.p(E=.02, p=.2, ci=.9, z.dec = 3)
## [1] 1083
What proportion of shoppers at a large appliance store actually makes a large-ticket purchase? To estimate this proportion within 10% and be 95% confident of the results, how large a sample should you take? Assume you have no idea what proportion of all shoppers actually make a large-ticket purchase.
get.n.p(E=.1, p=.5, ci=.95)
## [1] 97