R03-R04 STA1511
I. Sampling Distributions
• A numerical descriptive measure that is calculated from a sample is called a statistic.
• Statistics vary from sample to sample.
• The sampling distribution of a statistic (not a parameter) is the distribution of values in all possible samples of the same size from the same population.
Sampling Distribution of a Sample Mean
Central Limit Theorem
Central Limit Theorem: If random samples of n observations are drawn from a nonnormal population with finite μ and standard deviation σ, then, when n is large, the sampling distribution of the sample mean (symbol : \(\bar{x}\)) is approximately normally distributed, with mean μ and standard deviation \(\frac{\sigma}{n}\). The approximation becomes more accurate as n becomes large
If the population is normal, then the sampling distribution of will
also be normal
,
no matter what the sample size
. When the population is
approximately symmetric
, the distribution becomes
approximately normal for relatively small values of n
. When
the population is skewed
, the sample size must be
at least 30
before the sampling distribution of becomes
approximately normal.
Illustration:
Finding Probabilities for the Sample Mean
Example
- A random sample of size n = 16 from a normal distribution with μ = 10 and σ = 8. Find the probability of \(\bar{x}>12\).
Answer:
=12
xbar=8/sqrt(16)
sd1=10
miupnorm(q = xbar, mean = miu, sd = sd1, lower.tail = FALSE)
## [1] 0.1586553
- A soda filling machine is supposed to fill cans of soda with 12 fluid ounces. Suppose that the fills are actually normally distributed with a mean of 12.1 oz and a standard deviation of 0.2 oz. What is the probability that the average fill for a 6-pack of soda is less than 12 oz (the probability of \(\bar{x}<12\))?
Answer:
Sampling Distribution of a Sample Proportion
Example
- A member of the DPR in the previous election (last year) got 52% of the vote. This year, he wants to know popularity again. If his popularity does not change, what is the probability that more than half of a sample of 300 voters will vote for him again?
Answer:
# exact
pnorm(q=0.50, 0.52, sd=sqrt(0.52*0.48/300),lower.tail = FALSE)
## [1] 0.755963
- The soda bottler in the previous example claims that only 5% of the soda cans are underfilled. A quality control technician randomly samples 200 cans of soda. What is the probability that more than 10% of the cans are underfilled?
Answer:
Exercise R03
Suppose a person’s blood pressure typically measures with mean 160 and standard deviation 20 mm. If one takes n=5 blood pressure readings, what is the probability the average will be <=150?
A factory manufactures 2000 DVDs every day. It is known that 3% of DVDs are faulty. Using a normal approximation, estimate the probability that at least 0.5% faulty DVDs are produced in one day.
II. Estimation
Point Estimate
A point estimator is a value used to estimate the value of a population parameter.
Properties of Good Estimation:
Unbiased
Minimum variability
Example :
Researchers are interested in the effect of a certain nutrient on the growth rate of plant seedlings. Using a hydroponics growth procedure that used water containing the nutrient, they planted six tomato plants and recorded the heights of each plant 14 days after germination. Those heights, measured in millimeters, were as follows: 55.5, 60.3, 60.6, 62.1, 65.5, 69.2. Find a point estimate of the population mean height of this variety of seedling 14 days after germination.
<- c(55.5, 60.3, 60.6, 62.1, 65.5, 69.2)
growth_rate<-mean(growth_rate)) (mean_growthrate
## [1] 62.2
Interval Estimate (Confidence Interval)
Interval Estimator (Confidence Interval) is an interval used to estimate the value of a population parameter.
The level of confidence (1-𝜶) is the probability that the interval estimate contains the population parameter.
1. Confidence Interval for the Mean of One Population
100(1−α)% confidence interval for the population mean μ:
- For Large Sample or Known 𝜎
Formula:
\(\bar{x}\pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\)
- For Small Sample or Unknown 𝜎
Formula:
\(\bar{x}\pm t_{\alpha/2,df} \frac{s}{\sqrt{n}}\)
Example:
A random sample of 12 students of a certain school typed an average of 79.3 words per minute with a standard deviation of 7.8 words per minute. Assuming normal distribution for the number of words typed per minute, find a 95% confidence interval for the average number of words typed by all students of this school. [A 95% confidence interval for μ (small sample)]
\(\bar{x}\pm t_{\alpha/2,df} \frac{s}{\sqrt{n}}\)
<-79.3
mean1<-qt(0.975,11,lower.tail = TRUE)
t_table t_table
## [1] 2.200985
<-7.8
s_value<-12
n<-t_table*(s_value/sqrt(n))
moecat(" Lower", mean1-moe,"\n","Upper:" ,mean1+moe)
## Lower 74.34412
## Upper: 84.25588
2. Confidence Interval for the Mean Difference between Two Populations
LARGE-SAMPLE CASE (n1 > 30 AND n2 > 30)
- Interval Estimate with \(\sigma_{1}\) and \(\sigma_{2}\) Known
\(\bar{x_{1}}-\bar{x_{2}}\pm z_{\alpha/2} \sigma_{\bar{x_{1}}-\bar{x_{2}}}\)
- Interval Estimate with \(\sigma_{1}\) and \(\sigma_{2}\) Unknown
\(\bar{x_{1}}-\bar{x_{2}}\pm z_{\alpha/2} s_{\bar{x_{1}}-\bar{x_{2}}}\)
Example:
Compare the average daily intake of dairy products of men and women using a 95% confidence interval.Could you conclude, based on this confidence interval, that there is a difference in the average daily intake of dairy products for men and women?
<-756-762
mean_dif1<-50
n1<-50
n2<-35^2
s1<-30^2
s2
<-qnorm(0.975,lower.tail = TRUE)
z_table z_table
## [1] 1.959964
<-sqrt(s1/n1+s2/n2)
s1_2 s1_2
## [1] 6.519202
<-z_table*s1_2
moe
cat(" Lower", mean_dif1-moe,"\n","Upper:" ,mean_dif1+moe)
## Lower -18.7774
## Upper: 6.777402
The Confidence Interval contains 0. Therefore, it is possible that \(\mu_{1}=\mu_{2}\). We can conclude that there is NO DIFFERENCE in average daily intake of dairy products for men and women.
Excercise R04
The factory made observations regarding the longevity of a brake light, according to them, 44 brake lights can be used on average for 4900 days with a standard deviation of 220 days, find a 95% confidence interval for the average service life of the brake lights!
It is known that the chemistry test scores given to 50 female and 75 male students have an average of 76 and 86, respectively. Find the 90% confidence interval for the difference μ1‒μ2 ! Assume the population standard deviations for male and female students are 8 and 6.