R03-R04 STA1511

I. Sampling Distributions

• A numerical descriptive measure that is calculated from a sample is called a statistic.

• Statistics vary from sample to sample.

• The sampling distribution of a statistic (not a parameter) is the distribution of values in all possible samples of the same size from the same population.

Sampling Distribution of a Sample Mean

Central Limit Theorem

Central Limit Theorem: If random samples of n observations are drawn from a nonnormal population with finite μ and standard deviation σ, then, when n is large, the sampling distribution of the sample mean (symbol : \(\bar{x}\)) is approximately normally distributed, with mean μ and standard deviation \(\frac{\sigma}{n}\). The approximation becomes more accurate as n becomes large

If the population is normal, then the sampling distribution of will also be normal, no matter what the sample size. When the population is approximately symmetric, the distribution becomes approximately normal for relatively small values of n. When the population is skewed, the sample size must be at least 30 before the sampling distribution of becomes approximately normal.

Illustration:

Finding Probabilities for the Sample Mean

Example

A random sample of size n = 16 from a normal distribution with μ = 10 and σ = 8. Find the probability of \(\bar{x}>12\).

Answer:

xbar=12
sd1=8/sqrt(16)
miu=10
pnorm(q = xbar, mean = miu, sd = sd1, lower.tail = FALSE)

## [1] 0.1586553

A soda filling machine is supposed to fill cans of soda with 12 fluid ounces. Suppose that the fills are actually normally distributed with a mean of 12.1 oz and a standard deviation of 0.2 oz. What is the probability that the average fill for a 6-pack of soda is less than 12 oz (the probability of \(\bar{x}<12\))?

Answer:

Sampling Distribution of a Sample Proportion

Example

A member of the DPR in the previous election (last year) got 52% of the vote. This year, he wants to know popularity again. If his popularity does not change, what is the probability that more than half of a sample of 300 voters will vote for him again?

Answer:

# exact
pnorm(q=0.50, 0.52, sd=sqrt(0.52*0.48/300),lower.tail = FALSE)

## [1] 0.755963

The soda bottler in the previous example claims that only 5% of the soda cans are underfilled. A quality control technician randomly samples 200 cans of soda. What is the probability that more than 10% of the cans are underfilled?

Answer:

Exercise R03

Suppose a person’s blood pressure typically measures with mean 160 and standard deviation 20 mm. If one takes n=5 blood pressure readings, what is the probability the average will be <=150?
A factory manufactures 2000 DVDs every day. It is known that 3% of DVDs are faulty. Using a normal approximation, estimate the probability that at least 0.5% faulty DVDs are produced in one day.

II. Estimation

Point Estimate

A point estimator is a value used to estimate the value of a population parameter.

Properties of Good Estimation:

Unbiased
Minimum variability

Example :

Researchers are interested in the effect of a certain nutrient on the growth rate of plant seedlings. Using a hydroponics growth procedure that used water containing the nutrient, they planted six tomato plants and recorded the heights of each plant 14 days after germination. Those heights, measured in millimeters, were as follows: 55.5, 60.3, 60.6, 62.1, 65.5, 69.2. Find a point estimate of the population mean height of this variety of seedling 14 days after germination.

growth_rate<- c(55.5, 60.3, 60.6, 62.1, 65.5, 69.2)
(mean_growthrate<-mean(growth_rate))

## [1] 62.2

Interval Estimate (Confidence Interval)

Interval Estimator (Confidence Interval) is an interval used to estimate the value of a population parameter.

The level of confidence (1-𝜶) is the probability that the interval estimate contains the population parameter.

1. Confidence Interval for the Mean of One Population

100(1−α)% confidence interval for the population mean μ:

For Large Sample or Known 𝜎

Formula:

\(\bar{x}\pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\)

For Small Sample or Unknown 𝜎

Formula:

\(\bar{x}\pm t_{\alpha/2,df} \frac{s}{\sqrt{n}}\)

Example:

A random sample of 12 students of a certain school typed an average of 79.3 words per minute with a standard deviation of 7.8 words per minute. Assuming normal distribution for the number of words typed per minute, find a 95% confidence interval for the average number of words typed by all students of this school. [A 95% confidence interval for μ (small sample)]

\(\bar{x}\pm t_{\alpha/2,df} \frac{s}{\sqrt{n}}\)

mean1<-79.3
t_table<-qt(0.975,11,lower.tail = TRUE)
t_table

## [1] 2.200985

s_value<-7.8
n<-12
moe<-t_table*(s_value/sqrt(n))
cat(" Lower", mean1-moe,"\n","Upper:" ,mean1+moe)

##  Lower 74.34412 
##  Upper: 84.25588

2. Confidence Interval for the Mean Difference between Two Populations

LARGE-SAMPLE CASE (n1 > 30 AND n2 > 30)

Interval Estimate with \(\sigma_{1}\) and \(\sigma_{2}\) Known

\(\bar{x_{1}}-\bar{x_{2}}\pm z_{\alpha/2} \sigma_{\bar{x_{1}}-\bar{x_{2}}}\)

Interval Estimate with \(\sigma_{1}\) and \(\sigma_{2}\) Unknown

\(\bar{x_{1}}-\bar{x_{2}}\pm z_{\alpha/2} s_{\bar{x_{1}}-\bar{x_{2}}}\)

Example:

Compare the average daily intake of dairy products of men and women using a 95% confidence interval.Could you conclude, based on this confidence interval, that there is a difference in the average daily intake of dairy products for men and women?

mean_dif1<-756-762
n1<-50
n2<-50
s1<-35^2
s2<-30^2

z_table<-qnorm(0.975,lower.tail = TRUE)
z_table

## [1] 1.959964

s1_2<-sqrt(s1/n1+s2/n2)
s1_2

## [1] 6.519202

moe<-z_table*s1_2

cat(" Lower", mean_dif1-moe,"\n","Upper:" ,mean_dif1+moe)

##  Lower -18.7774 
##  Upper: 6.777402

The Confidence Interval contains 0. Therefore, it is possible that \(\mu_{1}=\mu_{2}\). We can conclude that there is NO DIFFERENCE in average daily intake of dairy products for men and women.

Excercise R04

The factory made observations regarding the longevity of a brake light, according to them, 44 brake lights can be used on average for 4900 days with a standard deviation of 220 days, find a 95% confidence interval for the average service life of the brake lights!
It is known that the chemistry test scores given to 50 female and 75 male students have an average of 76 and 86, respectively. Find the 90% confidence interval for the difference μ1‒μ2 ! Assume the population standard deviations for male and female students are 8 and 6.