SAMPLING DISTRIBUTION AND CENTRAL LIMIT THEOREM

  1. Sampling distribution of the Mean:
  1. What is the difference between a population mean μ and a sample mean x ̄?

Population mean is the average of all the values in a population and a sample mean is the average of values in a subset of the population. The sample mean can be used to get an idea of what the population mean would be.

  1. What is the difference between a population standard deviation σ and a sample standard deviation s?

A population std dev is the measure of the spread of values for an entire population. A Sample standard deviation is derived from a subset of the population that is being observed. Measuring an entire population can be very time consuming and therefore a sample can be used to understand the population.

  1. Say you were to take a sample of size n from some population and calculate a sample mean (x ̄1 ). Then you take a second sample from the same population and calculate that sample mean (x ̄2 ). Would you expect x ̄1 and x ̄2 to be exactly the same? Why or why not?

x ̄1 and x ̄2 would not be expected to be the same. Because were using two different samples from a population and it is very likely that the samples will not be the same. They will however be similar when averaged out and compared. Therefore due to outliers or simply different samples I would not expect them to be exactly the same.

  1. Would the Central Limit Theorem (CLT) work if you were sampling from populations that are not normally distributed?

Yes, provided that the sample size is sufficient.

  1. Does the mean of the sampling distribution of x ̄ to change as n gets larger? If yes, then how so?

It gets closer to the population mean as n gets larger.

  1. Does the standard deviation of the sampling distribution of x ̄ to change as n gets larger? If yes, then how so?

Yes, as the sample mean gets closer to the population mean it will get smaller.

  1. Suppose that X is a random variable that represents the age at time of death in the US. Assume that X is normally distributed with mean 73.9 years of age, and standard deviation 18.1 years of age.
  1. What is the probability that a randomly selected person has an age between 70 and 78 years at the time of death?

P(70 < X < 78) = P( 70-73.9 / 4.25 < X-73.9 / 4.25 < 78-73.9/4.25) P(70 < X < 78) = P(-0.89 < Z < 0.96) P( Z > -0.89) =.1867, P(Z < 0.96) = 0.8315 P(-0.89 < Z < 0.96) = 0.8315 – 0.1867 = 0.6448 P = 0.65

  1. Suppose we now draw repeated samples of size 100 from the same population, what proportion of the samples would we expect to have a mean that lies between 70 and 78 years of age?

64.5

Confidence Intervals

  1. Suppose you are interested in estimating the mean height of the population of people between the ages of 12 and 40 who suffer from fetal alcohol syndrome. Assume that the heights follow a normal distribution with unknown mean µ but with known standard deviation σ = 6 cm. A random sample of 31 patients is selected from the underlying population; the mean height for these individual is ¯x = 147.4 cm.
  1. What is the point estimate for µ?

σ2 = std dev ^ 2 = 6^2 = 36 36 = E (X- μ)^2 / N-1 = E (X- μ)^2 / 30 E (X- μ ^2) = 36*30 = 1080 X-147.4 = √1080 = 32.86

32.86 + 147.4 = 180.26 = point estimate for μ

  1. Construct a 95% confidence interval for µ. Do this by hand

147.4 (±) 1.96 (6/(√31)) = 147.4 (±) 2.11

Interval: (145.29, 149.51)

  1. Interpret the confidence interval.

There’s a 95% chance that someone will be between 145.29 cm and 149.51 cm in height

  1. Construct a 90% confidence interval for µ. How does this compare to the 95% confidence interval?

147.4 (±) 1.645 (6/(√31)) = 147.4 (±) 1.77

90 percent Interval: (145.63, 149.17)

It’s very close in value. However limited in range.

Normal

  1. Remember the z-score for an observation x is nothing more than the distance of x from the mean, measured in units of the standard deviation. Complete Table 2:

See attached table in separate word document

  1. For Z, a standard random normal variable, what is P(Z > 1)? Hint: Use the standard normal table
  1. What is the P(Z < −1)?

1-0.8413 = 0.1587

  1. What is the P(−1 < Z < 1)?

1-2(0.1586) =1-.3173 = 0.6827

  1. What is the P(−2 < Z < 2)?

P Z<-2 =0.0227 (PZ>-2 = 0.9772), P Z<2 = 0.9772

P = 1 -2(0.0227) = 1- 0.0455 = 0.955

  1. What value of z cuts off the upper 30% of the standard normal distribution?

.525

  1. What value cuts off the lower 10% of the standard normal distribution?

-1.285

  1. Among females in the United States between 18 and 74 years of age, diastolic blood pressure is normally distributed with µ = 77 mm Hg and standard deviation σ = 11.6mm Hg.
  1. What is the probability that a randomly selected woman has a diastolic blood pressure lees than 60 mm Hg?

P (X-77/ 11.6 < 60-77/ 11.6) = P (Z <-1.46) = 0.0721

  1. What is the probability that she has a diastolic blood pressure greater than 90 mm Hg?

P (X-77/ 11.6 > 90-77/ 11.6) P (Z>1.12) =.8686 = 0.87

  1. What is the probability that the woman has a diastolic blood pressure between 60 and 90 mm Hg?

0.8686 – 0.0721 = 0.7965

  1. What is the probability that among 10 females selected at random from the population, exactly two will have blood pressure outside of the range 60 to 90 mm Hg?

1-0.7965 = 0.2034 (10/2) ((.2034)^2 * (.7965)^8) (10!/2!(8)!) (.0067) 45(.0067) = 0.302

Hypothesis Testing

  1. Suppose you are interested in the distribution of the concentration of benzene in a specific brand of cigar. Assume the distribution is approximately normal with unknown mean and standard deviation. Furthermore you know that the population mean concentration of benzene in cigarettes is 81µg/g tobacco and you want to test whether the population mean concentration of benzene in cigars is different from the population mean concentration of benzene in cigarettes. A random sample of seven cigars has mean benzene concentration of ¯x = 151µg/g and standard deviation σ = 9µg/g. Perform an appropriate hypothesis test at the 0.05 level of significance.
  1. Is this a one-sample or two-sample test scenario?

One-Sample

  1. Is a one-sided or two-sided test more appropriate?

Two-sided is more appropriate. This is because its more and less than 81ug

  1. State the null hypothesis.

The population mean concentration of benzene in cigars is the same as the population mean concentration of benzene in cigarettes

  1. State the alternative hypothesis.

the population mean concentration of benzene in cigars is different from the population mean concentration of benzene in cigarettes.

  1. Calculate the test statistic by hand. What is the distribution of the test statistic?

u = 81 A random sample of seven cigars has mean benzene concentration of x ̄ = 151μg/g standard deviation σ = 9μg/g Standard error of mean= std dev /(√n)

Standard error = 9/(√7) = 3.402

t = (-x – u )/ SE = (151-81)/(3.402) = 20.58

  1. Calculate and interpret the p-value of the test statistic.

2.45 is critical value, p is close to 0

  1. Do you reject or fail to reject the null hypothesis?

Reject the null hypothesis.

  1. What do you conclude?

Benzene level in the sample is significantly more than the mean level

  1. What is the corresponding 95% confidence interval for the population mean concentration of benzene level in cigars? Is this consistent with the results from the hypothesis test?

Same equation used earlier, yields: (144.33, 157.67), yes

  1. Suppose now that you thought that population mean concentration of benzene in cigars would be higher than the population mean concentration of benzene in cigarettes. Is a one-sided or two-sided test more appropriate?

One sided, since were thinking higher (one direction from our value)

  1. State the new null hypothesis.

The mean concentration of benzene in cigars is higher than 81 ug / g

  1. State the new alternative hypothesis.

The mean concentration of benzene in cigars is not higher than 81 ug / g

  1. Calculate the test statistic by hand. Compare with the previous test statistic. What is the distribution of the test statistic?

z = (x ̄ - µ)/SE = (x ̄ - µ)/(σ/ √n) = (151-81)/(9/√7) = 20.58

  1. Calculate and interpret the p-value.

2.45 is critical value, p is close to 0

  1. Do you reject or fail to reject the null hypothesis?

Reject the null hypothesis.

  1. What do you conclude?

This is significantly a higher concentration of benzene in cigars than cigarettes.