Data 606 Chapter 4 Practice

Below are Pratice questions: Practice: 4.3, 4.13, 4.23, 4.25, 4.39, 4.47

4.3 College credits.

A college counselor is interested in estimating how many credits a student typically enrolls in each semester. The counselor decides to randomly sample 100 students by using the registrar’s database of students. The histogram below shows the distribution of the number of credits taken by these students. Sample statistics for this distribution are also provided. 4.3

a) What is the point estimate for the average number of credits taken per semester by students at this college? What about the median?

The point estimate for mean : 13.65 The point estimate for median : 14

(b) What is the point estimate for the standard deviation of the number of credits taken per semester by students at this college? What about the IQR?

The point estimate for SD : 1.91 The point estimate for IQR : Q3-Q1 = 15-13 = 2

(c) Is a load of 16 credits unusually high for this college? What about 18 credits? Explain your reasoning. Hint: Observations farther than two standard deviations from the mean are usually considered to be unusual.

The approximate 95% confidence interval is ( 13.65 - 2 * SD , 13.65 + 2* SD ) = (9.83, 17.47)

So a load of 16 is not an unusal credit as per the 95% confidence level. But 18 is not, since it is outside of the confidence level.

confidence_lower_range <- 13.65 -2 * 1.91
confidence_upper_range <-  13.65 + 2 * 1.91

(d) The college counselor takes another random sample of 100 students and this time finds a sample mean of 14.02 units. Should she be surprised that this sample statistic is slightly different than the one from the original sample? Explain your reasoning.

No, this is under the confidence level. This could be due to the sampling error.

(e) The sample means given above are point estimates for the mean number of credits taken by all students at that college. What measures do we use to quantify the variability of this estimate

(Hint: recall that SD¯x = ! pn )? Compute this quantity using the data from the original sample.

Standard error can be used to quantify the variability of the estimate. = 0.19

SE <- 1.91/sqrt(100)
SE
## [1] 0.191

4.13 Waiting at an ER, Part I.

A hospital administrator hoping to improve wait times decides to estimate the average emergency room waiting time at her hospital. She collects a simple random sample of 64 patients and determines the time (in minutes) between when they checked in to the ER until they were first seen by a doctor. A 95% confidence interval based on this sample is (128 minutes, 147 minutes), which is based on the normal model for the mean. Determine whether the following statements are true or false, and explain your reasoning. #### (a) This confidence interval is not valid since we do not know if the population distribution of the ER wait times is nearly Normal.

This can be true or false since there is no indication on the skewness.

If the distribution is normal and since the no of sample is greater than 30, confidence is valid. But if the distribution has some skewness and since the no of sample is less than 100, the confidence range can be go invalid.

(b) We are 95% confident that the average waiting time of these 64 emergency room patients is

between 128 and 147 minutes.

False, confidence interval is an estimate of the population based on the sample. It cant be really apply on samples.

(c) We are 95% confident that the average waiting time of all patients at this hospital’s emergency

room is between 128 and 147 minutes.

True, based on above qn reasoning.

(d) 95% of random samples have a sample mean between 128 and 147 minutes.

False, confidence interval is an estimate of the population based on the sample. It cant be really apply on samples.

(e) A 99% confidence interval would be narrower than the 95% confidence interval since we need

to be more sure of our estimate.

False. it would be broader because it would be taking into account more possibilities.

(f) The margin of error is 9.5 and the sample mean is 137.5.

True

sample_mean <- (128 + 147)/2
sample_mean
## [1] 137.5
margin <- 147 - sample_mean
margin
## [1] 9.5

(g) In order to decrease the margin of error of a 95% confidence interval to half of what it is now,

we would need to double the sample size.

you would need to make the sample size 4 times in order to decrease the margin of error.

4.23 Nutrition labels.

The nutrition label on a bag of potato chips says that a one ounce (28 gram) serving of potato chips has 130 calories and contains ten grams of fat, with three grams of saturated fat. A random sample of 35 bags yielded a sample mean of 134 calories with a standard deviation of 17 calories. Is there evidence that the nutrition label does not provide an accurate measure of calories in the bags of potato chips? We have verified the independence, sample size, and skew conditions are satisfied.

H0 -> Null Hypothesis => mu = 130 HA -> Alternate Hypothesis => mu not equal 130

calculate the z value of the sampled mean.

z <- (134 - 130)/(17/sqrt(35))
z
## [1] 1.392019

calculate the pvalue( both tail and head part )

pvalue <- pnorm(z, lower.tail=FALSE)
totalPval <- pvalue + pvalue
totalPval
## [1] 0.1639167

Since the total p val = 0.1639167 and it is greater than alpha(.05), we can not reject null hyposthesis. Hence there is no evidence that the nutrition label does not provide an accurate measure of calories in the bags of potato chips.

4.25 Waiting at an ER, Part III.

The hospital administrator mentioned in Exercise 4.13 randomly selected 64 patients and measured the time (in minutes) between when they checked in to the ER and the time they were first seen by a doctor. The average time is 137.5 minutes and the standard deviation is 39 minutes. She is getting grief from her supervisor on the basis that the wait times in the ER has increased greatly from last year’s average of 127 minutes. However, she claims that the increase is probably just due to chance.

(a) Are conditions for inference met? Note any assumptions you must make to proceed.

Independence -> since it is random sample, it satisfies the independence conditions. Sample Size -> Since the number of samples is more than 30, it satisfies the sample size Distrubition -> Since there is no information on skewness, I’m assuming that the distrubution is not strongly skewed.

(b) Using a significance level of ↵ = 0.05, is the change in wait times statistically significant? Use a two-sided test since it seems the supervisor had to inspect the data before she suggested an increase occurred.

Since two sided test, pvalue is calcualted as 0.03 which is less than the significance level(0.05). So null hypotesis is rejected.

z <- (137.5 - 127)/(39/sqrt(64))
z
## [1] 2.153846
pvalue <- 2*(1- pnorm(z))
pvalue
## [1] 0.03125224

(c) Would the conclusion of the hypothesis test change if the significance level was changed to ↵ = 0.01?

Yes, in this case the null hypothesis is failed to reject.

4.39 Weights of pennies.

The distribution of weights of United States pennies is approximately normal with a mean of 2.5 grams and a standard deviation of 0.03 grams.

(a) What is the probability that a randomly chosen penny weighs less than 2.4 grams?

0.04%

mean = 2.5
sd = 0.03
#using the Z score formula
z <- (2.4-mean)/sd
pnorm(z)
## [1] 0.0004290603

(b) Describe the sampling distribution of the mean weight of 10 randomly chosen pennies.

sampling distribution of 10 samples is approximately normal with standard error = 0.009

n= 10
SampleD <- sd/(sqrt(n))
SampleD
## [1] 0.009486833

(c) What is the probability that the mean weight of 10 pennies is less than 2.4 grams?

The probabily is nearly 0.

pnorm(2.4,mean = 2.5,sd = SampleD)
## [1] 2.797279e-26

(d) Sketch the two distributions (population and sampling) on the same scale.

normsample <- seq(mean - (3 * sd), mean + (3 * sd), length=15)
randomsample<- seq(mean - (3 * SampleD), mean + (3 * SampleD), length=15)
popDist <- dnorm(normsample,mean,sd)
sampleDist<- dnorm(randomsample,mean,SampleD)

plot(normsample, popDist, type="l",col="blue", ylim=c(0,75), xlab = "Weights", ylab = "Frequency")
lines(randomsample, sampleDist, col="red")

#### (e) Could you estimate the probabilities from (a) and (c) if the weights of pennies had a skewed distribution?

We can not find the probability accuratly for (a) since the distrubution is skewed. We can find the probability for (c) if we have no of samples is greater than 100 since it is skewed. But with samples 10, it is not possible.