Homework 4

Problem 4.4

(a) The point estimates for the average and medians are the sample mean and median, respectively. Therefore, the point estimates for average and meadian are \(\boxed{171.1}\) and \(\boxed{170.3}\), respectively.

(b) The point estimate for the standard deviation is the sample standard deviation, which is \(\boxed{9.4}\). The IQR is the difference between Q3 and Q1:

\(\text{IQR} = \text{Q}_{3} - \text{Q}_{1} = 177.8 - 163.8 = \boxed{14.0}\)

(c) For this problem, I calculated the Z-score for 180 and 155 cm:

\(Z = \frac{x - \mu}{\sigma} = \frac{180 - 171.1}{9.4} = \boxed{0.947}\)

\(Z = \frac{x - \mu}{\sigma} = \frac{155 - 171.1}{9.4} = \boxed{-1.713}\)

zscore <- vector()
heights <- c(180,155)
mean <- 171.1
sd <- 9.4
for (i in 1:length(heights)) {
  zscore[i] = (heights[i]-mean)/sd
}
zscore

[1]  0.9468085 -1.7127660

Both of these heights have z-scores that are within 2 standard deviations of this mean. From this, we can say that these heights are not abnormal.

(d) I would expect them to be different than the previous study, but for them to be somewhat close. These are simply estimates of the population, so a different sample of people could provide different results.

(e) The variability is measured by the standard error. The equation and answer are shown below:

\(SE = \frac{s}{\sqrt{n}} = \frac{9.4}{\sqrt{507}} = \boxed{0.417}\)

SE <- 9.4/sqrt(507)
SE

[1] 0.4174687

Problem 4.14

(a) FALSE. We are 100% confident that the average spending for these 436 Americans is in this interval. We are 95% confident that the population’s spending is in between this interval.

(b) FALSE. Since our sample size is greater than 30 and only slightly skewed, we can say that the distribution is normal.

(c) FALSE. The confidence interval is for the population mean.

(d) TRUE.

(e) TRUE.

(f) FALSE. The equation for error is as follows:

\(E = z^{*}\frac{s}{\sqrt{n}}\)

We want to find the ratio of the new sample size (\(n_{\text{new}}\)) to the original size (\(n_{\text{original}}\)):

\(\frac{1}{3} = \frac{E_{\text{new}}}{E_{\text{original}}} = \frac{z^{*}\frac{s}{\sqrt{n_{\text{new}}}}}{z^{*}\frac{s}{\sqrt{n_{\text{original}}}}}\)

\(\frac{1}{3} = \frac{\frac{1}{\sqrt{n_{\text{new}}}}}{\frac{1}{\sqrt{n_{\text{original}}}}}\) \(\rightarrow \left(\frac{1}{3}\right)^{2} = \left(\frac{\frac{1}{\sqrt{n_{\text{new}}}}}{\frac{1}{\sqrt{n_{\text{original}}}}}\right)^{2}\)

\(\frac{1}{9} = \frac{\frac{1}{n_{\text{new}}}}{\frac{1}{n_{\text{original}}}}\) \(\rightarrow n_{\text{new}} = 9 \times n_{\text{original}}\)

Therefore, the sample would need to be 9 times the size of the original.

(g) TRUE. The margin of error is half of the range of the confidence interval:

\(\text{Margin of Error} = \frac{89.11-80.31}{2} = \boxed{4.4}\)

Problem 4.24

(a) YES. The conditions for inference are independence, random and normal. It is independent because 36 students is less than 10% of the population. This is a random sample. It is normal because the sample size is larger than 30.

(b) The claim is that these gifted children will be able to count to 10 prior to turning 32 months old. Therfore the hypotheses will be:

Null Hypothesis - \(H_{0}: \mu = 32\)

Alternative Hypothesis - \(H_{a}: \mu < 32\)

We then need to compute the Z-score for this:

\(Z = \frac{\bar{x} - \text{null value}}{SE_{\bar{x}}} = \frac{\bar{x} - \mu}{s/\sqrt{n}} = \frac{30.69 - 32}{4.31/\sqrt{36}} = -1.82367\)

Now, that we have the Z-score, we can find the p-value based on the Z-score:

\(P(Z < -1.82367) = 0.0341\)

mu <- 32
mean <- 30.69
sd <- 4.31
n <- 36
Z <- (mean-mu)/(sd/sqrt(n))
if (Z > 0) {
  pnorm(-Z)
} else {
  pnorm(Z)
}

[1] 0.0341013

Since the p-value is less than the significance level, we can reject the null hypothesis in favor of the alternative, and say that gifted children tend to count to 10 in less than 32 months.

(c) If the average age for children to count to 10 is 32 months across the population, then we have 3.41% chance of getting a sample of 36 children that count to 10 in less than 30.69 months.

(d) For a confidence interval, we need to use the following equation:

\(\text{Confidence Interval} = \bar{x} \pm z^{*}SE\)

The \(z^{*}\) value corresponds to the confidence level. In our case, we will need to find this value for half the significance level (0.05). I use the qnorm() function to do this. The rest of the calculation is also shown below with the corresponding confidence interval.

z <- qnorm(0.05,lower.tail=FALSE)
mean <- 30.69
sd <- 4.31
n <- 36
se <- sd/sqrt(n)
CI1 <- mean - z*se
CI2 <- mean + z*se
cat("Confidence Interval: (",CI1,",",CI2,")")

Confidence Interval: ( 29.50845 , 31.87155 )

(e) Yes, becasue the confiedence interval does not contain 32 months, and we rejected the null in favor of the alternative and said the average was less than 32 months for gifted children.

Problem 4.26

(a) The set up of this problem is exactly the same as the previous example, except we are saying not equal to for the alternative hypothesis rather than less than. This is important because since this is a two-sided test, we will have to doble the p-value.

Null Hypothesis - \(H_{0}: \mu = 100\)

Alternative Hypothesis - \(H_{a}: \mu \ne 100\)

mu <- 100
mean <- 118.2
sd <- 6.5
n <- 36
Z <- (mean-mu)/(sd/sqrt(n))
if (Z > 0) {
  2*pnorm(-Z)
} else {
  2*pnorm(Z)
}

[1] 2.44044e-63

Since the p-value is less than the significance level, we can reject the null hypothesis in favor of the alternative hypothesis that the mothers of gifted children have a different IQ than mothers of children who are not classified as gifted.

(b) The confidence interval calculation is describe in 4.24b. Here is the solution for this problem.

z <- qnorm(0.05,lower.tail=FALSE)
mean <- 118.2
sd <- 6.5
n <- 36
se <- sd/sqrt(n)
CI1 <- mean - z*se
CI2 <- mean + z*se
cat("Confidence Interval: (",CI1,",",CI2,")")

Confidence Interval: ( 116.4181 , 119.9819 )

(c) Yes, the results agree because 100 is not in the confidence interval, and we rejected the null hypothesis.

Problem 4.34

The “sampling distribution” of the means is the distribution of means from many different samples that correspond to a population. For example, say we want to find the average weight of all the goldfish in a very large pond. Instead of sampling every fish, we can do several random samples and find the means for each of them. Then, we can find the distribution for all of those means. This would be the sampling distribution of the sample mean.

The central limit theorem tells us that as we add more samples, the distribution will become more normal. As we grow in sample size, the mean will remain the same, the standard deviation will become smaller (\(\sigma/\sqrt{n}\)), and the distribution will become more uniform and normal.

Problem 4.40

(a) We are looking for \(P(X > 10,500)\). First, we need to calculate the Z-score using the following equation:

\(Z = \frac{x-\mu}{\sigma}\)

Then, we need to calculate the p-value from this. The code is shown below:

x <- 10500
mu <- 9000
sd <- 1000
Z <- (x-mu)/sd
1-pnorm(Z)

[1] 0.0668072

Therefore, the probability of a lightbulb lasting more than 10,500 hours is \(\boxed{6.68\%}\)

(b) Since the population distribution is approximately normal, the sampling distribution is approximately normal. The sampling mean is equal to the population mean, and the standard deviation is the standard deviation divided by the square root of the sample size:

\(\mu_{\bar{x}} = \mu = \boxed{9000}\)

\(\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{1000}{\sqrt{15}} = \boxed{258.199}\)

(c) The equation for Z-score is:

\(Z = \frac{\bar{x}-\mu}{\sigma_{\bar{x}}} = \frac{\bar{x}-\mu}{\sigma/\sqrt{n}}\)

In addition, since we are looking for the probability that it is greater than, we will have to use the following equation:

\(P(X > 10500) = 1 - P(X<10500)\)

The solution is shown below:

mu <- 9000
mean <- 10500
sd <- 1000
n <- 15
Z <- (mean-mu)/(sd/sqrt(n))
1-pnorm(Z)

[1] 3.133452e-09

This is approximately 0, which means there is virtually no chance that the mean probability will be greater than 10,500 hours.

(d) Below i the code for the plots, and the plots themselves:

library(ggplot2)
gginit.4.40 <- ggplot(data=data.frame(x=c(5000,13000)),aes(x))
stattype.4.40.1 <- stat_function(fun=dnorm,args=list(mean=9000,sd=1000),color="red")
stattype.4.40.2 <- stat_function(fun=dnorm,args=list(mean=9000,sd=258.199),color="blue")
annotate.4.40.1 <- annotate(geom="text",x=10000,y=0.0008,label="Sampling", color="blue")
annotate.4.40.2 <- annotate(geom="text",x=10500,y=0.0003,label="Population", color="red")
theme.4.40 <- theme_bw() +
             theme(axis.line.x = element_line(color="black"),
                   axis.line.y = element_blank(),
                   axis.text.y = element_blank(),
                   axis.title.y = element_blank(),
                   axis.ticks.y = element_blank(),
                   panel.grid.major = element_blank(),
                   panel.grid.minor = element_blank(),
                   panel.border = element_blank(),
                   panel.background = element_blank())

gginit.4.40 + stattype.4.40.1 + stattype.4.40.2 + theme.4.40 + xlab("Hours Lasted") + annotate.4.40.1 + annotate.4.40.2

(e) No, we could not. A requirement is that the distribution is approximately normal.

Problem 4.48

Let us take the equation for the Z-score: