4.4

  1. Point estimate is 171.1. The median is 170.3.

  2. Standard deviation = 9.4. IQR -> 177.8-163.8 = 14

  3. Both measurements are within 2 standard deviations from the mean which isn’t unusual.

  4. I would not expect the mean and standard deviation to be exactly the same. Since the distribution looks symmetric and without skewness, I would expect the next sample to be similar.

  5. Standard Error (SE). \(SE = \frac{9.4}{\sqrt{507}} = 0.417\)

4.14

tg_data <- "https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%204%20Exercise%20Data/tgSpending.csv"
tg <- read_table(tg_data)
## Parsed with column specification:
## cols(
##   spending = col_double()
## )
spending <- tg$spending
n <- 436 #sample size

summary(spending)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.719  49.177  75.792  84.707 112.255 282.803
sd <- sd(spending) #standard deviation
se <- sd/sqrt(n) #standard error
m_of_e <- 1.96*se #margin of error
print(paste0("Standard Deviation is ", round(sd,2)))
## [1] "Standard Deviation is 46.93"
print(paste0("Standard Error is ", round(se,2)))
## [1] "Standard Error is 2.25"
print(paste0("Margin of Error is ", round(m_of_e,2)))
## [1] "Margin of Error is 4.41"
  1. TRUE. CI’s are used to determine how much confidence a parameter falls between a certain interval

  2. FALSE. We are more lenient with the skew because the sample size was so large

  3. TRUE. 95% of those intervals would contain the population mean.

  4. TRUE. CI’s are used to determine how much confidence a parameter falls between a certain interval

  5. TRUE. 90% interval would have a smaller interval

  6. FALSE. Calculations below show it needs to be about 9 times larger.

  7. TRUE

n3 <- 436*9 #Sample size times 9
se3 <- sd/sqrt(n3) #standard error for larger sample size

m_of_e
## [1] 4.405038
m_of_e3 <- 1.96*se3 #margin of error for larger sample size
m_of_e3
## [1] 1.468346

4.24

gifted_data <- "https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%204%20Exercise%20Data/gifted.csv"
gifted <- read_csv(gifted_data)
## Parsed with column specification:
## cols(
##   score = col_integer(),
##   fatheriq = col_integer(),
##   motheriq = col_integer(),
##   speak = col_integer(),
##   count = col_integer(),
##   read = col_double(),
##   edutv = col_double(),
##   cartoons = col_double()
## )
count <- gifted$count
summary(count)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   21.00   28.00   31.00   30.69   34.25   39.00
hist(count)

n <- length(count) #sample size
sd <- sd(count) #standard deviation
  1. Yes. Sample size is random and less than 10% of the population. The sample size is greater than 30. The distribution is not strongly skewed.

\(H_O: \mu = 30.69 \ months\)

\(H_A: \mu \gt 30.69 \ months\)

\(\alpha = 0.10\)

Standard Error (se) = \(\frac{sd}{\sqrt{n}}\)

se <- sd/sqrt(n)
se
## [1] 0.7191479

Z Score = \(\frac{(Sample Mean - Null Value)}{se}\)

z <- (30.69-32)/se
z
## [1] -1.8216

p-value

p <- (1-pnorm(30.69, 32, se))
p
## [1] 0.9657422

p-value is greater than \(\alpha = 0.10\) so we can reject the alternate hypotheses that the children can count to 10 successfully at an average of 32 months

  1. The p-value is the probability that the alternate mean average falls outside of \(z\times se\) from the null value.

#from z table
m <- mean(count)
lower <- m - 1.645 * se
upper <- m + 1.645 * se
c(lower, upper)
## [1] 29.51145 31.87744
  1. Yes, the sample mean is very close to the population mean and falls within the confidence interval.

4.26

iq <- gifted$motheriq

summary(iq)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   101.0   113.8   118.0   118.2   122.2   131.0
hist(iq)

sd <- sd(iq)
n <- length(iq)

\(H_O: \mu = 100\)

\(H_A: \mu \neq 100\)

\(\alpha = 0.10\)

Standard Error (se) = \(\frac{sd}{\sqrt{n}}\)

se <- sd/sqrt(n)
se
## [1] 1.084157

Z Score = \(\frac{(Sample Mean - Null Value)}{se}\)

z <- (118.2-100)/se
z
## [1] 16.78723

p-value

p <- 2 * (1-pnorm(118.2, 100, se))
p
## [1] 0

The p-value is smaller than the significance value so we can reject the null hypothesis and say that the mothers IQ of gifted children is significantly different the the average IQ of the population.

#from z table
m <- mean(iq)
lower <- m - 1.645 * se
upper <- m + 1.645 * se
c(lower, upper)
## [1] 116.3832 119.9501

Yes, there is enough evidence to suggest that the average sample IQ is very different to the average population IQ

4.34

Sampling distribution of the mean is the plot of mean values where the range of means is on the x-axis and the frequency of the means is on the y-axis. With a low sample size, the shape will not look unimodal. It may or may not be evenly distributed and it will be difficult to discern if it’s symmetrical or has a skew. The spread may be very wide and there will not be a clear center.

With a large sample size, the distribution will look unimodal, the spread will become smaller with less range and there will be a clear center around a specific value

4.40

  1. Probability that randomly chosen light bulb lasts more than 10,500 hours
prob <- 1-pnorm(10500, mean = 9000, sd = 1000)
prob
## [1] 0.0668072
  1. Since the problems stated that the lifespan of the bulbs are nearly normal we can expect the sample distribution to be normal as well. We can use the sample size to find the Standard Error and Z-Score.

Standard Error (se) = \(\frac{1000}{\sqrt{15}}\)

Z Score = \(\frac{(Sample Mean - Null Value)}{se}\)

se <- 1000/sqrt(15)
se
## [1] 258.1989
z <- (10500-9000)/se
z
## [1] 5.809475
  1. Probability is basically 0% that the mean lifespan is more than 10500 hours.
prob <- 1-pnorm(10500, 9000, se)
prob
## [1] 3.133452e-09

Population

x <- sort(rnorm(n = 10000, mean = 9000, sd = 1000))
y <- dnorm(x = x, mean = 9000, sd = 1000)
plot(x,y, type = "l", xlim = c(5500, 12500))
lines(x = x, y = y, col = "blue")

Sample

x <- sort(rnorm(n = 15, mean = 9000, sd = se))
y <- dnorm(x = x, mean = 9000, sd = se)
plot(x,y, type = "l", xlim = c(5500, 12500))
lines(x = x, y = y, col = "blue")

  1. Would could use the CLT for part A since the population size is large. We could not estimate for part C without a normal distribution.

4.48

\(z = \frac{(\bar x - \mu) \sqrt{n}}{\sigma}\)

The large the \(n\) value the large the z-value increases. This means as the sample size gets larger, it is more likely to go into the \(H_A\) region, so the p-value decreases.