Foundations for inference

Data 606 - Homework 4

Heather Geiger; March 18, 2018

Questions

Question 4.4

Mean = 171.1. Median = 170.3
SD = 9.4, IQR = 177.8 - 163.8 = 14.
z(180) = (180 - 171.1)/9.4 = 0.95. z(155) = (155 - 171.1)/9.4 = -1.71. The person who is 180cm is definitely not usually tall. The person who is 155cm is not super unusually short, as their height is less than 2 sd below the mean. However, they are close to being unusually short, and their height is definitely more unusual than the person whose height is 180cm.
When we take a different sample, we expect the mean and standard deviation of the sample to change slightly from the original sample due to random chance. So, we would not expect them to be exactly the same.
Standard error of the mean here is SD/sqrt(n) = 9.4/sqrt(507) = 0.42.

Question 4.14

False. The sample mean of the current sample always falls within the confidence interval.
False. The sample is large enough that it should be able to accomodate a bit of skew.
False. The confidence interval is about the population, not what other samples would be like.
True.
True.
False. The denominator for the margin of error is equal to the square root of n. To decrease the margin of error to a third of what it is now, we would need to increase the denominator by a factor of 3. The square root of 9n is equal to 3 x the square root of n, so we would need to use a sample 9 times larger.
True. (89.11 - 80.31)/2 = 4.4.

Question 4.24

hypothesis_test_one_tailed <- function(mean_if_null,sample_size,sample_mean,sample_sd){
standard_error_mean <- sample_sd/sqrt(sample_size)
absolute_z <- abs(sample_mean - mean_if_null)/standard_error_mean
p_one_tailed <- pnorm(-1*absolute_z)
return(p_one_tailed)
}

round(hypothesis_test_one_tailed(32,36,30.69,4.31),digits=3)

## [1] 0.034

round(hypothesis_test_one_tailed(32,36,30.69,4.31)*2,digits=3)

## [1] 0.068

confidence_interval <- function(sample_size,sample_mean,sample_sd,confidence_interval_percentage){
standard_error_mean <- sample_sd/sqrt(sample_size)
zscore <- -1*qnorm((100 - confidence_interval_percentage)/200)
zscore_times_se_mean <- zscore*standard_error_mean
return(c(sample_mean - zscore_times_se_mean,sample_mean + zscore_times_se_mean))
}

round(confidence_interval(36,30.69,4.31,90),digits=2)

## [1] 29.51 31.87

Yes, all of the conditions for inference are satisfied.
- The sample size of 36 students is almost certainly less than 10% of the population in a large city. So, we can reasonably assume independence.
- The sample size of 36 is greater than the minimum recommended n of 30.
- The data do not appear unreasonably skewed.
Based on the assumption that we decided to test the hypothesis that gifted children learn to count at a lower age before looking at the data, a one-tailed hypothesis test could be reasonable here. In that case, the p-value would be .034. If we wanted to be extra conservative anyway and use a two-tailed test, the p-value would be double, so .068. In either case, we would reject the null hypothesis, as the p-value in both scenarios is less than the significance level threshold of 0.10 we have chosen.
Using a one-tailed hypothesis test, we would say that there is only a 3.4% chance that we would see a mean age of counting to 10 this young or younger just by random chance based on sampling variability. Using a two-tailed hypothesis test, we would say that there is only a 6.8% chance that we would see a mean age of counting to 10 this different from the mean (either at least 1.31 months older or younger) by chance. In either case, we are OK with up to a 10% chance that the difference we are seeing is actually just due to sampling variability, so we reject the null hypothesis that the mean age of counting to 10 is the same between typical and gifted children.
A 90% confidence interval around the sample mean gives a range for age of counting to 10 in gifted children between 29.51 and 31.87 months.
Yes, the results from the hypothesis test and the confidence interval agree. We find that the mean for typical children is higher than the upper bound of the 90% confidence interval, which confirms what we found using the hypothesis test.

Question 4.26

signif(hypothesis_test_one_tailed(100,36,118.2,6.5)*2,3)

## [1] 2.44e-63

round(confidence_interval(36,118.2,6.5,90),digits=2)

## [1] 116.42 119.98

Since the question is whether the average IQ of mothers of gifted children is different than mothers of typical children (not if it is higher), we multiply the results of the one-tailed hypothesis test by two to get the p-value for a two-tailed hypothesis test. We get a p-value of 2.44e-63, which is clearly much much lower than our alpha of 0.10. We definitely reject the null hypothesis, in favor of the alternative hypothesis that the average IQ of mothers of gifted children is different than mothers of typical children.
The 90% confidence interval around the average IQ of mothers of gifted children is 116.42 to 119.98.
Yes, the results of the hypothesis test and the confidence interval agree. The mean IQ for the mothers of typical children is much lower than the lower bound of the 90% confidence interval, which confirms what we found using the hypothesis test.

Question 4.34

The sampling distribution of the mean describes the distribution of values you would get if you took a number of samples from a population, using the same n each time, and took the mean of each of those samples.

The sampling distribution of the mean is theoretically a normal distribution. When the samples are small, this may not always be exactly true, but you approach closer and closer to perfectly normal as n increases.

The sampling distribution of the mean should be centered around the population mean.

The spread (standard deviation) of the sampling distribution of the mean decreases as n increases.

Question 4.40

pnorm(-1.5)

## [1] 0.0668072

1000/sqrt(15)

## [1] 258.1989

1500/(1000/sqrt(15))

## [1] 5.809475

pnorm(-1*1500/(1000/sqrt(15)))

## [1] 3.133452e-09

plot_setup_sampling_dist <- function(mytitle,max_abs_sd,mymean,mysd,myxmin,myxmax){
plot(seq((-1*max_abs_sd)*mysd + mymean,max_abs_sd*mysd + mymean,length=200),
dnorm(seq((-1*max_abs_sd)*mysd + mymean,max_abs_sd*mysd + mymean,length=200),mean=mymean,sd=mysd),
type="l",xlab="Z-score",ylab="Density",main=mytitle,
xlim=c(myxmin,myxmax),
lty=2)
abline(h=0)
}

plot_setup_population_dist <- function(max_abs_sd,mymean,mysd){
lines(seq((-1*max_abs_sd)*mysd + mymean,max_abs_sd*mysd + mymean,length=200),
dnorm(seq((-1*max_abs_sd)*mysd + mymean,max_abs_sd*mysd + mymean,length=200),mean=mymean,sd=mysd),
type="l")
}

plot_setup_sampling_dist("Population and sampling distributions",6,9000,1000/sqrt(15),3000,15000)
plot_setup_population_dist(6,9000,1000)
legend("topleft",legend=c("Population","Sampling distribution of the mean"),lty=c(1,2),bty="n",cex=0.7)

The probability that a randomly chosen light bulb lasts more than 10,500 hours is the same as the probability of getting z > 1.5, which is the same as probability of z < -1.5 = 0.0668 or 6.68%.
The distribution of the mean lifespan of 15 light bulbs has a mean of 9,000 and a standard deviation of 1000/sqrt(15) ~ 258.20. It is roughly normally distributed.
Z-score here is 1500/258.20 ~ 5.81. Probability of getting z > 5.81 ~ 3.13e-9.
See plot above.
No, you could not estimate the probabilities in a and b with a skewed distribution.

Question 4.48

The p-value will decrease by a factor of ~3.16 (sqrt(10) ~ 3.16). The p-value is based on comparing the difference we see to the standard error of the mean.

Before we got standard error of the mean by dividing by sqrt(50), now we divide by sqrt(500) which is sqrt(10) times larger.