# Load the data
bdims <- read.csv('https://raw.githubusercontent.com/jbryer/DATA606Fall2018/master/data/openintro.org/Ch%204%20Exercise%20Data/bdims.csv')
#a The point estimate for the average height of active individuals is 171.14.
mean(bdims$hgt)
## [1] 171.1438
#T he median is 170.3
median(bdims$hgt)
## [1] 170.3
#b The point estimate for the standard deviation for the heights of active individuals is 9.41.
sd(bdims$hgt)
## [1] 9.407205
# The IQR is 14
IQR(bdims$hgt)
## [1] 14
#c A person who is 180 cm tall is not considered unsually tall because he is in 2 standard deviation of the mean.
# A person who is 155 cm tall is not considered unsually short because he is in 2 standard deviation of the mean.
x1 <- 180
x2 <- 155
mu <- mean(bdims$hgt)
sigma <- sd(bdims$hgt)
(x1 - mu) / sigma # tall person
## [1] 0.9414287
(x2 - mu) / sigma # short person
## [1] -1.716109
#d The mean and the standard deviation of a new random sample would have very small change
# because the sample size is big enough i.e. n >= 30.
#e We measure the standard error to quantify the variability of such an estimate. i.e. 0.42
sigma / sqrt(507)
## [1] 0.4177887
#a False. We know 100% that the average spending costs of the sample size is between $80.31 and $89.11.
#b False. This confidence interval is valid because the right skew is not extreme and n is greater than 30
#c False. Random samples and confidence interval are two different things in this context.
#d True, by definition, confidence interval contains the plausible range of values for the population parameter.
#e True, 90% confidence interval would be narrower since it would leave out outliers.
#f False. In order to decrease the margin of error, we would need to use a sample 9 times larger.
#g True. The margin of error is 4.4
upperbound <- 89.11
lowerbound <- 80.31
(upperbound - lowerbound) / 2
## [1] 4.4
24 Researchers investigating characteristics of gifted children collected data from schools in a large city on a random sample of thirty-six children who were identified as gifted children soon after they reached the age of four. The following histogram shows the distribution of the ages (in months) at which these children first counted to 10 successfully. Also provided are some sample statistics.
a) Are conditions for inference satisfied?
b) Suppose you read online that children first count to 10 successfully when they are 32 months old, on average. Perform a hypothesis test to evaluate if these data provide convincing evidence that the average age at which gifted children fist count to 10 successfully is less than the general average of 32 months. Use a significance level of 0.10.
c) Interpret the p-value in context of the hypothesis test and the data.
d) Calculate a 90% confidence interval for the average age at which gifted children first count to 10 successfully.
e) Do your results from the hypothesis test and the confidence interval agree? Explain.
#a Yes, conditions for inference are satisfied because the sample is likely random.
# Also, the sample is nearly normal with very little skew and the n is large enough.
#b Null Hypothesis: The average age of child counting to 10 sucessfully is ??=32 months
# Alternative Hpothesis: The average age of child counting to sucessfully is ??<32 months
xbar <- 30.69
sd <- 4.31
n <- 36
a <- 32
se <- sd / sqrt(n) # standard error
z_value <- (xbar - a) / se
2*pnorm(-abs(z_value))
## [1] 0.0682026
#c Since the p-value is less than the signficance level. Therefore, we reject the null hypothesis.
# This suggest that this children take less than 32 months to successfully count to 10.
#d 90% confidence interval for the average age at which gifted children first count to 10 sucessfully is (29.51, 31.87)
lower_tail <- xbar - 1.645 * se
upper_tail <- xbar + 1.645 * se
round(c(lower_tail, upper_tail), 2)
## [1] 29.51 31.87
#e Yes, the results from the hypothesis test and the confident interval agree.
# The sample mean is not in the confidence internval meaning on average this children
# take less than 32 months (upper tail is less than 32) to sucessfully count to 10.
#a Null Hypothesis: The average IQ of mothers of gifted children is ??=100
# Alternate Hypothesis: The average IQ of mothers of gifted children is ??!=100
# Since the p-value is less than the signficance level. Therefore, we reject the null hypothesis.
# This suggest that the average IQ of mothers of gifted children is different.
gifted <- read.csv('https://raw.githubusercontent.com/jbryer/DATA606Fall2018/master/data/openintro.org/Ch%204%20Exercise%20Data/gifted.csv')
mother_iq <- gifted$motheriq
xbar <- mean(mother_iq)
sd <- sd(mother_iq)
n <- 36
a <- 100
se <- sd / sqrt(n)
z_value <- (xbar - a) / se
2*pnorm(-abs(z_value))
## [1] 5.077477e-63
#b 90% confidence interval for the average IQ of mothers of gifted children is (116.38, 119.95)
lower_tail <- xbar - 1.645 * se
upper_tail <- xbar + 1.645 * se
round(c(lower_tail, upper_tail), 2)
## [1] 116.38 119.95
#c Yes, the interval is higher than the sample mean of the null hypothesis.
# A andom sampling of independent observation and calculating the mean of the sample is called sampling distribution.
# As sample size increases, the shape is symmetric normal distirubtion, and centered at the true population mean.
# As the sample size increases we would expect samples to yield more consistent sample means,
# hence the variability among the sample means would be lower.
xbar <- 9000
sd <- 1000
#a The probability that a randomly chosen light bulb lasts more than 10,500 hours is 6.68%
1 - pnorm(10500, xbar, sd)
## [1] 0.0668072
#b The distribution of the mean lifespan of 15 light bulbs is normally distributed with
# the mean 9,000 and with standard deviation 258.2 i.e. N(9000, 258.2)
n <- 15
se <- sd / sqrt (n)
se
## [1] 258.1989
#c The probability that the mean lifespace of 15 randomly chosen light bulbs
# is more than 10,500 is 0%
a <- 10500
1 - pnorm(a, xbar, se)
## [1] 3.133452e-09
#d Plot of population distribution and sampling distribution
par(mfrow = c(2,1))
bulb_population <- rnorm(10000, xbar, sd)
bulb_sample <- rnorm(15, xbar, sd)
hist(bulb_population, xlim = c(4000,14000), prob = TRUE)
hist(bulb_sample, xlim = c(4000,14000), prob = TRUE)
#d If the lifespans of light bulbs had a skewed distribution, then the sample size is large.
# Therefore, for part(a) the probability could be estimated. Part(c) sample size is less than 30.
# If the sample size increases, the p-value decreases because small p-value indicates strong evidence against the null hypothesis.
# As we sample more observations from the population, we get a true population mean. Therefore less uncertainty i.e. p-value.