Chapter 4 Foundations for Inference
x <- 180
mu <- mean(bdims$hgt)
sd <- sd(bdims$hgt)
z <- (x - mu) / sd # 0.94
z > 2
## [1] FALSE
As above, we can conclude the 180cm is not unusual.
155cm
x <- 155
mu <- mean(bdims$hgt)
sd <- sd(bdims$hgt)
z <- (x - mu) / sd # -1.71
abs(z) > 2
## [1] FALSE
No I would not expect the new sample to match the above. The Mean and SD would be near value, but not the same.
n.samp <- 507
SE <- sd/sqrt(n.samp)
SE
## [1] 0.4177887
F The sample mean (point estimate) 84.7067651 is always in the confidence level. The 95% confidence interval covers the population mean (a parameter) with 95% probability.
F Based on the condition for sample mean being nearly normal and SE being accurate, the sample observations should be independent and size should be larger than 30. The population distribution should not be strongly skewed. If there is any prominent outliers present, the sample should be at least 100 observations. The conditions on the confidence interval are met for this example. All samples will show different distributions.
F Samples are to predict on the populations. Also, given the sample’s nature, different sizes of samples may have different confidence intervals as we saw in lab4b. We can state that the mean value of 95% of the random sample (n = 436) lie within the confidence interval.
T The confidence interval covers the parameter value( average spending of an average american adult) with probability 95%.
T The higher confidence, the broader possible outcomes.
F A sample size of 3 times bigger is not enough, since the SE equals sigma / sqrt(n). To make the confidence interval smaller to 1/3 of what it is now, we need a sample size 9 times bigger than 436. (SE = sigma / (3 to the power * n))
T The margin of error is given by z * se 4.4
Independant samples selected randomly and its size is larger than size 30. The sample satisfies the basic requirements.
H0: mu = 32 H1: mu < 32
x <- 32 #mean value
n <- 36 #sample size
min <- 32
mu <- 30.69
sd <- 4.31
max <- 39
alpha <- 0.10
Z <- (mu - min) / (sd / sqrt(n))
P <- pnorm(Z, mean = 0, sd = 1)
P >= alpha
## [1] FALSE
As above, P-value is lower than the significance level of 0.10, thus we reject H0 the null hypothesis.
There is significant eviddence to infer that hte gifted children can count to 10 earlier than general population does.
low <- mu - 1.645 * sd / sqrt(n)
high <- mu + 1.645 * sd / sqrt(n)
90% Confidnece interval is given by 29.5083417, 31.8716583
e.Yes. With 90% confidence , the population mean of gifted children is between 29.5083417, 31.8716583. The value of 32 months is outside of the confidence interval. We can conclude that 32 months is an unusual event.
x <- 100
n <- 36
min <- 101
mu <- 118.2
sd <- 6.5
max <- 131
alpha <- 0.10
z <- (mu - x) / sd
p <- 1 - pnorm(z, 0, 1)
p < alpha
## [1] TRUE
H0: mu = 100 avg of Gifted children’s mothers IQ = avg of Population’s IQ Ha: mu != 100 avg of Gifted children’s mothers IQ != avg of Population’s IQ
Since p < alpha = TRUE, we reject the null hypothesis. The data favors that mother of gifted children does have higher mean IQ than mothers in general population.
SE <- sd / sqrt(n)
high <- mu + (1.645 * SE)
low <- mu - (1.645 * SE)
A 90% confidence interval for the average IQ of mothers of gifted children is 116.4179167, 119.9820833
Sampling distribution of the mean:
It obeys the Central Limit Theorem in that it has a normal distribution(given the sample size is larger than 30, and not strongly skewed) and that it would tend towards the mean (spread becomes narrower) as simple size increases.
As the sample size increases
mu <- 9000
sd <- 1000
x <- 10500
z <- (x - mu) / sd
prob <- 1 - pnorm(z)
The probability that a randomly chosen light bulb lasts more than 10,500 hours is 0.0668072
The random sampling of 15 independent light bulbs, the distribution of the mean lifespan would be centered near population mean, centered around 9000, and having a nearly normal shape.
n <- 15
se <- sd / sqrt(n)
z <- (x - mu) / se
p <- pnorm(z, mean = mu, sd = sd)
p
## [1] 1.189897e-19
The probability that the tmean lifespan of 15 randomly chosen light bulbs is more than 10,500 hours is approximately 0%.
normal.sample <- seq(mu - (4 * sd), mu + (4 * sd), length=100) # normal sample
h.norm <- dnorm(normal.sample, mean=mu, sd=sd)
df <- data.frame(name="Population", x =normal.sample, h.norm)
random.sample <- seq(mu - (4 * se), mu + (4 * se), length=100) # random sample
h.rand <- dnorm(random.sample, mean=mu, sd=se)
df <- rbind(df, data.frame(name="Sample", x = random.sample, h.norm=h.rand))
ggplot(df, aes(x, h.norm,
color = name)) + geom_line()
The P value depends on the standard of error.
To calculate standard of error:
With SE, we can calculate Z score:
To calculate the P value:
So if we use N = 500, the denominator in the SE would be larger, thus the SE would be smaller. If the SE becomes smaller, then the Z score will get larger. If the Z score is larger, then (1 - prob value from the Z score) * 2 will get smaller.
Thus the P value will decrease.
Having a higher N value will allow you to reject the H0 null hypothesis in favor of the Ha alternavite hypothesis.