Heights of adults. (7.7, p. 260) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender, for 507 physically active individuals. The histogram below shows the sample distribution of heights in centimeters.
point estimate = 171.1
median = 170.3
Standard deviation = 9.4
IQR = (Q3 - Q1) = (177.8 - 163.8) = 14
180 cm is taller than the average person likewise 155 cm is shorter than the average. If the definition of unusual is greater than one standard deviation then both of these individuals are unusually tall/short.
I expected the mean and standard deviation to be close to the one we already have.
Use the standard error
9.4 / sqrt(507)
## [1] 0.4174687
Thanksgiving spending, Part I. The 2009 holiday retail season, which kicked off on November 27, 2009 (the day after Thanksgiving), had been marked by somewhat lower self-reported consumer spending than was seen during the comparable period in 2008. To get an estimate of consumer spending, 436 randomly sampled American adults were surveyed. Daily consumer spending for the six-day period after Thanksgiving, spanning the Black Friday weekend and Cyber Monday, averaged $84.71. A 95% confidence interval based on this sample is ($80.31, $89.11). Determine whether the following statements are true or false, and explain your reasoning.
\(\mu=84.71 \\\) 95% confidence (80.31, 89.11)
False, the range is a population parameter so we are 100% sure the average spending of these 436 American adults is between $80.31 and $89.11.
Since n=436 which is > 30 we know by the Central limit theorem the distribution will be normal, this sample is right skewed.
False, based on the information provided here we can’t tell. However if the actual mean is 84.71 then we can say that 95% of random samples with have a mean between $80.31 and $89.11.
True for this sample of 436.
True, A lower confidence interval is narrower
False, No you would need to sample 9 times more since margin of error is calculated by sqrt(p ( (1-p) / n)), the square root causes the non-linear requirement
Gifted children, Part I. Researchers investigating characteristics of gifted children col- lected data from schools in a large city on a random sample of thirty-six children who were identified as gifted children soon after they reached the age of four. The following histogram shows the dis- tribution of the ages (in months) at which these children first counted to 10 successfully. Also provided are some sample statistics.
Are conditions for inference satisfied? The requirement for inference is n>30 and 36 just meets the requirement
Suppose you read online that children first count to 10 successfully when they are 32 months old, on average. Perform a hypothesis test to evaluate if these data provide convincing evidence that the average age at which gifted children fist count to 10 successfully is less than the general average of 32 months. Use a significance level of 0.10.
\(H_o: \mu = 32 \\\) \(H_a: \mu < 32 \\\)
For significance level 0.1 -> z = 1.645
std = 4.31
n = 36
df = n - 1
X_bar = 32
mu = 30.69
se <- std / sqrt(n)
t <- (X_bar - mu) / se
t_crit <- 1.69
Since t (1.823) > t_crit(1.69) we reject the null hypothesis
p_val <- 1 - pnorm(t, mean=0, sd=1)
The p_val is 0.034 so there is a 3.4% chance that the difference between those two are due to chance
c(30.69 - t_crit * se, 30.69 + t_crit * se)
## [1] 29.47602 31.90398
The confidence interval of 10% produced a range of 29.5 - 31.9 months at which a child who counts to 10 can be considered gifted. The hypothesis test of 32 is outside the range so it was correct to reject it.
Gifted children, Part II. Exercise above describes a study on gifted children. In this study, along with variables on the children, the researchers also collected data on the mother’s and father’s IQ of the 36 randomly sampled gifted children. The histogram below shows the distribution of mother’s IQ. Also provided are some sample statistics.
se = 6.5 / sqrt(36)
mu = 118.2
t = (mu - 100) / se
It looks like t=16.8 >> t_crit=1.69 therefore we fail to reject the null hypothesis
c(mu - t_crit * se, mu + t_crit * se)
## [1] 116.3692 120.0308
The confidence interval of 10% produced an IQ range of 116-119 for mothers of gifted children. The hypothesis test of 100 is outside the range so it was correct to fail to reject it.
CLT. Define the term “sampling distribution” of the mean, and describe how the shape, center, and spread of the sampling distribution of the mean change as sample size increases.
As sample size (n) increases the shape of the curve becomes more narrow and skewness less apparent. The shape is more normal and the center p is a closer approximation of the true population mean.
CFLBs. A manufacturer of compact fluorescent light bulbs advertises that the distribution of the lifespans of these light bulbs is nearly normal with a mean of 9,000 hours and a standard deviation of 1,000 hours.
$ H_o: = 9000, =1000$ $ H_a: $
X = 10500
mu = 9000
std = 1000
z = (X - mu) / std
1 - pnorm(z)
## [1] 0.0668072
There is a 6.7% chance that a a randomly chosen light bulb lasts more than 10,500 hours
n = 15
se = std/sqrt(n)
The problem statement states the distribution is nearly normal; however since n<30 the standard error will be large, here it’s 258.2
\(P(X > \mu | n=15) = 1 - P(X <= \mu | n=15)\)
n = 15
X_bar = 10500
1 - pnorm(X_bar, mean=mu, sd = se)
## [1] 3.133452e-09
It looks like there is a near 0 chance that 15 randomly chosen light bulbs will last more than 10,500 hours.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
population = 100000
mu = 9000
std = 1000
n = 15
dist = rnorm(population, mu, std)
sample_pop15 = sample(dist, size=n)
data <- rbind(data.frame(x=dist, label=c('Population')), data.frame(x=sample_pop15, label=c('Sampling')))
data %>% ggplot(aes(x=x,color=label)) + geom_density()
I believe we can always “estimate” the probabilities; however, the accuracy of those estimates will be low with a skewed distribution especially since n is so small (<30)
Same observation, different sample size. Suppose you conduct a hypothesis test based on a sample where the sample size is n = 50, and arrive at a p-value of 0.08. You then refer back to your notes and discover that you made a careless mistake, the sample size should have been n = 500. Will your p-value increase, decrease, or stay the same? Explain.
n=50 p_val=0.08
Noting that our initial n > 30 and our true n >> 30 ie 500, I would first expect the stand error to go down since due to it’s sensitivity to n. A lower standard error will give us results closer to the true population parameters and thus lowering our p_value.