Point estimate is 171.1. The median is 170.3.
Standard deviation = 9.4. IQR -> 177.8-163.8 = 14
Both measurements are within 2 standard deviations from the mean which isn’t unusual.
I would not expect the mean and standard deviation to be exactly the same. Since the distribution looks symmetric and without skewness, I would expect the next sample to be similar.
Standard Error (SE). \(SE = \frac{9.4}{\sqrt{507}} = 0.417\)
tg_data <- "https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%204%20Exercise%20Data/tgSpending.csv"
tg <- read_table(tg_data)
## Parsed with column specification:
## cols(
## spending = col_double()
## )
spending <- tg$spending
n <- 436 #sample size
summary(spending)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.719 49.177 75.792 84.707 112.255 282.803
sd <- sd(spending) #standard deviation
se <- sd/sqrt(n) #standard error
m_of_e <- 1.96*se #margin of error
print(paste0("Standard Deviation is ", round(sd,2)))
## [1] "Standard Deviation is 46.93"
print(paste0("Standard Error is ", round(se,2)))
## [1] "Standard Error is 2.25"
print(paste0("Margin of Error is ", round(m_of_e,2)))
## [1] "Margin of Error is 4.41"
TRUE. CI’s are used to determine how much confidence a parameter falls between a certain interval
FALSE. We are more lenient with the skew because the sample size was so large
TRUE. 95% of those intervals would contain the population mean.
TRUE. CI’s are used to determine how much confidence a parameter falls between a certain interval
TRUE. 90% interval would have a smaller interval
FALSE. Calculations below show it needs to be about 9 times larger.
TRUE
n3 <- 436*9 #Sample size times 9
se3 <- sd/sqrt(n3) #standard error for larger sample size
m_of_e
## [1] 4.405038
m_of_e3 <- 1.96*se3 #margin of error for larger sample size
m_of_e3
## [1] 1.468346
gifted_data <- "https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%204%20Exercise%20Data/gifted.csv"
gifted <- read_csv(gifted_data)
## Parsed with column specification:
## cols(
## score = col_integer(),
## fatheriq = col_integer(),
## motheriq = col_integer(),
## speak = col_integer(),
## count = col_integer(),
## read = col_double(),
## edutv = col_double(),
## cartoons = col_double()
## )
count <- gifted$count
summary(count)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.00 28.00 31.00 30.69 34.25 39.00
hist(count)
n <- length(count) #sample size
sd <- sd(count) #standard deviation
Yes. Sample size is random and less than 10% of the population. The sample size is greater than 30. The distribution is not strongly skewed.
\(H_O: \mu = 30.69 \ months\)
\(H_A: \mu \gt 30.69 \ months\)
\(\alpha = 0.10\)
Standard Error (se) = \(\frac{sd}{\sqrt{n}}\)
se <- sd/sqrt(n)
se
## [1] 0.7191479
Z Score = \(\frac{(Sample Mean - Null Value)}{se}\)
z <- (30.69-32)/se
z
## [1] -1.8216
p-value
p <- (1-pnorm(30.69, 32, se))
p
## [1] 0.9657422
p-value is greater than \(\alpha = 0.10\) so we can reject the alternate hypotheses that the children can count to 10 successfully at an average of 32 months
The p-value is the probability that the alternate mean average falls outside of \(z\times se\) from the null value.
#from z table
m <- mean(count)
lower <- m - 1.645 * se
upper <- m + 1.645 * se
c(lower, upper)
## [1] 29.51145 31.87744
iq <- gifted$motheriq
summary(iq)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 101.0 113.8 118.0 118.2 122.2 131.0
hist(iq)
sd <- sd(iq)
n <- length(iq)
\(H_O: \mu = 100\)
\(H_A: \mu \neq 100\)
\(\alpha = 0.10\)
Standard Error (se) = \(\frac{sd}{\sqrt{n}}\)
se <- sd/sqrt(n)
se
## [1] 1.084157
Z Score = \(\frac{(Sample Mean - Null Value)}{se}\)
z <- (118.2-100)/se
z
## [1] 16.78723
p-value
p <- 2 * (1-pnorm(118.2, 100, se))
p
## [1] 0
The p-value is smaller than the significance value so we can reject the null hypothesis and say that the mothers IQ of gifted children is significantly different the the average IQ of the population.
#from z table
m <- mean(iq)
lower <- m - 1.645 * se
upper <- m + 1.645 * se
c(lower, upper)
## [1] 116.3832 119.9501
Yes, there is enough evidence to suggest that the average sample IQ is very different to the average population IQ
Sampling distribution of the mean is the plot of mean values where the range of means is on the x-axis and the frequency of the means is on the y-axis. With a low sample size, the shape will not look unimodal. It may or may not be evenly distributed and it will be difficult to discern if it’s symmetrical or has a skew. The spread may be very wide and there will not be a clear center.
With a large sample size, the distribution will look unimodal, the spread will become smaller with less range and there will be a clear center around a specific value
prob <- 1-pnorm(10500, mean = 9000, sd = 1000)
prob
## [1] 0.0668072
Standard Error (se) = \(\frac{1000}{\sqrt{15}}\)
Z Score = \(\frac{(Sample Mean - Null Value)}{se}\)
se <- 1000/sqrt(15)
se
## [1] 258.1989
z <- (10500-9000)/se
z
## [1] 5.809475
prob <- 1-pnorm(10500, 9000, se)
prob
## [1] 3.133452e-09
Population
x <- sort(rnorm(n = 10000, mean = 9000, sd = 1000))
y <- dnorm(x = x, mean = 9000, sd = 1000)
plot(x,y, type = "l", xlim = c(5500, 12500))
lines(x = x, y = y, col = "blue")
Sample
x <- sort(rnorm(n = 15, mean = 9000, sd = se))
y <- dnorm(x = x, mean = 9000, sd = se)
plot(x,y, type = "l", xlim = c(5500, 12500))
lines(x = x, y = y, col = "blue")
\(z = \frac{(\bar x - \mu) \sqrt{n}}{\sigma}\)
The large the \(n\) value the large the z-value increases. This means as the sample size gets larger, it is more likely to go into the \(H_A\) region, so the p-value decreases.