library('DATA606')
## Loading required package: shiny
## Loading required package: openintro
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
## Loading required package: OIdata
## Loading required package: RCurl
## Loading required package: bitops
## Loading required package: maps
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:openintro':
##
## diamonds
## Loading required package: markdown
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
The point estimate for the average height is 171.1. The point estimate for the median is 170.3.
The point estimate for standard deviation is 9.4. The point estimate for the IQR is 14 (Q3 - Q1).
sample_mean <- 171.1
sample_sd <- 9.4
(180 - sample_mean)/(sample_sd)
## [1] 0.9468085
(155 - sample_mean)/(sample_sd)
## [1] -1.712766
With another sample I would not expect the same point estimates. They approximate population values, but vary between samples.
n <- 507
sample_se <- sample_sd/sqrt(n)
sample_se
## [1] 0.4174687
FALSE,This is not what a 95% confidence interval means. It refers to how confident we are that the true population mean is within the specified interval if we were to take many different samples. If say we take 100 different samples and calculate the 95% confidence interval for each sample, about 95% of those intervals would contain the population mean.
FALSE,The sample is sufficiently large (n = 436) to account for the skew.
FALSE,Confidence interval for the mean of a sample is not about other sample means.
True, that is what a confidence interval is used for, to describe how confident we are that the corresponding population parameter falls within the confidence interval.
TRUE,With a 90% confidence interval we do not need such a wide interval to catch the values, so the interval would be narrower.
FALSE,In order to decrease the margin of error by 3, we need to increase the sample by 3^2=9.
True,The margin of error is 4.4 since our confidence interval is equal to our point estimate plus and minus the margin of error
The sample is random and 36 children of a large city is certainly under 10% of the population. The sample size is over 30. There isn’t appear to be any strong skew in the population. Based on this information the conditions for inference are satisfied.
H_0: mean = 32 (average age at which gifted kids first count to 10 is same as general population) H_A: mean < 32 (average age at which gifted kids first count to 10 is less than the general pop) alpha: 0.10 z_score of -1.82 (left tail only) -> .03438 The p-value is .03 -> if the null hypothesis were true that the mean is 32, the probability of an observed value to be 30.69 or lower
If the null hypothesis is true, then the probability of observing a sample mean lower than 30.69 for a sample of 36 children is only 0.0344
n <- 36
sd <- 4.31
se <- sd/sqrt(n)
mean <- 30.69
lower <- mean - 1.64 * se
upper <- mean + 1.64 * se
c(lower, upper)
## [1] 29.51193 31.86807
Yes, my confidence interval shows that the mean of 32 would be captured by my confidence interval from 23.58 to 37.8 so there is a good possibility that the difference in means is due to random sampling and not statistically significant.
z = (118.2 - 100) / 6.5
z
## [1] 2.8
normalPlot(mean = 0, sd = 1, bounds = c(-z,z), tails = TRUE)
lb <- round(118.2 - 1.65*6.5, 2)
ub <- round(118.2 + 1.65*6.5, 2)
lb
## [1] 107.48
ub
## [1] 128.93
Yes, the average IQ of mother’s in the general population does not fall within the 90% confidence interval of the mean IQ of mothers of gifted children.
The sampling distribution of the mean is the distribution that would be created by repeatedly taking random samples from a population and calculating the mean and then creating a distribution from all the resulting mean values. As the sample size increases, the distribution gets taller and skinnier with the center always the same at the population mean, but the range decreasing as the sample size is increased.
z <- (10500-9000)/1000
z
## [1] 1.5
normalPlot(mean = 0, sd = 1, bounds = c(-Inf,z), tails = FALSE)
P <- 1-0.933
P
## [1] 0.067
sd <- 1000
mean <- 9000
sample_sd <- sd/sqrt(15)
sample_sd
## [1] 258.1989
1 - pnorm(q=10500, mean=9000, sd=258.20)
## [1] 3.13392e-09
(10500-9000)/258.20
## [1] 5.80945
set.seed(3000)
xseq <- seq(0,12000,50)
densities1 <- dnorm(xseq, 9000,1000)
densities2 <- dnorm(xseq, 9000, 258)
par(mfrow = c(2, 1))
plot(xseq, densities1, col="yellow",xlab="", ylab="Density", type="l",lwd=2, cex=2, main="Population", cex.axis=.8)
plot(xseq, densities2, col="darkblue",xlab="", ylab="Density", type="l",lwd=2, cex=2, main="Sample", cex.axis=.8)
Yes, but we would need a much larger sample size to use the normal distribution for the sample mean.
The sample size increased your standard error is decreased, so the number of standard deviations your sample statistic is away from the population parameter would increase, and your p value would be much smaller.