library('DATA606')
## Loading required package: shiny
## Loading required package: openintro
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following object is masked from 'package:datasets':
##
## cars
## Loading required package: OIdata
## Loading required package: RCurl
## Loading required package: bitops
## Loading required package: maps
## Loading required package: ggplot2
## Loading required package: markdown
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
4.4 Heights of adults
4.14 Thanksgiving spending Part1
4.24 Gifted Children Part-1
Conditions for inferences are satisfied. Sample is random and sample size is greater than 30. Overall population doesn’t have strong skew
Null Hypothesis : mean >= 32 and Alternate Hypothesis : mean < 32 SE = 4.31/sqrt(36) = 0.72 Z Value = (30.69-32)/0.72 = -1.81 Conclusion : At 90% confidence interval critical value = -1.28. Since Z stat is lower than cirtical value, we reject null hypothesis in favor of alternate hypothesis. We can conclude that true mean is lesser than 32
From the confidence interval we can say that mean age for gifted childrens to count to 10 is between 29.5 and 31.87 months. This is inline with alternate hypothesis.
4.26 Gifted Children Part-2
Null hypothesis is mean = 100 and Alternate hypothesis is mean is not 100 SE = 6.5/sqrt(36) = 1.08 Z Value = (118.2-100)/1.08 = 16.85. At 90% confidence interval critical value is 1.64. Observed value 16 is way higher than critical value. We have to reject null hypothesis in favor of alternate hypothesis
90% confidence interval for mean IQ of mothers of gifted childrens is between 116.4 to 120. This alines with the alternate hypothesis that the mean is not 100
4.34 CLT
Sampling distribution of the sample means is the distribution by drawing multiple samples from population and plotting sample means. As per central limit theorum this distribution is always normal irrespective of the actual distribution of the population from which samples are drawn. It is recommended to have sample size >= 30 to get best results while performing hypothesis testing. As sample size increases the standard error goes down and we can better estimate population parameter.
4.40 CFLBs
Probability of x > 10500 is 1-pnorm(1.5) = 0.0668
normalPlot(mean = 0, sd = 1, bounds=c(1.5,4), tails = FALSE)
Distribution of the mean lifespan of 15 bulbs will be normal with mean of 9000 and SD of 258.19
Z Value = (10500-9000)/258.19 = 5.18, probability is 1-pnorm(5.81) = 0
Population distribution vs Sampling distribution of sample mean
Black : Denotes population distribution Red: Denotes sampling distribution
s <- seq(5000,13000,0.01)
plot(s, dnorm(s,9000, 1000), type="l", ylim = c(0,0.002), ylab = "", xlab = "Lifespan (hours)")
lines(s, dnorm(s,9000, 258.1989), col="red")
4.48 Same observation with different sample size
As sample size increases, standard error decreases and Z value increases for positive Z and decreases for negative Z. This change in Z value causes P value to decrease