# Good Practise: Basic house keeping: cleanup the env before you start new work
rm(list=ls())
setwd("C:\\CUNY\\606Statistics\\Assignments")
# Libraries
library(DATA606)##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
a. Average height of individuals is 171.1 Median height of individuals is 170.3
b. Point estimate for SD is 9.4. The point estimate for the IQR is 14 (Q3 - Q1)
c. An adult has to have greater than 2 standard deviations above the norm, to be considered unusually tall. \[Z>= 2\]
\[Z_{180} = \frac {x - \mu}{\sigma} = \frac {180 - 171.1}{9.4} = .9468085\]
SInce results is <2, its not unusual
\[Z_{155} = \frac {x - \mu}{\sigma} = \frac {155 - 171.1}{9.4} = -1.71\]
SInce results is <2, its not unusual
d. With another sample we would not expect the same point estimates. They approximate population values, but can vary between samples.
e. We can measure it by getting Standard Error \[SE = \frac {\sigma}{\sqrt{n}} = \frac {9.4}{\sqrt{507}} = 0.417 \]
a. FALSE Inference is measure on the population parameter. the point estimate of this sample is always within the confidence interval.
b. FALSE The sample is large enough (n=436 and n >=30) to account for skew.
c. FALSE COnfidence interval for the mean of sample is not about the sample mean. we cant be 95% sure that random sample will have mean between 80.31 and 89.11.
d. TRUE Population parameter are being estimated by point estimate and the confidence interval.
e. TRUE With 90% confidence we do need such a wide interval, so the interval would be narrower.
f. FALSE to decrease the margin of error by 3 we need to increase the sample size by \(3^2 = 9\).
g. TRUE Margin of error is half confidence interval = 4.4.
a: The condition is satisfied! As the data was collected for children in large city, we can assume it was independent. As n statisfies minimum n, and is large enough. The grpah shows normal distribution shape since there is no obvious skew.
b: \(H_{0} : \mu = 32\) and \(H_{A} : \mu < 32\) \[SE = \frac {4.32}{\sqrt{36}} = 0.72\] \[Z = \frac{30.69-32}{0.72} = -1.81944\]
We can reject \(H_{0}\) as \(p - value = 0.0344219 < 0.10 = \alpha\)
c: if \(H_{0}\) is true then the probalility of observing a sample manea ower than 30.69 for sample of 36 childrens is only 0.0344 (p value)
d: 90% conifdence interval is \(30.69 \pm 1.65 * SE = 30.69 \pm 1.188\) or \((29.502, 31.878)\)
e: The hypothesis test and the confidence interval seem agree. There is 90% confidence that the average age at which gifted children first count to 10 is between 29.5 and 31.9 months. \(\mu = 32\)
a. Null hypothesis \(H_{0}\): The average of mother’s IQ of gifted children = population’s IQ average.
Alternate hypothesis \(H_{A}\): The average of mother’s IQ of gifted children \(\neq\) population’s IQ average.
b. The 90% confidence interval is \(118.2\pm1.65 * SE=118.2\pm1.782\) or \((116.418,119.982)\)
c. Hypothesis test and the confidence interval agree! We are 90% confident that the average IQ of mothers of gifted children is between 116.4 and 120. This is significantly above population average of 100.
As sample size increases the normal approximation also increases and the spread of the sampling distribution of the mean becomes narrower.
a. \(Z = \frac{x-\mu}{\sigma} = \frac{10500-9000}{1000} = 1.5\)
normalPlot(mean = 0, sd = 1, bounds=c(1.5,4), tails = FALSE)Probability of \(x > 10500\) is \(1-pnorm(1.5)\) or \(0.0668072\)
b. If the light bulbs are randomly selected, the distribution of the mean lifespan of 15 light bulbs is nearly normal with distribution \(N(\mu, \frac{\sigma}{\sqrt{n}})\) or \(N(9000,258.1989)\)
c. If the light bulbs are selected at random, the distribution of the mean lifespan of \(Z = \frac{10500???9000}{258.1989} = 5.81\) \(1-pnorm(5.81) = approx. 0\)
d.
s <- seq(5000,14000,0.01)
plot(s, dnorm(s,9000, 1000), type="l", ylim = c(0,0.002), ylab = "", xlab = "Lifespan (total hours)")
lines(s, dnorm(s,9000, 258.1989), col="blue")