data606.ch4.hwk

4.4

#a) mean is 171.1 and median is 170.3.

#b) standard deviation is 9.4 and Q1 is 163.8 and Q3 is 177.8 so IQR is 14.

#c) I would say someone who is 180cm is pretty tall but I would not say he/she is unusually tall as it is within 1 sd from the mean.
#I would say someone who is 151cm is pretty short but I would not say he/she is unusually tall as it is within 1.7 sd from the mean.
z=180
x_bar=171.1
sd=9.4

z180 = (z-x_bar)/sd
z180

## [1] 0.9468085

z=155
x_bar=171.1
sd=9.4

z155 = (z-x_bar)/sd
z155

## [1] -1.712766

#d) They would be different because each time you take another random sample, you are more likely getting new sets of samples.

#e) It can be measured using standard error.

sd=9.4
n=507

SE = sd/sqrt(n)
SE

## [1] 0.4174687

4.14

#a) FALSE, it is about Population Parameter being within confidence interval, not point estimate.
#b) FALSE, given that sample size is 436 (bigger than 30), sample distribution will be normally distributed eventually even with some skweness.
#c) FALSE, it is that 95% confident that population mean will be between 80.31 and 89.11, not 95% of random samples (not point estimate).
#d) TURE, it is about population parameter.
#e) TRUE, the lower confidence, the narrower the interval.
#f) FALSE, we need to get 9 times larger since sqrt(n) should be 3.
#g) True, since margin of error is calculated as upper CI - lower CI divided by 2, it is true that the margin of error is 4.4.
(89.11 - 80.31)/2

## [1] 4.4

4.24

#a) n is larger than 30, it does not have strong skewness and each observation is independent from random sampling so I would say the sample distribution can be used as inference. It will be approximately normally distributed.
#b) 
#H0 = 32, Ha < 32, one tail test

null = 32
sd = 4.31
x_bar = 30.69
n = 36

(x_bar - null)/(sd/sqrt(n))

## [1] -1.823666

#Z score -1.82 -> p-value = 0.0344. We reject null hypothesis in favor of alternaltive hypothesis as p value < 0.1.

#c) From b), we know that p-value should be lower than significance level to reject null hypothesis. Thus, we reject null hypothesis.

#d) 
z = 1.645
upper = x_bar + z*(sd/sqrt(n))
lower = x_bar - z*(sd/sqrt(n))

upper

## [1] 31.87166

lower

## [1] 29.50834

#e)
#Yes, from d), we know that 32 does not fall within CI 90% and we know that it is not true that children first count to 10 successfully when they are 32 months old, on average. From b), we rejected null hypothesis so the results agree each other.

4.26

#a)
#H0 = 100, Ha != 100, two tail test

null = 100
x_bar = 118.2
sd = 6.5
n = 36



(x_bar - null)/(sd/sqrt(n))

## [1] 16.8

#Since P-value is very large, we are certain that P-value < 0.1. Therefore, it is statistically significant and we reject null hypothesis in favor of alternative hypothesis.

#b) 
z = 1.645
upper = x_bar + z*(sd/sqrt(n))
lower = x_bar - z*(sd/sqrt(n))

upper

## [1] 119.9821

lower

## [1] 116.4179

#We are 90% confident that population parameter falls within [116.4, 119.98]

#c) Yes, according to b) we know that 100 does not fall within 90% CI so we know that it is not true that gifted children has the average IQ of 100 (their IQ is different than 100). From a), we rejected null hypothesis in a same logic with b).

4.34

#It means randomly sampling the means of the sample that have specific sample sizes. According to CLT, sample size should be larger than 30 for sampling distribution to be normally-distributed. As sample size increases, the center of sample distribution will move toward the population mean thus will have bell-shaped distribution. Spread of the sampling distribution will become smaller as sample size increases.

4.40

#a) 
library(DATA606)

## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.

## 
## Attaching package: 'DATA606'

## The following object is masked from 'package:utils':
## 
##     demo

mu = 9000
sd = 1000
x = 10500

z = (x - mu)/sd
z

## [1] 1.5

#Probability of Z > 1.5 = 6.7%.

normalPlot(mean = 0, sd = 1, bounds=c(1.5,4), tails = FALSE)

#b)
n = 15
SE = sd/sqrt(n)

#It should be nearly normally distribution with mean around 9000 and SE = 258.1989

#c) 

z = (x - mu)/SE
z

## [1] 5.809475

#since z-score is way large, we know that the probability is almost 0.

#d)
t <- seq(5000,13000,0.01)
plot(t, dnorm(t,9000, 1000), type="l", ylim = c(0,0.003))
lines(t, dnorm(t,9000, 258.1989), col="blue")

#Population distribution = black line, Sampling distribution = blue line

#e) Given that skewness is not so strong, we can estimate the probabilities. However, if the skewness is too strong, it will be difficult to estimate the probabilities.

4.48

#As sample size becomes larger, standard error will become smaller therefore, absolute value of z score will become bigger which will decrease P-value.

data606.ch4.hwk

Sang Yoon (Andy) Hwang

March 16, 2018