DATA606 : Assignment 4

mean <- 171.1
median <- 170.3
sd <- 9.4
IQR<- 177.8 - 163.8 # Q3 - Q1
IQR

## [1] 14

Ans a. The point estimate for the average height of active individuals = 171.1 and the median is 170.3
b. The SD is 9.4. The IQR is 177.8 - 163.8 = 14

Z.180 <- (180 - mean)/(sd)
(round(Z.180, 2))

## [1] 0.95

Z.155 <- (155 - mean)/(sd)
(round(Z.155, 2))

## [1] -1.71

c. The Z score for heaight 180cm is: 0.95. This is less than 2, and considered not unusual.
The Z score for heaight 155cm is: -1.71. This is less than 2, and considered not unusual.

d. I do not expect the mean and standard deviation to be exactly the same, as this was a random sampling. I suspect that they would be similar but not exact, as it is unlikely that the same exact 507 values would be picked again

SE <- sd/sqrt(507)
paste0("Standard error: ", round(SE,3))

## [1] "Standard error: 0.417"

e. The standard error to measure the variability for all the point estimates for all the mean heights we sample.

Ans: a. False. We know 100% for certain that the average spending costs of these 436 American adults is between $80.31 and $89.11. The point estimate is always in the confidence interval.
b. False. The N = 436, and the right skew can be overlooked.
c. False. The confidence interval is not about a sample mean.
d. True. This is the definition of a 95% Confidence Interval.
e. True. If we do not need to be as sure, then we can use a lower number for confidence interval. The range would also be smaller vs. a 95% confidence interval.
f. False. In the calculation of the standard error, we divide the standard deviation by the square root of the sample size. To cut the SE (or margin of error) to a third, we would need to sample 3^2 = 9 times the number of people in the initial sample.

# Margin of Error = Z * standard of error
# If the 95% confidence intervals = 89.11 - 80.31 = 8.80
# Divide this number (8.80) by 2 = 4.40

g. True. From above calculation the the mean is 84.71 and the margin of error is 4.40.

Ans: a. Independant samples selected randomly and its size > 30. The sample satisfies the basic requirements.

giftedchild.n <- 36
giftedchild.min <- 21
giftedchild.mean <- 30.69
giftedchild.sd <- 4.31
giftedchild.max <- 39

# Null hypothesis (H0) is that 32 months is average for an average developing child.
# Alternate hypothesis (HA) states that the average does NOT equal 32 months.
# This is a two-tailed T test, with the alpha value = 0.10, as designated in the question.

# Calculate the Standard of Error.
child.SE <- 4.31/sqrt(giftedchild.n)
child.Z <- (giftedchild.mean - 32)/(child.SE)
p.value <- pnorm(child.Z, mean = 0, sd = 1) * 2 # To make it two-tailed
p.value

## [1] 0.0682026

b. The two tailed T test p-value is: 0.0682026 which is < 0.10, therefore, we reject the null hypothesis.
c. With the p value < 0.10, this suggests that the gifted children count to 10 on average statistically faster than the normal child.

child.lowerCI90 <- giftedchild.mean - 1.29 * child.SE
child.upperCI90 <- giftedchild.mean + 1.29 * child.SE
child.lowerCI90

## [1] 29.76335

child.upperCI90

## [1] 31.61665

d. The interval range is lower:29.76335 , higher: 31.61665
e. The results from the hypothesis test and the confidence interval agree. The upper limit in the 90% confidence intervals is 31.62, whereas, the average child who can count to 10 was 32 months. With a confidence intervals of 90%, this is equivalent of alpha = 0.10.

The null hypothesis is that the average of IQ mothers of gifted children = average IQ of the population at large.
The alternate hypothesis: average of IQ mothers of gifted children != average IQ of the population at large.

n <- 36
child.gift.min <- 101
child.gift.mean <- 118.2
child.gift.sd <- 6.5
child.gift.max <- 131

child.gift.SE <- (child.gift.sd)/sqrt(n)
child.gift.Z <- (child.gift.mean - 100)/(child.gift.SE)
(child.gift.Z)

## [1] 16.8

child.gift.twotailP <- (1 - pnorm(child.gift.Z, mean = 0, sd = 1)) * 2
child.gift.twotailP

## [1] 0

a. With a P value of zero. This is obviously less than alpha = .10, we can reject the null hypothesis and accept the alternate hypothesis.

b. Below are confidence interval :

# To make a 90% confidence interval, looking at the Z table. Z should equal 1.29
CI.upper90 <- child.gift.mean + 1.29 * child.gift.SE
CI.lower90 <- child.gift.mean - 1.29 * child.gift.SE
paste("Lower 90% CI: ", CI.lower90)

## [1] "Lower 90% CI:  116.8025"

paste("Upper 90% CI: ", CI.upper90)

## [1] "Upper 90% CI:  119.5975"

c. The results from the hypothesis test and the confidence interval agree.

Sampling distribution of the mean is the probability distribution obtained through a large number of samples drawn from a speciric population. Also, the means of all the samples are used to create the sample distribution.
The shape is typically unimodal with no skew (for a normal distribution). As the number N of samples increases, the more normal the distribution appears. Also with a higher N sample, given that the formula standard of error is (standard deviation of population) / (sqrt of N). The larger the N, the smaller the spread or variance.

bulb.mean <- 9000
bulb.sd <- 1000
Last.Longer <- pnorm(10500, mean = bulb.mean, sd = bulb.sd, lower.tail = FALSE)
round(Last.Longer,2) * 100

## [1] 7

a. The probability that a randomly chose light bulb lasts more than 10,500 hours is: 7%

fifteen.bulb <- rnorm(15, mean = bulb.mean, sd = bulb.sd)
par(mfrow = c(1,2))
hist(fifteen.bulb)
qqnorm(fifteen.bulb)
qqline(fifteen.bulb)

mean(fifteen.bulb)

## [1] 8989.507

b. The random sampling of 15 independent light bulbs, the distribution of the mean lifespan would be centered near population mean, centered around 9000, and having a nearly normal shape.

fifteen.bulb.SE <- bulb.sd/sqrt(15)
prob.bulb <- pnorm(10500, mean = bulb.mean, sd = fifteen.bulb.SE, lower.tail = FALSE) * 100
round(prob.bulb,2)

## [1] 0

c. The probability that the mean lifespan of 15 randomly chosen light bulbs is more than 10,500 hours is: 0 %

d. Below is the sketch of two distributions (population and sampling) on the same scale:

par(mfrow = c(2,1))
bulb.pop <- rnorm(10000, mean = bulb.mean, sd = bulb.sd)
hist(bulb.pop, xlim = c(4000,14000), prob = TRUE)
lines(density(bulb.pop, adjust = 2), lty = "dotted", col = "darkgreen", lwd = 2)
hist(fifteen.bulb, xlim = c(4000,14000), prob = TRUE)
lines(density(fifteen.bulb, adjust = 2), lty = "dotted", col = "blue", lwd = 2)

samp.bulb <- rep(NA, 5000)

for (i in 1:5000){
  samp <- rnorm(15, mean = bulb.mean, sd = bulb.sd)
  samp.bulb[i] <- mean(samp)
}

par(mfrow = c(2,1))
hist(bulb.pop, xlim = c(4000,14000), prob = TRUE)
lines(density(bulb.pop, adjust = 2), lty = "dotted", col = "darkgreen", lwd = 2)
hist(samp.bulb, xlim = c(4000,14000), prob = TRUE)
lines(density(samp.bulb, adjust = 2), lty = 'dotted', col = "blue", lwd = 2)

e. f you have a skewed distribution, you CANNOT estimate the probabilities as one of the assumptions in order to perform these calculations is that there is no to very minimal skewed distributions.

The P value depends on the standard of error. To calculate standard of error:

sd(population)/ sqrt(n) With SE, we can calculate Z score:

(point estimate - mean) / standard of error

To calculate the P value: 1 - probability value from the Z score and multiply it by 2 (two test)

Ff we use N = 500, the denominator in the SE would be larger, thus the SE would be smaller. If the SE becomes smaller, then the Z score will get larger. If the Z score is larger, then (1 - prob value from the Z score) * 2 will get smaller.

Thus the P value will decrease.

higher N value will allow to reject the null hypothesis.

DATA606 : Assignment 4

Niteen Kumar

Mar 18, 2018