Part 1

Question 2.2 Heart Transplants, Part II

2.2.a

65.22% of the patients in the treatment group died and 88.24% of patients in the control group died.

2.2.b.i

H⁰ = Treatment has no effect on patient outcome, and the variables (alive or dead as well as treatment or control) are independent.

H^A = Treatment impacts whether a patient will live or die.

2.2.b.ii

alive on 28 cards dead on 75 cards 69 cards represent treatment 34 cards represent control distribution centered at 0 simulated differences in proportions are greater than or equal to 0.23 (24/69 - 4/34 = 0.35 - 0.12 = 0.23)

2.2.b.iii

The histogram is slightly left skewed, slightly contrasting with the symmetric normal distribution of the null hypothesis that there is no relationship between treatment and patient outcome. Thus, we can accept the alternative hypothesis that there is a statisticaly significant relationship between treatment and patient outcome.

Question 3.28 Sleep Deprivation, CA vs. OR, Part I

We can be 95% confident that true sleep deprivation rates are between the proportion 1.7% lower or 0.1% higher than the sample size.

Question 3.30 Sleep Deprivation, CA vs. OR, Part II

3.30.a

N⁰ = There is no significant difference between the rates of sleep deprivation in California and Oregon, or any difference noted in the survey is not statistically significant. p1 (california sleep deprivation = ) N^A = The survey provides strong evidence of the difference in the rate of sleep deprivation in California and Oregon.

the pooled proportion estimate is 0.082, the t-statistic is -1.68, and the p-value is 0.093, which is greater than a confidence level of 5%, so we can reject the null hypothesis.

3.30.b

If this is incorrect, then we cannot reject the null hypothesis, meaning this would be a Type 2 Error.

Question 4.16 Work hours and education, Part I

4.16.a

The parameter of interest is the difference between the average hours worked per week for Americans with a college degree and those Americans without a college degree. The difference in averages of the sample population is 2.4 hours. We can be 95% confident that the true value of the difference in average working hours of the entire population lies betewen 1.5 and 3.3 hours.

4.16.b

Both conditions appear to be satisfied. Independence: Both samples (505 and 667) represent less than 10% of both American populations Success-Failure Condition:

4.16.c

We can be 95% confident that Americans with a college degree worked 0.6 to 4.1 hours more than Americans without a college degree.

4.16.d

College degrees often lead to salaried positions that are seldom paid by the hour, hence employees often work more hours to complete their tasks or take on more work. Jobs that do not require college degrees are often paid in hourly rates, which can directly limit the number of hours one works.

Question 5.10

5.10.a

There appears to be a positive relationship between the volume and height of these trees.

5.10.b

There appears to be a strong and positive relationship between the volume and diameter of these trees, with very few residuals.

5.10.c

The diameter would be a better variable, because the relationship between diameter and volume is stronger and more predictable versus the relationship between height and volume.

Part II

Question 1

My best point estimate of the population mean price is $173,783.20.

code: price60 <- sample(price, 60)
mean_price60 <- mean(price60)
mean_price60
[1] 173783.2

Question 2

A mean of 50 home price samples yields $166,155.40.

code:
price50 <- sample(price,50)
price50
[1] 126000 105000 129500 137000 295493 139500 187500 149000 297000 172500 64000 239900 99900 193800 128500 150000 [17] 153500 318000 142000 127500 320000 99500 128250 83500 290000 128000 93500 229800 127000 107400 132500 148325 [33] 325000 193500 55000 130000 160000 128500 126000 162000 205000 168000 284000 136900 99500 177439 135500 117000 [49] 143000 318061
summary(price50)
Min. 1st Qu. Median Mean 3rd Qu. Max.
55000 127100 140800 166200 192000 325000
mean_price50 <- mean(price50)
mean_price50
[1] 166155.4

According to this histogram of the 50 samples of home prices, it appears that sample has a normal distribution with a right skew (greater frequency of more expensive homes). Based on the shape of this sampling distribution, I would guess the mean home price of the population to be around $175,000, while the mean home price of these 50 samples is $166,155.40.

code: mean_price50 <-rep(NA, 50) for (i in 1:50){ + price50 <- sample(price, 50) + mean_price50[i] <- mean(price50) + } hist(mean_price50) hist(mean_price50, col = “orange”)

The true population mean home price is $180,796.10.

code: mean(price)
[1] 180796.1

Question 3

All but 3 of the confidence intervals of this sample population include the true population mean of $180,796. We can say this with about 94% confidence (47/50 = 0.94), slightly lower than the 95% confidence level.

code: se <- sd(price50)/sqrt(50)
upper <- mean_price50 + 1.96 * se
lower <- mean_price50 - 1.96 * se
c(lower, upper)
[1] 146360.6 158854.4 158361.7 162793.1 143718.4 162175.4 144624.9 [8] 156105.6 149074.0 157804.8 155530.0 166318.3 137685.1 156602.6 [15] 157458.6 167144.0 173359.2 145126.0 155806.2 173961.7 160929.5 [22] 161498.9 146633.9 176928.3 164456.0 173029.3 155652.1 172905.9 [29] 173695.8 157140.1 175402.3 145109.1 156167.3 139159.3 178778.8 [36] 174691.5 178924.4 163386.4 153456.2 181800.2 159862.0 153185.8 [43] 160980.9 156626.9 175641.5 160075.6 157172.9 161878.8 166831.8 [50] 150471.6 185105.7 197599.6 197106.8 201538.2 182463.5 200920.5 [57] 183370.0 194850.7 187819.1 196549.9 194275.1 205063.5 176430.2 [64] 195347.7 196203.8 205889.1 212104.3 183871.1 194551.3 212706.8 [71] 199674.6 200244.1 185379.0 215673.4 203201.2 211774.4 194397.2 [78] 211651.0 212441.0 195885.2 214147.4 183854.2 194912.4 177904.4 [85] 217523.9 213436.7 217669.6 202131.5 192201.3 220545.3 198607.1 [92] 191930.9 199726.0 195372.0 214386.6 198820.7 195918.1 200623.9 [99] 205576.9 189216.7
plot_ci(lower, upper, mean(price))

Question 4

For a 90% confidence interval, a z-score would be 1.65, revealing an upper bound of $186,152.70 and lower bound of $153,535.70.

code: se <- sd(price50)/sqrt(50)
upper2 <- pe + 1.65 * se
lower2 <- pe - 1.65 * se
c(upper2, lower2)
[1] 186152.7 153535.6

another code: mean_price50 <- rep(NA, 50)
sd_price50 <- rep (NA, 50)
for (i in 1:50){
+ price50 <- sample(price, 50)
+ mean_price50[i] <- mean(price50)
+ sd_price50[i] <- sd(price50)
+ }
se <- sd_price50/sqrt(50)
upper2 <- mean_price50 + 1.65 * se
lower2 <- mean_price50 - 1.65 * se
c(upper2, lower2)
[1] 204381.5 220780.8 192717.9 218499.1 208326.2 192865.6 234517.0 [8] 181150.6 202074.5 203238.5 175574.5 193633.6 194280.8 196342.9 [15] 219658.7 203131.9 210991.5 188975.6 202101.0 226421.9 188601.6 [22] 207312.6 204321.5 223666.8 169335.6 196146.6 186718.2 204367.0 [29] 192378.3 194888.1 209170.8 188663.5 215591.4 200571.4 188943.6 [36] 169296.1 186851.0 187869.3 194420.5 217942.6 191723.8 208427.8 [43] 190345.9 193321.6 198395.0 190141.4 172676.7 227179.5 205386.8 [50] 218761.6 167485.9 171905.3 159404.8 176139.1 170378.6 164932.0 [57] 183149.9 151849.4 163535.0 174494.3 151985.1 161840.7 165028.1 [64] 166106.6 174340.1 165882.6 171186.9 159590.4 163169.2 183333.1 [71] 155079.5 171354.5 167437.5 175552.0 144458.9 162838.8 149088.9 [78] 155650.4 162431.7 161929.8 164695.5 157820.7 174180.4 162461.0 [85] 158336.5 148234.3 157116.6 155602.1 157004.2 172844.5 148141.0 [92] 165941.4 160581.4 157781.1 160892.1 156510.8 144879.3 178931.9 [99] 155030.6 174697.5

Question 5

According to the plot, it appears that about 12% of the sample observations (6/50 = 0.12) do not include the true population mean. This is very close to the 90% interval selected for the intervals.

code: plot_ci(upper2, lower2, mean(price))

Adv Quant Methods - Homework 2

Deepa Mehta

November 22, 2015