First homework assignment for DACSS 603
Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures. Is the confidence interval narrower for angiography or bypass surgery?
I wanted to do this one a bit longer-form than necessary, but I also appreciate the chance to practice organizing my R code neatly while solving. I need some good repetition to build those good habits.
For this question I assigned all of the values in the table to objects to use in the calculations of each treatment’s t-scores, margins of error, and confidence intervals. I did a quick check of the qnorm(.95) to see if the t-scores had converged to the normal distribution given the relatively high sample sizes of each treatment. Surprisingly, to me at least, rounding at the .000 level they still had very slightly different levels both to the normal distribution and to each others’ scores.
I know I have the standard deviations, wait time would be a continuous variable, and these samples are larger than n = 30 so I can use qnorm and the 1.645 z critical value clearing those assumptions. But, it seems like I should use the distribution with fatter tails relative to the sample size values from qt regardless if I could assume normality from the other conditions, as to err on the side of caution for analyzing and communicating medical procedure data. I have a note in my code on that point, mostly for my own reference and will solve for the normal distribution values as well. After running the calculations the practical effect would be introducing slightly more uncertainty in the mean wait time for the angiography procedure while still keeping that procedures’ interval narrower, as we’d expect, than that of the bypass procedure. Especially if this was to be used to set patient expectations on wait times, the wider range would be the option a hospital or doctor would communicate, but we do appear to be in the range where t and z distributions are becoming quite similar.
## Means, Total N, and SDs from full question text. B = Bypass A = Angiography
#mean wait times for each procedure
mean_b <- 19
mean_a <- 18
#standard deviations
sd_b <- 10
sd_a <- 9
#total N
n_b <- 539
n_a <- 847
## 90% CI score finding. .90 = 1 - a and I need a/2 for the two tails since I have no theory on why the direction of the error would be important. Did a little side exploring of the t distribution and normal distribution scores below.
#Rounded to 1.645 but not used for calculations. I thought it was a sufficiently large sample size that that qt and qnorm should have returned the same values at .000 rounding, but after checking those, each T rounded up differently at the 3rd digits and for medical data I would rather err on the side of caution
#normal z score
z_heart <- qnorm(.95)
round(z_heart, digits = 3)
[1] 1.645
#check this for t scores
b_t <- round(qt(.95, df = n_b -1), digits = 3)
a_t <- round(qt(.95, df = n_a -1), digits = 3)
#print them
b_t
[1] 1.648
a_t
[1] 1.647
## 90% CI standard error of mean/ margin of error
#Take the score multiplied by standard deviations and sqrt of Ns, t value
mofe_bypass <- b_t*(sd_b/sqrt(n_b))
mofe_angiography <- a_t*(sd_a/sqrt(n_b))
#doing this with z 1.645
mofe_bypass_z <- 1.645*(sd_b/sqrt(n_b))
mofe_angiography_z <- 1.645*(sd_a/sqrt(n_a))
#print the normal distribution score MoE
mofe_bypass_z
[1] 0.7085517
mofe_angiography_z
[1] 0.5087058
# Print them using t value
mofe_bypass
[1] 0.7098439
mofe_angiography
[1] 0.6384718
## +/- from the data set mean for the range
#bypass upper and lower
bypass_lower <- mean_b - mofe_bypass
bypass_upper <- mean_b + mofe_bypass
#combine the upper and lower to list interval
ci_bypass <- print(c(bypass_lower, bypass_upper))
[1] 18.29016 19.70984
#angiography upper and lower
angio_lower <- mean_a - mofe_angiography
angio_upper <- mean_a + mofe_angiography
#combine the upper and lower to list interval
ci_angiography <- print(c(angio_lower, angio_upper))
[1] 17.36153 18.63847
#Normal distribution 90% CI
#bypass Z
bypass_lower_z <- mean_b - mofe_bypass_z
bypass_upper_z <- mean_b + mofe_bypass_z
ci_bypass_z <- (print(c(bypass_lower_z, bypass_upper_z)))
[1] 18.29145 19.70855
#angiography Z
angiography_lower_z <- mean_a - mofe_angiography_z
angiography_upper_z <- mean_a + mofe_angiography_z
ci_angiography_z <- (print(c(angiography_lower_z, angiography_upper_z)))
[1] 17.49129 18.50871
The confidence interval is narrower for angiography, as we have a larger sample size for that procedure’s wait time and a smaller standard deviation. We would expect this as in theory as the larger sample size mean should be closer to the true population mean, and the smaller standard deviation means we have less variation to begin with. That expectation is shown in both the smaller numerator and the larger denominator produced during the margin or error calculation. For comparison on the denominators, since the standard deviations were already listed in the table: 23.2163735 for bypass and 29.1032644 for angiography. For completeness, the normal distribution CI’s are 17.4912942, 18.5087058 for angiography and 18.2914483, 19.7085517 for the bypass procedure.
A survey of 1031 adult Americans was carried out by the National Center for Public Policy. Assume that the sample is representative of adult Americans. Among those surveyed, 567 believed that college education is essential for success. Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success. Construct and interpret a 95% confidence interval for p.
Since we are looking to construct this interval around a proportion, I used the prop.test function to construct the interval using the total sample size as the n input and the 567 raw respondents for college “being essential for success” as the success vector in the function. I decided to use this function versus hand calculating as in question 1.
prop_coll_success <- prop.test(567, 1031, conf.level = .95)
prop_coll_success
1-sample proportions test with continuity correction
data: 567 out of 1031, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5189682 0.5805580
sample estimates:
p
0.5499515
names(prop_coll_success)
[1] "statistic" "parameter" "p.value" "estimate"
[5] "null.value" "conf.int" "alternative" "method"
[9] "data.name"
The point estimate from the survey data is 0.55 The 95% confidence The interval is 0.52, 0.58. This interval means that we could say we are 95% confident that the proportion of American adults who believe that “college education is essential for success” is somewhere between the lower and upper end of our confidence interval – assuming the initial survey was indeed random and representative. Representativeness (and randomness, but that’s already extremely difficult with surveying) would be especially important for this question depending on what variables were used to determine that the initial sample was representative, as beliefs around college education are increasingly subject to the impacts political polarization and thus we could be breaking some of our assumptions needed for our analysis to be accurate depending on which variables were used.
Suppose that the financial aid office of UMass Amherst seeks to estimate the mean cost of textbooks per quarter for students. The estimate will be useful if it is within $5 of the true population mean (i.e. they want the confidence interval to have a length of $10 or less). The financial aid office is pretty sure that the amount spent on books varies widely, with most values between $30 and $200. They think that the population standard deviation is about a quarter of this range. Assuming the significance level to be 5%, what should be the size of the sample?
For this problem, I started off assigning all of the elements of the problem to objects as I did for the prior two. The standard deviation problem is shown in sd_books below, as 1/4 of the difference between $200 and $30. The critical z for 5% significance level is shown in finding the qnorm result for half of the alpha level. The margin of error we need to aim for is 5 dollars and that is assigned to book_moe. I assumed it would take a decent sized sample, at least larger than 30, to get within 5 dollars with that large of a standard deviation and the we would be able to randomly collect this sample, so I used the large sample size for estimating mean equation to find the sample size n. It took too long to figure out the fancy letters in Rmarkdown, so I used abbreviations for standard deviation and such, unfortunately. \[n = sd ^ {2} (z / M) ^ {2}\]
#Assign all of the elements of the problem to objects
sd_books <- (200-30)*.25
sd_books
[1] 42.5
book_z <- qnorm(.975)
book_z
[1] 1.959964
book_moe <- 5
# Formula for n from margin of error calculation, large sample is n = sd^2 * z a/2 / M. Calculate by hand first.
book_n_by_hand <- sd_books ^ 2 * (book_z / book_moe) ^ 2
book_n_by_hand
[1] 277.5454
#Use the samplingbook package to confirm.
book_n_package <- sample.size.mean(e = book_moe, S = sd_books, level = .95)
book_n_package
sample.size.mean object: Sample size for mean estimate
Without finite population correction: N=Inf, precision e=5 and standard deviation S=42.5
Sample size needed: 278
After finding the answer of 278 –rounded up– students in the sample by hand calculating n, I wanted to check my work using the samplingbook package and putting the same elements from the problem into the function. That answer, shown above in book_n_package matched my “hand” calculation of 278 students in the sample needed to have a mean estimate within 5 dollars of the true population mean of textbook costs.
According to a union agreement, the mean income for all senior-level workers in a large service company equals $500 per week. A representative of a women’s group decides to analyze whether the mean income μ for female employees matches this norm. For a random sample of nine female employees, ȳ = $410 and s =90.
-A)Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.
-B) Report the P-value for Ha : μ < 500. Interpret.
-C) Report and interpret the P-value for H a: μ > 500. (Hint: The P-values for the two possible one-sided tests must sum to 1.)
Since this question has several follow up questions to report out, I’m going to dive right in to the code portion of the work, and I’ll have all of the narrative explanations and steps of work described there.
#question elements assigned to objects, will use alpha = .05
#population
all_union_mean <- 500
#sample
women_mean <- 410
women_sd <- 90
women_n <- 9
#get the test statistic t
#estimated standard error
women_se <- women_sd / sqrt(women_n)
women_se
[1] 30
#test statistic
women_t <- (women_mean - all_union_mean) / women_se
women_t
[1] -3
#find two tail p-value, round to two digits
women_p <- round(2*pt(-abs(women_t), df = 8, lower.tail = FALSE ), digits = 2)
women_p
[1] 1.98
#p for < 500, round to two digits
women_lower_p <- round(pt(-abs(women_t), df = 8, lower.tail = TRUE), digits = 2)
women_lower_p
[1] 0.01
#p for > 500, round to two digits
women_greater_p <- round(pt(-abs(women_t), df = 8, lower.tail = FALSE), digits = 2)
women_greater_p
[1] 0.99
Assumptions: Because this is a small sample, I’m using the two-sided t-test as it is robust when the data may not clear the normality assumptions. Given the smaller sample size of the study, highly skewed data could impact one-tailed tests and the question didn’t explicitly state that they were looking higher or lower than the overall union average
Hypotheses: Null hypothesis is that the mean wage for women = 500 dollars, the same as the contract required mean for all senior workers at the union. The alternative hypothesis is that the mean wage for women =/= 500 dollars.
Test statistic: The test statistic women_t has a value of -3. Since this is a negative value due to the sample mean being lower than the population mean, it’s important to remember that the absolute value should be used in calculating the p value.
P value: The two-sided P value from women_t with 8 degrees of freedom equals 1.98.
Interpreting the results: Since the two-tailed P value is lower than our pre-selected alpha level of .05 by some distance, we can reject the null hypothesis that the mean wage for women is equal to 500 dollars, and accept the alternative hypothesis that it is not equal to 500. The below data points make it seem as though it is very likely below the overall union mean, which lines up with the mean and sd data from their study. If only the very outer bounds of the deviation hits the overall mean, it seems like this all confirms that the female employee mean is < 500 dollars. I would suggest initiating the process of review, confirmation of the study, and the grievance process.
Question B: The P value for the womens’ mean weekly wage being less than 500 is 0.01. This means that if our null hypothesis was true, we would have a 99% chance of getting a value below 500 for our sample mean. Being that the contract specifies an overall mean of 500 and it’s stated that this is a large company so we can assume normality through the Central Limit Theorem, we have a lot of evidence that the mean weekly wage for women is likely lower than what the contract specifies. Being that we have a small sample size it is possible for the one-tailed value to be off with highly skewed data, however. Given the rejection of the null and the one tail results here and below for greater than 500, I think the women’s group would have a strong case that their mean wage is not 500 and most likely lower than 500 dollars.
Question C: The P value for the mean weekly wage for women being greater than 500 is 0.99. This means that if the null hypothesis was true, we would expect to get a mean weekly wage for women greater than the population mean wage 1% of the time. With the same small sample size caveat as above on one tailed results, this split in P values which would have us fail to reject and then reject the null hypothesis respectively, for above and below the population mean is quite extreme and would be another supporting point in the group filing a grievance.
Jones and Smith separately conduct studies to test H0: μ = 500 against Ha : μ ≠ 500, each with n = 1000. Jones gets ȳ = 519.5, with se = 10.0. Smith gets ȳ = 519. 7,with se = 10.0.
A) Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97 and P-value = 0.049 for Smith.
B) Using α = 0.05, for each study indicate whether the result is “statistically significant.”
C)Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0 ,” without reporting the actual P-value.
Same as above, I’m going to assign out the objects and then go through the solution steps below the R code chunk. The answer for question A will be in the R code chunk, while I will answer the narrative aspects of the other questions in the text below.
#Question Elements
#both studies with same n
study_n <- 1000
#null mean = 500
#alt mean =/= 500
#jones study elements
jones_mean <- 519.5
jones_se <- 10.0
#smith study elements
smith_mean <- 519.7
smith_se <- 10.0
#Solve for Smith
#t for Smith, use two digits and it must equal 1.97
smith_t <- round((smith_mean - 500) / smith_se, digits = 2)
smith_t #It's 1.97 like it's supposed to be, yay!
[1] 1.97
#P for Smith, use three digits and two-tailed
smith_p <- round(2 * pt(-abs(smith_t), df = 999), digits = 3)
smith_p # It's .049 like it's supposed to be!
[1] 0.049
#Solve for Jones
#t for Jones, use two digits and it must equal 1.95.
jones_t <- round((jones_mean - 500) / jones_se, digits = 2)
jones_t #It's 1.95 like it's suppost to be, yay!
[1] 1.95
#P for Jones, use three digits and it must equal .051
jones_p <- round(2 * pt(-abs(jones_t), df = 999), digits = 3)
jones_p #It's .051 like it's supposed to be!
[1] 0.051
Without rounding, only Smith’s .049 would be below the alpha level and Jones’ .051 would be above. With these fine margins, reporting only whether the null was rejected or if we failed to reject it, or even just listing the “p less than or equal to .05 or p greater than .05” statement without the raw p values included would also ascribe practical significance in difference to these two studies even though they are very nearly identical. The differences could very well be random noise and chance, and reporting out the statement alone would make it harder to asses that.
Choosing to publish Smith’s study just because it cleared the threshold and not Jones’ as well could make it harder to see the full picture of the parameter they studied or analyze the overall random variability in the experimental findings in a meta analysis setting.
Are the taxes on gasoline very high in the United States? According to the American Petroleum Institute, the per gallon federal tax that was levied on gasoline was 18.4 cents per gallon. However, state and local taxes vary over the same period.
The sample data of gasoline taxes for 18 large cities is given below in the variable called gas_taxes.
Is there enough evidence to conclude at a 95% confidence level that the average tax per gallon of gas in the US in 2005 was less than 45 cents? Explain.
After reading the question, I don’t think our initial sample was collected in a way that would allow us to be confident in a confidence interval constructed from it to look at the average price in the nation as a whole in 2005. If the federal tax is constant at 18.4, it looks like most of the variability from city to city in the sample data is driven by state and local tax levy decisions. The mean for the gas_tax data is 40.86, so over half of our mean amount comes from state and local variables that do not look to be accounted for in the sampling description as is. Most Americans in 2005 (and still today), did not live in very large cities, the amount depending of course on the exact definition of “large.”
Our sample is only from 18 large cities in the country, and I think it’s reasonable to assume large cities have, or the very least could have, systemically different gas tax policies than other population densities. Some states could have caps or specific legislation that could impact the overall national average, and there’s not enough information about how the 18 large cities were selected or sampled to clear all of the assumptions, even if we didn’t think large cities varied from suburban, exurban or rural geographies in a way that would skew the small sample of data that we do have. Exploring bootstrapping or other approaches would also be impacted by this fact.
Now, if I’m reading entirely too much into this set up and I should just show that I can evaluate if our mean is below a set level using a confidence interval, I’ll proceed to do that, too! To answer this question, I’ll use the psych package to get the descriptive statistics and look those over for fun and possible use to calculate by hand if the t.test result looks funny. Then I’ll use t.test(gas_tax) and look at the 95% CI range.
If the entire confidence interval is below 45 cents we would have enough evidence to say we think it’s 95% likely that the mean gas tax is below the 45 cent level, the equivalent of rejecting the null hypothesis and accepting the alternative hypothesis of gas taxes were likely lower than 45 cents. If the interval includes 45 and/or above that level we don’t have enough information, and would be doing the equivalent of failing to reject the null hypothesis.
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
#Use describe from the psych package for overview of gas_tax data, could use the variables here to hand calculate the CI.
gas_tax_summary <- describe(gas_taxes)
gas_tax_summary
vars n mean sd median trimmed mad min max range skew
X1 1 18 40.86 9.31 41.47 41.41 9.72 18.49 54.41 35.92 -0.58
kurtosis se
X1 -0.32 2.19
#Use t.test on gas_taxes to see the 95% CI
t.test(gas_taxes)
One Sample t-test
data: gas_taxes
t = 18.625, df = 17, p-value = 9.555e-13
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
36.23386 45.49169
sample estimates:
mean of x
40.86278
The 95% interval does contain 45 cents, so we cannot say that we have enough evidence at this confidence level and interval to do the equivalent of rejecting the null hypothesis. Since the null value is at the very very top of the interval, and putting aside the other issues with the sample to begin with, it seems like the sort of result where we could say in practice it was likely lower than 45 cents. Since gas taxes and prices go to the third digit at the pump, our result would have a maximum 95% CI level “at the pump” reading of .455. It feels safe to describe in actual practice with that interval that we’re confident it was 45 cents or lower, even if we’d need a bit more data to say it in specific statistical terms.