The time between the date a patient was recommended for heart surgery and the surgery date for cardiac patients in Ontario was collected by the Cardiac Care Network (“Wait Times Data Guide,” Ministry of Health and Long-Term Care, Ontario, Canada, 2006). The sample mean and sample standard deviation for wait times (in days) of patients for two cardiac procedures are given in the accompanying table. Assume that the sample is representative of the Ontario population
Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures. Is the confidence interval narrower for angiography or bypass surgery?
To construct the 90% confidence interval, we first need one piece of missing information, the “Z” value. A 90% confidence interval corresponds to a z value of 1.645 (within 2 standard deviations of the mean by convention). With all the needed pieces of the formula we can now compute the formula in R as shown with the code provided.
Bypass: [18.29,19.71]
Difference = 1.42
R-Code:
qnorm(.95)
How I got .95 was by knowing the corresponding z value. A 90% confidence interval provides tails of .95, .10/2 = .5+.90 = .95 = z value of 1.645
1.644854
center <- 19
stddev <-10
n <- 539
error <- qnorm(0.95)*stddev/sqrt(n)
error 0.7084886
lower_bound <- center - error
lower_bound 18.29151
upper_bound <- center + error
upper_bound 19.70849
Angiography: [17.49,18.51]
Difference = 1.02
R-Code:
center <-18
stddev <-9
n <- 847
error <- qnorm(0.95)*stddev/sqrt(n)
error 0.5086606
lower_bound <- center - error
lower_bound 17.49134
upper_bound <- center + error
upper_bound 18.50866
• As we can see the confidence interval is narrower for Angiography. Bypass surgy being .4 more in difference than Angiography makes the confidence interval narrower for Angiography.
A survey of 1031 adult Americans was carried out by the National Center for Public Policy. Assume that the sample is representative of adult Americans. Among those surveyed, 567 believed that college education is essential for success. Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success. Construct and interpret a 95% confidence interval for p.
Using division (567/1031), I found that the point estimate “p” is 0.54995 of American Adults believed that college is essential for success while those that don’t are 45% of the population. To construct and interpret a 95% confidence interval for “p” requires further research using the following elements.
prop.test(x=567, n=1031, p=0.54995, correct=FALSE)
1-sample proportions test without continuity correction
data: 567 out of 1031,null probability 0.54995 X-squared = 9.415e-09, df = 1, p-value = 0.9999 alternative hypothesis: true p is not equal to 0.54995 95 percent confidence interval: [0.52, 0.58] sample estimates: p = 0.5499515
Lower <- 0.52 Upper <- 0.58
The confidence interval at 95% has an interval of [.52, .58] which contains the difference between proportions. To further interpret, there are between 51-59% Americans who believe that a college education is essential for success when estimating the proportion.
• The difference in this question from question 1 is that we didn’t have the standard deviation. Not having the standard deviation, we use the Confidence Interval for a Proportion Formula.
Suppose that the financial aid office of UMass Amherst seeks to estimate the mean cost of textbooks per quarter for students. The estimate will be useful if it is within 5 dollars of the true population mean (i.e. they want the confidence interval to have a length of 10 dollars or less). The financial aid office is pretty sure that the amount spent on books varies widely, with most values between 30 dollars and $200. They think that the population standard deviation is about a quarter of this range. Assuming the significance level to be 5%, what should be the size of the sample?
To find the size of the sample you need the following.
N = population size = 170
z = z-score = 1.96
e = margin of error = 95% or 10 as they want a confidence interval of 10
p = standard of deviation = 170/4 = 42.5
Sample mean = 170*.5 = 85
Next we use the following standard formula to solve for the Sample Size.
Sample Size = {(z*sd)/5}2 = 277.5556
(Exercise 6.7, Chapter 6 of SMSS, Agresti 2018) According to a union agreement, the mean income for all senior-level workers in a large service company equals 500 dollars per week. A representative of a women’s group decides to analyze whether the mean income μ for female employees matches this norm. For a random sample of nine female employees, ȳ = $410 and s = 90.
##PART A Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.
##Answer Assumptions
Type of data: Quantitative as the question represents amounts rather than groupings
Randomization: As the question stands our random sample stands to hold a random sample of 9 female employees.
Population distribution: Normal distribution.
Sample size “n”: 9 Population Mean: 500 Standard Deviation “s”: 90 Sample mean “y”:410
Hypothesis The hypothesis is as follows.
Ho : μ=$500 H1: μ≠$500
Ho is the null hypothesis and represents the population mean which is $500 H1 is the alternative hypothesis and represents the population proportion not equal to $500
Test Statistic & P – Value
• To find the answer I turn to the one sample t-Test.
Population Mean = 500 Sample Mean = 410 Sample Size = 9 Sample Standard Deviation = 90
• Assuming the data is normally distributed and the significance of the test is .05
T = (Xbar – μ) / sd / sqrt(n)
(410-500) / 90 / sqrt(9) = -3
The probability the t value being -3 is less than 1 percent; .85% and since the alternative hypothesis (isn’t equal), we’ll need a two-tailed probability.
q = t-score = -3
df = degrees of freedom (sample size “n”-1) = 9-1 = 8
Left-tailed test in R:
p_value=pt(q=-3, df=8, lower.tail = TRUE)
0.00853584
The P-value is as follows due to that we’re calculating a two tailed test,
2 * 0.0085 = 0.0171
With the P-Value of 0.0171 (lower than the significance level of .05) we can say that there is significance at p <.05 between women and senior level workers making the seniors pay “rejected”. There is enough evidence to say that the women aren’t paid as much.
Report the P-value for Ha : μ < 500. Interpret.
Remembering that this is calculated assuming the null hypothesis is true, we look further to the left sided tail test. The P-Value itself validates if a null hypothesis is true or not. The p value is a kind of “credibility rating” of a null hypothesis in light of evidence.
• To find the p-value we must first get the following values
q = t-score = -3 df = degrees of freedom (sample size “n”-1) = 9-1 = 8 Left-tailed test in R:
p_value=pt(q=-3, df=8, lower.tail = TRUE)
p_value P-Value: 0.008535841
In this case, the p-value is significant. This shows that we can reject the null hypothesis or in other words reject the hypothesis that was claimed.
Report and interpret the P-value for H a: μ > 500. (Hint: The P-values for the two possible one-sided tests must sum to 1.)
##Answer Right-tailed test in R:
p_value=pt(q=-3,df=8, lower.tail=FALSE)
p_value P-Value: 0.9914642
0.008535841 + 0.9914642 = 1
Having the P-values for the two possible one-sided tests must sum to 1 I know that I came to right conclusion.
(Exercise 6.23, Chapter 6 of SMSS, Agresti 2018) Jones and Smith separately conduct studies to test H0: μ = 500 against Ha : μ ≠ 500, each with n = 1000. Jones gets ȳ = 519.5, with se = 10.0. Smith gets ȳ = 519.7, with se = 10.0
To first show the t test I used the test statistic formula: t = ȳ - μ / se
Jones <- (519.5 - 500) / 10
T-Test for Jones = 1.95
Smith <- (519.7 – 500)/10
T-Test for Smith = 1.97
Now that we have the T-Tests for both Jones and smith we can solve for the P-Value. Remembering the formula used in question 4 we need the degrees of freedom and p = standard error. Degrees of freedom being n – 1 we find that df = 999.
p_value=pt(q=1.95, df=999, lower.tail = FALSE) Multiplied by 2 = .051
Jones P Value = 0.051
Performing the same operation for Smith we find the following, p_value=pt(q=1.97, df=999, lower.tail = FALSE) Multiplid by 2 = 0.049
Smiths P Value = 0.049
Using α = 0.05, for each study indicate whether the result is “statistically significant.”
With a significance level α = 0.05, we can not reject the null hypothesis for Jones as his P Value is .051. The P Value being larger for Jones than the significance level also shows that the results are not statistically significant. Smith on the other hand is the opposite where his P-Value is less than the significance level and can reject the null hypothesis and are statistically significant.
Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0 ,” without reporting the actual P-value.
The misleading aspects of reporting the results of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0 ,” without reporting the actual P-value is that they are based on arbitrary evaluations. Being presented with statistics that are based on random choice only shows one aspects of a situation where there can be an infinite number of outcomes. As an example, and to conclude my explanation we can’t have a binary justification as to reject or not reject a hypothesis as statistics can take a quick snapshot of a scenario in time but does not in any way describe the entire story. However, should we decide to look at a situation, snapshot or circumstance in time, the p-value does provide the best insight for the outcomes significance.
Are the taxes on gasoline very high in the United States? According to the American Petroleum Institute, the per gallon federal tax that was levied on gasoline was 18.4 cents per gallon. However, state and local taxes vary over the same period. The sample data of gasoline taxes for 18 large cities is given below in the variable called gas_taxes.
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
Is there enough evidence to conclude at a 95% confidence level that the average tax per gallon of gas in the US in 2005 was less than 45 cents? Explain.
95% Confidence interval has a 5% significance level and 1.960 Z Value
• We use the Z value amongst the following values to solve for the confidence interval
Confidence interval Formula: X (+or–) Z s/sqrt(n)
• X is the mean = 40.86278 • Z is the chosen Z-value from the table above = 1.960 • s is the standard deviation = 9.308317 • n is the number of observations = 18
R CODE used
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
mean(gas_taxes) [1] 40.86278
• The last thing that we need in order to find the confidence interval is the standard deviation.
sd(gas_taxes) [1] 9.308317
The value we derived from the formula for the confidence interval is the margin or error. [36.6, 45.2] We can conclude that yes, with 95% confidence the population mean is between 36.6 and 45.2, based on only 18 samples.