Question 1

The time between the date a patient was recommended for heart surgery and the surgery date for cardiac patients in Ontario was collected by the Cardiac Care Network (“Wait Times Data Guide,” Ministry of Health and Long-Term Care, Ontario, Canada, 2006). The sample mean and sample standard deviation for wait times (in days) of patients for two cardiac procedures are given in the accompanying table. Assume that the sample is representative of the Ontario population

Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures. Is the confidence interval narrower for angiography or bypass surgery?

Answer

To construct the 90% confidence interval, we first need one piece of missing information, the “Z” value. A 90% confidence interval corresponds to a z value of 1.645 (within 2 standard deviations of the mean by convention). With all the needed pieces of the formula we can now compute the formula in R as shown with the code provided.

                                              Bypass: [18.29,19.71] 

Difference = 1.42

R-Code:

qnorm(.95)

How I got .95 was by knowing the corresponding z value. A 90% confidence interval provides tails of .95, .10/2 = .5+.90 = .95 = z value of 1.645

1.644854

center <- 19

stddev <-10

n <- 539

error <- qnorm(0.95)*stddev/sqrt(n)

error 0.7084886

lower_bound <- center - error

lower_bound 18.29151

upper_bound <- center + error

upper_bound 19.70849

                                           Angiography: [17.49,18.51]

Difference = 1.02

R-Code:

center <-18

stddev <-9

n <- 847

error <- qnorm(0.95)*stddev/sqrt(n)

error 0.5086606

lower_bound <- center - error

lower_bound 17.49134

upper_bound <- center + error

upper_bound 18.50866

• As we can see the confidence interval is narrower for Angiography. Bypass surgy being .4 more in difference than Angiography makes the confidence interval narrower for Angiography.

Question 2

A survey of 1031 adult Americans was carried out by the National Center for Public Policy. Assume that the sample is representative of adult Americans. Among those surveyed, 567 believed that college education is essential for success. Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success. Construct and interpret a 95% confidence interval for p.

Answer

Using division (567/1031), I found that the point estimate “p” is 0.54995 of American Adults believed that college is essential for success while those that don’t are 45% of the population. To construct and interpret a 95% confidence interval for “p” requires further research using the following elements.

  1. Sample proportion; 0.54995 = p
  2. The sample size; 567 = x
  3. Total Adults; 1031 = n

prop.test(x=567, n=1031, p=0.54995, correct=FALSE)

1-sample proportions test without continuity correction

data: 567 out of 1031,null probability 0.54995 X-squared = 9.415e-09, df = 1, p-value = 0.9999 alternative hypothesis: true p is not equal to 0.54995 95 percent confidence interval: [0.52, 0.58] sample estimates: p = 0.5499515

                              Lower <- 0.52 Upper <-  0.58

The confidence interval at 95% has an interval of [.52, .58] which contains the difference between proportions. To further interpret, there are between 51-59% Americans who believe that a college education is essential for success when estimating the proportion.

• The difference in this question from question 1 is that we didn’t have the standard deviation. Not having the standard deviation, we use the Confidence Interval for a Proportion Formula.

Question 3

Suppose that the financial aid office of UMass Amherst seeks to estimate the mean cost of textbooks per quarter for students. The estimate will be useful if it is within 5 dollars of the true population mean (i.e. they want the confidence interval to have a length of 10 dollars or less). The financial aid office is pretty sure that the amount spent on books varies widely, with most values between 30 dollars and $200. They think that the population standard deviation is about a quarter of this range. Assuming the significance level to be 5%, what should be the size of the sample?

Answer

To find the size of the sample you need the following.

N = population size = 170

z = z-score = 1.96

e = margin of error = 95% or 10 as they want a confidence interval of 10

p = standard of deviation = 170/4 = 42.5

Sample mean = 170*.5 = 85

Next we use the following standard formula to solve for the Sample Size.

                             Sample Size = {(z*sd)/5}2 = 277.5556 

Question 4

(Exercise 6.7, Chapter 6 of SMSS, Agresti 2018) According to a union agreement, the mean income for all senior-level workers in a large service company equals 500 dollars per week. A representative of a women’s group decides to analyze whether the mean income μ for female employees matches this norm. For a random sample of nine female employees, ȳ = $410 and s = 90.

##PART A Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.

##Answer Assumptions

Type of data: Quantitative as the question represents amounts rather than groupings

Randomization: As the question stands our random sample stands to hold a random sample of 9 female employees.

Population distribution: Normal distribution.

Sample size “n”: 9 Population Mean: 500 Standard Deviation “s”: 90 Sample mean “y”:410

Hypothesis The hypothesis is as follows.

Ho : μ=$500 H1: μ≠$500

Ho is the null hypothesis and represents the population mean which is $500 H1 is the alternative hypothesis and represents the population proportion not equal to $500

                                    Test Statistic & P – Value

• To find the answer I turn to the one sample t-Test.

Population Mean = 500 Sample Mean = 410 Sample Size = 9 Sample Standard Deviation = 90

• Assuming the data is normally distributed and the significance of the test is .05

T = (Xbar – μ) / sd / sqrt(n)
(410-500) / 90 / sqrt(9) = -3

The probability the t value being -3 is less than 1 percent; .85% and since the alternative hypothesis (isn’t equal), we’ll need a two-tailed probability.

q = t-score = -3

df = degrees of freedom (sample size “n”-1) = 9-1 = 8

Left-tailed test in R:

                        p_value=pt(q=-3, df=8, lower.tail = TRUE)

0.00853584

The P-value is as follows due to that we’re calculating a two tailed test,

                                     2 * 0.0085 = 0.0171

With the P-Value of 0.0171 (lower than the significance level of .05) we can say that there is significance at p <.05 between women and senior level workers making the seniors pay “rejected”. There is enough evidence to say that the women aren’t paid as much.


Question 4 B

Report the P-value for Ha : μ < 500. Interpret.

Answer

Remembering that this is calculated assuming the null hypothesis is true, we look further to the left sided tail test. The P-Value itself validates if a null hypothesis is true or not. The p value is a kind of “credibility rating” of a null hypothesis in light of evidence.

• To find the p-value we must first get the following values

q = t-score = -3 df = degrees of freedom (sample size “n”-1) = 9-1 = 8 Left-tailed test in R:

                              p_value=pt(q=-3, df=8, lower.tail = TRUE)

p_value P-Value: 0.008535841

In this case, the p-value is significant. This shows that we can reject the null hypothesis or in other words reject the hypothesis that was claimed.


Question 4 C

Report and interpret the P-value for H a: μ > 500. (Hint: The P-values for the two possible one-sided tests must sum to 1.)

##Answer Right-tailed test in R:

                            p_value=pt(q=-3,df=8, lower.tail=FALSE)

p_value P-Value: 0.9914642

                                  0.008535841 + 0.9914642 = 1
    

Having the P-values for the two possible one-sided tests must sum to 1 I know that I came to right conclusion.

Question 5A

(Exercise 6.23, Chapter 6 of SMSS, Agresti 2018) Jones and Smith separately conduct studies to test H0: μ = 500 against Ha : μ ≠ 500, each with n = 1000. Jones gets ȳ = 519.5, with se = 10.0. Smith gets ȳ = 519.7, with se = 10.0

Answer

To first show the t test I used the test statistic formula: t = ȳ - μ / se

Jones <- (519.5 - 500) / 10

                                   T-Test for Jones = 1.95

Smith <- (519.7 – 500)/10

                                   T-Test for Smith = 1.97 

Now that we have the T-Tests for both Jones and smith we can solve for the P-Value. Remembering the formula used in question 4 we need the degrees of freedom and p = standard error. Degrees of freedom being n – 1 we find that df = 999.

p_value=pt(q=1.95, df=999, lower.tail = FALSE) Multiplied by 2 = .051

                                    Jones P Value = 0.051

Performing the same operation for Smith we find the following, p_value=pt(q=1.97, df=999, lower.tail = FALSE) Multiplid by 2 = 0.049

                                  Smiths P Value = 0.049

Question 5B

Using α = 0.05, for each study indicate whether the result is “statistically significant.”

Answer

With a significance level α = 0.05, we can not reject the null hypothesis for Jones as his P Value is .051. The P Value being larger for Jones than the significance level also shows that the results are not statistically significant. Smith on the other hand is the opposite where his P-Value is less than the significance level and can reject the null hypothesis and are statistically significant.

Question 5C

Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0 ,” without reporting the actual P-value.

Answer

The misleading aspects of reporting the results of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0 ,” without reporting the actual P-value is that they are based on arbitrary evaluations. Being presented with statistics that are based on random choice only shows one aspects of a situation where there can be an infinite number of outcomes. As an example, and to conclude my explanation we can’t have a binary justification as to reject or not reject a hypothesis as statistics can take a quick snapshot of a scenario in time but does not in any way describe the entire story. However, should we decide to look at a situation, snapshot or circumstance in time, the p-value does provide the best insight for the outcomes significance.

Question 6

Are the taxes on gasoline very high in the United States? According to the American Petroleum Institute, the per gallon federal tax that was levied on gasoline was 18.4 cents per gallon. However, state and local taxes vary over the same period. The sample data of gasoline taxes for 18 large cities is given below in the variable called gas_taxes.

gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

Is there enough evidence to conclude at a 95% confidence level that the average tax per gallon of gas in the US in 2005 was less than 45 cents? Explain.

Answer

95% Confidence interval has a 5% significance level and 1.960 Z Value

• We use the Z value amongst the following values to solve for the confidence interval

Confidence interval Formula: X (+or–) Z s/sqrt(n)

• X is the mean = 40.86278 • Z is the chosen Z-value from the table above = 1.960 • s is the standard deviation = 9.308317 • n is the number of observations = 18

R CODE used

gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

mean(gas_taxes) [1] 40.86278

• The last thing that we need in order to find the confidence interval is the standard deviation.

sd(gas_taxes) [1] 9.308317

The value we derived from the formula for the confidence interval is the margin or error. [36.6, 45.2] We can conclude that yes, with 95% confidence the population mean is between 36.6 and 45.2, based on only 18 samples.