DACSS 603: Homework 1

Homework # 1 questions and answers for DACSS 603: Introduction to Quantitative Analysis

Megan Georges
2/22/2022

Question 1:

The time between the date a patient was recommended for heart surgery and the surgery date for cardiac patients in Ontario was collected by the Cardiac Care Network (“Wait Times Data Guide,” Ministry of Health and Long-Term Care, Ontario, Canada, 2006). The sample mean and sample standard deviation for wait times (in days) of patients for two cardiac procedures are given in the accompanying table. Assume that the sample is representative of the Ontario population. Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures. Is the confidence interval narrower for angiography or bypass surgery?

procedure <- c('Bypass', 'Angiography')
samplesize <- c(539, 847)
meanwait <- c(19, 18)
standev <- c(10, 9)

surgdata <- data.frame(procedure, samplesize, meanwait, standev)

kable(surgdata, col.names = c("Surgical Procedure", "Sample Size", "Mean Wait Time", "Standard Deviation"), 
      align = c('c', 'c', 'c', 'c')) %>%
    kable_styling(fixed_thead = TRUE)%>%
  scroll_box(width = "100%", height = "100%")
Surgical Procedure Sample Size Mean Wait Time Standard Deviation
Bypass 539 19 10
Angiography 847 18 9

Answer:

We have the sample size, mean, and standard deviation and can assume the sample is representative of the Ontario population. We do not know the population mean or standard deviation. We will use the t-distribution to produce an interval estimate for the true mean wait times of the two procedures. According to the text (SMSS, section 5.6), “confidence intervals using the t-distribution apply with any n but assume a normal population distribution.”.

Using formula to calculate confidence intervals:

# Bypass procedure
ybarB <- 19
tB <- qt(.05, df=539-1)
seB <- 10/sqrt(539)

ybarB + tB*seB
[1] 18.29029
ybarB - tB*seB
[1] 19.70971
# Angiography procedure
ybarA <- 18
tA <- qt(.05, df=847-1)
seA <- 9/sqrt(847)

ybarA + tA*seA
[1] 17.49078
ybarA - tA*seA
[1] 18.50922

Reporting the confidence intervals:

Which confidence interval is narrower?

# Bypass confidence interval difference
(ybarB - tB*seB)-(ybarB + tB*seB)
[1] 1.419421
# Angiography confidence interval difference
(ybarA - tA*seA)-(ybarA + tA*seA)
[1] 1.018436

The confidence interval for the angiography procedure is narrower than the confidence interval for the bypass procedure.


Question 2:

A survey of 1031 adult Americans was carried out by the National Center for Public Policy. Assume that the sample is representative of adult Americans. Among those surveyed, 567 believed that college education is essential for success. Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success. Construct and interpret a 95% confidence interval for p.

Answer:

Since the data is that of proportions, we will use prop.test() to calculate p and the 95% confidence interval.

prop.test(567, 1031, conf.level = .95)

    1-sample proportions test with continuity correction

data:  567 out of 1031, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5189682 0.5805580
sample estimates:
        p 
0.5499515 

Question 3:

Suppose that the financial aid office of UMass Amherst seeks to estimate the mean cost of textbooks per quarter for students. The estimate will be useful if it is within $5 of the true population mean (i.e. they want the confidence interval to have a length of $10 or less). The financial aid office is pretty sure that the amount spent on books varies widely, with most values between $30 and $200. They think that the population standard deviation is about a quarter of this range. Assuming the significance level to be 5%, what should be the size of the sample?

Answer:

# Computing sample size
stdevBooks <- (200-30)/4
margerrorBooks <- (10/2)
zBooks <- 1.96

stdevBooks^2 * (zBooks/margerrorBooks)^2
[1] 277.5556

To achieve an estimate of the mean cost of books with the range of a 95% confidence interval equal to or less than $10, the sample size should be at least 278.


Question 4:

(Exercise 6.7, Chapter 6 of SMSS, Agresti 2018) According to a union agreement, the mean income for all senior-level workers in a large service company equals $500 per week. A representative of a women’s group decides to analyze whether the mean income μ for female employees matches this norm. For a random sample of nine female employees, ȳ = $410 and s = 90.

  1. Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.
  2. Report the P-value for Ha : μ < 500. Interpret.
  3. Report and interpret the P-value for H a: μ > 500.

(Hint: The P-values for the two possible one-sided tests must sum to 1.)

Answer:

a.

# Calculate t-statistic:
(410-500)/(90/sqrt(9))
[1] -3
# Calculate p-value
pt(-3, 8)*2 
[1] 0.01707168
# Multiply by 2 to account for two-tails

The test statistic is t=-3 and the p-value is P=0.01707168. With an α-level of .05, the p-value is substantially less than .05, thus we will reject the null hypothesis. There is strong evidence that the mean income of female employees is not equal to $500.

b.

# Calculate p-value for Ha: μ < 500
pt(-3, 8, lower.tail = TRUE)
[1] 0.008535841

The p-value for \(H_{a}\): μ < 500 is P=0.008535841. With an α-level of .05, the p-value is substantially less than .05, thus we will reject the null hypothesis. There is evidence that the mean income of female employees is less than $500.

c.

# Calculate p-value for Ha: μ > 500
pt(-3, 8, lower.tail = FALSE)
[1] 0.9914642

The p-value for \(H_{a}\): μ > 500 is P=0.9914642. With an α-level of .05, we fail to reject the null hypothesis. It is highly unlikely that the mean income of female employees is greater than $500.


Question 5:

(Exercise 6.23, Chapter 6 of SMSS, Agresti 2018) Jones and Smith separately conduct studies to test H0: μ = 500 against Ha : μ ≠ 500, each with n = 1000. Jones gets ȳ = 519.5, with se = 10.0. Smith gets ȳ = 519.7, with se = 10.0.

  1. Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97 and P-value = 0.049 for Smith.
  2. Using α = 0.05, for each study indicate whether the result is ‘statistically significant.’
  3. Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0 ,” without reporting the actual P-value.

Answer:

a.

# Jones t=1.95, P=.051
JonesT <- (519.5-500)/10
JonesT
[1] 1.95
JonesP <- pt(1.95, 999, lower.tail = FALSE)*2
JonesP
[1] 0.05145555
# Smith t=1.97, P=.049
SmithT <- (519.7-500)/10
SmithT
[1] 1.97
SmithP <- pt(1.97, 999, lower.tail = FALSE)*2
SmithP
[1] 0.04911426

b.

With an α-level of .05, the p-values that both Jones (P=.051) and Smith (P=.049) found are very close to equivalent. Although Jones’ P-value is slightly greater than α=.05 and Smith’s P-value is slightly less than α=.05, the proximity of the results should yield the same conclusion. Both P-values provide moderate evidence to reject the null hypothesis and indicate that the mean is not equal to 500. If we were to technically interpret the P-values, then Jones’ test would fail to reject the null hypothesis, and Smith’s test would reject the null hypothesis.

c.

If we fail to report the P-value and simply state whether the P-value is less than/equal to or greater than the defined significance level of the test, one cannot determine the strength of the conclusion. For example, a P-value of .009 for a significance level of .05 provides much stronger evidence to reject the null than a P-value of .045, however both values allow for rejection of the null at the significance level .05. In the Jones/Smith example, reporting the results only as “P ≤ 0.05” versus “P > 0.05” will lead to different conclusions about very similar results (rejecting versus failing to reject the null).


Question 6:

Are the taxes on gasoline very high in the United States? According to the American Petroleum Institute, the per gallon federal tax that was levied on gasoline was 18.4 cents per gallon. However, state and local taxes vary over the same period. The sample data of gasoline taxes for 18 large cities is given below in the variable called gas_taxes.

gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

Is there enough evidence to conclude at a 95% confidence level that the average tax per gallon of gas in the US in 2005 was less than 45 cents? Explain.

Answer:

t.test(gas_taxes, mu = 18.4, conf.level = .95)

    One Sample t-test

data:  gas_taxes
t = 10.238, df = 17, p-value = 1.095e-08
alternative hypothesis: true mean is not equal to 18.4
95 percent confidence interval:
 36.23386 45.49169
sample estimates:
mean of x 
 40.86278 

The 95% confidence interval for the mean tax per gallon is 36.23386 through 45.49169. We cannot conclude with 95% confidence that the mean tax is less than 45 cents, since the 95% confidence interval contains values above 45 cents (up to 45.49169).