HW1 V2

QUESTION 1

The time between the date a patient was recommended for heart surgery and the surgery date for cardiac patients in Ontario was collected by the Cardiac Care Network (“Wait Times Data Guide,” Ministry of Health and Long-Term Care, Ontario, Canada, 2006). The sample mean and sample standard deviation for wait times (in days) of patients for two cardiac procedures are given in the accompanying table. Assume that the sample is representative of the Ontario population.

Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures.

confidence_level <- 0.9
s_size_bypass <- 539
s_size_angio <- 847
S_mean_bypass <- 19
s_mean_angio <- 18
s_sd_bypass <- 10
s_sd_angio <- 9

tail_area <- (1-confidence_level)/2
tail_area

[1] 0.05

t_score_bypass <- qt(p = 1-tail_area, df = s_size_bypass-1)
t_score_bypass

[1] 1.647691

t_score_angio <- qt(p = 1-tail_area, df = s_size_angio-1)
t_score_angio

[1] 1.646657

CI_bypass <- c(S_mean_bypass - t_score_bypass * s_sd_bypass / sqrt(s_size_bypass),
        S_mean_bypass + t_score_bypass * s_sd_bypass / sqrt(s_size_bypass))

CI_bypass

[1] 18.29029 19.70971

CI_angio <- c(s_mean_angio - t_score_angio * s_sd_angio / sqrt(s_size_angio),
       s_mean_angio + t_score_angio * s_sd_angio / sqrt(s_size_angio))

print(CI_bypass)

[1] 18.29029 19.70971

print(CI_angio)

[1] 17.49078 18.50922

19.70971-18.29029

[1] 1.41942

18.50922-17.49078

[1] 1.01844

I followed the instructions to “Calculate Confidence Interval Manually” from the Tutorial to get to the following conclusions:

The 90% Confidence Interval for Bypass Surgery Wait Time Mean is 18.29029 to 19.70971 days

The 90% Confidence Interval for Angiography Surgery Wait Time Mean is 17.49078 to 18.50922 days

Is the confidence interval narrower for angiography or bypass surgery?

The confidence interval is more narrow for angiography.

QUESTION 2

A survey of 1031 adult Americans was carried out by the National Center for Public Policy. Assume that the sample is representative of adult Americans. Among those surveyed, 567 believed that college education is essential for success. Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success. Construct and interpret a 95% confidence interval for p.

confidence_level <- 0.95
x <- 567
s_size2 <- 1031
p <- x/s_size2
p

[1] 0.5499515

?prop.test
prop.test(x,s_size2)


    1-sample proportions test with continuity correction

data:  x out of s_size2, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5189682 0.5805580
sample estimates:
        p 
0.5499515

Based on the sample, approximately 55% of adult Americans believe that college education is essential for success. p=0.55

The 95% confidence interval is 0.5189682 to 0.5805580.

We are 95% confident that between 51.9% and 58.1% of adult Americans believe that college education is essential for success.

QUESTION 3

Suppose that the financial aid office of UMass Amherst seeks to estimate the mean cost of textbooks per quarter for students. The estimate will be useful if it is within $5 of the true population mean (i.e. they want the confidence interval to have a length of $10 or less). The financial aid office is pretty sure that the amount spent on books varies widely, with most values between $30 and $200. They think that the population standard deviation is about a quarter of this range.

Assuming the significance level to be 5%, what should be the size of the sample?

#n=?

z <- 1.96
M <-5
sd <- 170/4
sd

[1] 42.5

(sd)*(sd)*(z/M)*(z/M)

[1] 277.5556

I found this formula in section 5.4 of the Agresti test book.

Sample size should be at least 278 books.

QUESTION 4 (Exercise 6.7, Chapter 6 of SMSS, Agresti 2018) According to a union agreement, the mean income for all senior-level workers in a large service company equals $500 per week. A representative of a women’s group decides to analyze whether the mean income μ for female employees matches this norm. For a random sample of nine female employees, ȳ = $410 and s = 90.

Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result. Report the P-value for Ha : μ < 500. Interpret. Report and interpret the P-value for H a: μ > 500. (Hint: The P-values for the two possible one-sided tests must sum to 1.)

H0= Mean=500 HA= Mean does not equal 500

s_mean4 <- 500
s_mean_alt <- 410
s <- 90
n4 <- 9
df4<- n4-1
df4

[1] 8

t.score4 <- (s_mean_alt - s_mean4)/(s*(sqrt(9)))
t.score4

[1] -0.3333333

p.value <- 2*pt(-abs(t.score4),df4)
p.value

[1] 0.747451

greater_p.value <- pt(t.score4, df4, lower= FALSE)
greater_p.value

[1] 0.6262745

less_p.value <- pt(t.score4, df4, lower= TRUE)
less_p.value

[1] 0.3737255

greater_p.value+ less_p.value

[1] 1

I started by calculating T Score = (sample mean - population mean)/ (Standard Deviation/ Square Root of Sample Size). The T Score is -0.33333

I got a little lost and used google to help my find the formulat to use the T Score to find the (2 sided) P Value , specifically https://www.cyclismo.org/tutorial/R/pValues.html and followed the example in section 10.2 since it seemed similar, to calculate the p.value.

I multiplied 2 (since 2 sided) by the pt calculation using the T Score and Degrees of Freedom. The result was a p value of 0.7475. This is a large P Value, so this result suggests that we should not reject the original hypothesis that $500 is the mean income.

I then calculated the p value in the case that the hypothesis is that the Mean income is greater than $500, the result was 0.6268. I then calculated the p value in the case that the hypothesis is that the Mean income is less than $500, the result was 0.3737. In both of these situations, there is not enough information to reject the original hypothesis. (However it seems more likely that the mean is actuallyless than $500 than it is more than $500.)

The sum of both p-values in this calculation is 1.

QUESTION 5

(Exercise 6.23, Chapter 6 of SMSS, Agresti 2018) Jones and Smith separately conduct studies to test H0: μ = 500 against Ha : μ ≠ 500, each with n = 1000. Jones gets ȳ = 519.5, with se = 10.0. Smith gets ȳ = 519.7, with se = 10.0. Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97 and P-value = 0.049 for Smith. Using α = 0.05, for each study indicate whether the result is “statistically significant.” Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0 ,” without reporting the actual P-value.

H0 = Mean = 500 Ha = Mean does not equal 500 Sample Size = 1000

population_mean <- 500
n5 <- 1000
jones_mean <- 519.5
jones_Se <- 10


jones_t <- (jones_mean - population_mean)/(jones_Se)
jones_t

[1] 1.95

?pt
jones_pvalue <-2*pt(jones_t,(n5-1), lower.tail= FALSE)
jones_pvalue

[1] 0.05145555

smith_mean <- 519.7
smith_se <- 10

smith_t <- (smith_mean - population_mean)/smith_se
smith_t

[1] 1.97

smith_pvalue <-2*pt(smith_t,(n5-1), lower.tail= FALSE)
smith_pvalue

[1] 0.04911426

The formulas above do indicate that t = 1.95 and P-value = 0.051 for Jones and t = 1.97 and P-value = 0.049 for Smith.

Technically the Jones p-value indicates that the Jones results are statistically significant, however they are very close to being insignificant- that should be noted. The Smith p-value indicates that the Smith results are not statistically significant since they are less than 0.05

QUESTION 6

Are the taxes on gasoline very high in the United States? According to the American Petroleum Institute, the per gallon federal tax that was levied on gasoline was 18.4 cents per gallon. However, state and local taxes vary over the same period. The sample data of gasoline taxes for 18 large cities is given below in the variable called gas_taxes.

gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

Is there enough evidence to conclude at a 95% confidence level that the average tax per gallon of gas in the US in 2005 was less than 45 cents? Explain.

gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

n6 <- 18

summary(gas_taxes)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  18.49   35.95   41.48   40.86   47.95   54.41

gas_taxes_mean <- mean(gas_taxes)
gas_taxes_mean

[1] 40.86278

gas_taxes_sd <- sd(gas_taxes)
gas_taxes_sd

[1] 9.308317

confidence_level6 <- 0.95

tail_area6 <- (1 - confidence_level6)/2
tail_area6

[1] 0.025

t_score6 <- qt(p = 1-tail_area, df = n6-1)
t_score6

[1] 1.739607

CI <- c(gas_taxes_mean - t_score6 * gas_taxes_sd / sqrt(n6),
        gas_taxes_mean + t_score6 * gas_taxes_sd / sqrt (n6))

CI

[1] 37.04610 44.67946

I’m assuming this data is from 2005, though that’s not incredibly obvious. I’m assuming the sample of gas prices is the sum of local, state and federal.

The mean gas tax from the sample group is 40.86 cents.

I again used the formula from our tutorial for calculating Confidence Interval Manually.

I calculated the Confidence Interval of 95% based on the sample data provided. The CI is between 36.23 and 45.49

So I am 95% confident that the average gas tax per gallon in the US for the given time period was between 36.23 cents and 45.49 cents.

Being conservative, I don’t think it’s correct to say that I am 95% confident that the mean for all of the US gas taxes is less than 45 cents.

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.