PROBLEM I: Complete a hypothesis test for two of the following. 1) State the null and alternative. You may use words or symbols. 2) Provide the test statistic (no interpretation needed) 3) Provide the p-value (no interpretation needed) 4) State your decision. You may use α=0.05 5) State your conclusion in terms of the problem. 6) Address any additional questions as needed.

Do patients average cholesterol differ by sex?

  1. State the null and alternative:

Ho: There is no difference in average cholesterol between the sexes.

Ha: There is a difference in average cholesterol between the sexes.

  1. Provide the test statistic (no interpretation needed)

The test statistic and degrees of freedom are as follows: t = 2.3589, df = 4913

  1. Provide the p-value (no interpretation needed)

The p-value is as follows: p-value = 0.01837

  1. State your decision. You may use α=0.05

Since the p-value is 0.01837, which is less then α=0.05, we reject the null hypothesis.

  1. State your conclusion in terms of the problem.

At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in cholesterol levels between the sexes.

## 
##  Two Sample t-test
## 
## data:  totalchol by sex
## t = 2.3589, df = 4913, p-value = 0.01837
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
##  0.4651368 5.0424457
## sample estimates:
## mean in group F mean in group M 
##        197.1000        194.3462

Do patients average sedentary minutes differ by sex?

  1. State the null and alternative:

Ho: There is no difference in average sedentary minutes between the sexes.

Ha: There is a difference in average sedentary minutes between the sexes.

  1. Provide the test statistic (no interpretation needed)

The test statistic and degrees of freedom are as follows: t = -2.1003, df = 4913

  1. Provide the p-value (no interpretation needed)

The p-value is as follows: p-value = 0.03575

  1. State your decision. You may use α=0.05

Since the p-value is 0.03575, which is less then α=0.05, we reject the null hypothesis.

  1. State your conclusion in terms of the problem.

At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in sedentary minutes between the sexes.

## 
##  Two Sample t-test
## 
## data:  sedmins by sex
## t = -2.1003, df = 4913, p-value = 0.03575
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
##  -21.3373009  -0.7348513
## sample estimates:
## mean in group F mean in group M 
##        305.8654        316.9015

PROBLEM II: Complete a hypothesis test to determine if the average age of patients differ by marriage.
1) State the null and alternative. You may use words or symbols. 2) Provide the test statistic (no interpretation needed) 3) Provide the p-value (no interpretation needed) 4) State your decision. You may use α=0.05 5) State your conclusion in terms of the problem. 6) Address any additional questions as needed.

  1. State the null and alternative. You may use words or symbols.

Ho: There is no significant difference in average age across marital status categories.

Ha: At least one marital status category has a significantly different mean age.

  1. Provide the test statistic (no interpretation needed)

The F statistic is as follows: F = 558

  1. Provide the p-value (no interpretation needed)

The p-value is as follows: p-value = <2e-16

  1. State your decision. You may use α=0.05

Since the p-value is <2e-16, which is less then α=0.05, we reject the null hypothesis.

  1. State your conclusion in terms of the problem.

At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in average age across marital status categories.

##               Df  Sum Sq Mean Sq F value Pr(>F)    
## marriage       4  482636  120659     558 <2e-16 ***
## Residuals   4910 1061672     216                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Address any additional questions:
  1. If the patient’s average age differs by marriage, what marriage categories are significant?

A Bonferroni test determine the following marital categories to show significant difference by age: - married vs widowed - married vs other - married vs never married - widowed vs other - widowed vs never married - widwowed vs divorced - other vs never married - other vs divorced - never married vs divorced

  1. Report two significant comparisons and explain their importance.

The married group has an average age 2.95 years ABOVE the overall mean and the never married group has an average age 15.71 years BELOW the overall mean.

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Patients$age and Patients$marriage 
## 
##              divorced married nevermarried other  
## married      0.97     -       -            -      
## nevermarried < 2e-16  < 2e-16 -            -      
## other        < 2e-16  < 2e-16 3.9e-14      -      
## widowed      < 2e-16  < 2e-16 < 2e-16      < 2e-16
## 
## P value adjustment method: bonferroni
## $`married vs widowed`
## [1] -20.65591 -17.44047
## attr(,"conf.level")
## [1] 0.95
## 
## $`married vs other`
## [1] 10.95577 13.78595
## attr(,"conf.level")
## [1] 0.95
## 
## $`married vs nevermarried`
## [1] 17.45853 19.86660
## attr(,"conf.level")
## [1] 0.95
## 
## $`married vs divorced`
## [1] -2.5971896  0.2799448
## attr(,"conf.level")
## [1] 0.95
## 
## $`widowed vs other`
## [1] 29.80448 33.03363
## attr(,"conf.level")
## [1] 0.95
## 
## $`widowed vs nevermarried`
## [1] 36.16961 39.25190
## attr(,"conf.level")
## [1] 0.95
## 
## $`widowed vs divorced`
## [1] 16.33254 19.44660
## attr(,"conf.level")
## [1] 0.95
## 
## $`other vs nevermarried`
## [1] 4.792852 7.790557
## attr(,"conf.level")
## [1] 0.95
## 
## $`other vs divorced`
## [1] -15.14524 -11.91372
## attr(,"conf.level")
## [1] 0.95
## 
## $`nevermarried vs divorced`
## [1] -21.31127 -18.33110
## attr(,"conf.level")
## [1] 0.95
## [1] "Overall Mean:  49.2724313326551"
## [1] "Group Means:  53.3836126629423" "Group Means:  52.224990268587" 
## [3] "Group Means:  33.5624256837099" "Group Means:  39.8541300527241"
## [5] "Group Means:  71.2731829573935"
## [1] "Deviations:  4.11118133028713"  "Deviations:  2.95255893593186" 
## [3] "Deviations:  -15.7100056489453" "Deviations:  -9.41830127993106"
## [5] "Deviations:  22.0007516247383"
##                                                                    divorced 
##      "divorced group has an average age 4.11 years ABOVE the overall mean." 
##                                                                     married 
##       "married group has an average age 2.95 years ABOVE the overall mean." 
##                                                                nevermarried 
## "nevermarried group has an average age 15.71 years BELOW the overall mean." 
##                                                                       other 
##         "other group has an average age 9.42 years BELOW the overall mean." 
##                                                                     widowed 
##         "widowed group has an average age 22 years ABOVE the overall mean."

PROBLEM III Part A: Perform a hypothesis test to see if

  1. The proportions for each marriage category are equal.
  1. State the null and alternative. You may use words or symbols.

Ho: The proportions of patients in different marital status categories are equal.

Ha: At least one marital status category has a different proportion.

  1. Provide the test statistic (no interpretation needed)

The chisquare statistic and degrees of freedom are as follows: X - squared = 3303.1, df = 4

  1. Provide the p-value (no interpretation needed)

The p-value is as follows: p-value < 2.2e-16

  1. State your decision. You may use α=0.05

Since the p-value is < 2.2e-16, which is less then α=0.05, we reject the null hypothesis.

  1. State your conclusion in terms of the problem.

At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in the proportions of at least one of the marital categories.

## 
##  Chi-squared test for given probabilities
## 
## data:  marriage_counts
## X-squared = 3303.1, df = 4, p-value < 2.2e-16

PROBLEM III Part B: Perform a hypothesis test to see if

  1. The proportions for each obese category are equal.
  1. State the null and alternative. You may use words or symbols.

Ho: The proportions of patients in different obesity categories are equal.

Ha: At least one obesity category has a different proportion.

  1. Provide the test statistic (no interpretation needed)

The chi-square statistic and degrees of freedom are as follows: X-squared = 1574.3, df = 3

  1. Provide the p-value (no interpretation needed)

The p-value is as follows: p-value < 2.2e-16

  1. State your decision. You may use α=0.05

Since the p-value is < 2.2e-16, which is less then α=0.05, we reject the null hypothesis.

  1. State your conclusion in terms of the problem.

At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in the proportions of at least one of the obesity categories.

## 
##  Chi-squared test for given probabilities
## 
## data:  obese_counts
## X-squared = 1574.3, df = 3, p-value < 2.2e-16

Code Appendix

knitr::opts_chunk$set(
    echo = FALSE,
    message = TRUE,
    warning = TRUE
)
# Load the data
Patients <- read.csv("C:/Users/hemanth/Documents/STA 100/Exams/patients.csv")
#t test for comparison between cholesterol by sex to identify significant difference
ttest_cholesterol <- t.test(totalchol ~ sex, data = Patients, var.equal = TRUE, conf.level = 0.95)
ttest_cholesterol
#t test for comparison between sedentary minutes by sex to identify significant difference
ttest_sedentary <- t.test(sedmins ~ sex, data = Patients, var.equal = TRUE, conf.level = 0.95)
ttest_sedentary
# ANOVA to compare mean age across marital status categories to identify significant differences
anova_marital_status_age <- aov(age ~ marriage, data = Patients)

# Results
summary(anova_marital_status_age)
# Pairwise t-tests with Bonferroni correction to identify difference between groups by age and marital status
pairwise.t.test(Patients$age, Patients$marriage, p.adjust.method = "bonferroni")

# Unique marriage categories
marriage_levels <- unique(Patients$marriage)

# Store unique categories 
ci_results <- list()

# Loop through all pairs of marriage categories
for (i in 1:(length(marriage_levels) - 1)) {
  for (j in (i + 1):length(marriage_levels)) {
    
    # Extract two comparison groups
    group1 <- Patients$age[Patients$marriage == marriage_levels[i]]
    group2 <- Patients$age[Patients$marriage == marriage_levels[j]]
    
    # t-test with confidence interval
    test_result <- t.test(group1, group2, var.equal = TRUE, conf.level = 0.95)
    
    # Results
    ci_results[[paste(marriage_levels[i], "vs", marriage_levels[j])]] <- test_result$conf.int
  }
}

# Confidence intervals results
ci_results

# Compute overall mean
overall_mean <- mean(Patients$age)

# Print the overall mean
paste("Overall Mean: ",overall_mean)

# Compute group means

group_means <- tapply(Patients$age, Patients$marriage, mean)
paste("Group Means: ",group_means)

# Compute deviations from the overall mean
deviations <- group_means - overall_mean

# Print deviations
paste("Deviations: ",deviations)

# Generate interpretation
interpretation <- sapply(names(deviations), function(group) {
  if (deviations[group] > 0) {
    paste(group, "group has an average age", round(deviations[group], 2), "years ABOVE the overall mean.")
  } else {
    paste(group, "group has an average age", abs(round(deviations[group], 2)), "years BELOW the overall mean.")
  }
})

# Print interpretation
interpretation

# Frequency of marriage
marriage_counts <- table(Patients$marriage)

# Chi-square test on marital status for significant difference in proportions
chisq_marriage_counts <- chisq.test(marriage_counts)

# Results
chisq_marriage_counts
# Frequency of obese
obese_counts <- table(Patients$obese)

# Chi-square test on obesity status for significant difference in proportions
chisq_obese_counts <- chisq.test(obese_counts)

# Results
chisq_obese_counts