PROBLEM I: Complete a hypothesis test for two of the following. 1) State the null and alternative. You may use words or symbols. 2) Provide the test statistic (no interpretation needed) 3) Provide the p-value (no interpretation needed) 4) State your decision. You may use α=0.05 5) State your conclusion in terms of the problem. 6) Address any additional questions as needed.
Do patients average cholesterol differ by sex?
Ho: There is no difference in average cholesterol between the sexes.
Ha: There is a difference in average cholesterol between the sexes.
The test statistic and degrees of freedom are as follows: t = 2.3589, df = 4913
The p-value is as follows: p-value = 0.01837
Since the p-value is 0.01837, which is less then α=0.05, we reject the null hypothesis.
At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in cholesterol levels between the sexes.
##
## Two Sample t-test
##
## data: totalchol by sex
## t = 2.3589, df = 4913, p-value = 0.01837
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
## 0.4651368 5.0424457
## sample estimates:
## mean in group F mean in group M
## 197.1000 194.3462
Do patients average sedentary minutes differ by sex?
Ho: There is no difference in average sedentary minutes between the sexes.
Ha: There is a difference in average sedentary minutes between the sexes.
The test statistic and degrees of freedom are as follows: t = -2.1003, df = 4913
The p-value is as follows: p-value = 0.03575
Since the p-value is 0.03575, which is less then α=0.05, we reject the null hypothesis.
At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in sedentary minutes between the sexes.
##
## Two Sample t-test
##
## data: sedmins by sex
## t = -2.1003, df = 4913, p-value = 0.03575
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
## -21.3373009 -0.7348513
## sample estimates:
## mean in group F mean in group M
## 305.8654 316.9015
PROBLEM II: Complete a hypothesis test to determine if the average
age of patients differ by marriage.
1) State the null and alternative. You may use words or symbols. 2)
Provide the test statistic (no interpretation needed) 3) Provide the
p-value (no interpretation needed) 4) State your decision. You may use
α=0.05 5) State your conclusion in terms of the problem. 6) Address any
additional questions as needed.
Ho: There is no significant difference in average age across marital status categories.
Ha: At least one marital status category has a significantly different mean age.
The F statistic is as follows: F = 558
The p-value is as follows: p-value = <2e-16
Since the p-value is <2e-16, which is less then α=0.05, we reject the null hypothesis.
At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in average age across marital status categories.
## Df Sum Sq Mean Sq F value Pr(>F)
## marriage 4 482636 120659 558 <2e-16 ***
## Residuals 4910 1061672 216
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A Bonferroni test determine the following marital categories to show significant difference by age: - married vs widowed - married vs other - married vs never married - widowed vs other - widowed vs never married - widwowed vs divorced - other vs never married - other vs divorced - never married vs divorced
The married group has an average age 2.95 years ABOVE the overall mean and the never married group has an average age 15.71 years BELOW the overall mean.
##
## Pairwise comparisons using t tests with pooled SD
##
## data: Patients$age and Patients$marriage
##
## divorced married nevermarried other
## married 0.97 - - -
## nevermarried < 2e-16 < 2e-16 - -
## other < 2e-16 < 2e-16 3.9e-14 -
## widowed < 2e-16 < 2e-16 < 2e-16 < 2e-16
##
## P value adjustment method: bonferroni
## $`married vs widowed`
## [1] -20.65591 -17.44047
## attr(,"conf.level")
## [1] 0.95
##
## $`married vs other`
## [1] 10.95577 13.78595
## attr(,"conf.level")
## [1] 0.95
##
## $`married vs nevermarried`
## [1] 17.45853 19.86660
## attr(,"conf.level")
## [1] 0.95
##
## $`married vs divorced`
## [1] -2.5971896 0.2799448
## attr(,"conf.level")
## [1] 0.95
##
## $`widowed vs other`
## [1] 29.80448 33.03363
## attr(,"conf.level")
## [1] 0.95
##
## $`widowed vs nevermarried`
## [1] 36.16961 39.25190
## attr(,"conf.level")
## [1] 0.95
##
## $`widowed vs divorced`
## [1] 16.33254 19.44660
## attr(,"conf.level")
## [1] 0.95
##
## $`other vs nevermarried`
## [1] 4.792852 7.790557
## attr(,"conf.level")
## [1] 0.95
##
## $`other vs divorced`
## [1] -15.14524 -11.91372
## attr(,"conf.level")
## [1] 0.95
##
## $`nevermarried vs divorced`
## [1] -21.31127 -18.33110
## attr(,"conf.level")
## [1] 0.95
## [1] "Overall Mean: 49.2724313326551"
## [1] "Group Means: 53.3836126629423" "Group Means: 52.224990268587"
## [3] "Group Means: 33.5624256837099" "Group Means: 39.8541300527241"
## [5] "Group Means: 71.2731829573935"
## [1] "Deviations: 4.11118133028713" "Deviations: 2.95255893593186"
## [3] "Deviations: -15.7100056489453" "Deviations: -9.41830127993106"
## [5] "Deviations: 22.0007516247383"
## divorced
## "divorced group has an average age 4.11 years ABOVE the overall mean."
## married
## "married group has an average age 2.95 years ABOVE the overall mean."
## nevermarried
## "nevermarried group has an average age 15.71 years BELOW the overall mean."
## other
## "other group has an average age 9.42 years BELOW the overall mean."
## widowed
## "widowed group has an average age 22 years ABOVE the overall mean."
PROBLEM III Part A: Perform a hypothesis test to see if
Ho: The proportions of patients in different marital status categories are equal.
Ha: At least one marital status category has a different proportion.
The chisquare statistic and degrees of freedom are as follows: X - squared = 3303.1, df = 4
The p-value is as follows: p-value < 2.2e-16
Since the p-value is < 2.2e-16, which is less then α=0.05, we reject the null hypothesis.
At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in the proportions of at least one of the marital categories.
##
## Chi-squared test for given probabilities
##
## data: marriage_counts
## X-squared = 3303.1, df = 4, p-value < 2.2e-16
PROBLEM III Part B: Perform a hypothesis test to see if
Ho: The proportions of patients in different obesity categories are equal.
Ha: At least one obesity category has a different proportion.
The chi-square statistic and degrees of freedom are as follows: X-squared = 1574.3, df = 3
The p-value is as follows: p-value < 2.2e-16
Since the p-value is < 2.2e-16, which is less then α=0.05, we reject the null hypothesis.
At the 5% significance level, there is sufficient evidence to warrant that there is a significant difference in the proportions of at least one of the obesity categories.
##
## Chi-squared test for given probabilities
##
## data: obese_counts
## X-squared = 1574.3, df = 3, p-value < 2.2e-16
knitr::opts_chunk$set(
echo = FALSE,
message = TRUE,
warning = TRUE
)
# Load the data
Patients <- read.csv("C:/Users/hemanth/Documents/STA 100/Exams/patients.csv")
#t test for comparison between cholesterol by sex to identify significant difference
ttest_cholesterol <- t.test(totalchol ~ sex, data = Patients, var.equal = TRUE, conf.level = 0.95)
ttest_cholesterol
#t test for comparison between sedentary minutes by sex to identify significant difference
ttest_sedentary <- t.test(sedmins ~ sex, data = Patients, var.equal = TRUE, conf.level = 0.95)
ttest_sedentary
# ANOVA to compare mean age across marital status categories to identify significant differences
anova_marital_status_age <- aov(age ~ marriage, data = Patients)
# Results
summary(anova_marital_status_age)
# Pairwise t-tests with Bonferroni correction to identify difference between groups by age and marital status
pairwise.t.test(Patients$age, Patients$marriage, p.adjust.method = "bonferroni")
# Unique marriage categories
marriage_levels <- unique(Patients$marriage)
# Store unique categories
ci_results <- list()
# Loop through all pairs of marriage categories
for (i in 1:(length(marriage_levels) - 1)) {
for (j in (i + 1):length(marriage_levels)) {
# Extract two comparison groups
group1 <- Patients$age[Patients$marriage == marriage_levels[i]]
group2 <- Patients$age[Patients$marriage == marriage_levels[j]]
# t-test with confidence interval
test_result <- t.test(group1, group2, var.equal = TRUE, conf.level = 0.95)
# Results
ci_results[[paste(marriage_levels[i], "vs", marriage_levels[j])]] <- test_result$conf.int
}
}
# Confidence intervals results
ci_results
# Compute overall mean
overall_mean <- mean(Patients$age)
# Print the overall mean
paste("Overall Mean: ",overall_mean)
# Compute group means
group_means <- tapply(Patients$age, Patients$marriage, mean)
paste("Group Means: ",group_means)
# Compute deviations from the overall mean
deviations <- group_means - overall_mean
# Print deviations
paste("Deviations: ",deviations)
# Generate interpretation
interpretation <- sapply(names(deviations), function(group) {
if (deviations[group] > 0) {
paste(group, "group has an average age", round(deviations[group], 2), "years ABOVE the overall mean.")
} else {
paste(group, "group has an average age", abs(round(deviations[group], 2)), "years BELOW the overall mean.")
}
})
# Print interpretation
interpretation
# Frequency of marriage
marriage_counts <- table(Patients$marriage)
# Chi-square test on marital status for significant difference in proportions
chisq_marriage_counts <- chisq.test(marriage_counts)
# Results
chisq_marriage_counts
# Frequency of obese
obese_counts <- table(Patients$obese)
# Chi-square test on obesity status for significant difference in proportions
chisq_obese_counts <- chisq.test(obese_counts)
# Results
chisq_obese_counts