Introduction
This report analyzes two datasets of individuals aged 65+ from the
Wentworth Medical Center to investigate whether geographic
location (Florida, New York, North Carolina) is associated with
depression scores (higher scores = higher
depression).
- Medical1 (Healthy): 60 participants in reasonably
good health (20 per location).
- Medical2 (Chronic): 60 participants with a chronic
condition (20 per location).
Per assignment requirements, we provide:
1) Descriptive statistics and preliminary
observations,
2) One-way ANOVA for each dataset with
hypotheses and conclusions,
3) Inferences about individual treatment means (Tukey
HSD) where appropriate.
We interleave narrative with code for reproducibility and to allow easy
upload to RPubs (HTML output only; no LaTeX needed).
Data Import &
Reshaping
# Update paths with Knit with Parameters… if needed:
med1 <- read_excel(params$medical1_path, sheet = "Data")
med2 <- read_excel(params$medical2_path, sheet = "Data")
to_long <- function(df, label) {
df %>%
pivot_longer(cols = everything(),
names_to = "Location",
values_to = "Score") %>%
mutate(Location = factor(Location),
Study = label)
}
m1 <- to_long(med1, "Healthy")
m2 <- to_long(med2, "Chronic")
head(m1); head(m2)
#> # A tibble: 6 × 3
#> Location Score Study
#> <fct> <dbl> <chr>
#> 1 Florida 3 Healthy
#> 2 New York 8 Healthy
#> 3 North Carolina 10 Healthy
#> 4 Florida 7 Healthy
#> 5 New York 11 Healthy
#> 6 North Carolina 7 Healthy
#> # A tibble: 6 × 3
#> Location Score Study
#> <fct> <dbl> <chr>
#> 1 Florida 13 Chronic
#> 2 New York 14 Chronic
#> 3 North Carolina 10 Chronic
#> 4 Florida 12 Chronic
#> 5 New York 9 Chronic
#> 6 North Carolina 12 Chronic
Q1. Descriptive
Statistics & Preliminary Observations
desc_tbl <- function(df) {
df %>%
group_by(Location) %>%
summarise(
n = n(),
mean = mean(Score),
sd = sd(Score),
min = min(Score),
median = median(Score),
max = max(Score),
.groups = "drop"
)
}
desc1 <- desc_tbl(m1)
desc2 <- desc_tbl(m2)
desc1 %>%
kable(caption = "Medical1 (Healthy): Descriptive Statistics by Location", digits = 2) %>%
kable_styling(full_width = FALSE)
Medical1 (Healthy): Descriptive Statistics by Location
|
Location
|
n
|
mean
|
sd
|
min
|
median
|
max
|
|
Florida
|
20
|
5.55
|
2.14
|
2
|
6.0
|
9
|
|
New York
|
20
|
8.00
|
2.20
|
4
|
8.0
|
13
|
|
North Carolina
|
20
|
7.05
|
2.84
|
3
|
7.5
|
12
|
desc2 %>%
kable(caption = "Medical2 (Chronic): Descriptive Statistics by Location", digits = 2) %>%
kable_styling(full_width = FALSE)
Medical2 (Chronic): Descriptive Statistics by Location
|
Location
|
n
|
mean
|
sd
|
min
|
median
|
max
|
|
Florida
|
20
|
14.50
|
3.17
|
9
|
14.5
|
21
|
|
New York
|
20
|
15.25
|
4.13
|
9
|
14.5
|
24
|
|
North Carolina
|
20
|
13.95
|
2.95
|
8
|
14.0
|
19
|
overall <- bind_rows(
m1 %>% summarise(Study = "Medical1 (Healthy)",
n = n(),
mean = mean(Score),
sd = sd(Score),
min = min(Score),
median = median(Score),
max = max(Score)),
m2 %>% summarise(Study = "Medical2 (Chronic)",
n = n(),
mean = mean(Score),
sd = sd(Score),
min = min(Score),
median = median(Score),
max = max(Score))
)
overall %>%
kable(caption = "Overall Summary by Study", digits = 2) %>%
kable_styling(full_width = FALSE)
Overall Summary by Study
|
Study
|
n
|
mean
|
sd
|
min
|
median
|
max
|
|
Medical1 (Healthy)
|
60
|
6.87
|
2.58
|
2
|
7
|
13
|
|
Medical2 (Chronic)
|
60
|
14.57
|
3.44
|
8
|
14
|
24
|
Preliminary observations. In
Medical1, New York has the highest mean and Florida the
lowest. In Medical2, means are higher
overall than in Medical1 (consistent with chronic illness) and
closer together across locations.
Optional Visuals
ggplot(m1, aes(Location, Score, fill = Location)) +
geom_boxplot() +
labs(title = "Medical1 (Healthy): Depression by Location", y = "Depression Score", x = "") +
theme_minimal() + theme(legend.position = "none")

ggplot(m2, aes(Location, Score, fill = Location)) +
geom_boxplot() +
labs(title = "Medical2 (Chronic): Depression by Location", y = "Depression Score", x = "") +
theme_minimal() + theme(legend.position = "none")

Q2. One-Way ANOVA with
Hypotheses & Conclusions
Hypotheses (each dataset):
- \(H_0\): \(\mu_{\text{Florida}} = \mu_{\text{New York}} =
\mu_{\text{North Carolina}}\)
- \(H_A\): At least one location mean
differs.
Medical1
(Healthy)
fit1 <- aov(Score ~ Location, data = m1)
# Robust ANOVA table without broom(summary(.)) issues
aov1 <- anova(fit1)
aov1 %>% as.data.frame() %>%
kable(caption = "ANOVA Table — Medical1 (Healthy)", digits = 4) %>%
kable_styling(full_width = FALSE)
ANOVA Table — Medical1 (Healthy)
|
|
Df
|
Sum Sq
|
Mean Sq
|
F value
|
Pr(>F)
|
|
Location
|
2
|
61.0333
|
30.5167
|
5.2409
|
0.0081
|
|
Residuals
|
57
|
331.9000
|
5.8228
|
NA
|
NA
|
# Extract key values for inline reporting
df1_num <- aov1$Df[1]; df1_den <- aov1$Df[2]
F1 <- aov1$`F value`[1]; p1 <- aov1$`Pr(>F)`[1]
Conclusion (Medical1). The one-way ANOVA indicated a
significant effect of location on depression, F(2, 57) = 5.241,
p = 0.00814.
Assumption
Checks
lev1 <- leveneTest(Score ~ Location, data = m1)
as.data.frame(lev1) %>%
kable(caption = "Levene's Test — Medical1 (Healthy)", digits = 4) %>%
kable_styling(full_width = FALSE)
Levene’s Test — Medical1 (Healthy)
|
|
Df
|
F value
|
Pr(>F)
|
|
group
|
2
|
1.2547
|
0.2929
|
|
|
57
|
NA
|
NA
|
m1 %>% group_by(Location) %>%
summarise(W = shapiro.test(Score)$statistic,
p.value = shapiro.test(Score)$p.value) %>%
kable(caption = "Shapiro–Wilk by Location — Medical1 (Healthy)", digits = 4) %>%
kable_styling(full_width = FALSE)
Shapiro–Wilk by Location — Medical1 (Healthy)
|
Location
|
W
|
p.value
|
|
Florida
|
0.9372
|
0.2119
|
|
New York
|
0.9336
|
0.1809
|
|
North Carolina
|
0.9278
|
0.1398
|
Medical2
(Chronic)
fit2 <- aov(Score ~ Location, data = m2)
aov2 <- anova(fit2)
aov2 %>% as.data.frame() %>%
kable(caption = "ANOVA Table — Medical2 (Chronic)", digits = 4) %>%
kable_styling(full_width = FALSE)
ANOVA Table — Medical2 (Chronic)
|
|
Df
|
Sum Sq
|
Mean Sq
|
F value
|
Pr(>F)
|
|
Location
|
2
|
17.0333
|
8.5167
|
0.7142
|
0.4939
|
|
Residuals
|
57
|
679.7000
|
11.9246
|
NA
|
NA
|
df2_num <- aov2$Df[1]; df2_den <- aov2$Df[2]
F2 <- aov2$`F value`[1]; p2 <- aov2$`Pr(>F)`[1]
Conclusion (Medical2). The one-way ANOVA was
not significant, F(2, 57) = 0.714, p
= 0.494; there is no evidence that mean depression differs by location
among participants with chronic conditions.
Assumption
Checks
lev2 <- leveneTest(Score ~ Location, data = m2)
as.data.frame(lev2) %>%
kable(caption = "Levene's Test — Medical2 (Chronic)", digits = 4) %>%
kable_styling(full_width = FALSE)
Levene’s Test — Medical2 (Chronic)
|
|
Df
|
F value
|
Pr(>F)
|
|
group
|
2
|
0.7734
|
0.4662
|
|
|
57
|
NA
|
NA
|
m2 %>% group_by(Location) %>%
summarise(W = shapiro.test(Score)$statistic,
p.value = shapiro.test(Score)$p.value) %>%
kable(caption = "Shapiro–Wilk by Location — Medical2 (Chronic)", digits = 4) %>%
kable_styling(full_width = FALSE)
Shapiro–Wilk by Location — Medical2 (Chronic)
|
Location
|
W
|
p.value
|
|
Florida
|
0.9686
|
0.7246
|
|
New York
|
0.9592
|
0.5272
|
|
North Carolina
|
0.9759
|
0.8702
|
Q3. Inference on
Individual Treatment Means (Tukey HSD)
- Medical1: Appropriate (ANOVA significant).
- Medical2: Not warranted; the omnibus ANOVA was not
significant.
tukey1 <- TukeyHSD(fit1)
tukey1_df <- broom::tidy(tukey1)
tukey1_df %>%
kable(caption = "Tukey HSD Pairwise Comparisons — Medical1 (Healthy)", digits = 4) %>%
kable_styling(full_width = FALSE)
Tukey HSD Pairwise Comparisons — Medical1 (Healthy)
|
term
|
contrast
|
null.value
|
estimate
|
conf.low
|
conf.high
|
adj.p.value
|
|
Location
|
New York-Florida
|
0
|
2.45
|
0.6137
|
4.2863
|
0.0061
|
|
Location
|
North Carolina-Florida
|
0
|
1.50
|
-0.3363
|
3.3363
|
0.1301
|
|
Location
|
North Carolina-New York
|
0
|
-0.95
|
-2.7863
|
0.8863
|
0.4320
|
sig_pairs <- subset(tukey1_df, adj.p.value < 0.05)
sig_pairs
#> # A tibble: 1 × 7
#> term contrast null.value estimate conf.low conf.high adj.p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Location New York-Florida 0 2.45 0.614 4.29 0.00608
Post-hoc conclusion (Medical1). Tukey HSD indicates
New York > Florida is significant; other pairwise
differences are not.
Submission Notes
This HTML report (RPubs-friendly) interleaves narrative with code
to satisfy the instruction to include answers and R code together.
If a PDF is required, install TinyTeX and knit to PDF later:
install.packages("tinytex"); tinytex::install_tinytex()
Appendix: Session
Info
sessionInfo()
#> R version 4.2.3 (2023-03-15 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.utf8
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] kableExtra_1.4.0 knitr_1.45 broom_1.0.5 car_3.1-2
#> [5] carData_3.0-5 ggplot2_3.5.0 dplyr_1.1.4 tidyr_1.3.0
#> [9] readxl_1.4.3
#>
#> loaded via a namespace (and not attached):
#> [1] highr_0.10 cellranger_1.1.0 pillar_1.9.0 bslib_0.6.1
#> [5] compiler_4.2.3 jquerylib_0.1.4 tools_4.2.3 digest_0.6.34
#> [9] viridisLite_0.4.2 jsonlite_1.8.8 evaluate_0.23 lifecycle_1.0.3
#> [13] tibble_3.2.1 gtable_0.3.4 pkgconfig_2.0.3 rlang_1.1.3
#> [17] cli_3.6.1 rstudioapi_0.15.0 yaml_2.3.8 xfun_0.41
#> [21] fastmap_1.1.1 xml2_1.3.6 stringr_1.5.1 withr_2.5.0
#> [25] systemfonts_1.0.5 generics_0.1.3 vctrs_0.6.5 sass_0.4.8
#> [29] grid_4.2.3 tidyselect_1.2.0 svglite_2.1.3 glue_1.6.2
#> [33] R6_2.5.1 fansi_1.0.4 rmarkdown_2.25 farver_2.1.1
#> [37] purrr_1.0.2 magrittr_2.0.3 backports_1.4.1 scales_1.3.0
#> [41] htmltools_0.5.7 abind_1.4-5 colorspace_2.1-0 labeling_0.4.3
#> [45] utf8_1.2.3 stringi_1.8.3 munsell_0.5.0 cachem_1.0.8