1 Introduction
2 Data Import & Reshaping
3 Q1. Descriptive Statistics & Preliminary Observations
- 3.1 Optional Visuals
4 Q2. One-Way ANOVA with Hypotheses & Conclusions
- 4.1 Medical1 (Healthy)
- 4.2 Medical2 (Chronic)
5 Q3. Inference on Individual Treatment Means (Tukey HSD)
6 Submission Notes
7 Appendix: Session Info

1 Introduction

This report analyzes two datasets of individuals aged 65+ from the Wentworth Medical Center to investigate whether geographic location (Florida, New York, North Carolina) is associated with depression scores (higher scores = higher depression).

Medical1 (Healthy): 60 participants in reasonably good health (20 per location).
Medical2 (Chronic): 60 participants with a chronic condition (20 per location).

Per assignment requirements, we provide:
1) Descriptive statistics and preliminary observations,
2) One-way ANOVA for each dataset with hypotheses and conclusions,
3) Inferences about individual treatment means (Tukey HSD) where appropriate.
We interleave narrative with code for reproducibility and to allow easy upload to RPubs (HTML output only; no LaTeX needed).

2 Data Import & Reshaping

# Update paths with Knit with Parameters… if needed:
med1 <- read_excel(params$medical1_path, sheet = "Data")
med2 <- read_excel(params$medical2_path, sheet = "Data")

to_long <- function(df, label) {
  df %>%
    pivot_longer(cols = everything(),
                 names_to = "Location",
                 values_to = "Score") %>%
    mutate(Location = factor(Location),
           Study = label)
}

m1 <- to_long(med1, "Healthy")
m2 <- to_long(med2, "Chronic")

head(m1); head(m2)

#> # A tibble: 6 × 3
#>   Location       Score Study  
#>   <fct>          <dbl> <chr>  
#> 1 Florida            3 Healthy
#> 2 New York           8 Healthy
#> 3 North Carolina    10 Healthy
#> 4 Florida            7 Healthy
#> 5 New York          11 Healthy
#> 6 North Carolina     7 Healthy

#> # A tibble: 6 × 3
#>   Location       Score Study  
#>   <fct>          <dbl> <chr>  
#> 1 Florida           13 Chronic
#> 2 New York          14 Chronic
#> 3 North Carolina    10 Chronic
#> 4 Florida           12 Chronic
#> 5 New York           9 Chronic
#> 6 North Carolina    12 Chronic

3 Q1. Descriptive Statistics & Preliminary Observations

desc_tbl <- function(df) {
  df %>%
    group_by(Location) %>%
    summarise(
      n = n(),
      mean = mean(Score),
      sd = sd(Score),
      min = min(Score),
      median = median(Score),
      max = max(Score),
      .groups = "drop"
    )
}

desc1 <- desc_tbl(m1)
desc2 <- desc_tbl(m2)

desc1 %>% 
  kable(caption = "Medical1 (Healthy): Descriptive Statistics by Location", digits = 2) %>%
  kable_styling(full_width = FALSE)

Medical1 (Healthy): Descriptive Statistics by Location
Location	n	mean	sd	min	median	max
Florida	20	5.55	2.14	2	6.0	9
New York	20	8.00	2.20	4	8.0	13
North Carolina	20	7.05	2.84	3	7.5	12

desc2 %>% 
  kable(caption = "Medical2 (Chronic): Descriptive Statistics by Location", digits = 2) %>%
  kable_styling(full_width = FALSE)

Medical2 (Chronic): Descriptive Statistics by Location
Location	n	mean	sd	min	median	max
Florida	20	14.50	3.17	9	14.5	21
New York	20	15.25	4.13	9	14.5	24
North Carolina	20	13.95	2.95	8	14.0	19

overall <- bind_rows(
  m1 %>% summarise(Study = "Medical1 (Healthy)",
                   n = n(),
                   mean = mean(Score),
                   sd = sd(Score),
                   min = min(Score),
                   median = median(Score),
                   max = max(Score)),
  m2 %>% summarise(Study = "Medical2 (Chronic)",
                   n = n(),
                   mean = mean(Score),
                   sd = sd(Score),
                   min = min(Score),
                   median = median(Score),
                   max = max(Score))
)

overall %>% 
  kable(caption = "Overall Summary by Study", digits = 2) %>%
  kable_styling(full_width = FALSE)

Overall Summary by Study
Study	n	mean	sd	min	median	max
Medical1 (Healthy)	60	6.87	2.58	2	7	13
Medical2 (Chronic)	60	14.57	3.44	8	14	24

Preliminary observations. In Medical1, New York has the highest mean and Florida the lowest. In Medical2, means are higher overall than in Medical1 (consistent with chronic illness) and closer together across locations.

3.1 Optional Visuals

ggplot(m1, aes(Location, Score, fill = Location)) +
  geom_boxplot() +
  labs(title = "Medical1 (Healthy): Depression by Location", y = "Depression Score", x = "") +
  theme_minimal() + theme(legend.position = "none")

ggplot(m2, aes(Location, Score, fill = Location)) +
  geom_boxplot() +
  labs(title = "Medical2 (Chronic): Depression by Location", y = "Depression Score", x = "") +
  theme_minimal() + theme(legend.position = "none")

4 Q2. One-Way ANOVA with Hypotheses & Conclusions

Hypotheses (each dataset):
- \(H_0\): \(\mu_{\text{Florida}} = \mu_{\text{New York}} = \mu_{\text{North Carolina}}\)
- \(H_A\): At least one location mean differs.

4.1 Medical1 (Healthy)

fit1 <- aov(Score ~ Location, data = m1)

# Robust ANOVA table without broom(summary(.)) issues
aov1 <- anova(fit1)
aov1 %>% as.data.frame() %>%
  kable(caption = "ANOVA Table — Medical1 (Healthy)", digits = 4) %>%
  kable_styling(full_width = FALSE)

ANOVA Table — Medical1 (Healthy)
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
Location	2	61.0333	30.5167	5.2409	0.0081
Residuals	57	331.9000	5.8228	NA	NA

# Extract key values for inline reporting
df1_num <- aov1$Df[1]; df1_den <- aov1$Df[2]
F1 <- aov1$`F value`[1]; p1 <- aov1$`Pr(>F)`[1]

Conclusion (Medical1). The one-way ANOVA indicated a significant effect of location on depression, F(2, 57) = 5.241, p = 0.00814.

4.1.1 Assumption Checks

lev1 <- leveneTest(Score ~ Location, data = m1)
as.data.frame(lev1) %>%
  kable(caption = "Levene's Test — Medical1 (Healthy)", digits = 4) %>%
  kable_styling(full_width = FALSE)

Levene’s Test — Medical1 (Healthy)
	Df	F value	Pr(>F)
group	2	1.2547	0.2929
	57	NA	NA

m1 %>% group_by(Location) %>%
  summarise(W = shapiro.test(Score)$statistic,
            p.value = shapiro.test(Score)$p.value) %>%
  kable(caption = "Shapiro–Wilk by Location — Medical1 (Healthy)", digits = 4) %>%
  kable_styling(full_width = FALSE)

Shapiro–Wilk by Location — Medical1 (Healthy)
Location	W	p.value
Florida	0.9372	0.2119
New York	0.9336	0.1809
North Carolina	0.9278	0.1398

4.2 Medical2 (Chronic)

fit2 <- aov(Score ~ Location, data = m2)
aov2 <- anova(fit2)
aov2 %>% as.data.frame() %>%
  kable(caption = "ANOVA Table — Medical2 (Chronic)", digits = 4) %>%
  kable_styling(full_width = FALSE)

ANOVA Table — Medical2 (Chronic)
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
Location	2	17.0333	8.5167	0.7142	0.4939
Residuals	57	679.7000	11.9246	NA	NA

df2_num <- aov2$Df[1]; df2_den <- aov2$Df[2]
F2 <- aov2$`F value`[1]; p2 <- aov2$`Pr(>F)`[1]

Conclusion (Medical2). The one-way ANOVA was not significant, F(2, 57) = 0.714, p = 0.494; there is no evidence that mean depression differs by location among participants with chronic conditions.

4.2.1 Assumption Checks

lev2 <- leveneTest(Score ~ Location, data = m2)
as.data.frame(lev2) %>%
  kable(caption = "Levene's Test — Medical2 (Chronic)", digits = 4) %>%
  kable_styling(full_width = FALSE)

Levene’s Test — Medical2 (Chronic)
	Df	F value	Pr(>F)
group	2	0.7734	0.4662
	57	NA	NA

m2 %>% group_by(Location) %>%
  summarise(W = shapiro.test(Score)$statistic,
            p.value = shapiro.test(Score)$p.value) %>%
  kable(caption = "Shapiro–Wilk by Location — Medical2 (Chronic)", digits = 4) %>%
  kable_styling(full_width = FALSE)

Shapiro–Wilk by Location — Medical2 (Chronic)
Location	W	p.value
Florida	0.9686	0.7246
New York	0.9592	0.5272
North Carolina	0.9759	0.8702

5 Q3. Inference on Individual Treatment Means (Tukey HSD)

Medical1: Appropriate (ANOVA significant).
Medical2: Not warranted; the omnibus ANOVA was not significant.

tukey1 <- TukeyHSD(fit1)
tukey1_df <- broom::tidy(tukey1)

tukey1_df %>%
  kable(caption = "Tukey HSD Pairwise Comparisons — Medical1 (Healthy)", digits = 4) %>%
  kable_styling(full_width = FALSE)

Tukey HSD Pairwise Comparisons — Medical1 (Healthy)
term	contrast	estimate	conf.low	conf.high	adj.p.value
Location	New York-Florida	2.45	0.6137	4.2863	0.0061
Location	North Carolina-Florida	1.50	-0.3363	3.3363	0.1301
Location	North Carolina-New York	-0.95	-2.7863	0.8863	0.4320

sig_pairs <- subset(tukey1_df, adj.p.value < 0.05)
sig_pairs

#> # A tibble: 1 × 7
#>   term     contrast         null.value estimate conf.low conf.high adj.p.value
#>   <chr>    <chr>                 <dbl>    <dbl>    <dbl>     <dbl>       <dbl>
#> 1 Location New York-Florida          0     2.45    0.614      4.29     0.00608

Post-hoc conclusion (Medical1). Tukey HSD indicates New York > Florida is significant; other pairwise differences are not.

6 Submission Notes

This HTML report (RPubs-friendly) interleaves narrative with code to satisfy the instruction to include answers and R code together.
If a PDF is required, install TinyTeX and knit to PDF later:
```
install.packages("tinytex"); tinytex::install_tinytex()
```

7 Appendix: Session Info

sessionInfo()

#> R version 4.2.3 (2023-03-15 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] kableExtra_1.4.0 knitr_1.45       broom_1.0.5      car_3.1-2       
#> [5] carData_3.0-5    ggplot2_3.5.0    dplyr_1.1.4      tidyr_1.3.0     
#> [9] readxl_1.4.3    
#> 
#> loaded via a namespace (and not attached):
#>  [1] highr_0.10        cellranger_1.1.0  pillar_1.9.0      bslib_0.6.1      
#>  [5] compiler_4.2.3    jquerylib_0.1.4   tools_4.2.3       digest_0.6.34    
#>  [9] viridisLite_0.4.2 jsonlite_1.8.8    evaluate_0.23     lifecycle_1.0.3  
#> [13] tibble_3.2.1      gtable_0.3.4      pkgconfig_2.0.3   rlang_1.1.3      
#> [17] cli_3.6.1         rstudioapi_0.15.0 yaml_2.3.8        xfun_0.41        
#> [21] fastmap_1.1.1     xml2_1.3.6        stringr_1.5.1     withr_2.5.0      
#> [25] systemfonts_1.0.5 generics_0.1.3    vctrs_0.6.5       sass_0.4.8       
#> [29] grid_4.2.3        tidyselect_1.2.0  svglite_2.1.3     glue_1.6.2       
#> [33] R6_2.5.1          fansi_1.0.4       rmarkdown_2.25    farver_2.1.1     
#> [37] purrr_1.0.2       magrittr_2.0.3    backports_1.4.1   scales_1.3.0     
#> [41] htmltools_0.5.7   abind_1.4-5       colorspace_2.1-0  labeling_0.4.3   
#> [45] utf8_1.2.3        stringi_1.8.3     munsell_0.5.0     cachem_1.0.8

Assignment #2: ANOVA

Lisa Ovalle

October 05, 2025