1 Introduction

This report analyzes two datasets of individuals aged 65+ from the Wentworth Medical Center to investigate whether geographic location (Florida, New York, North Carolina) is associated with depression scores (higher scores = higher depression).

Per assignment requirements, we provide:
1) Descriptive statistics and preliminary observations,
2) One-way ANOVA for each dataset with hypotheses and conclusions,
3) Inferences about individual treatment means (Tukey HSD) where appropriate.
We interleave narrative with code for reproducibility and to allow easy upload to RPubs (HTML output only; no LaTeX needed).

2 Data Import & Reshaping

# Update paths with Knit with Parameters… if needed:
med1 <- read_excel(params$medical1_path, sheet = "Data")
med2 <- read_excel(params$medical2_path, sheet = "Data")

to_long <- function(df, label) {
  df %>%
    pivot_longer(cols = everything(),
                 names_to = "Location",
                 values_to = "Score") %>%
    mutate(Location = factor(Location),
           Study = label)
}

m1 <- to_long(med1, "Healthy")
m2 <- to_long(med2, "Chronic")

head(m1); head(m2)
#> # A tibble: 6 × 3
#>   Location       Score Study  
#>   <fct>          <dbl> <chr>  
#> 1 Florida            3 Healthy
#> 2 New York           8 Healthy
#> 3 North Carolina    10 Healthy
#> 4 Florida            7 Healthy
#> 5 New York          11 Healthy
#> 6 North Carolina     7 Healthy
#> # A tibble: 6 × 3
#>   Location       Score Study  
#>   <fct>          <dbl> <chr>  
#> 1 Florida           13 Chronic
#> 2 New York          14 Chronic
#> 3 North Carolina    10 Chronic
#> 4 Florida           12 Chronic
#> 5 New York           9 Chronic
#> 6 North Carolina    12 Chronic

3 Q1. Descriptive Statistics & Preliminary Observations

desc_tbl <- function(df) {
  df %>%
    group_by(Location) %>%
    summarise(
      n = n(),
      mean = mean(Score),
      sd = sd(Score),
      min = min(Score),
      median = median(Score),
      max = max(Score),
      .groups = "drop"
    )
}

desc1 <- desc_tbl(m1)
desc2 <- desc_tbl(m2)

desc1 %>% 
  kable(caption = "Medical1 (Healthy): Descriptive Statistics by Location", digits = 2) %>%
  kable_styling(full_width = FALSE)
Medical1 (Healthy): Descriptive Statistics by Location
Location n mean sd min median max
Florida 20 5.55 2.14 2 6.0 9
New York 20 8.00 2.20 4 8.0 13
North Carolina 20 7.05 2.84 3 7.5 12
desc2 %>% 
  kable(caption = "Medical2 (Chronic): Descriptive Statistics by Location", digits = 2) %>%
  kable_styling(full_width = FALSE)
Medical2 (Chronic): Descriptive Statistics by Location
Location n mean sd min median max
Florida 20 14.50 3.17 9 14.5 21
New York 20 15.25 4.13 9 14.5 24
North Carolina 20 13.95 2.95 8 14.0 19
overall <- bind_rows(
  m1 %>% summarise(Study = "Medical1 (Healthy)",
                   n = n(),
                   mean = mean(Score),
                   sd = sd(Score),
                   min = min(Score),
                   median = median(Score),
                   max = max(Score)),
  m2 %>% summarise(Study = "Medical2 (Chronic)",
                   n = n(),
                   mean = mean(Score),
                   sd = sd(Score),
                   min = min(Score),
                   median = median(Score),
                   max = max(Score))
)

overall %>% 
  kable(caption = "Overall Summary by Study", digits = 2) %>%
  kable_styling(full_width = FALSE)
Overall Summary by Study
Study n mean sd min median max
Medical1 (Healthy) 60 6.87 2.58 2 7 13
Medical2 (Chronic) 60 14.57 3.44 8 14 24

Preliminary observations. In Medical1, New York has the highest mean and Florida the lowest. In Medical2, means are higher overall than in Medical1 (consistent with chronic illness) and closer together across locations.

3.1 Optional Visuals

ggplot(m1, aes(Location, Score, fill = Location)) +
  geom_boxplot() +
  labs(title = "Medical1 (Healthy): Depression by Location", y = "Depression Score", x = "") +
  theme_minimal() + theme(legend.position = "none")

ggplot(m2, aes(Location, Score, fill = Location)) +
  geom_boxplot() +
  labs(title = "Medical2 (Chronic): Depression by Location", y = "Depression Score", x = "") +
  theme_minimal() + theme(legend.position = "none")

4 Q2. One-Way ANOVA with Hypotheses & Conclusions

Hypotheses (each dataset):
- \(H_0\): \(\mu_{\text{Florida}} = \mu_{\text{New York}} = \mu_{\text{North Carolina}}\)
- \(H_A\): At least one location mean differs.

4.1 Medical1 (Healthy)

fit1 <- aov(Score ~ Location, data = m1)

# Robust ANOVA table without broom(summary(.)) issues
aov1 <- anova(fit1)
aov1 %>% as.data.frame() %>%
  kable(caption = "ANOVA Table — Medical1 (Healthy)", digits = 4) %>%
  kable_styling(full_width = FALSE)
ANOVA Table — Medical1 (Healthy)
Df Sum Sq Mean Sq F value Pr(>F)
Location 2 61.0333 30.5167 5.2409 0.0081
Residuals 57 331.9000 5.8228 NA NA
# Extract key values for inline reporting
df1_num <- aov1$Df[1]; df1_den <- aov1$Df[2]
F1 <- aov1$`F value`[1]; p1 <- aov1$`Pr(>F)`[1]

Conclusion (Medical1). The one-way ANOVA indicated a significant effect of location on depression, F(2, 57) = 5.241, p = 0.00814.

4.1.1 Assumption Checks

lev1 <- leveneTest(Score ~ Location, data = m1)
as.data.frame(lev1) %>%
  kable(caption = "Levene's Test — Medical1 (Healthy)", digits = 4) %>%
  kable_styling(full_width = FALSE)
Levene’s Test — Medical1 (Healthy)
Df F value Pr(>F)
group 2 1.2547 0.2929
57 NA NA
m1 %>% group_by(Location) %>%
  summarise(W = shapiro.test(Score)$statistic,
            p.value = shapiro.test(Score)$p.value) %>%
  kable(caption = "Shapiro–Wilk by Location — Medical1 (Healthy)", digits = 4) %>%
  kable_styling(full_width = FALSE)
Shapiro–Wilk by Location — Medical1 (Healthy)
Location W p.value
Florida 0.9372 0.2119
New York 0.9336 0.1809
North Carolina 0.9278 0.1398

4.2 Medical2 (Chronic)

fit2 <- aov(Score ~ Location, data = m2)
aov2 <- anova(fit2)
aov2 %>% as.data.frame() %>%
  kable(caption = "ANOVA Table — Medical2 (Chronic)", digits = 4) %>%
  kable_styling(full_width = FALSE)
ANOVA Table — Medical2 (Chronic)
Df Sum Sq Mean Sq F value Pr(>F)
Location 2 17.0333 8.5167 0.7142 0.4939
Residuals 57 679.7000 11.9246 NA NA
df2_num <- aov2$Df[1]; df2_den <- aov2$Df[2]
F2 <- aov2$`F value`[1]; p2 <- aov2$`Pr(>F)`[1]

Conclusion (Medical2). The one-way ANOVA was not significant, F(2, 57) = 0.714, p = 0.494; there is no evidence that mean depression differs by location among participants with chronic conditions.

4.2.1 Assumption Checks

lev2 <- leveneTest(Score ~ Location, data = m2)
as.data.frame(lev2) %>%
  kable(caption = "Levene's Test — Medical2 (Chronic)", digits = 4) %>%
  kable_styling(full_width = FALSE)
Levene’s Test — Medical2 (Chronic)
Df F value Pr(>F)
group 2 0.7734 0.4662
57 NA NA
m2 %>% group_by(Location) %>%
  summarise(W = shapiro.test(Score)$statistic,
            p.value = shapiro.test(Score)$p.value) %>%
  kable(caption = "Shapiro–Wilk by Location — Medical2 (Chronic)", digits = 4) %>%
  kable_styling(full_width = FALSE)
Shapiro–Wilk by Location — Medical2 (Chronic)
Location W p.value
Florida 0.9686 0.7246
New York 0.9592 0.5272
North Carolina 0.9759 0.8702

5 Q3. Inference on Individual Treatment Means (Tukey HSD)

tukey1 <- TukeyHSD(fit1)
tukey1_df <- broom::tidy(tukey1)

tukey1_df %>%
  kable(caption = "Tukey HSD Pairwise Comparisons — Medical1 (Healthy)", digits = 4) %>%
  kable_styling(full_width = FALSE)
Tukey HSD Pairwise Comparisons — Medical1 (Healthy)
term contrast null.value estimate conf.low conf.high adj.p.value
Location New York-Florida 0 2.45 0.6137 4.2863 0.0061
Location North Carolina-Florida 0 1.50 -0.3363 3.3363 0.1301
Location North Carolina-New York 0 -0.95 -2.7863 0.8863 0.4320
sig_pairs <- subset(tukey1_df, adj.p.value < 0.05)
sig_pairs
#> # A tibble: 1 × 7
#>   term     contrast         null.value estimate conf.low conf.high adj.p.value
#>   <chr>    <chr>                 <dbl>    <dbl>    <dbl>     <dbl>       <dbl>
#> 1 Location New York-Florida          0     2.45    0.614      4.29     0.00608

Post-hoc conclusion (Medical1). Tukey HSD indicates New York > Florida is significant; other pairwise differences are not.

6 Submission Notes

7 Appendix: Session Info

sessionInfo()
#> R version 4.2.3 (2023-03-15 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] kableExtra_1.4.0 knitr_1.45       broom_1.0.5      car_3.1-2       
#> [5] carData_3.0-5    ggplot2_3.5.0    dplyr_1.1.4      tidyr_1.3.0     
#> [9] readxl_1.4.3    
#> 
#> loaded via a namespace (and not attached):
#>  [1] highr_0.10        cellranger_1.1.0  pillar_1.9.0      bslib_0.6.1      
#>  [5] compiler_4.2.3    jquerylib_0.1.4   tools_4.2.3       digest_0.6.34    
#>  [9] viridisLite_0.4.2 jsonlite_1.8.8    evaluate_0.23     lifecycle_1.0.3  
#> [13] tibble_3.2.1      gtable_0.3.4      pkgconfig_2.0.3   rlang_1.1.3      
#> [17] cli_3.6.1         rstudioapi_0.15.0 yaml_2.3.8        xfun_0.41        
#> [21] fastmap_1.1.1     xml2_1.3.6        stringr_1.5.1     withr_2.5.0      
#> [25] systemfonts_1.0.5 generics_0.1.3    vctrs_0.6.5       sass_0.4.8       
#> [29] grid_4.2.3        tidyselect_1.2.0  svglite_2.1.3     glue_1.6.2       
#> [33] R6_2.5.1          fansi_1.0.4       rmarkdown_2.25    farver_2.1.1     
#> [37] purrr_1.0.2       magrittr_2.0.3    backports_1.4.1   scales_1.3.0     
#> [41] htmltools_0.5.7   abind_1.4-5       colorspace_2.1-0  labeling_0.4.3   
#> [45] utf8_1.2.3        stringi_1.8.3     munsell_0.5.0     cachem_1.0.8