Part 1 - Introduction

Water quality in recreational swimming areas is an important environmental and public health concern. Contaminated water can expose swimmers to harmful microorganisms that may cause gastrointestinal illness, skin infections, and other health problems. Because of these risks, environmental agencies routinely monitor bacterial levels in beaches, lakes, rivers, and other recreational waters.

One commonly used bacterial indicator is enterococci bacteria. Enterococci are naturally found in the intestines of humans and animals, so elevated levels in water often suggest fecal contamination from sewage, stormwater runoff, wildlife waste, or failing wastewater systems. Since enterococci levels are strongly associated with swimmer illness, they are commonly used to determine whether water is safe for recreation.

Rainfall may strongly affect bacteria levels in water. During rain events, runoff can wash pet waste, sediments, sewage overflow, and other contaminants into nearby waterways. Heavy rainfall may therefore increase bacterial concentrations and reduce water quality.

The data used in this project combines water quality measurements with weather observations collected on matching dates. The primary response variable (dependent variable) is enterococci bacteria concentration, measured in colony-forming units per 100 milliliters (CFU/100 mL). The main explanatory variable (independent variable) is rainfall amount, measured in millimeters (mm) of precipitation.

#Import datasets
water_quality <- readr::read_csv(
'https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-05-20/water_quality.csv'
)
weather <- readr::read_csv(
'https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-05-20/weather.csv'
)

# Include map of sampling locations
knitr::include_graphics("~/Desktop/BIO 320 - Lister/Final Project/BeachwatchMap.png")

Part 2 - Main Research Question

Does the level of rainfall affect the bacteria levels in the water?

Part 3 - Exploring the Data (Descriptive Statistics)

# Merge datasets by date
combined_data <- water_quality %>%
  left_join(weather, by = "date") %>%
  select(swim_site, date, enterococci_cfu_100ml,
         water_temperature_c, precipitation_mm) %>%
  drop_na(enterococci_cfu_100ml, precipitation_mm)

# Create a new variables
combined_data <- combined_data %>%
  mutate(
    rainy_day = if_else(precipitation_mm > 0, "Rainy", "Non-rainy"),
    rainy_day = factor(rainy_day),
    log_enterococci = log10(enterococci_cfu_100ml + 1)
  )

# Overall summary statistics
summary_stats <- combined_data %>%
  summarise(
    n = n(),
    mean_enterococci = mean(enterococci_cfu_100ml),
    median_enterococci = median(enterococci_cfu_100ml),
    sd_enterococci = sd(enterococci_cfu_100ml),
    se_enterococci = sd(enterococci_cfu_100ml) / sqrt(n()),
    mean_rain = mean(precipitation_mm),
    median_rain = median(precipitation_mm),
    sd_rain = sd(precipitation_mm),
    se_rain = sd(precipitation_mm) / sqrt(n())
  )

kable(summary_stats, digits = 2)
n mean_enterococci median_enterococci sd_enterococci se_enterococci mean_rain median_rain sd_rain se_rain
123223 116.77 4 4714.63 13.43 1.94 0.1 5.61 0.02
# Grouped summary statistics 
group_stats <- combined_data %>%
  group_by(rainy_day) %>%
  summarise(
    n = n(),
    mean_enterococci = mean(enterococci_cfu_100ml),
    median_enterococci = median(enterococci_cfu_100ml),
    sd_enterococci = sd(enterococci_cfu_100ml),
    se_enterococci = sd(enterococci_cfu_100ml) / sqrt(n())
  )

kable(group_stats, digits = 2)
rainy_day n mean_enterococci median_enterococci sd_enterococci se_enterococci
Non-rainy 60057 53.69 2 1462.13 5.97
Rainy 63166 176.75 6 6428.20 25.58
# Histogram of Enterococci Levels
ggplot(combined_data, aes(x = enterococci_cfu_100ml)) +
  geom_histogram(fill = "steelblue", bins = 30) +
  theme_minimal() +
  labs(
    title = "Distribution of Enterococci Bacteria Levels",
    x = "Enterococci (CFU per 100mL)",
    y = "Frequency"
  )

# Histogram of Log-Transformed Data
ggplot(combined_data, aes(x = log_enterococci)) +
  geom_histogram(fill = "darkgreen", bins = 30) +
  theme_minimal() +
  labs(
    title = "Distribution of Log-Transformed Enterococci Levels",
    x = "Log10 Enterococci",
    y = "Frequency"
  )

# Boxplot: Rainy vs Non-Rainy Days
ggplot(combined_data, aes(x = rainy_day, y = log_enterococci, fill = rainy_day)) +
  geom_boxplot() +
  theme_minimal() +
  labs(
    title = "Bacteria Levels on Rainy vs Non-Rainy Days",
    x = "Day Type",
    y = "Log Enterococci"
  )

# Scatterplot: Rainfall vs Bacteria
ggplot(combined_data, aes(x = precipitation_mm, y = log_enterococci)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  theme_minimal() +
  labs(
    title = "Rainfall vs Bacteria Levels",
    x = "Rainfall (mm)",
    y = "Log Enterococci"
  )

Part 4 - Statistical Tests (Inferential Statistics)

The bacteria data were not normally distributed, so a log transformation was applied before statistical testing.

Independent Sample t-test: comparing bacteria levels between rainy and non-rainy days. - Null Hypothesis: Mean bacteria levels are equal on rainy and non-rainy days. - Alternative Hypothesis: Mean bacteria levels differ between rainy and non-rainy days.

Linear Regression: this model tests whether rainfall amount predicts bacteria levels. - Null Hypothesis: Rainfall has no relationship with bacteria levels. - Alternative Hypothesis: Rainfall significantly affects bacteria levels.

Multiple Regression (Improved Model): this model controls for water temperature and swim site.

These diagnostic plots were used to assess linearity, normality of residuals, constant variance, and influential outliers.

# Independent Sample t-test
t_test_results <- t.test(log_enterococci ~ rainy_day, data = combined_data)
t_test_results
## 
##  Welch Two Sample t-test
## 
## data:  log_enterococci by rainy_day
## t = -61.899, df = 121492, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Non-rainy and group Rainy is not equal to 0
## 95 percent confidence interval:
##  -0.2742393 -0.2574052
## sample estimates:
## mean in group Non-rainy     mean in group Rainy 
##               0.7014733               0.9672956
# Linear Regression 
rain_model <- lm(log_enterococci ~ precipitation_mm, data = combined_data)
summary(rain_model)
## 
## Call:
## lm(formula = log_enterococci ~ precipitation_mm, data = combined_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2599 -0.7903 -0.0962  0.4601  4.9433 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.7902765  0.0022793  346.72   <2e-16 ***
## precipitation_mm 0.0244279  0.0003842   63.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.756 on 123221 degrees of freedom
## Multiple R-squared:  0.03177,    Adjusted R-squared:  0.03176 
## F-statistic:  4043 on 1 and 123221 DF,  p-value: < 2.2e-16
# Multiple Regression
multi_model <- lm(
  log_enterococci ~ precipitation_mm + water_temperature_c + swim_site,
  data = combined_data
)

summary(multi_model)
## 
## Call:
## lm(formula = log_enterococci ~ precipitation_mm + water_temperature_c + 
##     swim_site, data = combined_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4039 -0.4391 -0.1212  0.3576  4.6005 
## 
## Coefficients:
##                                              Estimate Std. Error t value
## (Intercept)                                 0.2055298  0.0257566   7.980
## precipitation_mm                            0.0203184  0.0004795  42.377
## water_temperature_c                         0.0010228  0.0002097   4.877
## swim_siteBalmoral Baths                     0.5432819  0.0382200  14.215
## swim_siteBilarong Reserve                   1.0298891  0.0361783  28.467
## swim_siteBilgola Beach                      0.1109708  0.0360188   3.081
## swim_siteBoat Harbour                       0.8205223  0.0354658  23.136
## swim_siteBondi Beach                        0.5801112  0.0338422  17.142
## swim_siteBronte Beach                       0.6609371  0.0331994  19.908
## swim_siteBungan Beach                       0.0521869  0.0359199   1.453
## swim_siteCabarita Beach                     0.5797650  0.0316765  18.303
## swim_siteCallan Park Seawall                0.6770511  0.0478790  14.141
## swim_siteCamp Cove                          0.1985926  0.0420930   4.718
## swim_siteChinamans Beach                    0.3085838  0.0318677   9.683
## swim_siteChiswick Baths                     0.5712400  0.0401415  14.231
## swim_siteClifton Gardens                    0.4590599  0.0383698  11.964
## swim_siteClontarf Pool                      0.5568633  0.0386553  14.406
## swim_siteClovelly Beach                     0.4459010  0.0339412  13.137
## swim_siteCollaroy Beach                     0.3795577  0.0360046  10.542
## swim_siteCoogee Beach                       0.8508519  0.0331617  25.658
## swim_siteDarling Harbour                    1.2447456  0.0392951  31.677
## swim_siteDavidson Reserve                   0.8700922  0.0380555  22.864
## swim_siteDawn Fraser Pool                   0.6343674  0.0380129  16.688
## swim_siteDee Why Beach                      0.2819881  0.0359905   7.835
## swim_siteEdwards Beach                      0.3578920  0.0390996   9.153
## swim_siteElouera Beach                      0.2425258  0.0353390   6.863
## swim_siteFairlight Beach                    0.3283249  0.0394707   8.318
## swim_siteForty Baskets Pool                 0.4440417  0.0389556  11.399
## swim_siteFreshwater Beach                   0.4921560  0.0360047  13.669
## swim_siteGordons Bay (East)                 0.3961165  0.0353392  11.209
## swim_siteGreenhills Beach                   0.1750181  0.0355296   4.926
## swim_siteGreenwich Baths                    0.6186080  0.0379926  16.282
## swim_siteGurney Crescent Baths              0.5178153  0.0396527  13.059
## swim_siteHayes Street Beach                 0.7866814  0.0389560  20.194
## swim_siteHenley Baths (Kelly Street Baths)  0.8449548  0.1073281   7.873
## swim_siteLittle Bay Beach                   0.7113614  0.0340523  20.890
## swim_siteLittle Manly Cove                  0.5366109  0.0384122  13.970
## swim_siteLittle Sirius Cove                 1.3246587  0.0612569  21.625
## swim_siteLong Reef Beach                    0.1766427  0.0314278   5.621
## swim_siteMalabar Beach                      0.9119533  0.0337273  27.039
## swim_siteManly Cove                         0.4678243  0.0315187  14.843
## swim_siteMaroubra Beach                     0.4107633  0.0313343  13.109
## swim_siteMegalong Creek                     1.7648720  0.0646351  27.305
## swim_siteMona Vale Beach                    0.1490143  0.0313784   4.749
## swim_siteMurray Rose Pool                   0.7422783  0.0381365  19.464
## swim_siteNarrabeen Lagoon (Birdwood Park)   1.0449724  0.0364190  28.693
## swim_siteNewport Beach                      0.1718602  0.0359059   4.786
## swim_siteNielsen Park                       0.2176661  0.0316517   6.877
## swim_siteNorth Cronulla Beach               0.2788387  0.0350723   7.950
## swim_siteNorth Curl Curl Beach              0.3345735  0.0349907   9.562
## swim_siteNorth Narrabeen Beach              0.2255751  0.0360474   6.258
## swim_siteNorth Steyne Beach                 0.4495448  0.0360190  12.481
## swim_siteNorthbridge Baths                  0.6648958  0.0385006  17.270
## swim_siteOak Park Beach                     0.1779708  0.0353018   5.041
## swim_sitePalm Beach                         0.1042418  0.0313256   3.328
## swim_siteParsley Bay                        0.6588720  0.0382411  17.229
## swim_sitePenrith Beach                      0.7177563  0.0991408   7.240
## swim_siteQueenscliff Beach                  0.6885395  0.0348990  19.729
## swim_siteRose Bay Beach                     0.8678935  0.0386326  22.465
## swim_siteSangrado Baths                     1.2623801  0.1182144  10.679
## swim_siteShelly Beach (Manly)               0.6201551  0.0313744  19.766
## swim_siteShelly Beach (Sutherland)          0.2209829  0.0352648   6.266
## swim_siteSouth Cronulla Beach               0.4556274  0.0352648  12.920
## swim_siteSouth Curl Curl Beach              0.1972558  0.0360332   5.474
## swim_siteSouth Maroubra Beach               0.4030818  0.0350374  11.504
## swim_siteSouth Maroubra Rockpool            0.7170564  0.0353392  20.291
## swim_siteSouth Steyne Beach                 0.7079300  0.0352772  20.068
## swim_siteTamarama Beach                     0.5705754  0.0337188  16.922
## swim_siteTambourine Bay                     0.8881127  0.0382210  23.236
## swim_siteTurimetta Beach                    0.0916142  0.0361197   2.536
## swim_siteWanda Beach                        0.2122272  0.0352893   6.014
## swim_siteWarriewood Beach                   0.1423594  0.0360762   3.946
## swim_siteWatsons Bay                        0.4968795  0.0384339  12.928
## swim_siteWentworth Falls Lake - Beach       1.3006895  0.0630730  20.622
## swim_siteWentworth Falls Lake - Jetty       1.6645338  0.0632875  26.301
## swim_siteWhale Beach                       -0.0134556  0.0359199  -0.375
## swim_siteWindsor Beach                      1.3817573  0.0635073  21.757
## swim_siteWoodford Bay                       0.6837651  0.0388387  17.605
## swim_siteWoolwich Baths                     0.8216067  0.0379131  21.671
## swim_siteYarramundi Reserve                 1.6777750  0.0641735  26.144
## swim_siteYosemite Creek - Minnehaha Falls   1.6468562  0.0682587  24.127
##                                            Pr(>|t|)    
## (Intercept)                                1.50e-15 ***
## precipitation_mm                            < 2e-16 ***
## water_temperature_c                        1.08e-06 ***
## swim_siteBalmoral Baths                     < 2e-16 ***
## swim_siteBilarong Reserve                   < 2e-16 ***
## swim_siteBilgola Beach                     0.002065 ** 
## swim_siteBoat Harbour                       < 2e-16 ***
## swim_siteBondi Beach                        < 2e-16 ***
## swim_siteBronte Beach                       < 2e-16 ***
## swim_siteBungan Beach                      0.146266    
## swim_siteCabarita Beach                     < 2e-16 ***
## swim_siteCallan Park Seawall                < 2e-16 ***
## swim_siteCamp Cove                         2.39e-06 ***
## swim_siteChinamans Beach                    < 2e-16 ***
## swim_siteChiswick Baths                     < 2e-16 ***
## swim_siteClifton Gardens                    < 2e-16 ***
## swim_siteClontarf Pool                      < 2e-16 ***
## swim_siteClovelly Beach                     < 2e-16 ***
## swim_siteCollaroy Beach                     < 2e-16 ***
## swim_siteCoogee Beach                       < 2e-16 ***
## swim_siteDarling Harbour                    < 2e-16 ***
## swim_siteDavidson Reserve                   < 2e-16 ***
## swim_siteDawn Fraser Pool                   < 2e-16 ***
## swim_siteDee Why Beach                     4.78e-15 ***
## swim_siteEdwards Beach                      < 2e-16 ***
## swim_siteElouera Beach                     6.83e-12 ***
## swim_siteFairlight Beach                    < 2e-16 ***
## swim_siteForty Baskets Pool                 < 2e-16 ***
## swim_siteFreshwater Beach                   < 2e-16 ***
## swim_siteGordons Bay (East)                 < 2e-16 ***
## swim_siteGreenhills Beach                  8.42e-07 ***
## swim_siteGreenwich Baths                    < 2e-16 ***
## swim_siteGurney Crescent Baths              < 2e-16 ***
## swim_siteHayes Street Beach                 < 2e-16 ***
## swim_siteHenley Baths (Kelly Street Baths) 3.54e-15 ***
## swim_siteLittle Bay Beach                   < 2e-16 ***
## swim_siteLittle Manly Cove                  < 2e-16 ***
## swim_siteLittle Sirius Cove                 < 2e-16 ***
## swim_siteLong Reef Beach                   1.91e-08 ***
## swim_siteMalabar Beach                      < 2e-16 ***
## swim_siteManly Cove                         < 2e-16 ***
## swim_siteMaroubra Beach                     < 2e-16 ***
## swim_siteMegalong Creek                     < 2e-16 ***
## swim_siteMona Vale Beach                   2.05e-06 ***
## swim_siteMurray Rose Pool                   < 2e-16 ***
## swim_siteNarrabeen Lagoon (Birdwood Park)   < 2e-16 ***
## swim_siteNewport Beach                     1.70e-06 ***
## swim_siteNielsen Park                      6.19e-12 ***
## swim_siteNorth Cronulla Beach              1.90e-15 ***
## swim_siteNorth Curl Curl Beach              < 2e-16 ***
## swim_siteNorth Narrabeen Beach             3.94e-10 ***
## swim_siteNorth Steyne Beach                 < 2e-16 ***
## swim_siteNorthbridge Baths                  < 2e-16 ***
## swim_siteOak Park Beach                    4.64e-07 ***
## swim_sitePalm Beach                        0.000876 ***
## swim_siteParsley Bay                        < 2e-16 ***
## swim_sitePenrith Beach                     4.56e-13 ***
## swim_siteQueenscliff Beach                  < 2e-16 ***
## swim_siteRose Bay Beach                     < 2e-16 ***
## swim_siteSangrado Baths                     < 2e-16 ***
## swim_siteShelly Beach (Manly)               < 2e-16 ***
## swim_siteShelly Beach (Sutherland)         3.73e-10 ***
## swim_siteSouth Cronulla Beach               < 2e-16 ***
## swim_siteSouth Curl Curl Beach             4.42e-08 ***
## swim_siteSouth Maroubra Beach               < 2e-16 ***
## swim_siteSouth Maroubra Rockpool            < 2e-16 ***
## swim_siteSouth Steyne Beach                 < 2e-16 ***
## swim_siteTamarama Beach                     < 2e-16 ***
## swim_siteTambourine Bay                     < 2e-16 ***
## swim_siteTurimetta Beach                   0.011203 *  
## swim_siteWanda Beach                       1.82e-09 ***
## swim_siteWarriewood Beach                  7.96e-05 ***
## swim_siteWatsons Bay                        < 2e-16 ***
## swim_siteWentworth Falls Lake - Beach       < 2e-16 ***
## swim_siteWentworth Falls Lake - Jetty       < 2e-16 ***
## swim_siteWhale Beach                       0.707959    
## swim_siteWindsor Beach                      < 2e-16 ***
## swim_siteWoodford Bay                       < 2e-16 ***
## swim_siteWoolwich Baths                     < 2e-16 ***
## swim_siteYarramundi Reserve                 < 2e-16 ***
## swim_siteYosemite Creek - Minnehaha Falls   < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6428 on 48103 degrees of freedom
##   (75039 observations deleted due to missingness)
## Multiple R-squared:  0.2114, Adjusted R-squared:  0.2101 
## F-statistic: 161.2 on 80 and 48103 DF,  p-value: < 2.2e-16
# Assumption Checks 
par(mfrow = c(2,2))
plot(rain_model)

Part 4 - Discussion

The t-test compares bacteria levels on rainy and non-rainy days. If the p-value is less than 0.05, then rainfall status significantly affects bacteria concentrations.

The regression model tests whether rainfall amount predicts bacteria levels. A positive slope indicates bacteria levels increase as rainfall increases. The R-squared value describes how much variation in bacteria levels is explained by rainfall.

If the multiple regression model remains significant after controlling for temperature and swim site, then rainfall likely has an independent effect on bacterial contamination.

This analysis is important because rainfall runoff may carry fecal matter, pollutants, and waste into recreational waters, increasing health risks for swimmers. Understanding this relationship can help improve beach advisories and public safety decisions.

Limitations that arise were other environmental factors such as wind, sunlight, currents, and wildlife were not included. Rainfall effects may vary across beaches. This is observational data, so direct causation cannot be confirmed. Lastly, bacteria levels naturally fluctuate over time.

Part 5 - Conclusion

Overall, this project examined whether rainfall affects bacteria levels in recreational water. If significant results are found, the evidence suggests rainfall is associated with increased enterococci concentrations and poorer water quality. Monitoring rainfall may help predict unsafe swimming conditions and protect public health.

Part 6 - References

Water Quality Dataset. TidyTuesday (2025-05-20). Weather Dataset. TidyTuesday (2025-05-20). U.S. Environmental Protection Agency (EPA). Recreational Water Quality Criteria. Wikimedia Commons or personal project image: BeachwatchMap.png

AI Statement

We acknowledge the use of ChatGPT (chatgpt.com) to assist with revising the original code, identifying errors, and improving the graphs and statistical tests used in this project. It was used to help streamline sections of code and identify potential weaknesses in the analysis. The outputs were used to correct syntax errors and refine the workflow before running the final analysis on the complete dataset.