1 Executive Summary

The aim of this report is to investigate the effects of Location and time of year on the water quality of the beachwater in the Northern Beaches Council. The data set used focuses on the levels of Enterococci cfu per 100ml present in the water as a measure of water quality. It was constructed by the NSW government office of Environment and Heritage in 2018. Research Question 1 explored the effects of the month and season on the levels of Enterococci cfu per 100ml recorded. While the summer season had the highest mean, it was only by a slight amount, and mostly due to high levels recorded in January. Further, nearly every month recorded levels of Enterococci less than 33 cfu per 100ml, which is the average recommended by the NHMRC (Hickey and Cowie, n.d.). It was found that there was not sufficient evidence to suggest that any particular month or season effects the level of Enterococci cfu per 100ml to a significant degree. Research question 2 explored the effects of geographical location on the Enterococci levels of the Northern Beaches. While the scatterplot showed a negtive correlation between latitude and Enterococci levels, it was only slim and further investigation using a test statistic showed that in the Northern Beaches, geographical location has little effect on the levels of Enterococci cfu per 100ml.

2 Full Report

2.1 Initial Data Analysis

beaches <- read.csv("~/Desktop/Project 2 - fr/beaches.csv")
View(beaches)
attach(beaches)
beaches <- na.omit(beaches)

2.1.1 Background

This data set was constructed by the NSW government office of environment and heritage in 2018 as a part of their Beachwatch water quality program. This data set is therefore credible. However, there are still some limitations, including: - The bacterial indicators used to detect the presence of Enterococci in the water don’t demonstrate the presence of viable pathagens, or other bacteria. - The analaysis of bacterial indicators takes between 24 and 48 hours to process, so it is still impossible to know the quality of the water at the time of swimming. There are no ethical issues with this dataset.

2.1.2 Variables

summary(beaches)
##     BeachId                               Region    
##  Min.   : 2.0   Sydney Northern Ocean Beaches:1273  
##  1st Qu.: 7.0                                       
##  Median :10.2                                       
##  Mean   :11.3                                       
##  3rd Qu.:16.0                                       
##  Max.   :21.0                                       
##                                                     
##                      Council                                      Site    
##  Northern Beaches Council:1273   Avalon Beach                       : 58  
##                                  Bilarong Reserve (Narrabeen Lagoon): 58  
##                                  Birdwood Park (Narrabeen Lagoon)   : 58  
##                                  Bungan Beach                       : 58  
##                                  Collaroy Beach                     : 58  
##                                  Dee Why Beach                      : 58  
##                                  (Other)                            :925  
##    Longitude        Latitude            Date      Enterococci..cfu.100ml.
##  Min.   :151.3   Min.   :-33.80   1/2/18  :  22   Min.   :   0.00        
##  1st Qu.:151.3   1st Qu.:-33.77   10/4/18 :  22   1st Qu.:   0.00        
##  Median :151.3   Median :-33.71   10/9/18 :  22   Median :   1.00        
##  Mean   :151.3   Mean   :-33.72   11/7/18 :  22   Mean   :  26.73        
##  3rd Qu.:151.3   3rd Qu.:-33.67   12/11/18:  22   3rd Qu.:   8.00        
##  Max.   :151.3   Max.   :-33.60   12/12/18:  22   Max.   :5300.00        
##                                   (Other) :1141                          
##       Month        Day.of.Week 
##  May     :132   Friday   :198  
##  April   :110   Monday   :308  
##  August  :110   Sunday   : 22  
##  February:110   Thursday :242  
##  January :110   Tuesday  :227  
##  July    :110   Wednesday:276  
##  (Other) :591

There are ten variables in this data set: BeachID, Region, Council, Site, Longitude, Latitude, Date, Enterococci (cfu/100ml), Month and Day of Week.

class(beaches$BeachId)
## [1] "numeric"

Beach ID has been classed as numeric, which is not the most accurate class for this variable. It needs to be reclassed as a factor as the numbers in the data represent each beaches ID, and should not be considered as numeric data.

beaches$BeachId <- as.factor(beaches$BeachId)
class(beaches$BeachId)
## [1] "factor"

Beach ID has now been reclassed as a factor, which is a more accurate classification of this variable.

class(beaches$Region)
## [1] "factor"

This variable is classed as a factor, which is the correct classifiation of Region as it is comprised of one factor: Sydney Northern Ocean Beaches

class(beaches$Council)
## [1] "factor"

The variable Council has been classed as a factor, which is an appropriate classification, as this variable consists of no numerical data, only one factor: Northern Beaches Council.

class(beaches$Site)
## [1] "factor"

Site is classified correcly as a factor, as there is no numeric data, only 22 different factors, including Avalon Beach, Bilgola Beach, Collaroy Beach and all other beaches in the Northern Beaches Council.

class(beaches$Longitude)
## [1] "numeric"

This variable, Longitude, is considered numeric, this is an accurate classification as Longitude is made up of numeric data including decimals, which means it must be classified as numeric.

class(beaches$Latitude)
## [1] "numeric"

Latitude has also been correctly classed as numeric, as it too contains decimals.

class(beaches$Date)
## [1] "factor"

The variable Date has been correctly classified as a factor, as there are 58 different factors including 1/2/18, 10/4/18, and all the other dates the beach water quality was tested.

class(beaches$Enterococci..cfu.100ml.)
## [1] "integer"

This variable is classified as an integer, which is correct because each value is a whole number, which means this variable is most accurately classified as an integer.

class(beaches$Month)
## [1] "factor"

The variable Month is classed as a factor, which is accurate, as there is no numeric data, only twelve factors: January, February, March, April, May, June, July, August, September, October, November and December.

class(beaches$Day.of.Week)
## [1] "factor"

This variable is correctly classified as a factor, as there is no numeric data, only Monday, Tuesday, Wednesday, Thursday, Friday and Sunday.

summary(beaches)
##     BeachId                              Region    
##  2      : 58   Sydney Northern Ocean Beaches:1273  
##  4      : 58                                       
##  6      : 58                                       
##  7      : 58                                       
##  8      : 58                                       
##  9      : 58                                       
##  (Other):925                                       
##                      Council                                      Site    
##  Northern Beaches Council:1273   Avalon Beach                       : 58  
##                                  Bilarong Reserve (Narrabeen Lagoon): 58  
##                                  Birdwood Park (Narrabeen Lagoon)   : 58  
##                                  Bungan Beach                       : 58  
##                                  Collaroy Beach                     : 58  
##                                  Dee Why Beach                      : 58  
##                                  (Other)                            :925  
##    Longitude        Latitude            Date      Enterococci..cfu.100ml.
##  Min.   :151.3   Min.   :-33.80   1/2/18  :  22   Min.   :   0.00        
##  1st Qu.:151.3   1st Qu.:-33.77   10/4/18 :  22   1st Qu.:   0.00        
##  Median :151.3   Median :-33.71   10/9/18 :  22   Median :   1.00        
##  Mean   :151.3   Mean   :-33.72   11/7/18 :  22   Mean   :  26.73        
##  3rd Qu.:151.3   3rd Qu.:-33.67   12/11/18:  22   3rd Qu.:   8.00        
##  Max.   :151.3   Max.   :-33.60   12/12/18:  22   Max.   :5300.00        
##                                   (Other) :1141                          
##       Month        Day.of.Week 
##  May     :132   Friday   :198  
##  April   :110   Monday   :308  
##  August  :110   Sunday   : 22  
##  February:110   Thursday :242  
##  January :110   Tuesday  :227  
##  July    :110   Wednesday:276  
##  (Other) :591

2.1.3 Stakeolders

The only potential stakeholders of this data is the NSW government department that constructed the data set, The Office of Environment and Heritage. However, there would be no conflict of interest.

2.1.4 Assessment

This data set is reliable as it was constructed by the NSW government office of Environment and Heritage as a part of their beachwatch program. However, the limitations of this dataset include: - The bacterial indicators used to detect the presence of Enterococci in the water don’t demonstrate the presence of viable pathagens, or other bacteria. - The analaysis of bacterial indicators takes between 24 and 48 hours to process, so it is still impossible to know the quality of the water at the time of swimming. Despite these limitations, this data is credible and reliable.

2.2 Research Question 1

How does the month of the year affect the water quality of the beaches in the Northern Beaches Council?

2.2.1 Hypothesis

The summer months will hold more levels of Enterococci, as the beaches would be busier at these times.

2.2.2 Mean

mean(beaches$Enterococci..cfu.100ml.)
## [1] 26.7337
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "January"])
## [1] 107.4273
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "February"])
## [1] 11.11818
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "March"])
## [1] 45.3578
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "April"])
## [1] 5.572727
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "May"])
## [1] 9.280303
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "June"])
## [1] 42.61364
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "June"])
## [1] 42.61364
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "July"])
## [1] 5.218182
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "August"])
## [1] 48.97273
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "September"])
## [1] 27.38889
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "October"])
## [1] 4.909091
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "November"])
## [1] 4.590909
mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "December"])
## [1] 6.784091

As shown above, the mean results for each month are varied. November presents the smallest average amount of Enterococci at 4.590909 cfu per 100ml, whilst January’s mean is the highest, with 107.4273 cfu per 100ml of Enterococci recorded. The NSW and Australian Enterococci level guidelines suggest that for recreational water, enterococci cfu per 100ml should be recorded as a mean of less than 33, or a median of less than 35 with 4 out of 5 samples recorded less than or equal to 100 cfu per 100ml (Hickey and Cowie, n.d.). Thus, most of these months fall into the safe category, with only January, March, June, and August presenting higher means than 33.

(mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "January"]) + mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "February"]) + mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "December"]))/3
## [1] 41.77652
(mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "March"]) + mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "April"]) + mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "May"]))/3
## [1] 20.07028
(mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "June"]) + mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "July"]) + mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "August"]))/3
## [1] 32.26818
(mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "September"]) + mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "October"]) + mean(beaches$Enterococci..cfu.100ml.[beaches$Month == "November"]))/3
## [1] 12.2963

Above, the seasonal means show that the summer mean is the only one above the reccomended level of 33 cfu per 100ml of Enterococci.

2.2.3 Scatterplot

beaches$Month = factor(beaches$Month, levels = c("January","February","March","April","May","June","July","August","September","October","November","December"))
plot(x = beaches$Month, y = beaches$Enterococci..cfu.100ml., main = "Enterococci levels by Month", ylab = "Enterococci Levels (cfu/100ml)", xlab = "2018",)
abline(h = 33, col = "blue")

This scatterplot shows each enterococci value for each month of 2018, with a line at 33, the level suggested by the NSW guidelines for reference. January, March, June and September are shown to have the most consistently higher than recommended levels, however, August has one extremely high value which is almost certainly an outlier, and which would have caused its mean to be dragged higher than the recommended 33.

2.2.4 Conclusion

While the summer months of 2018 did have a higher average of Enterococci cfu per 100ml than the other seasons, it was only by a slight amount, and mostly due to the the high levels recorded in January. December and February both had low levels of Enterococci cfu per 100ml. Further, each month, with the exception of January, March, June and August, recorded lower than 33 cfu per 100ml of Enterococci, and the levels of Enterococci reduced as the year went on. Thus, there is not enough evidence to suggest that certain months or seasons effect the levels of Enterococci fu per 100ml present in the water of the Northern Beaches.

2.3 Research Question 2

What is the impact of location on the water quality of the Northern Beaches in NSW

2.3.1 Hypothesis

The beaches further South will have a higher level of Enterococci cfu per 100ml than the northernmost beaches.

2.3.2 Mean

mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Avalon Beach"])
## [1] 1.810345
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Bilarong Reserve (Narrabeen Lagoon)"])
## [1] 51.24138
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Bilgola Beach"])
## [1] 5.578947
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Birdwood Park (Narrabeen Lagoon)"])
## [1] 171.6207
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Bungan Beach"])
## [1] 2.913793
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Collaroy Beach"])
## [1] 30.37931
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Dee Why Beach"])
## [1] 17.06897
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Freshwater Beach"])
## [1] 24.86207
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Long Reef Beach"])
## [1] 5.241379
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Mona Vale Beach"])
## [1] 4.310345
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Newport Beach"])
## [1] 8.465517
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "North Curl Curl Beach"])
## [1] 45.51724
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "North Narrabeen Beach"])
## [1] 7.241379
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "North Steyne Beach"])
## [1] 16.67241
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Palm Beach"])
## [1] 3.241379
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Queenscliff Beach"])
## [1] 55.15517
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Shelly Beach (Manly)"])
## [1] 58.94828
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "South Curl Curl Beach"])
## [1] 10
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "South Steyne Beach"])
## [1] 52.68966
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Turimetta Beach"])
## [1] 5.982759
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Warriewood Beach"])
## [1] 4.724138
mean(beaches$Enterococci..cfu.100ml.[beaches$Site == "Whale Beach"])
## [1] 3.421053

The average levels of Enterococci cfu per 100ml of each beach site are mostly underneath the recommended 33, with the exceptions of beach sites such as Bilarong Reserve, Birdwood Park, North Curl Curl Beach, Queenscliff Beach, Shelly Beach, and South Steyne Beach. The smallest level of enterococci was recorded at Avalon Beach with a mean of 1.81035, and Birdwood Park recorded the highest average level of Enterococci of 171.602. However, Birdwood Park also recorded the highest raw level at 5300, which would have pulled its mean up significantly.

2.3.3 Scatterplot

plot(x = beaches$Latitude, y = beaches$Enterococci..cfu.100ml., main = "Enterococci Levels by Latitude", ylab = "Enterococci levels (cfu per 100ml)", xlab = "Latitude (ϕ)" )
abline(h = 33, col = "blue")

This scatterplot shows the correlation between Enterococci levels and latitude, with an added line at 33 showing the recommended average level. There is a slight downward trend, as latitude increases, Enterococci levels decrease for the most part. However, this trend is only very slim. At approximately -33.72 there is a clear outlier, however, the negative correlation between Enterococci levels and latitude can be clearly seen regardless of the outlier.

2.3.4 Significance Level

The significance level for this research question is 0.05.

2.3.5 Hypothesis

H0 - The beaches with latitude less than -33.70 will record Enterococci levels of approximately 33 cfu per 100ml. (u = 33) H1 - The beaches with latitude less than -33.70 will record Enterococci levels of more than to 33 cfu per 100ml. (u > 33)

2.3.6 Assumptions

For this research question, it is assumed that: - The proportion follows a normal distribution: the sample size is large enough so the mean will approximately follow a normal distribution. - Observations are independent of each other

2.3.7 Test Statistic

mean(beaches$Enterococci..cfu.100ml.[beaches$Latitude < -33.70])
## [1] 42.09163
sd(beaches$Enterococci..cfu.100ml.[beaches$Latitude < -33.70])
## [1] 263.6469

sample size - 1273

Test Statistic:

(42.09163 - 33)/(263.6469/sqrt(1273))
## [1] 1.230363

2.3.8 P value

pnorm(0.975)
## [1] 0.8352199

As the test statistic (1.230363) is greater than the P value (0.8352199), then there are grounds to reject H0.

2.3.9 Conclusion

The numerical and graphical summaries of the data show some evidence that the most Southerly beaches have reported higher levels of Enterococci cfu per 100ml than the Northernmost beaches. The scatterplot clearly shows a downward trend, and the beaches with the highest means were mostly located South. However, the hypothesis test showed this to not necessarily be the case. The test statistic was greater than the p value, which means H0 can be rejected. Thus, while this paticular data set shows that the Southern beaches had slightly higher levels of Enterococci cfu per 100ml, this is probably not always the case. Geographical locationhas not been showed to have any large effect on the levels of Enterococci cfu per 100ml recorded in the Northern Beaches.

3 References

Beachwatch NSW (2009). Appendix A 195 Appendix A Indicators and Guidelines. [online] Available at: https://www.environment.nsw.gov.au/resources/beach/bppsob0809/09633appendixes0809.pdf [Accessed 10 Nov. 2020].

Beachwatch NSW (2018). Microbial assessments of beach water quality. [online] NSW Environment, Energy and Science. Available at: https://www.environment.nsw.gov.au/topics/water/beaches/reporting-beach-water-quality/guidelines/microbial-assessments [Accessed 5 Nov. 2020].

Hickey, C. and Cowie, C. (n.d.). Taking the Plunge: Recreational Water Quality Guidelines. NSW Public Health Bulletin, [online] 14(8). Available at: https://www.publish.csiro.au/nb/pdf/NB03051 [Accessed 5 Nov. 2020].