The Effectiveness of the Fish Farm Monitoring and Notification System by Determining the Correlations of pH Level, Temperature, and Turbidity 🐟

Cucio, Quintos, Tan, Villar & Villareal

August 6, 2021

🐟 Introduction

The Earth is covered by 71% water and the rest are bare land [1]. Large bodies of water surround some countries which are home to an important source for food and livelihood. Monitoring water quality is essential to maintain a robust aquatic environment especially in an archipelago like the Philippines. The Philippines is home to many islands and these islands are surrounded by different bodies of water. These islands and bodies of water are key attractions for tourist spots; however, it is unfortunate that these waters are also considered to be polluted, wherein a fifth of the 2.7 million tons of trash generated each year end up in the ocean, posing a threat to sea life and the people [2]. In order to address these issues, water quality monitoring is implemented to continuously identify real-time parameters with the goals of avoiding its life threatening effects on sea life. In this paper, the data of water quality monitoring from a digital fish farm monitoring system were analyzed to determine whether the monitoring system is an effective model for observing the water quality of fish farms. Using simple linear regression, the researchers determine the significance of regression between the pH level, temperature, and turbidity. The results of this paper will justify the effectiveness of the monitoring system. When proven as an effective model, the monitoring system will be a suitable foundation in creating more advanced monitoring systems for future studies.

🐟 Methodology

A.I. Arafat et al. chose to analyze the most important and impactful factors on water quality [3]. They used three different types of sensors to record the parameters pH level, temperature, and turbidity. The pH level is a measure of how acidic or basic a substance is [4]. Temperature, on the other hand, is the measure of heat in a given object [5]. As for turbidity, it is the measure of relative clarity of a liquid substance [6]. These data were recorded in two different measures of depth, which were 30 cm and 60 cm below the surface. The data gathered by the sensors followed an algorithm and were sent to a website that monitors the condition of the fish farm daily at any time of the day. The raw data is found in the repository in Mendeley Data [7]. The dataset recorded from 30 cm depth will be used to monitor the condition of the water as it contains the three important variables needed for regression analysis. Using these data gathered by A.I. Arafat et al., the researchers chose to evaluate the accuracy of the fish farm monitoring system by testing the significance of regression of the different factors affecting water quality. Three simple linear regression models were made: pH level against temperature, pH level against turbidity, and temperature against turbidity. Thus, the following hypotheses were formulated:
  • For temperature vs. pH Level: \(H_0:\beta_{TempVspH}=0\), \(H_A:\beta_{TempVspH}\neq0\)
  • For turbidity vs. pH Level: \(H_0:\beta_{TurbVspH}=0\), \(H_A:\beta_{TurbVspH}\neq0\)
  • For temperature vs. pH Level: \(H_0:\beta_{TempVsTurb}=0\), \(H_A:\beta_{TempVsTurb}\neq0\)
where each \(\beta_1\) represents the population slope of the least squares regression line y-variable as a function of x-variable. Rejecting the null hypothesis \(H_0\) implies that there is a linear correlation between the two variables.

🐟 Results and Discussion

The given hypotheses were tested its significance of regression using simple linear regression with a significance level of \(\alpha=0.05\). The raw data gathered was imported.
rawdata = read.delim("Sensor-data-for-30-cm.txt", header=T)
Turb = rawdata$Turbidity..NTU.
Temp = rawdata$Temperature..Â.C.
pHLevel = rawdata$pH
The different factors imported above already have a linear relationship with each other. Confirming this relationship using simple linear regression verifies the validity the monitoring system as a solid foundation for similar future models. Drawing a scatter diagram is the simplest way to determine whether it is reasonable to create a simple linear regression model.
plot(Temp, pHLevel,
     main = "Scatter Diagram of Temperature vs. pH Level",
     xlab = "Temperature",
     ylab = "pH Level", las = 1)

plot(Turb, pHLevel,
     main = "Scatter Diagram of Turbidity vs. pH Level",
     xlab = "Turbidity",
     ylab = "pH Level", las = 1)

plot(Temp, Turb,
     main = "Scatter Diagram of Temperature vs. Turbidity",
     xlab = "Temperature",
     ylab = "Turbidity", las = 1)

Based on the diagrams above, it does not seem reasonable. Although, it is because there are a lot of external factors that affect the different factors above. When the data were separated, the relationship between the variables were seen more clearly. For example, the scatterplots below show a clearer representation of the relationship between the different factors.
plot(Temp[5156:5885], pHLevel[5156:5885],
     main = "Scatter Diagram of Temperature vs. pH Level (Day 5: Daytime",
     xlab = "Temperature",
     ylab = "pH Level", las = 1)

plot(Temp[6554:7189], pHLevel[6554:7189],
     main = "Scatter Diagram of Temperature vs. pH Level (Day 6: Daytime",
     xlab = "Temperature",
     ylab = "pH Level", las = 1)

plot(Turb[6554:7189], pHLevel[6554:7189],
     main = "Scatter Diagram of Turbidity vs. pH Level (Day 6: Daytime",
     xlab = "Turbidity",
     ylab = "pH Level", las = 1)

plot(Turb[7833:8476], pHLevel[7833:8476],
     main = "Scatter Diagram of Turbidity vs. pH Level (Day 7: Daytime",
     xlab = "Turbidity",
     ylab = "pH Level", las = 1)

plot(Temp[811:1445], Turb[811:1445],
     main = "Scatter Diagram of Temperature vs. Turbidity (Day 2: Daytime",
     xlab = "Temperature",
     ylab = "Turbidity", las = 1)

plot(Temp[7833:8476], Turb[7833:8476],
     main = "Scatter Diagram of Temperature vs. Turbidity (Day 7: Daytime",
     xlab = "Temperature",
     ylab = "Turbidity", las = 1)

With the given diagrams above, the relationship between the variables are still evident. Hence, testing for the significance of regression for the different models were plausible.

Testing the significance of regression for temperature vs. pH level:

TempvpH = lm(pHLevel ~ Temp)
summary(TempvpH)
## 
## Call:
## lm(formula = pHLevel ~ Temp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.85876 -0.37672  0.00436  0.43688  0.67226 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.308087   0.044005  188.80   <2e-16 ***
## Temp        -0.027503   0.002188  -12.57   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.44 on 9621 degrees of freedom
## Multiple R-squared:  0.01615,    Adjusted R-squared:  0.01605 
## F-statistic:   158 on 1 and 9621 DF,  p-value: < 2.2e-16

Testing the significance of regression for turbidity vs. pH level:

TurbvpH = lm(pHLevel ~ Turb)
summary(TurbvpH)
## 
## Call:
## lm(formula = pHLevel ~ Turb)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.79342 -0.39126 -0.00422  0.45999  0.66840 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.4395716  0.0603153 123.345  < 2e-16 ***
## Turb        0.0014772  0.0002791   5.292 1.23e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.443 on 9621 degrees of freedom
## Multiple R-squared:  0.002903,   Adjusted R-squared:  0.002799 
## F-statistic: 28.01 on 1 and 9621 DF,  p-value: 1.234e-07

Testing the significance of regression for turbidity vs. temperature:

TurbvTemp = lm(Temp ~ Turb)
summary(TurbvTemp)
## 
## Call:
## lm(formula = Temp ~ Turb)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4326 -1.6668 -0.0747  1.5594  4.0650 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 21.726667   0.278581  77.990  < 2e-16 ***
## Turb        -0.007991   0.001289  -6.198 5.94e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.046 on 9621 degrees of freedom
## Multiple R-squared:  0.003977,   Adjusted R-squared:  0.003874 
## F-statistic: 38.42 on 1 and 9621 DF,  p-value: 5.942e-10
The p-value of each simple linear regression model can be determined from the information above. The p-values for temperature vs. pH level, turbidity vs. pH level, and turbidity vs. temperature are 6.1580143^{-36}, 1.2350948^{-7}, 5.9509674^{-10}, respectively. Since all of the p-values are less than \(0.05\), the null hypotheses are rejected. Therefore, there is sufficient evidence to conclude that there is a linear relationship between each two given factors.

🐟 Conclusion

From the data gathered, the researchers came to the conclusion that the null hypotheses were rejected as there was significant evidence to show that there is a linear relationship between the given factors. From the regression models created, it is concluded that pH level against temperature, pH level against turbidity, and temperature against turbidity have linear relationships with each other. The results determine that the monitoring system is an accurate model to observe the water quality of fish farms. This research will be helpful not only for the owners of fish farms but also to the different communities near bodies of water that are in need of monitoring the quality of their waters. This research serves as a foundation for future studies about monitoring systems to create a more advanced system that will also monitor other factors other than pH level, temperature and turbidity. Monitoring systems can also be produced involving other mediums that differ from water such as air, soil, etc.

🐟 References

[1] M. Williams, What percent of Earth is water?. Universe Today, 2014 https://phys.org/news/2014-12-percent-earth.html

[2] McKinsey & Company and Ocean Conservancy, Stemming the Tide: Land-based Strategies for a Plastic-Free Ocean. 2015. https://www.mckinsey.com/business-functions/sustainability/our-insights/stemming-the-tide-land-based-strategies-for-a-plastic-free-ocean

[3] Arif Istiaq Arafat, Tasmima Akter, Md. Ferdous Ahammed, Md. Younus Ali, Abdullah-Al Nahid, A dataset for internet of things based fish farm monitoring and notification system. 2020. https://doi.org/10.1016/j.dib.2020.106457

[4] United States Geological Survey, pH and Water. 2019. https://www.usgs.gov/special-topic/water-science-school/science/ph-and-water?qt-science_center_objects=0#qt-science_center_objects

[5] Andrew Zimmerman Jones, Temperature Definition in ScienceTemperature Definition in Science. 2019. https://www.thoughtco.com/temperature-definition-in-science-2699014

[6] United States Geological Survey. Turbidity and Water. 2019. https://www.usgs.gov/special-topic/water-science-school/science/turbidity-and-water?qt-science_center_objects=0#qt-science_center_objects

[7] Arif Istiaq Arafat, Tasmima Akter, Md. Ferdous Ahammed, Md. Younus Ali, Abdullah-Al Nahid, KU-MWQ: A Dataset for Monitoring Water Quality Using Digital Sensors. 2020. https://data.mendeley.com/datasets/34rczh25kc/4