This report shows an analysis of the water quality for two stations from the ‘Foz de Areia’ water reservoir. The dataset for each water station presents 19 variables with 39 observations, totalizing 780 observations.
The firs step for a statistical analysis is the descriptive analysis for each variable. The measures used in this report is shown below, such as the boxplot for each variable.
## Min Max Amplitude Mean Median
## Temp.Amb 6.90 33.00 26.10 19.37 19.10
## Temp.Agua 15.10 27.40 12.30 21.60 21.80
## Secchi 0.45 2.40 1.95 1.18 1.15
## OD 4.20 9.40 5.20 7.06 7.10
## pH 6.30 9.70 3.40 7.52 7.20
## Condutividade 39.00 90.00 51.00 63.51 63.00
## Ptotal 0.01 0.07 0.06 0.04 0.03
## Ntotal 0.50 3.80 3.30 1.66 1.50
## ST 39.00 89.00 50.00 58.77 57.00
## Turbidez 3.00 41.00 38.00 9.95 9.00
## Coli.Totais 0.90 3400.00 3399.10 438.53 99.00
## Coli.Termot 0.90 31.00 30.10 5.84 2.00
## DBO 0.90 8.12 7.22 2.11 1.89
## DQO 3.02 25.12 22.10 10.86 10.10
## Clorofila.a 0.92 148.67 147.75 14.67 6.04
## Fitoplancton 467.00 271594.00 271127.00 22465.79 5776.00
## Cianobacteria 0.00 269536.00 269536.00 19145.79 2152.00
## Pluvio 0.00 82.00 82.00 12.21 1.00
## Standard.Deviaton Mode Coefficient.of.Variation
## Temp.Amb 5.37 26.00 0.28
## Temp.Agua 3.61 26.00 0.17
## Secchi 0.46 0.90 0.39
## OD 1.13 7.10 0.16
## pH 0.72 7.20 0.10
## Condutividade 12.79 61.00 0.20
## Ptotal 0.01 0.02 0.41
## Ntotal 0.68 1.30 0.41
## ST 11.26 57.00 0.19
## Turbidez 7.00 5.00 0.70
## Coli.Totais 799.66 49.00 1.82
## Coli.Termot 7.90 0.90 1.35
## DBO 1.56 0.90 0.74
## DQO 4.88 11.05 0.45
## Clorofila.a 27.25 2.60 1.86
## Fitoplancton 46424.56 467.00 2.07
## Cianobacteria 46382.84 0.00 2.42
## Pluvio 19.97 0.00 1.64
## Min Max Amplitude Mean Median
## Temp.Amb 6.70 29.60 22.90 21.63 22.00
## Temp.Agua 10.10 26.90 16.80 19.91 19.60
## Secchi 0.20 3.00 2.80 1.50 1.40
## OD 3.50 15.20 11.70 7.68 7.20
## pH 6.70 10.10 3.40 7.77 7.50
## Condutividade 36.00 83.00 47.00 58.33 58.00
## Ptotal 0.01 0.46 0.45 0.05 0.03
## Ntotal 0.40 10.00 9.60 2.07 1.60
## ST 33.00 153.00 120.00 58.72 51.00
## Turbidez 2.00 95.00 93.00 13.41 5.00
## Coli.Totais 0.90 27000.00 26999.10 1111.53 7.80
## Coli.Termot 0.90 22.00 21.10 2.56 1.00
## DBO 0.90 154.56 153.66 6.08 1.90
## DQO 1.50 330.00 328.50 21.03 8.95
## Clorofila.a 0.82 469.00 468.18 41.62 5.80
## Fitoplancton 569.00 1432347.00 1431778.00 92972.95 9651.00
## Cianobacteria 0.00 1431246.00 1431246.00 90646.05 6820.00
## Pluvio 0.00 121.40 121.40 14.87 4.80
## Standard.Deviaton Mode Coefficient.of.Variation
## Temp.Amb 4.33 18.00 0.20
## Temp.Agua 4.13 18.00 0.21
## Secchi 0.76 0.90 0.50
## OD 2.52 6.50 0.33
## pH 0.93 7.20 0.12
## Condutividade 11.83 62.00 0.20
## Ptotal 0.08 0.02 1.63
## Ntotal 1.97 1.00 0.95
## ST 25.19 47.00 0.43
## Turbidez 20.65 4.00 1.54
## Coli.Totais 4429.79 1.70 3.99
## Coli.Termot 4.66 0.90 1.82
## DBO 24.47 0.90 4.03
## DQO 54.03 6.00 2.57
## Clorofila.a 107.30 0.82 2.58
## Fitoplancton 263237.64 569.00 2.83
## Cianobacteria 263680.54 0.00 2.91
## Pluvio 26.95 0.00 1.81
The second analysis is the principal component analysis. Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. Firstly, we should standardize our data, using the mean and the standard deviation. After that we analyze the importance of components, as it follows:
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 2.3145 1.7712 1.3843 1.21847 1.09862 1.00618
## Proportion of Variance 0.2976 0.1743 0.1065 0.08248 0.06705 0.05624
## Cumulative Proportion 0.2976 0.4719 0.5784 0.66083 0.72789 0.78413
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 0.96817 0.81672 0.76256 0.65970 0.61352 0.51168
## Proportion of Variance 0.05208 0.03706 0.03231 0.02418 0.02091 0.01455
## Cumulative Proportion 0.83621 0.87326 0.90557 0.92975 0.95066 0.96520
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 0.46570 0.39847 0.35167 0.29794 0.19350 0.02797
## Proportion of Variance 0.01205 0.00882 0.00687 0.00493 0.00208 0.00004
## Cumulative Proportion 0.97725 0.98607 0.99294 0.99788 0.99996 1.00000
As we can see, the nine first components presents a cumulative proportion of variance higher than 90%. The plot also shows that around the ninth component the variance is very small. So, for this water station (FA_2R), the nine first component should be used. For the second station (FA_3R), the seven first components will be used, as the following table and plot confirms.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 3.0304 1.3471 1.13506 1.07824 1.02571 0.98634
## Proportion of Variance 0.5102 0.1008 0.07158 0.06459 0.05845 0.05405
## Cumulative Proportion 0.5102 0.6110 0.68258 0.74717 0.80562 0.85967
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 0.92375 0.71354 0.64142 0.50971 0.42154 0.35064
## Proportion of Variance 0.04741 0.02829 0.02286 0.01443 0.00987 0.00683
## Cumulative Proportion 0.90707 0.93536 0.95821 0.97265 0.98252 0.98935
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 0.30114 0.22999 0.18572 0.10862 0.04257 0.003698
## Proportion of Variance 0.00504 0.00294 0.00192 0.00066 0.00010 0.000000
## Cumulative Proportion 0.99439 0.99733 0.99924 0.99990 1.00000 1.000000
Once the number of components to be analyzed has been defined, we verify that the seasons affect these components. To do this, we use the ANOVA procedure. The results for the FA_2R station are:
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 66.9 22.302 2.767 0.0562 .
## Residuals 35 282.1 8.059
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 34.97 11.656 12 1.48e-05 ***
## Residuals 35 33.99 0.971
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 6.43 2.145 1.765 0.172
## Residuals 35 42.52 1.215
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 1.03 0.3443 0.279 0.84
## Residuals 35 43.15 1.2327
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 1.44 0.4792 0.435 0.729
## Residuals 35 38.54 1.1012
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 4.34 1.4458 1.551 0.219
## Residuals 35 32.63 0.9323
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 4.137 1.3791 1.706 0.184
## Residuals 35 28.289 0.8082
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 0.147 0.0490 0.089 0.965
## Residuals 35 19.200 0.5486
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 1.894 0.6312 1.608 0.205
## Residuals 35 13.740 0.3926
In the most of the cases, the seasons does not affect the components. The result were quite similar to the FA_3R station.
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 66.9 22.300 2.767 0.0562 .
## Residuals 35 282.1 8.059
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 34.57 11.523 11.73 1.8e-05 ***
## Residuals 35 34.39 0.983
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 6.48 2.160 1.78 0.169
## Residuals 35 42.48 1.214
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 0.76 0.2541 0.205 0.892
## Residuals 35 43.42 1.2405
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 1.22 0.4056 0.366 0.778
## Residuals 35 38.76 1.1075
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 4.31 1.4357 1.539 0.222
## Residuals 35 32.66 0.9332
## Df Sum Sq Mean Sq F value Pr(>F)
## Estacao 3 3.909 1.3029 1.599 0.207
## Residuals 35 28.517 0.8148
For last, we calculated the factor loadings, communality and the specific variance. Factor loadings represent how much a factor explains a variable in factor analysis. It is basically a simple correlation between the variables and the factors. Communality is the portion of the variance that a variable shares with all other variables considered. It is also the proportion of variance explained by common factors. For the first station the results is shown below:
## PC1 PC2 PC3 PC4
## Temp.Ambiente 0.74313585 -0.21548163 0.15446632 -0.187035481
## Temp.Agua 0.63688068 -0.06971953 0.10525507 0.084413453
## Secchi -0.57336715 0.56630706 -0.34677295 -0.023825889
## OD 0.37895905 0.43869672 0.07266318 0.501267230
## pH 0.64520041 0.43625584 0.03257857 0.386561941
## Condutividade 0.17751045 -0.17444074 -0.83874492 0.251269425
## Ptotal 0.67637736 -0.48523197 -0.20298907 -0.084128779
## Ntotal 0.66184739 -0.40032701 -0.03819923 0.210106740
## ST 0.48359035 -0.46952336 -0.57705777 0.066374566
## Turbidez 0.40465919 -0.53154584 0.12719559 -0.297444321
## Coli.Totais 0.22592359 -0.68698210 -0.00361884 0.083728456
## Coli.Termot -0.14046539 -0.30779733 0.47937381 -0.233753925
## DBO 0.17289699 0.35654723 -0.42926294 -0.638638363
## DQO 0.34672762 0.29296167 -0.21949904 -0.587899383
## Clorofila.a 0.80009739 0.48623152 0.05442041 0.002397134
## Fitoplancton 0.81439829 0.44272330 0.17506303 -0.080516173
## Cianobacteria 0.81321200 0.43167465 0.16694731 -0.097362523
## Pluvio. 0.01750353 -0.23385103 0.36831099 -0.054431688
## PC5 PC6 PC7 PC8
## Temp.Ambiente 0.198465379 -0.333229392 0.164525095 -0.065498644
## Temp.Agua 0.531638083 -0.432379656 -0.045815005 0.143357494
## Secchi 0.058765188 -0.086791106 0.030790434 0.291860935
## OD -0.334637818 0.292022069 0.013552065 -0.254267917
## pH 0.038692133 -0.110071762 -0.136432736 -0.017116348
## Condutividade 0.013656858 -0.051968542 -0.241572649 0.172061469
## Ptotal -0.067902977 0.243068441 -0.187182724 0.052704246
## Ntotal 0.178302632 0.319989934 0.248409183 0.131978711
## ST -0.201280751 -0.028617920 -0.212176425 0.007377689
## Turbidez -0.399844663 -0.294896114 -0.113938729 -0.305059235
## Coli.Totais -0.004318535 0.131308839 0.492257810 0.090799896
## Coli.Termot -0.427352624 0.019230849 -0.305670178 0.469126251
## DBO 0.132406743 0.025463616 -0.119988649 -0.206826370
## DQO 0.012791559 0.366330471 0.277745565 0.076671640
## Clorofila.a -0.137664002 0.023816756 -0.059393211 -0.037392673
## Fitoplancton -0.088128904 0.008250901 0.001981988 0.166827700
## Cianobacteria -0.098202943 0.009511001 -0.017090155 0.182938768
## Pluvio. 0.542517711 0.452039590 -0.494191350 -0.097300981
## PC9
## Temp.Ambiente 0.021081118
## Temp.Agua 0.185408934
## Secchi -0.005682843
## OD 0.203991933
## pH 0.361863031
## Condutividade -0.104625208
## Ptotal -0.142681105
## Ntotal -0.125103385
## ST 0.167624988
## Turbidez -0.078091539
## Coli.Totais 0.203363933
## Coli.Termot 0.270437616
## DBO 0.172868097
## DQO 0.219203788
## Clorofila.a -0.073878985
## Fitoplancton -0.218459768
## Cianobacteria -0.217001892
## Pluvio. 0.015534816
## Communality Specific Variance Variance
## Temp.Ambiente 0.8397587 0.16024134 1
## Temp.Agua 0.9553001 0.04469985 1
## Secchi 0.8674219 0.13257807 1
## OD 0.8963214 0.10367856 1
## pH 0.9205588 0.07944119 1
## Condutividade 0.9303651 0.06963495 1
## Ptotal 0.8630847 0.13691531 1
## Ntotal 0.8728694 0.12713055 1
## ST 0.9062174 0.09378260 1
## Turbidez 0.9099228 0.09007719 1
## Coli.Totais 0.8391894 0.16081065 1
## Coli.Termot 0.9685602 0.03143984 1
## DBO 0.8543827 0.14561733 1
## DQO 0.8652852 0.13471483 1
## Clorofila.a 0.9094467 0.09055326 1
## Fitoplancton 0.9797733 0.02022673 1
## Cianobacteria 0.9755904 0.02440961 1
## Pluvio. 0.9462076 0.05379236 1
And for the second station:
## PC1 PC2 PC3 PC4
## Temp.Ambiente -0.253537252 0.8680569595 -0.255524785 -0.219074240
## Temp.Agua -0.426407777 0.7433589509 0.038344457 -0.035770479
## Secchi 0.589064389 -0.1265945721 -0.545999396 0.172102507
## OD -0.703176634 -0.2609594540 0.461179675 0.231684515
## pH -0.803510299 0.3173544573 -0.074201548 0.270983537
## Condutividade -0.540887619 0.0900357674 -0.485157043 0.220790830
## Ptotal -0.772760995 0.0716064805 0.147494104 0.072266030
## Ntotal -0.944364556 -0.0530069177 -0.007123909 -0.008785959
## ST -0.900842765 0.0538131674 0.087518694 0.148121045
## Turbidez -0.906619176 -0.0331535849 0.110504699 -0.142210075
## Coli.Totais 0.024375816 0.1987778790 0.411289506 -0.151390796
## Coli.Termot 0.005703707 0.2412633880 0.386779078 0.446247427
## DBO -0.775137276 -0.3196473807 -0.164305414 -0.189489481
## DQO -0.896069610 -0.2403634815 -0.108499836 -0.109126998
## Clorofila.a -0.917661486 0.0006985685 -0.046237809 -0.049010646
## Fitoplancton -0.941366203 -0.1507706812 -0.139168987 -0.098684807
## Cianobacteria -0.941886339 -0.1504591184 -0.137939884 -0.097776698
## Pluvio. 0.124022880 0.0253542725 0.173730489 -0.754198817
## PC5 PC6 PC7
## Temp.Ambiente 0.010364042 0.038863130 -0.03764755
## Temp.Agua -0.089981337 -0.009731537 -0.37430562
## Secchi -0.275786012 0.189343738 0.05571132
## OD 0.002633356 -0.105333437 -0.14165770
## pH 0.119822530 -0.080993234 0.01345327
## Condutividade -0.171495884 -0.105860702 0.46483176
## Ptotal 0.316394766 -0.060107074 0.29752880
## Ntotal -0.001446046 0.020984581 0.16678082
## ST 0.117422444 -0.055968136 0.27716797
## Turbidez 0.109673890 0.021830761 -0.10262922
## Coli.Totais -0.176859094 0.794208605 0.30802053
## Coli.Termot -0.712613404 -0.220193315 0.04466749
## DBO -0.324458109 0.119713602 -0.23509856
## DQO -0.195378155 0.077543203 -0.09009743
## Clorofila.a 0.052049006 0.022754837 -0.02571117
## Fitoplancton -0.130085082 0.074487444 -0.09280143
## Cianobacteria -0.129961828 0.074957218 -0.09092354
## Pluvio. -0.281169341 -0.433128718 0.30145465
## Communality Specific Variance Variance
## Temp.Ambiente 0.9341256 0.06587444 1
## Temp.Agua 0.8854520 0.11454801 1
## Secchi 0.8057704 0.19422962 1
## OD 0.8600906 0.13990940 1
## pH 0.8463789 0.15362107 1
## Condutividade 0.8414777 0.15852231 1
## Ptotal 0.8215058 0.17849417 1
## Ntotal 0.9230204 0.07697962 1
## ST 0.9377555 0.06224455 1
## Turbidez 0.8785302 0.12146981 1
## Coli.Totais 0.9891082 0.01089185 1
## Coli.Termot 0.9652735 0.03472648 1
## DBO 0.9407905 0.05920948 1
## DQO 0.9366994 0.06330062 1
## Clorofila.a 0.8505310 0.14946899 1
## Fitoplancton 0.9690914 0.03090856 1
## Cianobacteria 0.9691513 0.03084873 1
## Pluvio. 0.9725542 0.02744576 1
Each variable belongs to the component in which the factorial load is greater. Variables with factorial load smaller than 0,5 should be removed from the dataset. For the FA_2R station we removed the Coli.Termot variable and for the FA_3R station no variables werer removed.