1. Introduction

This report shows an analysis of the water quality for two stations from the ‘Foz de Areia’ water reservoir. The dataset for each water station presents 19 variables with 39 observations, totalizing 780 observations.

2. Descriptive Analysis

The firs step for a statistical analysis is the descriptive analysis for each variable. The measures used in this report is shown below, such as the boxplot for each variable.

##                  Min       Max Amplitude     Mean  Median
## Temp.Amb        6.90     33.00     26.10    19.37   19.10
## Temp.Agua      15.10     27.40     12.30    21.60   21.80
## Secchi          0.45      2.40      1.95     1.18    1.15
## OD              4.20      9.40      5.20     7.06    7.10
## pH              6.30      9.70      3.40     7.52    7.20
## Condutividade  39.00     90.00     51.00    63.51   63.00
## Ptotal          0.01      0.07      0.06     0.04    0.03
## Ntotal          0.50      3.80      3.30     1.66    1.50
## ST             39.00     89.00     50.00    58.77   57.00
## Turbidez        3.00     41.00     38.00     9.95    9.00
## Coli.Totais     0.90   3400.00   3399.10   438.53   99.00
## Coli.Termot     0.90     31.00     30.10     5.84    2.00
## DBO             0.90      8.12      7.22     2.11    1.89
## DQO             3.02     25.12     22.10    10.86   10.10
## Clorofila.a     0.92    148.67    147.75    14.67    6.04
## Fitoplancton  467.00 271594.00 271127.00 22465.79 5776.00
## Cianobacteria   0.00 269536.00 269536.00 19145.79 2152.00
## Pluvio          0.00     82.00     82.00    12.21    1.00
##               Standard.Deviaton   Mode Coefficient.of.Variation
## Temp.Amb                   5.37  26.00                     0.28
## Temp.Agua                  3.61  26.00                     0.17
## Secchi                     0.46   0.90                     0.39
## OD                         1.13   7.10                     0.16
## pH                         0.72   7.20                     0.10
## Condutividade             12.79  61.00                     0.20
## Ptotal                     0.01   0.02                     0.41
## Ntotal                     0.68   1.30                     0.41
## ST                        11.26  57.00                     0.19
## Turbidez                   7.00   5.00                     0.70
## Coli.Totais              799.66  49.00                     1.82
## Coli.Termot                7.90   0.90                     1.35
## DBO                        1.56   0.90                     0.74
## DQO                        4.88  11.05                     0.45
## Clorofila.a               27.25   2.60                     1.86
## Fitoplancton           46424.56 467.00                     2.07
## Cianobacteria          46382.84   0.00                     2.42
## Pluvio                    19.97   0.00                     1.64
##                  Min        Max  Amplitude     Mean  Median
## Temp.Amb        6.70      29.60      22.90    21.63   22.00
## Temp.Agua      10.10      26.90      16.80    19.91   19.60
## Secchi          0.20       3.00       2.80     1.50    1.40
## OD              3.50      15.20      11.70     7.68    7.20
## pH              6.70      10.10       3.40     7.77    7.50
## Condutividade  36.00      83.00      47.00    58.33   58.00
## Ptotal          0.01       0.46       0.45     0.05    0.03
## Ntotal          0.40      10.00       9.60     2.07    1.60
## ST             33.00     153.00     120.00    58.72   51.00
## Turbidez        2.00      95.00      93.00    13.41    5.00
## Coli.Totais     0.90   27000.00   26999.10  1111.53    7.80
## Coli.Termot     0.90      22.00      21.10     2.56    1.00
## DBO             0.90     154.56     153.66     6.08    1.90
## DQO             1.50     330.00     328.50    21.03    8.95
## Clorofila.a     0.82     469.00     468.18    41.62    5.80
## Fitoplancton  569.00 1432347.00 1431778.00 92972.95 9651.00
## Cianobacteria   0.00 1431246.00 1431246.00 90646.05 6820.00
## Pluvio          0.00     121.40     121.40    14.87    4.80
##               Standard.Deviaton   Mode Coefficient.of.Variation
## Temp.Amb                   4.33  18.00                     0.20
## Temp.Agua                  4.13  18.00                     0.21
## Secchi                     0.76   0.90                     0.50
## OD                         2.52   6.50                     0.33
## pH                         0.93   7.20                     0.12
## Condutividade             11.83  62.00                     0.20
## Ptotal                     0.08   0.02                     1.63
## Ntotal                     1.97   1.00                     0.95
## ST                        25.19  47.00                     0.43
## Turbidez                  20.65   4.00                     1.54
## Coli.Totais             4429.79   1.70                     3.99
## Coli.Termot                4.66   0.90                     1.82
## DBO                       24.47   0.90                     4.03
## DQO                       54.03   6.00                     2.57
## Clorofila.a              107.30   0.82                     2.58
## Fitoplancton          263237.64 569.00                     2.83
## Cianobacteria         263680.54   0.00                     2.91
## Pluvio                    26.95   0.00                     1.81

3. Principal Component Analysis

The second analysis is the principal component analysis. Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. Firstly, we should standardize our data, using the mean and the standard deviation. After that we analyze the importance of components, as it follows:

## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6
## Standard deviation     2.3145 1.7712 1.3843 1.21847 1.09862 1.00618
## Proportion of Variance 0.2976 0.1743 0.1065 0.08248 0.06705 0.05624
## Cumulative Proportion  0.2976 0.4719 0.5784 0.66083 0.72789 0.78413
##                            PC7     PC8     PC9    PC10    PC11    PC12
## Standard deviation     0.96817 0.81672 0.76256 0.65970 0.61352 0.51168
## Proportion of Variance 0.05208 0.03706 0.03231 0.02418 0.02091 0.01455
## Cumulative Proportion  0.83621 0.87326 0.90557 0.92975 0.95066 0.96520
##                           PC13    PC14    PC15    PC16    PC17    PC18
## Standard deviation     0.46570 0.39847 0.35167 0.29794 0.19350 0.02797
## Proportion of Variance 0.01205 0.00882 0.00687 0.00493 0.00208 0.00004
## Cumulative Proportion  0.97725 0.98607 0.99294 0.99788 0.99996 1.00000

As we can see, the nine first components presents a cumulative proportion of variance higher than 90%. The plot also shows that around the ninth component the variance is very small. So, for this water station (FA_2R), the nine first component should be used. For the second station (FA_3R), the seven first components will be used, as the following table and plot confirms.

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     3.0304 1.3471 1.13506 1.07824 1.02571 0.98634
## Proportion of Variance 0.5102 0.1008 0.07158 0.06459 0.05845 0.05405
## Cumulative Proportion  0.5102 0.6110 0.68258 0.74717 0.80562 0.85967
##                            PC7     PC8     PC9    PC10    PC11    PC12
## Standard deviation     0.92375 0.71354 0.64142 0.50971 0.42154 0.35064
## Proportion of Variance 0.04741 0.02829 0.02286 0.01443 0.00987 0.00683
## Cumulative Proportion  0.90707 0.93536 0.95821 0.97265 0.98252 0.98935
##                           PC13    PC14    PC15    PC16    PC17     PC18
## Standard deviation     0.30114 0.22999 0.18572 0.10862 0.04257 0.003698
## Proportion of Variance 0.00504 0.00294 0.00192 0.00066 0.00010 0.000000
## Cumulative Proportion  0.99439 0.99733 0.99924 0.99990 1.00000 1.000000

Once the number of components to be analyzed has been defined, we verify that the seasons affect these components. To do this, we use the ANOVA procedure. The results for the FA_2R station are:

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Estacao      3   66.9  22.302   2.767 0.0562 .
## Residuals   35  282.1   8.059                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Estacao      3  34.97  11.656      12 1.48e-05 ***
## Residuals   35  33.99   0.971                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3   6.43   2.145   1.765  0.172
## Residuals   35  42.52   1.215
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3   1.03  0.3443   0.279   0.84
## Residuals   35  43.15  1.2327
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3   1.44  0.4792   0.435  0.729
## Residuals   35  38.54  1.1012
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3   4.34  1.4458   1.551  0.219
## Residuals   35  32.63  0.9323
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3  4.137  1.3791   1.706  0.184
## Residuals   35 28.289  0.8082
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3  0.147  0.0490   0.089  0.965
## Residuals   35 19.200  0.5486
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3  1.894  0.6312   1.608  0.205
## Residuals   35 13.740  0.3926

In the most of the cases, the seasons does not affect the components. The result were quite similar to the FA_3R station.

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Estacao      3   66.9  22.300   2.767 0.0562 .
## Residuals   35  282.1   8.059                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## Estacao      3  34.57  11.523   11.73 1.8e-05 ***
## Residuals   35  34.39   0.983                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3   6.48   2.160    1.78  0.169
## Residuals   35  42.48   1.214
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3   0.76  0.2541   0.205  0.892
## Residuals   35  43.42  1.2405
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3   1.22  0.4056   0.366  0.778
## Residuals   35  38.76  1.1075
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3   4.31  1.4357   1.539  0.222
## Residuals   35  32.66  0.9332
##             Df Sum Sq Mean Sq F value Pr(>F)
## Estacao      3  3.909  1.3029   1.599  0.207
## Residuals   35 28.517  0.8148

For last, we calculated the factor loadings, communality and the specific variance. Factor loadings represent how much a factor explains a variable in factor analysis. It is basically a simple correlation between the variables and the factors. Communality is the portion of the variance that a variable shares with all other variables considered. It is also the proportion of variance explained by common factors. For the first station the results is shown below:

##                       PC1         PC2         PC3          PC4
## Temp.Ambiente  0.74313585 -0.21548163  0.15446632 -0.187035481
## Temp.Agua      0.63688068 -0.06971953  0.10525507  0.084413453
## Secchi        -0.57336715  0.56630706 -0.34677295 -0.023825889
## OD             0.37895905  0.43869672  0.07266318  0.501267230
## pH             0.64520041  0.43625584  0.03257857  0.386561941
## Condutividade  0.17751045 -0.17444074 -0.83874492  0.251269425
## Ptotal         0.67637736 -0.48523197 -0.20298907 -0.084128779
## Ntotal         0.66184739 -0.40032701 -0.03819923  0.210106740
## ST             0.48359035 -0.46952336 -0.57705777  0.066374566
## Turbidez       0.40465919 -0.53154584  0.12719559 -0.297444321
## Coli.Totais    0.22592359 -0.68698210 -0.00361884  0.083728456
## Coli.Termot   -0.14046539 -0.30779733  0.47937381 -0.233753925
## DBO            0.17289699  0.35654723 -0.42926294 -0.638638363
## DQO            0.34672762  0.29296167 -0.21949904 -0.587899383
## Clorofila.a    0.80009739  0.48623152  0.05442041  0.002397134
## Fitoplancton   0.81439829  0.44272330  0.17506303 -0.080516173
## Cianobacteria  0.81321200  0.43167465  0.16694731 -0.097362523
## Pluvio.        0.01750353 -0.23385103  0.36831099 -0.054431688
##                        PC5          PC6          PC7          PC8
## Temp.Ambiente  0.198465379 -0.333229392  0.164525095 -0.065498644
## Temp.Agua      0.531638083 -0.432379656 -0.045815005  0.143357494
## Secchi         0.058765188 -0.086791106  0.030790434  0.291860935
## OD            -0.334637818  0.292022069  0.013552065 -0.254267917
## pH             0.038692133 -0.110071762 -0.136432736 -0.017116348
## Condutividade  0.013656858 -0.051968542 -0.241572649  0.172061469
## Ptotal        -0.067902977  0.243068441 -0.187182724  0.052704246
## Ntotal         0.178302632  0.319989934  0.248409183  0.131978711
## ST            -0.201280751 -0.028617920 -0.212176425  0.007377689
## Turbidez      -0.399844663 -0.294896114 -0.113938729 -0.305059235
## Coli.Totais   -0.004318535  0.131308839  0.492257810  0.090799896
## Coli.Termot   -0.427352624  0.019230849 -0.305670178  0.469126251
## DBO            0.132406743  0.025463616 -0.119988649 -0.206826370
## DQO            0.012791559  0.366330471  0.277745565  0.076671640
## Clorofila.a   -0.137664002  0.023816756 -0.059393211 -0.037392673
## Fitoplancton  -0.088128904  0.008250901  0.001981988  0.166827700
## Cianobacteria -0.098202943  0.009511001 -0.017090155  0.182938768
## Pluvio.        0.542517711  0.452039590 -0.494191350 -0.097300981
##                        PC9
## Temp.Ambiente  0.021081118
## Temp.Agua      0.185408934
## Secchi        -0.005682843
## OD             0.203991933
## pH             0.361863031
## Condutividade -0.104625208
## Ptotal        -0.142681105
## Ntotal        -0.125103385
## ST             0.167624988
## Turbidez      -0.078091539
## Coli.Totais    0.203363933
## Coli.Termot    0.270437616
## DBO            0.172868097
## DQO            0.219203788
## Clorofila.a   -0.073878985
## Fitoplancton  -0.218459768
## Cianobacteria -0.217001892
## Pluvio.        0.015534816
##               Communality Specific Variance Variance
## Temp.Ambiente   0.8397587        0.16024134        1
## Temp.Agua       0.9553001        0.04469985        1
## Secchi          0.8674219        0.13257807        1
## OD              0.8963214        0.10367856        1
## pH              0.9205588        0.07944119        1
## Condutividade   0.9303651        0.06963495        1
## Ptotal          0.8630847        0.13691531        1
## Ntotal          0.8728694        0.12713055        1
## ST              0.9062174        0.09378260        1
## Turbidez        0.9099228        0.09007719        1
## Coli.Totais     0.8391894        0.16081065        1
## Coli.Termot     0.9685602        0.03143984        1
## DBO             0.8543827        0.14561733        1
## DQO             0.8652852        0.13471483        1
## Clorofila.a     0.9094467        0.09055326        1
## Fitoplancton    0.9797733        0.02022673        1
## Cianobacteria   0.9755904        0.02440961        1
## Pluvio.         0.9462076        0.05379236        1

And for the second station:

##                        PC1           PC2          PC3          PC4
## Temp.Ambiente -0.253537252  0.8680569595 -0.255524785 -0.219074240
## Temp.Agua     -0.426407777  0.7433589509  0.038344457 -0.035770479
## Secchi         0.589064389 -0.1265945721 -0.545999396  0.172102507
## OD            -0.703176634 -0.2609594540  0.461179675  0.231684515
## pH            -0.803510299  0.3173544573 -0.074201548  0.270983537
## Condutividade -0.540887619  0.0900357674 -0.485157043  0.220790830
## Ptotal        -0.772760995  0.0716064805  0.147494104  0.072266030
## Ntotal        -0.944364556 -0.0530069177 -0.007123909 -0.008785959
## ST            -0.900842765  0.0538131674  0.087518694  0.148121045
## Turbidez      -0.906619176 -0.0331535849  0.110504699 -0.142210075
## Coli.Totais    0.024375816  0.1987778790  0.411289506 -0.151390796
## Coli.Termot    0.005703707  0.2412633880  0.386779078  0.446247427
## DBO           -0.775137276 -0.3196473807 -0.164305414 -0.189489481
## DQO           -0.896069610 -0.2403634815 -0.108499836 -0.109126998
## Clorofila.a   -0.917661486  0.0006985685 -0.046237809 -0.049010646
## Fitoplancton  -0.941366203 -0.1507706812 -0.139168987 -0.098684807
## Cianobacteria -0.941886339 -0.1504591184 -0.137939884 -0.097776698
## Pluvio.        0.124022880  0.0253542725  0.173730489 -0.754198817
##                        PC5          PC6         PC7
## Temp.Ambiente  0.010364042  0.038863130 -0.03764755
## Temp.Agua     -0.089981337 -0.009731537 -0.37430562
## Secchi        -0.275786012  0.189343738  0.05571132
## OD             0.002633356 -0.105333437 -0.14165770
## pH             0.119822530 -0.080993234  0.01345327
## Condutividade -0.171495884 -0.105860702  0.46483176
## Ptotal         0.316394766 -0.060107074  0.29752880
## Ntotal        -0.001446046  0.020984581  0.16678082
## ST             0.117422444 -0.055968136  0.27716797
## Turbidez       0.109673890  0.021830761 -0.10262922
## Coli.Totais   -0.176859094  0.794208605  0.30802053
## Coli.Termot   -0.712613404 -0.220193315  0.04466749
## DBO           -0.324458109  0.119713602 -0.23509856
## DQO           -0.195378155  0.077543203 -0.09009743
## Clorofila.a    0.052049006  0.022754837 -0.02571117
## Fitoplancton  -0.130085082  0.074487444 -0.09280143
## Cianobacteria -0.129961828  0.074957218 -0.09092354
## Pluvio.       -0.281169341 -0.433128718  0.30145465
##               Communality Specific Variance Variance
## Temp.Ambiente   0.9341256        0.06587444        1
## Temp.Agua       0.8854520        0.11454801        1
## Secchi          0.8057704        0.19422962        1
## OD              0.8600906        0.13990940        1
## pH              0.8463789        0.15362107        1
## Condutividade   0.8414777        0.15852231        1
## Ptotal          0.8215058        0.17849417        1
## Ntotal          0.9230204        0.07697962        1
## ST              0.9377555        0.06224455        1
## Turbidez        0.8785302        0.12146981        1
## Coli.Totais     0.9891082        0.01089185        1
## Coli.Termot     0.9652735        0.03472648        1
## DBO             0.9407905        0.05920948        1
## DQO             0.9366994        0.06330062        1
## Clorofila.a     0.8505310        0.14946899        1
## Fitoplancton    0.9690914        0.03090856        1
## Cianobacteria   0.9691513        0.03084873        1
## Pluvio.         0.9725542        0.02744576        1

Each variable belongs to the component in which the factorial load is greater. Variables with factorial load smaller than 0,5 should be removed from the dataset. For the FA_2R station we removed the Coli.Termot variable and for the FA_3R station no variables werer removed.