Water Quality v Precipitation Variability

Author

Kathleen Kirsch

Pull in Rainfall, Runoff, and Water Quality Data

Code hidden throughout for brevity*

#packages

library(lubridate)
library(data.table)
library(dplyr) 
library(tibble)
library(raster)

Cleaning & Data Check

    n
1 155
    n
1 188
[1] 134
[1] 21

Histograms

Categorical Indicators

Characteristic N = 1881
wq
    0 6 (3.2%)
    1-9 37 (20%)
    10-99 74 (39%)
    100+ 71 (38%)
wq_b1
    0 6 (3.2%)
    1+ 182 (97%)
wq_b10
    0-9 43 (23%)
    10+ 145 (77%)
wq_b100
    0-99 117 (62%)
    100+ 71 (38%)
1 n (%)
Characteristic N = 9161
wq
    0 279 (30%)
    1-9 168 (18%)
    10-99 164 (18%)
    100+ 305 (33%)
wq_b1
    0 279 (30%)
    1+ 637 (70%)
wq_b10
    0-9 447 (49%)
    10+ 469 (51%)
wq_b100
    0-99 611 (67%)
    100+ 305 (33%)
1 n (%)

Logistic regression B1 Rainfall

Testing unit increases in rainfall and runoff across risk of being contaminated (binary indicator 1 count)


Call:
glm(formula = wq_b1 ~ imerg_rf + lis_runoff, family = "binomial", 
    data = rwanda)

Coefficients:
            Estimate Std. Error z value          Pr(>|z|)    
(Intercept)  0.62989    0.08786   7.169 0.000000000000755 ***
imerg_rf     0.04085    0.01846   2.213            0.0269 *  
lis_runoff   0.28000    0.11793   2.374            0.0176 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1126.1  on 915  degrees of freedom
Residual deviance: 1108.0  on 913  degrees of freedom
AIC: 1114

Number of Fisher Scoring iterations: 4
Parameter   | Odds Ratio |    SE |         95% CI |     z |      p
------------------------------------------------------------------
(Intercept) |      1.877 | 0.165 | [1.582, 2.233] | 7.169 | < .001
imerg rf    |      1.042 | 0.019 | [1.006, 1.082] | 2.213 | 0.027 
lis runoff  |      1.323 | 0.156 | [1.076, 1.721] | 2.374 | 0.018 

Model: wq_b1 ~ imerg_rf + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 913)
Tjur's R2: 0.018

Logistic regression B10 Rainfall

Testing unit increases in rainfall and runoff across risk of being contaminated (binary indicator 10 count)


Call:
glm(formula = wq_b10 ~ imerg_rf + lis_runoff, family = "binomial", 
    data = rwanda)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept) -0.15853    0.08221  -1.928  0.05381 . 
imerg_rf     0.04328    0.01602   2.701  0.00691 **
lis_runoff   0.22759    0.08442   2.696  0.00702 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1269.3  on 915  degrees of freedom
Residual deviance: 1246.6  on 913  degrees of freedom
AIC: 1252.6

Number of Fisher Scoring iterations: 4
Parameter   | Odds Ratio |    SE |         95% CI |      z |     p
------------------------------------------------------------------
(Intercept) |      0.853 | 0.070 | [0.726, 1.002] | -1.928 | 0.054
imerg rf    |      1.044 | 0.017 | [1.013, 1.078] |  2.701 | 0.007
lis runoff  |      1.256 | 0.106 | [1.075, 1.501] |  2.696 | 0.007

Model: wq_b10 ~ imerg_rf + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 913)
Tjur's R2: 0.024

Logistic regression B100 Rainfall

Testing unit increases in rainfall and runoff across risk of being contaminated (binary indicator 10 count)


Call:
glm(formula = wq_b100 ~ imerg_rf + lis_runoff, family = "binomial", 
    data = rwanda)

Coefficients:
            Estimate Std. Error z value             Pr(>|z|)    
(Intercept) -0.89267    0.08857 -10.078 < 0.0000000000000002 ***
imerg_rf     0.05704    0.01591   3.585             0.000337 ***
lis_runoff   0.02272    0.06841   0.332             0.739831    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1165.6  on 915  degrees of freedom
Residual deviance: 1150.6  on 913  degrees of freedom
AIC: 1156.6

Number of Fisher Scoring iterations: 4
Parameter   | Odds Ratio |    SE |         95% CI |       z |      p
--------------------------------------------------------------------
(Intercept) |      0.410 | 0.036 | [0.344, 0.486] | -10.078 | < .001
imerg rf    |      1.059 | 0.017 | [1.026, 1.093] |   3.585 | < .001
lis runoff  |      1.023 | 0.070 | [0.892, 1.169] |   0.332 | 0.740 

Model: wq_b100 ~ imerg_rf + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 913)
Tjur's R2: 0.017

Quadratic Effect B1

Testing linearity – is there an inflection point?


Call:
glm(formula = wq_b1 ~ imerg_rf + I(imerg_rf^2) + lis_runoff, 
    family = "binomial", data = rwanda)

Coefficients:
               Estimate Std. Error z value       Pr(>|z|)    
(Intercept)    0.570073   0.092645   6.153 0.000000000759 ***
imerg_rf       0.097563   0.035220   2.770         0.0056 ** 
I(imerg_rf^2) -0.003318   0.001688  -1.966         0.0493 *  
lis_runoff     0.230589   0.117737   1.959         0.0502 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1126.1  on 915  degrees of freedom
Residual deviance: 1104.4  on 912  degrees of freedom
AIC: 1112.4

Number of Fisher Scoring iterations: 4
Parameter   | Odds Ratio |    SE |         95% CI |      z |      p
-------------------------------------------------------------------
(Intercept) |      1.768 | 0.164 | [1.477, 2.124] |  6.153 | < .001
imerg rf    |      1.102 | 0.039 | [1.029, 1.182] |  2.770 | 0.006 
imerg rf^2  |      0.997 | 0.002 | [0.993, 1.000] | -1.966 | 0.049 
lis runoff  |      1.259 | 0.148 | [1.024, 1.636] |  1.959 | 0.050 

Model: wq_b1 ~ imerg_rf + I(imerg_rf^2) + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 912)
Tjur's R2: 0.022

Quadratic Effect B10

Testing linearity – is there an inflection point?


Call:
glm(formula = wq_b10 ~ imerg_rf + I(imerg_rf^2) + lis_runoff, 
    family = "binomial", data = rwanda)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)  
(Intercept)   -0.200468   0.088121  -2.275   0.0229 *
imerg_rf       0.079697   0.031820   2.505   0.0123 *
I(imerg_rf^2) -0.002133   0.001587  -1.344   0.1788  
lis_runoff     0.200043   0.085998   2.326   0.0200 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1269.3  on 915  degrees of freedom
Residual deviance: 1244.8  on 912  degrees of freedom
AIC: 1252.8

Number of Fisher Scoring iterations: 4
Parameter   | Odds Ratio |    SE |         95% CI |      z |     p
------------------------------------------------------------------
(Intercept) |      0.818 | 0.072 | [0.688, 0.972] | -2.275 | 0.023
imerg rf    |      1.083 | 0.034 | [1.017, 1.153] |  2.505 | 0.012
imerg rf^2  |      0.998 | 0.002 | [0.995, 1.001] | -1.344 | 0.179
lis runoff  |      1.221 | 0.105 | [1.042, 1.464] |  2.326 | 0.020

Model: wq_b10 ~ imerg_rf + I(imerg_rf^2) + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 912)
Tjur's R2: 0.026

Quadratic Effect B100

Testing linearity – is there an inflection point?


Call:
glm(formula = wq_b100 ~ imerg_rf + I(imerg_rf^2) + lis_runoff, 
    family = "binomial", data = rwanda)

Coefficients:
               Estimate Std. Error z value            Pr(>|z|)    
(Intercept)   -0.918125   0.096005  -9.563 <0.0000000000000002 ***
imerg_rf       0.077001   0.032567   2.364              0.0181 *  
I(imerg_rf^2) -0.001142   0.001613  -0.708              0.4790    
lis_runoff     0.009373   0.070993   0.132              0.8950    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1165.6  on 915  degrees of freedom
Residual deviance: 1150.1  on 912  degrees of freedom
AIC: 1158.1

Number of Fisher Scoring iterations: 4
Parameter   | Odds Ratio |    SE |         95% CI |      z |      p
-------------------------------------------------------------------
(Intercept) |      0.399 | 0.038 | [0.330, 0.481] | -9.563 | < .001
imerg rf    |      1.080 | 0.035 | [1.013, 1.151] |  2.364 | 0.018 
imerg rf^2  |      0.999 | 0.002 | [0.996, 1.002] | -0.708 | 0.479 
lis runoff  |      1.009 | 0.072 | [0.876, 1.160] |  0.132 | 0.895 

Model: wq_b100 ~ imerg_rf + I(imerg_rf^2) + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 912)
Tjur's R2: 0.017

Clustered Standard Errors + Model Summary B1

Adjusting by school site ID


z test of coefficients:

            Estimate Std. Error z value     Pr(>|z|)    
(Intercept) 0.629887   0.119603  5.2665 0.0000001391 ***
imerg_rf    0.040847   0.027532  1.4836      0.13792    
lis_runoff  0.280005   0.114970  2.4355      0.01487 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tinytable_d0uh62h7n4fuq114l8xf
(1)
Est. p
(Intercept) 1.877 [1.485, 2.373] <0.001
imerg_rf 1.042 [0.987, 1.099] 0.138
lis_runoff 1.323 [1.056, 1.658] 0.015
Num.Obs. 916
AIC 1114.0
BIC 1128.4
Log.Lik. -553.983
RMSE 0.46
Std.Errors by: school_id

Clustered Standard Errors + Model Summary B10

Adjusting by school site ID


z test of coefficients:

             Estimate Std. Error z value Pr(>|z|)   
(Intercept) -0.158528   0.104561 -1.5161 0.129487   
imerg_rf     0.043283   0.023452  1.8456 0.064953 . 
lis_runoff   0.227595   0.086767  2.6231 0.008714 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tinytable_wuhpbwqwh7uw66gbgiue
(1)
Est. p
(Intercept) 0.853 [0.695, 1.048] 0.129
imerg_rf 1.044 [0.997, 1.093] 0.065
lis_runoff 1.256 [1.059, 1.488] 0.009
Num.Obs. 916
AIC 1252.6
BIC 1267.0
Log.Lik. -623.277
RMSE 0.49
Std.Errors by: school_id

Clustered Standard Errors + Model Summary B100

Adjusting by school site ID


z test of coefficients:

             Estimate Std. Error z value            Pr(>|z|)    
(Intercept) -0.892674   0.118244 -7.5494 0.00000000000004372 ***
imerg_rf     0.057036   0.022735  2.5087             0.01212 *  
lis_runoff   0.022717   0.050727  0.4478             0.65428    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tinytable_0f39l80vfoo8gnixks8q
(1)
Est. p
(Intercept) 0.410 [0.325, 0.516] <0.001
imerg_rf 1.059 [1.013, 1.107] 0.012
lis_runoff 1.023 [0.926, 1.130] 0.654
Num.Obs. 916
AIC 1156.6
BIC 1171.1
Log.Lik. -575.309
RMSE 0.47
Std.Errors by: school_id

Predicted Probability Outcome: Plot B1


 imerg_rf Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
        0    0.670     0.0262 25.6   <0.001 476.7 0.619  0.722
        5    0.713     0.0245 29.1   <0.001 616.9 0.665  0.761
       15    0.789     0.0595 13.3   <0.001 130.8 0.672  0.905
       20    0.821     0.0722 11.4   <0.001  96.9 0.679  0.962
       35    0.894     0.0853 10.5   <0.001  82.9 0.727  1.061

Type:  response 
Columns: imerg_rf, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 

Predicted Probability Outcome: Plot B10


 imerg_rf Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
        0    0.479     0.0260 18.43   <0.001 249.6 0.428  0.530
        5    0.533     0.0268 19.91   <0.001 290.5 0.480  0.585
       15    0.636     0.0712  8.93   <0.001  61.1 0.497  0.776
       20    0.684     0.0910  7.52   <0.001  44.1 0.506  0.863
       35    0.805     0.1206  6.68   <0.001  35.3 0.569  1.042

Type:  response 
Columns: imerg_rf, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 

Predicted Probability Outcome: Plot B100


 imerg_rf Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
        0    0.292     0.0244 11.97   <0.001 107.2 0.244  0.340
        5    0.354     0.0240 14.77   <0.001 161.5 0.307  0.401
       15    0.493     0.0719  6.85   <0.001  37.0 0.352  0.634
       20    0.564     0.0976  5.77   <0.001  27.0 0.372  0.755
       35    0.752     0.1365  5.51   <0.001  24.7 0.485  1.020

Type:  response 
Columns: imerg_rf, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 

Predicted Data: Receiver Operator Characteristic (ROC) and Area Under Curve (AUC) B1

Create ROC curve, plot, calculate AUC to examine how predictive model is

Area under the curve: 0.6211

Predicted Data: Receiver Operator Characteristic (ROC) and Area Under Curve (AUC) B10

Create ROC curve, plot, calculate AUC to examine how predictive model is

Area under the curve: 0.6078

Predicted Data: Receiver Operator Characteristic (ROC) and Area Under Curve (AUC) B100

Create ROC curve, plot, calculate AUC to examine how predictive model is

Area under the curve: 0.5801

Next steps

Update w/ most recent Rwanda data

Run with different rainfall and runoff timelines