#packages
library(lubridate)
library(data.table)
library(dplyr)
library(tibble)
library(raster)Water Quality v Precipitation Variability
Pull in Rainfall, Runoff, and Water Quality Data
Code hidden throughout for brevity*
Cleaning & Data Check
n
1 155
n
1 188
[1] 134
[1] 21
Histograms
Categorical Indicators
| Characteristic | N = 1881 |
|---|---|
| wq | |
| 0 | 6 (3.2%) |
| 1-9 | 37 (20%) |
| 10-99 | 74 (39%) |
| 100+ | 71 (38%) |
| wq_b1 | |
| 0 | 6 (3.2%) |
| 1+ | 182 (97%) |
| wq_b10 | |
| 0-9 | 43 (23%) |
| 10+ | 145 (77%) |
| wq_b100 | |
| 0-99 | 117 (62%) |
| 100+ | 71 (38%) |
| 1 n (%) | |
| Characteristic | N = 9161 |
|---|---|
| wq | |
| 0 | 279 (30%) |
| 1-9 | 168 (18%) |
| 10-99 | 164 (18%) |
| 100+ | 305 (33%) |
| wq_b1 | |
| 0 | 279 (30%) |
| 1+ | 637 (70%) |
| wq_b10 | |
| 0-9 | 447 (49%) |
| 10+ | 469 (51%) |
| wq_b100 | |
| 0-99 | 611 (67%) |
| 100+ | 305 (33%) |
| 1 n (%) | |
Logistic regression B1 Rainfall
Testing unit increases in rainfall and runoff across risk of being contaminated (binary indicator 1 count)
Call:
glm(formula = wq_b1 ~ imerg_rf + lis_runoff, family = "binomial",
data = rwanda)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.62989 0.08786 7.169 0.000000000000755 ***
imerg_rf 0.04085 0.01846 2.213 0.0269 *
lis_runoff 0.28000 0.11793 2.374 0.0176 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1126.1 on 915 degrees of freedom
Residual deviance: 1108.0 on 913 degrees of freedom
AIC: 1114
Number of Fisher Scoring iterations: 4
Parameter | Odds Ratio | SE | 95% CI | z | p
------------------------------------------------------------------
(Intercept) | 1.877 | 0.165 | [1.582, 2.233] | 7.169 | < .001
imerg rf | 1.042 | 0.019 | [1.006, 1.082] | 2.213 | 0.027
lis runoff | 1.323 | 0.156 | [1.076, 1.721] | 2.374 | 0.018
Model: wq_b1 ~ imerg_rf + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 913)
Tjur's R2: 0.018
Logistic regression B10 Rainfall
Testing unit increases in rainfall and runoff across risk of being contaminated (binary indicator 10 count)
Call:
glm(formula = wq_b10 ~ imerg_rf + lis_runoff, family = "binomial",
data = rwanda)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.15853 0.08221 -1.928 0.05381 .
imerg_rf 0.04328 0.01602 2.701 0.00691 **
lis_runoff 0.22759 0.08442 2.696 0.00702 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1269.3 on 915 degrees of freedom
Residual deviance: 1246.6 on 913 degrees of freedom
AIC: 1252.6
Number of Fisher Scoring iterations: 4
Parameter | Odds Ratio | SE | 95% CI | z | p
------------------------------------------------------------------
(Intercept) | 0.853 | 0.070 | [0.726, 1.002] | -1.928 | 0.054
imerg rf | 1.044 | 0.017 | [1.013, 1.078] | 2.701 | 0.007
lis runoff | 1.256 | 0.106 | [1.075, 1.501] | 2.696 | 0.007
Model: wq_b10 ~ imerg_rf + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 913)
Tjur's R2: 0.024
Logistic regression B100 Rainfall
Testing unit increases in rainfall and runoff across risk of being contaminated (binary indicator 10 count)
Call:
glm(formula = wq_b100 ~ imerg_rf + lis_runoff, family = "binomial",
data = rwanda)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.89267 0.08857 -10.078 < 0.0000000000000002 ***
imerg_rf 0.05704 0.01591 3.585 0.000337 ***
lis_runoff 0.02272 0.06841 0.332 0.739831
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1165.6 on 915 degrees of freedom
Residual deviance: 1150.6 on 913 degrees of freedom
AIC: 1156.6
Number of Fisher Scoring iterations: 4
Parameter | Odds Ratio | SE | 95% CI | z | p
--------------------------------------------------------------------
(Intercept) | 0.410 | 0.036 | [0.344, 0.486] | -10.078 | < .001
imerg rf | 1.059 | 0.017 | [1.026, 1.093] | 3.585 | < .001
lis runoff | 1.023 | 0.070 | [0.892, 1.169] | 0.332 | 0.740
Model: wq_b100 ~ imerg_rf + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 913)
Tjur's R2: 0.017
Quadratic Effect B1
Testing linearity – is there an inflection point?
Call:
glm(formula = wq_b1 ~ imerg_rf + I(imerg_rf^2) + lis_runoff,
family = "binomial", data = rwanda)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.570073 0.092645 6.153 0.000000000759 ***
imerg_rf 0.097563 0.035220 2.770 0.0056 **
I(imerg_rf^2) -0.003318 0.001688 -1.966 0.0493 *
lis_runoff 0.230589 0.117737 1.959 0.0502 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1126.1 on 915 degrees of freedom
Residual deviance: 1104.4 on 912 degrees of freedom
AIC: 1112.4
Number of Fisher Scoring iterations: 4
Parameter | Odds Ratio | SE | 95% CI | z | p
-------------------------------------------------------------------
(Intercept) | 1.768 | 0.164 | [1.477, 2.124] | 6.153 | < .001
imerg rf | 1.102 | 0.039 | [1.029, 1.182] | 2.770 | 0.006
imerg rf^2 | 0.997 | 0.002 | [0.993, 1.000] | -1.966 | 0.049
lis runoff | 1.259 | 0.148 | [1.024, 1.636] | 1.959 | 0.050
Model: wq_b1 ~ imerg_rf + I(imerg_rf^2) + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 912)
Tjur's R2: 0.022
Quadratic Effect B10
Testing linearity – is there an inflection point?
Call:
glm(formula = wq_b10 ~ imerg_rf + I(imerg_rf^2) + lis_runoff,
family = "binomial", data = rwanda)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.200468 0.088121 -2.275 0.0229 *
imerg_rf 0.079697 0.031820 2.505 0.0123 *
I(imerg_rf^2) -0.002133 0.001587 -1.344 0.1788
lis_runoff 0.200043 0.085998 2.326 0.0200 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1269.3 on 915 degrees of freedom
Residual deviance: 1244.8 on 912 degrees of freedom
AIC: 1252.8
Number of Fisher Scoring iterations: 4
Parameter | Odds Ratio | SE | 95% CI | z | p
------------------------------------------------------------------
(Intercept) | 0.818 | 0.072 | [0.688, 0.972] | -2.275 | 0.023
imerg rf | 1.083 | 0.034 | [1.017, 1.153] | 2.505 | 0.012
imerg rf^2 | 0.998 | 0.002 | [0.995, 1.001] | -1.344 | 0.179
lis runoff | 1.221 | 0.105 | [1.042, 1.464] | 2.326 | 0.020
Model: wq_b10 ~ imerg_rf + I(imerg_rf^2) + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 912)
Tjur's R2: 0.026
Quadratic Effect B100
Testing linearity – is there an inflection point?
Call:
glm(formula = wq_b100 ~ imerg_rf + I(imerg_rf^2) + lis_runoff,
family = "binomial", data = rwanda)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.918125 0.096005 -9.563 <0.0000000000000002 ***
imerg_rf 0.077001 0.032567 2.364 0.0181 *
I(imerg_rf^2) -0.001142 0.001613 -0.708 0.4790
lis_runoff 0.009373 0.070993 0.132 0.8950
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1165.6 on 915 degrees of freedom
Residual deviance: 1150.1 on 912 degrees of freedom
AIC: 1158.1
Number of Fisher Scoring iterations: 4
Parameter | Odds Ratio | SE | 95% CI | z | p
-------------------------------------------------------------------
(Intercept) | 0.399 | 0.038 | [0.330, 0.481] | -9.563 | < .001
imerg rf | 1.080 | 0.035 | [1.013, 1.151] | 2.364 | 0.018
imerg rf^2 | 0.999 | 0.002 | [0.996, 1.002] | -0.708 | 0.479
lis runoff | 1.009 | 0.072 | [0.876, 1.160] | 0.132 | 0.895
Model: wq_b100 ~ imerg_rf + I(imerg_rf^2) + lis_runoff (916 Observations)
Residual standard deviation: 1.000 (df = 912)
Tjur's R2: 0.017
Clustered Standard Errors + Model Summary B1
Adjusting by school site ID
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.629887 0.119603 5.2665 0.0000001391 ***
imerg_rf 0.040847 0.027532 1.4836 0.13792
lis_runoff 0.280005 0.114970 2.4355 0.01487 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| (1) | ||
|---|---|---|
| Est. | p | |
| (Intercept) | 1.877 [1.485, 2.373] | <0.001 |
| imerg_rf | 1.042 [0.987, 1.099] | 0.138 |
| lis_runoff | 1.323 [1.056, 1.658] | 0.015 |
| Num.Obs. | 916 | |
| AIC | 1114.0 | |
| BIC | 1128.4 | |
| Log.Lik. | -553.983 | |
| RMSE | 0.46 | |
| Std.Errors | by: school_id | |
Clustered Standard Errors + Model Summary B10
Adjusting by school site ID
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.158528 0.104561 -1.5161 0.129487
imerg_rf 0.043283 0.023452 1.8456 0.064953 .
lis_runoff 0.227595 0.086767 2.6231 0.008714 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| (1) | ||
|---|---|---|
| Est. | p | |
| (Intercept) | 0.853 [0.695, 1.048] | 0.129 |
| imerg_rf | 1.044 [0.997, 1.093] | 0.065 |
| lis_runoff | 1.256 [1.059, 1.488] | 0.009 |
| Num.Obs. | 916 | |
| AIC | 1252.6 | |
| BIC | 1267.0 | |
| Log.Lik. | -623.277 | |
| RMSE | 0.49 | |
| Std.Errors | by: school_id | |
Clustered Standard Errors + Model Summary B100
Adjusting by school site ID
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.892674 0.118244 -7.5494 0.00000000000004372 ***
imerg_rf 0.057036 0.022735 2.5087 0.01212 *
lis_runoff 0.022717 0.050727 0.4478 0.65428
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| (1) | ||
|---|---|---|
| Est. | p | |
| (Intercept) | 0.410 [0.325, 0.516] | <0.001 |
| imerg_rf | 1.059 [1.013, 1.107] | 0.012 |
| lis_runoff | 1.023 [0.926, 1.130] | 0.654 |
| Num.Obs. | 916 | |
| AIC | 1156.6 | |
| BIC | 1171.1 | |
| Log.Lik. | -575.309 | |
| RMSE | 0.47 | |
| Std.Errors | by: school_id | |
Predicted Probability Outcome: Plot B1
imerg_rf Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0 0.670 0.0262 25.6 <0.001 476.7 0.619 0.722
5 0.713 0.0245 29.1 <0.001 616.9 0.665 0.761
15 0.789 0.0595 13.3 <0.001 130.8 0.672 0.905
20 0.821 0.0722 11.4 <0.001 96.9 0.679 0.962
35 0.894 0.0853 10.5 <0.001 82.9 0.727 1.061
Type: response
Columns: imerg_rf, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Predicted Probability Outcome: Plot B10
imerg_rf Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0 0.479 0.0260 18.43 <0.001 249.6 0.428 0.530
5 0.533 0.0268 19.91 <0.001 290.5 0.480 0.585
15 0.636 0.0712 8.93 <0.001 61.1 0.497 0.776
20 0.684 0.0910 7.52 <0.001 44.1 0.506 0.863
35 0.805 0.1206 6.68 <0.001 35.3 0.569 1.042
Type: response
Columns: imerg_rf, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Predicted Probability Outcome: Plot B100
imerg_rf Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0 0.292 0.0244 11.97 <0.001 107.2 0.244 0.340
5 0.354 0.0240 14.77 <0.001 161.5 0.307 0.401
15 0.493 0.0719 6.85 <0.001 37.0 0.352 0.634
20 0.564 0.0976 5.77 <0.001 27.0 0.372 0.755
35 0.752 0.1365 5.51 <0.001 24.7 0.485 1.020
Type: response
Columns: imerg_rf, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Predicted Data: Receiver Operator Characteristic (ROC) and Area Under Curve (AUC) B1
Create ROC curve, plot, calculate AUC to examine how predictive model is
Area under the curve: 0.6211
Predicted Data: Receiver Operator Characteristic (ROC) and Area Under Curve (AUC) B10
Create ROC curve, plot, calculate AUC to examine how predictive model is
Area under the curve: 0.6078
Predicted Data: Receiver Operator Characteristic (ROC) and Area Under Curve (AUC) B100
Create ROC curve, plot, calculate AUC to examine how predictive model is
Area under the curve: 0.5801
Next steps
Update w/ most recent Rwanda data
Run with different rainfall and runoff timelines