Cloud Seeding Report

Premise

Does cloud seeding actually cause an increase in rainfall? This report discusses a set of data collected during a cloud seeding experiment, and contains variables of rainfall in cubic meters times a factor of 1e8, number of days after the first day of experiment (time), suitability criterion (sne), percentage of cloud cover measured by radar (cloudcover), total rainfall in the area one hour before seeding (prewetness), and a factor for if radio echo was stationary or in motion (echomotion). Statistical analysis of the data below does not show a direct correlation between cloud seeding and increased rainfall, however, the suitability criterion (sne) does.

Basic Statistical Analysis Between Seeded and Unseeded Rainfall:

Below are histogram plot visualizing rainfall for either seeded or unseeded clouds:

The graphs look slightly different, the unseeded cloud rainfall visually looks a bit more skewed to the right than the seeded data, however the mean and standard deviation values are not terribly different for either data set:

Mean.Seeded.Cloud.Rainfall
[1] 4.634167
Standard.Deviation.Seeded.Cloud.Rainfall
[1] 2.776841
Mean.Unseeded.Cloud.Rainfall
[1] 4.171667
Standard.Deviation.Unseeded.Cloud.Rainfall
[1] 3.519196

While the unseeded rainfall data truly is a little bit more spread than the seeded rainfall data, the mean values are not terribly different. A more advanced statistical tool used to tell if two populations are different is a T-test.


    Welch Two Sample t-test

data:  cloudsSeeded$rainfall and cloudsUnseeded$rainfall
t = 0.3574, df = 20.871, p-value = 0.7244
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.229691  3.154691
sample estimates:
mean of x mean of y 
 4.634167  4.171667 

The p-value above indicates that the difference in populations is likely due to random variation, and not necessarily indicative of two distinct populations (p-value>0.05 cutoff)

Linear Regression Modeling To Evaluate Seeding

According to this experiment and T-test, seeding does not show a conclusive increase in rainfall. However, there are still other pieces of data collected through the experiment that are worth investigating. Below is a linear regression model including the other variables introduced earlier (cloud cover, prewetness, echomotion, and sne):


Call:
lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion + 
    sne, data = clouds)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.1158 -1.7078 -0.2422  1.3368  6.4827 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)  
(Intercept)           6.37680    2.43432   2.620   0.0174 *
seedingyes            1.12011    1.20725   0.928   0.3658  
cloudcover            0.01821    0.11508   0.158   0.8761  
prewetness            2.55109    2.70090   0.945   0.3574  
echomotionstationary  2.59855    1.54090   1.686   0.1090  
sne                  -1.27530    0.68015  -1.875   0.0771 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.855 on 18 degrees of freedom
Multiple R-squared:  0.3403,    Adjusted R-squared:  0.157 
F-statistic: 1.857 on 5 and 18 DF,  p-value: 0.1524
Analysis of Variance Table

Response: rainfall
           Df  Sum Sq Mean Sq F value  Pr(>F)  
seeding     1   1.283  1.2834  0.1575 0.69613  
cloudcover  1  15.738 15.7377  1.9313 0.18157  
prewetness  1   0.003  0.0027  0.0003 0.98557  
echomotion  1  29.985 29.9853  3.6798 0.07108 .
sne         1  28.649 28.6491  3.5158 0.07711 .
Residuals  18 146.677  8.1487                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The linear regression model above reveals some important information. First, it does agree with the previous claim that seeding didn’t have a great correlation with increased rainfall. The suitability criterion (sne) does however seem to indicate some potential correlation.

Linear Regression Modeling To Evaluate Suitability Criterion

To further explore the suitability criterion and what effect it plays in rainfall, it is important to separate the data to seeded vs unseeded observations again, to avoid confounding variables.

Analysis of Variance Table

Response: rainfall
          Df Sum Sq Mean Sq F value  Pr(>F)  
sne        1 33.310  33.310  6.4669 0.02921 *
Residuals 10 51.509   5.151                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Call:
lm(formula = rainfall ~ sne, data = cloudsSeeded)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0134 -1.3297 -0.3276  0.6171  4.3867 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  12.0202     2.9774   4.037  0.00237 **
sne          -2.2180     0.8722  -2.543  0.02921 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.27 on 10 degrees of freedom
Multiple R-squared:  0.3927,    Adjusted R-squared:  0.332 
F-statistic: 6.467 on 1 and 10 DF,  p-value: 0.02921
Analysis of Variance Table

Response: rainfall
          Df  Sum Sq Mean Sq F value Pr(>F)
sne        1  13.565  13.565  1.1058 0.3177
Residuals 10 122.667  12.267               

Call:
lm(formula = rainfall ~ sne, data = cloudsUnseeded)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4892 -2.1762  0.2958  1.4902  7.3616 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    7.319      3.160   2.317    0.043 *
sne           -1.046      0.995  -1.052    0.318  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.502 on 10 degrees of freedom
Multiple R-squared:  0.09957,   Adjusted R-squared:  0.009528 
F-statistic: 1.106 on 1 and 10 DF,  p-value: 0.3177

By separating the data, the difference between seeded and unseeded is easily observable. Overall, sne has stronger correlation to the seeded observations than the unseeded, however both seem to have at least some correlation. Below is a graph showing both data sets with their modeling:

`geom_smooth()` using formula = 'y ~ x'

The slope difference between the two sets of data is apparent in this graph. The slope is steeper for the seeded observation, suggesting a correlation that shows less rainfall as the sne value increases more for the seeded observations than the unseeded observations. That is to say, sne effects rainfall more for seeded than unseeded observations.