Cloud Seeding Report
Premise
Does cloud seeding actually cause an increase in rainfall? This report discusses a set of data collected during a cloud seeding experiment, and contains variables of rainfall in cubic meters times a factor of 1e8, number of days after the first day of experiment (time), suitability criterion (sne), percentage of cloud cover measured by radar (cloudcover), total rainfall in the area one hour before seeding (prewetness), and a factor for if radio echo was stationary or in motion (echomotion). Statistical analysis of the data below does not show a direct correlation between cloud seeding and increased rainfall, however, the suitability criterion (sne) does.
Basic Statistical Analysis Between Seeded and Unseeded Rainfall:
Below are histogram plot visualizing rainfall for either seeded or unseeded clouds:
The graphs look slightly different, the unseeded cloud rainfall visually looks a bit more skewed to the right than the seeded data, however the mean and standard deviation values are not terribly different for either data set:
Mean.Seeded.Cloud.Rainfall[1] 4.634167
Standard.Deviation.Seeded.Cloud.Rainfall[1] 2.776841
Mean.Unseeded.Cloud.Rainfall[1] 4.171667
Standard.Deviation.Unseeded.Cloud.Rainfall[1] 3.519196
While the unseeded rainfall data truly is a little bit more spread than the seeded rainfall data, the mean values are not terribly different. A more advanced statistical tool used to tell if two populations are different is a T-test.
Welch Two Sample t-test
data: cloudsSeeded$rainfall and cloudsUnseeded$rainfall
t = 0.3574, df = 20.871, p-value = 0.7244
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.229691 3.154691
sample estimates:
mean of x mean of y
4.634167 4.171667
The p-value above indicates that the difference in populations is likely due to random variation, and not necessarily indicative of two distinct populations (p-value>0.05 cutoff)
Linear Regression Modeling To Evaluate Seeding
According to this experiment and T-test, seeding does not show a conclusive increase in rainfall. However, there are still other pieces of data collected through the experiment that are worth investigating. Below is a linear regression model including the other variables introduced earlier (cloud cover, prewetness, echomotion, and sne):
Call:
lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion +
sne, data = clouds)
Residuals:
Min 1Q Median 3Q Max
-5.1158 -1.7078 -0.2422 1.3368 6.4827
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.37680 2.43432 2.620 0.0174 *
seedingyes 1.12011 1.20725 0.928 0.3658
cloudcover 0.01821 0.11508 0.158 0.8761
prewetness 2.55109 2.70090 0.945 0.3574
echomotionstationary 2.59855 1.54090 1.686 0.1090
sne -1.27530 0.68015 -1.875 0.0771 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.855 on 18 degrees of freedom
Multiple R-squared: 0.3403, Adjusted R-squared: 0.157
F-statistic: 1.857 on 5 and 18 DF, p-value: 0.1524
Analysis of Variance Table
Response: rainfall
Df Sum Sq Mean Sq F value Pr(>F)
seeding 1 1.283 1.2834 0.1575 0.69613
cloudcover 1 15.738 15.7377 1.9313 0.18157
prewetness 1 0.003 0.0027 0.0003 0.98557
echomotion 1 29.985 29.9853 3.6798 0.07108 .
sne 1 28.649 28.6491 3.5158 0.07711 .
Residuals 18 146.677 8.1487
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The linear regression model above reveals some important information. First, it does agree with the previous claim that seeding didn’t have a great correlation with increased rainfall. The suitability criterion (sne) does however seem to indicate some potential correlation.
Linear Regression Modeling To Evaluate Suitability Criterion
To further explore the suitability criterion and what effect it plays in rainfall, it is important to separate the data to seeded vs unseeded observations again, to avoid confounding variables.
Analysis of Variance Table
Response: rainfall
Df Sum Sq Mean Sq F value Pr(>F)
sne 1 33.310 33.310 6.4669 0.02921 *
Residuals 10 51.509 5.151
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = rainfall ~ sne, data = cloudsSeeded)
Residuals:
Min 1Q Median 3Q Max
-3.0134 -1.3297 -0.3276 0.6171 4.3867
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.0202 2.9774 4.037 0.00237 **
sne -2.2180 0.8722 -2.543 0.02921 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.27 on 10 degrees of freedom
Multiple R-squared: 0.3927, Adjusted R-squared: 0.332
F-statistic: 6.467 on 1 and 10 DF, p-value: 0.02921
Analysis of Variance Table
Response: rainfall
Df Sum Sq Mean Sq F value Pr(>F)
sne 1 13.565 13.565 1.1058 0.3177
Residuals 10 122.667 12.267
Call:
lm(formula = rainfall ~ sne, data = cloudsUnseeded)
Residuals:
Min 1Q Median 3Q Max
-5.4892 -2.1762 0.2958 1.4902 7.3616
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.319 3.160 2.317 0.043 *
sne -1.046 0.995 -1.052 0.318
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.502 on 10 degrees of freedom
Multiple R-squared: 0.09957, Adjusted R-squared: 0.009528
F-statistic: 1.106 on 1 and 10 DF, p-value: 0.3177
By separating the data, the difference between seeded and unseeded is easily observable. Overall, sne has stronger correlation to the seeded observations than the unseeded, however both seem to have at least some correlation. Below is a graph showing both data sets with their modeling:
`geom_smooth()` using formula = 'y ~ x'
The slope difference between the two sets of data is apparent in this graph. The slope is steeper for the seeded observation, suggesting a correlation that shows less rainfall as the sne value increases more for the seeded observations than the unseeded observations. That is to say, sne effects rainfall more for seeded than unseeded observations.