Using the clouds.csv data set, this project examines the effects of cloud seeding on rainfall. This project takes a closer look at other variables that could be affecting rainfall in both seeded and non-seeded conditions.
Setting Up The Work Space
To set up R Studio, the working directory was set to a folder with the clouds.csv data set. The packages needed for analysis were loaded into the current R session.
X seeding time sne
Min. : 1.00 Length:24 Min. : 0.00 Min. :1.300
1st Qu.: 6.75 Class :character 1st Qu.:15.75 1st Qu.:2.612
Median :12.50 Mode :character Median :32.50 Median :3.250
Mean :12.50 Mean :35.33 Mean :3.169
3rd Qu.:18.25 3rd Qu.:55.25 3rd Qu.:3.962
Max. :24.00 Max. :83.00 Max. :4.650
cloudcover prewetness echomotion rainfall
Min. : 2.200 Min. :0.0180 Length:24 Min. : 0.280
1st Qu.: 3.750 1st Qu.:0.1405 Class :character 1st Qu.: 2.342
Median : 5.250 Median :0.2220 Mode :character Median : 4.335
Mean : 7.246 Mean :0.3271 Mean : 4.403
3rd Qu.: 7.175 3rd Qu.:0.3297 3rd Qu.: 5.575
Max. :37.900 Max. :1.2670 Max. :12.850
For the rainfall variable, summary statistics were used to compare rainfall data between the seeded and non-seeded experiments. These include mean, median, and interquartile range.
summary(seeding_no$rainfall)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.280 1.075 4.055 4.172 5.832 12.850
summary(seeding_yes$rainfall)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.090 2.683 4.530 4.634 5.468 11.860
Mean:
The mean rainfall for non-seeded clouds is 4.171667 m3, and the mean rainfall for seeded clouds is 4.634167 m3. On average,seeded clouds produce 0.4625 m3 more rainfall than non-seeded clouds.
Median:
The median rainfall for non-seeded clouds is 4.055 m3, and the median rainfall for seeded clouds is 4.530 m3.The median rainfall for seeded clouds is 0.475 m3 higher than for non-seeded clouds.
Interquartile Range (IQR):
With an IQR value of 4.7575, fifty percent of rainfall from non-seeded clouds measures between 1.075 m3 and 5.832 m3. With an IQR value of 2.7850, fifty percent of rainfall from non-seeded clouds measures between 2.683 m3 and 5.468 m3.
The standard deviation in rainfall for non-seeded clouds is 3.519196, and the standard deviation in rainfall for seeded clouds is 2.776841.There is less variation (-0.7423555 sd) in rainfall for seeded clouds.
The statistical findings were plotted for better visualization of results.
Rainfall x Time Line Plot
This plot shows rainfall measurements for both seeded and non-seeded clouds throughout 83 days following the first day of the experiment (initial seeding day).
ggplot(clouds, aes(x=time, y=rainfall, group = seeding, colour = seeding), stat="bin", binwidth=1) +geom_line(linewidth =1) +scale_color_manual(values =c("no"="deepskyblue3", "yes"="deeppink3")) +labs(title ="Non-Seeded Clouds vs Seeded Clouds Rainfall Amounts Over 83 Days", x="Time: days after the first day of the experiment", y="Rainfall (cubic meters times 1e+8)", fill="Seeding") +scale_x_continuous(breaks =seq(0,85,by=5)) +scale_y_continuous(breaks =seq(0,15,by=1))
We can see that rainfall for seeded clouds peaked at about 12 m3 x 1e+8 on approximately day 38, and fluctuated between 1-6 m3 x 1e+8 during a majority of the period.
Non-seeded rainfall crashes drastically from about 13 to almost zero m3 x 1e+8 in the first 2 weeks, then peaks around 6.5 m3 x 1e+8 on approximately day 22, and decreases back down to almost zero m3 x 1e+8 by day 83.
Rainfall Distribution Box Plot:
This plot shows the distribution of rainfall for both seeded and non-seeded clouds. It helps visualize the Interquartile Range (IQR) in the shaded grey area, as well as any outliers in the data. Here we can see that the seeded 12 m3 x 1e+8 measurement recorded on approximately day 38 is a major outlier in that data set. This plot offers a visual representation of the difference in IQR between the seeded and non-seeded data sets that was identified in the statistical analysis: seeded clouds have a much narrower IQR than non-seeded clouds.
boxplot(seeding_no$rainfall, seeding_yes$rainfall, names =c("Non-Seeded Clouds", "Seeded Clouds"), ylab ="Rainfall (cubic meters times 1e+8)", main ="Rainfall Distribution: Non-Seeded vs Seeded Clouds")
Rainfall Distribution Density Plots
These plots show the density distribution of rainfall measurements for each group. The plots are similar: right-skewed with a slight bimodal appearance.The have been facet wrapped for side-by-side comparison.
ggplot(clouds, aes(x=rainfall, colour=seeding, fill =seeding)) +guides(fill=guide_legend(title="Seeding"), colour=guide_legend(title="Seeding")) +facet_wrap(~seeding, ncol=2, labeller =labeller(seeding =c("yes"="Seeded Clouds", "no"="Non-Seeded Clouds"))) +geom_density() +scale_fill_manual(values =c("no"="deepskyblue3", "yes"="deeppink3")) +scale_color_manual(values =c("no"="deepskyblue3", "yes"="deeppink3")) +scale_x_continuous(breaks =seq(0,15,by=1)) +scale_y_continuous(breaks =seq(0,0.2,by=0.01)) +labs(title ="Rainfall Density Distribution (non-seeded vs seeded clouds)", x ="Rainfall (cubic meters times 1e+8)", y="Density")
Overlaid Rainfall Distribution Density Plot
This plot overlays the density plots for seeded and non-seeded cloud rainfall measurements, making it very easy to visualize the difference in standard deviation between the two groups. The narrow curves of the seeded rainfall density plot communicate that those measurements are more tightly clustered around the mean, while the wider curves of the non-seeded rainfall density plot communicate that those measurements are more spread out around the mean.
ggplot(clouds, aes(x=rainfall, colour=seeding, fill =seeding)) +geom_density(alpha=0.5) +scale_fill_manual(values =c("no"="deepskyblue3", "yes"="deeppink3")) +scale_color_manual(values =c("no"="deepskyblue3", "yes"="deeppink3")) +scale_x_continuous(breaks =seq(0,15,by=1)) +scale_y_continuous(breaks =seq(0,0.2,by=0.01)) +labs(title ="Rainfall Distribution (non-seeded vs seeded clouds) (overlaid)", x ="Rainfall (cubic meters times 1e+8)", y="Density")
A t-test was performed to determine if the differences in rainfall between seeded and non-seeded clouds are statistically significant.
Welch Two Sample t-test
data: clouds$rainfall[clouds$seeding == "no"] and clouds$rainfall[clouds$seeding == "yes"]
t = -0.3574, df = 20.871, p-value = 0.7244
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.154691 2.229691
sample estimates:
mean of x mean of y
4.171667 4.634167
P-value = 0.7244
We fail to reject the null hypothesis, since the P-value is much greater than our 0.05 significance value.
There is not a statistically significant difference between the mean rainfall amounts of non-seeded clouds and seeded clouds.
Analyzing More Variables: Multiple Linear Regression Model
A multiple linear regression model was created to examine other variables that may affect rainfall amounts for both seeded and non-seeded clouds.
Other variables include cloudcover, prewetness, ecomotion, and sne (suitability index).
mlr_model_s2p1 =lm(rainfall~seeding+cloudcover+prewetness+echomotion+sne, data = clouds)
The model summary was examined, and an ANOVA test was performed.
summary(mlr_model_s2p1)
Call:
lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion +
sne, data = clouds)
Residuals:
Min 1Q Median 3Q Max
-5.1158 -1.7078 -0.2422 1.3368 6.4827
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.37680 2.43432 2.620 0.0174 *
seedingyes 1.12011 1.20725 0.928 0.3658
cloudcover 0.01821 0.11508 0.158 0.8761
prewetness 2.55109 2.70090 0.945 0.3574
echomotionstationary 2.59855 1.54090 1.686 0.1090
sne -1.27530 0.68015 -1.875 0.0771 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.855 on 18 degrees of freedom
Multiple R-squared: 0.3403, Adjusted R-squared: 0.157
F-statistic: 1.857 on 5 and 18 DF, p-value: 0.1524
Using the results from the summary of the model, holding all other independent variables constant and ranking the variables from most to least influence (p-value):
Sne (0.0771),
Echomotion (0.1090),
Seeding (0.3658),
Prewetness (0.3574),
Cloudcover (0.8761)
The adjusted R2 value is 0.157, which means that approximately 15.70 percent of the rainfall can be explained by these five variables.
When examining the results from the ANOVA test, the p-value influence rankings change:
Echomotion (0.07108),
Sne (0.07711),
Cloudcover (0.18157),
Seeding (0.69613),
Prewetness (0.98557)
If the variables are ranked by their Sum of Squares values, we get the same ranking order as the ANOVA p-value ranking. Echomotion has the highest (most influential) value and prewetness has the lowest (least influential) value.
Across both the summary and the ANOVA results, only two variables stand out as significant: echomotion and sne. Despite cloudcover having a high Sum of Squares value (15.738), it cannot be considered significant due to its extraordinarily large summary p-value (0.8761).
The significance of echomotion is determined by the combination of its low (although not technically significant) p-value (0.1090) in the summary and its very high Sum of Squares value (29.985) which suggests a significant level of variance explained.
The significance of sne is also determined by the combination of its significantly low p-value (0.0771) and its very high Sum of Squares value (28.649) which also suggests a significant level of variance explained.
Suitability Index (sne) Effect on Rainfall
The suitability index (sne) variable’s relationship with rainfall was more closely examined through two new regression models: one for non-seeded clouds, and one for seeded clouds.
Non-Seeded Model
lr_model_sne_seeding_no =lm(rainfall~sne, data = seeding_no)
Seeded Model
lr_model_sne_seeding_yes =lm(rainfall~sne, data = seeding_yes)
The summaries were examined and an ANOVA test was performed for each model.
Non-Seeded Model:Summary and ANOVA test
summary(lr_model_sne_seeding_no)
Call:
lm(formula = rainfall ~ sne, data = seeding_no)
Residuals:
Min 1Q Median 3Q Max
-5.4892 -2.1762 0.2958 1.4902 7.3616
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.319 3.160 2.317 0.043 *
sne -1.046 0.995 -1.052 0.318
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.502 on 10 degrees of freedom
Multiple R-squared: 0.09957, Adjusted R-squared: 0.009528
F-statistic: 1.106 on 1 and 10 DF, p-value: 0.3177
anova(lr_model_sne_seeding_no)
Analysis of Variance Table
Response: rainfall
Df Sum Sq Mean Sq F value Pr(>F)
sne 1 13.565 13.565 1.1058 0.3177
Residuals 10 122.667 12.267
Seeded Model:Summary and ANOVA test
summary(lr_model_sne_seeding_yes)
Call:
lm(formula = rainfall ~ sne, data = seeding_yes)
Residuals:
Min 1Q Median 3Q Max
-3.0134 -1.3297 -0.3276 0.6171 4.3867
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.0202 2.9774 4.037 0.00237 **
sne -2.2180 0.8722 -2.543 0.02921 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.27 on 10 degrees of freedom
Multiple R-squared: 0.3927, Adjusted R-squared: 0.332
F-statistic: 6.467 on 1 and 10 DF, p-value: 0.02921
anova(lr_model_sne_seeding_yes)
Analysis of Variance Table
Response: rainfall
Df Sum Sq Mean Sq F value Pr(>F)
sne 1 33.310 33.310 6.4669 0.02921 *
Residuals 10 51.509 5.151
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpreting the Models
The sne coefficient (slope) for the seeded model is -2.218, while the coefficient for the non-seeded model is -1.046. This suggests that sne has a stronger effect on rainfall under seeded conditions.
The p-value for the seeded model is 0.029, which is statistically significant. The p-value for the non-seeded model is 0.318, which is not significant.
The R2 value for the seeded model is 0.393, which suggests that sne accounts for 39.3 percent of rainfall under seeded conditions. The R2 value for the non-seeded model is 0.100, which suggests that sne only accounts for 10 percent of the rainfall under non-seeded conditions.
These results likely tell us that high sne values inhibit/disrupt a storm’s ability to produce rainfall in seeded conditions.
A figure showing the two models was produced.
ggplot(clouds, aes(x=sne, y=rainfall, color=seeding)) +geom_point() +geom_smooth(method=lm, se=FALSE, linewidth=1) +scale_color_manual(values =c("no"="deepskyblue3", "yes"="deeppink3")) +scale_x_continuous(breaks =seq(0,5,by=.5)) +scale_y_continuous(breaks =seq(0,15,by=1)) +labs(title="Rainfall by Suitability Index (sne) for Seeded and Non-Seeded Clouds", x="Suitability Index (SNE)", y="Rainfall (cubic metres times 1e+8)", fill="Seeding")
`geom_smooth()` using formula = 'y ~ x'
Comparing Slopes
The differences in slope between seeded and non-seeded conditions suggest that Suitability Index values have a much more pronounced negative effect on rainfall in seeded conditions. These results likely tell us that high sne values inhibit/disrupt a storm’s ability to produce rainfall in seeded conditions. These results also suggest that cloud seeding could be counterproductive in conditions where sne values are high.