Final Project - GEOG 5680

Author

Vivian Strange

Project 3: Cloud Seeding

06/20/2025

Project Description

Using the clouds.csv data set, this project examines the effects of cloud seeding on rainfall. This project takes a closer look at other variables that could be affecting rainfall in both seeded and non-seeded conditions.

Setting Up The Work Space

To set up R Studio, the working directory was set to a folder with the clouds.csv data set. The packages needed for analysis were loaded into the current R session.

setwd("~/Desktop/geog5680/FINAL_PROJECT")
library(ggplot2)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(knitr)
library(rmarkdown)

The clouds.csv data set was loaded, renamed, and inspected.

clouds = read.csv("clouds.csv")
str(clouds)

'data.frame':   24 obs. of  8 variables:
 $ X         : int  1 2 3 4 5 6 7 8 9 10 ...
 $ seeding   : chr  "no" "yes" "yes" "no" ...
 $ time      : int  0 1 3 4 6 9 18 25 27 28 ...
 $ sne       : num  1.75 2.7 4.1 2.35 4.25 1.6 1.3 3.35 2.85 2.2 ...
 $ cloudcover: num  13.4 37.9 3.9 5.3 7.1 6.9 4.6 4.9 12.1 5.2 ...
 $ prewetness: num  0.274 1.267 0.198 0.526 0.25 ...
 $ echomotion: chr  "stationary" "moving" "stationary" "moving" ...
 $ rainfall  : num  12.85 5.52 6.29 6.11 2.45 ...

summary(clouds)

       X           seeding               time            sne       
 Min.   : 1.00   Length:24          Min.   : 0.00   Min.   :1.300  
 1st Qu.: 6.75   Class :character   1st Qu.:15.75   1st Qu.:2.612  
 Median :12.50   Mode  :character   Median :32.50   Median :3.250  
 Mean   :12.50                      Mean   :35.33   Mean   :3.169  
 3rd Qu.:18.25                      3rd Qu.:55.25   3rd Qu.:3.962  
 Max.   :24.00                      Max.   :83.00   Max.   :4.650  
   cloudcover       prewetness      echomotion           rainfall     
 Min.   : 2.200   Min.   :0.0180   Length:24          Min.   : 0.280  
 1st Qu.: 3.750   1st Qu.:0.1405   Class :character   1st Qu.: 2.342  
 Median : 5.250   Median :0.2220   Mode  :character   Median : 4.335  
 Mean   : 7.246   Mean   :0.3271                      Mean   : 4.403  
 3rd Qu.: 7.175   3rd Qu.:0.3297                      3rd Qu.: 5.575  
 Max.   :37.900   Max.   :1.2670                      Max.   :12.850

head(clouds)

  X seeding time  sne cloudcover prewetness echomotion rainfall
1 1      no    0 1.75       13.4      0.274 stationary    12.85
2 2     yes    1 2.70       37.9      1.267     moving     5.52
3 3     yes    3 4.10        3.9      0.198 stationary     6.29
4 4      no    4 2.35        5.3      0.526     moving     6.11
5 5     yes    6 4.25        7.1      0.250     moving     2.45
6 6      no    9 1.60        6.9      0.018 stationary     3.61

Analyzing The Data

The difference in rainfall between seeding and non-seeding experiments was examined using statistical analyses.

The data was filtered by seeding status into two new subsets: data from seeded clouds (seeding_yes) and data from non-seeded clouds (seeding_no).

seeding_yes = clouds |> filter(seeding == "yes")
seeding_no = clouds |> filter(seeding == "no")

For the rainfall variable, summary statistics were used to compare rainfall data between the seeded and non-seeded experiments. These include mean, median, and interquartile range.

summary(seeding_no$rainfall)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.280   1.075   4.055   4.172   5.832  12.850

summary(seeding_yes$rainfall)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.090   2.683   4.530   4.634   5.468  11.860

Mean:

The mean rainfall for non-seeded clouds is 4.171667 m³, and the mean rainfall for seeded clouds is 4.634167 m³. On average, seeded clouds produce 0.4625 m³ more rainfall than non-seeded clouds.

Median:

The median rainfall for non-seeded clouds is 4.055 m³, and the median rainfall for seeded clouds is 4.530 m³. The median rainfall for seeded clouds is 0.475 m³ higher than for non-seeded clouds.

Interquartile Range (IQR):

With an IQR value of 4.7575, fifty percent of rainfall from non-seeded clouds measures between 1.075 m³ and 5.832 m³. With an IQR value of 2.7850, fifty percent of rainfall from non-seeded clouds measures between 2.683 m³ and 5.468 m³.

Standard deviation was also examined.

sd_rain = tapply(clouds$rainfall, clouds$seeding, sd)
tapply(clouds$rainfall, clouds$seeding, sd)

      no      yes 
3.519196 2.776841

dif_sd = sd_rain["yes"]-sd_rain["no"]
sd_rain["yes"]-sd_rain["no"]

       yes 
-0.7423555

The standard deviation in rainfall for non-seeded clouds is 3.519196, and the standard deviation in rainfall for seeded clouds is 2.776841. There is less variation (-0.7423555 sd) in rainfall for seeded clouds.

The statistical findings were plotted for better visualization of results.

Rainfall x Time Line Plot

This plot shows rainfall measurements for both seeded and non-seeded clouds throughout 83 days following the first day of the experiment (initial seeding day).

ggplot(clouds, aes(x=time, y=rainfall, group = seeding, colour = seeding), stat="bin", binwidth=1) + 
  geom_line(linewidth = 1) + 
  scale_color_manual(values = c("no"="deepskyblue3", "yes"="deeppink3")) + 
  labs(title = "Non-Seeded Clouds vs Seeded Clouds Rainfall Amounts Over 83 Days", 
       x="Time: days after the first day of the experiment", 
       y="Rainfall (cubic meters times 1e+8)", 
       fill="Seeding") + 
  scale_x_continuous(breaks = seq(0,85,by=5)) + 
  scale_y_continuous(breaks = seq(0,15,by=1))

We can see that rainfall for seeded clouds peaked at about 12 m³ x 1e+8 on approximately day 38, and fluctuated between 1-6 m³ x 1e+8 during a majority of the period.

Non-seeded rainfall crashes drastically from about 13 to almost zero m³ x 1e+8 in the first 2 weeks, then peaks around 6.5 m³ x 1e+8 on approximately day 22, and decreases back down to almost zero m³ x 1e+8 by day 83.

Rainfall Distribution Box Plot:

This plot shows the distribution of rainfall for both seeded and non-seeded clouds. It helps visualize the Interquartile Range (IQR) in the shaded grey area, as well as any outliers in the data. Here we can see that the seeded 12 m³ x 1e+8 measurement recorded on approximately day 38 is a major outlier in that data set. This plot offers a visual representation of the difference in IQR between the seeded and non-seeded data sets that was identified in the statistical analysis: seeded clouds have a much narrower IQR than non-seeded clouds.

boxplot(seeding_no$rainfall, seeding_yes$rainfall, names = c("Non-Seeded Clouds", "Seeded Clouds"), ylab = "Rainfall (cubic meters times 1e+8)", main = "Rainfall Distribution: Non-Seeded vs Seeded Clouds")

Rainfall Distribution Density Plots

These plots show the density distribution of rainfall measurements for each group. The plots are similar: right-skewed with a slight bimodal appearance.The have been facet wrapped for side-by-side comparison.

ggplot(clouds, aes(x=rainfall, colour=seeding, fill =seeding)) + 
  guides(fill=guide_legend(title="Seeding"), colour=guide_legend(title="Seeding")) + 
  facet_wrap(~seeding, ncol=2, labeller = labeller(seeding = c("yes" = "Seeded Clouds", "no" = "Non-Seeded Clouds"))) + 
  geom_density() + 
  scale_fill_manual(values = c("no"="deepskyblue3", "yes"="deeppink3")) + 
  scale_color_manual(values = c("no"="deepskyblue3", "yes"="deeppink3")) + 
  scale_x_continuous(breaks = seq(0,15,by=1)) + 
  scale_y_continuous(breaks = seq(0,0.2,by=0.01)) + 
  labs(title = "Rainfall Density Distribution (non-seeded vs seeded clouds)", 
       x = "Rainfall (cubic meters times 1e+8)", 
       y= "Density")

Overlaid Rainfall Distribution Density Plot

This plot overlays the density plots for seeded and non-seeded cloud rainfall measurements, making it very easy to visualize the difference in standard deviation between the two groups. The narrow curves of the seeded rainfall density plot communicate that those measurements are more tightly clustered around the mean, while the wider curves of the non-seeded rainfall density plot communicate that those measurements are more spread out around the mean.

ggplot(clouds, aes(x=rainfall, colour=seeding, fill =seeding)) + 
  geom_density(alpha=0.5) + 
  scale_fill_manual(values = c("no"="deepskyblue3", "yes"="deeppink3")) + 
  scale_color_manual(values = c("no"="deepskyblue3", "yes"="deeppink3")) + 
  scale_x_continuous(breaks = seq(0,15,by=1)) + 
  scale_y_continuous(breaks = seq(0,0.2,by=0.01)) + 
  labs(title = "Rainfall Distribution (non-seeded vs seeded clouds) (overlaid)", 
       x = "Rainfall (cubic meters times 1e+8)", 
       y= "Density")

A t-test was performed to determine if the differences in rainfall between seeded and non-seeded clouds are statistically significant.

t.test(clouds$rainfall[clouds$seeding == "no"], clouds$rainfall[clouds$seeding == "yes"])


    Welch Two Sample t-test

data:  clouds$rainfall[clouds$seeding == "no"] and clouds$rainfall[clouds$seeding == "yes"]
t = -0.3574, df = 20.871, p-value = 0.7244
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.154691  2.229691
sample estimates:
mean of x mean of y 
 4.171667  4.634167

P-value = 0.7244

We fail to reject the null hypothesis, since the P-value is much greater than our 0.05 significance value.

There is not a statistically significant difference between the mean rainfall amounts of non-seeded clouds and seeded clouds.

Analyzing More Variables: Multiple Linear Regression Model

A multiple linear regression model was created to examine other variables that may affect rainfall amounts for both seeded and non-seeded clouds.

Other variables include cloudcover, prewetness, ecomotion, and sne (suitability index).

mlr_model_s2p1 = lm(rainfall~seeding+cloudcover+prewetness+echomotion+sne, data = clouds)

The model summary was examined, and an ANOVA test was performed.

summary(mlr_model_s2p1)


Call:
lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion + 
    sne, data = clouds)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.1158 -1.7078 -0.2422  1.3368  6.4827 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)  
(Intercept)           6.37680    2.43432   2.620   0.0174 *
seedingyes            1.12011    1.20725   0.928   0.3658  
cloudcover            0.01821    0.11508   0.158   0.8761  
prewetness            2.55109    2.70090   0.945   0.3574  
echomotionstationary  2.59855    1.54090   1.686   0.1090  
sne                  -1.27530    0.68015  -1.875   0.0771 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.855 on 18 degrees of freedom
Multiple R-squared:  0.3403,    Adjusted R-squared:  0.157 
F-statistic: 1.857 on 5 and 18 DF,  p-value: 0.1524

anova(mlr_model_s2p1)

Analysis of Variance Table

Response: rainfall
           Df  Sum Sq Mean Sq F value  Pr(>F)  
seeding     1   1.283  1.2834  0.1575 0.69613  
cloudcover  1  15.738 15.7377  1.9313 0.18157  
prewetness  1   0.003  0.0027  0.0003 0.98557  
echomotion  1  29.985 29.9853  3.6798 0.07108 .
sne         1  28.649 28.6491  3.5158 0.07711 .
Residuals  18 146.677  8.1487                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using the results from the summary of the model, holding all other independent variables constant and ranking the variables from most to least influence (p-value):

Sne (0.0771),
Echomotion (0.1090),
Seeding (0.3658),
Prewetness (0.3574),
Cloudcover (0.8761)

The adjusted R² value is 0.157, which means that approximately 15.70 percent of the rainfall can be explained by these five variables.

When examining the results from the ANOVA test, the p-value influence rankings change:

Echomotion (0.07108),
Sne (0.07711),
Cloudcover (0.18157),
Seeding (0.69613),
Prewetness (0.98557)

If the variables are ranked by their Sum of Squares values, we get the same ranking order as the ANOVA p-value ranking. Echomotion has the highest (most influential) value and prewetness has the lowest (least influential) value.

Across both the summary and the ANOVA results, only two variables stand out as significant: echomotion and sne. Despite cloudcover having a high Sum of Squares value (15.738), it cannot be considered significant due to its extraordinarily large summary p-value (0.8761).

The significance of echomotion is determined by the combination of its low (although not technically significant) p-value (0.1090) in the summary and its very high Sum of Squares value (29.985) which suggests a significant level of variance explained.

The significance of sne is also determined by the combination of its significantly low p-value (0.0771) and its very high Sum of Squares value (28.649) which also suggests a significant level of variance explained.

Suitability Index (sne) Effect on Rainfall

The suitability index (sne) variable’s relationship with rainfall was more closely examined through two new regression models: one for non-seeded clouds, and one for seeded clouds.

Non-Seeded Model

lr_model_sne_seeding_no = lm(rainfall~sne, data = seeding_no)

Seeded Model

lr_model_sne_seeding_yes = lm(rainfall~sne, data = seeding_yes)

The summaries were examined and an ANOVA test was performed for each model.

Non-Seeded Model: Summary and ANOVA test

summary(lr_model_sne_seeding_no)


Call:
lm(formula = rainfall ~ sne, data = seeding_no)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4892 -2.1762  0.2958  1.4902  7.3616 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    7.319      3.160   2.317    0.043 *
sne           -1.046      0.995  -1.052    0.318  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.502 on 10 degrees of freedom
Multiple R-squared:  0.09957,   Adjusted R-squared:  0.009528 
F-statistic: 1.106 on 1 and 10 DF,  p-value: 0.3177

anova(lr_model_sne_seeding_no)

Analysis of Variance Table

Response: rainfall
          Df  Sum Sq Mean Sq F value Pr(>F)
sne        1  13.565  13.565  1.1058 0.3177
Residuals 10 122.667  12.267

Seeded Model: Summary and ANOVA test

summary(lr_model_sne_seeding_yes)


Call:
lm(formula = rainfall ~ sne, data = seeding_yes)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0134 -1.3297 -0.3276  0.6171  4.3867 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  12.0202     2.9774   4.037  0.00237 **
sne          -2.2180     0.8722  -2.543  0.02921 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.27 on 10 degrees of freedom
Multiple R-squared:  0.3927,    Adjusted R-squared:  0.332 
F-statistic: 6.467 on 1 and 10 DF,  p-value: 0.02921

anova(lr_model_sne_seeding_yes)

Analysis of Variance Table

Response: rainfall
          Df Sum Sq Mean Sq F value  Pr(>F)  
sne        1 33.310  33.310  6.4669 0.02921 *
Residuals 10 51.509   5.151                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpreting the Models

The sne coefficient (slope) for the seeded model is -2.218, while the coefficient for the non-seeded model is -1.046. This suggests that sne has a stronger effect on rainfall under seeded conditions.

The p-value for the seeded model is 0.029, which is statistically significant. The p-value for the non-seeded model is 0.318, which is not significant.

The R² value for the seeded model is 0.393, which suggests that sne accounts for 39.3 percent of rainfall under seeded conditions. The R² value for the non-seeded model is 0.100, which suggests that sne only accounts for 10 percent of the rainfall under non-seeded conditions.

These results likely tell us that high sne values inhibit/disrupt a storm’s ability to produce rainfall in seeded conditions.

A figure showing the two models was produced.

ggplot(clouds, aes(x=sne, y=rainfall, color=seeding)) + 
  geom_point() + 
  geom_smooth(method=lm, se=FALSE, linewidth=1) + 
  scale_color_manual(values = c("no"="deepskyblue3", "yes"="deeppink3")) + 
  scale_x_continuous(breaks = seq(0,5,by=.5)) + 
  scale_y_continuous(breaks = seq(0,15,by=1)) +
  labs(title="Rainfall by Suitability Index (sne) for Seeded and Non-Seeded Clouds", x="Suitability Index (SNE)", y="Rainfall (cubic metres times 1e+8)", fill="Seeding")

`geom_smooth()` using formula = 'y ~ x'

Comparing Slopes

The differences in slope between seeded and non-seeded conditions suggest that Suitability Index values have a much more pronounced negative effect on rainfall in seeded conditions. These results likely tell us that high sne values inhibit/disrupt a storm’s ability to produce rainfall in seeded conditions. These results also suggest that cloud seeding could be counterproductive in conditions where sne values are high.