6680 Final Project

Author

Ethan Schatz

This document documents the process and results of the final project for Geography 6680. The project I have chosen examines the effects of cloud seeding and whether seeding clouds with silver iodide increases rainfall amounts.


Load the Data

Here we load the data the request libraries and data files that will be needed for this project.

library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
clouds = read.csv("E:/Summer 2026/data/clouds.csv")
clouds <- na.omit(clouds)

Seeding vs. Non-seeding (t-test)

The goal of this project is to test the effects of cloud seeding. It’s only natural for our first test to run a simple t-test to determine whether cloud seeding significantly impacts rainfall totals.

Null Hypothesis

There isn’t a significant difference in rainfall between seeding and non-seeding experiments.

Alternate Hypothesis

There is a significant difference in rainfall between seeding and non-seeding experiments.

Building the model

Statistical Summary

clouds %>%
  group_by(seeding) %>%
  summarise(
    count = n(),
    mean_rainfall = mean(rainfall, na.rm = TRUE),
    sd_rainfall = sd(rainfall, na.rm = TRUE),
    median_rainfall = median(rainfall, na.rm = TRUE),
    min_rainfall = min(rainfall, na.rm = TRUE),
    max_rainfall = max(rainfall, na.rm = TRUE)
  )
# A tibble: 2 × 7
  seeding count mean_rainfall sd_rainfall median_rainfall min_rainfall
  <chr>   <int>         <dbl>       <dbl>           <dbl>        <dbl>
1 no         12          4.17        3.52            4.06         0.28
2 yes        12          4.63        2.78            4.53         1.09
# ℹ 1 more variable: max_rainfall <dbl>

This table show the summary statistics comparing the rainfall total of a data point and whether it was a seeding experiment or not. ### Visualization

ggplot(clouds, aes(x = seeding, y = rainfall, fill = seeding)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) + 
  geom_jitter(width = 0.2, alpha = 0.5) +         
  labs(
    title = "Comparison of Rainfall between Seeding and Non-Seeding Experiments",
    x = "Seeding?",
    y = "Rainfall"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

This graph is a visual of the summary statistics. ### t-test

clouds_t_test <- t.test(rainfall ~ seeding, data = clouds)
print(clouds_t_test)

    Welch Two Sample t-test

data:  rainfall by seeding
t = -0.3574, df = 20.871, p-value = 0.7244
alternative hypothesis: true difference in means between group no and group yes is not equal to 0
95 percent confidence interval:
 -3.154691  2.229691
sample estimates:
 mean in group no mean in group yes 
         4.171667          4.634167 

Running a t-test on the variables yields a p-value of 0.7224, which indicates we accept the null hypothesis and conclude there isn’t a significant difference between seeding and non-seeding experiments.

Multiple Linear Regression Model

While seeding itself doesn’t have a significant impact on rainfall, it’s plausible that environmental factors could mask or drive rainfall totals. This section will focus on building a multiple linear regression model to isolate seeding’s impact, while simultaneously testing key variables such as cloud cover, pre-wetness, echo motion, and suitability (sne).

Model

clouds_multi <- lm(rainfall ~ seeding + cloudcover + prewetness + echomotion + sne, data = clouds)
summary(clouds_multi)

Call:
lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion + 
    sne, data = clouds)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.1158 -1.7078 -0.2422  1.3368  6.4827 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)  
(Intercept)           6.37680    2.43432   2.620   0.0174 *
seedingyes            1.12011    1.20725   0.928   0.3658  
cloudcover            0.01821    0.11508   0.158   0.8761  
prewetness            2.55109    2.70090   0.945   0.3574  
echomotionstationary  2.59855    1.54090   1.686   0.1090  
sne                  -1.27530    0.68015  -1.875   0.0771 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.855 on 18 degrees of freedom
Multiple R-squared:  0.3403,    Adjusted R-squared:  0.157 
F-statistic: 1.857 on 5 and 18 DF,  p-value: 0.1524
anova(clouds_multi)
Analysis of Variance Table

Response: rainfall
           Df  Sum Sq Mean Sq F value  Pr(>F)  
seeding     1   1.283  1.2834  0.1575 0.69613  
cloudcover  1  15.738 15.7377  1.9313 0.18157  
prewetness  1   0.003  0.0027  0.0003 0.98557  
echomotion  1  29.985 29.9853  3.6798 0.07108 .
sne         1  28.649 28.6491  3.5158 0.07711 .
Residuals  18 146.677  8.1487                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The resulting model suggests that none of the variables are statistically significant. Only one variable, sne (suitability), comes close to being significant. This model is also not a good one, since only 34% of the variance is explained by it. However, looking at the ANOVA table, the variables echomotion and sne account for most of the model’s variation, with sums of squares of 29.985 and 28.649, respectively. This translates to around 39.6% and 37.9%, respectively. Based on these results, suitability appears to be the most impactful variable because it is closest to statistical significance and has the second-highest variation in the model.

Suitability Models

Given that sne (suitability) seems to be the best overall variable in the data set it would be logical to examine rainfall behavior under different conditions. This section will build two models, one testing the rainfall with cloud seeding, and one without.

Cloud Seeding Model

seeding_data <- clouds %>% filter(seeding == "yes")
model_seeding <- lm(rainfall ~ sne, data = seeding_data)
summary(model_seeding)

Call:
lm(formula = rainfall ~ sne, data = seeding_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0134 -1.3297 -0.3276  0.6171  4.3867 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  12.0202     2.9774   4.037  0.00237 **
sne          -2.2180     0.8722  -2.543  0.02921 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.27 on 10 degrees of freedom
Multiple R-squared:  0.3927,    Adjusted R-squared:  0.332 
F-statistic: 6.467 on 1 and 10 DF,  p-value: 0.02921

Non-cloud Seeding Model

non_seeding_data <- clouds %>% filter(seeding == "no")
model_non_seeding <- lm(rainfall ~ sne, data = non_seeding_data)
summary(model_non_seeding)

Call:
lm(formula = rainfall ~ sne, data = non_seeding_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4892 -2.1762  0.2958  1.4902  7.3616 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    7.319      3.160   2.317    0.043 *
sne           -1.046      0.995  -1.052    0.318  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.502 on 10 degrees of freedom
Multiple R-squared:  0.09957,   Adjusted R-squared:  0.009528 
F-statistic: 1.106 on 1 and 10 DF,  p-value: 0.3177

Plot

library(ggplot2)

ggplot(clouds, aes(x = sne, y = rainfall, color = seeding)) +
  geom_point(size = 3, alpha = 0.7) +                  
  geom_smooth(method = "lm", se = FALSE, size = 1.2) + 
  labs(
    title = "Effect of Suitability on Rainfall by Seeding Status",
    x = "Suitability Criterion (sne)",
    y = "Rainfall",
    color = "Seeding Occurred?"
  ) +
  theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
`geom_smooth()` using formula = 'y ~ x'

Both models show that the higher the sne (suitability) score is, the less rainfall is expected. This effect is greater in data points where cloud seeding has occurred than in those with normal conditions. This shows that cloud seeding tends to amplify the negative relationship between rainfall and suitability.

Conclusion

These results show that cloud seeding has no statistically significant impact on rainfall totals. Every test conducted showed that cloud seeding had no significant impact on rainfall totals, whereas other variables, such as suitability scores, had a greater impact.

DISCLAIMER:

ChatGPT and Gemini was used during the process to write the code and help interpret results.