View the Interactive App

You can access the interactive version of this project here:
https://seanplawler.shinyapps.io/cloudseedingapp/


Introduction

This project analyzes data from a 1975 cloud seeding experiment that tested whether introducing large quantities of silver iodide into clouds could increase rainfall. The dataset includes information about seeding, cloud characteristics, pre-seeding rainfall, and actual rainfall. My goal was to determine whether seeding significantly impacts rainfall and how that effect varies with other atmospheric variables like cloud cover and a suitability index (SNE).

Cloud Seeding Analysis

# Load Required Libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggpubr)

# Load and Inspect the Data
clouds <- read.csv("clouds.csv")
str(clouds)
## 'data.frame':    24 obs. of  8 variables:
##  $ X         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ seeding   : chr  "no" "yes" "yes" "no" ...
##  $ time      : int  0 1 3 4 6 9 18 25 27 28 ...
##  $ sne       : num  1.75 2.7 4.1 2.35 4.25 1.6 1.3 3.35 2.85 2.2 ...
##  $ cloudcover: num  13.4 37.9 3.9 5.3 7.1 6.9 4.6 4.9 12.1 5.2 ...
##  $ prewetness: num  0.274 1.267 0.198 0.526 0.25 ...
##  $ echomotion: chr  "stationary" "moving" "stationary" "moving" ...
##  $ rainfall  : num  12.85 5.52 6.29 6.11 2.45 ...
summary(clouds)
##        X           seeding               time            sne       
##  Min.   : 1.00   Length:24          Min.   : 0.00   Min.   :1.300  
##  1st Qu.: 6.75   Class :character   1st Qu.:15.75   1st Qu.:2.612  
##  Median :12.50   Mode  :character   Median :32.50   Median :3.250  
##  Mean   :12.50                      Mean   :35.33   Mean   :3.169  
##  3rd Qu.:18.25                      3rd Qu.:55.25   3rd Qu.:3.962  
##  Max.   :24.00                      Max.   :83.00   Max.   :4.650  
##    cloudcover       prewetness      echomotion           rainfall     
##  Min.   : 2.200   Min.   :0.0180   Length:24          Min.   : 0.280  
##  1st Qu.: 3.750   1st Qu.:0.1405   Class :character   1st Qu.: 2.342  
##  Median : 5.250   Median :0.2220   Mode  :character   Median : 4.335  
##  Mean   : 7.246   Mean   :0.3271                      Mean   : 4.403  
##  3rd Qu.: 7.175   3rd Qu.:0.3297                      3rd Qu.: 5.575  
##  Max.   :37.900   Max.   :1.2670                      Max.   :12.850
# Compare Rainfall by Seeding Status
rainfall_summary <- clouds %>%
  group_by(seeding) %>%
  summarise(
    mean_rainfall = mean(rainfall, na.rm = TRUE),
    sd_rainfall = sd(rainfall, na.rm = TRUE),
    n = n()
  )
print(rainfall_summary)
## # A tibble: 2 × 4
##   seeding mean_rainfall sd_rainfall     n
##   <chr>           <dbl>       <dbl> <int>
## 1 no               4.17        3.52    12
## 2 yes              4.63        2.78    12
# Boxplot
p1 <- ggboxplot(clouds, x = "seeding", y = "rainfall",
                color = "seeding", palette = "jco",
                add = "jitter") +
       theme_minimal()
print(p1)

# T-test
ttest <- t.test(rainfall ~ seeding, data = clouds)
print(ttest)
## 
##  Welch Two Sample t-test
## 
## data:  rainfall by seeding
## t = -0.3574, df = 20.871, p-value = 0.7244
## alternative hypothesis: true difference in means between group no and group yes is not equal to 0
## 95 percent confidence interval:
##  -3.154691  2.229691
## sample estimates:
##  mean in group no mean in group yes 
##          4.171667          4.634167
# Multiple Linear Regression
model1 <- lm(rainfall ~ seeding + cloudcover + prewetness + echomotion + sne, data = clouds)
print(summary(model1))
## 
## Call:
## lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion + 
##     sne, data = clouds)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1158 -1.7078 -0.2422  1.3368  6.4827 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)  
## (Intercept)           6.37680    2.43432   2.620   0.0174 *
## seedingyes            1.12011    1.20725   0.928   0.3658  
## cloudcover            0.01821    0.11508   0.158   0.8761  
## prewetness            2.55109    2.70090   0.945   0.3574  
## echomotionstationary  2.59855    1.54090   1.686   0.1090  
## sne                  -1.27530    0.68015  -1.875   0.0771 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.855 on 18 degrees of freedom
## Multiple R-squared:  0.3403, Adjusted R-squared:  0.157 
## F-statistic: 1.857 on 5 and 18 DF,  p-value: 0.1524
print(anova(model1))
## Analysis of Variance Table
## 
## Response: rainfall
##            Df  Sum Sq Mean Sq F value  Pr(>F)  
## seeding     1   1.283  1.2834  0.1575 0.69613  
## cloudcover  1  15.738 15.7377  1.9313 0.18157  
## prewetness  1   0.003  0.0027  0.0003 0.98557  
## echomotion  1  29.985 29.9853  3.6798 0.07108 .
## sne         1  28.649 28.6491  3.5158 0.07711 .
## Residuals  18 146.677  8.1487                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Seeding-specific models
seed_yes <- filter(clouds, seeding == "yes")
seed_no  <- filter(clouds, seeding == "no")

model_yes <- lm(rainfall ~ sne, data = seed_yes)
model_no  <- lm(rainfall ~ sne, data = seed_no)

print(summary(model_yes))
## 
## Call:
## lm(formula = rainfall ~ sne, data = seed_yes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0134 -1.3297 -0.3276  0.6171  4.3867 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  12.0202     2.9774   4.037  0.00237 **
## sne          -2.2180     0.8722  -2.543  0.02921 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.27 on 10 degrees of freedom
## Multiple R-squared:  0.3927, Adjusted R-squared:  0.332 
## F-statistic: 6.467 on 1 and 10 DF,  p-value: 0.02921
print(summary(model_no))
## 
## Call:
## lm(formula = rainfall ~ sne, data = seed_no)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.4892 -2.1762  0.2958  1.4902  7.3616 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    7.319      3.160   2.317    0.043 *
## sne           -1.046      0.995  -1.052    0.318  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.502 on 10 degrees of freedom
## Multiple R-squared:  0.09957,    Adjusted R-squared:  0.009528 
## F-statistic: 1.106 on 1 and 10 DF,  p-value: 0.3177
# Combined scatter + regression lines
p2 <- ggplot(clouds, aes(x = sne, y = rainfall, color = seeding)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal() +
  labs(title = "Rainfall vs SNE by Seeding",
       x = "Suitability Index (SNE)",
       y = "Rainfall (×10⁸ m³)")
print(p2)
## `geom_smooth()` using formula = 'y ~ x'

Interpretation and Conclusion

Rainfall by Seeding

The average rainfall was slightly higher for seeded clouds (4.63) compared to non-seeded ones (4.17). However, a t-test showed no statistically significant difference (p ≈ 0.72), suggesting that seeding alone did not significantly increase rainfall totals.

Multiple Regression

The multiple regression model explained about 34% of the variance in rainfall (R² = 0.34). Only the intercept was statistically significant. SNE (suitability index) was marginally significant (p ≈ 0.077), indicating a possible relationship worth further exploration.

Stratified Models by Seeding Status

When I modeled rainfall against SNE for seeded and non-seeded clouds separately:

  • For seeded clouds, the model was significant (p ≈ 0.03) with a negative slope: higher SNE was associated with less rainfall.
  • For non-seeded clouds, the model was not statistically significant.

This was surprising, since SNE is meant to measure how “suitable” a cloud is for seeding, yet more suitable clouds produced less rainfall when seeded.

Visualization

The scatterplot confirmed this pattern. A clear negative trend in seeded clouds, but no meaningful trend in non-seeded clouds.

Conclusion

While simple comparisons didn’t show a clear effect of seeding on rainfall, the regression analysis revealed that the suitability index (SNE) may modify seeding effectiveness. This project shows how deeper statistical modeling can reveal patterns that aren’t obvious from simple group comparisons.