You can access the interactive version of this project here:
https://seanplawler.shinyapps.io/cloudseedingapp/
This project analyzes data from a 1975 cloud seeding experiment that tested whether introducing large quantities of silver iodide into clouds could increase rainfall. The dataset includes information about seeding, cloud characteristics, pre-seeding rainfall, and actual rainfall. My goal was to determine whether seeding significantly impacts rainfall and how that effect varies with other atmospheric variables like cloud cover and a suitability index (SNE).
# Load Required Libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggpubr)
# Load and Inspect the Data
clouds <- read.csv("clouds.csv")
str(clouds)
## 'data.frame': 24 obs. of 8 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ seeding : chr "no" "yes" "yes" "no" ...
## $ time : int 0 1 3 4 6 9 18 25 27 28 ...
## $ sne : num 1.75 2.7 4.1 2.35 4.25 1.6 1.3 3.35 2.85 2.2 ...
## $ cloudcover: num 13.4 37.9 3.9 5.3 7.1 6.9 4.6 4.9 12.1 5.2 ...
## $ prewetness: num 0.274 1.267 0.198 0.526 0.25 ...
## $ echomotion: chr "stationary" "moving" "stationary" "moving" ...
## $ rainfall : num 12.85 5.52 6.29 6.11 2.45 ...
summary(clouds)
## X seeding time sne
## Min. : 1.00 Length:24 Min. : 0.00 Min. :1.300
## 1st Qu.: 6.75 Class :character 1st Qu.:15.75 1st Qu.:2.612
## Median :12.50 Mode :character Median :32.50 Median :3.250
## Mean :12.50 Mean :35.33 Mean :3.169
## 3rd Qu.:18.25 3rd Qu.:55.25 3rd Qu.:3.962
## Max. :24.00 Max. :83.00 Max. :4.650
## cloudcover prewetness echomotion rainfall
## Min. : 2.200 Min. :0.0180 Length:24 Min. : 0.280
## 1st Qu.: 3.750 1st Qu.:0.1405 Class :character 1st Qu.: 2.342
## Median : 5.250 Median :0.2220 Mode :character Median : 4.335
## Mean : 7.246 Mean :0.3271 Mean : 4.403
## 3rd Qu.: 7.175 3rd Qu.:0.3297 3rd Qu.: 5.575
## Max. :37.900 Max. :1.2670 Max. :12.850
# Compare Rainfall by Seeding Status
rainfall_summary <- clouds %>%
group_by(seeding) %>%
summarise(
mean_rainfall = mean(rainfall, na.rm = TRUE),
sd_rainfall = sd(rainfall, na.rm = TRUE),
n = n()
)
print(rainfall_summary)
## # A tibble: 2 × 4
## seeding mean_rainfall sd_rainfall n
## <chr> <dbl> <dbl> <int>
## 1 no 4.17 3.52 12
## 2 yes 4.63 2.78 12
# Boxplot
p1 <- ggboxplot(clouds, x = "seeding", y = "rainfall",
color = "seeding", palette = "jco",
add = "jitter") +
theme_minimal()
print(p1)
# T-test
ttest <- t.test(rainfall ~ seeding, data = clouds)
print(ttest)
##
## Welch Two Sample t-test
##
## data: rainfall by seeding
## t = -0.3574, df = 20.871, p-value = 0.7244
## alternative hypothesis: true difference in means between group no and group yes is not equal to 0
## 95 percent confidence interval:
## -3.154691 2.229691
## sample estimates:
## mean in group no mean in group yes
## 4.171667 4.634167
# Multiple Linear Regression
model1 <- lm(rainfall ~ seeding + cloudcover + prewetness + echomotion + sne, data = clouds)
print(summary(model1))
##
## Call:
## lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion +
## sne, data = clouds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1158 -1.7078 -0.2422 1.3368 6.4827
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.37680 2.43432 2.620 0.0174 *
## seedingyes 1.12011 1.20725 0.928 0.3658
## cloudcover 0.01821 0.11508 0.158 0.8761
## prewetness 2.55109 2.70090 0.945 0.3574
## echomotionstationary 2.59855 1.54090 1.686 0.1090
## sne -1.27530 0.68015 -1.875 0.0771 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.855 on 18 degrees of freedom
## Multiple R-squared: 0.3403, Adjusted R-squared: 0.157
## F-statistic: 1.857 on 5 and 18 DF, p-value: 0.1524
print(anova(model1))
## Analysis of Variance Table
##
## Response: rainfall
## Df Sum Sq Mean Sq F value Pr(>F)
## seeding 1 1.283 1.2834 0.1575 0.69613
## cloudcover 1 15.738 15.7377 1.9313 0.18157
## prewetness 1 0.003 0.0027 0.0003 0.98557
## echomotion 1 29.985 29.9853 3.6798 0.07108 .
## sne 1 28.649 28.6491 3.5158 0.07711 .
## Residuals 18 146.677 8.1487
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Seeding-specific models
seed_yes <- filter(clouds, seeding == "yes")
seed_no <- filter(clouds, seeding == "no")
model_yes <- lm(rainfall ~ sne, data = seed_yes)
model_no <- lm(rainfall ~ sne, data = seed_no)
print(summary(model_yes))
##
## Call:
## lm(formula = rainfall ~ sne, data = seed_yes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0134 -1.3297 -0.3276 0.6171 4.3867
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.0202 2.9774 4.037 0.00237 **
## sne -2.2180 0.8722 -2.543 0.02921 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.27 on 10 degrees of freedom
## Multiple R-squared: 0.3927, Adjusted R-squared: 0.332
## F-statistic: 6.467 on 1 and 10 DF, p-value: 0.02921
print(summary(model_no))
##
## Call:
## lm(formula = rainfall ~ sne, data = seed_no)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.4892 -2.1762 0.2958 1.4902 7.3616
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.319 3.160 2.317 0.043 *
## sne -1.046 0.995 -1.052 0.318
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.502 on 10 degrees of freedom
## Multiple R-squared: 0.09957, Adjusted R-squared: 0.009528
## F-statistic: 1.106 on 1 and 10 DF, p-value: 0.3177
# Combined scatter + regression lines
p2 <- ggplot(clouds, aes(x = sne, y = rainfall, color = seeding)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal() +
labs(title = "Rainfall vs SNE by Seeding",
x = "Suitability Index (SNE)",
y = "Rainfall (×10⁸ m³)")
print(p2)
## `geom_smooth()` using formula = 'y ~ x'
The average rainfall was slightly higher for seeded clouds (4.63) compared to non-seeded ones (4.17). However, a t-test showed no statistically significant difference (p ≈ 0.72), suggesting that seeding alone did not significantly increase rainfall totals.
The multiple regression model explained about 34% of the variance in rainfall (R² = 0.34). Only the intercept was statistically significant. SNE (suitability index) was marginally significant (p ≈ 0.077), indicating a possible relationship worth further exploration.
When I modeled rainfall against SNE for seeded and non-seeded clouds separately:
This was surprising, since SNE is meant to measure how “suitable” a cloud is for seeding, yet more suitable clouds produced less rainfall when seeded.
The scatterplot confirmed this pattern. A clear negative trend in seeded clouds, but no meaningful trend in non-seeded clouds.
While simple comparisons didn’t show a clear effect of seeding on rainfall, the regression analysis revealed that the suitability index (SNE) may modify seeding effectiveness. This project shows how deeper statistical modeling can reveal patterns that aren’t obvious from simple group comparisons.