Methods

First I imported clouds.csv and looked at its structure and summary statistics. Then I computed the descriptive statistics and created boxplots to compare rainfall by seeding status. After that I performed a two-sample t-test to assess mean differences between seeded and non-seeded clouds. Once I was done with that, I fitted a multiple linear regression with all predictors and inspected coefficients and ANOVA results. Then I built separate simple regressions of rainfall v.s. sne for seeded and non-seeded subsets. Finally, I plotted fitted lines to show the relationship between suitability and rainfall by seeding status.

Part 1: Difference in Rainfall Between Seeding and Non-Seeding

Descriptive Statistics

clouds <- read.csv("clouds.csv", stringsAsFactors = TRUE)
stats <- clouds %>%
  group_by(seeding) %>%
  summarize(
    n      = n(),
    mean   = mean(rainfall),
    sd     = sd(rainfall),
    median = median(rainfall),
    IQR    = IQR(rainfall)
  )
stats

## # A tibble: 2 × 6
##   seeding     n  mean    sd median   IQR
##   <fct>   <int> <dbl> <dbl>  <dbl> <dbl>
## 1 no         12  4.17  3.52   4.06  4.76
## 2 yes        12  4.63  2.78   4.53  2.78

Seeded clouds have a slightly higher average rainfall (0.4625 10^8 m^3) and lower variability (SD 2.78 vs. 3.52).

Boxplot Visualization

ggplot(clouds, aes(x = seeding, y = rainfall)) +
  geom_boxplot() +
  labs(
    x = "Seeding Status",
    y = "Rainfall (×10^8 m^3)",
    title = "Rainfall by Seeding Status"
  )

The boxplot shows overlapping distributions; medians are similar (4.06 vs. 4.525) and the interquartile range is wider for non-seeded.

Two-Sample t-Test

t_res <- t.test(rainfall ~ seeding, data = clouds)
t_res

## 
##  Welch Two Sample t-test
## 
## data:  rainfall by seeding
## t = -0.3574, df = 20.871, p-value = 0.7244
## alternative hypothesis: true difference in means between group no and group yes is not equal to 0
## 95 percent confidence interval:
##  -3.154691  2.229691
## sample estimates:
##  mean in group no mean in group yes 
##          4.171667          4.634167

Since p = 0.7244 > 0.05, we conclude there is no significant difference in mean rainfall between seeded and non-seeded clouds.

Part 2: Multiple Linear Regression Model

clouds$echomotion <- relevel(clouds$echomotion, ref = "stationary")
model_all <- lm(rainfall ~ seeding + cloudcover + prewetness + echomotion + sne, data = clouds)
summary(model_all)

## 
## Call:
## lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion + 
##     sne, data = clouds)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1158 -1.7078 -0.2422  1.3368  6.4827 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)       8.97535    2.43913   3.680  0.00171 **
## seedingyes        1.12011    1.20725   0.928  0.36578   
## cloudcover        0.01821    0.11508   0.158  0.87606   
## prewetness        2.55109    2.70090   0.945  0.35741   
## echomotionmoving -2.59855    1.54090  -1.686  0.10898   
## sne              -1.27530    0.68015  -1.875  0.07711 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.855 on 18 degrees of freedom
## Multiple R-squared:  0.3403, Adjusted R-squared:  0.157 
## F-statistic: 1.857 on 5 and 18 DF,  p-value: 0.1524

anova(model_all)

## Analysis of Variance Table
## 
## Response: rainfall
##            Df  Sum Sq Mean Sq F value  Pr(>F)  
## seeding     1   1.283  1.2834  0.1575 0.69613  
## cloudcover  1  15.738 15.7377  1.9313 0.18157  
## prewetness  1   0.003  0.0027  0.0003 0.98557  
## echomotion  1  29.985 29.9853  3.6798 0.07108 .
## sne         1  28.649 28.6491  3.5158 0.07711 .
## Residuals  18 146.677  8.1487                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

None of the predictors reach α = 0.05, though echomotion (p = 0.7108) and sne (p = 0.07711) are the closest.

echomotion and sne explain the most variance in rainfall with the highest F-values, but neither is statistically significant at α = 0.05.

Part 3: Relationship Between sne and Rainfall by Seeding Status

Stratified Regression Models

m_no  <- lm(rainfall ~ sne, data = filter(clouds, seeding == "no"))
m_yes <- lm(rainfall ~ sne, data = filter(clouds, seeding == "yes"))
coef_no  <- coef(summary(m_no))
coef_yes <- coef(summary(m_yes))
coef_no; coef_yes

##              Estimate Std. Error   t value   Pr(>|t|)
## (Intercept)  7.319500  3.1595671  2.316615 0.04302009
## sne         -1.046371  0.9950467 -1.051580 0.31773971

##              Estimate Std. Error   t value    Pr(>|t|)
## (Intercept) 12.020237   2.977439  4.037106 0.002372239
## sne         -2.218039   0.872211 -2.543008 0.029212123

Only the seeded slope is significant. The more negative slope indicates a stronger inverse relationship between suitability and rainfall when seeding is applied.

Regression Lines Plot

ggplot(clouds, aes(x = sne, y = rainfall, color = seeding)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    x = "Suitability Index (sne)",
    y = "Rainfall (10^8 m^3)",
    title = "Rainfall vs. sne by Seeding Status"
  )

## `geom_smooth()` using formula = 'y ~ x'

The regression line for seeded clouds has a steeper negative slope (-2.218) compared to non-seeded (-1.046), showing that higher sne is associated with larger decreases in rainfall when seeding is used.

Conclusion

Overall, seeding does not alter mean rainfall but changes how suitability impacts rainfall outcomes.

Cloud Seeding Experiment Analysis

Lucy Engar

2025-06-27