Methods

First I imported clouds.csv and looked at its structure and summary statistics. Then I computed the descriptive statistics and created boxplots to compare rainfall by seeding status. After that I performed a two-sample t-test to assess mean differences between seeded and non-seeded clouds. Once I was done with that, I fitted a multiple linear regression with all predictors and inspected coefficients and ANOVA results. Then I built separate simple regressions of rainfall v.s. sne for seeded and non-seeded subsets. Finally, I plotted fitted lines to show the relationship between suitability and rainfall by seeding status.

Part 1: Difference in Rainfall Between Seeding and Non-Seeding

Descriptive Statistics

clouds <- read.csv("clouds.csv", stringsAsFactors = TRUE)
stats <- clouds %>%
  group_by(seeding) %>%
  summarize(
    n      = n(),
    mean   = mean(rainfall),
    sd     = sd(rainfall),
    median = median(rainfall),
    IQR    = IQR(rainfall)
  )
stats
## # A tibble: 2 × 6
##   seeding     n  mean    sd median   IQR
##   <fct>   <int> <dbl> <dbl>  <dbl> <dbl>
## 1 no         12  4.17  3.52   4.06  4.76
## 2 yes        12  4.63  2.78   4.53  2.78

Seeded clouds have a slightly higher average rainfall (0.4625 10^8 m^3) and lower variability (SD 2.78 vs. 3.52).

Boxplot Visualization

ggplot(clouds, aes(x = seeding, y = rainfall)) +
  geom_boxplot() +
  labs(
    x = "Seeding Status",
    y = "Rainfall (×10^8 m^3)",
    title = "Rainfall by Seeding Status"
  )

The boxplot shows overlapping distributions; medians are similar (4.06 vs. 4.525) and the interquartile range is wider for non-seeded.

Two-Sample t-Test

t_res <- t.test(rainfall ~ seeding, data = clouds)
t_res
## 
##  Welch Two Sample t-test
## 
## data:  rainfall by seeding
## t = -0.3574, df = 20.871, p-value = 0.7244
## alternative hypothesis: true difference in means between group no and group yes is not equal to 0
## 95 percent confidence interval:
##  -3.154691  2.229691
## sample estimates:
##  mean in group no mean in group yes 
##          4.171667          4.634167

Since p = 0.7244 > 0.05, we conclude there is no significant difference in mean rainfall between seeded and non-seeded clouds.

Part 2: Multiple Linear Regression Model

clouds$echomotion <- relevel(clouds$echomotion, ref = "stationary")
model_all <- lm(rainfall ~ seeding + cloudcover + prewetness + echomotion + sne, data = clouds)
summary(model_all)
## 
## Call:
## lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion + 
##     sne, data = clouds)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1158 -1.7078 -0.2422  1.3368  6.4827 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)       8.97535    2.43913   3.680  0.00171 **
## seedingyes        1.12011    1.20725   0.928  0.36578   
## cloudcover        0.01821    0.11508   0.158  0.87606   
## prewetness        2.55109    2.70090   0.945  0.35741   
## echomotionmoving -2.59855    1.54090  -1.686  0.10898   
## sne              -1.27530    0.68015  -1.875  0.07711 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.855 on 18 degrees of freedom
## Multiple R-squared:  0.3403, Adjusted R-squared:  0.157 
## F-statistic: 1.857 on 5 and 18 DF,  p-value: 0.1524
anova(model_all)
## Analysis of Variance Table
## 
## Response: rainfall
##            Df  Sum Sq Mean Sq F value  Pr(>F)  
## seeding     1   1.283  1.2834  0.1575 0.69613  
## cloudcover  1  15.738 15.7377  1.9313 0.18157  
## prewetness  1   0.003  0.0027  0.0003 0.98557  
## echomotion  1  29.985 29.9853  3.6798 0.07108 .
## sne         1  28.649 28.6491  3.5158 0.07711 .
## Residuals  18 146.677  8.1487                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

None of the predictors reach α = 0.05, though echomotion (p = 0.7108) and sne (p = 0.07711) are the closest.

echomotion and sne explain the most variance in rainfall with the highest F-values, but neither is statistically significant at α = 0.05.

Part 3: Relationship Between sne and Rainfall by Seeding Status

Stratified Regression Models

m_no  <- lm(rainfall ~ sne, data = filter(clouds, seeding == "no"))
m_yes <- lm(rainfall ~ sne, data = filter(clouds, seeding == "yes"))
coef_no  <- coef(summary(m_no))
coef_yes <- coef(summary(m_yes))
coef_no; coef_yes
##              Estimate Std. Error   t value   Pr(>|t|)
## (Intercept)  7.319500  3.1595671  2.316615 0.04302009
## sne         -1.046371  0.9950467 -1.051580 0.31773971
##              Estimate Std. Error   t value    Pr(>|t|)
## (Intercept) 12.020237   2.977439  4.037106 0.002372239
## sne         -2.218039   0.872211 -2.543008 0.029212123

Only the seeded slope is significant. The more negative slope indicates a stronger inverse relationship between suitability and rainfall when seeding is applied.

Regression Lines Plot

ggplot(clouds, aes(x = sne, y = rainfall, color = seeding)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    x = "Suitability Index (sne)",
    y = "Rainfall (10^8 m^3)",
    title = "Rainfall vs. sne by Seeding Status"
  )
## `geom_smooth()` using formula = 'y ~ x'

The regression line for seeded clouds has a steeper negative slope (-2.218) compared to non-seeded (-1.046), showing that higher sne is associated with larger decreases in rainfall when seeding is used.

Conclusion

Overall, seeding does not alter mean rainfall but changes how suitability impacts rainfall outcomes.