Final Project3: Cloud Seeding

Author

Hyunjeong Sin

Load all the files

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
clouds <- read.csv("clouds.csv")

Explore the difference in rainfall between seeding and non-seeding experiments

Descriptive Statistics of Rainfall by Seeding Status

clouds %>%
  group_by(seeding) %>%
  summarise(n = n(), 
    mean_rainfall = mean(rainfall),
    sd_rainfall = sd(rainfall),
    median_rainfall = median(rainfall),
    min_rainfall = min(rainfall),
    max_rainfall = max(rainfall)
  )
# A tibble: 2 × 7
  seeding     n mean_rainfall sd_rainfall median_rainfall min_rainfall
  <chr>   <int>         <dbl>       <dbl>           <dbl>        <dbl>
1 no         12          4.17        3.52            4.06         0.28
2 yes        12          4.63        2.78            4.53         1.09
# ℹ 1 more variable: max_rainfall <dbl>

The mean and median rainfall are slightly higher in the seeded group (4.63 and 4.53) than in the non-seeded group (4.17 and 4.06). However, both groups show considerable variability and overlapping ranges. The non-seeded group has more variable rainfall, as indicated by a higher standard deviation (3.52 vs. 2.78). The minimum and maximum values are also similar between the two groups. Overall, these results suggest that seeding does not substantially or consistently increase rainfall.

Histogram of Rainfall by Seeding Status

ggplot(clouds, aes(x = rainfall, fill = seeding)) +
  geom_histogram(color =  "black", alpha = 0.6, position = "identity", bins = 20) +
  facet_wrap(~seeding) + theme_minimal() +
  ggtitle("Rainfall Distribution by Seeding Status")

Both the seeded and non-seeded groups exhibit a wide range of rainfall values with significant overlap in their distributions. The non-seeded group has more observations at the lower end of the rainfall spectrum, and both groups include some high outliers. Overall, there is no clear separation between the groups, and seeding does not produce a noticeable shift toward higher rainfall. The slightly higher means and medians in the seeded group do not indicate a substantial or consistent effect.

T-Test Comparing Rainfall Between Seeded and Non-Seeded Clouds

t.test(rainfall ~ seeding, data = clouds)

    Welch Two Sample t-test

data:  rainfall by seeding
t = -0.3574, df = 20.871, p-value = 0.7244
alternative hypothesis: true difference in means between group no and group yes is not equal to 0
95 percent confidence interval:
 -3.154691  2.229691
sample estimates:
 mean in group no mean in group yes 
         4.171667          4.634167 

The p-value for the difference in mean rainfall between the two groups is 0.72, which is well above the standard threshold for statistical significance. The 95% confidence interval for the mean difference (-3.15 to 2.23) includes zero, which reinforces the absence of a significant effect. Although mean rainfall was slightly higher in the seeded group (4.63 vs. 4.17), this difference is not statistically significant and may be due to random variation. In summary, there is no evidence that cloud seeding significantly impacts rainfall.

Build a multiple linear regression model to model the effects of seeding on rainfall

clouds$echomotion <- as.factor(clouds$echomotion)
lm1 <- lm(rainfall ~ seeding + cloudcover + prewetness + echomotion + sne, data = clouds)
summary(lm1)

Call:
lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion + 
    sne, data = clouds)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.1158 -1.7078 -0.2422  1.3368  6.4827 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)  
(Intercept)           6.37680    2.43432   2.620   0.0174 *
seedingyes            1.12011    1.20725   0.928   0.3658  
cloudcover            0.01821    0.11508   0.158   0.8761  
prewetness            2.55109    2.70090   0.945   0.3574  
echomotionstationary  2.59855    1.54090   1.686   0.1090  
sne                  -1.27530    0.68015  -1.875   0.0771 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.855 on 18 degrees of freedom
Multiple R-squared:  0.3403,    Adjusted R-squared:  0.157 
F-statistic: 1.857 on 5 and 18 DF,  p-value: 0.1524

Although the model explains approximately 34% of the variance in rainfall, none of the predictors are statistically significant at the 0.05 level. Echomotion (p = 0.11) and SNE (p = 0.077) demonstrate potential influence; however, their effects are only marginally significant. The seeding variable has a positive coefficient (1.12, p = 0.37); however, this effect is unreliable in this model. Overall, these results suggest that no single variable can predict rainfall reliably in this dataset. Larger samples or additional predictors may be necessary to clarify these relationships.

anova(lm1)
Analysis of Variance Table

Response: rainfall
           Df  Sum Sq Mean Sq F value  Pr(>F)  
seeding     1   1.283  1.2834  0.1575 0.69613  
cloudcover  1  15.738 15.7377  1.9313 0.18157  
prewetness  1   0.003  0.0027  0.0003 0.98557  
echomotion  1  29.985 29.9853  3.6798 0.07108 .
sne         1  28.649 28.6491  3.5158 0.07711 .
Residuals  18 146.677  8.1487                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Seeding, cloud cover, and prewetness all have high p-values (greater than 0.18), which suggests that they do not meaningfully explain rainfall in this model. Echomotion (p = 0.071) and SNE (p = 0.077) are closer to reaching marginal significance and may have an effect; however, their influence has not been statistically confirmed. Echomotion and SNE are more strongly associated with rainfall than the other predictors, yet none reach the standard threshold for statistical significance (p < 0.05).

Build two new models relating this variable to rainfall, one for the seeding experiments, one for the non-seeding experiments.

seeded <- filter(clouds, seeding == "yes")
noseed <- filter(clouds, seeding == "no")

lm_seeded <- lm(rainfall ~ sne, data = seeded)
lm_noseed <- lm(rainfall ~ sne, data = noseed)

summary(lm_seeded)

Call:
lm(formula = rainfall ~ sne, data = seeded)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0134 -1.3297 -0.3276  0.6171  4.3867 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  12.0202     2.9774   4.037  0.00237 **
sne          -2.2180     0.8722  -2.543  0.02921 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.27 on 10 degrees of freedom
Multiple R-squared:  0.3927,    Adjusted R-squared:  0.332 
F-statistic: 6.467 on 1 and 10 DF,  p-value: 0.02921
summary(lm_noseed)

Call:
lm(formula = rainfall ~ sne, data = noseed)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4892 -2.1762  0.2958  1.4902  7.3616 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    7.319      3.160   2.317    0.043 *
sne           -1.046      0.995  -1.052    0.318  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.502 on 10 degrees of freedom
Multiple R-squared:  0.09957,   Adjusted R-squared:  0.009528 
F-statistic: 1.106 on 1 and 10 DF,  p-value: 0.3177

In the seeded group, the regression coefficient for SNE was -2.22 (p = 0.029), indicating a significant negative association. As the suitability index increases by one, rainfall decreases by approximately 2.22 units. For non-seeded clouds, the coefficient was -1.05 (p = 0.318), showing a weaker, non-significant relationship. The steeper, significant negative slope in the seeded group suggests that seeding may reduce rainfall when the suitability index is high. This pattern is much less clear for non-seeded clouds. Overall, these results suggest that seeding effectiveness is strongly influenced by the suitability index and that seeding could be less effective—or even counterproductive—when SNE is high. ### Produce a figure showing the two models

ggplot(clouds, aes(x = sne, y = rainfall, color = seeding)) +
  geom_point(size = 2, alpha = 0.8) +
  geom_smooth(method = "lm", se = FALSE, lwd = 1.2) +
  theme_minimal() +
  labs(
    title = "Relationship between SNE and Rainfall by Seeding",
    x = "Suitability Index (SNE)",
    y = "Rainfall"
  )
`geom_smooth()` using formula = 'y ~ x'

The blue line for the seeded group has a much steeper negative slope, confirming the regression result that rainfall tends to decrease sharply as SNE increases when seeding is performed. In contrast, the non-seeded group has a much flatter (and non-significant) downward slope, indicating only a slight decrease in rainfall as SNE increases.
This figure clearly illustrates that the effect of SNE on rainfall is much stronger in seeded clouds compared to non-seeded clouds.