Final_Project_Part 3

Author

Cienna Kim

Published

June 19, 2026

Introduction

The dataset clouds.csv contains information from a cloud seeding experiment conducted during the summer of 1975. The purpose of the experiment was to investigate whether cloud seeding using silver iodide could increase rainfall. The dataset includes information about seeding status, cloud characteristics, and measured rainfall.

The goal of this analysis is to:

  1. Compare rainfall between seeded and non-seeded clouds.
  2. Build a multiple linear regression model to evaluate factors affecting rainfall.
  3. Examine the relationship between sne and rainfall separately for seeded and non-seeded clouds.

Load Data

clouds <- read.csv("clouds.csv")

clouds$seeding <- factor(clouds$seeding)
clouds$echomotion <- factor(clouds$echomotion)

str(clouds)
'data.frame':   24 obs. of  8 variables:
 $ X         : int  1 2 3 4 5 6 7 8 9 10 ...
 $ seeding   : Factor w/ 2 levels "no","yes": 1 2 2 1 2 1 1 1 1 2 ...
 $ time      : int  0 1 3 4 6 9 18 25 27 28 ...
 $ sne       : num  1.75 2.7 4.1 2.35 4.25 1.6 1.3 3.35 2.85 2.2 ...
 $ cloudcover: num  13.4 37.9 3.9 5.3 7.1 6.9 4.6 4.9 12.1 5.2 ...
 $ prewetness: num  0.274 1.267 0.198 0.526 0.25 ...
 $ echomotion: Factor w/ 2 levels "moving","stationary": 2 1 2 1 1 2 1 1 1 1 ...
 $ rainfall  : num  12.85 5.52 6.29 6.11 2.45 ...
summary(clouds)
       X         seeding       time            sne          cloudcover    
 Min.   : 1.00   no :12   Min.   : 0.00   Min.   :1.300   Min.   : 2.200  
 1st Qu.: 6.75   yes:12   1st Qu.:15.75   1st Qu.:2.612   1st Qu.: 3.750  
 Median :12.50            Median :32.50   Median :3.250   Median : 5.250  
 Mean   :12.50            Mean   :35.33   Mean   :3.169   Mean   : 7.246  
 3rd Qu.:18.25            3rd Qu.:55.25   3rd Qu.:3.962   3rd Qu.: 7.175  
 Max.   :24.00            Max.   :83.00   Max.   :4.650   Max.   :37.900  
   prewetness          echomotion    rainfall     
 Min.   :0.0180   moving    :19   Min.   : 0.280  
 1st Qu.:0.1405   stationary: 5   1st Qu.: 2.342  
 Median :0.2220                   Median : 4.335  
 Mean   :0.3271                   Mean   : 4.403  
 3rd Qu.:0.3297                   3rd Qu.: 5.575  
 Max.   :1.2670                   Max.   :12.850  

Rainfall and Seeding

Summary Statistics

# mean
aggregate(rainfall ~ seeding,
          data = clouds,
          mean)
  seeding rainfall
1      no 4.171667
2     yes 4.634167
# sd
aggregate(rainfall ~ seeding,
          data = clouds,
          sd)
  seeding rainfall
1      no 3.519196
2     yes 2.776841
# median
aggregate(rainfall ~ seeding,
          data = clouds,
          median)
  seeding rainfall
1      no    4.055
2     yes    4.530

Results

The average rainfall for seeded clouds was approximately 4.63, while the average rainfall for non-seeded clouds was approximately 4.17. The standard deviation was slightly lower in the seeded group.

Visualization

boxplot(rainfall ~ seeding,
        data = clouds,
        main = "Rainfall by Seeding Status",
        xlab = "Seeding",
        ylab = "Rainfall")

The boxplot shows substantial overlap between the two groups, suggesting that any difference in rainfall may be small.

t-test

t.test(rainfall ~ seeding, data = clouds)

    Welch Two Sample t-test

data:  rainfall by seeding
t = -0.3574, df = 20.871, p-value = 0.7244
alternative hypothesis: true difference in means between group no and group yes is not equal to 0
95 percent confidence interval:
 -3.154691  2.229691
sample estimates:
 mean in group no mean in group yes 
         4.171667          4.634167 

Interpretation

The t-test produced a p-value of approximately 0.724.

Because the p-value is much larger than 0.05, there is no statistically significant evidence that cloud seeding increased rainfall in this dataset.

Multiple Linear Regression

A multiple Linear Regression

Model Summary

  • seeding
  • cloudcover
  • prewetness
  • echomotion
  • sne

as predictors of rainfall.

model1 <- lm(rainfall ~ seeding +
                           cloudcover +
                           prewetness +
                           echomotion +
                           sne,
             data = clouds)

summary(model1)

Call:
lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion + 
    sne, data = clouds)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.1158 -1.7078 -0.2422  1.3368  6.4827 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)  
(Intercept)           6.37680    2.43432   2.620   0.0174 *
seedingyes            1.12011    1.20725   0.928   0.3658  
cloudcover            0.01821    0.11508   0.158   0.8761  
prewetness            2.55109    2.70090   0.945   0.3574  
echomotionstationary  2.59855    1.54090   1.686   0.1090  
sne                  -1.27530    0.68015  -1.875   0.0771 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.855 on 18 degrees of freedom
Multiple R-squared:  0.3403,    Adjusted R-squared:  0.157 
F-statistic: 1.857 on 5 and 18 DF,  p-value: 0.1524

ANOVA Results

anova(model1)
Analysis of Variance Table

Response: rainfall
           Df  Sum Sq Mean Sq F value  Pr(>F)  
seeding     1   1.283  1.2834  0.1575 0.69613  
cloudcover  1  15.738 15.7377  1.9313 0.18157  
prewetness  1   0.003  0.0027  0.0003 0.98557  
echomotion  1  29.985 29.9853  3.6798 0.07108 .
sne         1  28.649 28.6491  3.5158 0.07711 .
Residuals  18 146.677  8.1487                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual Analysis

hist(residuals(model1),
     xlab = "Residuals",
     main = "Distribution of Residuals")

Interpretation

The ANOVA results indicate that echomotion and sne have the strongest relationship with rainfall, as they explain rhe largest proportion of variation in rainfall. However, neither variable is statistically significant at the 0.05 level, although both are close to significance (p < 0.10).

Relationship Between sne and Rainfall

Because sne appears to be related to rainfall, separate linear models were fit for seeded and non-seeded clouds.

Non-seeded Clouds

model_no <- lm(rainfall ~ sne,
               data = subset(clouds, seeding == "no"))

summary(model_no)

Call:
lm(formula = rainfall ~ sne, data = subset(clouds, seeding == 
    "no"))

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4892 -2.1762  0.2958  1.4902  7.3616 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    7.319      3.160   2.317    0.043 *
sne           -1.046      0.995  -1.052    0.318  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.502 on 10 degrees of freedom
Multiple R-squared:  0.09957,   Adjusted R-squared:  0.009528 
F-statistic: 1.106 on 1 and 10 DF,  p-value: 0.3177

Seeded Clouds

model_yes <- lm(rainfall ~ sne,
                data = subset(clouds, seeding == "yes"))

summary(model_yes)

Call:
lm(formula = rainfall ~ sne, data = subset(clouds, seeding == 
    "yes"))

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0134 -1.3297 -0.3276  0.6171  4.3867 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  12.0202     2.9774   4.037  0.00237 **
sne          -2.2180     0.8722  -2.543  0.02921 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.27 on 10 degrees of freedom
Multiple R-squared:  0.3927,    Adjusted R-squared:  0.332 
F-statistic: 6.467 on 1 and 10 DF,  p-value: 0.02921

Compare Slopes

coef(model_no)
(Intercept)         sne 
   7.319500   -1.046371 
coef(model_yes)
(Intercept)         sne 
  12.020237   -2.218039 

The estimated models are:

Non-seeded clouds

rainfall = 7.32 - 1.05(sne)

Seeded clouds

rainfall = 12.02 - 2.22(sne)

The seeded group has a steeper negative slope than the non-seeded group.

Visualization

plot(rainfall ~ sne,
     data = clouds,
     col = ifelse(clouds$seeding == "yes", "red", "black"),
     pch = 16,
     xlab = "sne",
     ylab = "rainfall",
     main = "Rainfall vs sne")

abline(model_no,
       col = "black")

abline(model_yes,
       col = "red")

legend("topright",
       legend = c("No Seeding", "Seeding"),
       col = c("black", "red"),
       pch = 16)

Interpretation

Both groups show a negative relationship between sne and rainfall.

However, the seeded group has a steeper negative slope than the non-seeded group. This suggests that rainfall decreases more rapidly as sne increases in seeded clouds, indicating that the relationship between sne and rainfall differs depending on whether seeding occurred.

Conclusion

This analysis examined the effect of cloud seeding on rainfall.

The comparison of seeded and non-seeded clouds showed that the seeded group had slightly higher average rainfall. However, the t-test indicated that this difference was not statistically significant.

The multiple linear regression model suggested that sne and echomotion are the variables most strongly associated with rainfall.

Finally, separate models of rainfall versus sne showed that the seeded group had a steeper negative slope than the non-seeded group, suggesting that the effect of sne on rainfall may depend on seeding status.