project03

Author

Hangu Lee

1. Data Exploration & t-test

# Load the dataset
clouds <- read.csv("clouds.csv")

# Check the structure and summarized values
str(clouds)

'data.frame':   24 obs. of  8 variables:
 $ X         : int  1 2 3 4 5 6 7 8 9 10 ...
 $ seeding   : chr  "no" "yes" "yes" "no" ...
 $ time      : int  0 1 3 4 6 9 18 25 27 28 ...
 $ sne       : num  1.75 2.7 4.1 2.35 4.25 1.6 1.3 3.35 2.85 2.2 ...
 $ cloudcover: num  13.4 37.9 3.9 5.3 7.1 6.9 4.6 4.9 12.1 5.2 ...
 $ prewetness: num  0.274 1.267 0.198 0.526 0.25 ...
 $ echomotion: chr  "stationary" "moving" "stationary" "moving" ...
 $ rainfall  : num  12.85 5.52 6.29 6.11 2.45 ...

summary(clouds)

       X              seeding        time            sne       
 Min.   : 1.00   Length   :24   Min.   : 0.00   Min.   :1.300  
 1st Qu.: 6.75   N.unique : 2   1st Qu.:15.75   1st Qu.:2.612  
 Median :12.50   N.blank  : 0   Median :32.50   Median :3.250  
 Mean   :12.50   Min.nchar: 2   Mean   :35.33   Mean   :3.169  
 3rd Qu.:18.25   Max.nchar: 3   3rd Qu.:55.25   3rd Qu.:3.962  
 Max.   :24.00                  Max.   :83.00   Max.   :4.650  
   cloudcover       prewetness         echomotion    rainfall     
 Min.   : 2.200   Min.   :0.0180   Length   :24   Min.   : 0.280  
 1st Qu.: 3.750   1st Qu.:0.1405   N.unique : 2   1st Qu.: 2.342  
 Median : 5.250   Median :0.2220   N.blank  : 0   Median : 4.335  
 Mean   : 7.246   Mean   :0.3271   Min.nchar: 6   Mean   : 4.403  
 3rd Qu.: 7.175   3rd Qu.:0.3297   Max.nchar:10   3rd Qu.: 5.575  
 Max.   :37.900   Max.   :1.2670                  Max.   :12.850

# 1. Split data into Seeding and Non-Seeding groups
seed_yes <- subset(clouds, seeding == "yes")
seed_no <- subset(clouds, seeding == "no")

# 2. Summary statistics (Mean and Standard Deviation)
mean(seed_yes$rainfall)

[1] 4.634167

sd(seed_yes$rainfall)

[1] 2.776841

mean(seed_no$rainfall)

[1] 4.171667

sd(seed_no$rainfall)

[1] 3.519196

# 3. Visualization: Boxplot
boxplot(rainfall ~ seeding, data = clouds, 
        main = "Rainfall Distribution by Seeding Status",
        xlab = "Seeding", ylab = "Rainfall", 
        col = c("lightcoral", "lightskyblue"))

# 4. T-test (Independent two-sample t-test)
t.test(rainfall ~ seeding, data = clouds)


    Welch Two Sample t-test

data:  rainfall by seeding
t = -0.3574, df = 20.871, p-value = 0.7244
alternative hypothesis: true difference in means between group no and group yes is not equal to 0
95 percent confidence interval:
 -3.154691  2.229691
sample estimates:
 mean in group no mean in group yes 
         4.171667          4.634167

The seeding group had a slightly higher mean rainfall (4.63) than the non-seeding group (4.17), while the non-seeding group showed greater variability (SD = 3.52 vs. 2.78). The boxplot revealed substantial overlap between the two distributions. A t-test found no significant difference in rainfall between the groups (t = -0.357, p = 0.724). Therefore, the data do not provide evidence that cloud seeding significantly increases rainfall.

2. Multiple Linear Regression Model

# 1. Build the Multiple Linear Regression Model
model_clouds <- lm(rainfall ~ seeding + cloudcover + prewetness + echomotion + sne, data = clouds)

# 2. Check Coefficients and Model Fit
summary(model_clouds)


Call:
lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion + 
    sne, data = clouds)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.1158 -1.7078 -0.2422  1.3368  6.4827 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)  
(Intercept)           6.37680    2.43432   2.620   0.0174 *
seedingyes            1.12011    1.20725   0.928   0.3658  
cloudcover            0.01821    0.11508   0.158   0.8761  
prewetness            2.55109    2.70090   0.945   0.3574  
echomotionstationary  2.59855    1.54090   1.686   0.1090  
sne                  -1.27530    0.68015  -1.875   0.0771 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.855 on 18 degrees of freedom
Multiple R-squared:  0.3403,    Adjusted R-squared:  0.157 
F-statistic: 1.857 on 5 and 18 DF,  p-value: 0.1524

# 3. Sequential ANOVA to check variable importance
anova(model_clouds)

Analysis of Variance Table

Response: rainfall
           Df  Sum Sq Mean Sq F value  Pr(>F)  
seeding     1   1.283  1.2834  0.1575 0.69613  
cloudcover  1  15.738 15.7377  1.9313 0.18157  
prewetness  1   0.003  0.0027  0.0003 0.98557  
echomotion  1  29.985 29.9853  3.6798 0.07108 .
sne         1  28.649 28.6491  3.5158 0.07711 .
Residuals  18 146.677  8.1487                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The multiple regression model explains about 34% of the variation in rainfall (R² = 0.3403), but the overall model is not statistically significant (p = 0.152). According to both the coefficient table and the ANOVA results, echomotion and sne appear to be the most important predictors of rainfall, with the largest F-values and sums of squares. Seeding, cloudcover, and prewetness show little evidence of influencing rainfall. Therefore, echomotion and sne appear to have the strongest effects on rainfall, while cloud seeding itself does not appear to significantly increase rainfall in this dataset.

3. Suitability Index (sne) Models

# 1. Build separate models for Seeding (Yes) and No Seeding (No)
model_yes <- lm(rainfall ~ sne, data = seed_yes)
model_no <- lm(rainfall ~ sne, data = seed_no)

# 2. Check the coefficients of both models
summary(model_yes)


Call:
lm(formula = rainfall ~ sne, data = seed_yes)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0134 -1.3297 -0.3276  0.6171  4.3867 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  12.0202     2.9774   4.037  0.00237 **
sne          -2.2180     0.8722  -2.543  0.02921 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.27 on 10 degrees of freedom
Multiple R-squared:  0.3927,    Adjusted R-squared:  0.332 
F-statistic: 6.467 on 1 and 10 DF,  p-value: 0.02921

summary(model_no)


Call:
lm(formula = rainfall ~ sne, data = seed_no)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4892 -2.1762  0.2958  1.4902  7.3616 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)    7.319      3.160   2.317    0.043 *
sne           -1.046      0.995  -1.052    0.318  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.502 on 10 degrees of freedom
Multiple R-squared:  0.09957,   Adjusted R-squared:  0.009528 
F-statistic: 1.106 on 1 and 10 DF,  p-value: 0.3177

# 3. Produce a figure showing the two models together
plot(rainfall ~ sne, data = clouds, 
     col = ifelse(seeding == "yes", "blue", "red"), 
     pch = 16, main = "Rainfall vs SNE by Seeding Status",
     xlab = "SNE (Suitability Criterion)", ylab = "Rainfall")

# Add the regression lines for each model
abline(model_yes, col = "blue", lwd = 2) # Seeding Yes Line
abline(model_no, col = "red", lwd = 2)   # Seeding No Line

# Add a simple legend
legend("topright", legend = c("Seeding (Yes)", "No Seeding (No)"), 
       col = c("blue", "red"), lwd = 2)

Both models show a negative relationship between SNE and rainfall. The seeding model has a steeper slope (−2.22) than the non-seeding model (−1.05), indicating a stronger effect of SNE on rainfall when cloud seeding is used. The SNE coefficient is statistically significant in the seeding model (p = 0.029) but not in the non-seeding model (p = 0.318). This suggests that rainfall is more sensitive to changes in SNE during seeded experiments than during non-seeded experiments.