First I imported clouds.csv and looked at its structure
and summary statistics. Then I computed the descriptive statistics and
created boxplots to compare rainfall by seeding status. After that I
performed a two-sample t-test to assess mean differences between seeded
and non-seeded clouds. Once I was done with that, I fitted a multiple
linear regression with all predictors and inspected coefficients and
ANOVA results. Then I built separate simple regressions of rainfall v.s.
sne for seeded and non-seeded subsets. Finally, I plotted fitted lines
to show the relationship between suitability and rainfall by seeding
status.
clouds <- read.csv("clouds.csv", stringsAsFactors = TRUE)
stats <- clouds %>%
group_by(seeding) %>%
summarize(
n = n(),
mean = mean(rainfall),
sd = sd(rainfall),
median = median(rainfall),
IQR = IQR(rainfall)
)
stats
## # A tibble: 2 × 6
## seeding n mean sd median IQR
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 no 12 4.17 3.52 4.06 4.76
## 2 yes 12 4.63 2.78 4.53 2.78
Seeded clouds have a slightly higher average rainfall (0.4625 10^8 m^3) and lower variability (SD 2.78 vs. 3.52).
ggplot(clouds, aes(x = seeding, y = rainfall)) +
geom_boxplot() +
labs(
x = "Seeding Status",
y = "Rainfall (×10^8 m^3)",
title = "Rainfall by Seeding Status"
)
The boxplot shows overlapping distributions; medians are similar (4.06 vs. 4.525) and the interquartile range is wider for non-seeded.
t_res <- t.test(rainfall ~ seeding, data = clouds)
t_res
##
## Welch Two Sample t-test
##
## data: rainfall by seeding
## t = -0.3574, df = 20.871, p-value = 0.7244
## alternative hypothesis: true difference in means between group no and group yes is not equal to 0
## 95 percent confidence interval:
## -3.154691 2.229691
## sample estimates:
## mean in group no mean in group yes
## 4.171667 4.634167
Since p = 0.7244 > 0.05, we conclude there is no significant difference in mean rainfall between seeded and non-seeded clouds.
clouds$echomotion <- relevel(clouds$echomotion, ref = "stationary")
model_all <- lm(rainfall ~ seeding + cloudcover + prewetness + echomotion + sne, data = clouds)
summary(model_all)
##
## Call:
## lm(formula = rainfall ~ seeding + cloudcover + prewetness + echomotion +
## sne, data = clouds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1158 -1.7078 -0.2422 1.3368 6.4827
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.97535 2.43913 3.680 0.00171 **
## seedingyes 1.12011 1.20725 0.928 0.36578
## cloudcover 0.01821 0.11508 0.158 0.87606
## prewetness 2.55109 2.70090 0.945 0.35741
## echomotionmoving -2.59855 1.54090 -1.686 0.10898
## sne -1.27530 0.68015 -1.875 0.07711 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.855 on 18 degrees of freedom
## Multiple R-squared: 0.3403, Adjusted R-squared: 0.157
## F-statistic: 1.857 on 5 and 18 DF, p-value: 0.1524
anova(model_all)
## Analysis of Variance Table
##
## Response: rainfall
## Df Sum Sq Mean Sq F value Pr(>F)
## seeding 1 1.283 1.2834 0.1575 0.69613
## cloudcover 1 15.738 15.7377 1.9313 0.18157
## prewetness 1 0.003 0.0027 0.0003 0.98557
## echomotion 1 29.985 29.9853 3.6798 0.07108 .
## sne 1 28.649 28.6491 3.5158 0.07711 .
## Residuals 18 146.677 8.1487
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
None of the predictors reach α = 0.05, though echomotion (p = 0.7108) and sne (p = 0.07711) are the closest.
echomotion and sne explain the most variance in rainfall with the highest F-values, but neither is statistically significant at α = 0.05.
m_no <- lm(rainfall ~ sne, data = filter(clouds, seeding == "no"))
m_yes <- lm(rainfall ~ sne, data = filter(clouds, seeding == "yes"))
coef_no <- coef(summary(m_no))
coef_yes <- coef(summary(m_yes))
coef_no; coef_yes
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.319500 3.1595671 2.316615 0.04302009
## sne -1.046371 0.9950467 -1.051580 0.31773971
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.020237 2.977439 4.037106 0.002372239
## sne -2.218039 0.872211 -2.543008 0.029212123
Only the seeded slope is significant. The more negative slope indicates a stronger inverse relationship between suitability and rainfall when seeding is applied.
ggplot(clouds, aes(x = sne, y = rainfall, color = seeding)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(
x = "Suitability Index (sne)",
y = "Rainfall (10^8 m^3)",
title = "Rainfall vs. sne by Seeding Status"
)
## `geom_smooth()` using formula = 'y ~ x'
The regression line for seeded clouds has a steeper negative slope (-2.218) compared to non-seeded (-1.046), showing that higher sne is associated with larger decreases in rainfall when seeding is used.
Overall, seeding does not alter mean rainfall but changes how suitability impacts rainfall outcomes.