After having exploring your dataset over the past few weeks, you should already have some questions.
Devise at least two different null hypotheses based on two different aspects (e.g., columns) of your data. For each hypothesis:
Build two visualizations that best illustrate the results from the two pairs of hypothesis tests, one for each null hypothesis.
Null Hypothesis (H0): There is no correlation between temperature and humidity. Alternative Hypothesis (H1): There is a significant correlation between temperature and humidity.
We hypothesize that there is a significant correlation between temperature and humidity.
Setting Alpha, Power, and Effect Size
For both hypotheses: - Alpha Level (α): 0.05. This is a standard choice, indicating a 5% risk of rejecting the null hypothesis when it is true. - Power Level: 0.8. This means we have an 80% chance of correctly rejecting the null hypothesis when it is false. - Minimum Effect Size: This will depend on the specific nature of your data and what you consider to be a practically significant difference. For simplicity, let’s assume a medium effect size.
##
## Call:
## lm(formula = humidity ~ temp, data = weather_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.993 -10.122 5.498 14.734 30.005
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 89.66068 1.43002 62.70 <2e-16 ***
## temp -0.72836 0.06109 -11.92 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.92 on 2532 degrees of freedom
## Multiple R-squared: 0.05316, Adjusted R-squared: 0.05278
## F-statistic: 142.1 on 1 and 2532 DF, p-value: < 2.2e-16
##
## Pearson's product-moment correlation
##
## data: temp and humidity
## t = -11.923, df = 2532, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2670977 -0.1933538
## sample estimates:
## cor
## -0.2305568
The scatter plot with the regression line visually demonstrates the
negative relationship between temperature and humidity.
Null Hypothesis (H0): Wind speed does not affect visibility. Alternative Hypothesis (H1): Higher wind speeds significantly reduce visibility.
We hypothesize that wind speed affects visibility.
##
## Call:
## lm(formula = visibility ~ wind_speed, data = weather_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.4809 0.0285 0.1984 0.2738 22.2606
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.611067 0.088749 108.294 < 2e-16 ***
## wind_speed 0.018868 0.006902 2.734 0.00631 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.551 on 2532 degrees of freedom
## Multiple R-squared: 0.002942, Adjusted R-squared: 0.002549
## F-statistic: 7.472 on 1 and 2532 DF, p-value: 0.006309
##
## Pearson's product-moment correlation
##
## data: wind_speed and visibility
## t = 2.7336, df = 2532, p-value = 0.006309
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.01533817 0.09298693
## sample estimates:
## cor
## 0.05424456
The scatter plot with the regression line shows the weak positive
relationship between wind speed and visibility.
The analysis of the weather dataset reveals significant relationships between temperature and humidity, and wind speed and visibility. Hypothesis 1 reveals a weak negative correlation between temperature and humidity, while Hypothesis 2 indicates a very weak positive correlation between wind speed and visibility. While the relationships are statistically significant, they are relatively weak.