1. After having exploring your dataset over the past few weeks, you should already have some questions.

  2. Devise at least two different null hypotheses based on two different aspects (e.g., columns) of your data. For each hypothesis:

    • Come up with an alpha level, power level, and minimum effect size, and explain why you chose each value.
    • Determine if you have enough data to perform a Neyman-Pearson hypothesis test. If you do, perform one and interpret results. If not, explain why.
    • Perform a Fisher’s style test for significance, and interpret the p-value.
    • So, technically, you have two hypothesis tests for each hypothesis, equating two four total tests.
  3. Build two visualizations that best illustrate the results from the two pairs of hypothesis tests, one for each null hypothesis.

Hypothesis Testing

Hypothesis 1: Temperature and Humidity Relationship

Null Hypothesis (H0): There is no correlation between temperature and humidity. Alternative Hypothesis (H1): There is a significant correlation between temperature and humidity.

We hypothesize that there is a significant correlation between temperature and humidity.

Setting Alpha, Power, and Effect Size

For both hypotheses: - Alpha Level (α): 0.05. This is a standard choice, indicating a 5% risk of rejecting the null hypothesis when it is true. - Power Level: 0.8. This means we have an 80% chance of correctly rejecting the null hypothesis when it is false. - Minimum Effect Size: This will depend on the specific nature of your data and what you consider to be a practically significant difference. For simplicity, let’s assume a medium effect size.

Neyman-Pearson Hypothesis Test (Linear Regression)

## 
## Call:
## lm(formula = humidity ~ temp, data = weather_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -70.993 -10.122   5.498  14.734  30.005 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 89.66068    1.43002   62.70   <2e-16 ***
## temp        -0.72836    0.06109  -11.92   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.92 on 2532 degrees of freedom
## Multiple R-squared:  0.05316,    Adjusted R-squared:  0.05278 
## F-statistic: 142.1 on 1 and 2532 DF,  p-value: < 2.2e-16
  • The regression model indicates a negative relationship between temperature and humidity.
  • Coefficient for temperature: -0.7284, suggesting that for every 1 degree Celsius increase in temperature, humidity decreases by approximately 0.73%.
  • The model has an R-squared value of 0.053, indicating that about 5.3% of the variability in humidity is explained by temperature.
  • The F-statistic’s p-value is significantly low (<< 0.05), suggesting that the relationship between temperature and humidity is statistically significant.

Fisher’s Style Test (Pearson Correlation)

## 
##  Pearson's product-moment correlation
## 
## data:  temp and humidity
## t = -11.923, df = 2532, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2670977 -0.1933538
## sample estimates:
##        cor 
## -0.2305568
  • Correlation Coefficient: -0.231, indicating a weak negative correlation between temperature and humidity.
  • P-value: Approximately 6.35e-32, which is significantly below the alpha level of 0.05, suggesting that the correlation between temperature and humidity is statistically significant.

Visualization

The scatter plot with the regression line visually demonstrates the negative relationship between temperature and humidity.

Hypothesis 2: Wind Speed and Visibility

Null Hypothesis (H0): Wind speed does not affect visibility. Alternative Hypothesis (H1): Higher wind speeds significantly reduce visibility.

We hypothesize that wind speed affects visibility.

Neyman-Pearson Hypothesis Test (Linear Regression)

## 
## Call:
## lm(formula = visibility ~ wind_speed, data = weather_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.4809   0.0285   0.1984   0.2738  22.2606 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.611067   0.088749 108.294  < 2e-16 ***
## wind_speed  0.018868   0.006902   2.734  0.00631 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.551 on 2532 degrees of freedom
## Multiple R-squared:  0.002942,   Adjusted R-squared:  0.002549 
## F-statistic: 7.472 on 1 and 2532 DF,  p-value: 0.006309
  • The regression model shows a very slight positive relationship between wind speed and visibility.
  • Coefficient for wind speed: 0.0189, indicating that for each 1 KPH increase in wind speed, visibility increases by approximately 0.019 KM.
  • The model has an R-squared value of 0.003, suggesting that only about 0.3% of the variability in visibility is explained by wind speed.
  • The F-statistic’s p-value is 0.00631, which is below the alpha level of 0.05, indicating that the relationship, although very weak, is statistically significant.

Fisher’s Style Test (Pearson Correlation)

## 
##  Pearson's product-moment correlation
## 
## data:  wind_speed and visibility
## t = 2.7336, df = 2532, p-value = 0.006309
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01533817 0.09298693
## sample estimates:
##        cor 
## 0.05424456
  • Correlation Coefficient: 0.0542, indicating a very weak positive correlation between wind speed and visibility.
  • P-value: Approximately 0.00631, which is below the alpha level of 0.05, suggesting that there is a statistically significant, though weak, correlation between wind speed and visibility.

Visualization

The scatter plot with the regression line shows the weak positive relationship between wind speed and visibility.

Conclusion

The analysis of the weather dataset reveals significant relationships between temperature and humidity, and wind speed and visibility. Hypothesis 1 reveals a weak negative correlation between temperature and humidity, while Hypothesis 2 indicates a very weak positive correlation between wind speed and visibility. While the relationships are statistically significant, they are relatively weak.