1. Select a continuous (or ordered integer) column of data that seems most “valuable” given the context of your data, and call this your response variable.
  • For example, in the Ames housing data, the price of the house is likely of the most value to both buyers and sellers. This is the thing most people will ask about when it comes to houses.
  1. Select a categorical column of data (explanatory variable) that you expect might influence the response variable.
  • Devise a null hypothesis for an ANOVA test given this situation. Test this hypothesis using ANOVA, and summarize your results. Be clear about how the R output relates to your conclusions.
  • If there are more than 10 categories, consider consolidating them before running the test using the methods we’ve learned in class.
  • Explain what this might mean for people who may be interested in your data. E.g., “there is not enough evidence to conclude [—-], so it would be safe to assume that we can [——]”.
  1. Find at least one other continuous (or ordered integer) column of data that might influence the response variable. Make sure the relationship between this variable and the response is roughly linear.
  • Build a linear regression model of the response using just this column, and evaluate its fit.
  • Run appropriate hypothesis tests and summarize their results. Use diagnostic plots to identify any issues with your model.
  • Interpret the coefficients of your model, and explain how they relate to the context of your data. For example, can you make any recommendations about an optimal way of doing something?
  1. Include at least one other variable into your regression model (e.g., you might use the one from the ANOVA), and evaluate how it helps (or doesn’t).
  • Maybe include an interaction term, but explain why you included it.
  • You can add up to 4 variables if you like.

For each of the above tasks, you must explain to the reader what insight was gathered, its significance, and any further questions you have which might need to be further investigated.


Selecting a Response (Continuous/Ordered Integer) and Explanatory Categorical Variable

We will perform our analysis on different weather conditions (condition_text) and how they effect temperature (temperature_celsius).

Formulating a null hypothesis for an ANOVA test

Examine if different weather conditions (condition_text) significantly affect the average temperature (temperature_celsius)

Null Hypothesis: Weather conditions have no effect on the average temperature. Alternate Hypothesis: Weather conditions do have an effect on the average temperature.

##                          Df Sum Sq Mean Sq F value Pr(>F)    
## factor(condition_text)   21   8940   425.7   10.99 <2e-16 ***
## Residuals              2512  97331    38.7                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results of the ANOVA test are as follows:

  • Sum of Squares for Condition Text (sum_sq): \(8939.96\)
  • Degrees of Freedom for Condition Text (df): \(21\)
  • F-Statistic (F): \(10.99\)
  • P-Value (PR(>F)): \(1.97 \times 10^{-35}\)

Observations:

  • The extremely low P-value (\(1.97 \times 10^{-35}\)) suggests that there is a statistically significant difference in average temperatures across different weather conditions.
  • This result leads us to reject the null hypothesis that weather condition has no effect on average temperature. It indicates that different weather conditions do indeed affect the average temperature significantly.

Identifying another Continuous Variable with Linear Relationship

For the linear regression model with temperature_celsius as the dependent variable we will select air_quality_PM2.5 as the independent variable.

## 
## Call:
## lm(formula = temperature_celsius ~ air_quality_PM2.5, data = weather_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.5595  -4.6709   0.8365   4.9909  22.3935 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       22.458056   0.137562 163.257   <2e-16 ***
## air_quality_PM2.5  0.001797   0.002410   0.746    0.456    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.478 on 2532 degrees of freedom
## Multiple R-squared:  0.0002196,  Adjusted R-squared:  -0.0001752 
## F-statistic: 0.5563 on 1 and 2532 DF,  p-value: 0.4558
## 
## Call:
## lm(formula = temperature_celsius ~ air_quality_PM2.5, data = weather_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.5595  -4.6709   0.8365   4.9909  22.3935 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       22.458056   0.137562 163.257   <2e-16 ***
## air_quality_PM2.5  0.001797   0.002410   0.746    0.456    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.478 on 2532 degrees of freedom
## Multiple R-squared:  0.0002196,  Adjusted R-squared:  -0.0001752 
## F-statistic: 0.5563 on 1 and 2532 DF,  p-value: 0.4558
## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = temperature_celsius ~ air_quality_PM2.5, data = weather_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.5595  -4.6709   0.8365   4.9909  22.3935 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       22.458056   0.137562 163.257   <2e-16 ***
## air_quality_PM2.5  0.001797   0.002410   0.746    0.456    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.478 on 2532 degrees of freedom
## Multiple R-squared:  0.0002196,  Adjusted R-squared:  -0.0001752 
## F-statistic: 0.5563 on 1 and 2532 DF,  p-value: 0.4558

The results of the Linear Regression are as follows:

  • R-squared: \(0.000\), indicating that the model explains none of the variability of the response data around its mean.
  • F-statistic: \(0.5563\), with a P-value of \(0.456\), suggesting that the model is not statistically significant.
  • Coefficients:
    • Constant (Intercept): \(22.4581\), indicating the average temperature when PM2.5 is zero.
    • air_quality_PM2.5: \(0.0018\), but with a P-value of \(0.456\), indicating that the effect of air quality (PM2.5) on temperature is not statistically significant.

Observations:

  • The model does not provide strong evidence to suggest a significant linear relationship between air quality (PM2.5) and temperature (Celsius).
  • Given the high P-value for the air_quality_PM2.5 coefficient and the low R-squared value, it’s clear that this model doesn’t effectively explain variations in temperature based on air quality PM2.5 levels.

Optimal way using Polynomial Regression?

## 
## Call:
## lm(formula = temperature_celsius ~ air_quality_PM2.5 + I(air_quality_PM2.5^2), 
##     data = weather_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.366  -4.777   0.719   4.861  21.567 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             2.225e+01  1.512e-01 147.149  < 2e-16 ***
## air_quality_PM2.5       1.670e-02  5.189e-03   3.218  0.00131 ** 
## I(air_quality_PM2.5^2) -2.924e-05  9.022e-06  -3.241  0.00121 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.466 on 2531 degrees of freedom
## Multiple R-squared:  0.004351,   Adjusted R-squared:  0.003564 
## F-statistic:  5.53 on 2 and 2531 DF,  p-value: 0.004015

Observations

  • I(air_quality_PM2.5^2) (\(\beta_2\)): \(-2.924 \times 10^{-5}\) indicates the non-linear effect of PM2.5 on temperature. The negative coefficient implies a downward trend after a certain point.
  • The low R-squared value suggests that the model might not fully capture the relationship between temperature and air quality (PM2.5)
  • With this test we can consider that there could be other factors influencing temperature that are not accounted for in this model.

Introducing an additional variable into the Regression Model

## 
## Call:
## lm(formula = temperature_celsius ~ air_quality_PM2.5 * factor(condition_text), 
##     data = weather_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.1930  -4.3248   0.5969   4.5097  20.4847 
## 
## Coefficients: (2 not defined because of singularities)
##                                                                               Estimate
## (Intercept)                                                                  20.271639
## air_quality_PM2.5                                                             0.019995
## factor(condition_text)Cloudy                                                 -1.722313
## factor(condition_text)Fog                                                    -7.184122
## factor(condition_text)Heavy rain                                              3.704204
## factor(condition_text)Heavy rain at times                                    13.400701
## factor(condition_text)Light drizzle                                          -9.265671
## factor(condition_text)Light rain                                              1.737184
## factor(condition_text)Light rain shower                                       3.215193
## factor(condition_text)Mist                                                    2.191950
## factor(condition_text)Moderate or heavy rain shower                           3.754595
## factor(condition_text)Moderate or heavy rain with thunder                     3.795936
## factor(condition_text)Moderate rain                                          -0.636838
## factor(condition_text)Moderate rain at times                                -11.760291
## factor(condition_text)Overcast                                                1.406537
## factor(condition_text)Partly cloudy                                           4.153278
## factor(condition_text)Patchy light drizzle                                   -2.897606
## factor(condition_text)Patchy light rain                                       4.664378
## factor(condition_text)Patchy light rain with thunder                          5.988027
## factor(condition_text)Patchy rain possible                                    2.840489
## factor(condition_text)Sunny                                                   2.383336
## factor(condition_text)Thundery outbreaks possible                             5.221131
## factor(condition_text)Torrential rain shower                                  5.919104
## air_quality_PM2.5:factor(condition_text)Cloudy                                0.114687
## air_quality_PM2.5:factor(condition_text)Fog                                   0.034967
## air_quality_PM2.5:factor(condition_text)Heavy rain                           -0.249830
## air_quality_PM2.5:factor(condition_text)Heavy rain at times                  -2.402973
## air_quality_PM2.5:factor(condition_text)Light drizzle                         0.190373
## air_quality_PM2.5:factor(condition_text)Light rain                           -0.018484
## air_quality_PM2.5:factor(condition_text)Light rain shower                    -0.033577
## air_quality_PM2.5:factor(condition_text)Mist                                 -0.017868
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain shower        -0.292910
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain with thunder  -0.036616
## air_quality_PM2.5:factor(condition_text)Moderate rain                        -0.057087
## air_quality_PM2.5:factor(condition_text)Moderate rain at times                0.777919
## air_quality_PM2.5:factor(condition_text)Overcast                             -0.054563
## air_quality_PM2.5:factor(condition_text)Partly cloudy                        -0.031817
## air_quality_PM2.5:factor(condition_text)Patchy light drizzle                        NA
## air_quality_PM2.5:factor(condition_text)Patchy light rain                           NA
## air_quality_PM2.5:factor(condition_text)Patchy light rain with thunder       -0.166316
## air_quality_PM2.5:factor(condition_text)Patchy rain possible                 -0.058718
## air_quality_PM2.5:factor(condition_text)Sunny                                 0.002527
## air_quality_PM2.5:factor(condition_text)Thundery outbreaks possible          -0.238588
## air_quality_PM2.5:factor(condition_text)Torrential rain shower               -1.646088
##                                                                             Std. Error
## (Intercept)                                                                   0.243997
## air_quality_PM2.5                                                             0.006587
## factor(condition_text)Cloudy                                                  2.180961
## factor(condition_text)Fog                                                     1.602128
## factor(condition_text)Heavy rain                                              3.687890
## factor(condition_text)Heavy rain at times                                    10.547545
## factor(condition_text)Light drizzle                                           5.103785
## factor(condition_text)Light rain                                              0.877597
## factor(condition_text)Light rain shower                                       0.887078
## factor(condition_text)Mist                                                    0.970734
## factor(condition_text)Moderate or heavy rain shower                           2.763893
## factor(condition_text)Moderate or heavy rain with thunder                     1.283121
## factor(condition_text)Moderate rain                                           1.667590
## factor(condition_text)Moderate rain at times                                  8.606456
## factor(condition_text)Overcast                                                0.796875
## factor(condition_text)Partly cloudy                                           0.320873
## factor(condition_text)Patchy light drizzle                                    6.194774
## factor(condition_text)Patchy light rain                                       6.195086
## factor(condition_text)Patchy light rain with thunder                          1.780255
## factor(condition_text)Patchy rain possible                                    0.970894
## factor(condition_text)Sunny                                                   0.639261
## factor(condition_text)Thundery outbreaks possible                             7.589107
## factor(condition_text)Torrential rain shower                                  7.050596
## air_quality_PM2.5:factor(condition_text)Cloudy                                0.043046
## air_quality_PM2.5:factor(condition_text)Fog                                   0.061453
## air_quality_PM2.5:factor(condition_text)Heavy rain                            0.113097
## air_quality_PM2.5:factor(condition_text)Heavy rain at times                   1.862763
## air_quality_PM2.5:factor(condition_text)Light drizzle                         0.234618
## air_quality_PM2.5:factor(condition_text)Light rain                            0.033925
## air_quality_PM2.5:factor(condition_text)Light rain shower                     0.024187
## air_quality_PM2.5:factor(condition_text)Mist                                  0.007423
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain shower         0.172055
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain with thunder   0.027022
## air_quality_PM2.5:factor(condition_text)Moderate rain                         0.041082
## air_quality_PM2.5:factor(condition_text)Moderate rain at times                0.906440
## air_quality_PM2.5:factor(condition_text)Overcast                              0.037697
## air_quality_PM2.5:factor(condition_text)Partly cloudy                         0.008502
## air_quality_PM2.5:factor(condition_text)Patchy light drizzle                        NA
## air_quality_PM2.5:factor(condition_text)Patchy light rain                           NA
## air_quality_PM2.5:factor(condition_text)Patchy light rain with thunder        0.094432
## air_quality_PM2.5:factor(condition_text)Patchy rain possible                  0.035627
## air_quality_PM2.5:factor(condition_text)Sunny                                 0.011776
## air_quality_PM2.5:factor(condition_text)Thundery outbreaks possible           0.318171
## air_quality_PM2.5:factor(condition_text)Torrential rain shower                2.046957
##                                                                             t value
## (Intercept)                                                                  83.081
## air_quality_PM2.5                                                             3.035
## factor(condition_text)Cloudy                                                 -0.790
## factor(condition_text)Fog                                                    -4.484
## factor(condition_text)Heavy rain                                              1.004
## factor(condition_text)Heavy rain at times                                     1.271
## factor(condition_text)Light drizzle                                          -1.815
## factor(condition_text)Light rain                                              1.979
## factor(condition_text)Light rain shower                                       3.624
## factor(condition_text)Mist                                                    2.258
## factor(condition_text)Moderate or heavy rain shower                           1.358
## factor(condition_text)Moderate or heavy rain with thunder                     2.958
## factor(condition_text)Moderate rain                                          -0.382
## factor(condition_text)Moderate rain at times                                 -1.366
## factor(condition_text)Overcast                                                1.765
## factor(condition_text)Partly cloudy                                          12.944
## factor(condition_text)Patchy light drizzle                                   -0.468
## factor(condition_text)Patchy light rain                                       0.753
## factor(condition_text)Patchy light rain with thunder                          3.364
## factor(condition_text)Patchy rain possible                                    2.926
## factor(condition_text)Sunny                                                   3.728
## factor(condition_text)Thundery outbreaks possible                             0.688
## factor(condition_text)Torrential rain shower                                  0.840
## air_quality_PM2.5:factor(condition_text)Cloudy                                2.664
## air_quality_PM2.5:factor(condition_text)Fog                                   0.569
## air_quality_PM2.5:factor(condition_text)Heavy rain                           -2.209
## air_quality_PM2.5:factor(condition_text)Heavy rain at times                  -1.290
## air_quality_PM2.5:factor(condition_text)Light drizzle                         0.811
## air_quality_PM2.5:factor(condition_text)Light rain                           -0.545
## air_quality_PM2.5:factor(condition_text)Light rain shower                    -1.388
## air_quality_PM2.5:factor(condition_text)Mist                                 -2.407
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain shower        -1.702
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain with thunder  -1.355
## air_quality_PM2.5:factor(condition_text)Moderate rain                        -1.390
## air_quality_PM2.5:factor(condition_text)Moderate rain at times                0.858
## air_quality_PM2.5:factor(condition_text)Overcast                             -1.447
## air_quality_PM2.5:factor(condition_text)Partly cloudy                        -3.742
## air_quality_PM2.5:factor(condition_text)Patchy light drizzle                     NA
## air_quality_PM2.5:factor(condition_text)Patchy light rain                        NA
## air_quality_PM2.5:factor(condition_text)Patchy light rain with thunder       -1.761
## air_quality_PM2.5:factor(condition_text)Patchy rain possible                 -1.648
## air_quality_PM2.5:factor(condition_text)Sunny                                 0.215
## air_quality_PM2.5:factor(condition_text)Thundery outbreaks possible          -0.750
## air_quality_PM2.5:factor(condition_text)Torrential rain shower               -0.804
##                                                                             Pr(>|t|)
## (Intercept)                                                                  < 2e-16
## air_quality_PM2.5                                                           0.002427
## factor(condition_text)Cloudy                                                0.429776
## factor(condition_text)Fog                                                   7.65e-06
## factor(condition_text)Heavy rain                                            0.315272
## factor(condition_text)Heavy rain at times                                   0.204024
## factor(condition_text)Light drizzle                                         0.069575
## factor(condition_text)Light rain                                            0.047872
## factor(condition_text)Light rain shower                                     0.000295
## factor(condition_text)Mist                                                  0.024030
## factor(condition_text)Moderate or heavy rain shower                         0.174446
## factor(condition_text)Moderate or heavy rain with thunder                   0.003122
## factor(condition_text)Moderate rain                                         0.702575
## factor(condition_text)Moderate rain at times                                0.171921
## factor(condition_text)Overcast                                              0.077675
## factor(condition_text)Partly cloudy                                          < 2e-16
## factor(condition_text)Patchy light drizzle                                  0.640004
## factor(condition_text)Patchy light rain                                     0.451572
## factor(condition_text)Patchy light rain with thunder                        0.000781
## factor(condition_text)Patchy rain possible                                  0.003469
## factor(condition_text)Sunny                                                 0.000197
## factor(condition_text)Thundery outbreaks possible                           0.491531
## factor(condition_text)Torrential rain shower                                0.401259
## air_quality_PM2.5:factor(condition_text)Cloudy                              0.007765
## air_quality_PM2.5:factor(condition_text)Fog                                 0.569407
## air_quality_PM2.5:factor(condition_text)Heavy rain                          0.027266
## air_quality_PM2.5:factor(condition_text)Heavy rain at times                 0.197169
## air_quality_PM2.5:factor(condition_text)Light drizzle                       0.417204
## air_quality_PM2.5:factor(condition_text)Light rain                          0.585910
## air_quality_PM2.5:factor(condition_text)Light rain shower                   0.165182
## air_quality_PM2.5:factor(condition_text)Mist                                0.016154
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain shower       0.088801
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain with thunder 0.175518
## air_quality_PM2.5:factor(condition_text)Moderate rain                       0.164784
## air_quality_PM2.5:factor(condition_text)Moderate rain at times              0.390857
## air_quality_PM2.5:factor(condition_text)Overcast                            0.147909
## air_quality_PM2.5:factor(condition_text)Partly cloudy                       0.000187
## air_quality_PM2.5:factor(condition_text)Patchy light drizzle                      NA
## air_quality_PM2.5:factor(condition_text)Patchy light rain                         NA
## air_quality_PM2.5:factor(condition_text)Patchy light rain with thunder      0.078322
## air_quality_PM2.5:factor(condition_text)Patchy rain possible                0.099457
## air_quality_PM2.5:factor(condition_text)Sunny                               0.830109
## air_quality_PM2.5:factor(condition_text)Thundery outbreaks possible         0.453402
## air_quality_PM2.5:factor(condition_text)Torrential rain shower              0.421379
##                                                                                
## (Intercept)                                                                 ***
## air_quality_PM2.5                                                           ** 
## factor(condition_text)Cloudy                                                   
## factor(condition_text)Fog                                                   ***
## factor(condition_text)Heavy rain                                               
## factor(condition_text)Heavy rain at times                                      
## factor(condition_text)Light drizzle                                         .  
## factor(condition_text)Light rain                                            *  
## factor(condition_text)Light rain shower                                     ***
## factor(condition_text)Mist                                                  *  
## factor(condition_text)Moderate or heavy rain shower                            
## factor(condition_text)Moderate or heavy rain with thunder                   ** 
## factor(condition_text)Moderate rain                                            
## factor(condition_text)Moderate rain at times                                   
## factor(condition_text)Overcast                                              .  
## factor(condition_text)Partly cloudy                                         ***
## factor(condition_text)Patchy light drizzle                                     
## factor(condition_text)Patchy light rain                                        
## factor(condition_text)Patchy light rain with thunder                        ***
## factor(condition_text)Patchy rain possible                                  ** 
## factor(condition_text)Sunny                                                 ***
## factor(condition_text)Thundery outbreaks possible                              
## factor(condition_text)Torrential rain shower                                   
## air_quality_PM2.5:factor(condition_text)Cloudy                              ** 
## air_quality_PM2.5:factor(condition_text)Fog                                    
## air_quality_PM2.5:factor(condition_text)Heavy rain                          *  
## air_quality_PM2.5:factor(condition_text)Heavy rain at times                    
## air_quality_PM2.5:factor(condition_text)Light drizzle                          
## air_quality_PM2.5:factor(condition_text)Light rain                             
## air_quality_PM2.5:factor(condition_text)Light rain shower                      
## air_quality_PM2.5:factor(condition_text)Mist                                *  
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain shower       .  
## air_quality_PM2.5:factor(condition_text)Moderate or heavy rain with thunder    
## air_quality_PM2.5:factor(condition_text)Moderate rain                          
## air_quality_PM2.5:factor(condition_text)Moderate rain at times                 
## air_quality_PM2.5:factor(condition_text)Overcast                               
## air_quality_PM2.5:factor(condition_text)Partly cloudy                       ***
## air_quality_PM2.5:factor(condition_text)Patchy light drizzle                   
## air_quality_PM2.5:factor(condition_text)Patchy light rain                      
## air_quality_PM2.5:factor(condition_text)Patchy light rain with thunder      .  
## air_quality_PM2.5:factor(condition_text)Patchy rain possible                .  
## air_quality_PM2.5:factor(condition_text)Sunny                                  
## air_quality_PM2.5:factor(condition_text)Thundery outbreaks possible            
## air_quality_PM2.5:factor(condition_text)Torrential rain shower                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.191 on 2492 degrees of freedom
## Multiple R-squared:  0.1013, Adjusted R-squared:  0.08653 
## F-statistic: 6.852 on 41 and 2492 DF,  p-value: < 2.2e-16

Observations

  • The R-squared value (Multiple R-squared and Adjusted R-squared) of 0.1013 indicates that the model explains about 10.13% of the variance in temperature_celsius, which suggests that this model has limited predictive power.
  • The significance of each coefficient is determined by its p-value. For instance, the asterisks indicate the level of significance: *** for highly significant, ** for significant, * for moderately significant, and so on. condition_text categories, such as ‘Partly cloudy,’ ‘Sunny,’ ‘Patchy rain possible,’ and ‘Moderate or heavy rain with thunder,’ show significance in predicting temperature_celsius.

Conclusion

In conclusion, while certain weather conditions and their interaction with air quality (PM2.5) demonstrate significance in predicting temperature changes, this model might not fully capture the complexity of temperature variations