Overview and Introduction:

Captial Bikeshare is a bicycle sharing system in various cities including Washington, D.C., and it releases a large amount of their data to the public about bike rentals in Washinton, D.C.. In this project, I will explore how different factors such as weather, temperature and date correlate to the variation in the count of rental bikes per hour using the 2011 dataset. I will also use the same dataset to construct and compare several models, both linear and nonlinear, in terms of how well can they predict the future hourly bike rental counts based on given variables.

## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## Parsed with column specification:
## cols(
##   instant = col_integer(),
##   dteday = col_date(format = ""),
##   season = col_integer(),
##   yr = col_integer(),
##   mnth = col_integer(),
##   hr = col_integer(),
##   holiday = col_integer(),
##   weekday = col_integer(),
##   workingday = col_integer(),
##   weathersit = col_integer(),
##   temp = col_double(),
##   atemp = col_double(),
##   hum = col_double(),
##   windspeed = col_double(),
##   casual = col_integer(),
##   registered = col_integer(),
##   cnt = col_integer()
## )
## Parsed with column specification:
## cols(
##   instant = col_integer(),
##   dteday = col_date(format = ""),
##   season = col_integer(),
##   yr = col_integer(),
##   mnth = col_integer(),
##   hr = col_integer(),
##   holiday = col_integer(),
##   weekday = col_integer(),
##   workingday = col_integer(),
##   weathersit = col_integer(),
##   temp = col_double(),
##   atemp = col_double(),
##   hum = col_double(),
##   windspeed = col_double()
## )

Dataset

## Warning in format.POSIXlt(as.POSIXlt(x), ...): unknown timezone 'zone/tz/
## 2017c.1.0/zoneinfo/America/New_York'
##     instant         dteday               season            yr   
##  Min.   :   1   Min.   :2011-01-01   Min.   :1.000   Min.   :0  
##  1st Qu.:2162   1st Qu.:2011-04-04   1st Qu.:2.000   1st Qu.:0  
##  Median :4323   Median :2011-07-04   Median :3.000   Median :0  
##  Mean   :4323   Mean   :2011-07-03   Mean   :2.514   Mean   :0  
##  3rd Qu.:6484   3rd Qu.:2011-10-02   3rd Qu.:3.000   3rd Qu.:0  
##  Max.   :8645   Max.   :2011-12-31   Max.   :4.000   Max.   :0  
##       mnth              hr           holiday           weekday     
##  Min.   : 1.000   Min.   : 0.00   Min.   :0.00000   Min.   :0.000  
##  1st Qu.: 4.000   1st Qu.: 6.00   1st Qu.:0.00000   1st Qu.:1.000  
##  Median : 7.000   Median :12.00   Median :0.00000   Median :3.000  
##  Mean   : 6.574   Mean   :11.57   Mean   :0.02765   Mean   :3.013  
##  3rd Qu.:10.000   3rd Qu.:18.00   3rd Qu.:0.00000   3rd Qu.:5.000  
##  Max.   :12.000   Max.   :23.00   Max.   :1.00000   Max.   :6.000  
##    workingday       weathersit         temp            atemp       
##  Min.   :0.0000   Min.   :1.000   Min.   :0.0200   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:1.000   1st Qu.:0.3200   1st Qu.:0.3182  
##  Median :1.0000   Median :1.000   Median :0.5000   Median :0.4848  
##  Mean   :0.6837   Mean   :1.438   Mean   :0.4891   Mean   :0.4690  
##  3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:0.6600   3rd Qu.:0.6212  
##  Max.   :1.0000   Max.   :4.000   Max.   :0.9600   Max.   :1.0000  
##       hum           windspeed          casual        registered   
##  Min.   :0.0000   Min.   :0.0000   Min.   :  0.0   Min.   :  0.0  
##  1st Qu.:0.4900   1st Qu.:0.1045   1st Qu.:  3.0   1st Qu.: 26.0  
##  Median :0.6500   Median :0.1940   Median : 14.0   Median : 90.0  
##  Mean   :0.6434   Mean   :0.1912   Mean   : 28.6   Mean   :115.2  
##  3rd Qu.:0.8100   3rd Qu.:0.2836   3rd Qu.: 38.0   3rd Qu.:168.0  
##  Max.   :1.0000   Max.   :0.8507   Max.   :272.0   Max.   :567.0  
##       cnt       
##  Min.   :  1.0  
##  1st Qu.: 31.0  
##  Median :109.0  
##  Mean   :143.8  
##  3rd Qu.:211.0  
##  Max.   :651.0
##     instant          dteday               season           yr   
##  Min.   : 8646   Min.   :2012-01-01   Min.   :1.00   Min.   :1  
##  1st Qu.:10829   1st Qu.:2012-04-01   1st Qu.:2.00   1st Qu.:1  
##  Median :13012   Median :2012-07-01   Median :2.00   Median :1  
##  Mean   :13012   Mean   :2012-07-01   Mean   :2.49   Mean   :1  
##  3rd Qu.:15196   3rd Qu.:2012-09-30   3rd Qu.:3.00   3rd Qu.:1  
##  Max.   :17379   Max.   :2012-12-31   Max.   :4.00   Max.   :1  
##       mnth              hr           holiday           weekday     
##  Min.   : 1.000   Min.   : 0.00   Min.   :0.00000   Min.   :0.000  
##  1st Qu.: 4.000   1st Qu.: 6.00   1st Qu.:0.00000   1st Qu.:1.000  
##  Median : 7.000   Median :12.00   Median :0.00000   Median :3.000  
##  Mean   : 6.502   Mean   :11.52   Mean   :0.02988   Mean   :2.995  
##  3rd Qu.: 9.000   3rd Qu.:18.00   3rd Qu.:0.00000   3rd Qu.:5.000  
##  Max.   :12.000   Max.   :23.00   Max.   :1.00000   Max.   :6.000  
##    workingday       weathersit         temp            atemp       
##  Min.   :0.0000   Min.   :1.000   Min.   :0.0200   Min.   :0.0152  
##  1st Qu.:0.0000   1st Qu.:1.000   1st Qu.:0.3400   1st Qu.:0.3333  
##  Median :1.0000   Median :1.000   Median :0.5200   Median :0.4848  
##  Mean   :0.6817   Mean   :1.413   Mean   :0.5048   Mean   :0.4825  
##  3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:0.6600   3rd Qu.:0.6212  
##  Max.   :1.0000   Max.   :4.000   Max.   :1.0000   Max.   :0.9242  
##       hum           windspeed     
##  Min.   :0.1600   Min.   :0.0000  
##  1st Qu.:0.4600   1st Qu.:0.1045  
##  Median :0.6100   Median :0.1642  
##  Mean   :0.6112   Mean   :0.1890  
##  3rd Qu.:0.7700   3rd Qu.:0.2537  
##  Max.   :1.0000   Max.   :0.8060

There are a total of 8645 observations and 17 variables in the 2011 dataset. According to the information given, of the variables are quantitative (temp,atemp,hum,windspeed,cnt), and the remaining 12 variables are qualitative. At the same time, there are a total of 8734 observations and 14 variables in the 2012 dataset. The missing three variables are “cnt” (count of total rental bikes), “casual” (count of casual users), and “registered” (count of registered users). And I intend to predict the “cnt” for 2012 using the selected model in this project

Part 1: Exploratory Analysis

In this part, I will plot the hourly bike rental count against several variables to explore their relationships.

Figure 1 shows the rental count per hour in each season in 2011. There are more rentals in both summer and fall, and a little less in winter and the least in spring. This seems unreasonable at first glance, because there supposed to be more rentals in spring than in winter in terms of higher average temperature. In order to solve this question, I now calculate the average temperature in each season as below:

## # A tibble: 4 x 3
##   season  avg_temp avg_atemp
##   <fctr>     <dbl>     <dbl>
## 1 Spring 0.2753482 0.2769897
## 2 Summer 0.5346074 0.5103303
## 3   Fall 0.7013393 0.6541496
## 4 Winter 0.4263543 0.4180605

From the result above, it is easy to tell that both average temperature and feeling temperature are the lowest in spring, which is contrary to my previous expectation. Now I plot the relationship between the rental count and the normalized feeling temperature.

## `geom_smooth()` using method = 'gam'

Figure 2 shows the relationship between the rental count per hour and the normalized feeling temperature in each season in 2011. As expected, the higher the normalized feeling temperature the higher the rental count would be. And there are more rentals in both summer and fall because the average feeling temperatures are higher in both season and it is the lowest in spring, resulting in less rental count.

Figure 3 shows the relationship between the rental count per hour and the normalized humidity in 2011. Unlike temperature, there is no significant relationship between these two variables as this appears as a uniform distribution from 0.25~0.75. This result seems reasonable because people might not be as sensitive to himidity as to temperature. Next, let’s re-examine the plot by taking weather conditions into account.

Figure 4 shows the relationship between the rental count per hour and the normalized humidity under each weather condition in 2011. Even after breaking the data into separate sets based on weather condition, there is still no significant relationship shown between these two variables. On the other hand, the rental traffic is highest when the weather is clear and partly cloundy and much less when under adverse and severe weather conditions as shown above.

Figure 5 shows the relationship between the rental count per hour and the normalized wind speed in 2011. Similar to humidity, wind speed does not seem to affect the rental traffic significantly. The rental count stays relatively same for different wind speeds. However, the number drops significantly when the normalized wind speed excceed 0.5, which is approximately 33km/hour(not exact). This aligns with my previous expectation as it will be more difficult to ride a bike under strong wind condition. Now, let’s take a look at the data of normalized wind speed larger than 0.5.

Figure 6 shows the rental count per hour when the normalized wind speed is larger than 0.5 in 2011. As shown above, even if the wind speed is large at a particular hour, as long as the weather is still nice and clear, then there will still be a significant count of rentals.

Let’s now try to quantify the relationship between the rental count per hour and the normalized feeling temperature, the normalized humidity or the normalized wind speed in 2011.

## 
## Call:
## lm(formula = cnt ~ atemp, data = hour11)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -244.70  -80.38  -20.19   59.23  459.72 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -14.905      3.650  -4.083 4.48e-05 ***
## atemp        338.377      7.283  46.460  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 119.7 on 8643 degrees of freedom
## Multiple R-squared:  0.1998, Adjusted R-squared:  0.1997 
## F-statistic:  2159 on 1 and 8643 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = cnt ~ hum, data = hour11)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -269.37  -92.09  -32.24   61.55  487.07 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  270.374      4.722   57.26   <2e-16 ***
## hum         -196.727      7.020  -28.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 128.1 on 8643 degrees of freedom
## Multiple R-squared:  0.0833, Adjusted R-squared:  0.08319 
## F-statistic: 785.4 on 1 and 8643 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = cnt ~ windspeed, data = hour11)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -193.02 -109.76  -34.83   67.48  516.62 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  126.072      2.647  47.625  < 2e-16 ***
## windspeed     92.705     11.640   7.964 1.87e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 133.3 on 8643 degrees of freedom
## Multiple R-squared:  0.007286,   Adjusted R-squared:  0.007171 
## F-statistic: 63.43 on 1 and 8643 DF,  p-value: 1.871e-15

From the outputs above, it is obvious that feeling temperature is the most effective predictor when predicting the rental counts with the least MSE generated. Wind spend, on the contrary, is the least effective predictor with MSE being the largest. Now let’s take a look at the linear model of these three variables altoghther:

## 
## Call:
## lm(formula = cnt ~ atemp + hum + windspeed, data = hour11)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -254.62  -74.03  -22.11   52.19  468.69 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   97.492      6.265  15.562  < 2e-16 ***
## atemp        334.804      6.930  48.315  < 2e-16 ***
## hum         -183.359      6.461 -28.378  < 2e-16 ***
## windspeed     37.964     10.303   3.685  0.00023 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 113.7 on 8641 degrees of freedom
## Multiple R-squared:  0.2783, Adjusted R-squared:  0.2781 
## F-statistic:  1111 on 3 and 8641 DF,  p-value: < 2.2e-16

The result shows that both feeling temperature and wind speed are positively correlated with the rental count while humidity is negatively correlated with the rental count. Most importantly, the p-value of windspeed is only 0.00023 while the other two are significantly much smaller. It is possible that this p-value will increase more if we include more variables in the model, eventually rendering it statistically insignificant.

## `geom_smooth()` using method = 'gam'
## `geom_smooth()` using method = 'gam'

Figure 7 shows a general trend of the bike rental counts per hour for casual and registered user in each hour on workdays in 2011. From the graph, we can tell that the number of registered users is much larger than that of casual users at most of the hours except between 3 and 5am. The number of registered users also reaches its peak during rush hours, which are around 8am and 6pm. This suggests that registered users are more likely to use rental bikes to commute to work or back home comparing to casual users, who might employ other means of transportation more frequently. Yet during hours from 3 to 5 in the morning, there are more casual users. This is probably because they need rental bikes to commute back home when public transportation is out of service during that time.

## `geom_smooth()` using method = 'gam'
## `geom_smooth()` using method = 'gam'

Figure 8 shows a general trend of the bike rental counts per hour for casual and registered user in each hour on weekends or holidays in 2011. Although, the number of registered users is still much larger than that of casual users. They both follow a similar trend that increases gradually from 5am, reaches the peak around 2pm and then gradually decreases. This suggests that both user groups tend to use rental bikes between 10am and 18pm, which is usually a prime time period for bike riding.

Part 2: Predictive Modeling

In this part, I will construct and train several models (linear & non-linear) based on the bike sharing data from 2011, and then I will compare and select the model with the best performance to predict the ride counts for each hour in 2012.

Non-interactive & interactive linear model

## 
## Call:
## lm(formula = formula1, data = train11)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -284.69  -44.56   -6.79   40.68  412.24 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -20.059      7.597  -2.641  0.00830 ** 
## hr.f1                        -11.854      6.439  -1.841  0.06569 .  
## hr.f2                        -19.513      6.379  -3.059  0.00223 ** 
## hr.f3                        -28.343      6.538  -4.335 1.48e-05 ***
## hr.f4                        -31.386      6.591  -4.762 1.96e-06 ***
## hr.f5                        -16.147      6.504  -2.482  0.01307 *  
## hr.f6                         25.983      6.464   4.019 5.90e-05 ***
## hr.f7                        125.997      6.409  19.659  < 2e-16 ***
## hr.f8                        226.988      6.410  35.411  < 2e-16 ***
## hr.f9                        119.749      6.413  18.672  < 2e-16 ***
## hr.f10                        79.723      6.391  12.475  < 2e-16 ***
## hr.f11                        99.235      6.488  15.295  < 2e-16 ***
## hr.f12                       127.027      6.532  19.445  < 2e-16 ***
## hr.f13                       128.801      6.601  19.512  < 2e-16 ***
## hr.f14                       111.807      6.568  17.022  < 2e-16 ***
## hr.f15                       114.221      6.673  17.116  < 2e-16 ***
## hr.f16                       165.481      6.597  25.084  < 2e-16 ***
## hr.f17                       281.329      6.539  43.025  < 2e-16 ***
## hr.f18                       262.727      6.504  40.393  < 2e-16 ***
## hr.f19                       177.417      6.415  27.657  < 2e-16 ***
## hr.f20                       121.553      6.501  18.697  < 2e-16 ***
## hr.f21                        87.495      6.354  13.771  < 2e-16 ***
## hr.f22                        57.223      6.330   9.040  < 2e-16 ***
## hr.f23                        27.306      6.362   4.292 1.79e-05 ***
## mnth.f2                        5.066      4.897   1.035  0.30094    
## mnth.f3                       10.636      5.448   1.952  0.05095 .  
## mnth.f4                       26.836      8.439   3.180  0.00148 ** 
## mnth.f5                       64.931      9.080   7.151 9.56e-13 ***
## mnth.f6                       47.330      9.662   4.899 9.88e-07 ***
## mnth.f7                       15.693     10.911   1.438  0.15040    
## mnth.f8                       30.038     10.507   2.859  0.00427 ** 
## mnth.f9                       45.287      9.400   4.818 1.48e-06 ***
## mnth.f10                      38.111      8.293   4.596 4.40e-06 ***
## mnth.f11                      18.935      8.015   2.362  0.01819 *  
## mnth.f12                      19.962      6.239   3.199  0.00138 ** 
## holiday.f1                   -19.307      5.922  -3.260  0.00112 ** 
## workingday.f1                 -2.498      2.093  -1.193  0.23274    
## weathersit.fCloudy & Misty    -3.288      2.354  -1.397  0.16252    
## weathersit.fAdverse Weather  -44.359      3.774 -11.755  < 2e-16 ***
## weathersit.fSevere Weather   -62.796     75.559  -0.831  0.40595    
## season.fSummer                18.224      5.933   3.072  0.00214 ** 
## season.fFall                  27.508      6.934   3.967 7.36e-05 ***
## season.fWinter                45.858      5.881   7.797 7.33e-15 ***
## temp                         117.851     44.297   2.660  0.00782 ** 
## atemp                         49.923     46.792   1.067  0.28605    
## hum                          -78.225      6.632 -11.796  < 2e-16 ***
## windspeed                    -21.745      8.676  -2.506  0.01222 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 75.25 on 6436 degrees of freedom
## Multiple R-squared:  0.6851, Adjusted R-squared:  0.6828 
## F-statistic: 304.3 on 46 and 6436 DF,  p-value: < 2.2e-16

According to the summary output, this model is statistically significant with a p-value less than 0.05. However, this model’s mean squared error (MSE) is relatively high suggesting that it still has certain limitations. Most of the predictors are statistically significant except workingday and atemp variable, which is probably due to multicollinearity. Try re-running the model withouth these two variables:

## 
## Call:
## lm(formula = cnt ~ hr.f + mnth.f + holiday.f + weathersit.f + 
##     season.f + temp + hum + windspeed, data = train11)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -284.30  -44.90   -6.51   41.21  410.37 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -20.023      7.374  -2.716 0.006635 ** 
## hr.f1                        -11.968      6.439  -1.859 0.063123 .  
## hr.f2                        -19.606      6.379  -3.073 0.002125 ** 
## hr.f3                        -28.454      6.538  -4.352 1.37e-05 ***
## hr.f4                        -31.462      6.591  -4.773 1.85e-06 ***
## hr.f5                        -16.296      6.504  -2.506 0.012249 *  
## hr.f6                         25.932      6.465   4.011 6.11e-05 ***
## hr.f7                        125.980      6.409  19.656  < 2e-16 ***
## hr.f8                        227.030      6.410  35.417  < 2e-16 ***
## hr.f9                        119.816      6.414  18.681  < 2e-16 ***
## hr.f10                        79.716      6.391  12.474  < 2e-16 ***
## hr.f11                        99.318      6.488  15.309  < 2e-16 ***
## hr.f12                       127.071      6.532  19.453  < 2e-16 ***
## hr.f13                       128.936      6.600  19.535  < 2e-16 ***
## hr.f14                       112.007      6.567  17.055  < 2e-16 ***
## hr.f15                       114.348      6.673  17.136  < 2e-16 ***
## hr.f16                       165.540      6.597  25.094  < 2e-16 ***
## hr.f17                       281.396      6.538  43.038  < 2e-16 ***
## hr.f18                       262.726      6.504  40.392  < 2e-16 ***
## hr.f19                       177.484      6.415  27.667  < 2e-16 ***
## hr.f20                       121.629      6.502  18.708  < 2e-16 ***
## hr.f21                        87.554      6.354  13.780  < 2e-16 ***
## hr.f22                        57.219      6.330   9.039  < 2e-16 ***
## hr.f23                        27.347      6.362   4.298 1.75e-05 ***
## mnth.f2                        5.345      4.891   1.093 0.274492    
## mnth.f3                       10.719      5.448   1.968 0.049168 *  
## mnth.f4                       27.463      8.429   3.258 0.001128 ** 
## mnth.f5                       65.239      9.072   7.191 7.16e-13 ***
## mnth.f6                       47.245      9.634   4.904 9.63e-07 ***
## mnth.f7                       15.891     10.880   1.461 0.144189    
## mnth.f8                       29.798     10.472   2.845 0.004449 ** 
## mnth.f9                       44.746      9.345   4.788 1.72e-06 ***
## mnth.f10                      38.872      8.279   4.695 2.72e-06 ***
## mnth.f11                      19.738      7.994   2.469 0.013571 *  
## mnth.f12                      20.544      6.223   3.301 0.000967 ***
## holiday.f1                   -18.051      5.719  -3.156 0.001606 ** 
## weathersit.fCloudy & Misty    -3.469      2.351  -1.475 0.140152    
## weathersit.fAdverse Weather  -44.877      3.760 -11.935  < 2e-16 ***
## weathersit.fSevere Weather   -65.007     75.549  -0.860 0.389570    
## season.fSummer                17.997      5.930   3.035 0.002416 ** 
## season.fFall                  27.401      6.929   3.955 7.75e-05 ***
## season.fWinter                45.460      5.876   7.737 1.17e-14 ***
## temp                         162.402     11.764  13.805  < 2e-16 ***
## hum                          -77.549      6.608 -11.735  < 2e-16 ***
## windspeed                    -24.719      8.235  -3.002 0.002695 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 75.25 on 6438 degrees of freedom
## Multiple R-squared:  0.6849, Adjusted R-squared:  0.6828 
## F-statistic: 318.1 on 44 and 6438 DF,  p-value: < 2.2e-16
## Analysis of Variance Table
## 
## Model 1: cnt ~ hr.f + mnth.f + holiday.f + workingday.f + weathersit.f + 
##     season.f + temp + atemp + hum + windspeed
## Model 2: cnt ~ hr.f + mnth.f + holiday.f + weathersit.f + season.f + temp + 
##     hum + windspeed
##   Res.Df      RSS Df Sum of Sq      F Pr(>F)
## 1   6436 36439980                           
## 2   6438 36454586 -2    -14606 1.2899 0.2754

The new model does not improve or deteriorate significantly from the previous model, which further suggests that the predictive powers the eliminated variables are limited. The result of ANOVA analysis also supports the finding as p-value is larger than 0.05. So the final non-interactive model can explain roughly 68.28% of the variation in rental count and has a MSE of 75.25. Let’s now take a look at an interactive linear model:

## List of 11
##  $ call         : language lm(formula = cnt ~ hr.f * temp * holiday.f + weathersit.f + mnth.f +      season.f + hum + windspeed, data = train11)
##  $ terms        :Classes 'terms', 'formula'  language cnt ~ hr.f * temp * holiday.f + weathersit.f + mnth.f + season.f +      hum + windspeed
##   .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, holiday.f, weathersit.f, mnth.f, season.f, hum,      windspeed)
##   .. ..- attr(*, "factors")= int [1:9, 1:12] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:9] "cnt" "hr.f" "temp" "holiday.f" ...
##   .. .. .. ..$ : chr [1:12] "hr.f" "temp" "holiday.f" "weathersit.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:12] "hr.f" "temp" "holiday.f" "weathersit.f" ...
##   .. ..- attr(*, "order")= int [1:12] 1 1 1 1 1 1 1 1 2 2 ...
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, holiday.f, weathersit.f, mnth.f, season.f, hum,      windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:9] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:9] "cnt" "hr.f" "temp" "holiday.f" ...
##  $ residuals    : Named num [1:6483] 105.5 -49.9 -135.4 -65.5 -96.2 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients : num [1:115, 1:4] 49.277 0.682 5.634 -10.488 -8.386 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
##  $ aliased      : Named logi [1:115] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ sigma        : num 68.7
##  $ df           : int [1:3] 115 6368 115
##  $ r.squared    : num 0.74
##  $ adj.r.squared: num 0.735
##  $ fstatistic   : Named num [1:3] 159 114 6368
##   ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
##  $ cov.unscaled : num [1:115, 1:115] 0.0328 -0.0278 -0.0279 -0.0277 -0.0277 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  - attr(*, "class")= chr "summary.lm"

In the model above, I add in one interactive term, which is hr.f*temp*holiday.f. I want to explore how temperature affects the rental count during holidays and non-holidays. The model’s adjusted R-squared increases by 7.7% and MSE decreases by 8.6%, which infers an significant improvement from the previous model.

However, there are roughly 1024 possible combinations of the interaction term, it would be impossible to try out all of them. I need a more feasible method to identify strong interaction effects. One possible approach is to fit a tree model.

Tree Model to determine interaction terms

The plot shows that hour is the most important factor affecting the rental count (the longer the branches in the tree, the greater the deviance explained), which collaborates with my previous prediction in part1. Temperature is more imporant during daytime (6:30am~). The second branch shows that at lower temperature (less than 31 celcius), there are more rental counts before 8:30pm in late fall and winter. While at higher temperature (more than 31 celcius), rental traffic is busier before 8:30 am and from 4:30 to 6:30pm on normal working days, which is the usual rush hour. At the same time, traffic is busier from 9:30 am to 3:30pm and from 4:30 to 6:30pm on weekdends and holidays. Overall, the tree model indicates that the interaction structure of the data is not complex.

## List of 11
##  $ call         : language lm(formula = cnt ~ hr.f * temp * workingday.f + hr.f * temp * season.f +      temp + mnth.f + hum + windspeed, data = train11)
##  $ terms        :Classes 'terms', 'formula'  language cnt ~ hr.f * temp * workingday.f + hr.f * temp * season.f + temp +      mnth.f + hum + windspeed
##   .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "factors")= int [1:8, 1:14] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##   .. .. .. ..$ : chr [1:14] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:14] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "order")= int [1:14] 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##  $ residuals    : Named num [1:6483] 72.2 -50.5 -144.6 20.8 -21 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients : num [1:253, 1:4] 72.7 -3.32 1.07 -1.36 -4.73 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
##  $ aliased      : Named logi [1:253] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ sigma        : num 46.4
##  $ df           : int [1:3] 253 6230 253
##  $ r.squared    : num 0.884
##  $ adj.r.squared: num 0.879
##  $ fstatistic   : Named num [1:3] 188 252 6230
##   ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
##  $ cov.unscaled : num [1:253, 1:253] 0.15 -0.147 -0.147 -0.147 -0.147 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  - attr(*, "class")= chr "summary.lm"

The new interactive model greatly improves from the previous one. The model now can explain approximately 87.91% of the variance in rental count, which is 1.2 times the original one. MSE decreases to only 46.45, which further suggests that the model is statistically more effective than the previous one.

Log Transformation of interactive & non-interactive linear models

Now let’s try twisted the model a little bit by performing log transformation on the original data.

## 
## Call:
## lm(formula = lg_cnt ~ hr.f * lg_temp * workingday.f + hr.f * 
##     lg_temp * season.f + lg_temp + mnth.f + hum + windspeed, 
##     data = train11)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4604 -0.1540  0.0488  0.2229  2.0736 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    4.533582   0.171660  26.410  < 2e-16 ***
## hr.f1                         -0.247070   0.233793  -1.057 0.290648    
## hr.f2                         -0.632714   0.233104  -2.714 0.006660 ** 
## hr.f3                         -1.441122   0.240813  -5.984 2.29e-09 ***
## hr.f4                         -2.772145   0.264385 -10.485  < 2e-16 ***
## hr.f5                         -3.086811   0.251909 -12.254  < 2e-16 ***
## hr.f6                         -1.954148   0.241950  -8.077 7.93e-16 ***
## hr.f7                         -1.344376   0.232257  -5.788 7.46e-09 ***
## hr.f8                          0.025458   0.296019   0.086 0.931469    
## hr.f9                          0.497165   0.251371   1.978 0.047993 *  
## hr.f10                         1.180717   0.240942   4.900 9.80e-07 ***
## hr.f11                         1.598780   0.266407   6.001 2.07e-09 ***
## hr.f12                         1.447013   0.244800   5.911 3.58e-09 ***
## hr.f13                         1.835071   0.262841   6.982 3.22e-12 ***
## hr.f14                         1.476195   0.245502   6.013 1.92e-09 ***
## hr.f15                         1.308035   0.243573   5.370 8.15e-08 ***
## hr.f16                         1.330853   0.251860   5.284 1.31e-07 ***
## hr.f17                         1.200911   0.243127   4.939 8.04e-07 ***
## hr.f18                         0.983277   0.246273   3.993 6.61e-05 ***
## hr.f19                         0.535366   0.249987   2.142 0.032266 *  
## hr.f20                         0.520891   0.263068   1.980 0.047740 *  
## hr.f21                         0.343491   0.263052   1.306 0.191671    
## hr.f22                         0.290723   0.253398   1.147 0.251302    
## hr.f23                        -0.129158   0.240747  -0.536 0.591642    
## lg_temp                        0.442288   0.113199   3.907 9.44e-05 ***
## workingday.f1                 -1.008461   0.107166  -9.410  < 2e-16 ***
## season.fSummer                 0.803474   0.212422   3.782 0.000157 ***
## season.fFall                   0.667479   0.248495   2.686 0.007249 ** 
## season.fWinter                 0.858886   0.246182   3.489 0.000489 ***
## mnth.f2                        0.159044   0.029677   5.359 8.66e-08 ***
## mnth.f3                        0.185035   0.035250   5.249 1.58e-07 ***
## mnth.f4                        0.384559   0.049256   7.807 6.80e-15 ***
## mnth.f5                        0.640851   0.053113  12.066  < 2e-16 ***
## mnth.f6                        0.468726   0.055746   8.408  < 2e-16 ***
## mnth.f7                        0.383784   0.061098   6.281 3.58e-10 ***
## mnth.f8                        0.363315   0.059133   6.144 8.54e-10 ***
## mnth.f9                        0.378031   0.056019   6.748 1.63e-11 ***
## mnth.f10                       0.322501   0.049281   6.544 6.46e-11 ***
## mnth.f11                       0.212335   0.047685   4.453 8.62e-06 ***
## mnth.f12                       0.305000   0.039377   7.746 1.10e-14 ***
## hum                           -0.842518   0.033580 -25.090  < 2e-16 ***
## windspeed                     -0.549272   0.046353 -11.850  < 2e-16 ***
## hr.f1:lg_temp                  0.052460   0.159449   0.329 0.742163    
## hr.f2:lg_temp                 -0.090375   0.152572  -0.592 0.553641    
## hr.f3:lg_temp                 -0.134459   0.153075  -0.878 0.379766    
## hr.f4:lg_temp                 -0.194066   0.188310  -1.031 0.302784    
## hr.f5:lg_temp                 -0.287946   0.166815  -1.726 0.084371 .  
## hr.f6:lg_temp                  0.070624   0.160420   0.440 0.659773    
## hr.f7:lg_temp                 -0.088647   0.151780  -0.584 0.559206    
## hr.f8:lg_temp                  0.078601   0.199715   0.394 0.693916    
## hr.f9:lg_temp                  0.044830   0.175144   0.256 0.797989    
## hr.f10:lg_temp                 0.141348   0.170534   0.829 0.407217    
## hr.f11:lg_temp                 0.377567   0.203392   1.856 0.063451 .  
## hr.f12:lg_temp                 0.122378   0.184474   0.663 0.507106    
## hr.f13:lg_temp                 0.483142   0.219843   2.198 0.028009 *  
## hr.f14:lg_temp                 0.182143   0.193983   0.939 0.347786    
## hr.f15:lg_temp                 0.108955   0.194907   0.559 0.576175    
## hr.f16:lg_temp                 0.148920   0.198991   0.748 0.454261    
## hr.f17:lg_temp                 0.137317   0.189329   0.725 0.468304    
## hr.f18:lg_temp                 0.088821   0.183557   0.484 0.628484    
## hr.f19:lg_temp                 0.008569   0.184728   0.046 0.963005    
## hr.f20:lg_temp                 0.112515   0.202481   0.556 0.578446    
## hr.f21:lg_temp                 0.092249   0.191920   0.481 0.630774    
## hr.f22:lg_temp                 0.146752   0.179383   0.818 0.413335    
## hr.f23:lg_temp                 0.018176   0.165227   0.110 0.912408    
## hr.f1:workingday.f1           -0.506829   0.158107  -3.206 0.001355 ** 
## hr.f2:workingday.f1           -0.681905   0.155886  -4.374 1.24e-05 ***
## hr.f3:workingday.f1           -0.771363   0.159695  -4.830 1.40e-06 ***
## hr.f4:workingday.f1            0.678457   0.168105   4.036 5.50e-05 ***
## hr.f5:workingday.f1            2.047011   0.164829  12.419  < 2e-16 ***
## hr.f6:workingday.f1            2.481230   0.160912  15.420  < 2e-16 ***
## hr.f7:workingday.f1            2.801075   0.150463  18.616  < 2e-16 ***
## hr.f8:workingday.f1            2.170660   0.156122  13.904  < 2e-16 ***
## hr.f9:workingday.f1            0.831912   0.154596   5.381 7.67e-08 ***
## hr.f10:workingday.f1           0.128856   0.148708   0.867 0.386248    
## hr.f11:workingday.f1           0.102741   0.151080   0.680 0.496505    
## hr.f12:workingday.f1           0.361688   0.150254   2.407 0.016105 *  
## hr.f13:workingday.f1           0.162788   0.150858   1.079 0.280593    
## hr.f14:workingday.f1           0.186932   0.145993   1.280 0.200447    
## hr.f15:workingday.f1           0.303987   0.149553   2.033 0.042131 *  
## hr.f16:workingday.f1           0.676216   0.148135   4.565 5.09e-06 ***
## hr.f17:workingday.f1           1.346999   0.148305   9.083  < 2e-16 ***
## hr.f18:workingday.f1           1.338956   0.152796   8.763  < 2e-16 ***
## hr.f19:workingday.f1           1.068509   0.150602   7.095 1.44e-12 ***
## hr.f20:workingday.f1           1.000281   0.157066   6.369 2.04e-10 ***
## hr.f21:workingday.f1           0.961633   0.155228   6.195 6.20e-10 ***
## hr.f22:workingday.f1           1.044531   0.154527   6.760 1.51e-11 ***
## hr.f23:workingday.f1           0.979434   0.154711   6.331 2.61e-10 ***
## lg_temp:workingday.f1          0.007914   0.104927   0.075 0.939878    
## hr.f1:season.fSummer          -0.144182   0.296560  -0.486 0.626856    
## hr.f2:season.fSummer          -0.375242   0.292593  -1.282 0.199724    
## hr.f3:season.fSummer          -0.127003   0.302173  -0.420 0.674281    
## hr.f4:season.fSummer          -0.194272   0.313755  -0.619 0.535817    
## hr.f5:season.fSummer           0.291361   0.307402   0.948 0.343259    
## hr.f6:season.fSummer           0.262480   0.299688   0.876 0.381149    
## hr.f7:season.fSummer           0.443285   0.285372   1.553 0.120389    
## hr.f8:season.fSummer          -0.124917   0.333514  -0.375 0.708009    
## hr.f9:season.fSummer          -0.118051   0.300941  -0.392 0.694870    
## hr.f10:season.fSummer         -0.587658   0.288455  -2.037 0.041666 *  
## hr.f11:season.fSummer         -0.868934   0.301097  -2.886 0.003916 ** 
## hr.f12:season.fSummer         -0.650876   0.293189  -2.220 0.026455 *  
## hr.f13:season.fSummer         -0.978481   0.301501  -3.245 0.001179 ** 
## hr.f14:season.fSummer         -0.750543   0.285751  -2.627 0.008646 ** 
## hr.f15:season.fSummer         -0.628057   0.283282  -2.217 0.026654 *  
## hr.f16:season.fSummer         -0.404307   0.292900  -1.380 0.167526    
## hr.f17:season.fSummer         -0.384052   0.287293  -1.337 0.181338    
## hr.f18:season.fSummer         -0.159922   0.291170  -0.549 0.582861    
## hr.f19:season.fSummer          0.369939   0.297743   1.242 0.214107    
## hr.f20:season.fSummer          0.232877   0.303774   0.767 0.443341    
## hr.f21:season.fSummer          0.244015   0.308863   0.790 0.429533    
## hr.f22:season.fSummer         -0.030339   0.302866  -0.100 0.920210    
## hr.f23:season.fSummer         -0.150611   0.297560  -0.506 0.612766    
## hr.f1:season.fFall            -0.194763   0.356116  -0.547 0.584461    
## hr.f2:season.fFall            -0.200652   0.355830  -0.564 0.572843    
## hr.f3:season.fFall            -0.055470   0.362335  -0.153 0.878331    
## hr.f4:season.fFall             0.446332   0.367830   1.213 0.225016    
## hr.f5:season.fFall             0.102268   0.365524   0.280 0.779652    
## hr.f6:season.fFall             0.210322   0.364430   0.577 0.563876    
## hr.f7:season.fFall             0.214991   0.339269   0.634 0.526307    
## hr.f8:season.fFall            -0.232171   0.378131  -0.614 0.539239    
## hr.f9:season.fFall            -0.165559   0.339668  -0.487 0.625982    
## hr.f10:season.fFall           -0.649566   0.332356  -1.954 0.050695 .  
## hr.f11:season.fFall           -0.881288   0.335838  -2.624 0.008708 ** 
## hr.f12:season.fFall           -0.727042   0.327090  -2.223 0.026268 *  
## hr.f13:season.fFall           -0.795132   0.326067  -2.439 0.014774 *  
## hr.f14:season.fFall           -0.610155   0.318650  -1.915 0.055562 .  
## hr.f15:season.fFall           -0.619515   0.315418  -1.964 0.049562 *  
## hr.f16:season.fFall           -0.553842   0.318016  -1.742 0.081636 .  
## hr.f17:season.fFall           -0.415055   0.316771  -1.310 0.190153    
## hr.f18:season.fFall           -0.249275   0.322341  -0.773 0.439359    
## hr.f19:season.fFall            0.288256   0.331905   0.868 0.385159    
## hr.f20:season.fFall            0.309249   0.348028   0.889 0.374265    
## hr.f21:season.fFall            0.178797   0.353387   0.506 0.612908    
## hr.f22:season.fFall           -0.109180   0.345311  -0.316 0.751878    
## hr.f23:season.fFall           -0.024723   0.349387  -0.071 0.943591    
## hr.f1:season.fWinter          -0.118934   0.341901  -0.348 0.727955    
## hr.f2:season.fWinter          -0.310943   0.341705  -0.910 0.362871    
## hr.f3:season.fWinter           0.242569   0.338418   0.717 0.473540    
## hr.f4:season.fWinter           0.513317   0.348231   1.474 0.140513    
## hr.f5:season.fWinter           0.571597   0.348828   1.639 0.101343    
## hr.f6:season.fWinter           0.290813   0.342543   0.849 0.395923    
## hr.f7:season.fWinter           0.462898   0.330353   1.401 0.161197    
## hr.f8:season.fWinter          -0.169189   0.370172  -0.457 0.647647    
## hr.f9:season.fWinter          -0.046470   0.346530  -0.134 0.893328    
## hr.f10:season.fWinter         -0.283863   0.346151  -0.820 0.412215    
## hr.f11:season.fWinter         -0.666731   0.364578  -1.829 0.067481 .  
## hr.f12:season.fWinter         -0.481386   0.342987  -1.404 0.160515    
## hr.f13:season.fWinter         -0.566056   0.352204  -1.607 0.108065    
## hr.f14:season.fWinter         -0.517507   0.341636  -1.515 0.129876    
## hr.f15:season.fWinter         -0.145710   0.332351  -0.438 0.661096    
## hr.f16:season.fWinter         -0.037562   0.345107  -0.109 0.913332    
## hr.f17:season.fWinter         -0.038625   0.334028  -0.116 0.907945    
## hr.f18:season.fWinter          0.008282   0.346020   0.024 0.980905    
## hr.f19:season.fWinter          0.447459   0.340161   1.315 0.188413    
## hr.f20:season.fWinter          0.154944   0.351101   0.441 0.659005    
## hr.f21:season.fWinter          0.151750   0.350371   0.433 0.664948    
## hr.f22:season.fWinter         -0.029243   0.351391  -0.083 0.933680    
## hr.f23:season.fWinter         -0.075897   0.340536  -0.223 0.823640    
## lg_temp:season.fSummer         0.659489   0.193552   3.407 0.000660 ***
## lg_temp:season.fFall           0.222227   0.436667   0.509 0.610828    
## lg_temp:season.fWinter         0.223747   0.214568   1.043 0.297090    
## hr.f1:lg_temp:workingday.f1   -0.081711   0.151283  -0.540 0.589134    
## hr.f2:lg_temp:workingday.f1    0.120254   0.146402   0.821 0.411452    
## hr.f3:lg_temp:workingday.f1   -0.150413   0.149882  -1.004 0.315635    
## hr.f4:lg_temp:workingday.f1   -0.026785   0.162194  -0.165 0.868837    
## hr.f5:lg_temp:workingday.f1    0.022084   0.149649   0.148 0.882688    
## hr.f6:lg_temp:workingday.f1   -0.420714   0.147607  -2.850 0.004383 ** 
## hr.f7:lg_temp:workingday.f1   -0.312886   0.140042  -2.234 0.025503 *  
## hr.f8:lg_temp:workingday.f1   -0.535887   0.150371  -3.564 0.000368 ***
## hr.f9:lg_temp:workingday.f1   -0.796751   0.152421  -5.227 1.78e-07 ***
## hr.f10:lg_temp:workingday.f1  -0.354933   0.150500  -2.358 0.018387 *  
## hr.f11:lg_temp:workingday.f1  -0.332130   0.161361  -2.058 0.039602 *  
## hr.f12:lg_temp:workingday.f1  -0.031091   0.160063  -0.194 0.845992    
## hr.f13:lg_temp:workingday.f1  -0.292423   0.172284  -1.697 0.089683 .  
## hr.f14:lg_temp:workingday.f1  -0.190454   0.160406  -1.187 0.235143    
## hr.f15:lg_temp:workingday.f1  -0.138938   0.166894  -0.832 0.405165    
## hr.f16:lg_temp:workingday.f1  -0.141818   0.163442  -0.868 0.385594    
## hr.f17:lg_temp:workingday.f1  -0.270057   0.160692  -1.681 0.092894 .  
## hr.f18:lg_temp:workingday.f1  -0.276793   0.159941  -1.731 0.083574 .  
## hr.f19:lg_temp:workingday.f1  -0.439711   0.157414  -2.793 0.005233 ** 
## hr.f20:lg_temp:workingday.f1  -0.386429   0.165305  -2.338 0.019436 *  
## hr.f21:lg_temp:workingday.f1  -0.304770   0.157112  -1.940 0.052446 .  
## hr.f22:lg_temp:workingday.f1  -0.174430   0.154225  -1.131 0.258095    
## hr.f23:lg_temp:workingday.f1  -0.044544   0.150651  -0.296 0.767488    
## hr.f1:lg_temp:season.fSummer  -0.142476   0.275935  -0.516 0.605637    
## hr.f2:lg_temp:season.fSummer  -0.637802   0.263226  -2.423 0.015420 *  
## hr.f3:lg_temp:season.fSummer  -0.235465   0.268749  -0.876 0.380981    
## hr.f4:lg_temp:season.fSummer  -0.391888   0.282791  -1.386 0.165862    
## hr.f5:lg_temp:season.fSummer  -0.262559   0.266549  -0.985 0.324647    
## hr.f6:lg_temp:season.fSummer  -0.157071   0.258373  -0.608 0.543260    
## hr.f7:lg_temp:season.fSummer   0.075900   0.257885   0.294 0.768525    
## hr.f8:lg_temp:season.fSummer  -0.274536   0.286916  -0.957 0.338680    
## hr.f9:lg_temp:season.fSummer  -0.161635   0.274626  -0.589 0.556175    
## hr.f10:lg_temp:season.fSummer -0.483412   0.268832  -1.798 0.072194 .  
## hr.f11:lg_temp:season.fSummer -0.717763   0.282930  -2.537 0.011208 *  
## hr.f12:lg_temp:season.fSummer -0.644799   0.286327  -2.252 0.024359 *  
## hr.f13:lg_temp:season.fSummer -0.969055   0.316436  -3.062 0.002205 ** 
## hr.f14:lg_temp:season.fSummer -0.684099   0.286924  -2.384 0.017144 *  
## hr.f15:lg_temp:season.fSummer -0.670677   0.292795  -2.291 0.022020 *  
## hr.f16:lg_temp:season.fSummer -0.355513   0.300866  -1.182 0.237395    
## hr.f17:lg_temp:season.fSummer -0.375922   0.296961  -1.266 0.205597    
## hr.f18:lg_temp:season.fSummer -0.174114   0.287889  -0.605 0.545337    
## hr.f19:lg_temp:season.fSummer  0.171923   0.301667   0.570 0.568759    
## hr.f20:lg_temp:season.fSummer  0.179098   0.292607   0.612 0.540510    
## hr.f21:lg_temp:season.fSummer  0.199830   0.294715   0.678 0.497768    
## hr.f22:lg_temp:season.fSummer  0.023362   0.281565   0.083 0.933878    
## hr.f23:lg_temp:season.fSummer -0.209306   0.281064  -0.745 0.456486    
## hr.f1:lg_temp:season.fFall    -0.366280   0.638366  -0.574 0.566139    
## hr.f2:lg_temp:season.fFall    -0.140081   0.633988  -0.221 0.825137    
## hr.f3:lg_temp:season.fFall    -0.411336   0.626651  -0.656 0.511588    
## hr.f4:lg_temp:season.fFall     0.644355   0.616070   1.046 0.295642    
## hr.f5:lg_temp:season.fFall    -0.861608   0.623324  -1.382 0.166935    
## hr.f6:lg_temp:season.fFall    -0.410943   0.633637  -0.649 0.516656    
## hr.f7:lg_temp:season.fFall    -0.411451   0.596207  -0.690 0.490148    
## hr.f8:lg_temp:season.fFall    -0.564782   0.618044  -0.914 0.360845    
## hr.f9:lg_temp:season.fFall    -0.435533   0.590300  -0.738 0.460654    
## hr.f10:lg_temp:season.fFall   -0.934171   0.639697  -1.460 0.144249    
## hr.f11:lg_temp:season.fFall   -1.169814   0.605338  -1.932 0.053343 .  
## hr.f12:lg_temp:season.fFall   -0.925475   0.624344  -1.482 0.138307    
## hr.f13:lg_temp:season.fFall   -0.240478   0.599054  -0.401 0.688118    
## hr.f14:lg_temp:season.fFall   -0.362682   0.609279  -0.595 0.551688    
## hr.f15:lg_temp:season.fFall   -0.948783   0.603819  -1.571 0.116163    
## hr.f16:lg_temp:season.fFall   -0.884953   0.576669  -1.535 0.124934    
## hr.f17:lg_temp:season.fFall   -0.502880   0.599878  -0.838 0.401892    
## hr.f18:lg_temp:season.fFall   -0.757977   0.580224  -1.306 0.191481    
## hr.f19:lg_temp:season.fFall   -0.161670   0.598097  -0.270 0.786933    
## hr.f20:lg_temp:season.fFall    0.294859   0.636128   0.464 0.643007    
## hr.f21:lg_temp:season.fFall   -0.164910   0.629314  -0.262 0.793293    
## hr.f22:lg_temp:season.fFall   -0.245219   0.608577  -0.403 0.687008    
## hr.f23:lg_temp:season.fFall   -0.165293   0.621985  -0.266 0.790440    
## hr.f1:lg_temp:season.fWinter  -0.042274   0.293756  -0.144 0.885576    
## hr.f2:lg_temp:season.fWinter  -0.135257   0.291176  -0.465 0.642292    
## hr.f3:lg_temp:season.fWinter   0.298732   0.279031   1.071 0.284389    
## hr.f4:lg_temp:season.fWinter   0.296046   0.289791   1.022 0.307017    
## hr.f5:lg_temp:season.fWinter   0.148366   0.286943   0.517 0.605134    
## hr.f6:lg_temp:season.fWinter  -0.003593   0.278967  -0.013 0.989723    
## hr.f7:lg_temp:season.fWinter   0.045860   0.272403   0.168 0.866311    
## hr.f8:lg_temp:season.fWinter  -0.218740   0.298102  -0.734 0.463112    
## hr.f9:lg_temp:season.fWinter  -0.057390   0.299537  -0.192 0.848066    
## hr.f10:lg_temp:season.fWinter -0.157159   0.315250  -0.499 0.618134    
## hr.f11:lg_temp:season.fWinter -0.385751   0.342026  -1.128 0.259430    
## hr.f12:lg_temp:season.fWinter -0.291798   0.328061  -0.889 0.373789    
## hr.f13:lg_temp:season.fWinter -0.217109   0.342245  -0.634 0.525865    
## hr.f14:lg_temp:season.fWinter -0.259841   0.336588  -0.772 0.440154    
## hr.f15:lg_temp:season.fWinter  0.170420   0.326336   0.522 0.601533    
## hr.f16:lg_temp:season.fWinter  0.204876   0.329475   0.622 0.534078    
## hr.f17:lg_temp:season.fWinter  0.235525   0.314210   0.750 0.453536    
## hr.f18:lg_temp:season.fWinter  0.097032   0.321844   0.301 0.763052    
## hr.f19:lg_temp:season.fWinter  0.443724   0.308953   1.436 0.150990    
## hr.f20:lg_temp:season.fWinter  0.198931   0.317424   0.627 0.530877    
## hr.f21:lg_temp:season.fWinter  0.178255   0.308565   0.578 0.563494    
## hr.f22:lg_temp:season.fWinter  0.132711   0.306606   0.433 0.665147    
## hr.f23:lg_temp:season.fWinter -0.054760   0.294036  -0.186 0.852266    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4212 on 6230 degrees of freedom
## Multiple R-squared:  0.919,  Adjusted R-squared:  0.9157 
## F-statistic: 280.5 on 252 and 6230 DF,  p-value: < 2.2e-16

The model is further improved with adjusted R-squared increasing to 0.915 and MSE decreasing to 0.4188 in this model!

Model Trimming

However, the model still contains too many redundant variables, which does not align with the principle of parsimony in this case. Let’s now try triming down the model further:

## List of 11
##  $ call         : language lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f +      hum + windspeed + hr.f:workingday.| __truncated__ ...
##  $ terms        :Classes 'terms', 'formula'  language lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum +      windspeed + hr.f:workingday.f + lg_temp:w| __truncated__ ...
##   .. ..- attr(*, "variables")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "factors")= int [1:8, 1:13] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
##   .. .. .. ..$ : chr [1:13] "hr.f" "lg_temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:13] "hr.f" "lg_temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "order")= int [1:13] 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
##  $ residuals    : Named num [1:6483] 0.3024 -0.2709 -0.5322 0.7109 -0.0839 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients : num [1:253, 1:4] 4.534 -0.247 -0.633 -1.441 -2.772 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
##  $ aliased      : Named logi [1:253] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ sigma        : num 0.421
##  $ df           : int [1:3] 253 6230 253
##  $ r.squared    : num 0.919
##  $ adj.r.squared: num 0.916
##  $ fstatistic   : Named num [1:3] 280 252 6230
##   ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
##  $ cov.unscaled : num [1:253, 1:253] 0.166 -0.152 -0.153 -0.153 -0.153 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  - attr(*, "class")= chr "summary.lm"
## List of 11
##  $ call         : language lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f +      hum + windspeed + hr.f:workingday.| __truncated__ ...
##  $ terms        :Classes 'terms', 'formula'  language lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum +      windspeed + hr.f:workingday.f + lg_temp:w| __truncated__ ...
##   .. ..- attr(*, "variables")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "factors")= int [1:8, 1:12] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
##   .. .. .. ..$ : chr [1:12] "hr.f" "lg_temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:12] "hr.f" "lg_temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "order")= int [1:12] 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
##  $ residuals    : Named num [1:6483] 0.3203 -0.27 -0.4912 0.75 -0.0666 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients : num [1:184, 1:4] 4.638 -0.358 -0.855 -1.456 -2.63 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
##  $ aliased      : Named logi [1:184] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ sigma        : num 0.423
##  $ df           : int [1:3] 184 6299 184
##  $ r.squared    : num 0.918
##  $ adj.r.squared: num 0.915
##  $ fstatistic   : Named num [1:3] 383 183 6299
##   ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
##  $ cov.unscaled : num [1:184, 1:184] 0.0775 -0.0588 -0.059 -0.0585 -0.0576 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  - attr(*, "class")= chr "summary.lm"
## List of 11
##  $ call         : language lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f +      hum + windspeed + hr.f:workingday.| __truncated__ ...
##  $ terms        :Classes 'terms', 'formula'  language lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum +      windspeed + hr.f:workingday.f + lg_temp:w| __truncated__ ...
##   .. ..- attr(*, "variables")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "factors")= int [1:8, 1:11] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
##   .. .. .. ..$ : chr [1:11] "hr.f" "lg_temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:11] "hr.f" "lg_temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "order")= int [1:11] 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
##  $ residuals    : Named num [1:6483] 0.2978 -0.2638 -0.5056 0.7851 -0.0664 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients : num [1:115, 1:4] 4.641 -0.323 -0.758 -1.392 -2.606 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
##  $ aliased      : Named logi [1:115] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ sigma        : num 0.425
##  $ df           : int [1:3] 115 6368 115
##  $ r.squared    : num 0.916
##  $ adj.r.squared: num 0.914
##  $ fstatistic   : Named num [1:3] 607 114 6368
##   ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
##  $ cov.unscaled : num [1:115, 1:115] 0.0595 -0.0415 -0.0417 -0.0417 -0.0406 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  - attr(*, "class")= chr "summary.lm"
## List of 11
##  $ call         : language lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f +      hum + windspeed + hr.f:workingday.| __truncated__ ...
##  $ terms        :Classes 'terms', 'formula'  language lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum +      windspeed + hr.f:workingday.f + lg_temp:w| __truncated__
##   .. ..- attr(*, "variables")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "factors")= int [1:8, 1:10] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
##   .. .. .. ..$ : chr [1:10] "hr.f" "lg_temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:10] "hr.f" "lg_temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "order")= int [1:10] 1 1 1 1 1 1 1 2 2 2
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
##  $ residuals    : Named num [1:6483] 0.314 -0.252 -0.418 0.768 -0.134 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients : num [1:69, 1:4] 4.645 -0.335 -0.572 -1.291 -2.486 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:69] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
##  $ aliased      : Named logi [1:69] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:69] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ sigma        : num 0.432
##  $ df           : int [1:3] 69 6414 69
##  $ r.squared    : num 0.912
##  $ adj.r.squared: num 0.912
##  $ fstatistic   : Named num [1:3] 983 68 6414
##   ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
##  $ cov.unscaled : num [1:69, 1:69] 0.031 -0.0107 -0.0106 -0.0104 -0.0106 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:69] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:69] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  - attr(*, "class")= chr "summary.lm"

It appears that removing these interaction terms did not deteriorate the explanatory power of the log model by too much: adjusted R-squared decreases from 0.915 to 0.912 and MSE increases from 0.419 to 0.432. However, the ANOVA analysis shows that the final model is significantly different from the initial one with p-values less than 0.05:

## Analysis of Variance Table
## 
## Model 1: lg_cnt ~ hr.f * lg_temp * workingday.f + hr.f * lg_temp * season.f + 
##     lg_temp + mnth.f + hum + windspeed
## Model 2: lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + 
##     hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f + 
##     hr.f:season.f + lg_temp:season.f + hr.f:lg_temp:workingday.f + 
##     hr.f:lg_temp:season.f
## Model 3: lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + 
##     hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f + 
##     lg_temp:season.f + hr.f:lg_temp:workingday.f + hr.f:lg_temp:season.f
## Model 4: lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + 
##     hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f + 
##     lg_temp:season.f + hr.f:lg_temp:workingday.f
## Model 5: lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + 
##     hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f + 
##     lg_temp:season.f
##   Res.Df    RSS  Df Sum of Sq      F    Pr(>F)    
## 1   6230 1105.1                                   
## 2   6230 1105.1   0     0.000                     
## 3   6299 1125.3 -69   -20.202 1.6506 0.0005986 ***
## 4   6368 1149.7 -69   -24.367 1.9909 2.341e-06 ***
## 5   6414 1194.6 -46   -44.980 5.5125 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

However, the second model with hr.f:lg_temp removed is not significantly different from the first model, so I will choose is as my final model.

## 
## Call:
## lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + 
##     mnth.f + hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f + 
##     hr.f:season.f + lg_temp:season.f + hr.f:lg_temp:workingday.f + 
##     hr.f:lg_temp:season.f, data = train11)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4604 -0.1540  0.0488  0.2229  2.0736 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    4.533582   0.171660  26.410  < 2e-16 ***
## hr.f1                         -0.247070   0.233793  -1.057 0.290648    
## hr.f2                         -0.632714   0.233104  -2.714 0.006660 ** 
## hr.f3                         -1.441122   0.240813  -5.984 2.29e-09 ***
## hr.f4                         -2.772145   0.264385 -10.485  < 2e-16 ***
## hr.f5                         -3.086811   0.251909 -12.254  < 2e-16 ***
## hr.f6                         -1.954148   0.241950  -8.077 7.93e-16 ***
## hr.f7                         -1.344376   0.232257  -5.788 7.46e-09 ***
## hr.f8                          0.025458   0.296019   0.086 0.931469    
## hr.f9                          0.497165   0.251371   1.978 0.047993 *  
## hr.f10                         1.180717   0.240942   4.900 9.80e-07 ***
## hr.f11                         1.598780   0.266407   6.001 2.07e-09 ***
## hr.f12                         1.447013   0.244800   5.911 3.58e-09 ***
## hr.f13                         1.835071   0.262841   6.982 3.22e-12 ***
## hr.f14                         1.476195   0.245502   6.013 1.92e-09 ***
## hr.f15                         1.308035   0.243573   5.370 8.15e-08 ***
## hr.f16                         1.330853   0.251860   5.284 1.31e-07 ***
## hr.f17                         1.200911   0.243127   4.939 8.04e-07 ***
## hr.f18                         0.983277   0.246273   3.993 6.61e-05 ***
## hr.f19                         0.535366   0.249987   2.142 0.032266 *  
## hr.f20                         0.520891   0.263068   1.980 0.047740 *  
## hr.f21                         0.343491   0.263052   1.306 0.191671    
## hr.f22                         0.290723   0.253398   1.147 0.251302    
## hr.f23                        -0.129158   0.240747  -0.536 0.591642    
## lg_temp                        0.442288   0.113199   3.907 9.44e-05 ***
## workingday.f1                 -1.008461   0.107166  -9.410  < 2e-16 ***
## season.fSummer                 0.803474   0.212422   3.782 0.000157 ***
## season.fFall                   0.667479   0.248495   2.686 0.007249 ** 
## season.fWinter                 0.858886   0.246182   3.489 0.000489 ***
## mnth.f2                        0.159044   0.029677   5.359 8.66e-08 ***
## mnth.f3                        0.185035   0.035250   5.249 1.58e-07 ***
## mnth.f4                        0.384559   0.049256   7.807 6.80e-15 ***
## mnth.f5                        0.640851   0.053113  12.066  < 2e-16 ***
## mnth.f6                        0.468726   0.055746   8.408  < 2e-16 ***
## mnth.f7                        0.383784   0.061098   6.281 3.58e-10 ***
## mnth.f8                        0.363315   0.059133   6.144 8.54e-10 ***
## mnth.f9                        0.378031   0.056019   6.748 1.63e-11 ***
## mnth.f10                       0.322501   0.049281   6.544 6.46e-11 ***
## mnth.f11                       0.212335   0.047685   4.453 8.62e-06 ***
## mnth.f12                       0.305000   0.039377   7.746 1.10e-14 ***
## hum                           -0.842518   0.033580 -25.090  < 2e-16 ***
## windspeed                     -0.549272   0.046353 -11.850  < 2e-16 ***
## hr.f1:workingday.f1           -0.506829   0.158107  -3.206 0.001355 ** 
## hr.f2:workingday.f1           -0.681905   0.155886  -4.374 1.24e-05 ***
## hr.f3:workingday.f1           -0.771363   0.159695  -4.830 1.40e-06 ***
## hr.f4:workingday.f1            0.678457   0.168105   4.036 5.50e-05 ***
## hr.f5:workingday.f1            2.047011   0.164829  12.419  < 2e-16 ***
## hr.f6:workingday.f1            2.481230   0.160912  15.420  < 2e-16 ***
## hr.f7:workingday.f1            2.801075   0.150463  18.616  < 2e-16 ***
## hr.f8:workingday.f1            2.170660   0.156122  13.904  < 2e-16 ***
## hr.f9:workingday.f1            0.831912   0.154596   5.381 7.67e-08 ***
## hr.f10:workingday.f1           0.128856   0.148708   0.867 0.386248    
## hr.f11:workingday.f1           0.102741   0.151080   0.680 0.496505    
## hr.f12:workingday.f1           0.361688   0.150254   2.407 0.016105 *  
## hr.f13:workingday.f1           0.162788   0.150858   1.079 0.280593    
## hr.f14:workingday.f1           0.186932   0.145993   1.280 0.200447    
## hr.f15:workingday.f1           0.303987   0.149553   2.033 0.042131 *  
## hr.f16:workingday.f1           0.676216   0.148135   4.565 5.09e-06 ***
## hr.f17:workingday.f1           1.346999   0.148305   9.083  < 2e-16 ***
## hr.f18:workingday.f1           1.338956   0.152796   8.763  < 2e-16 ***
## hr.f19:workingday.f1           1.068509   0.150602   7.095 1.44e-12 ***
## hr.f20:workingday.f1           1.000281   0.157066   6.369 2.04e-10 ***
## hr.f21:workingday.f1           0.961633   0.155228   6.195 6.20e-10 ***
## hr.f22:workingday.f1           1.044531   0.154527   6.760 1.51e-11 ***
## hr.f23:workingday.f1           0.979434   0.154711   6.331 2.61e-10 ***
## lg_temp:workingday.f1          0.007914   0.104927   0.075 0.939878    
## hr.f1:season.fSummer          -0.144182   0.296560  -0.486 0.626856    
## hr.f2:season.fSummer          -0.375242   0.292593  -1.282 0.199724    
## hr.f3:season.fSummer          -0.127003   0.302173  -0.420 0.674281    
## hr.f4:season.fSummer          -0.194272   0.313755  -0.619 0.535817    
## hr.f5:season.fSummer           0.291361   0.307402   0.948 0.343259    
## hr.f6:season.fSummer           0.262480   0.299688   0.876 0.381149    
## hr.f7:season.fSummer           0.443285   0.285372   1.553 0.120389    
## hr.f8:season.fSummer          -0.124917   0.333514  -0.375 0.708009    
## hr.f9:season.fSummer          -0.118051   0.300941  -0.392 0.694870    
## hr.f10:season.fSummer         -0.587658   0.288455  -2.037 0.041666 *  
## hr.f11:season.fSummer         -0.868934   0.301097  -2.886 0.003916 ** 
## hr.f12:season.fSummer         -0.650876   0.293189  -2.220 0.026455 *  
## hr.f13:season.fSummer         -0.978481   0.301501  -3.245 0.001179 ** 
## hr.f14:season.fSummer         -0.750543   0.285751  -2.627 0.008646 ** 
## hr.f15:season.fSummer         -0.628057   0.283282  -2.217 0.026654 *  
## hr.f16:season.fSummer         -0.404307   0.292900  -1.380 0.167526    
## hr.f17:season.fSummer         -0.384052   0.287293  -1.337 0.181338    
## hr.f18:season.fSummer         -0.159922   0.291170  -0.549 0.582861    
## hr.f19:season.fSummer          0.369939   0.297743   1.242 0.214107    
## hr.f20:season.fSummer          0.232877   0.303774   0.767 0.443341    
## hr.f21:season.fSummer          0.244015   0.308863   0.790 0.429533    
## hr.f22:season.fSummer         -0.030339   0.302866  -0.100 0.920210    
## hr.f23:season.fSummer         -0.150611   0.297560  -0.506 0.612766    
## hr.f1:season.fFall            -0.194763   0.356116  -0.547 0.584461    
## hr.f2:season.fFall            -0.200652   0.355830  -0.564 0.572843    
## hr.f3:season.fFall            -0.055470   0.362335  -0.153 0.878331    
## hr.f4:season.fFall             0.446332   0.367830   1.213 0.225016    
## hr.f5:season.fFall             0.102268   0.365524   0.280 0.779652    
## hr.f6:season.fFall             0.210322   0.364430   0.577 0.563876    
## hr.f7:season.fFall             0.214991   0.339269   0.634 0.526307    
## hr.f8:season.fFall            -0.232171   0.378131  -0.614 0.539239    
## hr.f9:season.fFall            -0.165559   0.339668  -0.487 0.625982    
## hr.f10:season.fFall           -0.649566   0.332356  -1.954 0.050695 .  
## hr.f11:season.fFall           -0.881288   0.335838  -2.624 0.008708 ** 
## hr.f12:season.fFall           -0.727042   0.327090  -2.223 0.026268 *  
## hr.f13:season.fFall           -0.795132   0.326067  -2.439 0.014774 *  
## hr.f14:season.fFall           -0.610155   0.318650  -1.915 0.055562 .  
## hr.f15:season.fFall           -0.619515   0.315418  -1.964 0.049562 *  
## hr.f16:season.fFall           -0.553842   0.318016  -1.742 0.081636 .  
## hr.f17:season.fFall           -0.415055   0.316771  -1.310 0.190153    
## hr.f18:season.fFall           -0.249275   0.322341  -0.773 0.439359    
## hr.f19:season.fFall            0.288256   0.331905   0.868 0.385159    
## hr.f20:season.fFall            0.309249   0.348028   0.889 0.374265    
## hr.f21:season.fFall            0.178797   0.353387   0.506 0.612908    
## hr.f22:season.fFall           -0.109180   0.345311  -0.316 0.751878    
## hr.f23:season.fFall           -0.024723   0.349387  -0.071 0.943591    
## hr.f1:season.fWinter          -0.118934   0.341901  -0.348 0.727955    
## hr.f2:season.fWinter          -0.310943   0.341705  -0.910 0.362871    
## hr.f3:season.fWinter           0.242569   0.338418   0.717 0.473540    
## hr.f4:season.fWinter           0.513317   0.348231   1.474 0.140513    
## hr.f5:season.fWinter           0.571597   0.348828   1.639 0.101343    
## hr.f6:season.fWinter           0.290813   0.342543   0.849 0.395923    
## hr.f7:season.fWinter           0.462898   0.330353   1.401 0.161197    
## hr.f8:season.fWinter          -0.169189   0.370172  -0.457 0.647647    
## hr.f9:season.fWinter          -0.046470   0.346530  -0.134 0.893328    
## hr.f10:season.fWinter         -0.283863   0.346151  -0.820 0.412215    
## hr.f11:season.fWinter         -0.666731   0.364578  -1.829 0.067481 .  
## hr.f12:season.fWinter         -0.481386   0.342987  -1.404 0.160515    
## hr.f13:season.fWinter         -0.566056   0.352204  -1.607 0.108065    
## hr.f14:season.fWinter         -0.517507   0.341636  -1.515 0.129876    
## hr.f15:season.fWinter         -0.145710   0.332351  -0.438 0.661096    
## hr.f16:season.fWinter         -0.037562   0.345107  -0.109 0.913332    
## hr.f17:season.fWinter         -0.038625   0.334028  -0.116 0.907945    
## hr.f18:season.fWinter          0.008282   0.346020   0.024 0.980905    
## hr.f19:season.fWinter          0.447459   0.340161   1.315 0.188413    
## hr.f20:season.fWinter          0.154944   0.351101   0.441 0.659005    
## hr.f21:season.fWinter          0.151750   0.350371   0.433 0.664948    
## hr.f22:season.fWinter         -0.029243   0.351391  -0.083 0.933680    
## hr.f23:season.fWinter         -0.075897   0.340536  -0.223 0.823640    
## lg_temp:season.fSummer         0.659489   0.193552   3.407 0.000660 ***
## lg_temp:season.fFall           0.222227   0.436667   0.509 0.610828    
## lg_temp:season.fWinter         0.223747   0.214568   1.043 0.297090    
## hr.f1:lg_temp:workingday.f0    0.052460   0.159449   0.329 0.742163    
## hr.f2:lg_temp:workingday.f0   -0.090375   0.152572  -0.592 0.553641    
## hr.f3:lg_temp:workingday.f0   -0.134459   0.153075  -0.878 0.379766    
## hr.f4:lg_temp:workingday.f0   -0.194066   0.188310  -1.031 0.302784    
## hr.f5:lg_temp:workingday.f0   -0.287946   0.166815  -1.726 0.084371 .  
## hr.f6:lg_temp:workingday.f0    0.070624   0.160420   0.440 0.659773    
## hr.f7:lg_temp:workingday.f0   -0.088647   0.151780  -0.584 0.559206    
## hr.f8:lg_temp:workingday.f0    0.078601   0.199715   0.394 0.693916    
## hr.f9:lg_temp:workingday.f0    0.044830   0.175144   0.256 0.797989    
## hr.f10:lg_temp:workingday.f0   0.141348   0.170534   0.829 0.407217    
## hr.f11:lg_temp:workingday.f0   0.377567   0.203392   1.856 0.063451 .  
## hr.f12:lg_temp:workingday.f0   0.122378   0.184474   0.663 0.507106    
## hr.f13:lg_temp:workingday.f0   0.483142   0.219843   2.198 0.028009 *  
## hr.f14:lg_temp:workingday.f0   0.182143   0.193983   0.939 0.347786    
## hr.f15:lg_temp:workingday.f0   0.108955   0.194907   0.559 0.576175    
## hr.f16:lg_temp:workingday.f0   0.148920   0.198991   0.748 0.454261    
## hr.f17:lg_temp:workingday.f0   0.137317   0.189329   0.725 0.468304    
## hr.f18:lg_temp:workingday.f0   0.088821   0.183557   0.484 0.628484    
## hr.f19:lg_temp:workingday.f0   0.008569   0.184728   0.046 0.963005    
## hr.f20:lg_temp:workingday.f0   0.112515   0.202481   0.556 0.578446    
## hr.f21:lg_temp:workingday.f0   0.092249   0.191920   0.481 0.630774    
## hr.f22:lg_temp:workingday.f0   0.146752   0.179383   0.818 0.413335    
## hr.f23:lg_temp:workingday.f0   0.018176   0.165227   0.110 0.912408    
## hr.f1:lg_temp:workingday.f1   -0.029251   0.168329  -0.174 0.862048    
## hr.f2:lg_temp:workingday.f1    0.029879   0.166662   0.179 0.857726    
## hr.f3:lg_temp:workingday.f1   -0.284873   0.170704  -1.669 0.095205 .  
## hr.f4:lg_temp:workingday.f1   -0.220851   0.167522  -1.318 0.187437    
## hr.f5:lg_temp:workingday.f1   -0.265863   0.158393  -1.679 0.093299 .  
## hr.f6:lg_temp:workingday.f1   -0.350090   0.168773  -2.074 0.038090 *  
## hr.f7:lg_temp:workingday.f1   -0.401533   0.150648  -2.665 0.007710 ** 
## hr.f8:lg_temp:workingday.f1   -0.457287   0.193303  -2.366 0.018029 *  
## hr.f9:lg_temp:workingday.f1   -0.751921   0.176814  -4.253 2.14e-05 ***
## hr.f10:lg_temp:workingday.f1  -0.213585   0.169864  -1.257 0.208660    
## hr.f11:lg_temp:workingday.f1   0.045438   0.183257   0.248 0.804185    
## hr.f12:lg_temp:workingday.f1   0.091287   0.186541   0.489 0.624598    
## hr.f13:lg_temp:workingday.f1   0.190718   0.194808   0.979 0.327615    
## hr.f14:lg_temp:workingday.f1  -0.008311   0.187666  -0.044 0.964677    
## hr.f15:lg_temp:workingday.f1  -0.029982   0.185967  -0.161 0.871922    
## hr.f16:lg_temp:workingday.f1   0.007102   0.187889   0.038 0.969849    
## hr.f17:lg_temp:workingday.f1  -0.132740   0.182277  -0.728 0.466500    
## hr.f18:lg_temp:workingday.f1  -0.187973   0.183938  -1.022 0.306852    
## hr.f19:lg_temp:workingday.f1  -0.431143   0.182319  -2.365 0.018071 *  
## hr.f20:lg_temp:workingday.f1  -0.273913   0.180401  -1.518 0.128975    
## hr.f21:lg_temp:workingday.f1  -0.212521   0.182732  -1.163 0.244866    
## hr.f22:lg_temp:workingday.f1  -0.027678   0.176888  -0.156 0.875666    
## hr.f23:lg_temp:workingday.f1  -0.026368   0.170813  -0.154 0.877325    
## hr.f1:lg_temp:season.fSummer  -0.142476   0.275935  -0.516 0.605637    
## hr.f2:lg_temp:season.fSummer  -0.637802   0.263226  -2.423 0.015420 *  
## hr.f3:lg_temp:season.fSummer  -0.235465   0.268749  -0.876 0.380981    
## hr.f4:lg_temp:season.fSummer  -0.391888   0.282791  -1.386 0.165862    
## hr.f5:lg_temp:season.fSummer  -0.262559   0.266549  -0.985 0.324647    
## hr.f6:lg_temp:season.fSummer  -0.157071   0.258373  -0.608 0.543260    
## hr.f7:lg_temp:season.fSummer   0.075900   0.257885   0.294 0.768525    
## hr.f8:lg_temp:season.fSummer  -0.274536   0.286916  -0.957 0.338680    
## hr.f9:lg_temp:season.fSummer  -0.161635   0.274626  -0.589 0.556175    
## hr.f10:lg_temp:season.fSummer -0.483412   0.268832  -1.798 0.072194 .  
## hr.f11:lg_temp:season.fSummer -0.717763   0.282930  -2.537 0.011208 *  
## hr.f12:lg_temp:season.fSummer -0.644799   0.286327  -2.252 0.024359 *  
## hr.f13:lg_temp:season.fSummer -0.969055   0.316436  -3.062 0.002205 ** 
## hr.f14:lg_temp:season.fSummer -0.684099   0.286924  -2.384 0.017144 *  
## hr.f15:lg_temp:season.fSummer -0.670677   0.292795  -2.291 0.022020 *  
## hr.f16:lg_temp:season.fSummer -0.355513   0.300866  -1.182 0.237395    
## hr.f17:lg_temp:season.fSummer -0.375922   0.296961  -1.266 0.205597    
## hr.f18:lg_temp:season.fSummer -0.174114   0.287889  -0.605 0.545337    
## hr.f19:lg_temp:season.fSummer  0.171923   0.301667   0.570 0.568759    
## hr.f20:lg_temp:season.fSummer  0.179098   0.292607   0.612 0.540510    
## hr.f21:lg_temp:season.fSummer  0.199830   0.294715   0.678 0.497768    
## hr.f22:lg_temp:season.fSummer  0.023362   0.281565   0.083 0.933878    
## hr.f23:lg_temp:season.fSummer -0.209306   0.281064  -0.745 0.456486    
## hr.f1:lg_temp:season.fFall    -0.366280   0.638366  -0.574 0.566139    
## hr.f2:lg_temp:season.fFall    -0.140081   0.633988  -0.221 0.825137    
## hr.f3:lg_temp:season.fFall    -0.411336   0.626651  -0.656 0.511588    
## hr.f4:lg_temp:season.fFall     0.644355   0.616070   1.046 0.295642    
## hr.f5:lg_temp:season.fFall    -0.861608   0.623324  -1.382 0.166935    
## hr.f6:lg_temp:season.fFall    -0.410943   0.633637  -0.649 0.516656    
## hr.f7:lg_temp:season.fFall    -0.411451   0.596207  -0.690 0.490148    
## hr.f8:lg_temp:season.fFall    -0.564782   0.618044  -0.914 0.360845    
## hr.f9:lg_temp:season.fFall    -0.435533   0.590300  -0.738 0.460654    
## hr.f10:lg_temp:season.fFall   -0.934171   0.639697  -1.460 0.144249    
## hr.f11:lg_temp:season.fFall   -1.169814   0.605338  -1.932 0.053343 .  
## hr.f12:lg_temp:season.fFall   -0.925475   0.624344  -1.482 0.138307    
## hr.f13:lg_temp:season.fFall   -0.240478   0.599054  -0.401 0.688118    
## hr.f14:lg_temp:season.fFall   -0.362682   0.609279  -0.595 0.551688    
## hr.f15:lg_temp:season.fFall   -0.948783   0.603819  -1.571 0.116163    
## hr.f16:lg_temp:season.fFall   -0.884953   0.576669  -1.535 0.124934    
## hr.f17:lg_temp:season.fFall   -0.502880   0.599878  -0.838 0.401892    
## hr.f18:lg_temp:season.fFall   -0.757977   0.580224  -1.306 0.191481    
## hr.f19:lg_temp:season.fFall   -0.161670   0.598097  -0.270 0.786933    
## hr.f20:lg_temp:season.fFall    0.294859   0.636128   0.464 0.643007    
## hr.f21:lg_temp:season.fFall   -0.164910   0.629314  -0.262 0.793293    
## hr.f22:lg_temp:season.fFall   -0.245219   0.608577  -0.403 0.687008    
## hr.f23:lg_temp:season.fFall   -0.165293   0.621985  -0.266 0.790440    
## hr.f1:lg_temp:season.fWinter  -0.042274   0.293756  -0.144 0.885576    
## hr.f2:lg_temp:season.fWinter  -0.135257   0.291176  -0.465 0.642292    
## hr.f3:lg_temp:season.fWinter   0.298732   0.279031   1.071 0.284389    
## hr.f4:lg_temp:season.fWinter   0.296046   0.289791   1.022 0.307017    
## hr.f5:lg_temp:season.fWinter   0.148366   0.286943   0.517 0.605134    
## hr.f6:lg_temp:season.fWinter  -0.003593   0.278967  -0.013 0.989723    
## hr.f7:lg_temp:season.fWinter   0.045860   0.272403   0.168 0.866311    
## hr.f8:lg_temp:season.fWinter  -0.218740   0.298102  -0.734 0.463112    
## hr.f9:lg_temp:season.fWinter  -0.057390   0.299537  -0.192 0.848066    
## hr.f10:lg_temp:season.fWinter -0.157159   0.315250  -0.499 0.618134    
## hr.f11:lg_temp:season.fWinter -0.385751   0.342026  -1.128 0.259430    
## hr.f12:lg_temp:season.fWinter -0.291798   0.328061  -0.889 0.373789    
## hr.f13:lg_temp:season.fWinter -0.217109   0.342245  -0.634 0.525865    
## hr.f14:lg_temp:season.fWinter -0.259841   0.336588  -0.772 0.440154    
## hr.f15:lg_temp:season.fWinter  0.170420   0.326336   0.522 0.601533    
## hr.f16:lg_temp:season.fWinter  0.204876   0.329475   0.622 0.534078    
## hr.f17:lg_temp:season.fWinter  0.235525   0.314210   0.750 0.453536    
## hr.f18:lg_temp:season.fWinter  0.097032   0.321844   0.301 0.763052    
## hr.f19:lg_temp:season.fWinter  0.443724   0.308953   1.436 0.150990    
## hr.f20:lg_temp:season.fWinter  0.198931   0.317424   0.627 0.530877    
## hr.f21:lg_temp:season.fWinter  0.178255   0.308565   0.578 0.563494    
## hr.f22:lg_temp:season.fWinter  0.132711   0.306606   0.433 0.665147    
## hr.f23:lg_temp:season.fWinter -0.054760   0.294036  -0.186 0.852266    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4212 on 6230 degrees of freedom
## Multiple R-squared:  0.919,  Adjusted R-squared:  0.9157 
## F-statistic: 280.5 on 252 and 6230 DF,  p-value: < 2.2e-16

Next, I am going to generate predictions of the training dataset using the interactive log model and to compare them to the actual values.

## # A tibble: 20 x 2
##      cnt       pred
##    <int>      <dbl>
##  1    40  17.520762
##  2     1   2.171260
##  3     1   1.725305
##  4     2   3.091957
##  5    94 131.559509
##  6    67  82.434962
##  7    35  65.062956
##  8    37  44.570102
##  9    39  23.138253
## 10     6   7.063183
## 11    93 108.167812
## 12    74  87.733195
## 13    22  47.521091
## 14     9  31.469255
## 15     5   9.737650
## 16    30  25.051612
## 17    88 125.419972
## 18    76  92.209247
## 19   110 110.204445
## 20    94  65.457335
## [1] 4743026
## [1] 6.589896

Simply glancing over the first 20 results, the predictions basically align with the actual values with some deviations. More importantly, the mean squared error is only 6.59, suggesting that the model is very valid.

Generalized linear model

Poisson Models

After researching online and closely examining the dataset, I decide that a non-linear regression is also suitable in this situation because the response variable is non-negative integer (count data). And a poisson regression seems reasonable because we are measuring the total rental counts per hour (number of arrivals per unit of time).

## [1] 144.3972
## [1] 17850.4

Apparently variance and mean of the rental count per hour are not the same, which means it is not a strictly Poisson regression in this case. There is probably overdispersion in the data, which occurs when the observed variance is much higher than the observed mean. To test for overdispersion, let’s fit both Poisson and quasi-Poisson model to the same data.

## 
## Call:
## glm(formula = formula1, family = poisson, data = train11)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -19.5243   -3.3235   -0.6176    2.6411   21.0595  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  2.961642   0.012113 244.497  < 2e-16 ***
## hr.f1                       -0.487373   0.014926 -32.652  < 2e-16 ***
## hr.f2                       -0.883611   0.016884 -52.334  < 2e-16 ***
## hr.f3                       -1.505289   0.022331 -67.408  < 2e-16 ***
## hr.f4                       -2.089087   0.029326 -71.237  < 2e-16 ***
## hr.f5                       -1.066271   0.018826 -56.638  < 2e-16 ***
## hr.f6                        0.331216   0.012052  27.483  < 2e-16 ***
## hr.f7                        1.332681   0.010086 132.136  < 2e-16 ***
## hr.f8                        1.807462   0.009617 187.952  < 2e-16 ***
## hr.f9                        1.303421   0.010079 129.318  < 2e-16 ***
## hr.f10                       1.044882   0.010312 101.324  < 2e-16 ***
## hr.f11                       1.191663   0.010184 117.011  < 2e-16 ***
## hr.f12                       1.355565   0.010053 134.847  < 2e-16 ***
## hr.f13                       1.359879   0.010062 135.151  < 2e-16 ***
## hr.f14                       1.274582   0.010142 125.677  < 2e-16 ***
## hr.f15                       1.284774   0.010195 126.020  < 2e-16 ***
## hr.f16                       1.538777   0.009931 154.943  < 2e-16 ***
## hr.f17                       1.955886   0.009623 203.258  < 2e-16 ***
## hr.f18                       1.902731   0.009614 197.906  < 2e-16 ***
## hr.f19                       1.595422   0.009759 163.488  < 2e-16 ***
## hr.f20                       1.321851   0.010064 131.345  < 2e-16 ***
## hr.f21                       1.092284   0.010241 106.661  < 2e-16 ***
## hr.f22                       0.827568   0.010642  77.766  < 2e-16 ***
## hr.f23                       0.461826   0.011434  40.391  < 2e-16 ***
## mnth.f2                      0.199456   0.008061  24.743  < 2e-16 ***
## mnth.f3                      0.322305   0.008307  38.799  < 2e-16 ***
## mnth.f4                      0.530135   0.011356  46.684  < 2e-16 ***
## mnth.f5                      0.786666   0.011803  66.647  < 2e-16 ***
## mnth.f6                      0.699463   0.012226  57.210  < 2e-16 ***
## mnth.f7                      0.526043   0.013198  39.859  < 2e-16 ***
## mnth.f8                      0.607408   0.012715  47.773  < 2e-16 ***
## mnth.f9                      0.690178   0.011694  59.018  < 2e-16 ***
## mnth.f10                     0.608446   0.010831  56.174  < 2e-16 ***
## mnth.f11                     0.466332   0.010624  43.893  < 2e-16 ***
## mnth.f12                     0.444657   0.009086  48.939  < 2e-16 ***
## holiday.f1                  -0.136887   0.007172 -19.085  < 2e-16 ***
## workingday.f1               -0.005971   0.002314  -2.580  0.00988 ** 
## weathersit.fCloudy & Misty  -0.026932   0.002662 -10.118  < 2e-16 ***
## weathersit.fAdverse Weather -0.447284   0.005124 -87.299  < 2e-16 ***
## weathersit.fSevere Weather  -0.686365   0.166881  -4.113 3.91e-05 ***
## season.fSummer               0.163607   0.007620  21.470  < 2e-16 ***
## season.fFall                 0.228876   0.007993  28.635  < 2e-16 ***
## season.fWinter               0.357385   0.007335  48.722  < 2e-16 ***
## temp                         0.014832   0.045870   0.323  0.74643    
## atemp                        0.879267   0.048703  18.054  < 2e-16 ***
## hum                         -0.399088   0.007716 -51.719  < 2e-16 ***
## windspeed                   -0.091023   0.009183  -9.912  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 786230  on 6482  degrees of freedom
## Residual deviance: 164223  on 6436  degrees of freedom
## AIC: 204137
## 
## Number of Fisher Scoring iterations: 5
## 
## Call:
## glm(formula = formula1, family = quasipoisson, data = train11)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -19.5243   -3.3235   -0.6176    2.6411   21.0595  
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  2.961642   0.060173  49.218  < 2e-16 ***
## hr.f1                       -0.487373   0.074147  -6.573 5.31e-11 ***
## hr.f2                       -0.883611   0.083873 -10.535  < 2e-16 ***
## hr.f3                       -1.505289   0.110932 -13.570  < 2e-16 ***
## hr.f4                       -2.089087   0.145680 -14.340  < 2e-16 ***
## hr.f5                       -1.066271   0.093520 -11.402  < 2e-16 ***
## hr.f6                        0.331216   0.059867   5.533 3.28e-08 ***
## hr.f7                        1.332681   0.050101  26.600  < 2e-16 ***
## hr.f8                        1.807462   0.047771  37.836  < 2e-16 ***
## hr.f9                        1.303421   0.050069  26.032  < 2e-16 ***
## hr.f10                       1.044882   0.051227  20.397  < 2e-16 ***
## hr.f11                       1.191663   0.050591  23.555  < 2e-16 ***
## hr.f12                       1.355565   0.049937  27.146  < 2e-16 ***
## hr.f13                       1.359879   0.049984  27.207  < 2e-16 ***
## hr.f14                       1.274582   0.050380  25.300  < 2e-16 ***
## hr.f15                       1.284774   0.050644  25.369  < 2e-16 ***
## hr.f16                       1.538777   0.049334  31.191  < 2e-16 ***
## hr.f17                       1.955886   0.047801  40.917  < 2e-16 ***
## hr.f18                       1.902731   0.047760  39.840  < 2e-16 ***
## hr.f19                       1.595422   0.048477  32.911  < 2e-16 ***
## hr.f20                       1.321851   0.049993  26.440  < 2e-16 ***
## hr.f21                       1.092284   0.050871  21.471  < 2e-16 ***
## hr.f22                       0.827568   0.052864  15.655  < 2e-16 ***
## hr.f23                       0.461826   0.056798   8.131 5.07e-16 ***
## mnth.f2                      0.199456   0.040045   4.981 6.50e-07 ***
## mnth.f3                      0.322305   0.041266   7.810 6.61e-15 ***
## mnth.f4                      0.530135   0.056411   9.398  < 2e-16 ***
## mnth.f5                      0.786666   0.058635  13.416  < 2e-16 ***
## mnth.f6                      0.699463   0.060734  11.517  < 2e-16 ***
## mnth.f7                      0.526043   0.065560   8.024 1.21e-15 ***
## mnth.f8                      0.607408   0.063161   9.617  < 2e-16 ***
## mnth.f9                      0.690178   0.058092  11.881  < 2e-16 ***
## mnth.f10                     0.608446   0.053806  11.308  < 2e-16 ***
## mnth.f11                     0.466332   0.052777   8.836  < 2e-16 ***
## mnth.f12                     0.444657   0.045135   9.852  < 2e-16 ***
## holiday.f1                  -0.136887   0.035629  -3.842 0.000123 ***
## workingday.f1               -0.005971   0.011497  -0.519 0.603536    
## weathersit.fCloudy & Misty  -0.026932   0.013223  -2.037 0.041713 *  
## weathersit.fAdverse Weather -0.447284   0.025452 -17.574  < 2e-16 ***
## weathersit.fSevere Weather  -0.686365   0.828993  -0.828 0.407729    
## season.fSummer               0.163607   0.037854   4.322 1.57e-05 ***
## season.fFall                 0.228876   0.039705   5.764 8.58e-09 ***
## season.fWinter               0.357385   0.036438   9.808  < 2e-16 ***
## temp                         0.014832   0.227863   0.065 0.948103    
## atemp                        0.879267   0.241936   3.634 0.000281 ***
## hum                         -0.399088   0.038332 -10.411  < 2e-16 ***
## windspeed                   -0.091023   0.045618  -1.995 0.046049 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 24.67681)
## 
##     Null deviance: 786230  on 6482  degrees of freedom
## Residual deviance: 164223  on 6436  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

According to the output above, it is easy to tell that overdispersion definitely exists as the dispersion parameter, which is forced to be 1 in the poisson model, is estimated at 26 in the quasi-poisson model, which suggests that overdispersion does exist. Now we should consider the use of Negative Binomial regression model in this case.

Negative Binomial Model

## 
## Call:
## glm.nb(formula = formula1, data = train11, init.theta = 3.955918709, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.8413  -0.7292  -0.0680   0.5048   4.6242  
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  2.95168    0.05353  55.136  < 2e-16 ***
## hr.f1                       -0.48481    0.04616 -10.503  < 2e-16 ***
## hr.f2                       -0.86861    0.04657 -18.653  < 2e-16 ***
## hr.f3                       -1.48901    0.04999 -29.789  < 2e-16 ***
## hr.f4                       -2.02125    0.05372 -37.626  < 2e-16 ***
## hr.f5                       -0.99721    0.04801 -20.769  < 2e-16 ***
## hr.f6                        0.41283    0.04521   9.132  < 2e-16 ***
## hr.f7                        1.42285    0.04429  32.124  < 2e-16 ***
## hr.f8                        1.94834    0.04416  44.118  < 2e-16 ***
## hr.f9                        1.46087    0.04431  32.970  < 2e-16 ***
## hr.f10                       1.10280    0.04426  24.915  < 2e-16 ***
## hr.f11                       1.22616    0.04488  27.323  < 2e-16 ***
## hr.f12                       1.40588    0.04515  31.135  < 2e-16 ***
## hr.f13                       1.40405    0.04562  30.780  < 2e-16 ***
## hr.f14                       1.32565    0.04544  29.176  < 2e-16 ***
## hr.f15                       1.34238    0.04613  29.098  < 2e-16 ***
## hr.f16                       1.58428    0.04556  34.775  < 2e-16 ***
## hr.f17                       2.04214    0.04508  45.305  < 2e-16 ***
## hr.f18                       1.97881    0.04483  44.139  < 2e-16 ***
## hr.f19                       1.64800    0.04427  37.223  < 2e-16 ***
## hr.f20                       1.36771    0.04491  30.456  < 2e-16 ***
## hr.f21                       1.12405    0.04399  25.550  < 2e-16 ***
## hr.f22                       0.87459    0.04396  19.895  < 2e-16 ***
## hr.f23                       0.49897    0.04440  11.238  < 2e-16 ***
## mnth.f2                      0.20277    0.03505   5.785 7.24e-09 ***
## mnth.f3                      0.25294    0.03885   6.511 7.46e-11 ***
## mnth.f4                      0.44504    0.05967   7.458 8.77e-14 ***
## mnth.f5                      0.74809    0.06403  11.684  < 2e-16 ***
## mnth.f6                      0.61994    0.06795   9.123  < 2e-16 ***
## mnth.f7                      0.49269    0.07637   6.452 1.11e-10 ***
## mnth.f8                      0.52598    0.07352   7.155 8.40e-13 ***
## mnth.f9                      0.60646    0.06586   9.208  < 2e-16 ***
## mnth.f10                     0.47290    0.05827   8.116 4.82e-16 ***
## mnth.f11                     0.36379    0.05637   6.454 1.09e-10 ***
## mnth.f12                     0.38096    0.04420   8.619  < 2e-16 ***
## holiday.f1                  -0.22606    0.04151  -5.446 5.15e-08 ***
## workingday.f1               -0.18759    0.01457 -12.878  < 2e-16 ***
## weathersit.fCloudy & Misty  -0.02660    0.01643  -1.619 0.105449    
## weathersit.fAdverse Weather -0.49182    0.02692 -18.271  < 2e-16 ***
## weathersit.fSevere Weather  -0.59877    0.53181  -1.126 0.260207    
## season.fSummer               0.15658    0.04186   3.741 0.000183 ***
## season.fFall                 0.22103    0.04834   4.572 4.83e-06 ***
## season.fWinter               0.41641    0.04119  10.109  < 2e-16 ***
## temp                         0.12568    0.30733   0.409 0.682580    
## atemp                        1.01866    0.32461   3.138 0.001700 ** 
## hum                         -0.37074    0.04635  -7.998 1.26e-15 ***
## windspeed                   -0.15108    0.05994  -2.520 0.011722 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(3.9559) family taken to be 1)
## 
##     Null deviance: 31408.6  on 6482  degrees of freedom
## Residual deviance:  6916.9  on 6436  degrees of freedom
## AIC: 66927
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  3.9559 
##           Std. Err.:  0.0754 
## 
##  2 x log-likelihood:  -66831.0750

Obviously, the negative binomial model is a better fit in this case as its residual deviance is only 6916.9, much less than the poisson model’s 164223 and the null model’s 31408.6 that includes no parameter but the intercept. Next, I will build a new negative binomial model by including interaction terms as those in the log interactive model:

## 
## Call:
## glm.nb(formula = cnt ~ hr.f * temp * workingday.f + hr.f * temp * 
##     season.f + temp + mnth.f + hum + windspeed, data = train11, 
##     init.theta = 10.79018799, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.6408  -0.5869  -0.0127   0.4855   7.6183  
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 3.234525   0.144000  22.462  < 2e-16 ***
## hr.f1                      -0.347395   0.216383  -1.605 0.108392    
## hr.f2                      -0.543862   0.222008  -2.450 0.014296 *  
## hr.f3                      -0.806485   0.245096  -3.290 0.001000 ** 
## hr.f4                      -1.953445   0.341688  -5.717 1.08e-08 ***
## hr.f5                      -2.454827   0.281584  -8.718  < 2e-16 ***
## hr.f6                      -1.751092   0.232172  -7.542 4.62e-14 ***
## hr.f7                      -0.808791   0.205229  -3.941 8.12e-05 ***
## hr.f8                       0.253261   0.204217   1.240 0.214918    
## hr.f9                       0.602675   0.201634   2.989 0.002799 ** 
## hr.f10                      0.927773   0.196133   4.730 2.24e-06 ***
## hr.f11                      1.010174   0.203592   4.962 6.99e-07 ***
## hr.f12                      1.296911   0.201387   6.440 1.20e-10 ***
## hr.f13                      1.165416   0.210081   5.547 2.90e-08 ***
## hr.f14                      1.327834   0.201840   6.579 4.75e-11 ***
## hr.f15                      1.249909   0.200624   6.230 4.66e-10 ***
## hr.f16                      1.263467   0.202568   6.237 4.45e-10 ***
## hr.f17                      1.220836   0.198479   6.151 7.70e-10 ***
## hr.f18                      1.003953   0.196178   5.118 3.10e-07 ***
## hr.f19                      0.826709   0.198115   4.173 3.01e-05 ***
## hr.f20                      0.487677   0.211928   2.301 0.021384 *  
## hr.f21                      0.294547   0.203374   1.448 0.147531    
## hr.f22                      0.080870   0.202359   0.400 0.689425    
## hr.f23                     -0.119409   0.204410  -0.584 0.559111    
## temp                        2.404332   0.444450   5.410 6.31e-08 ***
## workingday.f1              -0.838300   0.124688  -6.723 1.78e-11 ***
## season.fSummer              0.052391   0.204583   0.256 0.797885    
## season.fFall                1.060568   0.383392   2.766 0.005670 ** 
## season.fWinter              0.683308   0.209729   3.258 0.001122 ** 
## mnth.f2                     0.135055   0.024803   5.445 5.18e-08 ***
## mnth.f3                     0.183770   0.029480   6.234 4.55e-10 ***
## mnth.f4                     0.485137   0.039891  12.162  < 2e-16 ***
## mnth.f5                     0.728367   0.042241  17.243  < 2e-16 ***
## mnth.f6                     0.557212   0.044830  12.430  < 2e-16 ***
## mnth.f7                     0.476269   0.049313   9.658  < 2e-16 ***
## mnth.f8                     0.466275   0.047609   9.794  < 2e-16 ***
## mnth.f9                     0.447456   0.045556   9.822  < 2e-16 ***
## mnth.f10                    0.398653   0.039580  10.072  < 2e-16 ***
## mnth.f11                    0.297713   0.038525   7.728 1.09e-14 ***
## mnth.f12                    0.328305   0.032432  10.123  < 2e-16 ***
## hum                        -0.747564   0.026711 -27.987  < 2e-16 ***
## windspeed                  -0.446801   0.036797 -12.142  < 2e-16 ***
## hr.f1:temp                  0.111825   0.673224   0.166 0.868075    
## hr.f2:temp                  0.185791   0.746426   0.249 0.803432    
## hr.f3:temp                 -1.799047   0.858148  -2.096 0.036044 *  
## hr.f4:temp                 -1.316758   1.096259  -1.201 0.229698    
## hr.f5:temp                 -0.201107   0.928272  -0.217 0.828484    
## hr.f6:temp                 -0.236679   0.731513  -0.324 0.746281    
## hr.f7:temp                 -0.379553   0.654047  -0.580 0.561703    
## hr.f8:temp                 -0.574469   0.685468  -0.838 0.401992    
## hr.f9:temp                 -0.321478   0.630245  -0.510 0.609993    
## hr.f10:temp                 0.164608   0.594267   0.277 0.781785    
## hr.f11:temp                 0.357694   0.604155   0.592 0.553813    
## hr.f12:temp                -0.122439   0.589396  -0.208 0.835435    
## hr.f13:temp                 0.252067   0.587558   0.429 0.667917    
## hr.f14:temp                -0.298886   0.577475  -0.518 0.604756    
## hr.f15:temp                -0.131445   0.562051  -0.234 0.815089    
## hr.f16:temp                -0.366307   0.575707  -0.636 0.524598    
## hr.f17:temp                -0.522655   0.567310  -0.921 0.356901    
## hr.f18:temp                -0.418664   0.574812  -0.728 0.466400    
## hr.f19:temp                -0.782685   0.588904  -1.329 0.183830    
## hr.f20:temp                -0.362281   0.620663  -0.584 0.559422    
## hr.f21:temp                -0.137911   0.619173  -0.223 0.823743    
## hr.f22:temp                 0.146134   0.617024   0.237 0.812783    
## hr.f23:temp                 0.214216   0.637604   0.336 0.736893    
## hr.f1:workingday.f1        -0.325313   0.187119  -1.739 0.082117 .  
## hr.f2:workingday.f1        -0.687554   0.194266  -3.539 0.000401 ***
## hr.f3:workingday.f1        -0.358921   0.222838  -1.611 0.107250    
## hr.f4:workingday.f1         0.419159   0.256274   1.636 0.101926    
## hr.f5:workingday.f1         1.945314   0.220463   8.824  < 2e-16 ***
## hr.f6:workingday.f1         2.977014   0.187445  15.882  < 2e-16 ***
## hr.f7:workingday.f1         2.999566   0.174246  17.215  < 2e-16 ***
## hr.f8:workingday.f1         2.904639   0.165167  17.586  < 2e-16 ***
## hr.f9:workingday.f1         2.237143   0.167240  13.377  < 2e-16 ***
## hr.f10:workingday.f1        0.743018   0.166549   4.461 8.15e-06 ***
## hr.f11:workingday.f1        0.526213   0.170053   3.094 0.001972 ** 
## hr.f12:workingday.f1        0.399288   0.169837   2.351 0.018723 *  
## hr.f13:workingday.f1        0.502059   0.174853   2.871 0.004088 ** 
## hr.f14:workingday.f1        0.374639   0.169401   2.212 0.026997 *  
## hr.f15:workingday.f1        0.405625   0.175485   2.311 0.020808 *  
## hr.f16:workingday.f1        0.740821   0.171488   4.320 1.56e-05 ***
## hr.f17:workingday.f1        1.576515   0.169447   9.304  < 2e-16 ***
## hr.f18:workingday.f1        1.687312   0.169167   9.974  < 2e-16 ***
## hr.f19:workingday.f1        1.643839   0.169949   9.673  < 2e-16 ***
## hr.f20:workingday.f1        1.606472   0.176354   9.109  < 2e-16 ***
## hr.f21:workingday.f1        1.434903   0.170851   8.399  < 2e-16 ***
## hr.f22:workingday.f1        1.251914   0.171790   7.287 3.16e-13 ***
## hr.f23:workingday.f1        0.885177   0.172222   5.140 2.75e-07 ***
## temp:workingday.f1         -0.249861   0.241684  -1.034 0.301213    
## hr.f1:season.fSummer        0.317319   0.308663   1.028 0.303930    
## hr.f2:season.fSummer        0.470324   0.314996   1.493 0.135408    
## hr.f3:season.fSummer       -0.297656   0.365160  -0.815 0.414993    
## hr.f4:season.fSummer        0.211142   0.444053   0.475 0.634439    
## hr.f5:season.fSummer        0.430101   0.331190   1.299 0.194062    
## hr.f6:season.fSummer        0.114944   0.281751   0.408 0.683300    
## hr.f7:season.fSummer        0.116801   0.268835   0.434 0.663945    
## hr.f8:season.fSummer       -0.007301   0.275834  -0.026 0.978884    
## hr.f9:season.fSummer       -0.040854   0.269140  -0.152 0.879349    
## hr.f10:season.fSummer       0.265597   0.265195   1.002 0.316578    
## hr.f11:season.fSummer       0.461079   0.270765   1.703 0.088591 .  
## hr.f12:season.fSummer       0.443921   0.274112   1.619 0.105343    
## hr.f13:season.fSummer       0.768343   0.290457   2.645 0.008162 ** 
## hr.f14:season.fSummer       0.512371   0.274080   1.869 0.061564 .  
## hr.f15:season.fSummer       0.601283   0.279081   2.155 0.031200 *  
## hr.f16:season.fSummer       0.288349   0.282335   1.021 0.307111    
## hr.f17:season.fSummer       0.285656   0.280969   1.017 0.309305    
## hr.f18:season.fSummer       0.221351   0.271510   0.815 0.414924    
## hr.f19:season.fSummer      -0.030059   0.285119  -0.105 0.916039    
## hr.f20:season.fSummer      -0.270543   0.282254  -0.959 0.337806    
## hr.f21:season.fSummer      -0.220993   0.282947  -0.781 0.434779    
## hr.f22:season.fSummer      -0.128376   0.276603  -0.464 0.642563    
## hr.f23:season.fSummer       0.273147   0.284005   0.962 0.336167    
## hr.f1:season.fFall          0.137648   0.590748   0.233 0.815756    
## hr.f2:season.fFall          0.266562   0.630900   0.423 0.672652    
## hr.f3:season.fFall          0.062576   0.661348   0.095 0.924617    
## hr.f4:season.fFall         -0.945234   0.733153  -1.289 0.197303    
## hr.f5:season.fFall          1.039369   0.596354   1.743 0.081356 .  
## hr.f6:season.fFall          0.404026   0.550755   0.734 0.463202    
## hr.f7:season.fFall          0.427568   0.513800   0.832 0.405314    
## hr.f8:season.fFall          0.117710   0.516668   0.228 0.819782    
## hr.f9:season.fFall         -0.140387   0.500017  -0.281 0.778891    
## hr.f10:season.fFall         0.525113   0.534669   0.982 0.326037    
## hr.f11:season.fFall         0.713331   0.510283   1.398 0.162139    
## hr.f12:season.fFall         0.643023   0.520797   1.235 0.216946    
## hr.f13:season.fFall         0.142355   0.502883   0.283 0.777117    
## hr.f14:season.fFall         0.641095   0.511158   1.254 0.209768    
## hr.f15:season.fFall         0.953101   0.504535   1.889 0.058882 .  
## hr.f16:season.fFall         0.609139   0.484226   1.258 0.208405    
## hr.f17:season.fFall         0.285678   0.501238   0.570 0.568716    
## hr.f18:season.fFall         0.541732   0.486147   1.114 0.265135    
## hr.f19:season.fFall         0.098278   0.501291   0.196 0.844572    
## hr.f20:season.fFall        -0.452658   0.532394  -0.850 0.395196    
## hr.f21:season.fFall         0.032158   0.526710   0.061 0.951317    
## hr.f22:season.fFall         0.081629   0.516255   0.158 0.874364    
## hr.f23:season.fFall         0.311802   0.530873   0.587 0.556977    
## hr.f1:season.fWinter        0.104451   0.306550   0.341 0.733307    
## hr.f2:season.fWinter        0.099897   0.322790   0.309 0.756957    
## hr.f3:season.fWinter       -0.628045   0.337230  -1.862 0.062552 .  
## hr.f4:season.fWinter       -0.262745   0.410650  -0.640 0.522286    
## hr.f5:season.fWinter        0.438743   0.329060   1.333 0.182427    
## hr.f6:season.fWinter        0.235670   0.280215   0.841 0.400329    
## hr.f7:season.fWinter        0.321368   0.263481   1.220 0.222578    
## hr.f8:season.fWinter        0.114831   0.271407   0.423 0.672225    
## hr.f9:season.fWinter        0.005893   0.275802   0.021 0.982952    
## hr.f10:season.fWinter       0.064308   0.284904   0.226 0.821420    
## hr.f11:season.fWinter       0.268389   0.297560   0.902 0.367074    
## hr.f12:season.fWinter       0.179002   0.291351   0.614 0.538961    
## hr.f13:season.fWinter       0.053973   0.297113   0.182 0.855850    
## hr.f14:season.fWinter       0.002737   0.294758   0.009 0.992591    
## hr.f15:season.fWinter      -0.216986   0.290879  -0.746 0.455687    
## hr.f16:season.fWinter      -0.228378   0.288796  -0.791 0.429064    
## hr.f17:season.fWinter      -0.407663   0.279909  -1.456 0.145278    
## hr.f18:season.fWinter      -0.222853   0.282375  -0.789 0.429990    
## hr.f19:season.fWinter      -0.447668   0.278049  -1.610 0.107391    
## hr.f20:season.fWinter      -0.249107   0.287224  -0.867 0.385783    
## hr.f21:season.fWinter      -0.139278   0.280772  -0.496 0.619856    
## hr.f22:season.fWinter      -0.202351   0.280815  -0.721 0.471165    
## hr.f23:season.fWinter       0.019190   0.282439   0.068 0.945829    
## temp:season.fSummer        -0.134685   0.524599  -0.257 0.797381    
## temp:season.fFall          -1.635534   0.686621  -2.382 0.017219 *  
## temp:season.fWinter        -0.672058   0.579167  -1.160 0.245891    
## hr.f1:temp:workingday.f1   -0.195383   0.368507  -0.530 0.595973    
## hr.f2:temp:workingday.f1   -0.126522   0.382656  -0.331 0.740915    
## hr.f3:temp:workingday.f1   -0.609561   0.438720  -1.389 0.164709    
## hr.f4:temp:workingday.f1    0.416065   0.509294   0.817 0.413960    
## hr.f5:temp:workingday.f1   -0.044430   0.453610  -0.098 0.921974    
## hr.f6:temp:workingday.f1   -0.619696   0.376369  -1.647 0.099659 .  
## hr.f7:temp:workingday.f1   -0.294822   0.343680  -0.858 0.390983    
## hr.f8:temp:workingday.f1   -0.827457   0.323972  -2.554 0.010646 *  
## hr.f9:temp:workingday.f1   -1.707480   0.326088  -5.236 1.64e-07 ***
## hr.f10:temp:workingday.f1  -0.760246   0.317001  -2.398 0.016474 *  
## hr.f11:temp:workingday.f1  -0.425599   0.317519  -1.340 0.180119    
## hr.f12:temp:workingday.f1  -0.050557   0.316326  -0.160 0.873019    
## hr.f13:temp:workingday.f1  -0.314423   0.318987  -0.986 0.324284    
## hr.f14:temp:workingday.f1  -0.165556   0.309417  -0.535 0.592609    
## hr.f15:temp:workingday.f1  -0.102321   0.318280  -0.321 0.747846    
## hr.f16:temp:workingday.f1  -0.021761   0.313496  -0.069 0.944661    
## hr.f17:temp:workingday.f1  -0.239307   0.313086  -0.764 0.444660    
## hr.f18:temp:workingday.f1  -0.411709   0.319052  -1.290 0.196906    
## hr.f19:temp:workingday.f1  -0.641671   0.321386  -1.997 0.045872 *  
## hr.f20:temp:workingday.f1  -0.697131   0.334658  -2.083 0.037241 *  
## hr.f21:temp:workingday.f1  -0.571893   0.329387  -1.736 0.082522 .  
## hr.f22:temp:workingday.f1  -0.316056   0.333171  -0.949 0.342809    
## hr.f23:temp:workingday.f1   0.039174   0.339015   0.116 0.908007    
## hr.f1:temp:season.fSummer  -0.581861   0.795288  -0.732 0.464391    
## hr.f2:temp:season.fSummer  -1.011343   0.864672  -1.170 0.242151    
## hr.f3:temp:season.fSummer   1.531419   1.017043   1.506 0.132130    
## hr.f4:temp:season.fSummer  -0.037001   1.234459  -0.030 0.976088    
## hr.f5:temp:season.fSummer  -0.417087   0.993899  -0.420 0.674743    
## hr.f6:temp:season.fSummer   0.322193   0.795190   0.405 0.685347    
## hr.f7:temp:season.fSummer   0.163798   0.733064   0.223 0.823190    
## hr.f8:temp:season.fSummer   0.115861   0.766810   0.151 0.879901    
## hr.f9:temp:season.fSummer   0.068361   0.713394   0.096 0.923660    
## hr.f10:temp:season.fSummer -0.916039   0.679229  -1.349 0.177451    
## hr.f11:temp:season.fSummer -1.341498   0.683323  -1.963 0.049623 *  
## hr.f12:temp:season.fSummer -1.060638   0.678338  -1.564 0.117915    
## hr.f13:temp:season.fSummer -1.676274   0.682962  -2.454 0.014111 *  
## hr.f14:temp:season.fSummer -1.148471   0.661159  -1.737 0.082377 .  
## hr.f15:temp:season.fSummer -1.344426   0.654032  -2.056 0.039821 *  
## hr.f16:temp:season.fSummer -0.549422   0.670720  -0.819 0.412699    
## hr.f17:temp:season.fSummer -0.401389   0.665517  -0.603 0.546426    
## hr.f18:temp:season.fSummer -0.186584   0.663139  -0.281 0.778431    
## hr.f19:temp:season.fSummer  0.754416   0.690005   1.093 0.274241    
## hr.f20:temp:season.fSummer  0.922665   0.703966   1.311 0.189970    
## hr.f21:temp:season.fSummer  0.680261   0.713512   0.953 0.340389    
## hr.f22:temp:season.fSummer  0.258364   0.710912   0.363 0.716286    
## hr.f23:temp:season.fSummer -0.672484   0.736295  -0.913 0.361066    
## hr.f1:temp:season.fFall    -0.250268   1.062386  -0.236 0.813765    
## hr.f2:temp:season.fFall    -0.741203   1.157574  -0.640 0.521973    
## hr.f3:temp:season.fFall     1.171419   1.269347   0.923 0.356084    
## hr.f4:temp:season.fFall     1.902958   1.471095   1.294 0.195816    
## hr.f5:temp:season.fFall    -1.321183   1.210882  -1.091 0.275232    
## hr.f6:temp:season.fFall    -0.093779   1.038339  -0.090 0.928035    
## hr.f7:temp:season.fFall    -0.305965   0.953350  -0.321 0.748258    
## hr.f8:temp:season.fFall     0.010419   0.973434   0.011 0.991460    
## hr.f9:temp:season.fFall     0.396880   0.909004   0.437 0.662394    
## hr.f10:temp:season.fFall   -1.026441   0.915605  -1.121 0.262266    
## hr.f11:temp:season.fFall   -1.389246   0.889946  -1.561 0.118513    
## hr.f12:temp:season.fFall   -1.133210   0.889998  -1.273 0.202921    
## hr.f13:temp:season.fFall   -0.624003   0.863211  -0.723 0.469750    
## hr.f14:temp:season.fFall   -0.975502   0.865730  -1.127 0.259828    
## hr.f15:temp:season.fFall   -1.450047   0.851617  -1.703 0.088625 .  
## hr.f16:temp:season.fFall   -0.770828   0.840803  -0.917 0.359260    
## hr.f17:temp:season.fFall   -0.210693   0.854370  -0.247 0.805213    
## hr.f18:temp:season.fFall   -0.342930   0.849322  -0.404 0.686382    
## hr.f19:temp:season.fFall    0.740974   0.879705   0.842 0.399621    
## hr.f20:temp:season.fFall    1.286106   0.928827   1.385 0.166157    
## hr.f21:temp:season.fFall    0.468874   0.933794   0.502 0.615585    
## hr.f22:temp:season.fFall   -0.001077   0.922962  -0.001 0.999069    
## hr.f23:temp:season.fFall   -0.524609   0.961003  -0.546 0.585136    
## hr.f1:temp:season.fWinter  -0.361377   0.867892  -0.416 0.677128    
## hr.f2:temp:season.fWinter  -0.600436   0.947539  -0.634 0.526290    
## hr.f3:temp:season.fWinter   2.246708   1.040462   2.159 0.030824 *  
## hr.f4:temp:season.fWinter   1.195505   1.245791   0.960 0.337239    
## hr.f5:temp:season.fWinter  -0.435289   1.046770  -0.416 0.677527    
## hr.f6:temp:season.fWinter  -0.041082   0.849670  -0.048 0.961437    
## hr.f7:temp:season.fWinter  -0.096498   0.787759  -0.122 0.902506    
## hr.f8:temp:season.fWinter  -0.192347   0.819123  -0.235 0.814348    
## hr.f9:temp:season.fWinter  -0.033539   0.782570  -0.043 0.965815    
## hr.f10:temp:season.fWinter -0.363427   0.772407  -0.471 0.637989    
## hr.f11:temp:season.fWinter -0.974418   0.787358  -1.238 0.215872    
## hr.f12:temp:season.fWinter -0.615315   0.760252  -0.809 0.418310    
## hr.f13:temp:season.fWinter -0.500382   0.759439  -0.659 0.509970    
## hr.f14:temp:season.fWinter -0.269946   0.750831  -0.360 0.719199    
## hr.f15:temp:season.fWinter  0.077422   0.731741   0.106 0.915737    
## hr.f16:temp:season.fWinter  0.436001   0.748241   0.583 0.560094    
## hr.f17:temp:season.fWinter  0.823586   0.731882   1.125 0.260463    
## hr.f18:temp:season.fWinter  0.651915   0.748287   0.871 0.383640    
## hr.f19:temp:season.fWinter  1.402116   0.748953   1.872 0.061192 .  
## hr.f20:temp:season.fWinter  0.776728   0.776630   1.000 0.317250    
## hr.f21:temp:season.fWinter  0.448498   0.777169   0.577 0.563877    
## hr.f22:temp:season.fWinter  0.284529   0.783051   0.363 0.716337    
## hr.f23:temp:season.fWinter -0.180054   0.801823  -0.225 0.822324    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(10.7902) family taken to be 1)
## 
##     Null deviance: 76179.4  on 6482  degrees of freedom
## Residual deviance:  6858.9  on 6230  degrees of freedom
## AIC: 61583
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  10.790 
##           Std. Err.:  0.233 
## 
##  2 x log-likelihood:  -61074.707
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: cnt
##                                                                                                 Model
## 1 hr.f + mnth.f + holiday.f + workingday.f + weathersit.f + season.f + temp + atemp + hum + windspeed
## 2               hr.f * temp * workingday.f + hr.f * temp * season.f + temp + mnth.f + hum + windspeed
##       theta Resid. df    2 x log-lik.   Test    df LR stat. Pr(Chi)
## 1  3.955919      6436       -66831.07                              
## 2 10.790188      6230       -61074.71 1 vs 2   206 5756.368       0

According to the output, the interactive binomial model is further improved with a smaller residual deviance of 6858.9. The anova analysis also suggests that the interactive model is a better fit to our data.

## List of 21
##  $ call          : language glm.nb(formula = cnt ~ hr.f + temp + workingday.f + season.f + mnth.f +      hum + windspeed + hr.f:temp + hr.f:w| __truncated__ ...
##  $ terms         :Classes 'terms', 'formula'  language cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed +      hr.f:temp + hr.f:workingday.f + tem| __truncated__ ...
##   .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "factors")= int [1:8, 1:14] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##   .. .. .. ..$ : chr [1:14] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:14] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "order")= int [1:14] 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##  $ family        :List of 12
##   ..$ family    : chr "Negative Binomial(10.7902)"
##   ..$ link      : chr "log"
##   ..$ linkfun   :function (mu)  
##   ..$ linkinv   :function (eta)  
##   ..$ variance  :function (mu)  
##   ..$ dev.resids:function (y, mu, wt)  
##   ..$ aic       :function (y, n, mu, wt, dev)  
##   ..$ mu.eta    :function (eta)  
##   ..$ initialize:  expression({  if (any(y < 0))  stop("negative values not allowed for the negative binomial family")  n <- rep(1, | __truncated__
##   ..$ validmu   :function (mu)  
##   ..$ valideta  :function (eta)  
##   ..$ simulate  :function (object, nsim)  
##   ..- attr(*, "class")= chr "family"
##  $ deviance      : num 6859
##  $ aic           : num 61583
##  $ contrasts     :List of 4
##   ..$ hr.f        : chr "contr.treatment"
##   ..$ workingday.f: chr "contr.treatment"
##   ..$ season.f    : chr "contr.treatment"
##   ..$ mnth.f      : chr "contr.treatment"
##  $ df.residual   : int 6230
##  $ null.deviance : num 76179
##  $ df.null       : int 6482
##  $ iter          : int 1
##  $ deviance.resid: Named num [1:6483] 0.919 -0.78 -1.454 0.558 -0.344 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients  : num [1:253, 1:4] 3.235 -0.347 -0.544 -0.806 -1.953 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "z value" "Pr(>|z|)"
##  $ aliased       : Named logi [1:253] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ dispersion    : num 1
##  $ df            : int [1:3] 253 6230 253
##  $ cov.unscaled  : num [1:253, 1:253] 0.0207 -0.0204 -0.0204 -0.0204 -0.0204 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ cov.scaled    : num [1:253, 1:253] 0.0207 -0.0204 -0.0204 -0.0204 -0.0204 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ theta         : num 10.8
##  $ SE.theta      : num 0.233
##  $ twologlik     : num -61075
##  $ NA            : NULL
##  - attr(*, "class")= chr [1:2] "summary.negbin" "summary.glm"
## List of 21
##  $ call          : language glm.nb(formula = cnt ~ hr.f + temp + workingday.f + season.f + mnth.f +      hum + windspeed + hr.f:temp + hr.f:w| __truncated__ ...
##  $ terms         :Classes 'terms', 'formula'  language cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed +      hr.f:temp + hr.f:workingday.f + tem| __truncated__ ...
##   .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "factors")= int [1:8, 1:13] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##   .. .. .. ..$ : chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "order")= int [1:13] 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##  $ family        :List of 12
##   ..$ family    : chr "Negative Binomial(10.5934)"
##   ..$ link      : chr "log"
##   ..$ linkfun   :function (mu)  
##   ..$ linkinv   :function (eta)  
##   ..$ variance  :function (mu)  
##   ..$ dev.resids:function (y, mu, wt)  
##   ..$ aic       :function (y, n, mu, wt, dev)  
##   ..$ mu.eta    :function (eta)  
##   ..$ initialize:  expression({  if (any(y < 0))  stop("negative values not allowed for the negative binomial family")  n <- rep(1, | __truncated__
##   ..$ validmu   :function (mu)  
##   ..$ valideta  :function (eta)  
##   ..$ simulate  :function (object, nsim)  
##   ..- attr(*, "class")= chr "family"
##  $ deviance      : num 6854
##  $ aic           : num 61537
##  $ contrasts     :List of 4
##   ..$ hr.f        : chr "contr.treatment"
##   ..$ workingday.f: chr "contr.treatment"
##   ..$ season.f    : chr "contr.treatment"
##   ..$ mnth.f      : chr "contr.treatment"
##  $ df.residual   : int 6299
##  $ null.deviance : num 75008
##  $ df.null       : int 6482
##  $ iter          : int 1
##  $ deviance.resid: Named num [1:6483] 0.898 -0.75 -1.41 0.748 -0.367 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients  : num [1:184, 1:4] 3.177 -0.233 -0.389 -1.11 -2.072 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "z value" "Pr(>|z|)"
##  $ aliased       : Named logi [1:184] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ dispersion    : num 1
##  $ df            : int [1:3] 184 6299 184
##  $ cov.unscaled  : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ cov.scaled    : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ theta         : num 10.6
##  $ SE.theta      : num 0.228
##  $ twologlik     : num -61167
##  $ NA            : NULL
##  - attr(*, "class")= chr [1:2] "summary.negbin" "summary.glm"
## List of 21
##  $ call          : language glm.nb(formula = cnt ~ hr.f + temp + workingday.f + season.f + mnth.f +      hum + windspeed + hr.f:temp + hr.f:w| __truncated__ ...
##  $ terms         :Classes 'terms', 'formula'  language cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed +      hr.f:temp + hr.f:workingday.f + tem| __truncated__ ...
##   .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "factors")= int [1:8, 1:13] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##   .. .. .. ..$ : chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "order")= int [1:13] 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##  $ family        :List of 12
##   ..$ family    : chr "Negative Binomial(10.5934)"
##   ..$ link      : chr "log"
##   ..$ linkfun   :function (mu)  
##   ..$ linkinv   :function (eta)  
##   ..$ variance  :function (mu)  
##   ..$ dev.resids:function (y, mu, wt)  
##   ..$ aic       :function (y, n, mu, wt, dev)  
##   ..$ mu.eta    :function (eta)  
##   ..$ initialize:  expression({  if (any(y < 0))  stop("negative values not allowed for the negative binomial family")  n <- rep(1, | __truncated__
##   ..$ validmu   :function (mu)  
##   ..$ valideta  :function (eta)  
##   ..$ simulate  :function (object, nsim)  
##   ..- attr(*, "class")= chr "family"
##  $ deviance      : num 6854
##  $ aic           : num 61537
##  $ contrasts     :List of 4
##   ..$ hr.f        : chr "contr.treatment"
##   ..$ workingday.f: chr "contr.treatment"
##   ..$ season.f    : chr "contr.treatment"
##   ..$ mnth.f      : chr "contr.treatment"
##  $ df.residual   : int 6299
##  $ null.deviance : num 75008
##  $ df.null       : int 6482
##  $ iter          : int 1
##  $ deviance.resid: Named num [1:6483] 0.898 -0.75 -1.41 0.748 -0.367 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients  : num [1:184, 1:4] 3.177 -0.233 -0.389 -1.11 -2.072 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "z value" "Pr(>|z|)"
##  $ aliased       : Named logi [1:184] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ dispersion    : num 1
##  $ df            : int [1:3] 184 6299 184
##  $ cov.unscaled  : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ cov.scaled    : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ theta         : num 10.6
##  $ SE.theta      : num 0.228
##  $ twologlik     : num -61167
##  $ NA            : NULL
##  - attr(*, "class")= chr [1:2] "summary.negbin" "summary.glm"
## List of 21
##  $ call          : language glm.nb(formula = cnt ~ hr.f + temp + workingday.f + season.f + mnth.f +      hum + windspeed + hr.f:temp + hr.f:w| __truncated__ ...
##  $ terms         :Classes 'terms', 'formula'  language cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed +      hr.f:temp + hr.f:workingday.f + tem| __truncated__ ...
##   .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "factors")= int [1:8, 1:13] 0 1 0 0 0 0 0 0 0 0 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##   .. .. .. ..$ : chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "term.labels")= chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
##   .. ..- attr(*, "order")= int [1:13] 1 1 1 1 1 1 1 2 2 2 ...
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
##   .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
##   .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
##  $ family        :List of 12
##   ..$ family    : chr "Negative Binomial(10.5934)"
##   ..$ link      : chr "log"
##   ..$ linkfun   :function (mu)  
##   ..$ linkinv   :function (eta)  
##   ..$ variance  :function (mu)  
##   ..$ dev.resids:function (y, mu, wt)  
##   ..$ aic       :function (y, n, mu, wt, dev)  
##   ..$ mu.eta    :function (eta)  
##   ..$ initialize:  expression({  if (any(y < 0))  stop("negative values not allowed for the negative binomial family")  n <- rep(1, | __truncated__
##   ..$ validmu   :function (mu)  
##   ..$ valideta  :function (eta)  
##   ..$ simulate  :function (object, nsim)  
##   ..- attr(*, "class")= chr "family"
##  $ deviance      : num 6854
##  $ aic           : num 61537
##  $ contrasts     :List of 4
##   ..$ hr.f        : chr "contr.treatment"
##   ..$ workingday.f: chr "contr.treatment"
##   ..$ season.f    : chr "contr.treatment"
##   ..$ mnth.f      : chr "contr.treatment"
##  $ df.residual   : int 6299
##  $ null.deviance : num 75008
##  $ df.null       : int 6482
##  $ iter          : int 1
##  $ deviance.resid: Named num [1:6483] 0.898 -0.75 -1.41 0.748 -0.367 ...
##   ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
##  $ coefficients  : num [1:184, 1:4] 3.177 -0.233 -0.389 -1.11 -2.072 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:4] "Estimate" "Std. Error" "z value" "Pr(>|z|)"
##  $ aliased       : Named logi [1:184] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..- attr(*, "names")= chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ dispersion    : num 1
##  $ df            : int [1:3] 184 6299 184
##  $ cov.unscaled  : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ cov.scaled    : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##   .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
##  $ theta         : num 10.6
##  $ SE.theta      : num 0.228
##  $ twologlik     : num -61167
##  $ NA            : NULL
##  - attr(*, "class")= chr [1:2] "summary.negbin" "summary.glm"
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: cnt
##                                                                                                                                                                                                Model
## 1                 hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + temp:workingday.f + temp:season.f + hr.f:temp:workingday.f + hr.f:temp:season.f
## 2                 hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + temp:workingday.f + temp:season.f + hr.f:temp:workingday.f + hr.f:temp:season.f
## 3                 hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + temp:workingday.f + temp:season.f + hr.f:temp:workingday.f + hr.f:temp:season.f
## 4                                                                                                              hr.f * temp * workingday.f + hr.f * temp * season.f + temp + mnth.f + hum + windspeed
## 5 hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + temp:workingday.f + hr.f:season.f + temp:season.f + hr.f:temp:workingday.f + hr.f:temp:season.f
##      theta Resid. df    2 x log-lik.   Test    df      LR stat.    Pr(Chi)
## 1 10.59339      6299       -61167.18                                      
## 2 10.59339      6299       -61167.18 1 vs 2     0 -6.388291e-09 1.00000000
## 3 10.59339      6299       -61167.18 2 vs 3     0  0.000000e+00 1.00000000
## 4 10.79019      6230       -61074.71 3 vs 4    69  9.246979e+01 0.03121904
## 5 10.79019      6230       -61074.71 4 vs 5     0 -3.135210e-08 1.00000000

The summary output shows that the updated interactive model is improved from the previous model with residual deviance decreasing to only 6482 in this case. After further trimming down the redundant variables, the ANOVA analysis output shows that the explanatory power of the model does not deteriorate significantly till the third revised model. Therefore, in this case, I will choose glm_nb3 as the final interactive negative binomial model.

Now, let’s generate predictions of the training dataset using the negative binomial model and to compare them to the actual values.

## # A tibble: 20 x 3
##      cnt       pred      pred2
##    <int>      <dbl>      <dbl>
##  1    40  17.520762  17.578447
##  2     1   2.171260   2.581132
##  3     1   1.725305   2.133603
##  4     2   3.091957   4.014142
##  5    94 131.559509 138.340101
##  6    67  82.434962  95.385245
##  7    35  65.062956  74.375771
##  8    37  44.570102  55.398084
##  9    39  23.138253  33.663672
## 10     6   7.063183   8.576729
## 11    93 108.167812 112.747895
## 12    74  87.733195  98.343140
## 13    22  47.521091  47.250220
## 14     9  31.469255  30.822930
## 15     5   9.737650  10.556992
## 16    30  25.051612  26.735983
## 17    88 125.419972 112.300792
## 18    76  92.209247  92.028032
## 19   110 110.204445  99.493973
## 20    94  65.457335  66.866203
## [1] 4499429
## [1] 2081.142
## [1] 4743026
## [1] 2193.814

Based on the SSE and MSE, I conclude that the interactive negative binomial model is a better fit to the dataset.

Predictions

Now, I will use the interactive negative binomial model to predict the cnt variable for the 2012 dataset:

# get predictions for 2012 data
pred_12 <- predict(glm_nbin, newdata = hour12_x)

# add predictions to 2012 data frame
hour12_x <- hour12_x %>% 
            mutate(cnt_pred=exp(pred_12))

For the complete dataset, please see bike_sharing_predictions_2012