Captial Bikeshare is a bicycle sharing system in various cities including Washington, D.C., and it releases a large amount of their data to the public about bike rentals in Washinton, D.C.. In this project, I will explore how different factors such as weather, temperature and date correlate to the variation in the count of rental bikes per hour using the 2011 dataset. I will also use the same dataset to construct and compare several models, both linear and nonlinear, in terms of how well can they predict the future hourly bike rental counts based on given variables.
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Parsed with column specification:
## cols(
## instant = col_integer(),
## dteday = col_date(format = ""),
## season = col_integer(),
## yr = col_integer(),
## mnth = col_integer(),
## hr = col_integer(),
## holiday = col_integer(),
## weekday = col_integer(),
## workingday = col_integer(),
## weathersit = col_integer(),
## temp = col_double(),
## atemp = col_double(),
## hum = col_double(),
## windspeed = col_double(),
## casual = col_integer(),
## registered = col_integer(),
## cnt = col_integer()
## )
## Parsed with column specification:
## cols(
## instant = col_integer(),
## dteday = col_date(format = ""),
## season = col_integer(),
## yr = col_integer(),
## mnth = col_integer(),
## hr = col_integer(),
## holiday = col_integer(),
## weekday = col_integer(),
## workingday = col_integer(),
## weathersit = col_integer(),
## temp = col_double(),
## atemp = col_double(),
## hum = col_double(),
## windspeed = col_double()
## )
## Warning in format.POSIXlt(as.POSIXlt(x), ...): unknown timezone 'zone/tz/
## 2017c.1.0/zoneinfo/America/New_York'
## instant dteday season yr
## Min. : 1 Min. :2011-01-01 Min. :1.000 Min. :0
## 1st Qu.:2162 1st Qu.:2011-04-04 1st Qu.:2.000 1st Qu.:0
## Median :4323 Median :2011-07-04 Median :3.000 Median :0
## Mean :4323 Mean :2011-07-03 Mean :2.514 Mean :0
## 3rd Qu.:6484 3rd Qu.:2011-10-02 3rd Qu.:3.000 3rd Qu.:0
## Max. :8645 Max. :2011-12-31 Max. :4.000 Max. :0
## mnth hr holiday weekday
## Min. : 1.000 Min. : 0.00 Min. :0.00000 Min. :0.000
## 1st Qu.: 4.000 1st Qu.: 6.00 1st Qu.:0.00000 1st Qu.:1.000
## Median : 7.000 Median :12.00 Median :0.00000 Median :3.000
## Mean : 6.574 Mean :11.57 Mean :0.02765 Mean :3.013
## 3rd Qu.:10.000 3rd Qu.:18.00 3rd Qu.:0.00000 3rd Qu.:5.000
## Max. :12.000 Max. :23.00 Max. :1.00000 Max. :6.000
## workingday weathersit temp atemp
## Min. :0.0000 Min. :1.000 Min. :0.0200 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:1.000 1st Qu.:0.3200 1st Qu.:0.3182
## Median :1.0000 Median :1.000 Median :0.5000 Median :0.4848
## Mean :0.6837 Mean :1.438 Mean :0.4891 Mean :0.4690
## 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:0.6600 3rd Qu.:0.6212
## Max. :1.0000 Max. :4.000 Max. :0.9600 Max. :1.0000
## hum windspeed casual registered
## Min. :0.0000 Min. :0.0000 Min. : 0.0 Min. : 0.0
## 1st Qu.:0.4900 1st Qu.:0.1045 1st Qu.: 3.0 1st Qu.: 26.0
## Median :0.6500 Median :0.1940 Median : 14.0 Median : 90.0
## Mean :0.6434 Mean :0.1912 Mean : 28.6 Mean :115.2
## 3rd Qu.:0.8100 3rd Qu.:0.2836 3rd Qu.: 38.0 3rd Qu.:168.0
## Max. :1.0000 Max. :0.8507 Max. :272.0 Max. :567.0
## cnt
## Min. : 1.0
## 1st Qu.: 31.0
## Median :109.0
## Mean :143.8
## 3rd Qu.:211.0
## Max. :651.0
## instant dteday season yr
## Min. : 8646 Min. :2012-01-01 Min. :1.00 Min. :1
## 1st Qu.:10829 1st Qu.:2012-04-01 1st Qu.:2.00 1st Qu.:1
## Median :13012 Median :2012-07-01 Median :2.00 Median :1
## Mean :13012 Mean :2012-07-01 Mean :2.49 Mean :1
## 3rd Qu.:15196 3rd Qu.:2012-09-30 3rd Qu.:3.00 3rd Qu.:1
## Max. :17379 Max. :2012-12-31 Max. :4.00 Max. :1
## mnth hr holiday weekday
## Min. : 1.000 Min. : 0.00 Min. :0.00000 Min. :0.000
## 1st Qu.: 4.000 1st Qu.: 6.00 1st Qu.:0.00000 1st Qu.:1.000
## Median : 7.000 Median :12.00 Median :0.00000 Median :3.000
## Mean : 6.502 Mean :11.52 Mean :0.02988 Mean :2.995
## 3rd Qu.: 9.000 3rd Qu.:18.00 3rd Qu.:0.00000 3rd Qu.:5.000
## Max. :12.000 Max. :23.00 Max. :1.00000 Max. :6.000
## workingday weathersit temp atemp
## Min. :0.0000 Min. :1.000 Min. :0.0200 Min. :0.0152
## 1st Qu.:0.0000 1st Qu.:1.000 1st Qu.:0.3400 1st Qu.:0.3333
## Median :1.0000 Median :1.000 Median :0.5200 Median :0.4848
## Mean :0.6817 Mean :1.413 Mean :0.5048 Mean :0.4825
## 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:0.6600 3rd Qu.:0.6212
## Max. :1.0000 Max. :4.000 Max. :1.0000 Max. :0.9242
## hum windspeed
## Min. :0.1600 Min. :0.0000
## 1st Qu.:0.4600 1st Qu.:0.1045
## Median :0.6100 Median :0.1642
## Mean :0.6112 Mean :0.1890
## 3rd Qu.:0.7700 3rd Qu.:0.2537
## Max. :1.0000 Max. :0.8060
There are a total of 8645 observations and 17 variables in the 2011 dataset. According to the information given, of the variables are quantitative (temp,atemp,hum,windspeed,cnt), and the remaining 12 variables are qualitative. At the same time, there are a total of 8734 observations and 14 variables in the 2012 dataset. The missing three variables are “cnt” (count of total rental bikes), “casual” (count of casual users), and “registered” (count of registered users). And I intend to predict the “cnt” for 2012 using the selected model in this project
In this part, I will plot the hourly bike rental count against several variables to explore their relationships.
Figure 1 shows the rental count per hour in each season in 2011. There are more rentals in both summer and fall, and a little less in winter and the least in spring. This seems unreasonable at first glance, because there supposed to be more rentals in spring than in winter in terms of higher average temperature. In order to solve this question, I now calculate the average temperature in each season as below:
## # A tibble: 4 x 3
## season avg_temp avg_atemp
## <fctr> <dbl> <dbl>
## 1 Spring 0.2753482 0.2769897
## 2 Summer 0.5346074 0.5103303
## 3 Fall 0.7013393 0.6541496
## 4 Winter 0.4263543 0.4180605
From the result above, it is easy to tell that both average temperature and feeling temperature are the lowest in spring, which is contrary to my previous expectation. Now I plot the relationship between the rental count and the normalized feeling temperature.
## `geom_smooth()` using method = 'gam'
Figure 2 shows the relationship between the rental count per hour and the normalized feeling temperature in each season in 2011. As expected, the higher the normalized feeling temperature the higher the rental count would be. And there are more rentals in both summer and fall because the average feeling temperatures are higher in both season and it is the lowest in spring, resulting in less rental count.
Figure 3 shows the relationship between the rental count per hour and the normalized humidity in 2011. Unlike temperature, there is no significant relationship between these two variables as this appears as a uniform distribution from 0.25~0.75. This result seems reasonable because people might not be as sensitive to himidity as to temperature. Next, let’s re-examine the plot by taking weather conditions into account.
Figure 4 shows the relationship between the rental count per hour and the normalized humidity under each weather condition in 2011. Even after breaking the data into separate sets based on weather condition, there is still no significant relationship shown between these two variables. On the other hand, the rental traffic is highest when the weather is clear and partly cloundy and much less when under adverse and severe weather conditions as shown above.
Figure 5 shows the relationship between the rental count per hour and the normalized wind speed in 2011. Similar to humidity, wind speed does not seem to affect the rental traffic significantly. The rental count stays relatively same for different wind speeds. However, the number drops significantly when the normalized wind speed excceed 0.5, which is approximately 33km/hour(not exact). This aligns with my previous expectation as it will be more difficult to ride a bike under strong wind condition. Now, let’s take a look at the data of normalized wind speed larger than 0.5.
Figure 6 shows the rental count per hour when the normalized wind speed is larger than 0.5 in 2011. As shown above, even if the wind speed is large at a particular hour, as long as the weather is still nice and clear, then there will still be a significant count of rentals.
Let’s now try to quantify the relationship between the rental count per hour and the normalized feeling temperature, the normalized humidity or the normalized wind speed in 2011.
##
## Call:
## lm(formula = cnt ~ atemp, data = hour11)
##
## Residuals:
## Min 1Q Median 3Q Max
## -244.70 -80.38 -20.19 59.23 459.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -14.905 3.650 -4.083 4.48e-05 ***
## atemp 338.377 7.283 46.460 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 119.7 on 8643 degrees of freedom
## Multiple R-squared: 0.1998, Adjusted R-squared: 0.1997
## F-statistic: 2159 on 1 and 8643 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = cnt ~ hum, data = hour11)
##
## Residuals:
## Min 1Q Median 3Q Max
## -269.37 -92.09 -32.24 61.55 487.07
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 270.374 4.722 57.26 <2e-16 ***
## hum -196.727 7.020 -28.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 128.1 on 8643 degrees of freedom
## Multiple R-squared: 0.0833, Adjusted R-squared: 0.08319
## F-statistic: 785.4 on 1 and 8643 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = cnt ~ windspeed, data = hour11)
##
## Residuals:
## Min 1Q Median 3Q Max
## -193.02 -109.76 -34.83 67.48 516.62
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 126.072 2.647 47.625 < 2e-16 ***
## windspeed 92.705 11.640 7.964 1.87e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 133.3 on 8643 degrees of freedom
## Multiple R-squared: 0.007286, Adjusted R-squared: 0.007171
## F-statistic: 63.43 on 1 and 8643 DF, p-value: 1.871e-15
From the outputs above, it is obvious that feeling temperature is the most effective predictor when predicting the rental counts with the least MSE generated. Wind spend, on the contrary, is the least effective predictor with MSE being the largest. Now let’s take a look at the linear model of these three variables altoghther:
##
## Call:
## lm(formula = cnt ~ atemp + hum + windspeed, data = hour11)
##
## Residuals:
## Min 1Q Median 3Q Max
## -254.62 -74.03 -22.11 52.19 468.69
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 97.492 6.265 15.562 < 2e-16 ***
## atemp 334.804 6.930 48.315 < 2e-16 ***
## hum -183.359 6.461 -28.378 < 2e-16 ***
## windspeed 37.964 10.303 3.685 0.00023 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 113.7 on 8641 degrees of freedom
## Multiple R-squared: 0.2783, Adjusted R-squared: 0.2781
## F-statistic: 1111 on 3 and 8641 DF, p-value: < 2.2e-16
The result shows that both feeling temperature and wind speed are positively correlated with the rental count while humidity is negatively correlated with the rental count. Most importantly, the p-value of windspeed is only 0.00023 while the other two are significantly much smaller. It is possible that this p-value will increase more if we include more variables in the model, eventually rendering it statistically insignificant.
## `geom_smooth()` using method = 'gam'
## `geom_smooth()` using method = 'gam'
Figure 7 shows a general trend of the bike rental counts per hour for casual and registered user in each hour on workdays in 2011. From the graph, we can tell that the number of registered users is much larger than that of casual users at most of the hours except between 3 and 5am. The number of registered users also reaches its peak during rush hours, which are around 8am and 6pm. This suggests that registered users are more likely to use rental bikes to commute to work or back home comparing to casual users, who might employ other means of transportation more frequently. Yet during hours from 3 to 5 in the morning, there are more casual users. This is probably because they need rental bikes to commute back home when public transportation is out of service during that time.
## `geom_smooth()` using method = 'gam'
## `geom_smooth()` using method = 'gam'
Figure 8 shows a general trend of the bike rental counts per hour for casual and registered user in each hour on weekends or holidays in 2011. Although, the number of registered users is still much larger than that of casual users. They both follow a similar trend that increases gradually from 5am, reaches the peak around 2pm and then gradually decreases. This suggests that both user groups tend to use rental bikes between 10am and 18pm, which is usually a prime time period for bike riding.
In this part, I will construct and train several models (linear & non-linear) based on the bike sharing data from 2011, and then I will compare and select the model with the best performance to predict the ride counts for each hour in 2012.
##
## Call:
## lm(formula = formula1, data = train11)
##
## Residuals:
## Min 1Q Median 3Q Max
## -284.69 -44.56 -6.79 40.68 412.24
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.059 7.597 -2.641 0.00830 **
## hr.f1 -11.854 6.439 -1.841 0.06569 .
## hr.f2 -19.513 6.379 -3.059 0.00223 **
## hr.f3 -28.343 6.538 -4.335 1.48e-05 ***
## hr.f4 -31.386 6.591 -4.762 1.96e-06 ***
## hr.f5 -16.147 6.504 -2.482 0.01307 *
## hr.f6 25.983 6.464 4.019 5.90e-05 ***
## hr.f7 125.997 6.409 19.659 < 2e-16 ***
## hr.f8 226.988 6.410 35.411 < 2e-16 ***
## hr.f9 119.749 6.413 18.672 < 2e-16 ***
## hr.f10 79.723 6.391 12.475 < 2e-16 ***
## hr.f11 99.235 6.488 15.295 < 2e-16 ***
## hr.f12 127.027 6.532 19.445 < 2e-16 ***
## hr.f13 128.801 6.601 19.512 < 2e-16 ***
## hr.f14 111.807 6.568 17.022 < 2e-16 ***
## hr.f15 114.221 6.673 17.116 < 2e-16 ***
## hr.f16 165.481 6.597 25.084 < 2e-16 ***
## hr.f17 281.329 6.539 43.025 < 2e-16 ***
## hr.f18 262.727 6.504 40.393 < 2e-16 ***
## hr.f19 177.417 6.415 27.657 < 2e-16 ***
## hr.f20 121.553 6.501 18.697 < 2e-16 ***
## hr.f21 87.495 6.354 13.771 < 2e-16 ***
## hr.f22 57.223 6.330 9.040 < 2e-16 ***
## hr.f23 27.306 6.362 4.292 1.79e-05 ***
## mnth.f2 5.066 4.897 1.035 0.30094
## mnth.f3 10.636 5.448 1.952 0.05095 .
## mnth.f4 26.836 8.439 3.180 0.00148 **
## mnth.f5 64.931 9.080 7.151 9.56e-13 ***
## mnth.f6 47.330 9.662 4.899 9.88e-07 ***
## mnth.f7 15.693 10.911 1.438 0.15040
## mnth.f8 30.038 10.507 2.859 0.00427 **
## mnth.f9 45.287 9.400 4.818 1.48e-06 ***
## mnth.f10 38.111 8.293 4.596 4.40e-06 ***
## mnth.f11 18.935 8.015 2.362 0.01819 *
## mnth.f12 19.962 6.239 3.199 0.00138 **
## holiday.f1 -19.307 5.922 -3.260 0.00112 **
## workingday.f1 -2.498 2.093 -1.193 0.23274
## weathersit.fCloudy & Misty -3.288 2.354 -1.397 0.16252
## weathersit.fAdverse Weather -44.359 3.774 -11.755 < 2e-16 ***
## weathersit.fSevere Weather -62.796 75.559 -0.831 0.40595
## season.fSummer 18.224 5.933 3.072 0.00214 **
## season.fFall 27.508 6.934 3.967 7.36e-05 ***
## season.fWinter 45.858 5.881 7.797 7.33e-15 ***
## temp 117.851 44.297 2.660 0.00782 **
## atemp 49.923 46.792 1.067 0.28605
## hum -78.225 6.632 -11.796 < 2e-16 ***
## windspeed -21.745 8.676 -2.506 0.01222 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 75.25 on 6436 degrees of freedom
## Multiple R-squared: 0.6851, Adjusted R-squared: 0.6828
## F-statistic: 304.3 on 46 and 6436 DF, p-value: < 2.2e-16
According to the summary output, this model is statistically significant with a p-value less than 0.05. However, this model’s mean squared error (MSE) is relatively high suggesting that it still has certain limitations. Most of the predictors are statistically significant except workingday and atemp variable, which is probably due to multicollinearity. Try re-running the model withouth these two variables:
##
## Call:
## lm(formula = cnt ~ hr.f + mnth.f + holiday.f + weathersit.f +
## season.f + temp + hum + windspeed, data = train11)
##
## Residuals:
## Min 1Q Median 3Q Max
## -284.30 -44.90 -6.51 41.21 410.37
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.023 7.374 -2.716 0.006635 **
## hr.f1 -11.968 6.439 -1.859 0.063123 .
## hr.f2 -19.606 6.379 -3.073 0.002125 **
## hr.f3 -28.454 6.538 -4.352 1.37e-05 ***
## hr.f4 -31.462 6.591 -4.773 1.85e-06 ***
## hr.f5 -16.296 6.504 -2.506 0.012249 *
## hr.f6 25.932 6.465 4.011 6.11e-05 ***
## hr.f7 125.980 6.409 19.656 < 2e-16 ***
## hr.f8 227.030 6.410 35.417 < 2e-16 ***
## hr.f9 119.816 6.414 18.681 < 2e-16 ***
## hr.f10 79.716 6.391 12.474 < 2e-16 ***
## hr.f11 99.318 6.488 15.309 < 2e-16 ***
## hr.f12 127.071 6.532 19.453 < 2e-16 ***
## hr.f13 128.936 6.600 19.535 < 2e-16 ***
## hr.f14 112.007 6.567 17.055 < 2e-16 ***
## hr.f15 114.348 6.673 17.136 < 2e-16 ***
## hr.f16 165.540 6.597 25.094 < 2e-16 ***
## hr.f17 281.396 6.538 43.038 < 2e-16 ***
## hr.f18 262.726 6.504 40.392 < 2e-16 ***
## hr.f19 177.484 6.415 27.667 < 2e-16 ***
## hr.f20 121.629 6.502 18.708 < 2e-16 ***
## hr.f21 87.554 6.354 13.780 < 2e-16 ***
## hr.f22 57.219 6.330 9.039 < 2e-16 ***
## hr.f23 27.347 6.362 4.298 1.75e-05 ***
## mnth.f2 5.345 4.891 1.093 0.274492
## mnth.f3 10.719 5.448 1.968 0.049168 *
## mnth.f4 27.463 8.429 3.258 0.001128 **
## mnth.f5 65.239 9.072 7.191 7.16e-13 ***
## mnth.f6 47.245 9.634 4.904 9.63e-07 ***
## mnth.f7 15.891 10.880 1.461 0.144189
## mnth.f8 29.798 10.472 2.845 0.004449 **
## mnth.f9 44.746 9.345 4.788 1.72e-06 ***
## mnth.f10 38.872 8.279 4.695 2.72e-06 ***
## mnth.f11 19.738 7.994 2.469 0.013571 *
## mnth.f12 20.544 6.223 3.301 0.000967 ***
## holiday.f1 -18.051 5.719 -3.156 0.001606 **
## weathersit.fCloudy & Misty -3.469 2.351 -1.475 0.140152
## weathersit.fAdverse Weather -44.877 3.760 -11.935 < 2e-16 ***
## weathersit.fSevere Weather -65.007 75.549 -0.860 0.389570
## season.fSummer 17.997 5.930 3.035 0.002416 **
## season.fFall 27.401 6.929 3.955 7.75e-05 ***
## season.fWinter 45.460 5.876 7.737 1.17e-14 ***
## temp 162.402 11.764 13.805 < 2e-16 ***
## hum -77.549 6.608 -11.735 < 2e-16 ***
## windspeed -24.719 8.235 -3.002 0.002695 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 75.25 on 6438 degrees of freedom
## Multiple R-squared: 0.6849, Adjusted R-squared: 0.6828
## F-statistic: 318.1 on 44 and 6438 DF, p-value: < 2.2e-16
## Analysis of Variance Table
##
## Model 1: cnt ~ hr.f + mnth.f + holiday.f + workingday.f + weathersit.f +
## season.f + temp + atemp + hum + windspeed
## Model 2: cnt ~ hr.f + mnth.f + holiday.f + weathersit.f + season.f + temp +
## hum + windspeed
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 6436 36439980
## 2 6438 36454586 -2 -14606 1.2899 0.2754
The new model does not improve or deteriorate significantly from the previous model, which further suggests that the predictive powers the eliminated variables are limited. The result of ANOVA analysis also supports the finding as p-value is larger than 0.05. So the final non-interactive model can explain roughly 68.28% of the variation in rental count and has a MSE of 75.25. Let’s now take a look at an interactive linear model:
## List of 11
## $ call : language lm(formula = cnt ~ hr.f * temp * holiday.f + weathersit.f + mnth.f + season.f + hum + windspeed, data = train11)
## $ terms :Classes 'terms', 'formula' language cnt ~ hr.f * temp * holiday.f + weathersit.f + mnth.f + season.f + hum + windspeed
## .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, holiday.f, weathersit.f, mnth.f, season.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:9, 1:12] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:9] "cnt" "hr.f" "temp" "holiday.f" ...
## .. .. .. ..$ : chr [1:12] "hr.f" "temp" "holiday.f" "weathersit.f" ...
## .. ..- attr(*, "term.labels")= chr [1:12] "hr.f" "temp" "holiday.f" "weathersit.f" ...
## .. ..- attr(*, "order")= int [1:12] 1 1 1 1 1 1 1 1 2 2 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, holiday.f, weathersit.f, mnth.f, season.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:9] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:9] "cnt" "hr.f" "temp" "holiday.f" ...
## $ residuals : Named num [1:6483] 105.5 -49.9 -135.4 -65.5 -96.2 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:115, 1:4] 49.277 0.682 5.634 -10.488 -8.386 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
## $ aliased : Named logi [1:115] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ sigma : num 68.7
## $ df : int [1:3] 115 6368 115
## $ r.squared : num 0.74
## $ adj.r.squared: num 0.735
## $ fstatistic : Named num [1:3] 159 114 6368
## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
## $ cov.unscaled : num [1:115, 1:115] 0.0328 -0.0278 -0.0279 -0.0277 -0.0277 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## - attr(*, "class")= chr "summary.lm"
In the model above, I add in one interactive term, which is hr.f*temp*holiday.f. I want to explore how temperature affects the rental count during holidays and non-holidays. The model’s adjusted R-squared increases by 7.7% and MSE decreases by 8.6%, which infers an significant improvement from the previous model.
However, there are roughly 1024 possible combinations of the interaction term, it would be impossible to try out all of them. I need a more feasible method to identify strong interaction effects. One possible approach is to fit a tree model.
The plot shows that hour is the most important factor affecting the rental count (the longer the branches in the tree, the greater the deviance explained), which collaborates with my previous prediction in part1. Temperature is more imporant during daytime (6:30am~). The second branch shows that at lower temperature (less than 31 celcius), there are more rental counts before 8:30pm in late fall and winter. While at higher temperature (more than 31 celcius), rental traffic is busier before 8:30 am and from 4:30 to 6:30pm on normal working days, which is the usual rush hour. At the same time, traffic is busier from 9:30 am to 3:30pm and from 4:30 to 6:30pm on weekdends and holidays. Overall, the tree model indicates that the interaction structure of the data is not complex.
## List of 11
## $ call : language lm(formula = cnt ~ hr.f * temp * workingday.f + hr.f * temp * season.f + temp + mnth.f + hum + windspeed, data = train11)
## $ terms :Classes 'terms', 'formula' language cnt ~ hr.f * temp * workingday.f + hr.f * temp * season.f + temp + mnth.f + hum + windspeed
## .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:8, 1:14] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## .. .. .. ..$ : chr [1:14] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "term.labels")= chr [1:14] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "order")= int [1:14] 1 1 1 1 1 1 1 2 2 2 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## $ residuals : Named num [1:6483] 72.2 -50.5 -144.6 20.8 -21 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:253, 1:4] 72.7 -3.32 1.07 -1.36 -4.73 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
## $ aliased : Named logi [1:253] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ sigma : num 46.4
## $ df : int [1:3] 253 6230 253
## $ r.squared : num 0.884
## $ adj.r.squared: num 0.879
## $ fstatistic : Named num [1:3] 188 252 6230
## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
## $ cov.unscaled : num [1:253, 1:253] 0.15 -0.147 -0.147 -0.147 -0.147 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## - attr(*, "class")= chr "summary.lm"
The new interactive model greatly improves from the previous one. The model now can explain approximately 87.91% of the variance in rental count, which is 1.2 times the original one. MSE decreases to only 46.45, which further suggests that the model is statistically more effective than the previous one.
Now let’s try twisted the model a little bit by performing log transformation on the original data.
##
## Call:
## lm(formula = lg_cnt ~ hr.f * lg_temp * workingday.f + hr.f *
## lg_temp * season.f + lg_temp + mnth.f + hum + windspeed,
## data = train11)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4604 -0.1540 0.0488 0.2229 2.0736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.533582 0.171660 26.410 < 2e-16 ***
## hr.f1 -0.247070 0.233793 -1.057 0.290648
## hr.f2 -0.632714 0.233104 -2.714 0.006660 **
## hr.f3 -1.441122 0.240813 -5.984 2.29e-09 ***
## hr.f4 -2.772145 0.264385 -10.485 < 2e-16 ***
## hr.f5 -3.086811 0.251909 -12.254 < 2e-16 ***
## hr.f6 -1.954148 0.241950 -8.077 7.93e-16 ***
## hr.f7 -1.344376 0.232257 -5.788 7.46e-09 ***
## hr.f8 0.025458 0.296019 0.086 0.931469
## hr.f9 0.497165 0.251371 1.978 0.047993 *
## hr.f10 1.180717 0.240942 4.900 9.80e-07 ***
## hr.f11 1.598780 0.266407 6.001 2.07e-09 ***
## hr.f12 1.447013 0.244800 5.911 3.58e-09 ***
## hr.f13 1.835071 0.262841 6.982 3.22e-12 ***
## hr.f14 1.476195 0.245502 6.013 1.92e-09 ***
## hr.f15 1.308035 0.243573 5.370 8.15e-08 ***
## hr.f16 1.330853 0.251860 5.284 1.31e-07 ***
## hr.f17 1.200911 0.243127 4.939 8.04e-07 ***
## hr.f18 0.983277 0.246273 3.993 6.61e-05 ***
## hr.f19 0.535366 0.249987 2.142 0.032266 *
## hr.f20 0.520891 0.263068 1.980 0.047740 *
## hr.f21 0.343491 0.263052 1.306 0.191671
## hr.f22 0.290723 0.253398 1.147 0.251302
## hr.f23 -0.129158 0.240747 -0.536 0.591642
## lg_temp 0.442288 0.113199 3.907 9.44e-05 ***
## workingday.f1 -1.008461 0.107166 -9.410 < 2e-16 ***
## season.fSummer 0.803474 0.212422 3.782 0.000157 ***
## season.fFall 0.667479 0.248495 2.686 0.007249 **
## season.fWinter 0.858886 0.246182 3.489 0.000489 ***
## mnth.f2 0.159044 0.029677 5.359 8.66e-08 ***
## mnth.f3 0.185035 0.035250 5.249 1.58e-07 ***
## mnth.f4 0.384559 0.049256 7.807 6.80e-15 ***
## mnth.f5 0.640851 0.053113 12.066 < 2e-16 ***
## mnth.f6 0.468726 0.055746 8.408 < 2e-16 ***
## mnth.f7 0.383784 0.061098 6.281 3.58e-10 ***
## mnth.f8 0.363315 0.059133 6.144 8.54e-10 ***
## mnth.f9 0.378031 0.056019 6.748 1.63e-11 ***
## mnth.f10 0.322501 0.049281 6.544 6.46e-11 ***
## mnth.f11 0.212335 0.047685 4.453 8.62e-06 ***
## mnth.f12 0.305000 0.039377 7.746 1.10e-14 ***
## hum -0.842518 0.033580 -25.090 < 2e-16 ***
## windspeed -0.549272 0.046353 -11.850 < 2e-16 ***
## hr.f1:lg_temp 0.052460 0.159449 0.329 0.742163
## hr.f2:lg_temp -0.090375 0.152572 -0.592 0.553641
## hr.f3:lg_temp -0.134459 0.153075 -0.878 0.379766
## hr.f4:lg_temp -0.194066 0.188310 -1.031 0.302784
## hr.f5:lg_temp -0.287946 0.166815 -1.726 0.084371 .
## hr.f6:lg_temp 0.070624 0.160420 0.440 0.659773
## hr.f7:lg_temp -0.088647 0.151780 -0.584 0.559206
## hr.f8:lg_temp 0.078601 0.199715 0.394 0.693916
## hr.f9:lg_temp 0.044830 0.175144 0.256 0.797989
## hr.f10:lg_temp 0.141348 0.170534 0.829 0.407217
## hr.f11:lg_temp 0.377567 0.203392 1.856 0.063451 .
## hr.f12:lg_temp 0.122378 0.184474 0.663 0.507106
## hr.f13:lg_temp 0.483142 0.219843 2.198 0.028009 *
## hr.f14:lg_temp 0.182143 0.193983 0.939 0.347786
## hr.f15:lg_temp 0.108955 0.194907 0.559 0.576175
## hr.f16:lg_temp 0.148920 0.198991 0.748 0.454261
## hr.f17:lg_temp 0.137317 0.189329 0.725 0.468304
## hr.f18:lg_temp 0.088821 0.183557 0.484 0.628484
## hr.f19:lg_temp 0.008569 0.184728 0.046 0.963005
## hr.f20:lg_temp 0.112515 0.202481 0.556 0.578446
## hr.f21:lg_temp 0.092249 0.191920 0.481 0.630774
## hr.f22:lg_temp 0.146752 0.179383 0.818 0.413335
## hr.f23:lg_temp 0.018176 0.165227 0.110 0.912408
## hr.f1:workingday.f1 -0.506829 0.158107 -3.206 0.001355 **
## hr.f2:workingday.f1 -0.681905 0.155886 -4.374 1.24e-05 ***
## hr.f3:workingday.f1 -0.771363 0.159695 -4.830 1.40e-06 ***
## hr.f4:workingday.f1 0.678457 0.168105 4.036 5.50e-05 ***
## hr.f5:workingday.f1 2.047011 0.164829 12.419 < 2e-16 ***
## hr.f6:workingday.f1 2.481230 0.160912 15.420 < 2e-16 ***
## hr.f7:workingday.f1 2.801075 0.150463 18.616 < 2e-16 ***
## hr.f8:workingday.f1 2.170660 0.156122 13.904 < 2e-16 ***
## hr.f9:workingday.f1 0.831912 0.154596 5.381 7.67e-08 ***
## hr.f10:workingday.f1 0.128856 0.148708 0.867 0.386248
## hr.f11:workingday.f1 0.102741 0.151080 0.680 0.496505
## hr.f12:workingday.f1 0.361688 0.150254 2.407 0.016105 *
## hr.f13:workingday.f1 0.162788 0.150858 1.079 0.280593
## hr.f14:workingday.f1 0.186932 0.145993 1.280 0.200447
## hr.f15:workingday.f1 0.303987 0.149553 2.033 0.042131 *
## hr.f16:workingday.f1 0.676216 0.148135 4.565 5.09e-06 ***
## hr.f17:workingday.f1 1.346999 0.148305 9.083 < 2e-16 ***
## hr.f18:workingday.f1 1.338956 0.152796 8.763 < 2e-16 ***
## hr.f19:workingday.f1 1.068509 0.150602 7.095 1.44e-12 ***
## hr.f20:workingday.f1 1.000281 0.157066 6.369 2.04e-10 ***
## hr.f21:workingday.f1 0.961633 0.155228 6.195 6.20e-10 ***
## hr.f22:workingday.f1 1.044531 0.154527 6.760 1.51e-11 ***
## hr.f23:workingday.f1 0.979434 0.154711 6.331 2.61e-10 ***
## lg_temp:workingday.f1 0.007914 0.104927 0.075 0.939878
## hr.f1:season.fSummer -0.144182 0.296560 -0.486 0.626856
## hr.f2:season.fSummer -0.375242 0.292593 -1.282 0.199724
## hr.f3:season.fSummer -0.127003 0.302173 -0.420 0.674281
## hr.f4:season.fSummer -0.194272 0.313755 -0.619 0.535817
## hr.f5:season.fSummer 0.291361 0.307402 0.948 0.343259
## hr.f6:season.fSummer 0.262480 0.299688 0.876 0.381149
## hr.f7:season.fSummer 0.443285 0.285372 1.553 0.120389
## hr.f8:season.fSummer -0.124917 0.333514 -0.375 0.708009
## hr.f9:season.fSummer -0.118051 0.300941 -0.392 0.694870
## hr.f10:season.fSummer -0.587658 0.288455 -2.037 0.041666 *
## hr.f11:season.fSummer -0.868934 0.301097 -2.886 0.003916 **
## hr.f12:season.fSummer -0.650876 0.293189 -2.220 0.026455 *
## hr.f13:season.fSummer -0.978481 0.301501 -3.245 0.001179 **
## hr.f14:season.fSummer -0.750543 0.285751 -2.627 0.008646 **
## hr.f15:season.fSummer -0.628057 0.283282 -2.217 0.026654 *
## hr.f16:season.fSummer -0.404307 0.292900 -1.380 0.167526
## hr.f17:season.fSummer -0.384052 0.287293 -1.337 0.181338
## hr.f18:season.fSummer -0.159922 0.291170 -0.549 0.582861
## hr.f19:season.fSummer 0.369939 0.297743 1.242 0.214107
## hr.f20:season.fSummer 0.232877 0.303774 0.767 0.443341
## hr.f21:season.fSummer 0.244015 0.308863 0.790 0.429533
## hr.f22:season.fSummer -0.030339 0.302866 -0.100 0.920210
## hr.f23:season.fSummer -0.150611 0.297560 -0.506 0.612766
## hr.f1:season.fFall -0.194763 0.356116 -0.547 0.584461
## hr.f2:season.fFall -0.200652 0.355830 -0.564 0.572843
## hr.f3:season.fFall -0.055470 0.362335 -0.153 0.878331
## hr.f4:season.fFall 0.446332 0.367830 1.213 0.225016
## hr.f5:season.fFall 0.102268 0.365524 0.280 0.779652
## hr.f6:season.fFall 0.210322 0.364430 0.577 0.563876
## hr.f7:season.fFall 0.214991 0.339269 0.634 0.526307
## hr.f8:season.fFall -0.232171 0.378131 -0.614 0.539239
## hr.f9:season.fFall -0.165559 0.339668 -0.487 0.625982
## hr.f10:season.fFall -0.649566 0.332356 -1.954 0.050695 .
## hr.f11:season.fFall -0.881288 0.335838 -2.624 0.008708 **
## hr.f12:season.fFall -0.727042 0.327090 -2.223 0.026268 *
## hr.f13:season.fFall -0.795132 0.326067 -2.439 0.014774 *
## hr.f14:season.fFall -0.610155 0.318650 -1.915 0.055562 .
## hr.f15:season.fFall -0.619515 0.315418 -1.964 0.049562 *
## hr.f16:season.fFall -0.553842 0.318016 -1.742 0.081636 .
## hr.f17:season.fFall -0.415055 0.316771 -1.310 0.190153
## hr.f18:season.fFall -0.249275 0.322341 -0.773 0.439359
## hr.f19:season.fFall 0.288256 0.331905 0.868 0.385159
## hr.f20:season.fFall 0.309249 0.348028 0.889 0.374265
## hr.f21:season.fFall 0.178797 0.353387 0.506 0.612908
## hr.f22:season.fFall -0.109180 0.345311 -0.316 0.751878
## hr.f23:season.fFall -0.024723 0.349387 -0.071 0.943591
## hr.f1:season.fWinter -0.118934 0.341901 -0.348 0.727955
## hr.f2:season.fWinter -0.310943 0.341705 -0.910 0.362871
## hr.f3:season.fWinter 0.242569 0.338418 0.717 0.473540
## hr.f4:season.fWinter 0.513317 0.348231 1.474 0.140513
## hr.f5:season.fWinter 0.571597 0.348828 1.639 0.101343
## hr.f6:season.fWinter 0.290813 0.342543 0.849 0.395923
## hr.f7:season.fWinter 0.462898 0.330353 1.401 0.161197
## hr.f8:season.fWinter -0.169189 0.370172 -0.457 0.647647
## hr.f9:season.fWinter -0.046470 0.346530 -0.134 0.893328
## hr.f10:season.fWinter -0.283863 0.346151 -0.820 0.412215
## hr.f11:season.fWinter -0.666731 0.364578 -1.829 0.067481 .
## hr.f12:season.fWinter -0.481386 0.342987 -1.404 0.160515
## hr.f13:season.fWinter -0.566056 0.352204 -1.607 0.108065
## hr.f14:season.fWinter -0.517507 0.341636 -1.515 0.129876
## hr.f15:season.fWinter -0.145710 0.332351 -0.438 0.661096
## hr.f16:season.fWinter -0.037562 0.345107 -0.109 0.913332
## hr.f17:season.fWinter -0.038625 0.334028 -0.116 0.907945
## hr.f18:season.fWinter 0.008282 0.346020 0.024 0.980905
## hr.f19:season.fWinter 0.447459 0.340161 1.315 0.188413
## hr.f20:season.fWinter 0.154944 0.351101 0.441 0.659005
## hr.f21:season.fWinter 0.151750 0.350371 0.433 0.664948
## hr.f22:season.fWinter -0.029243 0.351391 -0.083 0.933680
## hr.f23:season.fWinter -0.075897 0.340536 -0.223 0.823640
## lg_temp:season.fSummer 0.659489 0.193552 3.407 0.000660 ***
## lg_temp:season.fFall 0.222227 0.436667 0.509 0.610828
## lg_temp:season.fWinter 0.223747 0.214568 1.043 0.297090
## hr.f1:lg_temp:workingday.f1 -0.081711 0.151283 -0.540 0.589134
## hr.f2:lg_temp:workingday.f1 0.120254 0.146402 0.821 0.411452
## hr.f3:lg_temp:workingday.f1 -0.150413 0.149882 -1.004 0.315635
## hr.f4:lg_temp:workingday.f1 -0.026785 0.162194 -0.165 0.868837
## hr.f5:lg_temp:workingday.f1 0.022084 0.149649 0.148 0.882688
## hr.f6:lg_temp:workingday.f1 -0.420714 0.147607 -2.850 0.004383 **
## hr.f7:lg_temp:workingday.f1 -0.312886 0.140042 -2.234 0.025503 *
## hr.f8:lg_temp:workingday.f1 -0.535887 0.150371 -3.564 0.000368 ***
## hr.f9:lg_temp:workingday.f1 -0.796751 0.152421 -5.227 1.78e-07 ***
## hr.f10:lg_temp:workingday.f1 -0.354933 0.150500 -2.358 0.018387 *
## hr.f11:lg_temp:workingday.f1 -0.332130 0.161361 -2.058 0.039602 *
## hr.f12:lg_temp:workingday.f1 -0.031091 0.160063 -0.194 0.845992
## hr.f13:lg_temp:workingday.f1 -0.292423 0.172284 -1.697 0.089683 .
## hr.f14:lg_temp:workingday.f1 -0.190454 0.160406 -1.187 0.235143
## hr.f15:lg_temp:workingday.f1 -0.138938 0.166894 -0.832 0.405165
## hr.f16:lg_temp:workingday.f1 -0.141818 0.163442 -0.868 0.385594
## hr.f17:lg_temp:workingday.f1 -0.270057 0.160692 -1.681 0.092894 .
## hr.f18:lg_temp:workingday.f1 -0.276793 0.159941 -1.731 0.083574 .
## hr.f19:lg_temp:workingday.f1 -0.439711 0.157414 -2.793 0.005233 **
## hr.f20:lg_temp:workingday.f1 -0.386429 0.165305 -2.338 0.019436 *
## hr.f21:lg_temp:workingday.f1 -0.304770 0.157112 -1.940 0.052446 .
## hr.f22:lg_temp:workingday.f1 -0.174430 0.154225 -1.131 0.258095
## hr.f23:lg_temp:workingday.f1 -0.044544 0.150651 -0.296 0.767488
## hr.f1:lg_temp:season.fSummer -0.142476 0.275935 -0.516 0.605637
## hr.f2:lg_temp:season.fSummer -0.637802 0.263226 -2.423 0.015420 *
## hr.f3:lg_temp:season.fSummer -0.235465 0.268749 -0.876 0.380981
## hr.f4:lg_temp:season.fSummer -0.391888 0.282791 -1.386 0.165862
## hr.f5:lg_temp:season.fSummer -0.262559 0.266549 -0.985 0.324647
## hr.f6:lg_temp:season.fSummer -0.157071 0.258373 -0.608 0.543260
## hr.f7:lg_temp:season.fSummer 0.075900 0.257885 0.294 0.768525
## hr.f8:lg_temp:season.fSummer -0.274536 0.286916 -0.957 0.338680
## hr.f9:lg_temp:season.fSummer -0.161635 0.274626 -0.589 0.556175
## hr.f10:lg_temp:season.fSummer -0.483412 0.268832 -1.798 0.072194 .
## hr.f11:lg_temp:season.fSummer -0.717763 0.282930 -2.537 0.011208 *
## hr.f12:lg_temp:season.fSummer -0.644799 0.286327 -2.252 0.024359 *
## hr.f13:lg_temp:season.fSummer -0.969055 0.316436 -3.062 0.002205 **
## hr.f14:lg_temp:season.fSummer -0.684099 0.286924 -2.384 0.017144 *
## hr.f15:lg_temp:season.fSummer -0.670677 0.292795 -2.291 0.022020 *
## hr.f16:lg_temp:season.fSummer -0.355513 0.300866 -1.182 0.237395
## hr.f17:lg_temp:season.fSummer -0.375922 0.296961 -1.266 0.205597
## hr.f18:lg_temp:season.fSummer -0.174114 0.287889 -0.605 0.545337
## hr.f19:lg_temp:season.fSummer 0.171923 0.301667 0.570 0.568759
## hr.f20:lg_temp:season.fSummer 0.179098 0.292607 0.612 0.540510
## hr.f21:lg_temp:season.fSummer 0.199830 0.294715 0.678 0.497768
## hr.f22:lg_temp:season.fSummer 0.023362 0.281565 0.083 0.933878
## hr.f23:lg_temp:season.fSummer -0.209306 0.281064 -0.745 0.456486
## hr.f1:lg_temp:season.fFall -0.366280 0.638366 -0.574 0.566139
## hr.f2:lg_temp:season.fFall -0.140081 0.633988 -0.221 0.825137
## hr.f3:lg_temp:season.fFall -0.411336 0.626651 -0.656 0.511588
## hr.f4:lg_temp:season.fFall 0.644355 0.616070 1.046 0.295642
## hr.f5:lg_temp:season.fFall -0.861608 0.623324 -1.382 0.166935
## hr.f6:lg_temp:season.fFall -0.410943 0.633637 -0.649 0.516656
## hr.f7:lg_temp:season.fFall -0.411451 0.596207 -0.690 0.490148
## hr.f8:lg_temp:season.fFall -0.564782 0.618044 -0.914 0.360845
## hr.f9:lg_temp:season.fFall -0.435533 0.590300 -0.738 0.460654
## hr.f10:lg_temp:season.fFall -0.934171 0.639697 -1.460 0.144249
## hr.f11:lg_temp:season.fFall -1.169814 0.605338 -1.932 0.053343 .
## hr.f12:lg_temp:season.fFall -0.925475 0.624344 -1.482 0.138307
## hr.f13:lg_temp:season.fFall -0.240478 0.599054 -0.401 0.688118
## hr.f14:lg_temp:season.fFall -0.362682 0.609279 -0.595 0.551688
## hr.f15:lg_temp:season.fFall -0.948783 0.603819 -1.571 0.116163
## hr.f16:lg_temp:season.fFall -0.884953 0.576669 -1.535 0.124934
## hr.f17:lg_temp:season.fFall -0.502880 0.599878 -0.838 0.401892
## hr.f18:lg_temp:season.fFall -0.757977 0.580224 -1.306 0.191481
## hr.f19:lg_temp:season.fFall -0.161670 0.598097 -0.270 0.786933
## hr.f20:lg_temp:season.fFall 0.294859 0.636128 0.464 0.643007
## hr.f21:lg_temp:season.fFall -0.164910 0.629314 -0.262 0.793293
## hr.f22:lg_temp:season.fFall -0.245219 0.608577 -0.403 0.687008
## hr.f23:lg_temp:season.fFall -0.165293 0.621985 -0.266 0.790440
## hr.f1:lg_temp:season.fWinter -0.042274 0.293756 -0.144 0.885576
## hr.f2:lg_temp:season.fWinter -0.135257 0.291176 -0.465 0.642292
## hr.f3:lg_temp:season.fWinter 0.298732 0.279031 1.071 0.284389
## hr.f4:lg_temp:season.fWinter 0.296046 0.289791 1.022 0.307017
## hr.f5:lg_temp:season.fWinter 0.148366 0.286943 0.517 0.605134
## hr.f6:lg_temp:season.fWinter -0.003593 0.278967 -0.013 0.989723
## hr.f7:lg_temp:season.fWinter 0.045860 0.272403 0.168 0.866311
## hr.f8:lg_temp:season.fWinter -0.218740 0.298102 -0.734 0.463112
## hr.f9:lg_temp:season.fWinter -0.057390 0.299537 -0.192 0.848066
## hr.f10:lg_temp:season.fWinter -0.157159 0.315250 -0.499 0.618134
## hr.f11:lg_temp:season.fWinter -0.385751 0.342026 -1.128 0.259430
## hr.f12:lg_temp:season.fWinter -0.291798 0.328061 -0.889 0.373789
## hr.f13:lg_temp:season.fWinter -0.217109 0.342245 -0.634 0.525865
## hr.f14:lg_temp:season.fWinter -0.259841 0.336588 -0.772 0.440154
## hr.f15:lg_temp:season.fWinter 0.170420 0.326336 0.522 0.601533
## hr.f16:lg_temp:season.fWinter 0.204876 0.329475 0.622 0.534078
## hr.f17:lg_temp:season.fWinter 0.235525 0.314210 0.750 0.453536
## hr.f18:lg_temp:season.fWinter 0.097032 0.321844 0.301 0.763052
## hr.f19:lg_temp:season.fWinter 0.443724 0.308953 1.436 0.150990
## hr.f20:lg_temp:season.fWinter 0.198931 0.317424 0.627 0.530877
## hr.f21:lg_temp:season.fWinter 0.178255 0.308565 0.578 0.563494
## hr.f22:lg_temp:season.fWinter 0.132711 0.306606 0.433 0.665147
## hr.f23:lg_temp:season.fWinter -0.054760 0.294036 -0.186 0.852266
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4212 on 6230 degrees of freedom
## Multiple R-squared: 0.919, Adjusted R-squared: 0.9157
## F-statistic: 280.5 on 252 and 6230 DF, p-value: < 2.2e-16
The model is further improved with adjusted R-squared increasing to 0.915 and MSE decreasing to 0.4188 in this model!
However, the model still contains too many redundant variables, which does not align with the principle of parsimony in this case. Let’s now try triming down the model further:
## List of 11
## $ call : language lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:workingday.| __truncated__ ...
## $ terms :Classes 'terms', 'formula' language lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:workingday.f + lg_temp:w| __truncated__ ...
## .. ..- attr(*, "variables")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:8, 1:13] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
## .. .. .. ..$ : chr [1:13] "hr.f" "lg_temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "term.labels")= chr [1:13] "hr.f" "lg_temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "order")= int [1:13] 1 1 1 1 1 1 1 2 2 2 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
## $ residuals : Named num [1:6483] 0.3024 -0.2709 -0.5322 0.7109 -0.0839 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:253, 1:4] 4.534 -0.247 -0.633 -1.441 -2.772 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
## $ aliased : Named logi [1:253] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ sigma : num 0.421
## $ df : int [1:3] 253 6230 253
## $ r.squared : num 0.919
## $ adj.r.squared: num 0.916
## $ fstatistic : Named num [1:3] 280 252 6230
## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
## $ cov.unscaled : num [1:253, 1:253] 0.166 -0.152 -0.153 -0.153 -0.153 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## - attr(*, "class")= chr "summary.lm"
## List of 11
## $ call : language lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:workingday.| __truncated__ ...
## $ terms :Classes 'terms', 'formula' language lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:workingday.f + lg_temp:w| __truncated__ ...
## .. ..- attr(*, "variables")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:8, 1:12] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
## .. .. .. ..$ : chr [1:12] "hr.f" "lg_temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "term.labels")= chr [1:12] "hr.f" "lg_temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "order")= int [1:12] 1 1 1 1 1 1 1 2 2 2 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
## $ residuals : Named num [1:6483] 0.3203 -0.27 -0.4912 0.75 -0.0666 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:184, 1:4] 4.638 -0.358 -0.855 -1.456 -2.63 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
## $ aliased : Named logi [1:184] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ sigma : num 0.423
## $ df : int [1:3] 184 6299 184
## $ r.squared : num 0.918
## $ adj.r.squared: num 0.915
## $ fstatistic : Named num [1:3] 383 183 6299
## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
## $ cov.unscaled : num [1:184, 1:184] 0.0775 -0.0588 -0.059 -0.0585 -0.0576 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## - attr(*, "class")= chr "summary.lm"
## List of 11
## $ call : language lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:workingday.| __truncated__ ...
## $ terms :Classes 'terms', 'formula' language lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:workingday.f + lg_temp:w| __truncated__ ...
## .. ..- attr(*, "variables")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:8, 1:11] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
## .. .. .. ..$ : chr [1:11] "hr.f" "lg_temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "term.labels")= chr [1:11] "hr.f" "lg_temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "order")= int [1:11] 1 1 1 1 1 1 1 2 2 2 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
## $ residuals : Named num [1:6483] 0.2978 -0.2638 -0.5056 0.7851 -0.0664 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:115, 1:4] 4.641 -0.323 -0.758 -1.392 -2.606 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
## $ aliased : Named logi [1:115] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ sigma : num 0.425
## $ df : int [1:3] 115 6368 115
## $ r.squared : num 0.916
## $ adj.r.squared: num 0.914
## $ fstatistic : Named num [1:3] 607 114 6368
## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
## $ cov.unscaled : num [1:115, 1:115] 0.0595 -0.0415 -0.0417 -0.0417 -0.0406 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:115] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## - attr(*, "class")= chr "summary.lm"
## List of 11
## $ call : language lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:workingday.| __truncated__ ...
## $ terms :Classes 'terms', 'formula' language lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:workingday.f + lg_temp:w| __truncated__
## .. ..- attr(*, "variables")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:8, 1:10] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
## .. .. .. ..$ : chr [1:10] "hr.f" "lg_temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "term.labels")= chr [1:10] "hr.f" "lg_temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "order")= int [1:10] 1 1 1 1 1 1 1 2 2 2
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(lg_cnt, hr.f, lg_temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:8] "lg_cnt" "hr.f" "lg_temp" "workingday.f" ...
## $ residuals : Named num [1:6483] 0.314 -0.252 -0.418 0.768 -0.134 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:69, 1:4] 4.645 -0.335 -0.572 -1.291 -2.486 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:69] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
## $ aliased : Named logi [1:69] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:69] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ sigma : num 0.432
## $ df : int [1:3] 69 6414 69
## $ r.squared : num 0.912
## $ adj.r.squared: num 0.912
## $ fstatistic : Named num [1:3] 983 68 6414
## ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
## $ cov.unscaled : num [1:69, 1:69] 0.031 -0.0107 -0.0106 -0.0104 -0.0106 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:69] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:69] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## - attr(*, "class")= chr "summary.lm"
It appears that removing these interaction terms did not deteriorate the explanatory power of the log model by too much: adjusted R-squared decreases from 0.915 to 0.912 and MSE increases from 0.419 to 0.432. However, the ANOVA analysis shows that the final model is significantly different from the initial one with p-values less than 0.05:
## Analysis of Variance Table
##
## Model 1: lg_cnt ~ hr.f * lg_temp * workingday.f + hr.f * lg_temp * season.f +
## lg_temp + mnth.f + hum + windspeed
## Model 2: lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f +
## hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f +
## hr.f:season.f + lg_temp:season.f + hr.f:lg_temp:workingday.f +
## hr.f:lg_temp:season.f
## Model 3: lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f +
## hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f +
## lg_temp:season.f + hr.f:lg_temp:workingday.f + hr.f:lg_temp:season.f
## Model 4: lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f +
## hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f +
## lg_temp:season.f + hr.f:lg_temp:workingday.f
## Model 5: lg_cnt ~ hr.f + lg_temp + workingday.f + season.f + mnth.f +
## hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f +
## lg_temp:season.f
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 6230 1105.1
## 2 6230 1105.1 0 0.000
## 3 6299 1125.3 -69 -20.202 1.6506 0.0005986 ***
## 4 6368 1149.7 -69 -24.367 1.9909 2.341e-06 ***
## 5 6414 1194.6 -46 -44.980 5.5125 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
However, the second model with hr.f:lg_temp removed is not significantly different from the first model, so I will choose is as my final model.
##
## Call:
## lm(formula = lg_cnt ~ hr.f + lg_temp + workingday.f + season.f +
## mnth.f + hum + windspeed + hr.f:workingday.f + lg_temp:workingday.f +
## hr.f:season.f + lg_temp:season.f + hr.f:lg_temp:workingday.f +
## hr.f:lg_temp:season.f, data = train11)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4604 -0.1540 0.0488 0.2229 2.0736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.533582 0.171660 26.410 < 2e-16 ***
## hr.f1 -0.247070 0.233793 -1.057 0.290648
## hr.f2 -0.632714 0.233104 -2.714 0.006660 **
## hr.f3 -1.441122 0.240813 -5.984 2.29e-09 ***
## hr.f4 -2.772145 0.264385 -10.485 < 2e-16 ***
## hr.f5 -3.086811 0.251909 -12.254 < 2e-16 ***
## hr.f6 -1.954148 0.241950 -8.077 7.93e-16 ***
## hr.f7 -1.344376 0.232257 -5.788 7.46e-09 ***
## hr.f8 0.025458 0.296019 0.086 0.931469
## hr.f9 0.497165 0.251371 1.978 0.047993 *
## hr.f10 1.180717 0.240942 4.900 9.80e-07 ***
## hr.f11 1.598780 0.266407 6.001 2.07e-09 ***
## hr.f12 1.447013 0.244800 5.911 3.58e-09 ***
## hr.f13 1.835071 0.262841 6.982 3.22e-12 ***
## hr.f14 1.476195 0.245502 6.013 1.92e-09 ***
## hr.f15 1.308035 0.243573 5.370 8.15e-08 ***
## hr.f16 1.330853 0.251860 5.284 1.31e-07 ***
## hr.f17 1.200911 0.243127 4.939 8.04e-07 ***
## hr.f18 0.983277 0.246273 3.993 6.61e-05 ***
## hr.f19 0.535366 0.249987 2.142 0.032266 *
## hr.f20 0.520891 0.263068 1.980 0.047740 *
## hr.f21 0.343491 0.263052 1.306 0.191671
## hr.f22 0.290723 0.253398 1.147 0.251302
## hr.f23 -0.129158 0.240747 -0.536 0.591642
## lg_temp 0.442288 0.113199 3.907 9.44e-05 ***
## workingday.f1 -1.008461 0.107166 -9.410 < 2e-16 ***
## season.fSummer 0.803474 0.212422 3.782 0.000157 ***
## season.fFall 0.667479 0.248495 2.686 0.007249 **
## season.fWinter 0.858886 0.246182 3.489 0.000489 ***
## mnth.f2 0.159044 0.029677 5.359 8.66e-08 ***
## mnth.f3 0.185035 0.035250 5.249 1.58e-07 ***
## mnth.f4 0.384559 0.049256 7.807 6.80e-15 ***
## mnth.f5 0.640851 0.053113 12.066 < 2e-16 ***
## mnth.f6 0.468726 0.055746 8.408 < 2e-16 ***
## mnth.f7 0.383784 0.061098 6.281 3.58e-10 ***
## mnth.f8 0.363315 0.059133 6.144 8.54e-10 ***
## mnth.f9 0.378031 0.056019 6.748 1.63e-11 ***
## mnth.f10 0.322501 0.049281 6.544 6.46e-11 ***
## mnth.f11 0.212335 0.047685 4.453 8.62e-06 ***
## mnth.f12 0.305000 0.039377 7.746 1.10e-14 ***
## hum -0.842518 0.033580 -25.090 < 2e-16 ***
## windspeed -0.549272 0.046353 -11.850 < 2e-16 ***
## hr.f1:workingday.f1 -0.506829 0.158107 -3.206 0.001355 **
## hr.f2:workingday.f1 -0.681905 0.155886 -4.374 1.24e-05 ***
## hr.f3:workingday.f1 -0.771363 0.159695 -4.830 1.40e-06 ***
## hr.f4:workingday.f1 0.678457 0.168105 4.036 5.50e-05 ***
## hr.f5:workingday.f1 2.047011 0.164829 12.419 < 2e-16 ***
## hr.f6:workingday.f1 2.481230 0.160912 15.420 < 2e-16 ***
## hr.f7:workingday.f1 2.801075 0.150463 18.616 < 2e-16 ***
## hr.f8:workingday.f1 2.170660 0.156122 13.904 < 2e-16 ***
## hr.f9:workingday.f1 0.831912 0.154596 5.381 7.67e-08 ***
## hr.f10:workingday.f1 0.128856 0.148708 0.867 0.386248
## hr.f11:workingday.f1 0.102741 0.151080 0.680 0.496505
## hr.f12:workingday.f1 0.361688 0.150254 2.407 0.016105 *
## hr.f13:workingday.f1 0.162788 0.150858 1.079 0.280593
## hr.f14:workingday.f1 0.186932 0.145993 1.280 0.200447
## hr.f15:workingday.f1 0.303987 0.149553 2.033 0.042131 *
## hr.f16:workingday.f1 0.676216 0.148135 4.565 5.09e-06 ***
## hr.f17:workingday.f1 1.346999 0.148305 9.083 < 2e-16 ***
## hr.f18:workingday.f1 1.338956 0.152796 8.763 < 2e-16 ***
## hr.f19:workingday.f1 1.068509 0.150602 7.095 1.44e-12 ***
## hr.f20:workingday.f1 1.000281 0.157066 6.369 2.04e-10 ***
## hr.f21:workingday.f1 0.961633 0.155228 6.195 6.20e-10 ***
## hr.f22:workingday.f1 1.044531 0.154527 6.760 1.51e-11 ***
## hr.f23:workingday.f1 0.979434 0.154711 6.331 2.61e-10 ***
## lg_temp:workingday.f1 0.007914 0.104927 0.075 0.939878
## hr.f1:season.fSummer -0.144182 0.296560 -0.486 0.626856
## hr.f2:season.fSummer -0.375242 0.292593 -1.282 0.199724
## hr.f3:season.fSummer -0.127003 0.302173 -0.420 0.674281
## hr.f4:season.fSummer -0.194272 0.313755 -0.619 0.535817
## hr.f5:season.fSummer 0.291361 0.307402 0.948 0.343259
## hr.f6:season.fSummer 0.262480 0.299688 0.876 0.381149
## hr.f7:season.fSummer 0.443285 0.285372 1.553 0.120389
## hr.f8:season.fSummer -0.124917 0.333514 -0.375 0.708009
## hr.f9:season.fSummer -0.118051 0.300941 -0.392 0.694870
## hr.f10:season.fSummer -0.587658 0.288455 -2.037 0.041666 *
## hr.f11:season.fSummer -0.868934 0.301097 -2.886 0.003916 **
## hr.f12:season.fSummer -0.650876 0.293189 -2.220 0.026455 *
## hr.f13:season.fSummer -0.978481 0.301501 -3.245 0.001179 **
## hr.f14:season.fSummer -0.750543 0.285751 -2.627 0.008646 **
## hr.f15:season.fSummer -0.628057 0.283282 -2.217 0.026654 *
## hr.f16:season.fSummer -0.404307 0.292900 -1.380 0.167526
## hr.f17:season.fSummer -0.384052 0.287293 -1.337 0.181338
## hr.f18:season.fSummer -0.159922 0.291170 -0.549 0.582861
## hr.f19:season.fSummer 0.369939 0.297743 1.242 0.214107
## hr.f20:season.fSummer 0.232877 0.303774 0.767 0.443341
## hr.f21:season.fSummer 0.244015 0.308863 0.790 0.429533
## hr.f22:season.fSummer -0.030339 0.302866 -0.100 0.920210
## hr.f23:season.fSummer -0.150611 0.297560 -0.506 0.612766
## hr.f1:season.fFall -0.194763 0.356116 -0.547 0.584461
## hr.f2:season.fFall -0.200652 0.355830 -0.564 0.572843
## hr.f3:season.fFall -0.055470 0.362335 -0.153 0.878331
## hr.f4:season.fFall 0.446332 0.367830 1.213 0.225016
## hr.f5:season.fFall 0.102268 0.365524 0.280 0.779652
## hr.f6:season.fFall 0.210322 0.364430 0.577 0.563876
## hr.f7:season.fFall 0.214991 0.339269 0.634 0.526307
## hr.f8:season.fFall -0.232171 0.378131 -0.614 0.539239
## hr.f9:season.fFall -0.165559 0.339668 -0.487 0.625982
## hr.f10:season.fFall -0.649566 0.332356 -1.954 0.050695 .
## hr.f11:season.fFall -0.881288 0.335838 -2.624 0.008708 **
## hr.f12:season.fFall -0.727042 0.327090 -2.223 0.026268 *
## hr.f13:season.fFall -0.795132 0.326067 -2.439 0.014774 *
## hr.f14:season.fFall -0.610155 0.318650 -1.915 0.055562 .
## hr.f15:season.fFall -0.619515 0.315418 -1.964 0.049562 *
## hr.f16:season.fFall -0.553842 0.318016 -1.742 0.081636 .
## hr.f17:season.fFall -0.415055 0.316771 -1.310 0.190153
## hr.f18:season.fFall -0.249275 0.322341 -0.773 0.439359
## hr.f19:season.fFall 0.288256 0.331905 0.868 0.385159
## hr.f20:season.fFall 0.309249 0.348028 0.889 0.374265
## hr.f21:season.fFall 0.178797 0.353387 0.506 0.612908
## hr.f22:season.fFall -0.109180 0.345311 -0.316 0.751878
## hr.f23:season.fFall -0.024723 0.349387 -0.071 0.943591
## hr.f1:season.fWinter -0.118934 0.341901 -0.348 0.727955
## hr.f2:season.fWinter -0.310943 0.341705 -0.910 0.362871
## hr.f3:season.fWinter 0.242569 0.338418 0.717 0.473540
## hr.f4:season.fWinter 0.513317 0.348231 1.474 0.140513
## hr.f5:season.fWinter 0.571597 0.348828 1.639 0.101343
## hr.f6:season.fWinter 0.290813 0.342543 0.849 0.395923
## hr.f7:season.fWinter 0.462898 0.330353 1.401 0.161197
## hr.f8:season.fWinter -0.169189 0.370172 -0.457 0.647647
## hr.f9:season.fWinter -0.046470 0.346530 -0.134 0.893328
## hr.f10:season.fWinter -0.283863 0.346151 -0.820 0.412215
## hr.f11:season.fWinter -0.666731 0.364578 -1.829 0.067481 .
## hr.f12:season.fWinter -0.481386 0.342987 -1.404 0.160515
## hr.f13:season.fWinter -0.566056 0.352204 -1.607 0.108065
## hr.f14:season.fWinter -0.517507 0.341636 -1.515 0.129876
## hr.f15:season.fWinter -0.145710 0.332351 -0.438 0.661096
## hr.f16:season.fWinter -0.037562 0.345107 -0.109 0.913332
## hr.f17:season.fWinter -0.038625 0.334028 -0.116 0.907945
## hr.f18:season.fWinter 0.008282 0.346020 0.024 0.980905
## hr.f19:season.fWinter 0.447459 0.340161 1.315 0.188413
## hr.f20:season.fWinter 0.154944 0.351101 0.441 0.659005
## hr.f21:season.fWinter 0.151750 0.350371 0.433 0.664948
## hr.f22:season.fWinter -0.029243 0.351391 -0.083 0.933680
## hr.f23:season.fWinter -0.075897 0.340536 -0.223 0.823640
## lg_temp:season.fSummer 0.659489 0.193552 3.407 0.000660 ***
## lg_temp:season.fFall 0.222227 0.436667 0.509 0.610828
## lg_temp:season.fWinter 0.223747 0.214568 1.043 0.297090
## hr.f1:lg_temp:workingday.f0 0.052460 0.159449 0.329 0.742163
## hr.f2:lg_temp:workingday.f0 -0.090375 0.152572 -0.592 0.553641
## hr.f3:lg_temp:workingday.f0 -0.134459 0.153075 -0.878 0.379766
## hr.f4:lg_temp:workingday.f0 -0.194066 0.188310 -1.031 0.302784
## hr.f5:lg_temp:workingday.f0 -0.287946 0.166815 -1.726 0.084371 .
## hr.f6:lg_temp:workingday.f0 0.070624 0.160420 0.440 0.659773
## hr.f7:lg_temp:workingday.f0 -0.088647 0.151780 -0.584 0.559206
## hr.f8:lg_temp:workingday.f0 0.078601 0.199715 0.394 0.693916
## hr.f9:lg_temp:workingday.f0 0.044830 0.175144 0.256 0.797989
## hr.f10:lg_temp:workingday.f0 0.141348 0.170534 0.829 0.407217
## hr.f11:lg_temp:workingday.f0 0.377567 0.203392 1.856 0.063451 .
## hr.f12:lg_temp:workingday.f0 0.122378 0.184474 0.663 0.507106
## hr.f13:lg_temp:workingday.f0 0.483142 0.219843 2.198 0.028009 *
## hr.f14:lg_temp:workingday.f0 0.182143 0.193983 0.939 0.347786
## hr.f15:lg_temp:workingday.f0 0.108955 0.194907 0.559 0.576175
## hr.f16:lg_temp:workingday.f0 0.148920 0.198991 0.748 0.454261
## hr.f17:lg_temp:workingday.f0 0.137317 0.189329 0.725 0.468304
## hr.f18:lg_temp:workingday.f0 0.088821 0.183557 0.484 0.628484
## hr.f19:lg_temp:workingday.f0 0.008569 0.184728 0.046 0.963005
## hr.f20:lg_temp:workingday.f0 0.112515 0.202481 0.556 0.578446
## hr.f21:lg_temp:workingday.f0 0.092249 0.191920 0.481 0.630774
## hr.f22:lg_temp:workingday.f0 0.146752 0.179383 0.818 0.413335
## hr.f23:lg_temp:workingday.f0 0.018176 0.165227 0.110 0.912408
## hr.f1:lg_temp:workingday.f1 -0.029251 0.168329 -0.174 0.862048
## hr.f2:lg_temp:workingday.f1 0.029879 0.166662 0.179 0.857726
## hr.f3:lg_temp:workingday.f1 -0.284873 0.170704 -1.669 0.095205 .
## hr.f4:lg_temp:workingday.f1 -0.220851 0.167522 -1.318 0.187437
## hr.f5:lg_temp:workingday.f1 -0.265863 0.158393 -1.679 0.093299 .
## hr.f6:lg_temp:workingday.f1 -0.350090 0.168773 -2.074 0.038090 *
## hr.f7:lg_temp:workingday.f1 -0.401533 0.150648 -2.665 0.007710 **
## hr.f8:lg_temp:workingday.f1 -0.457287 0.193303 -2.366 0.018029 *
## hr.f9:lg_temp:workingday.f1 -0.751921 0.176814 -4.253 2.14e-05 ***
## hr.f10:lg_temp:workingday.f1 -0.213585 0.169864 -1.257 0.208660
## hr.f11:lg_temp:workingday.f1 0.045438 0.183257 0.248 0.804185
## hr.f12:lg_temp:workingday.f1 0.091287 0.186541 0.489 0.624598
## hr.f13:lg_temp:workingday.f1 0.190718 0.194808 0.979 0.327615
## hr.f14:lg_temp:workingday.f1 -0.008311 0.187666 -0.044 0.964677
## hr.f15:lg_temp:workingday.f1 -0.029982 0.185967 -0.161 0.871922
## hr.f16:lg_temp:workingday.f1 0.007102 0.187889 0.038 0.969849
## hr.f17:lg_temp:workingday.f1 -0.132740 0.182277 -0.728 0.466500
## hr.f18:lg_temp:workingday.f1 -0.187973 0.183938 -1.022 0.306852
## hr.f19:lg_temp:workingday.f1 -0.431143 0.182319 -2.365 0.018071 *
## hr.f20:lg_temp:workingday.f1 -0.273913 0.180401 -1.518 0.128975
## hr.f21:lg_temp:workingday.f1 -0.212521 0.182732 -1.163 0.244866
## hr.f22:lg_temp:workingday.f1 -0.027678 0.176888 -0.156 0.875666
## hr.f23:lg_temp:workingday.f1 -0.026368 0.170813 -0.154 0.877325
## hr.f1:lg_temp:season.fSummer -0.142476 0.275935 -0.516 0.605637
## hr.f2:lg_temp:season.fSummer -0.637802 0.263226 -2.423 0.015420 *
## hr.f3:lg_temp:season.fSummer -0.235465 0.268749 -0.876 0.380981
## hr.f4:lg_temp:season.fSummer -0.391888 0.282791 -1.386 0.165862
## hr.f5:lg_temp:season.fSummer -0.262559 0.266549 -0.985 0.324647
## hr.f6:lg_temp:season.fSummer -0.157071 0.258373 -0.608 0.543260
## hr.f7:lg_temp:season.fSummer 0.075900 0.257885 0.294 0.768525
## hr.f8:lg_temp:season.fSummer -0.274536 0.286916 -0.957 0.338680
## hr.f9:lg_temp:season.fSummer -0.161635 0.274626 -0.589 0.556175
## hr.f10:lg_temp:season.fSummer -0.483412 0.268832 -1.798 0.072194 .
## hr.f11:lg_temp:season.fSummer -0.717763 0.282930 -2.537 0.011208 *
## hr.f12:lg_temp:season.fSummer -0.644799 0.286327 -2.252 0.024359 *
## hr.f13:lg_temp:season.fSummer -0.969055 0.316436 -3.062 0.002205 **
## hr.f14:lg_temp:season.fSummer -0.684099 0.286924 -2.384 0.017144 *
## hr.f15:lg_temp:season.fSummer -0.670677 0.292795 -2.291 0.022020 *
## hr.f16:lg_temp:season.fSummer -0.355513 0.300866 -1.182 0.237395
## hr.f17:lg_temp:season.fSummer -0.375922 0.296961 -1.266 0.205597
## hr.f18:lg_temp:season.fSummer -0.174114 0.287889 -0.605 0.545337
## hr.f19:lg_temp:season.fSummer 0.171923 0.301667 0.570 0.568759
## hr.f20:lg_temp:season.fSummer 0.179098 0.292607 0.612 0.540510
## hr.f21:lg_temp:season.fSummer 0.199830 0.294715 0.678 0.497768
## hr.f22:lg_temp:season.fSummer 0.023362 0.281565 0.083 0.933878
## hr.f23:lg_temp:season.fSummer -0.209306 0.281064 -0.745 0.456486
## hr.f1:lg_temp:season.fFall -0.366280 0.638366 -0.574 0.566139
## hr.f2:lg_temp:season.fFall -0.140081 0.633988 -0.221 0.825137
## hr.f3:lg_temp:season.fFall -0.411336 0.626651 -0.656 0.511588
## hr.f4:lg_temp:season.fFall 0.644355 0.616070 1.046 0.295642
## hr.f5:lg_temp:season.fFall -0.861608 0.623324 -1.382 0.166935
## hr.f6:lg_temp:season.fFall -0.410943 0.633637 -0.649 0.516656
## hr.f7:lg_temp:season.fFall -0.411451 0.596207 -0.690 0.490148
## hr.f8:lg_temp:season.fFall -0.564782 0.618044 -0.914 0.360845
## hr.f9:lg_temp:season.fFall -0.435533 0.590300 -0.738 0.460654
## hr.f10:lg_temp:season.fFall -0.934171 0.639697 -1.460 0.144249
## hr.f11:lg_temp:season.fFall -1.169814 0.605338 -1.932 0.053343 .
## hr.f12:lg_temp:season.fFall -0.925475 0.624344 -1.482 0.138307
## hr.f13:lg_temp:season.fFall -0.240478 0.599054 -0.401 0.688118
## hr.f14:lg_temp:season.fFall -0.362682 0.609279 -0.595 0.551688
## hr.f15:lg_temp:season.fFall -0.948783 0.603819 -1.571 0.116163
## hr.f16:lg_temp:season.fFall -0.884953 0.576669 -1.535 0.124934
## hr.f17:lg_temp:season.fFall -0.502880 0.599878 -0.838 0.401892
## hr.f18:lg_temp:season.fFall -0.757977 0.580224 -1.306 0.191481
## hr.f19:lg_temp:season.fFall -0.161670 0.598097 -0.270 0.786933
## hr.f20:lg_temp:season.fFall 0.294859 0.636128 0.464 0.643007
## hr.f21:lg_temp:season.fFall -0.164910 0.629314 -0.262 0.793293
## hr.f22:lg_temp:season.fFall -0.245219 0.608577 -0.403 0.687008
## hr.f23:lg_temp:season.fFall -0.165293 0.621985 -0.266 0.790440
## hr.f1:lg_temp:season.fWinter -0.042274 0.293756 -0.144 0.885576
## hr.f2:lg_temp:season.fWinter -0.135257 0.291176 -0.465 0.642292
## hr.f3:lg_temp:season.fWinter 0.298732 0.279031 1.071 0.284389
## hr.f4:lg_temp:season.fWinter 0.296046 0.289791 1.022 0.307017
## hr.f5:lg_temp:season.fWinter 0.148366 0.286943 0.517 0.605134
## hr.f6:lg_temp:season.fWinter -0.003593 0.278967 -0.013 0.989723
## hr.f7:lg_temp:season.fWinter 0.045860 0.272403 0.168 0.866311
## hr.f8:lg_temp:season.fWinter -0.218740 0.298102 -0.734 0.463112
## hr.f9:lg_temp:season.fWinter -0.057390 0.299537 -0.192 0.848066
## hr.f10:lg_temp:season.fWinter -0.157159 0.315250 -0.499 0.618134
## hr.f11:lg_temp:season.fWinter -0.385751 0.342026 -1.128 0.259430
## hr.f12:lg_temp:season.fWinter -0.291798 0.328061 -0.889 0.373789
## hr.f13:lg_temp:season.fWinter -0.217109 0.342245 -0.634 0.525865
## hr.f14:lg_temp:season.fWinter -0.259841 0.336588 -0.772 0.440154
## hr.f15:lg_temp:season.fWinter 0.170420 0.326336 0.522 0.601533
## hr.f16:lg_temp:season.fWinter 0.204876 0.329475 0.622 0.534078
## hr.f17:lg_temp:season.fWinter 0.235525 0.314210 0.750 0.453536
## hr.f18:lg_temp:season.fWinter 0.097032 0.321844 0.301 0.763052
## hr.f19:lg_temp:season.fWinter 0.443724 0.308953 1.436 0.150990
## hr.f20:lg_temp:season.fWinter 0.198931 0.317424 0.627 0.530877
## hr.f21:lg_temp:season.fWinter 0.178255 0.308565 0.578 0.563494
## hr.f22:lg_temp:season.fWinter 0.132711 0.306606 0.433 0.665147
## hr.f23:lg_temp:season.fWinter -0.054760 0.294036 -0.186 0.852266
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4212 on 6230 degrees of freedom
## Multiple R-squared: 0.919, Adjusted R-squared: 0.9157
## F-statistic: 280.5 on 252 and 6230 DF, p-value: < 2.2e-16
Next, I am going to generate predictions of the training dataset using the interactive log model and to compare them to the actual values.
## # A tibble: 20 x 2
## cnt pred
## <int> <dbl>
## 1 40 17.520762
## 2 1 2.171260
## 3 1 1.725305
## 4 2 3.091957
## 5 94 131.559509
## 6 67 82.434962
## 7 35 65.062956
## 8 37 44.570102
## 9 39 23.138253
## 10 6 7.063183
## 11 93 108.167812
## 12 74 87.733195
## 13 22 47.521091
## 14 9 31.469255
## 15 5 9.737650
## 16 30 25.051612
## 17 88 125.419972
## 18 76 92.209247
## 19 110 110.204445
## 20 94 65.457335
## [1] 4743026
## [1] 6.589896
Simply glancing over the first 20 results, the predictions basically align with the actual values with some deviations. More importantly, the mean squared error is only 6.59, suggesting that the model is very valid.
After researching online and closely examining the dataset, I decide that a non-linear regression is also suitable in this situation because the response variable is non-negative integer (count data). And a poisson regression seems reasonable because we are measuring the total rental counts per hour (number of arrivals per unit of time).
## [1] 144.3972
## [1] 17850.4
Apparently variance and mean of the rental count per hour are not the same, which means it is not a strictly Poisson regression in this case. There is probably overdispersion in the data, which occurs when the observed variance is much higher than the observed mean. To test for overdispersion, let’s fit both Poisson and quasi-Poisson model to the same data.
##
## Call:
## glm(formula = formula1, family = poisson, data = train11)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.5243 -3.3235 -0.6176 2.6411 21.0595
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.961642 0.012113 244.497 < 2e-16 ***
## hr.f1 -0.487373 0.014926 -32.652 < 2e-16 ***
## hr.f2 -0.883611 0.016884 -52.334 < 2e-16 ***
## hr.f3 -1.505289 0.022331 -67.408 < 2e-16 ***
## hr.f4 -2.089087 0.029326 -71.237 < 2e-16 ***
## hr.f5 -1.066271 0.018826 -56.638 < 2e-16 ***
## hr.f6 0.331216 0.012052 27.483 < 2e-16 ***
## hr.f7 1.332681 0.010086 132.136 < 2e-16 ***
## hr.f8 1.807462 0.009617 187.952 < 2e-16 ***
## hr.f9 1.303421 0.010079 129.318 < 2e-16 ***
## hr.f10 1.044882 0.010312 101.324 < 2e-16 ***
## hr.f11 1.191663 0.010184 117.011 < 2e-16 ***
## hr.f12 1.355565 0.010053 134.847 < 2e-16 ***
## hr.f13 1.359879 0.010062 135.151 < 2e-16 ***
## hr.f14 1.274582 0.010142 125.677 < 2e-16 ***
## hr.f15 1.284774 0.010195 126.020 < 2e-16 ***
## hr.f16 1.538777 0.009931 154.943 < 2e-16 ***
## hr.f17 1.955886 0.009623 203.258 < 2e-16 ***
## hr.f18 1.902731 0.009614 197.906 < 2e-16 ***
## hr.f19 1.595422 0.009759 163.488 < 2e-16 ***
## hr.f20 1.321851 0.010064 131.345 < 2e-16 ***
## hr.f21 1.092284 0.010241 106.661 < 2e-16 ***
## hr.f22 0.827568 0.010642 77.766 < 2e-16 ***
## hr.f23 0.461826 0.011434 40.391 < 2e-16 ***
## mnth.f2 0.199456 0.008061 24.743 < 2e-16 ***
## mnth.f3 0.322305 0.008307 38.799 < 2e-16 ***
## mnth.f4 0.530135 0.011356 46.684 < 2e-16 ***
## mnth.f5 0.786666 0.011803 66.647 < 2e-16 ***
## mnth.f6 0.699463 0.012226 57.210 < 2e-16 ***
## mnth.f7 0.526043 0.013198 39.859 < 2e-16 ***
## mnth.f8 0.607408 0.012715 47.773 < 2e-16 ***
## mnth.f9 0.690178 0.011694 59.018 < 2e-16 ***
## mnth.f10 0.608446 0.010831 56.174 < 2e-16 ***
## mnth.f11 0.466332 0.010624 43.893 < 2e-16 ***
## mnth.f12 0.444657 0.009086 48.939 < 2e-16 ***
## holiday.f1 -0.136887 0.007172 -19.085 < 2e-16 ***
## workingday.f1 -0.005971 0.002314 -2.580 0.00988 **
## weathersit.fCloudy & Misty -0.026932 0.002662 -10.118 < 2e-16 ***
## weathersit.fAdverse Weather -0.447284 0.005124 -87.299 < 2e-16 ***
## weathersit.fSevere Weather -0.686365 0.166881 -4.113 3.91e-05 ***
## season.fSummer 0.163607 0.007620 21.470 < 2e-16 ***
## season.fFall 0.228876 0.007993 28.635 < 2e-16 ***
## season.fWinter 0.357385 0.007335 48.722 < 2e-16 ***
## temp 0.014832 0.045870 0.323 0.74643
## atemp 0.879267 0.048703 18.054 < 2e-16 ***
## hum -0.399088 0.007716 -51.719 < 2e-16 ***
## windspeed -0.091023 0.009183 -9.912 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 786230 on 6482 degrees of freedom
## Residual deviance: 164223 on 6436 degrees of freedom
## AIC: 204137
##
## Number of Fisher Scoring iterations: 5
##
## Call:
## glm(formula = formula1, family = quasipoisson, data = train11)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.5243 -3.3235 -0.6176 2.6411 21.0595
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.961642 0.060173 49.218 < 2e-16 ***
## hr.f1 -0.487373 0.074147 -6.573 5.31e-11 ***
## hr.f2 -0.883611 0.083873 -10.535 < 2e-16 ***
## hr.f3 -1.505289 0.110932 -13.570 < 2e-16 ***
## hr.f4 -2.089087 0.145680 -14.340 < 2e-16 ***
## hr.f5 -1.066271 0.093520 -11.402 < 2e-16 ***
## hr.f6 0.331216 0.059867 5.533 3.28e-08 ***
## hr.f7 1.332681 0.050101 26.600 < 2e-16 ***
## hr.f8 1.807462 0.047771 37.836 < 2e-16 ***
## hr.f9 1.303421 0.050069 26.032 < 2e-16 ***
## hr.f10 1.044882 0.051227 20.397 < 2e-16 ***
## hr.f11 1.191663 0.050591 23.555 < 2e-16 ***
## hr.f12 1.355565 0.049937 27.146 < 2e-16 ***
## hr.f13 1.359879 0.049984 27.207 < 2e-16 ***
## hr.f14 1.274582 0.050380 25.300 < 2e-16 ***
## hr.f15 1.284774 0.050644 25.369 < 2e-16 ***
## hr.f16 1.538777 0.049334 31.191 < 2e-16 ***
## hr.f17 1.955886 0.047801 40.917 < 2e-16 ***
## hr.f18 1.902731 0.047760 39.840 < 2e-16 ***
## hr.f19 1.595422 0.048477 32.911 < 2e-16 ***
## hr.f20 1.321851 0.049993 26.440 < 2e-16 ***
## hr.f21 1.092284 0.050871 21.471 < 2e-16 ***
## hr.f22 0.827568 0.052864 15.655 < 2e-16 ***
## hr.f23 0.461826 0.056798 8.131 5.07e-16 ***
## mnth.f2 0.199456 0.040045 4.981 6.50e-07 ***
## mnth.f3 0.322305 0.041266 7.810 6.61e-15 ***
## mnth.f4 0.530135 0.056411 9.398 < 2e-16 ***
## mnth.f5 0.786666 0.058635 13.416 < 2e-16 ***
## mnth.f6 0.699463 0.060734 11.517 < 2e-16 ***
## mnth.f7 0.526043 0.065560 8.024 1.21e-15 ***
## mnth.f8 0.607408 0.063161 9.617 < 2e-16 ***
## mnth.f9 0.690178 0.058092 11.881 < 2e-16 ***
## mnth.f10 0.608446 0.053806 11.308 < 2e-16 ***
## mnth.f11 0.466332 0.052777 8.836 < 2e-16 ***
## mnth.f12 0.444657 0.045135 9.852 < 2e-16 ***
## holiday.f1 -0.136887 0.035629 -3.842 0.000123 ***
## workingday.f1 -0.005971 0.011497 -0.519 0.603536
## weathersit.fCloudy & Misty -0.026932 0.013223 -2.037 0.041713 *
## weathersit.fAdverse Weather -0.447284 0.025452 -17.574 < 2e-16 ***
## weathersit.fSevere Weather -0.686365 0.828993 -0.828 0.407729
## season.fSummer 0.163607 0.037854 4.322 1.57e-05 ***
## season.fFall 0.228876 0.039705 5.764 8.58e-09 ***
## season.fWinter 0.357385 0.036438 9.808 < 2e-16 ***
## temp 0.014832 0.227863 0.065 0.948103
## atemp 0.879267 0.241936 3.634 0.000281 ***
## hum -0.399088 0.038332 -10.411 < 2e-16 ***
## windspeed -0.091023 0.045618 -1.995 0.046049 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 24.67681)
##
## Null deviance: 786230 on 6482 degrees of freedom
## Residual deviance: 164223 on 6436 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
According to the output above, it is easy to tell that overdispersion definitely exists as the dispersion parameter, which is forced to be 1 in the poisson model, is estimated at 26 in the quasi-poisson model, which suggests that overdispersion does exist. Now we should consider the use of Negative Binomial regression model in this case.
##
## Call:
## glm.nb(formula = formula1, data = train11, init.theta = 3.955918709,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.8413 -0.7292 -0.0680 0.5048 4.6242
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.95168 0.05353 55.136 < 2e-16 ***
## hr.f1 -0.48481 0.04616 -10.503 < 2e-16 ***
## hr.f2 -0.86861 0.04657 -18.653 < 2e-16 ***
## hr.f3 -1.48901 0.04999 -29.789 < 2e-16 ***
## hr.f4 -2.02125 0.05372 -37.626 < 2e-16 ***
## hr.f5 -0.99721 0.04801 -20.769 < 2e-16 ***
## hr.f6 0.41283 0.04521 9.132 < 2e-16 ***
## hr.f7 1.42285 0.04429 32.124 < 2e-16 ***
## hr.f8 1.94834 0.04416 44.118 < 2e-16 ***
## hr.f9 1.46087 0.04431 32.970 < 2e-16 ***
## hr.f10 1.10280 0.04426 24.915 < 2e-16 ***
## hr.f11 1.22616 0.04488 27.323 < 2e-16 ***
## hr.f12 1.40588 0.04515 31.135 < 2e-16 ***
## hr.f13 1.40405 0.04562 30.780 < 2e-16 ***
## hr.f14 1.32565 0.04544 29.176 < 2e-16 ***
## hr.f15 1.34238 0.04613 29.098 < 2e-16 ***
## hr.f16 1.58428 0.04556 34.775 < 2e-16 ***
## hr.f17 2.04214 0.04508 45.305 < 2e-16 ***
## hr.f18 1.97881 0.04483 44.139 < 2e-16 ***
## hr.f19 1.64800 0.04427 37.223 < 2e-16 ***
## hr.f20 1.36771 0.04491 30.456 < 2e-16 ***
## hr.f21 1.12405 0.04399 25.550 < 2e-16 ***
## hr.f22 0.87459 0.04396 19.895 < 2e-16 ***
## hr.f23 0.49897 0.04440 11.238 < 2e-16 ***
## mnth.f2 0.20277 0.03505 5.785 7.24e-09 ***
## mnth.f3 0.25294 0.03885 6.511 7.46e-11 ***
## mnth.f4 0.44504 0.05967 7.458 8.77e-14 ***
## mnth.f5 0.74809 0.06403 11.684 < 2e-16 ***
## mnth.f6 0.61994 0.06795 9.123 < 2e-16 ***
## mnth.f7 0.49269 0.07637 6.452 1.11e-10 ***
## mnth.f8 0.52598 0.07352 7.155 8.40e-13 ***
## mnth.f9 0.60646 0.06586 9.208 < 2e-16 ***
## mnth.f10 0.47290 0.05827 8.116 4.82e-16 ***
## mnth.f11 0.36379 0.05637 6.454 1.09e-10 ***
## mnth.f12 0.38096 0.04420 8.619 < 2e-16 ***
## holiday.f1 -0.22606 0.04151 -5.446 5.15e-08 ***
## workingday.f1 -0.18759 0.01457 -12.878 < 2e-16 ***
## weathersit.fCloudy & Misty -0.02660 0.01643 -1.619 0.105449
## weathersit.fAdverse Weather -0.49182 0.02692 -18.271 < 2e-16 ***
## weathersit.fSevere Weather -0.59877 0.53181 -1.126 0.260207
## season.fSummer 0.15658 0.04186 3.741 0.000183 ***
## season.fFall 0.22103 0.04834 4.572 4.83e-06 ***
## season.fWinter 0.41641 0.04119 10.109 < 2e-16 ***
## temp 0.12568 0.30733 0.409 0.682580
## atemp 1.01866 0.32461 3.138 0.001700 **
## hum -0.37074 0.04635 -7.998 1.26e-15 ***
## windspeed -0.15108 0.05994 -2.520 0.011722 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(3.9559) family taken to be 1)
##
## Null deviance: 31408.6 on 6482 degrees of freedom
## Residual deviance: 6916.9 on 6436 degrees of freedom
## AIC: 66927
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 3.9559
## Std. Err.: 0.0754
##
## 2 x log-likelihood: -66831.0750
Obviously, the negative binomial model is a better fit in this case as its residual deviance is only 6916.9, much less than the poisson model’s 164223 and the null model’s 31408.6 that includes no parameter but the intercept. Next, I will build a new negative binomial model by including interaction terms as those in the log interactive model:
##
## Call:
## glm.nb(formula = cnt ~ hr.f * temp * workingday.f + hr.f * temp *
## season.f + temp + mnth.f + hum + windspeed, data = train11,
## init.theta = 10.79018799, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.6408 -0.5869 -0.0127 0.4855 7.6183
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.234525 0.144000 22.462 < 2e-16 ***
## hr.f1 -0.347395 0.216383 -1.605 0.108392
## hr.f2 -0.543862 0.222008 -2.450 0.014296 *
## hr.f3 -0.806485 0.245096 -3.290 0.001000 **
## hr.f4 -1.953445 0.341688 -5.717 1.08e-08 ***
## hr.f5 -2.454827 0.281584 -8.718 < 2e-16 ***
## hr.f6 -1.751092 0.232172 -7.542 4.62e-14 ***
## hr.f7 -0.808791 0.205229 -3.941 8.12e-05 ***
## hr.f8 0.253261 0.204217 1.240 0.214918
## hr.f9 0.602675 0.201634 2.989 0.002799 **
## hr.f10 0.927773 0.196133 4.730 2.24e-06 ***
## hr.f11 1.010174 0.203592 4.962 6.99e-07 ***
## hr.f12 1.296911 0.201387 6.440 1.20e-10 ***
## hr.f13 1.165416 0.210081 5.547 2.90e-08 ***
## hr.f14 1.327834 0.201840 6.579 4.75e-11 ***
## hr.f15 1.249909 0.200624 6.230 4.66e-10 ***
## hr.f16 1.263467 0.202568 6.237 4.45e-10 ***
## hr.f17 1.220836 0.198479 6.151 7.70e-10 ***
## hr.f18 1.003953 0.196178 5.118 3.10e-07 ***
## hr.f19 0.826709 0.198115 4.173 3.01e-05 ***
## hr.f20 0.487677 0.211928 2.301 0.021384 *
## hr.f21 0.294547 0.203374 1.448 0.147531
## hr.f22 0.080870 0.202359 0.400 0.689425
## hr.f23 -0.119409 0.204410 -0.584 0.559111
## temp 2.404332 0.444450 5.410 6.31e-08 ***
## workingday.f1 -0.838300 0.124688 -6.723 1.78e-11 ***
## season.fSummer 0.052391 0.204583 0.256 0.797885
## season.fFall 1.060568 0.383392 2.766 0.005670 **
## season.fWinter 0.683308 0.209729 3.258 0.001122 **
## mnth.f2 0.135055 0.024803 5.445 5.18e-08 ***
## mnth.f3 0.183770 0.029480 6.234 4.55e-10 ***
## mnth.f4 0.485137 0.039891 12.162 < 2e-16 ***
## mnth.f5 0.728367 0.042241 17.243 < 2e-16 ***
## mnth.f6 0.557212 0.044830 12.430 < 2e-16 ***
## mnth.f7 0.476269 0.049313 9.658 < 2e-16 ***
## mnth.f8 0.466275 0.047609 9.794 < 2e-16 ***
## mnth.f9 0.447456 0.045556 9.822 < 2e-16 ***
## mnth.f10 0.398653 0.039580 10.072 < 2e-16 ***
## mnth.f11 0.297713 0.038525 7.728 1.09e-14 ***
## mnth.f12 0.328305 0.032432 10.123 < 2e-16 ***
## hum -0.747564 0.026711 -27.987 < 2e-16 ***
## windspeed -0.446801 0.036797 -12.142 < 2e-16 ***
## hr.f1:temp 0.111825 0.673224 0.166 0.868075
## hr.f2:temp 0.185791 0.746426 0.249 0.803432
## hr.f3:temp -1.799047 0.858148 -2.096 0.036044 *
## hr.f4:temp -1.316758 1.096259 -1.201 0.229698
## hr.f5:temp -0.201107 0.928272 -0.217 0.828484
## hr.f6:temp -0.236679 0.731513 -0.324 0.746281
## hr.f7:temp -0.379553 0.654047 -0.580 0.561703
## hr.f8:temp -0.574469 0.685468 -0.838 0.401992
## hr.f9:temp -0.321478 0.630245 -0.510 0.609993
## hr.f10:temp 0.164608 0.594267 0.277 0.781785
## hr.f11:temp 0.357694 0.604155 0.592 0.553813
## hr.f12:temp -0.122439 0.589396 -0.208 0.835435
## hr.f13:temp 0.252067 0.587558 0.429 0.667917
## hr.f14:temp -0.298886 0.577475 -0.518 0.604756
## hr.f15:temp -0.131445 0.562051 -0.234 0.815089
## hr.f16:temp -0.366307 0.575707 -0.636 0.524598
## hr.f17:temp -0.522655 0.567310 -0.921 0.356901
## hr.f18:temp -0.418664 0.574812 -0.728 0.466400
## hr.f19:temp -0.782685 0.588904 -1.329 0.183830
## hr.f20:temp -0.362281 0.620663 -0.584 0.559422
## hr.f21:temp -0.137911 0.619173 -0.223 0.823743
## hr.f22:temp 0.146134 0.617024 0.237 0.812783
## hr.f23:temp 0.214216 0.637604 0.336 0.736893
## hr.f1:workingday.f1 -0.325313 0.187119 -1.739 0.082117 .
## hr.f2:workingday.f1 -0.687554 0.194266 -3.539 0.000401 ***
## hr.f3:workingday.f1 -0.358921 0.222838 -1.611 0.107250
## hr.f4:workingday.f1 0.419159 0.256274 1.636 0.101926
## hr.f5:workingday.f1 1.945314 0.220463 8.824 < 2e-16 ***
## hr.f6:workingday.f1 2.977014 0.187445 15.882 < 2e-16 ***
## hr.f7:workingday.f1 2.999566 0.174246 17.215 < 2e-16 ***
## hr.f8:workingday.f1 2.904639 0.165167 17.586 < 2e-16 ***
## hr.f9:workingday.f1 2.237143 0.167240 13.377 < 2e-16 ***
## hr.f10:workingday.f1 0.743018 0.166549 4.461 8.15e-06 ***
## hr.f11:workingday.f1 0.526213 0.170053 3.094 0.001972 **
## hr.f12:workingday.f1 0.399288 0.169837 2.351 0.018723 *
## hr.f13:workingday.f1 0.502059 0.174853 2.871 0.004088 **
## hr.f14:workingday.f1 0.374639 0.169401 2.212 0.026997 *
## hr.f15:workingday.f1 0.405625 0.175485 2.311 0.020808 *
## hr.f16:workingday.f1 0.740821 0.171488 4.320 1.56e-05 ***
## hr.f17:workingday.f1 1.576515 0.169447 9.304 < 2e-16 ***
## hr.f18:workingday.f1 1.687312 0.169167 9.974 < 2e-16 ***
## hr.f19:workingday.f1 1.643839 0.169949 9.673 < 2e-16 ***
## hr.f20:workingday.f1 1.606472 0.176354 9.109 < 2e-16 ***
## hr.f21:workingday.f1 1.434903 0.170851 8.399 < 2e-16 ***
## hr.f22:workingday.f1 1.251914 0.171790 7.287 3.16e-13 ***
## hr.f23:workingday.f1 0.885177 0.172222 5.140 2.75e-07 ***
## temp:workingday.f1 -0.249861 0.241684 -1.034 0.301213
## hr.f1:season.fSummer 0.317319 0.308663 1.028 0.303930
## hr.f2:season.fSummer 0.470324 0.314996 1.493 0.135408
## hr.f3:season.fSummer -0.297656 0.365160 -0.815 0.414993
## hr.f4:season.fSummer 0.211142 0.444053 0.475 0.634439
## hr.f5:season.fSummer 0.430101 0.331190 1.299 0.194062
## hr.f6:season.fSummer 0.114944 0.281751 0.408 0.683300
## hr.f7:season.fSummer 0.116801 0.268835 0.434 0.663945
## hr.f8:season.fSummer -0.007301 0.275834 -0.026 0.978884
## hr.f9:season.fSummer -0.040854 0.269140 -0.152 0.879349
## hr.f10:season.fSummer 0.265597 0.265195 1.002 0.316578
## hr.f11:season.fSummer 0.461079 0.270765 1.703 0.088591 .
## hr.f12:season.fSummer 0.443921 0.274112 1.619 0.105343
## hr.f13:season.fSummer 0.768343 0.290457 2.645 0.008162 **
## hr.f14:season.fSummer 0.512371 0.274080 1.869 0.061564 .
## hr.f15:season.fSummer 0.601283 0.279081 2.155 0.031200 *
## hr.f16:season.fSummer 0.288349 0.282335 1.021 0.307111
## hr.f17:season.fSummer 0.285656 0.280969 1.017 0.309305
## hr.f18:season.fSummer 0.221351 0.271510 0.815 0.414924
## hr.f19:season.fSummer -0.030059 0.285119 -0.105 0.916039
## hr.f20:season.fSummer -0.270543 0.282254 -0.959 0.337806
## hr.f21:season.fSummer -0.220993 0.282947 -0.781 0.434779
## hr.f22:season.fSummer -0.128376 0.276603 -0.464 0.642563
## hr.f23:season.fSummer 0.273147 0.284005 0.962 0.336167
## hr.f1:season.fFall 0.137648 0.590748 0.233 0.815756
## hr.f2:season.fFall 0.266562 0.630900 0.423 0.672652
## hr.f3:season.fFall 0.062576 0.661348 0.095 0.924617
## hr.f4:season.fFall -0.945234 0.733153 -1.289 0.197303
## hr.f5:season.fFall 1.039369 0.596354 1.743 0.081356 .
## hr.f6:season.fFall 0.404026 0.550755 0.734 0.463202
## hr.f7:season.fFall 0.427568 0.513800 0.832 0.405314
## hr.f8:season.fFall 0.117710 0.516668 0.228 0.819782
## hr.f9:season.fFall -0.140387 0.500017 -0.281 0.778891
## hr.f10:season.fFall 0.525113 0.534669 0.982 0.326037
## hr.f11:season.fFall 0.713331 0.510283 1.398 0.162139
## hr.f12:season.fFall 0.643023 0.520797 1.235 0.216946
## hr.f13:season.fFall 0.142355 0.502883 0.283 0.777117
## hr.f14:season.fFall 0.641095 0.511158 1.254 0.209768
## hr.f15:season.fFall 0.953101 0.504535 1.889 0.058882 .
## hr.f16:season.fFall 0.609139 0.484226 1.258 0.208405
## hr.f17:season.fFall 0.285678 0.501238 0.570 0.568716
## hr.f18:season.fFall 0.541732 0.486147 1.114 0.265135
## hr.f19:season.fFall 0.098278 0.501291 0.196 0.844572
## hr.f20:season.fFall -0.452658 0.532394 -0.850 0.395196
## hr.f21:season.fFall 0.032158 0.526710 0.061 0.951317
## hr.f22:season.fFall 0.081629 0.516255 0.158 0.874364
## hr.f23:season.fFall 0.311802 0.530873 0.587 0.556977
## hr.f1:season.fWinter 0.104451 0.306550 0.341 0.733307
## hr.f2:season.fWinter 0.099897 0.322790 0.309 0.756957
## hr.f3:season.fWinter -0.628045 0.337230 -1.862 0.062552 .
## hr.f4:season.fWinter -0.262745 0.410650 -0.640 0.522286
## hr.f5:season.fWinter 0.438743 0.329060 1.333 0.182427
## hr.f6:season.fWinter 0.235670 0.280215 0.841 0.400329
## hr.f7:season.fWinter 0.321368 0.263481 1.220 0.222578
## hr.f8:season.fWinter 0.114831 0.271407 0.423 0.672225
## hr.f9:season.fWinter 0.005893 0.275802 0.021 0.982952
## hr.f10:season.fWinter 0.064308 0.284904 0.226 0.821420
## hr.f11:season.fWinter 0.268389 0.297560 0.902 0.367074
## hr.f12:season.fWinter 0.179002 0.291351 0.614 0.538961
## hr.f13:season.fWinter 0.053973 0.297113 0.182 0.855850
## hr.f14:season.fWinter 0.002737 0.294758 0.009 0.992591
## hr.f15:season.fWinter -0.216986 0.290879 -0.746 0.455687
## hr.f16:season.fWinter -0.228378 0.288796 -0.791 0.429064
## hr.f17:season.fWinter -0.407663 0.279909 -1.456 0.145278
## hr.f18:season.fWinter -0.222853 0.282375 -0.789 0.429990
## hr.f19:season.fWinter -0.447668 0.278049 -1.610 0.107391
## hr.f20:season.fWinter -0.249107 0.287224 -0.867 0.385783
## hr.f21:season.fWinter -0.139278 0.280772 -0.496 0.619856
## hr.f22:season.fWinter -0.202351 0.280815 -0.721 0.471165
## hr.f23:season.fWinter 0.019190 0.282439 0.068 0.945829
## temp:season.fSummer -0.134685 0.524599 -0.257 0.797381
## temp:season.fFall -1.635534 0.686621 -2.382 0.017219 *
## temp:season.fWinter -0.672058 0.579167 -1.160 0.245891
## hr.f1:temp:workingday.f1 -0.195383 0.368507 -0.530 0.595973
## hr.f2:temp:workingday.f1 -0.126522 0.382656 -0.331 0.740915
## hr.f3:temp:workingday.f1 -0.609561 0.438720 -1.389 0.164709
## hr.f4:temp:workingday.f1 0.416065 0.509294 0.817 0.413960
## hr.f5:temp:workingday.f1 -0.044430 0.453610 -0.098 0.921974
## hr.f6:temp:workingday.f1 -0.619696 0.376369 -1.647 0.099659 .
## hr.f7:temp:workingday.f1 -0.294822 0.343680 -0.858 0.390983
## hr.f8:temp:workingday.f1 -0.827457 0.323972 -2.554 0.010646 *
## hr.f9:temp:workingday.f1 -1.707480 0.326088 -5.236 1.64e-07 ***
## hr.f10:temp:workingday.f1 -0.760246 0.317001 -2.398 0.016474 *
## hr.f11:temp:workingday.f1 -0.425599 0.317519 -1.340 0.180119
## hr.f12:temp:workingday.f1 -0.050557 0.316326 -0.160 0.873019
## hr.f13:temp:workingday.f1 -0.314423 0.318987 -0.986 0.324284
## hr.f14:temp:workingday.f1 -0.165556 0.309417 -0.535 0.592609
## hr.f15:temp:workingday.f1 -0.102321 0.318280 -0.321 0.747846
## hr.f16:temp:workingday.f1 -0.021761 0.313496 -0.069 0.944661
## hr.f17:temp:workingday.f1 -0.239307 0.313086 -0.764 0.444660
## hr.f18:temp:workingday.f1 -0.411709 0.319052 -1.290 0.196906
## hr.f19:temp:workingday.f1 -0.641671 0.321386 -1.997 0.045872 *
## hr.f20:temp:workingday.f1 -0.697131 0.334658 -2.083 0.037241 *
## hr.f21:temp:workingday.f1 -0.571893 0.329387 -1.736 0.082522 .
## hr.f22:temp:workingday.f1 -0.316056 0.333171 -0.949 0.342809
## hr.f23:temp:workingday.f1 0.039174 0.339015 0.116 0.908007
## hr.f1:temp:season.fSummer -0.581861 0.795288 -0.732 0.464391
## hr.f2:temp:season.fSummer -1.011343 0.864672 -1.170 0.242151
## hr.f3:temp:season.fSummer 1.531419 1.017043 1.506 0.132130
## hr.f4:temp:season.fSummer -0.037001 1.234459 -0.030 0.976088
## hr.f5:temp:season.fSummer -0.417087 0.993899 -0.420 0.674743
## hr.f6:temp:season.fSummer 0.322193 0.795190 0.405 0.685347
## hr.f7:temp:season.fSummer 0.163798 0.733064 0.223 0.823190
## hr.f8:temp:season.fSummer 0.115861 0.766810 0.151 0.879901
## hr.f9:temp:season.fSummer 0.068361 0.713394 0.096 0.923660
## hr.f10:temp:season.fSummer -0.916039 0.679229 -1.349 0.177451
## hr.f11:temp:season.fSummer -1.341498 0.683323 -1.963 0.049623 *
## hr.f12:temp:season.fSummer -1.060638 0.678338 -1.564 0.117915
## hr.f13:temp:season.fSummer -1.676274 0.682962 -2.454 0.014111 *
## hr.f14:temp:season.fSummer -1.148471 0.661159 -1.737 0.082377 .
## hr.f15:temp:season.fSummer -1.344426 0.654032 -2.056 0.039821 *
## hr.f16:temp:season.fSummer -0.549422 0.670720 -0.819 0.412699
## hr.f17:temp:season.fSummer -0.401389 0.665517 -0.603 0.546426
## hr.f18:temp:season.fSummer -0.186584 0.663139 -0.281 0.778431
## hr.f19:temp:season.fSummer 0.754416 0.690005 1.093 0.274241
## hr.f20:temp:season.fSummer 0.922665 0.703966 1.311 0.189970
## hr.f21:temp:season.fSummer 0.680261 0.713512 0.953 0.340389
## hr.f22:temp:season.fSummer 0.258364 0.710912 0.363 0.716286
## hr.f23:temp:season.fSummer -0.672484 0.736295 -0.913 0.361066
## hr.f1:temp:season.fFall -0.250268 1.062386 -0.236 0.813765
## hr.f2:temp:season.fFall -0.741203 1.157574 -0.640 0.521973
## hr.f3:temp:season.fFall 1.171419 1.269347 0.923 0.356084
## hr.f4:temp:season.fFall 1.902958 1.471095 1.294 0.195816
## hr.f5:temp:season.fFall -1.321183 1.210882 -1.091 0.275232
## hr.f6:temp:season.fFall -0.093779 1.038339 -0.090 0.928035
## hr.f7:temp:season.fFall -0.305965 0.953350 -0.321 0.748258
## hr.f8:temp:season.fFall 0.010419 0.973434 0.011 0.991460
## hr.f9:temp:season.fFall 0.396880 0.909004 0.437 0.662394
## hr.f10:temp:season.fFall -1.026441 0.915605 -1.121 0.262266
## hr.f11:temp:season.fFall -1.389246 0.889946 -1.561 0.118513
## hr.f12:temp:season.fFall -1.133210 0.889998 -1.273 0.202921
## hr.f13:temp:season.fFall -0.624003 0.863211 -0.723 0.469750
## hr.f14:temp:season.fFall -0.975502 0.865730 -1.127 0.259828
## hr.f15:temp:season.fFall -1.450047 0.851617 -1.703 0.088625 .
## hr.f16:temp:season.fFall -0.770828 0.840803 -0.917 0.359260
## hr.f17:temp:season.fFall -0.210693 0.854370 -0.247 0.805213
## hr.f18:temp:season.fFall -0.342930 0.849322 -0.404 0.686382
## hr.f19:temp:season.fFall 0.740974 0.879705 0.842 0.399621
## hr.f20:temp:season.fFall 1.286106 0.928827 1.385 0.166157
## hr.f21:temp:season.fFall 0.468874 0.933794 0.502 0.615585
## hr.f22:temp:season.fFall -0.001077 0.922962 -0.001 0.999069
## hr.f23:temp:season.fFall -0.524609 0.961003 -0.546 0.585136
## hr.f1:temp:season.fWinter -0.361377 0.867892 -0.416 0.677128
## hr.f2:temp:season.fWinter -0.600436 0.947539 -0.634 0.526290
## hr.f3:temp:season.fWinter 2.246708 1.040462 2.159 0.030824 *
## hr.f4:temp:season.fWinter 1.195505 1.245791 0.960 0.337239
## hr.f5:temp:season.fWinter -0.435289 1.046770 -0.416 0.677527
## hr.f6:temp:season.fWinter -0.041082 0.849670 -0.048 0.961437
## hr.f7:temp:season.fWinter -0.096498 0.787759 -0.122 0.902506
## hr.f8:temp:season.fWinter -0.192347 0.819123 -0.235 0.814348
## hr.f9:temp:season.fWinter -0.033539 0.782570 -0.043 0.965815
## hr.f10:temp:season.fWinter -0.363427 0.772407 -0.471 0.637989
## hr.f11:temp:season.fWinter -0.974418 0.787358 -1.238 0.215872
## hr.f12:temp:season.fWinter -0.615315 0.760252 -0.809 0.418310
## hr.f13:temp:season.fWinter -0.500382 0.759439 -0.659 0.509970
## hr.f14:temp:season.fWinter -0.269946 0.750831 -0.360 0.719199
## hr.f15:temp:season.fWinter 0.077422 0.731741 0.106 0.915737
## hr.f16:temp:season.fWinter 0.436001 0.748241 0.583 0.560094
## hr.f17:temp:season.fWinter 0.823586 0.731882 1.125 0.260463
## hr.f18:temp:season.fWinter 0.651915 0.748287 0.871 0.383640
## hr.f19:temp:season.fWinter 1.402116 0.748953 1.872 0.061192 .
## hr.f20:temp:season.fWinter 0.776728 0.776630 1.000 0.317250
## hr.f21:temp:season.fWinter 0.448498 0.777169 0.577 0.563877
## hr.f22:temp:season.fWinter 0.284529 0.783051 0.363 0.716337
## hr.f23:temp:season.fWinter -0.180054 0.801823 -0.225 0.822324
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(10.7902) family taken to be 1)
##
## Null deviance: 76179.4 on 6482 degrees of freedom
## Residual deviance: 6858.9 on 6230 degrees of freedom
## AIC: 61583
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 10.790
## Std. Err.: 0.233
##
## 2 x log-likelihood: -61074.707
## Likelihood ratio tests of Negative Binomial Models
##
## Response: cnt
## Model
## 1 hr.f + mnth.f + holiday.f + workingday.f + weathersit.f + season.f + temp + atemp + hum + windspeed
## 2 hr.f * temp * workingday.f + hr.f * temp * season.f + temp + mnth.f + hum + windspeed
## theta Resid. df 2 x log-lik. Test df LR stat. Pr(Chi)
## 1 3.955919 6436 -66831.07
## 2 10.790188 6230 -61074.71 1 vs 2 206 5756.368 0
According to the output, the interactive binomial model is further improved with a smaller residual deviance of 6858.9. The anova analysis also suggests that the interactive model is a better fit to our data.
## List of 21
## $ call : language glm.nb(formula = cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:w| __truncated__ ...
## $ terms :Classes 'terms', 'formula' language cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + tem| __truncated__ ...
## .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:8, 1:14] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## .. .. .. ..$ : chr [1:14] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "term.labels")= chr [1:14] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "order")= int [1:14] 1 1 1 1 1 1 1 2 2 2 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## $ family :List of 12
## ..$ family : chr "Negative Binomial(10.7902)"
## ..$ link : chr "log"
## ..$ linkfun :function (mu)
## ..$ linkinv :function (eta)
## ..$ variance :function (mu)
## ..$ dev.resids:function (y, mu, wt)
## ..$ aic :function (y, n, mu, wt, dev)
## ..$ mu.eta :function (eta)
## ..$ initialize: expression({ if (any(y < 0)) stop("negative values not allowed for the negative binomial family") n <- rep(1, | __truncated__
## ..$ validmu :function (mu)
## ..$ valideta :function (eta)
## ..$ simulate :function (object, nsim)
## ..- attr(*, "class")= chr "family"
## $ deviance : num 6859
## $ aic : num 61583
## $ contrasts :List of 4
## ..$ hr.f : chr "contr.treatment"
## ..$ workingday.f: chr "contr.treatment"
## ..$ season.f : chr "contr.treatment"
## ..$ mnth.f : chr "contr.treatment"
## $ df.residual : int 6230
## $ null.deviance : num 76179
## $ df.null : int 6482
## $ iter : int 1
## $ deviance.resid: Named num [1:6483] 0.919 -0.78 -1.454 0.558 -0.344 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:253, 1:4] 3.235 -0.347 -0.544 -0.806 -1.953 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "z value" "Pr(>|z|)"
## $ aliased : Named logi [1:253] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ dispersion : num 1
## $ df : int [1:3] 253 6230 253
## $ cov.unscaled : num [1:253, 1:253] 0.0207 -0.0204 -0.0204 -0.0204 -0.0204 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ cov.scaled : num [1:253, 1:253] 0.0207 -0.0204 -0.0204 -0.0204 -0.0204 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:253] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ theta : num 10.8
## $ SE.theta : num 0.233
## $ twologlik : num -61075
## $ NA : NULL
## - attr(*, "class")= chr [1:2] "summary.negbin" "summary.glm"
## List of 21
## $ call : language glm.nb(formula = cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:w| __truncated__ ...
## $ terms :Classes 'terms', 'formula' language cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + tem| __truncated__ ...
## .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:8, 1:13] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## .. .. .. ..$ : chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "term.labels")= chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "order")= int [1:13] 1 1 1 1 1 1 1 2 2 2 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## $ family :List of 12
## ..$ family : chr "Negative Binomial(10.5934)"
## ..$ link : chr "log"
## ..$ linkfun :function (mu)
## ..$ linkinv :function (eta)
## ..$ variance :function (mu)
## ..$ dev.resids:function (y, mu, wt)
## ..$ aic :function (y, n, mu, wt, dev)
## ..$ mu.eta :function (eta)
## ..$ initialize: expression({ if (any(y < 0)) stop("negative values not allowed for the negative binomial family") n <- rep(1, | __truncated__
## ..$ validmu :function (mu)
## ..$ valideta :function (eta)
## ..$ simulate :function (object, nsim)
## ..- attr(*, "class")= chr "family"
## $ deviance : num 6854
## $ aic : num 61537
## $ contrasts :List of 4
## ..$ hr.f : chr "contr.treatment"
## ..$ workingday.f: chr "contr.treatment"
## ..$ season.f : chr "contr.treatment"
## ..$ mnth.f : chr "contr.treatment"
## $ df.residual : int 6299
## $ null.deviance : num 75008
## $ df.null : int 6482
## $ iter : int 1
## $ deviance.resid: Named num [1:6483] 0.898 -0.75 -1.41 0.748 -0.367 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:184, 1:4] 3.177 -0.233 -0.389 -1.11 -2.072 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "z value" "Pr(>|z|)"
## $ aliased : Named logi [1:184] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ dispersion : num 1
## $ df : int [1:3] 184 6299 184
## $ cov.unscaled : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ cov.scaled : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ theta : num 10.6
## $ SE.theta : num 0.228
## $ twologlik : num -61167
## $ NA : NULL
## - attr(*, "class")= chr [1:2] "summary.negbin" "summary.glm"
## List of 21
## $ call : language glm.nb(formula = cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:w| __truncated__ ...
## $ terms :Classes 'terms', 'formula' language cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + tem| __truncated__ ...
## .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:8, 1:13] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## .. .. .. ..$ : chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "term.labels")= chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "order")= int [1:13] 1 1 1 1 1 1 1 2 2 2 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## $ family :List of 12
## ..$ family : chr "Negative Binomial(10.5934)"
## ..$ link : chr "log"
## ..$ linkfun :function (mu)
## ..$ linkinv :function (eta)
## ..$ variance :function (mu)
## ..$ dev.resids:function (y, mu, wt)
## ..$ aic :function (y, n, mu, wt, dev)
## ..$ mu.eta :function (eta)
## ..$ initialize: expression({ if (any(y < 0)) stop("negative values not allowed for the negative binomial family") n <- rep(1, | __truncated__
## ..$ validmu :function (mu)
## ..$ valideta :function (eta)
## ..$ simulate :function (object, nsim)
## ..- attr(*, "class")= chr "family"
## $ deviance : num 6854
## $ aic : num 61537
## $ contrasts :List of 4
## ..$ hr.f : chr "contr.treatment"
## ..$ workingday.f: chr "contr.treatment"
## ..$ season.f : chr "contr.treatment"
## ..$ mnth.f : chr "contr.treatment"
## $ df.residual : int 6299
## $ null.deviance : num 75008
## $ df.null : int 6482
## $ iter : int 1
## $ deviance.resid: Named num [1:6483] 0.898 -0.75 -1.41 0.748 -0.367 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:184, 1:4] 3.177 -0.233 -0.389 -1.11 -2.072 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "z value" "Pr(>|z|)"
## $ aliased : Named logi [1:184] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ dispersion : num 1
## $ df : int [1:3] 184 6299 184
## $ cov.unscaled : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ cov.scaled : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ theta : num 10.6
## $ SE.theta : num 0.228
## $ twologlik : num -61167
## $ NA : NULL
## - attr(*, "class")= chr [1:2] "summary.negbin" "summary.glm"
## List of 21
## $ call : language glm.nb(formula = cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:w| __truncated__ ...
## $ terms :Classes 'terms', 'formula' language cnt ~ hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + tem| __truncated__ ...
## .. ..- attr(*, "variables")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "factors")= int [1:8, 1:13] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## .. .. .. ..$ : chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "term.labels")= chr [1:13] "hr.f" "temp" "workingday.f" "season.f" ...
## .. ..- attr(*, "order")= int [1:13] 1 1 1 1 1 1 1 2 2 2 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(cnt, hr.f, temp, workingday.f, season.f, mnth.f, hum, windspeed)
## .. ..- attr(*, "dataClasses")= Named chr [1:8] "numeric" "factor" "numeric" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:8] "cnt" "hr.f" "temp" "workingday.f" ...
## $ family :List of 12
## ..$ family : chr "Negative Binomial(10.5934)"
## ..$ link : chr "log"
## ..$ linkfun :function (mu)
## ..$ linkinv :function (eta)
## ..$ variance :function (mu)
## ..$ dev.resids:function (y, mu, wt)
## ..$ aic :function (y, n, mu, wt, dev)
## ..$ mu.eta :function (eta)
## ..$ initialize: expression({ if (any(y < 0)) stop("negative values not allowed for the negative binomial family") n <- rep(1, | __truncated__
## ..$ validmu :function (mu)
## ..$ valideta :function (eta)
## ..$ simulate :function (object, nsim)
## ..- attr(*, "class")= chr "family"
## $ deviance : num 6854
## $ aic : num 61537
## $ contrasts :List of 4
## ..$ hr.f : chr "contr.treatment"
## ..$ workingday.f: chr "contr.treatment"
## ..$ season.f : chr "contr.treatment"
## ..$ mnth.f : chr "contr.treatment"
## $ df.residual : int 6299
## $ null.deviance : num 75008
## $ df.null : int 6482
## $ iter : int 1
## $ deviance.resid: Named num [1:6483] 0.898 -0.75 -1.41 0.748 -0.367 ...
## ..- attr(*, "names")= chr [1:6483] "1" "2" "3" "4" ...
## $ coefficients : num [1:184, 1:4] 3.177 -0.233 -0.389 -1.11 -2.072 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:4] "Estimate" "Std. Error" "z value" "Pr(>|z|)"
## $ aliased : Named logi [1:184] FALSE FALSE FALSE FALSE FALSE FALSE ...
## ..- attr(*, "names")= chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ dispersion : num 1
## $ df : int [1:3] 184 6299 184
## $ cov.unscaled : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ cov.scaled : num [1:184, 1:184] 0.0121 -0.0115 -0.0115 -0.0114 -0.0114 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## .. ..$ : chr [1:184] "(Intercept)" "hr.f1" "hr.f2" "hr.f3" ...
## $ theta : num 10.6
## $ SE.theta : num 0.228
## $ twologlik : num -61167
## $ NA : NULL
## - attr(*, "class")= chr [1:2] "summary.negbin" "summary.glm"
## Likelihood ratio tests of Negative Binomial Models
##
## Response: cnt
## Model
## 1 hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + temp:workingday.f + temp:season.f + hr.f:temp:workingday.f + hr.f:temp:season.f
## 2 hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + temp:workingday.f + temp:season.f + hr.f:temp:workingday.f + hr.f:temp:season.f
## 3 hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + temp:workingday.f + temp:season.f + hr.f:temp:workingday.f + hr.f:temp:season.f
## 4 hr.f * temp * workingday.f + hr.f * temp * season.f + temp + mnth.f + hum + windspeed
## 5 hr.f + temp + workingday.f + season.f + mnth.f + hum + windspeed + hr.f:temp + hr.f:workingday.f + temp:workingday.f + hr.f:season.f + temp:season.f + hr.f:temp:workingday.f + hr.f:temp:season.f
## theta Resid. df 2 x log-lik. Test df LR stat. Pr(Chi)
## 1 10.59339 6299 -61167.18
## 2 10.59339 6299 -61167.18 1 vs 2 0 -6.388291e-09 1.00000000
## 3 10.59339 6299 -61167.18 2 vs 3 0 0.000000e+00 1.00000000
## 4 10.79019 6230 -61074.71 3 vs 4 69 9.246979e+01 0.03121904
## 5 10.79019 6230 -61074.71 4 vs 5 0 -3.135210e-08 1.00000000
The summary output shows that the updated interactive model is improved from the previous model with residual deviance decreasing to only 6482 in this case. After further trimming down the redundant variables, the ANOVA analysis output shows that the explanatory power of the model does not deteriorate significantly till the third revised model. Therefore, in this case, I will choose glm_nb3 as the final interactive negative binomial model.
Now, let’s generate predictions of the training dataset using the negative binomial model and to compare them to the actual values.
## # A tibble: 20 x 3
## cnt pred pred2
## <int> <dbl> <dbl>
## 1 40 17.520762 17.578447
## 2 1 2.171260 2.581132
## 3 1 1.725305 2.133603
## 4 2 3.091957 4.014142
## 5 94 131.559509 138.340101
## 6 67 82.434962 95.385245
## 7 35 65.062956 74.375771
## 8 37 44.570102 55.398084
## 9 39 23.138253 33.663672
## 10 6 7.063183 8.576729
## 11 93 108.167812 112.747895
## 12 74 87.733195 98.343140
## 13 22 47.521091 47.250220
## 14 9 31.469255 30.822930
## 15 5 9.737650 10.556992
## 16 30 25.051612 26.735983
## 17 88 125.419972 112.300792
## 18 76 92.209247 92.028032
## 19 110 110.204445 99.493973
## 20 94 65.457335 66.866203
## [1] 4499429
## [1] 2081.142
## [1] 4743026
## [1] 2193.814
Based on the SSE and MSE, I conclude that the interactive negative binomial model is a better fit to the dataset.
Now, I will use the interactive negative binomial model to predict the cnt variable for the 2012 dataset:
# get predictions for 2012 data
pred_12 <- predict(glm_nbin, newdata = hour12_x)
# add predictions to 2012 data frame
hour12_x <- hour12_x %>%
mutate(cnt_pred=exp(pred_12))
For the complete dataset, please see bike_sharing_predictions_2012