Using R, build a simple linear model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?
The dataset is counts of bicycles from Seattle’s data portal, for their Fremont bike counters. I chose Fremont because the data begins in October 2012, whereas the other counters were put in years later.
Link: https://data.seattle.gov/Transportation/Fremont-Bridge-Hourly-Bicycle-Counts-by-Month-Octo/65db-xm6k
I’ve formatted the data, as well as incorporated weather data I scraped from WeatherUnderground.com, and hours of sunlight per day as calculated by Jake VanderPlas.
This is part of a greater data science project I’m working on for DATA 602.
## Warning: package 'zoo' was built under R version 3.3.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
URL = "https://raw.githubusercontent.com/cspitmit03/data602-finalproject/master/FremontAndPredictors.csv"
data = read.csv(URL)
data$TempAvg = as.numeric(data$TempAvg)
weeklyData = rollmeanr(data[,2],7,fill=NA)
plot(weeklyData, type = "l", main = "Counts of bikes on Fremont, 7 day rolling average, 10/2012 - 10/2017")
How does the year over year growth look?
YoYData = diff(data[,2],365,fill=NA, main ="Histogram of YoY Bike Counts")
weeklyYoY = rollmeanr(YoYData,7,fill=NA)
hist(weeklyYoY,main = "Histogram, YoY change of rolling weekly average")
summary(weeklyYoY)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -1471.00 -254.20 58.14 70.46 376.40 2291.00 6
Fremont = data$Fremont
DeltaPercent = tail(Fremont, -365)/head(Fremont, length(Fremont) - 365)
weeklyDeltaP = rollmeanr(DeltaPercent,7,fill=NA)
weeklyDeltaP = (tail(weeklyDeltaP, -7))*100 - 100
plot(weeklyDeltaP, type = "l", main = "YoY % Change, weekly rolling average, Mean = 23%")
Here are the predictors in the dataset:
summary(data)
## Date Fremont Friday Monday Saturday
## 2012-10-03: 1 Min. : 98 False:1590 False:1590 False:1590
## 2012-10-04: 1 1st Qu.:1774 True : 265 True : 265 True : 265
## 2012-10-05: 1 Median :2493
## 2012-10-06: 1 Mean :2675
## 2012-10-07: 1 3rd Qu.:3704
## 2012-10-08: 1 Max. :7314
## (Other) :1849
## Sunday Sunlight Thursday Tuesday Wednesday
## False:1590 Min. : 8.219 False:1590 False:1590 False:1590
## True : 265 1st Qu.: 9.497 True : 265 True : 265 True : 265
## Median :11.923
## Mean :11.977
## 3rd Qu.:14.503
## Max. :15.781
##
## isMay TempHi TempAvg TempLow
## False:1700 Min. :30.00 Min. : 1.00 Min. :18.00
## True : 155 1st Qu.:53.00 1st Qu.:24.00 1st Qu.:42.00
## Median :62.00 Median :31.00 Median :48.00
## Mean :62.69 Mean :31.42 Mean :48.54
## 3rd Qu.:73.00 3rd Qu.:40.00 3rd Qu.:57.00
## Max. :98.00 Max. :56.00 Max. :70.00
##
## DewHigh DewAvg DewLow HumidHi
## Min. : 6.00 Min. : 1.00 Min. :-4.00 Min. : 40.00
## 1st Qu.:43.00 1st Qu.:40.00 1st Qu.:35.00 1st Qu.: 80.00
## Median :49.00 Median :46.00 Median :42.00 Median : 87.00
## Mean :48.19 Mean :44.76 Mean :40.72 Mean : 86.16
## 3rd Qu.:54.00 3rd Qu.:51.00 3rd Qu.:48.00 3rd Qu.: 93.00
## Max. :77.00 Max. :63.00 Max. :61.00 Max. :100.00
##
## HumidAvg HumidLow PressHigh PressAvg
## Min. :24.00 Min. :11.00 Min. :29.41 Min. :29.19
## 1st Qu.:62.00 1st Qu.:40.00 1st Qu.:30.01 1st Qu.:29.93
## Median :72.00 Median :52.00 Median :30.12 Median :30.05
## Mean :70.63 Mean :52.09 Mean :30.13 Mean :30.04
## 3rd Qu.:80.00 3rd Qu.:64.00 3rd Qu.:30.24 3rd Qu.:30.16
## Max. :99.00 Max. :96.00 Max. :30.86 Max. :30.81
##
## PressLow VisHigh VisAvg VisLow
## Min. :28.94 Min. : 3.00 Min. : 1.000 Min. : 0.000
## 1st Qu.:29.84 1st Qu.:10.00 1st Qu.: 9.000 1st Qu.: 4.000
## Median :29.97 Median :10.00 Median :10.000 Median : 9.000
## Mean :29.95 Mean : 9.98 Mean : 9.353 Mean : 7.071
## 3rd Qu.:30.09 3rd Qu.:10.00 3rd Qu.:10.000 3rd Qu.:10.000
## Max. :30.75 Max. :10.00 Max. :10.000 Max. :10.000
##
## WindAvg WindLow WindHigh Precip
## Min. : 4 Min. : 0.000 Min. : 0.000 Min. :0.000
## 1st Qu.: 8 1st Qu.: 3.000 1st Qu.: 0.000 1st Qu.:0.000
## Median :10 Median : 4.000 Median : 0.000 Median :0.000
## Mean :11 Mean : 4.695 Mean : 9.323 Mean :0.105
## 3rd Qu.:13 3rd Qu.: 6.000 3rd Qu.:20.000 3rd Qu.:0.080
## Max. :30 Max. :23.000 Max. :52.000 Max. :2.490
##
## Events
## :930
## Rain :792
## Fog : 67
## Fog , Rain : 26
## Rain , Thunderstorm: 18
## Rain , Snow : 13
## (Other) : 9
People are more apt to bike when it’s warmer, during late spring through early fall, partly due to higher temperatures, but also due to more sunlight during peak commute times.
We’ll do a simple regression of counts on hours of sunlight per day, exclusive of local weather conditions.
plot(data$Sunlight, data$Fre, main = "Hours of Sunlight vs Daily Bicycle Counts")
fit = lm(formula = Fremont ~ Sunlight, data = data[ ,-1])
print(summary(fit))
##
## Call:
## lm(formula = Fremont ~ Sunlight, data = data[, -1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2961.8 -843.6 111.1 825.3 3500.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1069.28 115.44 -9.263 <2e-16 ***
## Sunlight 312.65 9.42 33.189 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1052 on 1853 degrees of freedom
## Multiple R-squared: 0.3728, Adjusted R-squared: 0.3725
## F-statistic: 1102 on 1 and 1853 DF, p-value: < 2.2e-16
simple_preds = fit$coefficients[1] + fit$coefficients[2]*data$Sunlight
RMSESimple = (sum((simple_preds - mean(data$Fremont))^2) / nrow(data))^.5
plot(simple_preds, type = "l", main = "Fremont Bike count ~ Hours of Sun daily, Oct 2012 - Oct 2017")
Now, a regression with all available predictors. Note the adj. R squared of 0.85.
fitAll = lm(formula = Fremont ~ ., data = data[ ,-1])
print(summary(fitAll))
##
## Call:
## lm(formula = Fremont ~ ., data = data[, -1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2532.7 -266.5 19.4 297.7 3246.2
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -103.5166 2470.2044 -0.042 0.966578
## FridayTrue -468.5192 45.6316 -10.267 < 2e-16 ***
## MondayTrue -238.8359 45.4798 -5.251 1.69e-07 ***
## SaturdayTrue -1792.9771 45.4411 -39.457 < 2e-16 ***
## SundayTrue -1870.3564 45.4333 -41.167 < 2e-16 ***
## Sunlight 109.2160 8.8341 12.363 < 2e-16 ***
## ThursdayTrue -153.1263 45.4842 -3.367 0.000777 ***
## TuesdayTrue 1.9220 45.4141 0.042 0.966247
## WednesdayTrue NA NA NA NA
## isMayTrue 306.7764 48.0239 6.388 2.13e-10 ***
## TempHi 19.3731 8.4765 2.285 0.022398 *
## TempAvg 12.4042 13.8592 0.895 0.370898
## TempLow -35.2364 9.1078 -3.869 0.000113 ***
## DewHigh -8.3558 8.9742 -0.931 0.351931
## DewAvg 38.5168 15.1782 2.538 0.011243 *
## DewLow 11.1100 6.1320 1.812 0.070182 .
## HumidHi 10.4090 3.4184 3.045 0.002360 **
## HumidAvg -18.6429 5.5506 -3.359 0.000799 ***
## HumidLow -5.3913 2.9199 -1.846 0.064996 .
## PressHigh -874.7818 331.7241 -2.637 0.008434 **
## PressAvg 1629.0921 553.8027 2.942 0.003306 **
## PressLow -719.1377 288.4316 -2.493 0.012746 *
## VisHigh 27.6175 49.2578 0.561 0.575089
## VisAvg -3.7339 17.5413 -0.213 0.831460
## VisLow 18.8054 6.6590 2.824 0.004794 **
## WindAvg -5.1743 6.2551 -0.827 0.408219
## WindLow -13.6698 7.6021 -1.798 0.072317 .
## WindHigh 0.2824 1.7505 0.161 0.871864
## Precip -478.3503 70.3441 -6.800 1.41e-11 ***
## EventsFog 189.4520 88.4961 2.141 0.032423 *
## EventsFog , Rain -74.2264 117.4106 -0.632 0.527339
## EventsFog , Rain , Snow 358.4908 527.4743 0.680 0.496821
## EventsRain -298.9633 37.7543 -7.919 4.14e-15 ***
## EventsRain , Snow -264.6140 153.9009 -1.719 0.085716 .
## EventsRain , Thunderstorm -444.0541 128.1289 -3.466 0.000541 ***
## EventsSnow -151.9296 204.8444 -0.742 0.458375
## EventsThunderstorm -62.2274 522.9879 -0.119 0.905301
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 519.3 on 1819 degrees of freedom
## Multiple R-squared: 0.85, Adjusted R-squared: 0.8471
## F-statistic: 294.5 on 35 and 1819 DF, p-value: < 2.2e-16
And lastly, a stepwise regression model, using an F test to select predictors
KitchenSink = step(lm(formula = Fremont ~ ., data = data[ ,-1]), direction = "both", test = "F")
## Start: AIC=23232.6
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday +
## Tuesday + Wednesday + isMay + TempHi + TempAvg + TempLow +
## DewHigh + DewAvg + DewLow + HumidHi + HumidAvg + HumidLow +
## PressHigh + PressAvg + PressLow + VisHigh + VisAvg + VisLow +
## WindAvg + WindLow + WindHigh + Precip + Events
##
##
## Step: AIC=23232.6
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday +
## Tuesday + isMay + TempHi + TempAvg + TempLow + DewHigh +
## DewAvg + DewLow + HumidHi + HumidAvg + HumidLow + PressHigh +
## PressAvg + PressLow + VisHigh + VisAvg + VisLow + WindAvg +
## WindLow + WindHigh + Precip + Events
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - Tuesday 1 483 490601186 23231 0.0018 0.9662470
## - WindHigh 1 7018 490607721 23231 0.0260 0.8718636
## - VisAvg 1 12220 490612923 23231 0.0453 0.8314596
## - VisHigh 1 84784 490685487 23231 0.3144 0.5750893
## - WindAvg 1 184561 490785264 23231 0.6843 0.4082194
## - TempAvg 1 216050 490816753 23231 0.8010 0.3708982
## - DewHigh 1 233818 490834521 23232 0.8669 0.3519307
## <none> 490600703 23233
## - WindLow 1 872074 491472777 23234 3.2334 0.0723170 .
## - DewLow 1 885350 491486053 23234 3.2826 0.0701822 .
## - HumidLow 1 919494 491520196 23234 3.4092 0.0649960 .
## - TempHi 1 1408820 492009523 23236 5.2235 0.0223985 *
## - PressLow 1 1676618 492277320 23237 6.2164 0.0127455 *
## - DewAvg 1 1736819 492337521 23237 6.4396 0.0112432 *
## - PressHigh 1 1875603 492476306 23238 6.9542 0.0084336 **
## - VisLow 1 2150992 492751694 23239 7.9752 0.0047939 **
## - PressAvg 1 2333870 492934572 23239 8.6533 0.0033060 **
## - HumidHi 1 2500690 493101392 23240 9.2718 0.0023605 **
## - HumidAvg 1 3042567 493643270 23242 11.2809 0.0007992 ***
## - Thursday 1 3056855 493657557 23242 11.3339 0.0007769 ***
## - TempLow 1 4036954 494637657 23246 14.9678 0.0001132 ***
## - Monday 1 7438039 498038742 23259 27.5780 1.686e-07 ***
## - isMay 1 11005855 501606557 23272 40.8064 2.129e-10 ***
## - Precip 1 12471889 503072591 23277 46.2420 1.413e-11 ***
## - Events 8 21789412 512390115 23297 10.0986 7.162e-14 ***
## - Friday 1 28432721 519033423 23335 105.4200 < 2.2e-16 ***
## - Sunlight 1 41223703 531824406 23380 152.8451 < 2.2e-16 ***
## - Saturday 1 419902189 910502891 24378 1556.8712 < 2.2e-16 ***
## - Sunday 1 457083958 947684660 24452 1694.7300 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=23230.61
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday +
## isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow +
## HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow +
## VisHigh + VisAvg + VisLow + WindAvg + WindLow + WindHigh +
## Precip + Events
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - WindHigh 1 6944 490608130 23229 0.0258 0.8725048
## - VisAvg 1 12242 490613427 23229 0.0454 0.8312695
## - VisHigh 1 84806 490685992 23229 0.3146 0.5749350
## - WindAvg 1 184082 490785268 23229 0.6829 0.4086998
## - TempAvg 1 216389 490817574 23229 0.8027 0.3703925
## - DewHigh 1 233773 490834959 23230 0.8672 0.3518448
## <none> 490601186 23231
## - WindLow 1 871833 491473018 23232 3.2343 0.0722782 .
## - DewLow 1 885216 491486401 23232 3.2839 0.0701266 .
## - HumidLow 1 920479 491521665 23232 3.4147 0.0647791 .
## + Tuesday 1 483 490600703 23233 0.0018 0.9662470
## + Wednesday 1 483 490600703 23233 0.0018 0.9662470
## - TempHi 1 1408687 492009872 23234 5.2259 0.0223680 *
## - PressLow 1 1676208 492277393 23235 6.2183 0.0127319 *
## - DewAvg 1 1736548 492337734 23235 6.4421 0.0112272 *
## - PressHigh 1 1875122 492476308 23236 6.9562 0.0084240 **
## - VisLow 1 2151013 492752199 23237 7.9797 0.0047821 **
## - PressAvg 1 2333521 492934706 23237 8.6567 0.0032997 **
## - HumidHi 1 2501173 493102359 23238 9.2787 0.0023516 **
## - HumidAvg 1 3042114 493643299 23240 11.2854 0.0007973 ***
## - TempLow 1 4038222 494639408 23244 14.9807 0.0001124 ***
## - Thursday 1 4131088 494732274 23244 15.3252 9.383e-05 ***
## - Monday 1 10038435 500639621 23266 37.2399 1.273e-09 ***
## - isMay 1 11006466 501607651 23270 40.8311 2.103e-10 ***
## - Precip 1 12510779 503111964 23275 46.4117 1.298e-11 ***
## - Events 8 21792902 512394087 23295 10.1057 6.979e-14 ***
## - Friday 1 38283425 528884611 23368 142.0213 < 2.2e-16 ***
## - Sunlight 1 41223270 531824456 23378 152.9274 < 2.2e-16 ***
## - Saturday 1 561285202 1051886388 24643 2082.2189 < 2.2e-16 ***
## - Sunday 1 611320927 1101922113 24730 2267.8382 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=23228.63
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday +
## isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow +
## HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow +
## VisHigh + VisAvg + VisLow + WindAvg + WindLow + Precip +
## Events
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - VisAvg 1 12333 490620463 23227 0.0458 0.8306040
## - VisHigh 1 84949 490693079 23227 0.3153 0.5745106
## - WindAvg 1 205126 490813255 23227 0.7614 0.3830156
## - TempAvg 1 226767 490834897 23228 0.8417 0.3590333
## - DewHigh 1 234174 490842304 23228 0.8692 0.3513032
## <none> 490608130 23229
## - WindLow 1 869105 491477234 23230 3.2259 0.0726488 .
## - DewLow 1 885906 491494036 23230 3.2882 0.0699425 .
## - HumidLow 1 920628 491528757 23230 3.4171 0.0646858 .
## + WindHigh 1 6944 490601186 23231 0.0258 0.8725048
## + Tuesday 1 409 490607721 23231 0.0015 0.9689483
## + Wednesday 1 409 490607721 23231 0.0015 0.9689483
## - TempHi 1 1401772 492009901 23232 5.2030 0.0226632 *
## - PressLow 1 1684688 492292818 23233 6.2531 0.0124849 *
## - DewAvg 1 1731878 492340008 23233 6.4282 0.0113150 *
## - PressHigh 1 1868474 492476603 23234 6.9353 0.0085228 **
## - VisLow 1 2144421 492752551 23235 7.9595 0.0048355 **
## - PressAvg 1 2329858 492937988 23235 8.6478 0.0033159 **
## - HumidHi 1 2496265 493104395 23236 9.2654 0.0023686 **
## - HumidAvg 1 3037214 493645344 23238 11.2733 0.0008025 ***
## - TempLow 1 4067426 494675556 23242 15.0971 0.0001058 ***
## - Thursday 1 4124184 494732314 23242 15.3078 9.469e-05 ***
## - Monday 1 10043414 500651543 23264 37.2783 1.249e-09 ***
## - isMay 1 11038663 501646793 23268 40.9724 1.959e-10 ***
## - Precip 1 12504340 503112469 23273 46.4126 1.297e-11 ***
## - Events 8 21790723 512398853 23293 10.1101 6.868e-14 ***
## - Friday 1 38286083 528894213 23366 142.1072 < 2.2e-16 ***
## - Sunlight 1 41458813 532066943 23377 153.8835 < 2.2e-16 ***
## - Saturday 1 561600650 1052208780 24642 2084.5044 < 2.2e-16 ***
## - Sunday 1 611796226 1102404356 24728 2270.8163 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=23226.68
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday +
## isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow +
## HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow +
## VisHigh + VisLow + WindAvg + WindLow + Precip + Events
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - VisHigh 1 72628 490693090 23225 0.2697 0.6035866
## - WindAvg 1 205977 490826440 23226 0.7649 0.3819050
## - TempAvg 1 229988 490850451 23226 0.8541 0.3555183
## - DewHigh 1 233122 490853585 23226 0.8657 0.3522614
## <none> 490620463 23227
## - DewLow 1 874668 491495131 23228 3.2482 0.0716661 .
## - WindLow 1 882651 491503114 23228 3.2779 0.0703846 .
## - HumidLow 1 909669 491530132 23228 3.3782 0.0662266 .
## + VisAvg 1 12333 490608130 23229 0.0458 0.8306040
## + WindHigh 1 7036 490613427 23229 0.0261 0.8716402
## + Tuesday 1 428 490620035 23229 0.0016 0.9682199
## + Wednesday 1 428 490620035 23229 0.0016 0.9682199
## - TempHi 1 1423132 492043595 23230 5.2850 0.0216222 *
## - PressLow 1 1675430 492295893 23231 6.2220 0.0127053 *
## - DewAvg 1 1720380 492340843 23231 6.3889 0.0115675 *
## - PressHigh 1 1875165 492495628 23232 6.9637 0.0083887 **
## - PressAvg 1 2327391 492947854 23234 8.6431 0.0033243 **
## - HumidHi 1 2485043 493105506 23234 9.2286 0.0024165 **
## - VisLow 1 2680488 493300951 23235 9.9544 0.0016308 **
## - HumidAvg 1 3025275 493645737 23236 11.2349 0.0008192 ***
## - TempLow 1 4055459 494675922 23240 15.0606 0.0001078 ***
## - Thursday 1 4142199 494762662 23240 15.3827 9.104e-05 ***
## - Monday 1 10040025 500660488 23262 37.2853 1.244e-09 ***
## - isMay 1 11027367 501647830 23266 40.9519 1.979e-10 ***
## - Precip 1 13045756 503666219 23273 48.4476 4.716e-12 ***
## - Events 8 24947224 515567687 23303 11.5807 3.655e-16 ***
## - Friday 1 38292665 528913128 23364 142.2061 < 2.2e-16 ***
## - Sunlight 1 41455133 532075596 23375 153.9505 < 2.2e-16 ***
## - Saturday 1 561945980 1052566443 24641 2086.8791 < 2.2e-16 ***
## - Sunday 1 611886549 1102507012 24727 2272.3416 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=23224.95
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday +
## isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow +
## HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow +
## VisLow + WindAvg + WindLow + Precip + Events
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - WindAvg 1 204694 490897784 23224 0.7605 0.3832968
## - DewHigh 1 227654 490920745 23224 0.8458 0.3578731
## - TempAvg 1 235823 490928914 23224 0.8761 0.3493908
## <none> 490693090 23225
## - WindLow 1 874692 491567782 23226 3.2496 0.0716054 .
## - DewLow 1 902628 491595718 23226 3.3534 0.0672294 .
## - HumidLow 1 971242 491664333 23227 3.6083 0.0576491 .
## + VisHigh 1 72628 490620463 23227 0.2697 0.6035866
## + WindHigh 1 7089 490686001 23227 0.0263 0.8711313
## + Tuesday 1 429 490692662 23227 0.0016 0.9681828
## + Wednesday 1 429 490692662 23227 0.0016 0.9681828
## + VisAvg 1 12 490693079 23227 0.0000 0.9947352
## - TempHi 1 1404961 492098052 23228 5.2196 0.0224475 *
## - PressLow 1 1685441 492378531 23229 6.2617 0.0124248 *
## - DewAvg 1 1712185 492405276 23229 6.3610 0.0117499 *
## - PressHigh 1 1884559 492577649 23230 7.0014 0.0082145 **
## - PressAvg 1 2343425 493036515 23232 8.7062 0.0032118 **
## - HumidHi 1 2461979 493155069 23232 9.1466 0.0025266 **
## - VisLow 1 2725392 493418483 23233 10.1252 0.0014872 **
## - HumidAvg 1 2992666 493685756 23234 11.1182 0.0008720 ***
## - TempLow 1 4091264 494784355 23238 15.1997 0.0001002 ***
## - Thursday 1 4215969 494909059 23239 15.6630 7.860e-05 ***
## - Monday 1 10055161 500748252 23261 37.3565 1.200e-09 ***
## - isMay 1 11068857 501761947 23264 41.1225 1.817e-10 ***
## - Precip 1 12996400 503689491 23271 48.2836 5.115e-12 ***
## - Events 8 25015965 515709055 23301 11.6173 3.207e-16 ***
## - Friday 1 38255730 528948821 23362 142.1259 < 2.2e-16 ***
## - Sunlight 1 41382598 532075688 23373 153.7427 < 2.2e-16 ***
## - Saturday 1 561873680 1052566770 24639 2087.4468 < 2.2e-16 ***
## - Sunday 1 612855116 1103548207 24726 2276.8506 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=23223.73
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday +
## isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow +
## HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow +
## VisLow + WindLow + Precip + Events
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - TempAvg 1 226397 491124181 23223 0.8412 0.3591720
## - DewHigh 1 259057 491156841 23223 0.9626 0.3266720
## <none> 490897784 23224
## + WindAvg 1 204694 490693090 23225 0.7605 0.3832968
## - DewLow 1 926858 491824642 23225 3.4439 0.0636480 .
## - HumidLow 1 985863 491883647 23225 3.6631 0.0557859 .
## + VisHigh 1 71344 490826440 23226 0.2650 0.6067810
## + WindHigh 1 27657 490870128 23226 0.1027 0.7486364
## + VisAvg 1 61 490897723 23226 0.0002 0.9880014
## + Tuesday 1 3 490897781 23226 0.0000 0.9971698
## + Wednesday 1 3 490897781 23226 0.0000 0.9971698
## - TempHi 1 1390321 492288105 23227 5.1659 0.0231499 *
## - PressLow 1 1738062 492635846 23228 6.4580 0.0111275 *
## - DewAvg 1 1755009 492652793 23228 6.5210 0.0107416 *
## - PressHigh 1 2191477 493089261 23230 8.1427 0.0043722 **
## - HumidHi 1 2385859 493283643 23231 8.8650 0.0029452 **
## - PressAvg 1 2596916 493494701 23232 9.6492 0.0019235 **
## - VisLow 1 2813637 493711421 23232 10.4545 0.0012454 **
## - WindLow 1 2912957 493810741 23233 10.8235 0.0010213 **
## - HumidAvg 1 2931848 493829632 23233 10.8937 0.0009835 ***
## - TempLow 1 4062286 494960070 23237 15.0940 0.0001059 ***
## - Thursday 1 4190637 495088421 23238 15.5709 8.248e-05 ***
## - Monday 1 10030394 500928178 23259 37.2693 1.254e-09 ***
## - isMay 1 11165529 502063313 23263 41.4871 1.514e-10 ***
## - Precip 1 13336806 504234590 23272 49.5548 2.721e-12 ***
## - Events 8 25419382 516317166 23301 11.8062 < 2.2e-16 ***
## - Friday 1 38154980 529052764 23361 141.7702 < 2.2e-16 ***
## - Sunlight 1 42054310 532952094 23374 156.2587 < 2.2e-16 ***
## - Saturday 1 562202295 1053100079 24638 2088.9420 < 2.2e-16 ***
## - Sunday 1 612922164 1103819948 24725 2277.3988 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=23222.58
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday +
## isMay + TempHi + TempLow + DewHigh + DewAvg + DewLow + HumidHi +
## HumidAvg + HumidLow + PressHigh + PressAvg + PressLow + VisLow +
## WindLow + Precip + Events
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## - DewHigh 1 261283 491385464 23222 0.9709 0.3245820
## <none> 491124181 23223
## + TempAvg 1 226397 490897784 23224 0.8412 0.3591720
## + WindAvg 1 195268 490928914 23224 0.7255 0.3944579
## - DewLow 1 904827 492029008 23224 3.3623 0.0668675 .
## + VisHigh 1 77040 491047141 23224 0.2862 0.5927520
## - HumidLow 1 1032712 492156893 23225 3.8375 0.0502696 .
## + WindHigh 1 14043 491110138 23225 0.0522 0.8193785
## + VisAvg 1 271 491123910 23225 0.0010 0.9746876
## + Tuesday 1 67 491124115 23225 0.0002 0.9874369
## + Wednesday 1 67 491124115 23225 0.0002 0.9874369
## - PressLow 1 1802332 492926513 23227 6.6974 0.0097320 **
## - DewAvg 1 1905094 493029276 23228 7.0793 0.0078664 **
## - PressHigh 1 2168960 493293141 23229 8.0598 0.0045760 **
## - HumidHi 1 2400926 493525107 23230 8.9218 0.0028555 **
## - PressAvg 1 2618173 493742354 23230 9.7290 0.0018421 **
## - VisLow 1 2800910 493925091 23231 10.4081 0.0012769 **
## - HumidAvg 1 3095947 494220129 23232 11.5044 0.0007091 ***
## - WindLow 1 3301445 494425627 23233 12.2681 0.0004719 ***
## - Thursday 1 4159347 495283528 23236 15.4560 8.760e-05 ***
## - TempHi 1 4729050 495853232 23238 17.5730 2.897e-05 ***
## - TempLow 1 5600956 496725138 23242 20.8130 5.401e-06 ***
## - Monday 1 10004022 501128204 23258 37.1746 1.315e-09 ***
## - isMay 1 11192439 502316620 23262 41.5907 1.437e-10 ***
## - Precip 1 13254130 504378311 23270 49.2519 3.162e-12 ***
## - Events 8 25293896 516418077 23300 11.7489 < 2.2e-16 ***
## - Friday 1 38096085 529220267 23359 141.5637 < 2.2e-16 ***
## - Sunlight 1 41925850 533050031 23373 155.7950 < 2.2e-16 ***
## - Saturday 1 561995723 1053119904 24636 2088.3561 < 2.2e-16 ***
## - Sunday 1 613663507 1104787688 24725 2280.3518 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: AIC=23221.57
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday +
## isMay + TempHi + TempLow + DewAvg + DewLow + HumidHi + HumidAvg +
## HumidLow + PressHigh + PressAvg + PressLow + VisLow + WindLow +
## Precip + Events
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 491385464 23222
## + DewHigh 1 261283 491124181 23223 0.9709 0.3245820
## + TempAvg 1 228623 491156841 23223 0.8495 0.3568161
## + WindAvg 1 226076 491159388 23223 0.8400 0.3595092
## + VisHigh 1 70931 491314534 23223 0.2635 0.6078058
## + WindHigh 1 18034 491367431 23224 0.0670 0.7958153
## + VisAvg 1 261 491385203 23224 0.0010 0.9751566
## + Tuesday 1 31 491385433 23224 0.0001 0.9914299
## + Wednesday 1 31 491385433 23224 0.0001 0.9914299
## - HumidLow 1 1226406 492611870 23224 4.5574 0.0329105 *
## - PressLow 1 1755022 493140486 23226 6.5217 0.0107372 *
## - DewLow 1 1815797 493201261 23226 6.7475 0.0094631 **
## - DewAvg 1 1970892 493356357 23227 7.3239 0.0068678 **
## - HumidHi 1 2174389 493559853 23228 8.0801 0.0045252 **
## - PressHigh 1 2226470 493611935 23228 8.2736 0.0040692 **
## - PressAvg 1 2623441 494008905 23229 9.7488 0.0018226 **
## - HumidAvg 1 2909279 494294743 23231 10.8109 0.0010282 **
## - VisLow 1 3043961 494429425 23231 11.3114 0.0007862 ***
## - WindLow 1 3500134 494885599 23233 13.0066 0.0003187 ***
## - Thursday 1 4138356 495523820 23235 15.3782 9.125e-05 ***
## - TempHi 1 4482921 495868385 23236 16.6586 4.668e-05 ***
## - TempLow 1 5610138 496995602 23241 20.8474 5.306e-06 ***
## - Monday 1 9932672 501318136 23257 36.9100 1.502e-09 ***
## - isMay 1 11128734 502514198 23261 41.3546 1.617e-10 ***
## - Precip 1 13430611 504816075 23270 49.9085 2.282e-12 ***
## - Events 8 27625742 519011206 23307 12.8322 < 2.2e-16 ***
## - Friday 1 37876238 529261702 23357 140.7490 < 2.2e-16 ***
## - Sunlight 1 42248291 533633756 23373 156.9956 < 2.2e-16 ***
## - Saturday 1 562066084 1053451548 24634 2088.6509 < 2.2e-16 ***
## - Sunday 1 613402330 1104787795 24723 2279.4176 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(KitchenSink)
##
## Call:
## lm(formula = Fremont ~ Friday + Monday + Saturday + Sunday +
## Sunlight + Thursday + isMay + TempHi + TempLow + DewAvg +
## DewLow + HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg +
## PressLow + VisLow + WindLow + Precip + Events, data = data[,
## -1])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2527.6 -265.4 24.6 294.8 3259.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -304.205 2410.311 -0.126 0.899580
## FridayTrue -466.034 39.282 -11.864 < 2e-16 ***
## MondayTrue -238.431 39.246 -6.075 1.50e-09 ***
## SaturdayTrue -1793.691 39.248 -45.702 < 2e-16 ***
## SundayTrue -1872.218 39.214 -47.743 < 2e-16 ***
## Sunlight 109.768 8.761 12.530 < 2e-16 ***
## ThursdayTrue -153.721 39.199 -3.922 9.12e-05 ***
## isMayTrue 307.946 47.886 6.431 1.62e-10 ***
## TempHi 23.632 5.790 4.081 4.67e-05 ***
## TempLow -29.405 6.440 -4.566 5.31e-06 ***
## DewAvg 29.998 11.085 2.706 0.006868 **
## DewLow 13.982 5.383 2.598 0.009463 **
## HumidHi 9.449 3.324 2.843 0.004525 **
## HumidAvg -17.904 5.445 -3.288 0.001028 **
## HumidLow -6.041 2.830 -2.135 0.032911 *
## PressHigh -934.871 325.015 -2.876 0.004069 **
## PressAvg 1708.496 547.191 3.122 0.001823 **
## PressLow -732.482 286.825 -2.554 0.010737 *
## VisLow 19.030 5.658 3.363 0.000786 ***
## WindLow -19.313 5.355 -3.606 0.000319 ***
## Precip -479.393 67.858 -7.065 2.28e-12 ***
## EventsFog 191.467 82.362 2.325 0.020197 *
## EventsFog , Rain -75.836 114.605 -0.662 0.508234
## EventsFog , Rain , Snow 346.743 526.619 0.658 0.510344
## EventsRain -307.410 36.223 -8.487 < 2e-16 ***
## EventsRain , Snow -269.686 153.385 -1.758 0.078876 .
## EventsRain , Thunderstorm -454.125 127.126 -3.572 0.000363 ***
## EventsSnow -173.836 203.546 -0.854 0.393197
## EventsThunderstorm -75.146 521.147 -0.144 0.885364
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 518.8 on 1826 degrees of freedom
## Multiple R-squared: 0.8498, Adjusted R-squared: 0.8475
## F-statistic: 368.9 on 28 and 1826 DF, p-value: < 2.2e-16
A histogram of our residuals approximates the normal distribution.
residuals = resid(KitchenSink)
hist(residuals)
KSPredictions = predict(KitchenSink, data)
KSPredsRolling = rollmeanr(KSPredictions,7,fill=NA)
plot(KSPredsRolling, type = "l", main = "Stepwise Prediction of Bicycle Counts")
While the model showed a high degree of fit, there is plenty of room for improvement. Specifically, the dataset exhibits a time trend that is not reflected in any of the predictors. A time series model in which data is weighed less the older it is, would reduce prediction error further; also the regression should be trained and validated on different datasets, such as through k-fold cross validation.