Discussion Prompt

Using R, build a simple linear model for data that interests you. Conduct residual analysis. Was the linear model appropriate? Why or why not?


The dataset is counts of bicycles from Seattle’s data portal, for their Fremont bike counters. I chose Fremont because the data begins in October 2012, whereas the other counters were put in years later.

Link: https://data.seattle.gov/Transportation/Fremont-Bridge-Hourly-Bicycle-Counts-by-Month-Octo/65db-xm6k

I’ve formatted the data, as well as incorporated weather data I scraped from WeatherUnderground.com, and hours of sunlight per day as calculated by Jake VanderPlas.

This is part of a greater data science project I’m working on for DATA 602.

## Warning: package 'zoo' was built under R version 3.3.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
URL = "https://raw.githubusercontent.com/cspitmit03/data602-finalproject/master/FremontAndPredictors.csv"

data = read.csv(URL)
data$TempAvg = as.numeric(data$TempAvg)

weeklyData = rollmeanr(data[,2],7,fill=NA)

plot(weeklyData, type = "l", main = "Counts of bikes on Fremont, 7 day rolling average, 10/2012 - 10/2017")

How does the year over year growth look?

YoYData = diff(data[,2],365,fill=NA, main ="Histogram of YoY Bike Counts")
weeklyYoY = rollmeanr(YoYData,7,fill=NA)
hist(weeklyYoY,main = "Histogram, YoY change of rolling weekly average")

summary(weeklyYoY)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -1471.00  -254.20    58.14    70.46   376.40  2291.00        6
Fremont = data$Fremont
DeltaPercent = tail(Fremont, -365)/head(Fremont, length(Fremont) - 365)
weeklyDeltaP = rollmeanr(DeltaPercent,7,fill=NA)
weeklyDeltaP = (tail(weeklyDeltaP, -7))*100 - 100

plot(weeklyDeltaP, type = "l", main = "YoY % Change, weekly rolling average, Mean = 23%")

Here are the predictors in the dataset:

summary(data)
##          Date         Fremont       Friday       Monday      Saturday   
##  2012-10-03:   1   Min.   :  98   False:1590   False:1590   False:1590  
##  2012-10-04:   1   1st Qu.:1774   True : 265   True : 265   True : 265  
##  2012-10-05:   1   Median :2493                                         
##  2012-10-06:   1   Mean   :2675                                         
##  2012-10-07:   1   3rd Qu.:3704                                         
##  2012-10-08:   1   Max.   :7314                                         
##  (Other)   :1849                                                        
##    Sunday        Sunlight       Thursday     Tuesday     Wednesday   
##  False:1590   Min.   : 8.219   False:1590   False:1590   False:1590  
##  True : 265   1st Qu.: 9.497   True : 265   True : 265   True : 265  
##               Median :11.923                                         
##               Mean   :11.977                                         
##               3rd Qu.:14.503                                         
##               Max.   :15.781                                         
##                                                                      
##    isMay          TempHi         TempAvg         TempLow     
##  False:1700   Min.   :30.00   Min.   : 1.00   Min.   :18.00  
##  True : 155   1st Qu.:53.00   1st Qu.:24.00   1st Qu.:42.00  
##               Median :62.00   Median :31.00   Median :48.00  
##               Mean   :62.69   Mean   :31.42   Mean   :48.54  
##               3rd Qu.:73.00   3rd Qu.:40.00   3rd Qu.:57.00  
##               Max.   :98.00   Max.   :56.00   Max.   :70.00  
##                                                              
##     DewHigh          DewAvg          DewLow         HumidHi      
##  Min.   : 6.00   Min.   : 1.00   Min.   :-4.00   Min.   : 40.00  
##  1st Qu.:43.00   1st Qu.:40.00   1st Qu.:35.00   1st Qu.: 80.00  
##  Median :49.00   Median :46.00   Median :42.00   Median : 87.00  
##  Mean   :48.19   Mean   :44.76   Mean   :40.72   Mean   : 86.16  
##  3rd Qu.:54.00   3rd Qu.:51.00   3rd Qu.:48.00   3rd Qu.: 93.00  
##  Max.   :77.00   Max.   :63.00   Max.   :61.00   Max.   :100.00  
##                                                                  
##     HumidAvg        HumidLow       PressHigh        PressAvg    
##  Min.   :24.00   Min.   :11.00   Min.   :29.41   Min.   :29.19  
##  1st Qu.:62.00   1st Qu.:40.00   1st Qu.:30.01   1st Qu.:29.93  
##  Median :72.00   Median :52.00   Median :30.12   Median :30.05  
##  Mean   :70.63   Mean   :52.09   Mean   :30.13   Mean   :30.04  
##  3rd Qu.:80.00   3rd Qu.:64.00   3rd Qu.:30.24   3rd Qu.:30.16  
##  Max.   :99.00   Max.   :96.00   Max.   :30.86   Max.   :30.81  
##                                                                 
##     PressLow        VisHigh          VisAvg           VisLow      
##  Min.   :28.94   Min.   : 3.00   Min.   : 1.000   Min.   : 0.000  
##  1st Qu.:29.84   1st Qu.:10.00   1st Qu.: 9.000   1st Qu.: 4.000  
##  Median :29.97   Median :10.00   Median :10.000   Median : 9.000  
##  Mean   :29.95   Mean   : 9.98   Mean   : 9.353   Mean   : 7.071  
##  3rd Qu.:30.09   3rd Qu.:10.00   3rd Qu.:10.000   3rd Qu.:10.000  
##  Max.   :30.75   Max.   :10.00   Max.   :10.000   Max.   :10.000  
##                                                                   
##     WindAvg      WindLow          WindHigh          Precip     
##  Min.   : 4   Min.   : 0.000   Min.   : 0.000   Min.   :0.000  
##  1st Qu.: 8   1st Qu.: 3.000   1st Qu.: 0.000   1st Qu.:0.000  
##  Median :10   Median : 4.000   Median : 0.000   Median :0.000  
##  Mean   :11   Mean   : 4.695   Mean   : 9.323   Mean   :0.105  
##  3rd Qu.:13   3rd Qu.: 6.000   3rd Qu.:20.000   3rd Qu.:0.080  
##  Max.   :30   Max.   :23.000   Max.   :52.000   Max.   :2.490  
##                                                                
##                  Events   
##                     :930  
##  Rain               :792  
##  Fog                : 67  
##  Fog , Rain         : 26  
##  Rain , Thunderstorm: 18  
##  Rain , Snow        : 13  
##  (Other)            :  9

Linear Regression

People are more apt to bike when it’s warmer, during late spring through early fall, partly due to higher temperatures, but also due to more sunlight during peak commute times.

We’ll do a simple regression of counts on hours of sunlight per day, exclusive of local weather conditions.

plot(data$Sunlight, data$Fre, main = "Hours of Sunlight vs Daily Bicycle Counts")

fit = lm(formula = Fremont ~ Sunlight, data = data[ ,-1])
print(summary(fit))
## 
## Call:
## lm(formula = Fremont ~ Sunlight, data = data[, -1])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2961.8  -843.6   111.1   825.3  3500.5 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1069.28     115.44  -9.263   <2e-16 ***
## Sunlight      312.65       9.42  33.189   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1052 on 1853 degrees of freedom
## Multiple R-squared:  0.3728, Adjusted R-squared:  0.3725 
## F-statistic:  1102 on 1 and 1853 DF,  p-value: < 2.2e-16
simple_preds = fit$coefficients[1] + fit$coefficients[2]*data$Sunlight
RMSESimple = (sum((simple_preds - mean(data$Fremont))^2) / nrow(data))^.5
plot(simple_preds, type = "l", main = "Fremont Bike count ~ Hours of Sun daily, Oct 2012 - Oct 2017")

Regression with all predictors in the dataset

Now, a regression with all available predictors. Note the adj. R squared of 0.85.

fitAll = lm(formula = Fremont ~ ., data = data[ ,-1])

print(summary(fitAll))
## 
## Call:
## lm(formula = Fremont ~ ., data = data[, -1])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2532.7  -266.5    19.4   297.7  3246.2 
## 
## Coefficients: (1 not defined because of singularities)
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -103.5166  2470.2044  -0.042 0.966578    
## FridayTrue                 -468.5192    45.6316 -10.267  < 2e-16 ***
## MondayTrue                 -238.8359    45.4798  -5.251 1.69e-07 ***
## SaturdayTrue              -1792.9771    45.4411 -39.457  < 2e-16 ***
## SundayTrue                -1870.3564    45.4333 -41.167  < 2e-16 ***
## Sunlight                    109.2160     8.8341  12.363  < 2e-16 ***
## ThursdayTrue               -153.1263    45.4842  -3.367 0.000777 ***
## TuesdayTrue                   1.9220    45.4141   0.042 0.966247    
## WednesdayTrue                     NA         NA      NA       NA    
## isMayTrue                   306.7764    48.0239   6.388 2.13e-10 ***
## TempHi                       19.3731     8.4765   2.285 0.022398 *  
## TempAvg                      12.4042    13.8592   0.895 0.370898    
## TempLow                     -35.2364     9.1078  -3.869 0.000113 ***
## DewHigh                      -8.3558     8.9742  -0.931 0.351931    
## DewAvg                       38.5168    15.1782   2.538 0.011243 *  
## DewLow                       11.1100     6.1320   1.812 0.070182 .  
## HumidHi                      10.4090     3.4184   3.045 0.002360 ** 
## HumidAvg                    -18.6429     5.5506  -3.359 0.000799 ***
## HumidLow                     -5.3913     2.9199  -1.846 0.064996 .  
## PressHigh                  -874.7818   331.7241  -2.637 0.008434 ** 
## PressAvg                   1629.0921   553.8027   2.942 0.003306 ** 
## PressLow                   -719.1377   288.4316  -2.493 0.012746 *  
## VisHigh                      27.6175    49.2578   0.561 0.575089    
## VisAvg                       -3.7339    17.5413  -0.213 0.831460    
## VisLow                       18.8054     6.6590   2.824 0.004794 ** 
## WindAvg                      -5.1743     6.2551  -0.827 0.408219    
## WindLow                     -13.6698     7.6021  -1.798 0.072317 .  
## WindHigh                      0.2824     1.7505   0.161 0.871864    
## Precip                     -478.3503    70.3441  -6.800 1.41e-11 ***
## EventsFog                   189.4520    88.4961   2.141 0.032423 *  
## EventsFog , Rain            -74.2264   117.4106  -0.632 0.527339    
## EventsFog , Rain , Snow     358.4908   527.4743   0.680 0.496821    
## EventsRain                 -298.9633    37.7543  -7.919 4.14e-15 ***
## EventsRain , Snow          -264.6140   153.9009  -1.719 0.085716 .  
## EventsRain , Thunderstorm  -444.0541   128.1289  -3.466 0.000541 ***
## EventsSnow                 -151.9296   204.8444  -0.742 0.458375    
## EventsThunderstorm          -62.2274   522.9879  -0.119 0.905301    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 519.3 on 1819 degrees of freedom
## Multiple R-squared:   0.85,  Adjusted R-squared:  0.8471 
## F-statistic: 294.5 on 35 and 1819 DF,  p-value: < 2.2e-16

Stepwise Regression Model

And lastly, a stepwise regression model, using an F test to select predictors

KitchenSink = step(lm(formula = Fremont ~ ., data = data[ ,-1]), direction = "both", test = "F")
## Start:  AIC=23232.6
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday + 
##     Tuesday + Wednesday + isMay + TempHi + TempAvg + TempLow + 
##     DewHigh + DewAvg + DewLow + HumidHi + HumidAvg + HumidLow + 
##     PressHigh + PressAvg + PressLow + VisHigh + VisAvg + VisLow + 
##     WindAvg + WindLow + WindHigh + Precip + Events
## 
## 
## Step:  AIC=23232.6
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday + 
##     Tuesday + isMay + TempHi + TempAvg + TempLow + DewHigh + 
##     DewAvg + DewLow + HumidHi + HumidAvg + HumidLow + PressHigh + 
##     PressAvg + PressLow + VisHigh + VisAvg + VisLow + WindAvg + 
##     WindLow + WindHigh + Precip + Events
## 
##             Df Sum of Sq       RSS   AIC   F value    Pr(>F)    
## - Tuesday    1       483 490601186 23231    0.0018 0.9662470    
## - WindHigh   1      7018 490607721 23231    0.0260 0.8718636    
## - VisAvg     1     12220 490612923 23231    0.0453 0.8314596    
## - VisHigh    1     84784 490685487 23231    0.3144 0.5750893    
## - WindAvg    1    184561 490785264 23231    0.6843 0.4082194    
## - TempAvg    1    216050 490816753 23231    0.8010 0.3708982    
## - DewHigh    1    233818 490834521 23232    0.8669 0.3519307    
## <none>                   490600703 23233                        
## - WindLow    1    872074 491472777 23234    3.2334 0.0723170 .  
## - DewLow     1    885350 491486053 23234    3.2826 0.0701822 .  
## - HumidLow   1    919494 491520196 23234    3.4092 0.0649960 .  
## - TempHi     1   1408820 492009523 23236    5.2235 0.0223985 *  
## - PressLow   1   1676618 492277320 23237    6.2164 0.0127455 *  
## - DewAvg     1   1736819 492337521 23237    6.4396 0.0112432 *  
## - PressHigh  1   1875603 492476306 23238    6.9542 0.0084336 ** 
## - VisLow     1   2150992 492751694 23239    7.9752 0.0047939 ** 
## - PressAvg   1   2333870 492934572 23239    8.6533 0.0033060 ** 
## - HumidHi    1   2500690 493101392 23240    9.2718 0.0023605 ** 
## - HumidAvg   1   3042567 493643270 23242   11.2809 0.0007992 ***
## - Thursday   1   3056855 493657557 23242   11.3339 0.0007769 ***
## - TempLow    1   4036954 494637657 23246   14.9678 0.0001132 ***
## - Monday     1   7438039 498038742 23259   27.5780 1.686e-07 ***
## - isMay      1  11005855 501606557 23272   40.8064 2.129e-10 ***
## - Precip     1  12471889 503072591 23277   46.2420 1.413e-11 ***
## - Events     8  21789412 512390115 23297   10.0986 7.162e-14 ***
## - Friday     1  28432721 519033423 23335  105.4200 < 2.2e-16 ***
## - Sunlight   1  41223703 531824406 23380  152.8451 < 2.2e-16 ***
## - Saturday   1 419902189 910502891 24378 1556.8712 < 2.2e-16 ***
## - Sunday     1 457083958 947684660 24452 1694.7300 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=23230.61
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday + 
##     isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow + 
##     HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow + 
##     VisHigh + VisAvg + VisLow + WindAvg + WindLow + WindHigh + 
##     Precip + Events
## 
##             Df Sum of Sq        RSS   AIC   F value    Pr(>F)    
## - WindHigh   1      6944  490608130 23229    0.0258 0.8725048    
## - VisAvg     1     12242  490613427 23229    0.0454 0.8312695    
## - VisHigh    1     84806  490685992 23229    0.3146 0.5749350    
## - WindAvg    1    184082  490785268 23229    0.6829 0.4086998    
## - TempAvg    1    216389  490817574 23229    0.8027 0.3703925    
## - DewHigh    1    233773  490834959 23230    0.8672 0.3518448    
## <none>                    490601186 23231                        
## - WindLow    1    871833  491473018 23232    3.2343 0.0722782 .  
## - DewLow     1    885216  491486401 23232    3.2839 0.0701266 .  
## - HumidLow   1    920479  491521665 23232    3.4147 0.0647791 .  
## + Tuesday    1       483  490600703 23233    0.0018 0.9662470    
## + Wednesday  1       483  490600703 23233    0.0018 0.9662470    
## - TempHi     1   1408687  492009872 23234    5.2259 0.0223680 *  
## - PressLow   1   1676208  492277393 23235    6.2183 0.0127319 *  
## - DewAvg     1   1736548  492337734 23235    6.4421 0.0112272 *  
## - PressHigh  1   1875122  492476308 23236    6.9562 0.0084240 ** 
## - VisLow     1   2151013  492752199 23237    7.9797 0.0047821 ** 
## - PressAvg   1   2333521  492934706 23237    8.6567 0.0032997 ** 
## - HumidHi    1   2501173  493102359 23238    9.2787 0.0023516 ** 
## - HumidAvg   1   3042114  493643299 23240   11.2854 0.0007973 ***
## - TempLow    1   4038222  494639408 23244   14.9807 0.0001124 ***
## - Thursday   1   4131088  494732274 23244   15.3252 9.383e-05 ***
## - Monday     1  10038435  500639621 23266   37.2399 1.273e-09 ***
## - isMay      1  11006466  501607651 23270   40.8311 2.103e-10 ***
## - Precip     1  12510779  503111964 23275   46.4117 1.298e-11 ***
## - Events     8  21792902  512394087 23295   10.1057 6.979e-14 ***
## - Friday     1  38283425  528884611 23368  142.0213 < 2.2e-16 ***
## - Sunlight   1  41223270  531824456 23378  152.9274 < 2.2e-16 ***
## - Saturday   1 561285202 1051886388 24643 2082.2189 < 2.2e-16 ***
## - Sunday     1 611320927 1101922113 24730 2267.8382 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=23228.63
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday + 
##     isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow + 
##     HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow + 
##     VisHigh + VisAvg + VisLow + WindAvg + WindLow + Precip + 
##     Events
## 
##             Df Sum of Sq        RSS   AIC   F value    Pr(>F)    
## - VisAvg     1     12333  490620463 23227    0.0458 0.8306040    
## - VisHigh    1     84949  490693079 23227    0.3153 0.5745106    
## - WindAvg    1    205126  490813255 23227    0.7614 0.3830156    
## - TempAvg    1    226767  490834897 23228    0.8417 0.3590333    
## - DewHigh    1    234174  490842304 23228    0.8692 0.3513032    
## <none>                    490608130 23229                        
## - WindLow    1    869105  491477234 23230    3.2259 0.0726488 .  
## - DewLow     1    885906  491494036 23230    3.2882 0.0699425 .  
## - HumidLow   1    920628  491528757 23230    3.4171 0.0646858 .  
## + WindHigh   1      6944  490601186 23231    0.0258 0.8725048    
## + Tuesday    1       409  490607721 23231    0.0015 0.9689483    
## + Wednesday  1       409  490607721 23231    0.0015 0.9689483    
## - TempHi     1   1401772  492009901 23232    5.2030 0.0226632 *  
## - PressLow   1   1684688  492292818 23233    6.2531 0.0124849 *  
## - DewAvg     1   1731878  492340008 23233    6.4282 0.0113150 *  
## - PressHigh  1   1868474  492476603 23234    6.9353 0.0085228 ** 
## - VisLow     1   2144421  492752551 23235    7.9595 0.0048355 ** 
## - PressAvg   1   2329858  492937988 23235    8.6478 0.0033159 ** 
## - HumidHi    1   2496265  493104395 23236    9.2654 0.0023686 ** 
## - HumidAvg   1   3037214  493645344 23238   11.2733 0.0008025 ***
## - TempLow    1   4067426  494675556 23242   15.0971 0.0001058 ***
## - Thursday   1   4124184  494732314 23242   15.3078 9.469e-05 ***
## - Monday     1  10043414  500651543 23264   37.2783 1.249e-09 ***
## - isMay      1  11038663  501646793 23268   40.9724 1.959e-10 ***
## - Precip     1  12504340  503112469 23273   46.4126 1.297e-11 ***
## - Events     8  21790723  512398853 23293   10.1101 6.868e-14 ***
## - Friday     1  38286083  528894213 23366  142.1072 < 2.2e-16 ***
## - Sunlight   1  41458813  532066943 23377  153.8835 < 2.2e-16 ***
## - Saturday   1 561600650 1052208780 24642 2084.5044 < 2.2e-16 ***
## - Sunday     1 611796226 1102404356 24728 2270.8163 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=23226.68
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday + 
##     isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow + 
##     HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow + 
##     VisHigh + VisLow + WindAvg + WindLow + Precip + Events
## 
##             Df Sum of Sq        RSS   AIC   F value    Pr(>F)    
## - VisHigh    1     72628  490693090 23225    0.2697 0.6035866    
## - WindAvg    1    205977  490826440 23226    0.7649 0.3819050    
## - TempAvg    1    229988  490850451 23226    0.8541 0.3555183    
## - DewHigh    1    233122  490853585 23226    0.8657 0.3522614    
## <none>                    490620463 23227                        
## - DewLow     1    874668  491495131 23228    3.2482 0.0716661 .  
## - WindLow    1    882651  491503114 23228    3.2779 0.0703846 .  
## - HumidLow   1    909669  491530132 23228    3.3782 0.0662266 .  
## + VisAvg     1     12333  490608130 23229    0.0458 0.8306040    
## + WindHigh   1      7036  490613427 23229    0.0261 0.8716402    
## + Tuesday    1       428  490620035 23229    0.0016 0.9682199    
## + Wednesday  1       428  490620035 23229    0.0016 0.9682199    
## - TempHi     1   1423132  492043595 23230    5.2850 0.0216222 *  
## - PressLow   1   1675430  492295893 23231    6.2220 0.0127053 *  
## - DewAvg     1   1720380  492340843 23231    6.3889 0.0115675 *  
## - PressHigh  1   1875165  492495628 23232    6.9637 0.0083887 ** 
## - PressAvg   1   2327391  492947854 23234    8.6431 0.0033243 ** 
## - HumidHi    1   2485043  493105506 23234    9.2286 0.0024165 ** 
## - VisLow     1   2680488  493300951 23235    9.9544 0.0016308 ** 
## - HumidAvg   1   3025275  493645737 23236   11.2349 0.0008192 ***
## - TempLow    1   4055459  494675922 23240   15.0606 0.0001078 ***
## - Thursday   1   4142199  494762662 23240   15.3827 9.104e-05 ***
## - Monday     1  10040025  500660488 23262   37.2853 1.244e-09 ***
## - isMay      1  11027367  501647830 23266   40.9519 1.979e-10 ***
## - Precip     1  13045756  503666219 23273   48.4476 4.716e-12 ***
## - Events     8  24947224  515567687 23303   11.5807 3.655e-16 ***
## - Friday     1  38292665  528913128 23364  142.2061 < 2.2e-16 ***
## - Sunlight   1  41455133  532075596 23375  153.9505 < 2.2e-16 ***
## - Saturday   1 561945980 1052566443 24641 2086.8791 < 2.2e-16 ***
## - Sunday     1 611886549 1102507012 24727 2272.3416 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=23224.95
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday + 
##     isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow + 
##     HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow + 
##     VisLow + WindAvg + WindLow + Precip + Events
## 
##             Df Sum of Sq        RSS   AIC   F value    Pr(>F)    
## - WindAvg    1    204694  490897784 23224    0.7605 0.3832968    
## - DewHigh    1    227654  490920745 23224    0.8458 0.3578731    
## - TempAvg    1    235823  490928914 23224    0.8761 0.3493908    
## <none>                    490693090 23225                        
## - WindLow    1    874692  491567782 23226    3.2496 0.0716054 .  
## - DewLow     1    902628  491595718 23226    3.3534 0.0672294 .  
## - HumidLow   1    971242  491664333 23227    3.6083 0.0576491 .  
## + VisHigh    1     72628  490620463 23227    0.2697 0.6035866    
## + WindHigh   1      7089  490686001 23227    0.0263 0.8711313    
## + Tuesday    1       429  490692662 23227    0.0016 0.9681828    
## + Wednesday  1       429  490692662 23227    0.0016 0.9681828    
## + VisAvg     1        12  490693079 23227    0.0000 0.9947352    
## - TempHi     1   1404961  492098052 23228    5.2196 0.0224475 *  
## - PressLow   1   1685441  492378531 23229    6.2617 0.0124248 *  
## - DewAvg     1   1712185  492405276 23229    6.3610 0.0117499 *  
## - PressHigh  1   1884559  492577649 23230    7.0014 0.0082145 ** 
## - PressAvg   1   2343425  493036515 23232    8.7062 0.0032118 ** 
## - HumidHi    1   2461979  493155069 23232    9.1466 0.0025266 ** 
## - VisLow     1   2725392  493418483 23233   10.1252 0.0014872 ** 
## - HumidAvg   1   2992666  493685756 23234   11.1182 0.0008720 ***
## - TempLow    1   4091264  494784355 23238   15.1997 0.0001002 ***
## - Thursday   1   4215969  494909059 23239   15.6630 7.860e-05 ***
## - Monday     1  10055161  500748252 23261   37.3565 1.200e-09 ***
## - isMay      1  11068857  501761947 23264   41.1225 1.817e-10 ***
## - Precip     1  12996400  503689491 23271   48.2836 5.115e-12 ***
## - Events     8  25015965  515709055 23301   11.6173 3.207e-16 ***
## - Friday     1  38255730  528948821 23362  142.1259 < 2.2e-16 ***
## - Sunlight   1  41382598  532075688 23373  153.7427 < 2.2e-16 ***
## - Saturday   1 561873680 1052566770 24639 2087.4468 < 2.2e-16 ***
## - Sunday     1 612855116 1103548207 24726 2276.8506 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=23223.73
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday + 
##     isMay + TempHi + TempAvg + TempLow + DewHigh + DewAvg + DewLow + 
##     HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + PressLow + 
##     VisLow + WindLow + Precip + Events
## 
##             Df Sum of Sq        RSS   AIC   F value    Pr(>F)    
## - TempAvg    1    226397  491124181 23223    0.8412 0.3591720    
## - DewHigh    1    259057  491156841 23223    0.9626 0.3266720    
## <none>                    490897784 23224                        
## + WindAvg    1    204694  490693090 23225    0.7605 0.3832968    
## - DewLow     1    926858  491824642 23225    3.4439 0.0636480 .  
## - HumidLow   1    985863  491883647 23225    3.6631 0.0557859 .  
## + VisHigh    1     71344  490826440 23226    0.2650 0.6067810    
## + WindHigh   1     27657  490870128 23226    0.1027 0.7486364    
## + VisAvg     1        61  490897723 23226    0.0002 0.9880014    
## + Tuesday    1         3  490897781 23226    0.0000 0.9971698    
## + Wednesday  1         3  490897781 23226    0.0000 0.9971698    
## - TempHi     1   1390321  492288105 23227    5.1659 0.0231499 *  
## - PressLow   1   1738062  492635846 23228    6.4580 0.0111275 *  
## - DewAvg     1   1755009  492652793 23228    6.5210 0.0107416 *  
## - PressHigh  1   2191477  493089261 23230    8.1427 0.0043722 ** 
## - HumidHi    1   2385859  493283643 23231    8.8650 0.0029452 ** 
## - PressAvg   1   2596916  493494701 23232    9.6492 0.0019235 ** 
## - VisLow     1   2813637  493711421 23232   10.4545 0.0012454 ** 
## - WindLow    1   2912957  493810741 23233   10.8235 0.0010213 ** 
## - HumidAvg   1   2931848  493829632 23233   10.8937 0.0009835 ***
## - TempLow    1   4062286  494960070 23237   15.0940 0.0001059 ***
## - Thursday   1   4190637  495088421 23238   15.5709 8.248e-05 ***
## - Monday     1  10030394  500928178 23259   37.2693 1.254e-09 ***
## - isMay      1  11165529  502063313 23263   41.4871 1.514e-10 ***
## - Precip     1  13336806  504234590 23272   49.5548 2.721e-12 ***
## - Events     8  25419382  516317166 23301   11.8062 < 2.2e-16 ***
## - Friday     1  38154980  529052764 23361  141.7702 < 2.2e-16 ***
## - Sunlight   1  42054310  532952094 23374  156.2587 < 2.2e-16 ***
## - Saturday   1 562202295 1053100079 24638 2088.9420 < 2.2e-16 ***
## - Sunday     1 612922164 1103819948 24725 2277.3988 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=23222.58
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday + 
##     isMay + TempHi + TempLow + DewHigh + DewAvg + DewLow + HumidHi + 
##     HumidAvg + HumidLow + PressHigh + PressAvg + PressLow + VisLow + 
##     WindLow + Precip + Events
## 
##             Df Sum of Sq        RSS   AIC   F value    Pr(>F)    
## - DewHigh    1    261283  491385464 23222    0.9709 0.3245820    
## <none>                    491124181 23223                        
## + TempAvg    1    226397  490897784 23224    0.8412 0.3591720    
## + WindAvg    1    195268  490928914 23224    0.7255 0.3944579    
## - DewLow     1    904827  492029008 23224    3.3623 0.0668675 .  
## + VisHigh    1     77040  491047141 23224    0.2862 0.5927520    
## - HumidLow   1   1032712  492156893 23225    3.8375 0.0502696 .  
## + WindHigh   1     14043  491110138 23225    0.0522 0.8193785    
## + VisAvg     1       271  491123910 23225    0.0010 0.9746876    
## + Tuesday    1        67  491124115 23225    0.0002 0.9874369    
## + Wednesday  1        67  491124115 23225    0.0002 0.9874369    
## - PressLow   1   1802332  492926513 23227    6.6974 0.0097320 ** 
## - DewAvg     1   1905094  493029276 23228    7.0793 0.0078664 ** 
## - PressHigh  1   2168960  493293141 23229    8.0598 0.0045760 ** 
## - HumidHi    1   2400926  493525107 23230    8.9218 0.0028555 ** 
## - PressAvg   1   2618173  493742354 23230    9.7290 0.0018421 ** 
## - VisLow     1   2800910  493925091 23231   10.4081 0.0012769 ** 
## - HumidAvg   1   3095947  494220129 23232   11.5044 0.0007091 ***
## - WindLow    1   3301445  494425627 23233   12.2681 0.0004719 ***
## - Thursday   1   4159347  495283528 23236   15.4560 8.760e-05 ***
## - TempHi     1   4729050  495853232 23238   17.5730 2.897e-05 ***
## - TempLow    1   5600956  496725138 23242   20.8130 5.401e-06 ***
## - Monday     1  10004022  501128204 23258   37.1746 1.315e-09 ***
## - isMay      1  11192439  502316620 23262   41.5907 1.437e-10 ***
## - Precip     1  13254130  504378311 23270   49.2519 3.162e-12 ***
## - Events     8  25293896  516418077 23300   11.7489 < 2.2e-16 ***
## - Friday     1  38096085  529220267 23359  141.5637 < 2.2e-16 ***
## - Sunlight   1  41925850  533050031 23373  155.7950 < 2.2e-16 ***
## - Saturday   1 561995723 1053119904 24636 2088.3561 < 2.2e-16 ***
## - Sunday     1 613663507 1104787688 24725 2280.3518 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=23221.57
## Fremont ~ Friday + Monday + Saturday + Sunday + Sunlight + Thursday + 
##     isMay + TempHi + TempLow + DewAvg + DewLow + HumidHi + HumidAvg + 
##     HumidLow + PressHigh + PressAvg + PressLow + VisLow + WindLow + 
##     Precip + Events
## 
##             Df Sum of Sq        RSS   AIC   F value    Pr(>F)    
## <none>                    491385464 23222                        
## + DewHigh    1    261283  491124181 23223    0.9709 0.3245820    
## + TempAvg    1    228623  491156841 23223    0.8495 0.3568161    
## + WindAvg    1    226076  491159388 23223    0.8400 0.3595092    
## + VisHigh    1     70931  491314534 23223    0.2635 0.6078058    
## + WindHigh   1     18034  491367431 23224    0.0670 0.7958153    
## + VisAvg     1       261  491385203 23224    0.0010 0.9751566    
## + Tuesday    1        31  491385433 23224    0.0001 0.9914299    
## + Wednesday  1        31  491385433 23224    0.0001 0.9914299    
## - HumidLow   1   1226406  492611870 23224    4.5574 0.0329105 *  
## - PressLow   1   1755022  493140486 23226    6.5217 0.0107372 *  
## - DewLow     1   1815797  493201261 23226    6.7475 0.0094631 ** 
## - DewAvg     1   1970892  493356357 23227    7.3239 0.0068678 ** 
## - HumidHi    1   2174389  493559853 23228    8.0801 0.0045252 ** 
## - PressHigh  1   2226470  493611935 23228    8.2736 0.0040692 ** 
## - PressAvg   1   2623441  494008905 23229    9.7488 0.0018226 ** 
## - HumidAvg   1   2909279  494294743 23231   10.8109 0.0010282 ** 
## - VisLow     1   3043961  494429425 23231   11.3114 0.0007862 ***
## - WindLow    1   3500134  494885599 23233   13.0066 0.0003187 ***
## - Thursday   1   4138356  495523820 23235   15.3782 9.125e-05 ***
## - TempHi     1   4482921  495868385 23236   16.6586 4.668e-05 ***
## - TempLow    1   5610138  496995602 23241   20.8474 5.306e-06 ***
## - Monday     1   9932672  501318136 23257   36.9100 1.502e-09 ***
## - isMay      1  11128734  502514198 23261   41.3546 1.617e-10 ***
## - Precip     1  13430611  504816075 23270   49.9085 2.282e-12 ***
## - Events     8  27625742  519011206 23307   12.8322 < 2.2e-16 ***
## - Friday     1  37876238  529261702 23357  140.7490 < 2.2e-16 ***
## - Sunlight   1  42248291  533633756 23373  156.9956 < 2.2e-16 ***
## - Saturday   1 562066084 1053451548 24634 2088.6509 < 2.2e-16 ***
## - Sunday     1 613402330 1104787795 24723 2279.4176 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(KitchenSink)
## 
## Call:
## lm(formula = Fremont ~ Friday + Monday + Saturday + Sunday + 
##     Sunlight + Thursday + isMay + TempHi + TempLow + DewAvg + 
##     DewLow + HumidHi + HumidAvg + HumidLow + PressHigh + PressAvg + 
##     PressLow + VisLow + WindLow + Precip + Events, data = data[, 
##     -1])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2527.6  -265.4    24.6   294.8  3259.6 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -304.205   2410.311  -0.126 0.899580    
## FridayTrue                 -466.034     39.282 -11.864  < 2e-16 ***
## MondayTrue                 -238.431     39.246  -6.075 1.50e-09 ***
## SaturdayTrue              -1793.691     39.248 -45.702  < 2e-16 ***
## SundayTrue                -1872.218     39.214 -47.743  < 2e-16 ***
## Sunlight                    109.768      8.761  12.530  < 2e-16 ***
## ThursdayTrue               -153.721     39.199  -3.922 9.12e-05 ***
## isMayTrue                   307.946     47.886   6.431 1.62e-10 ***
## TempHi                       23.632      5.790   4.081 4.67e-05 ***
## TempLow                     -29.405      6.440  -4.566 5.31e-06 ***
## DewAvg                       29.998     11.085   2.706 0.006868 ** 
## DewLow                       13.982      5.383   2.598 0.009463 ** 
## HumidHi                       9.449      3.324   2.843 0.004525 ** 
## HumidAvg                    -17.904      5.445  -3.288 0.001028 ** 
## HumidLow                     -6.041      2.830  -2.135 0.032911 *  
## PressHigh                  -934.871    325.015  -2.876 0.004069 ** 
## PressAvg                   1708.496    547.191   3.122 0.001823 ** 
## PressLow                   -732.482    286.825  -2.554 0.010737 *  
## VisLow                       19.030      5.658   3.363 0.000786 ***
## WindLow                     -19.313      5.355  -3.606 0.000319 ***
## Precip                     -479.393     67.858  -7.065 2.28e-12 ***
## EventsFog                   191.467     82.362   2.325 0.020197 *  
## EventsFog , Rain            -75.836    114.605  -0.662 0.508234    
## EventsFog , Rain , Snow     346.743    526.619   0.658 0.510344    
## EventsRain                 -307.410     36.223  -8.487  < 2e-16 ***
## EventsRain , Snow          -269.686    153.385  -1.758 0.078876 .  
## EventsRain , Thunderstorm  -454.125    127.126  -3.572 0.000363 ***
## EventsSnow                 -173.836    203.546  -0.854 0.393197    
## EventsThunderstorm          -75.146    521.147  -0.144 0.885364    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 518.8 on 1826 degrees of freedom
## Multiple R-squared:  0.8498, Adjusted R-squared:  0.8475 
## F-statistic: 368.9 on 28 and 1826 DF,  p-value: < 2.2e-16

Residuals Analysis

A histogram of our residuals approximates the normal distribution.

residuals = resid(KitchenSink)
hist(residuals)

KSPredictions = predict(KitchenSink, data)
KSPredsRolling = rollmeanr(KSPredictions,7,fill=NA)
plot(KSPredsRolling, type = "l", main = "Stepwise Prediction of Bicycle Counts")

Parting Comments

While the model showed a high degree of fit, there is plenty of room for improvement. Specifically, the dataset exhibits a time trend that is not reflected in any of the predictors. A time series model in which data is weighed less the older it is, would reduce prediction error further; also the regression should be trained and validated on different datasets, such as through k-fold cross validation.