R Markdown
Import the data
bike.csv <- read.csv("bike_sharing_data.csv")
bike.txt <- read.delim("bike_sharing_data.txt")
Preview the data
head(bike.csv)
## datetime season holiday workingday weather temp atemp humidity
## 1 1/1/2011 0:00 1 0 0 1 9.84 14.395 81
## 2 1/1/2011 1:00 1 0 0 1 9.02 13.635 80
## 3 1/1/2011 2:00 1 0 0 1 9.02 13.635 80
## 4 1/1/2011 3:00 1 0 0 1 9.84 14.395 75
## 5 1/1/2011 4:00 1 0 0 1 9.84 14.395 75
## 6 1/1/2011 5:00 1 0 0 2 9.84 12.880 75
## windspeed casual registered count sources
## 1 0.0000 3 13 16 ad campaign
## 2 0.0000 8 32 40 www.yahoo.com
## 3 0.0000 5 27 32 www.google.fi
## 4 0.0000 3 10 13 AD campaign
## 5 0.0000 0 1 1 Twitter
## 6 6.0032 0 1 1 www.bing.com
tail(bike.csv)
## datetime season holiday workingday weather temp atemp humidity
## 17374 12/31/2012 18:00 1 0 1 2 10.66 13.635 48
## 17375 12/31/2012 19:00 1 0 1 2 10.66 12.880 60
## 17376 12/31/2012 20:00 1 0 1 2 10.66 12.880 60
## 17377 12/31/2012 21:00 1 0 1 1 10.66 12.880 60
## 17378 12/31/2012 22:00 1 0 1 1 10.66 13.635 56
## 17379 12/31/2012 23:00 1 0 1 1 10.66 13.635 65
## windspeed casual registered count sources
## 17374 8.9981 10 326 336 facebook page
## 17375 11.0014 5 206 211 <NA>
## 17376 11.0014 4 140 144 AD campaign
## 17377 11.0014 3 96 99 AD campaign
## 17378 8.9981 4 90 94 ad campaign
## 17379 8.9981 3 50 53 direct
Describe the data
str(bike.csv)
## 'data.frame': 17379 obs. of 13 variables:
## $ datetime : chr "1/1/2011 0:00" "1/1/2011 1:00" "1/1/2011 2:00" "1/1/2011 3:00" ...
## $ season : int 1 1 1 1 1 1 1 1 1 1 ...
## $ holiday : int 0 0 0 0 0 0 0 0 0 0 ...
## $ workingday: int 0 0 0 0 0 0 0 0 0 0 ...
## $ weather : int 1 1 1 1 1 2 1 1 1 1 ...
## $ temp : num 9.84 9.02 9.02 9.84 9.84 ...
## $ atemp : num 14.4 13.6 13.6 14.4 14.4 ...
## $ humidity : chr "81" "80" "80" "75" ...
## $ windspeed : num 0 0 0 0 0 ...
## $ casual : int 3 8 5 3 0 0 2 1 1 8 ...
## $ registered: int 13 32 27 10 1 1 0 2 7 6 ...
## $ count : int 16 40 32 13 1 1 2 3 8 14 ...
## $ sources : chr "ad campaign" "www.yahoo.com" "www.google.fi" "AD campaign" ...
summary(bike.csv)
## datetime season holiday workingday
## Length:17379 Min. :1.000 Min. :0.00000 Min. :0.0000
## Class :character 1st Qu.:2.000 1st Qu.:0.00000 1st Qu.:0.0000
## Mode :character Median :3.000 Median :0.00000 Median :1.0000
## Mean :2.502 Mean :0.02877 Mean :0.6827
## 3rd Qu.:3.000 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :4.000 Max. :1.00000 Max. :1.0000
## weather temp atemp humidity
## Min. :1.000 Min. : 0.82 Min. : 0.00 Length:17379
## 1st Qu.:1.000 1st Qu.:13.94 1st Qu.:16.66 Class :character
## Median :1.000 Median :20.50 Median :24.24 Mode :character
## Mean :1.425 Mean :20.38 Mean :23.79
## 3rd Qu.:2.000 3rd Qu.:27.06 3rd Qu.:31.06
## Max. :4.000 Max. :41.00 Max. :50.00
## windspeed casual registered count
## Min. : 0.000 Min. : 0.00 Min. : 0.0 Min. : 1
## 1st Qu.: 7.002 1st Qu.: 4.00 1st Qu.: 36.0 1st Qu.: 42
## Median :12.998 Median : 16.00 Median :116.0 Median :141
## Mean :12.737 Mean : 34.48 Mean :152.5 Mean :187
## 3rd Qu.:16.998 3rd Qu.: 46.00 3rd Qu.:217.0 3rd Qu.:277
## Max. :56.997 Max. :367.00 Max. :886.0 Max. :977
## sources
## Length:17379
## Class :character
## Mode :character
##
##
##
There are 17379 observations of 13 variables.
Create a contingency table
sort(table(bike.csv$season), decreasing = TRUE)
##
## 3 2 1 4
## 4496 4409 4242 4232
Winter has 4232 observations as 4 is winter.
Select the data using indexing
bike.csv[6251,'season']
## [1] 4
Row 6251 has the season 4/winter.
Subset the data using the subset()
subset(bike.csv, (windspeed >=40) & (season %in% c("1", "4")))
## datetime season holiday workingday weather temp atemp humidity
## 1008 2/14/2011 15:00 1 0 1 1 22.96 26.515 21
## 1010 2/14/2011 17:00 1 0 1 1 18.86 22.725 33
## 1011 2/14/2011 18:00 1 0 1 1 16.40 20.455 40
## 1015 2/14/2011 22:00 1 0 1 1 13.94 14.395 46
## 1018 2/15/2011 1:00 1 0 1 1 12.30 12.120 42
## 1019 2/15/2011 2:00 1 0 1 1 11.48 11.365 41
## 1120 2/19/2011 9:00 1 0 0 1 16.40 20.455 16
## 1124 2/19/2011 13:00 1 0 0 1 18.04 21.970 16
## 1125 2/19/2011 14:00 1 0 0 1 18.86 22.725 15
## 1126 2/19/2011 15:00 1 0 0 1 18.04 21.970 16
## 1127 2/19/2011 16:00 1 0 0 1 18.04 21.970 16
## 1128 2/19/2011 17:00 1 0 0 1 17.22 21.210 19
## 1259 2/25/2011 14:00 1 0 1 3 22.96 26.515 56
## 1260 2/25/2011 15:00 1 0 1 1 18.86 22.725 41
## 1262 2/25/2011 17:00 1 0 1 1 13.12 13.635 49
## 1265 2/25/2011 20:00 1 0 1 1 12.30 12.880 49
## 1333 2/28/2011 19:00 1 0 1 3 18.04 21.970 88
## 1334 2/28/2011 20:00 1 0 1 3 18.04 21.970 88
## 1478 3/6/2011 21:00 1 0 0 3 9.84 9.090 93
## 8068 12/7/2011 19:00 4 0 1 3 13.94 14.395 87
## 8069 12/7/2011 20:00 4 0 1 3 13.94 14.395 87
## 8706 1/3/2012 13:00 1 0 1 1 7.38 6.060 34
## 8943 1/13/2012 11:00 1 0 1 1 9.84 9.090 38
## 9644 2/11/2012 18:00 1 0 0 2 9.02 9.090 47
## 9647 2/11/2012 21:00 1 0 0 1 5.74 3.790 43
## 9653 2/12/2012 3:00 1 0 0 2 4.10 2.275 46
## 9654 2/12/2012 4:00 1 0 0 2 4.10 2.275 46
## 9662 2/12/2012 12:00 1 0 0 1 5.74 3.790 39
## 9957 2/24/2012 21:00 1 0 1 1 17.22 21.210 35
## 9959 2/24/2012 23:00 1 0 1 1 15.58 19.695 37
## 9971 2/25/2012 11:00 1 0 0 1 12.30 12.880 39
## 9972 2/25/2012 12:00 1 0 0 1 13.12 13.635 29
## 10169 3/4/2012 18:00 1 0 0 1 13.12 13.635 33
## 10193 3/5/2012 18:00 1 0 1 3 11.48 11.365 55
## 10260 3/8/2012 13:00 1 0 1 2 24.60 31.060 49
## 10261 3/8/2012 14:00 1 0 1 2 25.42 31.060 43
## 10262 3/8/2012 15:00 1 0 1 1 26.24 31.060 38
## 10263 3/8/2012 16:00 1 0 1 2 25.42 31.060 41
## 10264 3/8/2012 17:00 1 0 1 1 25.42 31.060 38
## 10290 3/9/2012 19:00 1 0 1 1 17.22 21.210 28
## 16208 11/13/2012 1:00 4 0 1 3 18.04 21.970 88
## 16473 11/24/2012 2:00 4 0 0 1 13.12 13.635 39
## 16483 11/24/2012 12:00 4 0 0 2 12.30 12.880 36
## 17150 12/22/2012 8:00 1 0 0 1 10.66 10.605 44
## 17154 12/22/2012 12:00 1 0 0 1 12.30 12.880 36
## 17345 12/30/2012 13:00 1 0 0 1 12.30 12.880 36
## windspeed casual registered count sources
## 1008 43.9989 19 71 90 www.google.co.uk
## 1010 40.9973 25 218 243 ad campaign
## 1011 40.9973 11 194 205 ad campaign
## 1015 43.9989 1 44 45 www.google.co.uk
## 1018 51.9987 0 5 5 www.google.fi
## 1019 46.0022 1 2 3 www.google.fi
## 1120 43.9989 18 37 55 Ad Campaign
## 1124 40.9973 52 103 155 Twitter
## 1125 43.9989 102 94 196 Twitter
## 1126 50.0021 84 87 171 ad campaign
## 1127 43.0006 39 81 120 blog
## 1128 40.9973 36 91 127 Ad Campaign
## 1259 40.9973 22 55 77 www.bing.com
## 1260 54.0020 31 98 129 www.google.co.uk
## 1262 50.0021 13 180 193 www.google.com
## 1265 40.9973 3 66 69 direct
## 1333 40.9973 8 76 84 Twitter
## 1334 40.9973 8 47 55 facebook page
## 1478 40.9973 1 6 7 ad campaign
## 8068 43.0006 2 31 33 ad campaign
## 8069 43.0006 1 25 26 Twitter
## 8706 43.9989 5 68 73 www.google.co.uk
## 8943 40.9973 12 102 114 www.yahoo.com
## 9644 43.9989 3 105 108 <NA>
## 9647 43.0006 5 43 48 Twitter
## 9653 46.0022 0 14 14 www.yahoo.com
## 9654 47.9988 0 1 1 www.bing.com
## 9662 43.0006 7 133 140 ad campaign
## 9957 54.0020 12 138 150 ad campaign
## 9959 46.0022 9 71 80 Twitter
## 9971 40.9973 29 155 184 www.google.com
## 9972 43.9989 49 218 267 www.google.com
## 10169 40.9973 20 164 184 <NA>
## 10193 43.9989 12 363 375 www.yahoo.com
## 10260 43.0006 35 198 233 www.bing.com
## 10261 43.0006 48 155 203 ad campaign
## 10262 46.0022 24 161 185 ad campaign
## 10263 43.0006 37 305 342 AD campaign
## 10264 43.9989 52 545 597 blog
## 10290 40.9973 12 232 244 www.google.fi
## 16208 43.0006 0 5 5 Twitter
## 16473 40.9973 5 29 34 facebook page
## 16483 40.9973 39 227 266 direct
## 17150 40.9973 8 75 83 ad campaign
## 17154 43.9989 30 169 199 www.yahoo.com
## 17345 43.9989 28 152 180 ad campaign
46 rows means there are 46 observations of windspeed being over 40
in spring (1) or winter (4).