R Markdown

Import the data

bike.csv <- read.csv("bike_sharing_data.csv")
bike.txt <- read.delim("bike_sharing_data.txt")

Preview the data

head(bike.csv)
##        datetime season holiday workingday weather temp  atemp humidity
## 1 1/1/2011 0:00      1       0          0       1 9.84 14.395       81
## 2 1/1/2011 1:00      1       0          0       1 9.02 13.635       80
## 3 1/1/2011 2:00      1       0          0       1 9.02 13.635       80
## 4 1/1/2011 3:00      1       0          0       1 9.84 14.395       75
## 5 1/1/2011 4:00      1       0          0       1 9.84 14.395       75
## 6 1/1/2011 5:00      1       0          0       2 9.84 12.880       75
##   windspeed casual registered count       sources
## 1    0.0000      3         13    16   ad campaign
## 2    0.0000      8         32    40 www.yahoo.com
## 3    0.0000      5         27    32 www.google.fi
## 4    0.0000      3         10    13   AD campaign
## 5    0.0000      0          1     1       Twitter
## 6    6.0032      0          1     1  www.bing.com
tail(bike.csv)
##               datetime season holiday workingday weather  temp  atemp humidity
## 17374 12/31/2012 18:00      1       0          1       2 10.66 13.635       48
## 17375 12/31/2012 19:00      1       0          1       2 10.66 12.880       60
## 17376 12/31/2012 20:00      1       0          1       2 10.66 12.880       60
## 17377 12/31/2012 21:00      1       0          1       1 10.66 12.880       60
## 17378 12/31/2012 22:00      1       0          1       1 10.66 13.635       56
## 17379 12/31/2012 23:00      1       0          1       1 10.66 13.635       65
##       windspeed casual registered count       sources
## 17374    8.9981     10        326   336 facebook page
## 17375   11.0014      5        206   211          <NA>
## 17376   11.0014      4        140   144   AD campaign
## 17377   11.0014      3         96    99   AD campaign
## 17378    8.9981      4         90    94   ad campaign
## 17379    8.9981      3         50    53        direct

Describe the data

str(bike.csv)
## 'data.frame':    17379 obs. of  13 variables:
##  $ datetime  : chr  "1/1/2011 0:00" "1/1/2011 1:00" "1/1/2011 2:00" "1/1/2011 3:00" ...
##  $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ workingday: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ weather   : int  1 1 1 1 1 2 1 1 1 1 ...
##  $ temp      : num  9.84 9.02 9.02 9.84 9.84 ...
##  $ atemp     : num  14.4 13.6 13.6 14.4 14.4 ...
##  $ humidity  : chr  "81" "80" "80" "75" ...
##  $ windspeed : num  0 0 0 0 0 ...
##  $ casual    : int  3 8 5 3 0 0 2 1 1 8 ...
##  $ registered: int  13 32 27 10 1 1 0 2 7 6 ...
##  $ count     : int  16 40 32 13 1 1 2 3 8 14 ...
##  $ sources   : chr  "ad campaign" "www.yahoo.com" "www.google.fi" "AD campaign" ...
summary(bike.csv)
##    datetime             season         holiday          workingday    
##  Length:17379       Min.   :1.000   Min.   :0.00000   Min.   :0.0000  
##  Class :character   1st Qu.:2.000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Mode  :character   Median :3.000   Median :0.00000   Median :1.0000  
##                     Mean   :2.502   Mean   :0.02877   Mean   :0.6827  
##                     3rd Qu.:3.000   3rd Qu.:0.00000   3rd Qu.:1.0000  
##                     Max.   :4.000   Max.   :1.00000   Max.   :1.0000  
##     weather           temp           atemp         humidity        
##  Min.   :1.000   Min.   : 0.82   Min.   : 0.00   Length:17379      
##  1st Qu.:1.000   1st Qu.:13.94   1st Qu.:16.66   Class :character  
##  Median :1.000   Median :20.50   Median :24.24   Mode  :character  
##  Mean   :1.425   Mean   :20.38   Mean   :23.79                     
##  3rd Qu.:2.000   3rd Qu.:27.06   3rd Qu.:31.06                     
##  Max.   :4.000   Max.   :41.00   Max.   :50.00                     
##    windspeed          casual         registered        count    
##  Min.   : 0.000   Min.   :  0.00   Min.   :  0.0   Min.   :  1  
##  1st Qu.: 7.002   1st Qu.:  4.00   1st Qu.: 36.0   1st Qu.: 42  
##  Median :12.998   Median : 16.00   Median :116.0   Median :141  
##  Mean   :12.737   Mean   : 34.48   Mean   :152.5   Mean   :187  
##  3rd Qu.:16.998   3rd Qu.: 46.00   3rd Qu.:217.0   3rd Qu.:277  
##  Max.   :56.997   Max.   :367.00   Max.   :886.0   Max.   :977  
##    sources         
##  Length:17379      
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

There are 17379 observations of 13 variables.

Create a contingency table

sort(table(bike.csv$season), decreasing = TRUE)
## 
##    3    2    1    4 
## 4496 4409 4242 4232

Winter has 4232 observations as 4 is winter.

Select the data using indexing

bike.csv[6251,'season']
## [1] 4

Row 6251 has the season 4/winter.

Subset the data using the subset()

subset(bike.csv, (windspeed >=40) & (season %in% c("1", "4")))
##               datetime season holiday workingday weather  temp  atemp humidity
## 1008   2/14/2011 15:00      1       0          1       1 22.96 26.515       21
## 1010   2/14/2011 17:00      1       0          1       1 18.86 22.725       33
## 1011   2/14/2011 18:00      1       0          1       1 16.40 20.455       40
## 1015   2/14/2011 22:00      1       0          1       1 13.94 14.395       46
## 1018    2/15/2011 1:00      1       0          1       1 12.30 12.120       42
## 1019    2/15/2011 2:00      1       0          1       1 11.48 11.365       41
## 1120    2/19/2011 9:00      1       0          0       1 16.40 20.455       16
## 1124   2/19/2011 13:00      1       0          0       1 18.04 21.970       16
## 1125   2/19/2011 14:00      1       0          0       1 18.86 22.725       15
## 1126   2/19/2011 15:00      1       0          0       1 18.04 21.970       16
## 1127   2/19/2011 16:00      1       0          0       1 18.04 21.970       16
## 1128   2/19/2011 17:00      1       0          0       1 17.22 21.210       19
## 1259   2/25/2011 14:00      1       0          1       3 22.96 26.515       56
## 1260   2/25/2011 15:00      1       0          1       1 18.86 22.725       41
## 1262   2/25/2011 17:00      1       0          1       1 13.12 13.635       49
## 1265   2/25/2011 20:00      1       0          1       1 12.30 12.880       49
## 1333   2/28/2011 19:00      1       0          1       3 18.04 21.970       88
## 1334   2/28/2011 20:00      1       0          1       3 18.04 21.970       88
## 1478    3/6/2011 21:00      1       0          0       3  9.84  9.090       93
## 8068   12/7/2011 19:00      4       0          1       3 13.94 14.395       87
## 8069   12/7/2011 20:00      4       0          1       3 13.94 14.395       87
## 8706    1/3/2012 13:00      1       0          1       1  7.38  6.060       34
## 8943   1/13/2012 11:00      1       0          1       1  9.84  9.090       38
## 9644   2/11/2012 18:00      1       0          0       2  9.02  9.090       47
## 9647   2/11/2012 21:00      1       0          0       1  5.74  3.790       43
## 9653    2/12/2012 3:00      1       0          0       2  4.10  2.275       46
## 9654    2/12/2012 4:00      1       0          0       2  4.10  2.275       46
## 9662   2/12/2012 12:00      1       0          0       1  5.74  3.790       39
## 9957   2/24/2012 21:00      1       0          1       1 17.22 21.210       35
## 9959   2/24/2012 23:00      1       0          1       1 15.58 19.695       37
## 9971   2/25/2012 11:00      1       0          0       1 12.30 12.880       39
## 9972   2/25/2012 12:00      1       0          0       1 13.12 13.635       29
## 10169   3/4/2012 18:00      1       0          0       1 13.12 13.635       33
## 10193   3/5/2012 18:00      1       0          1       3 11.48 11.365       55
## 10260   3/8/2012 13:00      1       0          1       2 24.60 31.060       49
## 10261   3/8/2012 14:00      1       0          1       2 25.42 31.060       43
## 10262   3/8/2012 15:00      1       0          1       1 26.24 31.060       38
## 10263   3/8/2012 16:00      1       0          1       2 25.42 31.060       41
## 10264   3/8/2012 17:00      1       0          1       1 25.42 31.060       38
## 10290   3/9/2012 19:00      1       0          1       1 17.22 21.210       28
## 16208  11/13/2012 1:00      4       0          1       3 18.04 21.970       88
## 16473  11/24/2012 2:00      4       0          0       1 13.12 13.635       39
## 16483 11/24/2012 12:00      4       0          0       2 12.30 12.880       36
## 17150  12/22/2012 8:00      1       0          0       1 10.66 10.605       44
## 17154 12/22/2012 12:00      1       0          0       1 12.30 12.880       36
## 17345 12/30/2012 13:00      1       0          0       1 12.30 12.880       36
##       windspeed casual registered count          sources
## 1008    43.9989     19         71    90 www.google.co.uk
## 1010    40.9973     25        218   243      ad campaign
## 1011    40.9973     11        194   205      ad campaign
## 1015    43.9989      1         44    45 www.google.co.uk
## 1018    51.9987      0          5     5    www.google.fi
## 1019    46.0022      1          2     3    www.google.fi
## 1120    43.9989     18         37    55      Ad Campaign
## 1124    40.9973     52        103   155          Twitter
## 1125    43.9989    102         94   196          Twitter
## 1126    50.0021     84         87   171      ad campaign
## 1127    43.0006     39         81   120             blog
## 1128    40.9973     36         91   127      Ad Campaign
## 1259    40.9973     22         55    77     www.bing.com
## 1260    54.0020     31         98   129 www.google.co.uk
## 1262    50.0021     13        180   193   www.google.com
## 1265    40.9973      3         66    69           direct
## 1333    40.9973      8         76    84      Twitter    
## 1334    40.9973      8         47    55    facebook page
## 1478    40.9973      1          6     7      ad campaign
## 8068    43.0006      2         31    33      ad campaign
## 8069    43.0006      1         25    26      Twitter    
## 8706    43.9989      5         68    73 www.google.co.uk
## 8943    40.9973     12        102   114    www.yahoo.com
## 9644    43.9989      3        105   108             <NA>
## 9647    43.0006      5         43    48      Twitter    
## 9653    46.0022      0         14    14    www.yahoo.com
## 9654    47.9988      0          1     1     www.bing.com
## 9662    43.0006      7        133   140      ad campaign
## 9957    54.0020     12        138   150      ad campaign
## 9959    46.0022      9         71    80          Twitter
## 9971    40.9973     29        155   184   www.google.com
## 9972    43.9989     49        218   267   www.google.com
## 10169   40.9973     20        164   184             <NA>
## 10193   43.9989     12        363   375    www.yahoo.com
## 10260   43.0006     35        198   233     www.bing.com
## 10261   43.0006     48        155   203      ad campaign
## 10262   46.0022     24        161   185      ad campaign
## 10263   43.0006     37        305   342      AD campaign
## 10264   43.9989     52        545   597             blog
## 10290   40.9973     12        232   244    www.google.fi
## 16208   43.0006      0          5     5      Twitter    
## 16473   40.9973      5         29    34    facebook page
## 16483   40.9973     39        227   266           direct
## 17150   40.9973      8         75    83      ad campaign
## 17154   43.9989     30        169   199    www.yahoo.com
## 17345   43.9989     28        152   180      ad campaign

46 rows means there are 46 observations of windspeed being over 40 in spring (1) or winter (4).