Questions 1-3: Conceptual

Question 4:

  # Option 1:

bike <- read.table("bike_sharing_data.csv", sep=",", header=TRUE)
      
  # Option 2:
      
bike1 <- read.csv("bike_sharing_data.csv")
      
  # Option 3:
      
bike2 <- read.table("bike_sharing_data.txt", sep="\t", header=TRUE)
      
  # Option 4:
      
bike3 <- read.delim("bike_sharing_data.txt")
All four options extract the bike sharing datasets correctly.

Questions 5 and 6:

str(bike)
## 'data.frame':    17379 obs. of  13 variables:
##  $ datetime  : chr  "1/1/2011 0:00" "1/1/2011 1:00" "1/1/2011 2:00" "1/1/2011 3:00" ...
##  $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ workingday: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ weather   : int  1 1 1 1 1 2 1 1 1 1 ...
##  $ temp      : num  9.84 9.02 9.02 9.84 9.84 ...
##  $ atemp     : num  14.4 13.6 13.6 14.4 14.4 ...
##  $ humidity  : chr  "81" "80" "80" "75" ...
##  $ windspeed : num  0 0 0 0 0 ...
##  $ casual    : int  3 8 5 3 0 0 2 1 1 8 ...
##  $ registered: int  13 32 27 10 1 1 0 2 7 6 ...
##  $ count     : int  16 40 32 13 1 1 2 3 8 14 ...
##  $ sources   : chr  "ad campaign" "www.yahoo.com" "www.google.fi" "AD campaign" ...
5: The dataset contains 17,379 observations and 13 variables.
6: The humidity variable is interpreted by R as a character (chr) type.

Question 7:

bike[6251, "season"]
## [1] 4
The season value in row 6251 is 4.

Question 8:

table(bike$season)
## 
##    1    2    3    4 
## 4242 4409 4496 4232
Winter (season = 4) has 4,232 observations.

Question 9: Conceptual

Question 10:

nrow(subset(bike, season %in% c(1, 4) & windspeed >= 40))
## [1] 46
There are 46 observations with high wind threat or above (>= 40 mph) occurring during winter (4) or spring (1).