This document is a hands-on lab on the Bike Data provided in association with Lab 1 of the UT Austin MOOC course on foundations of data sciences.

We will be working on the BikeData file.

The first step is the install SDSFoundations package. This is available under: https://preview.edx.org/c4x/UTAustinX/UT.7.01x/asset/SDSFoundations_1.1.zip

The Next step is to load the library, Once this is done the bike data should be available. Lets us assign this to our environment.

library(SDSFoundations)
BikeData<-BikeData

Let us explore this data set..

What is the age of the 7th rider in the dataset?

str(BikeData)
## 'data.frame':    121 obs. of  9 variables:
##  $ user_id : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ age     : int  28 35 28 44 42 36 45 54 39 44 ...
##  $ gender  : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 2 ...
##  $ student : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ employed: int  1 1 1 1 1 1 1 1 1 0 ...
##  $ cyc_freq: Factor w/ 4 levels "Daily","Less than once a month",..: 1 1 1 2 4 4 4 4 4 3 ...
##  $ distance: num  3.25 1.11 5.59 3.24 7.81 ...
##  $ time    : int  15 5 23 24 26 20 51 39 50 44 ...
##  $ speed   : num  13 13.3 14.6 8.1 18 ...
BikeData$age[7]
## [1] 45

How many of the first 10 riders in the dataset ride daily?

length(BikeData[BikeData$user_id<=10 & BikeData$cyc_freq=="Daily"])
## [1] 3

What is the speed of the first female who cycles less than one time per month (in miles/hour)?

levels(BikeData$cyc_freq)
## [1] "Daily"                   "Less than once a month" 
## [3] "Several times per month" "Several times per week"
BikeData$speed[BikeData$cyc_freq=="Less than once a month" & BikeData$gender=='F']
## [1]  8.1 14.8

What type of variable is student?

class(BikeData$student)
## [1] "integer"

What type of variable is cyc_freq?

class(BikeData$cyc_freq)
## [1] "factor"

What type of variable is distance?

class(BikeData$distance)
## [1] "numeric"

How many students are in the dataset?

table(BikeData$student)
## 
##   0   1 
## 107  14

Since we only want to work with the student data, let’s create a new data frame that only includes students.

student <-BikeData[BikeData$student==1,]

We want to know how often the students ride

table(student$cyc_freq)
## 
##                   Daily  Less than once a month Several times per month 
##                       8                       0                       0 
##  Several times per week 
##                       6

We also want to know how far the students travel on average. Let’s create a vector of just the distances

distance <-student$distance
distance
##  [1]  3.25 10.94  9.34  1.25  9.29  2.77  4.84  6.56  0.85  6.07  3.52
## [12] 13.43  7.19  8.31

Now let’s find the average distance ridden by the students, using the mean function.

mean(distance)
## [1] 6.257857