This document is a hands-on lab on the Bike Data provided in association with Lab 1 of the UT Austin MOOC course on foundations of data sciences.
We will be working on the BikeData file.
The first step is the install SDSFoundations package. This is available under: https://preview.edx.org/c4x/UTAustinX/UT.7.01x/asset/SDSFoundations_1.1.zip
The Next step is to load the library, Once this is done the bike data should be available. Lets us assign this to our environment.
library(SDSFoundations)
BikeData<-BikeData
Let us explore this data set..
What is the age of the 7th rider in the dataset?
str(BikeData)
## 'data.frame': 121 obs. of 9 variables:
## $ user_id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ age : int 28 35 28 44 42 36 45 54 39 44 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 2 ...
## $ student : int 1 0 0 0 0 0 0 0 0 0 ...
## $ employed: int 1 1 1 1 1 1 1 1 1 0 ...
## $ cyc_freq: Factor w/ 4 levels "Daily","Less than once a month",..: 1 1 1 2 4 4 4 4 4 3 ...
## $ distance: num 3.25 1.11 5.59 3.24 7.81 ...
## $ time : int 15 5 23 24 26 20 51 39 50 44 ...
## $ speed : num 13 13.3 14.6 8.1 18 ...
BikeData$age[7]
## [1] 45
How many of the first 10 riders in the dataset ride daily?
length(BikeData[BikeData$user_id<=10 & BikeData$cyc_freq=="Daily"])
## [1] 3
What is the speed of the first female who cycles less than one time per month (in miles/hour)?
levels(BikeData$cyc_freq)
## [1] "Daily" "Less than once a month"
## [3] "Several times per month" "Several times per week"
BikeData$speed[BikeData$cyc_freq=="Less than once a month" & BikeData$gender=='F']
## [1] 8.1 14.8
What type of variable is student?
class(BikeData$student)
## [1] "integer"
What type of variable is cyc_freq?
class(BikeData$cyc_freq)
## [1] "factor"
What type of variable is distance?
class(BikeData$distance)
## [1] "numeric"
How many students are in the dataset?
table(BikeData$student)
##
## 0 1
## 107 14
Since we only want to work with the student data, let’s create a new data frame that only includes students.
student <-BikeData[BikeData$student==1,]
We want to know how often the students ride
table(student$cyc_freq)
##
## Daily Less than once a month Several times per month
## 8 0 0
## Several times per week
## 6
We also want to know how far the students travel on average. Let’s create a vector of just the distances
distance <-student$distance
distance
## [1] 3.25 10.94 9.34 1.25 9.29 2.77 4.84 6.56 0.85 6.07 3.52
## [12] 13.43 7.19 8.31
Now let’s find the average distance ridden by the students, using the mean function.
mean(distance)
## [1] 6.257857