Lab Instructions

This lab is designed to give you practice acquiring data from an external site and reading it into R. In addition, it should give you more practice using R and Markdown, and provide experience using the R language when producing a ## Part 1: Data * Find a data set you’re interested in from Seattle’s Open Data Portal at: https://data.seattle.gov/ * Download that dataset as a CSV file and save it to your computer. * Write a code chunk to import the dataset into R using read.csv().

setwd("~/Documents/Data-Driven Analytics/Class Exercises/Homeworks/Lab 2")
data <- read.csv("Burke_Gilman_Trail_Bike_and_Ped_Counter.csv")

Part 2: Look at the data

Tell me something about the data you downloaded. Why do you think it’s interesting? How large is it, what variables does it contain? What kind of information is available? What kind of questions might it let you answer? Use and display a few of the commands we’ve learned (table, head, dim, names) but try to make sure it displays in a readable way. Remember to create separate chunks of code like below and to discuss what you see in the output

Name of Dataset: Burke Gilman Trail north of NE 70th St Bike and Ped Counter Link

This dataset is for a Bike and Pedestrian counter at the Burke-Gilman Trail north of NE 70th St. The dataset has 46k rows and 6 columns. The variables are: Date/Time, BGT North of NE 70th Total (total bike and pedestrians counted at that time), Ped South, Ped North, Bike South, and Bike North. This dataset has purely quantitative information regarding the counts.

I think this dataset is interesting as it is one of the few pedestrian/bike counters in the City, and is for a highly-used path in the City. It would be interesting to see how the pedestrian and bike use changes over time.

head(data)
##            Date BGT.North.of.NE.70th.Total Ped.South Ped.North Bike.North
## 1 3/31/19 23:00                          2         0         0          1
## 2 3/31/19 22:00                          5         0         0          3
## 3 3/31/19 21:00                          4         0         0          2
## 4 3/31/19 20:00                         12         2         3          4
## 5 3/31/19 19:00                         60        10        14         18
## 6 3/31/19 18:00                        142        19        25         45
##   Bike.South
## 1          1
## 2          2
## 3          2
## 4          3
## 5         18
## 6         53
summary(data)
##            Date       BGT.North.of.NE.70th.Total   Ped.South     
##  1/1/14 0:00 :    1   Min.   :    0.00           Min.   :   0.0  
##  1/1/14 1:00 :    1   1st Qu.:    3.00           1st Qu.:   0.0  
##  1/1/14 10:00:    1   Median :   32.00           Median :   4.0  
##  1/1/14 11:00:    1   Mean   :   75.37           Mean   :  22.1  
##  1/1/14 12:00:    1   3rd Qu.:   92.00           3rd Qu.:  14.0  
##  1/1/14 13:00:    1   Max.   :10493.00           Max.   :4054.0  
##  (Other)     :45978   NA's   :2335               NA's   :2335    
##    Ped.North          Bike.North       Bike.South     
##  Min.   :   0.000   Min.   :  0.00   Min.   :   0.00  
##  1st Qu.:   0.000   1st Qu.:  1.00   1st Qu.:   1.00  
##  Median :   4.000   Median :  8.00   Median :   9.00  
##  Mean   :   9.975   Mean   : 21.79   Mean   :  21.51  
##  3rd Qu.:  12.000   3rd Qu.: 29.00   3rd Qu.:  32.00  
##  Max.   :4095.000   Max.   :794.00   Max.   :8191.00  
##  NA's   :2335       NA's   :2335     NA's   :2335
dim(data)
## [1] 45984     6
names(data)
## [1] "Date"                       "BGT.North.of.NE.70th.Total"
## [3] "Ped.South"                  "Ped.North"                 
## [5] "Bike.North"                 "Bike.South"

Part 3: Modifying data

If future weeks we’ll learn about modifying data, how to change the orientation of a figure or remove some entries or many other things. Look at your data and think about what would make the raw data more useful. Are dates entered in the wrong form? Do you need locations aggregated to a higher level? Are words entered inconsistently (i.e., Seattle, seattle, SEATTLE)? Start to think forward about what you want to learn to make data as useful as possible for you. If you are experienced with R, go ahead an modify one of the columns.

I think it would be useful to condense the rows into morning, afternoon, evening data/time chunks. Right now, the rows are provided by the hour, which just creates too much data.

I think the columns are good as they are; there are only 5 other variables besides date and time and they are all useful. If I were to join the columns, I would join Ped North and Ped South, and Bike North and Bike South.

Part 4: Final Project

  • Look at the sources of life logging data you have available now. Make a list of the data sources you already have (already are signed up for).
  • as a second (nested) point on each item, identify whether it is passive, active, or administrative data
  • answer whether you can find a way to download the data you generate. If yes, post a link to where. If no, can you copy the information each day? How many do you have, is there any type of data you lack, and is there anything else you’re going to look into acquiring? Remember, you’ll want to start data collection next week.
  1. Steps/Sleep/Heart Rate Logging on Fitbit
  1. Credit Card/Debit Card Data
  • Administrative
  • Would have to enter by hand.
  1. RescueTime
  1. AskMeEvery