Chicago Transit API

Explanation of the service

The API that I will be looking in to will be a Chicago Transit Authority, this is a Api where you are able to see both buses and the subway system fro Chicago. Some things that you can see are delays, scheduled service, and additionally you can see customer satisfaction. One of the primary things I want to look in to is to see for the L subway the primary subway system in Chicago which line is often has the most delay and looking it to see if these delays occur at the same spot and where is it that the subway comes from. With a focus in particular on the Trains.

Setting up the APi

To intially find the website you have to go to this link https://www.transitchicago.com/developers/traintracker/ once you are on this website you have to oppurtunity to apply for a API key once you have recieved you key you can than go to this website http://lapi.transitchicago.com/api/1.0/ttarrivals.aspx? and after the question you just have to enter in your key and somethings that you can further see are.

Different Response Fields

Location based

mapid = which is the five digit code telling the sever which station you are looking in to
stpid = a five digit code indicating the specific stop

Identifiers of the Station

stanm = which is the station name
staid = station identification
stpid = a five digit code indicating the specific stop
rn = route number
destnm = destination name
mapid = numeric station identifier

Time fields

arrT = which is a date and time format that indicates

Location Fields

lon = longittude of the train
lat = latitude of the train

Setting up the API in R

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(xml2)


clark <- 
  read_xml("https://lapi.transitchicago.com/api/1.0/ttarrivals.aspx?key=b8cbcdc4fc1e41b9a9c76f84930d3a1f&mapid=40380&max=40")


#Parameters
stpId <- xml_text(xml_find_all(clark,".//stpId"))

mapid <- xml_text(xml_find_all(clark, ".//mapid"))

#Station Identifiers Fields

staNm <- xml_text(xml_find_all(clark, ".//staNm"))

staId <- xml_text(xml_find_all(clark, ".//staId"))

rn <- xml_text(xml_find_all(clark, ".//rn"))

destNm <- xml_text(xml_find_all(clark, ".//destNm"))

# Time Fields
 arrT <- xml_text(xml_find_all(clark, ".//arrT"))
 
#Location Based Fields
 lon <- xml_text(xml_find_all(clark, ".//lon"))
 latt <- xml_text(xml_find_all(clark, ".//lat"))
 heading <- xml_text(xml_find_all(clark, ".//heading"))
 
train_df <-
  data_frame(staNm, staId, rn,  arrT, lon, latt, heading)

## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## ℹ Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Examining the Clark Results

From running this code you have to use a XML which is a older version of JSON but looking at this data, you can see that in creating the Clark, i limited it to 40, which is what i did after trial and error i started it off with 5,10,20, and ultimately I did 40 and found that their were only 29 instances of a train coming in to the Clark station. Interesting thing about this is that if you wait five or so minutes to run this code again you’ll see the results update and will be different which is interesting because say if you where in charge of tracking this you could track it via the API.

Summary

In summary the Chicago Metro Transit API is a means to be able to see a multitude of trains throughout the Chicago area where you can keep track of and collect data to see how the trains are arriving which routes are getting to the location the quickest and so on. I think this API is very interesting for a data analyst as it is continuously updating itself simultaneously so if you tracked it for the week at the same time each day you could see some very interesting results.