IS607 - Project 2 - Trains (Dataset 1 of 3)

Steps

Attach libraries, load and show raw data

library(tidyr)
library(dplyr)
library(knitr)

rawdata = read.csv("https://raw.githubusercontent.com/L-Velasco/Fall16_IS607/master/Project%202_Train.csv")
kable(rawdata)

Trains	Arrival	Atlantis	El.Dorado	Hyperborea	Narnia	Valhalla
shinkansen	on_time	497	221	212	503	1841
shinkansen	delayed	62	12	20	102	305
tgv	on_time	694	4840	383	320	201
tgv	delayed	117	415	65	129	61

Tidy the data

Used gather() to make the city destination variables be part of observations and spread() to make the arrival type (on time, delayed) be variables. This step reshapes the data from “wide” to “tall” presentation.

tidydata <- rawdata %>%
  gather("destination","freq",3:7) %>%
  spread(Arrival,freq)
kable(tidydata)

Trains	destination	delayed	on_time
shinkansen	Atlantis	62	497
shinkansen	El.Dorado	12	221
shinkansen	Hyperborea	20	212
shinkansen	Narnia	102	503
shinkansen	Valhalla	305	1841
tgv	Atlantis	117	694
tgv	El.Dorado	415	4840
tgv	Hyperborea	65	383
tgv	Narnia	129	320
tgv	Valhalla	61	201

Transform and Analyze the data

Used mutate() to add two new variables that sums the total service and on-time percentage, per train and destination. These new variables help in analysing the train’s arrival. performance across city destinations.

transformed_data <- tidydata %>% 
  mutate(total_service = delayed + on_time,
         pct_ontime = round(on_time / total_service,2) * 100)
kable(transformed_data)

Trains	destination	delayed	on_time	total_service	pct_ontime
shinkansen	Atlantis	62	497	559	89
shinkansen	El.Dorado	12	221	233	95
shinkansen	Hyperborea	20	212	232	91
shinkansen	Narnia	102	503	605	83
shinkansen	Valhalla	305	1841	2146	86
tgv	Atlantis	117	694	811	86
tgv	El.Dorado	415	4840	5255	92
tgv	Hyperborea	65	383	448	85
tgv	Narnia	129	320	449	71
tgv	Valhalla	61	201	262	77

1. Which was the more reliable service for each city and by how much?

reliable_train_per_city <- transformed_data %>% 
  select(-delayed, -on_time, -total_service) %>% 
  spread(Trains, pct_ontime) %>% 
  mutate(pct_diff = shinkansen - tgv)

kable(reliable_train_per_city)

destination	shinkansen	tgv	pct_diff
Atlantis	89	86	3
El.Dorado	95	92	3
Hyperborea	91	85	6
Narnia	83	71	12
Valhalla	86	77	9

Based on the table above, SHINKANSEN appears to have a more reliable service for each city by at least 3% on-time arrival.

#2. Which was the most reliable train service, overall?

reliable_train_overall <- transformed_data %>% 
  group_by(Trains) %>% 
  summarise(pct_ontime = round(sum(on_time)/sum(total_service),2) * 100)

kable(reliable_train_overall)

Trains	pct_ontime
shinkansen	87
tgv	89

Based on the table above, TGV appears to have a more reliable service overall.

#3. In aggregate, which city had the most reliable on-time train service?

city_reliable_ontime_service <- transformed_data %>% 
  group_by(destination) %>% 
  summarise(pct_ontime = round(sum(on_time)/sum(total_service),2) * 100)

kable(city_reliable_ontime_service)

destination	pct_ontime
Atlantis	87
El.Dorado	92
Hyperborea	88
Narnia	78
Valhalla	85

Based on the table above, the destination EL DORADO appears to have the most reliable on-time service from both trains.

IS607 - Project 2 - Trains (Dataset 1 of 3)

Luisa Velasco

October 8, 2016

Description of the Project

Steps

Attach libraries, load and show raw data

Tidy the data

Transform and Analyze the data