Notebook Setup

setwd("~/Documents/Via")
library(readr)
library(data.table)
library(magrittr)
library(dplyr)
library(ff)
library(ggplot2)
library(scales)
library(RColorBrewer)
library(extrafont)
library(hexbin)
library(lubridate)

Load in fare and ride data of 2013

fare <- read_csv('trip_fare_1.csv')
ride <- read_csv('trip_data_1.csv')
# ride <- fread('trip_data_1.csv')

Source functions and parameters

Airport Coordinates Definition

* JFK: 40.6413° N, 73.7781° W

* LaGuardia: 40.7769° N, 73.8740° W

jfk <- data.frame(x=c(-73.81,-73.77), y=c(40.635,40.67))
laguardia <- data.frame(x=c(-73.88,-73.858), y=c(40.764,40.779))

Tag airports and manipulate dataframe (details in codes)

Pickups

Visulization — Pickups by Count

Warning: Removed 716 rows containing missing values (geom_point).

  • The bright highlights of the airports and midtown Times Square show the most taxi ride counts.
  • Yellow dots are placed to pinpoint airports coordinates.

Visulization — Pickups by Fare Revenue

Warning: Removed 720 rows containing non-finite values (stat_summary_hex).
Warning: Removed 720 rows containing missing values (geom_point).

  • The bright highlights of the airports, midtown Times Square and Financial District show the most fare revenue from taxi rides.

Visualization — Pickups by Fare Per Second

Warning: Removed 808 rows containing non-finite values (stat_summary_hex).
Warning: Removed 720 rows containing missing values (geom_point).

  • The bright highlights of the airports, and midtown Times Square show the most efficient taxi rides.

Dropoffs

Manipulate dataframe

map2 <- only_airport %>% select(dropoff_longitude, dropoff_latitude, fare_amount, trip_time_in_secs) %>% 
  round(4) %>% group_by(dropoff_longitude, dropoff_latitude) %>% 
  summarise(count=n(), fare = sum(fare_amount), time=mean(trip_time_in_secs)) 

Visulization — Dropoffs by Count

Warning: Removed 485 rows containing missing values (geom_point).

Visulization — Dropoffs by Fare Revenue

Warning: Removed 485 rows containing non-finite values (stat_summary_hex).
Warning: Removed 485 rows containing missing values (geom_point).

Visualization — Dropoffs by Fare Per Second

Warning: Removed 485 rows containing non-finite values (stat_summary_hex).
Warning: Removed 485 rows containing missing values (geom_point).

  • Interestingly, for dropoff rides, midtown Times Square actually show more efficiency than airports, which means taxi drivers earn more fare revenue per one unit of time in second.

Analysis Questions:

• How would you assess the efficiency of aggregating rides to/from each airport?

I would use fare revenue/trip_time_in_secscost per second to assess efficiency of aggregating rides to/from airports. Because the idea is that the higher fare revenue earned by one unit of time (second), the more efficient the ride is in economic sense.

• How does this compare to our current area of service (e.g. the Upper East Side)?

From graph, it shows UES generate similar fare revenue compared with to/from airports. Two other area of service worth noting are Mid-town (Times Square), and Financial District. The Upper West Side has potential too.

• Which of the airport expansion options is most beneficial and why?

LaGuardia airport expansion is most beneficial compared with JFK. It has relative less pickup and dropoff points but demand is not low. Plus rides to/from LaGuardia are economicly efficient.

Manipulate dataframe

map3 <- only_airport %>% select(pickup_longitude,pickup_latitude, surcharge) %>% round(4) %>%
  group_by(pickup_longitude,pickup_latitude)%>% summarise(count=n(), surcharge = sum(surcharge)) 

Visualizataion — Pickups by Surcharge

Warning: Removed 720 rows containing non-finite values (stat_summary_hex).
Warning: Removed 720 rows containing missing values (geom_point).

  • Pickups from airports rides collect most surcharges per ride.

Manipulate dataframe

map4 <- only_airport %>% select(dropoff_longitude,dropoff_latitude, surcharge) %>% round(4) %>%
  group_by(dropoff_longitude,dropoff_latitude)%>% summarise(count=n(), surcharge = sum(surcharge)) 

Visualization — Dropoffs by Surcharge

Warning: Removed 485 rows containing non-finite values (stat_summary_hex).
Warning: Removed 485 rows containing missing values (geom_point).

  • Another insight shows from dropoff map that La Guardia actually incur more surcharge than JFK. Hence expanding to LaGuardia may be more beneficial economicly.

• Would you launch airports as a separate service or as a new service? Why?

My thought process of whether to launch airports as a separate service is to look at demand and revenue relations. Based on graphs above, both metrics exhibit favorable signal for such consideration. In addition, people are inclined to request airport ride in advance on their phones, which fits Via’s operating business well.

Visualization — Ride Time Analysis

  • By package definition, Sunday is 1.

• Would you launch airport rides during all our hours of service (6am-12am on weekdays and 10am-12am on Saturdays) or only for certain hours? Which hours?

As shown in graph above, the current hours of service seem to almost cover most high demand time. That said, I would allocate more resources around mid afternoon, especially on Wednesday and Thursday.

Visualization — Fare Per Passenger Analysis

• How would you price airport rides and why (our current model is a $5.00 flat fee weekdays before 9pm and $5.95 weeknights after 9pm and all day Saturdays)?

According to taxi ride fare, discounted by number of passenger, I still think $5 flat fee is quite low, compared with almost $30 dollar normal rate. If it is Via’s initial strategy to use advantageous price incentive to start the business, it makes sense. However, as Via establishes itself, it may not be optimal anymore to lay too much money on the table. Hence, I would suggest a different price tier. In order to make profit and alleviate traffic congestion, I will charge $10 to $15 flat fee, including tax (the exact amount can vary according to model accuracy) for regular operating hours, and $1 to $5 surcharge for rush hours between 6am to 7am and 18pm to 19pm.

Qualitative Questions:

• What additional data would you like to see in order to answer questions 1-5 more confidently and how would you incorporate it?

I would want to see through what channel are those taxi rides are requested. For example, we may see growing usage of mobile application for airports rides as people may feel more assured to request ride in advance on their mobile phones, versas traditionally getting cabs on the street. This will also provide more information if we need to launch airport rides as separate service.

• How might your answer change over time? What Via data would you monitor to ensure the proposed expansion was a good business decision?

My answer might change along with the change of competitive landscape of On-demand transportation. Therefore, I would monitor Via’s competitors’ data to assess the efficiency of the proposed expansion.

