Client

Transdev Sydney Light Rail


Reccomdations

  • Central Grand Concourse, Central Chalmers Street, and Chinatown light rail routes has evidently the most boarding passengers

  • Increasing the frequency trains can give more convenience and comfort of taking these routes since there is a large population taking them

  • Reduce overcrowding and improve experience during peak hours

  • Hypothesis: Light rail routes around Town hall and Central is the most crowded areas, and frequency of trips should be added to these routes from 7am to 8am to make it more comfotable for passengers.


Evidence

The Data

  • The number of passengers on different light rail routes vary greatly, not only by the amount of trips taken, but also the people who took them.

  • Information on the trips taken and type of people who have taken the light rail is data from July 2016 to January 2023 (recent)

Which light rail route has the most population taking it?

dataset <- read.csv("LightRail_Jan2023.csv")

options(scipen=999)

library(tidyverse)
library(ggplot2)

ggplot(dataset, aes(x = Location, y = Trip, fill = Location)) +
  geom_bar(stat="identity", show.legend = F) +
  labs(x = "Type of Card", y = "Number of Trips",
       title = "Number of trips people took the light rail in different routes") +
  coord_flip()

Analysis

  • The numbers of trip from the x-axis above is the number combining all the different type of people that have taken the light rail

  • There are no direct correlation between the locations of the light rail routes and the number of trips taken that can be discovered from this bar chart

  • It can be seen that Central Grand Concourse, Central Chalmers Street, and Chinatown light rail are the three main routes that has the most passengers

  • Limitation: many people don’t swipe their cards and pay when getting on the light rail

Which routes should add more frequency in trips (and when)?

dataset <- read.csv("LightRail_Jan2023.csv")

library(tidyverse)
library(ggplot2)

agg_data <- aggregate(Trip ~ Card_type + Location, data = dataset, sum)

ggplot(agg_data, aes(x = Card_type, y = Location, size = Trip)) +
  geom_point(color = "blue", shape = 16) +
  scale_size_continuous(range = c(1, 10), breaks = pretty(agg_data$Trip, n = 5)) +
  labs(title = "Number of trips passengers took the lightrail with different card types in different location", x = "Card_type", y = "Location", size = "Trip") +
  geom_smooth(method = "lm", se = FALSE, linetype = "dashed", color = "red", size = 1.2) +
  theme(plot.title = element_text(hjust = 0.5, size = 30),
        legend.text = element_text(size = 14),
        legend.title = element_text(size = 16))

Analysis

  • As shown from above, the main population that takes the three main routes are adults (the bigger the point, the larger number of trip)

  • The Fair Work Ombudsman of Australian Government has provided information that most adults in the work field work from 9am to 5pm, giveing us evidence to assume that peak hours that the light rail passenger experience is from 8am to 9am, or 5pm to 6pm

  • Take Central Grand Concourse light rail as a example since the number of passengers taking this route greatly surpasses other routes. Adults make up one of a third (8000000 trips) of the trips, we can evaluate that the peak hours that tends to overcrowd is from 8am to 9am, or 5pm to 6pm

  • Increasing the frequency of trains within that time frame can greatly help with the flow and convenience of the passengers, and reduce the problems that might occur, such as the increased risk of injury, discomfort from crowded train rides and people not pay for their rides since it is easy to his in crowd

  • Limitation: no in-depth detailed data on the time which are the peak hours and which are not, there is not a exact distinction between the usage of different cards (people might use the type of cards that are not meant for them)

  • Most of the more populated light rail routes tend to have a more percentage of adults, meaning the the concept and recommendation on Central Grand Concourse light rail can also be applied on the others.


Acknowlegdements


Appendix 1

Number of trips people took the light rail with different card types

dataset <- read.csv("LightRail_Jan2023.csv")

options(scipen=999)

library(tidyverse)
library(ggplot2)

ggplot(dataset, aes(x = Card_type, y = Trip, fill = Card_type)) +
  geom_bar(stat="identity") +
  labs(x = "Type of Card", y = "Number of Trips",
       title = "Number of trips people took the light rail with different card types") +
  coord_flip()

Appendix 2

Defense of Approach

Client

  • To solve the problem of the overcrowding during normal and peak hours on the light rail, only the company controlling and running the light rail can do it - Transdev Sydney Light Rail. The analysis of the graphs and plots are given so that the company can get a more interpreted and informed report.

Statistical Analysis

  • The correlation between the numbers of trips and people who took the train is informed to the client so they can have more detail in the decision when increasing the frequency. The assumptions made on the time of peak hours are reasoned by the distribution of the different people taking the train (figure 2), and the information provided by Australian government. The assumption was made after considering these two aspects.

Limitation

  • Data on the numbers of trip taken can be easily misreported (many people tend to not swipe their cards), can’t identity the exact time of peak hours, method used on more populated light rail routes can’t be copied to use for in comparison, small populated light rail routes.