2022-06-23

Introduction

We created a Shiny App named My RapidKL Train Scheduler for Assignment 2.

Problem Identified: Planning a journey via public transport is a hassle due to non-centralized information of the train Fares,Route and Duration in multiple web pages.

RapidKL Train Dataset :

  • 3 data files for LRT Kelana Jaya Line and MRT Sungai-Buloh Kajang Line.
  • Dataset Source is from RapidKL
  • Dataset was checked and amount of samples has been reduced, while any dirty data has been checked.
  • Finalised dataset are kept in Kaggle for reference.

Dataset : RapidKL Train Dataset

Item Dataset Description
“Fare.csv” Numeric value for train fare measured in Ringgit Malaysia (RM).
“Time.csv” Numeric value for train ride duration measured in minutes.
“Route.csv” String value for train route.

Questions :

  • What’s the minimum amount of money required to board the train from Gombak to somewhere else?
   Fare <- read.csv(file = 'Fare.csv')##read file and store as data frame
   print(max(Fare$Gombak)) # max() used because user might stop other station
  • How many minutes does the train take from Gombak to KL Sentral?
   Time <- read.csv(file = 'Time.csv') ##read file and store as data frame
   print(Time[1,4]) ##call the station index instead of station names

Exploratory Data Analysis : Barplot

    barplot(Fare$Gombak,xlab="Station",ylab="Fare",col="blue",
    main="Train fare from Gombak")

Shiny App : My RapidKL Train Scheduler

Key Takeaways

  1. We should be able to discern and identify different types of data involved when doing data science project.An Integer is not the same as Float, so pay attention to it.

  2. Users can plan their train ride by selecting current station and destination.

  3. The app will calculate and retrieve the data for the train Fare, Routes and Duration of the trip.

  4. Practicing data driven culture is important, always check the dataset format and library declared. In our group’s case we used .csv file, so we did not use the “xlsx” library to read it.