Welcome! This dataset contains information about the best flights ticket prices from Jakarta. We all know how important it is to obtain affordable flights tickets when planning a trip. Sometimes, we need to invest time in searching for the best deals or even stay up late to take advantage of cheaper ticket prices. This data has been collected from the online ticket agent, tiket.com, and compiled by Riki Akbar as the code author.
This dataset provides an opportunity for us to analyze and understand the trends in flights ticket prices from Jakarta to various destinations in Indonesia. With this data, we can explore opportunities to secure flights tickets at better prices when planning our travels
The data we have is currently in a single column, even though it contains different pieces of information. This makes it appear messy, and we need to split it into several new columns. We can do this using the ‘tidyr’ library
library(tidyr)
# Split the column based on the delimiter "|"
ticket <- separate(tc,
col = extract_timestamp.origin.destination.depart_date.best_price,
into = c("Timestamp", "Origin", "Destination", "Depart_Date", "Best_Price"),
sep = "\\|")
# Check the data frame after the transformation
ticketNext, let’s take a look at the data components using ‘glimpse()’ from the ‘dplyr’ library.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Rows: 45,438
## Columns: 5
## $ Timestamp <chr> "2022-12-24 11:25:37.656571", "2022-12-24 11:25:37.656571"…
## $ Origin <chr> "JKTC", "JKTC", "JKTC", "JKTC", "JKTC", "JKTC", "JKTC", "J…
## $ Destination <chr> "BTJ", "BTJ", "BTJ", "BTJ", "BTJ", "BTJ", "BTJ", "BTJ", "B…
## $ Depart_Date <chr> "2023-04-25", "2023-04-04", "2023-04-27", "2023-04-29", "2…
## $ Best_Price <chr> "2310582.0", "2310582.0", "2310582.0", "2310582.0", "23163…
## Timestamp Origin Destination Depart_Date Best_Price
## 0 0 0 0 0
## [1] FALSE
The data currently does not have appropriate data types for the five columns. They should have the following data types:
•Origin: factor
•Destination: factor
•Depart_Date: date
•Best_Price: integer
•Timestamp: timestamp using ‘as.POSIXct()’
# Modifying data types using mutate()
ticket <- ticket %>%
mutate(Timestamp = as.POSIXct(Timestamp,format = "%Y-%m-%d %H:%M:%S"),
Origin = as.factor(Origin),
Destination = as.factor(Destination),
Best_Price = as.integer(Best_Price),
Depart_Date = as.Date(Depart_Date, format = "%Y-%m-%d"))
str(ticket)## 'data.frame': 45438 obs. of 5 variables:
## $ Timestamp : POSIXct, format: "2022-12-24 11:25:37" "2022-12-24 11:25:37" ...
## $ Origin : Factor w/ 1 level "JKTC": 1 1 1 1 1 1 1 1 1 1 ...
## $ Destination: Factor w/ 29 levels "BDJ","BDO","BIK",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ Depart_Date: Date, format: "2023-04-25" "2023-04-04" ...
## $ Best_Price : int 2310582 2310582 2310582 2310582 2316313 2474980 2474980 2474980 2474980 2474980 ...
In the code above, we use the ‘mutate()’ function to modify the data types of the columns as specified. We ensure that “Timestamp” is in the timestamp format using ‘as.POSIXct()’, “Origin” and “Destination” are converted to factors, “Best_Price” is an integer, and “Depart_Date” is in date format. After these modifications, we use ‘str()’ to review the data structure.
This dataset contains the best flight prices from Jakarta (JKTC) to various cities in Indonesia according to their respective destinations. It includes information about the departure date, as explained in the “Depart_Date” column. Additionally, the ticket prices for these flights are provided in the “Best_Price” column.
To start, let’s determine the total number of rows in the flight dataset using ‘nrow()’:
## [1] 45438
In the dataset, there are 45,438 flights from Jakarta (JKTC). However, we are interested in knowing the departure dates of these flights. We can do this by using the ‘unique()’ function:
## [1] "2023-04-25" "2023-04-04" "2023-04-27" "2023-04-29" "2023-04-01"
## [6] "2023-04-24" "2023-04-02" "2023-04-03" "2023-04-26" "2023-04-09"
## [11] "2023-04-05" "2023-04-28" "2023-04-06" "2023-04-07" "2023-04-08"
## [16] "2023-04-12" "2023-04-13" "2023-04-30" "2023-04-10" "2023-04-11"
## [21] "2023-04-23" "2023-04-16" "2023-04-17" "2023-04-18" "2023-04-19"
## [26] "2023-04-20" "2023-04-21" "2023-04-22" "2023-04-14" "2023-04-15"
## [31] "2023-01-15" "2023-01-16" "2023-01-09" "2023-01-13" "2023-01-02"
## [36] "2023-01-05" "2023-01-27" "2023-01-03" "2023-01-08" "2023-01-06"
## [41] "2023-01-29" "2023-01-30" "2023-01-10" "2023-01-22" "2023-01-23"
## [46] "2023-01-01" "2023-01-20" "2023-01-04" "2023-01-25" "2023-01-11"
## [51] "2023-01-12" "2023-01-19" "2023-01-18" "2023-01-07" "2023-01-26"
## [56] "2023-01-24" "2023-01-31" "2023-01-14" "2023-01-17" "2023-01-28"
## [61] "2023-01-21" "2023-02-16" "2023-02-02" "2023-02-23" "2023-02-09"
## [66] "2023-02-11" "2023-02-18" "2023-02-04" "2023-02-25" "2023-02-17"
## [71] "2023-02-15" "2023-02-14" "2023-02-19" "2023-02-20" "2023-02-24"
## [76] "2023-02-01" "2023-02-21" "2023-02-06" "2023-02-28" "2023-02-27"
## [81] "2023-02-05" "2023-02-26" "2023-02-03" "2023-02-08" "2023-02-07"
## [86] "2023-02-13" "2023-02-12" "2023-02-10" "2023-02-22" "2023-03-28"
## [91] "2023-03-30" "2023-03-23" "2023-03-25" "2023-03-06" "2023-03-05"
## [96] "2023-03-10" "2023-03-13" "2023-03-12" "2023-03-17" "2023-03-19"
## [101] "2023-03-20" "2023-03-03" "2023-03-24" "2023-03-29" "2023-03-07"
## [106] "2023-03-27" "2023-03-26" "2023-03-04" "2023-03-09" "2023-03-08"
## [111] "2023-03-31" "2023-03-14" "2023-03-11" "2023-03-18" "2023-03-16"
## [116] "2023-03-15" "2023-03-02" "2023-03-01" "2023-03-22" "2023-03-21"
## [121] "2023-05-28" "2023-05-29" "2023-05-08" "2023-05-09" "2023-05-13"
## [126] "2023-05-14" "2023-05-15" "2023-05-16" "2023-05-31" "2023-05-10"
## [131] "2023-05-11" "2023-05-12" "2023-05-30" "2023-05-17" "2023-05-18"
## [136] "2023-05-19" "2023-05-24" "2023-05-25" "2023-05-26" "2023-05-27"
## [141] "2023-05-20" "2023-05-21" "2023-05-22" "2023-05-23" "2023-05-06"
## [146] "2023-05-07" "2023-05-02" "2023-05-03" "2023-05-04" "2023-05-05"
## [151] "2023-05-01" "2023-06-19" "2023-06-26" "2023-06-28" "2023-06-27"
## [156] "2023-06-21" "2023-06-20" "2023-06-14" "2023-06-13" "2023-06-12"
## [161] "2023-06-18" "2023-06-04" "2023-06-25" "2023-06-06" "2023-06-05"
## [166] "2023-06-22" "2023-06-24" "2023-06-02" "2023-06-01" "2023-06-23"
## [171] "2023-06-08" "2023-06-07" "2023-06-29" "2023-06-09" "2023-06-15"
## [176] "2023-06-17" "2023-06-16" "2023-06-11" "2023-06-10" "2023-06-03"
## [181] "2023-06-30" "2024-01-08" "2024-01-13" "2024-01-28" "2024-02-27"
## [186] "2024-02-12" "2024-03-28" "2024-04-27" "2024-04-12" "2024-05-27"
## [191] "2024-05-12" "2024-06-11" "2024-06-26" "2024-03-13" "2024-01-01"
## [196] "2024-01-20" "2024-04-23" "2024-05-23" "2024-05-08" "2024-06-07"
## [201] "2024-06-22" "2024-01-02" "2023-07-03" "2023-07-28" "2023-07-09"
## [206] "2023-07-08" "2023-07-16" "2023-07-15" "2023-07-18" "2023-07-17"
## [211] "2023-07-12" "2023-07-11" "2023-07-14" "2023-07-13" "2023-07-30"
## [216] "2023-07-10" "2023-07-31" "2023-07-19" "2023-07-05" "2023-07-27"
## [221] "2023-07-26" "2023-07-04" "2023-07-29" "2023-07-07" "2023-07-06"
## [226] "2023-07-23" "2023-07-01" "2023-07-22" "2023-07-25" "2023-07-02"
## [231] "2023-07-24" "2023-07-21" "2023-07-20"
Based on the “Depart_Date” column, there are departures in the year 2024. However, we only want to focus on the data for departures in the year 2023 from January to July. Here is how we can filter this data:
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
# Filter for the year 2023
data_tiket_2023 <- ticket %>%
filter(year(Depart_Date) == 2023)
# Remove the day component from the "Depart_Date" column
data_tiket_2023 <- data_tiket_2023 %>%
mutate(Depart_Date = as.Date(format(Depart_Date, "%Y-%m-01")))
data_tiket_2023After this separation, it is observed that there are 45,395 flights in the dataset for the year 2023, from January to July.
Next, we want to find out the destinations from Jakarta using the ‘levels()’ function.
## [1] "BDJ" "BDO" "BIK" "BKS" "BPN" "BTH" "BTJ" "DJB" "DJJ" "DPS" "JOG" "KNO"
## [13] "LOP" "MDC" "MKQ" "MLG" "PDG" "PGK" "PKU" "PLM" "PNK" "SOC" "SRG" "SUB"
## [25] "TKG" "TNJ" "TRK" "UPG" "YIA"
In the “Destination” column, there are flights to 29 cities in Indonesia. These destinations are identified by their International Air Transport Association (IATA) codes, which correspond to various airports in Indonesia. Here is an explanation of the IATA codes:
•BDJ: Syamsudin Noor Airport (Banjarmasin)
•BDO: Husein Sastranegara International Airport (Bandung)
•BIK: Frans Kaisiepo Airport (Biak)
•BKS: Fatmawati Soekarno Airport (Bengkulu)
•BPN: Sultan Aji Muhammad Sulaiman International Airport (Balikpapan)
•BTH: Hang Nadim Airport (Batam)
•BTJ: Sultan Iskandar Muda International Airport (Banda Aceh)
•DJB: Sultan Thaha Syaifuddin International Airport (Jambi)
•DJJ: Sentani Airport (Jayapura)
•DPS: Ngurah Rai International Airport (Denpasar, Bali)
•JOG: Adisutjipto International Airport (Yogyakarta)
•KNO: Kualanamu International Airport (Medan)
•LOP: Lombok International Airport (Praya)
•MDC: Sam Ratulangi International Airport (Manado)
•MKQ: Mopah Airport (Merauke)
•MLG: Abdul Rachman Saleh Airport (Malang)
•PDG: Minangkabau International Airport (Padang)
•PGK: Depati Amir International Airport (Pangkalpinang)
•PKU: Sultan Syarif Kasim II International Airport (Pekanbaru)
•PLM: Sultan Mahmud Badaruddin II International Airport (Palembang)
•PNK: Supadio International Airport (Pontianak)
•SOC: Adi Sumarmo Airport (Solo)
•SRG: Adisumarmo International Airport (Surakarta)
•SUB: Juanda International Airport (Surabaya)
•TKG: Radin Inten II Airport (Bandar Lampung)
•TNJ: Raja Haji Fisabilillah International Airport (Tanjung Pinang)
•TRK: Juwata Airport (Tarakan)
•UPG: Sultan Hasanuddin Airport (Makassar)
•YIA: Yogyakarta International Airport (Kulon Progo)
Next, we will explore the data further.
This exploration will be based on business questions that need to be answered.
library(dplyr)
# Finding the average flight ticket prices
ticket_TS <- ticket %>% mutate(Booking_Month = format(Timestamp, "%Y-%m"))%>% group_by(Booking_Month) %>% summarize(Average_Price = mean(Best_Price))
ticket_TSFrom this data, we can observe the changes in flights ticket prices over time. For example, we can see that the average flights ticket prices tend to decrease from month to month.
In the provided output, the ticket prices in December 2022 (1,573,344) are higher than in January 2023 (1,477,963) and February 2023 (1,402,706).
This indicates that booking tickets in advance (e.g., a few months before the departure date) tends to result in better prices.
# Import the necessary library
library(ggplot2)
# Group the data by "Depart_Date"
ticket_o <- data_tiket_2023 %>%
group_by(Depart_Date) %>%
summarize(Average_Price = mean(Best_Price))
# Create a ggplot for the output
ggplot(ticket_o, aes(x = Depart_Date, y = Average_Price)) +
geom_line() +
labs(title = "Perbedaan Harga Tiket Pesawat antara Tanggal Keberangkatan",
x = "Tanggal Keberangkatan",
y = "Rata-rata Harga Tiket") +
theme_minimal()In the provided plot, there is an upward price trend around February to April, followed by a gradual decrease after April towards July. This indicates a fluctuation in ticket prices from month to month. Further statistical analysis may be needed to investigate this.
3.Is there a price trend for specific days of the week?
# Import the necessary libraries
library(dplyr)
library(ggplot2)
# Extract the day of the week
data_tiket_2023 <- data_tiket_2023 %>%
mutate(Day_of_Week = weekdays(Depart_Date))
# Group the data by the day of the week
ticket_week <- data_tiket_2023 %>%
group_by(Day_of_Week) %>%
summarize(Average_Price = mean(Best_Price))
ticket_weekBased on the analysis of the data output, we can see the average flights ticket prices for each day of the week. The results are as follows:
•Monday has an average airfare ticket price of approximately 1,580,924.
•Saturday has an average airfare ticket price of approximately 1,569,280.
•Sunday has an average airfare ticket price of approximately 1,385,607.
•Thursday has an average airfare ticket price of approximately 1,553,374.
•Wednesday has an average airfare ticket price of approximately 1,379,488.
From this data, we can observe differences in the average flights ticket prices between days of the week. For example, ticket prices on Sundays appear to be lower compared to Mondays or Thursdays.