The COVID-19 pandemic, which has been going on for more than a year, has had a major impact on many industries. One of the many industries that exist is the tourism industry. The tourism industry sector players must prepare a strategy to welcome the “New Normal” era.
In this markdown, I will give an overview of the tourism industry in 5 big cities on the island of Java;
This data is taken from https://www.kaggle.com/aprabowo/indonesia-tourism-destination. Consists of 437 rows and 13 columns
Place_Id
: Id name of tourist attractionsPlace_Name
: Name of tourist attractionsDescription
: Description of tourist attractionsCategory
: Tourist attractions categoryCity
: The name of the city where the tourist attraction is locatedPrice
: Ticket prices for the tourist attractionsRating
: Guest rating for tourist attractionsTime_Minutes
: Time distance of tourist attractions from the city centerCoordinate
: Coordinates of tourist attractionsLat
: Latitude CoordinatesLong
: Longitude CoordinatesX
: NAX.1
: NAFirst, we pull the data from the source, so we can see the overall data structure.
<- read.csv("data_input/tourism_with_id.csv")
wisata wisata
Then we impute the columns that we don’t need in the analysis process. The columns we don’t need are:
<- subset(wisata, select = -c(Place_Id, Description, Coordinate, X, X.1))
wisata wisata
Let’s see a brief description of the data in each column.
summary(wisata)
#> Place_Name Category City Price
#> Length:437 Length:437 Length:437 Min. : 0
#> Class :character Class :character Class :character 1st Qu.: 0
#> Mode :character Mode :character Mode :character Median : 5000
#> Mean : 24652
#> 3rd Qu.: 20000
#> Max. :900000
#>
#> Rating Time_Minutes Lat Long
#> Min. :3.400 Min. : 10.00 Min. :-8.198 Min. :103.9
#> 1st Qu.:4.300 1st Qu.: 45.00 1st Qu.:-7.750 1st Qu.:107.6
#> Median :4.500 Median : 60.00 Median :-7.021 Median :110.2
#> Mean :4.443 Mean : 82.61 Mean :-7.095 Mean :109.2
#> 3rd Qu.:4.600 3rd Qu.:120.00 3rd Qu.:-6.829 3rd Qu.:110.4
#> Max. :5.000 Max. :360.00 Max. : 1.079 Max. :112.8
#> NA's :232
Price variations, ranging from free to the highest ticket price of Rp. 900,000
The average ticket price in 5 major cities is Rp.24,652
The lowest rating is 3.4 while the highest rating is 5
Distance to tourist attractions, the fastest is 10 minutes, the longest is 6 hours. With an average travel time of 82.61 minutes
table(wisata$Category)
#>
#> Bahari Budaya Cagar Alam Pusat Perbelanjaan
#> 47 117 106 15
#> Taman Hiburan Tempat Ibadah
#> 135 17
The tourist attractions that spread across 5 cities have several categories, with the most categories being amusement parks, and the least categories being shopping centers.
barplot(table(wisata$Category), xlab = "Category", ylab = "Number of Categories")
head(wisata[order(-wisata$Price), c(1,2,3,4)],10)
Half of the list consists of tourist attractions in the amusement park category.
xtabs(formula = Price ~ City, data = wisata) / table(wisata$City)
#> City
#> Bandung Jakarta Semarang Surabaya Yogyakarta
#> 24931.45 45130.95 17017.54 10195.65 19456.35
The highest average ticket price is in Jakarta (Rp. 45.130.95), while the lowest average ticket price is in Surabaya (Rp. 10.195.65)
barplot(xtabs(formula = Price ~ City, data = wisata) / table(wisata$City), xlab = "City", ylab = "Average Ticket Price")
head(wisata[order(-wisata$Rating), c(1,2,3,5)],10)
tail(wisata[order(-wisata$Rating), c(1,2,3,5)],)
Although several places in Bandung are included in the list of the 10 highest rated tourist attractions, it turns out that several tourist attractions in Bandung also have the lowest position in the rating.
<- nrow(wisata[wisata$City == "Jakarta" & wisata$Rating > 4.5,])
Jakarta <- nrow(wisata[wisata$City == "Bandung" & wisata$Rating > 4.5,])
Bandung <- nrow(wisata[wisata$City == "Semarang" & wisata$Rating > 4.5,])
Semarang <- nrow(wisata[wisata$City == "Surabaya" & wisata$Rating > 4.5,])
Surabaya <- nrow(wisata[wisata$City == "Yogyakarta" & wisata$Rating > 4.5,])
Yogyakarta <- data.frame(Jakarta, Bandung, Semarang, Surabaya, Yogyakarta)
rating_kota barplot(as.matrix(rating_kota), xlab = "City", ylab = "Number of Ratings > 4.5", width=50)
The average time needed to reach the fastest tourist attractions is in the city of Surabaya, this indicates that tourist sites in the city of Semarang are close to the city center. In the city of Jakarta there are several outliers, there are even tourist attractions that take longer travel time than other tourist attractions.
boxplot(formula = Time_Minutes ~ City, data = wisata)
Is there a correlation between rating and price?
cor(wisata$Rating, wisata$Price)
#> [1] 0.02324285
From these results it can be concluded that there is a correlation between Price and Rating. If the rating goes up, the price will go up, and vice versa.
table(wisata$City)
#>
#> Bandung Jakarta Semarang Surabaya Yogyakarta
#> 124 84 57 46 126
Yogyakarta has the most tourist attractions, while Surabaya ranks last with only 46 tourist attractions.
barplot(table(wisata$City), xlab = "City", ylab = "Number of Tourist Attractions")
We will look for the location of each tourist attraction, based on the data in the Long and Lat columns.
library(tidyverse)
library(leaflet)
%>%
wisata leaflet(width = "100%") %>%
addTiles() %>%
setView(106.8272, -6.175392, zoom = 5) %>%
addMarkers(lat = ~Lat,
lng = ~Long,
popup = wisata$Place_Name)