INTRO

The COVID-19 pandemic, which has been going on for more than a year, has had a major impact on many industries. One of the many industries that exist is the tourism industry. The tourism industry sector players must prepare a strategy to welcome the “New Normal” era.

In this markdown, I will give an overview of the tourism industry in 5 big cities on the island of Java;

  1. Jakarta
  2. Bandung
  3. Yogyakarta
  4. Semarang
  5. Surabaya

This data is taken from https://www.kaggle.com/aprabowo/indonesia-tourism-destination. Consists of 437 rows and 13 columns

  • Place_Id: Id name of tourist attractions
  • Place_Name: Name of tourist attractions
  • Description: Description of tourist attractions
  • Category: Tourist attractions category
  • City: The name of the city where the tourist attraction is located
  • Price: Ticket prices for the tourist attractions
  • Rating: Guest rating for tourist attractions
  • Time_Minutes: Time distance of tourist attractions from the city center
  • Coordinate: Coordinates of tourist attractions
  • Lat: Latitude Coordinates
  • Long: Longitude Coordinates
  • X: NA
  • X.1: NA

Dataset

First, we pull the data from the source, so we can see the overall data structure.

wisata <- read.csv("data_input/tourism_with_id.csv")
wisata

Then we impute the columns that we don’t need in the analysis process. The columns we don’t need are:

  • Place_Id
  • Description
  • Coordinate
  • X
  • X.1
wisata <- subset(wisata, select = -c(Place_Id, Description, Coordinate, X, X.1))
wisata

Summary

Let’s see a brief description of the data in each column.

summary(wisata)
#>   Place_Name          Category             City               Price       
#>  Length:437         Length:437         Length:437         Min.   :     0  
#>  Class :character   Class :character   Class :character   1st Qu.:     0  
#>  Mode  :character   Mode  :character   Mode  :character   Median :  5000  
#>                                                           Mean   : 24652  
#>                                                           3rd Qu.: 20000  
#>                                                           Max.   :900000  
#>                                                                           
#>      Rating       Time_Minutes         Lat              Long      
#>  Min.   :3.400   Min.   : 10.00   Min.   :-8.198   Min.   :103.9  
#>  1st Qu.:4.300   1st Qu.: 45.00   1st Qu.:-7.750   1st Qu.:107.6  
#>  Median :4.500   Median : 60.00   Median :-7.021   Median :110.2  
#>  Mean   :4.443   Mean   : 82.61   Mean   :-7.095   Mean   :109.2  
#>  3rd Qu.:4.600   3rd Qu.:120.00   3rd Qu.:-6.829   3rd Qu.:110.4  
#>  Max.   :5.000   Max.   :360.00   Max.   : 1.079   Max.   :112.8  
#>                  NA's   :232
  • Price variations, ranging from free to the highest ticket price of Rp. 900,000

  • The average ticket price in 5 major cities is Rp.24,652

  • The lowest rating is 3.4 while the highest rating is 5

  • Distance to tourist attractions, the fastest is 10 minutes, the longest is 6 hours. With an average travel time of 82.61 minutes

table(wisata$Category)
#> 
#>             Bahari             Budaya         Cagar Alam Pusat Perbelanjaan 
#>                 47                117                106                 15 
#>      Taman Hiburan      Tempat Ibadah 
#>                135                 17

The tourist attractions that spread across 5 cities have several categories, with the most categories being amusement parks, and the least categories being shopping centers.

barplot(table(wisata$Category), xlab = "Category", ylab = "Number of Categories")

PRICE

Top 10 Tourist Attractions with Highest Ticket Price

head(wisata[order(-wisata$Price), c(1,2,3,4)],10)

Half of the list consists of tourist attractions in the amusement park category.

Average Ticket Price in Each City

xtabs(formula = Price ~ City, data = wisata) / table(wisata$City)
#> City
#>    Bandung    Jakarta   Semarang   Surabaya Yogyakarta 
#>   24931.45   45130.95   17017.54   10195.65   19456.35

The highest average ticket price is in Jakarta (Rp. 45.130.95), while the lowest average ticket price is in Surabaya (Rp. 10.195.65)

barplot(xtabs(formula = Price ~ City, data = wisata) / table(wisata$City), xlab = "City", ylab = "Average Ticket Price")

RATING

Top 10 Tourist Attractions with the Highest Rating

head(wisata[order(-wisata$Rating), c(1,2,3,5)],10)

Tourist Attractions with the Lowest Rating

tail(wisata[order(-wisata$Rating), c(1,2,3,5)],)

Although several places in Bandung are included in the list of the 10 highest rated tourist attractions, it turns out that several tourist attractions in Bandung also have the lowest position in the rating.

Jakarta <- nrow(wisata[wisata$City == "Jakarta" & wisata$Rating > 4.5,])
Bandung <- nrow(wisata[wisata$City == "Bandung" & wisata$Rating > 4.5,])
Semarang <- nrow(wisata[wisata$City == "Semarang" & wisata$Rating > 4.5,])
Surabaya <- nrow(wisata[wisata$City == "Surabaya" & wisata$Rating > 4.5,])
Yogyakarta <- nrow(wisata[wisata$City == "Yogyakarta" & wisata$Rating > 4.5,])
rating_kota <- data.frame(Jakarta, Bandung, Semarang, Surabaya, Yogyakarta)
barplot(as.matrix(rating_kota), xlab = "City", ylab = "Number of Ratings > 4.5", width=50)

TIME RANGE

The average time needed to reach the fastest tourist attractions is in the city of Surabaya, this indicates that tourist sites in the city of Semarang are close to the city center. In the city of Jakarta there are several outliers, there are even tourist attractions that take longer travel time than other tourist attractions.

boxplot(formula = Time_Minutes ~ City, data = wisata)

PRICE vs RATING

Is there a correlation between rating and price?

cor(wisata$Rating, wisata$Price)
#> [1] 0.02324285

From these results it can be concluded that there is a correlation between Price and Rating. If the rating goes up, the price will go up, and vice versa.

LOCATION

Number of Tourist Attractions in Each City

table(wisata$City)
#> 
#>    Bandung    Jakarta   Semarang   Surabaya Yogyakarta 
#>        124         84         57         46        126

Yogyakarta has the most tourist attractions, while Surabaya ranks last with only 46 tourist attractions.

barplot(table(wisata$City), xlab = "City", ylab = "Number of Tourist Attractions")

Map Plot of Every Tourist Attraction

We will look for the location of each tourist attraction, based on the data in the Long and Lat columns.

library(tidyverse)
library(leaflet)
wisata %>%
  leaflet(width = "100%") %>%
  addTiles() %>%
  setView(106.8272, -6.175392, zoom = 5) %>%
  addMarkers(lat = ~Lat,
             lng = ~Long,
             popup = wisata$Place_Name)