Backgroud and Objectives

The project aims to analyze and forecast visitor trends to South Korea based on country of nationality and age groups. This information will help anticipate shifts in tourism demand and develop targeted strategies.

The main objectives are:

• Age Group Analysis: Identify which age groups visit South Korea most frequently and how these changes over time.

• Country-wise Trends: Analyze visitor trends by country of origin to determine the major tourism contributors.

• Seasonality and Long-Term Trends: Examine seasonal fluctuations and long-term trends in visitor arrivals to understand peak travel periods and shifts in demand.

More information and the complete dataset can be found using the following link:

https://www.kaggle.com/datasets/bappekim/south-korea-visitors?resource=download

Data wrangling, munging and cleaning

To clean the data, I performed the follow:

  1. Check for missing values. Since all columns are a zero this means not missing values were found.
colSums(is.na(df))
##     date   nation  visitor   growth    share  age0.20 age21.30 age31.40 
##        0        0        0        0        0        0        0        0 
## age41.50 age51.60    age61 
##        0        0        0
  1. Check for duplicates. No duplicates were found since value is zero.
sum(duplicated(df))
## [1] 0
  1. Corrected any data types (e.g. date column was changed to datetime)
df$date <- ym(df$date)

Exploratory Data Analysis

To find answers to the first objective, I summed the visitor numbers by age group and sorted them by descending order.

df_long <- df %>%
  pivot_longer(
    cols = starts_with("age"),  
    names_to = "age", 
    values_to = "age_visitor"
  )

df_age_total <- df_long %>%
  group_by(age) %>%
  summarise(total_visitors = sum(age_visitor, na.rm = TRUE))

For the second objective, I composed a list of the Top 10 countries with the most visitors.

df_country_total <- df %>%
  group_by(nation) %>%
  summarise(total_visitors = sum(visitor, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(total_visitors))

top_10_countries <- df_country_total %>%
  slice_max(total_visitors, n = 10) %>%
  pull(nation)

head(top_10_countries)
## [1] "China"     "Japan"     "Taiwan"    "USA"       "Hong Kong" "Thailand"

To address the seasonal fluctuations, I aggregated visitors by month. Checking visitors by month will show when most travel and it may correlate with a holiday or seasonal event.

df <- df %>%
  mutate(year = year(date),
         month = month(date, label = TRUE))

df_monthly <- df %>%
  group_by(month) %>%
  summarise(avg_visitors = mean(visitor, na.rm = TRUE)) %>%
  arrange(match(month, month.abb))

Data Visualization

Age group with the highest total visitors

ggplot(df_age_total, aes(x = reorder(age, total_visitors), y = total_visitors, fill = age)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Total Visitors to South Korea by Age Group",
       x = "Age Group",
       y = "Total Number of Visitors") +
  theme_minimal()

Top 10 countries by visitor count

df_top_10 <- df_country_total %>%
  filter(nation %in% top_10_countries)

ggplot(df_top_10, aes(x = reorder(nation, total_visitors), y = total_visitors, fill = nation)) +
  geom_bar(stat = "identity") +
  coord_flip() +  
  labs(title = "Top 10 Countries Visiting South Korea",
       x = "Country",
       y = "Total Visitors") +
  theme_minimal() +
  theme(legend.position = "none") 

Seasonality Trends

ggplot(df_monthly, aes(x = month, y = avg_visitors, group = 1)) +
  geom_line(color = "blue", linewidth = 1) +
  geom_point(size = 2) +
  labs(title = "Average Monthly Visitors to South Korea",
       x = "Month",
       y = "Average Visitors") +
  theme_minimal()

Conclusion

The key findings from this dataset are:

• People within the ages of 21 to 30 are the ones to visit South Korea most frequently.

• Neighboring countries like China, Japan, and Taiwan lead the pack as top contributors of visitors.

• Most people visit South Korea in the late summer (August) and early fall (September and October). This could be due to the type of weather the country experiences during this time.

With the information above, stakeholders in industries like transportation, accommodations, entertainment, and food services can prepare and deploy campaigns to target the selected demographics. This would increase revenue for the country and provide better trip experiences for visitors.