The project aims to analyze and forecast visitor trends to South Korea based on country of nationality and age groups. This information will help anticipate shifts in tourism demand and develop targeted strategies.
The main objectives are:
• Age Group Analysis: Identify which age groups visit South Korea most frequently and how these changes over time.
• Country-wise Trends: Analyze visitor trends by country of origin to determine the major tourism contributors.
• Seasonality and Long-Term Trends: Examine seasonal fluctuations and long-term trends in visitor arrivals to understand peak travel periods and shifts in demand.
More information and the complete dataset can be found using the following link:
https://www.kaggle.com/datasets/bappekim/south-korea-visitors?resource=download
To clean the data, I performed the follow:
colSums(is.na(df))
## date nation visitor growth share age0.20 age21.30 age31.40
## 0 0 0 0 0 0 0 0
## age41.50 age51.60 age61
## 0 0 0
sum(duplicated(df))
## [1] 0
df$date <- ym(df$date)
To find answers to the first objective, I summed the visitor numbers by age group and sorted them by descending order.
df_long <- df %>%
pivot_longer(
cols = starts_with("age"),
names_to = "age",
values_to = "age_visitor"
)
df_age_total <- df_long %>%
group_by(age) %>%
summarise(total_visitors = sum(age_visitor, na.rm = TRUE))
For the second objective, I composed a list of the Top 10 countries with the most visitors.
df_country_total <- df %>%
group_by(nation) %>%
summarise(total_visitors = sum(visitor, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_visitors))
top_10_countries <- df_country_total %>%
slice_max(total_visitors, n = 10) %>%
pull(nation)
head(top_10_countries)
## [1] "China" "Japan" "Taiwan" "USA" "Hong Kong" "Thailand"
To address the seasonal fluctuations, I aggregated visitors by month. Checking visitors by month will show when most travel and it may correlate with a holiday or seasonal event.
df <- df %>%
mutate(year = year(date),
month = month(date, label = TRUE))
df_monthly <- df %>%
group_by(month) %>%
summarise(avg_visitors = mean(visitor, na.rm = TRUE)) %>%
arrange(match(month, month.abb))
Age group with the highest total visitors
ggplot(df_age_total, aes(x = reorder(age, total_visitors), y = total_visitors, fill = age)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Total Visitors to South Korea by Age Group",
x = "Age Group",
y = "Total Number of Visitors") +
theme_minimal()
Top 10 countries by visitor count
df_top_10 <- df_country_total %>%
filter(nation %in% top_10_countries)
ggplot(df_top_10, aes(x = reorder(nation, total_visitors), y = total_visitors, fill = nation)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Countries Visiting South Korea",
x = "Country",
y = "Total Visitors") +
theme_minimal() +
theme(legend.position = "none")
Seasonality Trends
ggplot(df_monthly, aes(x = month, y = avg_visitors, group = 1)) +
geom_line(color = "blue", linewidth = 1) +
geom_point(size = 2) +
labs(title = "Average Monthly Visitors to South Korea",
x = "Month",
y = "Average Visitors") +
theme_minimal()
The key findings from this dataset are:
• People within the ages of 21 to 30 are the ones to visit South Korea most frequently.
• Neighboring countries like China, Japan, and Taiwan lead the pack as top contributors of visitors.
• Most people visit South Korea in the late summer (August) and early fall (September and October). This could be due to the type of weather the country experiences during this time.
With the information above, stakeholders in industries like transportation, accommodations, entertainment, and food services can prepare and deploy campaigns to target the selected demographics. This would increase revenue for the country and provide better trip experiences for visitors.