Introduction: For my Business Analytics class my group decided to make a luggage tourism business based in Seoul, South Korea. My role in the group’s final project was to develop maps and graphs based on location, interest, travel popularity and shopping satisfaction. Below is the code with step by step instruction on how I completed each visualization.
From Lines 11-16 I call in the appropriate library packages
From Lines 25–66, I loaded the Hotspots2 dataset that contains key locations our business would offer luggage transportation for. From there, I created a Leaflet map that groups locations under a common “Hotel” category, removes theme parks, assigns distinct colors by location type, and visualizes the points on a map with an accompanying legend. I then saved the interactive map as an HTML file using saveWidget() and converted it to a high-resolution PNG using webshot().
From Lines 70-79 I reset the working directory to /Users/asherscott/Desktop/Data 4Fun, loaded the Seoul.csv file into the ST data set using read_csv(), and displayed the first few rows with head().
From Lines 83-111 I used ggplot() and gem_col() to create and format an overlapping bar chart that compares the yearly rate of tourism between Seoul and all of South Korea. As you can see every year Seoul individually makes up 3/4 of all tourism in the country.
tourism_plot <-ggplot(ST, aes(x =factor(Year))) +geom_col( aes(y = Tourism_total /1e6, fill ="Total Visitors"),position ="identity", alpha =0.5, color ="black") +geom_col( aes(y = Tourism_capital /1e6, fill ="Seoul Visitors"),position ="identity", alpha =0.7, color ="black") +scale_fill_manual( values =c("Total Visitors"="blue", "Seoul Visitors"="red")) +labs( title ="Tourism Trends Over Time",x ="Year",y ="Number of Tourists (Millions)",fill ="Tourist Type") +theme_minimal(base_size =13) +theme(axis.text.x =element_text(angle =45, hjust =1),panel.grid.major =element_blank(),panel.grid.minor =element_blank(),panel.background =element_blank(),plot.background =element_blank() )tourism_plot
From Lines 115-118 I loaded in the shopping data set.
Shop <-read.csv("Shopping_Int.csv") head(Shop)
Year Rate Satisfactory X X.1 X.2 X.3
1 2024 58.2 58.6 NA NA NA NA
2 2023 57.9 60.6 NA NA NA NA
3 2022 51.5 55.0 NA NA NA NA
4 2021 33.5 37.1 NA NA NA NA
5 2020 16.1 19.7 NA NA NA NA
6 2019 66.2 70.0 NA NA NA NA
From Lines 122-158 I created a combined bar-and-line visualization showing shopping rates and tourist satisfaction by year by ensuring the Year variable was treated as categorical, dynamically scaling the y-axis based on the data using max(), constructing the chart with geom_bar(), geom_line(), and geom_point().
Shop$Year <-as.factor(Shop$Year)y_max <-max(c(Shop$Rate, Shop$Satisfactory), na.rm =TRUE) +5shopping_plot <-ggplot(Shop, aes(x = Year)) +geom_bar(aes(y = Rate, fill ="Shopping Rate"), stat ="identity") +geom_line(aes(y = Satisfactory, color ="Satisfactory"), group =1, size =1) +geom_point(aes(y = Satisfactory, color ="Satisfactory"), size =3) +geom_text(aes(y = Satisfactory, label = Satisfactory),vjust =-1, color ="black", size =3.5, show.legend =FALSE) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) +labs(title ="Shopping Rate and Tourist Satisfaction by Year",x ="Year",y ="Percentage",fill =NULL,color =NULL) +scale_fill_manual(values =c("Shopping Rate"="steelblue")) +scale_color_manual(values =c("Satisfactory"="black")) +ylim(0, y_max)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
From Lines 162-170 I converted the Year variable to numeric in both the Shop and ST datasets, merged them by Year into a single dataset, filtered the merged data to include only years 2018–2024, and displayed the resulting combined dataset.
Shop$Year <-as.numeric(as.character(Shop$Year))ST$Year <-as.numeric(as.character(ST$Year))Combined <-merge(Shop, ST, by ="Year", all =TRUE)Combined_filtered <-subset(Combined, Year >=2018& Year <=2024)print(Combined_filtered)
Year Rate Satisfactory X X.1 X.2 X.3 Tourism_total Tourism_capital
5 2018 63.8 67.2 NA NA NA NA 15346879 12188460
6 2019 66.2 70.0 NA NA NA NA 17502756 13377105
7 2020 16.1 19.7 NA NA NA NA 2519118 1075664
8 2021 33.5 37.1 NA NA NA NA 967003 558341
9 2022 51.5 55.0 NA NA NA NA 3198017 2634378
10 2023 57.9 60.6 NA NA NA NA 11031665 8858427
11 2024 58.2 58.6 NA NA NA NA 16375409 12854696
Nat_Percent
5 79.4
6 76.4
7 42.7
8 57.7
9 82.4
10 80.3
11 78.5
From Lines 174-182 I converted Tourism_capital in the Combined_filtered dataset to actual people, estimated potential customers by applying the shopping Rate to this number, rounded the results for readability, and displayed the Year, Tourism_capital_real, Rate, and Potential_Customers columns.
From Lines 185-189 I loaded in the first Length-of-stay of stay data set.
LOS <-read.csv("LengthOfStay.csv") head(LOS)
Year AE AE_NT DE DE_NT X X.1
1 2024 1,752 1,353 328 238 NA NA
2 2023 2,152 1,257 333 229 NA NA
3 2022 3,047 360 345 NA NA
4 2021 4,217 265 152 NA NA
5 2020 3,885 167 85 NA NA
6 2019 1,442 1,239 245 148 NA NA
From Lines 193-196 I loaded in the second Length-of-stay data set.
From Lines 200-253 I cleaned the LOS2 dataset by converting the AE, DE, and DE_NT columns to numeric using the mutate() function, where AE = Average Expense, DE = Daily Expense, and DE_NT = Daily Expense with Travel. I then created two bar graphs: one showing all three expense categories and another showing only the daily expenses (DE and DE_NT, excluding AE). Both graphs were customized with colors, labels, themes, and saved as PNG files.