In this final part, I worked with a travel pricing dataset comparing different booking agencies and service types (flights and hotels) across four seasons: Winter, Spring, Summer, and Fall. The dataset is untidy because it stores seasons as separate columns and combines multiple variables (Agency and Service Type) into a single field such as TravelOne - Flight. The goal was to tidy this structure into a long format and make it easier to compare prices across seasons, agencies, and services.
To start, I loaded the libraries used for organizing, transforming, and analyzing the data.
library(tidyverse)
I recreated the dataset in wide format to match the original structure.
travel_prices <- data.frame(
Agency = c("TravelOne - Flight", "TravelOne - Hotel",
"Google Travel - Flight", "Google Travel - Hotel",
"Booking.com - Flight", "Booking.com - Hotel"),
Winter = c(350, 500, 340, 480, 360, 490),
Spring = c(400, 550, 410, 530, 415, 540),
Summer = c(525, 700, 510, 690, 520, 710),
Fall = c(375, 600, 365, 580, 370, 590)
)
travel_prices
## Agency Winter Spring Summer Fall
## 1 TravelOne - Flight 350 400 525 375
## 2 TravelOne - Hotel 500 550 700 600
## 3 Google Travel - Flight 340 410 510 365
## 4 Google Travel - Hotel 480 530 690 580
## 5 Booking.com - Flight 360 415 520 370
## 6 Booking.com - Hotel 490 540 710 590
The dataset was untidy because each season was stored as a separate column and agency information mixed two variables (agency and service type). I reshaped the dataset into a long format using pivot_longer() and then separated the combined Agency field into two distinct variables: Agency and Service.
travel_tidy <- travel_prices %>%
pivot_longer(
cols = Winter:Fall,
names_to = "Season",
values_to = "Price"
) %>%
separate(Agency, into = c("Agency", "Service"), sep = " - ")
travel_tidy
## # A tibble: 24 × 4
## Agency Service Season Price
## <chr> <chr> <chr> <dbl>
## 1 TravelOne Flight Winter 350
## 2 TravelOne Flight Spring 400
## 3 TravelOne Flight Summer 525
## 4 TravelOne Flight Fall 375
## 5 TravelOne Hotel Winter 500
## 6 TravelOne Hotel Spring 550
## 7 TravelOne Hotel Summer 700
## 8 TravelOne Hotel Fall 600
## 9 Google Travel Flight Winter 340
## 10 Google Travel Flight Spring 410
## # ℹ 14 more rows
Next, I summarized the data to find the average price per service type and per season. This allowed me to identify which agencies and seasons tend to offer the most cost-effective travel options.
avg_by_service <- travel_tidy %>%
group_by(Service) %>%
summarise(Average_Price = mean(Price))
avg_by_season <- travel_tidy %>%
group_by(Season) %>%
summarise(Average_Price = mean(Price))
avg_by_service
## # A tibble: 2 × 2
## Service Average_Price
## <chr> <dbl>
## 1 Flight 412.
## 2 Hotel 580
avg_by_season
## # A tibble: 4 × 2
## Season Average_Price
## <chr> <dbl>
## 1 Fall 480
## 2 Spring 474.
## 3 Summer 609.
## 4 Winter 420
To better compare seasonal trends, I created a line chart showing the average travel prices across the four seasons.
ggplot(travel_tidy, aes(x = Season, y = Price, color = Service, group = Service)) +
geom_line(size = 1.2) +
geom_point(size = 2) +
labs(title = "Average Travel Prices by Season and Service Type",
y = "Average Price ($)",
x = "Season") +
theme_minimal()
After tidying and transforming the dataset, I was able to clearly compare seasonal price patterns across travel agencies and services. This final part of the project illustrated how untidy data, where multiple variables are embedded in column names or combined fields, can be reshaped into an organized structure that supports efficient analysis and visualization. Together, all three datasets demonstrated the importance of data tidying for uncovering insights, ensuring clarity, and enabling reproducible analysis.