1. Introduction

In this final part, I worked with a travel pricing dataset comparing different booking agencies and service types (flights and hotels) across four seasons: Winter, Spring, Summer, and Fall. The dataset is untidy because it stores seasons as separate columns and combines multiple variables (Agency and Service Type) into a single field such as TravelOne - Flight. The goal was to tidy this structure into a long format and make it easier to compare prices across seasons, agencies, and services.


2. Load Libraries

To start, I loaded the libraries used for organizing, transforming, and analyzing the data.

library(tidyverse)

3. Create the Dataset

I recreated the dataset in wide format to match the original structure.

travel_prices <- data.frame(
  Agency = c("TravelOne - Flight", "TravelOne - Hotel",
             "Google Travel - Flight", "Google Travel - Hotel",
             "Booking.com - Flight", "Booking.com - Hotel"),
  Winter = c(350, 500, 340, 480, 360, 490),
  Spring = c(400, 550, 410, 530, 415, 540),
  Summer = c(525, 700, 510, 690, 520, 710),
  Fall = c(375, 600, 365, 580, 370, 590)
)

travel_prices
##                   Agency Winter Spring Summer Fall
## 1     TravelOne - Flight    350    400    525  375
## 2      TravelOne - Hotel    500    550    700  600
## 3 Google Travel - Flight    340    410    510  365
## 4  Google Travel - Hotel    480    530    690  580
## 5   Booking.com - Flight    360    415    520  370
## 6    Booking.com - Hotel    490    540    710  590

4. Tidy the Data

The dataset was untidy because each season was stored as a separate column and agency information mixed two variables (agency and service type). I reshaped the dataset into a long format using pivot_longer() and then separated the combined Agency field into two distinct variables: Agency and Service.

travel_tidy <- travel_prices %>%
  pivot_longer(
    cols = Winter:Fall,
    names_to = "Season",
    values_to = "Price"
  ) %>%
  separate(Agency, into = c("Agency", "Service"), sep = " - ")

travel_tidy
## # A tibble: 24 × 4
##    Agency        Service Season Price
##    <chr>         <chr>   <chr>  <dbl>
##  1 TravelOne     Flight  Winter   350
##  2 TravelOne     Flight  Spring   400
##  3 TravelOne     Flight  Summer   525
##  4 TravelOne     Flight  Fall     375
##  5 TravelOne     Hotel   Winter   500
##  6 TravelOne     Hotel   Spring   550
##  7 TravelOne     Hotel   Summer   700
##  8 TravelOne     Hotel   Fall     600
##  9 Google Travel Flight  Winter   340
## 10 Google Travel Flight  Spring   410
## # ℹ 14 more rows

5. Transform and Summary

Next, I summarized the data to find the average price per service type and per season. This allowed me to identify which agencies and seasons tend to offer the most cost-effective travel options.

avg_by_service <- travel_tidy %>%
  group_by(Service) %>%
  summarise(Average_Price = mean(Price))

avg_by_season <- travel_tidy %>%
  group_by(Season) %>%
  summarise(Average_Price = mean(Price))

avg_by_service
## # A tibble: 2 × 2
##   Service Average_Price
##   <chr>           <dbl>
## 1 Flight           412.
## 2 Hotel            580
avg_by_season
## # A tibble: 4 × 2
##   Season Average_Price
##   <chr>          <dbl>
## 1 Fall            480 
## 2 Spring          474.
## 3 Summer          609.
## 4 Winter          420

6. Visualization

To better compare seasonal trends, I created a line chart showing the average travel prices across the four seasons.

ggplot(travel_tidy, aes(x = Season, y = Price, color = Service, group = Service)) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  labs(title = "Average Travel Prices by Season and Service Type", 
       y = "Average Price ($)", 
       x = "Season") +
  theme_minimal()

7. Conclusion

After tidying and transforming the dataset, I was able to clearly compare seasonal price patterns across travel agencies and services. This final part of the project illustrated how untidy data, where multiple variables are embedded in column names or combined fields, can be reshaped into an organized structure that supports efficient analysis and visualization. Together, all three datasets demonstrated the importance of data tidying for uncovering insights, ensuring clarity, and enabling reproducible analysis.