Hotel Data Analysis with Group by and Probabilities

Initial setup and Configure the data set.
Install and import library dplyr and ggplot2.
Load the data set file in variable h_data (read.csv()) which will be used in through out the program.
Convert reservation_status_date in data set as Date.

Group 1 : Based on lead_time
1. Convert reservation_status_date of data set to date mm/dd/YYYY.
2. Groups the data by ‘market_segment’ using the group_by function.
3. Calculated the mean value of lead time for each market segment which is group by summarize function.
4. Computes the count of observations in each group using the n function.
5. Filters the grouped data to identify the group with the minimum count (n).
4. Assign a special tag to the smallest group Lowest Probability Group.
5. Merge back to the original data frame.

#Convert reservation_status_date to date mm/dd/YYYY according to R.
h_data$reservation_status_date <- as.Date(h_data$reservation_status_date, format="%m/%d/%Y")
grouped_market_segment <- h_data %>%
group_by(market_segment) %>%
summarize(average_lead_time = mean(lead_time),
          n = n()) %>%
  ungroup()
# Assign a special tag to the smallest group
smallest_group_market_segment <- grouped_market_segment %>%
  group_by_all() %>%
  filter(n == min(n)) %>%
  ungroup() %>%
  mutate(special_tag = "Lowest Probability Group")

# Merge back to the original data frame
hotel_data_with_tag_market_segment <- h_data %>%
  left_join(smallest_group_market_segment, by = "market_segment")

Including Plot #1

Creates a bar plot using ggplot to visualize the average lead time for each market segment.
The x-axis represents the ‘Market Segment’, and the y-axis represents the ‘Average Lead Time’.
Bars are colored with the fill Green.
The plot has a minimal theme.
Prints the generated plot.

Insight : This analysis aims to identify the market segment with the lowest probability, based on the average lead time.
The bar plot visually represents the average lead time for each market segment, and the special tag is used to highlight the group with the lowest probability.

#Grouping 2: Based on meal column
1. Groups the data by the meal column using the group_by function.
2. Calculates the mean of the stays_in_weekend_nights variable within each meal group using the summarize
function.
3. Computes the count of observations in each group using the n function.
4. Filters the grouped data to identify the group with the minimum count (n).
5. Assign a special tag to the smallest group- Lowest Probability Group.
6. Merges the data frame with the special tags back to the original data frame using the meal column.

grouped_meal <- h_data %>%
  group_by(meal) %>%
  summarize(average_stays_in_weekend_nights = mean(stays_in_weekend_nights),
            n = n()) %>%
  ungroup()
# Assign a special tag to the smallest group
smallest_group_meal <- grouped_meal %>%
  group_by_all() %>%
  filter(n == min(n)) %>%
  ungroup() %>%
  mutate(special_tag = "Lowest Probability Group")

# Merge back to the original data frame
hotel_data_with_tag_meal <- h_data %>%
  left_join(smallest_group_meal, by = "meal")

Including Plot #2

Creates a bar plot using ggplot to visualize the average stays in weekend nights for each meal type.
The x-axis represents the ‘Meal Type’, and the y-axis represents the ‘Average Stays in Weekend Nights’.
Bars are colored with the fill Green Color.
The plot has a minimal theme.

Insight : The analysis aims to identify patterns or differences in the average stays in weekend nights based on different meal types.
The visualization provides a clear comparison of the average stays in weekend nights across different meal types, allowing for insights into potential relationships or variations associated with meal choices.

#Grouping 3: Group by deposit_type and summarize total_of_special_requests. 1. Groups the data by the deposit_type column using the group_by function.
2. Calculates the mean of the total_of_special_requests variable within each deposit_type group using the summarize function.
3.Computes the count of observations in each group using the n function.
4. Filters the grouped data to identify the group with the minimum count (n).
5. Assign a special tag to the smallest group- Lowest Probability Group.
6. Merges the data frame with the special tags back to the original data frame using the ‘deposit_type’ column.

grouped_deposit_type <- h_data %>%
  group_by(deposit_type) %>%
  summarize(average_total_of_special_requests = mean(total_of_special_requests),
            n = n()) %>%
  ungroup()

#Number_total_of_special_requests <-mean(total_of_special_requests)

# Assign a special tag to the smallest group
smallest_group_deposit_type <- grouped_deposit_type %>%
  group_by_all() %>%
  filter(n == min(n)) %>%
  ungroup() %>%
  mutate(special_tag = "Lowest Probability Group")

# Merge back to the original data frame
hotel_data_with_tag_deposit_type <- h_data %>%
  left_join(smallest_group_deposit_type, by = "deposit_type")

Including Plot #3

1.Creates a bar plot using ggplot to visualize the average total special requests for each deposit type.
2.The x-axis represents the ‘Deposit Type’, and the y-axis represents the ‘Average Total Special Requests’.
3.Bars are colored with the fill “#66c2a5”.
4.The plot has a minimal theme.

Insight: The analysis aims to explore the relationship between the deposit type and the average total special requests made by guests.The visualization provides a comparison of the average total special requests across different deposit types, allowing for insights into potential patterns or differences associated with deposit preferences.
This analysis can help understand if there are specific deposit types that are associated with a higher or lower average total of special requests, providing valuable insights for hotel management or decision-makers.

Thank You

Hotel Data Analysis with Group by and Probabilities

Reshu Gupta

2024-01-29

Including Plot #1

Including Plot #2

Including Plot #3