1. Introduction

This report is divided into two parts: the first focuses on analyzing user search patterns, while the second addresses user frustration analysis. The goal is to understand user search patterns and the scenarios leading to no search results.

The dataset comprises 11 main features:

  • Destination Country
  • Destination City
  • Search Date
  • Search Day
  • Check in Date
  • Check in Day
  • Traveller Type
  • Session
  • No Search Result Session
  • No Search Result %
  • Bookings

Since the destination country and city are consistently Hong Kong, we will disregard the impact of these features. Additionally, since we want to analyze trends in search dates relative to check-in dates in the user search pattern analysis, a new feature called Days in Advance is created by subtracting the search date from the check-in date (Check-in Date - Search Date). This feature quantifies how far in advance users are searching and is categorized into the following 11 intervals:

  • 0D: On the same day
  • 1D: 1 day in advance
  • 2D ~ 3D: 2 to 3 days in advance
  • 4D ~ 1W: 4 to 7 days in advance
  • 1W ~ 2W: 8 to 14 days in advance
  • 2W ~ 1M: 15 to 30 days in advance
  • 1M ~ 2M: 31 to 60 days in advance
  • 2M ~ 3M: 61 to 90 days in advance
  • 3M ~ 6M: 91 to 180 days in advance
  • 6M ~ 1Y: 181 to 365 days in advance
  • Over 1Y: More than 365 days in advance

(D stands for Day, W for Week, M for Month, Y for Year)

Another key feature is Traveller Type, categorized into five types (excluding the ‘0’ type, which is undefined in the data):

  • Solo
  • 2 px
  • 3-4 px
  • (>=) 5 px
  • Family

The pie chart below shows the total search session percentage for each type of traveller:

The predominant traveller type in our dataset is 2 pax (two adults), accounting for about 70% of the total. This is followed by Family and 3-4 pax. The types Solo and >= 5 pax are relatively rare, comprising only 2.4% and 3.3% respectively. The predominance of two-person travel might be due to the popularity and availability of hotel rooms for two, suggesting that this segment should be our primary target to increase the conversion rate from searching to booking.

2. User Search Pattern Analysis

2.1 From Total Session Perspective

The bar chart below illustrates the distribution of sessions by days in advance, without taking traveller type into account. Observations:

  • Short-term searching: There is a significant percentage of sessions within short notice (0D to 3D), with immediate bookings (0D) accounting for 11%. This suggests a high demand for immediate or very short-term travel plans.
  • Peak searching period: The highest session percentage occurs 2 weeks to 1 months in advance (19.2%), indicating that this is the most common lead time. This might be driven by users looking to secure better rates or availability.
  • Decline in longer-term planning: Beyond 2 months, there is a noticeable decline in session percentages, except for the 6 months to 1 year category, which slightly increases to 8%.

The faceted bar chart below extends the previous analysis by incorporating traveller type into the session distribution. This allows us to examine how different types of travellers are searching their stays. Observations:

  • Solo and Small Groups: Solo travellers and small groups (2-3 pax) tend to book closer to their travel date, with a notable percentage of sessions occurring on the same day (0D).
  • Large Groups and Families: In contrast, larger groups (>= 5 pax) and families show higher session percentages for bookings made 1 to 2 months in advance. This suggests that these groups likely need to plan further ahead to accommodate their specific needs.
  • Consistent Trends: Across all traveller types, there is a noticeable trend where session percentages peak from 2 weeks to 2 month in advance, which may indicate the most favorable booking window for many travellers.

By aggregating the information from the analysis above, we can have conclusions for different type of travellers:

  • Solo and Small Groups: Marketing initiatives could be tailored to encourage earlier bookings, especially for solo travellers and small groups, who currently tend to book at the last minute. Also, given the peak at 2 weeks to 2 months in advance, marketing efforts should be timed to increase visibility around this period.
  • Large Groups and Families: Collaborating with hotels to offer early booking guarantees for larger groups and families can alleviate the stress associated with finding suitable accommodations. Promoting targeted incentives and packages for those booking well in advance, especially evident in occasional spikes for long-term bookings like the 6M ~ 1Y category, can also enhance booking experiences and satisfaction.

2.2 From Search Day Perspective

The bar chart below displays the overall percentage of search sessions by day of the week. This gives an initial insight into which days are most popular for users to engage in searching for accommodations. However, there is no significant differences in percentage of search sessions between the days of the week from the plot above. Thus, the faceted bar charts below extends our previous analysis by incorporating traveller type and days in advance into the session distribution respectively. Observations:

  • No Significant Differences: The search day percentages across different traveller types do not show significant variations, suggesting that the day of the week has a minimal impact on when different types of travellers decide to search.

Observations:

  • Immediate Searching: The search activity is relatively lower around weekends, suggesting a different dynamic affecting immediate booking decisions.
  • Searching 1D and 2D~3D: There is a noticeable increase in search activities on weekends, indicating a last-minute planning trend.
  • Searching 4D~1W and 1W ~ 2W: A similar weekend search trend is observed but with less pronounced differences.
  • Searching Beyond 2 Week: Search activities spread more evenly throughout the week, with a slight peak on Fridays.

By aggregating the information from the two bar charts above, we can have conclusion:

  • Further Analysis Required: While certain short-term booking trends can be observed, particularly with last-minute searches peaking on weekends, these patterns do not consistently hold as the booking lead time increases. The absence of a clear and consistent pattern across different segments indicates the need for a broader data set for analysis.

3. User Furstrations Analysis

3.1 No Result Session Percentage

The heatmap below illustrates the no result session percentage across different traveler types and days in advance.

Observing the heatmap vertically, it is evident that the no result session percentage increases with the size of the traveler group. This trend holds regardless of how early travelers search for accommodations, indicating a strong positive relationship between the likelihood of no results and group size.

Observing the haetmap horizontally, there is no consistent pattern between the days in advance of searching and no result session percentage across all traveler types. However, a notably higher likelihood of no results is observed for bookings made 6 months to 1 year in advance.

From the observations above, we can conclude that the no result session percentage likely correlates with traveler type, and shows little to no correlation with days in advance. Notably, the no result session percentage for groups of five or more travelers is significantly higher than that for smaller groups, surpassing even family groups.

The following heatmap represents a calendar view of total search sessions per check-in date.

The plot reveals that May 2024 experiences the highest search activity, aligning with the observation that most travelers book 1 to 2 months in advance. In contrast, search sessions for dates after 2024 July drastically decrease, with calendar blocks mostly white, indicating very low activity.

This heatmap below contrasts the previous by displaying no result session percentages instead of total search session for each check-in date. A clear trend emerges showing an increase in no result percentages as check-in dates become further away.

From the above analyses, an opposite trend is apparent between total search sessions and no result session percentages. Despite extremely low search volumes for check-in dates after 2024 July, the no result session percentages are significantly higher.

Thus, we can conclude that the no result session percentage is significantly influenced by traveler type and to a lesser extent by the days in advance. These findings highlight a potential mismatch between supply and demand, especially for larger groups and long-term bookings. Although demand for bookings longer than six months in advance is low, the supply is even lower and insufficient.

3.2 Correlation with Other Factors

This section details the results of the logistic regression model, which predicts the probability of a session resulting in no search outcomes (Percentage_no_result_session). The dependent variable in this model is binary, indicating whether or not a search session ended without any results. The logistic regression equation is expressed as follows:

\[ \log \left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \times \text{Category1D} + \beta_2 \times \text{Category2D_3D} + \ldots + \beta_{n} \times \text{Check_in_DaySaturday} \]

where \(p\) is the probability of a session ending with no results.

Summary of Logistic Regression Model
term estimate std.error statistic p.value
(Intercept) -2.3391962 0.3686254 -6.3457275 0.0000000
Category1D -0.5636665 0.5100230 -1.1051786 0.2690822
Category2D ~ 3D -0.4998088 0.4452017 -1.1226570 0.2615832
Category4D ~ 1W -0.3151248 0.3802743 -0.8286778 0.4072868
Category1W ~ 2W -0.3398839 0.3524871 -0.9642448 0.3349232
Category2W ~ 1M -0.3979447 0.3520726 -1.1302916 0.2583534
Category1M ~ 2M -0.2915577 0.3508600 -0.8309802 0.4059848
Category2M ~ 3M -0.2586577 0.3506129 -0.7377301 0.4606784
Category3M ~ 6M -0.0968134 0.3419018 -0.2831615 0.7770530
Category6M ~ 1Y 1.3422144 0.3388912 3.9606059 0.0000748
Traveller_Type2 pax 0.4920510 0.1442639 3.4107692 0.0006478
Traveller_Type3-4 pax 1.8181935 0.1368567 13.2853848 0.0000000
Traveller_Type>= 5 pax 2.7157365 0.1391750 19.5131060 0.0000000
Traveller_TypeFamily 1.9349377 0.1361162 14.2153406 0.0000000
Search_DayMonday -0.0170352 0.1309787 -0.1300610 0.8965182
Search_DayTuesday -0.0774684 0.1308527 -0.5920272 0.5538323
Search_DayWednesday -0.0468606 0.1301160 -0.3601447 0.7187389
Search_DayThursday -0.0322845 0.1301562 -0.2480444 0.8041000
Search_DayFriday -0.0459921 0.1267255 -0.3629267 0.7166597
Search_DaySaturday 0.0346562 0.1280981 0.2705440 0.7867418
Check_in_DayMonday -0.0433447 0.1287582 -0.3366365 0.7363910
Check_in_DayTuesday -0.0158334 0.1290199 -0.1227209 0.9023281
Check_in_DayWednesday 0.0112726 0.1283530 0.0878249 0.9300158
Check_in_DayThursday 0.0119217 0.1277522 0.0933190 0.9256502
Check_in_DayFriday -0.0459016 0.1274429 -0.3601735 0.7187174
Check_in_DaySaturday -0.0825308 0.1285399 -0.6420634 0.5208320

Statistically significant variables (p-value < 0.05) suggest a strong correlation with the occurrence of no result sessions:

  • Category 6M ~ 1Y: Coefficient = 1.34221. Sessions booked six months to one year in advance are more likely to end without results compared to same-day bookings (the baseline category). This positive coefficient implies that longer booking lead times are associated with an increased probability of no result sessions.
  • Traveller_Type 2 pax: Coefficient = 0.49205, indicating that two-person travel groups are more likely to experience no result sessions than solo travelers.
  • Traveller_Type 3-4 pax: Coefficient = 1.81819, suggesting that three to four-person groups have a significantly higher likelihood of no result sessions.
  • Traveller_Type >= 5 pax: Coefficient = 2.71574, the highest among the categories, shows that groups of five or more are substantially more likely to encounter no result sessions.
  • Traveller_Type Family: Coefficient = 1.93494, indicates that family groups also face a higher incidence of no result sessions compared to solo travelers.

Other variables, including different days of search and check-in, did not show significant correlations, suggesting they do not substantially affect the probability of no result sessions.

The analysis reveals a definitive relationship between no result session rates, group size, and the advance period of booking, as demonstrated in the GLM results. Notably, larger groups and those booking well in advance (6 months to 1 year) exhibit significantly higher rates of no result sessions, which aligns with trends observed in the earlier heatmap analyses.

4. Conclusions and Discussion

This report highlights several key insights:

  • Notable differences in search behaviors are observed among different traveler types, indicating distinct searching patterns.
  • No result sessions are positively correlated with the size of the traveler group, showing no correlation with the days in advance of searching, except in the 6M~1Y category.
  • No significant or impactful patterns are evident in the search days of the week for long-term advance searches.

However, the dataset covers only one month of search data in April 2024 and is limited to a single destination, Hong Kong. Therefore, a more diverse dataset is necessary for further analysis to enhance the robustness and precision of the results and insights. This expansion would also facilitate the identification of seasonal trends in user search behavior and patterns.