This report is divided into two parts: the first focuses on analyzing user search patterns, while the second addresses user frustration analysis. The goal is to understand user search patterns and the scenarios leading to no search results.
The dataset comprises 11 main features:
Destination CountryDestination CitySearch DateSearch DayCheck in DateCheck in DayTraveller TypeSessionNo Search Result SessionNo Search Result %BookingsSince the destination country and city are consistently Hong Kong, we
will disregard the impact of these features. Additionally, since we want
to analyze trends in search dates relative to check-in dates in the user
search pattern analysis, a new feature called
Days in Advance is created by subtracting the search date
from the check-in date (Check-in Date -
Search Date). This feature quantifies how far in advance
users are searching and is categorized into the following 11
intervals:
(D stands for Day, W for Week, M for Month, Y for Year)
Another key feature is Traveller Type, categorized into
five types (excluding the ‘0’ type, which is undefined in the data):
The pie chart below shows the total search session percentage for each type of traveller:
The predominant traveller type in our dataset is 2 pax (two adults), accounting for about 70% of the total. This is followed by Family and 3-4 pax. The types Solo and >= 5 pax are relatively rare, comprising only 2.4% and 3.3% respectively. The predominance of two-person travel might be due to the popularity and availability of hotel rooms for two, suggesting that this segment should be our primary target to increase the conversion rate from searching to booking.
The bar chart below illustrates the distribution of sessions by days
in advance, without taking traveller type into account.
Observations:
The faceted bar chart below extends the previous analysis by
incorporating traveller type into the session distribution. This allows
us to examine how different types of travellers are searching their
stays.
Observations:
By aggregating the information from the analysis above, we can have conclusions for different type of travellers:
The bar chart below displays the overall percentage of search
sessions by day of the week. This gives an initial insight into which
days are most popular for users to engage in searching for
accommodations.
However, there is no significant differences in percentage of search
sessions between the days of the week from the plot above. Thus, the
faceted bar charts below extends our previous analysis by incorporating
traveller type and days in advance into the session distribution
respectively.
Observations:
Observations:
By aggregating the information from the two bar charts above, we can have conclusion:
The heatmap below illustrates the no result session percentage across different traveler types and days in advance.
Observing the heatmap vertically, it is evident that the no result session percentage increases with the size of the traveler group. This trend holds regardless of how early travelers search for accommodations, indicating a strong positive relationship between the likelihood of no results and group size.
Observing the haetmap horizontally, there is no consistent pattern between the days in advance of searching and no result session percentage across all traveler types. However, a notably higher likelihood of no results is observed for bookings made 6 months to 1 year in advance.
From the observations above, we can conclude that the no result session percentage likely correlates with traveler type, and shows little to no correlation with days in advance. Notably, the no result session percentage for groups of five or more travelers is significantly higher than that for smaller groups, surpassing even family groups.
The following heatmap represents a calendar view of total search sessions per check-in date.
The plot reveals that May 2024 experiences the highest search activity, aligning with the observation that most travelers book 1 to 2 months in advance. In contrast, search sessions for dates after 2024 July drastically decrease, with calendar blocks mostly white, indicating very low activity.
This heatmap below contrasts the previous by displaying no result session percentages instead of total search session for each check-in date. A clear trend emerges showing an increase in no result percentages as check-in dates become further away.
From the above analyses, an opposite trend is apparent between total search sessions and no result session percentages. Despite extremely low search volumes for check-in dates after 2024 July, the no result session percentages are significantly higher.
Thus, we can conclude that the no result session percentage is significantly influenced by traveler type and to a lesser extent by the days in advance. These findings highlight a potential mismatch between supply and demand, especially for larger groups and long-term bookings. Although demand for bookings longer than six months in advance is low, the supply is even lower and insufficient.
This section details the results of the logistic regression model,
which predicts the probability of a session resulting in no search
outcomes (Percentage_no_result_session). The dependent
variable in this model is binary, indicating whether or not a search
session ended without any results. The logistic regression equation is
expressed as follows:
\[ \log \left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \times \text{Category1D} + \beta_2 \times \text{Category2D_3D} + \ldots + \beta_{n} \times \text{Check_in_DaySaturday} \]
where \(p\) is the probability of a session ending with no results.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -2.3391962 | 0.3686254 | -6.3457275 | 0.0000000 |
| Category1D | -0.5636665 | 0.5100230 | -1.1051786 | 0.2690822 |
| Category2D ~ 3D | -0.4998088 | 0.4452017 | -1.1226570 | 0.2615832 |
| Category4D ~ 1W | -0.3151248 | 0.3802743 | -0.8286778 | 0.4072868 |
| Category1W ~ 2W | -0.3398839 | 0.3524871 | -0.9642448 | 0.3349232 |
| Category2W ~ 1M | -0.3979447 | 0.3520726 | -1.1302916 | 0.2583534 |
| Category1M ~ 2M | -0.2915577 | 0.3508600 | -0.8309802 | 0.4059848 |
| Category2M ~ 3M | -0.2586577 | 0.3506129 | -0.7377301 | 0.4606784 |
| Category3M ~ 6M | -0.0968134 | 0.3419018 | -0.2831615 | 0.7770530 |
| Category6M ~ 1Y | 1.3422144 | 0.3388912 | 3.9606059 | 0.0000748 |
| Traveller_Type2 pax | 0.4920510 | 0.1442639 | 3.4107692 | 0.0006478 |
| Traveller_Type3-4 pax | 1.8181935 | 0.1368567 | 13.2853848 | 0.0000000 |
| Traveller_Type>= 5 pax | 2.7157365 | 0.1391750 | 19.5131060 | 0.0000000 |
| Traveller_TypeFamily | 1.9349377 | 0.1361162 | 14.2153406 | 0.0000000 |
| Search_DayMonday | -0.0170352 | 0.1309787 | -0.1300610 | 0.8965182 |
| Search_DayTuesday | -0.0774684 | 0.1308527 | -0.5920272 | 0.5538323 |
| Search_DayWednesday | -0.0468606 | 0.1301160 | -0.3601447 | 0.7187389 |
| Search_DayThursday | -0.0322845 | 0.1301562 | -0.2480444 | 0.8041000 |
| Search_DayFriday | -0.0459921 | 0.1267255 | -0.3629267 | 0.7166597 |
| Search_DaySaturday | 0.0346562 | 0.1280981 | 0.2705440 | 0.7867418 |
| Check_in_DayMonday | -0.0433447 | 0.1287582 | -0.3366365 | 0.7363910 |
| Check_in_DayTuesday | -0.0158334 | 0.1290199 | -0.1227209 | 0.9023281 |
| Check_in_DayWednesday | 0.0112726 | 0.1283530 | 0.0878249 | 0.9300158 |
| Check_in_DayThursday | 0.0119217 | 0.1277522 | 0.0933190 | 0.9256502 |
| Check_in_DayFriday | -0.0459016 | 0.1274429 | -0.3601735 | 0.7187174 |
| Check_in_DaySaturday | -0.0825308 | 0.1285399 | -0.6420634 | 0.5208320 |
Statistically significant variables (p-value < 0.05) suggest a strong correlation with the occurrence of no result sessions:
Other variables, including different days of search and check-in, did not show significant correlations, suggesting they do not substantially affect the probability of no result sessions.
The analysis reveals a definitive relationship between no result session rates, group size, and the advance period of booking, as demonstrated in the GLM results. Notably, larger groups and those booking well in advance (6 months to 1 year) exhibit significantly higher rates of no result sessions, which aligns with trends observed in the earlier heatmap analyses.
This report highlights several key insights:
However, the dataset covers only one month of search data in April 2024 and is limited to a single destination, Hong Kong. Therefore, a more diverse dataset is necessary for further analysis to enhance the robustness and precision of the results and insights. This expansion would also facilitate the identification of seasonal trends in user search behavior and patterns.