The goal of this challenge is to use a dataset based on millions of real anonymized accommodation reservations to come up with a strategy for making the best recommendation for their next destination in real-time.
The goal of the challenge is to predict (and recommend) the final city (city_id) of each trip (utrip_id). We will evaluate the quality of the predictions based on the top four recommended cities for each trip by using Precision@4 metric (4 representing the four suggestion slots at Booking.com website). When the true city is one of the top 4 suggestions (regardless of the order), it is considered correct.
“Data are just summaries of thousands of stories – tell a few of those stories to help make the data meaningful.” — Chip & Dan Heath
“A data scientist is someone who can obtain, scrub, explore, model, and interpret data, blending hacking, statistics, and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product.” – Hillary Mason, founder, Fast Forward Labs.
“Information is the oil of the 21st century, and analytics is the combustion engine.” – Peter Sondergaard, senior vice president, Gartner Research.
Questions: Questions will be asked previous to the visualization to make sure the visualizations shown in this project are insightful.
Summary: After each section I will provide a summary to understand what we got from the visualizations
Observations/Comments: Any observations I made with a given metric or visual or comments I have about a given summary of data
Observations - What can we learn from this deep dive that we didn’t know before from our inital observations of the dataset.
Understanding Current Dataset - The intention of this exploration is to also understand the booking.com dataset and how a customer travels using booking.com.
Modeling - Creating a model to predict (and recommend) the final city (city_id) of each trip (utrip_id)
## 'data.frame': 1048575 obs. of 9 variables:
## $ user_id : int 1006220 1006220 1006220 1006220 1010293 1010293 1010293 1010293 1010293 1010293 ...
## $ checkin : Factor w/ 425 levels "1/1/2016","1/1/2017",..: 272 245 246 250 364 335 337 338 340 341 ...
## $ checkout : Factor w/ 425 levels "1/1/2016","1/1/2017",..: 245 246 250 251 335 336 338 340 341 342 ...
## $ city_id : int 31114 39641 20232 24144 5325 55 23921 65322 23921 20545 ...
## $ device_class : Factor w/ 3 levels "desktop","mobile",..: 1 1 1 1 2 2 2 1 1 1 ...
## $ affiliate_id : int 384 384 384 384 359 359 359 9924 9924 10573 ...
## $ booker_country: Factor w/ 5 levels "Bartovia","Elbonia",..: 3 3 3 3 5 5 5 5 5 5 ...
## $ hotel_country : Factor w/ 193 levels "Absurdistan",..: 62 62 61 62 37 37 37 37 37 37 ...
## $ utrip_id : Factor w/ 195685 levels "1000027_1","1000045_1",..: 230 230 230 230 382 382 382 382 382 382 ...
| user_id | checkin | checkout | city_id | device_class | affiliate_id | booker_country | hotel_country | utrip_id |
|---|---|---|---|---|---|---|---|---|
| 1006220 | 4/9/2016 | 4/11/2016 | 31114 | desktop | 384 | Gondal | Gondal | 1006220_1 |
| 1006220 | 4/11/2016 | 4/12/2016 | 39641 | desktop | 384 | Gondal | Gondal | 1006220_1 |
| 1006220 | 4/12/2016 | 4/16/2016 | 20232 | desktop | 384 | Gondal | Glubbdubdrib | 1006220_1 |
| 1006220 | 4/16/2016 | 4/17/2016 | 24144 | desktop | 384 | Gondal | Gondal | 1006220_1 |
| 1010293 | 7/9/2016 | 7/10/2016 | 5325 | mobile | 359 | The Devilfire Empire | Cobra Island | 1010293_1 |
| 1010293 | 7/10/2016 | 7/11/2016 | 55 | mobile | 359 | The Devilfire Empire | Cobra Island | 1010293_1 |
| 1010293 | 7/12/2016 | 7/13/2016 | 23921 | mobile | 359 | The Devilfire Empire | Cobra Island | 1010293_1 |
| 1010293 | 7/13/2016 | 7/15/2016 | 65322 | desktop | 9924 | The Devilfire Empire | Cobra Island | 1010293_1 |
| 1010293 | 7/15/2016 | 7/16/2016 | 23921 | desktop | 9924 | The Devilfire Empire | Cobra Island | 1010293_1 |
| 1010293 | 7/16/2016 | 7/17/2016 | 20545 | desktop | 10573 | The Devilfire Empire | Cobra Island | 1010293_1 |
| user_id_cnt | city_id_cnt | affiliate_id_cnt | booker_country_cnt | hotel_country_cnt | utrip_id_cnt |
|---|---|---|---|---|---|
| 181231 | 38638 | 3126 | 5 | 193 | 195685 |
| user_id | checkin | checkout | city_id | device_class | affiliate_id | booker_country | hotel_country | utrip_id | stop_duration | trip_duration | total_city_dest | month | year | month_name | leg_of_trip |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1006220 | 4/9/2016 | 4/11/2016 | 31114 | desktop | 384 | Gondal | Gondal | 1006220_1 | 2 | 8 | 4 | 4 | 2016 | April | 4 |
| 1006220 | 4/11/2016 | 4/12/2016 | 39641 | desktop | 384 | Gondal | Gondal | 1006220_1 | 1 | 8 | 4 | 4 | 2016 | April | 1 |
| 1006220 | 4/12/2016 | 4/16/2016 | 20232 | desktop | 384 | Gondal | Glubbdubdrib | 1006220_1 | 4 | 8 | 4 | 4 | 2016 | April | 2 |
| 1006220 | 4/16/2016 | 4/17/2016 | 24144 | desktop | 384 | Gondal | Gondal | 1006220_1 | 1 | 8 | 4 | 4 | 2016 | April | 3 |
| 1010293 | 7/9/2016 | 7/10/2016 | 5325 | mobile | 359 | The Devilfire Empire | Cobra Island | 1010293_1 | 1 | 7 | 5 | 7 | 2016 | July | 6 |
| 1010293 | 7/10/2016 | 7/11/2016 | 55 | mobile | 359 | The Devilfire Empire | Cobra Island | 1010293_1 | 1 | 7 | 5 | 7 | 2016 | July | 1 |
| 1010293 | 7/12/2016 | 7/13/2016 | 23921 | mobile | 359 | The Devilfire Empire | Cobra Island | 1010293_1 | 1 | 7 | 5 | 7 | 2016 | July | 2 |
| 1010293 | 7/13/2016 | 7/15/2016 | 65322 | desktop | 9924 | The Devilfire Empire | Cobra Island | 1010293_1 | 2 | 7 | 5 | 7 | 2016 | July | 3 |
| 1010293 | 7/15/2016 | 7/16/2016 | 23921 | desktop | 9924 | The Devilfire Empire | Cobra Island | 1010293_1 | 1 | 7 | 5 | 7 | 2016 | July | 4 |
| 1010293 | 7/16/2016 | 7/17/2016 | 20545 | desktop | 10573 | The Devilfire Empire | Cobra Island | 1010293_1 | 1 | 7 | 5 | 7 | 2016 | July | 5 |