Please complete the research discussion assignment in a Jupyter or R Markdown notebook. You should post the GitHub link to your research in a new discussion thread.
Now that we have covered basic techniques for recommender systems, choose one commercial recommender and describe how you think it works (content-based, collaborative filtering, etc). Does the technique deliver a good experience or are the recommendations off-target?
You may also choose one of the three non-personalized recommenders (below) we went over in class and describe the technique and which of the three you prefer to use.
I’ve chosen to examine the Vacation property rental recommendation system implemented by data scientists at VRBO (“Vacation Rental By Owner”), an affiliate of HOMEAWAY, which is now part of the Expedia travel group. It appears that the usage of such engine extends across affiliated websites (e.g., the websites for Homeaway and VRBO appear to be nearly identical in format as well as content.) Although there are also many similarities within the hotels booking section of their parent, Expedia (which also books flights, rental cars, etc.), here I will look just at the usage of this engine in the vacation rental market rather than the commercial hotel booking market.
VRBO and Homeaway connect property managers and homeowners with travelers who seek the space, value and amenities of vacation rental homes rather than hotels. In a sense, VRBO competes with AirBnB, but they view their target customers differently. VRBO’s branding is known for high-end vacation properties, often in luxury locations, where the customer gets the use of the entire property for the duration of his/her contract. In contrast, AirBnB arranges for rentals of both entire properties as well as individual rooms within a homeowner’s dwelling, e.g., where the homeowner may be providing Breakfast in addition to the Bed (thus, the “BnB” in their name.)
As such, for VRBO/Homeaway there are two sets of target users, each with differing needs and goals:
Property managers and homeowners who seek to rent out their vacation homes to reputable guests.
Individuals who are interested in a private vacation rental rather than a hotel room.
Here are some excerpts from the HomeAway and VRBO websites which decribe their (now, united) mission:
“HomeAway.com connects property managers and homeowners with travelers who seek the space, value and amenities of vacation rental homes. The site has the largest and most diverse selection of homes around the world, with more than 1 million listings across 120 countries. By listing your properties on HomeAway you can take advantage of a global network of 50 sites, including VRBO, with over 750 million traveler visits each year. Property Managers can market to HomeAway’s traveler audience via yearly subscriptions or via a pay-per-booking model. HomeAway’s pricing models are flexible, allowing you to choose the listing option that meets your business needs, plus HomeAway offers discounts on bulk listing purchases. HomeAway’s dedicated account management team is standing by to recommend strategies, maximize your marketing ROI, and help your business grow.”
"In 1995, VRBO introduced a new way for people to travel together, pairing homeowners with families and friends looking for places to stay. We were grounded in one purpose: To give people the space they need to drop the distractions of everyday life and simply be together. Since then, we’ve grown into a global community of homeowners and travelers, with unique properties in 190 countries around the world. VRBO makes it easy and fun to book cabins, condos, beach houses and every kind of space in between.
VRBO is part of Expedia Group and offers homeowners and property managers exposure to over 750 million visits to Expedia Group sites each month."
As there are two sets of target users, each has a differing goal:
It can be much cheaper to stay in an apartment or home rented through an online service such as VRBO/HomeAway than a hotel. Not only that, you can often get a lot more room for your money, making these short-term rentals particularly cost effective for families. Plus, you’ll ordinarily have access to a kitchen of some type, so you can save money on eating at restaurants during your stay.
Among prospective renters, some users may already have a specific destination in mind, and are searching for an available property at a specific location. In this case, the system will present a set of properties which meet the user’s search specifications. However, the system’s recommendation algorithm will decide on which properties to present at the top of the list, making those the first items seen by the customer (unless the customer has specified a different sorting preference.)
However, other users may not yet have decided on a destination. The system has the opportunity to suggest a few potential destinations to the customer upon initial navigation to the website. In such case, the system can recommend locations which may be of interest to the user. If such a location results in the customer thinking “I haven’t been there, this looks like a really {beautiful|interesting} place, why not take my next vacation to that place?” the system has sold the customer on the macro perspective – the destination. Next it can pitch the customer on the micro perspective – a specific booking at that destination.
The website enables
the owners of properties to find prospective renters who are interested in coming to their location and renting a vacation property, and
vacationers can be guided toward various destinations of interest, and then offered listings of available vacation properties available for rent at such location.
The recommendation algorithm ensures that owner’s properties will be presented to prospective customers who may have an interest in such destination. From the perspective of vacationers, the destinations and properties which are presented must be interesting to the user, who otherwise may not select a booking from the website and may pursue alternatives elsewhere. However, through a methodology known as “retargeting”, the ecosystem retains the ability to recognize prospective customers who didn’t make any booking, and targets them with advertisements (displayed on unrelated internet pages in their web browser or other interactive applications) to induce them to return to the website.
While other firms typically use collaborative filtering and/or content-based recommendation settings. VRBO’s recommendation engine differs in that it takes a hybrid approach which implements a session-based local embedding model. This approach is based upon two stages:
Train a skip-gram sequence model to capture a local embedding representation for each listing, then extrapolate latent embeddings for listings subject to the Cold Start problem.
Source: Tomas Mikolov et.al., “Distributed Representations of Words and Phrases and their Compositionality”,https://arxiv.org/abs/1310.4546
A skip-gram is an architecture for word2vec in which the model uses the current word to predict the surrounding window of context words, weighing nearby context words more heavily than more distant context words.
Here, the skip-gram model attempts to predict listings \(x_i\) surrounded by listings \(x_{i−c}\) and \(x_{i+c}\) viewed in a traveler session \(s_k\) , based on the premise that traveler’s view of listings in the same session signals the similarity of those listings.
The training objective is to find the listing local representation that specifies surrounding most similar manifold.
Two key issues to address include sparsity and heterogeneity in views per item.
Especially frequent items are downsampled using the inverse square root of the frequency, and listings with extrermely low frequency are removed.
In this context, the Cold Start problem refers to the situation when newly added rental properties have recently been added to the system, but no (or, few) user interactions have occurred which would allow the system to learn about the property in order to recommend it.
To resolve the Cold Start problem, the contextual information that relates destinations (or search terms) to the listings based on the booking information is leveraged. Given latitude and longitude of the cold listing (for which we have no data), a belief is formed about the proportion of demand driven from each of the search terms pertaining to related/nearby destinations. Then the destination embedding from the earlier step is used to find the expected listing embedding for the cold listing.
Train a Deep Average Network (DAN) stacked with decoder and encoder layers predicting purchase events to capture a given traveler’s embedding, or latent preference for listings embedding.
Source: Tomas Mikolov, “Efficient Estimation of Word Representations in Vector Space”, https://arxiv.org/abs/1301.3781
In the second stage, given the listing’s embedding from the previous stage we model traveler embeddings using a sandwiched encoder-decoder non-linear Relu function.
In contrast to relatively weak implicit view signals, in this stage we leverage strong booking signals as a target variable based on historical traveler listing interaction. The adaptive stochastic gradient descent method is used to train the binary cross entropy of the neural networks.
The final question to answer is how to combine the traveler and listing embedding for personalized recommendations.
This is a particularly challenging task as traveler embeddings is non-linear projection of listings embedding with a different dimension. As a result, they are not in the same space to compute cosine similarity. (The authors defer discussion of this approach to their subsequent study.)
Additionally, VRBO can draw users to its website by bidding for advertisements when users are engaged in other activities in a web browser. For prospective customers who have previously visited the site, but perhaps didn’t purchase anything, VRBO can “retarget” them by displaying advertisements which are most likely to draw their interest.
Here is an illustration of the concept:
Source: Meisam HejaziNia et al., “Slide deck: Deep Personalized Retargeting”, https://www.slideshare.net/MeisamHejaziNia/readnet-vrbo-deep-personalized-retargeting-2
Such Deep Personalized Re-targeting is detailed in a paper and explained in a companion slide deck .
Here is an illustration of their process:
Source: Meisam HejaziNia et al., “Deep Personalized Retargeting”, https://arxiv.org/pdf/1907.02822
One problem that the developers face is that in Session-based recommenders, recommendations are provided based only on the visitor’s interactions in the current session. The goal is to propagate signals from “recent” sessions to the current one, for example by use of “cookies.”
The developers indicated that they are implementing a Hierarchical Recurrent Neural Network (HRNN) to improve their model. In short, the HRNN learns a representation embedding from “recent” sessions to inform the current one. For example, if you’re planning for a ski holiday, you have probably searched in previous recent sessions for hotels in places such as French Alps, and viewed hotels in ski areas. So, the algorithm would boost hotels in ski areas in the current session.
Here is an illustration of such architecture:
Source: Massimo Quadrana et al., “Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks”, https://arxiv.org/abs/1706.04148
Meisam HejaziNia et al., “Deep Personalized Retargeting”, https://arxiv.org/pdf/1907.02822.pdf
Meisam HejaziNia et al., “Slide deck: Deep Personalized Retargeting”, available at https://www.slideshare.net/MeisamHejaziNia/readnet-vrbo-deep-personalized-retargeting-2
Tomas Mikolov et al., “Efficient Estimation of Word Representations in Vector Space”, https://arxiv.org/abs/1301.3781
Tomas Mikolov et al., “Distributed Representations of Words and Phrases and their Compositionality”, https://arxiv.org/abs/1310.4546
Pavlos Mitsoulis-Ntompos et al., “A Simple Deep Personalized Recommendation System”, https://arxiv.org/abs/1906.11336
Pavlos Mitsoulis-Ntompos, “Build a Hotel Recommender using Amazon Personalize - No PhD Required”, slide deck from talk delivered at AWS Machine Learning Web Day, Nov 6, 2019
Massimo Quadrana et al., “Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks”, https://arxiv.org/abs/1706.04148
Read the article below and consider how to handle attacks on recommender systems.
Customer reviews have become the life-blood for countless businesses. A favorable rating on a site like <yelp.com> can translate into more business, with studies estimating that each one-star increment translates to an increase of 5 to 9 percent in revenue:
https://www.hbs.edu/faculty/Pages/item.aspx?num=41233
The lucrativeness of this impact has caused some businesses to arrange for “fake reviews” which improve their own ratings or disparage competitors. Collaborative Filtering-based recommender systems are most at risk for such attacks.
The same Harvard author subsequently noted that a sizable percentage of online reviews are classified as fake:
https://dash.harvard.edu/bitstream/handle/1/22836596/luca,zervas_fake-it-till-you-make-it.pdf
One term for this activity is “review bombing”, where a coordinated attack produces a flurry of negative reviews, adversely impacting the target. Here is an article discussing this phenomenon:
https://storyful.com/resources/blog/review-bombing/
As examples, it describes similar attacks on films such as “Captain Marvel” and on books written by Hillary Clinton and Megyn Kelly.
Another blog from a small-business owner describes the sudden appearence of hundreds of one-star reviews (none of which had any textual comments) and the difficulty of getting the social media site to remove them: https://www.fastcomet.com/blog/250-1-star-reviews-twice
Such attacks can either be the work of a direct competitor, a disgruntled ex-employee, or the collective followers of a prominent person, who may use social media influence to incite followers, as in the case of a Youtube prankster with millions of followers who was kicked out of a Florida hotel which feared that he would make prank videos during his stay:
There are various methods to detect and prevent such activity.
Yelp indicates that they have recommendation software which assesses user reviews based upon quality, reliability, and user activity. Reviews of which Yelp is suspicious are flagged as “not recommended”; the ratings associated with such reviews are not factored into the calculation of the overall score for the business.
https://www.yelp-support.com/article/What-is-Yelp-s-recommendation-software
A system could require users to enter a “CAPCHA” to prevent automated “bots” from spamming fake reviews. However, this may not prevent the entry of fake reviews via “click-farms” located overseas in low-cost regions, for which the cost is relatively inexpensive.
Similar attributes associated with the batch of reviews can be used to detect such activity. For example,
In contrast, a system in which users gradually accumulate “trust” – through evaluation by other users – can be used to ascribe veracity to a user’s rating. A system which only allows verified purchasers of a product (e.g., eBay) to submit a rating following a transaction prevents non-purchasing users from entering reviews. A two-way rating scheme (where buyers and sellers rate each other) discourages nefarious activity.