Yelp - Business Recommendations

Gautam Makani
11-22-2015

Introduction

  • Problem: Implement a recommendation engine to offer personalized restaurant recommendations
  • build out a “good” recommendation algorithm
  • test out the recommendation algorithm prediction
  • understand what it would take to improve the algorithm (for a future phase)

Methods

  • Exploratory data analysis to understand the problem domain
  • creating a random sample of 100 users that the recommendation method operates on
  • extracting restaurants from the business data
  • for each user - create a user profile, to capture the average review, businesses reviewed and location of the user
  • for each user - recommend top (3) highest rated restaurants
  • test recommendation by checking if has reviewed businesses in the recommended list (success factor)

Results - Recommendation output - 3 restaurants for each user

   user_name                       name rec_matches
1        Roy                    Sunfare           1
2        Roy  La Parilla Villa Catering           1
3        Roy VIP Dinner At Fogo De Chao           1
4        Kim               Port of Subs           1
5        Kim      La Playita Restaurant           1
6        Kim             The Paiza Club           1
7    Frances                    Sunfare           0
8    Frances  La Parilla Villa Catering           0
9    Frances VIP Dinner At Fogo De Chao           0
10     Kelly               Port of Subs           1
..       ...                        ...         ...

Results Histogram of matching recommendations

Recommendation Histogram

Of the 100 random sample users, 52 had a match within the recommended restaurants.

 summary(rt)
   Mode   FALSE    TRUE    NA's 
logical      48      52       0 

Discussion

This type of algorighm is ideally tested with A/B mechanism and then the efficacy of the algorithm is determined. Lacking that type of interactive response, I devised a “good” algorithm.

  • With over 50% of the random sample users actually visited a recommended resturant => “good” success

Actual trial, the success percentage higher due to:

  • incentive to visit the target restaurant
  • potential to visit the restaurant outside of this data gathering timeline