Introduction

This report looks at yelp restaurant review dataset to discover knowledge about the cuisines. We mine the dataset for a particular cuisine, to discover common/popular dishes of a particular cuisine. For this report the author mined popular dishes for Indian Cuisine.

Data Exploration

The dataset consists of yelp_academic_dataset_business.json, yelp_academic_dataset_review.json, yelp_academic_dataset_user.json, yelp_academic_dataset_checkin.json, yelp_academic_dataset_tip.json. We go through yelp_academic_dataset_business.json dataset, and filter out all the businesses that are categorized as Indian restaurants. Once we have categorized all the businesses, then we go through all the reviews for those businesses. The reviews are partitioned into positive reviews, where the stars for the reviews are greater than or equal to 3, and negative reviews, where the stars are less than or equal to 2.

Data Mining

Now for each partition, we look for all the common dishes that we had identified as part of task 3. For each dish we compute a weighted review count, both for positive and negative reviews. The weighted count is nothing but the number of times the dish is mentioned times the number of unique restaurants that are mentioned for that dish. We compute the weighted review count for both positive and negative review partitions. Only the dishes that have aggregate weight of greater than 1000 are considered for recommendation. This way we rule out the dishes that are not mentioned often, or are mentioned often, but only for few restaurants.

Data Visualization

Once we have identified the popular dishes, we plot the number of positive mentions for the dish, and the number of negative mentions for the dish.