Introduction

This report looks at yelp restaurant review dataset to discover knowledge about the cuisines. We mine the dataset for a particular cuisine, to discover common/popular dishes of a particular cuisine. For this report the author mined popular dishes for Indian Cuisine. Once we have knowledge about the popular dishes, we mine the dataset to gather knowledge about restaurants that serve those dishes. In this particular report we only mined the dataset to determine popular restarurants for the most popular indian dish, chicken tikka, which was found using data mining in task 4.

Data Exploration

The dataset consists of yelp_academic_dataset_business.json, yelp_academic_dataset_review.json, yelp_academic_dataset_user.json, yelp_academic_dataset_checkin.json, yelp_academic_dataset_tip.json. We go through yelp_academic_dataset_business.json dataset, and filter out all the businesses that are categorized as Indian restaurants. Once we have categorized all the businesses, then we go through all the reviews for those businesses. The reviews are partitioned into positive reviews, where the stars for the reviews are greater than or equal to 3, and negative reviews, where the stars are less than or equal to 2. Using these dataset we find the most popular dishes, by looking at the weighted review count of the dishes discovered in task 3. We find the most popular dish, by picking the dish that has the highest weighted review count, which in this case happens to be chicken tikka.

Data Mining

Once we have identified the most popular dish, we go back and look at all the positive and negative reviews for all the restaurants that serve that dish. Once we have that data, we can plot a bar plot of all these restaurants with their positive and negative review counts, which gives the user a good idea of which restaurant to pick when he/she desires to have that dish.

Data Visualization

We visualize positive and negative reviews for all the restaurants that serve that dish, using bar plot.