Yelp Dataset Analysis

Jeremy T. Finke
11/11/2015

Do like restaurants in the same proximity get similar ratings?

What is the definition of a like restaurant?

  • Could use restaurant type
  • Ended up using price range
    • 0 = NA
    • 1 is the cheapest
    • 4 is the most expensive

What is the definition of similar rating?

  • Would like to have used a quarter of a star
  • Problem is that the data has a half a star as the smallest increment

How to cluster neighborhood?

  • Used pamk from fpc library
  • Calculates optimum cluster based on meloids
  • Used a range of 2 to 10 for all cities
City Total Restaurants Total Neighborhoods
Edinburgh, UK 4364.0 5
Karlsruhe, Germany 1785.0 2
Montreal, Canada 8496.0 2
Waterloo, Canada 865.0 10
Pittsburgh, PA 4836.5 3
Charlotte, NC 7269.5 5
Urbana-Champaign, IL 892.5 10
Phoenix, AZ 27355.0 2
Las Vegas, NV 16956.0 6
Madison, WI 3381.0 5

Las Vegas Neighborhoods lasvegas

Charlotte Neighborhoods charlotte

Edinburgh Neighborhoods edinburgh

Montreal Neighborhoods montreal

Results?

  • Nothing of statistical significance using F-Test
  • However, some interesting patterns of mean of star scores per neighborhoods in cities

meanstar