2025-12-17

Project Overview

  • Proposed: NYC restaurant inspections + Yelp reviews
  • Goal: Integrate R + Tableau (via Rserve) for hypothesis testing & visualization
  • Pivot: California dataset due to data availability

Final Project: Challenges & Solutions

  • File Size Limits → Split files, used SSH key
  • Yelp Data Access → Used Yelp Open Dataset
  • Dataset Coverage → Pivoted to California data
  • JSON Errors → Added try/catch for ingestion
  • Outcome: Completed analysis using California resturant inspections + Yelp reviews

Introduction

California’s restaurant inspection program provides comprehensive health and safety data for food establishments across the state.

Research Question

How do regulatory compliance and customer experience intersect?

Hypothesis 1: Inspection Scores vs Yelp Ratings

Key Finding: Weak negative correlation (-0.124) - health scores don’t predict Yelp ratings.

Hypothesis 2: Critical Violations vs Ratings

Key Finding: Critical violations don’t determine customer ratings.

Hypothesis 3: Violations vs Review Volume

Key Finding: Very weak positive correlation (0.022) - negligible relationship.

Rating Distribution by Grade

Key Finding: Different grades show distinct patterns, but overlap considerably.

Heatmap: Violations vs Ratings

Key Finding: Most restaurants cluster in 3-5 violations with 3-4 star ratings.

Overall Conclusions

Health inspection metrics and Yelp ratings measure different dimensions of quality

  • ✓ Minimal relationship between health scores and customer ratings
  • ✓ Customer priorities: taste, service, ambiance > sanitation compliance
  • ✓ Information asymmetry: diners rarely check health scores
  • ✓ Rating compression: most restaurants cluster around 3-4 stars

Implications: Both perspectives are valuable but independent tools for assessing restaurant quality.

Data Sources: