Data Science Capstone

Yelp & Smoking

Eric Dolan

Intro

This is a study of popularly on smoking business by looking into the number of reviews(talking points) on yelp data set challenge and understand the economic impact and very real economic trade-offs we should consider when voting on a smoking ban.

I'll hypothesize that a smoking ban does create a real economic impact on the businesses we analyze.

The yelp data can be find on

yelp_dataset_challenge_academic_dataset.zip

More on Yelp Dataset Challenge and the details of the data structure can be found on

http://www.yelp.com/dataset_challenge

In Summary: Analyzing # of reviews to understand economic trade offs when considering a smoking ban. I hypothesize that it will have an measurable effect on a microscale, but is a net benefit on the macro scale.

Background

From the data set provide by Yelp, there are:

  • a total of 61164 businesses
  • only 2754 of businesses that are with the attribute smoking(both allowing indoor and outdoor)
  • a total of 1569264 reviews and from the period of Oct 2004 to Jan 2015
  • only 144080 reviews for the 2754 smoking businesses.

It is surprisingly high for only 2754 businesses.

Exploratory analysis/ Methods

After understanding the data, I converted the data into a Time Series Object

  • only 2006 to 2014 data are use because they consist of full year data. The final transformed data will look like in fig.1
##  Time-Series [1:108] from 2006 to 2015: 88 35 23 43 27 31 44 63 73 98 ...

fig.1 - Final transformed data

Next, we move on to build a Seasonal Decomposition of Time Series by Loess model

Modelling & Methodology

After a few round of tuning and experiment with different settings. We are able to produce a final model for forecasting. We analyzed the Yelp Data set, concentrated our analysis on "Total Review" and applied time series to display in our graph

plot of chunk unnamed-chunk-2 fig.2 - Final model plot

Forecast/Results

The 2 years forecast result in fig.3, show that it remains in same zone as of today. This means there is an economic impact based upon our analysis. Smoking bans seem to have a real and measure economic effect on these yelp businesses.

## Error in library(forecast): there is no package called 'forecast'
## Error in eval(expr, envir, enclos): could not find function "forecast"
## Error in plot(fcast, main = ""): object 'fcast' not found

fig.3 - Forecast result

Forecast Chart

our chart

Discussion

So as hypothesized, there is a real economic impact associated with a smoking ban and shows an apparent relationship.

Our methodology is well documented and resulted in clear plotting of the data we collected, analyzed and presented.

Although this can hurt businesses on a micro level, I would suspect there is a net benefit on the macro scale.

MOVING FORWARD: This data would be presentable at a state/municiple level to implement a tax incentive plan for those that implement a smoke free establishment. This would enable a macro health benefit while supplemening lost economic gains on the micro level.