Does Wifi matters? A yelp data study (Capstone project)

LI SUN
2015-11

Data Source

Data is http://www.yelp.com/dataset_challenge

  • Jsonlite package used to stream in data to r
  • Only business and review data sets used
  • Data merge and clean
  • R code is in git hub ()

Specify Question

So do people care about wifi? What are they talking about wifi? plot of chunk unnamed-chunk-1 plot of chunk unnamed-chunk-1 plot of chunk unnamed-chunk-1

People actually don't care about wifi as I do:) So, should a restuarant owner provide wife?

Exploratory Data Analysis

In this stage, I convert all variables to appropriate class and choose some variables by common sense. Also I check the correlations among variables and try to make sure it is almost full rank, by removing some highly correlated variables

plot of chunk unnamed-chunk-2

Exploratory Modeling

In this stage, I start with full model or least square linear regression by lm function. For several dfferent models

              Rsqr     sigma vif.ave_pos
original 0.6047648 0.6467814    9.264232
weighted 0.6484257 1.5147396   15.608399
centered 0.6047648 0.6286774    9.264232
W and C  0.6484257 1.4723406   15.608399

So as you see, the full data can explain almost 65% or variation in ratings. Which is good and weighted model has significantly higher R squared. So we will use weighted model.

Final Model

Finally, lasso was used to select vaiables coefficient plot

So those variables are left by using lambda.1se in glmnet package.

Take Home Message, free wifi is a tiny plus but paid wifi is a big no!