Insightful advertisement based on the yelp dataset

Weijia Lu
Nov.2015

Problem statement




doctor



Doctor Jerry is planning to open a new clinic. He has to figure out a bunch of problem.

  • Should I setup a large waiting room?
  • What kind of office staff I need?
  • Who will come to my clinic?
  • Should I highlight this/that feature in advertisement?

A way out

  • Red rectangle = data from yelp
  • Green rectangle = output of our algorithm

Demonstration - Find insightful feature phrase

C0 = potential explicit opinion in noun

N1-N4 = prefix, P1-P5 = postfixs

e.g.: The[DT] billing[NN] is[VBZ] absolutely[RB] horrible[JJ] for rule 5

  • Aggregate business buddy and mark insightful feature phrase from relevant reviews (see right figure and notation)

Business buddy shares the same category

Insightful features is explicit opinion found by rule

Now Dr. Jerry know staff and office are important

Demonstration - Find potential user

  • Down select business buddy into business neighbor based on attributes by clustering)

Clean NA in business buddy -> K-means

Plot total within groups sums of squares against the number of clusters center (see up figure), to select the sum of cluster center (=45)

  • Heuristic find candidate users

Find relevant reviews of business neighbor, select those submitted in this year.

Find users who compose those reviews

Find those users who composed more than one review

Discussion and conclusion

  • Here present a method for Dr. Jerry to generate insightful advertisement, and find potential user to deliver this advertisement.
  • During exploration of the key feature phrase, we use association mining to find syntactic structure.
  • The training set for association mining is aggregated by sampling explicit opinion statement.
  • It is quite difficult to find and characterize the implicit statement. That’s the reason we use association mining instead of the classifier.
  • The size of training set is limited, since mark the sample is tedious and arduous.
  • For finding the potential user, we use k-means clustering to down select business based on the features.
  • We can’t find good quantification in the raw dataset to support profiling of user, so we use a heuristic method during find the potential user.
  • END