ABSTRACT

This research used Pennsylvania State’s Yelp review data to examine the business categories and popular reviews in PA. We used Microsoft Excel and R as tools in dealing with the big raw dataset and plotting, qualitative and quantitative analysis applied in presenting the result for business starters. With our result in data visualization, we find literatures, news, and reports that helped us out for the factors that are directly associated with concentration or clusters of different types of business located in the different cities of Pennsylvania state. The representation of different data visualization plots helps us to understand and analyze the business and entrepreneurial ecosystem of different cities of Pennsylvania.

Keywords: Yelp data, Reviews, entrepreneurial ecosystem

INTRODUCTION

In today’s scenario of globalization and digitization, mobile applications play a vital role in the growth and successful operations of plethora of business sectors. In the same way, it provides an efficient platform to the clients to leverage its full potential by providing reviews and rating the businesses that ultimately help other clients, business partners , small business owners and entrepreneurs to make data-driven decisions and choose their preferred business sector, city, neighborhood wisely. Among all the applications, Yelp is one of the most widely used, and it contains tons of original data. Yelp develops, hosts and markets Yelp.com and the Yelp mobile app, which publish crowd-sourced reviews about local businesses [3]. Yelp users have contributed approximately 142 million cumulative reviews of almost every type of local business, from restaurants, boutiques and salons to dentists, mechanics, plumbers and more. The information these reviews provide is valuable for consumers and businesses alike [4]. “Frequency of Yelp.com usage and perceived influence of Yelp.com reviews were also positively related to the majority of motives” [2]. Yelp generated clients’ big data every year and looking into Yelp’s dataset, analyzing outcome of data visualization, we are able to see the business secret of American cities. As HU located in PA, we used PA’s yelp reviews.

DATA

The raw data we retrieved from https://www.kaggle.com/yelp-dataset/yelp-dataset/data. This data is an authorized dataset from official Yelp website for research use. The raw dataset includes (10) different types of data variables are as follows: (1) Business Category (2) State (3) Neighborhood (4) City (5) Zip Code (6) Stars (7) Review Count (8) Is Open (9) Latitude (10) Longitude. Here, the business category refers to the different types of business operates in the Pennsylvania state such as restaurants, shopping centers, Hotel & Travel, Salon & Spa, Health Services, Pet services, Real Estate, Entertainment centers, Fitness centers, Home services. Furthermore, the Stars refers to the rating given to the different types of business category by the customers. Similarly, the review counts refer to the number of reviews received by the business categories in operates in different cities of Pennsylvania.

METHODOLOGY

“Exploratory data analysis is a process of sifting through data in search of interesting information or patterns” [1]. Having said that, we performed all our data cleaning activities on Microsoft Excel and simultaneously utilize the R to obtain our tidy dataset. There are two parts to our Data Cleaning approach are as follows: Firstly, we first removed data we are not interested in to keep the dataset size manageable and afterwards cleaned the messy & noisy data. As we were interested in user data such as City, Business Category, Stars Review counts, Longitude & Latitude but not their social information, we deleted information about Check Ins, Business_id, Address & Name. Secondly, we consolidated many business categories into one single unit of business category for example we combine hospitals and health centers under single variable named as Health Services. We adopted similar approach in other business category too. Therefore, we came up with 10 important business categories that operates in different cities of Pennsylvania and eventually generates lot of revenues, spur and unleash the entrepreneurial & research activities, foster the job creation which ultimately contributes significantly in the economic as well as social growth and development of the Pennsylvania state. This paper seeks to answer the following question:

. What are the popular business centers operates in Pennsylvania State?

. Where are the most business centers are located in Pennsylvania?

. What factors play a vital role in setting up a business venture?

. What is the relationship (if any) between stars rating, reviews count with business categories?

The answer to the above question has the potential to utilize the visualization plots and dataset in improving the experience of customer as well as formulated the future growth roadmap and strategy of associated business categories. In this research study, we made an attempt to answer the above questions with the available Yelp dataset.

VISUALIZATION PLOTS & FINDINGS

## Warning: package 'readxl' was built under R version 3.4.3
## Warning: package 'ggplot2' was built under R version 3.4.3

# Findings:

. (8011) Business centers were identified across all 10 business categories.

. (3647) Restaurant is the most preferred business with highest number of counts followed by shopping centers (1488).

. (162) Real Estate is the least preferred business in Pennsylvania.

# Findings:

. The Average Star Rating across all the sectors stands at (3.5). . The Entertainment sector receives highest average star rating of (3.7), whereas Shopping Centers receives the star rating below (0.5).

# Findings:

. Reviews play a very important role in influencing setting-up a business and making business decision. . Restaurants accounts for highest number of review counts.

## Warning: package 'ggmap' was built under R version 3.4.3
## Using zoom = 8...
## Map from URL : http://tile.stamen.com/toner-lite/8/70/95.png
## Map from URL : http://tile.stamen.com/toner-lite/8/71/95.png
## Map from URL : http://tile.stamen.com/toner-lite/8/72/95.png
## Map from URL : http://tile.stamen.com/toner-lite/8/73/95.png
## Map from URL : http://tile.stamen.com/toner-lite/8/74/95.png
## Map from URL : http://tile.stamen.com/toner-lite/8/70/96.png
## Map from URL : http://tile.stamen.com/toner-lite/8/71/96.png
## Map from URL : http://tile.stamen.com/toner-lite/8/72/96.png
## Map from URL : http://tile.stamen.com/toner-lite/8/73/96.png
## Map from URL : http://tile.stamen.com/toner-lite/8/74/96.png
## Map from URL : http://tile.stamen.com/toner-lite/8/70/97.png
## Map from URL : http://tile.stamen.com/toner-lite/8/71/97.png
## Map from URL : http://tile.stamen.com/toner-lite/8/72/97.png
## Map from URL : http://tile.stamen.com/toner-lite/8/73/97.png
## Map from URL : http://tile.stamen.com/toner-lite/8/74/97.png

# Findings:

. From Plot 4, we found that Pittsburgh accounts for the highest number (5044) of business centers, whereas Canonsburg (128) comprises lowest number of business centers.

. From the above two plots (Plot 4 & 5), it is now quite clear that Pittsburgh is the epicenter of business activities in Pennsylvania State. We also found that the Restaurants are the main drivers for business operations in Pittsburgh. But through these above two plots, we noticed that one of the major city of Pennsylvania i.e. Philadelphia which is also widely known for number of corporate giants is missing from dataset. Yelp Dataset unable to capture the Philadelphia data.

. After that we zeroed in on Pittsburgh and performed extensive cleaning of dataset in order to know identify top 5 neighborhoods acts as a growth engine for Pittsburgh City. The Plot 6 highlights the distrbution of business centers in top 5 neighborhoods of Pittsburgh City.

. Finally, we also plotted a map (Plot 7) that describes the dispersion of Restaurants centers located in Pittsburgh city. Restaurants accounts major chunk of all the business operations.

## maptype = "toner" is only available with source = "stamen".
## resetting to source = "stamen"...
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Pittsburgh&zoom=10&size=640x640&scale=2&maptype=terrain&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Pittsburgh&sensor=false
## Map from URL : http://tile.stamen.com/toner/10/283/384.png
## Map from URL : http://tile.stamen.com/toner/10/284/384.png
## Map from URL : http://tile.stamen.com/toner/10/285/384.png
## Map from URL : http://tile.stamen.com/toner/10/283/385.png
## Map from URL : http://tile.stamen.com/toner/10/284/385.png
## Map from URL : http://tile.stamen.com/toner/10/285/385.png
## Map from URL : http://tile.stamen.com/toner/10/283/386.png
## Map from URL : http://tile.stamen.com/toner/10/284/386.png
## Map from URL : http://tile.stamen.com/toner/10/285/386.png
## Map from URL : http://tile.stamen.com/toner/10/283/387.png
## Map from URL : http://tile.stamen.com/toner/10/284/387.png
## Map from URL : http://tile.stamen.com/toner/10/285/387.png
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
## Warning: Removed 28 rows containing missing values (geom_point).

CONCLUSION

Our research study that purely relies on Yelp Dataset help us to perform the data visualization analysis through creation of seven (7) useful plots that ultimately concludes that the Pittsburgh is the hub for the business operations and entrepreneurial activities in Pennsylvania State. Moreover, Pittsburgh is a business-friendly city for especially restaurant business. In addition, statistics from government showed the economic environment (represented by Labor force condition) in Pittsburgh is in an up-trend in the next few years. Furthermore, our data visualization could cater the needs of budding entrepreneurs, small business owners to identify the best spot to launch their business associated with restaurants, health services, Hotel & Travel, fitness centers etc. Similarly, it will be equally helpful for the venture capitals, angel investors to fund the businesses wisely and mitigate their risk and increase their profitability chances based on the location and feasibility of business in Pennsylvania State.

LIMITATIONS

One of the major bottleneck was the presence of unwarranted data and missing values in the dataset that doesn’t create any value in our research and findings. We eliminate these data, but it raises the questions of fairness, transparency, accountability and biasedness. These data may have some hidden information that could lead to generate different findings and visualization plots. We encountered with above shortcomings that we hope to mitigate in our future research work.

REFERENCES

  1. Derthick, M., Kolojejchick, J., & Roth, S. F. (1997, August). An Interactive Visualization Environment for Data Exploration. In KDD (pp. 2-9).
  2. Hicks, A., Horovitz, J., Hovarter, M., Miki, M., & Bevan, J. L. (2012). Why people use Yelp. com: An exploration of uses and gratifications. Computers in Human Behavior, 28(6), 2274-2279.
  3. Yelp (n.d.). In Wikipedia. Retrieved October 14, 2009, from https://en.wikipedia.org/wiki/Yelp
  4. Yelp Statistics, retrieved from: https://yelpinc.gcs-web.com, May 9, 2017