33880719 Assingment Part A
1 Part 1: Technical Validation
1.0.1 Show first 6 rows of data set
1.0.2 Nuber of rows
[1] 7239
1.0.3 Number of variables
[1] 22
1.0.4 Mean star rating
[1] 3.61
1.0.5 Mean number of 5 star reviews
[1] 15.35
2 Part 2: Insights Report
2.1 Overall shape of Yelp ratings
2.1.1 Five number summary
0% 25% 50% 75% 100%
1.0 3.0 4.0 4.5 5.0
2.1.2 Histogram
2.1.3 Skewness and Kurtosis
skewness kurtosis
-0.530 2.549
2.1.4 Shape analysis
The distribution of business ratings has a five-number summary of 1, 3, 4, 4.5, and 5, showing that the majority of businesses fall in the 3–4.5 star range, as a result of the yelp reviews.The histogram is also shown to be unimodal, meaning it has a clear peak between 3.5-4 star bin, with the most of businesses falling within this range. The skewness of –0.530 indicates a slight negative skew, with a longer tail towards the lower rating, but overall the distribution is fairly symmetrical. Whilst, The kurtosis of 2.55 suggests a moderate peak similar to a normal distribution, the shape is not exactly bell curved in nature.
A ceiling effect is clearly visible with many businesses achieving the maximum 5 stars, and only very few businesses receiving a value of 1 star. Which as a result indicates that the distribution is not “normal-looking” in practice, rather it reflects the positive bias of most business achieving a favourable rating.
2.2 Two-state market comparison: AZ vs CA
2.2.1 Prepare two State data of AZ and CA
2.2.2 Table: Percentage by star rating
stars | AZ | CA |
---|---|---|
5.0 | 10.70 | 28.82 |
4.5 | 14.20 | 19.44 |
4.0 | 21.60 | 22.57 |
3.5 | 19.75 | 14.24 |
3.0 | 14.20 | 4.51 |
2.5 | 9.47 | 3.82 |
2.0 | 5.56 | 4.51 |
1.5 | 2.88 | 1.04 |
1.0 | 1.65 | 1.04 |
2.2.3 Mean reviews by star rating
2.2.4 Two state plot: AZ vs CA star rating
2.2.5 Comparison between AZ and CA
When comparing AZ to CA we can see that CA has a greater proportion of its business achieving higher star ratings, with 28.82% of CA businesses achieving a 5 star rating compared to just 10.70% for AZ. However, the 5 star mean rating only had mean review counts per business of 15.63 for CA and 13.25 AZ, its clear that businesses with a 5 star mean rating may not be as established due to their fewer average review numbers.In comparison, the 4 star businesses for CA and AZ received mean reviews of 98.51 for CA and AZ 57.72, indicating a far greater average review count. In general CA businesses received higher mean review count than AZ, which suggest CA businesses are more established and more commonly reviewed by Yelp users, adding more weight to CA’s business scores.
2.3 Rating distribution: open vs closed restaurants
2.3.1 Number of restaurants
[1] 2377
2.3.2 Side by side percentage bar plot
2.3.3 Table: Key statistics open vs closed
2.3.4 Analysis of Open vs Closed star rating
The side-by-side chart shows how open vs closed restaurants are distributed across star ratings. Using the numerical table above, currently open restaurants have a mean rating of 3.49 which compares closely to currently closed restuarants with a mean rating of 3.50. This means there is little statistical difference in regard to mean rating between currently open or closed business.
Surprisingly, a larger share of open restaurants are below 3 stars (21.07%) compared to closed ones (16.26%), which is counter intuitive as you may expect businesses with poor yelp review ratings to close.
These results suggests that yelp reviews have little correlation with the open or closure of a business and rather that closure is likely to depend on other factors such as location, finances or competition.
2.4 Weekend opening effect
2.4.1 Table: Weekend Open vs Weekend Closed
2.4.2 Comparison between open vs closed on weekends
Restaurants that are open on weekends have a lower mean rating (3.52) compared to those who are closed on weekends (4.14). Whilst this shows that closure on weekends could result in greater mean star rating, the number of business in this category with just 48 business in comparison to the number of open weekend businesses at 1964. The small sample of 48 comparatively, indicates the high average could be due to chance or unrepresentative businesses.
These businesses may be niche or specialized restaurants such as fine dining which may have a specific reason to be closed on the weekends. In contrast, the much larger group of weekend open restaurants likely captures a broad mix of venues such as fast food which could bring the mean rating lower.
This suggests that while weekend opening is common, it does not necessarily translate into higher star ratings, and other factors such as business type and market niche may be more important.
3 Appendix
Generative AI prompts used on. chatgpt.com
How can i troubleshoot this “Error in install.packages : Updating loaded packages error” - in r coding
What does echo: false/ true do again. in r code chunks
How to round mean rating values in r code
Explain how to filter data by a value
How to format a histogram in r code
Show how to rangle data set by proportion of mean < 3