Company Background

Caffeine Form is a company creating coffee cups from recycled material. Although they started selling the products on their website last year, the results were not as good as they expected. To better enter the local market, they decided to collaborate with local coffee shops to advertise and sell their coffee mugs.

Customer Questions

The marketing team is trying to come up with the best criteria to choose possible collaborators by investigating the local market. They would like you to answer the following questions to help:

  • What is the most common type of coffee shop in this local market?
  • How do the ranges of reviews differ in delivery options?
  • In each region, which type of coffee shops have the most reviews in total?

Dataset

The dataset contains the information about coffee shops in this new market.
The dataset needs to be validated based on the description below:

Column Name Criteria
Region Character, one of 10 possible regions (A to J) where coffee shop is
located
Place name Character, name of the shop.
Place type Character, the type of coffee shop, one of “Coffee shop”, “Cafe”,“Espresso bar”, and “Others”
Rating Numeric, coffee shop rating (on a 5 point scale).
Reviews Numeric, number of reviews provided for the shop. Remove the rows if the number of reviews is missing.
Price character, price category, one of “one dollar”, “two dollar”, three dollar"
Delivery option Binary, describing whether there is a delivery option, either True or False.
Dine in option Binary, describing whether there is a dine-in option, either True or False. Replace missing values with False.
Takeout option Binary, describing whether there is a takeout option, either True or False.Replace missing values with False.

Load the Data

coffe_df <- read_csv("coffee.csv", show_col_types = F)

coffe_df %>% 
  select(- `Place name`) %>% 
  head()
## # A tibble: 6 x 8
##   Region `Place type` Rating Reviews Price `Delivery option` `Dine in option`
##   <chr>  <chr>         <dbl>   <dbl> <chr> <lgl>             <lgl>           
## 1 C      Others          4.6     206 $$    FALSE             NA              
## 2 C      Cafe            5        24 $$    FALSE             NA              
## 3 C      Coffee shop     5        11 $$    FALSE             NA              
## 4 C      Coffee shop     4.4     331 $$    FALSE             TRUE            
## 5 C      Coffee shop     5        12 $$    FALSE             TRUE            
## 6 C      Espresso bar    4.6     367 $$    FALSE             TRUE            
## # ... with 1 more variable: `Takeout option` <lgl>

Data dictionary

skim(coffe_df)
Data summary
Name coffe_df
Number of rows 200
Number of columns 9
_______________________
Column type frequency:
character 4
logical 3
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Region 0 1 1 1 0 10 0
Place name 0 1 4 60 0 187 0
Place type 0 1 4 12 0 4 0
Price 0 1 1 3 0 3 0

Variable type: logical

skim_variable n_missing complete_rate mean count
Delivery option 0 1.00 0.17 FAL: 165, TRU: 35
Dine in option 60 0.70 1.00 TRU: 140
Takeout option 56 0.72 1.00 TRU: 144

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Rating 2 0.99 4.66 0.22 3.9 4.6 4.7 4.80 5 ▁▁▃▇▆
Reviews 2 0.99 622.49 1400.90 3.0 47.5 271.5 786.25 17937 ▇▁▁▁▁

Observation

  • The data have 200 observation and 9 columns.
  • The categorical columns have no missing value.
  • The logical columns have a missing value in Dine in option and Takeout option, and base on description above those missing value will replace with FALSE statement.
  • For the numeric columns there is missing value in both columns and we will gonna drop it instead of imputation since it also small amount of missing value.
  • The Reviews columns seems highly skewed since the standard deviation is too huge and the maximum value here is too much large compare to to the 75th percentile of this column, we will find out later why is this present in the data



Data Cleaning

## replace the missing value in Dine-in option and Takeout option with False and drop the missing value in numeric columns

coffe_df <- coffe_df %>% 
  mutate(`Dine in option` = ifelse(is.na(`Dine in option`), FALSE, `Dine in option`)) %>% 
  mutate(`Takeout option` = ifelse(is.na(`Takeout option`), FALSE, `Takeout option`)) %>% 
  drop_na()
skim(coffe_df)
Data summary
Name coffe_df
Number of rows 198
Number of columns 9
_______________________
Column type frequency:
character 4
logical 3
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Region 0 1 1 1 0 10 0
Place name 0 1 4 60 0 185 0
Place type 0 1 4 12 0 4 0
Price 0 1 1 3 0 3 0

Variable type: logical

skim_variable n_missing complete_rate mean count
Delivery option 0 1 0.18 FAL: 163, TRU: 35
Dine in option 0 1 0.71 TRU: 140, FAL: 58
Takeout option 0 1 0.73 TRU: 144, FAL: 54

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Rating 0 1 4.66 0.22 3.9 4.6 4.7 4.80 5 ▁▁▃▇▆
Reviews 0 1 622.49 1400.90 3.0 47.5 271.5 786.25 17937 ▇▁▁▁▁

EDA

Now our data is clean we will perform EDA to answer the problem

coffe_df %>% 
  ggplot(aes(Rating)) + 
  geom_histogram(aes(y = ..density..), fill = "#3279a8") +
  geom_density(color = "blue") +
  theme_minimal()

Observation

  • The distribution of rating is skewed to the left that’s mean most of the data fall from the highest rating than those low.
  • The highest number of rating is 4.6.
  • The lowest number of rating is 4.2 an below.
coffe_df %>% 
  ggplot(aes(Reviews)) + 
  geom_boxplot(fill = "#3279a8") +
  theme_minimal()

The review columns have an outliers and there is a one value that is to big. Something is suspicious why we have a huge number of review but we will find out for more analysis.

Correlation Analysis

cor(coffe_df$Reviews, coffe_df$Rating)
## [1] -0.1040226
coffe_df %>% 
  ggplot(aes(Reviews, Rating)) +
  geom_point(position = "jitter") +
  geom_smooth(method = lm) +
  theme_minimal()+
  labs(title = "Rating vs. Reviews") +
  theme(plot.title = element_text(hjust = .5))
## `geom_smooth()` using formula 'y ~ x'

Observation

The two numeric variables in dataset have no relationship at all since the correlation of coefficient is too low negative.

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarize(cor = cor(Reviews, Rating))
## # A tibble: 4 x 2
##   `Place type`     cor
##   <chr>          <dbl>
## 1 Cafe         -0.286 
## 2 Coffee shop  -0.0854
## 3 Espresso bar -0.356 
## 4 Others       -0.156
coffe_df %>% 
  ggplot(aes(x = Reviews, y = Rating)) +
  geom_point() +
  geom_smooth(method = lm) +
  facet_wrap(~ `Place type`) +
  theme_minimal()+
  labs(title = "Reviews and Rating Relationship per Place Type") +
  theme(plot.title = element_text(hjust = .5))
## `geom_smooth()` using formula 'y ~ x'

Observation

  • For each type of coffee shop there is no correlation for the numeric variables Reviews and Rating.
  • The highest number of reviews seems an outlier is also from the Coffee shop of Place type.



coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(count = n()) %>% 
  arrange(- count)
## # A tibble: 4 x 2
##   `Place type` count
##   <chr>        <int>
## 1 Coffee shop     96
## 2 Cafe            57
## 3 Others          25
## 4 Espresso bar    20

What is the most common type of coffee shop in this local market?

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(count = n()) %>% 
  mutate( `Place type` = fct_reorder(`Place type`, count, .desc = T )) %>% 
  ggplot(aes(x = `Place type`, y = count, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  geom_text(aes(label = count), vjust = -.5) +
  theme_minimal()

Observation

Highest number of bars were coffee shop for the place type while the Espresso bar is the lowest.



How do the ranges of reviews differ in delivery options?

coffe_df %>% 
  ggplot(aes(x = `Delivery option` , y = Reviews, fill = `Delivery option`)) +
  geom_boxplot(show.legend = F ) +
  theme_minimal()

Observation

  • The range of reviews for have a delivery option is higher than there is no delivery option.
  • For the IQR also have a delivery option is wider than there is no delivery option and we can say that 50% of data reviews is higher in having a delivery option than no.
  • Both delivery option have an outliers but in having delivery option seems there a huge number of reviews.
  • The outlier in having a delivery option and that value is too huge compare to others, let’s assume that there may have been a typographical error here and we will drop it for more accurate analysis we will do next.



coffe_df <- coffe_df %>% 
  filter(Reviews != 17937)

coffe_df %>% 
  ggplot(aes(x = `Delivery option` , y = Reviews, fill = `Delivery option`)) +
  geom_boxplot(show.legend = F ) +
  theme_minimal()

Observation

  • Now the huge number of reviews or outlier been drop it seems that now the range having for no delivery option is more higher than have a delivery option.
  • Even do the range is higher in no delivery option still considering that having a delivery is still high on median(it’s close to mean) so having delivery option is more number of reviews in average than no delivery option.



what is the average reviews per Place type ?

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(AvgReviews = mean(Reviews)) %>% 
  arrange( - AvgReviews)
## # A tibble: 4 x 2
##   `Place type` AvgReviews
##   <chr>             <dbl>
## 1 Coffee shop        557.
## 2 Cafe               533.
## 3 Espresso bar       526.
## 4 Others             462.
coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(AvgReviews = mean(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgReviews, .desc = T)) %>% 
  ggplot(aes(`Place type`, AvgReviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
   geom_text(aes(label = round(AvgReviews, 2)), vjust = -.5) +
  labs(
    y = "Average Reviews"
  ) +
  theme_minimal()

Observation

The highest bar of Average review is the Coffee shop in place type while the lowest bar is others.



what is the average Ratings per Place type ?

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  arrange( - AvgRating)
## # A tibble: 4 x 2
##   `Place type` AvgRating
##   <chr>            <dbl>
## 1 Others            4.72
## 2 Espresso bar      4.69
## 3 Coffee shop       4.68
## 4 Cafe              4.60
coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>%
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(`Place type`, AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
   geom_text(aes(label = round(AvgRating, 2)), vjust = -.5) +
  labs(
    y = "Average Rating"
  ) +
  theme_minimal()

Observation

The highest column of average rating is others in Place Type while the lowest is Cafe.



In each region, which type of coffee shops have the most reviews in total?

coffe_df %>%
  filter(Region == "A") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region A",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of coffee shop have a highest total reviews in type of coffee shop in Region A.
  • the lowest total reviews is others in type of coffee shop in this region.


coffe_df %>%
  filter(Region == "B") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region B",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of cafe have a highest total reviews in type of coffee shop in Region B.
  • the lowest total reviews is others in type of coffee shop in this Region.


coffe_df %>%
  filter(Region == "C") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region C",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of cafe have a highest total reviews in type of coffee shop in Region C.
  • the lowest total reviews is others in type of coffee shop in this Region.


coffe_df %>%
  filter(Region == "D") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region D",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of cafe have a highest total reviews in type of coffee shop in Region D.
  • the lowest total reviews is Espresso bar in type of coffee shop in this Region.


coffe_df %>%
  filter(Region == "E") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region E",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of Coffee shop have a highest total reviews in type of coffee shop in Region E.
  • the lowest total reviews is others in type of coffee shop in this Region.


coffe_df %>%
  filter(Region == "F") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region F",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of Coffee shop have a highest total reviews in type of coffee shop in Region F.
  • the lowest total reviews is Espresso bar in type of coffee shop in this Region.


coffe_df %>%
  filter(Region == "G") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region G",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of Espresso bar have a highest total reviews in type of coffee shop in Region G.
  • the lowest total reviews is others in type of coffee shop in this Region.


coffe_df %>%
  filter(Region == "H") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region H",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of cafe have a highest total reviews in type of coffee shop in Region H.
  • the lowest total reviews is coffee shop in type of coffee shop in this Region.


coffe_df %>%
  filter(Region == "I") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region I",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of Coffee shop have a highest total reviews in type of coffee shop in Region I.
  • the lowest total reviews is cafe in type of coffee shop in this Region.


coffe_df %>%
  filter(Region == "J") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region J",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

  • The bar of Coffee shop have a highest total reviews in type of coffee shop in Region G.
  • the lowest total reviews is Espresso bar in type of coffee shop in this Region.


For each region, what is average rating per Place type ?

coffe_df %>%
  filter(Region == "A") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region A",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of others.


coffe_df %>%
  filter(Region == "B") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region B",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of others.


coffe_df %>%
  filter(Region == "C") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region C",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Coffee shop.


coffe_df %>%
  filter(Region == "D") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region D",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Espresso bar.


coffe_df %>%
  filter(Region == "E") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region E",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Cafe.


coffe_df %>%
  filter(Region == "F") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region F",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Cafe.


coffe_df %>%
  filter(Region == "G") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region G",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of others.


coffe_df %>%
  filter(Region == "H") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region H",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of others.


coffe_df %>%
  filter(Region == "I") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region I",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Coffee shop.


coffe_df %>%
  filter(Region == "J") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region J",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Espresso bar.


Other key finding

  1. The average review per place type/type of coffee shop is almost same.
  2. The average rating per place type/type of coffee shop is all above 4.6.
  3. Some type of coffee shop have no review at all for some Region.
  4. For each Region Coffee shop type coffee shop have almost have a highest number of total reviews follow by cafe and Espresso bar.
  5. For every Region it seem the other type of coffee shop have almost low number of total reviews.
  6. Some type of coffee shop have no rating at all for some Region.
  7. Almost each region there average rating per type of coffee shop is greater than 4.5.
  8. The other type of coffee shop have appeared to be some highest and lowest rating for each Region.