Case Study Project - Coffee Shops

Company Background

Caffeine Form is a company creating coffee cups from recycled material. Although they started selling the products on their website last year, the results were not as good as they expected. To better enter the local market, they decided to collaborate with local coffee shops to advertise and sell their coffee mugs.

Customer Questions

The marketing team is trying to come up with the best criteria to choose possible collaborators by investigating the local market. They would like you to answer the following questions to help:

What is the most common type of coffee shop in this local market?
How do the ranges of reviews differ in delivery options?
In each region, which type of coffee shops have the most reviews in total?

Dataset

The dataset contains the information about coffee shops in this new market.
The dataset needs to be validated based on the description below:

Column Name	Criteria
Region	Character, one of 10 possible regions (A to J) where coffee shop is
located
Place name	Character, name of the shop.
Place type	Character, the type of coffee shop, one of “Coffee shop”, “Cafe”,“Espresso bar”, and “Others”
Rating	Numeric, coffee shop rating (on a 5 point scale).
Reviews	Numeric, number of reviews provided for the shop. Remove the rows if the number of reviews is missing.
Price	character, price category, one of “one dollar”, “two dollar”, three dollar"
Delivery option	Binary, describing whether there is a delivery option, either True or False.
Dine in option	Binary, describing whether there is a dine-in option, either True or False. Replace missing values with False.
Takeout option	Binary, describing whether there is a takeout option, either True or False.Replace missing values with False.

Load the Data

coffe_df <- read_csv("coffee.csv", show_col_types = F)

coffe_df %>% 
  select(- `Place name`) %>% 
  head()

## # A tibble: 6 x 8
##   Region `Place type` Rating Reviews Price `Delivery option` `Dine in option`
##   <chr>  <chr>         <dbl>   <dbl> <chr> <lgl>             <lgl>           
## 1 C      Others          4.6     206 $$    FALSE             NA              
## 2 C      Cafe            5        24 $$    FALSE             NA              
## 3 C      Coffee shop     5        11 $$    FALSE             NA              
## 4 C      Coffee shop     4.4     331 $$    FALSE             TRUE            
## 5 C      Coffee shop     5        12 $$    FALSE             TRUE            
## 6 C      Espresso bar    4.6     367 $$    FALSE             TRUE            
## # ... with 1 more variable: `Takeout option` <lgl>

Data dictionary

skim(coffe_df)

Data summary
Name	coffe_df
Number of rows	200
Number of columns	9
_______________________
Column type frequency:
character	4
logical	3
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
Region	1	1	1	10
Place name	1	4	60	187
Place type	1	4	12	4
Price	1	1	3	3

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
Delivery option	0	1.00	0.17	FAL: 165, TRU: 35
Dine in option	60	0.70	1.00	TRU: 140
Takeout option	56	0.72	1.00	TRU: 144

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Rating	2	0.99	4.66	0.22	3.9	4.6	4.7	4.80	5	▁▁▃▇▆
Reviews	2	0.99	622.49	1400.90	3.0	47.5	271.5	786.25	17937	▇▁▁▁▁

Observation

The data have 200 observation and 9 columns.
The categorical columns have no missing value.
The logical columns have a missing value in Dine in option and Takeout option, and base on description above those missing value will replace with FALSE statement.
For the numeric columns there is missing value in both columns and we will gonna drop it instead of imputation since it also small amount of missing value.
The Reviews columns seems highly skewed since the standard deviation is too huge and the maximum value here is too much large compare to to the 75th percentile of this column, we will find out later why is this present in the data

Data Cleaning

## replace the missing value in Dine-in option and Takeout option with False and drop the missing value in numeric columns

coffe_df <- coffe_df %>% 
  mutate(`Dine in option` = ifelse(is.na(`Dine in option`), FALSE, `Dine in option`)) %>% 
  mutate(`Takeout option` = ifelse(is.na(`Takeout option`), FALSE, `Takeout option`)) %>% 
  drop_na()
skim(coffe_df)

Data summary
Name	coffe_df
Number of rows	198
Number of columns	9
_______________________
Column type frequency:
character	4
logical	3
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
Region	1	1	1	10
Place name	1	4	60	185
Place type	1	4	12	4
Price	1	1	3	3

Variable type: logical

skim_variable	complete_rate	mean	count
Delivery option	1	0.18	FAL: 163, TRU: 35
Dine in option	1	0.71	TRU: 140, FAL: 58
Takeout option	1	0.73	TRU: 144, FAL: 54

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Rating	0	1	4.66	0.22	3.9	4.6	4.7	4.80	5	▁▁▃▇▆
Reviews	0	1	622.49	1400.90	3.0	47.5	271.5	786.25	17937	▇▁▁▁▁

EDA

Now our data is clean we will perform EDA to answer the problem

coffe_df %>% 
  ggplot(aes(Rating)) + 
  geom_histogram(aes(y = ..density..), fill = "#3279a8") +
  geom_density(color = "blue") +
  theme_minimal()

Observation

The distribution of rating is skewed to the left that’s mean most of the data fall from the highest rating than those low.
The highest number of rating is 4.6.
The lowest number of rating is 4.2 an below.

coffe_df %>% 
  ggplot(aes(Reviews)) + 
  geom_boxplot(fill = "#3279a8") +
  theme_minimal()

The review columns have an outliers and there is a one value that is to big. Something is suspicious why we have a huge number of review but we will find out for more analysis.

Correlation Analysis

cor(coffe_df$Reviews, coffe_df$Rating)

## [1] -0.1040226

coffe_df %>% 
  ggplot(aes(Reviews, Rating)) +
  geom_point(position = "jitter") +
  geom_smooth(method = lm) +
  theme_minimal()+
  labs(title = "Rating vs. Reviews") +
  theme(plot.title = element_text(hjust = .5))

## `geom_smooth()` using formula 'y ~ x'

Observation

The two numeric variables in dataset have no relationship at all since the correlation of coefficient is too low negative.

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarize(cor = cor(Reviews, Rating))

## # A tibble: 4 x 2
##   `Place type`     cor
##   <chr>          <dbl>
## 1 Cafe         -0.286 
## 2 Coffee shop  -0.0854
## 3 Espresso bar -0.356 
## 4 Others       -0.156

coffe_df %>% 
  ggplot(aes(x = Reviews, y = Rating)) +
  geom_point() +
  geom_smooth(method = lm) +
  facet_wrap(~ `Place type`) +
  theme_minimal()+
  labs(title = "Reviews and Rating Relationship per Place Type") +
  theme(plot.title = element_text(hjust = .5))

## `geom_smooth()` using formula 'y ~ x'

Observation

For each type of coffee shop there is no correlation for the numeric variables Reviews and Rating.
The highest number of reviews seems an outlier is also from the Coffee shop of Place type.

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(count = n()) %>% 
  arrange(- count)

## # A tibble: 4 x 2
##   `Place type` count
##   <chr>        <int>
## 1 Coffee shop     96
## 2 Cafe            57
## 3 Others          25
## 4 Espresso bar    20

What is the most common type of coffee shop in this local market?

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(count = n()) %>% 
  mutate( `Place type` = fct_reorder(`Place type`, count, .desc = T )) %>% 
  ggplot(aes(x = `Place type`, y = count, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  geom_text(aes(label = count), vjust = -.5) +
  theme_minimal()

Observation

Highest number of bars were coffee shop for the place type while the Espresso bar is the lowest.

How do the ranges of reviews differ in delivery options?

coffe_df %>% 
  ggplot(aes(x = `Delivery option` , y = Reviews, fill = `Delivery option`)) +
  geom_boxplot(show.legend = F ) +
  theme_minimal()

Observation

The range of reviews for have a delivery option is higher than there is no delivery option.
For the IQR also have a delivery option is wider than there is no delivery option and we can say that 50% of data reviews is higher in having a delivery option than no.
Both delivery option have an outliers but in having delivery option seems there a huge number of reviews.
The outlier in having a delivery option and that value is too huge compare to others, let’s assume that there may have been a typographical error here and we will drop it for more accurate analysis we will do next.

coffe_df <- coffe_df %>% 
  filter(Reviews != 17937)

coffe_df %>% 
  ggplot(aes(x = `Delivery option` , y = Reviews, fill = `Delivery option`)) +
  geom_boxplot(show.legend = F ) +
  theme_minimal()

Observation

Now the huge number of reviews or outlier been drop it seems that now the range having for no delivery option is more higher than have a delivery option.
Even do the range is higher in no delivery option still considering that having a delivery is still high on median(it’s close to mean) so having delivery option is more number of reviews in average than no delivery option.

what is the average reviews per Place type ?

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(AvgReviews = mean(Reviews)) %>% 
  arrange( - AvgReviews)

## # A tibble: 4 x 2
##   `Place type` AvgReviews
##   <chr>             <dbl>
## 1 Coffee shop        557.
## 2 Cafe               533.
## 3 Espresso bar       526.
## 4 Others             462.

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(AvgReviews = mean(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgReviews, .desc = T)) %>% 
  ggplot(aes(`Place type`, AvgReviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
   geom_text(aes(label = round(AvgReviews, 2)), vjust = -.5) +
  labs(
    y = "Average Reviews"
  ) +
  theme_minimal()

Observation

The highest bar of Average review is the Coffee shop in place type while the lowest bar is others.

what is the average Ratings per Place type ?

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  arrange( - AvgRating)

## # A tibble: 4 x 2
##   `Place type` AvgRating
##   <chr>            <dbl>
## 1 Others            4.72
## 2 Espresso bar      4.69
## 3 Coffee shop       4.68
## 4 Cafe              4.60

coffe_df %>% 
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>%
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(`Place type`, AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
   geom_text(aes(label = round(AvgRating, 2)), vjust = -.5) +
  labs(
    y = "Average Rating"
  ) +
  theme_minimal()

Observation

The highest column of average rating is others in Place Type while the lowest is Cafe.

In each region, which type of coffee shops have the most reviews in total?

coffe_df %>%
  filter(Region == "A") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region A",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of coffee shop have a highest total reviews in type of coffee shop in Region A.
the lowest total reviews is others in type of coffee shop in this region.

coffe_df %>%
  filter(Region == "B") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region B",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of cafe have a highest total reviews in type of coffee shop in Region B.
the lowest total reviews is others in type of coffee shop in this Region.

coffe_df %>%
  filter(Region == "C") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region C",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of cafe have a highest total reviews in type of coffee shop in Region C.
the lowest total reviews is others in type of coffee shop in this Region.

coffe_df %>%
  filter(Region == "D") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region D",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of cafe have a highest total reviews in type of coffee shop in Region D.
the lowest total reviews is Espresso bar in type of coffee shop in this Region.

coffe_df %>%
  filter(Region == "E") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region E",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of Coffee shop have a highest total reviews in type of coffee shop in Region E.
the lowest total reviews is others in type of coffee shop in this Region.

coffe_df %>%
  filter(Region == "F") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region F",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of Coffee shop have a highest total reviews in type of coffee shop in Region F.
the lowest total reviews is Espresso bar in type of coffee shop in this Region.

coffe_df %>%
  filter(Region == "G") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region G",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of Espresso bar have a highest total reviews in type of coffee shop in Region G.
the lowest total reviews is others in type of coffee shop in this Region.

coffe_df %>%
  filter(Region == "H") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region H",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of cafe have a highest total reviews in type of coffee shop in Region H.
the lowest total reviews is coffee shop in type of coffee shop in this Region.

coffe_df %>%
  filter(Region == "I") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region I",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of Coffee shop have a highest total reviews in type of coffee shop in Region I.
the lowest total reviews is cafe in type of coffee shop in this Region.

coffe_df %>%
  filter(Region == "J") %>%
  group_by(`Place type`) %>% 
  summarise(totalreviews = sum(Reviews)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, totalreviews, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = totalreviews, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = totalreviews), vjust = -.3) +
  labs(
    title = "Region J",
    y = "Total Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The bar of Coffee shop have a highest total reviews in type of coffee shop in Region G.
the lowest total reviews is Espresso bar in type of coffee shop in this Region.

For each region, what is average rating per Place type ?

coffe_df %>%
  filter(Region == "A") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region A",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of others.

coffe_df %>%
  filter(Region == "B") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region B",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of others.

coffe_df %>%
  filter(Region == "C") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region C",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Coffee shop.

coffe_df %>%
  filter(Region == "D") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region D",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Espresso bar.

coffe_df %>%
  filter(Region == "E") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region E",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Cafe.

coffe_df %>%
  filter(Region == "F") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region F",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Cafe.

coffe_df %>%
  filter(Region == "G") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region G",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of others.

coffe_df %>%
  filter(Region == "H") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region H",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of others.

coffe_df %>%
  filter(Region == "I") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region I",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Coffee shop.

coffe_df %>%
  filter(Region == "J") %>%
  group_by(`Place type`) %>% 
  summarise(AvgRating = mean(Rating)) %>% 
  mutate(`Place type` = fct_reorder(`Place type`, AvgRating, .desc = T)) %>% 
  ggplot(aes(x = `Place type`, y = AvgRating, fill = `Place type`)) +
  geom_col(show.legend = F, width = .7) +
  scale_fill_viridis_d() +
  geom_text(aes(label = round(AvgRating, 2)), vjust = -.3) +
  labs(
    title = "Region J",
    y = "Average Reviews"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = .5, face = "bold"))

Observation

The rating for every coffee shop type is almost same but the highest is bar of Espresso bar.

Other key finding

The average review per place type/type of coffee shop is almost same.
The average rating per place type/type of coffee shop is all above 4.6.
Some type of coffee shop have no review at all for some Region.
For each Region Coffee shop type coffee shop have almost have a highest number of total reviews follow by cafe and Espresso bar.
For every Region it seem the other type of coffee shop have almost low number of total reviews.
Some type of coffee shop have no rating at all for some Region.
Almost each region there average rating per type of coffee shop is greater than 4.5.
The other type of coffee shop have appeared to be some highest and lowest rating for each Region.

Case Study Project - Coffee Shops

John Lloyd Espada

9/3/2022

Company Background

Customer Questions

Dataset

Load the Data

Data dictionary

Data Cleaning

EDA

Correlation Analysis

What is the most common type of coffee shop in this local market?

How do the ranges of reviews differ in delivery options?

what is the average reviews per Place type ?

what is the average Ratings per Place type ?

In each region, which type of coffee shops have the most reviews in total?

For each region, what is average rating per Place type ?

Other key finding