Load in Data

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(htmltools)
packageVersion("htmltools")
## [1] '0.5.8.1'
library(readxl)
library(tibble)
library(ggplot2)

# Load in data
coffee <- read_csv("~/Downloads/coffee_analysis.csv")
## Rows: 2095 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): name, roaster, roast, loc_country, origin_1, origin_2, review_date...
## dbl  (2): 100g_USD, rating
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Show column types
spec(coffee) # double = numeric
## cols(
##   name = col_character(),
##   roaster = col_character(),
##   roast = col_character(),
##   loc_country = col_character(),
##   origin_1 = col_character(),
##   origin_2 = col_character(),
##   `100g_USD` = col_double(),
##   rating = col_double(),
##   review_date = col_character(),
##   desc_1 = col_character(),
##   desc_2 = col_character(),
##   desc_3 = col_character()
## )

Converting the categorical variables (roast, loc_country, origin_1) to factors will make statistical analysis and visualization far easier. Factors are simply another way to represent categorical variables and will make my life easier.

# Convert key categorical columns to factors
coffee <- coffee %>%
  mutate(
    roast = as.factor(roast),
    loc_country = as.factor(loc_country),
    origin_1 = as.factor(origin_1)
  )

Grouping Data

1. Roast Level –> Price&Rating

# Using group_by to group roast by price&rating
# The 'roast_summary' DataFrame will provide insights into the price, rating, and number of reviews (n) each type of coffee roast has received
roast_summary <- coffee %>%
  group_by(roast) %>%
  summarise(
    avg_price = mean(`100g_USD`, na.rm = TRUE),
    avg_rating = mean(rating, na.rm = TRUE),
    n = n()
  ) %>%
  arrange(n)

roast_summary
## # A tibble: 6 × 4
##   roast        avg_price avg_rating     n
##   <fct>            <dbl>      <dbl> <int>
## 1 Dark              4.46       88.2     5
## 2 <NA>             42.2        93.5    15
## 3 Medium-Dark       9.51       91.9    39
## 4 Medium            7.06       92.3   259
## 5 Light            11.4        93.5   287
## 6 Medium-Light      8.99       93.2  1490
  • Low Probability Group: ‘Dark’

The ‘n’ represents the total number of observations by each row. For Dark roast, n=5, which is the lowest number of observations by row, meaning it will have the lowest probability of being selected if we were to randomly select a row from the dataset. In this dataset, dark roast coffee has the average lowest price, rating, and number of reviews. These insights tell us that dark roasts are far less popular and enjoyed than other coffee roasts.

# Finding the roast with the lowest probability to be chosen
roast_summary %>% slice(1)
## # A tibble: 1 × 4
##   roast avg_price avg_rating     n
##   <fct>     <dbl>      <dbl> <int>
## 1 Dark       4.46       88.2     5

The ‘roast_summary’ below shows the difference in random probability to be chosen if we were to select one row from the dataset, highlighting that ‘Dark’ roast is far less likely to be selected than the other coffee roasts. ‘Medium-Light’ coffee roasts have by far the greatest chance of being selected, as it has received far and away the most reviews in this dataset.

roast_summary %>%
  mutate(probability = n / sum(n))
## # A tibble: 6 × 5
##   roast        avg_price avg_rating     n probability
##   <fct>            <dbl>      <dbl> <int>       <dbl>
## 1 Dark              4.46       88.2     5     0.00239
## 2 <NA>             42.2        93.5    15     0.00716
## 3 Medium-Dark       9.51       91.9    39     0.0186 
## 4 Medium            7.06       92.3   259     0.124  
## 5 Light            11.4        93.5   287     0.137  
## 6 Medium-Light      8.99       93.2  1490     0.711
ggplot(roast_summary, aes(x = roast, y = avg_rating)) +
  geom_col() +
  labs(title = "Average Rating by Roast Level",
       x = "Roast",
       y = "Average Rating")

Conclusion - Medium-Light coffee has received far and away the most reviews in this coffee reviews dataset. Additionally, Light to Medium-Light coffee are the two most popular coffee roasts, being the only blends that averaged ratings over 93. Testable Hypothesis - Coffee roasts with fewer reviews are not because consumers do not like dark roast coffee, but because it is more rare to find.

2. Origin –> Rating

I am aiming to analyze which regions/countries grow the best coffee in the world. I will gain insights through finding with places of origin receive the highest ratings.

origin_summary <- coffee %>%
  group_by(origin_1) %>%
  summarise(
    avg_rating = mean(rating, na.rm = TRUE),
    count = n()  # Create a count column to filter by
  ) %>%
  filter(count >= 10) %>%  # Only keep groups with at least 10 occurrences
  ungroup()

origin_summary
## # A tibble: 40 × 3
##    origin_1                  avg_rating count
##    <fct>                          <dbl> <int>
##  1 Acatenango Growing Region       93.5    14
##  2 Aceh Province                   93.8    12
##  3 Agaro Gera                      93.7    10
##  4 Alishan                         93.5    10
##  5 Antigua                         90.5    11
##  6 Antigua Growing Region          91.9    25
##  7 Bench-Maji Zone                 94.1    11
##  8 Boquete Growing Region          94.6    39
##  9 Brazil                          91.8    27
## 10 Caicedonia                      94.3    11
## # ℹ 30 more rows

I chose to filter out places of origin that received less than 10 reviews to eliminate higher variance due to small sample sizes.

origin_summary %>% slice(1)
## # A tibble: 1 × 3
##   origin_1                  avg_rating count
##   <fct>                          <dbl> <int>
## 1 Acatenango Growing Region       93.5    14
# Count replaces 'n' in the mutate function since I created a count column to filter by to only keep places of origin with at least 10 reviews in the 'group_by' function; 'count=n()'
origin_summary %>%
  mutate(probability = count / sum(count))
## # A tibble: 40 × 4
##    origin_1                  avg_rating count probability
##    <fct>                          <dbl> <int>       <dbl>
##  1 Acatenango Growing Region       93.5    14     0.0139 
##  2 Aceh Province                   93.8    12     0.0119 
##  3 Agaro Gera                      93.7    10     0.00990
##  4 Alishan                         93.5    10     0.00990
##  5 Antigua                         90.5    11     0.0109 
##  6 Antigua Growing Region          91.9    25     0.0248 
##  7 Bench-Maji Zone                 94.1    11     0.0109 
##  8 Boquete Growing Region          94.6    39     0.0386 
##  9 Brazil                          91.8    27     0.0267 
## 10 Caicedonia                      94.3    11     0.0109 
## # ℹ 30 more rows
ggplot(origin_summary %>% filter(count >= 15),
       aes(x = reorder(origin_1, avg_rating), y = avg_rating)) +
  geom_col() +
  coord_flip() +
  labs(title = "Average Rating by Origin (min 15 coffees)",
       x = "Origin",
       y = "Average Rating")

Conclusion - Certain countries of origin that stand out, receiving excellent reviews for coffee grown in their homeland include Costa Rica, Columbia, Guatemala, and El Salvador. These countries located in Central and South America all received an average rating of at least 92/100 for coffee grown in their country, highlighting Central and South America as one of the best coffee growing regions in the world!

Testable Hypothesis - There is a weak correlation between the average rating being better as a result of more reviews for a coffee.

3. Price Buckets(cut) –> Rating

I am creating categorical price ranges based on coffee ratings.

# Using cut() to break coffee prices per 100g_USD into 4 price bins
coffee <- coffee %>%
  mutate(
    price_bucket = cut(
      `100g_USD`,
      breaks = c(-Inf, 4.99, 9.99, 19.99, Inf),
      labels = c("$0-5", "$5-10", "$10-20", "$20<="),
      right = TRUE
    )
  )

# Grouped summary by custom buckets
price_summary <- coffee %>%
  group_by(price_bucket) %>%
  summarise(
    avg_rating = mean(rating, na.rm = TRUE),
    avg_price = mean(`100g_USD`, na.rm = TRUE),
    n = n()
  ) %>%
  arrange(n)

price_summary
## # A tibble: 4 × 4
##   price_bucket avg_rating avg_price     n
##   <fct>             <dbl>     <dbl> <int>
## 1 $20<=              94.3     37.2    177
## 2 $10-20             93.5     14.0    258
## 3 $0-5               92.4      4.12   586
## 4 $5-10              93.2      6.43  1074
price_summary %>% slice(1)
## # A tibble: 1 × 4
##   price_bucket avg_rating avg_price     n
##   <fct>             <dbl>     <dbl> <int>
## 1 $20<=              94.3      37.2   177

The correlation between higher average coffee price and higher average rating makes a ton of sense, as well as the indication that higher price is a result of rarer coffee grounds, increasing their value as well.

# Visualizing average coffee ratings by price per 100g USD
ggplot(price_summary,
       aes(x = price_bucket, y = avg_rating, fill = price_bucket)) +
  geom_col() +
  scale_fill_manual(values = c(
    "$0-5" = "lightgray",
    "$5-10" = "skyblue",
    "$10-20" = "lightgreen",
    "$20<=" = "green"
  )) +
  labs(
    title = "Average Rating by Custom Price Bucket",
    x = "Price Bucket (USD per 100g)",
    y = "Average Rating",
    fill = "Price Bucket"
  )

Conclusion - Extremely high prices for coffee worldwide that are priced 20 USD or higher per 100g are rather rare, as it seems that most coffee is reasonably priced, with 1,074 out of 2,095 total coffees in the dataset being priced in between 5-10 USD per 100g. With that said, the higher priced coffees do yield the highest average rating of a 94.3/100, indicating a that the highest priced coffees return great value and quality. Testable Hypothesis - Coffees that fall under the highest price bucket average ratings that are significantly higher than coffees priced in the lowest price bucket, reflecting the incredible quality, value, and rarity that create the steep price of certain coffees.

Roast X Country Combination

All Existing Combos

Building data frame with all combinations for two categorical variables.

# Find all existing combinations for 'roast' and 'loc_country'
# Cleaning data by creating coffee_clean to remove NA roasts
coffee_clean <- coffee %>%
  filter(!is.na(roast))

# Count 'roast' X 'loc_country
# Adding additional columns for visualization purposes
roast_country_counts <- coffee_clean %>%
  count(loc_country, roast) %>%
  arrange(loc_country, roast) %>% # Order by country, then roast
  group_by(loc_country) %>%
  mutate(
    dominant_roast = roast[which.max(n)],
    is_dominant = roast == dominant_roast
  ) %>%
  ungroup()

roast_country_counts
## # A tibble: 41 × 5
##    loc_country roast            n dominant_roast is_dominant
##    <fct>       <fct>        <int> <fct>          <lgl>      
##  1 Australia   Medium-Light     2 Medium-Light   TRUE       
##  2 Belgium     Medium           1 Medium         TRUE       
##  3 Canada      Light            1 Medium-Light   FALSE      
##  4 Canada      Medium           7 Medium-Light   FALSE      
##  5 Canada      Medium-Light    23 Medium-Light   TRUE       
##  6 China       Light            2 Light          TRUE       
##  7 China       Medium-Light     2 Light          FALSE      
##  8 England     Medium-Light     2 Medium-Light   TRUE       
##  9 Guatemala   Light            5 Medium-Light   FALSE      
## 10 Guatemala   Medium           5 Medium-Light   FALSE      
## # ℹ 31 more rows

All Possible Combos

all_combos <- expand.grid(
  roast = unique(coffee_clean$roast),
  loc_country = unique(coffee_clean$loc_country)
)

combo_check <- all_combos %>%
  left_join(roast_country_counts, by = c("roast", "loc_country")) %>%
  replace_na(list(n = 0))

combo_check
##           roast             loc_country   n dominant_roast is_dominant
## 1  Medium-Light               Hong Kong  16   Medium-Light        TRUE
## 2        Medium               Hong Kong   2   Medium-Light       FALSE
## 3         Light               Hong Kong   2   Medium-Light       FALSE
## 4   Medium-Dark               Hong Kong   0           <NA>          NA
## 5          Dark               Hong Kong   0           <NA>          NA
## 6  Medium-Light           United States 995   Medium-Light        TRUE
## 7        Medium           United States 151   Medium-Light       FALSE
## 8         Light           United States 166   Medium-Light       FALSE
## 9   Medium-Dark           United States  15   Medium-Light       FALSE
## 10         Dark           United States   4   Medium-Light       FALSE
## 11 Medium-Light                  Canada  23   Medium-Light        TRUE
## 12       Medium                  Canada   7   Medium-Light       FALSE
## 13        Light                  Canada   1   Medium-Light       FALSE
## 14  Medium-Dark                  Canada   0           <NA>          NA
## 15         Dark                  Canada   0           <NA>          NA
## 16 Medium-Light                  Taiwan 355   Medium-Light        TRUE
## 17       Medium                  Taiwan  73   Medium-Light       FALSE
## 18        Light                  Taiwan  99   Medium-Light       FALSE
## 19  Medium-Dark                  Taiwan  21   Medium-Light       FALSE
## 20         Dark                  Taiwan   1   Medium-Light       FALSE
## 21 Medium-Light                 Hawai'i  66   Medium-Light        TRUE
## 22       Medium                 Hawai'i  18   Medium-Light       FALSE
## 23        Light                 Hawai'i   2   Medium-Light       FALSE
## 24  Medium-Dark                 Hawai'i   1   Medium-Light       FALSE
## 25         Dark                 Hawai'i   0           <NA>          NA
## 26 Medium-Light               Australia   2   Medium-Light        TRUE
## 27       Medium               Australia   0           <NA>          NA
## 28        Light               Australia   0           <NA>          NA
## 29  Medium-Dark               Australia   0           <NA>          NA
## 30         Dark               Australia   0           <NA>          NA
## 31 Medium-Light                  Uganda   0           <NA>          NA
## 32       Medium                  Uganda   1         Medium        TRUE
## 33        Light                  Uganda   0           <NA>          NA
## 34  Medium-Dark                  Uganda   0           <NA>          NA
## 35         Dark                  Uganda   0           <NA>          NA
## 36 Medium-Light                  Mexico   1         Medium       FALSE
## 37       Medium                  Mexico   1         Medium        TRUE
## 38        Light                  Mexico   0           <NA>          NA
## 39  Medium-Dark                  Mexico   0           <NA>          NA
## 40         Dark                  Mexico   0           <NA>          NA
## 41 Medium-Light                 Belgium   0           <NA>          NA
## 42       Medium                 Belgium   1         Medium        TRUE
## 43        Light                 Belgium   0           <NA>          NA
## 44  Medium-Dark                 Belgium   0           <NA>          NA
## 45         Dark                 Belgium   0           <NA>          NA
## 46 Medium-Light United States And Floyd   1   Medium-Light        TRUE
## 47       Medium United States And Floyd   0           <NA>          NA
## 48        Light United States And Floyd   0           <NA>          NA
## 49  Medium-Dark United States And Floyd   0           <NA>          NA
## 50         Dark United States And Floyd   0           <NA>          NA
## 51 Medium-Light               Guatemala  18   Medium-Light        TRUE
## 52       Medium               Guatemala   5   Medium-Light       FALSE
## 53        Light               Guatemala   5   Medium-Light       FALSE
## 54  Medium-Dark               Guatemala   1   Medium-Light       FALSE
## 55         Dark               Guatemala   0           <NA>          NA
## 56 Medium-Light                   Japan   2          Light       FALSE
## 57       Medium                   Japan   0           <NA>          NA
## 58        Light                   Japan   9          Light        TRUE
## 59  Medium-Dark                   Japan   1          Light       FALSE
## 60         Dark                   Japan   0           <NA>          NA
## 61 Medium-Light                    Peru   2   Medium-Light        TRUE
## 62       Medium                    Peru   0           <NA>          NA
## 63        Light                    Peru   0           <NA>          NA
## 64  Medium-Dark                    Peru   0           <NA>          NA
## 65         Dark                    Peru   0           <NA>          NA
## 66 Medium-Light                Honduras   1   Medium-Light        TRUE
## 67       Medium                Honduras   0           <NA>          NA
## 68        Light                Honduras   0           <NA>          NA
## 69  Medium-Dark                Honduras   0           <NA>          NA
## 70         Dark                Honduras   0           <NA>          NA
## 71 Medium-Light                 England   2   Medium-Light        TRUE
## 72       Medium                 England   0           <NA>          NA
## 73        Light                 England   0           <NA>          NA
## 74  Medium-Dark                 England   0           <NA>          NA
## 75         Dark                 England   0           <NA>          NA
## 76 Medium-Light                   China   2          Light       FALSE
## 77       Medium                   China   0           <NA>          NA
## 78        Light                   China   2          Light        TRUE
## 79  Medium-Dark                   China   0           <NA>          NA
## 80         Dark                   China   0           <NA>          NA
## 81 Medium-Light                   Kenya   1   Medium-Light        TRUE
## 82       Medium                   Kenya   0           <NA>          NA
## 83        Light                   Kenya   0           <NA>          NA
## 84  Medium-Dark                   Kenya   0           <NA>          NA
## 85         Dark                   Kenya   0           <NA>          NA
## 86 Medium-Light                Malaysia   3   Medium-Light        TRUE
## 87       Medium                Malaysia   0           <NA>          NA
## 88        Light                Malaysia   0           <NA>          NA
## 89  Medium-Dark                Malaysia   0           <NA>          NA
## 90         Dark                Malaysia   0           <NA>          NA
## 91 Medium-Light              New Taiwan   0           <NA>          NA
## 92       Medium              New Taiwan   0           <NA>          NA
## 93        Light              New Taiwan   1          Light        TRUE
## 94  Medium-Dark              New Taiwan   0           <NA>          NA
## 95         Dark              New Taiwan   0           <NA>          NA

Missing Combos

missing_combos <- combo_check %>% filter(n == 0)
missing_combos
##           roast             loc_country n dominant_roast is_dominant
## 1   Medium-Dark               Hong Kong 0           <NA>          NA
## 2          Dark               Hong Kong 0           <NA>          NA
## 3   Medium-Dark                  Canada 0           <NA>          NA
## 4          Dark                  Canada 0           <NA>          NA
## 5          Dark                 Hawai'i 0           <NA>          NA
## 6        Medium               Australia 0           <NA>          NA
## 7         Light               Australia 0           <NA>          NA
## 8   Medium-Dark               Australia 0           <NA>          NA
## 9          Dark               Australia 0           <NA>          NA
## 10 Medium-Light                  Uganda 0           <NA>          NA
## 11        Light                  Uganda 0           <NA>          NA
## 12  Medium-Dark                  Uganda 0           <NA>          NA
## 13         Dark                  Uganda 0           <NA>          NA
## 14        Light                  Mexico 0           <NA>          NA
## 15  Medium-Dark                  Mexico 0           <NA>          NA
## 16         Dark                  Mexico 0           <NA>          NA
## 17 Medium-Light                 Belgium 0           <NA>          NA
## 18        Light                 Belgium 0           <NA>          NA
## 19  Medium-Dark                 Belgium 0           <NA>          NA
## 20         Dark                 Belgium 0           <NA>          NA
## 21       Medium United States And Floyd 0           <NA>          NA
## 22        Light United States And Floyd 0           <NA>          NA
## 23  Medium-Dark United States And Floyd 0           <NA>          NA
## 24         Dark United States And Floyd 0           <NA>          NA
## 25         Dark               Guatemala 0           <NA>          NA
## 26       Medium                   Japan 0           <NA>          NA
## 27         Dark                   Japan 0           <NA>          NA
## 28       Medium                    Peru 0           <NA>          NA
## 29        Light                    Peru 0           <NA>          NA
## 30  Medium-Dark                    Peru 0           <NA>          NA
## 31         Dark                    Peru 0           <NA>          NA
## 32       Medium                Honduras 0           <NA>          NA
## 33        Light                Honduras 0           <NA>          NA
## 34  Medium-Dark                Honduras 0           <NA>          NA
## 35         Dark                Honduras 0           <NA>          NA
## 36       Medium                 England 0           <NA>          NA
## 37        Light                 England 0           <NA>          NA
## 38  Medium-Dark                 England 0           <NA>          NA
## 39         Dark                 England 0           <NA>          NA
## 40       Medium                   China 0           <NA>          NA
## 41  Medium-Dark                   China 0           <NA>          NA
## 42         Dark                   China 0           <NA>          NA
## 43       Medium                   Kenya 0           <NA>          NA
## 44        Light                   Kenya 0           <NA>          NA
## 45  Medium-Dark                   Kenya 0           <NA>          NA
## 46         Dark                   Kenya 0           <NA>          NA
## 47       Medium                Malaysia 0           <NA>          NA
## 48        Light                Malaysia 0           <NA>          NA
## 49  Medium-Dark                Malaysia 0           <NA>          NA
## 50         Dark                Malaysia 0           <NA>          NA
## 51 Medium-Light              New Taiwan 0           <NA>          NA
## 52       Medium              New Taiwan 0           <NA>          NA
## 53  Medium-Dark              New Taiwan 0           <NA>          NA
## 54         Dark              New Taiwan 0           <NA>          NA

Analyzing most and least common Roast X Country combinations

roast_country_counts %>% slice_max(n, n = 5)
## # A tibble: 5 × 5
##   loc_country   roast            n dominant_roast is_dominant
##   <fct>         <fct>        <int> <fct>          <lgl>      
## 1 United States Medium-Light   995 Medium-Light   TRUE       
## 2 Taiwan        Medium-Light   355 Medium-Light   TRUE       
## 3 United States Light          166 Medium-Light   FALSE      
## 4 United States Medium         151 Medium-Light   FALSE      
## 5 Taiwan        Light           99 Medium-Light   FALSE
roast_country_counts %>% slice_min(n, n = 5)
## # A tibble: 13 × 5
##    loc_country             roast            n dominant_roast is_dominant
##    <fct>                   <fct>        <int> <fct>          <lgl>      
##  1 Belgium                 Medium           1 Medium         TRUE       
##  2 Canada                  Light            1 Medium-Light   FALSE      
##  3 Guatemala               Medium-Dark      1 Medium-Light   FALSE      
##  4 Hawai'i                 Medium-Dark      1 Medium-Light   FALSE      
##  5 Honduras                Medium-Light     1 Medium-Light   TRUE       
##  6 Japan                   Medium-Dark      1 Light          FALSE      
##  7 Kenya                   Medium-Light     1 Medium-Light   TRUE       
##  8 Mexico                  Medium           1 Medium         TRUE       
##  9 Mexico                  Medium-Light     1 Medium         FALSE      
## 10 New Taiwan              Light            1 Light          TRUE       
## 11 Taiwan                  Dark             1 Medium-Light   FALSE      
## 12 Uganda                  Medium           1 Medium         TRUE       
## 13 United States And Floyd Medium-Light     1 Medium-Light   TRUE

It is evident that the United States and Taiwan have received the most amount of coffee reviews, and that medium-light roast is far and away the most popular roast that consumers drink. The number of observations that combine the United States with medium-light roasts and Taiwan with medium-light roasts makes up roughly half of the total observations in the entire data set!

Coffee X Loc_Country Visualization

ggplot(roast_country_counts,
       aes(x = roast,
           y = loc_country,
           size = n,
           color = dominant_roast)) +
  geom_point(alpha = 0.7) +
  labs(
    title = "Roast × Roaster Country",
    x = "Roast",
    y = "Roaster Country",
    size = "Count",
    color = "Most Common Roast"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

This visualizations displays type of roast and roaster country combinations, highlighting the most common roast for each country. Medium-dark and dark roasts both did not make the visual as the most common roast for any country at all, while medium-light roast dominated this chart, as light blue dots are prevalent. The United States and Taiwan are the leading coffee roasters in the world based on this dataset, as they dominate the number of observations in this data set combining roast and location country, which is represented by their dot sizes, with those two countries having the largest dots. While the United States and Taiwan dominate across all roasts in roasting coffee, many other countries specialize in one or two roast levels, with medium-light receiving worldwide popularity.