Zhuo Ding
2025-11-27
Coffee is a critical export crop for many developing nations, and its market value depends strongly on its overall rating. The total cup score varies based on geographic, environmental, and post-harvest factors such as altitude, species/variety, and processing method. Understanding which factors strongly influence quality can support sustainable farming and global competitiveness.
Dataset: TidyTuesday 2020-07-07 Coffee Ratings
Source: Coffee Quality Institute (CQI) and the Specialty Coffee Association
Imported using standard readr::read_csv() in R
The dataset is documented in the TidyTuesday repository with full descriptions of variables such as altitude_mean_meters, aroma, flavor, processing_method, species, and total_cup_points. A data dictionary is provided in the repository.
Use the tools in R such as str() and summary() to describe the original dataset you imported.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
coffee <- readr::read_csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-07-07/coffee_ratings.csv"
)## Rows: 1339 Columns: 43
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (24): species, owner, country_of_origin, farm_name, lot_number, mill, ic...
## dbl (19): total_cup_points, number_of_bags, aroma, flavor, aftertaste, acidi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## spc_tbl_ [1,339 × 43] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ total_cup_points : num [1:1339] 90.6 89.9 89.8 89 88.8 ...
## $ species : chr [1:1339] "Arabica" "Arabica" "Arabica" "Arabica" ...
## $ owner : chr [1:1339] "metad plc" "metad plc" "grounds for health admin" "yidnekachew dabessa" ...
## $ country_of_origin : chr [1:1339] "Ethiopia" "Ethiopia" "Guatemala" "Ethiopia" ...
## $ farm_name : chr [1:1339] "metad plc" "metad plc" "san marcos barrancas \"san cristobal cuch" "yidnekachew dabessa coffee plantation" ...
## $ lot_number : chr [1:1339] NA NA NA NA ...
## $ mill : chr [1:1339] "metad plc" "metad plc" NA "wolensu" ...
## $ ico_number : chr [1:1339] "2014/2015" "2014/2015" NA NA ...
## $ company : chr [1:1339] "metad agricultural developmet plc" "metad agricultural developmet plc" NA "yidnekachew debessa coffee plantation" ...
## $ altitude : chr [1:1339] "1950-2200" "1950-2200" "1600 - 1800 m" "1800-2200" ...
## $ region : chr [1:1339] "guji-hambela" "guji-hambela" NA "oromia" ...
## $ producer : chr [1:1339] "METAD PLC" "METAD PLC" NA "Yidnekachew Dabessa Coffee Plantation" ...
## $ number_of_bags : num [1:1339] 300 300 5 320 300 100 100 300 300 50 ...
## $ bag_weight : chr [1:1339] "60 kg" "60 kg" "1" "60 kg" ...
## $ in_country_partner : chr [1:1339] "METAD Agricultural Development plc" "METAD Agricultural Development plc" "Specialty Coffee Association" "METAD Agricultural Development plc" ...
## $ harvest_year : chr [1:1339] "2014" "2014" NA "2014" ...
## $ grading_date : chr [1:1339] "April 4th, 2015" "April 4th, 2015" "May 31st, 2010" "March 26th, 2015" ...
## $ owner_1 : chr [1:1339] "metad plc" "metad plc" "Grounds for Health Admin" "Yidnekachew Dabessa" ...
## $ variety : chr [1:1339] NA "Other" "Bourbon" NA ...
## $ processing_method : chr [1:1339] "Washed / Wet" "Washed / Wet" NA "Natural / Dry" ...
## $ aroma : num [1:1339] 8.67 8.75 8.42 8.17 8.25 8.58 8.42 8.25 8.67 8.08 ...
## $ flavor : num [1:1339] 8.83 8.67 8.5 8.58 8.5 8.42 8.5 8.33 8.67 8.58 ...
## $ aftertaste : num [1:1339] 8.67 8.5 8.42 8.42 8.25 8.42 8.33 8.5 8.58 8.5 ...
## $ acidity : num [1:1339] 8.75 8.58 8.42 8.42 8.5 8.5 8.5 8.42 8.42 8.5 ...
## $ body : num [1:1339] 8.5 8.42 8.33 8.5 8.42 8.25 8.25 8.33 8.33 7.67 ...
## $ balance : num [1:1339] 8.42 8.42 8.42 8.25 8.33 8.33 8.25 8.5 8.42 8.42 ...
## $ uniformity : num [1:1339] 10 10 10 10 10 10 10 10 9.33 10 ...
## $ clean_cup : num [1:1339] 10 10 10 10 10 10 10 10 10 10 ...
## $ sweetness : num [1:1339] 10 10 10 10 10 10 10 9.33 9.33 10 ...
## $ cupper_points : num [1:1339] 8.75 8.58 9.25 8.67 8.58 8.33 8.5 9 8.67 8.5 ...
## $ moisture : num [1:1339] 0.12 0.12 0 0.11 0.12 0.11 0.11 0.03 0.03 0.1 ...
## $ category_one_defects : num [1:1339] 0 0 0 0 0 0 0 0 0 0 ...
## $ quakers : num [1:1339] 0 0 0 0 0 0 0 0 0 0 ...
## $ color : chr [1:1339] "Green" "Green" NA "Green" ...
## $ category_two_defects : num [1:1339] 0 1 0 2 2 1 0 0 0 4 ...
## $ expiration : chr [1:1339] "April 3rd, 2016" "April 3rd, 2016" "May 31st, 2011" "March 25th, 2016" ...
## $ certification_body : chr [1:1339] "METAD Agricultural Development plc" "METAD Agricultural Development plc" "Specialty Coffee Association" "METAD Agricultural Development plc" ...
## $ certification_address: chr [1:1339] "309fcf77415a3661ae83e027f7e5f05dad786e44" "309fcf77415a3661ae83e027f7e5f05dad786e44" "36d0d00a3724338ba7937c52a378d085f2172daa" "309fcf77415a3661ae83e027f7e5f05dad786e44" ...
## $ certification_contact: chr [1:1339] "19fef5a731de2db57d16da10287413f5f99bc2dd" "19fef5a731de2db57d16da10287413f5f99bc2dd" "0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660" "19fef5a731de2db57d16da10287413f5f99bc2dd" ...
## $ unit_of_measurement : chr [1:1339] "m" "m" "m" "m" ...
## $ altitude_low_meters : num [1:1339] 1950 1950 1600 1800 1950 ...
## $ altitude_high_meters : num [1:1339] 2200 2200 1800 2200 2200 NA NA 1700 1700 1850 ...
## $ altitude_mean_meters : num [1:1339] 2075 2075 1700 2000 2075 ...
## - attr(*, "spec")=
## .. cols(
## .. total_cup_points = col_double(),
## .. species = col_character(),
## .. owner = col_character(),
## .. country_of_origin = col_character(),
## .. farm_name = col_character(),
## .. lot_number = col_character(),
## .. mill = col_character(),
## .. ico_number = col_character(),
## .. company = col_character(),
## .. altitude = col_character(),
## .. region = col_character(),
## .. producer = col_character(),
## .. number_of_bags = col_double(),
## .. bag_weight = col_character(),
## .. in_country_partner = col_character(),
## .. harvest_year = col_character(),
## .. grading_date = col_character(),
## .. owner_1 = col_character(),
## .. variety = col_character(),
## .. processing_method = col_character(),
## .. aroma = col_double(),
## .. flavor = col_double(),
## .. aftertaste = col_double(),
## .. acidity = col_double(),
## .. body = col_double(),
## .. balance = col_double(),
## .. uniformity = col_double(),
## .. clean_cup = col_double(),
## .. sweetness = col_double(),
## .. cupper_points = col_double(),
## .. moisture = col_double(),
## .. category_one_defects = col_double(),
## .. quakers = col_double(),
## .. color = col_character(),
## .. category_two_defects = col_double(),
## .. expiration = col_character(),
## .. certification_body = col_character(),
## .. certification_address = col_character(),
## .. certification_contact = col_character(),
## .. unit_of_measurement = col_character(),
## .. altitude_low_meters = col_double(),
## .. altitude_high_meters = col_double(),
## .. altitude_mean_meters = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
## total_cup_points species owner country_of_origin
## Min. : 0.00 Length:1339 Length:1339 Length:1339
## 1st Qu.:81.08 Class :character Class :character Class :character
## Median :82.50 Mode :character Mode :character Mode :character
## Mean :82.09
## 3rd Qu.:83.67
## Max. :90.58
##
## farm_name lot_number mill ico_number
## Length:1339 Length:1339 Length:1339 Length:1339
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## company altitude region producer
## Length:1339 Length:1339 Length:1339 Length:1339
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## number_of_bags bag_weight in_country_partner harvest_year
## Min. : 0.0 Length:1339 Length:1339 Length:1339
## 1st Qu.: 14.0 Class :character Class :character Class :character
## Median : 175.0 Mode :character Mode :character Mode :character
## Mean : 154.2
## 3rd Qu.: 275.0
## Max. :1062.0
##
## grading_date owner_1 variety processing_method
## Length:1339 Length:1339 Length:1339 Length:1339
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## aroma flavor aftertaste acidity body
## Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:7.420 1st Qu.:7.33 1st Qu.:7.250 1st Qu.:7.330 1st Qu.:7.330
## Median :7.580 Median :7.58 Median :7.420 Median :7.580 Median :7.500
## Mean :7.567 Mean :7.52 Mean :7.401 Mean :7.536 Mean :7.517
## 3rd Qu.:7.750 3rd Qu.:7.75 3rd Qu.:7.580 3rd Qu.:7.750 3rd Qu.:7.670
## Max. :8.750 Max. :8.83 Max. :8.670 Max. :8.750 Max. :8.580
##
## balance uniformity clean_cup sweetness
## Min. :0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.:7.330 1st Qu.:10.000 1st Qu.:10.000 1st Qu.:10.000
## Median :7.500 Median :10.000 Median :10.000 Median :10.000
## Mean :7.518 Mean : 9.835 Mean : 9.835 Mean : 9.857
## 3rd Qu.:7.750 3rd Qu.:10.000 3rd Qu.:10.000 3rd Qu.:10.000
## Max. :8.750 Max. :10.000 Max. :10.000 Max. :10.000
##
## cupper_points moisture category_one_defects quakers
## Min. : 0.000 Min. :0.00000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 7.250 1st Qu.:0.09000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 7.500 Median :0.11000 Median : 0.0000 Median : 0.0000
## Mean : 7.503 Mean :0.08838 Mean : 0.4795 Mean : 0.1734
## 3rd Qu.: 7.750 3rd Qu.:0.12000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :10.000 Max. :0.28000 Max. :63.0000 Max. :11.0000
## NA's :1
## color category_two_defects expiration certification_body
## Length:1339 Min. : 0.000 Length:1339 Length:1339
## Class :character 1st Qu.: 0.000 Class :character Class :character
## Mode :character Median : 2.000 Mode :character Mode :character
## Mean : 3.556
## 3rd Qu.: 4.000
## Max. :55.000
##
## certification_address certification_contact unit_of_measurement
## Length:1339 Length:1339 Length:1339
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## altitude_low_meters altitude_high_meters altitude_mean_meters
## Min. : 1 Min. : 1 Min. : 1
## 1st Qu.: 1100 1st Qu.: 1100 1st Qu.: 1100
## Median : 1311 Median : 1350 Median : 1311
## Mean : 1751 Mean : 1799 Mean : 1775
## 3rd Qu.: 1600 3rd Qu.: 1650 3rd Qu.: 1600
## Max. :190164 Max. :190164 Max. :190164
## NA's :230 NA's :230 NA's :230
The dataset contains 1,339 rows and 43 columns describing coffee samples evaluated worldwide. Key variables include: - total_cup_points - country_of_origin - variety - processing_method - altitude_mean_meters - aroma, flavor, acidity - moisture
coffee_clean <- coffee %>%
select(country_of_origin, species,
altitude_mean_meters, processing_method,
aroma, flavor, aftertaste, acidity, body, balance,
total_cup_points) %>%
drop_na() %>%
mutate(avg_sensory = (aroma + flavor + aftertaste + acidity + body + balance) / 6)
glimpse(coffee_clean)## Rows: 1,013
## Columns: 12
## $ country_of_origin <chr> "Ethiopia", "Ethiopia", "Ethiopia", "Ethiopia", "…
## $ species <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Arab…
## $ altitude_mean_meters <dbl> 2075.0, 2075.0, 2000.0, 2075.0, 1822.5, 1905.0, 1…
## $ processing_method <chr> "Washed / Wet", "Washed / Wet", "Natural / Dry", …
## $ aroma <dbl> 8.67, 8.75, 8.17, 8.25, 8.08, 8.17, 8.25, 8.08, 8…
## $ flavor <dbl> 8.83, 8.67, 8.58, 8.50, 8.58, 8.67, 8.42, 8.67, 8…
## $ aftertaste <dbl> 8.67, 8.50, 8.42, 8.25, 8.50, 8.25, 8.17, 8.33, 8…
## $ acidity <dbl> 8.75, 8.58, 8.42, 8.50, 8.50, 8.50, 8.33, 8.42, 8…
## $ body <dbl> 8.50, 8.42, 8.50, 8.42, 7.67, 7.75, 8.08, 8.00, 8…
## $ balance <dbl> 8.42, 8.42, 8.25, 8.33, 8.42, 8.17, 8.17, 8.08, 8…
## $ total_cup_points <dbl> 90.58, 89.92, 89.00, 88.83, 88.25, 88.08, 87.92, …
## $ avg_sensory <dbl> 8.640000, 8.556667, 8.390000, 8.375000, 8.291667,…
Cleaning steps:
Selected only necessary variables
Removed rows with missing values
Created avg_sensory score based on six sensory attributes
Q1 — Which factors most strongly predict cup quality?
A. Average Sensory Score vs Total Cup Points:
ggplot(coffee_clean, aes(x = avg_sensory, y = total_cup_points)) +
geom_point(alpha = 0.6) +
geom_smooth() +
labs(title = "Average Sensory Quality and Coffee Rating",
x = "Average Sensory Score",
y = "Total Cup Points")## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Scatterplot shows positive correlation. The higher average sensory score, the higher-rated coffee.
B. Species vs Total Cup Points:
## <ggplot2::labels> List of 1
## $ title: chr "Coffee Rating by Coffee Species"
Arabica generally earns higher scores than Robusta.The box encloses the middle 50% of the data.
C. Altitude vs Total Cup Points
coffee_alt <- coffee_clean %>%
filter(
altitude_mean_meters > 400,
altitude_mean_meters < 3000
)
ggplot(coffee_alt, aes(x = altitude_mean_meters, y = total_cup_points, color = species)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", color = "blue")+
labs(title = "Altitude and Coffee Quality",
x = "Altitude (m)",
y = "Total Cup Points")## `geom_smooth()` using formula = 'y ~ x'
Coffees grown at higher altitudes — particularly Arabica varieties — tend to receive higher quality scores. In contrast, Robusta coffees cluster at lower altitudes and exhibit lower average scores.
D.Processing Method Influence
coffee_clean %>%
ggplot(aes(x = processing_method, y = total_cup_points)) +
geom_col() + coord_flip()+
labs(title = "Scores by Processing Method",
x = "Processing Method",
y = "Total Cup Points")
Wet/Washed methods yield more consistent high scores.
Q2.Do coffees from certain countries consistently score higher than others?
coffee_clean %>%
group_by(country_of_origin) %>%
summarize(mean_score = mean(total_cup_points),
n = n()) %>% # counts sampler size per country by create a new column n and counts how many rows (coffee samples) exist for each country
filter(n >= 5) %>% # keep countries with enough data (less sampler size could make the results misleading)
slice_max(mean_score, n = 15) %>%
ggplot(aes(x = reorder(country_of_origin, mean_score),
y = mean_score)) +
geom_col() +
coord_flip() +
labs(title = "Top 15 Countries by Mean Total Cup Points (Cleaned Data)",
x = "Country of Origin",
y = "Mean Cup Score") +
theme_minimal() +
theme(legend.position = "none")
• Bar chart of top 15 countries by rating Using the cleaned coffee
dataset, we computed the mean cupping score for each country and
selected the top 15 countries with at least 5 valid evaluations. The bar
chart shows that Ethiopia ranks highest in mean coffee quality, followed
closely by the United States, Kenya, Uganda, and Colombia. These results
highlight the dominance of East African and Latin American coffee
origins in the specialty market. Countries such as Guatemala, Costa
Rica, and El Salvador are also consistently well regarded, aligning with
global expert perceptions of high-altitude Arabica production regions.
The United States appears in the top rankings, not as a major coffee
producer, but the dataset includes specialty coffee grown in Hawaii,
which is known for premium, high-scoring Arabica beans.
Conclusion
Altitude, sensory quality, species, and processing method strongly relate to the coffee rating scores.
High sensory quality Arabica beans processed by Wet/Washed methods perform best.
Certain regions (Latin America & East Africa) consistently lead in rating.
Future Work could be done:
Incorporate climate variables
Predict scores using regression models
Examine sustainability metrics