Zhuo Ding
2025-11-27
Coffee is a critical export crop for many developing nations, and its market value depends strongly on its overall rating. The total cup score varies based on geographic, environmental, and post-harvest factors such as altitude, species/variety, and processing method. Understanding which factors strongly influence quality can support sustainable farming and global competitiveness.
Dataset: TidyTuesday 2020-07-07 Coffee Ratings
Source: Coffee Quality Institute (CQI) and the Specialty Coffee Association
Imported using standard readr::read_csv() in R
The dataset is documented in the TidyTuesday repository with full descriptions of variables such as altitude_mean_meters, aroma, flavor, processing_method, species, and total_cup_points. A data dictionary is provided in the repository.
Use the tools in R such as str() and summary() to describe the original dataset you imported.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
coffee <- readr::read_csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-07-07/coffee_ratings.csv"
)## Rows: 1339 Columns: 43
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (24): species, owner, country_of_origin, farm_name, lot_number, mill, ic...
## dbl (19): total_cup_points, number_of_bags, aroma, flavor, aftertaste, acidi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## spc_tbl_ [1,339 × 43] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ total_cup_points : num [1:1339] 90.6 89.9 89.8 89 88.8 ...
## $ species : chr [1:1339] "Arabica" "Arabica" "Arabica" "Arabica" ...
## $ owner : chr [1:1339] "metad plc" "metad plc" "grounds for health admin" "yidnekachew dabessa" ...
## $ country_of_origin : chr [1:1339] "Ethiopia" "Ethiopia" "Guatemala" "Ethiopia" ...
## $ farm_name : chr [1:1339] "metad plc" "metad plc" "san marcos barrancas \"san cristobal cuch" "yidnekachew dabessa coffee plantation" ...
## $ lot_number : chr [1:1339] NA NA NA NA ...
## $ mill : chr [1:1339] "metad plc" "metad plc" NA "wolensu" ...
## $ ico_number : chr [1:1339] "2014/2015" "2014/2015" NA NA ...
## $ company : chr [1:1339] "metad agricultural developmet plc" "metad agricultural developmet plc" NA "yidnekachew debessa coffee plantation" ...
## $ altitude : chr [1:1339] "1950-2200" "1950-2200" "1600 - 1800 m" "1800-2200" ...
## $ region : chr [1:1339] "guji-hambela" "guji-hambela" NA "oromia" ...
## $ producer : chr [1:1339] "METAD PLC" "METAD PLC" NA "Yidnekachew Dabessa Coffee Plantation" ...
## $ number_of_bags : num [1:1339] 300 300 5 320 300 100 100 300 300 50 ...
## $ bag_weight : chr [1:1339] "60 kg" "60 kg" "1" "60 kg" ...
## $ in_country_partner : chr [1:1339] "METAD Agricultural Development plc" "METAD Agricultural Development plc" "Specialty Coffee Association" "METAD Agricultural Development plc" ...
## $ harvest_year : chr [1:1339] "2014" "2014" NA "2014" ...
## $ grading_date : chr [1:1339] "April 4th, 2015" "April 4th, 2015" "May 31st, 2010" "March 26th, 2015" ...
## $ owner_1 : chr [1:1339] "metad plc" "metad plc" "Grounds for Health Admin" "Yidnekachew Dabessa" ...
## $ variety : chr [1:1339] NA "Other" "Bourbon" NA ...
## $ processing_method : chr [1:1339] "Washed / Wet" "Washed / Wet" NA "Natural / Dry" ...
## $ aroma : num [1:1339] 8.67 8.75 8.42 8.17 8.25 8.58 8.42 8.25 8.67 8.08 ...
## $ flavor : num [1:1339] 8.83 8.67 8.5 8.58 8.5 8.42 8.5 8.33 8.67 8.58 ...
## $ aftertaste : num [1:1339] 8.67 8.5 8.42 8.42 8.25 8.42 8.33 8.5 8.58 8.5 ...
## $ acidity : num [1:1339] 8.75 8.58 8.42 8.42 8.5 8.5 8.5 8.42 8.42 8.5 ...
## $ body : num [1:1339] 8.5 8.42 8.33 8.5 8.42 8.25 8.25 8.33 8.33 7.67 ...
## $ balance : num [1:1339] 8.42 8.42 8.42 8.25 8.33 8.33 8.25 8.5 8.42 8.42 ...
## $ uniformity : num [1:1339] 10 10 10 10 10 10 10 10 9.33 10 ...
## $ clean_cup : num [1:1339] 10 10 10 10 10 10 10 10 10 10 ...
## $ sweetness : num [1:1339] 10 10 10 10 10 10 10 9.33 9.33 10 ...
## $ cupper_points : num [1:1339] 8.75 8.58 9.25 8.67 8.58 8.33 8.5 9 8.67 8.5 ...
## $ moisture : num [1:1339] 0.12 0.12 0 0.11 0.12 0.11 0.11 0.03 0.03 0.1 ...
## $ category_one_defects : num [1:1339] 0 0 0 0 0 0 0 0 0 0 ...
## $ quakers : num [1:1339] 0 0 0 0 0 0 0 0 0 0 ...
## $ color : chr [1:1339] "Green" "Green" NA "Green" ...
## $ category_two_defects : num [1:1339] 0 1 0 2 2 1 0 0 0 4 ...
## $ expiration : chr [1:1339] "April 3rd, 2016" "April 3rd, 2016" "May 31st, 2011" "March 25th, 2016" ...
## $ certification_body : chr [1:1339] "METAD Agricultural Development plc" "METAD Agricultural Development plc" "Specialty Coffee Association" "METAD Agricultural Development plc" ...
## $ certification_address: chr [1:1339] "309fcf77415a3661ae83e027f7e5f05dad786e44" "309fcf77415a3661ae83e027f7e5f05dad786e44" "36d0d00a3724338ba7937c52a378d085f2172daa" "309fcf77415a3661ae83e027f7e5f05dad786e44" ...
## $ certification_contact: chr [1:1339] "19fef5a731de2db57d16da10287413f5f99bc2dd" "19fef5a731de2db57d16da10287413f5f99bc2dd" "0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660" "19fef5a731de2db57d16da10287413f5f99bc2dd" ...
## $ unit_of_measurement : chr [1:1339] "m" "m" "m" "m" ...
## $ altitude_low_meters : num [1:1339] 1950 1950 1600 1800 1950 ...
## $ altitude_high_meters : num [1:1339] 2200 2200 1800 2200 2200 NA NA 1700 1700 1850 ...
## $ altitude_mean_meters : num [1:1339] 2075 2075 1700 2000 2075 ...
## - attr(*, "spec")=
## .. cols(
## .. total_cup_points = col_double(),
## .. species = col_character(),
## .. owner = col_character(),
## .. country_of_origin = col_character(),
## .. farm_name = col_character(),
## .. lot_number = col_character(),
## .. mill = col_character(),
## .. ico_number = col_character(),
## .. company = col_character(),
## .. altitude = col_character(),
## .. region = col_character(),
## .. producer = col_character(),
## .. number_of_bags = col_double(),
## .. bag_weight = col_character(),
## .. in_country_partner = col_character(),
## .. harvest_year = col_character(),
## .. grading_date = col_character(),
## .. owner_1 = col_character(),
## .. variety = col_character(),
## .. processing_method = col_character(),
## .. aroma = col_double(),
## .. flavor = col_double(),
## .. aftertaste = col_double(),
## .. acidity = col_double(),
## .. body = col_double(),
## .. balance = col_double(),
## .. uniformity = col_double(),
## .. clean_cup = col_double(),
## .. sweetness = col_double(),
## .. cupper_points = col_double(),
## .. moisture = col_double(),
## .. category_one_defects = col_double(),
## .. quakers = col_double(),
## .. color = col_character(),
## .. category_two_defects = col_double(),
## .. expiration = col_character(),
## .. certification_body = col_character(),
## .. certification_address = col_character(),
## .. certification_contact = col_character(),
## .. unit_of_measurement = col_character(),
## .. altitude_low_meters = col_double(),
## .. altitude_high_meters = col_double(),
## .. altitude_mean_meters = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
## total_cup_points species owner country_of_origin
## Min. : 0.00 Length:1339 Length:1339 Length:1339
## 1st Qu.:81.08 Class :character Class :character Class :character
## Median :82.50 Mode :character Mode :character Mode :character
## Mean :82.09
## 3rd Qu.:83.67
## Max. :90.58
##
## farm_name lot_number mill ico_number
## Length:1339 Length:1339 Length:1339 Length:1339
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## company altitude region producer
## Length:1339 Length:1339 Length:1339 Length:1339
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## number_of_bags bag_weight in_country_partner harvest_year
## Min. : 0.0 Length:1339 Length:1339 Length:1339
## 1st Qu.: 14.0 Class :character Class :character Class :character
## Median : 175.0 Mode :character Mode :character Mode :character
## Mean : 154.2
## 3rd Qu.: 275.0
## Max. :1062.0
##
## grading_date owner_1 variety processing_method
## Length:1339 Length:1339 Length:1339 Length:1339
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## aroma flavor aftertaste acidity body
## Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:7.420 1st Qu.:7.33 1st Qu.:7.250 1st Qu.:7.330 1st Qu.:7.330
## Median :7.580 Median :7.58 Median :7.420 Median :7.580 Median :7.500
## Mean :7.567 Mean :7.52 Mean :7.401 Mean :7.536 Mean :7.517
## 3rd Qu.:7.750 3rd Qu.:7.75 3rd Qu.:7.580 3rd Qu.:7.750 3rd Qu.:7.670
## Max. :8.750 Max. :8.83 Max. :8.670 Max. :8.750 Max. :8.580
##
## balance uniformity clean_cup sweetness
## Min. :0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.:7.330 1st Qu.:10.000 1st Qu.:10.000 1st Qu.:10.000
## Median :7.500 Median :10.000 Median :10.000 Median :10.000
## Mean :7.518 Mean : 9.835 Mean : 9.835 Mean : 9.857
## 3rd Qu.:7.750 3rd Qu.:10.000 3rd Qu.:10.000 3rd Qu.:10.000
## Max. :8.750 Max. :10.000 Max. :10.000 Max. :10.000
##
## cupper_points moisture category_one_defects quakers
## Min. : 0.000 Min. :0.00000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 7.250 1st Qu.:0.09000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 7.500 Median :0.11000 Median : 0.0000 Median : 0.0000
## Mean : 7.503 Mean :0.08838 Mean : 0.4795 Mean : 0.1734
## 3rd Qu.: 7.750 3rd Qu.:0.12000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :10.000 Max. :0.28000 Max. :63.0000 Max. :11.0000
## NA's :1
## color category_two_defects expiration certification_body
## Length:1339 Min. : 0.000 Length:1339 Length:1339
## Class :character 1st Qu.: 0.000 Class :character Class :character
## Mode :character Median : 2.000 Mode :character Mode :character
## Mean : 3.556
## 3rd Qu.: 4.000
## Max. :55.000
##
## certification_address certification_contact unit_of_measurement
## Length:1339 Length:1339 Length:1339
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## altitude_low_meters altitude_high_meters altitude_mean_meters
## Min. : 1 Min. : 1 Min. : 1
## 1st Qu.: 1100 1st Qu.: 1100 1st Qu.: 1100
## Median : 1311 Median : 1350 Median : 1311
## Mean : 1751 Mean : 1799 Mean : 1775
## 3rd Qu.: 1600 3rd Qu.: 1650 3rd Qu.: 1600
## Max. :190164 Max. :190164 Max. :190164
## NA's :230 NA's :230 NA's :230
The dataset contains 1,339 rows and 43 columns describing coffee samples evaluated worldwide. Key variables include: - total_cup_points - country_of_origin - variety - processing_method - altitude_mean_meters - aroma, flavor, acidity - moisture
coffee_clean <- coffee %>%
select(country_of_origin, species,
altitude_mean_meters, processing_method,
aroma, flavor, aftertaste, acidity, body, balance,
total_cup_points) %>%
drop_na() %>%
mutate(avg_sensory = (aroma + flavor + aftertaste + acidity + body + balance) / 6)
glimpse(coffee_clean)## Rows: 1,013
## Columns: 12
## $ country_of_origin <chr> "Ethiopia", "Ethiopia", "Ethiopia", "Ethiopia", "…
## $ species <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Arab…
## $ altitude_mean_meters <dbl> 2075.0, 2075.0, 2000.0, 2075.0, 1822.5, 1905.0, 1…
## $ processing_method <chr> "Washed / Wet", "Washed / Wet", "Natural / Dry", …
## $ aroma <dbl> 8.67, 8.75, 8.17, 8.25, 8.08, 8.17, 8.25, 8.08, 8…
## $ flavor <dbl> 8.83, 8.67, 8.58, 8.50, 8.58, 8.67, 8.42, 8.67, 8…
## $ aftertaste <dbl> 8.67, 8.50, 8.42, 8.25, 8.50, 8.25, 8.17, 8.33, 8…
## $ acidity <dbl> 8.75, 8.58, 8.42, 8.50, 8.50, 8.50, 8.33, 8.42, 8…
## $ body <dbl> 8.50, 8.42, 8.50, 8.42, 7.67, 7.75, 8.08, 8.00, 8…
## $ balance <dbl> 8.42, 8.42, 8.25, 8.33, 8.42, 8.17, 8.17, 8.08, 8…
## $ total_cup_points <dbl> 90.58, 89.92, 89.00, 88.83, 88.25, 88.08, 87.92, …
## $ avg_sensory <dbl> 8.640000, 8.556667, 8.390000, 8.375000, 8.291667,…
Cleaning steps:
Selected only necessary variables
Removed rows with missing values
Created avg_sensory score based on six sensory attributes
Q1 — Which factors most strongly predict cup quality?
A. Average Sensory Score vs Total Cup Points:
ggplot(coffee_clean, aes(x = avg_sensory, y = total_cup_points)) +
geom_point(alpha = 0.6) +
geom_smooth() +
labs(title = "Average Sensory Quality and Coffee Rating",
x = "Average Sensory Score",
y = "Total Cup Points")## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Scatterplot shows positive correlation. The higher average sensory score, the higher-rated coffee.
B. Species vs Total Cup Points:
## <ggplot2::labels> List of 1
## $ title: chr "Coffee Rating by Coffee Species"
Arabica generally earns higher scores than Robusta.The box encloses the middle 50% of the data.
C. Altitude vs Total Cup Points
coffee_alt <- coffee_clean %>%
filter(
altitude_mean_meters > 400,
altitude_mean_meters < 3000
)
ggplot(coffee_alt, aes(x = altitude_mean_meters, y = total_cup_points, color = species)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", color = "blue")+
labs(title = "Altitude and Coffee Quality",
x = "Altitude (m)",
y = "Total Cup Points")## `geom_smooth()` using formula = 'y ~ x'
Coffees grown at higher altitudes — particularly Arabica varieties — tend to receive higher quality scores. In contrast, Robusta coffees cluster at lower altitudes and exhibit lower average scores.
D.Processing Method Influence
mean_scores <- coffee_clean %>%
group_by(processing_method) %>%
summarise(mean_score = mean(total_cup_points))
ggplot(mean_scores, aes(y = processing_method, x = mean_score)) +
geom_point()According to the mean score comparison, Pulped Natural/Honey processing yields the highest average cupping performance in this sample. Washed/Wet coffees follow closely behind, while Natural/Dry coffees produce lower average scores despite occasional high-scoring examples. This suggests that Honey processing may offer a strong balance of flavor development and quality control.
Q2.Do coffees from certain countries consistently score higher than others?
coffee_clean %>%
group_by(country_of_origin) %>%
summarize(mean_score = mean(total_cup_points), n = n()) %>% # counts sampler size per country by create a new column n and counts how many rows (coffee samples) exist for each country
filter(n >= 5) %>% # keep countries with enough data (less sampler size could make the results misleading)
slice_max(mean_score, n = 15) %>%
ggplot(aes(x = reorder(country_of_origin, mean_score),
y = mean_score)) +
geom_col() +
coord_flip() +
labs(title = "Top 15 Countries by Mean Total Cup Points (Cleaned Data)",
x = "Country of Origin",
y = "Mean Cup Score") +
theme_minimal() +
theme(legend.position = "none")Conclusion
Altitude, sensory quality, species, and processing method strongly relate to the coffee rating scores.
High sensory quality Arabica beans processed by Pulped Natural/Honey methods perform best.
Certain regions (Latin America & East Africa) consistently lead in rating.
Future Work could be done:
Incorporate climate variables
Examine sustainability metrics