Author: Haley Grace Henson
mydata <- read.csv("simplified_coffee.csv", header = TRUE)
head(mydata)
## name roaster roast
## 1 Ethiopia Shakiso Mormora Revel Coffee Medium-Light
## 2 Ethiopia Suke Quto Roast House Medium-Light
## 3 Ethiopia Gedeb Halo Beriti Big Creek Coffee Roasters Medium
## 4 Ethiopia Kayon Mountain Red Rooster Coffee Roaster Light
## 5 Ethiopia Gelgelu Natural Organic Willoughby's Coffee & Tea Medium-Light
## 6 Ethiopia Hambela Alaka Black Oak Coffee Roasters Medium-Light
## loc_country origin X100g_USD rating review_date
## 1 United States Ethiopia 4.70 92 November 2017
## 2 United States Ethiopia 4.19 92 November 2017
## 3 United States Ethiopia 4.85 94 November 2017
## 4 United States Ethiopia 5.14 93 November 2017
## 5 United States Ethiopia 3.97 93 November 2017
## 6 United States Ethiopia 5.14 93 November 2017
## review
## 1 Crisply sweet, cocoa-toned. Lemon blossom, roasted cacao nib, date, rice candy, white peppercorn in aroma and cup. Savory-tart structure; delicate, silky mouthfeel. The richly drying finish leads with cocoa-toned, crisply sweet floral notes in the short and a hint of spice (white peppercorn) in the long.
## 2 Delicate, sweetly spice-toned. Pink peppercorn, date, myrrh, lavender, roasted cacao nib in aroma and cup. Crisp, spice-toned structure with citrus-like acidity; satiny, very smooth mouthfeel. The crisply sweet finish centers around spice and pungent floral notes.
## 3 Deeply sweet, subtly pungent. Honey, pear, tangerine zest, dark chocolate, pistachio in aroma and cup. Sweet and juicy yet crisp in structure (think ripe pear); plush and buoyant in mouthfeel. Resonant and flavor saturated in the short finish; simplifies around hints of tangerine and pistachio in the long.
## 4 Delicate, richly and sweetly tart. Dried hibiscus, fine musk, almond, sandalwood, raspberry in aroma and cup. Fruit-toned, deeply sweet structure with gently-expressed acidity; buoyant, satiny-smooth mouthfeel. The crisp, flavor-saturated finish is characterized by sweet-tart fruit and floral tones (raspberry, hibiscus).
## 5 High-toned, floral. Dried apricot, magnolia, almond butter, maple syrup, cherry brandy in aroma and cup. Crisp, sweetly-tart in structure; plush, syrupy mouthfeel. Notes of magnolia and dried apricot dominate in the flavor-laden short finish, with hints of almond butter and fruit brandy in the long.
## 6 Very delicate, sweetly savory. Lemon verbena, allspice, dried persimmon, dogwood, baker’s chocolate in aroma and cup. Balanced, sweet-savory structure; velvety-smooth mouthfeel. The sweetly herb-toned finish centers on notes of lemon verbena and dried persimmon wrapped in baker’s chocolate.
##Description:
##name: Name of the blend ##roaster: Name of the roaster ##roast: Type of roast (Light, Medium-Light, Medium, Medium-Dark, Dark) ##loc_country: Location of the roaster ##origin: Origin of the beans ##X100g_USD: price per 100g of beans in US dollars ##rating: Rating of the coffee (out of 100) ##review_date: Date of the coffee review ##review: review of coffee
##Sample Size: 1046 (coffee reviews from around the world)
##Source: https://www.kaggle.com/datasets/schmoyote/coffee-reviews-dataset/code
mycleandata <- subset(mydata, select = -c(2,4,5,8,9))
##removed irrelevant variables for my analysis (variables that have too many categories)
mycleandata$roast <- factor(mycleandata$roast,
levels = c("Light","Medium-Light", "Medium", "Medium-Dark", "Dark"))
##factored categorical variable
mycleandata2 <- na.omit(mycleandata)
##got rid of all empty or n/a cells
summary(mycleandata2[ , c(-1,-2)])
## X100g_USD rating
## Min. : 0.17 Min. :84.0
## 1st Qu.: 5.28 1st Qu.:93.0
## Median : 6.17 Median :93.0
## Mean : 10.09 Mean :93.3
## 3rd Qu.: 9.60 3rd Qu.:94.0
## Max. :132.28 Max. :97.0
mean(mycleandata2$X100g_USD)
## [1] 10.09368
##The average price of a 100g bag of coffee is $10.09
median(mycleandata2$rating)
## [1] 93
##There is a slight upward curve that could indicate a positive correlation between price of coffee and the rating of coffee meaning that the higher the price of coffee the better rated the coffee is.
library(ggplot2)
ggplot(mycleandata2, aes(x=rating, fill=roast)) +
geom_histogram(position="dodge", binwidth = 5, colour="lightblue") +
ylab("Frequency") +
labs(fill="Roast")
##The histogram indicates that the most rated roast of coffee was
“Medium-Light”. We can also see that “Medium-Dark” has almost equal
ratings of 90 and 95.