Author: Haley Grace Henson

mydata <- read.csv("simplified_coffee.csv", header = TRUE)
head(mydata)
##                               name                    roaster        roast
## 1         Ethiopia Shakiso Mormora               Revel Coffee Medium-Light
## 2               Ethiopia Suke Quto                Roast House Medium-Light
## 3       Ethiopia Gedeb Halo Beriti  Big Creek Coffee Roasters       Medium
## 4          Ethiopia Kayon Mountain Red Rooster Coffee Roaster        Light
## 5 Ethiopia Gelgelu Natural Organic  Willoughby's Coffee & Tea Medium-Light
## 6           Ethiopia Hambela Alaka  Black Oak Coffee Roasters Medium-Light
##     loc_country   origin X100g_USD rating   review_date
## 1 United States Ethiopia      4.70     92 November 2017
## 2 United States Ethiopia      4.19     92 November 2017
## 3 United States Ethiopia      4.85     94 November 2017
## 4 United States Ethiopia      5.14     93 November 2017
## 5 United States Ethiopia      3.97     93 November 2017
## 6 United States Ethiopia      5.14     93 November 2017
##                                                                                                                                                                                                                                                                                                                               review
## 1                  Crisply sweet, cocoa-toned. Lemon blossom, roasted cacao nib, date, rice candy, white peppercorn in aroma and cup. Savory-tart structure; delicate, silky mouthfeel. The richly drying finish leads with cocoa-toned, crisply sweet floral notes in the short and a hint of spice (white peppercorn) in the long.
## 2                                                           Delicate, sweetly spice-toned. Pink peppercorn, date, myrrh, lavender, roasted cacao nib in aroma and cup. Crisp, spice-toned structure with citrus-like acidity; satiny, very smooth mouthfeel. The crisply sweet finish centers around spice and pungent floral notes.
## 3                Deeply sweet, subtly pungent. Honey, pear, tangerine zest, dark chocolate, pistachio in aroma and cup. Sweet and juicy yet crisp in structure (think ripe pear); plush and buoyant in mouthfeel. Resonant and flavor saturated in the short finish; simplifies around hints of tangerine and pistachio in the long.
## 4 Delicate, richly and sweetly tart. Dried hibiscus, fine musk, almond, sandalwood, raspberry in aroma and cup. Fruit-toned, deeply sweet structure with gently-expressed acidity; buoyant, satiny-smooth mouthfeel. The crisp, flavor-saturated finish is characterized by sweet-tart fruit and floral tones (raspberry, hibiscus).
## 5                       High-toned, floral. Dried apricot, magnolia, almond butter, maple syrup, cherry brandy in aroma and cup. Crisp, sweetly-tart in structure; plush, syrupy mouthfeel. Notes of magnolia and dried apricot dominate in the flavor-laden short finish, with hints of almond butter and fruit brandy in the long.
## 6                                Very delicate, sweetly savory. Lemon verbena, allspice, dried persimmon, dogwood, baker’s chocolate in aroma and cup. Balanced, sweet-savory structure; velvety-smooth mouthfeel. The sweetly herb-toned finish centers on notes of lemon verbena and dried persimmon wrapped in baker’s chocolate.

##Description:

##name: Name of the blend ##roaster: Name of the roaster ##roast: Type of roast (Light, Medium-Light, Medium, Medium-Dark, Dark) ##loc_country: Location of the roaster ##origin: Origin of the beans ##X100g_USD: price per 100g of beans in US dollars ##rating: Rating of the coffee (out of 100) ##review_date: Date of the coffee review ##review: review of coffee

##Sample Size: 1046 (coffee reviews from around the world)

##Source: https://www.kaggle.com/datasets/schmoyote/coffee-reviews-dataset/code

mycleandata <- subset(mydata, select = -c(2,4,5,8,9))

##removed irrelevant variables for my analysis (variables that have too many categories)

mycleandata$roast <- factor(mycleandata$roast,
            levels = c("Light","Medium-Light", "Medium", "Medium-Dark", "Dark"))

##factored categorical variable

mycleandata2 <- na.omit(mycleandata)

##got rid of all empty or n/a cells

summary(mycleandata2[ , c(-1,-2)])
##    X100g_USD          rating    
##  Min.   :  0.17   Min.   :84.0  
##  1st Qu.:  5.28   1st Qu.:93.0  
##  Median :  6.17   Median :93.0  
##  Mean   : 10.09   Mean   :93.3  
##  3rd Qu.:  9.60   3rd Qu.:94.0  
##  Max.   :132.28   Max.   :97.0
mean(mycleandata2$X100g_USD)
## [1] 10.09368

##The average price of a 100g bag of coffee is $10.09

median(mycleandata2$rating)
## [1] 93

50% of coffees have a rating of 93 or below

##There is a slight upward curve that could indicate a positive correlation between price of coffee and the rating of coffee meaning that the higher the price of coffee the better rated the coffee is.

library(ggplot2)
ggplot(mycleandata2, aes(x=rating, fill=roast)) +
  geom_histogram(position="dodge", binwidth = 5, colour="lightblue") +
 ylab("Frequency") +
  labs(fill="Roast")

##The histogram indicates that the most rated roast of coffee was “Medium-Light”. We can also see that “Medium-Dark” has almost equal ratings of 90 and 95.