WORLD OF CHOCOLATE

Oftenly we consume chocolate, well I do, more than often and I always ask myself the same question. Where did the cocoa in the chocolate come from? Luckily, I got the opportunity to analyze chocolate ratings for over 1700 chocolate bars.

After completing this study, we will recognize what factors contribute to a higher perceived rating of chocolate and where their chocolate is actually coming from.

The higher the percentage of cocoa in a chocolate, the more bitter it tastes. So we will look at different types of cocoa beans used and in what concentrations to make the world’s most favourite candy.

According to World Atlas, the top eight countries of cocoa beans are Mexico, Ecuador, Brazil, Cameroon, Nigeria, Indonesia, Ghana, and C?te d’Ivoire. Many of these countries are represented in the data, but there are many more listed than what are seen here.

A broad (or fava) bean is from the flowering plant and usually has more flavor. In addition, many companies that sell the chocolate are located nowhere near where their beans were harvested.

This data is imported from kaggle with more than 1700 rows and 9 variables. This data may have unnecessary items, so we are going to start this journey of discovery by cleaning our data.

Let us first import our data:

library(tidyverse);
## -- Attaching packages ---------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## Warning: package 'tidyr' was built under R version 3.6.2
## -- Conflicts ------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(rpart);
library(caret);
## Warning: package 'caret' was built under R version 3.6.2
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(rpart.plot); library(e1071); library(MASS);
## Warning: package 'rpart.plot' was built under R version 3.6.2
## Warning: package 'e1071' was built under R version 3.6.2
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
library(dplyr);
chocolate_raw <- read_csv("flavors_of_cacao.csv");
## Parsed with column specification:
## cols(
##   Company = col_character(),
##   SpecificBeanOriginorBarName = col_character(),
##   REF = col_double(),
##   ReviewDate = col_double(),
##   CocoaPercent = col_character(),
##   CompanyLocation = col_character(),
##   Rating = col_double(),
##   BeanType = col_character(),
##   BroadBeanOrigin = col_character()
## )
chocolate_clean <- read_csv("flavors_of_cacao.csv")
## Parsed with column specification:
## cols(
##   Company = col_character(),
##   SpecificBeanOriginorBarName = col_character(),
##   REF = col_double(),
##   ReviewDate = col_double(),
##   CocoaPercent = col_character(),
##   CompanyLocation = col_character(),
##   Rating = col_double(),
##   BeanType = col_character(),
##   BroadBeanOrigin = col_character()
## )
write.csv(chocolate_raw, file="chocolate_raw.csv")
write.csv(chocolate_clean, file="chocolate_clean.csv")

DATA CLEANING !!!

The data imported from Kaggle had lots of factors and information that couldn’t be analyzed too easily. Below, we will convert them to numeric variables and correct spelling and organizational errors, to make analyzing it easier and fun.You wouldn’t like spelling errors messing up your work, right?

#Remove rows with repeat labels
chocolate_clean = chocolate_clean[-1,]
View(chocolate_clean)

#Check to make sure there aren't columns with too many NA's
colMeans(is.na(chocolate_clean));
##                     Company SpecificBeanOriginorBarName 
##                0.0000000000                0.0000000000 
##                         REF                  ReviewDate 
##                0.0000000000                0.0000000000 
##                CocoaPercent             CompanyLocation 
##                0.0000000000                0.0000000000 
##                      Rating                    BeanType 
##                0.0000000000                0.0005574136 
##             BroadBeanOrigin 
##                0.0005574136
#Change misspellings
chocolate_clean[chocolate_clean == "Eucador"] <- "Ecuador"
chocolate_clean[chocolate_clean == "Domincan Republic"] <- "Dominican Republic"
chocolate_clean[chocolate_clean == "Niacragua"] <- "Nicaragua"

#Make data numeric
chocolate_clean$REF <- as.numeric(as.character(chocolate_clean$REF))
chocolate_clean$Rating <- as.numeric(as.character(chocolate_clean$Rating))
chocolate_clean$`ReviewDate` <- as.numeric(as.character(chocolate_clean$`ReviewDate`))

#Make percent into a decimal
chocolate_clean$`CocoaPercent` <- as.numeric(sub("%", "",chocolate_clean$`CocoaPercent`,fixed=TRUE))/100
#Assign numbers to factors to make Company Location (loc) numeric
chocolate_clean <- chocolate_clean %>%
  mutate(location = recode(
    chocolate_clean$`CompanyLocation`,
      "France" = 1,
      "U.S.A." = 2,
      "Fiji" = 3,
      "Ecuador" = 4,
      "Mexico" = 5,
      "Switzerland" = 6,
      "Netherlands" = 7,
      "Spain" = 8,
      "Peru" = 9,
      "Canada" = 10,
      "Italy" = 11,
      "Brazil" = 12,
      "U.K." = 13,
      "Australia" = 14,
      "Wales" = 15,
      "Belgium"= 16,
      "Germany"= 17,
      "Russia"= 18,
      "Puerto Rico"= 19,
      "Venezuela"=20,
      "Columbia"=21,
      "Japan"=22,
      "New Zealand"=23,
      "Costa Rico"=24,
      "South Korea"=25,
      "Amsterdam"=26,
      "Scotland"=27,
      "Martinique"=28,
      "Sao Tome"=29,
      "Argentina"=30,
      "Guatemala"=31,
      "South Africa"=32,
      "Bolivia"=33,
      "St. Lucia"=34,
      "Portugal"=35,
      "Singapore"=36,
      "Vietnam"=37,
      "Grenada"=38,
      "Israel"=39,
      "India"=40,
      "Czech Republic"=41,
      "Dominican Republic"=42,
      "Finland"=43,
      "Madagascar"=44,
      "Philippines"=45,
      "Sweden"=46,
      "Poland"=47,
      "Austria"=48,
      "Honduras"=49,
      "Nicaragua"=50,
      "Lithuania"=51,
      "Chile"=52,
      "Ghana"=53,
      "Iceland"=54,
      "Hungary"=55,
      "Denmark"=56,
      "Suriname"=57,
      "Ireland"=58
  ))
## Warning: Unreplaced values treated as NA as .x is not compatible. Please
## specify replacements exhaustively or supply .default
#Assign numbers to factors to make Bean Type (beantype) numeric
chocolate_clean <- chocolate_clean %>%
  mutate(beantype = recode(
    chocolate_clean$`BeanType`,
      "Amazon"=1,
      "Amazon mix"=2,
      "Amazon, ICS"=3,
      "Beniano"=4,
      "Blend"=5,
      "Blend-Forastero,Criollo"=6,
      "CCN51"=7,
      "ciol"=8,
      "ciol (Arriba)"=9,
      "Criollo"=10,
      "Criollo (Amarru)"=11,
      "Criollo (Ocumare 61)"=12,
      "Criollo (Ocumare 67)"=13,
      "Criollo (Ocumare 77)"=14,
      "Criollo (Ocumare)"=15,
      "Criollo (Porcela)"=16,
      "Criollo (Wild)"=17,
      "Criollo, +"=18,
      "Criollo, Forastero"=19,
      "Criollo, Trinitario"=20,
      "EET"=21,
      "Forastero"=22,
      "Forastero (Amelodo)"=23,
      "Forastero (Arriba)"=24,
      "Forastero (Arriba) ASS"=25,
      "Forastero (Arriba) ASSS"=26,
      "Forastero (Catongo)"=27,
      "Forastero (ciol)"=28,
      "Forastero (Parazinho)"=29,
      "Forastero(Arriba, CCN)"=30,
      "Forastero, Trinitario"=31,
      "Mati"=32,
      "Trinitario"=33,
      "Trinitario (85% Criollo)"=34,
      "Trinitario (Amelodo)"=35,
      "Trinitario (Scavi)"=36,
      "Trinitario, ciol"=37,
      "Trinitario, Criollo"=38,
      "Trinitario, Forastero"=39,
      "Trinitario, TCGA"=40
  ))
## Warning: Unreplaced values treated as NA as .x is not compatible. Please
## specify replacements exhaustively or supply .default
#Assign numbers to factors to make Broad Bean Origin (bborigin) numeric
chocolate_clean <- chocolate_clean %>%
  mutate(BeabOrigin = recode(
    chocolate_clean$`BroadBeanOrigin`,
      "Africa, Carribean, C. Am."=1,
      "Australia"=2,
      "Belize"=3,
      "Bolivia"=4,
      "Brazil"=5,
      "Burma"=6,
      "Cameroon"=7,
      "Carribean"=8,
      "Carribean(DR/Jam/Tri)"=9,
      "Central and S. America"=10,
      "Colombia"=11,
      "Colombia, Ecuador"=12,
      "Congo"=13,
      "Cost Rica, Venezuela"=14,
      "Costa Rica"=15,
      "Cuba"=16,
      "Domincan Republic, Madagascar"=17,
      "Domincan Republic"=18,
      "Dominican Republic, Bali"=19,
      "Dominican Republic"=20,
      "Domincan Republic, Ecuador, Peru"=21,
      "Ecuador"=22,
      "Ecuador, Costa Rica"=23,
      "Ecuador, Madagascar, PNG"=24,
      "El Salvador"=25,
      "Fiji"=26,
      "Gabon"=27,
      "Ghana"=28,
      "GhaNA& Madagascar"=29,
      "Ghana, Domincan Republic "=30,
      "Ghana, Pama, Ecuador"=31,
      "Greda, PNG, Hawaii, Haiti, Madagascar"=32,
      "Greda"=33,
      "Guatemala"=34,
      "Haiti"=35,
      "Hawaii"=36,
      "Honduras"=37,
      "India"=38,
      "Indonesia"=39,
      "Indonesia, Ghana"=40,
      "Ivory Coast"=41,
      "Jamaica"=42,
      "Liberia"=43,
      "Madagascar, Java, PNG"=44,
      "Madagascar"=45,
      "Madagascar & Ecuador"=46,
      "Malaysia"=47,
      "Martinique"=48,
      "Mexico"=49,
      "Nicaragua"=50,
      "Nigeria"=51,
      "Pama"=52,
      "Papua New Guinea"=53,
      "Peru"=54,
      "Peru(SMartin,Pangoa,ciol)"=55,
      "Peru, Belize"=56,
      "Peru, Dom. Rep"=57,
      "Peru, Ecuador"=58,
      "Peru, Ecuador, Venezuela"=59,
      "Peru, Madagascar, Dominican Republic"=60,
      "Peru, Madagascar"=61,
      "Philippines"=62,
      "PNG, Vanuatu, Mad"=63,
      "Principe"=64,
      "Puerto Rico"=65,
      "Samoa"=66,
      "Sao Tome"=67,
      "Sao Tome & Principe"=68,
      "Solomon Islands"=69,
      "South America"=70,
      "South America, Africa"=71,
      "Sri Lanka"=72,
      "St. Lucia"=73,
      "Suriname"=74,
      "Tanzania"=75,
      "Tobago"=76,
      "Togo"=77,
      "Trinidad"=78,
      "Trinidad, Ecuador"=79,
      "Trinidad, Tobago"=80,
      "Trinidad-Tobago"=81,
      "Uganda"=82,
      "Vanuatu"=83,
      "Venezuela, Bolivia, D.R."=84,
      "Venezuela, Trinidad, Ecuador"=85,
      "Venezuela, Indonesia, Ecuador"=86,
      "Venezuela, Trinidad, Madagascar"=87,
      "Venezuela,Ecuador,Peru,Nicaragua"=88,
      "Venezuela,Africa,Brasil,Peru,Mexico"=89,
      "Venezuela"=90,
      "Venezuela, Carribean"=91,
      "Venezuela, Dom. Rep."=92,
      "Venezuela, Gha"=93,
      "Venezuela, Java"=94,
      "Venezuela, Trinidad"=95,
      "Venezuela/ Ghana"=96,
      "Vietnam"=97,
      "West Africa"=98,
      "Guatemala, Domincan Republic, Peru, Madagascar, PNG"=99
  ))
## Warning: Unreplaced values treated as NA as .x is not compatible. Please
## specify replacements exhaustively or supply .default
#Assign numbers to factors to make Companies/Maker (companies) numeric
chocolate_clean <- chocolate_clean %>%
  mutate(companies = recode(
    chocolate_clean$`Company`,
      "A. Morin"=1,
      "Acalli"=2,
      "Adi"=3,
      "Aequare (Gianduja)"=3,
      "Ah Cacao"=4,
      "Akesson's (Pralus)"=5,
      "Alain Ducasse"=6,
      "Alexandre"=7,
      "Altus aka Cao Artisan"=8,
      "Amano"=9,
      "Amatller (Simon Coll)"=10,
      "Amazona"=11,
      "Ambrosia"=12,
      "Amedei"=13,
      "AMMA"=14,
      "Anahata"=15,
      "Animas"=16,
      "Ara"=17,
      "Arete"=18,
      "Artisan du Chocolat"=19,
      "Artisan du Chocolat (Casa Luker)"=20,
      "Askinosie"=21,
      "Bahen & Co."=22,
      "Bakau"=23,
      "Bar Au Chocolat"=24,
      "Baravelli's"=25,
      "Batch"=26,
      "Beau Cacao"=27,
      "Beehive"=28,
      "Belcolade"=29,
      "Bellflower"=30,
      "Belyzium"=31,
      "Benoit Nihant"=32,
      "Bernachon"=33,
      "Beschle (Felchlin)"=34,
      "Bisou"=35,
      "Bittersweet Origins"=36,
      "Black Mountain"=37,
      "Black River (A. Morin)"=38,
      "Blanxart"=39,
      "Blue Bandana"=40,
      "Bonnat"=41,
      "Bouga Cacao (Tulicorp)"=42,
      "Bowler Man"=43,
      "Brasstown aka It's Chocolate"=44,
      "Brazen"=45,
      "Breeze Mill"=46,
      "Bright"=47,
      "Britarev"=48,
      "Bronx Grrl Chocolate"=49,
      "Burnt Fork Bend"=50,
      "Cacao Arabuco"=51,
      "Cacao Atlanta"=52,
      "Cacao Barry"=53,
      "Cacao de Origen"=54,
      "Cacao de Origin"=55,
      "Cacao Hunters"=56,
      "Cacao Market"=57,
      "Cacao Prieto"=58,
      "Cacao Sampaka"=59,
      "Cacao Store"=60,
      "Cacaosuyo (Theobroma Inversiones)"=61,
      "Cacaoyere (Ecuatoriana)"=62,
      "Callebaut"=63,
      "C-Amaro"=64,
      "Cao"=65,
      "Caoni (Tulicorp)"=66,
      "Captain Pembleton"=67,
      "Caribeans"=68,
      "Carlotta Chocolat"=69,
      "Castronovo"=70,
      "Cello"=71,
      "Cemoi"=72,
      "Chaleur B"=73,
      "Charm School"=74,
      "Chchukululu (Tulicorp)"=75,
      "Chequessett"=76,
      "Chloe Chocolat"=77,
      "Chocablog"=78,
      "Choco Del Sol"=79,
      "Choco Dong"=80,
      "Chocolarder"=81,
      "Chocola'te"=82,
      "Chocolate Alchemist-Philly"=83,
      "Chocolate Con Amor"=84,
      "Chocolate Conspiracy"=85,
      "Chocolate Makers"=86,
      "Chocolate Tree, The"=87,
      "Chocolats Privilege"=88,
      "ChocoReko"=89,
      "Chocovic"=90,
      "Chocovivo"=91,
      "Choklat"=92,
      "Chokolat Elot (Girard)"=93,
      "Choocsol"=94,
      "Christopher Morel (Felchlin)"=95,
      "Chuao Chocolatier"=96,
      "Chuao Chocolatier (Pralus)"=97,
      "Claudio Corallo"=98,
      "Cloudforest"=99,
      "Coleman & Davis"=100,
      "Compania de Chocolate (Salgado)"=101,
      "Condor"=102,
      "Confluence"=103,
      "Coppeneur"=104,
      "Cote d' Or (Kraft)"=105,
      "Cravve"=106,
      "Creo"=107,
      "Daintree"=108,
      "Dalloway"=109,
      "Damson"=110,
      "Dandelion"=111,
      "Danta"=112,
      "DAR"=113,
      "Dark Forest"=114,
      "Davis"=115,
      "De Mendes"=116,
      "De Villiers"=117,
      "Dean and Deluca (Belcolade)"=118,
      "Debauve & Gallais (Michel Cluizel)"=119,
      "Desbarres"=120,
      "DeVries"=121,
      "Dick Taylor"=122,
      "Doble & Bignall"=123,
      "Dole (Guittard)"=124,
      "Dolfin (Belcolade)"=125,
      "Domori"=126,
      "Dormouse"=127,
      "Duffy's"=128,
      "Dulcinea"=129,
      "Durand"=130,
      "Durci"=131,
      "East Van Roasters"=132,
      "Eau de Rose"=133,
      "Eclat (Felchlin)"=134,
      "Edelmond"=135,
      "El Ceibo"=136,
      "El Rey"=137,
      "Emerald Estate"=138,
      "Emily's"=139,
      "ENNA"=140,
      "Enric Rovira (Claudio Corallo)"=141,
      "Erithaj (A. Morin)"=142,
      "Escazu"=143,
      "Ethel's Artisan (Mars)"=144,
      "Ethereal"=145,
      "Fearless (AMMA)"=146,
      "Feitoria Cacao"=147,
      "Felchlin"=148,
      "Finca"=149,
      "Forever Cacao"=150,
      "Forteza (Cortes)"=151,
      "Fossa"=152,
      "Franceschi"=153,
      "Frederic Blondeel"=154,
      "French Broad"=155,
      "Fresco"=156,
      "Friis Holm"=157,
      "Friis Holm (Bonnat)"=158,
      "Fruition"=159,
      "Garden Island"=160,
      "Georgia Ramon"=161,
      "Glennmade"=162,
      "Goodnow Farms"=163,
      "Grand Place"=164,
      "Green & Black's (ICAM)"=165,
      "Green Bean to Bar"=166,
      "Grenada Chocolate Co."=167,
      "Guido Castagna"=168,
      "Guittard"=169,
      "Habitual"=170,
      "Hachez"=171,
      "Hacienda El Castillo"=172,
      "Haigh"=173,
      "Harper Macaw"=174,
      "Heilemann"=175,
      "Heirloom Cacao Preservation (Brasstown)"=176,
      "Heirloom Cacao Preservation (Fruition)"=177,
      "Heirloom Cacao Preservation (Guittard)"=178,
      "Heirloom Cacao Preservation (Manoa)"=179,
      "Heirloom Cacao Preservation (Millcreek)"=180,
      "Heirloom Cacao Preservation (Mindo)"=181,
      "Heirloom Cacao Preservation (Zokoko)"=182,
      "hello cocoa"=183,
      "hexx"=184,
      "Hogarth"=185,
      "Hoja Verde (Tulicorp)"=186,
      "Holy Cacao"=187,
      "Honest"=188,
      "Hotel Chocolat"=189,
      "Hotel Chocolat (Coppeneur)"=190,
      "Hummingbird"=191,
      "Idilio (Felchlin)"=192,
      "Indah"=193,
      "Indaphoria"=194,
      "Indi"=195,
      "iQ Chocolate"=196,
      "Isidro"=197,
      "Izard"=198,
      "Jacque Torres"=199,
      "Jordis"=200,
      "Just Good Chocolate"=201,
      "Kah Kow"=202,
      "Kakao"=203,
      "Kallari (Ecuatoriana)"=204,
      "Kaoka (Cemoi)"=205,
      "Kerchner"=206,
      "Ki' Xocolatl"=207,
      "Kiskadee"=208,
      "Kto"=209,
      "K'ul"=210,
      "Kyya"=211,
      "L.A. Burdick (Felchlin)"=212,
      "La Chocolaterie Nanairo"=213,
      "La Maison du Chocolat (Valrhona)"=214,
      "La Oroquidea"=215,
      "La Pepa de Oro"=216,
      "Laia aka Chat-Noir"=217,
      "Lajedo do Ouro"=218,
      "Lake Champlain (Callebaut)"=219,
      "L'Amourette"=220,
      "Letterpress"=221,
      "Levy"=222,
      "Lilla"=223,
      "Lillie Belle"=224,
      "Lindt & Sprungli"=225,
      "Loiza"=226,
      "Lonohana"=227,
      "Love Bar"=228,
      "Luker"=229,
      "Machu Picchu Trading Co."=230,
      "Madecasse (Cinagra)"=231,
      "Madre"=232,
      "Maglio"=233,
      "Majani"=234,
      "Malagasy (Chocolaterie Robert)"=235,
      "Malagos"=236,
      "Malie Kai (Guittard)"=237,
      "Malmo"=238,
      "Mana"=239,
      "Manifesto Cacao"=240,
      "Manoa"=241,
      "Manufaktura Czekolady"=242,
      "Map Chocolate"=243,
      "Marana"=244,
      "Marigold's Finest"=245,
      "Marou"=246,
      "Mars"=247,
      "Marsatta"=248,
      "Martin Mayer"=249,
      "Mast Brothers"=250,
      "Matale"=251,
      "Maverick"=252,
      "Mayacama"=253,
      "Meadowlands"=254,
      "Menakao (aka Cinagra)"=255,
      "Mesocacao"=256,
      "Metiisto"=257,
      "Metropolitan"=258,
      "Michel Cluizel"=259,
      "Middlebury"=260,
      "Millcreek Cacao Roasters"=261,
      "Mindo"=262,
      "Minimal"=263,
      "Mission"=264,
      "Mita"=265,
      "Moho"=266,
      "Molucca"=267,
      "Momotombo"=268,
      "Monarque"=269,
      "Monsieur Truffe"=270,
      "Montecristi"=271,
      "Muchomas (Mesocacao)"=272,
      "Mutari"=273,
      "Nahua"=274,
      "Naive"=275,
      "Na�ve"=276,
      "Nanea"=277,
      "Nathan Miller"=278,
      "Neuhaus (Callebaut)"=279,
      "Nibble"=280,
      "Night Owl"=281,
      "Noble Bean aka Jerjobo"=282,
      "Noir d' Ebine"=283,
      "Nova Monda"=284,
      "Nuance"=285,
      "Nugali"=286,
      "Oakland Chocolate Co."=287,
      "Obolo"=288,
      "Ocelot"=289,
      "Ocho"=290,
      "Ohiyo"=291,
      "Oialla by Bojessen (Malmo)"=292,
      "Olive and Sinclair"=293,
      "Olivia"=294,
      "Omanhene"=295,
      "Omnom"=296,
      "organicfair"=297,
      "Original Beans (Felchlin)"=298,
      "Original Hawaiin Chocolate Factory"=299,
      "Orquidea"=300,
      "Pacari"=301,
      "Palette de Bine"=302,
      "Pangea"=303,
      "Park 75"=304,
      "Parliament"=305,
      "Pascha"=306,
      "Patric"=307,
      "Paul Young"=308,
      "Peppalo"=309,
      "Pierre Marcolini"=310,
      "Pinellas"=311,
      "Pitch Dark"=312,
      "Pomm (aka Dead Dog)"=313,
      "Potomac"=314,
      "Pralus"=315,
      "Pump Street Bakery"=316,
      "Pura Delizia"=317,
      "Q Chocolate"=318,
      "Quetzalli (Wolter)"=319,
      "Raaka"=320,
      "Rain Republic"=321,
      "Rancho San Jacinto"=322,
      "Ranger"=323,
      "Raoul Boulanger"=324,
      "Raw Cocoa"=325,
      "Republica del Cacao (aka Confecta)"=326,
      "Ritual"=327,
      "Roasting Masters"=328,
      "Robert (aka Chocolaterie Robert)"=329,
      "Rococo (Grenada Chocolate Co.)"=330,
      "Rogue"=331,
      "Rozsavolgyi"=332,
      "S.A.I.D."=333,
      "Sacred"=334,
      "Salgado"=335,
      "Santander (Compania Nacional)"=336,
      "Santome"=337,
      "Scharffen Berger"=338,
      "Seaforth"=339,
      "Shark Mountain"=340,
      "Shark's"=341,
      "Shattel"=342,
      "Shattell"=343,
      "Sibu"=344,
      "Sibu Sura"=345,
      "Silvio Bessone"=346,
      "Sirene"=347,
      "Sjolinds"=348,
      "Smooth Chocolator, The"=349,
      "Snake & Butterfly"=350,
      "Sol Cacao"=351,
      "Solkiki"=352,
      "Solomons Gold"=353,
      "Solstice"=354,
      "Soma"=355,
      "Somerville"=356,
      "Soul"=357,
      "Spagnvola"=358,
      "Spencer"=359,
      "Sprungli (Felchlin)"=360,
      "SRSLY"=361,
      "Starchild"=362,
      "Stella (aka Bernrain)"=363,
      "Stone Grindz"=364,
      "StRita Supreme"=365,
      "Sublime Origins"=366,
      "Summerbird"=367,
      "Suruca Chocolate"=368,
      "Svenska Kakaobolaget"=369,
      "Szanto Tibor"=370,
      "Tabal"=371,
      "Tablette (aka Vanillabeans)"=372,
      "Tan Ban Skrati"=373,
      "Taza"=374,
      "TCHO"=375,
      "Tejas"=376,
      "Terroir"=377,
      "The Barn"=378,
      "Theo"=379,
      "Theobroma"=380,
      "Timo A. Meyer"=381,
      "To'ak (Ecuatoriana)"=382,
      "Tobago Estate (Pralus)"=383,
      "Tocoti"=384,
      "Treehouse"=385,
      "Tsara (Cinagra)"=386,
      "twenty-four blackbirds"=387,
      "Two Ravens"=388,
      "Un Dimanche A Paris"=389,
      "Undone"=390,
      "Upchurch"=391,
      "Urzi"=392,
      "Valrhona"=393,
      "Vanleer (Barry Callebaut)"=394,
      "Vao Vao (Chocolaterie Robert)"=395,
      "Vicuna"=396,
      "Videri"=397,
      "Vietcacao (A. Morin)"=398,
      "Vintage Plantations"=399,
      "Vintage Plantations (Tulicorp)"=400,
      "Violet Sky"=401,
      "Vivra"=402,
      "Wellington Chocolate Factory"=403,
      "Whittakers"=404,
      "Wilkie's Organic"=405,
      "Willie's Cacao"=406,
      "Wm"=407,
      "Woodblock"=408,
      "Xocolat"=409,
      "Xocolla"=410,
      "Zak's"=411,
      "Zart Pralinen"=412,
      "Zokoko"=413,
      "Zotter"=414
  ))
## Warning: Unreplaced values treated as NA as .x is not compatible. Please
## specify replacements exhaustively or supply .default
write.csv(chocolate_clean, file="flavor_of_cacao.csv")

DATA ANALYZING:VISUALISATION.

Let us try to pick up the trend when plotting the ratings vs the cocoa percentage(concentration).

Graph of Rating against Cocoa Percentage.

ggplot(chocolate_clean,aes(x=chocolate_clean$CocoaPercent, y=Rating))+
  geom_jitter() +
  geom_smooth(method = "lm")+
  xlab("Cocoa Percent");

This plot confirms that the higher the cocoa concentration the lower the ratings, the higher the percentage of cocoa, the bitter the chocolate becomes.

Graph of Rating against Company with respect to the location.

bar<-ggplot(chocolate_clean,aes(x=chocolate_clean$companies, y=Rating, color=location))+
  geom_bar(stat = "identity")+
  xlab("Company")

bar+coord_polar();
## Warning: Removed 4 rows containing missing values (position_stack).

bar+coord_flip();
## Warning: Removed 4 rows containing missing values (position_stack).

The company with the highest chocolate rating is Soma since it is the only company in the range of 350 to 400,with the highest rating and that is located in Canada. According to the ratings,it is the best chocolate producing company.

Graph of the Rating against the REF

ggplot(chocolate_clean,aes(x=REF, y=Rating))+
  geom_jitter() +
  geom_smooth(method = "lm")+
  xlab("REF");

The Rating increases as the REF increases.