coffeData = read.table("coffeData.csv", header=TRUE, sep=";", dec=".", stringsAsFactors = TRUE)
summary(coffeData)
## X Species Country.of.Origin Fragrance...Aroma
## Min. : 1 Arabica:1309 Mexico :237 Min. :5.080
## 1st Qu.: 335 Robusta: 28 Colombia :183 1st Qu.:7.420
## Median : 669 Guatemala :180 Median :7.580
## Mean : 669 Brazil :132 Mean :7.572
## 3rd Qu.:1003 Taiwan : 75 3rd Qu.:7.750
## Max. :1337 United States (Hawaii): 73 Max. :8.750
## (Other) :457
## Flavor Aftertaste Salt...Acid Mouthfeel
## Min. :6.080 Min. :6.170 Min. :5.250 Min. :5.080
## 1st Qu.:7.330 1st Qu.:7.250 1st Qu.:7.330 1st Qu.:7.330
## Median :7.580 Median :7.420 Median :7.580 Median :7.500
## Mean :7.527 Mean :7.407 Mean :7.541 Mean :7.524
## 3rd Qu.:7.750 3rd Qu.:7.580 3rd Qu.:7.750 3rd Qu.:7.750
## Max. :8.830 Max. :8.670 Max. :8.750 Max. :8.750
##
## Balance Bitter...Sweet Uniform.Cup Clean.Cup
## Min. : 5.250 Min. :5.250 Min. : 6.000 Min. : 0.000
## 1st Qu.:10.000 1st Qu.:7.330 1st Qu.:10.000 1st Qu.:10.000
## Median :10.000 Median :7.500 Median :10.000 Median :10.000
## Mean : 9.868 Mean :7.527 Mean : 9.844 Mean : 9.849
## 3rd Qu.:10.000 3rd Qu.:7.670 3rd Qu.:10.000 3rd Qu.:10.000
## Max. :10.000 Max. :8.580 Max. :10.000 Max. :10.000
##
## Cupper.Points quality_score
## Min. : 5.17 Min. :63.08
## 1st Qu.: 7.25 1st Qu.:81.17
## Median : 7.50 Median :82.50
## Mean : 7.51 Mean :82.17
## 3rd Qu.: 7.75 3rd Qu.:83.67
## Max. :10.00 Max. :90.58
##
library(ggplot2)
g1 <- ggplot(coffeData, aes(x=Flavor))+
geom_histogram(fill="#F08787")+
labs(title="Coffee flavor histogram")
g1
g2 <- ggplot(coffeData, aes(x=quality_score))+
geom_histogram(fill="#6D94C5")+
labs(title="Coffee quality score histogram")
g2
library(gridExtra)
g3 <- ggplot(coffeData, aes(x=Flavor))+
geom_boxplot(fill="#F08787")+
labs(title="Coffee flavor boxplot")
g4 <- ggplot(coffeData, aes(x=quality_score))+
geom_boxplot(fill="#6D94C5")+
labs(title="Coffee quality score boxplot")
grid.arrange(g3, g4)
grid.arrange(g1, g2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
ggplot(coffeData, aes(x=Flavor, y=quality_score))+
geom_jitter(colour="#748873")+
geom_smooth(method="lm", colour="#F08787")
## `geom_smooth()` using formula = 'y ~ x'
cor(x=coffeData$Flavor, y=coffeData$quality_score, method='pearson')
## [1] 0.8348271
Interpretacion
Se tiene un coeficiente de correlacion de \(r=0.83\), lo cual indica una relacion alta entre las variables. Por tanto, a mayor calificacion del sabor mayor es la calidad del cafe.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
SurAmericaCountries <- filter(coffeData,
Country.of.Origin %in% c("Colombia","Brazil","Ecuador","Peru"))
unique(coffeData$Country.of.Origin)
## [1] Uganda India
## [3] United States Ecuador
## [5] Vietnam Ethiopia
## [7] Guatemala Brazil
## [9] Peru United States (Hawaii)
## [11] Indonesia China
## [13] Costa Rica Mexico
## [15] Honduras Taiwan
## [17] Nicaragua Tanzania, United Republic Of
## [19] Kenya Thailand
## [21] Colombia Panama
## [23] Papua New Guinea El Salvador
## [25] Japan United States (Puerto Rico)
## [27] Haiti Burundi
## [29] Philippines Rwanda
## [31] Malawi Laos
## [33] Zambia Myanmar
## [35] Mauritius Cote d?Ivoire
## 36 Levels: Brazil Burundi China Colombia Costa Rica Cote d?Ivoire ... Zambia
ggplot(SurAmericaCountries, aes(x=Country.of.Origin, y=quality_score, fill=Country.of.Origin))+
geom_boxplot()+
labs(title="Boxplots calidad con base en el pais", y="Puntaje de calidad")
Interpretacion
Se observa que Colombia se presenta una mayor calidad del cafe en comparacion de otros paises de Suramerica.
Dendrobates truncatus:
Colombia presenta la mejor calidad del cafe que varios paises de Suramerica (Brazil, Ecuador, Peru).
Las variables sabor y calidad del cafe tienen una relacion directamente proporcional.
Existen datos atipicos en las variables sabor y calidad del cafe.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.