Logo de Universidad Latina
Objetivo: elaborar un documento autorreproducible (HTML) que incluya lo siguiente:
Investigue como insertar una imagen en un documento de R-Markdown:
En internet busque un gráfico (imagen) e insértelo en el documento. Proceda a explicar cuáles son los componentes del gráfico y que información esta codificada.
Utilizando los datos de la tabla cafe.csv realice lo siguiente:
Se entrega en un documento html creado con rmarkdown que contenga el codigo. Suba el producto final a la plataforma.
Para esta sección y la 3 vamos a utilizar el post de Peter H. Diamandis disponible en World in Data Al texto le vamos a insertar estilos requeridos, y algunos adicionales; utilizando herramientas de HTML, R-Markdown y CSS. P.S. tambien se puede observar la aplicación de los requerimientos 1 y 2 en la portada del trabajo.
Before I share the new “data” with you, it’s essential that you understand why this matters.
We live in a world where we are constantly bombarded by negative news from every angle. If you turn on CNN (what I call the Crisis News Network), you’ll predominantly hear about death, terrorism, airplane crashes, bombings, financial crisis and political scandal.
I think of the news as a drug pusher, and negative news is their drug.
We humans are wired to pay 10x more attention to negative news than positive news.
Being able to rapidly notice and pay attention to negative news (like a predator or a dangerous fire) was an evolutionary advantage to keep you alive on the savannahs of Africa millions of years ago.
Today, we still pay more attention to negative news, and the news media knows this. They take advantage of it to drive our eyeballs to their advertisers. Typically, good news networks fail as businesses.
It’s not that the news media is lying – it’s just not a balanced view of what’s going on in the world.
AND because your mindset matters A LOT, my purpose with my work and with this blog is to share with you the data supporting the positive side of the equation and to give you insight to some fundamental truths about where humanity really is going…
The truth is, driven by advances in exponential technologies, things are getting much better around the world at an accelerating rate.
Imagen de XKCD
Esta gráfica de líneas fue generada por Max Roser y la tomamos del sitio de Peter Diamandis.
Muestra el porcentaje de la población mundial que vive en pobreza y como ese porcentaje a venido cambiando a través del tiempo. El eje X nos muestra una escala de tiempo desde 1820 hasta 2010, mientas que el eje Y nos muestra porcentajes de población.
En los datos tenemos 3 conjuntos: El porcentaje de la población en pobreza, en pobreza extrema y viviendo con menos de $1 al día. En las 3 series se observa la misma tendencia, donde la pobreza viene bajando de forma marcada a través del tiempo.
Max Roser: Declining Global Poverty
** No encontré la tabla cafe.csv en los documentos adjuntos al trabajo. Sin embargo, conseguí una copia del dataset en Github. **
library(readr)
coffee_ratings <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv')
## Rows: 1339 Columns: 43
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (24): species, owner, country_of_origin, farm_name, lot_number, mill, ic...
## dbl (19): total_cup_points, number_of_bags, aroma, flavor, aftertaste, acidi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(coffee_ratings)
## spc_tbl_ [1,339 × 43] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ total_cup_points : num [1:1339] 90.6 89.9 89.8 89 88.8 ...
## $ species : chr [1:1339] "Arabica" "Arabica" "Arabica" "Arabica" ...
## $ owner : chr [1:1339] "metad plc" "metad plc" "grounds for health admin" "yidnekachew dabessa" ...
## $ country_of_origin : chr [1:1339] "Ethiopia" "Ethiopia" "Guatemala" "Ethiopia" ...
## $ farm_name : chr [1:1339] "metad plc" "metad plc" "san marcos barrancas \"san cristobal cuch" "yidnekachew dabessa coffee plantation" ...
## $ lot_number : chr [1:1339] NA NA NA NA ...
## $ mill : chr [1:1339] "metad plc" "metad plc" NA "wolensu" ...
## $ ico_number : chr [1:1339] "2014/2015" "2014/2015" NA NA ...
## $ company : chr [1:1339] "metad agricultural developmet plc" "metad agricultural developmet plc" NA "yidnekachew debessa coffee plantation" ...
## $ altitude : chr [1:1339] "1950-2200" "1950-2200" "1600 - 1800 m" "1800-2200" ...
## $ region : chr [1:1339] "guji-hambela" "guji-hambela" NA "oromia" ...
## $ producer : chr [1:1339] "METAD PLC" "METAD PLC" NA "Yidnekachew Dabessa Coffee Plantation" ...
## $ number_of_bags : num [1:1339] 300 300 5 320 300 100 100 300 300 50 ...
## $ bag_weight : chr [1:1339] "60 kg" "60 kg" "1" "60 kg" ...
## $ in_country_partner : chr [1:1339] "METAD Agricultural Development plc" "METAD Agricultural Development plc" "Specialty Coffee Association" "METAD Agricultural Development plc" ...
## $ harvest_year : chr [1:1339] "2014" "2014" NA "2014" ...
## $ grading_date : chr [1:1339] "April 4th, 2015" "April 4th, 2015" "May 31st, 2010" "March 26th, 2015" ...
## $ owner_1 : chr [1:1339] "metad plc" "metad plc" "Grounds for Health Admin" "Yidnekachew Dabessa" ...
## $ variety : chr [1:1339] NA "Other" "Bourbon" NA ...
## $ processing_method : chr [1:1339] "Washed / Wet" "Washed / Wet" NA "Natural / Dry" ...
## $ aroma : num [1:1339] 8.67 8.75 8.42 8.17 8.25 8.58 8.42 8.25 8.67 8.08 ...
## $ flavor : num [1:1339] 8.83 8.67 8.5 8.58 8.5 8.42 8.5 8.33 8.67 8.58 ...
## $ aftertaste : num [1:1339] 8.67 8.5 8.42 8.42 8.25 8.42 8.33 8.5 8.58 8.5 ...
## $ acidity : num [1:1339] 8.75 8.58 8.42 8.42 8.5 8.5 8.5 8.42 8.42 8.5 ...
## $ body : num [1:1339] 8.5 8.42 8.33 8.5 8.42 8.25 8.25 8.33 8.33 7.67 ...
## $ balance : num [1:1339] 8.42 8.42 8.42 8.25 8.33 8.33 8.25 8.5 8.42 8.42 ...
## $ uniformity : num [1:1339] 10 10 10 10 10 10 10 10 9.33 10 ...
## $ clean_cup : num [1:1339] 10 10 10 10 10 10 10 10 10 10 ...
## $ sweetness : num [1:1339] 10 10 10 10 10 10 10 9.33 9.33 10 ...
## $ cupper_points : num [1:1339] 8.75 8.58 9.25 8.67 8.58 8.33 8.5 9 8.67 8.5 ...
## $ moisture : num [1:1339] 0.12 0.12 0 0.11 0.12 0.11 0.11 0.03 0.03 0.1 ...
## $ category_one_defects : num [1:1339] 0 0 0 0 0 0 0 0 0 0 ...
## $ quakers : num [1:1339] 0 0 0 0 0 0 0 0 0 0 ...
## $ color : chr [1:1339] "Green" "Green" NA "Green" ...
## $ category_two_defects : num [1:1339] 0 1 0 2 2 1 0 0 0 4 ...
## $ expiration : chr [1:1339] "April 3rd, 2016" "April 3rd, 2016" "May 31st, 2011" "March 25th, 2016" ...
## $ certification_body : chr [1:1339] "METAD Agricultural Development plc" "METAD Agricultural Development plc" "Specialty Coffee Association" "METAD Agricultural Development plc" ...
## $ certification_address: chr [1:1339] "309fcf77415a3661ae83e027f7e5f05dad786e44" "309fcf77415a3661ae83e027f7e5f05dad786e44" "36d0d00a3724338ba7937c52a378d085f2172daa" "309fcf77415a3661ae83e027f7e5f05dad786e44" ...
## $ certification_contact: chr [1:1339] "19fef5a731de2db57d16da10287413f5f99bc2dd" "19fef5a731de2db57d16da10287413f5f99bc2dd" "0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660" "19fef5a731de2db57d16da10287413f5f99bc2dd" ...
## $ unit_of_measurement : chr [1:1339] "m" "m" "m" "m" ...
## $ altitude_low_meters : num [1:1339] 1950 1950 1600 1800 1950 ...
## $ altitude_high_meters : num [1:1339] 2200 2200 1800 2200 2200 NA NA 1700 1700 1850 ...
## $ altitude_mean_meters : num [1:1339] 2075 2075 1700 2000 2075 ...
## - attr(*, "spec")=
## .. cols(
## .. total_cup_points = col_double(),
## .. species = col_character(),
## .. owner = col_character(),
## .. country_of_origin = col_character(),
## .. farm_name = col_character(),
## .. lot_number = col_character(),
## .. mill = col_character(),
## .. ico_number = col_character(),
## .. company = col_character(),
## .. altitude = col_character(),
## .. region = col_character(),
## .. producer = col_character(),
## .. number_of_bags = col_double(),
## .. bag_weight = col_character(),
## .. in_country_partner = col_character(),
## .. harvest_year = col_character(),
## .. grading_date = col_character(),
## .. owner_1 = col_character(),
## .. variety = col_character(),
## .. processing_method = col_character(),
## .. aroma = col_double(),
## .. flavor = col_double(),
## .. aftertaste = col_double(),
## .. acidity = col_double(),
## .. body = col_double(),
## .. balance = col_double(),
## .. uniformity = col_double(),
## .. clean_cup = col_double(),
## .. sweetness = col_double(),
## .. cupper_points = col_double(),
## .. moisture = col_double(),
## .. category_one_defects = col_double(),
## .. quakers = col_double(),
## .. color = col_character(),
## .. category_two_defects = col_double(),
## .. expiration = col_character(),
## .. certification_body = col_character(),
## .. certification_address = col_character(),
## .. certification_contact = col_character(),
## .. unit_of_measurement = col_character(),
## .. altitude_low_meters = col_double(),
## .. altitude_high_meters = col_double(),
## .. altitude_mean_meters = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
library(tidyverse)
## Warning in system("timedatectl", intern = TRUE): running command 'timedatectl'
## had status 1
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ dplyr 1.0.10
## ✔ tibble 3.1.8 ✔ stringr 1.5.0
## ✔ tidyr 1.2.1 ✔ forcats 0.5.2
## ✔ purrr 0.3.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
glimpse(coffee_ratings)
## Rows: 1,339
## Columns: 43
## $ total_cup_points <dbl> 90.58, 89.92, 89.75, 89.00, 88.83, 88.83, 88.75,…
## $ species <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Ara…
## $ owner <chr> "metad plc", "metad plc", "grounds for health ad…
## $ country_of_origin <chr> "Ethiopia", "Ethiopia", "Guatemala", "Ethiopia",…
## $ farm_name <chr> "metad plc", "metad plc", "san marcos barrancas …
## $ lot_number <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ mill <chr> "metad plc", "metad plc", NA, "wolensu", "metad …
## $ ico_number <chr> "2014/2015", "2014/2015", NA, NA, "2014/2015", N…
## $ company <chr> "metad agricultural developmet plc", "metad agri…
## $ altitude <chr> "1950-2200", "1950-2200", "1600 - 1800 m", "1800…
## $ region <chr> "guji-hambela", "guji-hambela", NA, "oromia", "g…
## $ producer <chr> "METAD PLC", "METAD PLC", NA, "Yidnekachew Dabes…
## $ number_of_bags <dbl> 300, 300, 5, 320, 300, 100, 100, 300, 300, 50, 3…
## $ bag_weight <chr> "60 kg", "60 kg", "1", "60 kg", "60 kg", "30 kg"…
## $ in_country_partner <chr> "METAD Agricultural Development plc", "METAD Agr…
## $ harvest_year <chr> "2014", "2014", NA, "2014", "2014", "2013", "201…
## $ grading_date <chr> "April 4th, 2015", "April 4th, 2015", "May 31st,…
## $ owner_1 <chr> "metad plc", "metad plc", "Grounds for Health Ad…
## $ variety <chr> NA, "Other", "Bourbon", NA, "Other", NA, "Other"…
## $ processing_method <chr> "Washed / Wet", "Washed / Wet", NA, "Natural / D…
## $ aroma <dbl> 8.67, 8.75, 8.42, 8.17, 8.25, 8.58, 8.42, 8.25, …
## $ flavor <dbl> 8.83, 8.67, 8.50, 8.58, 8.50, 8.42, 8.50, 8.33, …
## $ aftertaste <dbl> 8.67, 8.50, 8.42, 8.42, 8.25, 8.42, 8.33, 8.50, …
## $ acidity <dbl> 8.75, 8.58, 8.42, 8.42, 8.50, 8.50, 8.50, 8.42, …
## $ body <dbl> 8.50, 8.42, 8.33, 8.50, 8.42, 8.25, 8.25, 8.33, …
## $ balance <dbl> 8.42, 8.42, 8.42, 8.25, 8.33, 8.33, 8.25, 8.50, …
## $ uniformity <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
## $ clean_cup <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, …
## $ sweetness <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
## $ cupper_points <dbl> 8.75, 8.58, 9.25, 8.67, 8.58, 8.33, 8.50, 9.00, …
## $ moisture <dbl> 0.12, 0.12, 0.00, 0.11, 0.12, 0.11, 0.11, 0.03, …
## $ category_one_defects <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ quakers <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ color <chr> "Green", "Green", NA, "Green", "Green", "Bluish-…
## $ category_two_defects <dbl> 0, 1, 0, 2, 2, 1, 0, 0, 0, 4, 1, 0, 0, 2, 2, 0, …
## $ expiration <chr> "April 3rd, 2016", "April 3rd, 2016", "May 31st,…
## $ certification_body <chr> "METAD Agricultural Development plc", "METAD Agr…
## $ certification_address <chr> "309fcf77415a3661ae83e027f7e5f05dad786e44", "309…
## $ certification_contact <chr> "19fef5a731de2db57d16da10287413f5f99bc2dd", "19f…
## $ unit_of_measurement <chr> "m", "m", "m", "m", "m", "m", "m", "m", "m", "m"…
## $ altitude_low_meters <dbl> 1950.0, 1950.0, 1600.0, 1800.0, 1950.0, NA, NA, …
## $ altitude_high_meters <dbl> 2200.0, 2200.0, 1800.0, 2200.0, 2200.0, NA, NA, …
## $ altitude_mean_meters <dbl> 2075.0, 2075.0, 1700.0, 2000.0, 2075.0, NA, NA, …
Total cup points tiene valores invalidos de 0. Vamos a filtrar aquellos valores que esten muy por debajo de la media y que pueden ser erroneos
summary(coffee_ratings$total_cup_points)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 81.08 82.50 82.09 83.67 90.58
coffee_ratings<-coffee_ratings %>%
filter(total_cup_points>10)
summary(coffee_ratings$total_cup_points)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 59.83 81.10 82.50 82.15 83.67 90.58
Hay varios paises para los que tenemos muy pocas muestras e introducen ruido a la tabla. Vamos a considerar solo los que esten por encima del primer quadril con mas de 10 muestras.
country_counts <- coffee_ratings %>% count(country_of_origin)
summary(country_counts )
## country_of_origin n
## Length:37 Min. : 1.00
## Class :character 1st Qu.: 3.00
## Mode :character Median : 11.00
## Mean : 36.16
## 3rd Qu.: 40.00
## Max. :236.00
coffee_ratings <- coffee_ratings %>%
group_by(country_of_origin) %>%
filter(n()> 10)
summary(coffee_ratings %>% count(country_of_origin))
## country_of_origin n
## Length:19 Min. : 11.00
## Class :character 1st Qu.: 23.00
## Mode :character Median : 40.00
## Mean : 66.74
## 3rd Qu.: 74.00
## Max. :236.00
country_summary <- tapply(coffee_ratings$total_cup_points, coffee_ratings$country_of_origin, summary)
head(country_summary)
## $Brazil
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 70.67 81.73 82.42 82.41 83.25 88.83
##
## $China
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 78.00 82.40 83.17 82.93 84.31 87.25
##
## $Colombia
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 72.83 82.62 83.25 83.11 83.92 86.00
##
## $`Costa Rica`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 71.75 81.75 83.25 82.79 84.46 87.17
##
## $`El Salvador`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 79.67 82.25 82.83 83.05 84.17 85.58
##
## $Ethiopia
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 80.00 84.06 85.20 85.48 87.12 90.58
Gráfica Manual
# library
library(ggplot2)
# grouped boxplot
ggplot(coffee_ratings, aes(x=country_of_origin, y=total_cup_points)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
R-Studio (2020). R Markdown Authoring Basics. R-Studio. Obtenido en https://rmarkdown.rstudio.com/authoring_basics.html Yihui Xie (2022). R Markdown Cookbook. Chapman & Hall/CRC. Obtenido en https://bookdown.org/yihui/rmarkdown-cookbook/font-color.html
R-Graph Gallery. Box plot with ggplot2. Obtenido en: https://r-graph-gallery.com/265-grouped-boxplot-with-ggplot2.html