Logo de Universidad Latina

Maestría en Business Analytics

Modelado Analítico de Decisiones

Trabajo Final: Documento autorreproducible.

Pablo Torre

Diciembre 2022.

Instrucciones:

Objetivo: elaborar un documento autorreproducible (HTML) que incluya lo siguiente:

  1. Escoja un documento y transcribalo en la consola R. A este documento debera aplicarle distintos formatos:
  • Niveles de Título:

    Header 4

    Header 6
  • Negrita : “Luck favors the bold.”
  • Colores de letra: Roses are red, violets are blue.
  1. Investigue como insertar una imagen en un documento de R-Markdown:

  2. En internet busque un gráfico (imagen) e insértelo en el documento. Proceda a explicar cuáles son los componentes del gráfico y que información esta codificada.

  3. Utilizando los datos de la tabla cafe.csv realice lo siguiente:

  • Lea el archivo y verifique que se leyó correctamente con str
  • Realice un gráfico con papel y lápiz que permita comparar distintos países utilizando la variable total.cup.points
  • Replique el mismo gráfico en ggplot.

Se entrega en un documento html creado con rmarkdown que contenga el codigo. Suba el producto final a la plataforma.

1. Escoja un documento y transcribalo en la consola R. Con formatos.

Para esta sección y la 3 vamos a utilizar el post de Peter H. Diamandis disponible en World in Data Al texto le vamos a insertar estilos requeridos, y algunos adicionales; utilizando herramientas de HTML, R-Markdown y CSS. P.S. tambien se puede observar la aplicación de los requerimientos 1 y 2 en la portada del trabajo.

Why is this important

Before I share the new “data” with you, it’s essential that you understand why this matters.

We live in a world where we are constantly bombarded by negative news from every angle. If you turn on CNN (what I call the Crisis News Network), you’ll predominantly hear about death, terrorism, airplane crashes, bombings, financial crisis and political scandal.

I think of the news as a drug pusher, and negative news is their drug.

There’s a reason for this.

We humans are wired to pay 10x more attention to negative news than positive news.

Being able to rapidly notice and pay attention to negative news (like a predator or a dangerous fire) was an evolutionary advantage to keep you alive on the savannahs of Africa millions of years ago.

Today, we still pay more attention to negative news, and the news media knows this. They take advantage of it to drive our eyeballs to their advertisers. Typically, good news networks fail as businesses.

It’s not that the news media is lying – it’s just not a balanced view of what’s going on in the world.

AND because your mindset matters A LOT, my purpose with my work and with this blog is to share with you the data supporting the positive side of the equation and to give you insight to some fundamental truths about where humanity really is going…

The truth is, driven by advances in exponential technologies, things are getting much better around the world at an accelerating rate.

2. Investigar e insertar imagen de documento

Imagen de XKCD

3. Insertar e interpretar gráfica de internet.

Esta gráfica de líneas fue generada por Max Roser y la tomamos del sitio de Peter Diamandis.

Muestra el porcentaje de la población mundial que vive en pobreza y como ese porcentaje a venido cambiando a través del tiempo. El eje X nos muestra una escala de tiempo desde 1820 hasta 2010, mientas que el eje Y nos muestra porcentajes de población.

En los datos tenemos 3 conjuntos: El porcentaje de la población en pobreza, en pobreza extrema y viviendo con menos de $1 al día. En las 3 series se observa la misma tendencia, donde la pobreza viene bajando de forma marcada a través del tiempo.

Max Roser: Declining Global Poverty

4 Tabla Café.csv

** No encontré la tabla cafe.csv en los documentos adjuntos al trabajo. Sin embargo, conseguí una copia del dataset en Github. **

Lea el archivo y verifique que se leyó correctamente con str

library(readr)

coffee_ratings <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv')
## Rows: 1339 Columns: 43
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (24): species, owner, country_of_origin, farm_name, lot_number, mill, ic...
## dbl (19): total_cup_points, number_of_bags, aroma, flavor, aftertaste, acidi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(coffee_ratings)
## spc_tbl_ [1,339 × 43] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ total_cup_points     : num [1:1339] 90.6 89.9 89.8 89 88.8 ...
##  $ species              : chr [1:1339] "Arabica" "Arabica" "Arabica" "Arabica" ...
##  $ owner                : chr [1:1339] "metad plc" "metad plc" "grounds for health admin" "yidnekachew dabessa" ...
##  $ country_of_origin    : chr [1:1339] "Ethiopia" "Ethiopia" "Guatemala" "Ethiopia" ...
##  $ farm_name            : chr [1:1339] "metad plc" "metad plc" "san marcos barrancas \"san cristobal cuch" "yidnekachew dabessa coffee plantation" ...
##  $ lot_number           : chr [1:1339] NA NA NA NA ...
##  $ mill                 : chr [1:1339] "metad plc" "metad plc" NA "wolensu" ...
##  $ ico_number           : chr [1:1339] "2014/2015" "2014/2015" NA NA ...
##  $ company              : chr [1:1339] "metad agricultural developmet plc" "metad agricultural developmet plc" NA "yidnekachew debessa coffee plantation" ...
##  $ altitude             : chr [1:1339] "1950-2200" "1950-2200" "1600 - 1800 m" "1800-2200" ...
##  $ region               : chr [1:1339] "guji-hambela" "guji-hambela" NA "oromia" ...
##  $ producer             : chr [1:1339] "METAD PLC" "METAD PLC" NA "Yidnekachew Dabessa Coffee Plantation" ...
##  $ number_of_bags       : num [1:1339] 300 300 5 320 300 100 100 300 300 50 ...
##  $ bag_weight           : chr [1:1339] "60 kg" "60 kg" "1" "60 kg" ...
##  $ in_country_partner   : chr [1:1339] "METAD Agricultural Development plc" "METAD Agricultural Development plc" "Specialty Coffee Association" "METAD Agricultural Development plc" ...
##  $ harvest_year         : chr [1:1339] "2014" "2014" NA "2014" ...
##  $ grading_date         : chr [1:1339] "April 4th, 2015" "April 4th, 2015" "May 31st, 2010" "March 26th, 2015" ...
##  $ owner_1              : chr [1:1339] "metad plc" "metad plc" "Grounds for Health Admin" "Yidnekachew Dabessa" ...
##  $ variety              : chr [1:1339] NA "Other" "Bourbon" NA ...
##  $ processing_method    : chr [1:1339] "Washed / Wet" "Washed / Wet" NA "Natural / Dry" ...
##  $ aroma                : num [1:1339] 8.67 8.75 8.42 8.17 8.25 8.58 8.42 8.25 8.67 8.08 ...
##  $ flavor               : num [1:1339] 8.83 8.67 8.5 8.58 8.5 8.42 8.5 8.33 8.67 8.58 ...
##  $ aftertaste           : num [1:1339] 8.67 8.5 8.42 8.42 8.25 8.42 8.33 8.5 8.58 8.5 ...
##  $ acidity              : num [1:1339] 8.75 8.58 8.42 8.42 8.5 8.5 8.5 8.42 8.42 8.5 ...
##  $ body                 : num [1:1339] 8.5 8.42 8.33 8.5 8.42 8.25 8.25 8.33 8.33 7.67 ...
##  $ balance              : num [1:1339] 8.42 8.42 8.42 8.25 8.33 8.33 8.25 8.5 8.42 8.42 ...
##  $ uniformity           : num [1:1339] 10 10 10 10 10 10 10 10 9.33 10 ...
##  $ clean_cup            : num [1:1339] 10 10 10 10 10 10 10 10 10 10 ...
##  $ sweetness            : num [1:1339] 10 10 10 10 10 10 10 9.33 9.33 10 ...
##  $ cupper_points        : num [1:1339] 8.75 8.58 9.25 8.67 8.58 8.33 8.5 9 8.67 8.5 ...
##  $ moisture             : num [1:1339] 0.12 0.12 0 0.11 0.12 0.11 0.11 0.03 0.03 0.1 ...
##  $ category_one_defects : num [1:1339] 0 0 0 0 0 0 0 0 0 0 ...
##  $ quakers              : num [1:1339] 0 0 0 0 0 0 0 0 0 0 ...
##  $ color                : chr [1:1339] "Green" "Green" NA "Green" ...
##  $ category_two_defects : num [1:1339] 0 1 0 2 2 1 0 0 0 4 ...
##  $ expiration           : chr [1:1339] "April 3rd, 2016" "April 3rd, 2016" "May 31st, 2011" "March 25th, 2016" ...
##  $ certification_body   : chr [1:1339] "METAD Agricultural Development plc" "METAD Agricultural Development plc" "Specialty Coffee Association" "METAD Agricultural Development plc" ...
##  $ certification_address: chr [1:1339] "309fcf77415a3661ae83e027f7e5f05dad786e44" "309fcf77415a3661ae83e027f7e5f05dad786e44" "36d0d00a3724338ba7937c52a378d085f2172daa" "309fcf77415a3661ae83e027f7e5f05dad786e44" ...
##  $ certification_contact: chr [1:1339] "19fef5a731de2db57d16da10287413f5f99bc2dd" "19fef5a731de2db57d16da10287413f5f99bc2dd" "0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660" "19fef5a731de2db57d16da10287413f5f99bc2dd" ...
##  $ unit_of_measurement  : chr [1:1339] "m" "m" "m" "m" ...
##  $ altitude_low_meters  : num [1:1339] 1950 1950 1600 1800 1950 ...
##  $ altitude_high_meters : num [1:1339] 2200 2200 1800 2200 2200 NA NA 1700 1700 1850 ...
##  $ altitude_mean_meters : num [1:1339] 2075 2075 1700 2000 2075 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   total_cup_points = col_double(),
##   ..   species = col_character(),
##   ..   owner = col_character(),
##   ..   country_of_origin = col_character(),
##   ..   farm_name = col_character(),
##   ..   lot_number = col_character(),
##   ..   mill = col_character(),
##   ..   ico_number = col_character(),
##   ..   company = col_character(),
##   ..   altitude = col_character(),
##   ..   region = col_character(),
##   ..   producer = col_character(),
##   ..   number_of_bags = col_double(),
##   ..   bag_weight = col_character(),
##   ..   in_country_partner = col_character(),
##   ..   harvest_year = col_character(),
##   ..   grading_date = col_character(),
##   ..   owner_1 = col_character(),
##   ..   variety = col_character(),
##   ..   processing_method = col_character(),
##   ..   aroma = col_double(),
##   ..   flavor = col_double(),
##   ..   aftertaste = col_double(),
##   ..   acidity = col_double(),
##   ..   body = col_double(),
##   ..   balance = col_double(),
##   ..   uniformity = col_double(),
##   ..   clean_cup = col_double(),
##   ..   sweetness = col_double(),
##   ..   cupper_points = col_double(),
##   ..   moisture = col_double(),
##   ..   category_one_defects = col_double(),
##   ..   quakers = col_double(),
##   ..   color = col_character(),
##   ..   category_two_defects = col_double(),
##   ..   expiration = col_character(),
##   ..   certification_body = col_character(),
##   ..   certification_address = col_character(),
##   ..   certification_contact = col_character(),
##   ..   unit_of_measurement = col_character(),
##   ..   altitude_low_meters = col_double(),
##   ..   altitude_high_meters = col_double(),
##   ..   altitude_mean_meters = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
library(tidyverse)
## Warning in system("timedatectl", intern = TRUE): running command 'timedatectl'
## had status 1
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ dplyr   1.0.10
## ✔ tibble  3.1.8      ✔ stringr 1.5.0 
## ✔ tidyr   1.2.1      ✔ forcats 0.5.2 
## ✔ purrr   0.3.5      
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)
glimpse(coffee_ratings)
## Rows: 1,339
## Columns: 43
## $ total_cup_points      <dbl> 90.58, 89.92, 89.75, 89.00, 88.83, 88.83, 88.75,…
## $ species               <chr> "Arabica", "Arabica", "Arabica", "Arabica", "Ara…
## $ owner                 <chr> "metad plc", "metad plc", "grounds for health ad…
## $ country_of_origin     <chr> "Ethiopia", "Ethiopia", "Guatemala", "Ethiopia",…
## $ farm_name             <chr> "metad plc", "metad plc", "san marcos barrancas …
## $ lot_number            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ mill                  <chr> "metad plc", "metad plc", NA, "wolensu", "metad …
## $ ico_number            <chr> "2014/2015", "2014/2015", NA, NA, "2014/2015", N…
## $ company               <chr> "metad agricultural developmet plc", "metad agri…
## $ altitude              <chr> "1950-2200", "1950-2200", "1600 - 1800 m", "1800…
## $ region                <chr> "guji-hambela", "guji-hambela", NA, "oromia", "g…
## $ producer              <chr> "METAD PLC", "METAD PLC", NA, "Yidnekachew Dabes…
## $ number_of_bags        <dbl> 300, 300, 5, 320, 300, 100, 100, 300, 300, 50, 3…
## $ bag_weight            <chr> "60 kg", "60 kg", "1", "60 kg", "60 kg", "30 kg"…
## $ in_country_partner    <chr> "METAD Agricultural Development plc", "METAD Agr…
## $ harvest_year          <chr> "2014", "2014", NA, "2014", "2014", "2013", "201…
## $ grading_date          <chr> "April 4th, 2015", "April 4th, 2015", "May 31st,…
## $ owner_1               <chr> "metad plc", "metad plc", "Grounds for Health Ad…
## $ variety               <chr> NA, "Other", "Bourbon", NA, "Other", NA, "Other"…
## $ processing_method     <chr> "Washed / Wet", "Washed / Wet", NA, "Natural / D…
## $ aroma                 <dbl> 8.67, 8.75, 8.42, 8.17, 8.25, 8.58, 8.42, 8.25, …
## $ flavor                <dbl> 8.83, 8.67, 8.50, 8.58, 8.50, 8.42, 8.50, 8.33, …
## $ aftertaste            <dbl> 8.67, 8.50, 8.42, 8.42, 8.25, 8.42, 8.33, 8.50, …
## $ acidity               <dbl> 8.75, 8.58, 8.42, 8.42, 8.50, 8.50, 8.50, 8.42, …
## $ body                  <dbl> 8.50, 8.42, 8.33, 8.50, 8.42, 8.25, 8.25, 8.33, …
## $ balance               <dbl> 8.42, 8.42, 8.42, 8.25, 8.33, 8.33, 8.25, 8.50, …
## $ uniformity            <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
## $ clean_cup             <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, …
## $ sweetness             <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00,…
## $ cupper_points         <dbl> 8.75, 8.58, 9.25, 8.67, 8.58, 8.33, 8.50, 9.00, …
## $ moisture              <dbl> 0.12, 0.12, 0.00, 0.11, 0.12, 0.11, 0.11, 0.03, …
## $ category_one_defects  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ quakers               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ color                 <chr> "Green", "Green", NA, "Green", "Green", "Bluish-…
## $ category_two_defects  <dbl> 0, 1, 0, 2, 2, 1, 0, 0, 0, 4, 1, 0, 0, 2, 2, 0, …
## $ expiration            <chr> "April 3rd, 2016", "April 3rd, 2016", "May 31st,…
## $ certification_body    <chr> "METAD Agricultural Development plc", "METAD Agr…
## $ certification_address <chr> "309fcf77415a3661ae83e027f7e5f05dad786e44", "309…
## $ certification_contact <chr> "19fef5a731de2db57d16da10287413f5f99bc2dd", "19f…
## $ unit_of_measurement   <chr> "m", "m", "m", "m", "m", "m", "m", "m", "m", "m"…
## $ altitude_low_meters   <dbl> 1950.0, 1950.0, 1600.0, 1800.0, 1950.0, NA, NA, …
## $ altitude_high_meters  <dbl> 2200.0, 2200.0, 1800.0, 2200.0, 2200.0, NA, NA, …
## $ altitude_mean_meters  <dbl> 2075.0, 2075.0, 1700.0, 2000.0, 2075.0, NA, NA, …
Limpieza de datos.

Total cup points tiene valores invalidos de 0. Vamos a filtrar aquellos valores que esten muy por debajo de la media y que pueden ser erroneos

summary(coffee_ratings$total_cup_points)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   81.08   82.50   82.09   83.67   90.58
coffee_ratings<-coffee_ratings %>% 
  filter(total_cup_points>10)

summary(coffee_ratings$total_cup_points)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   59.83   81.10   82.50   82.15   83.67   90.58

Hay varios paises para los que tenemos muy pocas muestras e introducen ruido a la tabla. Vamos a considerar solo los que esten por encima del primer quadril con mas de 10 muestras.

country_counts <- coffee_ratings %>% count(country_of_origin)
summary(country_counts ) 
##  country_of_origin        n         
##  Length:37          Min.   :  1.00  
##  Class :character   1st Qu.:  3.00  
##  Mode  :character   Median : 11.00  
##                     Mean   : 36.16  
##                     3rd Qu.: 40.00  
##                     Max.   :236.00
coffee_ratings <- coffee_ratings %>% 
  group_by(country_of_origin) %>%
  filter(n()> 10)

summary(coffee_ratings %>% count(country_of_origin))
##  country_of_origin        n         
##  Length:19          Min.   : 11.00  
##  Class :character   1st Qu.: 23.00  
##  Mode  :character   Median : 40.00  
##                     Mean   : 66.74  
##                     3rd Qu.: 74.00  
##                     Max.   :236.00

Realice un gráfico con papel y lápiz que permita comparar distintos países utilizando la variable total.cup.points

country_summary <- tapply(coffee_ratings$total_cup_points, coffee_ratings$country_of_origin, summary)

head(country_summary)
## $Brazil
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   70.67   81.73   82.42   82.41   83.25   88.83 
## 
## $China
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   78.00   82.40   83.17   82.93   84.31   87.25 
## 
## $Colombia
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   72.83   82.62   83.25   83.11   83.92   86.00 
## 
## $`Costa Rica`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   71.75   81.75   83.25   82.79   84.46   87.17 
## 
## $`El Salvador`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   79.67   82.25   82.83   83.05   84.17   85.58 
## 
## $Ethiopia
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   80.00   84.06   85.20   85.48   87.12   90.58

Gráfica Manual

Replique el mismo gráfico en ggplot.

# library
library(ggplot2)
 

 
# grouped boxplot
ggplot(coffee_ratings, aes(x=country_of_origin, y=total_cup_points)) + 
    geom_boxplot() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Fuentes bibliográficas:

R-Studio (2020). R Markdown Authoring Basics. R-Studio. Obtenido en https://rmarkdown.rstudio.com/authoring_basics.html Yihui Xie (2022). R Markdown Cookbook. Chapman & Hall/CRC. Obtenido en https://bookdown.org/yihui/rmarkdown-cookbook/font-color.html

R-Graph Gallery. Box plot with ggplot2. Obtenido en: https://r-graph-gallery.com/265-grouped-boxplot-with-ggplot2.html

Extras:
  • Italica : ceteris paribus <- todo permanece constante
  • Equaciones en LaTeX : \(e^{i\pi}+1=0\)
  • Links URL : RStudio
  • Superscript: 32 = 9
  • Strikethrough : “If you strike me down, I shall become more powerful than you can possibly imagine.”