Aim

This is a sample seminar about the final work that each one should present.

The objective of the seminar is to demonstrate your skills in performing an exploratory analysis.

The minimum elements that the seminar should have include

  • pose a question that can be answered with data
  • load data into R, ideally from an online source
  • load r packages
  • comment in detail all the code
  • explore the data
  • identify tabulated data in tidy format
  • identify the location and proportion of null data in the dataset
  • create one or more summary tables
  • Create one or more exploratory graphics
  • Data wrangling: Apply one or more of these commands: filter, select, mutate, pivot
  • answer the question posed with a correctly formatted graphic
  • generate a codebook using the reporteR package (or dataMaid if reporteR is not available yet)
  • use the rmarkdown format to integrate text and code
  • export the document together with the code to a pdf or doc file

Minimum code must include

  • Packages
    • pacman::p_load()
  • Data import
    • read_csv()
  • Data exploration
    • head()
    • summary()
    • dim()
  • Data wrangling
    • %>%
    • filter()
    • select()
    • mutate()
    • group_by()
    • summarize()
  • Tables
    • gtsummary::tbl_summary()
  • Graphs
    • ggplot()
  • Codebook
    • dataMaid::codebook()

Extra points:

  • use packages that we haven’t seen in classes
  • use join_
  • use log10 transformations for axes

Below is a sample seminar. Your code may be more or less than the example, there is no maximum or minimum. The important thing is that:

  • you must use the minimum commands listed before,
  • you must comment most of your code to document the steps of your analysis and
  • you must export your code script to a pdf or docx document (go to Preview Notebook tab and select the format to export, detailed instructions here)

IMPORTANT: Your code should be executable

Some recommendations

Write clear code (select code, CTRL+SHIFT+A): it will help you when something doesn’t work

In case something doesn’t work, don’t despair, that happens to everyone, beginners and advanced. The important thing is to be able to detect the error. In case some error message appears, I suggest you first verify that your code doesn’t have some obvious error (like some orphan parenthesis, a period instead of a comma, etc) and if the error persists, copy and paste the error message in google to find out the solution.

Remember: there is no problem that cannot be solved without the proper use of google or a hammer!

SEMINAR EXAMPLE

Question

What’s the birth rate for european countries and for continents and What is the birth rate for the Baltic countries?

Packages

# install the pacman package if is not installed previously, uncomment next line

# install.packages("pacman")

pacman::p_load(tidyverse, # several packages for data science
               visdat,  # to visualize NAs
               gtsummary, # for nice tables
               dataMaid, # for the codebook
               janitor) # for data cleaning

Dataset

Found in the World Bank data

https://data.worldbank.org/indicator/SP.DYN.CBRT.IN Found the Birth rate, crude (per 1,000 people) Is in zip format, I created a copy online in google drive, published as a csv file and imported into R I will call my dataset as df for Data Frame

df <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vStv7Pr69DtRKv6Nw6gVBep8hbT3pEeO6B1vNwxK_1DUHgpoTgbuRpZ4SvgtHFQnBZJVGeeQVyRuXZl/pub?gid=473966571&single=true&output=csv")
Missing column names filled in: 'X3' [3], 'X4' [4], 'X5' [5], 'X6' [6], 'X7' [7], 'X8' [8], 'X9' [9], 'X10' [10], 'X11' [11], 'X12' [12], 'X13' [13], 'X14' [14], 'X15' [15], 'X16' [16], 'X17' [17], 'X18' [18], 'X19' [19], 'X20' [20], 'X21' [21], 'X22' [22], 'X23' [23], 'X24' [24], 'X25' [25], 'X26' [26], 'X27' [27], 'X28' [28], 'X29' [29], 'X30' [30], 'X31' [31], 'X32' [32], 'X33' [33], 'X34' [34], 'X35' [35], 'X36' [36], 'X37' [37], 'X38' [38], 'X39' [39], 'X40' [40], 'X41' [41], 'X42' [42], 'X43' [43], 'X44' [44], 'X45' [45], 'X46' [46], 'X47' [47], 'X48' [48], 'X49' [49], 'X50' [50], 'X51' [51], 'X52' [52], 'X53' [53], 'X54' [54], 'X55' [55], 'X56' [56], 'X57' [57], 'X58' [58], 'X59' [59], 'X60' [60], 'X61' [61], 'X62' [62], 'X63' [63], 'X64' [64], 'X65' [65]
── Column specification ───────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  `Data Source` = col_character(),
  `World Development Indicators` = col_character(),
  X3 = col_character(),
  X4 = col_character()
)
ℹ Use `spec()` for the full column specifications.

Data cleaning

head(df)

Seems that there are some extra rows at the top of the dataset. I will re-read, adding the option to skip those rows

df <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vStv7Pr69DtRKv6Nw6gVBep8hbT3pEeO6B1vNwxK_1DUHgpoTgbuRpZ4SvgtHFQnBZJVGeeQVyRuXZl/pub?gid=473966571&single=true&output=csv", 
               skip = 4) # add the option to skip 4 rows when importing the file

── Column specification ───────────────────────────────────────────────────────────
cols(
  .default = col_double(),
  `Country Name` = col_character(),
  `Country Code` = col_character(),
  `Indicator Name` = col_character(),
  `Indicator Code` = col_character(),
  `2019` = col_logical(),
  `2020` = col_logical()
)
ℹ Use `spec()` for the full column specifications.

Now it’s ok

I will standardize the names to facilitate handling

df <- df %>%              # create a new dataset with the former, dataset
  janitor::clean_names()  # and clean all the names
  

the last two columns are empty, so I will delete it

df <- df %>% 
  select(-x2019,  # here I unselect these columns
         -x2020)

Data wrangling

Check the dimensions

dim(df)
[1] 264  63

Check the variables included

glimpse(df)
Rows: 264
Columns: 63
$ country_name   <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra", "…
$ country_code   <chr> "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "…
$ indicator_name <chr> "Birth rate, crude (per 1,000 people)", "Birth rate, crud…
$ indicator_code <chr> "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "SP…
$ x1960          <dbl> 35.67900, 51.27900, 49.08000, 40.92400, NA, 47.79008, 47.…
$ x1961          <dbl> 34.52900, 51.37300, 48.77900, 40.36800, NA, 47.55839, 46.…
$ x1962          <dbl> 33.3200, 51.4570, 48.5470, 39.6270, NA, 47.3276, 46.0930,…
$ x1963          <dbl> 32.05000, 51.53000, 48.43000, 38.72300, NA, 47.09162, 45.…
$ x1964          <dbl> 30.73700, 51.58900, 48.45000, 37.69500, NA, 46.84421, 44.…
$ x1965          <dbl> 29.4130, 51.6310, 48.6220, 36.5990, NA, 46.5771, 43.8560,…
$ x1966          <dbl> 28.12100, 51.65200, 48.93600, 35.49600, NA, 46.28291, 42.…
$ x1967          <dbl> 26.90800, 51.65000, 49.34300, 34.43500, NA, 45.96055, 41.…
$ x1968          <dbl> 25.81700, 51.62300, 49.78700, 33.45800, NA, 45.61137, 40.…
$ x1969          <dbl> 24.87200, 51.57400, 50.23100, 32.59000, NA, 45.23716, 38.…
$ x1970          <dbl> 24.09900, 51.50200, 50.61900, 31.83700, NA, 44.84362, 37.…
$ x1971          <dbl> 23.50500, 51.41100, 50.90300, 31.18300, NA, 44.44035, 35.…
$ x1972          <dbl> 23.06800, 51.30300, 51.06200, 30.58700, NA, 44.03865, 34.…
$ x1973          <dbl> 22.76000, 51.18400, 51.09400, 30.01900, NA, 43.64783, 32.…
$ x1974          <dbl> 22.56100, 51.05800, 51.00500, 29.47300, NA, 43.27485, 31.…
$ x1975          <dbl> 22.45200, 50.93000, 50.82500, 28.94900, NA, 42.92493, 30.…
$ x1976          <dbl> 22.41400, 50.80300, 50.60000, 28.45500, NA, 42.60063, 30.…
$ x1977          <dbl> 22.4240, 50.6780, 50.3860, 28.0040, NA, 42.2929, 29.7350,…
$ x1978          <dbl> 22.45400, 50.55500, 50.22600, 27.60600, NA, 41.98993, 29.…
$ x1979          <dbl> 22.47800, 50.43600, 50.13900, 27.26200, NA, 41.68051, 29.…
$ x1980          <dbl> 22.47200, 50.32100, 50.13400, 26.98100, NA, 41.34983, 29.…
$ x1981          <dbl> 22.42400, 50.21000, 50.20700, 26.77200, NA, 40.98353, 30.…
$ x1982          <dbl> 22.32900, 50.09800, 50.32200, 26.62700, NA, 40.57024, 30.…
$ x1983          <dbl> 22.18700, 49.98400, 50.44900, 26.52800, NA, 40.10007, 30.…
$ x1984          <dbl> 21.98900, 49.86500, 50.56900, 26.45200, NA, 39.56784, 30.…
$ x1985          <dbl> 21.72600, 49.73500, 50.66300, 26.36700, NA, 38.96898, 30.…
$ x1986          <dbl> 21.39700, 49.58600, 50.71200, 26.24100, 11.90000, 38.3028…
$ x1987          <dbl> 21.00800, 49.41800, 50.71100, 26.04700, 11.00000, 37.5805…
$ x1988          <dbl> 20.5700, 49.2360, 50.6570, 25.7620, 11.6000, 36.8177, 28.…
$ x1989          <dbl> 20.08900, 49.04800, 50.54700, 25.37200, 12.50000, 36.0297…
$ x1990          <dbl> 19.57100, 48.88000, 50.38300, 24.86700, 11.90000, 35.3180…
$ x1991          <dbl> 19.02100, 48.76300, 50.16800, 24.24500, 11.90000, 34.5218…
$ x1992          <dbl> 18.44600, 48.70900, 49.91900, 23.52900, 12.10000, 33.8345…
$ x1993          <dbl> 17.85900, 48.71700, 49.65200, 22.74200, 11.40000, 33.0478…
$ x1994          <dbl> 17.27000, 48.77000, 49.37800, 21.90200, 10.90000, 32.2864…
$ x1995          <dbl> 16.69100, 48.83500, 49.11300, 21.02000, 11.00000, 31.5017…
$ x1996          <dbl> 16.13200, 48.87000, 48.87000, 20.10600, 10.90000, 30.8345…
$ x1997          <dbl> 15.59800, 48.83300, 48.65200, 19.17300, 11.20000, 30.2171…
$ x1998          <dbl> 15.09000, 48.68800, 48.46000, 18.23800, 11.90000, 29.6497…
$ x1999          <dbl> 14.61500, 48.41900, 48.29300, 17.32100, 12.60000, 29.1436…
$ x2000          <dbl> 14.17300, 48.02100, 48.15000, 16.43600, 11.30000, 28.7091…
$ x2001          <dbl> 13.76200, 47.50500, 48.02700, 15.59000, 11.80000, 28.3477…
$ x2002          <dbl> 13.37500, 46.90100, 47.91100, 14.79000, 11.20000, 28.0589…
$ x2003          <dbl> 13.01000, 46.23100, 47.78600, 14.04800, 10.30000, 27.8307…
$ x2004          <dbl> 12.66700, 45.50700, 47.63900, 13.38100, 10.90000, 27.6578…
$ x2005          <dbl> 12.34800, 44.72300, 47.45300, 12.82100, 10.70000, 27.5377…
$ x2006          <dbl> 12.05300, 43.87000, 47.21500, 12.39800, 10.60000, 27.4772…
$ x2007          <dbl> 11.78800, 42.94400, 46.92000, 12.11800, 10.10000, 27.4607…
$ x2008          <dbl> 11.556, 41.949, 46.563, 11.973, 10.400, 27.463, 12.581, 1…
$ x2009          <dbl> 11.361, 40.903, 46.143, 11.945, 9.900, 27.496, 12.208, 18…
$ x2010          <dbl> 11.21400, 39.82900, 45.65600, 12.00100, 9.80000, 27.51344…
$ x2011          <dbl> 11.12300, 38.75000, 45.10200, 12.10000, NA, 27.48487, 11.…
$ x2012          <dbl> 11.0900, 37.6900, 44.4930, 12.1970, 9.5000, 27.3893, 11.3…
$ x2013          <dbl> 11.11100, 36.67000, 43.84700, 12.25700, NA, 27.21144, 11.…
$ x2014          <dbl> 11.17900, 35.70600, 43.18200, 12.25900, NA, 26.94078, 10.…
$ x2015          <dbl> 11.28100, 34.80900, 42.52000, 12.19700, NA, 26.57699, 10.…
$ x2016          <dbl> 11.4040, 33.9810, 41.8820, 12.0800, 8.8000, 26.1348, 10.6…
$ x2017          <dbl> 11.53200, 33.21100, 41.28100, 11.93400, NA, 25.64801, 10.…
$ x2018          <dbl> 11.65200, 32.48700, 40.72900, 11.78000, 7.20000, 25.14754…

Check the content of some variables with simple tables

table(df$indicator_name)

Birth rate, crude (per 1,000 people) 
                                 264 
table(df$indicator_code)

SP.DYN.CBRT.IN 
           264 

So, both columns are keys, that is some constant and not variables, so I will delete it

df <- df %>% 
  select(-indicator_name, 
         -indicator_code)

Check

summary(df)
 country_name       country_code           x1960           x1961      
 Length:264         Length:264         Min.   :13.40   Min.   :13.70  
 Class :character   Class :character   1st Qu.:27.81   1st Qu.:26.90  
 Mode  :character   Mode  :character   Median :42.69   Median :42.58  
                                       Mean   :38.13   Mean   :37.81  
                                       3rd Qu.:47.29   3rd Qu.:47.21  
                                       Max.   :58.12   Max.   :58.19  
                                       NA's   :25      NA's   :26     
     x1962           x1963           x1964           x1965           x1966      
 Min.   :12.90   Min.   :13.10   Min.   :13.10   Min.   :13.10   Min.   :12.70  
 1st Qu.:28.41   1st Qu.:28.33   1st Qu.:27.56   1st Qu.:26.67   1st Qu.:25.28  
 Median :42.18   Median :42.16   Median :41.83   Median :41.19   Median :40.38  
 Mean   :37.89   Mean   :37.80   Mean   :37.35   Mean   :36.85   Mean   :36.25  
 3rd Qu.:47.01   3rd Qu.:46.83   3rd Qu.:46.45   3rd Qu.:46.13   3rd Qu.:45.95  
 Max.   :58.23   Max.   :58.21   Max.   :58.15   Max.   :58.04   Max.   :57.87  
 NA's   :25      NA's   :26      NA's   :26      NA's   :26      NA's   :25     
     x1967           x1968           x1969           x1970           x1971      
 Min.   :14.00   Min.   :13.70   Min.   :12.52   Min.   :11.57   Min.   :10.87  
 1st Qu.:24.57   1st Qu.:23.72   1st Qu.:23.02   1st Qu.:22.10   1st Qu.:22.54  
 Median :39.76   Median :39.09   Median :38.16   Median :37.16   Median :36.64  
 Mean   :35.96   Mean   :35.62   Mean   :35.22   Mean   :34.71   Mean   :34.50  
 3rd Qu.:45.74   3rd Qu.:45.56   3rd Qu.:45.32   3rd Qu.:45.27   3rd Qu.:45.08  
 Max.   :57.66   Max.   :57.43   Max.   :57.19   Max.   :56.95   Max.   :56.73  
 NA's   :26      NA's   :26      NA's   :26      NA's   :22      NA's   :24     
     x1972           x1973            x1974            x1975       
 Min.   :10.34   Min.   : 9.943   Min.   : 9.701   Min.   : 9.715  
 1st Qu.:22.52   1st Qu.:21.971   1st Qu.:21.115   1st Qu.:20.888  
 Median :35.94   Median :35.267   Median :34.877   Median :34.493  
 Mean   :34.16   Mean   :33.722   Mean   :33.426   Mean   :33.058  
 3rd Qu.:44.92   3rd Qu.:44.911   3rd Qu.:44.528   3rd Qu.:44.619  
 Max.   :56.55   Max.   :56.409   Max.   :56.315   Max.   :56.274  
 NA's   :23      NA's   :22       NA's   :22       NA's   :22      
     x1976           x1977           x1978           x1979           x1980      
 Min.   :10.13   Min.   :10.30   Min.   :10.40   Min.   :10.50   Min.   :11.10  
 1st Qu.:20.20   1st Qu.:20.11   1st Qu.:19.78   1st Qu.:19.75   1st Qu.:19.86  
 Median :34.09   Median :33.76   Median :33.52   Median :33.36   Median :33.18  
 Mean   :32.65   Mean   :32.41   Mean   :32.20   Mean   :32.08   Mean   :31.95  
 3rd Qu.:44.51   3rd Qu.:43.78   3rd Qu.:43.20   3rd Qu.:43.56   3rd Qu.:43.24  
 Max.   :56.29   Max.   :56.35   Max.   :56.44   Max.   :56.54   Max.   :56.63  
 NA's   :20      NA's   :20      NA's   :20      NA's   :20      NA's   :20     
     x1981           x1982           x1983           x1984           x1985      
 Min.   :10.40   Min.   :10.30   Min.   : 9.90   Min.   :10.10   Min.   :10.20  
 1st Qu.:20.66   1st Qu.:20.48   1st Qu.:20.17   1st Qu.:19.90   1st Qu.:19.00  
 Median :32.94   Median :32.99   Median :32.22   Median :31.99   Median :31.32  
 Mean   :31.95   Mean   :31.74   Mean   :31.44   Mean   :31.12   Mean   :30.80  
 3rd Qu.:42.96   3rd Qu.:42.81   3rd Qu.:42.33   3rd Qu.:41.89   3rd Qu.:41.11  
 Max.   :56.68   Max.   :56.69   Max.   :56.63   Max.   :56.52   Max.   :56.37  
 NA's   :21      NA's   :20      NA's   :20      NA's   :19      NA's   :19     
     x1986           x1987           x1988           x1989           x1990      
 Min.   : 9.80   Min.   : 9.70   Min.   :10.10   Min.   : 9.90   Min.   :10.00  
 1st Qu.:18.60   1st Qu.:19.14   1st Qu.:18.85   1st Qu.:18.35   1st Qu.:18.56  
 Median :30.58   Median :30.20   Median :29.72   Median :29.14   Median :28.45  
 Mean   :30.51   Mean   :30.18   Mean   :29.73   Mean   :29.19   Mean   :28.89  
 3rd Qu.:40.66   3rd Qu.:40.07   3rd Qu.:39.85   3rd Qu.:39.16   3rd Qu.:38.60  
 Max.   :56.18   Max.   :55.98   Max.   :55.80   Max.   :55.63   Max.   :55.48  
 NA's   :19      NA's   :17      NA's   :18      NA's   :17      NA's   :16     
     x1991           x1992           x1993           x1994           x1995      
 Min.   : 9.90   Min.   : 9.80   Min.   : 9.40   Min.   : 9.40   Min.   : 8.60  
 1st Qu.:17.53   1st Qu.:17.22   1st Qu.:16.70   1st Qu.:16.00   1st Qu.:15.60  
 Median :27.71   Median :27.04   Median :26.10   Median :25.35   Median :24.88  
 Mean   :28.28   Mean   :27.78   Mean   :27.32   Mean   :26.72   Mean   :26.21  
 3rd Qu.:38.09   3rd Qu.:37.50   3rd Qu.:37.16   3rd Qu.:35.89   3rd Qu.:34.92  
 Max.   :55.35   Max.   :55.22   Max.   :55.07   Max.   :54.91   Max.   :54.73  
 NA's   :15      NA's   :14      NA's   :17      NA's   :15      NA's   :15     
     x1996           x1997           x1998           x1999           x2000      
 Min.   : 8.10   Min.   : 7.70   Min.   : 7.60   Min.   : 7.80   Min.   : 7.80  
 1st Qu.:15.15   1st Qu.:15.02   1st Qu.:14.34   1st Qu.:14.05   1st Qu.:14.04  
 Median :24.24   Median :23.68   Median :23.21   Median :22.71   Median :22.17  
 Mean   :25.70   Mean   :25.37   Mean   :24.92   Mean   :24.65   Mean   :24.25  
 3rd Qu.:34.33   3rd Qu.:34.01   3rd Qu.:33.26   3rd Qu.:32.77   3rd Qu.:32.07  
 Max.   :54.53   Max.   :54.31   Max.   :54.08   Max.   :53.82   Max.   :53.54  
 NA's   :12      NA's   :15      NA's   :16      NA's   :16      NA's   :16     
     x2001           x2002           x2003           x2004           x2005       
 Min.   : 7.20   Min.   : 7.10   Min.   : 6.90   Min.   : 7.20   Min.   : 7.812  
 1st Qu.:13.60   1st Qu.:13.44   1st Qu.:13.46   1st Qu.:13.22   1st Qu.:13.045  
 Median :21.70   Median :21.20   Median :20.99   Median :20.79   Median :20.654  
 Mean   :23.83   Mean   :23.47   Mean   :23.34   Mean   :23.12   Mean   :22.891  
 3rd Qu.:31.96   3rd Qu.:31.61   3rd Qu.:31.28   3rd Qu.:30.81   3rd Qu.:30.826  
 Max.   :53.24   Max.   :52.91   Max.   :52.57   Max.   :52.20   Max.   :51.820  
 NA's   :15      NA's   :13      NA's   :16      NA's   :14      NA's   :12      
     x2006            x2007           x2008           x2009           x2010      
 Min.   : 8.122   Min.   : 8.30   Min.   : 8.30   Min.   : 8.10   Min.   : 8.30  
 1st Qu.:13.123   1st Qu.:13.10   1st Qu.:12.96   1st Qu.:12.71   1st Qu.:12.71  
 Median :20.694   Median :20.52   Median :20.22   Median :20.00   Median :19.61  
 Mean   :22.692   Mean   :22.57   Mean   :22.47   Mean   :22.27   Mean   :22.01  
 3rd Qu.:30.308   3rd Qu.:30.02   3rd Qu.:29.83   3rd Qu.:29.93   3rd Qu.:29.61  
 Max.   :51.428   Max.   :51.03   Max.   :50.62   Max.   :50.22   Max.   :49.80  
 NA's   :10       NA's   :11      NA's   :13      NA's   :13      NA's   :11     
     x2011           x2012           x2013           x2014           x2015      
 Min.   : 8.30   Min.   : 8.20   Min.   : 7.90   Min.   : 7.90   Min.   : 8.00  
 1st Qu.:12.58   1st Qu.:12.60   1st Qu.:12.41   1st Qu.:12.48   1st Qu.:12.10  
 Median :19.42   Median :18.89   Median :18.96   Median :18.83   Median :18.60  
 Mean   :21.88   Mean   :21.59   Mean   :21.34   Mean   :21.06   Mean   :20.83  
 3rd Qu.:29.57   3rd Qu.:29.12   3rd Qu.:28.67   3rd Qu.:28.34   3rd Qu.:28.13  
 Max.   :49.37   Max.   :48.93   Max.   :48.47   Max.   :47.99   Max.   :47.50  
 NA's   :13      NA's   :13      NA's   :14      NA's   :11      NA's   :14     
     x2016           x2017           x2018      
 Min.   : 7.80   Min.   : 6.70   Min.   : 5.90  
 1st Qu.:11.91   1st Qu.:11.59   1st Qu.:11.42  
 Median :18.32   Median :18.01   Median :17.60  
 Mean   :20.53   Mean   :20.19   Mean   :19.83  
 3rd Qu.:27.67   3rd Qu.:27.27   3rd Qu.:27.09  
 Max.   :47.02   Max.   :46.54   Max.   :46.08  
 NA's   :13      NA's   :13      NA's   :13     

Reshaping

Ok, dataset is in wide format, hence I will reshape it into long format

df %>% 
  pivot_longer(x1960:x2018, 
               names_to = "year", 
               values_to = "value")

Correct, hence I will store as a new dataframe Since I will not use the wide, I will rewrite it

df <- df %>% 
  pivot_longer(x1960:x2018, 
               names_to = "year", 
               values_to = "value")

Now I will fix the year column from this:

head(df$year)
[1] "x1960" "x1961" "x1962" "x1963" "x1964" "x1965"
df <- df %>% 
  mutate(year = str_sub(year, 2)) # remove the x

Join

Now we need a row for continent. SInce I have a tree code country column, googled “three code country continent csv” and found a csv with the three code to match and the continent.

Found a file here: https://datahub.io/JohnSnowLabs/country-and-continent-codes-list

continents <- read_csv("https://datahub.io/JohnSnowLabs/country-and-continent-codes-list/r/country-and-continent-codes-list-csv.csv")

── Column specification ───────────────────────────────────────────────────────────
cols(
  Continent_Name = col_character(),
  Continent_Code = col_character(),
  Country_Name = col_character(),
  Two_Letter_Country_Code = col_character(),
  Three_Letter_Country_Code = col_character(),
  Country_Number = col_double()
)

So I will select only the relevant columns, Three_Letter_Country_Code and Continent_Name

continents <- continents %>% 
  select(Three_Letter_Country_Code, Continent_Name)

Check

head(continents)

Now try to join

left_join(df, continents, 
          by = c("country_code" = "Three_Letter_Country_Code"))

Works!, so let’s join

df  <- left_join(df, continents, 
          by = c("country_code" = "Three_Letter_Country_Code"))

and delete the continents dataframe

rm(continents)

Finally, change the year for date format

This was tricky, finally found the answer here: https://stackoverflow.com/questions/30255833/convert-four-digit-year-values-to-a-date-type

df <- df %>% 
  mutate(year = as.Date(as.character(year), format = "%Y")) 

So, dataset ready for analysis!

Exploratory data analysis

head(df)

How many countries?

df %>%  
  distinct(country_name) # check unique values in one specified column

ok, we have 264 countries, from

df %>% 
  distinct(Continent_Name)

Check the NAs values

df %>% 
  visdat::vis_dat() # visualize the variables and NAs from a dataset

Check in more detail the NAs from continents:

df %>% 
  filter(is.na(Continent_Name)) %>%  # filter the NAs values from the Continent_name column
  distinct(country_name)

ok, there are some values, I will remove all of them and leave only the countries

df <- df %>% 
  filter(!is.na(Continent_Name)) # here the ! makes the trick, means "Is not NA"

check again

df %>% 
  visdat::vis_dat()

there are some NAs values, let’s find them

df %>% 
  filter(is.na(value))

ok, again, remove, now I will use drop_na

df <- df %>% 
  drop_na(value)

What is the average birth rate per year?

Check the average change in the birth rate

df %>% 
  group_by(year) %>%  # group by year
  summarise(average_birth_rate = mean(value)) %>% # now calculate the mean for each year
  ggplot(aes(x = year, 
             y = average_birth_rate)) + 
  geom_line(group = 1) +  # since there is only one point per year, I say here "use the points and merge it"
  labs(title = "Average Birth Rate per 1000", 
       subtitle = "Source: World Bank", 
       x = "Year", 
       y = "Average Birth Rate per 1000 inhabs.") + 
  theme_minimal()   # this use the theme minimal
`summarise()` ungrouping output (override with `.groups` argument)

What is the average birth rate per year and continent?

df %>% 
  group_by(Continent_Name, year) %>% 
  summarise(average_birth_rate = mean(value)) %>% 
  ggplot(aes(x = year, 
             y = average_birth_rate, 
             color = Continent_Name)) + 
  geom_line() +  
  labs(title = "Average Birth Rate per 1000 per Continent", 
       subtitle = "Source: World Bank", 
       x = "Year", 
       y = "Average Birth Rate per 1000 inhabs.", 
       color = "Continent") + 
  theme_minimal()   # this use the theme minimal
`summarise()` regrouping output by 'Continent_Name' (override with `.groups` argument)

Check each country individually

df %>% 
  ggplot(aes(x = year, 
             y = value, 
             group = country_name)) + 
  geom_line() + 
  facet_wrap(~Continent_Name)

What is the average birth rate per year for the Baltic countries?

We will focus on the Baltic countries

df %>% 
  filter(country_name %in% c("Latvia", "Estonia", "Lithuania")) %>%  # Select only the three baltic countries
   ggplot(aes(x = year, 
             y = value, 
             group = country_name,  
             color = country_name)) + 
  geom_line() + 
  facet_grid(country_name~.) + # order the facet with the countries in three rows. Try changing to (. ~ country_name)
  theme_minimal() + 
  labs(
    title = "Birth rate per 1000 inhab for Baltic Countries", 
    subtitle = "Data Source: World Bank", 
    x = "Year", 
    y = "Birth rate per 1000 inhabs", 
    color = "Country"
  )

Tables

I will calculate the change in birth rate from 1988 to 2018 for european countries and create a table.

I will use a new package, “DT”. The documentation is here: https://rstudio.github.io/DT/

pacman::p_load(DT)

Create a codebook

Use the dataMaid package

uncomment the next line to generate a codebook

dataMaid::makeCodebook(df)
Data report generation is finished. Please wait while your output file is being rendered.

This command create a codebook in PDF format.

LS0tCnRpdGxlOiAiRXhhbXBsZSBGaW5hbCBTZW1pbmFyIgphdXRob3I6ICJTZXJnaW8gVXJpYmUiCm91dHB1dDoKICBodG1sX25vdGVib29rOgogICAgdG9jOiB5ZXMKICAgIHRvY19mbG9hdDogeWVzCiAgICBmaWdfY2FwdGlvbjogeWVzCiAgaHRtbF9kb2N1bWVudDoKICAgIGNvZGVfZG93bmxvYWQ6IHllcwogIHBkZl9kb2N1bWVudDoKICAgIHRvYzogeWVzCiAgYWx3YXlzX2FsbG93X2h0bWw6IHRydWUKLS0tCgojIEFpbQpUaGlzIGlzIGEgc2FtcGxlIHNlbWluYXIgYWJvdXQgdGhlIGZpbmFsIHdvcmsgdGhhdCBlYWNoIG9uZSBzaG91bGQgcHJlc2VudC4gCgpUaGUgb2JqZWN0aXZlIG9mIHRoZSBzZW1pbmFyIGlzIHRvIGRlbW9uc3RyYXRlIHlvdXIgc2tpbGxzIGluIHBlcmZvcm1pbmcgYW4gZXhwbG9yYXRvcnkgYW5hbHlzaXMuCgpUaGUgbWluaW11bSBlbGVtZW50cyB0aGF0IHRoZSBzZW1pbmFyIHNob3VsZCBoYXZlIGluY2x1ZGUgCgorIHBvc2UgYSBxdWVzdGlvbiB0aGF0IGNhbiBiZSBhbnN3ZXJlZCB3aXRoIGRhdGEKKyBsb2FkIGRhdGEgaW50byBSLCBpZGVhbGx5IGZyb20gYW4gb25saW5lIHNvdXJjZQorIGxvYWQgciBwYWNrYWdlcworIGNvbW1lbnQgaW4gZGV0YWlsIGFsbCB0aGUgY29kZQorIGV4cGxvcmUgdGhlIGRhdGEKKyBpZGVudGlmeSB0YWJ1bGF0ZWQgZGF0YSBpbiB0aWR5IGZvcm1hdAorIGlkZW50aWZ5IHRoZSBsb2NhdGlvbiBhbmQgcHJvcG9ydGlvbiBvZiBudWxsIGRhdGEgaW4gdGhlIGRhdGFzZXQKKyBjcmVhdGUgb25lIG9yIG1vcmUgc3VtbWFyeSB0YWJsZXMKKyBDcmVhdGUgb25lIG9yIG1vcmUgZXhwbG9yYXRvcnkgZ3JhcGhpY3MKKyBEYXRhIHdyYW5nbGluZzogQXBwbHkgb25lIG9yIG1vcmUgb2YgdGhlc2UgY29tbWFuZHM6IGZpbHRlciwgc2VsZWN0LCBtdXRhdGUsIHBpdm90CisgYW5zd2VyIHRoZSBxdWVzdGlvbiBwb3NlZCB3aXRoIGEgY29ycmVjdGx5IGZvcm1hdHRlZCBncmFwaGljCisgZ2VuZXJhdGUgYSBjb2RlYm9vayB1c2luZyB0aGUgcmVwb3J0ZVIgcGFja2FnZSAob3IgZGF0YU1haWQgaWYgcmVwb3J0ZVIgaXMgbm90IGF2YWlsYWJsZSB5ZXQpCisgdXNlIHRoZSBybWFya2Rvd24gZm9ybWF0IHRvIGludGVncmF0ZSB0ZXh0IGFuZCBjb2RlCisgZXhwb3J0IHRoZSBkb2N1bWVudCB0b2dldGhlciB3aXRoIHRoZSBjb2RlIHRvIGEgcGRmIG9yIGRvYyBmaWxlCgoKIyMgTWluaW11bSBjb2RlIG11c3QgaW5jbHVkZQorIFBhY2thZ2VzCiAgKyBwYWNtYW46OnBfbG9hZCgpCisgRGF0YSBpbXBvcnQKICArIHJlYWRfY3N2KCkKKyBEYXRhIGV4cGxvcmF0aW9uCiAgKyBoZWFkKCkKICArIHN1bW1hcnkoKQogICsgZGltKCkKKyBEYXRhIHdyYW5nbGluZwogICsgJT4lIAogICsgZmlsdGVyKCkKICArIHNlbGVjdCgpCiAgKyBtdXRhdGUoKQogICsgZ3JvdXBfYnkoKQogICsgc3VtbWFyaXplKCkKKyBUYWJsZXMKICArIGd0c3VtbWFyeTo6dGJsX3N1bW1hcnkoKQorIEdyYXBocwogICsgZ2dwbG90KCkKKyBDb2RlYm9vawogICsgZGF0YU1haWQ6OmNvZGVib29rKCkKCgpFeHRyYSBwb2ludHM6IAoKICsgdXNlIHBhY2thZ2VzIHRoYXQgd2UgaGF2ZW4ndCBzZWVuIGluIGNsYXNzZXMKICsgdXNlIGpvaW5fCiArIHVzZSBsb2cxMCB0cmFuc2Zvcm1hdGlvbnMgZm9yIGF4ZXMKCgpCZWxvdyBpcyBhIHNhbXBsZSBzZW1pbmFyLiBZb3VyIGNvZGUgbWF5IGJlIG1vcmUgb3IgbGVzcyB0aGFuIHRoZSBleGFtcGxlLCB0aGVyZSBpcyBubyBtYXhpbXVtIG9yIG1pbmltdW0uIFRoZSBpbXBvcnRhbnQgdGhpbmcgaXMgdGhhdDogCgogKyB5b3UgbXVzdCB1c2UgdGhlIG1pbmltdW0gY29tbWFuZHMgbGlzdGVkIGJlZm9yZSwgCiArIHlvdSBtdXN0IGNvbW1lbnQgbW9zdCBvZiB5b3VyIGNvZGUgdG8gZG9jdW1lbnQgdGhlIHN0ZXBzIG9mIHlvdXIgYW5hbHlzaXMgYW5kIAogKyB5b3UgbXVzdCBleHBvcnQgeW91ciBjb2RlIHNjcmlwdCB0byBhIHBkZiBvciBkb2N4IGRvY3VtZW50IChnbyB0byBQcmV2aWV3IE5vdGVib29rIHRhYiBhbmQgc2VsZWN0IHRoZSBmb3JtYXQgdG8gZXhwb3J0LCBkZXRhaWxlZCBpbnN0cnVjdGlvbnMgW2hlcmVdKGh0dHBzOi8vcm1hcmtkb3duLnJzdHVkaW8uY29tL2xlc3Nvbi05Lmh0bWwpKQoKCioqSU1QT1JUQU5UOiBZb3VyIGNvZGUgc2hvdWxkIGJlIGV4ZWN1dGFibGUqKgoKIyMgU29tZSByZWNvbW1lbmRhdGlvbnMKCldyaXRlIGNsZWFyIGNvZGUgKHNlbGVjdCBjb2RlLCBDVFJMK1NISUZUK0EpOiBpdCB3aWxsIGhlbHAgeW91IHdoZW4gc29tZXRoaW5nIGRvZXNuJ3Qgd29yawoKSW4gY2FzZSBzb21ldGhpbmcgZG9lc24ndCB3b3JrLCBkb24ndCBkZXNwYWlyLCB0aGF0IGhhcHBlbnMgdG8gZXZlcnlvbmUsIGJlZ2lubmVycyBhbmQgYWR2YW5jZWQuIFRoZSBpbXBvcnRhbnQgdGhpbmcgaXMgdG8gYmUgYWJsZSB0byBkZXRlY3QgdGhlIGVycm9yLiBJbiBjYXNlIHNvbWUgZXJyb3IgbWVzc2FnZSBhcHBlYXJzLCBJIHN1Z2dlc3QgeW91IGZpcnN0IHZlcmlmeSB0aGF0IHlvdXIgY29kZSBkb2Vzbid0IGhhdmUgc29tZSBvYnZpb3VzIGVycm9yIChsaWtlIHNvbWUgb3JwaGFuIHBhcmVudGhlc2lzLCBhIHBlcmlvZCBpbnN0ZWFkIG9mIGEgY29tbWEsIGV0YykgYW5kIGlmIHRoZSBlcnJvciBwZXJzaXN0cywgY29weSBhbmQgcGFzdGUgdGhlIGVycm9yIG1lc3NhZ2UgaW4gZ29vZ2xlIHRvIGZpbmQgb3V0IHRoZSBzb2x1dGlvbi4gCgpSZW1lbWJlcjogdGhlcmUgaXMgbm8gcHJvYmxlbSB0aGF0IGNhbm5vdCBiZSBzb2x2ZWQgd2l0aG91dCB0aGUgcHJvcGVyIHVzZSBvZiBnb29nbGUgb3IgYSBoYW1tZXIhCgojIFNFTUlOQVIgRVhBTVBMRQoKIyMgUXVlc3Rpb24KCl9fV2hhdCdzIHRoZSBiaXJ0aCByYXRlIGZvciBldXJvcGVhbiBjb3VudHJpZXMgYW5kIGZvciBjb250aW5lbnRzIGFuZCBXaGF0IGlzIHRoZSBiaXJ0aCByYXRlIGZvciB0aGUgQmFsdGljIGNvdW50cmllcz9fXwoKCgojIyBQYWNrYWdlcwoKYGBge3J9CiMgaW5zdGFsbCB0aGUgcGFjbWFuIHBhY2thZ2UgaWYgaXMgbm90IGluc3RhbGxlZCBwcmV2aW91c2x5LCB1bmNvbW1lbnQgbmV4dCBsaW5lCgojIGluc3RhbGwucGFja2FnZXMoInBhY21hbiIpCgpwYWNtYW46OnBfbG9hZCh0aWR5dmVyc2UsICMgc2V2ZXJhbCBwYWNrYWdlcyBmb3IgZGF0YSBzY2llbmNlCiAgICAgICAgICAgICAgIHZpc2RhdCwgICMgdG8gdmlzdWFsaXplIE5BcwogICAgICAgICAgICAgICBndHN1bW1hcnksICMgZm9yIG5pY2UgdGFibGVzCiAgICAgICAgICAgICAgIGRhdGFNYWlkLCAjIGZvciB0aGUgY29kZWJvb2sKICAgICAgICAgICAgICAgamFuaXRvcikgIyBmb3IgZGF0YSBjbGVhbmluZwpgYGAKCiMjIERhdGFzZXQKCkZvdW5kIGluIHRoZSBXb3JsZCBCYW5rIGRhdGEKCmh0dHBzOi8vZGF0YS53b3JsZGJhbmsub3JnL2luZGljYXRvci9TUC5EWU4uQ0JSVC5JTgpGb3VuZCB0aGUgQmlydGggcmF0ZSwgY3J1ZGUgKHBlciAxLDAwMCBwZW9wbGUpCklzIGluIHppcCBmb3JtYXQsIEkgY3JlYXRlZCBhIGNvcHkgb25saW5lIGluIGdvb2dsZSBkcml2ZSwgcHVibGlzaGVkIGFzIGEgY3N2IGZpbGUgYW5kIGltcG9ydGVkIGludG8gUgpJIHdpbGwgY2FsbCBteSBkYXRhc2V0IGFzIGRmIGZvciBEYXRhIEZyYW1lCgpgYGB7cn0KZGYgPC0gcmVhZF9jc3YoImh0dHBzOi8vZG9jcy5nb29nbGUuY29tL3NwcmVhZHNoZWV0cy9kL2UvMlBBQ1gtMXZTdHY3UHI2OUR0Ukt2Nk53NmdWQmVwOGhiVDNwRWVPNkIxdk53eEtfMURVSGdwb1RnYnVScFo0U3ZndEhGUW5CWkpWR2VlUVZ5UnVYWmwvcHViP2dpZD00NzM5NjY1NzEmc2luZ2xlPXRydWUmb3V0cHV0PWNzdiIpCmBgYAojIyBEYXRhIGNsZWFuaW5nCgpgYGB7cn0KaGVhZChkZikKYGBgClNlZW1zIHRoYXQgdGhlcmUgYXJlIHNvbWUgZXh0cmEgcm93cyBhdCB0aGUgdG9wIG9mIHRoZSBkYXRhc2V0LiBJIHdpbGwgcmUtcmVhZCwgYWRkaW5nIHRoZSBvcHRpb24gdG8gc2tpcCB0aG9zZSByb3dzCgpgYGB7cn0KZGYgPC0gcmVhZF9jc3YoImh0dHBzOi8vZG9jcy5nb29nbGUuY29tL3NwcmVhZHNoZWV0cy9kL2UvMlBBQ1gtMXZTdHY3UHI2OUR0Ukt2Nk53NmdWQmVwOGhiVDNwRWVPNkIxdk53eEtfMURVSGdwb1RnYnVScFo0U3ZndEhGUW5CWkpWR2VlUVZ5UnVYWmwvcHViP2dpZD00NzM5NjY1NzEmc2luZ2xlPXRydWUmb3V0cHV0PWNzdiIsIAogICAgICAgICAgICAgICBza2lwID0gNCkgIyBhZGQgdGhlIG9wdGlvbiB0byBza2lwIDQgcm93cyB3aGVuIGltcG9ydGluZyB0aGUgZmlsZQpgYGAKCk5vdyBpdCdzIG9rCgoKSSB3aWxsIHN0YW5kYXJkaXplIHRoZSBuYW1lcyB0byBmYWNpbGl0YXRlIGhhbmRsaW5nCgpgYGB7cn0KZGYgPC0gZGYgJT4lICAgICAgICAgICAgICAjIGNyZWF0ZSBhIG5ldyBkYXRhc2V0IHdpdGggdGhlIGZvcm1lciwgZGF0YXNldAogIGphbml0b3I6OmNsZWFuX25hbWVzKCkgICMgYW5kIGNsZWFuIGFsbCB0aGUgbmFtZXMKICAKYGBgCgp0aGUgbGFzdCB0d28gY29sdW1ucyBhcmUgZW1wdHksIHNvIEkgd2lsbCBkZWxldGUgaXQKCmBgYHtyfQpkZiA8LSBkZiAlPiUgCiAgc2VsZWN0KC14MjAxOSwgICMgaGVyZSBJIHVuc2VsZWN0IHRoZXNlIGNvbHVtbnMKICAgICAgICAgLXgyMDIwKQpgYGAKCgojIyBEYXRhIHdyYW5nbGluZwoKCkNoZWNrIHRoZSBkaW1lbnNpb25zCgpgYGB7cn0KZGltKGRmKQpgYGAKQ2hlY2sgdGhlIHZhcmlhYmxlcyBpbmNsdWRlZAoKYGBge3J9CmdsaW1wc2UoZGYpCmBgYAoKQ2hlY2sgdGhlIGNvbnRlbnQgb2Ygc29tZSB2YXJpYWJsZXMgd2l0aCBzaW1wbGUgdGFibGVzCmBgYHtyfQp0YWJsZShkZiRpbmRpY2F0b3JfbmFtZSkKYGBgCmBgYHtyfQp0YWJsZShkZiRpbmRpY2F0b3JfY29kZSkKYGBgClNvLCBib3RoIGNvbHVtbnMgYXJlIGtleXMsIHRoYXQgaXMgc29tZSBjb25zdGFudCBhbmQgbm90IHZhcmlhYmxlcywgc28gSSB3aWxsIGRlbGV0ZSBpdAoKYGBge3J9CmRmIDwtIGRmICU+JSAKICBzZWxlY3QoLWluZGljYXRvcl9uYW1lLCAKICAgICAgICAgLWluZGljYXRvcl9jb2RlKQpgYGAKCkNoZWNrCgpgYGB7cn0Kc3VtbWFyeShkZikKYGBgCgoKIyMgUmVzaGFwaW5nCgpPaywgZGF0YXNldCBpcyBpbiB3aWRlIGZvcm1hdCwgaGVuY2UgSSB3aWxsIHJlc2hhcGUgaXQgaW50byBsb25nIGZvcm1hdAoKYGBge3J9CmRmICU+JSAKICBwaXZvdF9sb25nZXIoeDE5NjA6eDIwMTgsIAogICAgICAgICAgICAgICBuYW1lc190byA9ICJ5ZWFyIiwgCiAgICAgICAgICAgICAgIHZhbHVlc190byA9ICJ2YWx1ZSIpCmBgYAoKQ29ycmVjdCwgaGVuY2UgSSB3aWxsIHN0b3JlIGFzIGEgbmV3IGRhdGFmcmFtZQpTaW5jZSBJIHdpbGwgbm90IHVzZSB0aGUgd2lkZSwgSSB3aWxsIHJld3JpdGUgaXQKYGBge3J9CmRmIDwtIGRmICU+JSAKICBwaXZvdF9sb25nZXIoeDE5NjA6eDIwMTgsIAogICAgICAgICAgICAgICBuYW1lc190byA9ICJ5ZWFyIiwgCiAgICAgICAgICAgICAgIHZhbHVlc190byA9ICJ2YWx1ZSIpCmBgYAoKTm93IEkgd2lsbCBmaXggdGhlIHllYXIgY29sdW1uCmZyb20gdGhpczogCmBgYHtyfQpoZWFkKGRmJHllYXIpCmBgYApgYGB7cn0KZGYgPC0gZGYgJT4lIAogIG11dGF0ZSh5ZWFyID0gc3RyX3N1Yih5ZWFyLCAyKSkgIyByZW1vdmUgdGhlIHgKYGBgCgoKCiMjIEpvaW4KCk5vdyB3ZSBuZWVkIGEgcm93IGZvciBjb250aW5lbnQuIFNJbmNlIEkgaGF2ZSBhIHRyZWUgY29kZSBjb3VudHJ5IGNvbHVtbiwgZ29vZ2xlZCAidGhyZWUgY29kZSBjb3VudHJ5IGNvbnRpbmVudCBjc3YiIGFuZCBmb3VuZCBhIGNzdiB3aXRoIHRoZSB0aHJlZSBjb2RlIHRvIG1hdGNoIGFuZCB0aGUgY29udGluZW50LgoKRm91bmQgYSBmaWxlIGhlcmU6IGh0dHBzOi8vZGF0YWh1Yi5pby9Kb2huU25vd0xhYnMvY291bnRyeS1hbmQtY29udGluZW50LWNvZGVzLWxpc3QKCmBgYHtyfQpjb250aW5lbnRzIDwtIHJlYWRfY3N2KCJodHRwczovL2RhdGFodWIuaW8vSm9oblNub3dMYWJzL2NvdW50cnktYW5kLWNvbnRpbmVudC1jb2Rlcy1saXN0L3IvY291bnRyeS1hbmQtY29udGluZW50LWNvZGVzLWxpc3QtY3N2LmNzdiIpCmBgYApTbyBJIHdpbGwgc2VsZWN0IG9ubHkgdGhlIHJlbGV2YW50IGNvbHVtbnMsIFRocmVlX0xldHRlcl9Db3VudHJ5X0NvZGUgYW5kIENvbnRpbmVudF9OYW1lCgpgYGB7cn0KY29udGluZW50cyA8LSBjb250aW5lbnRzICU+JSAKICBzZWxlY3QoVGhyZWVfTGV0dGVyX0NvdW50cnlfQ29kZSwgQ29udGluZW50X05hbWUpCmBgYApDaGVjawoKYGBge3J9CmhlYWQoY29udGluZW50cykKYGBgCk5vdyB0cnkgdG8gam9pbgoKYGBge3J9CmxlZnRfam9pbihkZiwgY29udGluZW50cywgCiAgICAgICAgICBieSA9IGMoImNvdW50cnlfY29kZSIgPSAiVGhyZWVfTGV0dGVyX0NvdW50cnlfQ29kZSIpKQpgYGAKV29ya3MhLCBzbyBsZXQncyBqb2luCmBgYHtyfQpkZiAgPC0gbGVmdF9qb2luKGRmLCBjb250aW5lbnRzLCAKICAgICAgICAgIGJ5ID0gYygiY291bnRyeV9jb2RlIiA9ICJUaHJlZV9MZXR0ZXJfQ291bnRyeV9Db2RlIikpCmBgYAoKYW5kIGRlbGV0ZSB0aGUgY29udGluZW50cyBkYXRhZnJhbWUKCmBgYHtyfQpybShjb250aW5lbnRzKQpgYGAKCkZpbmFsbHksIGNoYW5nZSB0aGUgeWVhciBmb3IgZGF0ZSBmb3JtYXQKClRoaXMgd2FzIF90cmlja3lfLCBmaW5hbGx5IGZvdW5kIHRoZSBhbnN3ZXIgaGVyZTogCmh0dHBzOi8vc3RhY2tvdmVyZmxvdy5jb20vcXVlc3Rpb25zLzMwMjU1ODMzL2NvbnZlcnQtZm91ci1kaWdpdC15ZWFyLXZhbHVlcy10by1hLWRhdGUtdHlwZQoKCgpgYGB7cn0KZGYgPC0gZGYgJT4lIAogIG11dGF0ZSh5ZWFyID0gYXMuRGF0ZShhcy5jaGFyYWN0ZXIoeWVhciksIGZvcm1hdCA9ICIlWSIpKSAKYGBgCgoKClNvLCBkYXRhc2V0IHJlYWR5IGZvciBhbmFseXNpcyEKCiMjIEV4cGxvcmF0b3J5IGRhdGEgYW5hbHlzaXMgCmBgYHtyfQpoZWFkKGRmKQpgYGAKCkhvdyBtYW55IGNvdW50cmllcz8KCmBgYHtyfQpkZiAlPiUgIAogIGRpc3RpbmN0KGNvdW50cnlfbmFtZSkgIyBjaGVjayB1bmlxdWUgdmFsdWVzIGluIG9uZSBzcGVjaWZpZWQgY29sdW1uCmBgYApvaywgd2UgaGF2ZSAyNjQgY291bnRyaWVzLCBmcm9tCgpgYGB7cn0KZGYgJT4lIAogIGRpc3RpbmN0KENvbnRpbmVudF9OYW1lKQpgYGAKIyMjIENoZWNrIHRoZSBOQXMgdmFsdWVzCgpgYGB7cn0KZGYgJT4lIAogIHZpc2RhdDo6dmlzX2RhdCgpICMgdmlzdWFsaXplIHRoZSB2YXJpYWJsZXMgYW5kIE5BcyBmcm9tIGEgZGF0YXNldApgYGAKQ2hlY2sgaW4gbW9yZSBkZXRhaWwgdGhlIE5BcyBmcm9tIGNvbnRpbmVudHM6IAoKYGBge3J9CmRmICU+JSAKICBmaWx0ZXIoaXMubmEoQ29udGluZW50X05hbWUpKSAlPiUgICMgZmlsdGVyIHRoZSBOQXMgdmFsdWVzIGZyb20gdGhlIENvbnRpbmVudF9uYW1lIGNvbHVtbgogIGRpc3RpbmN0KGNvdW50cnlfbmFtZSkKYGBgCm9rLCB0aGVyZSBhcmUgc29tZSB2YWx1ZXMsIEkgd2lsbCByZW1vdmUgYWxsIG9mIHRoZW0gYW5kIGxlYXZlIG9ubHkgdGhlIGNvdW50cmllcwoKYGBge3J9CmRmIDwtIGRmICU+JSAKICBmaWx0ZXIoIWlzLm5hKENvbnRpbmVudF9OYW1lKSkgIyBoZXJlIHRoZSAhIG1ha2VzIHRoZSB0cmljaywgbWVhbnMgIklzIG5vdCBOQSIKYGBgCgpjaGVjayBhZ2FpbgoKYGBge3J9CmRmICU+JSAKICB2aXNkYXQ6OnZpc19kYXQoKQpgYGAKdGhlcmUgYXJlIHNvbWUgTkFzIHZhbHVlcywgbGV0J3MgZmluZCB0aGVtCmBgYHtyfQpkZiAlPiUgCiAgZmlsdGVyKGlzLm5hKHZhbHVlKSkKYGBgCm9rLCBhZ2FpbiwgcmVtb3ZlLCBub3cgSSB3aWxsIHVzZSBkcm9wX25hCgpgYGB7cn0KZGYgPC0gZGYgJT4lIAogIGRyb3BfbmEodmFsdWUpCmBgYAoKCiMjIyBXaGF0IGlzIHRoZSBhdmVyYWdlIGJpcnRoIHJhdGUgcGVyIHllYXI/CkNoZWNrIHRoZSBhdmVyYWdlIGNoYW5nZSBpbiB0aGUgYmlydGggcmF0ZSAKCmBgYHtyfQpkZiAlPiUgCiAgZ3JvdXBfYnkoeWVhcikgJT4lICAjIGdyb3VwIGJ5IHllYXIKICBzdW1tYXJpc2UoYXZlcmFnZV9iaXJ0aF9yYXRlID0gbWVhbih2YWx1ZSkpICU+JSAjIG5vdyBjYWxjdWxhdGUgdGhlIG1lYW4gZm9yIGVhY2ggeWVhcgogIGdncGxvdChhZXMoeCA9IHllYXIsIAogICAgICAgICAgICAgeSA9IGF2ZXJhZ2VfYmlydGhfcmF0ZSkpICsgCiAgZ2VvbV9saW5lKGdyb3VwID0gMSkgKyAgIyBzaW5jZSB0aGVyZSBpcyBvbmx5IG9uZSBwb2ludCBwZXIgeWVhciwgSSBzYXkgaGVyZSAidXNlIHRoZSBwb2ludHMgYW5kIG1lcmdlIGl0IgogIGxhYnModGl0bGUgPSAiQXZlcmFnZSBCaXJ0aCBSYXRlIHBlciAxMDAwIiwgCiAgICAgICBzdWJ0aXRsZSA9ICJTb3VyY2U6IFdvcmxkIEJhbmsiLCAKICAgICAgIHggPSAiWWVhciIsIAogICAgICAgeSA9ICJBdmVyYWdlIEJpcnRoIFJhdGUgcGVyIDEwMDAgaW5oYWJzLiIpICsgCiAgdGhlbWVfbWluaW1hbCgpICAgIyB0aGlzIHVzZSB0aGUgdGhlbWUgbWluaW1hbAoKYGBgCgojIyMgV2hhdCBpcyB0aGUgYXZlcmFnZSBiaXJ0aCByYXRlIHBlciB5ZWFyIGFuZCBjb250aW5lbnQ/CgpgYGB7cn0KZGYgJT4lIAogIGdyb3VwX2J5KENvbnRpbmVudF9OYW1lLCB5ZWFyKSAlPiUgCiAgc3VtbWFyaXNlKGF2ZXJhZ2VfYmlydGhfcmF0ZSA9IG1lYW4odmFsdWUpKSAlPiUgCiAgZ2dwbG90KGFlcyh4ID0geWVhciwgCiAgICAgICAgICAgICB5ID0gYXZlcmFnZV9iaXJ0aF9yYXRlLCAKICAgICAgICAgICAgIGNvbG9yID0gQ29udGluZW50X05hbWUpKSArIAogIGdlb21fbGluZSgpICsgIAogIGxhYnModGl0bGUgPSAiQXZlcmFnZSBCaXJ0aCBSYXRlIHBlciAxMDAwIHBlciBDb250aW5lbnQiLCAKICAgICAgIHN1YnRpdGxlID0gIlNvdXJjZTogV29ybGQgQmFuayIsIAogICAgICAgeCA9ICJZZWFyIiwgCiAgICAgICB5ID0gIkF2ZXJhZ2UgQmlydGggUmF0ZSBwZXIgMTAwMCBpbmhhYnMuIiwgCiAgICAgICBjb2xvciA9ICJDb250aW5lbnQiKSArIAogIHRoZW1lX21pbmltYWwoKSAgICMgdGhpcyB1c2UgdGhlIHRoZW1lIG1pbmltYWwKCmBgYAoKQ2hlY2sgZWFjaCBjb3VudHJ5IGluZGl2aWR1YWxseQoKYGBge3J9CmRmICU+JSAKICBnZ3Bsb3QoYWVzKHggPSB5ZWFyLCAKICAgICAgICAgICAgIHkgPSB2YWx1ZSwgCiAgICAgICAgICAgICBncm91cCA9IGNvdW50cnlfbmFtZSkpICsgCiAgZ2VvbV9saW5lKCkgKyAKICBmYWNldF93cmFwKH5Db250aW5lbnRfTmFtZSkKYGBgCiMjIyBXaGF0IGlzIHRoZSBhdmVyYWdlIGJpcnRoIHJhdGUgcGVyIHllYXIgZm9yIHRoZSBCYWx0aWMgY291bnRyaWVzPwpXZSB3aWxsIGZvY3VzIG9uIHRoZSBCYWx0aWMgY291bnRyaWVzCgpgYGB7cn0KZGYgJT4lIAogIGZpbHRlcihjb3VudHJ5X25hbWUgJWluJSBjKCJMYXR2aWEiLCAiRXN0b25pYSIsICJMaXRodWFuaWEiKSkgJT4lICAjIFNlbGVjdCBvbmx5IHRoZSB0aHJlZSBiYWx0aWMgY291bnRyaWVzCiAgIGdncGxvdChhZXMoeCA9IHllYXIsIAogICAgICAgICAgICAgeSA9IHZhbHVlLCAKICAgICAgICAgICAgIGdyb3VwID0gY291bnRyeV9uYW1lLCAgCiAgICAgICAgICAgICBjb2xvciA9IGNvdW50cnlfbmFtZSkpICsgCiAgZ2VvbV9saW5lKCkgKyAKICBmYWNldF9ncmlkKGNvdW50cnlfbmFtZX4uKSArICMgb3JkZXIgdGhlIGZhY2V0IHdpdGggdGhlIGNvdW50cmllcyBpbiB0aHJlZSByb3dzLiBUcnkgY2hhbmdpbmcgdG8gKC4gfiBjb3VudHJ5X25hbWUpCiAgdGhlbWVfbWluaW1hbCgpICsgCiAgbGFicygKICAgIHRpdGxlID0gIkJpcnRoIHJhdGUgcGVyIDEwMDAgaW5oYWIgZm9yIEJhbHRpYyBDb3VudHJpZXMiLCAKICAgIHN1YnRpdGxlID0gIkRhdGEgU291cmNlOiBXb3JsZCBCYW5rIiwgCiAgICB4ID0gIlllYXIiLCAKICAgIHkgPSAiQmlydGggcmF0ZSBwZXIgMTAwMCBpbmhhYnMiLCAKICAgIGNvbG9yID0gIkNvdW50cnkiCiAgKQpgYGAKCiMjIFRhYmxlcwoKSSB3aWxsIGNhbGN1bGF0ZSB0aGUgY2hhbmdlIGluIGJpcnRoIHJhdGUgZnJvbSAxOTg4IHRvIDIwMTggZm9yIGV1cm9wZWFuIGNvdW50cmllcyBhbmQgY3JlYXRlIGEgdGFibGUuIAoKSSB3aWxsIHVzZSBhIG5ldyBwYWNrYWdlLCAiRFQiLiBUaGUgZG9jdW1lbnRhdGlvbiBpcyBoZXJlOiBodHRwczovL3JzdHVkaW8uZ2l0aHViLmlvL0RULwoKYGBge3J9CnBhY21hbjo6cF9sb2FkKERUKQpgYGAKCmBgYHtyfQpkZiAlPiUKICBzZWxlY3QoLWNvdW50cnlfY29kZSkgJT4lICAjdW5zZWxlY3QgdGhpcyBjb2x1bW4sIHNpbmNlIGlzIHVzZWxlc3MKICBmaWx0ZXIoQ29udGluZW50X05hbWUgPT0gIkV1cm9wZSIpICU+JSAgICMgZmlsdGVyIG9ubHkgZXVyb3BlYW4gY291bnRyaWVzCiAgc2VsZWN0KC1Db250aW5lbnRfTmFtZSkgJT4lICAjIGFuZCBub3cgdW5zZWxlY3QgdGhpcyB1c2VsZXNzIGNvbHVtbgogICMgY29udmVydCB0aGUgZGF0ZSBmcm9tIFlZWVlNTUREIGZvcm1hdCB0byBZWVlZIAogIG11dGF0ZSh5ZWFyID0gbHVicmlkYXRlOjp5ZWFyKHllYXIpKSAlPiUgCiAgZmlsdGVyKHllYXIgPT0gIjE5ODgiIHwKICAgICAgICAgICB5ZWFyID09ICIyMDE4IikgICU+JSAKICAjIG5vdyBjb252ZXJ0IHRvIHdpZGUgZm9ybWF0IHRvIGNhbGN1bGF0ZSB0aGUgZGlmZmVyZW5jZQogIHBpdm90X3dpZGVyKG5hbWVzX2Zyb20gPSB5ZWFyLCAKICAgICAgICAgICAgICB2YWx1ZXNfZnJvbSA9IHZhbHVlKSAgJT4lCiAgIyBjaGFuZ2UgdGhlIG5hbWUgb2YgdGhlIGNvbHVtbnMKICByZW5hbWUoICJ4MTk4OCIgPSAiMTk4OCIsIAogICAgICAgICAgIngyMDE4IiA9ICIyMDE4IikgJT4lIAogICMgbm93IGNhbGN1bGF0ZSB0aGUgZGlmZmVyZW5jZQogIG11dGF0ZSgiRGlmZmVyZW5jZSBpbiBiaXJ0aCByYXRlIHBlciAxMDAwIGluaGFicy4gMTk4OC0yMDE4IiA9IHgyMDE4IC0geDE5ODgpICU+JSAjIGNyZWF0ZSBhIG5ldyB2YXJpYWJsZSB3aXRoIHRoZSBkaWZmZXJlbmNlIG9mIHRoZSBiaXJ0aCByYXRlIHBlciAxMDAwCiAgIyBmaWx0ZXIgb25seSByb3dzIHdpdGggdmFsdWVzCiAgZHJvcF9uYSgpICU+JSAKICAjIHJvdW5kIG51bWJlcnMKICBtdXRhdGVfaWYoaXMubnVtZXJpYywgcm91bmQsIDEpICU+JSAKICAjIGNyZWF0ZSBhIG5pY2UgdGFibGUKICBzZWxlY3QoY291bnRyeV9uYW1lLCAiRGlmZmVyZW5jZSBpbiBiaXJ0aCByYXRlIHBlciAxMDAwIGluaGFicy4gMTk4OC0yMDE4IikgJT4lIAogICMgYW5kIG5vdyB0aGUgdGFibGUsIGNvcHkgYW5kIHBhc3RlIGZyb20gdGhlIGRvY3VtZW50YXRpb24KICBkYXRhdGFibGUoKQpgYGAKCgojIyBDcmVhdGUgYSBjb2RlYm9vawoKVXNlIHRoZSBkYXRhTWFpZCBwYWNrYWdlCgp1bmNvbW1lbnQgdGhlIG5leHQgbGluZSB0byBnZW5lcmF0ZSBhIGNvZGVib29rCgpgYGB7cn0KIyBkYXRhTWFpZDo6bWFrZUNvZGVib29rKGRmKQpgYGAKVGhpcyBjb21tYW5kIGNyZWF0ZSBhIGNvZGVib29rIGluIFBERiBmb3JtYXQuCg==