1 Markdown & automation (9-11h)

UBISSS16 Day 3. 9:00 h - 11:00 h (Xavier de Pedro) http://rpubs.com/Xavi/ubiss16d3

1.1 R Markdown tutorial

Using knitr, r-markdown and Plot.ly.

Some examples from:

1.1.1 Install requirements (only if missing)

You can display the code that is run without showing the usual messages produced in the R console when installing a package or loading a library, by means of using the parameter “message=FALSE”:

if (!require(rmarkdown, quietly = TRUE))  {
  install.packages('rmarkdown', repos='http://cran.rediris.es') 
  }
if (!require(knitr, quietly = TRUE))     {
  install.packages('knitr', repos='http://cran.rediris.es') 
}
if (!require(plotly, quietly = TRUE))     {
  install.packages('plotly', repos='http://cran.rediris.es') 
}
if (!require(readr))     {
  install.packages('readr', repos='http://cran.rediris.es') 
  }
if (!require(xtable))     {
  install.packages('xtable', repos='http://cran.rediris.es') 
  }
if (!require(stargazer))     {
  install.packages('stargazer', repos='http://cran.rediris.es') 
  }
if (suppressPackageStartupMessages(!require(googleVis, quietly = TRUE)))  {
  install.packages('googleVis', repos='http://cran.rediris.es') 
}
if (!require("DT", quietly = T)) {
  install.packages('DT', repos = 'http://cran.rstudio.com')
}
if (!require("webshot", quietly = T)) {
  install.packages('webshot', repos = 'http://cran.rstudio.com')
}
if (!require("shiny", quietly = T)) {
  install.packages('shiny', repos = 'http://cran.rstudio.com')
}

Try re-running it with “message=TRUE”

1.2 Overview

Creating documents with R Markdown starts with an .Rmd file that contains a combination of markdown (content with simple text formatting) and R code chunks. The .Rmd file is fed to knitr, which executes all of the R code chunks and creates a new markdown (.md) document which includes the R code and it’s output.The markdown file generated by knitr is then processed by pandoc which is responsible for creating a finished web page, PDF, MS Word document, slide show, handout, book, dashboard, package vignette or other format.This may sound complicated, but R Markdown makes it extremely simple by encapsulating all of the above processing into a single render function. Better still, RStudio includes a “Knit” button that enables you to render an .Rmd and preview it using a single click or keyboard shortcut.

1.3 Usage

You can install the R Markdown package from CRAN as follows:

install.packages("rmarkdown")

In our case, it was installed (only if needed) in a step above. Once installed, you can open a new .Rmd file in the RStudio IDE by going to File > New File > R Markdown.

1.3.1 Markdown Basics

Markdown is a simple formatting language designed to make authoring content easy for everyone. Rather than write in complex markup code (e.g. HTML or LaTex), you write in plain text with formatting cues. Pandoc uses these cues to turn your document into attractive output. For example, the file on the left shows basic Markdown and the resulting output on the right:

1.3.2 R Code Chunks

Within an R Markdown file, R Code Chunks can be embedded with the native Markdown syntax for fenced code regions. For example, the following code chunk computes a data summary and renders a plot as a PNG image:

1.3.3 Inline R Code

You can also evaluate R expressions inline by enclosing the expression within a single back-tick qualified with ‘r’. For example, the following code embeds R results as text in the output at right

1.3.4 Rendering Output

There are two ways to render an R Markdown document into it’s final output format. If you are using RStudio, then the “Knit” button (Ctrl+Shift+K) will render the document and display a preview of it.If you are not using RStudio then you simply need to call the -+rmarkdown::render+- function, for example:

rmarkdown::render("input.Rmd")

Note that both methods use the same mechanism; RStudio’s “Knit” button calls rmarkdown::render() under the hood.

1.3.5 Using Parameters

R Markdown documents can contain a metadata section that includes title, author, and date information as well as options for customizing output. For example, this metadata included at the top of an .Rmd file adds a table of contents and chooses a different HTML theme:

---
title: "Sample Document"
output:
  html_document:
    toc: true
    theme: united
---

You can add a params field to the metadata to provide a list of values for your document to use. R Markdown will make the list available as params within any R code chunk in the report. For example, the file below takes a filename as a parameter and uses the name to read in a data set.

Parameters let you quickly apply your data set to new data sets, models, and parameters. You can set new values for the parameters when you call -+rmarkdown::render()+-,

rmarkdown::render("input.Rmd", params = list())

as well as when you press the “Knit” button:

1.3.6 Insert figures

You can insert a simple static figure, for instance, with “hardcoded” paramters, while also displaying the code run that generated it, with the parameter “echo=TRUE”:

x <- 1:10
y <- x^3
plot(x,y)

Or you can display an equivalent simple figure using params defined in the markdown header and a few more custom values for plot and axis titles.

x <- 1:params$n
y <- x^3
plot(x, 
     y, 
     xlab="My X label",
     ylab="My Y label", 
     main="My Chart Title",
     sub=paste0("Chart generated on ", params$d, " by ", params$a) # "sub" in a plot stands for subtitle
     )

Or you can insert an awesome interactive chart, which can be as simple as printing out a “plotly”" object in a code chunk. Use the code snippet below, after fetching the data and functions.

1.3.7 Eval and Echo params

Dowload the function definition for GetYAhooData() here: https://github.com/royr2/StockPriceAnalytics/blob/master/support/Yahoo%20Stock%20Data%20Pull.R

Click at “Raw” in that page on Github to get the clean code as a “source” for a script to run

Here you get the display of a code chunk without evaluating it, and therefore, without printing the results from the R console after it is run:

  source("https://raw.githubusercontent.com/royr2/StockPriceAnalytics/master/support/Yahoo%20Stock%20Data%20Pull.R")

AAPL <- GetYahooData("AAPL")
IBM <- GetYahooData("IBM")

And here you get the output of running the code chunk without displaying the source code:

## [1] "Data pull successful..."

## [1] "Data pull successful..."

Let’s produce the interactive graph embedded in our Rmarkdown report:

# Plotly chart 
library(plotly)
mat <-  data.frame(Date = AAPL$Date, 
                   AAPL = round(AAPL$Adj.Close,2),
                   IBM = round(IBM$Adj.Close,2))
 
p <- mat %>% 
  plot_ly(x = Date, y = AAPL, fill = "tozeroy", name = "Microsoft") %>% 
  add_trace(y = IBM, fill = "tonexty", name = "IBM") %>% 
  layout(title = "Stock Prices", 
         xaxis = list(title = "Time"),
         yaxis = list(title = "Stock Prices"))
p  # Thats it !

1.4 Case study

Task for YOU: 
* Reproduce this example but selecting just one district of the whole Data set, different from the one shown in the subset example below (e.g. different from "Sants-Montjuïc"), and display the equivalent adapted tables.
* By the end of the day (after second part), you will be able to publish your report on the internet and send the link to the professor's email: xavier.depedro@seeds4c.org .
** For this second part, you need to register an account at https://rpubs.com (it's a free and easy process but you may need to validate an email that may take a while to reach your account, so we'd better start the account creation now)

1.4.1 Fetch Data & Plot chart

First, let’s get the data set we will be playing with. http://www.aspb.cat/quefem/docs/InformeSalut2014_2010.pdf

Massaged with this custom code: https://github.com/xavidp/rscripts/blob/master/tabulizer_summer_school_ub_2016.R

Data set:

require(knitr)

getwd()

## [1] "/home/xavi/Dropbox/2016_SummeR_School_UB_HospClinic/day3"

datafile <- "InformeSalut2014_2010.csv"
download.file(url="https://seeds4c.org/tiki-download_file.php?fileId=453", destfile=datafile)
# Take 1 at reading a csv file into R, with default function read.csv
my.data <- read.csv(datafile, check.names=FALSE)
vnames <- colnames(my.data)
colnames(my.data) <- c("District", "Suburb", paste0("V", 1:13))
dim(my.data)

## [1] 73 15

head(my.data)

##       District                                 Suburb    V1    V2     V3
## 1 Ciutat Vella                              El Raval  55,1  38,6   60,3 
## 2 Ciutat Vella                        El Barri Gòtic  56,4  33,6  103,6 
## 3 Ciutat Vella                        La Barceloneta  58,7  37,5   82,1 
## 4 Ciutat Vella Sant Pere, Santa Caterina i la Ribera  56,7  39,4   91,2 
## 5     Eixample                         El Fort Pienc  53,7  30,2   99,0 
## 6     Eixample                    La Sagrada Família  56,3  33,5   97,5 
##     V4    V5    V6    V7     V8     V9    V10  V11 V12 V13
## 1 40,6 12,4  54,9  80,3  119,8  149,8  118,2  17,5 5,3  NA
## 2 23,5  9,8  50,1  81,3  115,7  118,7   42,3   3,8 3,9  NA
## 3 33,2 12,8  52,9  80,4  118,6  163,9   44,8   7,9   8  NA
## 4 25,4 11,4  49,6  82,4  104,3  126,2   31,9  16,4   6  NA
## 5 19,3  8,7  36,0  84,3   97,2   72,7   13,1   2,6 6,2  NA
## 6 20,8  8,7  38,0  84,3   94,2   82,1   15,4   4,7 6,3  NA

# As you can see there is an extra column that came with a note, and not real data. Therefore, we can remove it
tablelegend <- cbind(colnames(my.data[1:14]), vnames[1:14])
tablelegend <- rbind(tablelegend, unlist(strsplit(vnames[15], ":")))
colnames(tablelegend) <- c("Variable Code", "Variable Description")
knitr::kable(tablelegend, caption = "Table with kable")

Table with kable
Variable Code	Variable Description
District	District (Barcelona-Catalonia-Spain)
Suburb	Suburb
V1	Rate of over-aging, year 2014
V2	% of 75 y.o people or more living alone, year 2014
V3	Available Family income rate, year 2013*
V4	% of 15 y.o people or more with primary studies or less, year 2014
V5	% of recorded unemployment 16-64 y.o, year 2014
V6	% of non-voters municipal elections, year 2015
V7	Life expectancy when born, period 2009-2013
V8	Comparative mortality rate, period 2009-2013*
V9	Rate of Potential life years lost, period 2009-2013*
V10	Tuberculosis Rate, period 2010-2014
V11	Teenager fecundity rate, period 2010-2014
V12	Prevalence of low weight when born, period 2010-2014
Note	* 100 based on the total of Barcelona; dark gray corresponds to 25% with the worst indicator, green 25% better indicator and light gray the remaining 50%.

# As you can see, all columns where taken as factors, and not as numbers.
str(my.data)

## 'data.frame':    73 obs. of  15 variables:
##  $ District: Factor w/ 10 levels "Ciutat Vella",..: 1 1 1 1 2 2 2 2 2 2 ...
##  $ Suburb  : Factor w/ 73 levels "Baró de Viver ",..: 22 8 26 64 17 39 29 37 36 59 ...
##  $ V1      : Factor w/ 60 levels "37,3 ","43,1 ",..: 43 47 52 48 38 46 45 47 33 49 ...
##  $ V2      : Factor w/ 52 levels "10,0 ","13,5 ",..: 50 42 49 52 24 41 30 38 37 45 ...
##  $ V3      : Factor w/ 71 levels "101,0 ","102,5 ",..: 37 3 60 68 71 70 14 10 5 2 ...
##  $ V4      : Factor w/ 66 levels "10,1","10,5",..: 53 22 42 26 14 16 6 7 11 19 ...
##  $ V5      : Factor w/ 51 levels "10,0 ","10,1 ",..: 15 51 18 10 43 43 37 41 42 50 ...
##  $ V6      : Factor w/ 61 levels "29,3 ","31,7 ",..: 57 51 55 50 14 24 8 17 16 27 ...
##  $ V7      : Factor w/ 43 levels "75,2 ","78,1 ",..: 7 9 8 14 32 32 18 27 34 27 ...
##  $ V8      : Factor w/ 71 levels "100,0 ","101,1 ",..: 27 23 25 10 64 54 16 71 50 61 ...
##  $ V9      : Factor w/ 72 levels "100,3 ","101,7 ",..: 28 14 31 21 41 50 48 60 52 4 ...
##  $ V10     : Factor w/ 65 levels "0,0 ","11,0 ",..: 5 51 52 47 12 17 11 2 10 28 ...
##  $ V11     : Factor w/ 66 levels "0","0,0 ","1",..: 20 41 59 18 30 47 3 28 36 52 ...
##  $ V12     : Factor w/ 43 levels "10,4","10,8",..: 13 6 34 18 20 21 8 23 20 16 ...
##  $ V13     : logi  NA NA NA NA NA NA ...

# Take 2 at reading the file, this time with readr package, since it's better for performance and
# for self-detecting data types: numbers as numbers and not as factors, for instance, in this example.
my.data <- readr::read_csv(datafile)
vnames <- colnames(my.data)
my.data<-my.data[,-15]
colnames(my.data) <- c("District", "Suburb", paste0("V", 1:12))
str(my.data)

## Classes 'tbl_df' and 'data.frame':   73 obs. of  14 variables:
##  $ District: chr  "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
##  $ Suburb  : chr  "El Raval" "El Barri Gòtic" "La Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
##  $ V1      : num  551 564 587 567 537 563 558 564 528 572 ...
##  $ V2      : num  386 336 375 394 302 335 311 332 330 349 ...
##  $ V3      : num  603 1036 821 912 990 ...
##  $ V4      : num  406 235 332 254 193 208 127 143 173 225 ...
##  $ V5      : num  124 98 128 114 87 87 71 84 86 95 ...
##  $ V6      : num  549 501 529 496 360 380 350 368 367 386 ...
##  $ V7      : num  803 813 804 824 843 843 829 838 846 838 ...
##  $ V8      : num  1198 1157 1186 1043 972 ...
##  $ V9      : num  1498 1187 1639 1262 727 ...
##  $ V10     : chr  "118,2" "42,3" "44,8" "31,9" ...
##  $ V11     : chr  "17,5" "3,8" "7,9" "16,4" ...
##  $ V12     : num  53 39 8 6 62 63 48 65 62 58 ...

# Take 3 at reading the file, this time setting the locale properly to indicate that the dataset came with the decimal mark also as a comma, besides the field delimiter. Field delimiter also comes surrounded by quotation marks, therefore there is no confusion with this format.
# We will finally detect all numbers as numbers and not as factors
my.data <- readr::read_csv(datafile, locale=readr::locale("es", decimal_mark = ","))
vnames <- colnames(my.data)
my.data<-my.data[,-15]
colnames(my.data) <- c("District", "Suburb", paste0("V", 1:12))
str(my.data)

## Classes 'tbl_df' and 'data.frame':   73 obs. of  14 variables:
##  $ District: chr  "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
##  $ Suburb  : chr  "El Raval" "El Barri Gòtic" "La Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
##  $ V1      : num  55.1 56.4 58.7 56.7 53.7 56.3 55.8 56.4 52.8 57.2 ...
##  $ V2      : num  38.6 33.6 37.5 39.4 30.2 33.5 31.1 33.2 33 34.9 ...
##  $ V3      : num  60.3 103.6 82.1 91.2 99 ...
##  $ V4      : num  40.6 23.5 33.2 25.4 19.3 20.8 12.7 14.3 17.3 22.5 ...
##  $ V5      : num  12.4 9.8 12.8 11.4 8.7 8.7 7.1 8.4 8.6 9.5 ...
##  $ V6      : num  54.9 50.1 52.9 49.6 36 38 35 36.8 36.7 38.6 ...
##  $ V7      : num  80.3 81.3 80.4 82.4 84.3 84.3 82.9 83.8 84.6 83.8 ...
##  $ V8      : num  119.8 115.7 118.6 104.3 97.2 ...
##  $ V9      : num  149.8 118.7 163.9 126.2 72.7 ...
##  $ V10     : num  118.2 42.3 44.8 31.9 13.1 ...
##  $ V11     : num  17.5 3.8 7.9 16.4 2.6 4.7 1 2.5 3.5 6.1 ...
##  $ V12     : num  5.3 3.9 8 6 6.2 6.3 4.8 6.5 6.2 5.8 ...

1.4.2 Transform data set?

You might want to aggregate by District:

my.ag.data <-aggregate(my.data[,3:14], by=list(my.data$District), 
  FUN=mean, na.rm=TRUE)
my.ag.data

##                Group.1       V1       V2        V3       V4        V5
## 1         Ciutat Vella 56.72500 37.27500  84.30000 30.67500 11.600000
## 2             Eixample 55.36667 32.65000 116.40000 17.81667  8.500000
## 3               Gràcia 51.78000 31.96000 102.20000 18.70000  8.920000
## 4       Horta-Guinardó 55.02727 28.59091  79.85455 29.14545 10.800000
## 5            Les Corts 50.13333 22.13333 180.13333 15.40000  5.933333
## 6           Nou Barris 54.80769 28.98462  52.23077 41.63846 15.015385
## 7          Sant Andreu 51.82857 31.92857  69.98571 33.98571 11.985714
## 8           Sant Martí 51.34000 29.54000  88.98000 27.06000 11.030000
## 9       Sants-Montjuïc 53.48889 31.71111  79.04444 31.32222 11.366667
## 10 Sarrià-Sant Gervasi 53.28000 28.22000 189.50000 10.40000  5.660000
##          V6       V7        V8        V9       V10       V11      V12
## 1  51.87500 81.10000 114.60000 139.65000 59.300000 11.400000 5.800000
## 2  36.85000 83.95000  98.10000  85.66667 14.116667  3.400000 5.966667
## 3  36.90000 83.64000  99.50000  93.72000 17.240000  4.220000 6.520000
## 4  40.99091 83.07273 106.59091 104.34545 16.172727  7.763636 7.136364
## 5  35.96667 84.13333  99.50000  93.03333  8.866667  3.533333 6.533333
## 6  47.87692 81.96154 110.80000 150.10000 19.892308 24.392308 8.246154
## 7  43.00000 83.27143  99.34286 117.87143 21.128571 13.200000 7.514286
## 8  38.79000 83.88000  97.05000  95.12000 19.130000  7.640000 6.350000
## 9  43.31111 83.48889 119.71111 148.02222 28.533333 11.544444 6.744444
## 10 33.54000 85.02000  90.42000  75.46000  9.120000  1.460000 5.380000

or you may want to subset Districts and eventually remove some column (V12 in this example):

my.subset <- subset(my.data, District == "Sants-Montjuïc", select = -V12)
knitr::kable(my.subset, caption = "Table with kable")

Table with kable
District	Suburb	V1	V2	V3	V4	V5	V6	V7	V8	V9	V10	V11
Sants-Montjuïc	Poble-sec	55.5	35.3	71.0	33.3	11.8	46.6	82.6	106.2	106.9	41.3	12.8
Sants-Montjuïc	La Marina del Prat Vermell	61.9	33.6	59.1	54.3	20.1	64.8	83.1	274.1	566.5	69.8	33.6
Sants-Montjuïc	La Marina del Port	53.6	30.0	70.9	37.6	12.1	44.5	84.0	96.3	110.8	25.1	10.2
Sants-Montjuïc	La Font de la Guatlla	51.5	30.2	77.8	26.4	12.4	41.7	82.3	110.8	96.4	11.7	2.8
Sants-Montjuïc	Hostafrancs	51.7	33.2	77.2	28.3	9.4	41.2	82.6	109.5	92.0	23.9	11.6
Sants-Montjuïc	La Bordeta	53.1	26.8	71.4	29.3	10.0	38.9	83.3	97.6	110.2	23.7	8.1
Sants-Montjuïc	Sants-Badal	50.8	31.5	76.6	30.6	8.3	41.2	84.5	92.0	88.3	22.2	14.7
Sants-Montjuïc	Sants	54.5	33.9	82.6	25.5	9.3	37.1	83.7	101.1	89.7	24.1	8.5
Sants-Montjuïc	Les Corts	48.8	30.9	124.8	16.6	8.9	33.8	85.3	89.8	71.4	15.0	1.6

TASK:
Subset the generic my.data set to get just one District.

1.4.3 Static tables using …

Click below on the different names (Kable, Xtable, Stargazer, formattable), to open the tabs with more information about each option. This display is organized within tabs (“tabbed” display; see the R Markdown document to see how it was generated).

1.4.3.1 Kable

knitr::kable(head(my.data[,1:5]), caption = "Table with kable")

Table with kable
District	Suburb	V1	V2	V3
Ciutat Vella	El Raval	55.1	38.6	60.3
Ciutat Vella	El Barri Gòtic	56.4	33.6	103.6
Ciutat Vella	La Barceloneta	58.7	37.5	82.1
Ciutat Vella	Sant Pere, Santa Caterina i la Ribera	56.7	39.4	91.2
Eixample	El Fort Pienc	53.7	30.2	99.0
Eixample	La Sagrada Família	56.3	33.5	97.5

1.4.3.2 Xtable

print(xtable::xtable(head(my.data[,1:5]), caption = "Table with xtable"),
 type = "html", html.table.attributes = "border=1")

Table with xtable
	District	Suburb	V1	V2	V3
1	Ciutat Vella	El Raval	55.10	38.60	60.30
2	Ciutat Vella	El Barri Gòtic	56.40	33.60	103.60
3	Ciutat Vella	La Barceloneta	58.70	37.50	82.10
4	Ciutat Vella	Sant Pere, Santa Caterina i la Ribera	56.70	39.40	91.20
5	Eixample	El Fort Pienc	53.70	30.20	99.00
6	Eixample	La Sagrada Família	56.30	33.50	97.50

1.4.3.3 Stargazer

stargazer::stargazer(head(my.data[,1:5]), type = "html",
 title = "Table with stargazer", summary=FALSE)

**Table with stargazer**

	District	Suburb	V1	V2	V3

1	Ciutat Vella	El Raval	55.1	38.6	60.3
2	Ciutat Vella	El Barri Gòtic	56.4	33.6	103.6
3	Ciutat Vella	La Barceloneta	58.7	37.5	82.1
4	Ciutat Vella	Sant Pere, Santa Caterina i la Ribera	56.7	39.4	91.2
5	Eixample	El Fort Pienc	53.7	30.2	99
6	Eixample	La Sagrada Família	56.3	33.5	97.5

1.4.3.4 Formattable

For those interested in table display of their results, there is another type of displaying tabular data with custom formats depending of the values shown, which is called formattable. It didn’t work by default with the provided examples, but for sure it’s a very promising package to improve the display of results in tables (at least for printing in color).

See: http://renkun.me/formattable/

1.4.3.4.1 tabset end

End of display of static tables in a set of tabs

1.4.4 Dynamic tables (i): gvT

gVT stands for GoogleVistTable, which is a dynamic table display from the GoogleVis R package. It allows you, at least, to sort columns based on their values in real time, and with some extra work, you would be able to paint cells based on some logic in your scripts (values, legend codes from figures, etc; see its documentation pages).

require(googleVis)
# Set the googleVis options first to change the behaviour of plot.gvis, so
# that only the chart component of the HTML file is written into the output
# file.
op <- options(gvis.plot.tag = "chart")

# Make a clone of my.data, with a first extra column for id or samples (with
# 2 digits for easy re-sorting later on)
my.data.indexed <- cbind(sprintf("%02d", as.numeric(rownames(my.subset))), my.subset)
colnames(my.data.indexed)[1] <- "#"

## Table with enabled paging
gvT <- gvisTable(my.data.indexed, options = list(page = "disable", height = "automatic", 
    width = "automatic"))
plot(gvT)

# save the googlevis table to disk

# Assign file name for the my.data.indexed
outFileName <- paste("my.subset", analystName, format(Sys.Date(), "%y%m%d"), 
    "indexed.html", sep = ".")

# Display just the chart in the generated html
cat(gvT$html$chart, file = outFileName)

1.4.5 Dynamic tables (ii): DT

The R package DT provides an R interface to the JavaScript library DataTables. R data objects (matrices or data frames) can be displayed as tables on HTML pages, and DataTables provides filtering, pagination, sorting, and many other features in the tables.

require('DT')
    d = data.frame(
      my.data,
      stringsAsFactors = FALSE
    )
    dt <- datatable(d, filter = 'bottom', options = list(pageLength = 5)) %>%
    formatStyle('V1',  
                color = styleInterval(c(0.5, 56), c('black', 'red', 'blue')),
                backgroundColor = styleInterval(56.5, c('snow', 'lightyellow')),
                fontWeight = styleInterval(58.0, c('italics', 'bold')))

Display within this Rmd file the dynamic table produced

dt

You can also save the whole dynamic table in its own html file, to reuse elsewhere or display in full width, etc.

saveWidget(dt, paste("my.data", analystName, format(Sys.Date(), "%y%m%d"),'summary.html', sep="."))

1.5 Questions?

Ask in class to your course mates or directly to the professor.

2 Reports in html (11:30-13:30h)

You have produced your analysis results, and you want to tell the world (or your customer) about it, without requiring complicated steps (requiring specific programs that might not be available in the computer or mobile device of your readers) to view your results.

You can use html reports, so that they can be easily seen by anyone, regardless of the Operating System they use, or device (tablet, smartphone, …), and they can be seen at any time.

2.1 Custom reports via markdown + knitr

Write html reports with R Markdown from R Studio. In a similar fashion to what you have seen here in this example.

See another example, for instance, here:

http://www.jacolienvanrij.com/Tutorials/tutorialMarkdown.html

2.2 Custom reports via R packages. Eg. Noozle.R1

Nozzle is an R package that provides an API to generate HTML reports with dynamic user interface elements based on JavaScript and CSS (Cascading Style Sheets). Nozzle was designed to facilitate summarization and rapid browsing of complex results in data analysis pipelines where multiple analyses are performed frequently on big data sets. The package can be applied to any project where user-friendly reports need to be created.

See more here:

You could publish this type of reports by means of uploading the html generated plus the png images and corresponding pdf files produced and linked from your report to some web or ftp/sftp server, for instance, that you or your work institution have access to, in order to leave the report web accessible by others through the web browser of through public ftp accounts.

2.3 Embed dynamic objects

Beyond static charts and graphs and beyond dynamic tables: maps, other plot.ly charts, Rcharts & htmlwidgets, animations, Shiny apps, …

2.3.1 Maps with Plot.ly (i)

# Learn about API authentication here: https://plot.ly/r/getting-started
# Find your api_key here: https://plot.ly/settings/api

df <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_world_gdp_with_codes.csv')

# light grey boundaries
l <- list(color = toRGB("grey"), width = 0.5)

# specify map projection/options
g <- list(
  showframe = FALSE,
  showcoastlines = FALSE,
  projection = list(type = 'Equirectangular') # Instead of the usual but very biased 'Mercator' projection
)

plot_ly(df, z = GDP..BILLIONS., text = COUNTRY, locations = CODE, type = 'choropleth',
        color = GDP..BILLIONS., colors = 'Blues', marker = list(line = l),
        colorbar = list(tickprefix = '$', title = 'GDP Billions US$')) %>%
  layout(title = '2014 Global GDP
Source:CIA World Factbook',
         geo = g)

2.3.2 Maps with Plot.ly (ii)

df <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_ebola.csv')
# restrict from June to September
df <- subset(df, Month %in% 6:9)
# ordered factor variable with month abbreviations
df$abbrev <- ordered(month.abb[df$Month], levels = month.abb[6:9])
# September totals
df9 <- subset(df, Month == 9)

# common plot options
g <- list(
  scope = 'africa',
  showframe = F,
  showland = T,
  landcolor = toRGB("grey90")
)

g1 <- c(
  g,
  resolution = 50,
  showcoastlines = T,
  countrycolor = toRGB("white"),
  coastlinecolor = toRGB("white"),
  projection = list(type = 'Mercator'),
  list(lonaxis = list(range = c(-15, -5))),
  list(lataxis = list(range = c(0, 12))),
  list(domain = list(x = c(0, 1), y = c(0, 1)))
)

g2 <- c(
  g,
  showcountries = F,
  bgcolor = toRGB("white", alpha = 0),
  list(domain = list(x = c(0, .6), y = c(0, .6)))
)

plot_ly(df, type = 'scattergeo', mode = 'markers', locations = Country,
        locationmode = 'country names', text = paste(Value, "cases"),
        color = as.ordered(abbrev), marker = list(size = Value/50), inherit = F) %>%
  add_trace(type = 'scattergeo', mode = 'text', geo = 'geo2', showlegend = F,
            lon = 21.0936, lat = 7.1881, text = 'Africa') %>%
  add_trace(type = 'choropleth', locations = Country, locationmode = 'country names',
            z = Month, colors = "black", showscale = F, geo = 'geo2', data = df9) %>%
  layout(title = 'Ebola cases reported by month in West Africa 2014<br> Source: <a href="https://data.hdx.rwlabs.org/dataset/rowca-ebola-cases">HDX</a>',
         geo = g1, geo2 = g2)

2.3.3 Other plot.ly charts

See other plot.ly charts and applications, from several categories:

Basic
Statistical
Scientific
Maps
3D
Add events ad controls

2.3.4 Google Charts (other than tables)

See googleVis examples: https://cran.r-project.org/web/packages/googleVis/vignettes/googleVis_examples.html

2.3.5 RCharts & htmlwidgets

The rCharts is an R package to create, customize and publish interactive javascript visualizations from R using a familiar lattice style plotting interface. See: http://rcharts.io

The htmlwidgets package brings the best of JavaScript data visualization to R. Use JavaScript visualization libraries at the R console, just like plots. Embed widgets in R Markdown documents and Shiny web applications. Develop new widgets using a framework that seamlessly bridges R and JavaScript

See:

2.3.6 Inserting Shiny Apps

Shiny is a web application framework for R. You can turn your analyses into interactive web applications, without much previous knowledge of HTML, CSS, or JavaScript being required.

You can insert a shiny application in your Markdown document if you change the paramter runtime: static into runtime: shiny. Try that, and then set the chunk parameter to eval=TRUE to run this chunk of code and get the shiny application displayed here below:

source("https://raw.githubusercontent.com/rstudio/rmdexamples/master/R/kmeans_cluster.R")
kmeans_cluster(iris)

2.3.7 Where to publish your reports

Easy web publishing from R with RPubs (among others). Write R Markdown documents in RStudio. Share them here on RPubs. (It’s free, and couldn’t be simpler!)

See http://rpubs.com/about/getting-started

RStudio lets you harness the power of R Markdown to create documents that weave together your writing and the output of your R code. And now, with RPubs, you can publish those documents on the web with the click of a button!

Prerequisites

You’ll need R itself, RStudio (v0.96.230 or later), and the knitr package (v0.5 or later).

Instructions

In RStudio, create a new R Markdown document by choosing File | New | R Markdown. Click the Knit HTML button in the doc toolbar to preview your document. In the preview window, click the Publish button.

2.4 Case study

Task for YOU: 
* Add a the equivalent Plot.ly scatter plot in your report, as the one shown below, but for the same subset of data that you did in the previous part of the session today. 
* Publish your report on the internet at [RPubs](rpubs.com) and send the link to the professor's email: xavier.depedro@seeds4c.org .

require(knitr)

getwd()

## [1] "/home/xavi/Dropbox/2016_SummeR_School_UB_HospClinic/day3"

datafile <- "InformeSalut2014_2010.csv"
download.file(url="https://seeds4c.org/tiki-download_file.php?fileId=453", destfile=datafile)

# Get all data in place
my.data <- readr::read_csv(datafile, locale=readr::locale("es", decimal_mark = ","))
vnames <- colnames(my.data)
my.data<-my.data[,-15]
colnames(my.data) <- c("District", "Suburb", paste0("V", 1:12))
str(my.data)

## Classes 'tbl_df' and 'data.frame':   73 obs. of  14 variables:
##  $ District: chr  "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
##  $ Suburb  : chr  "El Raval" "El Barri Gòtic" "La Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
##  $ V1      : num  55.1 56.4 58.7 56.7 53.7 56.3 55.8 56.4 52.8 57.2 ...
##  $ V2      : num  38.6 33.6 37.5 39.4 30.2 33.5 31.1 33.2 33 34.9 ...
##  $ V3      : num  60.3 103.6 82.1 91.2 99 ...
##  $ V4      : num  40.6 23.5 33.2 25.4 19.3 20.8 12.7 14.3 17.3 22.5 ...
##  $ V5      : num  12.4 9.8 12.8 11.4 8.7 8.7 7.1 8.4 8.6 9.5 ...
##  $ V6      : num  54.9 50.1 52.9 49.6 36 38 35 36.8 36.7 38.6 ...
##  $ V7      : num  80.3 81.3 80.4 82.4 84.3 84.3 82.9 83.8 84.6 83.8 ...
##  $ V8      : num  119.8 115.7 118.6 104.3 97.2 ...
##  $ V9      : num  149.8 118.7 163.9 126.2 72.7 ...
##  $ V10     : num  118.2 42.3 44.8 31.9 13.1 ...
##  $ V11     : num  17.5 3.8 7.9 16.4 2.6 4.7 1 2.5 3.5 6.1 ...
##  $ V12     : num  5.3 3.9 8 6 6.2 6.3 4.8 6.5 6.2 5.8 ...

# As you can see there is an extra column that came with a note, and not real data. Therefore, we can remove it
tablelegend <- cbind(colnames(my.data[1:14]), vnames[1:14])
tablelegend <- rbind(tablelegend, unlist(strsplit(vnames[15], ":")))
colnames(tablelegend) <- c("Variable Code", "Variable Description")
knitr::kable(tablelegend, caption = "Table with kable")

Table with kable
Variable Code	Variable Description
District	District (Barcelona-Catalonia-Spain)
Suburb	Suburb
V1	Rate of over-aging, year 2014
V2	% of 75 y.o people or more living alone, year 2014
V3	Available Family income rate, year 2013*
V4	% of 15 y.o people or more with primary studies or less, year 2014
V5	% of recorded unemployment 16-64 y.o, year 2014
V6	% of non-voters municipal elections, year 2015
V7	Life expectancy when born, period 2009-2013
V8	Comparative mortality rate, period 2009-2013*
V9	Rate of Potential life years lost, period 2009-2013*
V10	Tuberculosis Rate, period 2010-2014
V11	Teenager fecundity rate, period 2010-2014
V12	Prevalence of low weight when born, period 2010-2014
Note	* 100 based on the total of Barcelona; dark gray corresponds to 25% with the worst indicator, green 25% better indicator and light gray the remaining 50%.

# Simple interactive scatter chart
library(plotly)

# Add some info about variables displayed
cat(paste0("V3: ", tablelegend[5,2]),
    paste0("\nV5: ", tablelegend[7,2]),
    paste0("\nV6: ", tablelegend[8,2]),
    paste0("\nV11: ", tablelegend[13,2]))

## V3: Available Family income rate, year 2013* 
## V5: % of recorded unemployment 16-64 y.o, year 2014 
## V6: % of non-voters municipal elections, year 2015 
## V11: Teenager fecundity rate, period 2010-2014

plot_ly(my.data, 
        x = V5, y = V6, text = paste("Over-aging: ", V1, 
                                     "Income: ", V3,
                                     "Fecundity: ", V11,
                                     "Suburb: ", Suburb),
        mode="marker",
        size = V3, opacity = V3,
        group = District)

2.4.1 Publish your report

Publish your report at RPubs, with the account your registered earlier today.

If needed, see again: http://rpubs.com/about/getting-started

Once published, if you make new changes to your document, you have the option to republish the updated version to the same url, or publish a new document.

2.5 Questions?

Ask in class to your course mates or directly to the professor, if time permits.

3 Last notes

This document has been produced with the following paramters in the R Markdown document and the corresponding versions of R packages:

3.1 Markdown header parameters

---
title: "Markdown, Automation & HTML Reports (UBISS16, Jul 6)"
author: "Xavier de Pedro, Ph.D. xavier.depedro@vhir.org - https://seeds4c.org/ubiss16d3"
date: "July 6, 2016" 
output:
  html_document:
    toc: true
    number_sections: true
    toc_depth: 3
    toc_float: 
      collapsed: true
      smooth_scroll: true
    theme: united
  pdf_document:
    toc: true
    highlight: zenburn
runtime: static
params:
  n: 100
  d: !r Sys.Date()
  a: "My Name"
---

3.2 Generate R or PDF from Rmd

You can extract all the R commands our of the R Markdown document (Rmd) by means of using the purl command, from knitr package:

wd <- "/home/xavi/Dropbox/2016_SummeR_School_UB_HospClinic/day3"
myfile <- file.path(wd, "ubiss16_d3_Markdown_Automation_Html_Reports.Rmd")
knitr::purl(myfile)

If you are using rmarkdown::render then you can pass a format name to render to select from the available formats. For example:

wd <- "/home/xavi/Dropbox/2016_SummeR_School_UB_HospClinic/day3"
myfile <- file.path(wd, "ubiss16_d3_Markdown_Automation_Html_Reports.Rmd")
render(myfile, "pdf_document")

# You can also render all formats defined in an input file with:
render(myfile, "all")

3.3 Session info

sessionInfo()

## R version 3.3.0 (2016-05-03)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04 LTS
## 
## locale:
##  [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=ca_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
##  [5] LC_MONETARY=ca_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8   
##  [7] LC_PAPER=ca_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] shiny_0.13.2     webshot_0.3.2    DT_0.1           googleVis_0.5.10
##  [5] stargazer_5.2    xtable_1.8-2     readr_0.2.2      plotly_3.6.0    
##  [9] ggplot2_2.1.0    knitr_1.13       rmarkdown_0.9.6 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5        RColorBrewer_1.1-2 highr_0.6         
##  [4] formatR_1.4        plyr_1.8.3         base64enc_0.1-3   
##  [7] viridis_0.3.4      tools_3.3.0        digest_0.6.9      
## [10] jsonlite_1.0       evaluate_0.9       gtable_0.2.0      
## [13] DBI_0.4-1          yaml_2.1.13        parallel_3.3.0    
## [16] gridExtra_2.2.1    dplyr_0.4.3        httr_1.1.0        
## [19] stringr_1.0.0      htmlwidgets_0.6    grid_3.3.0        
## [22] R6_2.1.2           RJSONIO_1.3-0      tidyr_0.4.1       
## [25] magrittr_1.5       scales_0.4.0       htmltools_0.3.5   
## [28] assertthat_0.1     mime_0.4           colorspace_1.2-6  
## [31] httpuv_1.3.3       stringi_1.1.1      munsell_0.4.3

## Set options back to original options
options(op)