UBISSS16 Day 3. 9:00 h - 11:00 h (Xavier de Pedro) http://rpubs.com/Xavi/ubiss16d3
Using knitr, r-markdown and Plot.ly.
Some examples from:
You can display the code that is run without showing the usual messages produced in the R console when installing a package or loading a library, by means of using the parameter “message=FALSE”:
if (!require(rmarkdown, quietly = TRUE)) {
install.packages('rmarkdown', repos='http://cran.rediris.es')
}
if (!require(knitr, quietly = TRUE)) {
install.packages('knitr', repos='http://cran.rediris.es')
}
if (!require(plotly, quietly = TRUE)) {
install.packages('plotly', repos='http://cran.rediris.es')
}
if (!require(readr)) {
install.packages('readr', repos='http://cran.rediris.es')
}
if (!require(xtable)) {
install.packages('xtable', repos='http://cran.rediris.es')
}
if (!require(stargazer)) {
install.packages('stargazer', repos='http://cran.rediris.es')
}
if (suppressPackageStartupMessages(!require(googleVis, quietly = TRUE))) {
install.packages('googleVis', repos='http://cran.rediris.es')
}
if (!require("DT", quietly = T)) {
install.packages('DT', repos = 'http://cran.rstudio.com')
}
if (!require("webshot", quietly = T)) {
install.packages('webshot', repos = 'http://cran.rstudio.com')
}
if (!require("shiny", quietly = T)) {
install.packages('shiny', repos = 'http://cran.rstudio.com')
}
Try re-running it with “message=TRUE”
Creating documents with R Markdown starts with an .Rmd file that contains a combination of markdown (content with simple text formatting) and R code chunks. The .Rmd file is fed to knitr, which executes all of the R code chunks and creates a new markdown (.md) document which includes the R code and it’s output.The markdown file generated by knitr is then processed by pandoc which is responsible for creating a finished web page, PDF, MS Word document, slide show, handout, book, dashboard, package vignette or other format.This may sound complicated, but R Markdown makes it extremely simple by encapsulating all of the above processing into a single render function. Better still, RStudio includes a “Knit” button that enables you to render an .Rmd and preview it using a single click or keyboard shortcut.
You can install the R Markdown package from CRAN as follows:
install.packages("rmarkdown")
In our case, it was installed (only if needed) in a step above. Once installed, you can open a new .Rmd file in the RStudio IDE by going to File > New File > R Markdown.
Markdown is a simple formatting language designed to make authoring content easy for everyone. Rather than write in complex markup code (e.g. HTML or LaTex), you write in plain text with formatting cues. Pandoc uses these cues to turn your document into attractive output. For example, the file on the left shows basic Markdown and the resulting output on the right:
Within an R Markdown file, R Code Chunks can be embedded with the native Markdown syntax for fenced code regions. For example, the following code chunk computes a data summary and renders a plot as a PNG image:
You can also evaluate R expressions inline by enclosing the expression within a single back-tick qualified with ‘r’. For example, the following code embeds R results as text in the output at right
There are two ways to render an R Markdown document into it’s final output format. If you are using RStudio, then the “Knit” button (Ctrl+Shift+K) will render the document and display a preview of it.If you are not using RStudio then you simply need to call the -+rmarkdown::render+- function, for example:
rmarkdown::render("input.Rmd")
Note that both methods use the same mechanism; RStudio’s “Knit” button calls rmarkdown::render() under the hood.
R Markdown documents can contain a metadata section that includes title, author, and date information as well as options for customizing output. For example, this metadata included at the top of an .Rmd file adds a table of contents and chooses a different HTML theme:
---
title: "Sample Document"
output:
html_document:
toc: true
theme: united
---
You can add a params field to the metadata to provide a list of values for your document to use. R Markdown will make the list available as params within any R code chunk in the report. For example, the file below takes a filename as a parameter and uses the name to read in a data set.
Parameters let you quickly apply your data set to new data sets, models, and parameters. You can set new values for the parameters when you call -+rmarkdown::render()+-,
rmarkdown::render("input.Rmd", params = list())
as well as when you press the “Knit” button:
You can insert a simple static figure, for instance, with “hardcoded” paramters, while also displaying the code run that generated it, with the parameter “echo=TRUE”:
x <- 1:10
y <- x^3
plot(x,y)
Or you can display an equivalent simple figure using params defined in the markdown header and a few more custom values for plot and axis titles.
x <- 1:params$n
y <- x^3
plot(x,
y,
xlab="My X label",
ylab="My Y label",
main="My Chart Title",
sub=paste0("Chart generated on ", params$d, " by ", params$a) # "sub" in a plot stands for subtitle
)
Or you can insert an awesome interactive chart, which can be as simple as printing out a “plotly”" object in a code chunk. Use the code snippet below, after fetching the data and functions.
Dowload the function definition for GetYAhooData() here: https://github.com/royr2/StockPriceAnalytics/blob/master/support/Yahoo%20Stock%20Data%20Pull.R
Click at “Raw” in that page on Github to get the clean code as a “source” for a script to run
Here you get the display of a code chunk without evaluating it, and therefore, without printing the results from the R console after it is run:
source("https://raw.githubusercontent.com/royr2/StockPriceAnalytics/master/support/Yahoo%20Stock%20Data%20Pull.R")
AAPL <- GetYahooData("AAPL")
IBM <- GetYahooData("IBM")
And here you get the output of running the code chunk without displaying the source code:
## [1] "Data pull successful..."
## [1] "Data pull successful..."
Let’s produce the interactive graph embedded in our Rmarkdown report:
# Plotly chart
library(plotly)
mat <- data.frame(Date = AAPL$Date,
AAPL = round(AAPL$Adj.Close,2),
IBM = round(IBM$Adj.Close,2))
p <- mat %>%
plot_ly(x = Date, y = AAPL, fill = "tozeroy", name = "Microsoft") %>%
add_trace(y = IBM, fill = "tonexty", name = "IBM") %>%
layout(title = "Stock Prices",
xaxis = list(title = "Time"),
yaxis = list(title = "Stock Prices"))
p # Thats it !
Task for YOU:
* Reproduce this example but selecting just one district of the whole Data set, different from the one shown in the subset example below (e.g. different from "Sants-Montjuïc"), and display the equivalent adapted tables.
* By the end of the day (after second part), you will be able to publish your report on the internet and send the link to the professor's email: xavier.depedro@seeds4c.org .
** For this second part, you need to register an account at https://rpubs.com (it's a free and easy process but you may need to validate an email that may take a while to reach your account, so we'd better start the account creation now)
First, let’s get the data set we will be playing with. http://www.aspb.cat/quefem/docs/InformeSalut2014_2010.pdf
Massaged with this custom code: https://github.com/xavidp/rscripts/blob/master/tabulizer_summer_school_ub_2016.R
Data set:
require(knitr)
getwd()
## [1] "/home/xavi/Dropbox/2016_SummeR_School_UB_HospClinic/day3"
datafile <- "InformeSalut2014_2010.csv"
download.file(url="https://seeds4c.org/tiki-download_file.php?fileId=453", destfile=datafile)
# Take 1 at reading a csv file into R, with default function read.csv
my.data <- read.csv(datafile, check.names=FALSE)
vnames <- colnames(my.data)
colnames(my.data) <- c("District", "Suburb", paste0("V", 1:13))
dim(my.data)
## [1] 73 15
head(my.data)
## District Suburb V1 V2 V3
## 1 Ciutat Vella El Raval 55,1 38,6 60,3
## 2 Ciutat Vella El Barri Gòtic 56,4 33,6 103,6
## 3 Ciutat Vella La Barceloneta 58,7 37,5 82,1
## 4 Ciutat Vella Sant Pere, Santa Caterina i la Ribera 56,7 39,4 91,2
## 5 Eixample El Fort Pienc 53,7 30,2 99,0
## 6 Eixample La Sagrada Família 56,3 33,5 97,5
## V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 1 40,6 12,4 54,9 80,3 119,8 149,8 118,2 17,5 5,3 NA
## 2 23,5 9,8 50,1 81,3 115,7 118,7 42,3 3,8 3,9 NA
## 3 33,2 12,8 52,9 80,4 118,6 163,9 44,8 7,9 8 NA
## 4 25,4 11,4 49,6 82,4 104,3 126,2 31,9 16,4 6 NA
## 5 19,3 8,7 36,0 84,3 97,2 72,7 13,1 2,6 6,2 NA
## 6 20,8 8,7 38,0 84,3 94,2 82,1 15,4 4,7 6,3 NA
# As you can see there is an extra column that came with a note, and not real data. Therefore, we can remove it
tablelegend <- cbind(colnames(my.data[1:14]), vnames[1:14])
tablelegend <- rbind(tablelegend, unlist(strsplit(vnames[15], ":")))
colnames(tablelegend) <- c("Variable Code", "Variable Description")
knitr::kable(tablelegend, caption = "Table with kable")
| Variable Code | Variable Description |
|---|---|
| District | District (Barcelona-Catalonia-Spain) |
| Suburb | Suburb |
| V1 | Rate of over-aging, year 2014 |
| V2 | % of 75 y.o people or more living alone, year 2014 |
| V3 | Available Family income rate, year 2013* |
| V4 | % of 15 y.o people or more with primary studies or less, year 2014 |
| V5 | % of recorded unemployment 16-64 y.o, year 2014 |
| V6 | % of non-voters municipal elections, year 2015 |
| V7 | Life expectancy when born, period 2009-2013 |
| V8 | Comparative mortality rate, period 2009-2013* |
| V9 | Rate of Potential life years lost, period 2009-2013* |
| V10 | Tuberculosis Rate, period 2010-2014 |
| V11 | Teenager fecundity rate, period 2010-2014 |
| V12 | Prevalence of low weight when born, period 2010-2014 |
| Note | * 100 based on the total of Barcelona; dark gray corresponds to 25% with the worst indicator, green 25% better indicator and light gray the remaining 50%. |
# As you can see, all columns where taken as factors, and not as numbers.
str(my.data)
## 'data.frame': 73 obs. of 15 variables:
## $ District: Factor w/ 10 levels "Ciutat Vella",..: 1 1 1 1 2 2 2 2 2 2 ...
## $ Suburb : Factor w/ 73 levels "Baró de Viver ",..: 22 8 26 64 17 39 29 37 36 59 ...
## $ V1 : Factor w/ 60 levels "37,3 ","43,1 ",..: 43 47 52 48 38 46 45 47 33 49 ...
## $ V2 : Factor w/ 52 levels "10,0 ","13,5 ",..: 50 42 49 52 24 41 30 38 37 45 ...
## $ V3 : Factor w/ 71 levels "101,0 ","102,5 ",..: 37 3 60 68 71 70 14 10 5 2 ...
## $ V4 : Factor w/ 66 levels "10,1","10,5",..: 53 22 42 26 14 16 6 7 11 19 ...
## $ V5 : Factor w/ 51 levels "10,0 ","10,1 ",..: 15 51 18 10 43 43 37 41 42 50 ...
## $ V6 : Factor w/ 61 levels "29,3 ","31,7 ",..: 57 51 55 50 14 24 8 17 16 27 ...
## $ V7 : Factor w/ 43 levels "75,2 ","78,1 ",..: 7 9 8 14 32 32 18 27 34 27 ...
## $ V8 : Factor w/ 71 levels "100,0 ","101,1 ",..: 27 23 25 10 64 54 16 71 50 61 ...
## $ V9 : Factor w/ 72 levels "100,3 ","101,7 ",..: 28 14 31 21 41 50 48 60 52 4 ...
## $ V10 : Factor w/ 65 levels "0,0 ","11,0 ",..: 5 51 52 47 12 17 11 2 10 28 ...
## $ V11 : Factor w/ 66 levels "0","0,0 ","1",..: 20 41 59 18 30 47 3 28 36 52 ...
## $ V12 : Factor w/ 43 levels "10,4","10,8",..: 13 6 34 18 20 21 8 23 20 16 ...
## $ V13 : logi NA NA NA NA NA NA ...
# Take 2 at reading the file, this time with readr package, since it's better for performance and
# for self-detecting data types: numbers as numbers and not as factors, for instance, in this example.
my.data <- readr::read_csv(datafile)
vnames <- colnames(my.data)
my.data<-my.data[,-15]
colnames(my.data) <- c("District", "Suburb", paste0("V", 1:12))
str(my.data)
## Classes 'tbl_df' and 'data.frame': 73 obs. of 14 variables:
## $ District: chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Suburb : chr "El Raval" "El Barri Gòtic" "La Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ V1 : num 551 564 587 567 537 563 558 564 528 572 ...
## $ V2 : num 386 336 375 394 302 335 311 332 330 349 ...
## $ V3 : num 603 1036 821 912 990 ...
## $ V4 : num 406 235 332 254 193 208 127 143 173 225 ...
## $ V5 : num 124 98 128 114 87 87 71 84 86 95 ...
## $ V6 : num 549 501 529 496 360 380 350 368 367 386 ...
## $ V7 : num 803 813 804 824 843 843 829 838 846 838 ...
## $ V8 : num 1198 1157 1186 1043 972 ...
## $ V9 : num 1498 1187 1639 1262 727 ...
## $ V10 : chr "118,2" "42,3" "44,8" "31,9" ...
## $ V11 : chr "17,5" "3,8" "7,9" "16,4" ...
## $ V12 : num 53 39 8 6 62 63 48 65 62 58 ...
# Take 3 at reading the file, this time setting the locale properly to indicate that the dataset came with the decimal mark also as a comma, besides the field delimiter. Field delimiter also comes surrounded by quotation marks, therefore there is no confusion with this format.
# We will finally detect all numbers as numbers and not as factors
my.data <- readr::read_csv(datafile, locale=readr::locale("es", decimal_mark = ","))
vnames <- colnames(my.data)
my.data<-my.data[,-15]
colnames(my.data) <- c("District", "Suburb", paste0("V", 1:12))
str(my.data)
## Classes 'tbl_df' and 'data.frame': 73 obs. of 14 variables:
## $ District: chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Suburb : chr "El Raval" "El Barri Gòtic" "La Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ V1 : num 55.1 56.4 58.7 56.7 53.7 56.3 55.8 56.4 52.8 57.2 ...
## $ V2 : num 38.6 33.6 37.5 39.4 30.2 33.5 31.1 33.2 33 34.9 ...
## $ V3 : num 60.3 103.6 82.1 91.2 99 ...
## $ V4 : num 40.6 23.5 33.2 25.4 19.3 20.8 12.7 14.3 17.3 22.5 ...
## $ V5 : num 12.4 9.8 12.8 11.4 8.7 8.7 7.1 8.4 8.6 9.5 ...
## $ V6 : num 54.9 50.1 52.9 49.6 36 38 35 36.8 36.7 38.6 ...
## $ V7 : num 80.3 81.3 80.4 82.4 84.3 84.3 82.9 83.8 84.6 83.8 ...
## $ V8 : num 119.8 115.7 118.6 104.3 97.2 ...
## $ V9 : num 149.8 118.7 163.9 126.2 72.7 ...
## $ V10 : num 118.2 42.3 44.8 31.9 13.1 ...
## $ V11 : num 17.5 3.8 7.9 16.4 2.6 4.7 1 2.5 3.5 6.1 ...
## $ V12 : num 5.3 3.9 8 6 6.2 6.3 4.8 6.5 6.2 5.8 ...
You might want to aggregate by District:
my.ag.data <-aggregate(my.data[,3:14], by=list(my.data$District),
FUN=mean, na.rm=TRUE)
my.ag.data
## Group.1 V1 V2 V3 V4 V5
## 1 Ciutat Vella 56.72500 37.27500 84.30000 30.67500 11.600000
## 2 Eixample 55.36667 32.65000 116.40000 17.81667 8.500000
## 3 Gràcia 51.78000 31.96000 102.20000 18.70000 8.920000
## 4 Horta-Guinardó 55.02727 28.59091 79.85455 29.14545 10.800000
## 5 Les Corts 50.13333 22.13333 180.13333 15.40000 5.933333
## 6 Nou Barris 54.80769 28.98462 52.23077 41.63846 15.015385
## 7 Sant Andreu 51.82857 31.92857 69.98571 33.98571 11.985714
## 8 Sant Martí 51.34000 29.54000 88.98000 27.06000 11.030000
## 9 Sants-Montjuïc 53.48889 31.71111 79.04444 31.32222 11.366667
## 10 Sarrià-Sant Gervasi 53.28000 28.22000 189.50000 10.40000 5.660000
## V6 V7 V8 V9 V10 V11 V12
## 1 51.87500 81.10000 114.60000 139.65000 59.300000 11.400000 5.800000
## 2 36.85000 83.95000 98.10000 85.66667 14.116667 3.400000 5.966667
## 3 36.90000 83.64000 99.50000 93.72000 17.240000 4.220000 6.520000
## 4 40.99091 83.07273 106.59091 104.34545 16.172727 7.763636 7.136364
## 5 35.96667 84.13333 99.50000 93.03333 8.866667 3.533333 6.533333
## 6 47.87692 81.96154 110.80000 150.10000 19.892308 24.392308 8.246154
## 7 43.00000 83.27143 99.34286 117.87143 21.128571 13.200000 7.514286
## 8 38.79000 83.88000 97.05000 95.12000 19.130000 7.640000 6.350000
## 9 43.31111 83.48889 119.71111 148.02222 28.533333 11.544444 6.744444
## 10 33.54000 85.02000 90.42000 75.46000 9.120000 1.460000 5.380000
or you may want to subset Districts and eventually remove some column (V12 in this example):
my.subset <- subset(my.data, District == "Sants-Montjuïc", select = -V12)
knitr::kable(my.subset, caption = "Table with kable")
| District | Suburb | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sants-Montjuïc | Poble-sec | 55.5 | 35.3 | 71.0 | 33.3 | 11.8 | 46.6 | 82.6 | 106.2 | 106.9 | 41.3 | 12.8 |
| Sants-Montjuïc | La Marina del Prat Vermell | 61.9 | 33.6 | 59.1 | 54.3 | 20.1 | 64.8 | 83.1 | 274.1 | 566.5 | 69.8 | 33.6 |
| Sants-Montjuïc | La Marina del Port | 53.6 | 30.0 | 70.9 | 37.6 | 12.1 | 44.5 | 84.0 | 96.3 | 110.8 | 25.1 | 10.2 |
| Sants-Montjuïc | La Font de la Guatlla | 51.5 | 30.2 | 77.8 | 26.4 | 12.4 | 41.7 | 82.3 | 110.8 | 96.4 | 11.7 | 2.8 |
| Sants-Montjuïc | Hostafrancs | 51.7 | 33.2 | 77.2 | 28.3 | 9.4 | 41.2 | 82.6 | 109.5 | 92.0 | 23.9 | 11.6 |
| Sants-Montjuïc | La Bordeta | 53.1 | 26.8 | 71.4 | 29.3 | 10.0 | 38.9 | 83.3 | 97.6 | 110.2 | 23.7 | 8.1 |
| Sants-Montjuïc | Sants-Badal | 50.8 | 31.5 | 76.6 | 30.6 | 8.3 | 41.2 | 84.5 | 92.0 | 88.3 | 22.2 | 14.7 |
| Sants-Montjuïc | Sants | 54.5 | 33.9 | 82.6 | 25.5 | 9.3 | 37.1 | 83.7 | 101.1 | 89.7 | 24.1 | 8.5 |
| Sants-Montjuïc | Les Corts | 48.8 | 30.9 | 124.8 | 16.6 | 8.9 | 33.8 | 85.3 | 89.8 | 71.4 | 15.0 | 1.6 |
TASK:
Subset the generic my.data set to get just one District.
Click below on the different names (Kable, Xtable, Stargazer, formattable), to open the tabs with more information about each option. This display is organized within tabs (“tabbed” display; see the R Markdown document to see how it was generated).
knitr::kable(head(my.data[,1:5]), caption = "Table with kable")
| District | Suburb | V1 | V2 | V3 |
|---|---|---|---|---|
| Ciutat Vella | El Raval | 55.1 | 38.6 | 60.3 |
| Ciutat Vella | El Barri Gòtic | 56.4 | 33.6 | 103.6 |
| Ciutat Vella | La Barceloneta | 58.7 | 37.5 | 82.1 |
| Ciutat Vella | Sant Pere, Santa Caterina i la Ribera | 56.7 | 39.4 | 91.2 |
| Eixample | El Fort Pienc | 53.7 | 30.2 | 99.0 |
| Eixample | La Sagrada Família | 56.3 | 33.5 | 97.5 |
print(xtable::xtable(head(my.data[,1:5]), caption = "Table with xtable"),
type = "html", html.table.attributes = "border=1")
| District | Suburb | V1 | V2 | V3 | |
|---|---|---|---|---|---|
| 1 | Ciutat Vella | El Raval | 55.10 | 38.60 | 60.30 |
| 2 | Ciutat Vella | El Barri Gòtic | 56.40 | 33.60 | 103.60 |
| 3 | Ciutat Vella | La Barceloneta | 58.70 | 37.50 | 82.10 |
| 4 | Ciutat Vella | Sant Pere, Santa Caterina i la Ribera | 56.70 | 39.40 | 91.20 |
| 5 | Eixample | El Fort Pienc | 53.70 | 30.20 | 99.00 |
| 6 | Eixample | La Sagrada Família | 56.30 | 33.50 | 97.50 |
stargazer::stargazer(head(my.data[,1:5]), type = "html",
title = "Table with stargazer", summary=FALSE)
| District | Suburb | V1 | V2 | V3 | |
| 1 | Ciutat Vella | El Raval | 55.1 | 38.6 | 60.3 |
| 2 | Ciutat Vella | El Barri Gòtic | 56.4 | 33.6 | 103.6 |
| 3 | Ciutat Vella | La Barceloneta | 58.7 | 37.5 | 82.1 |
| 4 | Ciutat Vella | Sant Pere, Santa Caterina i la Ribera | 56.7 | 39.4 | 91.2 |
| 5 | Eixample | El Fort Pienc | 53.7 | 30.2 | 99 |
| 6 | Eixample | La Sagrada Família | 56.3 | 33.5 | 97.5 |
For those interested in table display of their results, there is another type of displaying tabular data with custom formats depending of the values shown, which is called formattable. It didn’t work by default with the provided examples, but for sure it’s a very promising package to improve the display of results in tables (at least for printing in color).
See: http://renkun.me/formattable/
End of display of static tables in a set of tabs
gVT stands for GoogleVistTable, which is a dynamic table display from the GoogleVis R package. It allows you, at least, to sort columns based on their values in real time, and with some extra work, you would be able to paint cells based on some logic in your scripts (values, legend codes from figures, etc; see its documentation pages).
require(googleVis)
# Set the googleVis options first to change the behaviour of plot.gvis, so
# that only the chart component of the HTML file is written into the output
# file.
op <- options(gvis.plot.tag = "chart")
# Make a clone of my.data, with a first extra column for id or samples (with
# 2 digits for easy re-sorting later on)
my.data.indexed <- cbind(sprintf("%02d", as.numeric(rownames(my.subset))), my.subset)
colnames(my.data.indexed)[1] <- "#"
## Table with enabled paging
gvT <- gvisTable(my.data.indexed, options = list(page = "disable", height = "automatic",
width = "automatic"))
plot(gvT)
# save the googlevis table to disk
# Assign file name for the my.data.indexed
outFileName <- paste("my.subset", analystName, format(Sys.Date(), "%y%m%d"),
"indexed.html", sep = ".")
# Display just the chart in the generated html
cat(gvT$html$chart, file = outFileName)
The R package DT provides an R interface to the JavaScript library DataTables. R data objects (matrices or data frames) can be displayed as tables on HTML pages, and DataTables provides filtering, pagination, sorting, and many other features in the tables.
require('DT')
d = data.frame(
my.data,
stringsAsFactors = FALSE
)
dt <- datatable(d, filter = 'bottom', options = list(pageLength = 5)) %>%
formatStyle('V1',
color = styleInterval(c(0.5, 56), c('black', 'red', 'blue')),
backgroundColor = styleInterval(56.5, c('snow', 'lightyellow')),
fontWeight = styleInterval(58.0, c('italics', 'bold')))
Display within this Rmd file the dynamic table produced
dt
You can also save the whole dynamic table in its own html file, to reuse elsewhere or display in full width, etc.
saveWidget(dt, paste("my.data", analystName, format(Sys.Date(), "%y%m%d"),'summary.html', sep="."))
Ask in class to your course mates or directly to the professor.
You have produced your analysis results, and you want to tell the world (or your customer) about it, without requiring complicated steps (requiring specific programs that might not be available in the computer or mobile device of your readers) to view your results.
You can use html reports, so that they can be easily seen by anyone, regardless of the Operating System they use, or device (tablet, smartphone, …), and they can be seen at any time.
Write html reports with R Markdown from R Studio. In a similar fashion to what you have seen here in this example.
See another example, for instance, here:
http://www.jacolienvanrij.com/Tutorials/tutorialMarkdown.html
Nozzle is an R package that provides an API to generate HTML reports with dynamic user interface elements based on JavaScript and CSS (Cascading Style Sheets). Nozzle was designed to facilitate summarization and rapid browsing of complex results in data analysis pipelines where multiple analyses are performed frequently on big data sets. The package can be applied to any project where user-friendly reports need to be created.
See more here:
You could publish this type of reports by means of uploading the html generated plus the png images and corresponding pdf files produced and linked from your report to some web or ftp/sftp server, for instance, that you or your work institution have access to, in order to leave the report web accessible by others through the web browser of through public ftp accounts.
Beyond static charts and graphs and beyond dynamic tables: maps, other plot.ly charts, Rcharts & htmlwidgets, animations, Shiny apps, …
# Learn about API authentication here: https://plot.ly/r/getting-started
# Find your api_key here: https://plot.ly/settings/api
df <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_world_gdp_with_codes.csv')
# light grey boundaries
l <- list(color = toRGB("grey"), width = 0.5)
# specify map projection/options
g <- list(
showframe = FALSE,
showcoastlines = FALSE,
projection = list(type = 'Equirectangular') # Instead of the usual but very biased 'Mercator' projection
)
plot_ly(df, z = GDP..BILLIONS., text = COUNTRY, locations = CODE, type = 'choropleth',
color = GDP..BILLIONS., colors = 'Blues', marker = list(line = l),
colorbar = list(tickprefix = '$', title = 'GDP Billions US$')) %>%
layout(title = '2014 Global GDP
Source:CIA World Factbook',
geo = g)
df <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_ebola.csv')
# restrict from June to September
df <- subset(df, Month %in% 6:9)
# ordered factor variable with month abbreviations
df$abbrev <- ordered(month.abb[df$Month], levels = month.abb[6:9])
# September totals
df9 <- subset(df, Month == 9)
# common plot options
g <- list(
scope = 'africa',
showframe = F,
showland = T,
landcolor = toRGB("grey90")
)
g1 <- c(
g,
resolution = 50,
showcoastlines = T,
countrycolor = toRGB("white"),
coastlinecolor = toRGB("white"),
projection = list(type = 'Mercator'),
list(lonaxis = list(range = c(-15, -5))),
list(lataxis = list(range = c(0, 12))),
list(domain = list(x = c(0, 1), y = c(0, 1)))
)
g2 <- c(
g,
showcountries = F,
bgcolor = toRGB("white", alpha = 0),
list(domain = list(x = c(0, .6), y = c(0, .6)))
)
plot_ly(df, type = 'scattergeo', mode = 'markers', locations = Country,
locationmode = 'country names', text = paste(Value, "cases"),
color = as.ordered(abbrev), marker = list(size = Value/50), inherit = F) %>%
add_trace(type = 'scattergeo', mode = 'text', geo = 'geo2', showlegend = F,
lon = 21.0936, lat = 7.1881, text = 'Africa') %>%
add_trace(type = 'choropleth', locations = Country, locationmode = 'country names',
z = Month, colors = "black", showscale = F, geo = 'geo2', data = df9) %>%
layout(title = 'Ebola cases reported by month in West Africa 2014<br> Source: <a href="https://data.hdx.rwlabs.org/dataset/rowca-ebola-cases">HDX</a>',
geo = g1, geo2 = g2)
See other plot.ly charts and applications, from several categories:
See googleVis examples: https://cran.r-project.org/web/packages/googleVis/vignettes/googleVis_examples.html
The rCharts is an R package to create, customize and publish interactive javascript visualizations from R using a familiar lattice style plotting interface. See: http://rcharts.io
The htmlwidgets package brings the best of JavaScript data visualization to R. Use JavaScript visualization libraries at the R console, just like plots. Embed widgets in R Markdown documents and Shiny web applications. Develop new widgets using a framework that seamlessly bridges R and JavaScript
See:
Shiny is a web application framework for R. You can turn your analyses into interactive web applications, without much previous knowledge of HTML, CSS, or JavaScript being required.
You can insert a shiny application in your Markdown document if you change the paramter runtime: static into runtime: shiny. Try that, and then set the chunk parameter to eval=TRUE to run this chunk of code and get the shiny application displayed here below:
source("https://raw.githubusercontent.com/rstudio/rmdexamples/master/R/kmeans_cluster.R")
kmeans_cluster(iris)
Easy web publishing from R with RPubs (among others). Write R Markdown documents in RStudio. Share them here on RPubs. (It’s free, and couldn’t be simpler!)
See http://rpubs.com/about/getting-started
RStudio lets you harness the power of R Markdown to create documents that weave together your writing and the output of your R code. And now, with RPubs, you can publish those documents on the web with the click of a button!
Prerequisites
You’ll need R itself, RStudio (v0.96.230 or later), and the knitr package (v0.5 or later).
Instructions
In RStudio, create a new R Markdown document by choosing File | New | R Markdown. Click the Knit HTML button in the doc toolbar to preview your document. In the preview window, click the Publish button.
Task for YOU:
* Add a the equivalent Plot.ly scatter plot in your report, as the one shown below, but for the same subset of data that you did in the previous part of the session today.
* Publish your report on the internet at [RPubs](rpubs.com) and send the link to the professor's email: xavier.depedro@seeds4c.org .
require(knitr)
getwd()
## [1] "/home/xavi/Dropbox/2016_SummeR_School_UB_HospClinic/day3"
datafile <- "InformeSalut2014_2010.csv"
download.file(url="https://seeds4c.org/tiki-download_file.php?fileId=453", destfile=datafile)
# Get all data in place
my.data <- readr::read_csv(datafile, locale=readr::locale("es", decimal_mark = ","))
vnames <- colnames(my.data)
my.data<-my.data[,-15]
colnames(my.data) <- c("District", "Suburb", paste0("V", 1:12))
str(my.data)
## Classes 'tbl_df' and 'data.frame': 73 obs. of 14 variables:
## $ District: chr "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" "Ciutat Vella" ...
## $ Suburb : chr "El Raval" "El Barri Gòtic" "La Barceloneta" "Sant Pere, Santa Caterina i la Ribera" ...
## $ V1 : num 55.1 56.4 58.7 56.7 53.7 56.3 55.8 56.4 52.8 57.2 ...
## $ V2 : num 38.6 33.6 37.5 39.4 30.2 33.5 31.1 33.2 33 34.9 ...
## $ V3 : num 60.3 103.6 82.1 91.2 99 ...
## $ V4 : num 40.6 23.5 33.2 25.4 19.3 20.8 12.7 14.3 17.3 22.5 ...
## $ V5 : num 12.4 9.8 12.8 11.4 8.7 8.7 7.1 8.4 8.6 9.5 ...
## $ V6 : num 54.9 50.1 52.9 49.6 36 38 35 36.8 36.7 38.6 ...
## $ V7 : num 80.3 81.3 80.4 82.4 84.3 84.3 82.9 83.8 84.6 83.8 ...
## $ V8 : num 119.8 115.7 118.6 104.3 97.2 ...
## $ V9 : num 149.8 118.7 163.9 126.2 72.7 ...
## $ V10 : num 118.2 42.3 44.8 31.9 13.1 ...
## $ V11 : num 17.5 3.8 7.9 16.4 2.6 4.7 1 2.5 3.5 6.1 ...
## $ V12 : num 5.3 3.9 8 6 6.2 6.3 4.8 6.5 6.2 5.8 ...
# As you can see there is an extra column that came with a note, and not real data. Therefore, we can remove it
tablelegend <- cbind(colnames(my.data[1:14]), vnames[1:14])
tablelegend <- rbind(tablelegend, unlist(strsplit(vnames[15], ":")))
colnames(tablelegend) <- c("Variable Code", "Variable Description")
knitr::kable(tablelegend, caption = "Table with kable")
| Variable Code | Variable Description |
|---|---|
| District | District (Barcelona-Catalonia-Spain) |
| Suburb | Suburb |
| V1 | Rate of over-aging, year 2014 |
| V2 | % of 75 y.o people or more living alone, year 2014 |
| V3 | Available Family income rate, year 2013* |
| V4 | % of 15 y.o people or more with primary studies or less, year 2014 |
| V5 | % of recorded unemployment 16-64 y.o, year 2014 |
| V6 | % of non-voters municipal elections, year 2015 |
| V7 | Life expectancy when born, period 2009-2013 |
| V8 | Comparative mortality rate, period 2009-2013* |
| V9 | Rate of Potential life years lost, period 2009-2013* |
| V10 | Tuberculosis Rate, period 2010-2014 |
| V11 | Teenager fecundity rate, period 2010-2014 |
| V12 | Prevalence of low weight when born, period 2010-2014 |
| Note | * 100 based on the total of Barcelona; dark gray corresponds to 25% with the worst indicator, green 25% better indicator and light gray the remaining 50%. |
# Simple interactive scatter chart
library(plotly)
# Add some info about variables displayed
cat(paste0("V3: ", tablelegend[5,2]),
paste0("\nV5: ", tablelegend[7,2]),
paste0("\nV6: ", tablelegend[8,2]),
paste0("\nV11: ", tablelegend[13,2]))
## V3: Available Family income rate, year 2013*
## V5: % of recorded unemployment 16-64 y.o, year 2014
## V6: % of non-voters municipal elections, year 2015
## V11: Teenager fecundity rate, period 2010-2014
plot_ly(my.data,
x = V5, y = V6, text = paste("Over-aging: ", V1,
"Income: ", V3,
"Fecundity: ", V11,
"Suburb: ", Suburb),
mode="marker",
size = V3, opacity = V3,
group = District)
Publish your report at RPubs, with the account your registered earlier today.
If needed, see again: http://rpubs.com/about/getting-started
In RStudio, create a new R Markdown document by choosing File | New | R Markdown. Click the Knit HTML button in the doc toolbar to preview your document. In the preview window, click the Publish button.
Once published, if you make new changes to your document, you have the option to republish the updated version to the same url, or publish a new document.
Ask in class to your course mates or directly to the professor, if time permits.
This document has been produced with the following paramters in the R Markdown document and the corresponding versions of R packages:
---
title: "Markdown, Automation & HTML Reports (UBISS16, Jul 6)"
author: "Xavier de Pedro, Ph.D. xavier.depedro@vhir.org - https://seeds4c.org/ubiss16d3"
date: "July 6, 2016"
output:
html_document:
toc: true
number_sections: true
toc_depth: 3
toc_float:
collapsed: true
smooth_scroll: true
theme: united
pdf_document:
toc: true
highlight: zenburn
runtime: static
params:
n: 100
d: !r Sys.Date()
a: "My Name"
---
You can extract all the R commands our of the R Markdown document (Rmd) by means of using the purl command, from knitr package:
wd <- "/home/xavi/Dropbox/2016_SummeR_School_UB_HospClinic/day3"
myfile <- file.path(wd, "ubiss16_d3_Markdown_Automation_Html_Reports.Rmd")
knitr::purl(myfile)
If you are using rmarkdown::render then you can pass a format name to render to select from the available formats. For example:
wd <- "/home/xavi/Dropbox/2016_SummeR_School_UB_HospClinic/day3"
myfile <- file.path(wd, "ubiss16_d3_Markdown_Automation_Html_Reports.Rmd")
render(myfile, "pdf_document")
# You can also render all formats defined in an input file with:
render(myfile, "all")
sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04 LTS
##
## locale:
## [1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=ca_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
## [5] LC_MONETARY=ca_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8
## [7] LC_PAPER=ca_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] shiny_0.13.2 webshot_0.3.2 DT_0.1 googleVis_0.5.10
## [5] stargazer_5.2 xtable_1.8-2 readr_0.2.2 plotly_3.6.0
## [9] ggplot2_2.1.0 knitr_1.13 rmarkdown_0.9.6
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.5 RColorBrewer_1.1-2 highr_0.6
## [4] formatR_1.4 plyr_1.8.3 base64enc_0.1-3
## [7] viridis_0.3.4 tools_3.3.0 digest_0.6.9
## [10] jsonlite_1.0 evaluate_0.9 gtable_0.2.0
## [13] DBI_0.4-1 yaml_2.1.13 parallel_3.3.0
## [16] gridExtra_2.2.1 dplyr_0.4.3 httr_1.1.0
## [19] stringr_1.0.0 htmlwidgets_0.6 grid_3.3.0
## [22] R6_2.1.2 RJSONIO_1.3-0 tidyr_0.4.1
## [25] magrittr_1.5 scales_0.4.0 htmltools_0.3.5
## [28] assertthat_0.1 mime_0.4 colorspace_1.2-6
## [31] httpuv_1.3.3 stringi_1.1.1 munsell_0.4.3
## Set options back to original options
options(op)