R Markdown provides an authoring framework for data science. You can use a single R Markdown file to both
R Markdown documents are fully reproducible and support dozens of static and dynamic output formats.
In a nutshell, R Markdown stands on the shoulders of knitr and Pandoc. The former executes the computer code embedded in Markdown, and converts R Markdown to Markdown. The latter renders Markdown to the output format you want (such as PDF, HTML, Word, and so on). The rmarkdown package was first created in early 2014. During the past four years, it has steadily evolved into a relatively complete ecosystem for authoring documents.
At this point, there are a large number of tasks that you could do with R Markdown:
Compile a single R Markdown document to a report in different formats, such as PDF, HTML, or Word.
Create notebooks in which you can directly run code chunks interactively.
Make slides for presentations (HTML5, LaTeX Beamer, or PowerPoint).
Produce dashboards with flexible, interactive, and attractive layouts.
Build interactive applications based on Shiny.
Write journal articles.
Author books of multiple chapters.
Generate websites and blogs
Markdown documents that have chunks of R code are called Rmarkdwon and are saved with a .Rmd extension. Code chunks in Rmarkdown behave similarly to chunks in knitr documents but are demarcated differently and have some added flexibility. The opening of a chunk is denoted with three back ticks (`), an opening curly brace ({), the letter r, a chunk label followed by comma-separated options, then a closing curly brace (}). The chunk is closed with three back ticks. All code and comments inside the chunk are treated as R code.
#Example 1
x <- 20
x+1
## [1] 21
#Example 2
print(1:10)
## [1] 1 2 3 4 5 6 7 8 9 10
#Example 3
print("Hello World")
## [1] "Hello World"
As we can see in above three exmaples, the code as well as output is printed.
In RMarkdown we can print good looking tables that are adapted to the type of output document.
#Example 1
knitr::kable(
head(USArrests),
caption='TABLE: Violent crime rates by US state.'
)
| Murder | Assault | UrbanPop | Rape | |
|---|---|---|---|---|
| Alabama | 13.2 | 236 | 58 | 21.2 |
| Alaska | 10.0 | 263 | 48 | 44.5 |
| Arizona | 8.1 | 294 | 80 | 31.0 |
| Arkansas | 8.8 | 190 | 50 | 19.5 |
| California | 9.0 | 276 | 91 | 40.6 |
| Colorado | 7.9 | 204 | 78 | 38.7 |
#Example 2
knitr::kable(
head(airquality),
caption='TABLE: New York Air Quality Measurements'
)
| Ozone | Solar.R | Wind | Temp | Month | Day |
|---|---|---|---|---|---|
| 41 | 190 | 7.4 | 67 | 5 | 1 |
| 36 | 118 | 8.0 | 72 | 5 | 2 |
| 12 | 149 | 12.6 | 74 | 5 | 3 |
| 18 | 313 | 11.5 | 62 | 5 | 4 |
| NA | NA | 14.3 | 56 | 5 | 5 |
| 28 | NA | 14.9 | 66 | 5 | 6 |
Both the printed tables are not interactive. We can make these tables interactive by using DT package. The DT package provides an interactive tabular experience through the DataTables JavaScript library. Since DT is based on htmlwidgets, its full interactivity is only experienced in HTML-based output. Therefore screenshot of output is provided below.
library(DT)
## Warning: package 'DT' was built under R version 3.6.3
#Example 1
datatable(head(USArrests,100))
#Example 2
datatable(head(airquality,100))
By using the DT package, both the tables have become very interactive. We can search anything from the table now. We can also jump to any page and see as any as entries we want.
A datatable object can be passed, via a pipe, to oforamtting functions to customize the output. The following code builds a datatables object, formats the ‘estimate’ column and adds ‘$’ sign before all the values. The rows having the value ‘Alabama’ and ‘California’ from the column ‘NAME’ are colored blue and yellow respectively.
data(us_rent_income,package = 'tidyr')
datatable(
head(us_rent_income,500),
rownames=FALSE,extensions='Scroller',filter='top',
options=list(
dom="tis",scrollX=TRUE,
scrollY=400,
scrollCollapse=TRUE
)
)%>%
formatCurrency('estimate',digits=0)%>%
formatStyle(columns = 'NAME',
valueColumns = 'NAME',target='row',
backgroundColor = styleEqual(
levels = c('Alabama','California'),
values = c('blue','yellow')
)
)
Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. This package creates maps based on the OpenStreetMap (or other map provider) that are scrollable and zoomable. It can also use shapefiles, GeojSON, TOPOJSON and raster images to build up the map.
Features
In below example, I have picked up pizza data from jaredlander.com website and passed it into a variable ‘pizza’ by using fromJSON() function. The data is unnested by using unnest() function. I have made a function getcoords() which converts all the street addresses to coordinates by using RDSTK package and creates two new columns ‘longitude’ and ‘latitude’. Both these columns are merged back to pizza dataset by bind_cols() function. At last leaflet() function is used and a map is displayed with all the addresses that were there in the dataset.
#leaflet example 1
library(jsonlite)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(purrr)
##
## Attaching package: 'purrr'
## The following object is masked from 'package:jsonlite':
##
## flatten
library(leaflet)
## Warning: package 'leaflet' was built under R version 3.6.3
library(tidyr)
pizza<-fromJSON('https://www.jaredlander.com/data/PizzaFavorites.json')
pizza<-
pizza %>%
unnest() %>%
rename(Street=Address) %>%
unite(col=Address, Street, City, State, Zip, sep=', ', remove=FALSE)
## Warning: `cols` is now required.
## Please use `cols = c(Details)`
getcoords<-function(address){
RDSTK::street2coordinates(address)%>%
dplyr::select_('latitude','longitude')
}
pizza<-bind_cols(pizza,pizza$Address%>% map_df(getcoords))
## Warning: select_() is deprecated.
## Please use select() instead
##
## The 'programming' vignette or the tidyeval book can help you
## to program with select() : https://tidyeval.tidyverse.org
## This warning is displayed once per session.
leaflet()%>%
addTiles()%>%
addMarkers(lng=~longitude,lat=~latitude,popup = ~sprintf('%s<br/>%s',Name,Street),
data=pizza)
In this another example, I have used ‘quakes’ dataset and extracted first 30 values. If the magnitude of the earthquake is less than 4, less than 5 and greater than 5 then the colors green, orange and red are respectively shown on the marker.
#leaflet example 2
library(dplyr)
library(leaflet)
data(quakes)
df.20 <- quakes[1:30,]
getColor <- function(quakes) {
sapply(quakes$mag, function(mag) {
if(mag <= 4) {
"green"
} else if(mag <= 5) {
"orange"
} else {
"red"
} })
}
icons <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = getColor(df.20)
)
leaflet(df.20) %>% addTiles() %>%
addAwesomeMarkers(~long, ~lat,icon=icons, label=~as.character(mag))
The dygraphs package is an R interface to the dygraphs JavaScript charting library. It provides rich facilities for charting time-series data in R, including:
Below is a simple example of dygraphs. Three time series giving the monthly deaths from bronchitis, emphysema and asthma in the UK, 1974–1979, both sexes (ldeaths), males (mdeaths) and females (fdeaths).
#Example 1
library(dygraphs)
## Warning: package 'dygraphs' was built under R version 3.6.3
lungDeaths <- cbind(ldeaths, mdeaths, fdeaths)
dygraph(lungDeaths, main = "Deaths from Lung Disease (UK)") %>%
dyOptions(colors = RColorBrewer::brewer.pal(3, "Set2"))
In this another example, a dygraph is used to show the Per Capita GDP(Gross Domestic Product) of four countries- Britain, USA, Belgium, Portugal. WDI (World Development Indicators) package is used to access GDP of these four countries. By using ‘dyRangeSelector’ we can add a range selector to the bottom of the chart that allows users to pan and zoom to various date ranges.
#Example 2
library(WDI)
## Warning: package 'WDI' was built under R version 3.6.3
library(dygraphs)
gdp<-WDI(country=c("GB","US","BE","PT"),
indicator = c("NY.GDP.PCAP.CD"),
start=1970,end=2011)
names(gdp)<-c("iso2c","Country","PerCapGDP","Year")
gdpWide<-gdp%>%
dplyr::select(Country,Year,PerCapGDP)%>%
tidyr::spread(key=Country,value=PerCapGDP)
dygraph(gdpWide,main="Yearly Per Capita GDP",
xlab='Year',ylab='Per Capira GDP')%>%
dyOptions(drawPoints=TRUE,pointSize=1)%>%
dyLegend(width=400)%>%
dyRangeSelector(dateWindow = c("1990","2000"))
The three.js was developed by Bryan Lewis. It has functions for building 3D scatterplots and globes that can be spun around to view different angles. The package includes:- - graphjs: an interactive force directed graph widget - scatterplot3js: a 3-d scatterplot widget similar to the scatterplot3d function - globejs: a widget that plots data and images on a 3-d globe The widgets are easy to use and render directly in RStudio, in R markdown, in Shiny applications, and from command-line R via a web browser. They produce high-quality interactive visualizations with just a few lines of R code.
Below is an example that shows full working of three.js.
# required library files
library(readr)
## Warning: package 'readr' was built under R version 3.6.3
library(dplyr)
library(threejs)
## Warning: package 'threejs' was built under R version 3.6.3
## Loading required package: igraph
## Warning: package 'igraph' was built under R version 3.6.3
##
## Attaching package: 'igraph'
## The following object is masked from 'package:tidyr':
##
## crossing
## The following objects are masked from 'package:purrr':
##
## compose, simplify
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
# read_tsv returns a table
# we can see the datatypes of all the columns
flights<-read_tsv('http://www.jaredlander.com/data/Flights_Jan_2.tsv')
## Parsed with column specification:
## cols(
## From = col_character(),
## To = col_character(),
## From_Lat = col_double(),
## From_Long = col_double(),
## To_Lat = col_double(),
## To_Long = col_double()
## )
# Since the airports are in the dataset multiple times
# I have counted the number of times they appear and arranged them in descending order.
# This will avoid overlaying of same points multiple times.
airports<-flights%>%
count(From_Lat,From_Long)%>%
arrange(desc(n))
airports
## # A tibble: 49 x 3
## From_Lat From_Long n
## <dbl> <dbl> <int>
## 1 40.6 -73.8 25
## 2 26.1 -80.2 16
## 3 42.4 -71.0 15
## 4 28.4 -81.3 11
## 5 18.4 -66.0 7
## 6 40.7 -74.2 5
## 7 26.5 -81.8 4
## 8 26.7 -80.1 4
## 9 33.9 -118. 4
## 10 12.5 -70.0 3
## # ... with 39 more rows
# This is a high resolution image of earth by NASA
earth<-"http://eoimages.gsfc.nasa.gov/images/imagerecords/73000/73909/world.topo.bathy.200412.3x5400x2700.jpg"
# img is the image to use
# bg is background (default value is black)
# lat and long are coordinates of the points to draw
# value controls how tall to draw the points
# arcs argument takes a four column data.frame where first two columns are origin lat and long and the last two columns are destination lat and destination long
# Rest arguments are used to customize the look and fell of the globe
globejs(img=earth,bg="white", lat=airports$From_Lat,long=airports$From_Long,
value=airports$n*5,color='red',
arcs=flights %>%
dplyr::select(From_Lat,From_Long,To_Lat,To_Long),
arcsHeight=.4,arcsLwd=4,arcsColor="#fdcb6e",arcsOpacity=.85,
atmosphere=TRUE,fov=30,rotationlat=.5,rotationlong=-.05)
We can use the globejs function to plot bars over the globe on arbitrary points with just a few lines of code. In this example, I have used ‘world.cities’ dataset from the ‘maps’ package. This database is primarily of world cities of population greater than about 40,000. Also included are capital cities of any population size, and many smaller towns.
# Example 2: Adding bars to make a spiky globe
library("threejs")
library("maptools")
## Warning: package 'maptools' was built under R version 3.6.3
## Loading required package: sp
## Warning: package 'sp' was built under R version 3.6.3
## Checking rgeos availability: FALSE
## Note: when rgeos is not available, polygon geometry computations in maptools depend on gpclib,
## which has a restricted licence. It is disabled by default;
## to enable gpclib, type gpclibPermit()
library("maps")
## Warning: package 'maps' was built under R version 3.6.3
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
data(world.cities, package="maps")
cities <- world.cities[order(world.cities$pop,decreasing=TRUE)[1:500],]
value <- 100 * cities$pop / max(cities$pop)
globejs(bg="white", lat=cities$lat, long=cities$long, value=value,
rotationlat=-0.34, rotationlong=-0.38, fov=30)
Heatmaps display the intensity of numeric data and are particularly helpful with correlation matrices. It is written by Tal Galili. We first build a correlation matrix of the numeric columns, then call d3heatmap which builds the heatmap and clusters the variables, displaying a dendrogram for the clustering. The result can be seen in the screenshot. It has the following features:
The ‘economics’ dataset has been used in below example to build interactive heatmap. Hovering over individual cells shows more information about the data and dragging a box zooms in on the plot.
library(d3heatmap)
## Warning: package 'd3heatmap' was built under R version 3.6.3
library(dplyr)
data("economics",package = 'ggplot2')
econCor<-economics%>%select_if(is.numeric)%>%cor
d3heatmap(econCor,xaxis_front_size='12pt',yaxis_front_size='12pt',width=600,height=600)
Displaying data and analysis is an important part of the data science process. R has long had great visualization capabilities thanks to built in graphics and ggplot2. With Shiny from R Studio, we can build dashboards, all with R code.
Shiny is a powerful tool for building Web-based dashboards, all with R code. At first glance, the best part of Shiny is that everything can be done in R preventing the need to learn new tools, but that is only half the power of Shiny. Since everything is written in R, the dashboards can make use of the entire R ecosystem and compute statistics and models not possible in most dashboard tools. This capability brings machine learning, data science and even Al to accessible dashboards that everyone can understand. This is a powerful advance in data visualization and presentation.
Shiny is built with R natively, so the dashboard can be backed by any data munging, modelling, processing and visualization that can be done in R. Shiny enables R programmers to develop Web-based dashboards without having to learn HTML and JavaScript, though knowing those tools helps. Using Shiny can be simple but the code takes some getting used to.
library(shinydashboard)
## Warning: package 'shinydashboard' was built under R version 3.6.3
##
## Attaching package: 'shinydashboard'
## The following object is masked from 'package:graphics':
##
## box
library(shiny)
## Warning: package 'shiny' was built under R version 3.6.3
##
## Attaching package: 'shiny'
## The following object is masked from 'package:jsonlite':
##
## validate
## The following objects are masked from 'package:DT':
##
## dataTableOutput, renderDataTable
library(ggplot2)
data(diamonds,package='ggplot2')
# Define UI for application
ui <- dashboardPage(
dashHeader<-dashboardHeader(
title='My Dashboard',
tags$li(
class="dropdown",
tags$a(href="https://www.linkedin.com/in/ashishkaparwan17/",
icon('linkedin'), "Linkedin", target="_blank")
),
tags$li(
class="dropdown",
tags$a(href="https://github.com/ashishkaparwan17",
icon('github-square'), "Github", target="_blank")
),
tags$li(
class="dropdown",
tags$a(href="https://www.instagram.com/uttrakhandi/",
icon('instagram'), "Instgram", target="_blank")
)
),
dashSideBar<-dashboardSidebar(
sidebarMenu(
menuItem('Common shiny inputs',
tabName='commonShinyInputTab',
icon=icon('dashboard')
),
menuItem('Graphs',
tabName='GraphsTab',
icon=icon('bar-chart-o')
)
)
),
dashBody <- dashboardBody(
tabItems(
tabItem(
tabName = 'commonShinyInputTab',
h2("Common shiny inputs"),
sliderInput(
inputId = 'SliderSample',
label="This is a slider",
min=0, max=20, value=5
),
textInput(
inputId = 'TextSample',
label='Space to enter text'
),
checkboxInput(
inputId = 'CheckSample',
label = 'Single check box'
),
checkboxGroupInput(
inputId = 'CheckGroupSample',
label = 'Multiple check boxes',
choices=list('A','B','C')
),
radioButtons(
inputId = 'RadioSample',
label = 'Radio button',
choices=list('A','B','C')
),
dateInput(
inputId = 'DateChoice',
label = 'Date Selector'
)
),
tabItem(tabName = "GraphsTab",
h2("Graphs tab content"),
selectInput(
inputId='VarToPlot',
label='choose',
choices=c('carat','depth','table','price'),
selected='price'
),
plotOutput(outputId = 'HistPlot')
)
)
)
)
server <- function(input, output) {
output$HistPlot<-renderPlot({
ggplot(diamonds,aes_string(x=input$VarToPlot))+
geom_histogram(bins=30)
})
}
shinyApp(
ui = ui,
server = server,
options = list(height=650)
)
Below are the sceenshots of shiny dashboard output.