What is R Markdown?

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like the ones used to create the plots below.

In order to make PDF documents you will need to install complete version of LaTex found in https://miktex.org/2.9/setup under net installer (you do not want basic installer).

What is Plotly?

Plotly is data visualization software that is designed to be interactive and visually appealing. Their website is located at https://plot.ly/. Additional documentation and error assistance can be found pretty quickly with Google searches.

Advantages to using Plotly over Tableau

  • R is free to use whereas Tableau requires a paid license

  • The script makes the plots reproducible

  • R markdown can produce interactive plots within HTML, Word, and PDF documents

  • Plotly has more customization abilities

Disadvantages to using Plotly over Tableau

  • Scripts take a lot of front-end work compared the drag and drop features in Tableau

  • Plotly requires more extensive coding knowledge

Necessary Packages

  • plotly

  • ggplot2

  • dplyr

  • There are some prerequisite packages for plotly that you will need to install the first time you run.


General notes about formatting

Here is a basic scatter plot:

plot_ly(
  x = x,                  #data for x variable (remember to specify dataframe when applicable)
  y = y1,                 #data for y variable (remember to specify dataframe when applicable)
  type = "scatter") %>%   #specifies it is a scatterplot
  
add_trace(                #adds a second scatter trend to the graph
          x =x,           #data for x variable
          y =y2)          #data for y variable

Let’s try naming our axes:

#specifying axis preferences
x_axis <- list(
  showgrid = T,           #says to include gridlines on x-axis
  zeroline = F,           #says not to include a line at zero
  nticks = 20,            #indicates number of tick marks for the axis
  showline = F,           #says to not include a border on the sides of graph
  title = "X-Axis Title", #names axis
  mirror = "all")   

y_axis <- list(
  showgrid = T,           #says to include gridlines along the tick marks on y-axis
  zeroline = F,           #says not to include a line at zero
  nticks = 20,            #indicates number of tick marks for the axis
  showline = F,           #says to not include a border on the sides of graph
  title = "Y-Axis Title", #names axis
  mirror = "all")   

#Creating a scatterplot with axes and legends
plot_ly(
  x = x,                  #data for x variable (remember to specify dataframe when applicable)
  y = y1,                 #data for y variable (remember to specify dataframe when applicable)
  type = "scatter") %>%   #specifies it is a scatterplot
  
add_trace(                #adds a second scatter trend to the graph
          x =x,           #data for x variable
          y =y2) %>%      #data for y variable

layout(
  xaxis = x_axis,         #indicates what object to use for x-axis title
  yaxis = y_axis)         #indicates what object to use for y-axis title

What other formatting can we do?

#specifying axis preferences
x_axis <- list(
  showgrid = F,           #says to not include gridlines on x-axis
  zeroline = F,           #says not to include a line at zero
  nticks = 20,            #indicates number of tick marks for the axis
  showline = T,           #says to include a border on the sides of graph
  title = "X-Axis Title", #names axis
  mirror = "all")   

y_axis <- list(
  showgrid = T,           #says to include gridlines along the tick marks on y-axis
  zeroline = F,           #says not to include a line at zero
  nticks = 20,            #indicates number of tick marks for the axis
  showline = T,           #says to include a border on the sides of graph
  title = "Y-Axis Title", #names axis
  mirror = "all")   

#Creating a scatterplot with axes and legends
plot_ly(
  x = x,              #data for x variable (note: specify df when needed)
  y = y1,             #data for y variable (note: specify df when needed)
  type = "scatter",   #specifies it is a scatterplot
  name = "Blue Rising") %>%   #names trace
  
add_trace(                 #adds a second scatter trend to the graph
          x = x,           #data for x variable
          y = y2,          #data for y variable
          name = "Orange Falling") %>%   #names trace

layout(
  title = "Main Title",   #names title
  xaxis = x_axis,         #indicates what object to use for x-axis title
  yaxis = y_axis,         #indicates what object to use for y-axis title
  legend = list(          #adds a legend
            x = 0.5,      #specifies position of legend horizontally
            y = 1,        #specifies position of legend vertically
              bgcolor = "#F3F3F3")) #specifies the colors for two trends

Basic Plots

This is a line plot:

#lineplot (the lines do not have to be linear, will connect points in order they are written)
plot_ly(x = c(1, 2, 3),   #choose x variable (here, the x variable is a created in-line as a vector <1,2,3>)
        y = c(5, 6, 7),   #choose y variable (same situation as with x variable)
        type = "scatter", #specifies the chart is a scatter plot
        mode = "lines")   #Makes the chart a scatterplot

This is a bubble chart:

#bubble chart
plot_ly(x = c(1, 2, 3),       #select x variable
        y = c(5, 6, 7),       #select y variable
        type = "scatter" ,    #specifies that it is a scatterplot
        mode = "markers" ,    #indicates to use markers opposed to lines
        size = c(1, 5, 10 ),  #makes marker size a variable
        marker = list(        #chooses marker options (we just do color)
                      color = c("red", "blue" ,"green"))) #specifies color of bubbles

Here is another scatterplot:

#scatter plot (the scatter plot above is more detailed)
plot_ly(x = c(1, 2, 3 ),  #select x variable
        y = c(5, 6, 7 ),  #select y variable
        type = "scatter", #indicate it is a scatter plot
        mode = "markers") #says to use markers not lines

This is a heat map:

If you are confused what data is needed for input, type View(volcano) and look at dataframe structure.

#heatmap (usually will need quite a bit of data manipulation)
plot_ly(z = volcano,        #specify a dataframe that is a matrix of numeric values
        type = "heatmap")   #specifies plot is a heatmap

Here is a bar chart:

#barcharts (believe you need to aggregate data before using plot)
plot_ly(
  x = c("giraffes", "orangutans", "monkeys"), #choose unique bar category
  y = c(20, 14, 23),                          #choose height of bar
  type = "bar"                                #specify bar chart
) %>%
layout(title = "SF Zoo")                      #name plot

Here is an area plot:

#area plots
plot_ly(x = c(1, 2, 3),     #choose x variable
        y = c(5, 6, 7),     #choose y variable
        type = "scatter" ,  #specifies the chart is a scatter plot
        mode = "lines" ,    #indicates line plot
        fill = "tozeroy" )  #says to fill below line

Diagrams

Histograms

#histograms
plot_ly(alpha = 0.5) %>%        #bigger alpha = more opaque
  add_histogram(                #adds first histogram
    x = ~rnorm(500)) %>%        #choose numeric var. for first histogram
  add_histogram(                #add second histogram
    x = ~rnorm(500) + 1) %>%    #choose numeric var. for second histogram
  layout(barmode = "overlay")   #says to overlay the two plots

Here’s a box plot

#box plots
plot_ly(                  #creates first box plot
  y = rnorm(50),          #choose numeric variable for first plot
  type = "box") %>%       #specifies plot type is box plot
  
  add_trace(              #creates second box plot
    y = rnorm(50, 1))     #choose numeric variable for second plot

This is a “2D Histogram”

Measures correlation between two variables

#2D histograms
plot_ly (
  x = rnorm(1000, sd = 10), #choose first numeric variable
  y = rnorm(1000, sd = 5),  #choose second numeric variable
  type = "histogram2d")     #indicates plot is 2D histogram

Maps

This is a Bubble Map

Bubble plots are useful to depict three numeric variables on a single plot. The

#bubble map
data <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv") #reads in dataset

plot_ly(
  data,               #indicate the dataframe you're using
  x = ~Women,         #identify x variable (keep the tilda)
  y = ~Men,           #identify y variable (keep the tilda)           
  text = ~School,     #when hovering over a bubble will display this var.
  type = "scatter",   #makes the plot a scatterplot
  mode = "markers",   #indicates values are points (e.g. not lines)
  marker = list(      #chooses properties of each bubble
    size = ~Gap,      #sets the size of the bubble equal to selected var.
    opacity = 0.5)) %>%   #the smaller opacity the more see the bubble
  
  layout(title = 'Gender Gap in Earnings per University', #names plot
         xaxis = list(showgrid = TRUE),       #makes x gridlines
         yaxis = list(showgrid = TRUE))       #makes y gridlines

This is a Choropleth Map

This plot type is used to compare regions graphically in regards to a single variable.

#Note: these tend to only be able to be seen in browser
#opens dataset
df <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv")                                         
#Sets what information is shown when hovering
df$hover <- with(df,
                 paste(state, '<br>',      #'<br>' creates a break
                       "Beef", beef,        #in quotes are shown verbatim
                       "Dairy", dairy,     #nonquotes display each value
                       "<br>","Fruits", 
                       total.fruits, "Veggies", 
                       total.veggies,"<br>", 
                       "Wheat", wheat, 
                       "Corn", corn))

# sets color and width of state borders
l <- list(color = toRGB("white"), width = 2)     

# specify some map projection/options
g <- list(
  scope = 'usa',                         #sets scope of the map
  projection = list(
    type = 'albers usa'),                #selects which map type to use
  showlakes = FALSE                      #there are no lakes on this map
  )

plot_geo(df,                             #calls dataset
   locationmode = 'USA-states') %>%      #specifies map location
  
  add_trace(
    z = ~total.exports,                  #assigns a value to each location
    text = ~hover,                       #calls hover information
    locations = ~code,                   #links locations from data to map
    color = ~total.exports,              #chooses what var to color by
    colors = 'Purples'                   #select map colors
  ) %>%
  
  colorbar(title = "Millions USD") %>%   #names the colorbar
  
  layout(
    title = '2011 US Agriculture Exports by State
             <br>(Hover for breakdown)',     #names plot
    geo = g                                  #calls map info from g
  )

This is a scatter map

#scatter map (need to show in browser)
#loads the data
df <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2015_06_30_precipitation.csv')

# change default color scale title
m <- list(colorbar = list(title = "Total Inches"))

# geo styling
g <- list(
  scope = 'north america',           #says map of N. America
  showland = TRUE,                   #shows the landmass
  landcolor = toRGB("grey83"),       #colors the land grey
  subunitcolor = toRGB("white"),     #colors subland white
  countrycolor = toRGB("white"),     #colors countries white
  showlakes = TRUE,                  #shows lakes on map
  lakecolor = toRGB("white"),        #colors lakes white
  showsubunits = TRUE,               #shows subland units
  showcountries = TRUE,              #shows countries
  resolution = 50,                   #specifies resolution
  projection = list(                 #indicates map properites
    type = 'conic conformal',        #map is conical shape
    rotation = list(lon = -100)      #rorates by longitude
  ),
  lonaxis = list(                    #gives info of longitude
    showgrid = TRUE,                 #show longitude grids
    gridwidth = 0.5,                 #specifies width of grid
    range = c(-140, -55),            #range of longitude on map
    dtick = 5                        #freqency of gridlines
  ),
  lataxis = list(                    #gives info on latitude
    showgrid = TRUE,                 #show latitude grids
    gridwidth = 0.5,                 #specifies gridline width
    range = c(20, 60),               #frequency of gridlines
    dtick = 5                   
  )
)

plot_geo(df,                         #calls the data
         lat = ~Lat,                 #specifies latitude info
         lon = ~Lon,                 #specifies longitude info
         color = ~Globvalue) %>%     #colors by percipitation
  add_markers(
    text = ~paste(df$Globvalue,      #hover value is rainfall
                  "inches"),         #displays "inches" afterwards
    hoverinfo = "text"               #actually sets hover info
  ) %>%

#titles the chart and calls map information
layout(title = 'US Precipitation 06-30-2015<br>Source: NOAA', 
         geo = g)

3D (Probably Will Not Use)

3D Surface Plots

#3d surface plots
plot_ly(z = ~volcano) %>%  #z takes a matrix of numeric values (volcano is a r dataset)
  add_surface()            #turns the data into a surface plot

3D Line Plots

#3d line plots
#reads in the data from online source
data <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/3d-line1.csv')  
data$color <- as.factor(data$color)  #makes color column a factor variable

plot_ly(
  data,                              #specifies the dataframe data is coming from
  x = ~x,                            #indicates what numeric variable to use on x axis (need tilda in front of variable)
  y = ~y,                            #indicates what numeric variable to use on y axis (need tilda in front of variable)
  z = ~z,                            #indicates what numeric variable to use on z axis (need tilda in front of variable)
  type = 'scatter3d',                #makes the plot 3D (making it scatter is needed to use the "lines" option)
  mode = 'lines',                    #makes the plot the line plot
  opacity = 1,                       #indicates on range 0 to 1 how opaque to make the line (0 is invisible)
  line = list(                       #lists all the line options
              width = 6,             #indicates how thick the line should be
              color = ~color,        #sets the variable to color the line by (must be a factor variable)
              type = "solid"))       #specifies line type, can also make it "dotted" "dashed" "longdash" "twodash" and "dotdash"

3D Scatter Plots

#3d scatter plots
mtcars$am[which(mtcars$am == 0)] <- 'Automatic'  #replaces 0 with "Automatic" in am column
mtcars$am[which(mtcars$am == 1)] <- 'Manual'     #replaces 1 with "Manual" in am column
mtcars$am <- as.factor(mtcars$am)                #changes am variable to a factor variable

plot_ly(mtcars, 
        x = ~wt,                                 #choose x variable (must be numeric)
        y = ~hp,                                 #choose y variable (must be numeric)
        z = ~qsec,                               #choose z variable (must be numeric)
        color = ~am,                             #select categorical variable to color dots by
        colors = c('#BF382A', '#0C4B8E')) %>%    #choose the colors  to color the groups (make sure number of colors specified equal the number of categories)

    add_markers() %>%                            #specifies plot type
  
    layout(scene = list(xaxis = list(title = "Weight"),             #lables x-axis
                        yaxis = list(title = "Gross horsepower"),   #lables y-axis
                        zaxis = list(title = "1/4 mile time")))     #lables z-axis