This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like the ones used to create the plots below.
In order to make PDF documents you will need to install complete version of LaTex found in https://miktex.org/2.9/setup under net installer (you do not want basic installer).
Plotly is data visualization software that is designed to be interactive and visually appealing. Their website is located at https://plot.ly/. Additional documentation and error assistance can be found pretty quickly with Google searches.
R is free to use whereas Tableau requires a paid license
The script makes the plots reproducible
R markdown can produce interactive plots within HTML, Word, and PDF documents
Plotly has more customization abilities
Scripts take a lot of front-end work compared the drag and drop features in Tableau
Plotly requires more extensive coding knowledge
plotly
ggplot2
dplyr
There are some prerequisite packages for plotly that you will need to install the first time you run.
Here is a basic scatter plot:
plot_ly(
x = x, #data for x variable (remember to specify dataframe when applicable)
y = y1, #data for y variable (remember to specify dataframe when applicable)
type = "scatter") %>% #specifies it is a scatterplot
add_trace( #adds a second scatter trend to the graph
x =x, #data for x variable
y =y2) #data for y variable
Let’s try naming our axes:
#specifying axis preferences
x_axis <- list(
showgrid = T, #says to include gridlines on x-axis
zeroline = F, #says not to include a line at zero
nticks = 20, #indicates number of tick marks for the axis
showline = F, #says to not include a border on the sides of graph
title = "X-Axis Title", #names axis
mirror = "all")
y_axis <- list(
showgrid = T, #says to include gridlines along the tick marks on y-axis
zeroline = F, #says not to include a line at zero
nticks = 20, #indicates number of tick marks for the axis
showline = F, #says to not include a border on the sides of graph
title = "Y-Axis Title", #names axis
mirror = "all")
#Creating a scatterplot with axes and legends
plot_ly(
x = x, #data for x variable (remember to specify dataframe when applicable)
y = y1, #data for y variable (remember to specify dataframe when applicable)
type = "scatter") %>% #specifies it is a scatterplot
add_trace( #adds a second scatter trend to the graph
x =x, #data for x variable
y =y2) %>% #data for y variable
layout(
xaxis = x_axis, #indicates what object to use for x-axis title
yaxis = y_axis) #indicates what object to use for y-axis title
What other formatting can we do?
#specifying axis preferences
x_axis <- list(
showgrid = F, #says to not include gridlines on x-axis
zeroline = F, #says not to include a line at zero
nticks = 20, #indicates number of tick marks for the axis
showline = T, #says to include a border on the sides of graph
title = "X-Axis Title", #names axis
mirror = "all")
y_axis <- list(
showgrid = T, #says to include gridlines along the tick marks on y-axis
zeroline = F, #says not to include a line at zero
nticks = 20, #indicates number of tick marks for the axis
showline = T, #says to include a border on the sides of graph
title = "Y-Axis Title", #names axis
mirror = "all")
#Creating a scatterplot with axes and legends
plot_ly(
x = x, #data for x variable (note: specify df when needed)
y = y1, #data for y variable (note: specify df when needed)
type = "scatter", #specifies it is a scatterplot
name = "Blue Rising") %>% #names trace
add_trace( #adds a second scatter trend to the graph
x = x, #data for x variable
y = y2, #data for y variable
name = "Orange Falling") %>% #names trace
layout(
title = "Main Title", #names title
xaxis = x_axis, #indicates what object to use for x-axis title
yaxis = y_axis, #indicates what object to use for y-axis title
legend = list( #adds a legend
x = 0.5, #specifies position of legend horizontally
y = 1, #specifies position of legend vertically
bgcolor = "#F3F3F3")) #specifies the colors for two trends
This is a line plot:
#lineplot (the lines do not have to be linear, will connect points in order they are written)
plot_ly(x = c(1, 2, 3), #choose x variable (here, the x variable is a created in-line as a vector <1,2,3>)
y = c(5, 6, 7), #choose y variable (same situation as with x variable)
type = "scatter", #specifies the chart is a scatter plot
mode = "lines") #Makes the chart a scatterplot
This is a bubble chart:
#bubble chart
plot_ly(x = c(1, 2, 3), #select x variable
y = c(5, 6, 7), #select y variable
type = "scatter" , #specifies that it is a scatterplot
mode = "markers" , #indicates to use markers opposed to lines
size = c(1, 5, 10 ), #makes marker size a variable
marker = list( #chooses marker options (we just do color)
color = c("red", "blue" ,"green"))) #specifies color of bubbles
Here is another scatterplot:
#scatter plot (the scatter plot above is more detailed)
plot_ly(x = c(1, 2, 3 ), #select x variable
y = c(5, 6, 7 ), #select y variable
type = "scatter", #indicate it is a scatter plot
mode = "markers") #says to use markers not lines
This is a heat map:
If you are confused what data is needed for input, type View(volcano)
and look at dataframe structure.
#heatmap (usually will need quite a bit of data manipulation)
plot_ly(z = volcano, #specify a dataframe that is a matrix of numeric values
type = "heatmap") #specifies plot is a heatmap
Here is a bar chart:
#barcharts (believe you need to aggregate data before using plot)
plot_ly(
x = c("giraffes", "orangutans", "monkeys"), #choose unique bar category
y = c(20, 14, 23), #choose height of bar
type = "bar" #specify bar chart
) %>%
layout(title = "SF Zoo") #name plot
Here is an area plot:
#area plots
plot_ly(x = c(1, 2, 3), #choose x variable
y = c(5, 6, 7), #choose y variable
type = "scatter" , #specifies the chart is a scatter plot
mode = "lines" , #indicates line plot
fill = "tozeroy" ) #says to fill below line
Histograms
#histograms
plot_ly(alpha = 0.5) %>% #bigger alpha = more opaque
add_histogram( #adds first histogram
x = ~rnorm(500)) %>% #choose numeric var. for first histogram
add_histogram( #add second histogram
x = ~rnorm(500) + 1) %>% #choose numeric var. for second histogram
layout(barmode = "overlay") #says to overlay the two plots
Here’s a box plot
#box plots
plot_ly( #creates first box plot
y = rnorm(50), #choose numeric variable for first plot
type = "box") %>% #specifies plot type is box plot
add_trace( #creates second box plot
y = rnorm(50, 1)) #choose numeric variable for second plot
This is a “2D Histogram”
Measures correlation between two variables
#2D histograms
plot_ly (
x = rnorm(1000, sd = 10), #choose first numeric variable
y = rnorm(1000, sd = 5), #choose second numeric variable
type = "histogram2d") #indicates plot is 2D histogram
This is a Bubble Map
Bubble plots are useful to depict three numeric variables on a single plot. The
#bubble map
data <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv") #reads in dataset
plot_ly(
data, #indicate the dataframe you're using
x = ~Women, #identify x variable (keep the tilda)
y = ~Men, #identify y variable (keep the tilda)
text = ~School, #when hovering over a bubble will display this var.
type = "scatter", #makes the plot a scatterplot
mode = "markers", #indicates values are points (e.g. not lines)
marker = list( #chooses properties of each bubble
size = ~Gap, #sets the size of the bubble equal to selected var.
opacity = 0.5)) %>% #the smaller opacity the more see the bubble
layout(title = 'Gender Gap in Earnings per University', #names plot
xaxis = list(showgrid = TRUE), #makes x gridlines
yaxis = list(showgrid = TRUE)) #makes y gridlines
This is a Choropleth Map
This plot type is used to compare regions graphically in regards to a single variable.
#Note: these tend to only be able to be seen in browser
#opens dataset
df <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv")
#Sets what information is shown when hovering
df$hover <- with(df,
paste(state, '<br>', #'<br>' creates a break
"Beef", beef, #in quotes are shown verbatim
"Dairy", dairy, #nonquotes display each value
"<br>","Fruits",
total.fruits, "Veggies",
total.veggies,"<br>",
"Wheat", wheat,
"Corn", corn))
# sets color and width of state borders
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
scope = 'usa', #sets scope of the map
projection = list(
type = 'albers usa'), #selects which map type to use
showlakes = FALSE #there are no lakes on this map
)
plot_geo(df, #calls dataset
locationmode = 'USA-states') %>% #specifies map location
add_trace(
z = ~total.exports, #assigns a value to each location
text = ~hover, #calls hover information
locations = ~code, #links locations from data to map
color = ~total.exports, #chooses what var to color by
colors = 'Purples' #select map colors
) %>%
colorbar(title = "Millions USD") %>% #names the colorbar
layout(
title = '2011 US Agriculture Exports by State
<br>(Hover for breakdown)', #names plot
geo = g #calls map info from g
)
This is a scatter map
#scatter map (need to show in browser)
#loads the data
df <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2015_06_30_precipitation.csv')
# change default color scale title
m <- list(colorbar = list(title = "Total Inches"))
# geo styling
g <- list(
scope = 'north america', #says map of N. America
showland = TRUE, #shows the landmass
landcolor = toRGB("grey83"), #colors the land grey
subunitcolor = toRGB("white"), #colors subland white
countrycolor = toRGB("white"), #colors countries white
showlakes = TRUE, #shows lakes on map
lakecolor = toRGB("white"), #colors lakes white
showsubunits = TRUE, #shows subland units
showcountries = TRUE, #shows countries
resolution = 50, #specifies resolution
projection = list( #indicates map properites
type = 'conic conformal', #map is conical shape
rotation = list(lon = -100) #rorates by longitude
),
lonaxis = list( #gives info of longitude
showgrid = TRUE, #show longitude grids
gridwidth = 0.5, #specifies width of grid
range = c(-140, -55), #range of longitude on map
dtick = 5 #freqency of gridlines
),
lataxis = list( #gives info on latitude
showgrid = TRUE, #show latitude grids
gridwidth = 0.5, #specifies gridline width
range = c(20, 60), #frequency of gridlines
dtick = 5
)
)
plot_geo(df, #calls the data
lat = ~Lat, #specifies latitude info
lon = ~Lon, #specifies longitude info
color = ~Globvalue) %>% #colors by percipitation
add_markers(
text = ~paste(df$Globvalue, #hover value is rainfall
"inches"), #displays "inches" afterwards
hoverinfo = "text" #actually sets hover info
) %>%
#titles the chart and calls map information
layout(title = 'US Precipitation 06-30-2015<br>Source: NOAA',
geo = g)
3D Surface Plots
#3d surface plots
plot_ly(z = ~volcano) %>% #z takes a matrix of numeric values (volcano is a r dataset)
add_surface() #turns the data into a surface plot
3D Line Plots
#3d line plots
#reads in the data from online source
data <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/3d-line1.csv')
data$color <- as.factor(data$color) #makes color column a factor variable
plot_ly(
data, #specifies the dataframe data is coming from
x = ~x, #indicates what numeric variable to use on x axis (need tilda in front of variable)
y = ~y, #indicates what numeric variable to use on y axis (need tilda in front of variable)
z = ~z, #indicates what numeric variable to use on z axis (need tilda in front of variable)
type = 'scatter3d', #makes the plot 3D (making it scatter is needed to use the "lines" option)
mode = 'lines', #makes the plot the line plot
opacity = 1, #indicates on range 0 to 1 how opaque to make the line (0 is invisible)
line = list( #lists all the line options
width = 6, #indicates how thick the line should be
color = ~color, #sets the variable to color the line by (must be a factor variable)
type = "solid")) #specifies line type, can also make it "dotted" "dashed" "longdash" "twodash" and "dotdash"
3D Scatter Plots
#3d scatter plots
mtcars$am[which(mtcars$am == 0)] <- 'Automatic' #replaces 0 with "Automatic" in am column
mtcars$am[which(mtcars$am == 1)] <- 'Manual' #replaces 1 with "Manual" in am column
mtcars$am <- as.factor(mtcars$am) #changes am variable to a factor variable
plot_ly(mtcars,
x = ~wt, #choose x variable (must be numeric)
y = ~hp, #choose y variable (must be numeric)
z = ~qsec, #choose z variable (must be numeric)
color = ~am, #select categorical variable to color dots by
colors = c('#BF382A', '#0C4B8E')) %>% #choose the colors to color the groups (make sure number of colors specified equal the number of categories)
add_markers() %>% #specifies plot type
layout(scene = list(xaxis = list(title = "Weight"), #lables x-axis
yaxis = list(title = "Gross horsepower"), #lables y-axis
zaxis = list(title = "1/4 mile time"))) #lables z-axis