Today we are going to start by looking at scatterplots, heatmaps, and
Here is some old code from Stat. 6501 Rice Chapter 2:
This was developed to examine the randomness of a LCG random number generator.
Code to view correlation in pseudorandom numbers generated by a LCG David Read, California State University, Easy Bay, 3/30/09
Note: This is a TERRIBLE LCG used only to show the correlation.
N <- 26
Seed <- 44
Mvalue <- 26
Cvalue <- 27
Dvalue <- 3
v1 <- numeric(N)
v1[1] <- Seed
for (i in 2:N)
{Next <- (Cvalue*Seed + Dvalue)%%Mvalue
v1[i] <- Next
Seed <- Next
}
v2 <- v1/Mvalue
plot(v2[1:(N-1)], v2[2:N], main="Correlation of Pseudorandom Numbers from a Linear Congruential Generator")
library(tidyverse)
[30m── [1mAttaching packages[22m ─────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.0 ──[39m
[30m[32m✔[30m [34mggplot2[30m 2.2.1.[31m9000[30m [32m✔[30m [34mpurrr [30m 0.2.4
[32m✔[30m [34mtibble [30m 1.4.1 [32m✔[30m [34mdplyr [30m 0.7.4
[32m✔[30m [34mtidyr [30m 0.7.2 [32m✔[30m [34mstringr[30m 1.2.0
[32m✔[30m [34mreadr [30m 1.1.1 [32m✔[30m [34mforcats[30m 0.2.0 [39m
[30m── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(plotly)
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
v3 <- tibble(x = v2[1:(N-1)], y = v2[2:N])
head(v3)
plot_ly(v3, x = ~x, y = ~y)
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
Note: This LCG has passed, and still passes, many tests for “randomness” and is suggested for use by the U.S. Bureau of Standards in 1964.
N <-1000
Seed <- 762939453125
Mvalue <- 2^47
Cvalue <- (2^7)+1
Dvalue <- 29741096258473
v1 <- numeric(N)
v1[1] <- Seed
for (i in 2:N)
{Next <- (Cvalue*Seed + Dvalue)%%Mvalue
v1[i] <- Next
Seed <- Next
}
v2 <- v1/Mvalue
View correlation in 2-dimensions. X-axis is v[i-1], Y-axis is v[i]. Expand this image to full screen to see the “structure.”
plot(v2[1:(N-1)], v2[2:N], main="Correlation of Pseudorandom Numbers from a Linear Congruential Generator")
v3 <- tibble(x = v2[1:(N-1)], y = v2[2:N])
head(v3)
plot_ly(v3, x = ~x, y = ~y)
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
p <- plot_ly(v3, x = ~x, y = ~y)
subplot(
add_histogram2d(p) %>%
colorbar(title = "default") %>%
layout(xaxis = list(title = "default")),
add_histogram2d(p, zsmooth = "best") %>%
colorbar(title = "zsmooth") %>%
layout(xaxis = list(title = "zsmooth")),
add_histogram2d(p, nbinsx = 60, nbinsy = 60) %>%
colorbar(title = "nbins") %>%
layout(xaxis = list(title = "nbins")),
shareY = TRUE, titleX = TRUE
)
View correlation in 3-dimensions. X-axis is v[i-2], Y-axis is v[i-1], Z-axis is v[i].
NOTE: You MUST install and load the R package rgl for this code to work. The 3d image can be rotated by clicking and dragging. The mouse wheel zooms. The “structure” can be seen by rotating the image until the empty planes can be seen. Again, full screen is best.
library(rgl)
plot3d(v2[1:(N-2)], v2[2:(N-1)], v2[3:N], main="Correlation of Pseudorandom Numbers from a Linear Congruential Generator")
v3 <- tibble(x = v2[1:(N-2)], y = v2[2:(N-1)], z = v2[3:N])
head(v3)
plot_ly(v3, x = ~x, y = ~y, z = ~z, size = I(1))
No trace type specified:
Based on info supplied, a 'scatter3d' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter3d
No scatter3d mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
No trace type specified:
Based on info supplied, a 'scatter3d' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter3d
No scatter3d mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
Here is some old code from Stat. 6501 Rice Chapter 3:
Code to graph and manipulable in 3-dimensions the bivariate normal distribution Demonstration of rgl package in R April 1, 2009 - David Read - California State University, East Bay
Parameters for BVN, change as you see fit.
rho <-0.5
mux <- 0
muy <- 0
sigmax <- 1
sigmay <- 1
Graphing Parameters deviations is the number of std. deviations included in the plot partitions^2 is the total number of dots in the plot, more=blacker You want to leave a lattice in areas of interest, so choose accordingly.
partitions <-80
deviations <- 4
v1 <- rep(seq((mux-deviations*sigmax), (mux+deviations*sigmax),
by <- ((2*deviations*sigmax)/partitions)), times=partitions)
v2 <- seq((muy-deviations*sigmay), (muy+deviations*sigmay),
by <- ((2*deviations*sigmay)/partitions))
v3 <- rep(v2, each=partitions)
The rgl package must be installed and loaded for the following code to work!!! The resulting plot can be rotated by clicking and draging, the mouse wheel zooms in and out. Full screen is best!
library(rgl)
plot3d(v1[1:length(v1)], v3[1:length(v3)], (1/(2*pi*sigmax*sigmay*(1-rho^2)^(.5))*exp((-1)*(1/(2*(1-rho^2)))*((((v1[1:length(v1)]-mux)/sigmax)^2)+(((v3[1:length(v3)]-muy)/sigmay)^2)-(2*rho*(((v1[1:length(v1)]-mux)*(v3[1:length(v3)]-muy))/(sigmax*sigmay)))))), xlab="X", ylab="Y", zlab="f(x,y)", main="Bivariate Normal Distribution")
v4 <- (1/(2*pi*sigmax*sigmay*(1-rho^2)^(.5))*exp((-1)*(1/(2*(1-rho^2)))*((((v1[1:length(v1)]-mux)/sigmax)^2)+(((v3[1:length(v3)]-muy)/sigmay)^2)-(2*rho*(((v1[1:length(v1)]-mux)*(v3[1:length(v3)]-muy))/(sigmax*sigmay))))))
df <- tibble(x = v1[1:length(v1)], y = v3[1:length(v3)], z = v4)
head(df)
plot_ly(df, x = ~x, y = ~y, z = ~z, size = I(0.5))
No trace type specified:
Based on info supplied, a 'scatter3d' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter3d
No scatter3d mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
No trace type specified:
Based on info supplied, a 'scatter3d' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter3d
No scatter3d mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
plot_ly(df, x = ~x, y = ~y, z = ~z, type = "contour")
Arranging multiple views.
Multiple time series plotted side by side on the same time scale, but not same vertical scale.
?economics
head(economics)
head(economics_long)
p1 <- plot_ly(economics, x = ~date, y = ~unemploy) %>%
add_lines(name = "unemploy")
p2 <- plot_ly(economics, x = ~date, y = ~uempmed) %>%
add_lines(name = "uempmed")
subplot(p1, p2)
Multiple Time Series plotted on the same time scale.
vars <- setdiff(names(economics), "date")
plots <- lapply(vars, function(var) {
plot_ly(economics, x = ~date, y = as.formula(paste0("~", var))) %>%
add_lines(name = var)
})
subplot(plots, nrows = length(plots), shareX = TRUE, titleX = FALSE)
A plot I have been interested in for a long time. Scatterplot with marginal histograms. Very cool.
x <- rnorm(500)
y <- rnorm(500)
s <- subplot(
plot_ly(x = x, color = I("black")),
plotly_empty(),
plot_ly(x = x, y = y, color = I("black")),
plot_ly(y = y, color = I("black")),
nrows = 2, heights = c(0.2, 0.8), widths = c(0.8, 0.2),
shareX = TRUE, shareY = TRUE, titleX = FALSE, titleY = FALSE
)
No trace type specified:
Based on info supplied, a 'histogram' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#histogram
No trace type specified and no positional attributes specifiedNo trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
No trace type specified:
Based on info supplied, a 'histogram' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#histogram
layout(s, showlegend = FALSE)
Here is an interesting plot. Note the violin plot. What do you think of it as an alternative to a boxplot?
e <- tidyr::gather(economics, variable, value, -date)
head(e)
gg1 <- ggplot(e, aes(date, value)) + geom_line() +
facet_wrap(~variable, scales = "free_y", ncol = 1)
gg2 <- ggplot(e, aes(factor(1), value)) + geom_violin() +
facet_wrap(~variable, scales = "free_y", ncol = 1) +
theme(axis.text = element_blank(), axis.ticks = element_blank())
subplot(gg1, gg2) %>% layout(margin = list(l = 50))
Linking views without shiny.
library(crosstalk)
sd <- SharedData$new(txhousing, ~year)
p <- ggplot(sd, aes(month, median)) +
geom_line(aes(group = year)) +
geom_smooth(data = txhousing, method = "gam") +
facet_wrap(~ city)
ggplotly(p, tooltip = "year") %>%
highlight(defaultValues = 2015, color = "red")
Removed 616 rows containing non-finite values (stat_smooth).Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_click'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_click'). You can change this default via the `highlight()` function.
The iris data ploted by pairs in base R.
pairs(iris)
Using the ipairs function in IDPmisc
library(IDPmisc)
ipairs(iris)
Now with GGally and plotly.
d <- SharedData$new(iris)
p <- GGally::ggpairs(d, aes(color = Species), columns = 1:4)
highlight(ggplotly(p), on = "plotly_selected")
Can only have one: highlightCan only have one: highlightCan only have one: highlightSetting the `off` event (i.e., 'plotly_deselect') to match the `on` event (i.e., 'plotly_selected'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_deselect') to match the `on` event (i.e., 'plotly_selected'). You can change this default via the `highlight()` function.
More time series data with transient versus persistent selection.
sd <- SharedData$new(txhousing, ~city)
p <- ggplot(sd, aes(date, median)) + geom_line()
gg <- ggplotly(p, tooltip = "city")
gg
# Persistent mode can still be enabled in this case by holding the
# shift key when hovering over lines
highlight(gg, on = "plotly_hover", dynamic = TRUE)
Adding more colors to the selection color palette
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.
# Persistent mode can be set permanently like so
highlight(gg, on = "plotly_hover", dynamic = TRUE, persistent = TRUE)
Adding more colors to the selection color palette
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.
For some reason this next code does not work as expected. Still trying to figure out why.
library(plotly)
# requires an experimental version of leaflet
# devtools::install_github("rstudio/leaflet#346")
library(leaflet)
sd <- SharedData$new(quakes)
# let plotly & leaflet know this is persistent selection
options(persistent = TRUE)
p <- plot_ly(sd, x = ~depth, y = ~mag) %>%
add_markers(alpha = 0.5) %>%
highlight("plotly_selected", dynamic = TRUE)
Adding more colors to the selection color palette
map <- leaflet(sd) %>%
addTiles() %>%
addCircles()
Assuming 'long' and 'lat' are longitude and latitude, respectively
bscols(widths = c(6, 6), p, map)
Setting the `off` event (i.e., 'plotly_deselect') to match the `on` event (i.e., 'plotly_selected'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_deselect') to match the `on` event (i.e., 'plotly_selected'). You can change this default via the `highlight()` function.
Nice example of the hight()
# Group name is used to populate a title for the dropdown
sd <- SharedData$new(txhousing, ~city, group = "Choose a city")
plot_ly(sd, x = ~date, y = ~median) %>%
group_by(city) %>%
add_lines(text = ~city, hoverinfo = "text") %>%
highlight(on = "plotly_hover", persistent = TRUE, selectize = TRUE)
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.
Nested selection.
# if you don't want to highlight individual points, you could specify
# `class` as the key variable here, instead of the default (rownames)
m <- SharedData$new(mpg)
p <- ggplot(m, aes(displ, hwy, colour = class)) +
geom_point() +
geom_smooth(se = FALSE, method = "lm")
ggplotly(p) %>% highlight("plotly_hover")
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.
# for better tick labels
mtcars$am <- dplyr::recode(mtcars$am, `0` = "automatic", `1` = "manual")
# choose a model by AIC stepping backwards
mod <- step(lm(mpg ~ ., data = mtcars), trace = FALSE)
# produce diagnostic plots, coloring by automatic/manual
pm <- GGally::ggnostic(mod, mapping = aes(color = am))
# ggplotly() automatically adds rownames as a key if none is provided
ggplotly(pm) %>% highlight("plotly_click")
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_click'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_click'). You can change this default via the `highlight()` function.
m <- SharedData$new(mpg)
p1 <- ggplot(m, aes(displ, fill = class)) + geom_density()
p2 <- ggplot(m, aes(displ, hwy, fill = class)) + geom_point()
subplot(p1, p2) %>% highlight("plotly_click") %>% hide_legend()
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_click'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_click'). You can change this default via the `highlight()` function.
Annimation!!!
data(gapminder, package = "gapminder")
gg <- ggplot(gapminder, aes(gdpPercap, lifeExp, color = continent)) +
geom_point(aes(size = pop, frame = year, ids = country)) +
scale_x_log10()
Ignoring unknown aesthetics: frame, ids
ggplotly(gg)
g <- crosstalk::SharedData$new(gapminder, ~continent)
gg <- ggplot(g, aes(gdpPercap, lifeExp, color = continent, frame = year)) +
geom_point(aes(size = pop, ids = country)) +
geom_smooth(se = FALSE, method = "lm") +
scale_x_log10()
Ignoring unknown aesthetics: ids
ggplotly(gg) %>%
highlight("plotly_hover")
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.
Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_hover'). You can change this default via the `highlight()` function.