manipulate packageYou may download the script with all the code from this report, as well as a copy of this report, from this link: https://goo.gl/KSF9RZ.
manipulate package do?From the documentation:
“The manipulate function accepts a plotting expression and a set of controls (e.g. slider, picker, checkbox, or button) which are used to dynamically change values within the expression. When a value is changed using its corresponding control the expression is automatically re-executed and the plot is redrawn.”
Essentially, we can generate ‘interactive’ plots where we can change input variables via a GUI, similar to Shiny apps, but with minimal setup.
The manipulate package was created way back in 2015 by Joseph Allaire, who is the founder and current CEO of RStudio. The package never seemed to get really popular, presumably being overshadowed by flashier and more powerful interactive app packages like Shiny. However, in this report we make the case that the timeless simplicity of manipulate makes it still relevant, and something that you might even use for your next data analysis.
Here’s a preview of how to use manipulate (adapted from the manipulate documentation):
library(manipulate)
# Supply a single plotting expression, and any number of input controls that are assigned to variables, as parameters to the manipulate function.
manipulate(
plot(cars, xlim = c(x.min, x.max),
type = plot_type,
axes = axes,
ann = label),
# Refer to variables defined below. Order of parameters does not matter.
x.min = slider(0, 15),
x.max = slider(15, 30, initial = 25),
plot_type = picker("p", "l", "b"),
axes = checkbox(TRUE, "Axes"),
label = checkbox(TRUE, "Labels")
)
If we run the code in an Rscript in RStudio, a plot will be drawn in the ‘Plots’ panel, with a gear icon in the top left corner. Clicking the gear icon reveals a small grey panel with the respective input controls that we defined, which we can manipulate to redraw the plot.
As you can see, using manipulate is really simple; for most scenarios, we can achieve basic interactive functionality with very little code, just following this template.
The best way to learn how and when you can use manipulate effectively is to try it with examples.
Some key differences between Shiny apps and manipulate plots will become apparent in the examples. We will also summarize the general benefits and limitations of the manipulate package at the end of the report.
Note: manipulate only works within RStudio, since the controls are generated as part of the native RStudio GUI. Hence these examples must be run in an R script. If you run manipulate in a Rmarkdown chunk, you will simply see the plot image inline with no control. If you try to knit a Rmd file containing a chunk with manipulate code, however, you will get an error (unless you set eval=FALSE for that chunk.)
attitude datasetFirst, let us take a look at how we can use Manipulate to understand our data in a quick and easy manner.
For this example, we use the inbuilt R dataset called attitude - The Chatterjee Price Attitude Data Set.
According to the R documentation, this data set includes aggregated data from a survey of approximately 35 clerical employees in each of the 30 randomly selected departments at a large financial organization. The numbers denote the percent proportion of favorable responses to 7 questions in each department.
The 7 questions are:
To show how the picker can be used, we make a scatterplot and trend line with ‘rating’ on the y-axis and another variable on the x-axis. We can do this using the ggplot2 functions ggplot(), geom_point() and geom_smooth().
We create 2 pickers, x.variable, which contains a list of all the different variables or in this case, questions mentioned above that can be plotted on the x-axis and method, which contains a list of all the smoothing methods that can be used in geom_smooth().
The methods are:
Here is the basic code.
# Note that aes_string() is used instead of aes().
# This is to allow input of a string.
manipulate(
ggplot(attitude, aes_string(x = x.variable,y = "rating"))+
geom_point() + geom_smooth(method = method),
# Create 2 picker options:
# 1. x.variable, which contains the names of all the categories
# that can be plotted on the x axis.
# 2. method, which contains the different methods of smoothing
# that can be used to smooth the data.
# x.variable and method can be altered using the pickers.
x.variable = picker("rating","complaints","privileges",
"learning","raises","critical","advance"),
method = picker("lm", "glm", "gam", "loess", "rlm")
)
When we run the code above, we obtain the following output.
Since, ‘rating’ has been plotted against ‘rating’, we have a perfectly straight best fit line along y=x.
If we were to change the x-axis to ‘learning’ instead of ‘rating’, what type of correlation do we expect between the 2 variables?
Generally, we would expect the availability of opportunities for employees to learn to be positively correlated with the overall rating, right? Now, let us change the variable on the x-axis from ‘rating’ to ‘learning’ and take a look at what the data tells us.
We obtain the following output.
As we can see, the data meets our expectation. By using the picker to switch between multiple categories on the x-axis, we can draw quick inferences about how each variable is correlated with the overall rating instead of having to edit our code over and over again.
Next, let us try to change our method of smoothing using the ‘method’ picker. When we change method from ‘lm’ to ‘loess’, we obtain the following output.
Clearly, this graph looks very different from our previous one. Instead of the straight line generated by using the linear model option ‘lm’, we have a smooth curve generated by using the point-fitting option, ‘loess’. Once again, it is much more convenient to change the method of smoothing using the picker option as opposed to altering the code each time.
mpg datasetOur second example will illustrate two points:
We will demonstrate this using the native mpg dataset. This dataset gives us information about vehicle performance of different car manufacturers in the US from 1998-2008.
Looking at the dataset, we can see that there are many variables such as:
How then do we quickly make sense of this dataset to understand fuel efficiency of different automobile brands in the US?
# Begin manipulate function.
# Nest ggplot under the manipulate function.
manipulate(
# Note that aes_string() is used instead of aes().
# This is to allow input of a string.
# We set aes_string(x,y) so that we can manipulate them later.
ggplot(data = mpg, aes_string(x, y))+
geom_boxplot(fill = "white", colour = "#3366FF")+ # adjust the colour of the boxplots to make them more visible.
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5,
hjust = 1)),
# Create 2 picker options for x-variable and y-variable so that we can just them later. We use as.list so that we can generate a list of the colnames of the mpg dataset and use it in our picker function.
x = picker(as.list(names(mpg))),
y = picker(as.list(names(mpg)))
)
Let’s take a look at the code. Here, we use ggplot to produce boxplots for visualization, and we nest it within the ‘manipulate’ function. Unlike the previous example, we use y = picker(as.list(names(mpg))) to generate a list of all the column names rather than literal string values. This is convenient as it saves us from having to type out all the column names of our variables aka y = picker(“manufacturer”, “model”, “displ”, “year”, “cyl”, “trans” … etc…).
One key point to note is that we must use as.list() to wrap the string vector when we use names(mpg). If we use picker(names(mpg)), we see that there is only one input option created, which is the entire string vector itself. After trial and error, we discovered that the input controls for ‘picker’ only take a list of arguments, not a vector. We cannot use list(names(mpg)) either because that returns a list with only one component - the string vector itself - which gives the same result as using picker(names(mpg)).
After running our ‘manipulate’ widget, our initial plot shows a y = x function, because both picker values are set at the default ‘manufacturer’. So how can we use our widget to quickly analyse the data?
Let’s say I am an environmental studies major and I want to know which car manufacturers have the highest fuel efficiency. From our mpg dataset, we can analyse fuel efficiency of different car manufacturers by looking at the correlation between automobile manufacturers and variables like the number of miles it runs per gallon. Let’s take a look.
A common way to measure fuel efficiency is to look at miles travelled per gallon. Using our ‘manipulate’ widget, we let our x-picker remain at the manufacturer variable, and set our y-picker to cty (city miles per gallon). Any interesting observations?
There is much to be interpreted, but one can observe that Japanese car manufacturers like Honda, Nissan and Toyota have the top 3 highest median city miles per gallon as compared to American car manufacturers like Ford, Jeep and Chevrolet.
Nothing against American cars, but this observation suggests that Japanese car manufacturers produce cars that have higher fuel efficiency for city driving. Closer statistical analysis of the dataset may surface more interesting observations about car manufacturers and fuel efficiency. Let’s take a look at another metric - highway miles per gallon.
Highway miles per gallon is another metric for measuring fuel effciency. Because there is more continuous driving (less traffic stops) on highways than in cities, this data indicates how efficient different car brands are at purely continuous driving.
So, we let our x-picker remain at the manufacturer variable, and set our y-picker to hwy (highway miles per gallon). Again, we can observe correlations between the number of highway miles per gallon against the car manufacturer.
There appears to be a similar trend in fuel efficiency for highway driving as city driving. One interesting observation - it appears that Pontiac brand cars’ fuel effiency on highways improves more than proportionately to its fuel efficiency on highways in cities.
As seen, the ‘manipulate’ widget can help us to quickly observe relationships between variables in a dataset very quickly.
Okay, so now that we see how we can use manipulate, lets try out an example that shows why manipulate is such a great package! To do this, we will directly contrast its easy use with an exercise we did in class, where we used ShinyApp instead.
First, let us load the asean_tourism dataset into our RStudio. It contains the tourism numbers for a range of countries from the year 2004 to 2015. In Tutorial 21, we were asked to create a barplot of the tourism values for each country. We also created a dropdown menu with the different years as options. Overall, this allowed us to view the tourism values of any year we wanted.
This was the code we used. Complicated, right? ShinyApp has a lot of advantages, but while it is useful, it tends to be computationally complex. So how do we do something similar using Manipulate?
First, lets recall that picker variables are a feature of the manipulate package. So we type the following code:
# Prepare the data.
asean_tourism <- read.csv("data/asean_tourism.csv", stringsAsFactors = FALSE)
names(asean_tourism)[1] <- "country"
library(manipulate)
library(ggplot2)
manipulate(
ggplot(asean_tourism, aes_string("country", paste("X", year, sep=""), fill="country")) +
geom_bar(stat="identity") +
theme(legend.position="none",
axis.text.x = element_text(angle = 90,
vjust = 0.5,
hjust = 1)),
year = picker(as.list(c(2004:2015)))
)
So in our code, what we did was assign a variable to the picker function – the function being years. In my opinion, this package is useful for data manipulation. So, for example, if we were to toggle through the years: all throughout 2004-2013, Brunei’s tourism numbers do not cross 500. But then in the years 2013-2014, we see a sudden spike in tourism to almost 2500! That’s a hike of more than 5 times! So, if you were doing say an economics capstone, you’d be interested to know why there was a hike. Were there any policy changes between 2013-14? If so, is it worth analyzing? How significant is this spike in tourism number?
See! This is why Manipulate is so handy - it allows for preliminary data exploration so you can decide what variables/data is easy to work with. Of course, once I know what variables I want to work with: its best to use shinyApp to publish my results. But until then, this is a user-friendly tool for you to work with.
Let’s look at the shinyApp code we used in class again:
The direct benefits are that our code (that you probably have typed up by now if you were following this) is a lot less complicated.
manipulate is a single function, whereas shinyApp has 2 functions: its ui and server specifications. P
But also, on a functional level: shinyApp is more concerned with layout. Our shinyApp code creates a beautifully spaced barplot, with a nice drop-down menu. Manipulate on the other hand is just for the data analyst’s use. There’s no helptext, no titlepanel, etc. This makes it convenient in the short run, but a lot less aesthetic.
The UI refers to generated by Server side, and the server side assigns an output to a name. Worrying about the mapping can get tedious, and in my humble R experience - leads to several errors that require debugging. Then again, ShinyApp is designed that way because we may have several outputs that need referring to, so of course there are its benefits of being complicated.
All in all though, this package is convenient and effective for getting the job done when it comes to sorting data according to its varying categories - and we hope we’ve proved its use!
manipulateSo we’ve seen how we can explore a given dataset with manipulate, but so far we’ve only done minimal preprocessing of the data. What if we want to parameterize our data processing? That is, how can we make more than one line of code react to changes in our input controls, as is done with Shiny apps?
We’ll describe two simple approaches, which we will explore using exactly the same code in the template Shiny app, which uses the Old Faithful Geyser data.
head(faithful)
## eruptions waiting
## 1 3.600 79
## 2 1.800 54
## 3 3.333 74
## 4 2.283 62
## 5 4.533 85
## 6 2.883 55
The sample Shiny app simply allows us to change the bin width of the histogram. Here is the part of the code which processes the data and renders the output:
output$distPlot <- renderPlot({
# generate bins based on input$bins from ui.R
x <- faithful[, 2]
bins <- seq(min(x), max(x), length.out = input$bins + 1)
# draw the histogram with the specified number of bins
hist(x, breaks = bins, col = 'darkgray', border = 'white')
})
And here is what the Shiny app looks like:
So, the first, straightforward approach to replicating this in manipulate is to simply place all the lines of code (including the plotting expression) within curly braces, just like how it is done in the Shiny app code. This is good if you only have a few lines of code.
manipulate(
{x <- faithful[, 2]
bins <- seq(min(x), max(x), length.out = numBins + 1)
hist(x, breaks = bins, col = 'darkgray', border = 'white')},
numBins = slider(min = 1, max = 50, initial = 30, label = "Number of bins:")
)
And again this gets you the same basic functionality with one and a half extra lines of code.
Note: We do realize that the three lines of code in the curly braces could have been combined into one line. But often this isn’t possible or sane, we just picked this example for the sake of simplicity.
Note: We actually discovered this ourselves, this style was not used in any of the documentation examples nor any of the very few articles online that used manipulate. All the examples we’ve seen were single-line plotting expressions, with preprocessing occurring independently outside of the manipulate function.
The second approach would be to abstract the relevant plotting code into a separate function defined outside of manipulate. This function takes the input control variables as string parameters, runs the processing steps, and returns the plot expression. This function will be passed to the manipulate function and given the input control variables.
f <- function(numBins) {
x <- faithful[, 2]
bins <- seq(min(x), max(x), length.out = numBins + 1)
hist(x, breaks = bins, col = 'darkgray', border = 'white')
}
manipulate(
f(numBins),
numBins = slider(min = 1, max = 50, initial = 30, label = "Number of bins:")
)
And it works exactly the same as the curly braces approach.
This approach has several benefits:
Cleaner code - separates plotting logic from UI implementation, similar to how Shiny app’s server and UI code are separate. This is important when your code is long, you don’t want to wrap the entire thing in a manipulate function.
Easier debugging - if the plot isn’t working, simply pass the function literal string values to run it and check if the problem is in the logic or in the UI.
Reduces code duplication, if we are planning to call manipulate multiple times for whatever reason. Or if we want our script to generate both a manipulate plot as well as a final, static plot afterwards.
Note: We also came up with this ourselves.
It is up to you to judge which approach is more appropriate depending on the dataset. Even though these are very simple tips, hopefully it makes working with manipulate even easier.
The inspiration and data sets for this example come from this website. The chief focus of the article is transforming the data to produce a more pleasing representation in ggplot2. Let’s see if we can do a better (or quicker) job with manipulate.
First we make a preliminary graph of both of the data sets.
library(ggplot2)
library(manipulate)
library(RColorBrewer)
chromium <- read.csv("data/chromium.csv", stringsAsFactors = FALSE)
ggplot(chromium, aes(air, bm, colour = welding.type)) +
geom_point() +
geom_smooth(aes(group = ""), method = "lm", colour = "white")
nickel <- read.csv("data/nickel.csv", stringsAsFactors = FALSE)
ggplot(nickel, aes(air, bm, colour = welding.type)) +
geom_point() +
geom_smooth(aes(group = ""), method = "lm", colour = "white")
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
They look rather… displeasing. The data is very sparse farther from the origin, and while ggplot2 chooses more pleasant colours than base graphics, they are hardly ideal.
We know we can find all kinds of handy transformations in ggplot2, but who has time to try them all out? RColorBrewer pallets are beautiful, but their codes are difficult to memorize, and the outcome of their application impossible to visualize.
What about R’s built in colours? You know you need a kind of blue, but who knows if they called it "powderblue" or "babyblue". Besides, who can be bothered to read through the entire output of colors() every few seconds? Not you!
With manipulate you simply paste all the options from the help files or use helper functions that list them for you to try them out on the spot! You can search through the options by simply typing the first few letters.
In addition, we are taking the easy way out an using code blocks! Granted, we lose some of the functionality of passing a graphing function to manipulate, but we gain at least some of it pack by using picker() to select the data we’re plotting.
library(ggplot2)
library(manipulate)
manipulate({
p <- ggplot(data, aes(air, bm, colour = welding.type)) + geom_point()
p + geom_smooth(aes(group = ""), method = "lm", colour = lm) +
scale_x_continuous(trans = xScale) +
scale_y_continuous(trans = yScale) +
scale_colour_manual(values = brewer.pal(4, scater))
},
# Pick appropriate transformation
yScale = picker("asn", "atanh", "boxcox",
"exp", "identity", "log",
"log10", "log1p", "log2", "logit",
"probability", "probit", "reciprocal",
"reverse", "sqrt",
label = "Y Scale Transformation"),
xScale = picker("asn", "atanh", "boxcox",
"exp", "identity", "log",
"log10", "log1p", "log2", "logit",
"probability", "probit", "reciprocal",
"reverse", "sqrt",
label = "X Scale Transformation"),
# Switch between datasets.
data = picker("Chromium" = chromium, "Nickel" = nickel, label = "Datasets"),
# Pick appropriate point colour (searchable).
scater = picker(as.list(rownames(brewer.pal.info)),
label = "Palette"),
# Pick appropriate regresion colour (searchable).
# Use as.list() as picker eithr takes lists or individual arguments.
# It would interpret a vector as a single option which it would not know
# how to handle.
lm = picker(as.list(colours()),
label = "Regression")
)
Whoops, the graph is all garbled, and there are all kind of warnings! This is because picker() defaults to the first argument unless supplied with an initial = "" argument. And remember, not all transformations are possible on all data (e.g. log based transformations on negative values)
Let’s fix that.
library(ggplot2)
library(manipulate)
manipulate({
p <- ggplot(data, aes(air, bm, colour = welding.type)) + geom_point()
p + geom_smooth(aes(group = ""), method = "lm", colour = lm) +
scale_x_continuous(trans = xScale) +
scale_y_continuous(trans = yScale) +
scale_colour_manual(values = brewer.pal(4, scater))
},
# Pick appropriate transformation
yScale = picker("asn", "atanh", "boxcox",
"exp", "identity", "log",
"log10", "log1p", "log2", "logit",
"probability", "probit", "reciprocal",
"reverse", "sqrt", initial = "identity",
label = "Y Scale Transformation"),
xScale = picker("asn", "atanh", "boxcox",
"exp", "identity", "log",
"log10", "log1p", "log2", "logit",
"probability", "probit", "reciprocal",
"reverse", "sqrt", initial = "identity",
label = "X Scale Transformation"),
# Switch between datasets.
data = picker("Chromium" = chromium, "Nickel" = nickel, label = "Datasets"),
# Pick appropriate point colour (searchable).
scater = picker(as.list(rownames(brewer.pal.info)),
label = "Palette"),
# Pick appropriate regresion colour (searchable).
# Use as.list() as picker eithr takes lists or individual arguments.
# It would interpret a vector as a single option which it would not know
# how to handle.
lm = picker(as.list(colours()),
label = "Regression")
)
Now let’s do some picking.
"Set2" and "hotpink1" seem rather attractive, with "Set2" being a slightly more tame version of ggplot2’s defaults and "hotpink1" simply being fabulous. As for transformations, "log10" seems to work best, and is also rather easy to interpret.
Your plot is now ready to be finalized and beautified just how you like it, with all the trial and error out of the way and only labels to be changed.
For reference, this is the code used on the example website simply to try out two of the countless transformation ggplot2. One might as well try them out one by one.
chromium <- read.csv("chromium.csv")
nickel <- read.csv("nickel.csv")
p <- ggplot(chromium, aes(air, bm)) +
geom_point()
win_ctrls <- gwindow("Plot controls 1-4")
grp_ctrls <- ggroup(container = win_ctrls, horizontal = FALSE)
#1 Changing scales
available_scales <- c(
Linear = "identity",
Log = "log10"
)
frm_scale_trans_y <- gframe(
"Y scale transformation",
container = grp_ctrls,
expand = TRUE
)
rad_scale_trans_y <- gradio(
names(available_scales),
container = frm_scale_trans_y,
handler = function(h, ...)
{
scale_trans_y <- available_scales[svalue(h$obj)]
p <<- p +
scale_y_continuous(
trans = scale_trans_y
)
print(p)
}
)
frm_scale_trans_x <- gframe(
"X scale transformation",
container = grp_ctrls,
expand = TRUE
)
rad_scale_trans_x <- gradio(
names(available_scales),
container = frm_scale_trans_x,
handler = function(h, ...)
{
scale_trans_x <- available_scales[svalue(h$obj)]
p <<- p +
scale_x_continuous(
trans = scale_trans_x
)
print(p)
}
)
manipulate different from Shiny apps?manipulate functions generates a GUI control panel within RStudio that allows you to automatically redraw plots.manipulate functions combine the ‘server’ and ‘user interface’ together.manipulate?