Packages are are bundles of function, data and documentation. Anyone can develop an R package, and it is all free to share. Some R packages are simply a bundle of a few useful functions, and some are essentially comprehensive software that run on the R platform. To date, there are > 10,000 packages distributed on the official CRAN (Comprehensive R Archive Network) website. This means that for whatever problem, there is an R package that could help you solve it.
All packages distributed through CRAN follows a convention and has an associated manual to make it easy for you to use it. Developers can also distribute packages independently, and you can install those in a few simple steps.
The first step in using a package is to install it on your computer. This simply means that the package will be available to load. You will still have to load the package within each session to use it.
Note: You will still have to load the package each session to use it. See Section 7.1.3
You can install packages very easily. The simplest is to use the command line to run the function install.packages()
Try downloading these three packages:
install.packages('RColorBrewer') #a package for color palettes
install.packages('nlme') #a package for conducting mixed-effects models
install.packages('tidyr') #a package for dealing with dataframe
install.packages('dplyr') #another package for simplifying complex tasks with dataframes
install.packages('ggplot2') #a popular package for graphics
install.packages('igraph') #a package for network analysis
install.packages('leaflet') #a package to generate embedded maps in html documents.
Alternatively, you can go to Tools –> Install Packages... in Rstudio.
Now that you have packages, you can load them in your R session using library(). Some people use require(), which is essentially the same thing (but if you really want to get into the difference, I refer you to this webpage)
As an example, let’s load the igraph package and look at its help file:
library(igraph)
library(help='igraph')
You can see that this package has a ton of different functions that conduct special tasks related to network analysis. You will also notice that the header material contains a lot of information, including a url that acts as the package’s homepage: http://igraph.org
Just for show-and-tell, here is an example network generated from igraph (this is an example of a random network generated by what is called a preferential attachment process).
#create a random graph using preferential attachment
set.seed(5)
g=sample_pa(50, directed=F)
fg=fastgreedy.community(g)
plot(g, vertex.label="", vertex.size=10, vertex.color=membership(fg), edge.color="black")
In this section, I will just go through a few examples of popular and useful R packages. These examples are pretty ‘general’ packages that do certain tasks that can help you simplify some complex task or generate pretty graphics. Many other packages are aimed towards conducting some specific types of analyses, and we may go through examples of those later.
RColorBrewerDon’t you sometimes get frustrated trying to figure out how to make your figures with the optimal set of colors–e.g., to make it color blind-friendly, to maximize contrast, or creating a nice gradient of colors? RColorBrewer is a really useful package for generating color palettes that are based on some principles of cartography. This packages basically takes the information available from the “colorbrewer” website http://colorbrewer2.org/ and makes it accessible in R.
Let’s load the package and use the display.brewer.all() function to see all of the possible palettes available.
library(RColorBrewer)
display.brewer.all()
The codes on the sides of the palettes are the names of the palettes. You can use the
brewer.pal() function to select a palette, and the number of colors you want from that palette. See ?brewer.pal for much more info.
Here are a couple of examples of how to use brewer.pal() to create colors for your plots.
colors=brewer.pal(8, "Set2")
plot(1:8, 1:8, pch=19, col=colors, cex=8, ylim=c(0,9), xlim=c(0,9))
colors=brewer.pal(8, "YlOrRd")
plot(1:8, 1:8, pch=19, col=colors, cex=8, ylim=c(0,9), xlim=c(0,9))
dplyrdplyr helps you deal with some complex manipulations of dataframes. The package introduces some functions for summarizing, grouping, and otherwise manipulating data that are more intuitive than the functions available in base R. It also allows you to use ‘pipes’ as implemented through another package called magrittr. A ‘pipe’ (by the symbol %>%) allows you to carry a value forward without assigning it into a new object. So you can combine a series of tasks in a single sequence.
Let’s try it out to create a table of means and standard errors of petal length for three species of Iris in the iris dataset.
library(dplyr)
se=function(x) sqrt(var(x)/length(x))
iris.summary=iris %>%
group_by(Species) %>%
summarise(mean.petal=mean(Petal.Length), se.petal=se(Petal.Length))
iris.summary
## # A tibble: 3 × 3
## Species mean.petal se.petal
## <fctr> <dbl> <dbl>
## 1 setosa 1.462 0.02455980
## 2 versicolor 4.260 0.06645545
## 3 virginica 5.552 0.07804970
ggplot2ggplot2 is super popular and useful package for creating graphics. It was created by the same person that created dplyr (Hadley Wickham, who is a giant in the R community).
The benefit of ggplot2 is that it can make really pretty plots by default. The big cost is that there is an additional learning curve to understanding how to put a plot together and tinker with it. Here is the ggplot2 tutorial website: http://docs.ggplot2.org/current/. You can also get the book
Let’s compare the task of making a bar plot with error bars in base R vs. ggplot2. We will create a bar plot with mean \(\pm\) standard error for petal length in the three species of irises (using the iris.summary object created in dplyr above). We’ll make it pretty by generating colors from RColorBrewer.
#create a color palette from RColorBrewer
new.palette=brewer.pal(3, "Spectral")
#note: assigning the barplot to an object (called 'bp' here) allows you to get the x-axis values of the centers of the bars.
bp=barplot(iris.summary$mean.petal, col=new.palette, ylim=c(0, max(iris.summary$mean.petal)+max(iris.summary$se.petal)*2),names=iris.summary$Species, xlab="Species", ylab="Petal Length", las=1)
arrows(bp, iris.summary$mean.petal+iris.summary$se.petal, bp, iris.summary$mean.petal-iris.summary$se.petal, code=3, angle=90, length=0.3, lwd=2)
Codes for creating a similar graph in ggplot2 might look like this:
library(ggplot2)
p = ggplot(iris.summary, aes(x=Species, y = mean.petal, fill=Species))
p = p + geom_bar(stat="identity")
p = p + geom_errorbar(aes(ymin=mean.petal-se.petal, ymax=mean.petal+se.petal), width=0.2)
p = p + scale_fill_manual(values = new.palette)
p
I think it all comes down to preference. Some people like the convenience of a pretty default and perks like an automatic legend and axes. Personally, I prefer plotting with the base R package because I find it to be plenty powerful and I am much more familiar with the syntax and options.
leafletThis is a random one that I just recently came across. It allows you to embed an interactive map in your html R Markdown documents. I’m just including it here because I think it’s super cool. This will NOT work when rendering to pdf.
library(leaflet)
m=leaflet()
m=addTiles(m)
m=addMarkers(m, lat=40.818766, lng=-96.705075, popup="You are here")
m