Hello, Mr. Coppock. Here I have recreated a plot of GDP per capita over time that you used in lecture this week, using the ggplot2 package in R. I’ll go through the steps in order to give you an idea of how things work and what is possible.

First, here is the plot.

The Maddison-Project, http://www.ggdc.net/maddison/maddison-project/home.htm, 2013 version.


Now I’ll work through the process of producing that. I’ll assume you’re using R with RStudio (you’ll need to install both), and have an idea of how they work.

The first thing is to install the necessary packages. We will need ggplot2, reshape2, directlabels, and scales. You can check to see if you already have them by trying to load them with library(ggplot2), and so forth with the other packages.

For those that you don’t have, you only want to install them once, so run the appropriate install.packages() commands in the console, not in a script.

install.packages("ggplot2")
install.packages("reshape2")
install.packages("directlabels")
install.packages("scales")

Note that you install packages with quotations around the name, and load packages without quotations.


The next thing is to set your working directory to the location of the data, and read the data in as an object. I downloaded a .xlsx from the Maddison project and manually deleted the first two rows (they were extraneous) and put “Year” in A1, and saved it as a csv. Change the path and name to your own.

setwd("~/Dropbox/School/Year 4 Spring/Macroecon/")
gdp <- read.csv("gdp_per_capita.csv")

<- is the operator that assigns names to objects. You can also use = if you like.

The gdp dataframe should have appeared in your global environment pane. You can click the name to take a look at it.
Now we’ll want to take a subset of the gdp dataframe with only data from the columns and years we want in our plot. We’ll make vectors of the names.

desired_columns <- c("Year","Poland","Mexico","Ukraine","India","Nicaragua","Somalia")
desired_rows <- gdp$Year >= 1950

The c() function “combines” the arguments you provide into a vector. We need quotes around each name because they are character strings, not objects in our environment.

The dollar sign in gdp$Year is a way to call a column of a dataframe. It’s always dataframe$column. The statement gdp$Year >= 1950 produces a vector of FALSEs for each year before 1950, and TRUEs for the rest.

Now we’ll use bracket notation to define a new object as a subset of the dataframe.

subset.gdp <- gdp[desired_rows,desired_columns]

Brackets call a subset of a dataframe. You specify the rows, then the columns. Here we’ve said we want rows for 1950 and beyond, and the columns whose names match the strings in the desired_columns vector. You can also specify row and column numbers. dataframe[3,5] would call cell C5.

In some cases, we’d be ready to plot now. But in this case there is one more step. The philosophy of ggplot2 revolves around mapping “aesthetics” to variables. Here we will want to map x to the year, y to the GDP value, and color to the nation. So we need a dataframe with three columns: Year, Nation, and GDP.

The function melt() in the package reshape2 will do this for us. It will change the data from “wide form” to “long form”. We’ll tell it that the Year column holds id-variables, and the other columns hold measured variables. We’ll get back a long dataframe with the columns Year, variable (nation), and value (GDP).

library(reshape2)
long.subset.gdp <- melt(subset.gdp, id.var="Year")

Now our data is in the right form for plotting.

The function ggplot() initializes a plot with given data and aesthetics. We’ll assign it to a variable, and add elements and modifications to it with other functions after the + operator. geom_line() will add lines to our plot.

figure1 <- ggplot(data=long.subset.gdp, aes(x=Year, y=value, color=variable)) +
    geom_line()

We can call our basic plot by name. Don’t worry about error messages indicating missing values.

figure1
## Warning: Removed 27 rows containing missing values (geom_path).


Now we’ll want to change the axis labels and add a title.

figure1 <- ggplot(data=long.subset.gdp, aes(x=Year, y=value, color=variable)) +
    geom_line() +
    scale_y_continuous("Constant Dollars") +
    ggtitle("Real GDP per capita")
figure1
## Warning: Removed 27 rows containing missing values (geom_path).



We can change the size and color of the text using ggplot2’s theme() function.

figure1 <- ggplot(data=long.subset.gdp, aes(x=Year, y=value, color=variable)) +
    geom_line() +
    scale_y_continuous("Constant Dollars") +
    ggtitle("Real GDP per capita") +
    theme(text=element_text(color="grey50")) +
    theme(axis.title=element_text(size=15)) +
    theme(plot.title=element_text(size=20, color="steelblue"))
figure1
## Warning: Removed 27 rows containing missing values (geom_path).



Now we may want to change to a white background with theme_bw() and label the x-axis every 10 years with the breaks argument in scale_x_continuous(). We’ll feed it a sequence from 1950 to 2010 with 10 year steps using seq(1950, 2010, 10).

figure1 <- ggplot(data=long.subset.gdp, aes(x=Year, y=value, color=variable)) +
    geom_line() +
    theme_bw() +
    scale_y_continuous("Constant Dollars") +
    ggtitle("Real GDP per capita") +
    scale_x_continuous(breaks=seq(1950,2010,10)) +
    theme(text=element_text(color="grey50")) +
    theme(axis.title=element_text(size=15)) +
    theme(plot.title=element_text(size=20, color="steelblue"))
figure1
## Warning: Removed 27 rows containing missing values (geom_path).



I would prefer a comma in the labels of the y-axis. We can do that with labels=comma, but we need the scales package.

library(scales)
figure1 <- ggplot(data=long.subset.gdp, aes(x=Year, y=value, color=variable)) +
    geom_line() +
    theme_bw() +
    scale_y_continuous("Constant Dollars", labels=comma) +
    ggtitle("Real GDP per capita") +
    scale_x_continuous(breaks=seq(1950,2010,10)) +
    theme(text=element_text(color="grey50")) +
    theme(axis.title=element_text(size=15)) +
    theme(plot.title=element_text(size=20, color="steelblue"))
figure1
## Warning: Removed 27 rows containing missing values (geom_path).


Finally, we’d like to label the lines directly with the name of the country, and get rid of the legend. ggplot2 does not support that, but we can feed the ggplot2 object to direct.label() in the package directlabels, and it will do it for us. We’ll need a litle more space for the labels, so we’ll extend the plot out to year 2015 with limits=c(1950,2015) in scale_x_continuous().

library(directlabels)
figure1 <- ggplot(data=long.subset.gdp, aes(x=Year, y=value, color=variable)) +
    geom_line() +
    theme_bw() +
    scale_y_continuous("Constant Dollars", labels=comma) +
    ggtitle("Real GDP per capita") +
    scale_x_continuous(breaks=seq(1950,2010,10), limits=c(1950,2015)) +
    theme(text=element_text(color="grey50")) +
    theme(axis.title=element_text(size=15)) +
    theme(plot.title=element_text(size=20, color="steelblue"))
direct.label(figure1, method="last.points")
## Warning: Removed 27 rows containing missing values (geom_path).



Graphics made this way are reproducible. You can plug new data into scripts that are already made. You also have a record of what’s been done to the data.


You can also do some exotic things. Here are a couple of things that I’ve made in the past.

require(ggplot2)
require(scales)

abund <- read.csv("~/Dropbox/School/PaceLab/Chaoborus Phenology/master_abund.csv")
abund.fract <- ggplot(data = abund, aes(x = DOY)) +
    facet_grid(Lake ~ Year, scales = "free_y") + theme_bw() +
    geom_line(aes(y = fractionFI))
abund.fract

salaries <- read.csv("~/Dropbox/School/Year 4 Fall/D2K/project/sal_by_dept.csv")
violin <- ggplot(data=salaries, aes(y=Salary, x=GDept, fill=GDept, alpha=0.5)) +
  geom_violin(scale="area") +
  scale_y_continuous("Salary (USD)", labels=comma, breaks=(0:7)*50000) +
  scale_x_discrete("Department") +
  theme(legend.position="none") +
  ggtitle("Denisty Estimates of 2014 UVa Faculty Salaries by Department")
violin


When you aren’t sure how to use a function, you can see its documentation by running ?ggplot or help(ggplot) (with any function without parentheses) in the console. After that, googling will usually find answers.

Here are a few resources for learning more about plotting with ggplot2 in R.


This document was generated using R Markdown.