We want to compare the performance of the US presidents beginning with Ronald Reagan and ending with Barack Obama using actual economic data from official sources. We will use data from the Federal Reserve Economic Data (FRED) system.
The main FRED page is at https://fred.stlouisfed.org/.
There are five variables from FRED to be used in our analysis. Each of these can be thought of as a performance measure in the area of macroeconomics.
CPIAUSCL: The Consumer Price Index for All Urban Consumers: All Items, Index 1982-1984=100, Quarterly, Seasonally Adjusted. For our purposes, we will use the annual growth rate, which is commonly referred to as the rate of inflation.
A191RL1Q225SBEA: Real GDP, Percent Change from Preceding Period, Seasonally Adjusted Annual Rate.
PAYEMS: All Employees: Total Nonfarm Payrolls, Thousands of Persons, Quarterly, Seasonally Adjusted. We will use the first difference of this series. These are the numbers reported on a monthly basis as the “new jobs.” The numbers here are about three times as large as the monthly numbers.
UNRATE: The civilian unemployment rate, defined as the fraction of the labor force that is unemployed - Percent, Seasonally Adjusted.
LNS11300060: Civilian Labor Force Participation Rate: 25 to 54 years, Percent, Quarterly, Seasonally Adjusted. This differs from the number that is usually reported in that it is based on the population between the ages of 25 and 54, the prime working years. The usual number is based on the civilian population 16 and over. The meaning of the usual number has changed since 1980 because of the ages of the baby boom group.
The variables in the form we want to use them are all in the CSV file indicators.csv.
Most of this is easy, but handling the date raises a question. Before doing the import, google for “r data format.” Examine the data with the usual tools after you import it.
How do we want to visualize this data? Can we work with the data as is, or do we need to restructure it?
Produce a time-series graph of one performance measure for one president.
Produce a time series plot of the data for each variable. Map the presidents’ names to color.
We need to create a narrow version of our dataframe in which the indicator names are the contents of a variable rather than variable names. The name of the variable will be “indicator” and the values will be in a variable called “value.” Call the resulting dataframe “indg.”
Create a basic scatterplot of value against time. Use facet_grid() to put indicators in rows and presidents’ names in column.
What Problems do we see in this graph?
Note that every cell in the grid has the same scale. The value and date axes are all able to accomodate every possible value. Fix this.
The next problem is that the names of the presidents are in alphabetical order. Use the levels argument of factor() to put the presidents’ names in chronological order.
The names of the indicator variables are somewhat cyrptic. These should be replace with reader-friendly labels. Use the labels argument of the factor function to make these. Remember that the order of levels is established by the sorting order of the original values. This order must be followed in assigning labels.
The indicator labels are easier to understand, but still hard to read. Investigate the use of Theme() options to change the angle of the y-axis strip text.
Enhance the appearance of the graph by mapping col to the name of the president. The default colors are a bit wimpy, so pick something better using scale_color_brewer.
Investigate the Theme() options for dealing with the legend.
Play with the parameters to get something you like. Also change the angle of the years on the x-axis to avoid the overprinting. Finally save the result to both a pdf file and a jpg file.