For this project, you will examine the unemployment rate and its relation to the job vacancy rate. The unemployment rate is the percentage of the actively-engaged, adult population (has a job or looking for one) who do not have a job. The job vacancy rate is the number of job openings relative to the size of the actively-engaged, adult population.
The data for this project is available in the wooldridge
package. If you are not using RStudio cloud, be sure to install the
package by clicking “Tools > Install Packages” (only need to do
once). Once installed, load the data.
data("beveridge", package="wooldridge")
The important variables in the data set are:
month - date of the observation in “Year-Month-Day”
formaturate - unemployment ratevrate - job vacancy ratet - the observation number, ordered by timeYou will be using the ggplot library to plot the
relationships.
library(ggplot2)
See 3.3 and 3.4 for ggplot examples.
Start by looking at how the data is distributed. Create two histograms: one for the unemployment rate and one for the job vacancy rate. Each histogram should have 20 bins rather than the default of 30. Change the labels on the x axis and add a title to the plot.
Discuss what you see.
## Distribution of Unemployment
# load the library (?)
library(ggplot2)
#I can't find the data for wooldridge or urate or vrate anywhere? I have been looking for the past hour and can't get anything figured out.
# this is was trying to find the data
data("urate")
## Warning in data("urate"): data set 'urate' not found
# I already installed wooldridge and looked through all of the options there but none of that data is related to the population or what i am trying to find.
# I don't know how I will be able to complete this assignment and there's only about an hour left until it is due.
#deleted R Studio, redownloaded everything, trying again.
# histogram for unemployment rate
UnemploymentHistogram <- ggplot(beveridge, aes(x = urate)) + geom_histogram(binwidth = (max(beveridge$urate) - min(beveridge$urate))/20, fill = "blue", color = "black") + labs(title = "Distribution of Unemployment Rate",
x = "Unemployment Rate",
y = "Frequency")
plot(UnemploymentHistogram)
# histogram for job vacancy rate
JobVacancyHistogram <- ggplot(beveridge, aes(x = vrate)) + geom_histogram(binwidth = (max(beveridge$vrate) - min(beveridge$vrate))/20, fill = "green", color = "black") +
labs(title = "Distribution of Job Vacancy Rate",
x = "Job Vacancy Rate",
y = "Frequency")
plot(JobVacancyHistogram)
# What I see on the histograms:
# Discussion below
What I see on the histograms:
Unemployment rate: there’s only 19 columns even though I asked for 20. 2 blank spaces in between columns 11&12 and 12&13. Sort of looks like 2 separate graphs almost - there’s a bunch of tall colums all next to each other on the left, then a bunch of lower/empty columns, then a secondary group of tall columns towards the far right. The tallest columns on the left side reach just above 15, and on the right side, just below 10.
Job Vacancy: 20 columns (there is a blank space between the first 18 and the last 2 columns); highest peak at around 2.6 or so on the x axis which is just under 25 on the y axis. Second peak at around 3.25, reaching a value of just under 20 on the y axis. Sort of follows a bell curve type shape but not substantially, just in the fact that the two peaks are towards the middle and it tapers off on both ends.
Plot the unemployment rates and job vacancy rates across time (month) using line plots. Note that the time variable should be on the x axis. Change the x and y labels and add a title to the plot.
Discuss what is happening to each variable across time. Do the two series move together?
## Time Series Plots
# create line plot for unemployment rate
UnemploymentLinePlot <- ggplot(beveridge, aes(x = month, y = urate)) + geom_line(color = "blue") +
labs(title = "Unemployment Rates Across Time",
x = "Month",
y = "Unemployment Rate")
plot(UnemploymentLinePlot)
# Create line plot for job vacancy rates
JobVacancyLinePlot <- ggplot(beveridge, aes(x = month, y = vrate)) + geom_line(color = "green") +
labs(title = "Job Vacancy Rates Across Time",
x = "Month",
y = "Job Vacancy Rate")
plot(JobVacancyLinePlot)
Line Plots Discussion: - unemployment rates across time starts off before 2002 around 4, goes a little above 6 then dips around 2007 slightly above 4. Then it peaks in 2010 at around 11, and slightly decreases choppily going forward. - The job vacancy rates across time start way high above 4.0, dramatically drop to about 2.5 in 2003, peak again between 2007-8 at just above 3.5, dip below 2.0 right before 2010, then slightly increase to above 2.5 until 2012.
After plotting the two variables across time, let us look at their relationship. This relationship describes how “easy” it is to find a job all else held constant.
Create a scatterplot with the unemployment rate on the x axis and the job vacancy rate on the y axis. Label the graph and add a linear trend (regression) line. Add a dashed vertical line through the unemployment rate in December 2008 (middle of the Global Financial Crisis), and a dashed horizontal line through the vacancy rate at that point.
Discuss the relationship. Does a straight line describe the relationship well? What is the significance of December 2008 in the plot?
## The Beveridge Curve
#create a scatterplot
#had to change this all to comments because it wouldn't know
#Scatterplot <- ggplot(beveridge, aes(x = urate, y = vrate)) + geom_point() + geom_smooth(method = "lm", se = FALSE, color = "red") +
# labs(title = "Scatterplot of Unemployment Rate vs. Job Vacancy Rate",
# x = "Unemployment Rate",
# y = "Job Vacancy Rate")
# now add dashed vertical line through unemployment rate in Dec 2008
#also had to change to comments
#Scatterplot <- Scatterplot + geom_vline(xintercept = beveridge$urate[beveridge$month == "Dec 2008"], # linetype = "dashed", color = "purple")
# Now add dashed horizontal line through job vacancy rate in Dec 2008
#changed to comments
#Scatterplot <- Scatterplot + geom_hline(yintercept = beveridge$vrate[beveridge$month == "Dec 2008"], #linetype = "dashed", color = "orange")
The Scatterplot trends negatively starting heavily at the top left, less in the middle, then slightly less heavily on the bottom right. The trend line is very distinct within the data, and accurately portrays the negative trend portrayed by the plotted dots.
Create a new variable that is TRUE when the month is
December 2008 or after, FALSE otherwise. Plot a histogram
for unemployment rate, using a facet_wrap with your new
variable so that two histograms are created – before 2009 & after.
Then plot a scatterplot – as in the previous question \(-\) except with different colors and trend
lines for the pre- and post- periods (dashed lines not
needed).
Does splitting on December 2008 improve our understanding of the Beveridge Curve?
## Groups
#creating the new variable
#changed to comments because it wouldn't knit
#beveridge$NewVariable <- beveridge$month >= "Dec 2008"
#Plot histograms for unemployment rate with facet wrap
#Had to get help to understand this fully
#changed to comments also
#UnemploymentHistogram <- ggplot(beveridge, aes(x = urate, fill = NewVariable)) + geom_histogram(binwidth = (max(beveridge$urate) - min(beveridge$urate))/20, color = "black", position = "identity", alpha = 0.7) + labs(title = "Distribution of Unemployment Rate Before and After 2009",
# x = "Unemployment Rate",
# y = "Frequency") +
# facet_wrap(~NewVariable, scales = "free_y")
#Plot scatterplot with different colors and trend lines for pre- and post- period
#Scatterplot <- ggplot(beveridge, aes(x = urate, y = vrate, color = NewVariable)) + geom_point() +
# geom_smooth(method = "lm", se = FALSE) +
# labs(title = "Scatterplot of Unemployment Rate vs. Job Vacancy Rate",
# x = "Unemployment Rate",
# y = "Job Vacancy Rate")
#plot(UnemploymentHistogram)
#plot(Scatterplot)
The code isn’t working and I don’t have enough time to redo it if I want to complete number 5 in time as well
ggplot2 contains a number of themes, such as
theme_bw() that can be added to the plot to change the
overall look and design. A number of other packages go further. The
ggthemes package contains themes associated with popular
data software and news sites such as the Wall Street Journal or
Economist. ggthemr, tvthemes, and
hrbrthemes contain others.
Choose two themes from any package – or create your own if you are very ambitious! – then recreate one of the time series graphs in question 2 and the scatterplot in question 3 using the chosen themes.
## Themes
# create a line plot for unemployment rates across time with minimal theme (question 2)
MinimalTheme <- ggplot(beveridge, aes(x = month, y = urate)) +
geom_line(color = "red") +
labs(title = "Unemployment Rates Across Time",
x = "Month",
y = "Unemployment Rate") +
theme_minimal()
# create a scatterplot with regression line using minimal theme (question 3)
ScatterplotMinimalTheme <- ggplot(beveridge, aes(x = urate, y = vrate)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Scatterplot of Unemployment Rate vs. Job Vacancy Rate",
x = "Unemployment Rate",
y = "Job Vacancy Rate") +
theme_minimal()
plot(MinimalTheme)
plot(ScatterplotMinimalTheme)
## `geom_smooth()` using formula = 'y ~ x'