Introduction

For this project, you will examine the unemployment rate and its relation to the job vacancy rate. The unemployment rate is the percentage of the actively-engaged, adult population (has a job or looking for one) who do not have a job. The job vacancy rate is the number of job openings relative to the size of the actively-engaged, adult population.

The data for this project is available in the wooldridge package. If you are not using RStudio cloud, be sure to install the package by clicking “Tools > Install Packages” (only need to do once). Once installed, load the data.

data("beveridge", package="wooldridge")

The important variables in the data set are:

You will be using the ggplot library to plot the relationships.

library(ggplot2)

See 3.3 and 3.4 for ggplot examples.

1. Distribution of Unemployment

Start by looking at how the data is distributed. Create two histograms: one for the unemployment rate and one for the job vacancy rate. Each histogram should have 20 bins rather than the default of 30. Change the labels on the x axis and add a title to the plot.

Discuss what you see.


## Distribution of Unemployment

# load the library (?)
library(ggplot2)


#I can't find the data for wooldridge or urate or vrate anywhere? I have been looking for the past hour and can't get anything figured out.
# this is was trying to find the data 
data("urate")
## Warning in data("urate"): data set 'urate' not found
# I already installed wooldridge and looked through all of the options there but none of that data is related to the population or what i am trying to find. 
# I don't know how I will be able to complete this assignment and there's only about an hour left until it is due.
#deleted R Studio, redownloaded everything, trying again.





# histogram for unemployment rate
UnemploymentHistogram <- ggplot(beveridge, aes(x = urate)) + geom_histogram(binwidth = (max(beveridge$urate) - min(beveridge$urate))/20, fill = "blue", color = "black") + labs(title = "Distribution of Unemployment Rate",
x = "Unemployment Rate",
y = "Frequency")

plot(UnemploymentHistogram)

# histogram for job vacancy rate
JobVacancyHistogram <- ggplot(beveridge, aes(x = vrate)) + geom_histogram(binwidth = (max(beveridge$vrate) - min(beveridge$vrate))/20, fill = "green", color = "black") +
  labs(title = "Distribution of Job Vacancy Rate",
       x = "Job Vacancy Rate",
       y = "Frequency")

plot(JobVacancyHistogram)

# What I see on the histograms:

# Discussion below

What I see on the histograms:

2. Time-Series Plots

Plot the unemployment rates and job vacancy rates across time (month) using line plots. Note that the time variable should be on the x axis. Change the x and y labels and add a title to the plot.

Discuss what is happening to each variable across time. Do the two series move together?


## Time Series Plots

# create line plot for unemployment rate
UnemploymentLinePlot <- ggplot(beveridge, aes(x = month, y = urate)) + geom_line(color = "blue") +
  labs(title = "Unemployment Rates Across Time",
       x = "Month",
       y = "Unemployment Rate")

plot(UnemploymentLinePlot)

# Create line plot for job vacancy rates
JobVacancyLinePlot <- ggplot(beveridge, aes(x = month, y = vrate)) + geom_line(color = "green") +
  labs(title = "Job Vacancy Rates Across Time",
       x = "Month",
       y = "Job Vacancy Rate")

plot(JobVacancyLinePlot)

Line Plots Discussion: - unemployment rates across time starts off before 2002 around 4, goes a little above 6 then dips around 2007 slightly above 4. Then it peaks in 2010 at around 11, and slightly decreases choppily going forward. - The job vacancy rates across time start way high above 4.0, dramatically drop to about 2.5 in 2003, peak again between 2007-8 at just above 3.5, dip below 2.0 right before 2010, then slightly increase to above 2.5 until 2012.

3. The Beveridge Curve

After plotting the two variables across time, let us look at their relationship. This relationship describes how “easy” it is to find a job all else held constant.

Create a scatterplot with the unemployment rate on the x axis and the job vacancy rate on the y axis. Label the graph and add a linear trend (regression) line. Add a dashed vertical line through the unemployment rate in December 2008 (middle of the Global Financial Crisis), and a dashed horizontal line through the vacancy rate at that point.

Discuss the relationship. Does a straight line describe the relationship well? What is the significance of December 2008 in the plot?


## The Beveridge Curve

#create a scatterplot
#had to change this all to comments because it wouldn't know

#Scatterplot <- ggplot(beveridge, aes(x = urate, y = vrate)) + geom_point() + geom_smooth(method = "lm", se = FALSE, color = "red") + 
 # labs(title = "Scatterplot of Unemployment Rate vs. Job Vacancy Rate",
    #   x = "Unemployment Rate",
     #  y = "Job Vacancy Rate")

# now add dashed vertical line through unemployment rate in Dec 2008
#also had to change to comments
#Scatterplot <- Scatterplot + geom_vline(xintercept = beveridge$urate[beveridge$month == "Dec 2008"], # linetype = "dashed", color = "purple")

# Now add dashed horizontal line through job vacancy rate in Dec 2008
#changed to comments
#Scatterplot <- Scatterplot + geom_hline(yintercept = beveridge$vrate[beveridge$month == "Dec 2008"], #linetype = "dashed", color = "orange")

The Scatterplot trends negatively starting heavily at the top left, less in the middle, then slightly less heavily on the bottom right. The trend line is very distinct within the data, and accurately portrays the negative trend portrayed by the plotted dots.

4. Groups

Create a new variable that is TRUE when the month is December 2008 or after, FALSE otherwise. Plot a histogram for unemployment rate, using a facet_wrap with your new variable so that two histograms are created – before 2009 & after. Then plot a scatterplot – as in the previous question \(-\) except with different colors and trend lines for the pre- and post- periods (dashed lines not needed).

Does splitting on December 2008 improve our understanding of the Beveridge Curve?


## Groups

#creating the new variable
#changed to comments because it wouldn't knit
#beveridge$NewVariable <- beveridge$month >= "Dec 2008"

#Plot histograms for unemployment rate with facet wrap
#Had to get help to understand this fully
#changed to comments also
#UnemploymentHistogram <- ggplot(beveridge, aes(x = urate, fill = NewVariable)) + geom_histogram(binwidth = (max(beveridge$urate) - min(beveridge$urate))/20, color = "black", position = "identity", alpha = 0.7) + labs(title = "Distribution of Unemployment Rate Before and After 2009", 
# x = "Unemployment Rate",
# y = "Frequency") +
 # facet_wrap(~NewVariable, scales = "free_y")


#Plot scatterplot with different colors and trend lines for pre- and post- period
#Scatterplot <- ggplot(beveridge, aes(x = urate, y = vrate, color = NewVariable)) + geom_point() + 
 # geom_smooth(method = "lm", se = FALSE) + 
 # labs(title = "Scatterplot of Unemployment Rate vs. Job Vacancy Rate",
    #   x = "Unemployment Rate",
     #  y = "Job Vacancy Rate")


#plot(UnemploymentHistogram)
#plot(Scatterplot)

The code isn’t working and I don’t have enough time to redo it if I want to complete number 5 in time as well

5. Themes!

ggplot2 contains a number of themes, such as theme_bw() that can be added to the plot to change the overall look and design. A number of other packages go further. The ggthemes package contains themes associated with popular data software and news sites such as the Wall Street Journal or Economist. ggthemr, tvthemes, and hrbrthemes contain others.

Choose two themes from any package – or create your own if you are very ambitious! – then recreate one of the time series graphs in question 2 and the scatterplot in question 3 using the chosen themes.


## Themes



# create a line plot for unemployment rates across time with minimal theme (question 2)
MinimalTheme <- ggplot(beveridge, aes(x = month, y = urate)) +
  geom_line(color = "red") +
  labs(title = "Unemployment Rates Across Time",
       x = "Month",
       y = "Unemployment Rate") +
  theme_minimal()

# create a scatterplot with regression line using minimal theme (question 3)
ScatterplotMinimalTheme <- ggplot(beveridge, aes(x = urate, y = vrate)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +  
  labs(title = "Scatterplot of Unemployment Rate vs. Job Vacancy Rate",
       x = "Unemployment Rate",
       y = "Job Vacancy Rate") +
  theme_minimal()





plot(MinimalTheme)

plot(ScatterplotMinimalTheme)
## `geom_smooth()` using formula = 'y ~ x'