For this discussion I pulled some data from the Federal Reserve Bank of St. Louis. In particular I pulled annual inflation and unemployment data from 1960 to 2024.
library(readxl)
fred_data <- read_excel("fredgraph.xlsx", sheet = "Annual")
We will be looking at the unemployment rate as the independent variable and the inflation rate as the dependent one. According to the concept of the Phillips curve, the two are inversely related. When unemployment is high, labor (a universal input to all sectors) is cheap and can keep prices low. When unemployment is low and labor is scare, we would expect to see prices rise to reflect higher wages.
plot(fred_data$UNRATE, fred_data$FPCPITOTLZGUSA, main = "U.S. Unemployment Rate and Average Yearly Inflation",
xlab = "Unemployment Rate (%)", ylab = "CPI Inflation (Yearly Average %)")
Though we will try it here, it is questionable whether the linear model is suited for this case. For one, the relationship between the two variables may not be linear (though we often draw it that way in classes), and the other issue being that the observations are not really independent. Past inflation informs future expectations and thus future inflation. We are fitting a linear regression to the data to see exactly how well it does.
#find the linear regression for when unemployment is the independent variable, and inflation is the response variable
lm(FPCPITOTLZGUSA ~ UNRATE, fred_data)
##
## Call:
## lm(formula = FPCPITOTLZGUSA ~ UNRATE, data = fred_data)
##
## Coefficients:
## (Intercept) UNRATE
## 2.8628 0.1518
#find the correlation of the 2 variables
r <- cor(fred_data$UNRATE, fred_data$FPCPITOTLZGUSA)
r^2
## [1] 0.008058693
According the results of this regression, the line of best fit has a slope of 0.1518, and an intercept of 2.86. The positive slope suggests that higher unemployment is correlated with higher inflation, the opposite of the theory behind the Phillips curve. We should beware though, as the outliers present (joint high unemployment and inflation of the 1970s) is certainly exerting a lot of influence on the slope of the line. Nonetheless, removing these outliers would make us blind to an important era of macroeconomic history. Though an imperfect analysis, it does suggest a more murky relationship between inflation and unemployment then is normally shown.
The intercept value of 2.86 implies that when the unemployment rate is 0%, inflation would be around 2.86%. It is an interesting finding though not very practical for real world applications.
Another variable worth looking at is the \(R^2\) value. Found by squaring the correlation, it tells us how much of the variation of the data is explained by the trend line. In this case it is less than 1%.
We can replicate the regression findings with a couple different methods. The first of which is least square formula for the slope.
\(b_1 = \frac{s_y}{s_x}\cdot R\)
#find the standard deviation for unemployment
sx <- sd(fred_data$UNRATE)
#find the standard deviation for inflation
sy <- sd (fred_data$FPCPITOTLZGUSA)
#solve for the slope of the line
b1 <- (sy/sx)*r
b1
## [1] 0.1517985
Note: Here the correlation gives us a standardized direction and strength of fit (from -1 to +1), while the standard deviation scales it based on the distribution of the data on the xy plane.
We can also find the same slope using the covariance/variance formula.
#find the slope using the covariance/variance formula
cov(fred_data$UNRATE, fred_data$FPCPITOTLZGUSA)/var(fred_data$UNRATE)
## [1] 0.1517985
Having confirmed the slope, we now move on to the intercept. We will solve for this using the point slope equation and the fact that we know the line intercepts the point \((\bar{x}, \bar{y})\).
\(b_0 = \bar{y}-b_1\cdot\bar{x}\)
b0 <- mean(fred_data$FPCPITOTLZGUSA) - b1 * mean(fred_data$UNRATE)
b0
## [1] 2.862794
Using these parameters, lets overlay the regression line onto our scatter plot from earlier.
plot(fred_data$UNRATE, fred_data$FPCPITOTLZGUSA, main = "U.S. Unemployment Rate and Average Yearly Inflation",
xlab = "Unemployment Rate (%)", ylab = "CPI Inflation (Yearly Average %)")
lines(fred_data$UNRATE, b0+b1*fred_data$UNRATE, col = "red")
Again the fit does not seem to be very good in this case. The outliers exert enough leverage that they pull the trend line up, but removing them from the data set would prevent us from recognizing that the relationship is actually more complicated. In the end, the model resulting from least square regression does not seem to be a good one in this case.
There are 4 assumptions made by the least square regression method. If any of these conditions are not true, then the resulting model would not be appropriate. I will break down each assumption and how it applies to the data set from the first question.
Linearity - For linear regression to be appropriate, the data being fit should have an underlying linear trend. In the case of the FRED data from before, the main cloud of data does seem to suggest a linear, albiet weak, relationship, so this assumption is reasonably satisfied.
Nearly Normal Residuals - This assumes that the residuals of a fit (the difference between an actual data point and where the model predicts it will be) are normally distributed around the line of best fit. In other words, we shouldn’t see extreme outliers. The FRED data has a few such outliers and as such, the least square line is not appropriate.
Constant Variability - Variability describes how far off on average a data point is from the line of best fit. In the FRED data, it seems that as unemployment rises, the variability of inflation increases dramatically. This condition is not satisfied and suggests another model is needed.
Independent Observations- Another assumption made is that each observation is independent of one another. In our example that condition is very much not satisfied, as current inflation can influence future inflation if it changes expectations.
In the end, linear regression may not be an appropriate fit for this kind of data. The 4 assumptions above give good reason as to why.