Non-linearity in Vacancy Rates in British Columbia

The number of enterprises reporting a vacancy rate of more than four months is a nonlinear function of the average wage and the regional population. This document uses the R package 'mgcv' to plot the non-linearity. mgcv uses a spline function to 'wiggle' between the points.

Load the data

wages <- read.csv("http://dl.dropbox.com/u/23281950/WagesPop.csv")

Use a regular linear regression model to find the relationship between vacancy rate, average wages and population. Both independent variables are statistically significant, but the R-squared is very low.

x <- lm(Vac ~ AvWage + Population, data = wages)
summary(x)
## 
## Call:
## lm(formula = Vac ~ AvWage + Population, data = wages)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33.40 -13.77  -4.42  12.12  78.88 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  42.5929     3.3294   12.79  < 2e-16 ***
## AvWage       -0.7399     0.1461   -5.06  6.7e-07 ***
## Population   -0.0488     0.0482   -1.01     0.31    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 20 on 343 degrees of freedom
## Multiple R-squared: 0.0781,  Adjusted R-squared: 0.0727 
## F-statistic: 14.5 on 2 and 343 DF,  p-value: 8.8e-07

It is possible that the relationship between the variables is nonlinear, which we can explore with mgcv

install.packages("mgcv")
## Installing package(s) into 'C:/Users/Stephen/Documents/R/win-library/2.15'
## (as 'lib' is unspecified)
## Error: trying to use CRAN without setting a mirror
library(mgcv)
## This is mgcv 1.7-22. For overview type 'help("mgcv-package")'.

create an object with 'gam' which means 'generalized additive model'

t <- gam(Vac ~ s(AvWage) + s(Population, k = 5), data = wages)

check the model fit

gam.check(t)

plot of chunk unnamed-chunk-5

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 5 iterations.
## The RMS GCV score gradiant at convergence was 0.0005124 .
## The Hessian was positive definite.
## The estimated model rank was 14 (maximum possible: 14)
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                  k'   edf k-index p-value
## s(AvWage)     9.000 5.656   0.985    0.35
## s(Population) 4.000 1.250   0.839    0.00

and visualise the plot

vis.gam(t, view = c("AvWage", "Population"))
## Warning: data length [31] is not a sub-multiple or multiple of the number
## of rows [30]

plot of chunk unnamed-chunk-6


The 'linear predictor' is the vacancy rate. There is an interesting shape: it is clear that higher wages lower the vacancy rate, as one would expect. But notice the trough along the Population axis. Mid-way it drops. The surface is roughly parallel to the Population axis, implying that Population on its own doesn't have a big impact on vacancy rates. However, the average wage has much more impact, but the effect is non-linear. At about one-third along the AvWage axis, there is a 'bump' in the vacancy rate. It is possible that that this the partition between 'Macjobs' and more highly-paid employment.