The idea of this project is to develop an algorithm to predict the length unemployment in the US economy based on some economic indicators. The length of unemployment is closely related to government spending on social security, default rates on mortgages and credit cards, etc. Being able to predict the length of unemployment is of high value to make government spending more efficient and to reduce risk to financial institutions. Additionally it serves as a thermometer for the economy as a whole.

The problem we are trying to solve is to develop a simple method to predict the median weeks of unemployment based on monthly indicators that are easier to obtain than an accurate depiction of the length of unemployment.

Our clients could be either the government (or other public sector entities) or private companies that need an unemployment prediction. Public sector entities would be interested in adjusting social security payments to the actual need and providing a timely budget prediction. Private sector entities could be, for example, banks that use the length of unemployment as a predictor for default probability on debts; such institutions could be able to more accurately predict their gains/losses.

The variables are:

Additional dummy variables may be created to capture some political or economic trends.

Part of the data are available in the “economics” dataset in R. The remaining variables are found on BLS, Qandl and similar sources. The data set will result from merging the existing “economics” data with the additional variables.

The first step is to obtain the data. Then proper formatting and cleaning, followed by some descriptive statistics and graphs to gain some insight into the data. Each month/year combination will be defined as an entry (observation), so that we can use the month itself as a predictor, to maybe catch some seasonality effect. Data from 1967 to 2015 should provide a large enough data set with 574 observations that can be split into training and test data. The idea is to fit a random forest model (or another model) to be able to predict the length of unemployment.

The deliverables are the code used to generate the model, slide deck for presentation and a brief paper explaining the concepts, methods and results.