Rate Cloud

Terry Leitch
02-06-2018

Background

OTC derivative data is available but….

  • 100+ fields per trade
    • Different interpretation of field based on asset type
      • Interest rate and credit derivatives
      • Equity derivatives
      • Commodity derivatives
    • Mix of categorical factors
      • Compounding frequency
      • Reset index (LIBOR/Prime/etc)
      • Existence of collateral agreement and terms
      • Traded on/off exchange
      • Cleared/uncleared
  • Several factors have non-linear impact

Target: Find a way to quantify what is going on with the market

Solution

Define a Standard Instrument and invert its value from recent prices

  • Analayze the different terms to identify most common combinations
  • Select relevant factors using business knowledge
  • Pick model
    • Fit
    • Check errors
    • If not acceptable errors, modify and fit again changing # of points and trees

Model Choices

  • Standard regression would be very hard to use due to non-linearity
  • Logit - fits non-linear but so many combinations
  • Tree method - can handle non-linearity and time dimension

Chosen approach: tree based Random Forest

Details

Used the “caret” package in R

  • Keeps track of train/test/validate data subsets automatically
  • Has method parRF, which is a fast, parallel thread CPU version of randomForst
  • Interfaced with Shiny
  • Public data version running on my server at http://ruxton.webhop.net:3839/fkotey/sdr

Code:

irFitTimeGLM <- train(PRICE_NOTATION ~ .,
                    data = irSort[trainSlices,],
                    method = "parRF",allowParallel = TRUE)
pred <- predict(irFitTimeGLM,irSort[testSlices,])
true <- irSort$PRICE_NOTATION[testSlices]
meanErr[k]=mean(pred-true)
meanPcnt[k]=mean(meanErr)/mean(true)
sdErr[k]=sd(pred-true)

Screenshot

plot of chunk unnamed-chunk-2