Forecast Competition

Popularity of David Cameron…

library(Quandl)

## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

Quandl.auth("y8y3ezt48WZeqUqq1yQd")
library(quantmod)

## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.

library(zoo)
library(forecast)

## Loading required package: timeDate
## This is forecast 5.6

library(knitr)
options(scipen=13)
setwd("/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Lecture-9")
dave03 <- read.csv("Dave_030215.csv",stringsAsFactors=F)
dave03$Week <- as.Date(substr(dave03$Week,1,10))
plot(dave03$Week,dave03$david.cameron,
     main="Google Search Popularity of David Cameron",
     ylab="Search popularity",xlab="Date",type="l",ylim=range(0,20))

dave03.ts <- ts(dave03$david.cameron,start=c(2004,1),freq=52)
tsdisplay(dave03.ts)

dave03.arima <- auto.arima(dave03.ts)
dave03.forc <- forecast(dave03.arima,h=8)
plot(dave03.forc,include=100)

dave03.forc$mean

## Time Series:
## Start = c(2015, 8) 
## End = c(2015, 15) 
## Frequency = 52 
## [1] 5.436670 5.195599 5.092435 5.048288 5.029395 5.021310 5.017851 5.016370

dave09 <- read.csv("Dave_090215.csv",stringsAsFactors=F)
tail(dave09,1)

##                        Week david.cameron
## 579 2015-02-01 - 2015-02-07             5

FTSE last week

ftse <- Quandl("YAHOO/INDEX_FTSE")
ftse <- ftse[order(ftse$Date),]
ftse.0 <- ftse[ftse$Date<as.Date("2015-02-06"),]
ftse.auto <- auto.arima(ftse$Close)
ftse.forc <- forecast(ftse.auto,h=50)
plot(ftse.forc,include=100)

ftse.forc$mean[seq(1,45,5)]

## [1] 6832.093 6836.908 6840.509 6844.105 6847.702 6851.299 6854.896 6858.492
## [9] 6862.089

ftse[ftse$Date==as.Date("2015-02-06"),]

##         Date   Open   High    Low  Close Volume Adjusted Close
## 3 2015-02-06 6865.9 6886.2 6835.5 6853.4      0         6853.4

Trade balance:

trade <- read.csv("trade_030215.csv",stringsAsFactors=F)
trade$date <- as.Date(substr(trade$DateTime,1,8),"%Y%m%d")
plot(trade$date,trade$Actual,type="o",ylab="Total Trade Balance (£)",xlab="Date")

trade.t <- ts(trade$Actual[order(trade$date)],start=c(2012,8),freq=12)
trade.auto <- auto.arima(trade.t)
trade.f <- forecast(trade.auto,h=1)
plot(trade.f)

trade.f$mean

##            Jan
## 2015 -90411765

trade09 <- read.csv("trade_090215.csv",stringsAsFactors=F)
head(trade09,1)

##            DateTime       Actual   Consensus    Previous
## 1 20150206 09:30:00 -10154000000 -9100000000 -9283000000

Next week’s challenge: CPI YoY.
- http://www.fxstreet.com/economic-calendar/event.aspx?id=3ac4e096-06c8-4981-b973-622269563b1f

cpi <- read.csv("cpi_090215.csv")
cpi.t <- ts(cpi$Actual[order(cpi$DateTime)],start=c(2007,1),freq=12)
cpi.t.a <- auto.arima(cpi.t)
cpi.t.f <- forecast(cpi.t.a,h=10)
plot(cpi.t.f,main="Forecasts of CPI Inflation",ylab="Inflation (%, YoY)",xlab="Date")

cpi.t.f$mean

##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 2015 0.5015949 0.6033607 0.6889686 0.7836768 0.9564424 0.8226495 0.9617996
##            Aug       Sep       Oct
## 2015 1.0601266 1.2608149 1.2895016

Submit forecast: http://goo.gl/forms/odKgSjyiU0

EC313 Section 2: Practical

Week 5:
- Tuesday: Introduction to R: Getting set up, syntax.
- Wednesday: Loading data, running regression models. BYOD.
Week 7:
- Tuesday: Introduction to R: Model Selection.
- Wednesday: Introduction to R: Forecasting.
Week 8:
- Tuesday: Class, review, BYOD.
- Wednesday: Midterm II.

R: What is it?

R is a powerful computer programming language.
R is free. open source, and widely used in academia and industry.
- Free: Put it on your computer!
- Open source: Anyone can write code (centrally vetted).
  - New procedures usually coded in R first.
- Widely used: Lots of online help.
  - Impressive skill to add to CV.

Lots of online help: Resources

Aim for Today: Familiarity

Step 1: Download and install.
1. R: http://cran.rstudio.com/
2. RStudio: http://www.rstudio.com/products/rstudio/download/
Step 2: Open RStudio!
- RStudio is freeware (i.e. free!) that makes R more user friendly.

What is R?

R is statistical programming language consisting of a base and thousands of packages.
R’s base has most functions required for basic data work.
Packages exist for more complex and novel routines. Examples:
- fpp package for textbook.
- gets packages for general-to-specific model selection.
- quantmod and Quandl for data downloading.

What is RStudio?

Freeware that makes R easier to use.
- Makes cold programming language little more like software you’ve previously used…
Four windows:
- Console window: Interactive coding and output.
- Source window: For editing source/batch code.
- Workspace/history:
  - Workspace: All data and values R has in memory.
  - History: List of all commands previously used.
- Files/Plots/Packages/Help:
  - Files: Windows explorer-type interface for opening files.
  - Plots: Graphic interface for plotting based on data in memory.
  - Packages: Interface to help finding and downloading new packages.
  - Help: Window to view help files for functions.

The Language of R (and programming)

A function: A set of commands collected together and called using one command.
- E.g. mean(x) calculates the mean of a data series x.
A package: A collection of commands collected together and released centrally by R.
An error: If you make a mistake (e.g. misspelling a command), R will produce an error.
- Easily the most frustrating aspect of programming.
- First step (if error message unclear): Google the error message.
- If nothing makes sense, email me with command used, error produced, and data being used.
The workspace: The memory (on your computer) into which you load datasets.
Brackets: Hugely important — be very careful with them.
- If you forget to close a bracket, get + prompt rather than > prompt.
- Press ESC to escape, if need be, to start again.
Help!
- Don’t be afraid to ask for help — use a question mark.
- E.g. ?mean

Basics of R

R is centred around objects that you create.
- E.g. a dataset loaded up is an object that you can manipulate.
More basically, R is a calculator:
- Typing 100*10/50 into the Console window yields 20.
Can give numbers a name in your workspace:
- E.g. a=100*10/50 creates a value a that can be later used.
- Note values can be respecified, so a = a + 10 makes programming sense.
- Usually better to create new values (e.g. b = a + 10) to avoid losing output.

Scalars, Vectors and Matrices

Scalar: Single number (0-dimensional).
- a was a scalar.
Vector: Row/column of numbers (1-dimensional), often referred to as array.
- Create using c() function (concatenate). E.g. c(3,4,5) yields 3, 4, 5.
Matrix: Table of numbers (2-dimensional).
- Creation later…

Functions

Functions are automated procedures:
- Usually constituting multiple lines of code.
For example, the mean of b, where b <- c(3,4,5) :
- Could type out (3+4+5)/3 but time consuming (esp. for larger matrices).
- Could use the sum command for sum(b)/3, as sum(b) yields 12.
- Or use mean(b) instead: 4.
More serious example: Random number generation.
- rnorm generates normal random variables.
- From ?rnorm, typing rnorm(10) generates 10 standard normal random variables.
- E.g. -0.757523, -0.7040399, 0.0964325, 2.6603142, 0.6652204, 0.0577479, 0.6418337, 0.0588863, 0.8258567, -0.7761527.

Plotting Data

R makes plotting data (fairly) easy. plot function very general.
- Other libraries exist for plots such as ggplot.
plot allows scatter plots and time plots.

xt <- rnorm(100)
plot(xt)

plot(xt,main="Scatter plot of random normal variables",pch=4,col="pink")

Scatter plot points:

Types of plot:
- “p” is points.
- “l” is lines.
- “b” is both, but “o” looks better.
- “h” is histogram bars.

plot(xt,main="Scatter plot of random normal variables",pch=4,col="pink",type="o")

Colours:

rnums <- data.frame(x=rnorm(50),y=rnorm(50),z=rnorm(50),w=rnorm(50),v=rnorm(50))
plot(rnums$x,pch=1,col=1)
lines(rnums$y,pch=2,col=2,type="p")
lines(rnums$z,pch=3,col=3,type="p")
lines(rnums$w,pch=4,col=4,type="p")
lines(rnums$v,pch=5,col=5,type="p")

Multiple lines need a legend…

plot(rnums$x,pch=1,col=1,ylab="Number")
lines(rnums$y,pch=2,col=2,type="p")
lines(rnums$z,pch=3,col=3,type="p")
lines(rnums$w,pch=4,col=4,type="p")
lines(rnums$v,pch=5,col=5,type="p")
legend("topleft",pch=1:5,col=1:5,legend=colnames(rnums),ncol=5)

Wider range, by name:
- Reference: Quick-R: “Graphical Parameters”

colours()

##   [1] "white"                "aliceblue"            "antiquewhite"        
##   [4] "antiquewhite1"        "antiquewhite2"        "antiquewhite3"       
##   [7] "antiquewhite4"        "aquamarine"           "aquamarine1"         
##  [10] "aquamarine2"          "aquamarine3"          "aquamarine4"         
##  [13] "azure"                "azure1"               "azure2"              
##  [16] "azure3"               "azure4"               "beige"               
##  [19] "bisque"               "bisque1"              "bisque2"             
##  [22] "bisque3"              "bisque4"              "black"               
##  [25] "blanchedalmond"       "blue"                 "blue1"               
##  [28] "blue2"                "blue3"                "blue4"               
##  [31] "blueviolet"           "brown"                "brown1"              
##  [34] "brown2"               "brown3"               "brown4"              
##  [37] "burlywood"            "burlywood1"           "burlywood2"          
##  [40] "burlywood3"           "burlywood4"           "cadetblue"           
##  [43] "cadetblue1"           "cadetblue2"           "cadetblue3"          
##  [46] "cadetblue4"           "chartreuse"           "chartreuse1"         
##  [49] "chartreuse2"          "chartreuse3"          "chartreuse4"         
##  [52] "chocolate"            "chocolate1"           "chocolate2"          
##  [55] "chocolate3"           "chocolate4"           "coral"               
##  [58] "coral1"               "coral2"               "coral3"              
##  [61] "coral4"               "cornflowerblue"       "cornsilk"            
##  [64] "cornsilk1"            "cornsilk2"            "cornsilk3"           
##  [67] "cornsilk4"            "cyan"                 "cyan1"               
##  [70] "cyan2"                "cyan3"                "cyan4"               
##  [73] "darkblue"             "darkcyan"             "darkgoldenrod"       
##  [76] "darkgoldenrod1"       "darkgoldenrod2"       "darkgoldenrod3"      
##  [79] "darkgoldenrod4"       "darkgray"             "darkgreen"           
##  [82] "darkgrey"             "darkkhaki"            "darkmagenta"         
##  [85] "darkolivegreen"       "darkolivegreen1"      "darkolivegreen2"     
##  [88] "darkolivegreen3"      "darkolivegreen4"      "darkorange"          
##  [91] "darkorange1"          "darkorange2"          "darkorange3"         
##  [94] "darkorange4"          "darkorchid"           "darkorchid1"         
##  [97] "darkorchid2"          "darkorchid3"          "darkorchid4"         
## [100] "darkred"              "darksalmon"           "darkseagreen"        
## [103] "darkseagreen1"        "darkseagreen2"        "darkseagreen3"       
## [106] "darkseagreen4"        "darkslateblue"        "darkslategray"       
## [109] "darkslategray1"       "darkslategray2"       "darkslategray3"      
## [112] "darkslategray4"       "darkslategrey"        "darkturquoise"       
## [115] "darkviolet"           "deeppink"             "deeppink1"           
## [118] "deeppink2"            "deeppink3"            "deeppink4"           
## [121] "deepskyblue"          "deepskyblue1"         "deepskyblue2"        
## [124] "deepskyblue3"         "deepskyblue4"         "dimgray"             
## [127] "dimgrey"              "dodgerblue"           "dodgerblue1"         
## [130] "dodgerblue2"          "dodgerblue3"          "dodgerblue4"         
## [133] "firebrick"            "firebrick1"           "firebrick2"          
## [136] "firebrick3"           "firebrick4"           "floralwhite"         
## [139] "forestgreen"          "gainsboro"            "ghostwhite"          
## [142] "gold"                 "gold1"                "gold2"               
## [145] "gold3"                "gold4"                "goldenrod"           
## [148] "goldenrod1"           "goldenrod2"           "goldenrod3"          
## [151] "goldenrod4"           "gray"                 "gray0"               
## [154] "gray1"                "gray2"                "gray3"               
## [157] "gray4"                "gray5"                "gray6"               
## [160] "gray7"                "gray8"                "gray9"               
## [163] "gray10"               "gray11"               "gray12"              
## [166] "gray13"               "gray14"               "gray15"              
## [169] "gray16"               "gray17"               "gray18"              
## [172] "gray19"               "gray20"               "gray21"              
## [175] "gray22"               "gray23"               "gray24"              
## [178] "gray25"               "gray26"               "gray27"              
## [181] "gray28"               "gray29"               "gray30"              
## [184] "gray31"               "gray32"               "gray33"              
## [187] "gray34"               "gray35"               "gray36"              
## [190] "gray37"               "gray38"               "gray39"              
## [193] "gray40"               "gray41"               "gray42"              
## [196] "gray43"               "gray44"               "gray45"              
## [199] "gray46"               "gray47"               "gray48"              
## [202] "gray49"               "gray50"               "gray51"              
## [205] "gray52"               "gray53"               "gray54"              
## [208] "gray55"               "gray56"               "gray57"              
## [211] "gray58"               "gray59"               "gray60"              
## [214] "gray61"               "gray62"               "gray63"              
## [217] "gray64"               "gray65"               "gray66"              
## [220] "gray67"               "gray68"               "gray69"              
## [223] "gray70"               "gray71"               "gray72"              
## [226] "gray73"               "gray74"               "gray75"              
## [229] "gray76"               "gray77"               "gray78"              
## [232] "gray79"               "gray80"               "gray81"              
## [235] "gray82"               "gray83"               "gray84"              
## [238] "gray85"               "gray86"               "gray87"              
## [241] "gray88"               "gray89"               "gray90"              
## [244] "gray91"               "gray92"               "gray93"              
## [247] "gray94"               "gray95"               "gray96"              
## [250] "gray97"               "gray98"               "gray99"              
## [253] "gray100"              "green"                "green1"              
## [256] "green2"               "green3"               "green4"              
## [259] "greenyellow"          "grey"                 "grey0"               
## [262] "grey1"                "grey2"                "grey3"               
## [265] "grey4"                "grey5"                "grey6"               
## [268] "grey7"                "grey8"                "grey9"               
## [271] "grey10"               "grey11"               "grey12"              
## [274] "grey13"               "grey14"               "grey15"              
## [277] "grey16"               "grey17"               "grey18"              
## [280] "grey19"               "grey20"               "grey21"              
## [283] "grey22"               "grey23"               "grey24"              
## [286] "grey25"               "grey26"               "grey27"              
## [289] "grey28"               "grey29"               "grey30"              
## [292] "grey31"               "grey32"               "grey33"              
## [295] "grey34"               "grey35"               "grey36"              
## [298] "grey37"               "grey38"               "grey39"              
## [301] "grey40"               "grey41"               "grey42"              
## [304] "grey43"               "grey44"               "grey45"              
## [307] "grey46"               "grey47"               "grey48"              
## [310] "grey49"               "grey50"               "grey51"              
## [313] "grey52"               "grey53"               "grey54"              
## [316] "grey55"               "grey56"               "grey57"              
## [319] "grey58"               "grey59"               "grey60"              
## [322] "grey61"               "grey62"               "grey63"              
## [325] "grey64"               "grey65"               "grey66"              
## [328] "grey67"               "grey68"               "grey69"              
## [331] "grey70"               "grey71"               "grey72"              
## [334] "grey73"               "grey74"               "grey75"              
## [337] "grey76"               "grey77"               "grey78"              
## [340] "grey79"               "grey80"               "grey81"              
## [343] "grey82"               "grey83"               "grey84"              
## [346] "grey85"               "grey86"               "grey87"              
## [349] "grey88"               "grey89"               "grey90"              
## [352] "grey91"               "grey92"               "grey93"              
## [355] "grey94"               "grey95"               "grey96"              
## [358] "grey97"               "grey98"               "grey99"              
## [361] "grey100"              "honeydew"             "honeydew1"           
## [364] "honeydew2"            "honeydew3"            "honeydew4"           
## [367] "hotpink"              "hotpink1"             "hotpink2"            
## [370] "hotpink3"             "hotpink4"             "indianred"           
## [373] "indianred1"           "indianred2"           "indianred3"          
## [376] "indianred4"           "ivory"                "ivory1"              
## [379] "ivory2"               "ivory3"               "ivory4"              
## [382] "khaki"                "khaki1"               "khaki2"              
## [385] "khaki3"               "khaki4"               "lavender"            
## [388] "lavenderblush"        "lavenderblush1"       "lavenderblush2"      
## [391] "lavenderblush3"       "lavenderblush4"       "lawngreen"           
## [394] "lemonchiffon"         "lemonchiffon1"        "lemonchiffon2"       
## [397] "lemonchiffon3"        "lemonchiffon4"        "lightblue"           
## [400] "lightblue1"           "lightblue2"           "lightblue3"          
## [403] "lightblue4"           "lightcoral"           "lightcyan"           
## [406] "lightcyan1"           "lightcyan2"           "lightcyan3"          
## [409] "lightcyan4"           "lightgoldenrod"       "lightgoldenrod1"     
## [412] "lightgoldenrod2"      "lightgoldenrod3"      "lightgoldenrod4"     
## [415] "lightgoldenrodyellow" "lightgray"            "lightgreen"          
## [418] "lightgrey"            "lightpink"            "lightpink1"          
## [421] "lightpink2"           "lightpink3"           "lightpink4"          
## [424] "lightsalmon"          "lightsalmon1"         "lightsalmon2"        
## [427] "lightsalmon3"         "lightsalmon4"         "lightseagreen"       
## [430] "lightskyblue"         "lightskyblue1"        "lightskyblue2"       
## [433] "lightskyblue3"        "lightskyblue4"        "lightslateblue"      
## [436] "lightslategray"       "lightslategrey"       "lightsteelblue"      
## [439] "lightsteelblue1"      "lightsteelblue2"      "lightsteelblue3"     
## [442] "lightsteelblue4"      "lightyellow"          "lightyellow1"        
## [445] "lightyellow2"         "lightyellow3"         "lightyellow4"        
## [448] "limegreen"            "linen"                "magenta"             
## [451] "magenta1"             "magenta2"             "magenta3"            
## [454] "magenta4"             "maroon"               "maroon1"             
## [457] "maroon2"              "maroon3"              "maroon4"             
## [460] "mediumaquamarine"     "mediumblue"           "mediumorchid"        
## [463] "mediumorchid1"        "mediumorchid2"        "mediumorchid3"       
## [466] "mediumorchid4"        "mediumpurple"         "mediumpurple1"       
## [469] "mediumpurple2"        "mediumpurple3"        "mediumpurple4"       
## [472] "mediumseagreen"       "mediumslateblue"      "mediumspringgreen"   
## [475] "mediumturquoise"      "mediumvioletred"      "midnightblue"        
## [478] "mintcream"            "mistyrose"            "mistyrose1"          
## [481] "mistyrose2"           "mistyrose3"           "mistyrose4"          
## [484] "moccasin"             "navajowhite"          "navajowhite1"        
## [487] "navajowhite2"         "navajowhite3"         "navajowhite4"        
## [490] "navy"                 "navyblue"             "oldlace"             
## [493] "olivedrab"            "olivedrab1"           "olivedrab2"          
## [496] "olivedrab3"           "olivedrab4"           "orange"              
## [499] "orange1"              "orange2"              "orange3"             
## [502] "orange4"              "orangered"            "orangered1"          
## [505] "orangered2"           "orangered3"           "orangered4"          
## [508] "orchid"               "orchid1"              "orchid2"             
## [511] "orchid3"              "orchid4"              "palegoldenrod"       
## [514] "palegreen"            "palegreen1"           "palegreen2"          
## [517] "palegreen3"           "palegreen4"           "paleturquoise"       
## [520] "paleturquoise1"       "paleturquoise2"       "paleturquoise3"      
## [523] "paleturquoise4"       "palevioletred"        "palevioletred1"      
## [526] "palevioletred2"       "palevioletred3"       "palevioletred4"      
## [529] "papayawhip"           "peachpuff"            "peachpuff1"          
## [532] "peachpuff2"           "peachpuff3"           "peachpuff4"          
## [535] "peru"                 "pink"                 "pink1"               
## [538] "pink2"                "pink3"                "pink4"               
## [541] "plum"                 "plum1"                "plum2"               
## [544] "plum3"                "plum4"                "powderblue"          
## [547] "purple"               "purple1"              "purple2"             
## [550] "purple3"              "purple4"              "red"                 
## [553] "red1"                 "red2"                 "red3"                
## [556] "red4"                 "rosybrown"            "rosybrown1"          
## [559] "rosybrown2"           "rosybrown3"           "rosybrown4"          
## [562] "royalblue"            "royalblue1"           "royalblue2"          
## [565] "royalblue3"           "royalblue4"           "saddlebrown"         
## [568] "salmon"               "salmon1"              "salmon2"             
## [571] "salmon3"              "salmon4"              "sandybrown"          
## [574] "seagreen"             "seagreen1"            "seagreen2"           
## [577] "seagreen3"            "seagreen4"            "seashell"            
## [580] "seashell1"            "seashell2"            "seashell3"           
## [583] "seashell4"            "sienna"               "sienna1"             
## [586] "sienna2"              "sienna3"              "sienna4"             
## [589] "skyblue"              "skyblue1"             "skyblue2"            
## [592] "skyblue3"             "skyblue4"             "slateblue"           
## [595] "slateblue1"           "slateblue2"           "slateblue3"          
## [598] "slateblue4"           "slategray"            "slategray1"          
## [601] "slategray2"           "slategray3"           "slategray4"          
## [604] "slategrey"            "snow"                 "snow1"               
## [607] "snow2"                "snow3"                "snow4"               
## [610] "springgreen"          "springgreen1"         "springgreen2"        
## [613] "springgreen3"         "springgreen4"         "steelblue"           
## [616] "steelblue1"           "steelblue2"           "steelblue3"          
## [619] "steelblue4"           "tan"                  "tan1"                
## [622] "tan2"                 "tan3"                 "tan4"                
## [625] "thistle"              "thistle1"             "thistle2"            
## [628] "thistle3"             "thistle4"             "tomato"              
## [631] "tomato1"              "tomato2"              "tomato3"             
## [634] "tomato4"              "turquoise"            "turquoise1"          
## [637] "turquoise2"           "turquoise3"           "turquoise4"          
## [640] "violet"               "violetred"            "violetred1"          
## [643] "violetred2"           "violetred3"           "violetred4"          
## [646] "wheat"                "wheat1"               "wheat2"              
## [649] "wheat3"               "wheat4"               "whitesmoke"          
## [652] "yellow"               "yellow1"              "yellow2"             
## [655] "yellow3"              "yellow4"              "yellowgreen"

colours()[grep("pink",colours())]

##  [1] "deeppink"   "deeppink1"  "deeppink2"  "deeppink3"  "deeppink4" 
##  [6] "hotpink"    "hotpink1"   "hotpink2"   "hotpink3"   "hotpink4"  
## [11] "lightpink"  "lightpink1" "lightpink2" "lightpink3" "lightpink4"
## [16] "pink"       "pink1"      "pink2"      "pink3"      "pink4"

Challenge: Redo earlier graph (rnums) with random colours.
A bit sad, but…

x<-seq(2,7,by=.02)
y<-x+.25*x^3+rnorm(length(x),0,10)
x2<-seq(3,5,by=0.06)
y2<-22+10*x2+rnorm(length(x2),0,6)
plot(x,y,pch='~',main='Plot of Village by a River',col='blue',cex=2,cex.main=2,cex.lab=1.5)
points(x2,y2,pch='^',col='red',cex=2)

Finally (for now), histograms:

hist(xt)

Best Practice and RStudio

It is advisable to keep a file with important commands for projects in.
- Although history exists, that contains all commands, even wrong ones.
Better to start an R script file containing commands you need:
1. To open data.
2. To manipulate data.
3. To plot data and create graphical objects.
4. To run regression models.
Additional awesomeness in RStudio:
- R Markdown: Very simple way to create all sorts of documents based on R.
- E.g. these slides, basic web pages containing output from code.

Data Structures

Ways of storing data in R matter.
- Vectors
- Matrices
- Data frames
- Lists
- Specialised time-series objects

Vectors

We can do a lot with vectors.

vec1 <- c(4,2,78,28,2)
vec1

## [1]  4  2 78 28  2

vec1[4]

## [1] 28

vec1[3] = 12
vec1

## [1]  4  2 12 28  2

vec2 <- seq(from=0,to=1,by=0.1)
vec2

##  [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

vec3 <- seq(0,1,0.1)
vec3

##  [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

sum(vec1)

## [1] 48

vec2+vec3

##  [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

Often useful when manipulating data.
- E.g. see later when constructing rankings.
- Also when constructing lags and differences.

Matrices

A matrix is a two-dimensional array of data.
Mathematically: \[ \Pi = \left(\begin{array}{ccc}\pi_{11} & \pi_{12} & \pi_{13}\\\pi_{21} & \pi_{22} & \pi_{23}\end{array}\right) \]
Inserting numbers, e.g.: \[ \Pi = \left(\begin{array}{ccc}3 & 1 & 6\\7 & 2 & 4\end{array}\right) \]
In R…

mat <- matrix(data=c(3,1,6,7,2,4),ncol=3)
mat

##      [,1] [,2] [,3]
## [1,]    3    6    2
## [2,]    1    7    4

mat[2,3]

## [1] 4

mat[2,]

## [1] 1 7 4

mat[,3]

## [1] 2 4

mean(mat)

## [1] 3.833333

Data Frames

Datasets are commonly organised as data frames.
- Basically a matrix with column names.

df <- data.frame("person"=1:5,"age"=c(14,35,24,51,3),"weight"=c(55,75,60,80,15))
df

##   person age weight
## 1      1  14     55
## 2      2  35     75
## 3      3  24     60
## 4      4  51     80
## 5      5   3     15

df$age

## [1] 14 35 24 51  3

mean(df$weight)

## [1] 57

Loading Data

Mainly load data into data frames.
read.csv() command simplest for the purpose:
- But need files in spreadsheets, saved as csv.
- Saved with single cells for variable names, variables in columns.
read.table() is more general variant.
read.xlsx() exists, but is fiddly (worksheets and spelling them etc).
Packages exist to enable direct downloading of data:
- Ensures data always up to date.
- Removes one step in data project.
Quandl package:
- Should sign up for Quandl account to avoid warning message.

cpi <- Quandl("UKONS/MM23_D7BT_M") #UK CPI data
plot(cpi$Date,cpi$Value,type="l")

- quantmod package: + Download from Yahoo Finance, Google Finance, Fred and others.

getSymbols("YHOO",src="google") # from google finance

##     As of 0.4-0, 'getSymbols' uses env=parent.frame() and
##  auto.assign=TRUE by default.
## 
##  This  behavior  will be  phased out in 0.5-0  when the call  will
##  default to use auto.assign=FALSE. getOption("getSymbols.env") and 
##  getOptions("getSymbols.auto.assign") are now checked for alternate defaults
## 
##  This message is shown once per session and may be disabled by setting 
##  options("getSymbols.warning4.0"=FALSE). See ?getSymbol for more details

## [1] "YHOO"

plot(YHOO)

## Warning in plot.xts(YHOO): only the univariate series will be plotted

getSymbols("GOOG",src="yahoo") # from yahoo finance

## [1] "GOOG"

plot(GOOG)

## Warning in plot.xts(GOOG): only the univariate series will be plotted

getSymbols("DEXJPUS",src="FRED") # FX rates from FRED

## [1] "DEXJPUS"

plot(DEXJPUS)

Lists

More flexible than data frames:
- Allow different elements to be different shapes and sizes.

L <- list(some=rep(10,10),loads=rnorm(100),few=c(5,2,1))
L

## $some
##  [1] 10 10 10 10 10 10 10 10 10 10
## 
## $loads
##   [1]  1.940483302 -0.530228074 -0.007000273  1.125705004  0.861079623
##   [6] -1.228192937  1.247321590  1.066988058  0.276167988 -1.139004478
##  [11] -0.766858462 -1.740075458  0.347018015  0.899099738 -0.463321940
##  [16]  0.859686890 -0.110916436  1.206734051  0.178537375 -2.109497563
##  [21]  1.771985732 -0.737970101 -1.191087245 -1.601451476  0.301214567
##  [26] -0.341526024  0.090501267 -0.432785116 -1.432635415 -0.752748846
##  [31]  0.491984917  0.774979263 -0.704949820 -0.968474658 -1.379463180
##  [36]  0.354956001  0.575155455 -0.907711249 -0.924924393  0.096493377
##  [41] -0.440367684  0.079522338  1.128982727 -0.197281960  1.706570182
##  [46]  0.421466152  0.277162637  0.818629392  0.890684100  0.169291206
##  [51] -0.835707784 -1.084673093  0.109420124 -1.403637379 -1.109292043
##  [56] -0.541588185 -1.012795004 -1.633326169 -1.596925172  2.783458464
##  [61] -0.590140829  0.058860715 -0.716817377 -1.058080914  0.148069259
##  [66] -0.782639325 -0.550971202 -0.504858722  0.098528235  0.058069652
##  [71] -0.708511082  0.942307031  0.800899426 -0.766255767  1.485849166
##  [76] -0.689603770 -1.237895330  0.383032400 -0.046925825  0.600963557
##  [81]  0.241653724  1.084610507 -0.982266901 -2.208240598  1.557096116
##  [86] -0.138823297  0.722630517  0.854933364  0.246394306 -0.508719749
##  [91] -0.694801044  0.757515083 -1.011761635  0.576484479  0.149412712
##  [96]  0.537440416 -1.243343705  1.863930546  0.703049816  1.440098555
## 
## $few
## [1] 5 2 1

L$loads

##   [1]  1.940483302 -0.530228074 -0.007000273  1.125705004  0.861079623
##   [6] -1.228192937  1.247321590  1.066988058  0.276167988 -1.139004478
##  [11] -0.766858462 -1.740075458  0.347018015  0.899099738 -0.463321940
##  [16]  0.859686890 -0.110916436  1.206734051  0.178537375 -2.109497563
##  [21]  1.771985732 -0.737970101 -1.191087245 -1.601451476  0.301214567
##  [26] -0.341526024  0.090501267 -0.432785116 -1.432635415 -0.752748846
##  [31]  0.491984917  0.774979263 -0.704949820 -0.968474658 -1.379463180
##  [36]  0.354956001  0.575155455 -0.907711249 -0.924924393  0.096493377
##  [41] -0.440367684  0.079522338  1.128982727 -0.197281960  1.706570182
##  [46]  0.421466152  0.277162637  0.818629392  0.890684100  0.169291206
##  [51] -0.835707784 -1.084673093  0.109420124 -1.403637379 -1.109292043
##  [56] -0.541588185 -1.012795004 -1.633326169 -1.596925172  2.783458464
##  [61] -0.590140829  0.058860715 -0.716817377 -1.058080914  0.148069259
##  [66] -0.782639325 -0.550971202 -0.504858722  0.098528235  0.058069652
##  [71] -0.708511082  0.942307031  0.800899426 -0.766255767  1.485849166
##  [76] -0.689603770 -1.237895330  0.383032400 -0.046925825  0.600963557
##  [81]  0.241653724  1.084610507 -0.982266901 -2.208240598  1.557096116
##  [86] -0.138823297  0.722630517  0.854933364  0.246394306 -0.508719749
##  [91] -0.694801044  0.757515083 -1.011761635  0.576484479  0.149412712
##  [96]  0.537440416 -1.243343705  1.863930546  0.703049816  1.440098555

names(L)

## [1] "some"  "loads" "few"

Other Structures

Explicitly time-series structures exist:
- Helpful for creating lags, differences, etc.
ts creates a time series object:
- Most forecast package commands need ts arranged data.

cpi.t <- ts(cpi$Value[order(cpi$Date)],start=c(1988,6),frequency=12)
plot(cpi.t)

zoo is another time-series related package:
- More flexible for daily data.
- gets and isat require zoo arranged data.

dex.t <- zoo(DEXJPUS,order.by=index(DEXJPUS))
plot(dex.t)

xts also exists, but we won’t use this much.

Not Available Data

Often you will find NA entries in dataset.
- NA means “not available”.
Could be caused by data not being released yet.
- Or due to a calculation error.

xtr <- sqrt(xt)

## Warning in sqrt(xt): NaNs produced

Need to be careful about NAs:
- Any calculation involving an NA is NA.

mean(xtr)

## [1] NaN

mean(xtr,na.rm=T)

## [1] 0.8202736

mean(xtr[is.na(xtr)==FALSE])

## [1] 0.8202736

Data Classes

Thus far, all numeric.
Data often more qualitative:
- Names, colours, models, makes, etc.
- We represent such data in strings.
Three primary classes of data in R:
- Numeric (already seen).
- Character (strings).
- POSIC (dates, times, combinations thereof).

Character Data

Put something between "s to tell R it’s string.

name <- "James Reade"
name

## [1] "James Reade"

#name <- James Reade #(this produces an error if commented in)

Computations not possible:
- But can use categorical data in regressions.

degree.class <- c("1","2:1","2:2","3","Fail")
degree.class

## [1] "1"    "2:1"  "2:2"  "3"    "Fail"

Dates

Dates complicated, but essential for time series and forecasting.
Examples:

head(cpi$Date)

## [1] "2014-12-31" "2014-11-30" "2014-10-31" "2014-09-30" "2014-08-31"
## [6] "2014-07-31"

head(dave09$Week)

## [1] "2004-01-04 - 2004-01-10" "2004-01-11 - 2004-01-17"
## [3] "2004-01-18 - 2004-01-24" "2004-01-25 - 2004-01-31"
## [5] "2004-02-01 - 2004-02-07" "2004-02-08 - 2004-02-14"

head(trade09$DateTime)

## [1] "20150206 09:30:00" "20150109 09:30:00" "20141210 09:30:00"
## [4] "20141107 09:30:00" "20141010 08:30:00" "20140909 08:30:00"

What we do depends on what we wish to do with data.
If R recognises variable as date, all is well.
- We need dates to order series properly.
- But is character.

str(cpi$Date)

##  Date[1:324], format: "2014-12-31" "2014-11-30" "2014-10-31" "2014-09-30" ...

str(dave09$Week)

##  chr [1:579] "2004-01-04 - 2004-01-10" "2004-01-11 - 2004-01-17" ...

str(trade09$DateTime)

##  chr [1:91] "20150206 09:30:00" "20150109 09:30:00" ...

dave09$date <- substr(dave09$Week,1,10)
dave09$date <- as.Date(dave09$date)
str(dave09$date)

##  Date[1:579], format: "2004-01-04" "2004-01-11" "2004-01-18" "2004-01-25" ...

trade09$date <- substr(trade09$DateTime,1,8)
trade09$date <- as.Date(trade09$date,"%Y%m%d")
str(trade09$date)

##  Date[1:91], format: "2015-02-06" "2015-01-09" "2014-12-10" "2014-11-07" ...

Numerical

Sometimes numerical data will be loaded up as character data.
- Check by looking in Environment tab or using str function.
Can use as.numeric() function to coerce data to be numerical.
as. functions are class of functions to coerce data to format you want.
- Perhaps most useful is as.Date(), as already seen.

The If Statement

A conditional statement:
- Carry out command if condition holds.
E.g.: Exit the building if fire alarm sounds.

r="great"
if(r=="great") {
  print("everything is great")
}else{
  print("everything is not great")
}

## [1] "everything is great"

Particularly important when manipulating data.
- More subtle variant:

dave09$prime.minister <- dave09$date>as.Date("2010-05-11")

Better, for regression purposes:

dave09$prime.minister <- as.numeric(dave09$date>as.Date("2010-05-11"))

We can have multiple conditions:
- E.g. student studies English AND Maths at A-level: if (subject.English==1 & subject.Maths==1)
- E.g. student studies English OR Maths at A-level: if (subject.English==1 | subject.Maths==1)
Be careful when combining statements:
- Q: Is it a boy or a girl?
- A: Yes.

For Loops

Instead of writing out repetitive commands, can use for loop.
Loop repeats command within loop as often as specify.

for(i in 1:10) {
  print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

E.g. plotting graphs earlier of rnums.

plot(rnums$x,pch=1,col=1)
lines(rnums$y,pch=2,col=2,type="p")
lines(rnums$z,pch=3,col=3,type="p")
lines(rnums$w,pch=4,col=4,type="p")
lines(rnums$v,pch=5,col=5,type="p")

Instead create a for loop to make this less burdensome (and tidier):

plot(rnums$x,pch=1,col=1)
for(i in 2:5) {
  lines(rnums[,i],pch=i,col=i,type="p")
}

Tips

Always print things out.
- Especially if doing loops, or anything complicated.
Copying and pasting is a great way to learn.
Most useful functions:
- table: easy way to understand basic patterns in data.
- grep/regexpr: important for finding particular patterns in text.
Get practice writing out code — more efficient.

R and Forecasting League Table

#load data
wk1 <- read.csv("Wk1.csv",stringsAsFactors=F)
wk1$Wk <- 1
wk1$X <- NULL
wk1 <- wk1[!(wk1$Timestamp=="15/01/2015 16:51:56" & wk1$Name=="James Reade"),]
wk2 <- read.csv("Wk2.csv",stringsAsFactors=F)
wk2$Wk <- 2
wk3 <- read.csv("Wk3.csv",stringsAsFactors=F)
wk3$Wk <- 3
wk4 <- read.csv("Wk4.csv",stringsAsFactors=F)
wk4$Wk <- 4

#want to combine data so first collect all variable names
total.cols <- union(union(colnames(wk1),colnames(wk2)),union(colnames(wk3),colnames(wk4)))
#next add empty columns to datasets with variable names
for(i in setdiff(total.cols,colnames(wk1))) {wk1[[i]] <- NA}
for(i in setdiff(total.cols,colnames(wk2))) {wk2[[i]] <- NA}
for(i in setdiff(total.cols,colnames(wk3))) {wk3[[i]] <- NA}
for(i in setdiff(total.cols,colnames(wk4))) {wk4[[i]] <- NA}
forcs <- rbind(wk1,wk2,wk3,wk4)
forcs <- forcs[forcs$Name!="Outcome" & forcs$Name!="",]
forcs$Name <- gsub("Desiati Cosimo","Cosimo Desiati",forcs$Name)

#outcomes
outcomes <- data.frame("temp"=0,stringsAsFactors=F)
for(i in total.cols) {outcomes[[i]] <- NA}
outcomes$temp <- NULL
outcomes$What.value.will.FTSE.close.at.on.Friday.January.16. <- ftse$Close[ftse$Date==as.Date("2015-01-16")]
outcomes$What.value.will.the.FTSE.close.at.on.Friday.January.23. <- ftse$Close[ftse$Date==as.Date("2015-01-23")]
outcomes$What.value.will.the.FTSE.close.at.on.Friday.January.30. <- ftse$Close[ftse$Date==as.Date("2015-01-30")]
outcomes$What.value.will.the.FTSE.close.at.on.Friday.February.6. <- ftse$Close[ftse$Date==as.Date("2015-02-06")]
outcomes$What.will.be.the.relative.search.volume.for.David.Cameron.in.the.week.commencing.January.11. <- 
  dave09$david.cameron[dave09$Week=="2015-01-11 - 2015-01-17"]
outcomes$What.will.be.the.relative.search.volume.for.David.Cameron.in.the.week.commencing.January.18. <- 
  dave09$david.cameron[dave09$Week=="2015-01-18 - 2015-01-24"]
outcomes$What.will.be.the.relative.search.volume.for.David.Cameron.in.the.week.commencing.January.25. <- 
  dave09$david.cameron[dave09$Week=="2015-01-25 - 2015-01-31"]
outcomes$What.will.be.the.relative.search.volume.for.David.Cameron.in.the.week.commencing.February.1. <- 
  dave09$david.cameron[dave09$Week=="2015-02-01 - 2015-02-07"]
gdp09 <- read.csv("gdp09.csv",stringsAsFactors=F)
outcomes$What.will.GDP.growth..QoQ..be.for.2014Q4. <- gdp09$Actual[gdp09$DateTime=="20150127 09:30:00"]
outcomes$What.will.the.total.trade.balance.be.for.December.2014. <- 
  trade09$Actual[trade09$DateTime=="20150206 09:30:00"]
mgage09 <- read.csv("mortgage_090215.csv",stringsAsFactors=F)
outcomes$What.will.mortgage.approvals.be.for.December.2014. <- mgage09$Actual[mgage09$DateTime=="20150130 09:30:00"]

#first create forecast absolute % errors
abspc <- data.frame("Wk"=forcs$Wk,"Name"=forcs$Name)
for(i in colnames(outcomes)[grep("What.",colnames(outcomes))]) {
  abspc[[i]] <- abs((forcs[,i] - outcomes[,i])/outcomes[,i])
}

#first metric: APE to 1DP
#first create matrix with column per name and forecasts down columns
ppl <- data.frame("temp"=rep(NA,4*NROW(grep("What.",colnames(abspc)))))
names <- abspc$Name[duplicated(abspc$Name)==F]
for(i in names) {
  person.forcs <- abspc[abspc$Name==i,c(1,grep("What.",colnames(abspc)))]
  full.person.forcs <- c()
  for(j in 1:4) {
    if(j %in% person.forcs$Wk) {
      full.person.forcs <- cbind(full.person.forcs,
                                 t(as.numeric(person.forcs[person.forcs$Wk==j,
                                                           grep("What.",colnames(person.forcs))])))
    } else {
      full.person.forcs <- cbind(full.person.forcs,t(rep(NA,NROW(grep("What.",colnames(person.forcs))))))
    }    
  }
  ppl[[i]] <- t(full.person.forcs)
}
ppl$temp <- NULL
mape <- round(colMeans(ppl,na.rm=T),1)
#tie-breaker: number of forecasts
no.forcs <- colSums(is.na(ppl)==F)
#table
lge.tab <- data.frame("Name"=colnames(ppl),"MAPE"=as.numeric(mape),"No. Forecasts"=as.numeric(no.forcs),
                      "Adjusted"=as.numeric(mape)-as.numeric(no.forcs)/2000)
lge.tab <- lge.tab[order(lge.tab$Adjusted),]
kable(lge.tab[1:10,])

	Name	MAPE	No..Forecasts	Adjusted
6	James Reade	0.1	23	0.0885
8	Lloyd Morrish-Thomas	0.3	20	0.2900
12	Adam Shermon	0.3	5	0.2975
4	Will Colwell	0.4	11	0.3945
16	Chris Tucker	0.4	8	0.3960
17	Luke	0.4	2	0.3990
19	Matthew	0.5	3	0.4985
3	Neil Chandratreya	0.5	2	0.4990
13	Marios H	0.5	2	0.4990
18	JMW	0.5	2	0.4990

EC313 Lecture, Week 5

Introduction

Help Needed

Don’t Forget That Free Coffee…

Forecast Competition

EC313 Section 2: Practical

R: What is it?

Lots of online help: Resources

Aim for Today: Familiarity

What is R?

What is RStudio?

The Language of R (and programming)

Basics of R

Scalars, Vectors and Matrices

Functions

Plotting Data

Best Practice and RStudio

Data Structures

Vectors

Matrices

Data Frames

Loading Data

Lists

Other Structures

Not Available Data

Data Classes

Character Data

Dates

Numerical

The If Statement

For Loops

Tips

R and Forecasting League Table