Data 607 - Project #1: WTI - Moving Average Crossover Strategy

Silverio J. Vasquez
Sept. 26th, 2017

Goal of R Script

This R script goes to the EIA website and scrapes the daily WTI prices. A weekly average is calculated, then 13-,26-, and 52-week averages are calculated for a simple moving avreage crossover strategy.

Then a long/short signal is created if the weekly average is above/below the 52-week average +/- 1 standard deviation.

EIA Website

https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=RCLC1&f=D

Using the 'rvest' package, I extracted the necessary fields from the webpage.

The 'rvest' package was by Hadley Wickham - the father of tidy data and creater of many of the tidy data packages we're learning about.

The main two functions from 'rvest' that allowed me to scrape the webpage fairly quickly and easily were 'html_nodes' and 'html_text'.

Using rvest to read HTML

First read all the html for the website. The output isn't very intuitive, so I won't show it. But essentially, all the HTML (tags and all) are now saved in the 'wb' variable.

# Website URL
url <- 'https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=RCLC1&f=D'

# Read website HTML
wb <- read_html(url)

Using rvest to scrape table headers pt.1

After analyzing the html, I realized that the table headers were surrounded by the 'B6' CSS class. I used the function html_nodes passing the html and '.B6' CSS class as parameters. I saved the output as 'wti_html'.

# Extract B6 CSS class
wti_html <- html_nodes(wb,'.B6')
head(wti_html)

{xml_nodeset (6)}
[1] <td class="B6">  1983 Apr- 4 to Apr- 8</td>
[2] <td class="B6">  1983 Apr-11 to Apr-15</td>
[3] <td class="B6">  1983 Apr-18 to Apr-22</td>
[4] <td class="B6">  1983 Apr-25 to Apr-29</td>
[5] <td class="B6">  1983 May- 2 to May- 6</td>
[6] <td class="B6">  1983 May- 9 to May-13</td>

Using rvest to scrape table headers pt.2

Then I passed 'wit_html' into the html_text function to remove all the html tags from the content, into str_trim to remove white space, and into str_split_fixed to separate the year from the week.

wti_text <- html_text(wti_html)
wti_text <- str_trim(wti_text)
year_week <- str_split_fixed(wti_text," ",2)
head(year_week)

     [,1]   [,2]              
[1,] "1983" "Apr- 4 to Apr- 8"
[2,] "1983" "Apr-11 to Apr-15"
[3,] "1983" "Apr-18 to Apr-22"
[4,] "1983" "Apr-25 to Apr-29"
[5,] "1983" "May- 2 to May- 6"
[6,] "1983" "May- 9 to May-13"

Using rvest to scrape table prices pt.1

Using the same process, I extracted the oil prices after noticing the prices were always surrounded by the 'B3' CSS class and saved the output to 'price_html'.

# Extract B3 CSS class
price_html <- html_nodes(wb,'.B3')
head(price_html)

{xml_nodeset (6)}
[1] <td class="B3">29.44</td>
[2] <td class="B3">29.71</td>
[3] <td class="B3">29.92</td>
[4] <td class="B3">30.17</td>
[5] <td class="B3">30.38</td>
[6] <td class="B3">30.26</td>

Using rvest to scrape table prices pt.2

Then I passed 'price_html' into the html_text function to remove all the html tags from the data and as.numeric to convert it to numbers.

price_text <- html_text(price_html)
price_text <- as.numeric(price_text)
head(price_text)

[1] 29.44 29.71 29.92 30.17 30.38 30.26

Convert scraped data into dataframe

df <- as.data.frame(matrix(price_text,ncol = 5,byrow = T))
df <- cbind(year_week[,1],year_week[,2],df)
names(df) <- c('year','week','mon','tues','wed','thurs','fri')
head(df)

  year             week   mon  tues   wed thurs   fri
1 1983 Apr- 4 to Apr- 8 29.44 29.71 29.92 30.17 30.38
2 1983 Apr-11 to Apr-15 30.26 30.83 30.82 30.67 30.48
3 1983 Apr-18 to Apr-22 30.75 30.75 30.70 30.68 30.75
4 1983 Apr-25 to Apr-29 30.84 30.71 30.78 30.74 30.63
5 1983 May- 2 to May- 6 30.61 30.50 30.42 30.20 30.45
6 1983 May- 9 to May-13 30.10 30.31 29.97 29.72 29.80

Extract latest date for prices

date <- df$week[dim(df)[1]]
date <- str_split_fixed(date," ",3)[3]
year <- str_trim(df$year[dim(df)[1]])
mon <- str_extract_all(date,"[[:alpha:]]+")
day <- str_extract_all(date,"[0-9]+")
print(date)

[1] "Sep-22"

print(mon)

[[1]]
[1] "Sep"

print(day)

[[1]]
[1] "22"

Define function to convert month to number

    mon_convert <- function(x){
            counter = 1
            limit = 13
            mon_az <- c("jan","feb","mar","apr","may","jun","jul","aug",
                    "sep","oct","nov","dec")

            while (counter < limit){
                    if (tolower(x) == mon_az[counter]) break
                    counter = counter + 1
            }
            if (counter > 12) return(0)
            return(counter)
    }

Convert month to number

# Use mon_convert function to convert 3 character month into an integer
print(mon)

[[1]]
[1] "Sep"

mon_n <- mon_convert(mon)
print(mon_n)

[1] 9

date_str <- paste0(year,"/",mon_n,"/",day)

print(paste0('Latest price as of ', date_str))

[1] "Latest price as of 2017/9/22"

# Create a vector of weekly dates (each Friday)
idx <- seq(as.Date('1983/4/8'), as.Date(date_str),"week")

Calculate moving averages for WTI prices and 52 week standard deviation

# calculate average across week days
df$wkavg <- apply(df[,3:7],1,mean,na.rm = TRUE)

# calculate rolling averages for 12-,26-, and 52-weeks
df$wkavg13 <- rollmean(df$wkavg,13,fill=NA,align='right')
df$wkavg26 <- rollmean(df$wkavg,26,fill=NA,align='right')
df$wkavg52 <- rollmean(df$wkavg,52,fill=NA,align='right')

# calculate 52-week standard deviation using weekly average oil price
df$wkstd52 <- rollapply(df$wkavg,52,FUN=sd,fill=NA,align='right')

# Show bottom of dataframe
tail(df)

     year             week   mon  tues   wed thurs   fri   wkavg  wkavg13
1794 2017 Aug-14 to Aug-18 47.59 47.55 46.78 47.09 48.51 47.5040 46.96073
1795 2017 Aug-21 to Aug-25 47.37 47.64 48.41 47.43 47.87 47.7440 46.75242
1796 2017 Aug-28 to Sep- 1 46.57 46.44 45.96 47.23 47.29 46.6980 46.61381
1797 2017 Sep- 4 to Sep- 8    NA 48.66 49.16 49.09 47.48 48.5975 46.77085
1798 2017 Sep-11 to Sep-15 48.07 48.23 49.30 49.89 49.89 49.0760 47.06177
1799 2017 Sep-18 to Sep-22 49.91 49.48    NA    NA    NA 49.6950 47.56585
      wkavg26  wkavg52  wkstd52
1794 48.57825 49.01327 3.036840
1795 48.33677 49.02031 3.033398
1796 48.07262 49.05054 3.001491
1797 47.98490 49.10131 2.970313
1798 48.00698 49.19235 2.893127
1799 48.07779 49.29079 2.819114

Calculate returns using average weekly WTI prices

Calculate the log difference of the weekly WTI price aka returns. Then calculate the position (long or short, +1 or -1) based on whether the weekly average price is above/below the 52-week avg +/- 1 standard deviation.

# calculate log difference of oil prices
df$wkret <- c(0,diff(log(df$wkavg),lag=1))

# calculate position (i.e., signal) 
df$post <- ifelse(df$wkavg > (df$wkavg52+df$wkstd52),1, ifelse(df$wkavg < (df$wkavg52-df$wkstd52), -1, 0))

Calculate strategy's return, std, and Sharpe ratio

Make a new dataframe with two columns (weekly returns and long/short signal) for all the necessary calculations. To avoid look ahead bias, we lag the signal column before multiplying it by the return column. For example, if on day 2 the signal is +1, then you don't get the returns for day 2. You enter into a long position the following day when the market opens and you get the return for day 3.

[1] "The annualized return is 5.139%."

[1] "The annualized standard deviation is 23.735%."

[1] "The Sharpe ratio is 0.217."

Create plot of equity curve

Calculate the strategy's equity curve (the growth of a $1 investment) and plot it. Original plot of equity curve was done using the 'dygraph' package for interactive charts. However, I had to switch to ggplot2 chart because dygraph wasn't working well with this presentation file.

plot of chunk dyplot

Create a plot with all the averages

And export necssary data as CSV file: write.csv(as.data.frame(port),file=“wti_portfolio.csv”,row.names = TRUE)

plot of chunk sma_chart