Silverio J. Vasquez
Sept. 26th, 2017
This R script goes to the EIA website and scrapes the daily WTI prices. A weekly average is calculated, then 13-,26-, and 52-week averages are calculated for a simple moving avreage crossover strategy.
Then a long/short signal is created if the weekly average is above/below the 52-week average +/- 1 standard deviation.
https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=RCLC1&f=D
Using the 'rvest' package, I extracted the necessary fields from the webpage.
The 'rvest' package was by Hadley Wickham - the father of tidy data and creater of many of the tidy data packages we're learning about.
The main two functions from 'rvest' that allowed me to scrape the webpage fairly quickly and easily were 'html_nodes' and 'html_text'.
First read all the html for the website. The output isn't very intuitive, so I won't show it. But essentially, all the HTML (tags and all) are now saved in the 'wb' variable.
# Website URL
url <- 'https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=RCLC1&f=D'
# Read website HTML
wb <- read_html(url)
After analyzing the html, I realized that the table headers were surrounded by the 'B6' CSS class. I used the function html_nodes passing the html and '.B6' CSS class as parameters. I saved the output as 'wti_html'.
# Extract B6 CSS class
wti_html <- html_nodes(wb,'.B6')
head(wti_html)
{xml_nodeset (6)}
[1] <td class="B6"> 1983 Apr- 4 to Apr- 8</td>
[2] <td class="B6"> 1983 Apr-11 to Apr-15</td>
[3] <td class="B6"> 1983 Apr-18 to Apr-22</td>
[4] <td class="B6"> 1983 Apr-25 to Apr-29</td>
[5] <td class="B6"> 1983 May- 2 to May- 6</td>
[6] <td class="B6"> 1983 May- 9 to May-13</td>
Then I passed 'wit_html' into the html_text function to remove all the html tags from the content, into str_trim to remove white space, and into str_split_fixed to separate the year from the week.
wti_text <- html_text(wti_html)
wti_text <- str_trim(wti_text)
year_week <- str_split_fixed(wti_text," ",2)
head(year_week)
[,1] [,2]
[1,] "1983" "Apr- 4 to Apr- 8"
[2,] "1983" "Apr-11 to Apr-15"
[3,] "1983" "Apr-18 to Apr-22"
[4,] "1983" "Apr-25 to Apr-29"
[5,] "1983" "May- 2 to May- 6"
[6,] "1983" "May- 9 to May-13"
Using the same process, I extracted the oil prices after noticing the prices were always surrounded by the 'B3' CSS class and saved the output to 'price_html'.
# Extract B3 CSS class
price_html <- html_nodes(wb,'.B3')
head(price_html)
{xml_nodeset (6)}
[1] <td class="B3">29.44</td>
[2] <td class="B3">29.71</td>
[3] <td class="B3">29.92</td>
[4] <td class="B3">30.17</td>
[5] <td class="B3">30.38</td>
[6] <td class="B3">30.26</td>
Then I passed 'price_html' into the html_text function to remove all the html tags from the data and as.numeric to convert it to numbers.
price_text <- html_text(price_html)
price_text <- as.numeric(price_text)
head(price_text)
[1] 29.44 29.71 29.92 30.17 30.38 30.26
df <- as.data.frame(matrix(price_text,ncol = 5,byrow = T))
df <- cbind(year_week[,1],year_week[,2],df)
names(df) <- c('year','week','mon','tues','wed','thurs','fri')
head(df)
year week mon tues wed thurs fri
1 1983 Apr- 4 to Apr- 8 29.44 29.71 29.92 30.17 30.38
2 1983 Apr-11 to Apr-15 30.26 30.83 30.82 30.67 30.48
3 1983 Apr-18 to Apr-22 30.75 30.75 30.70 30.68 30.75
4 1983 Apr-25 to Apr-29 30.84 30.71 30.78 30.74 30.63
5 1983 May- 2 to May- 6 30.61 30.50 30.42 30.20 30.45
6 1983 May- 9 to May-13 30.10 30.31 29.97 29.72 29.80
date <- df$week[dim(df)[1]]
date <- str_split_fixed(date," ",3)[3]
year <- str_trim(df$year[dim(df)[1]])
mon <- str_extract_all(date,"[[:alpha:]]+")
day <- str_extract_all(date,"[0-9]+")
print(date)
[1] "Sep-22"
print(mon)
[[1]]
[1] "Sep"
print(day)
[[1]]
[1] "22"
mon_convert <- function(x){
counter = 1
limit = 13
mon_az <- c("jan","feb","mar","apr","may","jun","jul","aug",
"sep","oct","nov","dec")
while (counter < limit){
if (tolower(x) == mon_az[counter]) break
counter = counter + 1
}
if (counter > 12) return(0)
return(counter)
}
# Use mon_convert function to convert 3 character month into an integer
print(mon)
[[1]]
[1] "Sep"
mon_n <- mon_convert(mon)
print(mon_n)
[1] 9
date_str <- paste0(year,"/",mon_n,"/",day)
print(paste0('Latest price as of ', date_str))
[1] "Latest price as of 2017/9/22"
# Create a vector of weekly dates (each Friday)
idx <- seq(as.Date('1983/4/8'), as.Date(date_str),"week")
# calculate average across week days
df$wkavg <- apply(df[,3:7],1,mean,na.rm = TRUE)
# calculate rolling averages for 12-,26-, and 52-weeks
df$wkavg13 <- rollmean(df$wkavg,13,fill=NA,align='right')
df$wkavg26 <- rollmean(df$wkavg,26,fill=NA,align='right')
df$wkavg52 <- rollmean(df$wkavg,52,fill=NA,align='right')
# calculate 52-week standard deviation using weekly average oil price
df$wkstd52 <- rollapply(df$wkavg,52,FUN=sd,fill=NA,align='right')
# Show bottom of dataframe
tail(df)
year week mon tues wed thurs fri wkavg wkavg13
1794 2017 Aug-14 to Aug-18 47.59 47.55 46.78 47.09 48.51 47.5040 46.96073
1795 2017 Aug-21 to Aug-25 47.37 47.64 48.41 47.43 47.87 47.7440 46.75242
1796 2017 Aug-28 to Sep- 1 46.57 46.44 45.96 47.23 47.29 46.6980 46.61381
1797 2017 Sep- 4 to Sep- 8 NA 48.66 49.16 49.09 47.48 48.5975 46.77085
1798 2017 Sep-11 to Sep-15 48.07 48.23 49.30 49.89 49.89 49.0760 47.06177
1799 2017 Sep-18 to Sep-22 49.91 49.48 NA NA NA 49.6950 47.56585
wkavg26 wkavg52 wkstd52
1794 48.57825 49.01327 3.036840
1795 48.33677 49.02031 3.033398
1796 48.07262 49.05054 3.001491
1797 47.98490 49.10131 2.970313
1798 48.00698 49.19235 2.893127
1799 48.07779 49.29079 2.819114
Calculate the log difference of the weekly WTI price aka returns. Then calculate the position (long or short, +1 or -1) based on whether the weekly average price is above/below the 52-week avg +/- 1 standard deviation.
# calculate log difference of oil prices
df$wkret <- c(0,diff(log(df$wkavg),lag=1))
# calculate position (i.e., signal)
df$post <- ifelse(df$wkavg > (df$wkavg52+df$wkstd52),1, ifelse(df$wkavg < (df$wkavg52-df$wkstd52), -1, 0))
Make a new dataframe with two columns (weekly returns and long/short signal) for all the necessary calculations. To avoid look ahead bias, we lag the signal column before multiplying it by the return column. For example, if on day 2 the signal is +1, then you don't get the returns for day 2. You enter into a long position the following day when the market opens and you get the return for day 3.
[1] "The annualized return is 5.139%."
[1] "The annualized standard deviation is 23.735%."
[1] "The Sharpe ratio is 0.217."
Calculate the strategy's equity curve (the growth of a $1 investment) and plot it. Original plot of equity curve was done using the 'dygraph' package for interactive charts. However, I had to switch to ggplot2 chart because dygraph wasn't working well with this presentation file.
And export necssary data as CSV file: write.csv(as.data.frame(port),file=“wti_portfolio.csv”,row.names = TRUE)