A simple Scraper extrancing currency exchange data from SUNAT

Here I will demostrate a simple web scraping example using a table of exchange rates from Peru’s tax agency SUNAT. Today that page looked like this:

Sunat currency exchange for Oct 2017

Sunat currency exchange for Oct 2017

First we load the rvest library and the webpage we want to scraper using its url.

  library(rvest)
## Loading required package: xml2
  url <- 'http://www.sunat.gob.pe/cl-at-ittipcam/tcS01Alias'
  webpage <- read_html(url)

We get the webpage tables with html_nodes

  tbls <- html_nodes(webpage, "table")
  length(tbls)
## [1] 6

the are 6 tables. But through trial and error I find that my table of interest is table 2.

  tbl2<-html_table(tbls[[2]])
  print(tbl2)
##    X1     X2    X3  X4     X5    X6  X7     X8    X9 X10    X11   X12
## 1 Día Compra Venta Día Compra Venta Día Compra Venta Día Compra Venta
## 2   3  3.267 3.271   4  3.266 3.268   5  3.258 3.260   6  3.254 3.256
## 3   7  3.266 3.268  10  3.270 3.273  11  3.265 3.267  12  3.260 3.262
## 4  13  3.254 3.256  14  3.248 3.251  17  3.244 3.247  18  3.244 3.246
## 5  19  3.242 3.244  20  3.235 3.237  21  3.237 3.240  24  3.238 3.241
## 6  25  3.238 3.242  26  3.233 3.235  27  3.236 3.239  28  3.244 3.248
  dim(tbl2)
## [1]  6 12
  num.cols<-dim(tbl2)[2]
  num.rows<-dim(tbl2)[1]
  num.cols
## [1] 12
  num.rows
## [1] 6

Reformatting the data into a tidy data.frame

We already have the number of rows and columns and we used them to create vectors that we then integrate into a data.frame

  dia<-c()
  compra<-c()
  venta<-c()
  num.cols
## [1] 12
  num.rows
## [1] 6
  for(i in 2:num.rows){
     for(j in 1:(num.cols/3)){
     
        dia<-c(dia,as.numeric(tbl2[i,(j-1)*3+1]))
        compra<-c(compra,as.numeric(tbl2[i,(j-1)*3+2]))
        venta<-c(venta,as.numeric(tbl2[i,(j-1)*3+3]))
     }
  }
  
  output<-data.frame(dia,compra,venta)
  print(output)
##    dia compra venta
## 1    3  3.267 3.271
## 2    4  3.266 3.268
## 3    5  3.258 3.260
## 4    6  3.254 3.256
## 5    7  3.266 3.268
## 6   10  3.270 3.273
## 7   11  3.265 3.267
## 8   12  3.260 3.262
## 9   13  3.254 3.256
## 10  14  3.248 3.251
## 11  17  3.244 3.247
## 12  18  3.244 3.246
## 13  19  3.242 3.244
## 14  20  3.235 3.237
## 15  21  3.237 3.240
## 16  24  3.238 3.241
## 17  25  3.238 3.242
## 18  26  3.233 3.235
## 19  27  3.236 3.239
## 20  28  3.244 3.248