This is the example of webscraping data from official website of Botanical garden of Medicinal Plants - “Akademik Dr. Jovan Tucakov”, Valjevo, Serbia.
library(XML)
## Warning: package 'XML' was built under R version 3.2.2
url <- "http://www.bastalekbiljava.rs/pages/garden-map.php"
html <- htmlTreeParse(url, useInternalNodes = T)
raw <- xpathSApply(html, "//td", xmlValue)
names <- xpathSApply(html, "//th", xmlValue)
lngt <- length(raw)
table <- data.frame(raw[seq(1, lngt, 5)], raw[seq(2, lngt, 5)], raw[seq(3, lngt, 5)], raw[seq(4, lngt, 5)], raw[seq(5, lngt, 5)])
colnames(table) <- names
write.table(table, file = "botanical_garden.txt", sep = ",", col.names = colnames(table))
Here are first few rows of the output.
## Code No Scientific name Common name Family Details
## 1 10001 Dryopteris filix-mas male-fern Pterydophyta more
## 2 10002 Equisetum arvense common horsetail Pterydophyta more
## 3 10003 Asplenium scolopendrium hart's-tongue Pterydophyta more
## 4 10004 Polypodium vulgare common polypody Pterydophyta more
## 5 10005 Ceterach officinarum rustyback Pterydophyta more
## 6 10006 Asplenium trichomanes Pterydophyta more
NOTE that this database is in developnemt phase and that not all ‘common names’ for plant species have been entered. Furthermore, column ‘Details’ is sufficient in this scraping, since it contains no data (only ‘more’ values), but I’ve included anyway.