botanical_garden

This is the example of webscraping data from official website of Botanical garden of Medicinal Plants - “Akademik Dr. Jovan Tucakov”, Valjevo, Serbia.

library(XML)

## Warning: package 'XML' was built under R version 3.2.2

url <- "http://www.bastalekbiljava.rs/pages/garden-map.php"
html <- htmlTreeParse(url, useInternalNodes = T)
raw <- xpathSApply(html, "//td", xmlValue)
names <- xpathSApply(html, "//th", xmlValue)
lngt <- length(raw)
table <- data.frame(raw[seq(1, lngt, 5)], raw[seq(2, lngt, 5)], raw[seq(3, lngt, 5)], raw[seq(4, lngt, 5)], raw[seq(5, lngt, 5)])
colnames(table) <- names
write.table(table, file = "botanical_garden.txt", sep = ",", col.names = colnames(table))

Here are first few rows of the output.

##   Code No         Scientific name      Common name       Family Details
## 1   10001    Dryopteris filix-mas        male-fern Pterydophyta    more
## 2   10002       Equisetum arvense common horsetail Pterydophyta    more
## 3   10003 Asplenium scolopendrium    hart's-tongue Pterydophyta    more
## 4   10004      Polypodium vulgare  common polypody Pterydophyta    more
## 5   10005    Ceterach officinarum        rustyback Pterydophyta    more
## 6   10006   Asplenium trichomanes                  Pterydophyta    more

NOTE that this database is in developnemt phase and that not all ‘common names’ for plant species have been entered. Furthermore, column ‘Details’ is sufficient in this scraping, since it contains no data (only ‘more’ values), but I’ve included anyway.

botanical_garden

Shansh

Tuesday, September 29, 2015