library(httr)
library(rjson)
library(XML)
url <- "https://raw.githubusercontent.com/djunga/A7-Web-Data/main/mybooks.html"
myhtml <- GET(url)
x <- rawToChar(myhtml$content)
x <- htmlParse(x)
x <- readHTMLTable(x)
x <- data.frame(x)
colnames(x) <- gsub("NULL.", "", colnames(x))
url <- "https://raw.githubusercontent.com/djunga/A7-Web-Data/main/mybooks.xml"
myxml <- GET(url)
a <- rawToChar(myxml$content)
a <- xmlParse(a)
a <- xmlToDataFrame(a)
url <- "https://raw.githubusercontent.com/djunga/A7-Web-Data/main/mybooks.json"
b <- GET(url)
b <- rawToChar(b$content)
b <- fromJSON(b)
b <- unlist(b)
b
## book1.Title book1.Author1
## "Neuromancer" "William Gibson"
## book1.Author2 book1.Genre
## "NA" "Sci-Fi"
## book1.Published book1.Pages
## "1984" "271"
## book2.Title book2.Author1
## "1 the Road" "Ross Goodwin"
## book2.Author2 book2.Genre
## "Kenric McDowell" "Poetry"
## book2.Published book2.Pages
## "2018" "171"
## book3.Title book3.Author1
## "AI 2041: Ten Visions for Our Future" "Chen Qiufan"
## book3.Author2 book3.Genre
## "Kai-Fu Lee" "Sci-Fi"
## book3.Published book3.Pages
## "2021" "480"
mycolnames <- gsub("book[0-9][.]", "", names(b))[1:6]
w <- data.frame(b[1:6], b[7:12], b[13:18])
w <- data.frame(t(w))
colnames(w) <- mycolnames
head(x)
## Title Author1 Author2 Genre
## 1 Neuromancer William Gibson NA Sci-Fi
## 2 1 the Road Ross Goodwin Kenric McDowell Poetry
## 3 AI 2041: Ten Visions for Our Future Chen Qiufan Kai-Fu Lee Sci-Fi
## Published Pages
## 1 1984 271
## 2 2018 171
## 3 2021 480
head(a)
## title author1 author2 genre
## 1 Neuromancer William Gibson NA Sci-Fi
## 2 1 the Road Ross Goodwin Kenric McDowell Poetry
## 3 AI 2041: Ten Visions for Our Future Chen Qiufan Kai-Fu Lee Sci-Fi
## published pages
## 1 1984 271
## 2 2018 171
## 3 2021 480
head(w)
## Title Author1 Author2
## b.1.6. Neuromancer William Gibson NA
## b.7.12. 1 the Road Ross Goodwin Kenric McDowell
## b.13.18. AI 2041: Ten Visions for Our Future Chen Qiufan Kai-Fu Lee
## Genre Published Pages
## b.1.6. Sci-Fi 1984 271
## b.7.12. Poetry 2018 171
## b.13.18. Sci-Fi 2021 480
The HTML and XML data loaded required very little processing to be converted to a data frame format. In contrast, the JSON data required several steps, including using gsub to get the proper column names, and transposing the rows and columns. A JSON file on its own may appear more friendly in terms of visual format, but you may have a less frustrating time loading an HTML or XML file into your R environment.