The goal of this assignment is to create and import HTML, XML, and JSON files into R.
library(XML) #for readHTMLTable, xmlTreeParse, and xmlSApply
library(RCurl) #for getURL function
library(rjson) #for fromJSON function
library(jsonlite) #for fromJSON function
library(DT) #for datatables
library(tidyr) #for data wrangling
library(dplyr) #for data wrangling
library(magrittr) #for data wranglingbooks_html <- as.data.frame(readHTMLTable(getURL("https://raw.githubusercontent.com/mkivenson/Data-Acquisition-and-Management/master/Assignment%205/books.html"), header = TRUE))
datatable(books_html)books_html %<>%
separate(books.author,c("author1","author2","author3","author4"), sep = ",") %>%
gather("id","author",2:5) %>%
na.omit() %>%
arrange(books.title)
datatable(books_html)books_xml <- xmlTreeParse(getURL("https://raw.githubusercontent.com/mkivenson/Data-Acquisition-and-Management/master/Assignment%205/books.xml"))
books_xml <- xmlSApply(books_xml,function(x) xmlSApply(x, xmlValue))
books_xml <- as.data.frame(t(books_xml), row.names=NULL)
datatable(books_xml)books_xml %<>%
separate(author,c("author1","author2","author3","author4"), sep = ",") %>%
gather("id","author",2:5) %>%
na.omit() %>%
arrange(title)
datatable(books_xml)books_json <- as.data.frame(fromJSON(getURL("https://raw.githubusercontent.com/mkivenson/Data-Acquisition-and-Management/master/Assignment%205/books.json")))
datatable(books_json)books_json %<>%
separate(books.author,c("author1","author2","author3","author4"), sep = ",") %>%
gather("id","author",2:5) %>%
na.omit() %>%
arrange(books.title)
datatable(books_json)The HTML, XML, and JSON formats are nearly identical once imported as into R as dataframes. One distinction is that for the XML dataframe, the columns are named based on the item name.