For this assignment, I created an html, xml, and json file containing information about three of my college physics textbooks. These files contained the book titles, authors, subject, and page counts. I then used R to read each file and create a dataframe from the information.
I read in each file using the XML, jsonlite, and rvest libraries. Then I converted them to separate dataframes.
library(XML)
library(jsonlite)
library(rvest)
## Loading required package: xml2
##
## Attaching package: 'rvest'
## The following object is masked from 'package:XML':
##
## xml
html <- as.data.frame(read_html("books.html") %>% html_table())
xml <- xmlParse(file = "books.xml")
xml <- xmlRoot(xml)
xml <- xmlToDataFrame(xml)
json <- fromJSON("books.json")
json <- json$books
html
## X1 X2 X3 X4
## 1 Fundamentals of Physics Halliday, Resnick, & Walker Physics 1248
## 2 Classical Mechanics John R. Taylor Physics 745
## 3 Introduction to Quantum Mechanics David J. Griffiths Physics 434
xml
## title author subject length
## 1 Fundamentals of Physics Halliday, Resnick, & Walker Physics 1248
## 2 Classical Mechanics John R. Taylor Physics 745
## 3 Introduction to Quantum Mechanics David J. Griffiths Physics 434
json
## title author subject length
## 1 Fundamentals of Physics Halliday, Resnick, Walker Physics 1248
## 2 Classical Mechanics John R. Taylor Physics 745
## 3 Introduction to Quantum Mechanics David J. Griffiths Physics 434
Each of the dataframes is a little bit different than the others. The HTML dataframe lost the headers for the columns and the JSON dataframe represented the authors as a list instead of a single string (which is nice).