Introduction

For this assignment, I created an html, xml, and json file containing information about three of my college physics textbooks. These files contained the book titles, authors, subject, and page counts. I then used R to read each file and create a dataframe from the information.

Code

I read in each file using the XML, jsonlite, and rvest libraries. Then I converted them to separate dataframes.

library(XML)
library(jsonlite)
library(rvest)
## Loading required package: xml2
## 
## Attaching package: 'rvest'
## The following object is masked from 'package:XML':
## 
##     xml
html <- as.data.frame(read_html("books.html") %>% html_table())

xml <- xmlParse(file = "books.xml")
xml <- xmlRoot(xml)
xml <- xmlToDataFrame(xml)

json <- fromJSON("books.json")
json <- json$books

html
##                                  X1                          X2      X3   X4
## 1           Fundamentals of Physics Halliday, Resnick, & Walker Physics 1248
## 2               Classical Mechanics              John R. Taylor Physics  745
## 3 Introduction to Quantum Mechanics          David J. Griffiths Physics  434
xml
##                               title                      author subject length
## 1           Fundamentals of Physics Halliday, Resnick, & Walker Physics   1248
## 2               Classical Mechanics              John R. Taylor Physics    745
## 3 Introduction to Quantum Mechanics          David J. Griffiths Physics    434
json
##                               title                    author subject length
## 1           Fundamentals of Physics Halliday, Resnick, Walker Physics   1248
## 2               Classical Mechanics            John R. Taylor Physics    745
## 3 Introduction to Quantum Mechanics        David J. Griffiths Physics    434

Conclusion

Each of the dataframes is a little bit different than the others. The HTML dataframe lost the headers for the columns and the JSON dataframe represented the authors as a list instead of a single string (which is nice).