Lab5

Author

Gabriel Castellanos

Introduction

This project calls for loading 3 different files (XML, JSON, and, HTML) that each describe 3 of my favorite books and their attributes. Each file was created with Notepad ++ and loaded into a Github Repository. In the end, we should have 3 seperate data frames.

JSON

We need jsonlite package

install.packages("jsonlite")

Installing jsonlite [1.8.4] ...
    OK [linked cache]

library(jsonlite)

url <- "https://raw.githubusercontent.com/gc521/DATA-607-Data-Acquisition-Mangement/Lab-5/Books.json"

book <-fromJSON(url)

as.data.frame(book)

                                                  Name
1                                   A Clockwork Orange
2                                      The Kite Runner
3 In For Life: Confessions of a Three-Strikes Prisoner
                           Authour Year       Genre
1                  Anthony Burgess 1962      Horror
2                  Khaled Hosseini 2003       Drama
3 Damien Lartigue amd Eric W. Senn 2017 Non-Fiction

XML

This can be done with the aid of few packages.

install.packages("xml2")

Installing xml2 [1.3.3] ...
    OK [linked cache]

library(xml2)
install.packages("XML")

Installing XML [3.99-0.13] ...
    OK [linked cache]

library("XML")
install.packages('methods')

* There are no packages to install.

library(methods)
result <- read_xml('https://raw.githubusercontent.com/gc521/DATA-607-Data-Acquisition-Mangement/Lab-5/Books.xml')

books.xml <- xmlParse(result)

books.xml <- xmlToDataFrame(books.xml)

HTML

install.packages("rvest")

Installing rvest [1.0.3] ...
    OK [linked cache]

library(rvest)


books.html <- read_html('https://raw.githubusercontent.com/gc521/DATA-607-Data-Acquisition-Mangement/Lab-5/Books.html', trim = TRUE)
books.html <- html_table(books.html)
books.html <- as.data.frame(books.html)

Conclusion

All 3 file types posed their own unique challenges, but if you know which packages to use, the task becomes very manageable.