Data607- HW7

Assignment – Working with XML and JSON in R

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(RCurl)
library(tidyverse)

## -- Attaching packages ------------------------------------------------------------------------------------------------ tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.0
## v readr   1.3.1

## -- Conflicts --------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::complete() masks RCurl::complete()
## x dplyr::filter()   masks stats::filter()
## x dplyr::lag()      masks stats::lag()

library(XML)
library(knitr)
library(rjson)
library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following object is masked from 'package:purrr':
## 
##     compact

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

JSON

library(jsonlite)

## 
## Attaching package: 'jsonlite'

## The following objects are masked from 'package:rjson':
## 
##     fromJSON, toJSON

## The following object is masked from 'package:purrr':
## 
##     flatten

books_json <- fromJSON("https://raw.githubusercontent.com/hrensimin05/Data_607/master/books.json")

books_json <- bind_rows(books_json, .id = 'Author')
books_json

## # A tibble: 3 x 4
##      ID Title                                             Author       ISBN     
##   <int> <chr>                                             <chr>        <chr>    
## 1     1 Innumeracy : mathematical illiteracy and its con~ John Paulos  08090584~
## 2     2 The Rosie Project                                 Graeme Sims~ 14767290~
## 3     3 Is Everyone Hanging Out Without Me? (And Other C~ Mindy Kaling 03078862~

HTML

books_df <- readHTMLTable(
    getURL("https://raw.githubusercontent.com/hrensimin05/Data_607/master/books.htm"), header = TRUE, which = 1)

class(books_df)

## [1] "data.frame"

knitr::kable(books_df)

ID	Title	Author	ISBN
1	Innumeracy : mathematical illiteracy and its consequences	John Paulos	0809058405
2	The Rosie Project	Graeme Simsion	1476729093
3	Is Everyone Hanging Out Without Me? (And Other Concerns)	Mindy Kaling	0307886271

XML

books2 <- ldply(xmlToList(getURL("https://raw.githubusercontent.com/hrensimin05/Data_607/master/books.xml")), data.frame) %>%
    select(-.id)

class(books2)

## [1] "data.frame"

knitr::kable(books2)

id	title	author	isbn
1	Innumeracy : mathematical illiteracy and its consequences	John Paulos	0809058405
2	The Rosie Project	Graeme Simsion	1476729093
3	Is Everyone Hanging Out Without Me? (And Other Concerns)	Mindy Kaling	0307886271

Conclusion

All three files are storing information slightly differently. HTML and Xml data frames are the same , but the json data frame, which was the most difficult for me to implement, transform the data into raws, but I also created the json file a bit differently compare to xml and html.

#xml==html
books_df == books2

##        ID Title Author ISBN
## [1,] TRUE  TRUE   TRUE TRUE
## [2,] TRUE  TRUE   TRUE TRUE
## [3,] TRUE  TRUE   TRUE TRUE

Data607- HW7

Dominika Markowska-Desvallons

10/10/2020

Assignment – Working with XML and JSON in R

JSON

HTML

XML

Conclusion