Loading The necessary packages
library(XML)
library(rjson)
## Warning: package 'rjson' was built under R version 3.4.4
library(RJSONIO)
##
## Attaching package: 'RJSONIO'
## The following objects are masked from 'package:rjson':
##
## fromJSON, toJSON
suppressWarnings(suppressMessages(library(DT)))
library(data.table)
library(RCurl)
## Loading required package: bitops
suppressWarnings(suppressMessages(library(tidyverse)))
library(knitr)
suppressWarnings(suppressMessages(library(rvest)))
library(plyr)
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
Converting to Data Frames
html <- readHTMLTable(books.html)
html.df <- data.frame(html, stringsAsFactors = FALSE)
names(html.df)<- gsub("NULL.", "", names(html.df))
html.df
## Title
## 1 Hackers: Heroes of the Computer Revolution
## 2 DarkMarket
## 3 Ghost in the Wires: My Adventures as the World's Most Wanted Hacker
## Authors Genre
## 1 Steven Levy Technology
## 2 Misha Glenny Crime
## 3 Kevin Mitnick,Steve Wozniak,William L. Simon Biography
## Publisher Year ISBN Price
## 1 O'Reilly Media 2001 9780141000510 $18.15
## 2 Vintage 2012 9780307476449 $16.00
## 3 Little, Brown and Company 2011 9780316037723 $2.99
xml.df <- xmlToDataFrame(books.xml, stringsAsFactors = FALSE)
xml.df
## title
## 1 Hackers: Heroes of the Computer Revolution
## 2 DarkMarket
## 3 Ghost in the Wires: My Adventures as the World's Most Wanted Hacker
## authors genre
## 1 Steven Levy Technology
## 2 Misha Glenny Crime
## 3 Kevin Mitnick,Steve Wozniak,William L. Simon Biography
## publisher year isbn price
## 1 O'Reilly Media 2001 9780141000510 $18.15
## 2 Vintage 2012 9780307476449 $16.00
## 3 Little, Brown and Company 2011 ‎9780316037723 $14.45
books = fromJSON(books.json)
jsonframe <- data.frame(books$`Tech Books`)
jsonframe
## structure.c..Hackers..Heroes.of.the.Computer.Revolution....Steven.Levy...
## Title Hackers: Heroes of the Computer Revolution
## Author Steven Levy
## Genre Technology
## Publisher O'Reilly Media
## Year 2001
## ISBN 9780141000510
## Price $18.15
## structure.c..DarkMarket....Misha.Glenny....Crime....Vintage...
## Title DarkMarket
## Author Misha Glenny
## Genre Crime
## Publisher Vintage
## Year 2012
## ISBN 9780307476449
## Price $16.00
## Title
## Title Ghost in the Wires: My Adventures as the World's Most Wanted Hacker
## Author Ghost in the Wires: My Adventures as the World's Most Wanted Hacker
## Genre Ghost in the Wires: My Adventures as the World's Most Wanted Hacker
## Publisher Ghost in the Wires: My Adventures as the World's Most Wanted Hacker
## Year Ghost in the Wires: My Adventures as the World's Most Wanted Hacker
## ISBN Ghost in the Wires: My Adventures as the World's Most Wanted Hacker
## Price Ghost in the Wires: My Adventures as the World's Most Wanted Hacker
## Author Genre
## Title Kevin Mitnick,Steve Wozniak,William L. Simon Biography
## Author Kevin Mitnick,Steve Wozniak,William L. Simon Biography
## Genre Kevin Mitnick,Steve Wozniak,William L. Simon Biography
## Publisher Kevin Mitnick,Steve Wozniak,William L. Simon Biography
## Year Kevin Mitnick,Steve Wozniak,William L. Simon Biography
## ISBN Kevin Mitnick,Steve Wozniak,William L. Simon Biography
## Price Kevin Mitnick,Steve Wozniak,William L. Simon Biography
## Publisher Year ISBN Price
## Title Little, Brown and Company 2011 9780316037723 $14.45
## Author Little, Brown and Company 2011 9780316037723 $14.45
## Genre Little, Brown and Company 2011 9780316037723 $14.45
## Publisher Little, Brown and Company 2011 9780316037723 $14.45
## Year Little, Brown and Company 2011 9780316037723 $14.45
## ISBN Little, Brown and Company 2011 9780316037723 $14.45
## Price Little, Brown and Company 2011 9780316037723 $14.45
identical(jsonframe, html.df)
## [1] FALSE
identical(jsonframe, xml.df)
## [1] FALSE
identical(xml.df, html.df)
## [1] FALSE
The 3 data frames are not the same. The “identical” function above tests this theory and all three when tested against each other came up as false to see if they are the same.