Create three files which store book content in HTML (table) format, XML format, and and JSON format. Load the information from each of the three sources into separate R data frames.

Are the three data frames identical?

  1. Load packages

  2. Evaluate books.html

rinterface

readHTMLTable

doesn’t capture header names
fills sparse array (nulls) leftward
aligns columns nicely
adds no extraneous attributes
url <-  "/Users/scottkarr/IS607Spring2016/hw8/more/books.html"
html <- readHTMLTable(url)
df <- data.frame(html)
kable(df, align='l')
NULL.V1 NULL.V2 NULL.V3 NULL.V4 NULL.V5 NULL.V6 NULL.V7 NULL.V8
Flashboys Michael Lewis Non-fiction Thriller, Suspense, Sleuth 14.31 Chronicles the team that discovered market distortions from electronic trading. NA NA
title author1 author2 author3 type genre price description
The Economists’ Voice Joseph E. Stiglitz Aaron S. Edlin J. Bradford Delong Non-fiction Economics, Public Policy 18.95 Compilation of contemporary economists on public policy issues.
title author1 type genre price description NA NA
The Circle David Eggers Fiction Thriller, Suspense, Dystopia, Contemporary issues 9.49 A “Brave New World” for the modern surveillance state. NA NA
  1. Evaluate books.xml

rinterface

xmlToList

captures header names
maintains table structure with nulls
aligns columns nicely
adds columns attributes
url <-  "/Users/scottkarr/IS607Spring2016/hw8/more/books.xml"
df <- ldply(xmlToList(url), data.frame)
kable(df, align='l')
.id text title.text title..attrs text.1 author1 text.2 type text.3 genre text.4 price text.5 description text.6 author2 author3 genre.1 text.7 text.8
book Flashboys en Michael Lewis Non-fiction Thriller, Suspense, Sleuth 14.31 Chronicles the team that discovered market distortions from electronic trading. NA NA NA NA NA
book The Economists’ Voice en Joseph E. Stiglitz NA Non-fiction 18.95 Compilation of contemporary economists on public policy issues. Aaron S. Edlin J. Bradford Delong Economics, Public Policy
book The Circle en David Eggers NA Fiction 9.49 A “Brave New World” for the modern surveillance state. NA NA Thriller, Suspense, Dystopia, Contemporary issues NA NA
  1. Evaluate books.json

rinterface

fromJSON

captures header names
maintains table structure with nulls
columns displayed in wide format
adds no extraneous attributes
url <-  "/Users/scottkarr/IS607Spring2016/hw8/more/books.json"
json <- fromJSON(paste(readLines(url), collapse=""))
df <- data.frame(json)
kable(df, align='l')
title author1 author2 author3 type genre price description title.1 author1.1 author2.1 author3.1 type.1 genre.1 price.1 description.1 title.2 author1.2 author2.2 author3.2 type.2 genre.2 price.2 description.2
Flashboys Michael Lewis Non-fiction Thriller, Suspense, Sleuth 14.31 Chronicles the team that discovered market distortions from electronic trading. The Economists’92 Voice Joseph Stiglitz Aaron S. Edlin J. Bradford Delong Non-fiction Economics, Public Policy 18.95 Compilation of contemporary economists on public policy issues. The Circle David Eggers Fiction Thriller, Suspense, Dystopia, Contemporary Issues 9.49 A ‘Brave New World’ for the modern surveillance state.