I choose three nonfiction books that I used during my undergraduate years, as a Physics student. I choose five attributes for each of the books, including the title and the author(s). The other three attributes relate to the front cover of the books. I obtained the information about the books’ title, edition, author, cover designer and cover art/image. Then I created three (HTML, XML, and JSON) files “by hand” and stored the book’s information in them separately.
Installing all the necessary packages needed for this data analysis.
library(devtools)
## Warning: package 'devtools' was built under R version 3.2.2
## WARNING: Rtools is required to build R packages, but is not currently installed.
##
## Please download and install Rtools 3.3 from http://cran.r-project.org/bin/windows/Rtools/ and then run find_rtools().
devtools::install_github("crubba/htmltab")
## Downloading GitHub repo crubba/htmltab@master
## Installing htmltab
## "C:/PROGRA~1/R/R-32~1.1/bin/x64/R" --no-site-file --no-environ --no-save \
## --no-restore CMD INSTALL \
## "C:/Users/Nabila/AppData/Local/Temp/Rtmp8qZrlL/devtools19b06f482a8a/crubba-htmltab-51d42b0" \
## --library="C:/Users/Nabila/Documents/R/win-library/3.2" \
## --install-tests
library(htmltab)
library(XML)
## Warning: package 'XML' was built under R version 3.2.2
library(RCurl)
## Loading required package: bitops
library(plyr)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.2.2
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:utils':
##
## View
HTML File Link: https://github.com/nabilahossain/Class-IS607/blob/master/Week%208%20Assignment/Week_8_Assignment_HTML.html
htmltab I uploaded the table HTML table that I created in the file from online. Since we just had to upload the data from online, I did not tidy or transform the table.u1 <- "https://raw.githubusercontent.com/nabilahossain/Class-IS607/master/Week%208%20Assignment/Week_8_Assignment_HTML.html"
books1<- htmltab(doc = u1)
## Argument 'which' was left unspecified. Choosing first table.
books1
## Table 1: Three Physics books and their cover. >> Book Title
## 3 Horizons: Exploring the Universe
## 4 Horizons: Exploring the Universe
## 5 Horizons: Exploring the Universe
## 6 Horizons: Exploring the Universe
## 7 Concise Notes For Physics
## 8 Fundamentals Of Electric Circuits
## 9 Fundamentals Of Electric Circuits
## Table 1: Three Physics books and their cover. >> Edition
## 3 11th
## 4 11th
## 5 11th
## 6 11th
## 7 4th
## 8 4th
## 9 4th
## Table 1: Three Physics books and their cover. >> Author
## 3 Michael A. Seeds
## 4 Michael A. Seeds
## 5 Dana E. Backman
## 6 Dana E. Backman
## 7 Dr. Robert W. Finkel
## 8 Charles K. Alexander
## 9 Matthew N. O. Sadiku
## Table 1: Three Physics books and their cover. >> Cover Designer
## 3 Irene Morris
## 4 Irene Morris
## 5 Irene Morris
## 6 Irene Morris
## 7 Renee Sartell
## 8 Studio Montage
## 9 Studio Montage
## Table 1: Three Physics books and their cover. >> Cover Art/Image
## 3 Background: Young stars in the Rho Ophiuchi Cloud
## 4 Top Insert: Nebula in the Large Magellanic Cloud
## 5 Middle Insert: Gamma-ray burst
## 6 Bottom Insert: Phoenix Mars Lander
## 7 Pattern 4
## 8 Astronauts Repairing Spacecraft
## 9 Astronauts Repairing Spacecraft
XML File Link: https://github.com/nabilahossain/Class-IS607/blob/master/Week%208%20Assignment/Week_8_Assignment_XML.xml
XML, RCurl and plyr packages I uploaded the XML file from online. Using the XML file I created a table (data frame) below. Since we just had to upload the data from online, I did not tidy or transform the table.u2 <- getURL("https://raw.githubusercontent.com/nabilahossain/Class-IS607/master/Week%208%20Assignment/Week_8_Assignment_XML.xml")
xmlt1 <- htmlParse(u2, asText=TRUE)
books2 <- ldply(xmlToList(xmlt1), data.frame)
## Warning in data.frame(title = "Horizons: Exploring the Universe", edition
## = "11th", : row names were found from a short variable and have been
## discarded
## Warning in data.frame(book = structure(list(title = "Horizons: Exploring
## the Universe", : row names were found from a short variable and have been
## discarded
books2
## .id catalog.book.title catalog.book.edition
## 1 body Horizons: Exploring the Universe 11th
## 2 body Horizons: Exploring the Universe 11th
## 3 body Horizons: Exploring the Universe 11th
## 4 body Horizons: Exploring the Universe 11th
## catalog.book.author catalog.book.cover_designer
## 1 Michael A. Seeds Irene Morris
## 2 Dana E. Backman Irene Morris
## 3 Michael A. Seeds Irene Morris
## 4 Dana E. Backman Irene Morris
## catalog.book.cover_art_image catalog.book..attrs
## 1 Young stars in the Rho Ophiuchi Cloud 1
## 2 Nebula in the Large Magellanic Cloud 1
## 3 Gamma-ray burst 1
## 4 Phoenix Mars Lander 1
## catalog.book.title.1 catalog.book.edition.1 catalog.book.author.1
## 1 Concise Notes For Physics 4th Dr. Robert W. Finkel
## 2 Concise Notes For Physics 4th Dr. Robert W. Finkel
## 3 Concise Notes For Physics 4th Dr. Robert W. Finkel
## 4 Concise Notes For Physics 4th Dr. Robert W. Finkel
## catalog.book.cover_designer.1 catalog.book.cover_art_image.1
## 1 Renee Sartell Pattern 4
## 2 Renee Sartell Pattern 4
## 3 Renee Sartell Pattern 4
## 4 Renee Sartell Pattern 4
## catalog.book..attrs.1 catalog.book.title.2
## 1 2 Fundamentals Of Electric Circuits
## 2 2 Fundamentals Of Electric Circuits
## 3 2 Fundamentals Of Electric Circuits
## 4 2 Fundamentals Of Electric Circuits
## catalog.book.edition.2 catalog.book.author.2
## 1 4th Charles K. Alexandert
## 2 4th Matthew N. O. Sadiku
## 3 4th Charles K. Alexandert
## 4 4th Matthew N. O. Sadiku
## catalog.book.cover_designer.2 catalog.book.cover_art_image.2
## 1 Studio Montage Astronauts Repairing Spacecraft
## 2 Studio Montage Astronauts Repairing Spacecraft
## 3 Studio Montage Astronauts Repairing Spacecraft
## 4 Studio Montage Astronauts Repairing Spacecraft
## catalog.book..attrs.2
## 1 3
## 2 3
## 3 3
## 4 3
XML File Link: https://github.com/nabilahossain/Class-IS607/blob/master/Week%208%20Assignment/Week_8_Assignment_JSON.json
jsonlite I uploaded the JSON file from online. Using the file I created a table (data frame) below. Since we just had to upload the data from online, I did not tidy or transform the table.jf <- "https://raw.githubusercontent.com/nabilahossain/Class-IS607/master/Week%208%20Assignment/Week_8_Assignment_JSON.json"
books3 <- fromJSON(jf, flatten = FALSE)
books3
## id Title Edition
## 1 1 Horizons: Exploring the Universe 11th
## 2 2 Concise Notes For Physics 4th
## 3 2 Fundamentals Of Electric Circuits 4th
## Author Cover_Designer
## 1 Michael A. Seeds, Dana E. Backman Irene Morris
## 2 Dr. Robert W. Finkel Renee Sartell
## 3 Charles K. Alexander, Matthew N. O. Sadiku Studio Montage
## Cover_Art_Image
## 1 Background: Young stars in the Rho Ophiuchi Cloud, Top_Insert: Nebula in the Large Magellanic Cloud, Middle_Insert: Gamma-ray burst, Bottom_Insert:Phoenix Mars Lander
## 2 Pattern 4
## 3 Astronauts Repairing Spacecraft
r they all look different. The HTML table has nine rows and five columns. The second one (XML) data frame has 4 rows and 19 columns. The data frame that I created using JSON has 3 rows and 5 columns. The HTML data frame is the tidiest looking table then the others. The XML data frame had to be transformed and tidied the most.