DATA 607 WEEK 7 ASSIGNMENT
WORKING WITH HTML, XML AND JSON IN R
March 26, 2019
SOURCE FILES, R SCRIPTS AND RMARKDOWN ON GITHUB
An HTML table format is defined with the <table> tag. Each table row is defined with the <tr> tag. A table header is defined with the <th> tag. By default, table headings are bold and centered. A table data/cell is defined with the <td> tag.
# html variable in R
url_html <- read_html("https://raw.githubusercontent.com/kleberperez1/CUNY-SPS-Data607-Week7-Assignment/master/books.html")
tabs <- url_html %>%
html_nodes("table") %>%
html_table(fill = TRUE)
html_frame <- tabs[[1]]
kable(html_frame) %>%
kable_styling(bootstrap_options = "striped", font_size = 10) %>%
scroll_box()| Title | Authors | Genre | Publisher | Year | ISBN | QR Code | Price |
|---|---|---|---|---|---|---|---|
| QR Codes For The Creative Business Person | Dr. Chris Thomas | Self help | Milton Contact Limited | 2012 | 9780956264992 | NA | $16 |
| Scan Me: Everybody’s Guide to the Magical World of QR Codes | Mick Winter | Guide | Westsong Publishing | 2011 | 9780965900034 | NA | $15.95 |
| 40 Ways to Use QR Codes For Mobile Marketing | Jeff A Hamilton, Joan Mullally, Andrew P Simon | Guide | External Spiral Books | 2013 | None | NA | $2.99 |
XML is a file extension for an Extensible Markup Language (XML) file format used to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere using standard ASCII text. XML is similar to HTML.
xmlfile<-getURL("https://raw.githubusercontent.com/kleberperez1/CUNY-SPS-Data607-Week7-Assignment/master/books.xml")
xmlBooks<-tbl_df(xmlToDataFrame(xmlfile))
xmlBooks %>% kable() %>% kable_styling(bootstrap_options = "striped", font_size = 10) %>% scroll_box()| title | authors | genre | publisher | year | isbn | price |
|---|---|---|---|---|---|---|
| QR Codes For The Creative Business Person | Dr. Chris Thomas | Self help | Milton Contact Limited | 2012 | 9780956264992 | $16 |
| Scan Me: Everybody’s Guide to the Magical World of QR Codes | Mick Winter | Guide | Westsong Publishing | 2011 | 9780965900034 | $15.95 |
| 40 Ways to Use QR Codes For Mobile Marketing | Jeff A Hamilton,Joan Mullally,Andrew P Simon | Guide | External Spiral Books | 2013 | None | $2.99 |
The JSON syntax is a subset of the JavaScript syntax.
JSON data is written as name/value pairs.
A name/value pair consists of a field name (in double quotes), followed by a colon, followed by a value: “name”:“John”.
jsonFile<-"https://raw.githubusercontent.com/kleberperez1/CUNY-SPS-Data607-Week7-Assignment/master/books.json"
isValidJSON(jsonFile)## [1] TRUE
jsonbooks<-tbl_df(as.data.frame(fromJSON(jsonFile)))
jsonbooks %>% kable() %>% kable_styling(bootstrap_options = "striped", font_size = 10) %>% scroll_box()| Title | Authors | Genre | Publisher | Year | ISBN | QR.Code | Price |
|---|---|---|---|---|---|---|---|
| QR Codes For The Creative Business Person | Dr. Chris Thomas | Self help | Milton Contact Limited | 2012 | 9780956264992 | http://miltoncontact.co.uk/sites/default/files/imagecache/cat_grid_image/qr_code_book_cover-front.png | $16 |
| Scan Me: Everybody’s Guide to the Magical World of QR Codes | Mick Winter | Guide | Westsong Publishing | 2011 | 9780965900034 | https://images-na.ssl-images-amazon.com/images/I/51Tbhu0t5xL._SX329_BO1,204,203,200_.jpg | $15.95 |
| 40 Ways to Use QR Codes For Mobile Marketing | Jeff A Hamilton, Joan Mullally, Andrew P Simon | Guide | External Spiral Books | 2013 | None | https://images-na.ssl-images-amazon.com/images/I/51N7DgzLm6L._SY346_.jpg | $2.99 |
Are the three data frames identical?
The three data frames are not the identical. The data size for the three files are different.
The HTML data frame is the largest of all, followed by the JSON data frame and lastly
the XML data frame is the smallest in size.