Working with XML and JSON in R
Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book include: -the title -authors -two or three other attributes that you find interesting
Take the information that you’ve selected about these three books, and separately create three files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g. “books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, create each of these files “by hand” unless you’re already very comfortable with the file formats.
_________________________________________________________________________
Packages:
Rcurl
XML
jsonlite
_________________________________________________________________________
HTML
Using Github URL
<- ("https://raw.githubusercontent.com/mgino11/Books_Json_XML/main/Book_files/books.html")
url <- "//table[1]"
xp <- htmltab(url, rm_nodata_cols = F, which = xp)
df_html
datatable(df_html)
_________________________________________________________________________
JSON FILE
<- fromJSON("https://raw.githubusercontent.com/mgino11/Books_Json_XML/main/Book_files/books.json")
jason_file <- as_data_frame(jason_file)
json_df datatable(jason_file)
_________________________________________________________________________
XML File
<- "Book_files\books.xml"
filename <- xmlInternalTreeParse(filename)
xml_book <- xmlToDataFrame(xml_book) xml_df
_________________________________________________________________________
Conclusions
XML looks familiar to someone with basic knowledge about HTML, as it shares the same features of a markup language. Nevertheless, HTML and XML both serve their own specific purposes. While HTML is used to shape the display of information, the main purpose of XML is to store data.
XML is data wrapped in user-defined tags. The user-defined tags make XML much more flexible for storing data than HTML.Indentation further facilitates reading but is not a necessary component of XML.
Another standard for data storage and interchange we frequently find on the Web is the JavaScript Object Notation, abbreviated JSON. JSON is an increasingly popular alternative to XML for data exchange purposes that comes with some preferable features. JSON was designed for the same tasks that XML is often used for—the storage and exchange of human-readable data. Many APIs by popular web applications provide data in the JSON format.