Introduction/ Conclusion
For this assignment we were tasked to work with HTML, XML, and JSON data within R. First, we were asked to store the information of three of our favorite books within each of these formats and then to read that data back into R. The result of me creating those files can seen at the links shown below.
Reading the data into R and transforming it into a friendly format data frame is extremely simple, as shown below in the code. Using the XML, xlm2, rvest, and jsonlite libraries we can easily ascertain the necessary information from each file format. Then, using a base R function, we can transform the data into dataframes that can then be used for analysis.
The dataframes for each can also be seen below. Some of the quirks of each format can be explored by seeing the differences in the outputs. Such as that R will change any slash into a period when making the data into a data frame or the JSON format appending the name of the dataframe onto each column name. Other than than, there is not much else to be said. This was a simple warm up to how R handles different file formats. Thank you very much for reading my assignment.
html_data <- read_html("https://raw.githubusercontent.com/zachsfr/Cuny-SPS/main/607/Data/books.html") %>%
html_table() %>%
as.data.frame()
| The Name of the Wind |
Patrick Rothfuss |
The Kingkiller Chronicle |
Heroic fantasy |
| A Game of Thrones |
George R. R. Martin |
A Song of Ice and Fire |
Fantasy |
| The Talisman |
Stephen King and Peter Straub |
Jack Sawyer Trilogy |
Dark fantasy |
json_data <- fromJSON("https://raw.githubusercontent.com/zachsfr/Cuny-SPS/main/607/Data/books.json"
) %>%
as.data.frame()
| The Name of the Wind |
Patrick Rothfuss |
The Kingkiller Chronicle |
Heroic fantasy |
| A Game of Thrones |
George R. R. Martin |
A Song of Ice and Fire |
Fantasy |
| The Talisman< |
Stephen King and Peter Straub |
Jack Sawyer Trilogy |
Dark fantasy |
xml_data <- read_xml("https://raw.githubusercontent.com/zachsfr/Cuny-SPS/main/607/Data/book.xml") %>%
xmlParse() %>%
xmlToDataFrame()
| The Name of the Wind |
Patrick Rothfuss |
The Kingkiller Chronicle |
Heroic fantasy |
| A Game of Thrones |
George R. R. Martin |
A Song of Ice and Fire |
Fantasy |
| The Talisman |
Stephen King and Peter Straub |
Jack Sawyer Trilog |
Dark fantasy |