Create data

I created the data in the notebook and load that to my Github

Load the data

To load the information from the HTML, XML, and JSON sources into separate R data frames, you can use the following code. We will use the rvest package for HTML, XML and xml2 package for XML, and rjson for JSON.

library(rvest) # load html file
library(xml2)  # Load xml file
library(XML)   
library(rjson) # load json file

This article “https://www.datacamp.com/tutorial/r-data-import-tutorial” on dataCamp explains how to import HTML, XML, and JSON data in R.

Load data from HTML into a data frame

Extracting HTML data from github into table

file<-read_html("https://github.com/Kossi-Akplaka/Data607-data_acquisition_and_management/blob/main/assignment7/books.html")
table<-html_nodes(file, "table")
table
## {xml_nodeset (1)}
## [1] <table>\\r","  <tr>\\r","    <th>Books</th>\\r","    <th>Author1</th>\\r" ...

Transform into a dataframe

books_html <-html_table(table)[[1]] #converting HTML tables to dataframe
books_html
## # A tibble: 3 × 5
##   Books                    Author1                  Author2  Atribute1 Atribute2
##   <chr>                    <chr>                    <chr>    <chr>     <chr>    
## 1 Big Data                 Viktor Mayer-Schönberger Kenneth… Transfor… Challeng…
## 2 The signal and the noise Nate Silver              N/A      Art and … Data, st…
## 3 Superforecasting         Philip E. Tetlock        Dan Gar… Art and … Challeng…

Load data from XML into a data frame

Getting the data

xml_file <- xmlParse(read_xml('https://raw.githubusercontent.com/Kossi-Akplaka/Data607-data_acquisition_and_management/main/assignment7/book.xml'))
xml_file
## <?xml version="1.0" encoding="UTF-8"?>
## <books>
##   <book>
##     <title>Big Data</title>
##     <author1>Viktor Mayer-Schönberger</author1>
##     <author2>Kenneth Neil Cukier</author2>
##     <attribute1>Transformative power of big data in various fields</attribute1>
##     <attribute2>Challenges and opportunities of big data</attribute2>
##   </book>
##   <book>
##     <title>The signal and the noise</title>
##     <author1>Nate Silver</author1>
##     <author2>N/A</author2>
##     <attribute1>Art and science of prediction</attribute1>
##     <attribute2>Data, statistics, and critical thinking</attribute2>
##   </book>
##   <book>
##     <title>Superforecasting</title>
##     <author1>Philip E. Tetlock</author1>
##     <author2>Dan Gardner</author2>
##     <attribute1>Art and science of prediction</attribute1>
##     <attribute2>Challenges and uncertainties of making accurate predictions</attribute2>
##   </book>
## </books>
## 

Transform the data into a dataframe

books_xml <- xmlToDataFrame(xml_file)
books_xml
##                      title                  author1             author2
## 1                 Big Data Viktor Mayer-Schönberger Kenneth Neil Cukier
## 2 The signal and the noise              Nate Silver                 N/A
## 3         Superforecasting        Philip E. Tetlock         Dan Gardner
##                                           attribute1
## 1 Transformative power of big data in various fields
## 2                      Art and science of prediction
## 3                      Art and science of prediction
##                                                    attribute2
## 1                    Challenges and opportunities of big data
## 2                     Data, statistics, and critical thinking
## 3 Challenges and uncertainties of making accurate predictions

Load data from JSON into a data frame

Get the data

json_file <- fromJSON(file = 'https://raw.githubusercontent.com/Kossi-Akplaka/Data607-data_acquisition_and_management/main/assignment7/books.json')
json_file
## [[1]]
## [[1]]$title
## [1] "Big Data"
## 
## [[1]]$Author1
## [1] "Viktor Mayer-Schönberger"
## 
## [[1]]$Author2
## [1] "Kenneth Neil Cukier"
## 
## [[1]]$Attribute1
## [1] "Transformative power of big data in various fields"
## 
## [[1]]$Attribute2
## [1] "Challenges and opportunities of big data"
## 
## 
## [[2]]
## [[2]]$title
## [1] "The signal and the noise"
## 
## [[2]]$Author1
## [1] "Nate Silver"
## 
## [[2]]$Author2
## [1] "N/A"
## 
## [[2]]$Attribute1
## [1] "Art and science of prediction"
## 
## [[2]]$Attribute2
## [1] "Data, statistics, and critical thinking"
## 
## 
## [[3]]
## [[3]]$title
## [1] "Superforecasting"
## 
## [[3]]$Author1
## [1] "Philip E. Tetlock"
## 
## [[3]]$Author2
## [1] "Dan Gardner"
## 
## [[3]]$Attribute1
## [1] "Art and science of prediction"
## 
## [[3]]$Attribute2
## [1] "Challenges and opportunities of big data"

Transform to a dataframe

# stack list on top of each other
books_json <- do.call(rbind, lapply(json_file, as.data.frame))
books_json
##                      title                  Author1             Author2
## 1                 Big Data Viktor Mayer-Schönberger Kenneth Neil Cukier
## 2 The signal and the noise              Nate Silver                 N/A
## 3         Superforecasting        Philip E. Tetlock         Dan Gardner
##                                           Attribute1
## 1 Transformative power of big data in various fields
## 2                      Art and science of prediction
## 3                      Art and science of prediction
##                                 Attribute2
## 1 Challenges and opportunities of big data
## 2  Data, statistics, and critical thinking
## 3 Challenges and opportunities of big data

The data frames are identical…