I created the data in the notebook and load that to my Github
To load the information from the HTML, XML, and JSON sources into separate R data frames, you can use the following code. We will use the rvest package for HTML, XML and xml2 package for XML, and rjson for JSON.
library(rvest) # load html file
library(xml2) # Load xml file
library(XML)
library(rjson) # load json file
This article “https://www.datacamp.com/tutorial/r-data-import-tutorial” on dataCamp explains how to import HTML, XML, and JSON data in R.
Extracting HTML data from github into table
file<-read_html("https://github.com/Kossi-Akplaka/Data607-data_acquisition_and_management/blob/main/assignment7/books.html")
table<-html_nodes(file, "table")
table
## {xml_nodeset (1)}
## [1] <table>\\r"," <tr>\\r"," <th>Books</th>\\r"," <th>Author1</th>\\r" ...
Transform into a dataframe
books_html <-html_table(table)[[1]] #converting HTML tables to dataframe
books_html
## # A tibble: 3 × 5
## Books Author1 Author2 Atribute1 Atribute2
## <chr> <chr> <chr> <chr> <chr>
## 1 Big Data Viktor Mayer-Schönberger Kenneth… Transfor… Challeng…
## 2 The signal and the noise Nate Silver N/A Art and … Data, st…
## 3 Superforecasting Philip E. Tetlock Dan Gar… Art and … Challeng…
Getting the data
xml_file <- xmlParse(read_xml('https://raw.githubusercontent.com/Kossi-Akplaka/Data607-data_acquisition_and_management/main/assignment7/book.xml'))
xml_file
## <?xml version="1.0" encoding="UTF-8"?>
## <books>
## <book>
## <title>Big Data</title>
## <author1>Viktor Mayer-Schönberger</author1>
## <author2>Kenneth Neil Cukier</author2>
## <attribute1>Transformative power of big data in various fields</attribute1>
## <attribute2>Challenges and opportunities of big data</attribute2>
## </book>
## <book>
## <title>The signal and the noise</title>
## <author1>Nate Silver</author1>
## <author2>N/A</author2>
## <attribute1>Art and science of prediction</attribute1>
## <attribute2>Data, statistics, and critical thinking</attribute2>
## </book>
## <book>
## <title>Superforecasting</title>
## <author1>Philip E. Tetlock</author1>
## <author2>Dan Gardner</author2>
## <attribute1>Art and science of prediction</attribute1>
## <attribute2>Challenges and uncertainties of making accurate predictions</attribute2>
## </book>
## </books>
##
Transform the data into a dataframe
books_xml <- xmlToDataFrame(xml_file)
books_xml
## title author1 author2
## 1 Big Data Viktor Mayer-Schönberger Kenneth Neil Cukier
## 2 The signal and the noise Nate Silver N/A
## 3 Superforecasting Philip E. Tetlock Dan Gardner
## attribute1
## 1 Transformative power of big data in various fields
## 2 Art and science of prediction
## 3 Art and science of prediction
## attribute2
## 1 Challenges and opportunities of big data
## 2 Data, statistics, and critical thinking
## 3 Challenges and uncertainties of making accurate predictions
Get the data
json_file <- fromJSON(file = 'https://raw.githubusercontent.com/Kossi-Akplaka/Data607-data_acquisition_and_management/main/assignment7/books.json')
json_file
## [[1]]
## [[1]]$title
## [1] "Big Data"
##
## [[1]]$Author1
## [1] "Viktor Mayer-Schönberger"
##
## [[1]]$Author2
## [1] "Kenneth Neil Cukier"
##
## [[1]]$Attribute1
## [1] "Transformative power of big data in various fields"
##
## [[1]]$Attribute2
## [1] "Challenges and opportunities of big data"
##
##
## [[2]]
## [[2]]$title
## [1] "The signal and the noise"
##
## [[2]]$Author1
## [1] "Nate Silver"
##
## [[2]]$Author2
## [1] "N/A"
##
## [[2]]$Attribute1
## [1] "Art and science of prediction"
##
## [[2]]$Attribute2
## [1] "Data, statistics, and critical thinking"
##
##
## [[3]]
## [[3]]$title
## [1] "Superforecasting"
##
## [[3]]$Author1
## [1] "Philip E. Tetlock"
##
## [[3]]$Author2
## [1] "Dan Gardner"
##
## [[3]]$Attribute1
## [1] "Art and science of prediction"
##
## [[3]]$Attribute2
## [1] "Challenges and opportunities of big data"
Transform to a dataframe
# stack list on top of each other
books_json <- do.call(rbind, lapply(json_file, as.data.frame))
books_json
## title Author1 Author2
## 1 Big Data Viktor Mayer-Schönberger Kenneth Neil Cukier
## 2 The signal and the noise Nate Silver N/A
## 3 Superforecasting Philip E. Tetlock Dan Gardner
## Attribute1
## 1 Transformative power of big data in various fields
## 2 Art and science of prediction
## 3 Art and science of prediction
## Attribute2
## 1 Challenges and opportunities of big data
## 2 Data, statistics, and critical thinking
## 3 Challenges and opportunities of big data
The data frames are identical…