Reading files in XML/JSON/HTML formats into R

-Used XML library for reading XML and HTML files and jsonlite library for reading JSON file.

Reading XML data

library(XML)

#  xmlToDataFrame function to create dataframe from the input xml 
df_booksxml <- xmlToDataFrame("books.xml")
#printing the dataframe
df_booksxml
##   ListId                            Title
## 1      1       Introduction to Algorithms
## 2      2               R for Data Science
## 3      3 Automated Data Collection with R
##                                                                    Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2                                        Garrett Grolemund, Hadley Wickham
## 3            Simon Munzert, Christian Rubba, Peter Meibner, Dominic Nyhuis
##                Publisher   ReleaseDate          ISBN
## 1              MIT Press     July 2009 9780262033848
## 2   O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd          2015 9781118834817
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2                                                                                                                                  Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                               A Practical Guide to Web Scraping and Text Mining
# class is dataframe and type is list
(class(df_booksxml))
## [1] "data.frame"
(typeof(df_booksxml))
## [1] "list"

Reading HTML data

library(XML)

# Reading HTML file using xmlParse function
obj_html <-readHTMLTable("books.html", header = T)
# Assigning name to the books list
names(obj_html) <- "books"
#printing the list
obj_html
## $books
##   ListId                            Title
## 1      1       Introduction to Algorithms
## 2      2               R for Data Science
## 3      3 Automated Data Collection with R
##                                                                    Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2                                        Garrett Grolemund, Hadley Wickham
## 3            Simon Munzert, Christian Rubba, Peter Meibner, Dominic Nyhuis
##                Publisher   ReleaseDate          ISBN
## 1              MIT Press     July 2009 9780262033848
## 2   O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd          2015 9781118834817
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2                                                                                                                                  Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                               A Practical Guide to Web Scraping and Text Mining
# Converting list to a dataframe
df_bookshtml <- obj_html$'books'
#printing the dataframe
df_bookshtml
##   ListId                            Title
## 1      1       Introduction to Algorithms
## 2      2               R for Data Science
## 3      3 Automated Data Collection with R
##                                                                    Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2                                        Garrett Grolemund, Hadley Wickham
## 3            Simon Munzert, Christian Rubba, Peter Meibner, Dominic Nyhuis
##                Publisher   ReleaseDate          ISBN
## 1              MIT Press     July 2009 9780262033848
## 2   O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd          2015 9781118834817
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2                                                                                                                                  Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                               A Practical Guide to Web Scraping and Text Mining
# class is dataframe and type is list
(class(df_bookshtml))
## [1] "data.frame"
(typeof(df_bookshtml))
## [1] "list"

Reading JSON data

library(jsonlite)

# Getting json data fromJSON function imports the jasaon data as list
obj_json <- fromJSON("books.json")
#printing the list
obj_json
## $books
##   ListId                            Title
## 1      1       Introduction to Algorithms
## 2      2               R for Data Science
## 3      3 Automated Data Collection with R
##                                                                    Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2                                        Garrett Grolemund, Hadley Wickham
## 3            Christian Rubba, Dominic Nyhuis, Simon Munzert, Peter Meibner
##                Publisher   ReleaseDate          ISBN
## 1              MIT Press     July 2009 9780262033848
## 2   O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd          2015 9781118834817
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2                                                                                                                                  Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                               A Practical Guide to Web Scraping and Text Mining
# converitng list to a dataframe.
# Note1: Notice, the Authors column is a list
df_booksjson <- obj_json$'books'
#printing the dataframe
df_booksjson
##   ListId                            Title
## 1      1       Introduction to Algorithms
## 2      2               R for Data Science
## 3      3 Automated Data Collection with R
##                                                                    Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2                                        Garrett Grolemund, Hadley Wickham
## 3            Christian Rubba, Dominic Nyhuis, Simon Munzert, Peter Meibner
##                Publisher   ReleaseDate          ISBN
## 1              MIT Press     July 2009 9780262033848
## 2   O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd          2015 9781118834817
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2                                                                                                                                  Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                               A Practical Guide to Web Scraping and Text Mining
# Authors by row sequence
(df_booksjson$Authors)
## [[1]]
## [1] "Thomas H. Cormen"     "Charles E. Leiserson" "Ronald L. Rivest"    
## [4] "Clifford Stein"      
## 
## [[2]]
## [1] "Garrett Grolemund" "Hadley Wickham"   
## 
## [[3]]
## [1] "Christian Rubba" "Dominic Nyhuis"  "Simon Munzert"   "Peter Meibner"
# class is dataframe and type is list
class(df_booksjson)
## [1] "data.frame"
typeof(df_booksjson)
## [1] "list"