Reading files in XML/JSON/HTML formats into R
-Used XML library for reading XML and HTML files and jsonlite library for reading JSON file.
library(XML)
# xmlToDataFrame function to create dataframe from the input xml
df_booksxml <- xmlToDataFrame("books.xml")
#printing the dataframe
df_booksxml
## ListId Title
## 1 1 Introduction to Algorithms
## 2 2 R for Data Science
## 3 3 Automated Data Collection with R
## Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2 Garrett Grolemund, Hadley Wickham
## 3 Simon Munzert, Christian Rubba, Peter Meibner, Dominic Nyhuis
## Publisher ReleaseDate ISBN
## 1 MIT Press July 2009 9780262033848
## 2 O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd 2015 9781118834817
## BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2 Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3 A Practical Guide to Web Scraping and Text Mining
# class is dataframe and type is list
(class(df_booksxml))
## [1] "data.frame"
(typeof(df_booksxml))
## [1] "list"
library(XML)
# Reading HTML file using xmlParse function
obj_html <-readHTMLTable("books.html", header = T)
# Assigning name to the books list
names(obj_html) <- "books"
#printing the list
obj_html
## $books
## ListId Title
## 1 1 Introduction to Algorithms
## 2 2 R for Data Science
## 3 3 Automated Data Collection with R
## Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2 Garrett Grolemund, Hadley Wickham
## 3 Simon Munzert, Christian Rubba, Peter Meibner, Dominic Nyhuis
## Publisher ReleaseDate ISBN
## 1 MIT Press July 2009 9780262033848
## 2 O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd 2015 9781118834817
## BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2 Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3 A Practical Guide to Web Scraping and Text Mining
# Converting list to a dataframe
df_bookshtml <- obj_html$'books'
#printing the dataframe
df_bookshtml
## ListId Title
## 1 1 Introduction to Algorithms
## 2 2 R for Data Science
## 3 3 Automated Data Collection with R
## Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2 Garrett Grolemund, Hadley Wickham
## 3 Simon Munzert, Christian Rubba, Peter Meibner, Dominic Nyhuis
## Publisher ReleaseDate ISBN
## 1 MIT Press July 2009 9780262033848
## 2 O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd 2015 9781118834817
## BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2 Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3 A Practical Guide to Web Scraping and Text Mining
# class is dataframe and type is list
(class(df_bookshtml))
## [1] "data.frame"
(typeof(df_bookshtml))
## [1] "list"
library(jsonlite)
# Getting json data fromJSON function imports the jasaon data as list
obj_json <- fromJSON("books.json")
#printing the list
obj_json
## $books
## ListId Title
## 1 1 Introduction to Algorithms
## 2 2 R for Data Science
## 3 3 Automated Data Collection with R
## Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2 Garrett Grolemund, Hadley Wickham
## 3 Christian Rubba, Dominic Nyhuis, Simon Munzert, Peter Meibner
## Publisher ReleaseDate ISBN
## 1 MIT Press July 2009 9780262033848
## 2 O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd 2015 9781118834817
## BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2 Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3 A Practical Guide to Web Scraping and Text Mining
# converitng list to a dataframe.
# Note1: Notice, the Authors column is a list
df_booksjson <- obj_json$'books'
#printing the dataframe
df_booksjson
## ListId Title
## 1 1 Introduction to Algorithms
## 2 2 R for Data Science
## 3 3 Automated Data Collection with R
## Authors
## 1 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 2 Garrett Grolemund, Hadley Wickham
## 3 Christian Rubba, Dominic Nyhuis, Simon Munzert, Peter Meibner
## Publisher ReleaseDate ISBN
## 1 MIT Press July 2009 9780262033848
## 2 O'Reilly Media, Inc. December 2016 9781491910399
## 3 John Wiley & Sons, Ltd 2015 9781118834817
## BookDescription
## 1 Introduction to Algorithms uniquely combines rigor and comprehensiveness. The book covers a broad range of algorithms in depth, yet makes their design and analysis accessible to all levels of readers. Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.
## 2 Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
## 3 A Practical Guide to Web Scraping and Text Mining
# Authors by row sequence
(df_booksjson$Authors)
## [[1]]
## [1] "Thomas H. Cormen" "Charles E. Leiserson" "Ronald L. Rivest"
## [4] "Clifford Stein"
##
## [[2]]
## [1] "Garrett Grolemund" "Hadley Wickham"
##
## [[3]]
## [1] "Christian Rubba" "Dominic Nyhuis" "Simon Munzert" "Peter Meibner"
# class is dataframe and type is list
class(df_booksjson)
## [1] "data.frame"
typeof(df_booksjson)
## [1] "list"