Load the html file I created with my favourite books in a
dataframe!
books_html <- readHTMLTable("books.html", stringsAsFactors = FALSE)
books_html <- books_html[[1]]
print(books_html)
## Title
## 1 Thomas Calculus
## 2 Introduction to Algorithms
## 3 Discrete Mathematics and Its Applications
## Authors
## 1 Joel R. Hass,Christopher E. Heil,Maurice D. Weir
## 2 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
## 3 Kenneth H. Rosen
## Cost Edition ISBN Publication Year
## 1 95.99 3rd 978-0137442997 2021
## 2 52.12 3rd 978-0262033848 2009
## 3 129.15 7th 978-0073383095 2011
XML
Load the xml file I created with my favourite books in a
dataframe!
books_xml <- xmlToDataFrame("books.xml")
print(books_xml)
## Title
## 1 Thomas Calculus
## 2 Introduction to Algorithms
## 3 Discrete Mathematics and Its Applications
## Author Cost
## 1 Joel R. Hass,Christopher E. Heil,Maurice D. Weir 95.99
## 2 Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest,Clifford Stein 52.12
## 3 Kenneth H. Rosen 129.15
## Edition ISBN Publication_Year
## 1 3rd 978-0137442997 2021
## 2 3rd 978-0262033848 2009
## 3 7th 978-0073383095 2011
JSON
Load the json file I created with my favourite books in a
dataframe!
books_json <- fromJSON("books.json")
print(books_json)
## Title
## 1 Thomas Calculus
## 2 Introduction to Algorithms
## 3 Discrete Mathematics and Its Applications
## Author Cost
## 1 Joel R. Hass,Christopher E. Heil,Maurice D. Weir 95.99
## 2 Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest,Clifford Stein 52.12
## 3 Kenneth H. Rosen 129.15
## Edition ISBN Publication_Year
## 1 3rd 978-0137442997 2021
## 2 3rd 978-0262033848 2009
## 3 7th 978-0073383095 2011
I will use the identical() function to compare the 3 data
frames.Identical() will check if all the data frames have the same
column names, column types, and data values in each cell. If all these
conditions are met, identical() will return TRUE, indicating that the
data frames are identical. Otherwise, it will return FALSE.
identical(books_html, books_xml)
## [1] FALSE
identical(books_html, books_json)
## [1] FALSE
identical(books_xml, books_json)
## [1] FALSE
I’ll try to make them identical , but first I need to figure out
where the problem is.
names(books_html)
## [1] "Title" "Authors" "Cost" "Edition"
## [5] "ISBN" "Publication Year"
names(books_xml)
## [1] "Title" "Author" "Cost" "Edition"
## [5] "ISBN" "Publication_Year"
names(books_json)
## [1] "Title" "Author" "Cost" "Edition"
## [5] "ISBN" "Publication_Year"
identical(names(books_html), names(books_xml))
## [1] FALSE
identical(names(books_html), names(books_json))
## [1] FALSE
str(books_html)
## 'data.frame': 3 obs. of 6 variables:
## $ Title : chr "Thomas Calculus" "Introduction to Algorithms" "Discrete Mathematics and Its Applications"
## $ Authors : chr "Joel R. Hass,Christopher E. Heil,Maurice D. Weir" "Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein" "Kenneth H. Rosen"
## $ Cost : chr "95.99" "52.12" "129.15"
## $ Edition : chr "3rd" "3rd" "7th"
## $ ISBN : chr "978-0137442997" "978-0262033848" "978-0073383095"
## $ Publication Year: chr "2021" "2009" "2011"
str(books_xml)
## 'data.frame': 3 obs. of 6 variables:
## $ Title : chr "Thomas Calculus" "Introduction to Algorithms" "Discrete Mathematics and Its Applications"
## $ Author : chr "Joel R. Hass,Christopher E. Heil,Maurice D. Weir" "Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest,Clifford Stein" "Kenneth H. Rosen"
## $ Cost : chr "95.99" "52.12" "129.15"
## $ Edition : chr "3rd" "3rd" "7th"
## $ ISBN : chr "978-0137442997" "978-0262033848" "978-0073383095"
## $ Publication_Year: chr "2021" "2009" "2011"
str(books_json) ### The problem is located there. The cost and the publication year in the json dataframe are not characters as the others.Let's change that!
## 'data.frame': 3 obs. of 6 variables:
## $ Title : chr "Thomas Calculus" "Introduction to Algorithms" "Discrete Mathematics and Its Applications"
## $ Author : chr "Joel R. Hass,Christopher E. Heil,Maurice D. Weir" "Thomas H. Cormen,Charles E. Leiserson,Ronald L. Rivest,Clifford Stein" "Kenneth H. Rosen"
## $ Cost : num 96 52.1 129.2
## $ Edition : chr "3rd" "3rd" "7th"
## $ ISBN : chr "978-0137442997" "978-0262033848" "978-0073383095"
## $ Publication_Year: int 2021 2009 2011
The problem is located in the json dataframe.The cost and the
publication year in the json dataframe are not characters as the
others.Let’s change that!
books_json$Cost <- as.character(books_json$Cost)
books_json$Publication_Year <- as.character(books_json$Publication_Year)
identical(books_html, books_xml)
## [1] FALSE
identical(books_html, books_json)
## [1] FALSE
identical(books_xml, books_json)
## [1] TRUE