My Approach

My approach follows:

  • I created the three files by hand using the brackets editor.
  • I will use rvest, XML and jsonlite packages to parse the html, xml and json files, repectively.
  • Once imported I will use the kableExtra package to display each of the files
  • Use all.equal and/or identical functions to determine if the data frames are identical

Import Data

1 HTML File

Title Author.s. Subject Publisher Year ISBN
Efficiency of Racetrack Betting Markets Donald B. Hausch,Victor S.Y. Lo, William T. Ziemba Academic Finance - Betting Markets World Scientific 2012 9.812819e+09
Precision: Statistical and Mathematical Methods in Horse Racing C X Wong Quantitative Methods in Horse Racing Outskirts Press 2011 9.781433e+12
The Odds Must Be Crazy: Beating the Races with the Man Who Revolutionized Handicapping Len Ragozin Handicapping-Figure Making Little, Brown and Company 1997 9.780317e+12

2. XML File

Title Author Subject Publisher Year ISBN
Efficiency of Racetrack Betting Markets Donald B. Hausch,Victor S.Y. Lo, William T. Ziemba Academic Finance - Betting Markets World Scientific 2012 9812819185
Precision: Statistical and Mathematical Methods in Horse Racing C X Wong Quantitative Methods in Horse Racing Outskirts Press 2011 978143276852
The Odds Must Be Crazy: Beating the Races with the Man Who Revolutionized Handicapping Len Ragozin Handicapping-Figure Making Little, Brown and Company 1997 9781432768522

3. JSON File

Title Author Subject Publisher Year ISBN
Efficiency of Racetrack Betting Markets Donald B. Hausch,Victor S.Y. Lo, William T. Ziemba Academic Finance Betting Markets World Scientific 2012 9812819185
Precision: Statistical and Mathematical Methods in Horse Racing C X Wong Quantitative Methods in Horse Racing Outskirts Press 2011 978143276852
The Odds Must Be Crazy: Beating the Races with the Man Who Revolutionized Handicapping Len Ragozin Handicapping Figure Making Little, Brown and Company 1997 9781432768522

Are The Files Identical

## [1] "Names: 1 string mismatch"                                   
## [2] "Component \"Year\": Modes: numeric, character"              
## [3] "Component \"Year\": target is numeric, current is character"
## [4] "Component \"ISBN\": Modes: numeric, character"              
## [5] "Component \"ISBN\": target is numeric, current is character"
## [1] FALSE
The all.equal and identical functions indicate that the table/files are not identical. The reason for this is that Year and ISBN are numeric in some files and characters in others. The characters variables could easily be coersed to numerics, thus rendering the files identical.