Introduction

This noteboook is for the purpose of reading in three diffeent file type an .xml, .json, and .html table and converting them into a dataframe.

Convert JSON file to a Data Frame

library(jsonlite)
json_file <- "https://raw.githubusercontent.com/KevinJpotter/data_607/master/data/books.json"
json_df = fromJSON(json_file)
json_df
##                       author                                title
## 1              Douglas Adams The Hitchhiker's Guide to the Galaxy
## 2       Kim Stanley Robinson                               Aurora
## 3 Stephen King, Peter Straub                         The Talisman
##   year_published  genre
## 1           1979 sci-fi
## 2           2015 sci-fi
## 3           1984 sci-fi

Convert HTML file to a Data Frame

library(rvest)
library(RCurl)
text <- getURL("https://raw.githubusercontent.com/KevinJpotter/data_607/master/data/books.html")
html_df <- as.data.frame(read_html(text) %>% html_table(fill = TRUE))
html_df
##   Var.1                     author  genre                                title
## 1     0              Douglas Adams sci-fi The Hitchhiker's Guide to the Galaxy
## 2     1       Kim Stanley Robinson sci-fi                               Aurora
## 3     2 Stephen King, Peter Straub sci-fi                         The Talisman
##   year_published
## 1           1979
## 2           2015
## 3           1984

Convert XML file to a Data Frame

library(XML)
library(dplyr)

data <- getURL("https://raw.githubusercontent.com/KevinJpotter/data_607/master/data/books2.xml")
doc <- xmlParse(data)
xml_df = xmlToDataFrame(doc, stringsAsFactors = FALSE, ) %>% mutate_all(~type.convert(., 
    as.is = T))
xml_df
##                       author  genre                                title
## 1              Douglas Adams sci-fi The Hitchhiker's Guide to the Galaxy
## 2       Kim Stanley Robinson sci-fi                               Aurora
## 3 Stephen King, Peter Straub sci-fi                         The Talisman
##   year_published
## 1           1979
## 2           2015
## 3           1984

Conclusion

Each file type has its own package to work with in R and each work a little differently when loading them in. The all are stored slightly differently so extracting the information can be tricky. In this example the file strictures are very simple but, in large nested files extracting what you want can become difficult.