This noteboook is for the purpose of reading in three diffeent file type an .xml, .json, and .html table and converting them into a dataframe.
library(jsonlite)
json_file <- "https://raw.githubusercontent.com/KevinJpotter/data_607/master/data/books.json"
json_df = fromJSON(json_file)
json_df
## author title
## 1 Douglas Adams The Hitchhiker's Guide to the Galaxy
## 2 Kim Stanley Robinson Aurora
## 3 Stephen King, Peter Straub The Talisman
## year_published genre
## 1 1979 sci-fi
## 2 2015 sci-fi
## 3 1984 sci-fi
library(rvest)
library(RCurl)
text <- getURL("https://raw.githubusercontent.com/KevinJpotter/data_607/master/data/books.html")
html_df <- as.data.frame(read_html(text) %>% html_table(fill = TRUE))
html_df
## Var.1 author genre title
## 1 0 Douglas Adams sci-fi The Hitchhiker's Guide to the Galaxy
## 2 1 Kim Stanley Robinson sci-fi Aurora
## 3 2 Stephen King, Peter Straub sci-fi The Talisman
## year_published
## 1 1979
## 2 2015
## 3 1984
library(XML)
library(dplyr)
data <- getURL("https://raw.githubusercontent.com/KevinJpotter/data_607/master/data/books2.xml")
doc <- xmlParse(data)
xml_df = xmlToDataFrame(doc, stringsAsFactors = FALSE, ) %>% mutate_all(~type.convert(.,
as.is = T))
xml_df
## author genre title
## 1 Douglas Adams sci-fi The Hitchhiker's Guide to the Galaxy
## 2 Kim Stanley Robinson sci-fi Aurora
## 3 Stephen King, Peter Straub sci-fi The Talisman
## year_published
## 1 1979
## 2 2015
## 3 1984
Each file type has its own package to work with in R and each work a little differently when loading them in. The all are stored slightly differently so extracting the information can be tricky. In this example the file strictures are very simple but, in large nested files extracting what you want can become difficult.