I only get one table in the webpage, for simplicity reason, I don’t need to specify which table to read in. however, I could do //th[text() = ‘title’]/ancestor::table instead. It will give the same ouput
library(htmltab)
#store url
url <- "https://raw.githubusercontent.com/Sugarcane-svg/R/main/R607/Assignments/a5/movies.html"
# means find table label from th tag
html.tb <- htmltab(url, which = "//th/ancestor::table")
# print result
datatable(html.tb)
Read .json file
in my json file, there are three objects: movie1, movie2 and movie3. Each of them have the same column names. m3 has two directors, rbind function can only be used to bind data with same length, but m3 has length of 2 instead of 1, therefore, I cannot use rbine directly.
# store url
url <- "https://raw.githubusercontent.com/Sugarcane-svg/R/main/R607/Assignments/a5/movies.json"
# read .json
json.file <- fromJSON(getURL(url))
# create a instance of data frame
json.df <- data.frame()
# bind each of them
json.df <- rbind(json.df,json.file$movie1)
json.df <- rbind(json.df,json.file$movie2)
# make m3 as individual data frame with two observations
m3 <- as.data.frame(json.file$movie3)
datatable(m3)
From the table above, we can see that only director is is different, everything else remains the same, So I decide to bind data frame m3 first, change the value of director finally remove the unnecessary rows
# bind m3 into json.df
json.df <- rbind(json.df, m3)
# concate director with the same title
json.df[3,2] <- m3 %>%
filter(title=="Jiang Ziya")%>%
summarize(paste(director, collapse = ", "))
# remove the last row
json.df <- json.df[-4, ]
datatable(json.df)
Compare if these three data frame are identical from each other
# json vs. xml
identical(xml.df, json.df)
## [1] FALSE
# json vs. html
identical(html.tb, json.df)
## [1] FALSE
# xml vs html
identical(html.tb, xml.df)
## [1] FALSE