# Installing some useful packages
library(magrittr)
library(data.table)
library(kableExtra)
library(httr)
library(jsonlite)
library(XML) # Parse XML
library(RCurl)
library(stringr)
本文件將會介紹什麼是REST和API及如何使用R去透過Internet訪問server 上可用的數據。
Representational State Transfer (REST) 指的是web service中開發和操作中使用的語法。RESTful API是一個 URL界面,使用 HTTP method 來發送和接收信息。 在RESTful APT中用Uniform Resource Identifiers(URIs)來標識和尋址資源,並使用GET
,PUT
,POST
和DELETE
抽取訊息。
API (Application Programming Interface, 應用程式編譯介面) 指的是是軟體系統不同組成部分銜接的約定。且對API的訪問通常是僅允許經過身份驗證/授權的用戶使用,這種交互是當今Web application 的基石,指 Web application通過授權的方法來交換信息。
RESTful API 的回應(Responses) 通常是機器可讀的XML或JSON文件格式構造而成。 JSON(avaScript Object Notation, JavaScript對象表示法)是一種文件格式,它包含組織數據,而且定義了該組織的結構。 這很有用,因為數據通常是按結構組織的。
There are several R packages that will ‘wrap’ API calls. These are often written to access specific services and may include a limited of endpoints. However, data may also be accessed making direct HTTP requests using packages like httr
, and packages like jsonlite
to parse JSON files for use in R. For the purpose of this document, httr
and jsonlite
will be used to access data. Also in REST,
Example: Accessing OMDb API
The OMDb API is an open web service that hosts movie information, contributed and maintained by users. To make a call to this service, the endpoint must be specified, as well as an ‘API key’, which may be requested from the host. Such services are usually secured by various layers of security. Issuing an API key, is often mainly done to identify users and to limit traffic.
Other parameters must also be specified in the request call. These are listed below.
Using these, as well as an \(apikey\) supplied by the host, I will request details for the 2008 release of ‘Iron Man’ using the’ following HTTP request method GET
:
library(httr)
path <- "http://www.omdbapi.com/?apikey=1c8a7927&t=Iron+Man&y=2008" # web service of JSON
r <- GET(url = path)
r
## Response [http://www.omdbapi.com/?apikey=1c8a7927&t=Iron+Man&y=2008]
## Date: 2020-07-26 03:46
## Status: 200
## Content-Type: application/json; charset=utf-8
## Size: 1.22 kB
To see if a HTTP request is successful, check the reply status code. For a list of these, please refer to https://www.restapitutorial.com/httpstatuscodes.html. A successful connection and transmission is represented by ‘200’.
# See what type of content is in the response
r %>%
http_type()
## [1] "application/json"
# To view the content of the reply:
r %>%
httr::content() %>%
str()
## List of 25
## $ Title : chr "Iron Man"
## $ Year : chr "2008"
## $ Rated : chr "PG-13"
## $ Released : chr "02 May 2008"
## $ Runtime : chr "126 min"
## $ Genre : chr "Action, Adventure, Sci-Fi"
## $ Director : chr "Jon Favreau"
## $ Writer : chr "Mark Fergus (screenplay), Hawk Ostby (screenplay), Art Marcum (screenplay), Matt Holloway (screenplay), Stan Le"| __truncated__
## $ Actors : chr "Robert Downey Jr., Terrence Howard, Jeff Bridges, Gwyneth Paltrow"
## $ Plot : chr "After being held captive in an Afghan cave, billionaire engineer Tony Stark creates a unique weaponized suit of"| __truncated__
## $ Language : chr "Hungarian, Kurdish, Hindi, English, Persian, Urdu, Arabic"
## $ Country : chr "USA, Canada"
## $ Awards : chr "Nominated for 2 Oscars. Another 21 wins & 65 nominations."
## $ Poster : chr "https://m.media-amazon.com/images/M/MV5BMTczNTI2ODUwOF5BMl5BanBnXkFtZTcwMTU0NTIzMw@@._V1_SX300.jpg"
## $ Ratings :List of 3
## ..$ :List of 2
## .. ..$ Source: chr "Internet Movie Database"
## .. ..$ Value : chr "7.9/10"
## ..$ :List of 2
## .. ..$ Source: chr "Rotten Tomatoes"
## .. ..$ Value : chr "94%"
## ..$ :List of 2
## .. ..$ Source: chr "Metacritic"
## .. ..$ Value : chr "79/100"
## $ Metascore : chr "79"
## $ imdbRating: chr "7.9"
## $ imdbVotes : chr "914,253"
## $ imdbID : chr "tt0371746"
## $ Type : chr "movie"
## $ DVD : chr "30 Sep 2008"
## $ BoxOffice : chr "$318,298,180"
## $ Production: chr "Paramount Pictures"
## $ Website : chr "N/A"
## $ Response : chr "True"
由上述list 可看出,OMDb API中Ratings (評分) 的資料分別來自於Internet Movie Database
、Rotten Tomatoes
和 Metacritic
在取得OMDb API 的資料後,若要將此web service 的資料轉換為R的數據處理格式 (dataframe、data.table),可用下述code進行轉換
library(jsonlite)
df <- r %>%
content(x = ., as = "text", encoding = "UTF-8") %>%
fromJSON(r, flatten = TRUE)
df
## $Title
## [1] "Iron Man"
##
## $Year
## [1] "2008"
##
## $Rated
## [1] "PG-13"
##
## $Released
## [1] "02 May 2008"
##
## $Runtime
## [1] "126 min"
##
## $Genre
## [1] "Action, Adventure, Sci-Fi"
##
## $Director
## [1] "Jon Favreau"
##
## $Writer
## [1] "Mark Fergus (screenplay), Hawk Ostby (screenplay), Art Marcum (screenplay), Matt Holloway (screenplay), Stan Lee (characters), Don Heck (characters), Larry Lieber (characters), Jack Kirby (characters)"
##
## $Actors
## [1] "Robert Downey Jr., Terrence Howard, Jeff Bridges, Gwyneth Paltrow"
##
## $Plot
## [1] "After being held captive in an Afghan cave, billionaire engineer Tony Stark creates a unique weaponized suit of armor to fight evil."
##
## $Language
## [1] "Hungarian, Kurdish, Hindi, English, Persian, Urdu, Arabic"
##
## $Country
## [1] "USA, Canada"
##
## $Awards
## [1] "Nominated for 2 Oscars. Another 21 wins & 65 nominations."
##
## $Poster
## [1] "https://m.media-amazon.com/images/M/MV5BMTczNTI2ODUwOF5BMl5BanBnXkFtZTcwMTU0NTIzMw@@._V1_SX300.jpg"
##
## $Ratings
## $Ratings[[1]]
## $Ratings[[1]]$Source
## [1] "Internet Movie Database"
##
## $Ratings[[1]]$Value
## [1] "7.9/10"
##
##
## $Ratings[[2]]
## $Ratings[[2]]$Source
## [1] "Rotten Tomatoes"
##
## $Ratings[[2]]$Value
## [1] "94%"
##
##
## $Ratings[[3]]
## $Ratings[[3]]$Source
## [1] "Metacritic"
##
## $Ratings[[3]]$Value
## [1] "79/100"
##
##
##
## $Metascore
## [1] "79"
##
## $imdbRating
## [1] "7.9"
##
## $imdbVotes
## [1] "914,253"
##
## $imdbID
## [1] "tt0371746"
##
## $Type
## [1] "movie"
##
## $DVD
## [1] "30 Sep 2008"
##
## $BoxOffice
## [1] "$318,298,180"
##
## $Production
## [1] "Paramount Pictures"
##
## $Website
## [1] "N/A"
##
## $Response
## [1] "True"
上述code可得到此部電影的相關資訊
另外,有可能使用者只對特定column感興趣,像是電影名稱 (Title)、上映時間 (Year)、IMDB分級 (Rated)、演員 (Actors)和導演 (Director)感興趣,則可使用下述code進行處理
filter_df <- data.table(Title = df$Title,
Year = df$Year,
Director = df$Director,
Actors = df$Actors,
IMDB_Grading = df$Rated)
filter_df %>%
kable(., "html", caption = "The Movie information") %>%
kable_styling(bootstrap_options = "striped",
full_width = F)
Title | Year | Director | Actors | IMDB_Grading |
---|---|---|---|---|
Iron Man | 2008 | Jon Favreau | Robert Downey Jr., Terrence Howard, Jeff Bridges, Gwyneth Paltrow | PG-13 |
XML files 是由標記 (markup)和內容組成 (content)。對於XML檔的內容,可以用xml2
package 中的 read_xml()
function 去解析 (Parse)。
在這邊用wikipedia中hadley wickham的XML檔案做示範
library(xml2)
resp <- GET(url = "https://en.wikipedia.org/w/api.php",
query = list(action = "query",
titles = "Hadley Wickham",
prop = "revisions",
rvprop = "timestamp|user|comment|content",
rvlimit = "5",
format = "xml",
rvdir = "newer",
rvstart = "2015-01-14T17:12:45Z",
rvsection = "0"))
resp
## Response [https://en.wikipedia.org/w/api.php?action=query&titles=Hadley%20Wickham&prop=revisions&rvprop=timestamp%7Cuser%7Ccomment%7Ccontent&rvlimit=5&format=xml&rvdir=newer&rvstart=2015-01-14T17%3A12%3A45Z&rvsection=0]
## Date: 2020-07-26 03:46
## Status: 200
## Content-Type: text/xml; charset=utf-8
## Size: 12.4 kB
## <?xml version="1.0"?><api><continue rvcontinue="20150528042700|664370232...
##
## He is a prominent and active member of the [[R (programming language)|R]...
##
## He is a prominent and active member of the [[R (programming language)|R]...
##
## He is a prominent and active member of the [[R (programming language)|R]...
##
## He is a prominent and active member of the [[R (programming language)|R]...
# See what type of content is in the response
resp %>%
http_type()
## [1] "text/xml"
# To view the content of the reply:
resp %>%
httr::content()
## {xml_document}
## <api>
## [1] <continue rvcontinue="20150528042700|664370232" continue="||"/>
## [2] <warnings>\n <main xml:space="preserve">Subscribe to the mediawiki- ...
## [3] <query>\n <pages>\n <page _idx="41916270" pageid="41916270" ns=" ...
Alternatively, extract the text and parse the xml explicitly.
# Parse the xml
resp_xml <- resp %>%
content(., as = "text") %>%
read_xml()
# Structural of the XML
resp_xml %>%
xml_structure()
## <api>
## <continue [rvcontinue, continue]>
## <warnings>
## <main [space]>
## {text}
## <revisions [space]>
## {text}
## <query>
## <pages>
## <page [_idx, pageid, ns, title]>
## <revisions>
## <rev [user, anon, timestamp, contentformat, contentmodel, comment, space]>
## {text}
## <rev [user, anon, timestamp, contentformat, contentmodel, comment, space]>
## {text}
## <rev [user, timestamp, contentformat, contentmodel, comment, space]>
## {text}
## <rev [user, timestamp, contentformat, contentmodel, comment, space]>
## {text}
## <rev [user, timestamp, contentformat, contentmodel, comment, space]>
## {text}
library(rlist)
revs <- content(resp)$query$pages$`41916270`$revisions
# revs is a list of lists. Extract user, timestamp elements from each sublist.
user_time <- list.select(revs, user, timestamp)
user_time
## list()
# Stack to turn into a data frame
user_time %>% list.stack()
可以用xml_find_all()
function 抓取與XPATH匹配的節點 (node)。
在XPATHE中,/
開頭表當前級別,而@
表當前節點的屬性 (attribute)。
In R, we can use xml_find_all()
returns a nodeset of XPATH, To get data from the nodes in the nodeset, explicitly ask with xml_text()
, xml_double()
, xml_integer()
, or as_list()
。
For example
# Get the nodeset of XPATH
xml_find_all(resp_xml,
xpath = "/api/query/pages/page/revisions/rev")
## {xml_nodeset (5)}
## [1] <rev user="214.28.226.251" anon="" timestamp="2015-01-14T17:12:45Z" ...
## [2] <rev user="73.183.151.193" anon="" timestamp="2015-01-15T15:49:34Z" ...
## [3] <rev user="FeanorStar7" timestamp="2015-01-24T16:34:31Z" contentform ...
## [4] <rev user="KasparBot" timestamp="2015-04-26T19:18:17Z" contentformat ...
## [5] <rev user="Spkal" timestamp="2015-05-06T18:24:57Z" contentformat="te ...
由上述結果可得知,這是wikipedia中hadley wickham的XML檔案中的nodeset
# Two method can get the nodeset of XPATH
# All nodes in path
rev_nodes <- xml_find_all(resp_xml,
xpath = "/api/query/pages/page/revisions/rev")
# All rev nodes in document
rev_nodes <- xml_find_all(resp_xml, xpath = "//rev")
# Use xml_text to get the information of nodeset
rev_nodes %>% xml_text()
## [1] "'''Hadley Mary Helen Wickham III''' is a [[statistician]] from [[New Zealand]] who is currently Chief Scientist at [[RStudio]]<ref>{{cite web|url=http://washstat.org/wss1310.shtml |title=Washington Statistical Society October 2013 Newsletter |publisher=Washstat.org |date= |accessdate=2014-02-12}}</ref><ref>{{cite web|url=http://news.idg.no/cw/art.cfm?id=F66B12BB-D13E-94B0-DAA22F5AB01BEFE7 |title=60+ R resources to improve your data skills ( - Software ) |publisher=News.idg.no |date= |accessdate=2014-02-12}}</ref> and an [[Professors_in_the_United_States#Adjunct_professor|adjunct]] [[Assistant Professor]] of statistics at [[Rice University]].<ref name=\"about\">{{cite web|url=http://www.rstudio.com/about/ |title=About - RStudio |accessdate=2014-08-13}}</ref> He is best known for his development of open-source statistical analysis software packages for [[R (programming language)]] that implement logics of [[data visualisation]] and data transformation. Wickham completed his undergraduate studies at the [[University of Auckland]] and his PhD at [[Iowa State University]] under the supervision of Di Cook and Heike Hoffman.<ref>{{cite web|URL=http://blog.revolutionanalytics.com/2010/09/the-r-files-hadley-wickham.html |title= The R-Files: Hadley Wickham}}</ref> In 2006 he was awarded the [[John_Chambers_(statistician)|John Chambers]] Award for Statistical Computing for his work developing tools for data reshaping and visualisation.<ref>{{cite web|url=http://stat-computing.org/awards/jmc/winners.html |title=John Chambers Award Past winners|publisher=ASA Sections on Statistical Computing, Statistical Graphics,|date= |accessdate=2014-08-12}}</ref>\n\nHe is a prominent and active member of the [[R (programming language)|R]] user community and has developed several notable and widely used packages including [[ggplot2]], plyr, dplyr, and reshape2.<ref name=\"about\" /><ref>{{cite web|url=http://www.r-statistics.com/2013/06/top-100-r-packages-for-2013-jan-may/ |title=Top 100 R Packages for 2013 (Jan-May)! |publisher=R-statistics blog |date= |accessdate=2014-08-12}}</ref>"
## [2] "'''Hadley Wickham''' is a [[statistician]] from [[New Zealand]] who is currently Chief Scientist at [[RStudio]]<ref>{{cite web|url=http://washstat.org/wss1310.shtml |title=Washington Statistical Society October 2013 Newsletter |publisher=Washstat.org |date= |accessdate=2014-02-12}}</ref><ref>{{cite web|url=http://news.idg.no/cw/art.cfm?id=F66B12BB-D13E-94B0-DAA22F5AB01BEFE7 |title=60+ R resources to improve your data skills ( - Software ) |publisher=News.idg.no |date= |accessdate=2014-02-12}}</ref> and an [[Professors_in_the_United_States#Adjunct_professor|adjunct]] [[Assistant Professor]] of statistics at [[Rice University]].<ref name=\"about\">{{cite web|url=http://www.rstudio.com/about/ |title=About - RStudio |accessdate=2014-08-13}}</ref> He is best known for his development of open-source statistical analysis software packages for [[R (programming language)]] that implement logics of [[data visualisation]] and data transformation. Wickham completed his undergraduate studies at the [[University of Auckland]] and his PhD at [[Iowa State University]] under the supervision of Di Cook and Heike Hoffman.<ref>{{cite web|URL=http://blog.revolutionanalytics.com/2010/09/the-r-files-hadley-wickham.html |title= The R-Files: Hadley Wickham}}</ref> In 2006 he was awarded the [[John_Chambers_(statistician)|John Chambers]] Award for Statistical Computing for his work developing tools for data reshaping and visualisation.<ref>{{cite web|url=http://stat-computing.org/awards/jmc/winners.html |title=John Chambers Award Past winners|publisher=ASA Sections on Statistical Computing, Statistical Graphics,|date= |accessdate=2014-08-12}}</ref>\n\nHe is a prominent and active member of the [[R (programming language)|R]] user community and has developed several notable and widely used packages including [[ggplot2]], plyr, dplyr, and reshape2.<ref name=\"about\" /><ref>{{cite web|url=http://www.r-statistics.com/2013/06/top-100-r-packages-for-2013-jan-may/ |title=Top 100 R Packages for 2013 (Jan-May)! |publisher=R-statistics blog |date= |accessdate=2014-08-12}}</ref>"
## [3] "'''Hadley Wickham''' is a [[statistician]] from [[New Zealand]] who is currently Chief Scientist at [[RStudio]]<ref>{{cite web|url=http://washstat.org/wss1310.shtml |title=Washington Statistical Society October 2013 Newsletter |publisher=Washstat.org |date= |accessdate=2014-02-12}}</ref><ref>{{cite web|url=http://news.idg.no/cw/art.cfm?id=F66B12BB-D13E-94B0-DAA22F5AB01BEFE7 |title=60+ R resources to improve your data skills ( - Software ) |publisher=News.idg.no |date= |accessdate=2014-02-12}}</ref> and an [[Professors_in_the_United_States#Adjunct_professor|adjunct]] [[Assistant Professor]] of statistics at [[Rice University]].<ref name=\"about\">{{cite web|url=http://www.rstudio.com/about/ |title=About - RStudio |accessdate=2014-08-13}}</ref> He is best known for his development of open-source statistical analysis software packages for [[R (programming language)]] that implement logics of [[data visualisation]] and data transformation. Wickham completed his undergraduate studies at the [[University of Auckland]] and his PhD at [[Iowa State University]] under the supervision of Di Cook and Heike Hoffman.<ref>{{cite web|URL=http://blog.revolutionanalytics.com/2010/09/the-r-files-hadley-wickham.html |title= The R-Files: Hadley Wickham}}</ref> In 2006 he was awarded the [[John_Chambers_(statistician)|John Chambers]] Award for Statistical Computing for his work developing tools for data reshaping and visualisation.<ref>{{cite web|url=http://stat-computing.org/awards/jmc/winners.html |title=John Chambers Award Past winners|publisher=ASA Sections on Statistical Computing, Statistical Graphics,|date= |accessdate=2014-08-12}}</ref>\n\nHe is a prominent and active member of the [[R (programming language)|R]] user community and has developed several notable and widely used packages including [[ggplot2]], plyr, dplyr, and reshape2.<ref name=\"about\" /><ref>{{cite web|url=http://www.r-statistics.com/2013/06/top-100-r-packages-for-2013-jan-may/ |title=Top 100 R Packages for 2013 (Jan-May)! |publisher=R-statistics blog |date= |accessdate=2014-08-12}}</ref>"
## [4] "'''Hadley Wickham''' is a [[statistician]] from [[New Zealand]] who is currently Chief Scientist at [[RStudio]]<ref>{{cite web|url=http://washstat.org/wss1310.shtml |title=Washington Statistical Society October 2013 Newsletter |publisher=Washstat.org |date= |accessdate=2014-02-12}}</ref><ref>{{cite web|url=http://news.idg.no/cw/art.cfm?id=F66B12BB-D13E-94B0-DAA22F5AB01BEFE7 |title=60+ R resources to improve your data skills ( - Software ) |publisher=News.idg.no |date= |accessdate=2014-02-12}}</ref> and an [[Professors_in_the_United_States#Adjunct_professor|adjunct]] [[Assistant Professor]] of statistics at [[Rice University]].<ref name=\"about\">{{cite web|url=http://www.rstudio.com/about/ |title=About - RStudio |accessdate=2014-08-13}}</ref> He is best known for his development of open-source statistical analysis software packages for [[R (programming language)]] that implement logics of [[data visualisation]] and data transformation. Wickham completed his undergraduate studies at the [[University of Auckland]] and his PhD at [[Iowa State University]] under the supervision of Di Cook and Heike Hoffman.<ref>{{cite web|URL=http://blog.revolutionanalytics.com/2010/09/the-r-files-hadley-wickham.html |title= The R-Files: Hadley Wickham}}</ref> In 2006 he was awarded the [[John_Chambers_(statistician)|John Chambers]] Award for Statistical Computing for his work developing tools for data reshaping and visualisation.<ref>{{cite web|url=http://stat-computing.org/awards/jmc/winners.html |title=John Chambers Award Past winners|publisher=ASA Sections on Statistical Computing, Statistical Graphics,|date= |accessdate=2014-08-12}}</ref>\n\nHe is a prominent and active member of the [[R (programming language)|R]] user community and has developed several notable and widely used packages including [[ggplot2]], plyr, dplyr, and reshape2.<ref name=\"about\" /><ref>{{cite web|url=http://www.r-statistics.com/2013/06/top-100-r-packages-for-2013-jan-may/ |title=Top 100 R Packages for 2013 (Jan-May)! |publisher=R-statistics blog |date= |accessdate=2014-08-12}}</ref>"
## [5] "'''Hadley Wickham''' is a [[statistician]] from [[New Zealand]] who is currently Chief Scientist at [[RStudio]]<ref>{{cite web|url=http://washstat.org/wss1310.shtml |title=Washington Statistical Society October 2013 Newsletter |publisher=Washstat.org |date= |accessdate=2014-02-12}}</ref><ref>{{cite web|url=http://news.idg.no/cw/art.cfm?id=F66B12BB-D13E-94B0-DAA22F5AB01BEFE7 |title=60+ R resources to improve your data skills ( - Software ) |publisher=News.idg.no |date= |accessdate=2014-02-12}}</ref> and an [[Professors_in_the_United_States#Adjunct_professor|adjunct]] [[Assistant Professor]] of statistics at [[Rice University]].<ref name=\"about\">{{cite web|url=http://www.rstudio.com/about/ |title=About - RStudio |accessdate=2014-08-13}}</ref> He is best known for his development of open-source statistical analysis software packages for [[R (programming language)]] that implement logics of [[data visualisation]] and data transformation. Wickham completed his undergraduate studies at the [[University of Auckland]] and his PhD at [[Iowa State University]] under the supervision of Di Cook and Heike Hoffman.<ref>{{cite web|URL=http://blog.revolutionanalytics.com/2010/09/the-r-files-hadley-wickham.html |title= The R-Files: Hadley Wickham}}</ref> In 2006 he was awarded the [[John_Chambers_(statistician)|John Chambers]] Award for Statistical Computing for his work developing tools for data reshaping and visualisation.<ref>{{cite web|url=http://stat-computing.org/awards/jmc/winners.html |title=John Chambers Award Past winners|publisher=ASA Sections on Statistical Computing, Statistical Graphics,|date= |accessdate=2014-08-12}}</ref>\n\nHe is a prominent and active member of the [[R (programming language)|R]] user community and has developed several notable and widely used packages including [[ggplot2]], plyr, dplyr, and reshape2.<ref name=\"about\" /><ref>{{cite web|url=http://www.r-statistics.com/2013/06/top-100-r-packages-for-2013-jan-may/ |title=Top 100 R Packages for 2013 (Jan-May)! |publisher=R-statistics blog |date= |accessdate=2014-08-12}}</ref>"
Get the tag attributes with with xml_attr()
.
# Get the nodeset of XPATH
rev_nodes <- xml_find_all(resp_xml, xpath = "//rev")
rev_nodes %>%
xml_attrs() %>% .[[1]] # Get the all tag attributes
## user anon timestamp
## "214.28.226.251" "" "2015-01-14T17:12:45Z"
## contentformat contentmodel comment
## "text/x-wiki" "wikitext" ""
## space
## "preserve"
rev_nodes %>%
xml_attr(x = ., attr = "user") # Get the specific tag attributes
## [1] "214.28.226.251" "73.183.151.193" "FeanorStar7" "KasparBot"
## [5] "Spkal"
最後,將上述所學,全部整合成一個函數
get_revision_history <- function(article_title,
limit_index = 5){
# 1. Get the response ----
resp <- GET(url = "https://en.wikipedia.org/w/api.php",
query = list(action = "query",
titles = article_title,
prop = "revisions",
rvprop = "timestamp|user|comment|content",
rvlimit = limit_index,
format = "xml",
rvdir = "newer",
rvstart = "2015-01-14T17:12:45Z",
rvsection = "0"))
# 2. Convert response text to xml ----
resp_xml <- resp %>%
content(x = ., type = "text") %>%
read_xml()
# 3. Find revision nodeset -----
rev_nodes <- resp_xml %>%
xml_find_all(x = ., xpath = "//rev")
# 4. Parse user names, timestamp, and content ----
user <- rev_nodes %>%
xml_attr(x = ., attr = "user") # Parse usernames
timestamp <- rev_nodes %>%
xml_attr(x = ., attr = "timestamp") %>%
readr::parse_datetime() # Parse timestamp
content <- rev_nodes %>%
xml_text()
# 5. Return data frame ----
return_df <- data.frame(user = user,
timestamp = timestamp,
content = content %>%
str_sub(string = .,
start = 1, end = 40))
return(return_df)
}
# Call function for "Hadley Wickham"
get_revision_history(article_title = "Hadley Wickham",
limit_index = 7) %>%
kable(., "html", caption = "Wiki Revision history") %>%
kable_styling(bootstrap_options = "striped",
full_width = F)
user | timestamp | content |
---|---|---|
214.28.226.251 | 2015-01-14 17:12:45 | ’‘’Hadley Mary Helen Wickham III’’’ is a |
73.183.151.193 | 2015-01-15 15:49:34 | ’‘’Hadley Wickham’’’ is a [[statisticia |
FeanorStar7 | 2015-01-24 16:34:31 | ’‘’Hadley Wickham’’’ is a [[statisticia |
KasparBot | 2015-04-26 19:18:17 | ’‘’Hadley Wickham’’’ is a [[statisticia |
Spkal | 2015-05-06 18:24:57 | ’‘’Hadley Wickham’’’ is a [[statisticia |
Ser Amantio di Nicolao | 2015-05-28 04:27:00 | ’‘’Hadley Wickham’’’ is a [[statisticia |
141.58.138.101 | 2015-06-14 15:16:42 | ’‘’Hadley Wickham’’’ is a [[statisticia |
Using The Cleveland Museum of Art Open Access API to get the some artworks dataset in Cleveland Museum.
According to the above the list, we know the API request parameters .
# Content the API
resp <- GET(url = "https://openaccess-api.clevelandart.org/api/artworks/",
query = list(department = "American Painting and Sculpture",
type = "Painting",
q = "Sargent"))
cont <- content(resp)$data # List per result
result_list <- vector(mode = "list")
for (i in seq_along(cont)) {
ID <- cont[[i]]$id
Title <- cont[[i]]$title
Creation_Date <- cont[[i]]$creation_date
Url <- cont[[i]]$url
Fun_fact <- ifelse(test = is.null(cont[[i]]$fun_fact),
yes = NA, no = cont[[i]]$fun_fact)
Description <- ifelse(test = is.null(cont[[i]]$wall_description),
yes = NA, no = cont[[i]]$wall_description)
result_list[[i]] <- cbind(ID, Title,
Creation_Date, Url,
Fun_fact, Description)
}
df <- result_list %>%
do.call(what = rbind, args = .) %>%
data.table()
df %>% head() %>%
kable(., "html", caption = "Wiki Revision history") %>%
kable_styling(bootstrap_options = "striped",
full_width = F)
ID | Title | Creation_Date | Url | Fun_fact | Description |
---|---|---|---|---|---|
160289 | Portrait of Lisa Colt Curtis | 1898 | https://clevelandart.org/art/1998.168 | Elegant and poised in her silk gown, Lisa Colt Curtis does not directly engage the viewer. Sargent’s portrait seems to preserve the moment just before she steps forward to greet guests. | One of the most sought-after painters of his era, Sargent achieved considerable critical and financial success portraying cosmopolitan members of high society on both sides of the Atlantic. Here, the artist depicts an acquaintance—an heir to the Colt firearms fortune—who had recently married his distant cousin Ralph. In her portrait, Curtis wears an elegant satin dress and poses as if she were welcoming guests into her palatial Venetian home. The painting apparently was a wedding gift to the couple by the artist; its inscription at the top right reads, “To Ralph and Mrs. Ralph, John S. Sargent 1898.” |
109250 | The Cossack | undated | https://clevelandart.org/art/1927.397 | NA | NA |
170082 | Self-Portrait with Five Muses |
|
https://clevelandart.org/art/2012.30 | Church “spoke” at his funeral via an Edison phonograph recording he made for the occasion. | Lifelong Chagrin Falls resident Church is considered one of the great self-taught artists of 19th-century America. A painter, sculptor, and musician by passion, he offered his appearance and enthusiasms in this highly imaginative self-portrait, surrounding himself with a squadron of miniature winged muses. These figures represent not only the traditional arts of painting, sculpture, and music, but also Church’s profession of blacksmithing (identified as a crowned figure holding a hammer and anvil). A savvy entrepreneur, Church launched the first commercial art gallery in northeast Ohio: Church’s Art Museum, at Geauga Lake, in 1888. Its inventory consisted entirely of his own work. |
109971 | Head of a Girl | before 1929 | https://clevelandart.org/art/1928.579 | NA | NA |
102578 | Portrait of Dora Wheeler | 1882–83 | https://clevelandart.org/art/1921.1239 | Chase and Wheeler worked together to raise money for constructing the Statue of Liberty’s pedestal. | Dora Wheeler became Chase’s first student when he returned from overseas study in Munich and set up a teaching studio in New York. At the time, few American artists accepted women as private pupils. After her course of study, Wheeler joined her mother in launching a successful decorating firm, one of the first businesses in the country to be operated entirely by women. For the firm, she designed luxurious textiles, and the embroidered silk tapestry that fills the background in her portrait references her occupational interest. Chase’s portrait was awarded a gold medal at an international exhibition of contemporary art in Munich in 1883, and later that year was also shown in Paris. At some later point, the painting was acquired by the sitter, who subsequently donated it to the museum. |
121261 | The Violin Player |
|
https://clevelandart.org/art/1942.1133 | NA | NA |
APIs allow any application to become a node within an almost infinite pool of expanding data and functionality. include multi-dimensional data structures (such as JSON), source code and database schemas.
OMDB API [http://www.omdbapi.com/ ]
The Cleveland Museum of Art Open Access API [https://openaccess-api.clevelandart.org/]
RPus_plantagenet [https://rpubs.com/plantagenet/481658]
RPus_mpfoley73 [https://rpubs.com/mpfoley73/548500]
List of HTTP status codes [https://en.wikipedia.org/wiki/List_of_HTTP_status_codes]
Working with Web Data in R. DataCamp [https://campus.datacamp.com/courses/working-with-web-data-in-r]
Best practices for API packages. [https://rpubs.com/mpfoley73/548500]