Some Data on the Web
Let’s say we want that “Recent Princes” table.
Package rvest
Attach the rvest package:
library(rvest)Read in the Page
Here’s the URL we want:
url <- "http://gccs.surge.sh/tidypres.html"Now we grab the page:
page <- read_html(url)Go For the Tables
tables <- html_nodes(page, "table")Did we get it? Try this:
tables[[2]]{xml_node}
<table class="table table-hover table-bordered">
[1] <tr>\n<th>sex</th>\n \n <th>count</th>\n \n ...
[2] <tr>\n<td>M</td>\n \n <td>73</td>\n \n ...
[3] <tr>\n<td>M</td>\n \n <td>92</td>\n \n ...
and so on for more rows ...
Yep, looks like it did!
Turn Into a Data Frame
recentPrinces <- html_table(tables[[2]])Did it work? Try this:
str(recentPrinces)## 'data.frame': 39 obs. of 3 variables:
## $ sex : chr "M" "M" "M" "M" ...
## $ count: int 73 92 131 146 137 5 167 206 195 5 ...
## $ year : int 1978 1979 1980 1981 1982 1983 1983 1984 1985 1986 ...
And try this:
DT::datatable(recentPrinces, options = list(
pageLength = 5,
lengthMenu = c(5, 10, 15, 20)
))Analyse
Let’s make a graph:
ggplot(recentPrinces, aes(x = year, y = count)) +
geom_line(aes(color = sex)) +
labs(x = "Year",
y = "number of babies named 'Prince'",
title = "The name 'Prince' has been getting popular!",
subtitle = "(for boys, at any rate ...)")