ScrapeExampleSolution

Load Libraries

This is a worked solution for the scraping assignment. Start by loading two tidy libraries with functions that will help parse HTML.

library(rvest)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Read data from a single web page

#Defines URL, I made this easier to loop by using paste0 to add the ID
#to the generic URL
namePage <- paste0("https://www.ncbi.nlm.nih.gov/biosample/?term=",31280775)

#testPage variable will store all data from the webpage
#similar to what you see with "Inspect Element"
testPage <- read_html(namePage)

#Since the data we want is in a table, this is a good first step
tableText <- testPage %>% 
    html_node("table") %>%
    html_table()

names(tableText)<-c('Question','Response')

head(tableText)

## # A tibble: 6 × 2
##   Question              Response         
##   <chr>                 <chr>            
## 1 dominant hand         I am right handed
## 2 environmental medium  feces            
## 3 environmental package human-gut        
## 4 host body habitat     UBERON:feces     
## 5 host body mass index  33.3             
## 6 host body product     UBERON:feces

ScrapeExampleSolution

Jess Kaufman

2025-09-22

Load Libraries

Read data from a single web page