webscraping tutorial!

Author

Gary

Webscraping uses css elements on a website to take information (or whatever). Use inspect element to select different parts of a page.

Libraries

library(rvest)
library(tidyverse) # can also use base R

Some websites try to prevent scraping (or to limit how much can be scraped), so sometimes you will get error 403 or similar, meaning that the website denied your request. Some websites may also try to ban you from accessing them (do not try scraping Google - and also don’t run super extensive scraping code too many times in a row, eg, if you’re scraping 5000 pages on a website, limit your testing code to just 5-50 pages or so to limit your burden on the website). There’s always a way around these issues but so far I’ve not experienced any of them except 403. Don’t quote me on any of this