Question 5

Checking if you can scrape a webpage with robotstxt

paths_allowed("https://en.wikipedia.org/wiki/2026_in_film")

## Warning: package 'future' was built under R version 4.4.3

##  en.wikipedia.org

## [1] TRUE

Question 6

Scraping Highest Grossing Films table data using HTML Node

WikiFilmsData <- read_html("https://en.wikipedia.org/wiki/2026_in_film")
FilmDataTable <- html_elements(WikiFilmsData, "table")
Highestgross <- html_table(FilmDataTable[3])[[1]]
Highestgross

## # A tibble: 10 × 4
##     Rank Title                             Distributor         `Worldwide gross`
##    <int> <chr>                             <chr>               <chr>            
##  1     1 Cheburashka 2 †                   Central Partnership $79,559,272      
##  2     2 28 Years Later: The Bone Temple † Sony                $56,722,595      
##  3     3 Send Help †                       20th Century Studi… $54,977,893      
##  4     4 Border 2 †                        AA Films            $51,204,000[3]   
##  5     5 Mercy †                           Amazon MGM Studios… $49,802,465      
##  6     6 Return to Silent Hill †           Iconic Events Rele… $41,586,056[4]   
##  7     7 Primate †                         Paramount Pictures  $39,702,818      
##  8     8 Iron Lung †                       Markiplier Studios  $38,965,988      
##  9     9 Dracula †                         SND (France)        $33,593,404      
## 10    10 Mana Shankara Vara Prasad Garu †  Gold Box Entertain… $32,225,000[5][6]

Assignment 3

Jake Vaughan

2026-02-12

Loading Web Scraping Tools

Question 5

Checking if you can scrape a webpage with robotstxt

Question 6

Scraping Highest Grossing Films table data using HTML Node