Question 5: Robotstxt

Is scraping this web page allowed by ‘Robotstxt’?

library(robotstxt)

# Check if scraping is allowed
paths_allowed("https://en.wikipedia.org/wiki/2026_in_film")
## [1] TRUE

Question 6: Highest Grossing Films

Scraping the highest-grossing films from Wikipedia.

# Load libraries
library(rvest)
library(dplyr)
library(knitr)
library(stringr)

# URL of the 2026 in film Wikipedia page
url <- "https://en.wikipedia.org/wiki/2026_in_film"
page <- read_html(url)

# Extract the highest-grossing films table
highest_grossing <- page %>% 
  html_element("table.wikitable") %>%
  html_table()

# Clean the data
highest_grossing <- highest_grossing %>%
  mutate(
    # Remove random symbols
    Title = str_replace_all(Title, "†", ""),
    `Worldwide gross` = str_replace_all(`Worldwide gross`, "^[A-Z]+", ""),
    `Worldwide gross` = str_replace_all(`Worldwide gross`, "\\[.*?\\]", "")
  )

# Output the table
kable(head(highest_grossing, 15), caption = "Highest Grossing Films of 2026")
Highest Grossing Films of 2026
Rank Title Distributor Worldwide gross
1 Cheburashka 2 Central Partnership $79,559,272
2 28 Years Later: The Bone Temple Sony $56,722,595
3 Send Help 20th Century Studios $54,977,893
4 Border 2 AA Films $51,204,000
5 Mercy Amazon MGM Studios / Sony $49,802,465
6 Primate Paramount Pictures $39,702,818
7 Iron Lung Markiplier Studios $37,974,493
8 Dracula SND (France) $33,593,404
9 Mana Shankara Vara Prasad Garu Gold Box Entertainments $32,225,000
10 Prostokvashino  [ru] Cinema Atmosphere $31,934,076