Let me show you how to scrape HTML tables with R. The HTML tables are here: http://www.mishou.be/2021/10/04/pythonr-sample-data-for-data-analysis/ You can also learn how to scrape HTML tables with Python here: http://www.mishou.be/2021/10/04/pythonr-sample-data-for-data-analysis/.
# import libraries
library(htmltab)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# retrieve the first table
url <- "http://www.mishou.be/2021/10/04/pythonr-sample-data-for-data-analysis/"
df1 <- htmltab(url, which = 1)
# convert a data frame to a tibble
df1_tibble <- as_tibble(df1)
df1_tibble
## # A tibble: 200 × 7
## id english japanese nationality department classes gender
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 17.8 75.6 japan literature 2 male
## 2 2 64.4 53.3 nepal literature 2 male
## 3 3 86.7 31.1 nepal literature 1 male
## 4 4 60 62.2 indonesia literature 2 male
## 5 5 42.2 80 japan literature 1 male
## 6 6 33.3 75.6 japan literature 1 male
## 7 7 28.9 60 japan literature 2 male
## 8 8 53.3 88.9 japan literature 1 male
## 9 9 42.2 60 japan literature 1 male
## 10 10 40 80 japan literature 1 male
## # … with 190 more rows