Here I hope to extract the titles from the episodes of Frasier into an easy-to-use data frame.

readLines

From a relatively small text file, we can simply use readLines to load the data.

raw_data <- readLines("frasierEpisodes.txt")
head(raw_data)
## [1] "1.     1-1                 16 Sep 93   <a target=\"_blank\" href=\"http://www.tvmaze.com/episodes/49441/frasier-1x01-the-good-son\">The Good Son</a>"                 
## [2] "2.     1-2                 23 Sep 93   <a target=\"_blank\" href=\"http://www.tvmaze.com/episodes/49442/frasier-1x02-space-quest\">Space Quest</a>"                   
## [3] "3.     1-3                 30 Sep 93   <a target=\"_blank\" href=\"http://www.tvmaze.com/episodes/49443/frasier-1x03-dinner-at-eight\">Dinner at Eight</a>"           
## [4] "4.     1-4                 07 Oct 93   <a target=\"_blank\" href=\"http://www.tvmaze.com/episodes/49444/frasier-1x04-i-hate-frasier-crane\">I Hate Frasier Crane</a>" 
## [5] "5.     1-5                 14 Oct 93   <a target=\"_blank\" href=\"http://www.tvmaze.com/episodes/49445/frasier-1x05-heres-looking-at-you\">Here's Looking at You</a>"
## [6] "6.     1-6                 21 Oct 93   <a target=\"_blank\" href=\"http://www.tvmaze.com/episodes/49446/frasier-1x06-the-crucible\">The Crucible</a>"

Regular Expressions

Now comes the fun part. I want to extract the season numbers, the episode numbers, the titles of each episode, and maybe the air date.

library("stringr")

The pattern for the season and episode numbers (presently) is a one- or two-digit number, a hyphen, and another one- or two-digit number.

seasonEpisode_pattern <- "[0-9]+-[0-9]+"
seasonAndEpisode <- str_extract(raw_data, seasonEpisode_pattern)

The pattern for the dates is two digits, a space, three characters, a space, and two digits.

airDate_pattern <- "[0-9]{2} [A-Za-z]{3} [0-9]{2}"
airDate <- str_extract(raw_data, airDate_pattern)

To extract the titles, I will start by finding any characters between the “>” and the “<” of the HTML link tags, and then I can simply trim off those brackets.

title_pattern <- ">(.+?)<"
raw_title <- str_extract(raw_data, title_pattern)
title <- str_sub(raw_title, 2, str_length(raw_title) - 1)

Data Frame

Now we can combine the extracted strings into a nice data frame.

Frasier <- data.frame(seasonAndEpisode, airDate, title)

Let us split that seasonAndEpisode variable.

library(tidyverse)
Frasier <- Frasier %>%
  separate(seasonAndEpisode, c("season", "episode"), "-")
head(Frasier)
##   season episode   airDate                 title
## 1      1       1 16 Sep 93          The Good Son
## 2      1       2 23 Sep 93           Space Quest
## 3      1       3 30 Sep 93       Dinner at Eight
## 4      1       4 07 Oct 93  I Hate Frasier Crane
## 5      1       5 14 Oct 93 Here's Looking at You
## 6      1       6 21 Oct 93          The Crucible

Random Episode Picker

Now we can make some code that randomly selects an episode! Since each of the 11 seasons each had exactly 24 episodes, this part is easy to code.

seasonPicker <- sample(1:11, 1)
episodePicker <- sample(1:24, 1)
Frasier %>%
  filter(season == seasonPicker) %>%
  filter(episode == episodePicker)
##   season episode   airDate               title
## 1      4       1 17 Sep 96 The Two Mrs. Cranes

Source

I started with http://epguides.com/Frasier/, viewed the HTML code, and copied-and-pasted what I needed into a text document.