How YouTube may be important for research?

My main research interest is BookTube (book & YouTube) audience: their cultural consumption patterns, how they perceive the role of BookTube in their literature consumption, etc. So for me that was really exciting to learn how to collect their comments under the BookTube videos (here I take several videos of one booktuber), and later be able to analyze them. There would be mostly 3 useful variables: text of the comments, number of likes on the comments and number of replies on the comments, but the YouTube API is very useful for me later, so I decided to stick to it.

youtubeAuth <- Authenticate("youtube", apiKey = mykey)

Downloading data from chosen videos

videos <- c("f5wbru5KUXY", "D1KIxFQGz-A", "Ia-PnWACAZk", "CfBUgMwBPuw", "_wAfHTpY0bQ&t=1s")
yt_data <-youtubeAuth %>%   
  Collect(videos)

Data preparation

yt_data$Video_name[yt_data$VideoID == "f5wbru5KUXY"] <- "pov: i tell you about the great books i read while cozying up on the couch" 
yt_data$Video_name[yt_data$VideoID == "D1KIxFQGz-A"] <- "new faves, leigh bardugo & vampire smut ⛈️ books i recently read ft a thunderstorm" 
yt_data$Video_name[yt_data$VideoID == "Ia-PnWACAZk"] <- "new favourite books, love triangles & gothic fantasy books I recently read" 
yt_data$Video_name[yt_data$VideoID == "CfBUgMwBPuw"] <- "books i recently snuggled up with hyped dark academia, neil gaiman & witchy books" 
yt_data$Video_name[yt_data$VideoID == "_wAfHTpY0bQ"] <- "books i recently read (bad romance⚔ unhinged drama & a good sci-fi)"

yt_data$ReplyCount <- as.numeric(yt_data$ReplyCount)
yt_data$LikeCount <- as.numeric(yt_data$LikeCount)

youtube_data <- select(yt_data, c(Comment, CommentID, AuthorDisplayName, ReplyCount, LikeCount, Video_name))

Visualization

1

ggplot(youtube_data, aes(x=ReplyCount)) +
  geom_bar() + 
  theme_minimal() + 
  labs(x="Replies", 
       y="Frequency", 
       title="Interactions in YouTube video comment section", 
       subtitle = "Source: 'The Book Leo' channel")

2

So, as the histogram is not that understandable and descriptive, I decided to make beeswarm plots, but before that to see descriptive statistics of these 2 variables:

table1(~LikeCount + ReplyCount, data = youtube_data)
Overall
(N=1518)
LikeCount
Mean (SD) 9.06 (50.0)
Median [Min, Max] 0 [0, 751]
ReplyCount
Mean (SD) 0.229 (2.50)
Median [Min, Max] 0 [0, 92.0]
beeswarm(youtube_data$ReplyCount, horizontal=TRUE, pch=16, col= "darkgrey", xlab = 'Replies')

beeswarm(youtube_data$LikeCount, horizontal=TRUE, pch=16, col= "darkgrey", xlab = 'Likes')

3

ggplot(youtube_data, aes(x=LikeCount, y=ReplyCount)) +
  geom_point() +
  geom_smooth(method=lm , color="red", fill="#69b3a2", se=TRUE) 

4

youtube_corpus <- corpus(youtube_data, text_field = "Comment", docid_field = "CommentID")

youtube_tokens <- tokens(youtube_corpus, remove_numbers = TRUE, remove_punct = TRUE, remove_url = TRUE)
youtube_tokens_2 <- tokens_replace(youtube_tokens, pattern = hash_lemmas$token, replacement = hash_lemmas$lemma)
youtube_tokens_clean <- tokens_remove(youtube_tokens_2, 
                                 pattern = c(stopwords("en"))) 

youtube_dfm <- dfm(youtube_tokens_clean)
youtube_dfm_trim <- dfm_trim(youtube_dfm, min_docfreq = 0.0005, docfreq_type = "prop") 

textplot_wordcloud(youtube_dfm_trim, color = "#015781", min_size = 1, max_size = 5, max_words = 100)

5

actor_graph <- yt_data |> Create('actor') |> AddText(yt_data) |> Graph()

plot(actor_graph, vertex.label = '.', vertex.size = 4, vertex.color = closeness(actor_graph)*10000, edge.arrow.size = 0.4, layout = layout.fruchterman.reingold)

P.S. the last one is a network of comments, just a fancy plot, I was just interested in how it looks, for relevant usage probably should be upgraded somehow.

How I figured out YouTube API?

There are two main references that I obtained through YouTube videos about YouTube API

  1. https://www.cspoerlein.com/files/textanalyse.html#downloading_data

  2. https://cran.r-project.org/web/packages/vosonSML/vignettes/Intro-to-vosonSML.html

The second one for network construction mostly, while the 1st is VERY useful for this h/w and for my future research as well, because there is a lot of info on text analysis

To collect the data I needed to create a project in Google, so I did it and named ‘booktube-trial’

Then I proceeded to download YouTube Data API v3 from the Google library, this is needed to download comments data from any video on YouTube

And lastly, from the 3rd attempt (at first i didn’t download YouTube API, so nothing was working) I received a key (stored in mykey variable) that allowed to parse comments from videos

That’s it! Thank you for your attention :)