Video Selection

Video Link: https://www.youtube.com/watch?v=KCZJ6Ttsp-A

Video Title: How To Invest in ETFs | Ultimate Guide

For my video, I chose a industry/brand topic that I started getting into this year which is investing in the stock market, more specifically in ETF’s. I’ve used this video and many other videos to help guide me into which ETF’s are worth investing into. I’m curious to see the results on this video for this YouTube scraping exercise.

Observations and Findings

For my code, I’ve added both a word frequency chart and a word cloud chart and they provide some insight towards what the users thought about the video. The most common words based from the word frequency chart was video, market, investing, etf, portfolio and stocks.These are all heavily correlated with the topic of the video suggesting that many of the comments are engaged in discussion with the video. Through the word cloud chart, we can see many words with the same words but expanded more. We get more positive words such as informative, amazing,easy and other positive comments reflecting the video’s excellent delivery in ETF guidance. Although these are all positive comments showcasing the quality of the video, I did notice words that either seem to broad or too random that may not have anything to do with the video. I suspect that there are comments that possibly mention “Nice video”, “Amazing video”, or “informative video” which may lead an unrelated word to become the highest word frequency. Even through suspicion of bot accounts, there are accounts that may produce “fake” comments to increase the comment count and may skew the results. Just from looking at these visualizations, there doesn’t seem to be much bots commenting due to how most of these words are related to the general themes of the video such as investing, growth, and portfolios.

Learning Objectives

By the end of this tutorial, you will be able to: 1. Register a project in Google Cloud Console and enable the YouTube Data API v3. 2. Create OAuth 2.0 credentials and authorize R to access YouTube data on your behalf. 3. Use the tuber package to pull comments from any public YouTube video. 4. Clean the scraped data with dplyr and export it to a CSV file. 5. Recognize and fix the most common errors students run into during this process. # Prerequisites - R and RStudio installed - A Google account (a personal Gmail account works fine) - Packages: tuber, dplyr, readr - A YouTube video URL you want to scrape comments from (we’ll use a real example below) — # Part 1 — Get API Access from Google Cloud Console Before R can talk to YouTube, you need to tell Google that your project is allowed to request data. This happens in three steps inside the Google Cloud Console. ## Step 1.1 — Create a project and enable the YouTube Data API v3 1. Sign in to console.cloud.google.com and create a new project (or pick an existing one) using the project selector at the top of the page. 2. Go to APIs & Services → Library. 3. Search for YouTube Data API v3 and open it. 4. Click Enable. {r, eval=TRUE, echo=FALSE, fig.align='center', out.width='90%', fig.cap="Enabling the YouTube Data API v3 for your project"} knitr::include_graphics("images/01_enable_api.jpg") ## Step 1.2 — Create an OAuth 2.0 Client ID The tuber package authenticates as you, not as an anonymous script, so you need an OAuth client rather than a plain API key. 1. Go to APIs & Services → Credentials. 2. Click + CREATE CREDENTIALS → OAuth client ID. {r, eval=TRUE, echo=FALSE, fig.align='center', out.width='90%', fig.cap="Creating credentials: choose OAuth client ID"} knitr::include_graphics("images/02_create_oauth_client.jpg") 3. For Application type, choose Web application and give it any name (e.g., “MSBA 580 YouTube Scraper”). 4. Under Authorized redirect URIs, click + ADD URI and enter exactly:

http://localhost:1410/

Why this exact URL? tuber authenticates through the httr package, which spins up a temporary local web server on port 1410 to catch Google’s response. If this URI doesn’t match exactly (including the trailing slash), authentication will fail. 5. Click Create. Google will show you a Client ID and Client Secret — copy both somewhere safe. You’ll paste them into R in Part 2. ## Step 1.3 — Add yourself as a test user New OAuth apps start in Testing mode, which means only approved accounts can authenticate. If you skip this step, you’ll hit a 403: access_denied error the first time you try to log in from R. 1. Go to APIs & Services → OAuth consent screen. 2. Scroll to Test users and click + ADD USERS. 3. Enter the Gmail address you’ll use to authenticate (your own account is fine) and click Save. ```{r, eval=TRUE, echo=FALSE, fig.align=‘center’, out.width=‘80%’, fig.cap=“Adding yourself as a test user under the OAuth consent screen”} knitr::include_graphics(“images/03_test_users.jpg”)

---
# Part 2 — Connect R to the API
## Step 2.1 — Load the package

``` r
library(tuber)
library(dplyr)
library(readr)

Step 2.2 — Authenticate with yt_oauth()

Paste the Client ID and Client Secret you copied in Step 1.2 below. > �� Never commit real credentials to GitHub or share them in a script you hand in. Treat them like a password — store them in a separate, untracked file (e.g., an .Renviron file) for real projects.

library(tuber)

app_id <- "YOUTUBE_APP_ID"
app_secret <- "YOUTUBE_APP_SECRET"

yt_oauth(app_id, app_secret)

When you run this, R will ask:

Use a local file ('.httr-oauth'), to cache OAuth access credentials
between R sessions?
1: Yes
2: No

Choose 1: Yes so you don’t have to log in again every session. ## Step 2.3 — Approve access in your browser A browser window will open automatically. Because the app is still in Testing mode, you’ll see a warning screen first: {r, eval=TRUE, echo=FALSE, fig.align='center', out.width='65%', fig.cap='This warning is expected for apps in Testing mode — click Continue'} knitr::include_graphics("images/04_unverified_app_warning.jpg") Click Continue, then Allow on the next screen that lists what the app can access. Your browser tab will then say “Authentication complete. Please close this page and return to R.” — and your R console will print Authentication complete. — # Part 3 — Scrape Comments from a Video Step by Step ## Step 3.1 — Get the video ID The video ID is the part of the URL after v=. For example, the code below demonstrates how to scrape comments from a YouTube video about SpaceX step by step.

https://www.youtube.com/watch?v=KCZJ6Ttsp-A
└────┬────┘
video_id
video_id <- "KCZJ6Ttsp-A" # "How To Invest in ETFs | Ultimate Guide"

Step 3.2 — Pull every comment

comments_raw <- get_all_comments(video_id = video_id)
head(comments_raw)
##     authorDisplayName
## 1         @JoshuaMayo
## 2 @honeypotqueens9865
## 3          @Dee-rc2lt
## 4    @devonpruitt9456
## 5    @devonpruitt9456
## 6    @dahirukabiru672
##                                                                                                        authorProfileImageUrl
## 1 https://yt3.ggpht.com/xKDMsCpOCBE-trjD1xQzNaiR6_xAUKVzHv4OQ_XHrV6jbHR2G5_NHK5d3P_wkWWaSMsqiBgdfA=s48-c-k-c0x00ffffff-no-rj
## 2 https://yt3.ggpht.com/B5hFbAfoCnwej2iVnBh2IEaZZGNNm92i8uvj3LBOItFRPCocoWeo9NawBCvc-eNlvwj7XKtv1Q=s48-c-k-c0x00ffffff-no-rj
## 3                        https://yt3.ggpht.com/ytc/AIdro_n6Mc61AwG8RuqiyQtG_ureSrBRiRvFrgCLR1KjMr4=s48-c-k-c0x00ffffff-no-rj
## 4                        https://yt3.ggpht.com/ytc/AIdro_m_h2e2XEszk7BVOmbnFWUv1lRV3qfyKH0bTdMDr3I=s48-c-k-c0x00ffffff-no-rj
## 5                        https://yt3.ggpht.com/ytc/AIdro_m_h2e2XEszk7BVOmbnFWUv1lRV3qfyKH0bTdMDr3I=s48-c-k-c0x00ffffff-no-rj
## 6 https://yt3.ggpht.com/F6CEnnNFXADq27uyZ0hgiZg-tuKYnamxJ_fi6SsrwjQ48BsGeaS4UP8bRsgUqkuAzpWmI1TTIw=s48-c-k-c0x00ffffff-no-rj
##                             authorChannelUrl    authorChannelId.value
## 1         http://www.youtube.com/@JoshuaMayo UCJZ7zr9a6AT6STkyOFztgiQ
## 2 http://www.youtube.com/@honeypotqueens9865 UCt60aQ7XDsuIS8D4YSEfpMg
## 3          http://www.youtube.com/@Dee-rc2lt UC6VB3xX_6tkDwx_zMYxWy_Q
## 4    http://www.youtube.com/@devonpruitt9456 UCQ7kiaSi7o3HPQKDYn7N4Lg
## 5    http://www.youtube.com/@devonpruitt9456 UCQ7kiaSi7o3HPQKDYn7N4Lg
## 6    http://www.youtube.com/@dahirukabiru672 UCG5T5Fb4lEflBLbjJJyt_gg
##       videoId
## 1 KCZJ6Ttsp-A
## 2 KCZJ6Ttsp-A
## 3 KCZJ6Ttsp-A
## 4 KCZJ6Ttsp-A
## 5 KCZJ6Ttsp-A
## 6 KCZJ6Ttsp-A
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        textDisplay
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                   A monster of an ETF guide! Let me know if there are other videos you&#39;d guys like to see. 👍
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Which CDs 💿 would you recommend with the highest dividends and compound interest
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Proper portfolio allocation
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              @Dee-rc2lt this!!!!
## 5 Brother , I am proud of you. I am a new financial advisor in this field . You probably have your thoughts about my profession lol. However , watching your videos help me refresh my knowledge and mirror the way you convey financial principles in a clear and concise manner. I am inspired to create my own platform . Rarely do I subscribe to specific people , but you caught my interest . Blessings upon you. Please continue to share!! <br><br>Request : What are your thoughts about portfolio allocation across multiple accounts (ex. 401K + IRA).
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Hey.... You can get connected to Mrs Anna with this number here 👆she is always online
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   textOriginal
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                   A monster of an ETF guide! Let me know if there are other videos you'd guys like to see. 👍
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Which CDs 💿 would you recommend with the highest dividends and compound interest
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Proper portfolio allocation
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          @Dee-rc2lt this!!!!
## 5 Brother , I am proud of you. I am a new financial advisor in this field . You probably have your thoughts about my profession lol. However , watching your videos help me refresh my knowledge and mirror the way you convey financial principles in a clear and concise manner. I am inspired to create my own platform . Rarely do I subscribe to specific people , but you caught my interest . Blessings upon you. Please continue to share!! \n\nRequest : What are your thoughts about portfolio allocation across multiple accounts (ex. 401K + IRA).
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Hey.... You can get connected to Mrs Anna with this number here 👆she is always online
##   canRate viewerRating likeCount          publishedAt            updatedAt
## 1    TRUE         none       460 2022-02-05T00:09:39Z 2022-02-05T00:09:39Z
## 2    TRUE         none        10 2022-02-05T03:10:04Z 2022-02-05T03:10:04Z
## 3    TRUE         none         9 2022-02-05T14:41:11Z 2022-02-05T14:41:11Z
## 4    TRUE         none         1 2022-02-06T10:53:06Z 2022-02-06T10:53:06Z
## 5    TRUE         none         8 2022-02-06T10:54:07Z 2022-02-06T10:54:07Z
## 6    TRUE         none         0 2022-02-07T01:59:33Z 2022-02-07T01:59:33Z
##                                                  id moderationStatus
## 1                        Ugw8K1L11NUIgrm8Qop4AaABAg             <NA>
## 2 Ugw8K1L11NUIgrm8Qop4AaABAg.9Y2IzT9_3519Y2cctUVmwA             <NA>
## 3 Ugw8K1L11NUIgrm8Qop4AaABAg.9Y2IzT9_3519Y3rijJrazd             <NA>
## 4 Ugw8K1L11NUIgrm8Qop4AaABAg.9Y2IzT9_3519Y61Q6M2Lwa             <NA>
## 5 Ugw8K1L11NUIgrm8Qop4AaABAg.9Y2IzT9_3519Y61XYIgVk-             <NA>
## 6 Ugw8K1L11NUIgrm8Qop4AaABAg.9Y2IzT9_3519Y7e973jW4o             <NA>
##                     parentId
## 1                       <NA>
## 2 Ugw8K1L11NUIgrm8Qop4AaABAg
## 3 Ugw8K1L11NUIgrm8Qop4AaABAg
## 4 Ugw8K1L11NUIgrm8Qop4AaABAg
## 5 Ugw8K1L11NUIgrm8Qop4AaABAg
## 6 Ugw8K1L11NUIgrm8Qop4AaABAg
## 
## --- Tuber Metadata ---
## function: get_all_comments  api_calls: 47  results_found: 1654  timestamp: 2026-06-29 17:22:38  
## (Use tuber_info() for full metadata)

{r, eval=TRUE, echo=FALSE, fig.align='center', out.width='65%', fig.cap='Scraping YouTube comments'} knitr::include_graphics("images/05_scraping.png") This returns a data frame with one row per top-level comment and reply. Depending on the video’s popularity, this can take anywhere from a few seconds to a few minutes. ## Step 3.3 — Lifesaver trick: recovering output you forgot to save It happens to everyone: you run get_all_comments(video_id = "...") directly in the console without assigning it to anything, and the scrape (which can take a while) finishes, but the result wasn’t saved anywhere. As long as you haven’t run anything else in the console since, R keeps the most recent top-level result in .Last.value. Let’s see how we can solve this problem as your API may have a limit and you do not want to run the same scraping tasks again and again (also see the sreenshot below).

comments1 <- .Last.value

{r, eval=TRUE, echo=FALSE, fig.align='center', out.width='65%', fig.cap='Scraping YouTube comments'} knitr::include_graphics("images/06_saving_scraping_results_last.value.jpg. png") {r, eval=TRUE, echo=FALSE, fig.align='center', out.width='65%', fig.cap='Scraping YouTube comments'} knitr::include_graphics("images/07_saving_scraping_results.png") This is much faster than re-scraping the video from scratch. — # Part 4 — Clean the Data with dplyr and the pipe operator (%>% or |>) ## Step 4.1 — Always check the real column names first Don’t guess at column names — tuber’s output doesn’t always match what you’d expect. For example, the unique comment identifier is stored in a column called id, not comment_id. Run this first:

head(comments1)
## $help_type
## NULL
glimpse(comments1)
## List of 1
##  $ help_type: NULL

Step 4.2 — Tidy it up

Once you know the real column names, wrap the scrape in a dplyr pipeline that converts it to a tibble, removes accidental duplicate rows, and keeps only the columns you need:

comments_df <- comments_raw

names(comments_df)
##  [1] "authorDisplayName"     "authorProfileImageUrl" "authorChannelUrl"     
##  [4] "authorChannelId.value" "videoId"               "textDisplay"          
##  [7] "textOriginal"          "canRate"               "viewerRating"         
## [10] "likeCount"             "publishedAt"           "updatedAt"            
## [13] "id"                    "moderationStatus"      "parentId"
comments_df <- comments_df %>%
  select(authorDisplayName, textOriginal, publishedAt, likeCount) %>%
  filter(!is.na(textOriginal)) %>%
  distinct(textOriginal, .keep_all = TRUE) %>%
  rename(text = textOriginal) %>%
  mutate(comment_id = row_number())

head(comments_df)
##     authorDisplayName
## 1         @JoshuaMayo
## 2 @honeypotqueens9865
## 3          @Dee-rc2lt
## 4    @devonpruitt9456
## 5    @devonpruitt9456
## 6    @dahirukabiru672
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           text
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                   A monster of an ETF guide! Let me know if there are other videos you'd guys like to see. 👍
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Which CDs 💿 would you recommend with the highest dividends and compound interest
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Proper portfolio allocation
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          @Dee-rc2lt this!!!!
## 5 Brother , I am proud of you. I am a new financial advisor in this field . You probably have your thoughts about my profession lol. However , watching your videos help me refresh my knowledge and mirror the way you convey financial principles in a clear and concise manner. I am inspired to create my own platform . Rarely do I subscribe to specific people , but you caught my interest . Blessings upon you. Please continue to share!! \n\nRequest : What are your thoughts about portfolio allocation across multiple accounts (ex. 401K + IRA).
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Hey.... You can get connected to Mrs Anna with this number here 👆she is always online
##            publishedAt likeCount comment_id
## 1 2022-02-05T00:09:39Z       460          1
## 2 2022-02-05T03:10:04Z        10          2
## 3 2022-02-05T14:41:11Z         9          3
## 4 2022-02-06T10:53:06Z         1          4
## 5 2022-02-06T10:54:07Z         8          5
## 6 2022-02-07T01:59:33Z         0          6
## 
## --- Tuber Metadata ---
## function: get_all_comments  api_calls: 47  results_found: 1654  timestamp: 2026-06-29 17:22:38  
## (Use tuber_info() for full metadata)
library(tidytext)

comments_words <- comments_df %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words, by = "word") %>%
  count(word, sort = TRUE)

head(comments_words, 20)
##          word   n
## 1       video 238
## 2      market 184
## 3   investing 178
## 4        etfs 164
## 5         etf 160
## 6   portfolio 142
## 7   financial 123
## 8       money 123
## 9      stocks 123
## 10 investment 117
## 11    advisor 100
## 12       time  96
## 13     invest  93
## 14        i’m  77
## 15      stock  75
## 16     videos  72
## 17        lot  67
## 18        buy  65
## 19    trading  61
## 20       term  60
## 
## --- Tuber Metadata ---
## function: get_all_comments  api_calls: 47  results_found: 1654  timestamp: 2026-06-29 17:22:38  
## (Use tuber_info() for full metadata)
library(wordcloud)
library(RColorBrewer)

set.seed(123)

wordcloud(
  words = comments_words$word,
  freq = comments_words$n,
  max.words = 100,
  random.order = FALSE,
  colors = brewer.pal(8, "Dark2")
)

library(ggplot2)

comments_words %>%
  slice_max(n, n = 15) %>%
  ggplot(aes(x = reorder(word, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 15 Most Common Words",
    x = "Word",
    y = "Frequency"
  )

Notice we got an error here. What do you think we should do next?

{r, eval=TRUE, echo=FALSE, fig.align='center', out.width='65%', fig.cap='Scraping YouTube comments'} knitr::include_graphics("images/08_error_handling_comment_id_not_found.png ") ## Let’s have a discussion | Function | What it does | |—|—| | as_tibble() | Converts the result into a tibble so dplyr verbs behave predictably | | distinct(id, .keep_all = TRUE) | Removes duplicate rows if the API returns overlapping replies, keeping all other columns | | select(...) | Keeps only the columns you actually need for analysis | — # Part 5 — Save Your Data

write_csv(comments_df, "comments1.csv")
getwd()
## [1] "C:/Users/Darre/OneDrive - CSUCI/Documents"

Once saved, you can reload it anytime without re-scraping:

comments_clean <- read_csv("comments1.csv")
{r, eval=TRUE, echo=FALSE, fig.align='center', out.width='90%', fig.cap="The final scraped and cleaned comments, opened in Excel"} knitr::include_graphics("images/05_final_csv_output.jpg") — # Troubleshooting Field Guide | What you see | Likely cause | Fix | |—|—|—| | Error 403: access_denied, “…has not completed the Google verification process” | Your account isn’t on the test-user list yet | OAuth consent screen → Test users → + ADD USERS → enter your Gmail address (Step 1.3) | | “Google hasn’t verified this app” | Normal — your app is in Testing publishing status | Click Continue (only do this for apps you created yourself) | | Error in distinct(): Must use existing variables. x comment_id not found | The real column is named id, not comment_id | Run glimpse(comments_raw) to confirm actual column names before selecting | | Browser never redirects back to R / hangs at Waiting for authentication in browser... | The Authorized redirect URI doesn’t match | Confirm it’s exactly http://localhost:1410/ in your OAuth client settings (Step 1.2) | | Lost your scraped data after forgetting to assign it | Result wasn’t saved to a variable | Recover with comments_raw <- .Last.value, but only if nothing else ran in the console since (Step 3.3) | — # Summary In this tutorial you: - Enabled the YouTube Data API v3 and created OAuth 2.0 credentials in Google Cloud Console - Authenticated R against your Google account using tuber::yt_oauth() - Scraped every comment from a YouTube video with get_all_comments() - Cleaned the result with a dplyr pipeline and exported it to CSV - Learned how to recognize and fix the most common errors in this workflow Next steps: try running vader or tidytext sentiment analysis on comments_clean$textOriginal, or compare comment sentiment and engagement (likeCount) across two competing brands’ videos. # References - Sysoev, J. (tuber package documentation). tuber: Access to YouTube via the API - Google Developers. YouTube Data API v3 Reference - Wickham, H., et al. dplyr: A Grammar of Data Manipulation

Jimmy Zhenning Xu, Ph.D.,| github.com/utjimmyx