library(vosonSML)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(igraph)
##
## Attaching package: 'igraph'
##
## The following objects are masked from 'package:lubridate':
##
## %--%, union
##
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
##
## The following objects are masked from 'package:purrr':
##
## compose, simplify
##
## The following object is masked from 'package:tidyr':
##
## crossing
##
## The following object is masked from 'package:tibble':
##
## as_data_frame
##
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
##
## The following object is masked from 'package:base':
##
## union
through the Google API console, I generate the associated API key and pass it to the program
myAPIKey <- "AIzaSyCAfFMwXaCOYhp19wj_oTCa9iyCVXWaPAU"# this is my API
youtubeAuth <- Authenticate("youtube", apiKey = myAPIKey)
For homework, I chose a YouTube video dedicated to a brief overview of the 2022 FIFA World Cup, where the national football teams of Argentina and France played. To do this, I copied the link address of this video and inserted it into the following command
videoIDs <- ("https://www.youtube.com/watch?v=zhEWqfP6V_w")
Next, using the Collect function, I create a table with the maximum number of comments - 100 and output this table to make sure that the data is correctly saved
youtubeData <- youtubeAuth %>%
Collect(videoIDs = videoIDs,
maxComments = 100,
writeToFile = TRUE)
youtubeData
## # A tibble: 130 × 12
## Comment AuthorDisplayName AuthorProfileImageUrl AuthorChannelUrl
## <chr> <chr> <chr> <chr>
## 1 😢😢😢😢 @74-hfc https://yt3.ggpht.co… http://www.yout…
## 2 Without a doubt the… @zeinz https://yt3.ggpht.co… http://www.yout…
## 3 Cuando el debate ac… @Lalo_18 https://yt3.ggpht.co… http://www.yout…
## 4 No one should disre… @hasanatbaher6332 https://yt3.ggpht.co… http://www.yout…
## 5 October 19 2024 @ArfatinNurrahmah https://yt3.ggpht.co… http://www.yout…
## 6 the best of all tim… @victorfarcas8905 https://yt3.ggpht.co… http://www.yout…
## 7 هرجع للتعليق ده في … @Shakir_Mo. https://yt3.ggpht.co… http://www.yout…
## 8 الله على الذكريات @Shakir_Mo. https://yt3.ggpht.co… http://www.yout…
## 9 Messi Fans click th… @kelvinakuneto15… https://yt3.ggpht.co… http://www.yout…
## 10 একটি ম্যাচ একটি পেনা… @mdseezan4451 https://yt3.ggpht.co… http://www.yout…
## # ℹ 120 more rows
## # ℹ 8 more variables: AuthorChannelID <chr>, ReplyCount <chr>, LikeCount <chr>,
## # PublishedAt <chr>, UpdatedAt <chr>, CommentID <chr>, ParentID <chr>,
## # VideoID <chr>
Next, I save this table to a csv file using the following command
write.csv(youtubeData,
"youtubeDataLab3.csv",
row.names = TRUE)
Next, I will create an activity graph from the saved data. I also want to mark in blue the comments that mention France and their leader Mbappe, and in yellow where Argentina and their leader Messi are mentioned
activityNetwork <- youtubeData %>% Create("activity") %>% AddText(youtubeData)
activityGraph <- activityNetwork %>% Graph(writeToFile = TRUE)
V(activityGraph)$color <- "grey"
V(activityGraph)$color[which(V(activityGraph)$node_type=="video")] <- "red"
indFr <- grep("france|mbappe",tolower(V(activityGraph)$vosonTxt_comment))
V(activityGraph)$color[indFr] <- "blue"
indArg <- grep("messi|argentina",tolower(V(activityGraph)$vosonTxt_comment))
V(activityGraph)$color[indArg] <- "yellow"
plot(activityGraph,
vertex.label="",# deletes edge names from the graph
vertex.size=4,
edge.arrow.size=0.5)
This activity graph reflects each point as a separate comment. The red
dot is the video itself; the blue dot is the comment that mentions the
words “France” and “Mbappe”; the golden dot is the comment that mentions
the words “Argentina” and “Messi”. We can notice that most of the
comments relate directly to the video, but you can also see the response
comments that relate to the main ones, and there are also two comments
that have been commented out more than three times. Next, we can
additionally see how many nodes we have. To do this, simply output the
activityGraph
activityGraph
## IGRAPH fa29f3d DN-- 131 130 --
## + attr: type (g/c), name (v/c), video_id (v/c), published_at (v/c),
## | updated_at (v/c), author_id (v/c), screen_name (v/c), node_type
## | (v/c), vosonTxt_comment (v/c), color (v/c), edge_type (e/c)
## + edges from fa29f3d (vertex names):
## [1] Ugxp_F6k9GLojS2XTqN4AaABAg->VIDEOID:zhEWqfP6V_w
## [2] UgxVUM3PDX_pB_NYmtd4AaABAg->VIDEOID:zhEWqfP6V_w
## [3] Ugx1zi5UDK8gItvh9Hl4AaABAg->VIDEOID:zhEWqfP6V_w
## [4] UgzghtvM4v7xrUncrZR4AaABAg->VIDEOID:zhEWqfP6V_w
## [5] Ugwnho--Lhg4CgpKVS54AaABAg->VIDEOID:zhEWqfP6V_w
## [6] Ugx0AtInkkXjJdxioyR4AaABAg->VIDEOID:zhEWqfP6V_w
## + ... omitted several edges
In total, we got only This means that there were overall 125 comments collected and this comments were 124 times commented.
Next, we will create a graph of actors, where the node will be the user
actorNetwork <- youtubeData %>% Create("actor") %>% AddText(youtubeData)
actorGraph <- actorNetwork %>% Graph(writeToFile = TRUE)
V(actorGraph)$color <-ifelse(V(actorGraph)$node_type=="video",#coloring
"red",
"grey")
#plotting YouTube actor network (red node is video)
plot(actorGraph,
vertex.size=4,
vertex.label="",
edge.arrow.size=0.5)
These are the actors’ graphs. It shows the interaction of users with the
video and each other. On it, we see that most of the actors directly
commented on the video, but the interaction of users with each other is
also noticeable. If we consider their interaction, they usually
communicate one-on-one. We can also see closed loops, as well as several
arrows from one node. This is an indicator that users have replied to
each other several times.
Next, we will analyze the tone of the comments. To do this, first activate the necessary library and run the following command
library(syuzhet)
comments <- iconv(youtubeData$Comment, to = 'UTF-8')# converting data to use in package
comments %>% str
## chr [1:130] "😢😢😢😢" "Without a doubt the best football match ever." ...
# Obtain sentiment scores
s <- get_nrc_sentiment(comments)
s$neautral <- ifelse(s$negative + s$positive ==0, 1, 0)
barplot(100*colSums(s)/sum(s),
las = 2,
col = rainbow(10),
ylab = "Percentage",
main = "10 Parenting Tips to Calm Down Any Child In a Minute")
On the resulting graph, we can see that most of the comments were
considered neutral. You can also notice that there are more negative
comments than positive ones. The results of the graph turned out to be
ambiguous for me, because I assumed that there would be more positive
comments. It is also worth considering that some of the comments are
written in languages other than English