INTRODUCTION
In this project “Young Love: Then and Now”, I compared the lyrics of young female singer-songwriters from the 2000s with those of current era. That means, singer-songwriters who were teenagers or in their early twenties between late 2000s and early 2010s, and the current teenage or early twenties artists of 2020s. Many new social issues have arose in the past 10 years, especially for girls with the worldwide emergence of feminism. The artists I will be examining in this project are demographically similar: women in their twenties from English-speaking country, with considerable level of income since they were/are considered successful in global music industry. The difference is the period they live in. Young ones are sensitive to environment, and singer-songwriters are honest and genuine in their works. I considered that their lyrics would be able to show how these people are conceiving the world around them.
RESEARCH QUESTIONS
I expected the two groups(young artists of then, and now) to certainly have similarities, but wanted to find out and catch the noticeable differences among them.
This leads to my research questions:
For the questions, some of my expectations were that: - Lyrics of current artists would be more cynical, skeptical, and realistic compared to the ones from late 2000s. - The speakers of current lyrics would be more independent and confident women figures, than the speakers of 2000s. - Recent lyrics would show the shift in technology, such as keywords related to digital.
For the analysis I visualized the most frequent words in the lyrics, the sentiments, the most frequent words in tf-idf, and the network graph for each group. More detailed explanations about each figure and its result are provided below with the codes.
CONCLUSION
In summary of the analysis I’ve conducted(detailed description/explanations are down below), I didn’t notice significant numerical differences between the content and sentiment of the two groups of artists. However, a number of noteworthy contextual points were discovered. As shown in the word frequency models, each individual artist had her own characteristic, such as Jepsen’s casual voice and Rodrigo’s emotional struggles. Over the word frequencies and sentiment analysis, I discovered some clues that can be explained as a shift from the more positive atmosphere of 2000s to a more cynical landscape of contemporary. These can be seen in the sentiment analysis graphs and the bigram networks. One more interesting thing I’ve found is that hints of change in women’s personality were shown, especially from the network graphs where the words like ‘gonna’ connected to different types of words compared to those of the past. Also, although negative words slightly outweighing positive ones(excluding ‘love’, which was substantially top number 1 in frequency) was same for both time periods, the shape of the negative emotion seemed to change, from intimidated sadness to rage and disgust. But topics regarding progress in technology were not found in this analysis. To put it short, changes in emotional states/sentiment were not significantly visible in the numeric, but there were still some contexts that I could catch to argue that a more cynical voice is conveyed in contemporary young female singer-songwriters’ music. Differences in theme or topics between young female artists of then and now were not visible.
This project showed me that the themes and agendas in adolescence’s lyrics do not actively react in a rapid manner, or vividly convey changes of society over approximate 10 years. However, changes in sentiments are more visible, and it tells me that although our concerns and interests might be timeless, the attitudes change and pop music also can be part of it.
For artists from then and now, I selected two artists for each time period. Then I picked out two full albums(of appropriate release year) from each artist, therefore having chunks of lyric texts as datasets for the analysis.
On selecting which artists I should bring into question, I considered many things. First, she must be a women singer-songwriter, who writes for herself. She must be in her early twenties or late teenage at the point of the album release. Another important factor was that the artist must be ‘popular’, to some degrees considered commercially ‘successful’. This was to ensure that the songs could arouse public sympathy. Also, I wanted the artists to be similar in image and style if possible, where the main agendas are related to love and relationship. So here I tried to match the image of a typical teenage girl, counting out those who are considered relatively ‘minor’ for now.
The artists who match the above conditions were, Taylor Swift(1989) and Carly Rae Jepsen(1985) for 2000s Young Artists, Olivia Rodrigo(2003) and Sabrina Carpenter(1999) for Current Young Artists.
Considering the release date, the EPs used as datasets are ‘Fearless(2008)’, ‘Speak Now(2010)’ by Taylor Swift, ‘Tug of War(2008)’, ‘Kiss(2012)’ by Carly Rae Jepsen, ‘SOUR(2021)’, ‘GUTS(2023)’ by Olivia Rodrigo, ‘Singular: Act II(2019)’, ‘emails i can’t send(2022)’ by Sabrina Carpenter.
I got all the lyrics from online streaming platforms such as MelOn and Apple Music, put them into excel, and saved via csv file. I made one csv file per each artist, putting all the lyrics from every track of the two albums into one excel sheet.
I started with loading the csv files of each artist into R. Each artist’s lyrics data was assigned to the variable named after the artist name: rodrigo, sabrina, taylor, jepsen.
Originally, on the data frame the texts were labeled by the title of the first track, so I changed them into ‘lyrics’ using rename() function, for later tokenization.
rodrigo <- read.csv("text_files/rodrigo.csv")
sabrina <- read.csv("text_files/sabrina.csv")
taylor <- read.csv("text_files/taylor.csv")
jepsen <- read.csv("text_files/jepsen.csv")
rodrigo <- rename(rodrigo, lyrics = brutal)
sabrina <- rename(sabrina, lyrics = emails.i.cant.send)
taylor <- rename(taylor, lyrics = Fearless)
jepsen <- rename(jepsen, lyrics = kiss)
On the cleaning process, I designated some words other than basic stop words to exclude from the analysis. These include meaningless interjections like “la”, “ooh”, “yeah”, and musical terms like “chorus”, since they are useless in the context analysis. I bind these additional words with stop words, and named the variable ‘filter_words’. This is used throughout the whole project for making tidy data sets.
I tokenized the text into words, removing the filter words. The tidy texts were assigned to ‘tidy_artist’ variables.
Using the tidy texts, I counted up the words in them. Since the original csv files I made only had the lyric texts in it, I used mutate() function to add ‘artist’ column and made the tables distinguishable.
data("stop_words")
exclude <- tibble(word = c("ooh", "ah", "ahh", "eh", "la", "yeah", "da", "mm", "rai", "ha", "uh", "em", "whoa", "woah", "Chorus","chorus", "Verse", "Outro", "Bridge"))
filter_words <- bind_rows(stop_words, exclude)
tidy_taylor <- taylor %>%
unnest_tokens(word, lyrics) %>%
anti_join(filter_words)
## Joining with `by = join_by(word)`
tidy_taylor %>%
count(word, sort = TRUE)%>%
mutate(artist = "Taylor Swift")
tidy_jepsen <- jepsen %>%
unnest_tokens(word, lyrics) %>%
anti_join(filter_words)
## Joining with `by = join_by(word)`
tidy_jepsen %>%
count(word, sort = TRUE) %>%
mutate(artist = "Carly Rae Jepsen")
tidy_rodrigo <- rodrigo %>%
unnest_tokens(word, lyrics) %>%
anti_join(filter_words)
## Joining with `by = join_by(word)`
tidy_rodrigo %>%
count(word, sort = TRUE) %>%
mutate(artist = "Olivia Rodrigo")
tidy_sab <- sabrina %>%
unnest_tokens(word, lyrics) %>%
anti_join(filter_words)
## Joining with `by = join_by(word)`
tidy_sab %>%
count(word, sort = TRUE) %>%
mutate(artist = "Sabrina Carpenter")
The first figure is word frequency graph. To start on this analysis, I visualized the 30 most frequent words of each artist, so that I can see the overall mood and tone of the texts. I grouped the graphs from the same era together, and put them side by side. This was to examine the characteristics of that time period at once, and compare with the other era.
About the variables, ‘artist_freq’ prints to the artist’s frequency graph, and ‘time period_count’ (ex. then_count) is the two graphs put into one image side by side.
This was basically to kick start on the analysis, so rather than differentiating the overall character and context of the eras, I focused on what I can note on each individual artist.
In Taylor Swift’s lyrics wishful and romantic words like ‘grow’, ‘forever’, ‘smile’, ‘someday’ were visible, along with depictive words like ‘rain’, ‘shine’, door’. Carly Rae Jepsen’s lyrics contained a lot of daily, casual terms, such as ‘girlfriend’, ‘sweetie’, ‘dancing’, ‘money’. From Olivia Rodrigo and Sabrina Carpenter, who are the current young artists, some unique, distinguishing words stood out, like ‘jealousy’, ‘social’, ‘suicide’ for Rodrigo, and ‘pushing’ and ‘faking’ for Sabrina Carpenter. Overall, young artists of 2000s seemed to convey a bit more positive atmosphere compared to the current girls.
tay_wordcount <- tidy_taylor %>%
count(word, sort = TRUE)
tayfreq <- tay_wordcount %>%
top_n(30) %>%
ggplot(aes(x = reorder(word, n), y = n)) +
geom_bar(stat = "identity", fill = "skyblue") +
coord_flip() +
labs(title = "Taylor Swift",
x = "Words",
y = "Frequency")
## Selecting by n
jepsen_wordcount <- tidy_jepsen %>%
count(word, sort = TRUE)
jepsenfreq <- jepsen_wordcount %>%
top_n(30) %>%
ggplot(aes(x = reorder(word, n), y = n)) +
geom_bar(stat = "identity", fill = "lightpink") +
coord_flip() +
labs(title = "Carly Rae Jepsen",
x = "Words",
y = "Frequency")
## Selecting by n
then_count <- tayfreq + jepsenfreq +
plot_layout(ncol = 2) +
plot_annotation(title = "Top 30 Common Words in Taylor Swift & Carly Rae Jepsen EPs")
then_count
rodrigo_wordcount <- tidy_rodrigo %>%
count(word, sort = TRUE)
rodrigofreq <- rodrigo_wordcount %>%
top_n(30) %>%
ggplot(aes(x = reorder(word, n), y = n)) +
geom_bar(stat = "identity", fill = "purple") +
coord_flip() +
labs(title = "Olivia Rodrigo",
x = "Words",
y = "Frequency")
## Selecting by n
sab_wordcount <- tidy_sab %>%
count(word, sort = TRUE)
sabfreq <- sab_wordcount %>%
top_n(30) %>%
ggplot(aes(x = reorder(word, n), y = n)) +
geom_bar(stat = "identity", fill = "darkred") +
coord_flip() +
labs(title = "Sabrina Carpenter",
x = "Words",
y = "Frequency")
## Selecting by n
now_count <- rodrigofreq + sabfreq +
plot_layout(ncol = 2) +
plot_annotation(title = "Top 30 Common Words in Olivia Rodrigo & Sabrina Carpenter EPs")
now_count
Next I visualized the sentiments of each artist. I got top 15 positive and negative words for each artist, and put the data into bar graphs.
‘artist_sentiment’ (ex. tay_sentiment) is the count of positive and negative words of the artist, ‘plot_artist_sentiment’ (ex. plot_tay_sentiment) is the graph of 15 most frequent sentiment words drawn with ‘generate_sentiment_plot’. ‘artist combination_sentiment_plot’ (ex. tayjepsen_sentiment_plot) is the chart where the graphs of the two artist are put together side by side.
The overall sentiment comparison between then and now will be conducted below, so the purpose of this part is similar with the first figures: to examine and compare the characteristics of individual artist, as part of the early stages of the project.
The frequency of positive and negative words itself does not show significant level of difference between artists, but what I noticed was that Carly Rae Jepsen’s top negative words were high in frequency compared to others. Interestingly, Taylor Swift had wide range of positive words, which means she used diverse positive words by similar, moderate level of frequency. Also, among positive words ‘love’ was high in rank for all four artists.
generate_sentiment_plot <- function(sentiment_data, title) {
sentiment_data %>%
group_by(sentiment) %>%
top_n(15, wt = n) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~sentiment, scales = "free_y") +
coord_flip() +
labs(title = title, x = "Words", y = "Frequency")
}
get_sentiments("bing")
tay_sentiment <- tidy_taylor %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining with `by = join_by(word)`
jepsen_sentiment <- tidy_jepsen %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining with `by = join_by(word)`
plot_taylor_sentiment <- generate_sentiment_plot(tay_sentiment, "Taylor Swift Sentiment")
plot_jepsen_sentiment <- generate_sentiment_plot(jepsen_sentiment, "Carly Rae Jepsen Sentiment")
tayjepsen_sentiment_plot <- plot_taylor_sentiment + plot_jepsen_sentiment
plot_annotation(title = "Top 15 Positive and Negative Words in Swift & Jepsen EPs")
## $title
## [1] "Top 15 Positive and Negative Words in Swift & Jepsen EPs"
##
## $subtitle
## NULL
##
## $caption
## NULL
##
## $tag_levels
## NULL
##
## $tag_prefix
## NULL
##
## $tag_suffix
## NULL
##
## $tag_sep
## NULL
##
## $theme
## Named list()
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
##
## attr(,"class")
## [1] "plot_annotation"
tayjepsen_sentiment_plot
rodrigo_sentiment <- tidy_rodrigo %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining with `by = join_by(word)`
sab_sentiment <- tidy_sab %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining with `by = join_by(word)`
plot_rodrigo_sentiment <- generate_sentiment_plot(rodrigo_sentiment, "Olivia Rodrigo Sentiment")
plot_sab_sentiment <- generate_sentiment_plot(sab_sentiment, "Sabrina Carpenter Sentiment")
rodrisab_sentiment_plot <- plot_rodrigo_sentiment + plot_sab_sentiment
plot_annotation(title = "Top 15 Positive and Negative Words in Rodrigo & Carpenter EPs")
## $title
## [1] "Top 15 Positive and Negative Words in Rodrigo & Carpenter EPs"
##
## $subtitle
## NULL
##
## $caption
## NULL
##
## $tag_levels
## NULL
##
## $tag_prefix
## NULL
##
## $tag_suffix
## NULL
##
## $tag_sep
## NULL
##
## $theme
## Named list()
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
##
## attr(,"class")
## [1] "plot_annotation"
rodrisab_sentiment_plot
From here on to raise the reliability a bit, I added one more artist to each group. Lana Del Rey(1985) for 2000s group, and Billie Eilish(2001) for Current group. These two artists were not included in the previous analysis because they’re artists whose lyrics are considered comparatively ‘dark’, and I considered them not suitable when comparing individuals.
However, from here I’m going to combine the text data of the three artists in each group, and treat them as one dataset to compare the overall context of THEN and NOW.
The albums in issue are ‘Lana Del Ray(2010)’, ‘Born to Die(2012)’ by Lana Del Rey, ‘WHEN WE ALL FALL ASLEEP, WHERE DO WE GO?(2019)’, ‘Happier Than Ever(2021)’ by Billie Eilish.
Let’s load the csv files of Lana and Billie, rename for tokenization, tidy them up.
billie <- read.csv("text_files/billie.csv")
lana <- read.csv("text_files/lana.csv")
billie <- rename(billie, lyrics = Getting.Older)
lana <- rename(lana, lyrics = Born.To.Die)
tidy_lana <- lana %>%
unnest_tokens(word, lyrics) %>%
anti_join(filter_words)
## Joining with `by = join_by(word)`
tidy_lana %>%
count(word, sort = TRUE) %>%
mutate(artist = "Lana Del Rey")
tidy_billie <- billie %>%
unnest_tokens(word, lyrics) %>%
anti_join(filter_words)
## Joining with `by = join_by(word)`
tidy_billie %>%
count(word, sort = TRUE) %>%
mutate(artist = "Lana Del Rey")
Now I’ll combine the text data of the three artists for each group. For example, I’ll bind the texts of Olivia Rodigo, Sabrina Carpenter, and Billie Eilish together as one data, and label the combined text as ‘now’ using mutate(), which means it’s the text that represents young artists of current era.
After binding each group’s three artists into ‘combined_now’ and ‘combined_then’, I cleaned the new datasets by unnest_tokens() and removing filter words. Also, I got the word counts for the new combined texts.
combined_now <- bind_rows(
rodrigo %>% mutate(document = "Rodrigo_Sabrina_Billie"),
sabrina %>% mutate(document = "Rodrigo_Sabrina_Billie"),
billie %>% mutate(document = "Rodrigo_Sabrina_Billie")
)
combined_then <- bind_rows(
taylor %>% mutate(document = "Taylor_Jepsen_Lana"),
jepsen %>% mutate(document = "Taylor_Jepsen_Lana"),
lana %>% mutate(document = "Taylor_Jepsen_Lana")
)
tidy_then <- combined_then %>%
unnest_tokens(word, lyrics) %>%
anti_join(filter_words)
## Joining with `by = join_by(word)`
tidy_then %>%
count(word, sort = TRUE) %>%
mutate(when = "then")
tidy_now <- combined_now %>%
unnest_tokens(word, lyrics) %>%
anti_join(filter_words)
## Joining with `by = join_by(word)`
tidy_now %>%
count(word, sort = TRUE) %>%
mutate(when = "now")
These are the sentiment graphs of the combined texts, which shows the top 15 positive and negative words of the era. We can compare the overall sentiments shown in the lyrics of the two time periods from these graphs.
‘time_sentiment’ variables are the counts of positive and negative words from the combined data for each era, and top 15 words from the counts are visualized into graphs on ‘plot_time_sentiment’ variables.
Regarding the ratio of positive and negative, if we leave out the word ‘love’, which shows remarkably high frequency among both time periods, the frequency of ‘negative’ is slightly higher for both 2000s and current young artists. This difference in ratio is more visible in the graph of Current artists, but by low level. The word ‘love’, which topped the positive rank for both then and now, was used about twice as frequently in 2000s than now. On the other hand, the most frequent negative word of current era, ‘bad’, surpassed the frequency of most used negative word of 2000s, ‘cry’, by about 20 more uses.
One thing I found interesting was that in 2000s era, the most frequent negative words are highly related to the emotion of sadness, as we can see from words like ‘cry’, ‘sad’, ‘sadness’, but we cannot find any of those words from the graph of current era. Instead, current negative chart consists of words more related to anger and cynicism, such as ‘hate’, ‘hell’, ‘damn’, ‘blah’ which aren’t on 2000s graph. In the case of positive words, Current graph contains more specific, unique adjectives like ‘logical’, ‘satisfied’, ‘smart’ compared to 2000s, which contains more broad and abstract concepts like ‘bright’, ‘beautiful’, ‘heaven’.
then_sentiment <- tidy_then %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining with `by = join_by(word)`
now_sentiment <- tidy_now %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining with `by = join_by(word)`
plot_then_sentiment <- generate_sentiment_plot(then_sentiment, "Sentiment of 2000s Young Artists")
plot_now_sentiment <- generate_sentiment_plot(now_sentiment, "Sentiment of Current Young Artists")
plot_then_sentiment
plot_now_sentiment
The third figures show the 30 most frequent words in tf-dif for each era. This was to see if I can find any special, distinguishable topic or issue from the different time periods, but the results overall consist of common words, and there were no particular keywords that portray the social agenda. So the content of these figures are not so different from what can be seen from figure 1, only useful in understanding the general atmosphere each text data conveys.
‘tf_idf_time’ variables are for calculating tf-idf and counting up the words. ‘top30_frequent_time’ variables indicate the 30 tf-idf words highest in frequency.
tf_idf_then <- tidy_then %>%
count(document, word) %>%
bind_tf_idf(word, document, n) %>%
arrange(desc(tf_idf))
tf_idf_now <- tidy_now %>%
count(document, word) %>%
bind_tf_idf(word, document, n) %>%
arrange(desc(tf_idf))
top30_frequent_then <- tf_idf_then %>%
arrange(desc(n)) %>%
slice_max(n, n = 30)
top30_frequent_now <- tf_idf_now %>%
arrange(desc(n)) %>%
slice_max(n, n = 30)
top30_frequent_then %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n, fill = document)) +
geom_col(show.legend = FALSE) +
geom_bar(stat = "identity", fill = "violet") +
coord_flip() +
labs(x = NULL,
y = "Frequency",
title = "Top 30 Frequent Words in TF-IDF from 2000s Young Artists") +
theme_minimal()
top30_frequent_now %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n, fill = document)) +
geom_col(show.legend = FALSE) +
geom_bar(stat = "identity", fill = "navy") +
coord_flip() +
labs(x = NULL,
y = "Frequency",
title = "Top 30 Frequent Words in TF-IDF from Current Young Artists") +
theme_minimal()
The last figures are the network graphs for each era’s text. I made bigram networks to see what pair of words are used together in succession a lot, the overall relationship between words, and therefore understand the context of each era’s lyrics
First tokenize the texts into bigrams and assign them to the variable ‘bigram_time’. Then separate the two words of a bigram, assign them to ‘bigram_separated_time’. Remove the filter words, and count up the words.
bigram_then <- combined_then %>%
unnest_tokens(bigram, lyrics, token = "ngrams", n = 2)
bigram_now <- combined_now %>%
unnest_tokens(bigram, lyrics, token = "ngrams", n = 2)
bigrams_separated_then <- bigram_then %>%
separate(bigram, c("word1", "word2"), sep = " ")
bigrams_separated_now <- bigram_now %>%
separate(bigram, c("word1", "word2"), sep = " ")
bigrams_filtered_then <- bigrams_separated_then %>%
filter(!word1 %in% filter_words$word) %>%
filter(!word2 %in% filter_words$word)
bigrams_filtered_now <- bigrams_separated_now %>%
filter(!word1 %in% filter_words$word) %>%
filter(!word2 %in% filter_words$word)
bigram_counts_then <- bigrams_filtered_then %>%
count(word1, word2, sort = TRUE) %>%
na.omit()
bigram_counts_now <- bigrams_filtered_now %>%
count(word1, word2, sort = TRUE) %>%
na.omit()
bigram_counts_then
bigram_counts_now
Once the bigram words counts were ready, I created ‘graph_bigram_time’ variable to visualize the bigram network. When sketching the network graphic I set it to show words that appeared 3 times or more, for both THEN and NOW.
So the setting was same for the two graphs, but when I got the result it was clear that the network graph for 2000s artists was more complex than the Current one, with far more nodes. I interpreted this as 2000s young artists having more diversity in their vocabulary, or it might imply that recent songs tend to have lots of repeated words.
I observed some interesting points from the network graphs. The words ‘wanna’, ‘love’ and ‘gonna’ were frequently used for both time periods, but the other words that connect with these showed some difference.
In 2000s Young Artist graph ‘wanna’, a word that can show one’s wants and desire, connected with ‘jump’, ‘kiss’, ‘feel’, ‘wake’, ‘miss’. However in Current Young Artists, ‘wanna’ was used in succession with ‘suicide’, ‘curl’, ‘stay’, etc, giving quite different impression with the past group. The word ‘love’, connected with immensely negative words like ‘embarrassing’, ‘fucking’ in Current artists’ network, was on the other hand connected with relatively positive words like ‘true’, ‘forever’ in the graph of 2000s artists. The last point was on the word ‘gonna’, a word used to show someone’s will. It was connected with ‘leave’, ‘follow’ in 2000s, and with ‘run’, ‘stop’ in Current era. Whereas the former are words that must have particular objects, the latter are words that don’t need objects. These observations made me wonder, what can it imply?
font_add_google("Roboto", "roboto")
showtext_auto()
graph_bigram_then <- bigram_counts_then %>%
filter(n >= 3) %>%
as_tbl_graph(directed = F) %>%
mutate(centrality = centrality_degree(),
group = as.factor(group_infomap()))
set.seed(1234)
ggraph(graph_bigram_then, layout = "fr") +
geom_edge_link(color = "gray50",
alpha = 0.5) +
geom_node_point(aes(size = centrality,
color = group),
show.legend = F) +
scale_size(range = c(3, 5)) +
geom_node_text(aes(label = name),
repel = T,
size = 3.5) +
theme_graph() +
labs(title = "Network Graph of 2000s Young Artists") +
theme(
plot.title = element_text(family = "roboto", size = 15, face = "bold")
)
font_add_google("Roboto", "roboto")
showtext_auto()
graph_bigram_now <- bigram_counts_now %>%
filter(n >= 3) %>%
as_tbl_graph(directed = F) %>%
mutate(centrality = centrality_degree(),
group = as.factor(group_infomap()))
set.seed(1234)
ggraph(graph_bigram_now, layout = "fr") +
geom_edge_link(color = "gray50",
alpha = 0.5) +
geom_node_point(aes(size = centrality,
color = group),
show.legend = F) +
scale_size(range = c(3, 5)) +
geom_node_text(aes(label = name),
repel = T,
size = 4) +
theme_graph() +
labs(title = "Network Graph of Current Young Artists") +
theme(
plot.title = element_text(family = "roboto", size = 15, face = "bold")
)