Assignment 7

Author

Olivia Delffs

Introduction

In recent years, there has been a reduced gap in qualifications for what makes a classic celebrity rather than an internet celebrity, otherwise known as an influencer. In the early ages of the internet, when people were first able to make a livable career off of being an internet personality, it was seen as a somewhat laughable, and there was a clear distinction between them and a classic celebrity. But now, internet influencers are invited to exclusive events such as the Met Gala and are endorsed by luxury brands like Dior.

Often regarded as the most influential celebrities in terms of fashion, beauty, and lifestyle, the Kardashians have been spotted hanging out with younger, internet influencers. The most well-known example is a friendship fling between Kourtney Kardashian and TikTok star Addison Rae, who is much younger and does not come from the same wealthy celebrity background. This has led many people to question the future of celebrity culture and nepotistic structures within the space.

In general, are the Kardashians talked about in the same light as influencers?

To answer this question, we will look at post titles from various subreddits that discuss the Kardashians and compare the positivity of the words within these titles to those from subreddits discussing influencers which are contained in fullFrame.

fullFrame <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/delffso_xavier_edu/Edn6R7tWl8BErvaR0Tm8pGkB0db4Uc-nP3S85XO_gV0rbQ?download=1")

To begin, we will remove words that are more so used for sentence flow like “accordingly” and don’t have sentimental value.

tidy_sub <- fullFrame %>%
  unnest_tokens(word, Title) %>%
  anti_join(stop_words)

To categorize the words used in the Reddit post titles, we will assign them positive or negative value based on the Bing lexicon and also count how many times each words are used.

bing <- get_sentiments("bing")
sub_counts <- 
  tidy_sub %>% 
  group_by(Category, word) %>% 
  summarize(n = n()) %>% 
  inner_join(bing)

This graph shows that the Kardashians are spoken about with more positive sentiments on Reddit than influencers in general, but this only shows the words that are used more than once. It also shows that both groups are mainly talked about in discussion of their looks.

To give each category a numerical score using each word, we will create a new vector that will be equal to the number of times a word is used multiplied by 1 if positive and -1 if negative, and then add the scores together.

sub_counts <- sub_counts %>% 
  mutate(Score = ifelse(sentiment == "positive", n * 1, n * -1))

influencerScore <- sub_counts %>%
  filter(Category == "Influencer") %>%
  summarise(Score = sum(Score)) %>%
  pull(Score)

kardashianScore <- sub_counts %>%
  filter(Category == "Kardashian") %>%
  summarise(Score = sum(Score)) %>%
  pull(Score)

This yields the Kardashian’s a score of 5, and the influencers a score of -5. This could be used to strengthen an argument that the Kardashian’s are still regarded as a level of celebrity greater than internet influencers and therefore more respected.

What emotions are reflected in these post titles?

To answer this question, we will perform a similar analysis to the previous, except instead of creating a positivity-score, we will analyse the feelings that are associated with the words used.

This code pulls the associated emotions and the number of words that are associated within each, and then compiles them into a cohesive data frame.

kardashCounts <- tidy_sub %>% filter(Category == "Kardashian")
influencerCounts <- tidy_sub %>% filter(Category == "Influencer")
nrc <- get_sentiments("nrc")
kardash_sentiment <- 
  kardashCounts %>% 
  inner_join(nrc, by = "word", relationship = "many-to-many") %>% 
  group_by(sentiment) %>%
  summarize(`Count`=n(),
            `Percent of scoreable words` = `Count`/nrow(.)) %>% 
  arrange(-`Percent of scoreable words`)
kardash_sentiment <- kardash_sentiment %>% mutate(Category = "Kardashian")
influencer_sentiment <- 
  influencerCounts %>% 
  inner_join(nrc, by = "word", relationship = "many-to-many") %>% 
  group_by(sentiment) %>%
  summarize(`Count`=n(),
            `Percent of scoreable words` = `Count`/nrow(.)) %>% 
  arrange(-`Percent of scoreable words`)
influencer_sentiment <- influencer_sentiment %>% mutate(Category = "Influencer")
redditSentiment <- rbind(influencer_sentiment,kardash_sentiment)

This graph shows us that about the same amounts of words used to discuss both categories have a positive connotation or one relating to anticipation, whereas more have a negative connotation with influencers. More words are related to trust with the Kardashians over influencers, which makes sense because often times their family dynamic is discussed. Sentiments of anger, disgust, fear, and sadness are found more in discussions of influencers; sentiments of joy and surprise are found more in discussions of the Kardashians. This further proves that the Kardashians are talked about in a more positive light than influencers.

Does taking slang into account have an impact on sentiment analysis?

There are many slang words or words that may have a different connotation on the internet than in the general lexicon, especially in spaces discussing celebrities. These spaces often use slang that is also popular in the realm of Rupaul’s Drag Race, so we will use a data frame that contains words from a subreddit about Rupaul’s Drag Race and their sentiment value.

rupaulSent <- read_tsv("https://myxavier-my.sharepoint.com/:u:/g/personal/delffso_xavier_edu/ESfuy1M39AdPmF5ZyCNXkrgByiEJMhEBg5PVJR9Jq-pk1w?download=1", col_names = FALSE)
colnames(rupaulSent) <- c("word", "mean", "StD")
rupaulSent <- rupaulSent %>% select(word, mean)

Now we will perform the same analysis we did with the bing lexicon except with this lexicon.

common_words <- intersect(tidy_sub$word, rupaulSent$word)
reddit_tidy <- tidy_sub %>% filter(word %in% common_words)
rup_counts <- 
  reddit_tidy %>% 
  group_by(Category, word) %>% 
  summarize(n = n()) %>% 
  inner_join(rupaulSent)
rup_counts <- rup_counts %>% mutate(sentiment = ifelse(mean < 0, "negative", "positive"))

Because “kim”, “kylie”, “kardashian”, “west”, “parker”, and “bianca” are names, we will take those out and reanalyze, but it is interesting to note that most of those names have negative sentiment associated with them, especially kim and kylie because those are names of 2 of the Kardashians.

From this graph it seems like both influencers and the Kardashians are talked about in the same light, but if anything, the Kardashians more positively. Again, this only includes words that are used more than once, so we will calculate a numerical score including all words.

influencerScore2 <- rup_counts %>%
  filter(Category == "Influencer") %>%
  filter(!word %in% namesVec) %>%
  summarise(Score = sum(mean)) %>%
  pull(Score)

kardashianScore2 <- rup_counts %>%
  filter(Category == "Kardashian") %>%
  filter(!word %in% namesVec) %>%
  summarise(Score = sum(mean)) %>%
  pull(Score)

With this lexicon, the score for influencer is 6.58, which is positive in comparison to bing’s score of -5. This also yields a higher Kardashian score, which is 33.56, a huge jump from bing’s score of 5. Accounting for slang and internet language clearly makes a difference in the result.

Conclusion

While internet influencers are making their way into traditional celebrity spaces, they still seem to be discussed as lesser than the famous and influential Kardashians.