Abstract:

The question asked throughout this project was “Does what you say reflect who you are? The goal of this project is to examine what characters say in each of the 3 Lord of the Rings movies, how they change. Lannguage is a medium for characters to communicate their personalitites and motives. Does what they say accurately depict who they are? Example: Is Sam a good character? Many would say yes, but does this match up with his speech? Something else to ask is if a charaters speech is one way to signify their character development. A hope for this project would be to develop a program to track a characters growth by the sentiment of their language.

Import Data

The data for this project was found at https://www.kaggle.com/datasets/paultimothymooney/lord-of-the-rings-data?select=lotr_characters.csv. A data competition website, where another data scientist collected an analyzed this data, and shared his dataset for others use. In the same way in literature classes we talk about joining a conversation and responding to what we are reading, this is not confinded to subjects! In mathematics we do the same thing, looking at what others have learned, their observations spark questions and elicit responses worthy of research and thought. We have such a tendency to make distinctions but boiled down we are more alike than we think.

data = read.csv ('https://raw.githubusercontent.com/nicabey/LOTR/main/lotr_characters.csv?token=GHSAT0AAAAAABS2BA62KNN5FIJYTLUO2HLAYSEV5XA')
scriptsdata = read.csv('https://raw.githubusercontent.com/nicabey/LOTR/main/lotr_scripts.csv?token=GHSAT0AAAAAABS2BA63TGRMJVAXOAI52E46YSEV6IA')

Data Cleaning.

In the movie scipts data, there is some unnecessary differentiation between what character is speaking. Movie scripts will identify a character’s voice sometimes and separate it from the character. “Frodo Voice-over” vs. Frodo speaking to another character. All are things that The character is saying, so by cleaning up the data to remove unnecessary identifiers like “voice over” etc. we can get more of what each character says.

scriptsdata <- scriptsdata %>%
  mutate(char = str_replace_all(string = scriptsdata$char, pattern =" VOICE OVER",replacement = ""))
scriptsdata <- scriptsdata %>%
  mutate(char = str_replace_all(string = scriptsdata$char, pattern =" VOICE-OVER",replacement = ""))
scriptsdata <- scriptsdata %>%
  mutate(char = str_replace_all(string = scriptsdata$char, pattern =" VOICEOVER",replacement = ""))
scriptsdata <- scriptsdata %>%
  mutate(char = str_replace_all(string = scriptsdata$char, pattern =" VOICE",replacement = ""))

Sentiment Analysis

Determining good or bad is relative, yes. There is a way to automate it. Sentiment Analysis is that way. Sentiment Analysis is a program that looks a strings of text and assigns to them a positive, negative or neutral value. Businesses often use this to examine consumer feedback ex. social media, and product reviews. The goal in this section is to examine the lines each character says in “The Fellowship of the Ring”, “The Two Towers”, and “The Return of the King” and determine a characters sentiment.

Sentiment in practice?

statement = "I am very nervous about presenting in class"
statement2 = "I am excited for graduation next week"
t(analyzeSentiment(c(statement, statement2))[,1:4])
##                    [,1] [,2]
## WordCount     3.0000000  4.0
## SentimentGI  -0.3333333  0.5
## NegativityGI  0.3333333  0.0
## PositivityGI  0.0000000  0.5

We can see in the above example 2 statements with different sentiments. In the first

sentiment1 <- cbind(scriptsdata,sentiment1)
t(sentiment1[10,])
##                    10                       
## X                  "9"                      
## char               "SMEAGOL"                
## dialog             " My precious.  "        
## movie              "The Return of the King "
## WordCount          "1"                      
## SentimentGI        "1"                      
## NegativityGI       "0"                      
## PositivityGI       "1"                      
## SentimentHE        "0"                      
## NegativityHE       "0"                      
## PositivityHE       "0"                      
## SentimentLM        "0"                      
## NegativityLM       "0"                      
## PositivityLM       "0"                      
## RatioUncertaintyLM "0"                      
## SentimentQDAP      "1"                      
## NegativityQDAP     "0"                      
## PositivityQDAP     "1"
head(sentiment1[sentiment1['movie'] =="The Fellowship of the Ring ",])
##         X    char
## 1401 1400   MERRY
## 1402 1401 STRIDER
## 1403 1402   FRODO
## 1404 1403 STRIDER
## 1405 1404   FRODO
## 1406 1405 STRIDER
##                                                                                    dialog
## 1401                               What are     they eating when they can't get hobbit ? 
## 1402                                                                                     
## 1403                                          Who is     she ? This woman you sing of ?  
## 1404 Tis the Lady of L'thien. The Elf Maiden who gave her love     to Beren ... a mortal 
## 1405                                                               What happened to her? 
## 1406                                                                        She died.    
##                            movie WordCount SentimentGI NegativityGI
## 1401 The Fellowship of the Ring          4  -0.2500000         0.25
## 1402 The Fellowship of the Ring          0         NaN          NaN
## 1403 The Fellowship of the Ring          2   0.0000000         0.00
## 1404 The Fellowship of the Ring          9   0.1111111         0.00
## 1405 The Fellowship of the Ring          1   0.0000000         0.00
## 1406 The Fellowship of the Ring          1  -1.0000000         1.00
##      PositivityGI SentimentHE NegativityHE PositivityHE SentimentLM
## 1401    0.0000000           0            0            0           0
## 1402          NaN         NaN          NaN          NaN         NaN
## 1403    0.0000000           0            0            0           0
## 1404    0.1111111           0            0            0           0
## 1405    0.0000000           0            0            0           0
## 1406    0.0000000           0            0            0           0
##      NegativityLM PositivityLM RatioUncertaintyLM SentimentQDAP NegativityQDAP
## 1401            0            0                  0     0.0000000            0.0
## 1402          NaN          NaN                NaN           NaN            NaN
## 1403            0            0                  0    -0.5000000            0.5
## 1404            0            0                  0     0.1111111            0.0
## 1405            0            0                  0     0.0000000            0.0
## 1406            0            0                  0    -1.0000000            1.0
##      PositivityQDAP
## 1401      0.0000000
## 1402            NaN
## 1403      0.0000000
## 1404      0.1111111
## 1405      0.0000000
## 1406      0.0000000
sentiment1 %>%
  filter(movie == "The Fellowship of the Ring ") %>%
  group_by(char)%>%
  summarise(mean(SentimentGI, na.rm = TRUE)) %>%
  head(10)
## # A tibble: 10 × 2
##    char             `mean(SentimentGI, na.rm = TRUE)`
##    <chr>                                        <dbl>
##  1 ARAGORN                                    0.0494 
##  2 ARWEN                                      0.00725
##  3 BARLIMAN                                   0.0374 
##  4 BILBO                                      0.105  
##  5 BOROMIR                                    0.00343
##  6 CHILDREN HOBBITS                           0      
##  7 CROWD                                      0      
##  8 ELROND                                    -0.0309 
##  9 FARMER MAGGOT                             -0.333  
## 10 FRODO                                      0.0634
#head(sentiment1)

Do People Change?

Now that we have identified what characters are saying as good or bad, are we able to see that they have changed over time?

sentiment1 %>%
  group_by(char,movie)%>%
  summarise(mean(SentimentGI, na.rm = TRUE)) %>%
  arrange(char,match(movie, c("The Fellowship of the Ring ","The Two Towers ","The Return of the King " ))) %>%
  head(7)
## `summarise()` has grouped output by 'char'. You can override using the
## `.groups` argument.
## # A tibble: 7 × 3
## # Groups:   char [5]
##   char       movie                         `mean(SentimentGI, na.rm = TRUE)`
##   <chr>      <chr>                                                     <dbl>
## 1 " GANDALF" "The Return of the King "                              -0.417  
## 2 "(GOLLUM"  "The Return of the King "                               0      
## 3 "ARAGORN"  "The Fellowship of the Ring "                           0.0494 
## 4 "ARAGORN"  "The Two Towers "                                      -0.00507
## 5 "ARAGORN"  "The Return of the King "                               0.0864 
## 6 "ARAGORN " "The Two Towers "                                       0.125  
## 7 "ARGORN"   "The Two Towers "                                       0.125
frodosarc <- sentiment1 %>%
  arrange(match(movie, c("The Fellowship of the Ring ","The Two Towers ","The Return of the King " ))) %>%
  filter(char == "FRODO") %>%
  select(SentimentGI) %>%
  filter(!is.na(SentimentGI))
frodosarc$TalkingSequence = 1:212
library(ggplot2)
frodosarc %>%
  ggplot(mapping = aes(y = SentimentGI, x = TalkingSequence)) +
  geom_line()

This is all over the place. Why? It looks like my scripts data is not as clean as I’d hoped for. Some of the lines are out of order, the entire Fellowship of the Ring script is spliced throughout the return of the king film. In a perfect world we would hope to see that as the scripts progressed the characters sentiment would also, either in a negative or a positive way.

sentiment1 <- sentiment1 %>%
  rowwise() %>%
  mutate(Positive = (SentimentGI>0))
samschange <- sentiment1 %>%
  arrange(match(movie, c("The Fellowship of the Ring ","The Two Towers ","The Return of the King " ))) %>%
  filter(char == "SAM") %>%
  select(SentimentGI) %>%
  filter(!is.na(SentimentGI))
samschange$TalkingSequence = 1:203
samschange %>%
  ggplot(mapping = aes(y = SentimentGI, x = TalkingSequence)) +
  geom_line()

Sentiment Change

Now having categorized a characters statements as positive and negative I would be curious to see how they change over tolkiens writing. Because the previous section couldn’t have a conclusion drawn from it, lets approach things in a new way. Because the data is out of order we are unable to see the change within a character in a linear way. Instead lets look at things more generally. I’ve chonsent to take the mean of a specific character in each of the 3 movies. From this information I am able to draw a conclusion such as “Frodo has a strong positivce sentiment in The Fellowship of the Ring” rather than wayward data. This is illustrated graphically for Frodo and Sam and our beloved Boromir below.

frodoavg <- sentiment1 %>%
  filter(char == 'FRODO') %>%
  arrange(match(movie, c("The Fellowship of the Ring ","The Two Towers ","The Return of the King " ))) %>%
  group_by(movie) %>%
  summarise(meanSGI = mean(SentimentGI, na.rm = TRUE))
frodoavg$movie <- factor(frodoavg$movie, levels = c("The Fellowship of the Ring ","The Two Towers ","The Return of the King " ))
ggplot(frodoavg, aes(x = movie, y = meanSGI)) +
  geom_col()

samavg <- sentiment1 %>%
  filter(char == 'SAM') %>%
  arrange(match(movie, c("The Fellowship of the Ring ","The Two Towers ","The Return of the King " ))) %>%
  group_by(movie) %>%
  summarise(meanSGI = mean(SentimentGI, na.rm = TRUE))
samavg$movie <- factor(samavg$movie, levels = c("The Fellowship of the Ring ","The Two Towers ","The Return of the King " ))
ggplot(samavg, aes(x = movie, y = meanSGI)) +
  geom_col()

boroavg <- sentiment1 %>%
  filter(char == 'BOROMIR') %>%
  arrange(match(movie, c("The Fellowship of the Ring ","The Two Towers ","The Return of the King " ))) %>%
  group_by(movie) %>%
  summarise(meanSGI = mean(SentimentGI, na.rm = TRUE))
frodoavg$movie <- factor(frodoavg$movie, levels = c("The Fellowship of the Ring ","The Two Towers ","The Return of the King " ))
ggplot(boroavg, aes(x = movie, y = meanSGI)) +
  geom_col()

We can see the change in Frodo’s language over the course of the 3 movies. He begins with a high positive sentiment in the first movie and his sentiment decreases over time. Knowing that frodo does become less positive the why question comes to mind. This could be due to many influences of the journey, exhaustion, or this could be another influence, as the ring bearer he will have had the ring for the longest time in the return of the king. More opportunity for response! This is a conversation, but knowing things for certain can eliminate the thoughts that could be led by personal feelings.

Conclusion

Had there been more time I would’ve liked to clean the data even further reducing the error involved with discrepancies between names and other identifiers. Also further cleaning to organize the scripts data into its correct spots. Additionally, I had found a second data set that identifies the character’s race, with more time we could’ve classified good and bad races based on their sentiment. In this project I learned much about the importance of data entry, and the practice of cleaning up a dirty data set. I also gained more practice with sentiment analysis, and how to apply sentiment to something I hadn’t in the past. Language matters, language teaches about not only the characters in this series, but in knowing how their language reveals who they are we are able to see ourselves in this same way. What we say matters, it has the power to communicate who we are, so speak wisely.