Assignment 10

Load the libraries

library("tibble")
library("tidyverse")
library("tidytext")
library("textdata")
library("slam") 
library('tm')
library("lexicon") #"SentiWordNet"

Getting the primary example code from chapter 2 of the textbook

Sentiment analysis is a very exited topic and can allow us to understand text better. The second chapter of the book A Tidy Approach talks about the approach we can use for sentiment analysis with tidy data.

The following chunks of code are example code that I took from the book “A Tidy Approach”.

Load the library

Let’s explore the different sentiment lexicons.

get_sentiments("afinn")

## # A tibble: 2,477 × 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # ℹ 2,467 more rows

get_sentiments("bing")

## # A tibble: 6,786 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # ℹ 6,776 more rows

get_sentiments("nrc")

## # A tibble: 13,872 × 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # ℹ 13,862 more rows

Extend the code with a different corpus and lexicon

Martin Luther King was an advocate for the civil rights movement who delivered the speech “I Have a Dream” on August 28, 1963. This speech is widely considered the greatest speech of the 20th century for its power and resonance.

I’m interested in using a sentiment lexicon to understand what made the speech so powerful and memorable.

I found a text version of the speech on Kaggle. I added to my github and start tidying the data.

Getting the data

speech_df <- read.csv("https://raw.githubusercontent.com/Kossi-Akplaka/Data607-data_acquisition_and_management/main/Assignment10/dream.txt", header = FALSE)
tibble(speech_df)

## # A tibble: 43 × 1
##    V1                                                                           
##    <chr>                                                                        
##  1 "I am happy to join with you today in what will go down in history as the gr…
##  2 "Five score years ago, a great American, in whose symbolic shadow we stand t…
##  3 "One hundred years later, the colored American lives on a lonely island of p…
##  4 "In a sense we have come to our Nation’s Capital to cash a check. When the a…
##  5 "This note was a promise that all men, yes, black men as well as white men, …
##  6 "It is obvious today that America has defaulted on this promissory note inso…
##  7 " a check that will give us upon demand the riches of freedom and security o…
##  8 "I would be fatal for the nation to overlook the urgency of the moment and t…
##  9 "There will be neither rest nor tranquility in America until the colored cit…
## 10 "We can never be satisfied as long as our bodies, heavy with the fatigue of …
## # ℹ 33 more rows

Sentiment lexicon

The choice of a sentiment lexicon depends on the nature of the text. The Loughran-McDonald Financial Sentiment Word Lists, for instance, are tailored for financial text and may not be suitable for a historical speech.

An alternative is to use the SentiWordNet lexicon. According to SentiWordNet gitHub, SentiWordNet is a lexical resource for opinion mining that assigns to each synset of WordNet three sentiment scores:

Positivity
Negativity
Objectivity (neutral)

Tidy the text data

The speech data has 43 rows and 1 columns. Let’s tidy it up, remove the punctuation, etc…into a corpus.

# Interprets each element as a document
corpus <- Corpus(VectorSource(speech_df$V1))

# Remove quotation marks like “ or ”
corpus <- tm_map(corpus, content_transformer(function(x) gsub('”', '', x)))
corpus <- tm_map(corpus, content_transformer(function(x) gsub('“', '', x)))

# Pre-process the data
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)

# retrieve common words in English
corpus <- tm_map(corpus, removeWords, stopwords("english"))

Data visualization

First, let’s count the word for each row and add that in a dataframe df

# Create a Document-Term Matrix (DTM)
dtm <- DocumentTermMatrix(corpus)

# Convert DTM to a Data Frame
df <- as.data.frame(as.matrix(dtm))
tibble(df)

## # A tibble: 43 × 397
##    demonstration freedom greatest happy history  join nation today  will   ago
##            <dbl>   <dbl>    <dbl> <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
##  1             1       1        1     1       2     1      1     1     1     0
##  2             0       0        0     0       0     0      0     1     0     1
##  3             0       0        0     0       0     0      0     1     0     0
##  4             0       0        0     0       0     0      0     0     0     0
##  5             0       0        0     0       0     0      0     0     0     0
##  6             0       0        0     0       0     0      1     1     0     0
##  7             0       1        0     0       0     0      1     0     1     0
##  8             0       1        0     0       0     0      2     0     3     0
##  9             0       0        0     0       0     0      1     0     2     0
## 10             0       0        0     0       0     0      0     0     0     0
## # ℹ 33 more rows
## # ℹ 387 more variables: america <dbl>, american <dbl>, beacon <dbl>,
## #   came <dbl>, captivity <dbl>, chains <dbl>, colored <dbl>, crippled <dbl>,
## #   daybreak <dbl>, decree <dbl>, discrimination <dbl>, emancipation <dbl>,
## #   end <dbl>, five <dbl>, flames <dbl>, free <dbl>, great <dbl>, hope <dbl>,
## #   hundred <dbl>, injustice <dbl>, joyous <dbl>, later <dbl>, life <dbl>,
## #   long <dbl>, manacle <dbl>, millions <dbl>, momentous <dbl>, night <dbl>, …

This data has 43 rows. Let’s add all the rows together to count the total number of times each word have been spoken

# Sum the counts across all documents
total_counts <- colSums(df)
# Convert to a data frame
total_counts_df <- data.frame(word = names(total_counts), count = total_counts) %>%
  group_by(word) %>%
  summarize(count = sum(count))

head(total_counts_df)

## # A tibble: 6 × 2
##   word        count
##   <chr>       <dbl>
## 1 able            8
## 2 ago             1
## 3 alabama         3
## 4 alleghenies     1
## 5 almighty        1
## 6 also            1

Now, we can sort the data frame and plot the 10 most used words in the speech.

# Arrange by count in descending order and select the top 10
top_10_words <- total_counts_df %>%
  arrange(desc(count)) %>%
  head(10)

# Plot the top 10 words
ggplot(top_10_words, aes(x = reorder(word, count), y = count)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(title = "Top 10 Most Used Words in the Speech", x = "Word", y = "Count") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))

In his speech, Martin Luther King uses the word “will” the most, indicating a forward-looking perspective towards the future. Furthermore, he frequently uses words such as “freedom”, “colored”, and “every” conveying a vision where individuals, regardless of the color of their skin, will experience universal freedom.

Based on that, one can assume that the speech was very positive and encouraging. Let’s use the SentiWordNet lexicon to find if that’s the case.

Sentiment analysis using SentiWordNet lexicon

SentiWord has a list of 20,000 rows that gives a polarity values. In the dataframe, x is the Words and y stands for the Sentiment values.

Find more in the R Help document (hash_sentiment_sentiword {lexicon})

head(hash_sentiment_sentiword)

##               x       y
## 1:     365 days -0.5000
## 2:     366 days  0.2500
## 3:          3tc -0.2500
## 4:   a fortiori  0.2500
## 5:  a good deal  0.2500
## 6: a great deal  0.3125

Now we perform an inner join between “total_counts_df” and “hash_sentiment_sentiword”.

sentiment_analysis_df <- total_counts_df %>%
  inner_join(hash_sentiment_sentiword, by = c("word" = "x"))
head(sentiment_analysis_df)

## # A tibble: 6 × 3
##   word     count      y
##   <chr>    <dbl>  <dbl>
## 1 able         8  0.125
## 2 back         8  0.25 
## 3 bad          1 -0.518
## 4 bank         1  0.375
## 5 basic        1  0.25 
## 6 battered     1 -0.75

Finally, we can visualize the word vs the sentiment.

ggplot(sentiment_analysis_df, aes(x = word, y = y)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(title = "Sentiment Scores for Each word", y = "Sentiment Score") +
  theme(axis.text.x = element_blank(), axis.title.x = element_blank())

Based on the distribution, there is no clear pattern. The reason can be:

Limitation of lexicon analysis as it may not cover all nuances
The speech was very polarizing.

Exploring another Lexicon sentiment may provide a more understanding.

Sentiment analysis using AFFINN lexicon

# Load AFINN lexicon
afinn_lexicon <- get_sentiments("afinn")

# Join total_counts_df with AFINN lexicon
afinn_analysis_df <- total_counts_df %>%
  inner_join(afinn_lexicon, by = c("word" = "word"))

# Plot sentiment scores vs terms
ggplot(afinn_analysis_df, aes(x = word, y = value)) +
  geom_bar(stat = "identity", fill = "brown") +
  labs(title = "Sentiment Scores using AFINN for Each word", y = "Sentiment Score") +
  theme(axis.text.x = element_blank(), axis.title.x = element_blank())

Based on the plot, there are slightly more positive words in the AFFINN sentiment.

Conclusion

Speakers often use rhetorical devices to create emotional impact. This can involve emphasizing challenges and injustices (negative sentiment) while concurrently expressing optimism and aspirations (positive sentiment).

This speech motivate and inspire the audience by incorporating hope, dreams, and the vision for a better future while reminding the historical struggles for civil rights.

LS0tDQp0aXRsZTogIkFzc2lnbm1lbnQgMTAgLSBOTFAiDQphdXRob3I6ICJLb3NzaSBBa3BsYWthIg0KZGF0ZTogImByIFN5cy5EYXRlKClgIg0Kb3V0cHV0OiBvcGVuaW50cm86OmxhYl9yZXBvcnQNCi0tLQ0KIyMgTG9hZCB0aGUgbGlicmFyaWVzDQoNCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQpsaWJyYXJ5KCJ0aWJibGUiKQ0KbGlicmFyeSgidGlkeXZlcnNlIikNCmxpYnJhcnkoInRpZHl0ZXh0IikNCmxpYnJhcnkoInRleHRkYXRhIikNCmxpYnJhcnkoInNsYW0iKSANCmxpYnJhcnkoJ3RtJykNCmxpYnJhcnkoImxleGljb24iKSAjIlNlbnRpV29yZE5ldCIgDQpgYGANCg0KDQoNCiMjIEdldHRpbmcgdGhlIHByaW1hcnkgZXhhbXBsZSBjb2RlIGZyb20gY2hhcHRlciAyIG9mIHRoZSB0ZXh0Ym9vaw0KDQpTZW50aW1lbnQgYW5hbHlzaXMgaXMgYSB2ZXJ5IGV4aXRlZCB0b3BpYyBhbmQgY2FuIGFsbG93IHVzIHRvIHVuZGVyc3RhbmQgdGV4dCBiZXR0ZXIuIFRoZSBzZWNvbmQgY2hhcHRlciBvZiB0aGUgW2Jvb2tdKCJodHRwczovL3d3dy50aWR5dGV4dG1pbmluZy5jb20vc2VudGltZW50I3NlbnRpbWVudCIpIEEgVGlkeSBBcHByb2FjaCB0YWxrcyBhYm91dCB0aGUgYXBwcm9hY2ggd2UgY2FuIHVzZSBmb3Igc2VudGltZW50IGFuYWx5c2lzIHdpdGggdGlkeSBkYXRhLg0KDQpUaGUgZm9sbG93aW5nIGNodW5rcyBvZiBjb2RlIGFyZSBleGFtcGxlIGNvZGUgdGhhdCBJIHRvb2sgZnJvbSB0aGUgYm9vayAiQSBUaWR5IEFwcHJvYWNoIi4gIA0KDQoNCiMjIyBMb2FkIHRoZSBsaWJyYXJ5DQoNCkxldCdzIGV4cGxvcmUgdGhlIGRpZmZlcmVudCBzZW50aW1lbnQgbGV4aWNvbnMuDQoNCmBgYHtyfQ0KZ2V0X3NlbnRpbWVudHMoImFmaW5uIikNCmBgYA0KYGBge3J9DQpnZXRfc2VudGltZW50cygiYmluZyIpDQpgYGANCmBgYHtyfQ0KZ2V0X3NlbnRpbWVudHMoIm5yYyIpDQpgYGANCiMjIEV4dGVuZCB0aGUgY29kZSB3aXRoIGEgZGlmZmVyZW50IGNvcnB1cyBhbmQgbGV4aWNvbg0KDQpNYXJ0aW4gTHV0aGVyIEtpbmcgd2FzIGFuIGFkdm9jYXRlIGZvciB0aGUgY2l2aWwgcmlnaHRzIG1vdmVtZW50IHdobyBkZWxpdmVyZWQgdGhlIHNwZWVjaCAiSSBIYXZlIGEgRHJlYW0iIG9uIEF1Z3VzdCAyOCwgMTk2My4gVGhpcyBzcGVlY2ggaXMgd2lkZWx5IGNvbnNpZGVyZWQgdGhlIGdyZWF0ZXN0IHNwZWVjaCBvZiB0aGUgMjB0aCBjZW50dXJ5IGZvciBpdHMgcG93ZXIgYW5kIHJlc29uYW5jZS4gDQoNCkknbSBpbnRlcmVzdGVkIGluIHVzaW5nIGEgc2VudGltZW50IGxleGljb24gdG8gdW5kZXJzdGFuZCB3aGF0IG1hZGUgdGhlIHNwZWVjaCBzbyBwb3dlcmZ1bCBhbmQgbWVtb3JhYmxlLg0KDQpJIGZvdW5kIGEgdGV4dCB2ZXJzaW9uIG9mIHRoZSBzcGVlY2ggb24gW0thZ2dsZV0oImh0dHBzOi8vd3d3LmthZ2dsZS5jb20vZGF0YXNldHMvbXB3b2xrZS9oYXZlYWRyZWFtL2RhdGEiKS4gSSBhZGRlZCB0byBteSBnaXRodWIgYW5kIHN0YXJ0IHRpZHlpbmcgdGhlIGRhdGEuDQoNCiMjIyBHZXR0aW5nIHRoZSBkYXRhDQoNCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQpzcGVlY2hfZGYgPC0gcmVhZC5jc3YoImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9Lb3NzaS1Ba3BsYWthL0RhdGE2MDctZGF0YV9hY3F1aXNpdGlvbl9hbmRfbWFuYWdlbWVudC9tYWluL0Fzc2lnbm1lbnQxMC9kcmVhbS50eHQiLCBoZWFkZXIgPSBGQUxTRSkNCnRpYmJsZShzcGVlY2hfZGYpDQpgYGANCg0KIyMjIFNlbnRpbWVudCBsZXhpY29uDQoNClRoZSBjaG9pY2Ugb2YgYSBzZW50aW1lbnQgbGV4aWNvbiBkZXBlbmRzIG9uIHRoZSBuYXR1cmUgb2YgdGhlIHRleHQuIFRoZSAqTG91Z2hyYW4tTWNEb25hbGQgRmluYW5jaWFsIFNlbnRpbWVudCBXb3JkIExpc3RzKiwgZm9yIGluc3RhbmNlLCBhcmUgdGFpbG9yZWQgZm9yIGZpbmFuY2lhbCB0ZXh0IGFuZCBtYXkgbm90IGJlIHN1aXRhYmxlIGZvciBhIGhpc3RvcmljYWwgc3BlZWNoLg0KDQpBbiBhbHRlcm5hdGl2ZSBpcyB0byB1c2UgdGhlICpTZW50aVdvcmROZXQqIGxleGljb24uIEFjY29yZGluZyB0byAqU2VudGlXb3JkTmV0KiBbZ2l0SHViXSgiaHR0cHM6Ly9naXRodWIuY29tL2Flc3VsaS9TZW50aVdvcmROZXQiKSwgU2VudGlXb3JkTmV0IGlzIGEgbGV4aWNhbCByZXNvdXJjZSBmb3Igb3BpbmlvbiBtaW5pbmcgdGhhdCBhc3NpZ25zIHRvIGVhY2ggc3luc2V0IG9mIFdvcmROZXQgdGhyZWUgc2VudGltZW50IHNjb3JlczoNCg0KLSBQb3NpdGl2aXR5DQoNCi0gTmVnYXRpdml0eSANCg0KLSBPYmplY3Rpdml0eSAobmV1dHJhbCkNCg0KIyMgVGlkeSB0aGUgdGV4dCBkYXRhDQoNClRoZSBzcGVlY2ggZGF0YSBoYXMgNDMgcm93cyBhbmQgMSBjb2x1bW5zLiBMZXQncyB0aWR5IGl0IHVwLCByZW1vdmUgdGhlIHB1bmN0dWF0aW9uLCBldGMuLi5pbnRvIGEgY29ycHVzLiANCg0KYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCiMgSW50ZXJwcmV0cyBlYWNoIGVsZW1lbnQgYXMgYSBkb2N1bWVudA0KY29ycHVzIDwtIENvcnB1cyhWZWN0b3JTb3VyY2Uoc3BlZWNoX2RmJFYxKSkNCg0KIyBSZW1vdmUgcXVvdGF0aW9uIG1hcmtzIGxpa2Ug4oCcIG9yIOKAnQ0KY29ycHVzIDwtIHRtX21hcChjb3JwdXMsIGNvbnRlbnRfdHJhbnNmb3JtZXIoZnVuY3Rpb24oeCkgZ3N1Yign4oCdJywgJycsIHgpKSkNCmNvcnB1cyA8LSB0bV9tYXAoY29ycHVzLCBjb250ZW50X3RyYW5zZm9ybWVyKGZ1bmN0aW9uKHgpIGdzdWIoJ+KAnCcsICcnLCB4KSkpDQoNCiMgUHJlLXByb2Nlc3MgdGhlIGRhdGENCmNvcnB1cyA8LSB0bV9tYXAoY29ycHVzLCBjb250ZW50X3RyYW5zZm9ybWVyKHRvbG93ZXIpKQ0KY29ycHVzIDwtIHRtX21hcChjb3JwdXMsIHJlbW92ZVB1bmN0dWF0aW9uKQ0KY29ycHVzIDwtIHRtX21hcChjb3JwdXMsIHJlbW92ZU51bWJlcnMpDQoNCiMgcmV0cmlldmUgY29tbW9uIHdvcmRzIGluIEVuZ2xpc2gNCmNvcnB1cyA8LSB0bV9tYXAoY29ycHVzLCByZW1vdmVXb3Jkcywgc3RvcHdvcmRzKCJlbmdsaXNoIikpIA0KDQpgYGANCg0KDQoNCiMjIERhdGEgdmlzdWFsaXphdGlvbg0KDQpGaXJzdCwgbGV0J3MgY291bnQgdGhlIHdvcmQgZm9yIGVhY2ggcm93IGFuZCBhZGQgdGhhdCBpbiBhIGRhdGFmcmFtZSBkZg0KDQpgYGB7cn0NCiMgQ3JlYXRlIGEgRG9jdW1lbnQtVGVybSBNYXRyaXggKERUTSkNCmR0bSA8LSBEb2N1bWVudFRlcm1NYXRyaXgoY29ycHVzKQ0KDQojIENvbnZlcnQgRFRNIHRvIGEgRGF0YSBGcmFtZQ0KZGYgPC0gYXMuZGF0YS5mcmFtZShhcy5tYXRyaXgoZHRtKSkNCnRpYmJsZShkZikNCmBgYA0KVGhpcyBkYXRhIGhhcyA0MyByb3dzLiBMZXQncyBhZGQgYWxsIHRoZSByb3dzIHRvZ2V0aGVyIHRvIGNvdW50IHRoZSB0b3RhbCBudW1iZXIgb2YgdGltZXMgZWFjaCB3b3JkIGhhdmUgYmVlbiBzcG9rZW4NCg0KYGBge3J9DQojIFN1bSB0aGUgY291bnRzIGFjcm9zcyBhbGwgZG9jdW1lbnRzDQp0b3RhbF9jb3VudHMgPC0gY29sU3VtcyhkZikNCiMgQ29udmVydCB0byBhIGRhdGEgZnJhbWUNCnRvdGFsX2NvdW50c19kZiA8LSBkYXRhLmZyYW1lKHdvcmQgPSBuYW1lcyh0b3RhbF9jb3VudHMpLCBjb3VudCA9IHRvdGFsX2NvdW50cykgJT4lDQogIGdyb3VwX2J5KHdvcmQpICU+JQ0KICBzdW1tYXJpemUoY291bnQgPSBzdW0oY291bnQpKQ0KDQpoZWFkKHRvdGFsX2NvdW50c19kZikNCmBgYA0KDQoNCk5vdywgd2UgY2FuIHNvcnQgdGhlIGRhdGEgZnJhbWUgYW5kIHBsb3QgdGhlIDEwIG1vc3QgdXNlZCB3b3JkcyBpbiB0aGUgc3BlZWNoLg0KDQpgYGB7cn0NCiMgQXJyYW5nZSBieSBjb3VudCBpbiBkZXNjZW5kaW5nIG9yZGVyIGFuZCBzZWxlY3QgdGhlIHRvcCAxMA0KdG9wXzEwX3dvcmRzIDwtIHRvdGFsX2NvdW50c19kZiAlPiUNCiAgYXJyYW5nZShkZXNjKGNvdW50KSkgJT4lDQogIGhlYWQoMTApDQoNCiMgUGxvdCB0aGUgdG9wIDEwIHdvcmRzDQpnZ3Bsb3QodG9wXzEwX3dvcmRzLCBhZXMoeCA9IHJlb3JkZXIod29yZCwgY291bnQpLCB5ID0gY291bnQpKSArDQogIGdlb21fYmFyKHN0YXQgPSAiaWRlbnRpdHkiLCBmaWxsID0gImJsdWUiKSArDQogIGxhYnModGl0bGUgPSAiVG9wIDEwIE1vc3QgVXNlZCBXb3JkcyBpbiB0aGUgU3BlZWNoIiwgeCA9ICJXb3JkIiwgeSA9ICJDb3VudCIpICsNCiAgdGhlbWUoYXhpcy50ZXh0LnggPSBlbGVtZW50X3RleHQoYW5nbGUgPSA0NSwgaGp1c3QgPSAxLCB2anVzdCA9IDEpKQ0KYGBgDQoNCg0KSW4gaGlzIHNwZWVjaCwgTWFydGluIEx1dGhlciBLaW5nIHVzZXMgdGhlIHdvcmQgIip3aWxsKiIgdGhlIG1vc3QsIGluZGljYXRpbmcgYSBmb3J3YXJkLWxvb2tpbmcgcGVyc3BlY3RpdmUgdG93YXJkcyB0aGUgZnV0dXJlLiBGdXJ0aGVybW9yZSwgaGUgZnJlcXVlbnRseSB1c2VzIHdvcmRzIHN1Y2ggYXMgIipmcmVlZG9tKiIsICIqY29sb3JlZCoiLCBhbmQgIipldmVyeSoiIGNvbnZleWluZyBhIHZpc2lvbiB3aGVyZSBpbmRpdmlkdWFscywgcmVnYXJkbGVzcyBvZiB0aGUgY29sb3Igb2YgdGhlaXIgc2tpbiwgd2lsbCBleHBlcmllbmNlIHVuaXZlcnNhbCBmcmVlZG9tLg0KDQpCYXNlZCBvbiB0aGF0LCBvbmUgY2FuIGFzc3VtZSB0aGF0IHRoZSBzcGVlY2ggd2FzIHZlcnkgcG9zaXRpdmUgYW5kIGVuY291cmFnaW5nLiBMZXQncyB1c2UgdGhlICpTZW50aVdvcmROZXQqIGxleGljb24gdG8gZmluZCBpZiB0aGF0J3MgdGhlIGNhc2UuDQoNCg0KIyMgU2VudGltZW50IGFuYWx5c2lzIHVzaW5nIFNlbnRpV29yZE5ldCBsZXhpY29uDQoNClNlbnRpV29yZCBoYXMgYSBsaXN0IG9mIDIwLDAwMCByb3dzIHRoYXQgZ2l2ZXMgYSBwb2xhcml0eSB2YWx1ZXMuIEluIHRoZSBkYXRhZnJhbWUsIHggaXMgdGhlIFdvcmRzIGFuZCB5IHN0YW5kcyBmb3IgdGhlIFNlbnRpbWVudCB2YWx1ZXMuDQoNCkZpbmQgbW9yZSBpbiB0aGUgUiBIZWxwIGRvY3VtZW50ICgqaGFzaF9zZW50aW1lbnRfc2VudGl3b3JkIHtsZXhpY29ufSopDQoNCmBgYHtyfQ0KaGVhZChoYXNoX3NlbnRpbWVudF9zZW50aXdvcmQpDQpgYGANCk5vdyB3ZSBwZXJmb3JtIGFuIGlubmVyIGpvaW4gYmV0d2VlbiAidG90YWxfY291bnRzX2RmIiBhbmQgICJoYXNoX3NlbnRpbWVudF9zZW50aXdvcmQiLiANCg0KYGBge3J9DQpzZW50aW1lbnRfYW5hbHlzaXNfZGYgPC0gdG90YWxfY291bnRzX2RmICU+JQ0KICBpbm5lcl9qb2luKGhhc2hfc2VudGltZW50X3NlbnRpd29yZCwgYnkgPSBjKCJ3b3JkIiA9ICJ4IikpDQpoZWFkKHNlbnRpbWVudF9hbmFseXNpc19kZikNCmBgYA0KDQpGaW5hbGx5LCB3ZSBjYW4gdmlzdWFsaXplIHRoZSB3b3JkIHZzIHRoZSBzZW50aW1lbnQuDQoNCmBgYHtyfQ0KDQpnZ3Bsb3Qoc2VudGltZW50X2FuYWx5c2lzX2RmLCBhZXMoeCA9IHdvcmQsIHkgPSB5KSkgKw0KICBnZW9tX2JhcihzdGF0ID0gImlkZW50aXR5IiwgZmlsbCA9ICJibHVlIikgKw0KICBsYWJzKHRpdGxlID0gIlNlbnRpbWVudCBTY29yZXMgZm9yIEVhY2ggd29yZCIsIHkgPSAiU2VudGltZW50IFNjb3JlIikgKw0KICB0aGVtZShheGlzLnRleHQueCA9IGVsZW1lbnRfYmxhbmsoKSwgYXhpcy50aXRsZS54ID0gZWxlbWVudF9ibGFuaygpKQ0KDQoNCmBgYA0KDQoNCkJhc2VkIG9uIHRoZSBkaXN0cmlidXRpb24sIHRoZXJlIGlzIG5vIGNsZWFyIHBhdHRlcm4uIFRoZSByZWFzb24gY2FuIGJlOg0KDQotIExpbWl0YXRpb24gb2YgbGV4aWNvbiBhbmFseXNpcyBhcyBpdCBtYXkgbm90IGNvdmVyIGFsbCBudWFuY2VzDQoNCi0gVGhlIHNwZWVjaCB3YXMgdmVyeSBwb2xhcml6aW5nLiANCg0KRXhwbG9yaW5nIGFub3RoZXIgTGV4aWNvbiBzZW50aW1lbnQgbWF5IHByb3ZpZGUgYSBtb3JlIHVuZGVyc3RhbmRpbmcuDQoNCg0KIyMgU2VudGltZW50IGFuYWx5c2lzIHVzaW5nIEFGRklOTiBsZXhpY29uDQoNCmBgYHtyfQ0KIyBMb2FkIEFGSU5OIGxleGljb24NCmFmaW5uX2xleGljb24gPC0gZ2V0X3NlbnRpbWVudHMoImFmaW5uIikNCg0KIyBKb2luIHRvdGFsX2NvdW50c19kZiB3aXRoIEFGSU5OIGxleGljb24NCmFmaW5uX2FuYWx5c2lzX2RmIDwtIHRvdGFsX2NvdW50c19kZiAlPiUNCiAgaW5uZXJfam9pbihhZmlubl9sZXhpY29uLCBieSA9IGMoIndvcmQiID0gIndvcmQiKSkNCg0KIyBQbG90IHNlbnRpbWVudCBzY29yZXMgdnMgdGVybXMNCmdncGxvdChhZmlubl9hbmFseXNpc19kZiwgYWVzKHggPSB3b3JkLCB5ID0gdmFsdWUpKSArDQogIGdlb21fYmFyKHN0YXQgPSAiaWRlbnRpdHkiLCBmaWxsID0gImJyb3duIikgKw0KICBsYWJzKHRpdGxlID0gIlNlbnRpbWVudCBTY29yZXMgdXNpbmcgQUZJTk4gZm9yIEVhY2ggd29yZCIsIHkgPSAiU2VudGltZW50IFNjb3JlIikgKw0KICB0aGVtZShheGlzLnRleHQueCA9IGVsZW1lbnRfYmxhbmsoKSwgYXhpcy50aXRsZS54ID0gZWxlbWVudF9ibGFuaygpKQ0KYGBgDQoNCkJhc2VkIG9uIHRoZSBwbG90LCB0aGVyZSBhcmUgc2xpZ2h0bHkgbW9yZSBwb3NpdGl2ZSB3b3JkcyBpbiB0aGUgQUZGSU5OIHNlbnRpbWVudC4gDQoNCg0KIyMgQ29uY2x1c2lvbg0KDQpTcGVha2VycyBvZnRlbiB1c2UgcmhldG9yaWNhbCBkZXZpY2VzIHRvIGNyZWF0ZSBlbW90aW9uYWwgaW1wYWN0LiBUaGlzIGNhbiBpbnZvbHZlIGVtcGhhc2l6aW5nIGNoYWxsZW5nZXMgYW5kIGluanVzdGljZXMgKG5lZ2F0aXZlIHNlbnRpbWVudCkgd2hpbGUgY29uY3VycmVudGx5IGV4cHJlc3Npbmcgb3B0aW1pc20gYW5kIGFzcGlyYXRpb25zIChwb3NpdGl2ZSBzZW50aW1lbnQpLg0KDQpUaGlzIHNwZWVjaCBtb3RpdmF0ZSBhbmQgaW5zcGlyZSB0aGUgYXVkaWVuY2UgYnkgaW5jb3Jwb3JhdGluZyBob3BlLCBkcmVhbXMsIGFuZCB0aGUgdmlzaW9uIGZvciBhIGJldHRlciBmdXR1cmUgd2hpbGUgcmVtaW5kaW5nICB0aGUgaGlzdG9yaWNhbCBzdHJ1Z2dsZXMgZm9yIGNpdmlsIHJpZ2h0cy4gDQoNCg0K

Assignment 10 - NLP

Kossi Akplaka

2023-11-11