1 Background

In conducting sentiment analysis on Leo Tolstoy’s short story titled “How much land does a man need?” (Tolstoy 1905), the primary objective is to illustrate automated text mining in R. The scondary objective is to examine the underlying sentiments conveyed within the text by applying a quantitative approach. By analyzing the story through this lens, we aim to gain a deeper understanding of the characters, themes, and overall message conveyed by Tolstoy.

Tolstoy’s “How much land does a man need?” delves into the timeless themes of greed, ambition, and the pursuit of material wealth. The narrative follows a peasant named Pahom, who becomes consumed by the desire for more land. As he accumulates plots of land, his insatiable greed drives him to make a deal with the Devil. The climax of the story occurs during a race across the land, where Pahom’s relentless pursuit for more ultimately leads to his demise.

This poignant tale serves as a cautionary allegory, highlighting the destructive consequences of unchecked ambition and the pitfalls of materialism. Tolstoy’s powerful storytelling invites readers to reflect on the true value of wealth and the detrimental effects of never-ending desires.

By analyzing the sentiment in this short story, we can explore how Tolstoy’s writing evokes emotions such as greed, ambition, fear, and regret. Through a thorough examination of the characters’ actions, dialogues, and the narrative itself, we aim to uncover the emotional depth and impact of the story. This sentiment analysis will provide valuable insights into Tolstoy’s intent and the overall reception of the text, allowing for a nuanced understanding of the themes explored in “How much land does a man need?”

2 The Approach

The approach we take for sentiment analysis, utilizing the quantitative scoring method and implementing it through the R programming language, focuses on assigning numerical values to words or phrases to determine their sentiment polarity. This approach has its strengths, as it allows for automated analysis and provides a quantitative assessment of sentiment (Silge and Robinson 2017).

However, it’s important to acknowledge a potential limitation of this approach. Quantitative sentiment scoring might not always capture the true meaning of words or phrases, especially when they are used sarcastically or in a context that deviates from their literal interpretation. Sarcasm, irony, and other forms of nuanced language can be challenging to accurately detect and interpret solely based on quantitative scoring methods.

While the quantitative approach can provide valuable insights into overall sentiment trends and patterns within the text, it’s crucial to consider the context and use additional qualitative analysis techniques to capture the full meaning and subtleties of the sentiments expressed. Combining quantitative scoring with qualitative assessment, such as contextual analysis or manual review, can help mitigate this limitation and provide a more comprehensive understanding of the sentiment conveyed in Tolstoy’s “How much land does a man need?”

We adopt the tidy approach for conducting text analysis, as proposed by the Tidyverse framework (Wickham et al. 2019). With tidy data, each variable is organized into its own column, and each observation is represented in a row. The primary unit of analysis is a rectangular grid consisting of rows and columns of data.

Let’s define some important terms. A ‘corpus’ refers to a collection of text documents, which can also be seen as raw strings accompanied by additional metadata. In our case, the short story serves as the corpus that we analyze. If there are multiple corpora, we can create a document term matrix (DTM). A DTM is a sparse matrix that contains a collection of documents (i.e., corpus), with one row for each document and one column for each word present in the document set.

A ‘token’ represents a meaningful unit of text, such as a word, sentence, or tweet. Tokenization involves the process of dividing text into these individual tokens. In our analysis, we employed the unnest_tokens() function from the tidytext package to split the short story into its constituent words. The result is a table where each row corresponds to a single token or word.

3 Data

I start by loading the requisite R packages that we utilize in the analysis.

Code

if (!require(pacman)) {
    install.packages("pacman")
}

# install.packages("tokenizers")

pacman::p_load(
    tokenizers,
    tidyverse,
    tidytext,
    janitor,
    stopwords,
    gt,
    grateful,
    wordcloud2,
    wordcloud,
    NLP,
    tm,
    textdata
)

Next, we load the short stories, How much land does a man need? by Leo Tolstoy, into R . I scrape this short story from the site https://www.marxists.org/archive/tolstoy/1886/how-much-land-does-a-man-need.html using the rvest package (Wickham 2022).

Code

library(rvest)
land <- read_html("https://www.marxists.org/archive/tolstoy/1886/how-much-land-does-a-man-need.html") %>%
    html_nodes("p") %>%
    html_text() %>%
    tibble() %>%
    set_names("lines") %>%
    filter(!str_detect(lines, "Leo Tolstoy Archive")) %>%
    mutate(lines = str_remove_all(lines, "\\r|\\n")) %>%
    filter(!str_detect(lines, "^Written: 1886Source")) %>%
    filter(lines != "1886. ")

Let us look at the first 5 paragraphs of the short story.

Code

head(land) %>% gt()

lines
An elder sister came to visit her younger sister in the country. The elder was married to a tradesman in town, the younger to a peasant in the village. As the sisters sat over their tea talking, the elder began to boast of the advantages of town life: saying how comfortably they lived there, how well they dressed, what fine clothes her children wore, what good things they ate and drank, and how she went to the theater, promenades, and entertainments.
The younger sister was piqued, and in turn disparaged the life of a tradesman, and stood up for that of a peasant.
'I would not change my way of life for yours,' said she. 'We may live roughly, but at least we are free from anxiety. You live in better style than we do, but though you often earn more than you need, you are very likely to lose all you have. You know the proverb, "Loss and gain are brothers twain." It often happens that people who are wealthy one day are begging their bread the next. Our way is safer. Though a peasant's life is not a fat one, it is a long one. We shall never grow rich, but we shall always have enough to eat.'
The elder sister said sneeringly:
'Enough? Yes, if you like to share with the pigs and the calves! What do you know of elegance or manners! However much your good man may slave, you will die as you are living—on a dung heap—and your children the same.'
'Well, what of that?' replied the younger. 'Of course our work is rough and coarse. But, on the other hand, it is sure; and we need not bow to any one. But you, in your towns, are surrounded by temptations; to-day all may be right, but to-morrow the Evil One may tempt your husband with cards, wine, or women, and all will go to ruin. Don't such things happen often enough?'

Here are the last 5 paragraphs in the short story.

Code

tail(land) %>% gt()

lines
'Ah, what a fine fellow!' exclaimed the Chief. 'He has gained much land!'
Pahóm's servant came running up and tried to raise him, but he saw that blood was flowing from his mouth. Pahóm was dead!
The Bashkírs clicked their tongues to show their pity.
His servant picked up the spade and dug a grave long enough for Pahóm to lie in, and buried him in it. Six feet from his head to his heels was all he needed.

Next, we break down the story into constituent words. Here, the unnest_tokens() splits creates a column word . Each word in our short story will be a separate row. In our case, we split the variable lines into individual words.

Code

land_tokens <- land %>%
    unnest_tokens(word, lines) %>%
    filter(!str_detect(word, "^\\d.*"))

head(land_tokens, 20) %>%
    gt()

word
an
elder
sister
came
to
visit
her
younger
sister
in
the
country
the
elder
was
married
to
a
tradesman
in

Next, we remove some words have no meaning on their own and are useful for joining words together,like “for” or articles like “a” and “the”. In R, we have a dictionary of stopwords that allows us to quickly weed out unneccesary words.

Code

final_words <- land_tokens %>%
    anti_join(get_stopwords())

head(final_words, 20) %>%
    gt(caption = "Top 5 Words")

Top 5 Words
word
elder
sister
came
visit
younger
sister
country
elder
married
tradesman
town
younger
peasant
village
sisters
sat
tea
talking
elder
began

4 Analysis

4.1 Exploratory Analysis

To start the analysis we can count the occurrence of each key word in the story.

Code

final_words %>%
    count(word, sort = TRUE) %>%
    slice_head(n = 15) %>%
    gt(caption = "Prominent Words in the Short Story")

Prominent Words in the Short Story
word	n
land	79
pahóm	79
one	38
thought	31
went	28
said	27
now	20
bashkírs	18
chief	18
go	18
day	16
much	16
sun	16
came	14
began	13

Clearly, this story is about Pahom and land given their prominence in the text. There are also thoughts that probably run through the mind of the main actor Pahom, but also the secondary players like the chief and the bashkírs. The remainder of the text captures the conversations and actions in the story.

We plot the frequency of the 15 most used words in the short story.

Code

final_words %>%
    count(word, sort = TRUE) %>%
    slice_head(n = 15) %>%
    mutate(word = fct_reorder(word, n)) %>%
    ggplot(mapping = aes(x = word, y = n, size = n)) +
    geom_point(show.legend = FALSE) +
    ggthemes::theme_fivethirtyeight() +
    coord_flip() +
    labs(
        x = "", y = "Count",
        title = "Word Frequency"
    )

A word cloud is an alternative plot that shows the prominence of words in a piece of text. The merit of a word cloud is that it can visualize more words that a bar plot.

Code

final_words %>%
    wordcloud(
        scale = c(2.5, 0.5),
        random.order = FALSE,
        colors = brewer.pal(8, "Dark2")
    )

4.2 Sentiment Analysis

In this section, we estimate the average sentiment or emotional content of words in Tolstoy’s story. I have written a [separate article](https://rpubs.com/Karuitha/sentiment_star) on sentiment analysis (Karuitha 2022). In summary, there are several tools that allow for the estimation of sentiment. In R, the common sentiment analysis dictionaries are:

Bing
Afinn
Loughran
NRC

Please refere to the literature for each of these sentiment measures. In this case, we use the nrc dictionary. The nrc dictionary has 10 classes of sentiment listed below.

Code

get_sentiments("nrc") %>%
    count(sentiment) %>%
    pull(sentiment)

 [1] "anger"        "anticipation" "disgust"      "fear"         "joy"         
 [6] "negative"     "positive"     "sadness"      "surprise"     "trust"

Most English words have carefully been allocated to each of these sentiments. However, there is still room for error. As noted earlier, people may use words in ways that may not correspond with their literal meaning. In such a case, the kind of analysis used in this article may be misleading.

Code

final_words %>%
    inner_join(get_sentiments("nrc")) %>%
    count(sentiment) %>%
    arrange(desc(n)) %>%
    mutate(prop = n / sum(n)) %>%
    gt()

sentiment	n	prop
positive	259	0.22839506
anticipation	182	0.16049383
trust	147	0.12962963
negative	131	0.11552028
joy	96	0.08465608
sadness	77	0.06790123
anger	68	0.05996473
surprise	68	0.05996473
fear	63	0.05555556
disgust	43	0.03791887

Sentiment Score Using the NRC Method

Let us redo the analysis using the bing lexicon.

Code

final_words %>%
    inner_join(get_sentiments("bing")) %>%
    count(sentiment) %>%
    arrange(desc(n)) %>%
    mutate(prop = n / sum(n)) %>%
    gt()

sentiment	n	prop
negative	142	0.5
positive	142	0.5

Sentiment Score Using the Bing Method

Here is the same analysis using the loughran lexicon.

Code

final_words %>%
    inner_join(get_sentiments("loughran")) %>%
    count(sentiment) %>%
    arrange(desc(n)) %>%
    mutate(prop = n / sum(n)) %>%
    gt()

sentiment	n	prop
negative	73	0.437125749
positive	47	0.281437126
uncertainty	25	0.149700599
litigious	21	0.125748503
constraining	1	0.005988024

Sentiment Score Using the Loughran Method

The afinn lexicon shows that the story is largely positive.

Code

final_words %>%
    inner_join(get_sentiments("afinn")) %>%
    summarise(average = mean(value)) %>%
    gt()

average
0.2276423

Sentiment Score Using the Afinn Method

Overall, the story appears to be more positive than negative, given its purpose to provoke people to rethink their attitude to the pursuit of material wealth. Like every great story, “How much land does a man need?” uses a great deal of suspense and hence scores high in both anticipation and uncertainty.

5 Conclusion

In conclusion, Leo Tolstoy’s short story, “How much land does a man need?”, offers a timeless and cautionary tale that resonates with readers on multiple levels. Through the exploration of themes such as greed, ambition, and the pursuit of material wealth, Tolstoy masterfully crafts a narrative that exposes the destructive consequences of unchecked desires. The story is largely positive, though packed with uncertainty and anticipation as any good story should be in using suspense to draw the readers curiosity.

The story serves as an allegory, urging readers to reflect upon the true worth of wealth and the dangers of insatiable cravings. By conducting sentiment analysis on this poignant tale, we gain a deeper understanding of the emotional nuances and underlying sentiments conveyed by Tolstoy, enabling us to appreciate the profound impact of his storytelling. Ultimately, “How much land does a man need?” stands as a powerful reminder of the importance of contentment and the perils of endless aspirations.

References

Karuitha, John. 2022. “Natural language Processing in R: Sentiment Analysis of Kenya’s Star Newspaper on Saturday July 16, 2022.” 2022. {https://rpubs.com/Karuitha/sentiment_star}.

Silge, Julia, and David Robinson. 2017. Text Mining with r. 1st ed. Sebastopol, CA: O’Reilly Media.

Tolstoy, Leo. 1905. How Much Land Does a Man Need? Moscow, Russia: The Literature Network.

Wickham, Hadley. 2022. “Rvest: Easily Harvest (Scrape) Web Pages.” https://CRAN.R-project.org/package=rvest.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.