Mining Text and Exploring Sentiments in Leo Tolstoy's ‘How Much Land Does a Man Need?’
A Sentimental Journey through Greed, Consequence, and Redemption in Tolstoy’s Timeless Tale
Author
Affiliation
John Karuitha
Karatina University
Published
May 29, 2023
Modified
May 29, 2023
1 Background
In conducting sentiment analysis on Leo Tolstoy’s short story titled “How much land does a man need?” (Tolstoy 1905), the primary objective is to illustrate automated text mining in R. The scondary objective is to examine the underlying sentiments conveyed within the text by applying a quantitative approach. By analyzing the story through this lens, we aim to gain a deeper understanding of the characters, themes, and overall message conveyed by Tolstoy.
Tolstoy’s “How much land does a man need?” delves into the timeless themes of greed, ambition, and the pursuit of material wealth. The narrative follows a peasant named Pahom, who becomes consumed by the desire for more land. As he accumulates plots of land, his insatiable greed drives him to make a deal with the Devil. The climax of the story occurs during a race across the land, where Pahom’s relentless pursuit for more ultimately leads to his demise.
This poignant tale serves as a cautionary allegory, highlighting the destructive consequences of unchecked ambition and the pitfalls of materialism. Tolstoy’s powerful storytelling invites readers to reflect on the true value of wealth and the detrimental effects of never-ending desires.
By analyzing the sentiment in this short story, we can explore how Tolstoy’s writing evokes emotions such as greed, ambition, fear, and regret. Through a thorough examination of the characters’ actions, dialogues, and the narrative itself, we aim to uncover the emotional depth and impact of the story. This sentiment analysis will provide valuable insights into Tolstoy’s intent and the overall reception of the text, allowing for a nuanced understanding of the themes explored in “How much land does a man need?”
2 The Approach
The approach we take for sentiment analysis, utilizing the quantitative scoring method and implementing it through the R programming language, focuses on assigning numerical values to words or phrases to determine their sentiment polarity. This approach has its strengths, as it allows for automated analysis and provides a quantitative assessment of sentiment (Silge and Robinson 2017).
However, it’s important to acknowledge a potential limitation of this approach. Quantitative sentiment scoring might not always capture the true meaning of words or phrases, especially when they are used sarcastically or in a context that deviates from their literal interpretation. Sarcasm, irony, and other forms of nuanced language can be challenging to accurately detect and interpret solely based on quantitative scoring methods.
While the quantitative approach can provide valuable insights into overall sentiment trends and patterns within the text, it’s crucial to consider the context and use additional qualitative analysis techniques to capture the full meaning and subtleties of the sentiments expressed. Combining quantitative scoring with qualitative assessment, such as contextual analysis or manual review, can help mitigate this limitation and provide a more comprehensive understanding of the sentiment conveyed in Tolstoy’s “How much land does a man need?”
We adopt the tidy approach for conducting text analysis, as proposed by the Tidyverse framework (Wickham et al. 2019). With tidy data, each variable is organized into its own column, and each observation is represented in a row. The primary unit of analysis is a rectangular grid consisting of rows and columns of data.
Let’s define some important terms. A ‘corpus’ refers to a collection of text documents, which can also be seen as raw strings accompanied by additional metadata. In our case, the short story serves as the corpus that we analyze. If there are multiple corpora, we can create a document term matrix (DTM). A DTM is a sparse matrix that contains a collection of documents (i.e., corpus), with one row for each document and one column for each word present in the document set.
A ‘token’ represents a meaningful unit of text, such as a word, sentence, or tweet. Tokenization involves the process of dividing text into these individual tokens. In our analysis, we employed the unnest_tokens() function from the tidytext package to split the short story into its constituent words. The result is a table where each row corresponds to a single token or word.
3 Data
I start by loading the requisite R packages that we utilize in the analysis.
Let us look at the first 5 paragraphs of the short story.
Code
head(land) %>%gt()
lines
An elder sister came to visit her younger sister in the country. The elder was married to a tradesman in town, the younger to a peasant in the village. As the sisters sat over their tea talking, the elder began to boast of the advantages of town life: saying how comfortably they lived there, how well they dressed, what fine clothes her children wore, what good things they ate and drank, and how she went to the theater, promenades, and entertainments.
The younger sister was piqued, and in turn disparaged the life of a tradesman, and stood up for that of a peasant.
'I would not change my way of life for yours,' said she. 'We may live roughly, but at least we are free from anxiety. You live in better style than we do, but though you often earn more than you need, you are very likely to lose all you have. You know the proverb, "Loss and gain are brothers twain." It often happens that people who are wealthy one day are begging their bread the next. Our way is safer. Though a peasant's life is not a fat one, it is a long one. We shall never grow rich, but we shall always have enough to eat.'
The elder sister said sneeringly:
'Enough? Yes, if you like to share with the pigs and the calves! What do you know of elegance or manners! However much your good man may slave, you will die as you are living—on a dung heap—and your children the same.'
'Well, what of that?' replied the younger. 'Of course our work is rough and coarse. But, on the other hand, it is sure; and we need not bow to any one. But you, in your towns, are surrounded by temptations; to-day all may be right, but to-morrow the Evil One may tempt your husband with cards, wine, or women, and all will go to ruin. Don't such things happen often enough?'
Here are the last 5 paragraphs in the short story.
Code
tail(land) %>%gt()
lines
'Ah, what a fine fellow!' exclaimed the Chief. 'He has gained much land!'
Pahóm's servant came running up and tried to raise him, but he saw that blood was flowing from his mouth. Pahóm was dead!
The Bashkírs clicked their tongues to show their pity.
His servant picked up the spade and dug a grave long enough for Pahóm to lie in, and buried him in it. Six feet from his head to his heels was all he needed.
Next, we break down the story into constituent words. Here, the unnest_tokens() splits creates a column word . Each word in our short story will be a separate row. In our case, we split the variable lines into individual words.
Code
land_tokens <- land %>%unnest_tokens(word, lines) %>%filter(!str_detect(word, "^\\d.*"))head(land_tokens, 20) %>%gt()
word
an
elder
sister
came
to
visit
her
younger
sister
in
the
country
the
elder
was
married
to
a
tradesman
in
Next, we remove some words have no meaning on their own and are useful for joining words together,like “for” or articles like “a” and “the”. In R, we have a dictionary of stopwords that allows us to quickly weed out unneccesary words.
To start the analysis we can count the occurrence of each key word in the story.
Code
final_words %>%count(word, sort =TRUE) %>%slice_head(n =15) %>%gt(caption ="Prominent Words in the Short Story")
Prominent Words in the Short Story
word
n
land
79
pahóm
79
one
38
thought
31
went
28
said
27
now
20
bashkírs
18
chief
18
go
18
day
16
much
16
sun
16
came
14
began
13
Clearly, this story is about Pahom and land given their prominence in the text. There are also thoughts that probably run through the mind of the main actor Pahom, but also the secondary players like the chief and the bashkírs. The remainder of the text captures the conversations and actions in the story.
We plot the frequency of the 15 most used words in the short story.
Code
final_words %>%count(word, sort =TRUE) %>%slice_head(n =15) %>%mutate(word =fct_reorder(word, n)) %>%ggplot(mapping =aes(x = word, y = n, size = n)) +geom_point(show.legend =FALSE) + ggthemes::theme_fivethirtyeight() +coord_flip() +labs(x ="", y ="Count",title ="Word Frequency" )
Prominent Words in the Short Story
A word cloud is an alternative plot that shows the prominence of words in a piece of text. The merit of a word cloud is that it can visualize more words that a bar plot.
In this section, we estimate the average sentiment or emotional content of words in Tolstoy’s story. I have written a [separate article](https://rpubs.com/Karuitha/sentiment_star) on sentiment analysis (Karuitha 2022). In summary, there are several tools that allow for the estimation of sentiment. In R, the common sentiment analysis dictionaries are:
Bing
Afinn
Loughran
NRC
Please refere to the literature for each of these sentiment measures. In this case, we use the nrc dictionary. The nrc dictionary has 10 classes of sentiment listed below.
Most English words have carefully been allocated to each of these sentiments. However, there is still room for error. As noted earlier, people may use words in ways that may not correspond with their literal meaning. In such a case, the kind of analysis used in this article may be misleading.
Code
final_words %>%inner_join(get_sentiments("nrc")) %>%count(sentiment) %>%arrange(desc(n)) %>%mutate(prop = n /sum(n)) %>%gt()
sentiment
n
prop
positive
259
0.22839506
anticipation
182
0.16049383
trust
147
0.12962963
negative
131
0.11552028
joy
96
0.08465608
sadness
77
0.06790123
anger
68
0.05996473
surprise
68
0.05996473
fear
63
0.05555556
disgust
43
0.03791887
Sentiment Score Using the NRC Method
Let us redo the analysis using the bing lexicon.
Code
final_words %>%inner_join(get_sentiments("bing")) %>%count(sentiment) %>%arrange(desc(n)) %>%mutate(prop = n /sum(n)) %>%gt()
sentiment
n
prop
negative
142
0.5
positive
142
0.5
Sentiment Score Using the Bing Method
Here is the same analysis using the loughran lexicon.
Code
final_words %>%inner_join(get_sentiments("loughran")) %>%count(sentiment) %>%arrange(desc(n)) %>%mutate(prop = n /sum(n)) %>%gt()
sentiment
n
prop
negative
73
0.437125749
positive
47
0.281437126
uncertainty
25
0.149700599
litigious
21
0.125748503
constraining
1
0.005988024
Sentiment Score Using the Loughran Method
The afinn lexicon shows that the story is largely positive.
Overall, the story appears to be more positive than negative, given its purpose to provoke people to rethink their attitude to the pursuit of material wealth. Like every great story, “How much land does a man need?” uses a great deal of suspense and hence scores high in both anticipation and uncertainty.
5 Conclusion
In conclusion, Leo Tolstoy’s short story, “How much land does a man need?”, offers a timeless and cautionary tale that resonates with readers on multiple levels. Through the exploration of themes such as greed, ambition, and the pursuit of material wealth, Tolstoy masterfully crafts a narrative that exposes the destructive consequences of unchecked desires. The story is largely positive, though packed with uncertainty and anticipation as any good story should be in using suspense to draw the readers curiosity.
The story serves as an allegory, urging readers to reflect upon the true worth of wealth and the dangers of insatiable cravings. By conducting sentiment analysis on this poignant tale, we gain a deeper understanding of the emotional nuances and underlying sentiments conveyed by Tolstoy, enabling us to appreciate the profound impact of his storytelling. Ultimately, “How much land does a man need?” stands as a powerful reminder of the importance of contentment and the perils of endless aspirations.
References
Karuitha, John. 2022. “Natural language Processing in R: Sentiment Analysis of Kenya’s Star Newspaper on Saturday July 16, 2022.” 2022. {https://rpubs.com/Karuitha/sentiment_star}.
Silge, Julia, and David Robinson. 2017. Text Mining with r. 1st ed. Sebastopol, CA: O’Reilly Media.
Tolstoy, Leo. 1905. How Much Land Does a Man Need? Moscow, Russia: The Literature Network.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.