4 01 2020
Goals of the project
- See the evolution of English language in literature between 17th - 20th century
- Cross Century Analysis of Sentiment
- Cross Century Analysis of Parts of Speech
Dataset Description
- Books collected from Gutenberg Project
- Sample of books for centuries based on rank of most popular ones from www.goodreads.com
Dataset Preparation
- Only words extracted from books with regex
- Lemmatization introduced with Stanford NLP Library - Python
- Removing stopwords with R package tm
- Sentiment Analysis of Lemmatized Words with Vader - Python
Initial analysis
Number of Authors vs Number of Titles

Birthplaces of most popular authors
Total Number of Words vs Count of Unique Words for each Century

Count of Unique Words to All Words

Part of speech analysis
Stacked Barplot with Percentage of Parts of Speech for Unique Words

Sentiment Analysis
Sentiment Type to Sum of All Words

Is there progressive simplification of English language?
Basic English by Charles Kay Ogden

Use of basic words in literature - analysis of all used words

Use of basic words in literature - analysis of unique words

Wordmap - 20th century
Wordmap - 17th century