4 01 2020

Goals of the project

  • See the evolution of English language in literature between 17th - 20th century
  • Cross Century Analysis of Sentiment
  • Cross Century Analysis of Parts of Speech

Dataset Description

  • Books collected from Gutenberg Project
  • Sample of books for centuries based on rank of most popular ones from www.goodreads.com

Dataset Preparation

  • Only words extracted from books with regex
  • Lemmatization introduced with Stanford NLP Library - Python
  • Removing stopwords with R package tm
  • Sentiment Analysis of Lemmatized Words with Vader - Python

Initial analysis

Number of Authors vs Number of Titles

Birthplaces of most popular authors

Total Number of Words vs Count of Unique Words for each Century

Count of Unique Words to All Words

Part of speech analysis

Stacked Barplot with Percentage of Parts of Speech for Unique Words

Sentiment Analysis

Sentiment Type to Sum of All Words

Is there progressive simplification of English language?

Basic English by Charles Kay Ogden

Use of basic words in literature - analysis of all used words

Use of basic words in literature - analysis of unique words

Wordmap - 20th century

Wordmap - 17th century