Introduction

Goodreads.com is a popular literary social media site that catalogs books, provides a platform for ratings and reviews, and helps users find their next book to read based on their interests and the recommendations of other users. With over 20 million users and 10 million books catalogued, data from Goodreads is a valuable source for analysing the popular opinions of books based on reader sentiment without input from professional critic opinions. The following analysis investigates a variety of different perspectives of book popularity adn gives some visualization to popularity trends.


Dataset

The dataset used for this analysis is goodbook-10k from GitHub user Zygmunt Zajac available at https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/books.csv. The set includes the 10,000 books with the most number of reviews, with a total of 6 million ratings. Reviews are considered written opinions of a work and ratings are considered numerical categorization between 1 and 5, with 5 being the most positive. Some books may come in multiple formats or editions are included as only one record considered a “work.”
     

Analysis and Visualizations


Ratings by Year

The oldest book in the dataset is The Epic of Gilgamesh first published in 1750 BCE, and the latest books in the dataset are from 2017. While most books are later years, a summary of ratings from this range shows an increase in positive ratings through the first millenia, followed by a sharp decrease, and then a staggered increase up through modern times:

Average of Ratings by Year, 2000-2017

A closer look at the more recent years of publication show that since 2000, ratings went down then went positive, peaking around 2011, before dipping negative again:


Number of Ratings by Year

An analysis of the number of ratings by years shows that most ratings were given for books published in the last few hundred years:


5-Star Ratings by Year

Looking at just ratings of 5 stars, it is apparent that books published is recent years recevied the most 5-star ratings:


Average Ratings by Language

Graphing the average rating of books by their original language shows that books in Turkish have the highest average ratings, while books in Arabic have the lowest. Books in English rate in the lower end of the range.


Average Ratings by Year for each Language

Looking at each original language seperately, it is apparent that English has many more reviews that other languages, but also that most languages have the greatest number of reviews in recent years. All languages show that most ratings are the highest in recent years.