[1] 19.23645
[1] 18.99994
Since I was a young child there has always been a book in hand, my two favorite generas have always been Science Fiction and Fantays, something about magic and future technology has captivated my attention in a way no other medium ever has.
To pick between the two has always been such a difficult choice, with this project I am taking the time to see what the larger world of readers has to say using my skills as a Data Science student at Xavier University.
There is no bigger collection of review data than GoodReads, I personally am not a huge fan of their platform but it has become increasingly hard to find any companies that excclusively use their own reviews or reviews from purchasers. So I collected 30 reviews from each of my 10 favorite books from Science Fiction and Fantasy. The book lists contain a variety of old and new books that span the breadth of each generas specialties.
Fantasy:
Eragon
The Hobbit
A Pawn of Prophecy
Wizard’s First Rule
Transit to Scorpio
A Princess of Mars
The Way of Kings
The Princess Bride
The Lies of Locke Lamora
The Shadow of the Gods
Science Fiction:
Dune
To Sleep in a Sea of Stars
Nyxia
The Martian
Project Hail Mary
BattleField Earth
Hitchhikers Guide to the Galaxy
The Tar-Aiym Krang
Ready Player One
Dark Operator
The data set contains:
review_content = text of the review
review_rating = 1 to 5 scale rating of the book
review_date the ddate reviewed
book_id = the id number for the book
reviewer_id = the id number of the reviewer
reviewer_name = the reviewer’s username
reviewer_followers = the number of followers the reviewer has
reviewer_total_reviews = the total number of reviews the reviewer has published
I decided to do a quick sentiment analysis on the review content for each genera.
[1] 19.23645
[1] 18.99994
Looking at this bar chart, we can see that there is a much tighter spread on Science Fiction than in fantasy where the quartile ranges from 10 to 30. The average sentiment is only marginally higher for fantasy at 19.23 whereas SciFi is 18.99, both reamin thuroughly positive.
all_books_reviews %>%
ggplot(aes(x = review_rating))+
geom_histogram()+
facet_wrap(~genera)`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Warning: Removed 18 rows containing non-finite outside the scale range
(`stat_bin()`).
all_books_reviews %>%
filter(genera == "Fantasy") %>%
pull(review_rating) %>%
mean(na.rm = TRUE)%>% print()[1] 3.786942
all_books_reviews %>%
filter(genera == "SciFi") %>%
pull(review_rating) %>%
mean(na.rm = TRUE)%>% print()[1] 3.75945
The ratings turn into another incredibly tight race, with similar looking histograms and then an incredible .02 difference favoring fantasy with just the slightest edge. It would appear that Science Fiction and Fantasy hold very similar places in the general population of readers hearts.
I chose to include this plot of sentiment scores vs. review ratings because we see this interesting trend of people who rated the book 3 stars having higher sentiment values. This to me makes sense becauase a lot of the 4 star readers leave smaller reviews, people who review at three stars generally have a larger quantity of words and look more favorably upon the book by percentage of their review than negatively.
Because these generas had been reviewed so positively and they were in so close of a competition, I decided that I just needed to vindicate that SciFi and Fantasy are infact the best two generas even if i cannot tell them apart.
So I turned to mystery to see if another genera could outst Fantasy or SciFi in the public opinion, mystery because it happens to be my third favorite genera. I used a Goodreads generated top mystery novels table to webscrape from in order to make the below graphic and means.
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
[1] "Fantasy: 3.8021"
[1] "SciFi: 3.7665"
[1] "Mystery: 3.8971"
Unfortunatly this shows the result that the top mystery books according to Goodreads have beaten out my selection of Fiction and Fantasy books by the narrow margin.
Books will always hold a special place in the hearts of humanity, they have been the premier way for us to convey stories for the last two and a half thousand years, although I was not able to conclusively find that among my three favorite generas there is a clear winner this study has clearly shown that people desire to read good stories. None of these books had a negative average sentiment score nor below a three average rating. The reason no genera stands apart is because they all serve a different and necessary function, they each have their tales to tell, I for one know that my life would be worse off had I not had the opportunity to read selections from each of these and many more generas besides. The moral of the story is read more and read diversely, while you might have a favorite genera there is a book out there in every space that will resonate with you and will teach you how to think about life in new ways. Happy reading!