Data and Data Source

These data list the top five bestselling books on the New York Times Bestseller List organized by date and genre from January 3rd, 2010 to December 29th, 2019. Relevant information included is the published date of list, book genre, title, author, price, and total weeks on list (including before January 3rd, 2010). The data source originally came from the New York Times Bestseller List and was collected and downloaded from this link: https://www.kaggle.com/dhruvildave/new-york-times-best-sellers

5 Number Analysis and More of the Price Column

## Standard Deviation of Price Column
sdTotalPrice = sd(bestSellers$price); sdTotalPrice
## [1] 8.789769
## 5 Number Analysis 
summary(bestSellers$price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   3.863   0.000 150.000

Scatter of Price over Time

After 2013, the entries for price are entered at zero and some of the unknown entries are also entered as zero. However, for the data we do know, most books are under $50. Therefore, the data is skewed to the right and is not complete.

Box Plot Distribution of Price Column

The Price Distribution with outliers shows how much the data is skewed to the right by a large number of zero values. When all zero values are removed, the distribution shows the data still skewed to the right, but this time the IQR ranges from 13.95 to 25.99.

## 5 Point Summary without Zero Values
summary(skewPrice$price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.99   13.95   16.99   19.30   25.99  150.00

5 Number Analysis and More of the Weeks on List Column

## Standard Deviation of Weeks on List Column 
sdTotalWeeks = sd(bestSellers$weeks_on_list); sdTotalWeeks
## [1] 63.53431
## 5 Number Analysis of Weeks on List Column
summary(bestSellers$weeks_on_list)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    2.00   24.71   16.00  607.00

Scatter of Weeks on List over Time

The number of weeks on the list for each individual book varies, though most books’ weeks on list increase over time. In the early 2010s, there seems to be an outlier of books that are on the book list for a long period of time that is not replicated in any more years. There does not seem to be any other correlation.

Box Plot Distribution of Weeks on List Column

## 5 Point Summary Without Outliers
summary(skewWeeks$weeks_on_list)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    1.00    5.29    6.00   40.00

Weeks on List Analysis

The Weeks on List Distribution with Outliers shows how much the data is skewed to the right with a large numbers of high value outliers. When all outliers are removed, the distribution shows the data still skewed to the right, but this time the IQR ranges from 0 to 6.This shows that most books do not spend more than 1 month/1.5 months on the bestsellers list.

Top 10 Books Most Often in Top 5

We can see that the book that made the Bestseller List the most weeks is Unbroken which was 662 weeks.The closest book after is The Five Love Languages which has 466 weeks. Unbroken is an outlier, due to the rest of the bars in the top 10 being relatively even. The lowest is The Book Thief, whcih reaches 300 weeks on the bestseller list. All are significantly larger than the 24.71 average of the Weeks on List category. However, the Weeks on List category includes time before the start of 2010, while this graph only counts the time between Janurary 3rd, 2010 to December 29th, 2019. However, the statistic is an interesting gague to show how each of these books break the average by a long shot, and how Unbroken is statistically unique both within the top ten and with the rest of the data.

Top 50 Popular Authors vs Authors with Unique Books in NYT Top 5

The left side shows the authors that have had the most time on the list overall, while the right shows the most authors with the most different books hitting the top 5. The top author or authors in this case with time on the list is Bill O’Reilly and Martin Dugard with 4.37%. They do not enter the top 50 on the right, which shows that they have written fewer books who have spent longer on the list than the authors on the right. On the right, Danielle Steel has the most unique books that have entered the NYT Bestseller list. However, she does not appear on the left which means she writes a lot of books that make the top 5, but books that do not stay in the top 5 for a long time. Both distributions on the left and right have a clear winner, but a gradual decline in percentage. This shows a linear distribution between the top 50 values.

Weeks on Bestseller List

Since there are dozens of genres, I chose three popular genres with different age groups. I included data from books that were on the list before Janurary 2010 to gain a complete picture of a book’s life. I also chose hardcover to track current industry trends closer than older books.

The left shows that Graphic Books have the most outliers, and has outliers greater in value than Chapter Books and Fiction. Chapter Books have the least amount of outliers. When eliminating the outliers, Chapter Books has the greatest IQR, range, and median. Chapter Books stay far more consistent in weeks on list, and longer than both Graphic Books and Fiction. Therefore, only a few Graphic Books greatly exceed the standard for Weeks on List. Fiction has the least IQR and range, so Hardcover Fiction Books spend less time on the list, even though there are outliers that match with Chapter Books’ outliers.

Time vs Price vs Weeks on List

Due to the nature of the Price data, I fragmented the data so that only the time between 2010 and 2013 are featured within the graph. I also averaged the price and weeks on list for the top 5 entries for each date by genre.

There is no correlation with the date the book makes the list and average time on list. Chapter books seem to be longer on the list than any other genre in January-July 2010. The average price does not necessarily affect the average time on list between Fiction and Graphic Books which have similar price ranges. Chapter Books has a lower price and has a higher ceiling of time on list. Average Price does not vary over time with Chapter and Fiction. Graphic Novel varies more, but there is still a consistent price line within the genre.