Importing Packages

## Warning: cannot remove prior installation of package 'lubridate'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\LENOVO\AppData\Local\R\win-library\4.2\00LOCK\lubridate\libs\x64\lubridate.dll
## to
## C:\Users\LENOVO\AppData\Local\R\win-library\4.2\lubridate\libs\x64\lubridate.dll:
## Permission denied
## Warning: restored 'lubridate'
## Warning: package 'dplyr' is in use and will not be installed
## Warning: package 'tidyr' is in use and will not be installed

Importing Dataset

Exploring Dataset

head(bestsellers_with_categories)
##                                                                 Name
## 1                                      10-Day Green Smoothie Cleanse
## 2                                                  11/22/63: A Novel
## 3                            12 Rules for Life: An Antidote to Chaos
## 4                                             1984 (Signet Classics)
## 5 5,000 Awesome Facts (About Everything!) (National Geographic Kids)
## 6                      A Dance with Dragons (A Song of Ice and Fire)
##                     Author User.Rating Reviews Price Year       Genre
## 1                 JJ Smith         4.7   17350     8 2016 Non Fiction
## 2             Stephen King         4.6    2052    22 2011     Fiction
## 3       Jordan B. Peterson         4.7   18979    15 2018 Non Fiction
## 4            George Orwell         4.7   21424     6 2017     Fiction
## 5 National Geographic Kids         4.8    7665    12 2019 Non Fiction
## 6      George R. R. Martin         4.4   12643    11 2011     Fiction
is.null(bestsellers_with_categories)
## [1] FALSE
str(bestsellers_with_categories)
## 'data.frame':    550 obs. of  7 variables:
##  $ Name       : chr  "10-Day Green Smoothie Cleanse" "11/22/63: A Novel" "12 Rules for Life: An Antidote to Chaos" "1984 (Signet Classics)" ...
##  $ Author     : chr  "JJ Smith" "Stephen King" "Jordan B. Peterson" "George Orwell" ...
##  $ User.Rating: num  4.7 4.6 4.7 4.7 4.8 4.4 4.7 4.7 4.7 4.6 ...
##  $ Reviews    : int  17350 2052 18979 21424 7665 12643 19735 19699 5983 23848 ...
##  $ Price      : int  8 22 15 6 12 11 30 15 3 8 ...
##  $ Year       : int  2016 2011 2018 2017 2019 2011 2014 2017 2018 2016 ...
##  $ Genre      : chr  "Non Fiction" "Fiction" "Non Fiction" "Fiction" ...

The data looks good, consist of 7 columns, no null or empty cells.

Analyzing Dataset

The data set start from 2009-2019. The min review is 37 and the max review is 87841. The Price is start from $0 to the highest is $105.There are 12 books that cost $0 and 2 books that cost $105. User Rating start from 3.3 to the highest 4.9.

## # A tibble: 2 × 2
##   Genre        Name
##   <chr>       <int>
## 1 Fiction       240
## 2 Non Fiction   310
## [1] 248

There are 550 books in which 240 are Fiction and 310 are Non fiction.There are 248 Authors in this Data. Author Jeff Kinney wrote 12 books in these past years and he has the highest number of the book.

Visualizations

Summary

We can see that Fiction book is 44% and Non Fiction is 56%. From 2009 to 2019 the review is increased. The highest number of people to give a good rating from 4.5 to 4.9 is in the Non Fiction book. In each year the produced of Non Fiction book is higher than Fiction book. And there is no linear relationship between price and rating, means that higher price does not result higher rating in Fiction and Non Fiction book.