Create a new dataframe called books_small that includes title, authors, average_rating, and publication_date
books_small <- select(books, title, authors, average_rating, publication_date)
books_small
## # A tibble: 8,470 x 4
## title authors average_rating publication_date
## <chr> <chr> <dbl> <date>
## 1 "Harry Potter and the Half-Bloo… J.K. Rowling 4.57 2006-09-16
## 2 "Harry Potter and the Order of … J.K. Rowling 4.49 2004-09-01
## 3 "Harry Potter and the Chamber o… J.K. Rowling 4.42 2003-11-01
## 4 "Harry Potter and the Prisoner … J.K. Rowling 4.56 2004-05-01
## 5 "Harry Potter Boxed Set Books … J.K. Rowling 4.78 2004-09-13
## 6 "Unauthorized Harry Potter Book… W. Frederic… 3.74 2005-04-26
## 7 "Harry Potter Collection (Harry… J.K. Rowling 4.73 2005-09-12
## 8 "The Ultimate Hitchhiker's Guid… Douglas Ada… 4.38 2005-11-01
## 9 "The Ultimate Hitchhiker's Guid… Douglas Ada… 4.38 2002-04-30
## 10 "The Hitchhiker's Guide to the … Douglas Ada… 4.22 2004-08-03
## # … with 8,460 more rows
Create a new variable called tot_points that is the multiplication of average_rating and rating_counts
books <- mutate(books, tot_points = average_rating*ratings_count)
books <- select(books, tot_points)
books
## # A tibble: 8,470 x 1
## tot_points
## <dbl>
## 1 9577303.
## 2 9667720.
## 3 27992.
## 4 10668508.
## 5 198026.
## 6 71.1
## 7 133585.
## 8 15891.
## 9 1093064.
## 10 20805.
## # … with 8,460 more rows
How many book titles begin with the word “The”?
count(filter(books, str_detect(title,"^The")))
## # A tibble: 1 x 1
## n
## <int>
## 1 2313
Find all the books that were published in 2005 and that either had an average rating above 4.5 or more than 1,000 text reviews.
booksQ4 <- select(books, publication_date, average_rating, text_reviews_count)
booksQ4 <- arrange(filter(booksQ4, (publication_date <="2005-12-31" & publication_date >="2005-01-01"), (average_rating > 4.5 | text_reviews_count > 1000)))
booksQ4
## # A tibble: 154 x 3
## publication_date average_rating text_reviews_count
## <date> <dbl> <dbl>
## 1 2005-09-12 4.73 808
## 2 2005-08-30 3.42 1688
## 3 2005-11-15 3.87 1706
## 4 2005-12-27 3.93 2780
## 5 2005-09-27 3.77 2925
## 6 2005-09-13 4.56 299
## 7 2005-05-03 4.5 1427
## 8 2005-11-08 4.55 294
## 9 2005-07-05 4.52 419
## 10 2005-11-08 3.82 1726
## # … with 144 more rows
books <- mutate(books, tot_points = average_rating*ratings_count)
authors_top <- group_by(books, authors)
authors_top <- summarise(authors_top, top_ranked = sum(tot_points))
authors_top = arrange(authors_top, desc(top_ranked))
head(authors_top, 5)
## # A tibble: 5 x 2
## authors top_ranked
## <chr> <dbl>
## 1 J.K. Rowling 40534919.
## 2 J.R.R. Tolkien 24143922
## 3 Dan Brown 16014699.
## 4 Stephen King 15680588.
## 5 Nicholas Sparks 12556063.
The objective from this code was to arrange the top authors based on their total review points acquired. The total review points are comprised of the sum of average rating x ratings count for all their books in this dataset. In order to most easily find the top authors based on on their point acquisition, I ranked them in descending order, so that the authors with the highest points are the top, then used the head function in order to get a top view.