Question 1

Create a new dataframe called books_small that includes title, authors, average_rating, and publication_date

books_small <- select(books, title, authors, average_rating, publication_date)
books_small 
## # A tibble: 8,470 x 4
##    title                            authors      average_rating publication_date
##    <chr>                            <chr>                 <dbl> <date>          
##  1 "Harry Potter and the Half-Bloo… J.K. Rowling           4.57 2006-09-16      
##  2 "Harry Potter and the Order of … J.K. Rowling           4.49 2004-09-01      
##  3 "Harry Potter and the Chamber o… J.K. Rowling           4.42 2003-11-01      
##  4 "Harry Potter and the Prisoner … J.K. Rowling           4.56 2004-05-01      
##  5 "Harry Potter Boxed Set  Books … J.K. Rowling           4.78 2004-09-13      
##  6 "Unauthorized Harry Potter Book… W. Frederic…           3.74 2005-04-26      
##  7 "Harry Potter Collection (Harry… J.K. Rowling           4.73 2005-09-12      
##  8 "The Ultimate Hitchhiker's Guid… Douglas Ada…           4.38 2005-11-01      
##  9 "The Ultimate Hitchhiker's Guid… Douglas Ada…           4.38 2002-04-30      
## 10 "The Hitchhiker's Guide to the … Douglas Ada…           4.22 2004-08-03      
## # … with 8,460 more rows

Question 2

Create a new variable called tot_points that is the multiplication of average_rating and rating_counts

books <- mutate(books, tot_points = average_rating*ratings_count) 
books <- select(books, tot_points)
books
## # A tibble: 8,470 x 1
##    tot_points
##         <dbl>
##  1  9577303. 
##  2  9667720. 
##  3    27992. 
##  4 10668508. 
##  5   198026. 
##  6       71.1
##  7   133585. 
##  8    15891. 
##  9  1093064. 
## 10    20805. 
## # … with 8,460 more rows

Question 3

How many book titles begin with the word “The”?

count(filter(books, str_detect(title,"^The")))
## # A tibble: 1 x 1
##       n
##   <int>
## 1  2313

Question 4

Find all the books that were published in 2005 and that either had an average rating above 4.5 or more than 1,000 text reviews.

booksQ4 <- select(books, publication_date, average_rating, text_reviews_count)
booksQ4 <- arrange(filter(booksQ4, (publication_date <="2005-12-31" & publication_date >="2005-01-01"), (average_rating > 4.5 | text_reviews_count > 1000)))
booksQ4
## # A tibble: 154 x 3
##    publication_date average_rating text_reviews_count
##    <date>                    <dbl>              <dbl>
##  1 2005-09-12                 4.73                808
##  2 2005-08-30                 3.42               1688
##  3 2005-11-15                 3.87               1706
##  4 2005-12-27                 3.93               2780
##  5 2005-09-27                 3.77               2925
##  6 2005-09-13                 4.56                299
##  7 2005-05-03                 4.5                1427
##  8 2005-11-08                 4.55                294
##  9 2005-07-05                 4.52                419
## 10 2005-11-08                 3.82               1726
## # … with 144 more rows

Question 5

books <- mutate(books, tot_points = average_rating*ratings_count) 
  authors_top <- group_by(books, authors) 
  authors_top <- summarise(authors_top, top_ranked = sum(tot_points))
  authors_top = arrange(authors_top, desc(top_ranked))
  head(authors_top, 5)
## # A tibble: 5 x 2
##   authors         top_ranked
##   <chr>                <dbl>
## 1 J.K. Rowling     40534919.
## 2 J.R.R. Tolkien   24143922 
## 3 Dan Brown        16014699.
## 4 Stephen King     15680588.
## 5 Nicholas Sparks  12556063.

The objective from this code was to arrange the top authors based on their total review points acquired. The total review points are comprised of the sum of average rating x ratings count for all their books in this dataset. In order to most easily find the top authors based on on their point acquisition, I ranked them in descending order, so that the authors with the highest points are the top, then used the head function in order to get a top view.