Webscraping Assignment

Author

Renato Chavez

Published

March 30, 2023

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.4     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
library('rvest')

Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding
url <- 'http://www.imdb.com/search/title?count=100&release_date=2016,2016&title_type=feature'
webpage <- read_html(url)
rank_data_html <- html_nodes(webpage, '.text-primary')
rank_data <- html_text(rank_data_html)
head(rank_data)
[1] "1." "2." "3." "4." "5." "6."
length(rank_data)
[1] 100
rank_data<-as.numeric(rank_data)
head(rank_data)
[1] 1 2 3 4 5 6
length(rank_data)
[1] 100
title_data_html <- html_nodes(webpage, '.lister-item-header a')
title_data <- html_text(title_data_html)
head(title_data)
[1] "The Magnificent Seven"        "Me Before You"               
[3] "Rogue One: A Star Wars Story" "Hidden Figures"              
[5] "Suicide Squad"                "Sing"                        
length(title_data)
[1] 100
description_data_html <- html_nodes(webpage, '.ratings-bar+ .text-muted')
description_data <- html_text(description_data_html)
head(description_data)
[1] "\nSeven gunmen from a variety of backgrounds are brought together by a vengeful young widow to protect her town from the private army of a destructive industrialist."                                                          
[2] "\nA girl in a small town forms an unlikely bond with a recently-paralyzed man she's taking care of."                                                                                                                            
[3] "\nIn a time of conflict, a group of unlikely heroes band together on a mission to steal the plans to the Death Star, the Empire's ultimate weapon of destruction."                                                              
[4] "\nThe story of a team of female African-American mathematicians who served a vital role in NASA during the early years of the U.S. space program."                                                                              
[5] "\nA secret government agency recruits some of the most dangerous incarcerated super-villains to form a defensive task force. Their first mission: save the world from the apocalypse."                                          
[6] "\nIn a city of humanoid animals, a hustling theater impresario's attempt to save his theater with a singing competition becomes grander than he anticipates even as its finalists find that their lives will never be the same."
length(description_data)
[1] 100
description_data<-gsub("\n","",description_data)
head(description_data)
[1] "Seven gunmen from a variety of backgrounds are brought together by a vengeful young widow to protect her town from the private army of a destructive industrialist."                                                          
[2] "A girl in a small town forms an unlikely bond with a recently-paralyzed man she's taking care of."                                                                                                                            
[3] "In a time of conflict, a group of unlikely heroes band together on a mission to steal the plans to the Death Star, the Empire's ultimate weapon of destruction."                                                              
[4] "The story of a team of female African-American mathematicians who served a vital role in NASA during the early years of the U.S. space program."                                                                              
[5] "A secret government agency recruits some of the most dangerous incarcerated super-villains to form a defensive task force. Their first mission: save the world from the apocalypse."                                          
[6] "In a city of humanoid animals, a hustling theater impresario's attempt to save his theater with a singing competition becomes grander than he anticipates even as its finalists find that their lives will never be the same."
length(description_data)
[1] 100
runtime_data_html <- html_nodes(webpage, '.text-muted .runtime')
runtime_data <- html_text(runtime_data_html)
head(runtime_data)
[1] "132 min" "106 min" "133 min" "127 min" "123 min" "108 min"
length(runtime_data)
[1] 100
runtime_data<-gsub(" min","",runtime_data)
runtime_data<-as.numeric(runtime_data)
head(runtime_data)
[1] 132 106 133 127 123 108
length(runtime_data)
[1] 100
genre_data_html <- html_nodes(webpage, '.genre')
genre_data <- html_text(genre_data_html)
head(genre_data)
[1] "\nAction, Adventure, Western            "
[2] "\nDrama, Romance            "            
[3] "\nAction, Adventure, Sci-Fi            " 
[4] "\nBiography, Drama, History            " 
[5] "\nAction, Adventure, Fantasy            "
[6] "\nAnimation, Comedy, Family            " 
length(genre_data)
[1] 100
genre_data<-gsub("\n","",genre_data)
genre_data<-gsub(" ","",genre_data)
genre_data<-gsub(",.*","",genre_data)
genre_data<-as.factor(genre_data)
head(genre_data)
[1] Action    Drama     Action    Biography Action    Animation
Levels: Action Adventure Animation Biography Comedy Crime Drama Horror
length(genre_data)
[1] 100
rating_data_html <- html_nodes(webpage,'.ratings-imdb-rating strong')
rating_data <- html_text(rating_data_html)
head(rating_data)
[1] "6.8" "7.4" "7.8" "7.8" "5.9" "7.1"
length(rating_data)
[1] 100
rating_data<-as.numeric(rating_data)
head(rating_data)
[1] 6.8 7.4 7.8 7.8 5.9 7.1
length(rating_data)
[1] 100
votes_data_html <- html_nodes(webpage,'.sort-num_votes-visible span:nth-child(2)')
votes_data <- html_text(votes_data_html)
head(votes_data)
[1] "217,105" "263,239" "651,912" "238,256" "695,436" "176,623"
length(votes_data)
[1] 100
votes_data<-gsub(",","",votes_data)
votes_data<-as.numeric(votes_data)
head(votes_data)
[1] 217105 263239 651912 238256 695436 176623
length(votes_data)
[1] 100
directors_data_html <- html_nodes(webpage,'.text-muted+ p a:nth-child(1)')
directors_data <- html_text(directors_data_html)
head(directors_data)
[1] "Antoine Fuqua"  "Thea Sharrock"  "Gareth Edwards" "Theodore Melfi"
[5] "David Ayer"     "Garth Jennings"
length(directors_data)
[1] 100
directors_data<-as.factor(directors_data)
actors_data_html <- html_nodes(webpage,'.lister-item-content .ghost+ a')
actors_data <- html_text(actors_data_html)
head(actors_data)
[1] "Denzel Washington"   "Emilia Clarke"       "Felicity Jones"     
[4] "Taraji P. Henson"    "Will Smith"          "Matthew McConaughey"
length(actors_data)
[1] 100
ratings_bar_data <- html_nodes(webpage,'.ratings-bar') %>%
  html_text2()

head(ratings_bar_data)
[1] "6.8\nRate this\n 1 2 3 4 5 6 7 8 9 10 6.8/10 X \n54 Metascore"
[2] "7.4\nRate this\n 1 2 3 4 5 6 7 8 9 10 7.4/10 X \n51 Metascore"
[3] "7.8\nRate this\n 1 2 3 4 5 6 7 8 9 10 7.8/10 X \n65 Metascore"
[4] "7.8\nRate this\n 1 2 3 4 5 6 7 8 9 10 7.8/10 X \n74 Metascore"
[5] "5.9\nRate this\n 1 2 3 4 5 6 7 8 9 10 5.9/10 X \n40 Metascore"
[6] "7.1\nRate this\n 1 2 3 4 5 6 7 8 9 10 7.1/10 X \n59 Metascore"
metascore_data <- str_match(ratings_bar_data, "\\d{2} Metascore") %>%
  str_match("\\d{2}") %>%
  as.numeric()

length(metascore_data)
[1] 100
metascore_data
  [1] 54 51 65 74 40 59 94 65 71 81 81 78 84 79 62 66 70 56 NA 68 67 25 73 52 96
 [26] 44 64 55 99 76 88 44 75 36 41 47 51 72 65 57 69 48 66 32 81 72 74 51 65 66
 [51] 77 NA 71 42 81 33 58 65 48 57 67 62 79 80 32 42 46 21 NA 79 52 45 48 42 77
 [76] 77 34 73 33 46 60 NA 78 61 76 66 40 58 23 44 59 22 60 58 35 39 60 34 81 49
summary(metascore_data)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  21.00   46.00   60.50   59.57   73.25   99.00       4 
votes_bar_data <- html_nodes(webpage,'.sort-num_votes-visible') %>%
  html_text2()

head(votes_bar_data)
[1] "Votes: 217,105 | Gross: $93.43M"  "Votes: 263,239 | Gross: $56.25M" 
[3] "Votes: 651,912 | Gross: $532.18M" "Votes: 238,256 | Gross: $169.61M"
[5] "Votes: 695,436 | Gross: $325.10M" "Votes: 176,623 | Gross: $270.40M"
gross_data <- str_match(votes_bar_data, "\\$.+$")
gross_data <- gsub("M","",gross_data)
gross_data <- substring(gross_data,2.6) %>%
  as.numeric()
length(gross_data)
[1] 100
movies_df<-data.frame(Rank = rank_data, Title = title_data, Description = description_data, 
                      Runtime = runtime_data, Genre = genre_data, Rating = rating_data, 
                      Director = directors_data, Actors = actors_data, 
                      Metascore = metascore_data, Votes = votes_data, 
                      Gross_Earning_in_Mil = gross_data)
str(movies_df)
'data.frame':   100 obs. of  11 variables:
 $ Rank                : num  1 2 3 4 5 6 7 8 9 10 ...
 $ Title               : chr  "The Magnificent Seven" "Me Before You" "Rogue One: A Star Wars Story" "Hidden Figures" ...
 $ Description         : chr  "Seven gunmen from a variety of backgrounds are brought together by a vengeful young widow to protect her town f"| __truncated__ "A girl in a small town forms an unlikely bond with a recently-paralyzed man she's taking care of." "In a time of conflict, a group of unlikely heroes band together on a mission to steal the plans to the Death St"| __truncated__ "The story of a team of female African-American mathematicians who served a vital role in NASA during the early "| __truncated__ ...
 $ Runtime             : num  132 106 133 127 123 108 128 108 139 116 ...
 $ Genre               : Factor w/ 8 levels "Action","Adventure",..: 1 7 1 4 1 3 5 1 4 7 ...
 $ Rating              : num  6.8 7.4 7.8 7.8 5.9 7.1 8 8 8.1 7.9 ...
 $ Director            : Factor w/ 99 levels "Aisling Walsh",..: 11 91 34 92 26 36 20 94 61 30 ...
 $ Actors              : chr  "Denzel Washington" "Emilia Clarke" "Felicity Jones" "Taraji P. Henson" ...
 $ Metascore           : num  54 51 65 74 40 59 94 65 71 81 ...
 $ Votes               : num  217105 263239 651912 238256 695436 ...
 $ Gross_Earning_in_Mil: num  93.4 56.2 532.2 169.6 325.1 ...

Question 1: Based on the above data, which movie from which genre had the longest runtime?

First, I started by sorting the dataset in descending order by its runtime, so that I know which are the longest movies. I will choose only the top 10 to make my plot.

longest <- movies_df %>% arrange(desc(Runtime)) %>%
  slice(1:10)
longest
   Rank                              Title
1    64                     American Honey
2    63                            Silence
3    55                        The Wailing
4    26 Batman v Superman: Dawn of Justice
5    72                          Brimstone
6    33         Captain America: Civil War
7    36                A Cure for Wellness
8    13                     The Handmaiden
9    24                  X-Men: Apocalypse
10   42                           13 Hours
                                                                                                                                                                                                                       Description
1               A teenage girl with nothing to lose joins a traveling magazine sales crew, and gets caught up in a whirlwind of hard partying, law bending and young love as she criss-crosses the Midwest with a band of misfits.
2                                                In the 17th century, two Portuguese Jesuit priests travel to Japan in an attempt to locate their mentor, who is rumored to have committed apostasy, and to propagate Catholicism.
3                                   Soon after a stranger arrives in a little village, a mysterious sickness starts spreading. A policeman, drawn into the incident, is forced to solve the mystery in order to save his daughter.
4                                                                    Fearing that the actions of Superman are left unchecked, Batman takes on the Man of Steel, while the world wrestles with what kind of a hero it really needs.
5                                                                                                                       From the moment the new Reverend climbs the pulpit, Liz knows that she and her family are in great danger.
6                                                                                                                               Political involvement in the Avengers' affairs causes a rift between Captain America and Iron Man.
7  An ambitious young executive is sent to retrieve his company's CEO from an idyllic but mysterious "wellness center" at a remote location in the Swiss Alps, but soon suspects that the spa's treatments are not what they seem.
8                                                                                                                   A woman is hired as a handmaiden to a Japanese heiress, but secretly she is involved in a plot to defraud her.
9                                                                                In the 1980s the X-Men must defeat an ancient all-powerful mutant, En Sabah Nur, who intends to thrive through bringing destruction to the world.
10                                                                                                                         During an attack on a U.S. compound in Libya, a security team struggles to make sense out of the chaos.
   Runtime     Genre Rating         Director          Actors Metascore  Votes
1      163 Adventure    7.0    Andrea Arnold      Sasha Lane        80  44358
2      161     Drama    7.1  Martin Scorsese Andrew Garfield        79 115844
3      156     Drama    7.4      Na Hong-jin    Jun Kunimura        81  72386
4      151    Action    6.4      Zack Snyder     Ben Affleck        44 708536
5      148     Drama    7.0 Martin Koolhoven      Guy Pearce        45  42812
6      147    Action    7.8    Anthony Russo     Chris Evans        75 803900
7      146     Drama    6.4   Gore Verbinski     Dane DeHaan        47 102863
8      145     Drama    8.1   Park Chan-wook     Kim Min-hee        84 154509
9      144    Action    6.9     Bryan Singer    James McAvoy        52 442940
10     144    Action    7.3      Michael Bay  John Krasinski        48 148447
   Gross_Earning_in_Mil
1                  0.66
2                  7.10
3                    NA
4                330.36
5                    NA
6                408.08
7                  8.11
8                  2.01
9                155.44
10                52.85
ggplot(longest, aes(Genre, Runtime, color = Title)) + 
  geom_point(aes(size = Rating), alpha = 1/2) + 
  ggtitle("Question 1") + 
  xlab("Genres") + 
  ylab("Runtime") + 
  geom_smooth() + 
  scale_size_area() 
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Answer to question 1:

Based on this plot, I can conclude that the movie with the longest runtime is American Honey with 163 minutes, and it is from the Adventure genre.

Question 2: Based on the above data, in the Runtime of 130-160 mins, which genre has the highest votes?

First, I will start by filtering the movies that have at least 130 minutes of runtime, and at most 160 minutes of runtime.

highest_votes <- movies_df
highest_votes1 <- filter(highest_votes, Runtime >= 130 & Runtime <= 160) 
ggplot(highest_votes1, aes(Runtime, Votes, color = Genre)) + 
  geom_point(aes(size = 10), alpha = 1/2) + 
  ggtitle("Question 2") + 
  xlab("Runtime") + 
  ylab("Votes") + 
  scale_y_continuous(labels = comma)

This dataset only includes the movies with at least 130 minutes of runtime, and 160 minutes at most. Now, I will filter them by genre, to then add the votes of each genre. This will help us get a more accurate answer.

highest_votes_action <- filter(highest_votes1, Genre == "Action")
highest_votes_bio <- filter(highest_votes1, Genre == "Biography")
highest_votes_drama <- filter(highest_votes1, Genre == "Drama")
highest_votes_adventure <- filter(highest_votes1, Genre == "Adventure")
highest_votes_horror <- filter(highest_votes1, Genre == "Horror")
highest_votes_anim <- filter(highest_votes1, Genre == "Animation")
highest_votes_action1 <- highest_votes_action[, 'Votes']
sum(highest_votes_action[, 'Votes']) 
[1] 2972840
highest_votes_bio1 <- highest_votes_bio[, 'Votes']
sum(highest_votes_bio[, 'Votes']) 
[1] 581180
highest_votes_drama1 <- highest_votes_drama[, 'Votes']
sum(highest_votes_drama[, 'Votes']) 
[1] 774923
highest_votes_adventure1 <- highest_votes_adventure[, 'Votes']
sum(highest_votes_adventure[, 'Votes']) 
[1] 485176
highest_votes_horror1 <- highest_votes_horror[, 'Votes']
sum(highest_votes_horror[, 'Votes']) 
[1] 278815
highest_votes_anim1 <- highest_votes_anim[, 'Votes']
sum(highest_votes_anim[, 'Votes']) 
[1] 87202

Answer to question 2:

As expected from seeing the plot I created, the Action genre is the one with the highest votes amongst the movies that have a runtime between 130 to 160 minutes.

Question 3: Based on the above data, across all genres which genre has the highest average gross earnings in runtime 100 to 120.

I will start by creating a dataset with the movies with runtime 100 to 120.

gross_earnings <- movies_df
gross_earnings1 <- filter(gross_earnings, Runtime >= 100 & Runtime <= 120) 

Then, in order to get an idea like the other plot, I will create a plot to visualize what we have.

ggplot(gross_earnings1, aes(Runtime, Gross_Earning_in_Mil, color = Genre)) + 
  geom_point(aes(size = 10), alpha = 1/2) + 
  ggtitle("Question 3") + 
  xlab("Runtime") + 
  ylab("Gross Earnings in Millions") + 
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)
Warning: Removed 3 rows containing missing values (`geom_point()`).

Now, I will explore each genre in this filtered dataset. First, I will find out how many movies of each genre there are. Then, I will calculate the average gross earning of each genre by dividing the sum of the movies’ gross earnings in millions by the number of movies that each genre has. I will store this average in a variable call a_g_e_genre for each genre there is. I will print the results for each genre, so that I can see which one has the highest average gross earnings and answer the question. There will be a process to include the N/A.

action_count <- filter(gross_earnings1, Genre == "Action")
print(action_count)
   Rank                                            Title
1     8                                         Deadpool
2    17                                    The Nice Guys
3    23                                   Train to Busan
4    32                             The Legend of Tarzan
5    34                                 Assassin's Creed
6    38                                   Doctor Strange
7    56                                     The 5th Wave
8    62                                    The Bad Batch
9    65                     Independence Day: Resurgence
10   71                             Central Intelligence
11   74                                   The Great Wall
12   79                                        Allegiant
13   87 Teenage Mutant Ninja Turtles: Out of the Shadows
14   92                                      The Do-Over
15   93                    Ghostbusters: Answer the Call
16   95                       The Huntsman: Winter's War
17  100                 Resident Evil: The Final Chapter
                                                                                                                                                                                                     Description
1                                                                          A wisecracking mercenary gets experimented on and becomes immortal but ugly, and sets out to track down the man who ruined his looks.
2                                                                                    In 1970s Los Angeles, a mismatched pair of private eyes investigate a missing girl and the mysterious death of a porn star.
3                                                                                               While a zombie virus breaks out in South Korea, passengers struggle to survive on the train from Seoul to Busan.
4                                                             Tarzan, having acclimated to life in London, is called back to his former home in the jungle to investigate the activities at a mining encampment.
5                                                    Callum Lynch explores the memories of his ancestor Aguilar de Nerha and gains the skills of a Master Assassin, before taking on the secret Templar society.
6                                                                                     While on a journey of physical and spiritual healing, a brilliant neurosurgeon is drawn into the world of the mystic arts.
7                                                         Four waves of increasingly deadly alien attacks have left most of Earth in ruin. Cassie is on the run, desperately trying to save her younger brother.
8                                                                                                                                                 In a desert dystopia, a young woman is kidnapped by cannibals.
9                                                          Two decades after the first Independence Day invasion, Earth is faced with a new extra-Solar threat. But will mankind's new space defenses be enough?
10                                                     After he reconnects with an awkward pal from high school through Facebook, a mild-mannered accountant is lured into the world of international espionage.
11                                         In ancient China, a group of European mercenaries encounters a secret army that maintains and defends the Great Wall of China against a horde of monstrous creatures.
12                      After the earth-shattering revelations of Insurgent, Tris must escape with Four beyond the wall that encircles Chicago, to finally discover the shocking truth of the world around them.
13                                              The Turtles get into another battle with their enemy the Shredder, who has acquired new allies: the mutant thugs Bebop and Rocksteady and the alien being Krang.
14                                  Two down-on-their-luck guys decide to fake their own deaths and start over with new identities, only to find the people they're pretending to be are in even deeper trouble.
15 Following a ghost invasion of Manhattan, paranormal enthusiasts Erin Gilbert and Abby Yates, nuclear engineer Jillian Holtzmann, and subway worker Patty Tolan band together to stop the otherworldly threat.
16           Eric and fellow warrior Sara, raised as members of ice Queen Freya's army, try to conceal their forbidden love as they fight to survive the wicked intentions of both Freya and her sister Ravenna.
17       Alice returns to where the nightmare began: The Hive in Raccoon City, where the Umbrella Corporation is gathering its forces for a final strike against the only remaining survivors of the apocalypse.
   Runtime  Genre Rating                Director               Actors Metascore
1      108 Action    8.0              Tim Miller        Ryan Reynolds        65
2      116 Action    7.3             Shane Black        Russell Crowe        70
3      118 Action    7.6            Sang-ho Yeon             Gong Yoo        73
4      110 Action    6.2             David Yates  Alexander Skarsgård        44
5      115 Action    5.6           Justin Kurzel   Michael Fassbender        36
6      115 Action    7.5        Scott Derrickson Benedict Cumberbatch        72
7      112 Action    5.2              J Blakeson   Chloë Grace Moretz        33
8      118 Action    5.2       Ana Lily Amirpour      Suki Waterhouse        62
9      120 Action    5.2         Roland Emmerich       Liam Hemsworth        32
10     107 Action    6.3 Rawson Marshall Thurber       Dwayne Johnson        52
11     103 Action    5.9             Yimou Zhang           Matt Damon        42
12     120 Action    5.7        Robert Schwentke     Shailene Woodley        33
13     112 Action    5.9              Dave Green            Megan Fox        40
14     108 Action    5.7            Steven Brill         Adam Sandler        22
15     117 Action    6.8               Paul Feig     Melissa McCarthy        60
16     114 Action    6.1   Cedric Nicolas-Troyan      Chris Hemsworth        35
17     107 Action    5.5      Paul W.S. Anderson       Milla Jovovich        49
     Votes Gross_Earning_in_Mil
1  1057398               363.07
2   338304                36.26
3   230781                 2.13
4   180413               126.64
5   202192                54.65
6   758300               232.64
7   112405                34.92
8    33140                 0.18
9   182762               103.14
10  197847               127.44
11  139559                45.54
12  126380                66.18
13   95479                82.05
14   48323                   NA
15  235782               128.34
16  113351                48.39
17   96678                26.83
action_count1 <- mutate(action_count, a_g_e_action = (sum(action_count[, 'Gross_Earning_in_Mil'], na.rm = TRUE)) / 10)
drama_count <- filter(gross_earnings1, Genre == "Drama")
print(drama_count)
  Rank                         Title
1    2                 Me Before You
2   10                       Arrival
3   21             Nocturnal Animals
4   29                     Moonlight
5   30           10 Cloverfield Lane
6   35                    Passengers
7   69 American Wrestler: The Wizard
                                                                                                                                                                                   Description
1                                                                                            A girl in a small town forms an unlikely bond with a recently-paralyzed man she's taking care of.
2                                                           A linguist works with the military to communicate with alien lifeforms after twelve mysterious spacecraft appear around the world.
3                                                              A wealthy art gallery owner is haunted by her ex-husband's novel, a violent thriller she interprets as a symbolic revenge tale.
4                         A young African-American man grapples with his identity and sexuality while experiencing the everyday struggles of childhood, adolescence, and burgeoning adulthood.
5                                                    A young woman is held in an underground bunker by a man who insists that a hostile event has left the surface of the Earth uninhabitable.
6                                                                     A malfunction in a sleeping pod on a spacecraft traveling to a distant colony planet wakes one passenger 90 years early.
7 In 1980, a teenage boy escapes the unrest in Iran only to face more hostility in America, due to the hostage crisis. Determined to fit in, he joins the school's floundering wrestling team.
  Runtime Genre Rating         Director            Actors Metascore  Votes
1     106 Drama    7.4    Thea Sharrock     Emilia Clarke        51 263239
2     116 Drama    7.9 Denis Villeneuve         Amy Adams        81 711635
3     116 Drama    7.5         Tom Ford         Amy Adams        67 287010
4     111 Drama    7.4    Barry Jenkins    Mahershala Ali        99 315736
5     103 Drama    7.2 Dan Trachtenberg      John Goodman        76 336944
6     116 Drama    7.0    Morten Tyldum Jennifer Lawrence        41 417349
7     117 Drama    7.1  Alex Ranarivelo     George Thomas        NA   2545
  Gross_Earning_in_Mil
1                56.25
2               100.55
3                10.66
4                27.85
5                72.08
6               100.01
7                   NA
drama_count1 <- mutate(drama_count, a_g_e_drama = (sum(drama_count[, 'Gross_Earning_in_Mil'], na.rm = TRUE)) / 7)
anim_count <- filter(gross_earnings1, Genre == "Animation")
print(anim_count)
  Rank      Title
1    6       Sing
2   11      Moana
3   12   Zootopia
4   14 Your Name.
                                                                                                                                                                                                                    Description
1 In a city of humanoid animals, a hustling theater impresario's attempt to save his theater with a singing competition becomes grander than he anticipates even as its finalists find that their lives will never be the same.
2                                                    In Ancient Polynesia, when a terrible curse incurred by the Demigod Maui reaches Moana's island, she answers the Ocean's call to seek out the Demigod to set things right.
3                                                                                             In a city of anthropomorphic animals, a rookie bunny cop and a cynical con artist fox must work together to uncover a conspiracy.
4                                                                                           Two strangers find themselves linked in a bizarre way. When a connection forms, will distance be the only thing to keep them apart?
  Runtime     Genre Rating       Director              Actors Metascore  Votes
1     108 Animation    7.1 Garth Jennings Matthew McConaughey        59 176623
2     107 Animation    7.6   Ron Clements     Auli'i Cravalho        81 346704
3     108 Animation    8.0   Byron Howard    Ginnifer Goodwin        78 511169
4     106 Animation    8.4 Makoto Shinkai    Ryunosuke Kamiki        79 279615
  Gross_Earning_in_Mil
1               270.40
2               248.76
3               341.27
4                 5.02
anim_count1 <- mutate(anim_count, a_g_e_anim = (sum(anim_count[, 'Gross_Earning_in_Mil'], na.rm = TRUE)) / 4)
horror_count <- filter(gross_earnings1, Genre == "Horror")
print(horror_count)
  Rank          Title
1   15          Split
2   48 The Neon Demon
                                                                                                                                                                                  Description
1                               Three girls are kidnapped by a man with a diagnosed 23 distinct personalities. They must try to escape before the apparent emergence of a frightful new 24th.
2 An aspiring model, Jesse, is new to Los Angeles. However, her beauty and youth, which generate intense fascination and jealousy within the fashion industry, may prove themselves sinister.
  Runtime  Genre Rating             Director       Actors Metascore  Votes
1     117 Horror    7.3   M. Night Shyamalan James McAvoy        62 512657
2     117 Horror    6.1 Nicolas Winding Refn Elle Fanning        51  98808
  Gross_Earning_in_Mil
1               138.29
2                 1.33
horror_count1 <- mutate(horror_count, a_g_e_horror = (sum(horror_count[, 'Gross_Earning_in_Mil'], na.rm = TRUE)) / 2)
bio_count <- filter(gross_earnings1, Genre == "Biography")
print(bio_count)
  Rank                Title
1   16          The Founder
2   41                 Lion
3   49               Maudie
4   60             War Dogs
5   90 Miracles from Heaven
                                                                                                                                                                                                            Description
1 The story of Ray Kroc, a salesman who turned two brothers' innovative fast food eatery, McDonald's, into the biggest restaurant business in the world, with a combination of ambition, persistence, and ruthlessness.
2                                               A five-year-old Indian boy is adopted by an Australian couple after getting lost hundreds of kilometers from home. 25 years later, he sets out to find his lost family.
3                                                               An arthritic Nova Scotia woman works as a housekeeper while she hones her skills as an artist and eventually becomes a beloved figure in the community.
4                        Loosely based on the true story of two young men, David Packouz and Efraim Diveroli, who won a three hundred million dollar contract from the Pentagon to arm America's allies in Afghanistan.
5                                                                                                                                                                Based on the incredible true story of the Beam family.
  Runtime     Genre Rating         Director          Actors Metascore  Votes
1     115 Biography    7.2 John Lee Hancock  Michael Keaton        66 157975
2     118 Biography    8.0      Garth Davis       Dev Patel        69 241265
3     115 Biography    7.6    Aisling Walsh   Sally Hawkins        65  19490
4     114 Biography    7.1    Todd Phillips      Jonah Hill        57 228673
5     109 Biography    7.1  Patricia Riggen Jennifer Garner        44  26648
  Gross_Earning_in_Mil
1                12.79
2                51.74
3                 6.17
4                43.03
5                61.71
bio_count1 <- mutate(bio_count, a_g_e_bio = (sum(bio_count[, 'Gross_Earning_in_Mil'], na.rm = TRUE)) / 5)
crime_count <- filter(gross_earnings1, Genre == "Crime")
print(crime_count)
  Rank                 Title
1   31    Hell or High Water
2   52   The Invisible Guest
3   59 The Girl on the Train
                                                                                                                                      Description
1                A divorced father and his ex-con older brother resort to a desperate scheme in order to save their family's ranch in West Texas.
2 A successful entrepreneur accused of murder and a witness preparation expert have less than three hours to come up with an impregnable defense.
3                           A divorcee becomes entangled in a missing persons investigation that promises to send shockwaves throughout her life.
  Runtime Genre Rating        Director      Actors Metascore  Votes
1     102 Crime    7.6 David Mackenzie  Chris Pine        88 237103
2     106 Crime    8.0     Oriol Paulo Mario Casas        NA 181191
3     112 Crime    6.5     Tate Taylor Emily Blunt        48 191474
  Gross_Earning_in_Mil
1                26.86
2                   NA
3                75.40
crime_count1 <- mutate(crime_count, a_g_e_crime = (sum(crime_count[, 'Gross_Earning_in_Mil'], na.rm = TRUE)) / 3)
adv_count <- filter(gross_earnings1, Genre == "Adventure")
print(adv_count)
  Rank                           Title
1   51                 The Jungle Book
2   98 Alice Through the Looking Glass
3   99       Hunt for the Wilderpeople
                                                                                                                                                                                                                                      Description
1                                           After a threat from the tiger Shere Khan forces him to flee the jungle, a man-cub named Mowgli embarks on a journey of self discovery with the help of panther Bagheera and free-spirited bear Baloo.
2 Alice is appointed to save her beloved Mad Hatter from deadly grief by travelling back to the past, but this means fatally harming Time himself, the noble clockwork man with the device needed to save the Hatter's family from the Red Queen.
3                                                                                                                            A national manhunt is ordered for a rebellious kid and his foster uncle who go missing in the wild New Zealand bush.
  Runtime     Genre Rating      Director         Actors Metascore  Votes
1     106 Adventure    7.4   Jon Favreau     Neel Sethi        77 281781
2     113 Adventure    6.2   James Bobin Mia Wasikowska        34 113439
3     101 Adventure    7.8 Taika Waititi      Sam Neill        81 134051
  Gross_Earning_in_Mil
1               364.00
2                77.04
3                 5.20
adv_count1 <- mutate(adv_count, a_g_e_adv = (sum(adv_count[, 'Gross_Earning_in_Mil'], na.rm = TRUE)) / 3)
comedy_count <- filter(gross_earnings1, Genre == "Comedy")
print(comedy_count)
  Rank                 Title
1   46     Captain Fantastic
2   68         Dirty Grandpa
3   76 The Edge of Seventeen
4   96              Why Him?
5   97              Bad Moms
                                                                                                                                                                                                                                       Description
1 In the forests of the Pacific Northwest, a father devoted to raising his six kids with a rigorous physical and intellectual education is forced to leave his paradise and enter the world, challenging his idea of what it means to be a parent.
2                                                                                       Right before his wedding, an uptight guy is tricked into driving his grandfather, a lecherous former Army Lieutenant Colonel, to Florida for Spring Break.
3                                                                                                                             High-school life gets even more unbearable for Nadine when her best friend, Krista, starts dating her older brother.
4                                                                               A holiday gathering threatens to go off the rails when Ned Fleming realizes that his daughter's Silicon Valley millionaire boyfriend is about to pop the question.
5                                             When three overworked and under-appreciated moms are pushed beyond their limits, they ditch their conventional responsibilities for a jolt of long overdue freedom, fun and comedic self-indulgence.
  Runtime  Genre Rating           Director           Actors Metascore  Votes
1     118 Comedy    7.8          Matt Ross  Viggo Mortensen        72 223708
2     102 Comedy    5.9          Dan Mazer   Robert De Niro        21 126942
3     104 Comedy    7.3 Kelly Fremon Craig Hailee Steinfeld        77 131460
4     111 Comedy    6.2       John Hamburg      Zoey Deutch        39 117176
5     100 Comedy    6.2          Jon Lucas       Mila Kunis        60 128981
  Gross_Earning_in_Mil
1                 5.88
2                35.59
3                14.43
4                60.32
5               113.26
comedy_count1 <- mutate(comedy_count, a_g_e_comedy = (sum(comedy_count[, 'Gross_Earning_in_Mil'], na.rm = TRUE)) / 5)

Now I will print all of the averages to see which one is higher.

action_count1[1 , c("a_g_e_action")]
[1] 147.84
drama_count1[1 , c("a_g_e_drama")] 
[1] 52.48571
anim_count1[1 , c("a_g_e_anim")] 
[1] 216.3625
horror_count1[1 , c("a_g_e_horror")] 
[1] 69.81
bio_count1[1 , c("a_g_e_bio")] 
[1] 35.088
crime_count1[1 , c("a_g_e_crime")]  
[1] 34.08667
adv_count1[1 , c("a_g_e_adv")] 
[1] 148.7467
comedy_count1[1 , c("a_g_e_comedy")] 
[1] 45.896

Answer to question 3:

The Animated genre has the highest average gross earnings in the runtime of 100 to 120 minutes. With four movies in the dataset, three of them are in the top 5 movies with highest gross earnings in the dataset.