This dataset was share by Deepa Sharma for Superhero TV shows

  1. Data: This is a Kaggle data from superhero TV shows

https://www.kaggle.com/anoopkumarraut/superhero-tv-shows/data

  1. Possible Analysis:

Import Library

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readr)
library(curl)
## Warning: package 'curl' was built under R version 4.1.3
## Using libcurl 7.64.1 with Schannel
## 
## Attaching package: 'curl'
## The following object is masked from 'package:readr':
## 
##     parse_date
##install.packages("curl")
library(ggplot2)
##install.packages("ggmap")
library(dplyr)
library(stringr)
library("magrittr")
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract

Load data from Github (csv) into R dataframe

df<- read.csv("https://raw.githubusercontent.com/deepasharma06/Data-607/main/Dataset_Superhero-TV-Shows.csv")
head(df)
##                  show_title imdb_rating release_year runtime
## 1                Peacemaker         8.5       2022-       40
## 2 The Legend of Vox Machina         8.6       2022-       30
## 3                 Daredevil         8.6    2015-2018      54
## 4                  The Boys         8.7       2019-       60
## 5              Raising Dion         7.2       2019-       50
## 6                    Titans         7.6       2018-       45
##                          genre parental_guideline imdb_votes
## 1    Action, Adventure, Comedy              TV-MA     60,116
## 2 Animation, Action, Adventure              TV-MA     13,128
## 3         Action, Crime, Drama              TV-MA   4,10,433
## 4         Action, Crime, Drama              TV-MA   3,47,831
## 5                Drama, Sci-Fi               TV-G     13,375
## 6     Action, Adventure, Crime              TV-MA     93,828
##                                                                                                                                                                                synopsis
## 1 Picking up where The Suicide Squad (2021) left off, Peacemaker returns home after recovering from his encounter with Bloodsport - only to discover that his freedom comes at a price.
## 2                                    In a desperate attempt to pay off a mounting bar tab, a band of misfits end up on a quest to save the realm of Exandria from dark, magical forces.
## 3                                                                                    A blind lawyer by day, vigilante by night. Matt Murdock fights the crime of New York as Daredevil.
## 4                                                                                           A group of vigilantes set out to take down corrupt superheroes who abuse their superpowers.
## 5                                                         A widowed single mom discovers that her son has super powers and tries to figure out how to raise him safely and responsibly.
## 6                                                                                                                             A team of young superheroes combat evil and other perils.

This is to pull the columns needed for analysis and remove unnecessary columns

library(tidyr)
df<- df[, c("show_title", "imdb_rating", "release_year", "genre")]
head(df)
##                  show_title imdb_rating release_year
## 1                Peacemaker         8.5       2022- 
## 2 The Legend of Vox Machina         8.6       2022- 
## 3                 Daredevil         8.6    2015-2018
## 4                  The Boys         8.7       2019- 
## 5              Raising Dion         7.2       2019- 
## 6                    Titans         7.6       2018- 
##                          genre
## 1    Action, Adventure, Comedy
## 2 Animation, Action, Adventure
## 3         Action, Crime, Drama
## 4         Action, Crime, Drama
## 5                Drama, Sci-Fi
## 6     Action, Adventure, Crime

This is to truncate the first 4 characters of release_year to get the year instead of a range

df$release_year <- substr(df$release_year,1,4)
head(df)
##                  show_title imdb_rating release_year
## 1                Peacemaker         8.5         2022
## 2 The Legend of Vox Machina         8.6         2022
## 3                 Daredevil         8.6         2015
## 4                  The Boys         8.7         2019
## 5              Raising Dion         7.2         2019
## 6                    Titans         7.6         2018
##                          genre
## 1    Action, Adventure, Comedy
## 2 Animation, Action, Adventure
## 3         Action, Crime, Drama
## 4         Action, Crime, Drama
## 5                Drama, Sci-Fi
## 6     Action, Adventure, Crime

This code is to replace any rows in imdb_rating column with a value of ‘Not_Rated’ with NA

df$imdb_rating[df$imdb_rating == 'Not-Rated'] = NA
head(df)
##                  show_title imdb_rating release_year
## 1                Peacemaker         8.5         2022
## 2 The Legend of Vox Machina         8.6         2022
## 3                 Daredevil         8.6         2015
## 4                  The Boys         8.7         2019
## 5              Raising Dion         7.2         2019
## 6                    Titans         7.6         2018
##                          genre
## 1    Action, Adventure, Comedy
## 2 Animation, Action, Adventure
## 3         Action, Crime, Drama
## 4         Action, Crime, Drama
## 5                Drama, Sci-Fi
## 6     Action, Adventure, Crime

This code is to replace any rows in imdb_rating column with a value of Null with NA

df$imdb_rating[df$imdb_rating == ''] = NA
head(df)
##                  show_title imdb_rating release_year
## 1                Peacemaker         8.5         2022
## 2 The Legend of Vox Machina         8.6         2022
## 3                 Daredevil         8.6         2015
## 4                  The Boys         8.7         2019
## 5              Raising Dion         7.2         2019
## 6                    Titans         7.6         2018
##                          genre
## 1    Action, Adventure, Comedy
## 2 Animation, Action, Adventure
## 3         Action, Crime, Drama
## 4         Action, Crime, Drama
## 5                Drama, Sci-Fi
## 6     Action, Adventure, Crime

This code is to replace any rows in release_year column with a value of ‘TBA’ with NA

df$release_year[df$release_year == 'TBA'] = NA
head(df)
##                  show_title imdb_rating release_year
## 1                Peacemaker         8.5         2022
## 2 The Legend of Vox Machina         8.6         2022
## 3                 Daredevil         8.6         2015
## 4                  The Boys         8.7         2019
## 5              Raising Dion         7.2         2019
## 6                    Titans         7.6         2018
##                          genre
## 1    Action, Adventure, Comedy
## 2 Animation, Action, Adventure
## 3         Action, Crime, Drama
## 4         Action, Crime, Drama
## 5                Drama, Sci-Fi
## 6     Action, Adventure, Crime

This code is to replace all rows with NA

df <- na.omit(df) 
head(df)
##                  show_title imdb_rating release_year
## 1                Peacemaker         8.5         2022
## 2 The Legend of Vox Machina         8.6         2022
## 3                 Daredevil         8.6         2015
## 4                  The Boys         8.7         2019
## 5              Raising Dion         7.2         2019
## 6                    Titans         7.6         2018
##                          genre
## 1    Action, Adventure, Comedy
## 2 Animation, Action, Adventure
## 3         Action, Crime, Drama
## 4         Action, Crime, Drama
## 5                Drama, Sci-Fi
## 6     Action, Adventure, Crime

This code is to reorder the data in the decreasing order of imdb_rating column

df1 <- df[order(df$imdb_rating, decreasing = TRUE), ] 
head(df1)
##                          show_title imdb_rating release_year
## 23       Avatar: The Last Airbender         9.3         2005
## 36 Fullmetal Alchemist: Brotherhood         9.1         2009
## 51      Batman: The Animated Series           9         1992
## 53                     Cowboy Bebop         8.9         1998
## 4                          The Boys         8.7         2019
## 17                       Invincible         8.7         2021
##                           genre
## 23 Animation, Action, Adventure
## 36 Animation, Action, Adventure
## 51 Animation, Action, Adventure
## 53 Animation, Action, Adventure
## 4          Action, Crime, Drama
## 17 Animation, Action, Adventure

We can see from the above table that the TV show ‘Avatar: The Last Airbender’ is the highest rated show of all times with a rating of 9.3

The following code is to reorder the data in the decreasing order of imdb_rating column but also grouping the data by

df2 <- df %>%
  arrange(desc(imdb_rating)) %>%
  group_by(release_year) %>%
  slice(1:1) %>%
  arrange(desc(release_year))
head(df2)
## # A tibble: 6 x 4
## # Groups:   release_year [6]
##   show_title                imdb_rating release_year genre                      
##   <chr>                     <chr>       <chr>        <chr>                      
## 1 The Legend of Vox Machina 8.6         2022         Animation, Action, Adventu~
## 2 Invincible                8.7         2021         Animation, Action, Adventu~
## 3 Mashin Sentai Kiramager   8           2020         Action, Adventure, Comedy  
## 4 The Boys                  8.7         2019         Action, Crime, Drama       
## 5 Cinema Club               8.3         2018         Talk-Show                  
## 6 The Punisher              8.5         2017         Action, Crime, Drama

The above table shows the highest rated TV show for each year.

Conlcusion:

Based on the analysis of data we have the following conclusion: ‘Avatar: The Last Airbender’ is the highest rated show of all times with a rating of 9.3 The top rated shows for each year is available in the table above.

Reference:

YouTube. (2021, August 6). R select top n highest values by group (example) | extract head | reduce, rbind, dplyr & data.table. YouTube. Retrieved March 13, 2022, from https://www.youtube.com/watch?v=Vhb7cvfRB5k