The movies data set has 44010 rows about the amount of explicit content (drugs, language, sex, nudity, and violence) found in 1467 movies released since 1958. Each movie is represented by 30 rows (1 row = movie & tag_name type combo).
The relevant variables in the data set are:
## # A tibble: 44,010 × 12
## imdb_id name title_main title_subscript year rating run_time studio
## <chr> <chr> <chr> <chr> <int> <chr> <int> <chr>
## 1 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## 2 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## 3 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## 4 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## 5 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## 6 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## 7 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## 8 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## 9 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## 10 tt0052357 Vertigo Vertigo "" 1958 PG 7680 Universal
## # ℹ 44,000 more rows
## # ℹ 4 more variables: category <chr>, tag_name <chr>, occurrences <int>,
## # occur_duration <int>
Create a data set named movies2 that has the following rows:
Additionally, movies2 should only have the imdb_ib, name,
year, rating, run_time, studio, category, tag_name, occurrences, and
occur_duration columns. Display the movies2 dataset using the
tibble()
function.
## # A tibble: 17,124 × 10
## imdb_id name year rating run_time studio category tag_name occurrences
## <chr> <chr> <int> <chr> <int> <chr> <chr> <chr> <int>
## 1 tt0087231 The Fal… 1985 R 7920 Orion… violence disturb… 0
## 2 tt0087231 The Fal… 1985 R 7920 Orion… language sexual_… 3
## 3 tt0087231 The Fal… 1985 R 7920 Orion… violence graphic 1
## 4 tt0087231 The Fal… 1985 R 7920 Orion… immodes… nudity_… 17
## 5 tt0087231 The Fal… 1985 R 7920 Orion… language profani… 42
## 6 tt0087231 The Fal… 1985 R 7920 Orion… immodes… nudity_… 1
## 7 tt0087231 The Fal… 1985 R 7920 Orion… language racial_… 1
## 8 tt0087231 The Fal… 1985 R 7920 Orion… sexual sexual_… 0
## 9 tt0087231 The Fal… 1985 R 7920 Orion… drugs drugs_i… 9
## 10 tt0087231 The Fal… 1985 R 7920 Orion… violence gore 0
## # ℹ 17,114 more rows
## # ℹ 1 more variable: occur_duration <int>
If you’re unable to complete question 1, you can use the “movies q2.csv” data set in Brightspace.
Change the run_time and category columns in the movies2 data set as following:
Make sure to use the appropriate dplyr verb(s)!
Show the 10 rows with the most occurrences. Just display the name of the movie, run_time, category, tag_name, and occurrences (the movies2 data set should still have all 10 columns)
## name run_time category tag_name occurrences
## 1 Uncut Gems 136 language profanity 883
## 2 The Wolf of Wall Street 180 language profanity 743
## 3 This Is the End 107 language profanity 586
## 4 Casino 179 language profanity 541
## 5 End of Watch 109 language profanity 502
## 6 8 Mile 111 language profanity 479
## 7 They Cloned Tyrone 122 language profanity 478
## 8 Cherry 140 language profanity 461
## 9 Reservoir Dogs 99 language profanity 457
## 10 Malcolm & Marie 106 language profanity 455
What tag do all 10 movies with the most occurrences have?
The 10 tags with the most occurrences are all profanity
If you were unable to complete question 3, you can use the “movies q4.csv” data set for this question
Using the movies_summary data set the the relevant dplyr verbs, create a graph that has the categories on the y-axis and the average number of occurrences on the x-axis, represented by a bar. See the graph in Brightspace!