For this WPA, we will analyze data from 20 randomly selected movies from a movies dataset. We will focus on 5 vectors
m.names: The names of the movies boxoffice: The worldwide boxoffice revenue genre: The movie genre time: The running length (in minutes) rating: The (American) MPAA rating. Here are the vectors of data. Copy and paste this chunk into your R document before starting the questions.
m.names <- c("Baramgwa hamjje sarajida", "Sleepless in Seattle", "The Water Diviner",
"Fly Away Home", "The Three Musketeers", "Candyman: Farewell to Flesh",
"Honey I Blew Up the Kid", "Kingsman: The Secret Service", "Ajab Prem Ki Ghazab Kahani",
"A Bug's Life", "Courage Under Fire", "Dirty Pretty Things",
"In the Name of the Father", "Soul Plane", "Magnum Force", "About Time",
"House of Sand and Fog", "Bokura ga ita Zenpen", "Jackass 3D",
"Tropic Thunder - A Pirate's Tale")
boxoffice <- c(28686545, 218076024, 30864649, 35870837, 50375628, 13899536,
58662452, 404561724, 15906411, 363089431, 100748262, 14156753,
25096862, 14553807, 44680473, 89177486, 16157923, 26324268, 171685793,
191091250)
genre <- c("Action", "Romantic Comedy", "Drama", "Drama", "Adventure",
"Horror", "Comedy", "Action", "Comedy", "Adventure", "Drama",
"Drama", "Drama", "Comedy", "Action", "Romantic Comedy", "Drama",
"Drama", "Comedy", "Comedy")
time <- c(121, 100, 112, NA, NA, NA, NA, 129, NA, 96, 111, NA, NA, NA,
NA, 123, NA, 121, 93, 106)
rating <- c(NA, "PG", "R", "PG", "PG", "R", "PG", "R", NA, "G", "R",
"R", "R", "R", NA, "R", "R", NA, "R", "R")
Question 1 What is the name of the 10th movie in the list?
m.names[10]
## [1] "A Bug's Life"
Question 2 A) What are the genres of the first 5 movies in the list? B) What were the running times of those movies?
genre[1:5]
## [1] "Action" "Romantic Comedy" "Drama" "Drama"
## [5] "Adventure"
time[1:5]
## [1] 121 100 112 NA NA
Question 3 What is the name of every second movie in the m.names vector?
m.names[seq(2,length(m.names),2)]
## [1] "Sleepless in Seattle" "Fly Away Home"
## [3] "Candyman: Farewell to Flesh" "Kingsman: The Secret Service"
## [5] "A Bug's Life" "Dirty Pretty Things"
## [7] "Soul Plane" "About Time"
## [9] "Bokura ga ita Zenpen" "Tropic Thunder - A Pirate's Tale"
Question 4 Some joker changed the name of the movie “Tropic Thunder” to “Tropic Thunder - A Pirate’s Tale”. Using logical indexing, correct the name of this movie in the m.names vector
m.names[length(m.names)] <- "Tropic Thunder"
#or
m.names[m.names == "Tropic Thunder - A Pirate's Tale"] <- "Tropic Thunder"
Question 5 Change the genre names “Romantic Comedy” to “RomCom”. Change the genre name “Horror” to “Scary movie!!!”
genre[genre == "Romantic Comedy"] <- "RomCom"
genre[genre == "Horror"] <- "Scary movie!!!"
Question 6 Create a new vector called “boxoffice.millions” that has the box-office values in millions of dollars. For example, a value of 1000000 in the original boxoffice vector should be 1 in boxoffice.millions
boxoffice.millions <- boxoffice / 1000000
Question 7 What is the mean, median, and standard deviation of the box-office totals of all movies?
mean(boxoffice)
## [1] 95683306
sd(boxoffice)
## [1] 116422556
Question 8 What where the different movie genres and how many movies are there of each genre? (hint: use table())
table(genre)
## genre
## Action Adventure Comedy Drama RomCom
## 3 2 5 7 2
## Scary movie!!!
## 1
Question 9 How many movies were Dramas? (hint: don’t use table(), use sum())
sum(genre == "Drama")
## [1] 7
Question 10 What was the box-office total, genre, running time, and rating of “A Bug’s Life”? (use indexing, don’t look up the values visually!)
boxoffice[m.names == "A Bug's Life"]
## [1] 363089431
genre[m.names == "A Bug's Life"]
## [1] "Adventure"
time[m.names == "A Bug's Life"]
## [1] 96
rating[m.names == "A Bug's Life"]
## [1] "G"
Question 11 Is the movie “Pirate’s of the Caribbean” in the list? (hint: use a combination of logical indexing and the sum() function)
sum(m.names == "Pirate’s of the Caribbean")
## [1] 0
Question 12 A) What were the names of the Comedy movies? B) What was the mean box office revenue of those movies?
comedy.movies <- m.names[genre == "Comedy"]
mean(boxoffice[genre == "Comedy"])
## [1] 90379943
#or
mean(boxoffice[m.names %in% comedy.movies])
## [1] 90379943
Question 13 What were the names of those movies that made at least $50 Million dollars?
m.names[boxoffice.millions >= 50]
## [1] "Sleepless in Seattle" "The Three Musketeers"
## [3] "Honey I Blew Up the Kid" "Kingsman: The Secret Service"
## [5] "A Bug's Life" "Courage Under Fire"
## [7] "About Time" "Jackass 3D"
## [9] "Tropic Thunder"
Question 14 A) Out of all the movies that were either Comedies or Dramas, what was the smallest box office revenue? (Hint: use the %in% function.) B) What was the name of that movie? (Hint: Use logical indexing based on what you found in part A).
min.rev <- min(boxoffice[genre %in% c("Comedy","Drama")])
m.names[boxoffice == min.rev]
## [1] "Dirty Pretty Things"
Question 15 A) What was the median movie time in minutes? B) What was the median movie time in hours?
median(time, na.rm = TRUE)
## [1] 111.5
Question 16: A) What were the names of the movies with an R rating? B) What was the mean boxoffice revnue of those movies?
r <- m.names[rating == "R"]
mean(boxoffice[m.names %in% r])
## [1] 97454004
Question 17: A) What were the names of the movies with either a G or a PG rating? B) What was the mean running time of those movies in minutes?
gpg <- m.names[rating %in% c("G","PG")]
mean(time[m.names %in% gpg], na.rm = TRUE)
## [1] 98
Question 18 A) What percent of movies were Dramas? B) What percent of the movies were over 100 minutes long?
mean(genre == "Drama")
## [1] 0.35
mean(time > 100, na.rm = TRUE)
## [1] 0.7
Question 19 What were the names of the movies that made less than $30 Million dollars AND were Comedies?
m.names[(boxoffice.millions < 30) & (genre == "Comedy")]
## [1] "Ajab Prem Ki Ghazab Kahani" "Soul Plane"
Question 20 Look at the help menu for the order() function. Using this function, create a vector called “boxoffice.order” That indicates what the boxoffice rank of each movie is.
boxoffice.order <- m.names[order(boxoffice)]
help(order)
Question 21 What were the names of the top five highest grossing movies?
boxoffice.order[1:5]
## [1] "Candyman: Farewell to Flesh" "Dirty Pretty Things"
## [3] "Soul Plane" "Ajab Prem Ki Ghazab Kahani"
## [5] "House of Sand and Fog"
Question 22 What percent of the top 10 highest grossing movies were Comedies?
names <- m.names[m.names %in% boxoffice.order[1:10]]
Question 23 Oops, it turns out there was another error in the dataset. No big deal. But all the boxoffice totals were wrong. The true boxoffice totals are below. Repeat all of your analyses using this new data. (Hint: Don’t re-type everything! Just copy and paste!!)
Question 24 Using the boxoffice.order vector you created before, sort the original vectors so that the data are in order of boxoffice revenue. Assign these vectors to new vectors with the subscript .o (e.g.; m.names.o, boxoffice.o, …)