For this WPA, we will analyze data from 20 randomly selected movies from a movies dataset. We will focus on 5 vectors
Here are the vectors of data. Copy and paste this chunk into your R document before starting the questions.
m.names <- c("Baramgwa hamjje sarajida", "Sleepless in Seattle", "The Water Diviner",
"Fly Away Home", "The Three Musketeers", "Candyman: Farewell to Flesh",
"Honey I Blew Up the Kid", "Kingsman: The Secret Service", "Ajab Prem Ki Ghazab Kahani",
"A Bug's Life", "Courage Under Fire", "Dirty Pretty Things",
"In the Name of the Father", "Soul Plane", "Magnum Force", "About Time",
"House of Sand and Fog", "Bokura ga ita Zenpen", "Jackass 3D",
"Tropic Thunder - A Pirate's Tale")
boxoffice <- c(28686545, 218076024, 30864649, 35870837, 50375628, 13899536,
58662452, 404561724, 15906411, 363089431, 100748262, 14156753,
25096862, 14553807, 44680473, 89177486, 16157923, 26324268, 171685793,
191091250)
genre <- c("Action", "Romantic Comedy", "Drama", "Drama", "Adventure",
"Horror", "Comedy", "Action", "Comedy", "Adventure", "Drama",
"Drama", "Drama", "Comedy", "Action", "Romantic Comedy", "Drama",
"Drama", "Comedy", "Comedy")
time <- c(121, 100, 112, NA, NA, NA, NA, 129, NA, 96, 111, NA, NA, NA,
NA, 123, NA, 121, 93, 106)
rating <- c(NA, "PG", "R", "PG", "PG", "R", "PG", "R", NA, "G", "R",
"R", "R", "R", NA, "R", "R", NA, "R", "R")
Question 1 What is the name of the 10th movie in the list?
Question 2 A) What are the genres of the first 5 movies in the list? B) What were the running times of those movies?
Question 3 What is the name of every second movie in the m.names vector?
Question 4 Some joker changed the name of the movie “Tropic Thunder” to “Tropic Thunder - A Pirate’s Tale”. Using logical indexing, correct the name of this movie in the m.names vector
Question 5 Change the genre names “Romantic Comedy” to “RomCom”. Change the genre name “Horror” to “Scary movie!!!”
Question 6 Create a new vector called “boxoffice.millions” that has the box-office values in millions of dollars. For example, a value of 1000000 in the original boxoffice vector should be 1 in boxoffice.millions
Question 7 What is the mean, median, and standard deviation of the box-office totals of all movies?
Question 8 What where the different movie genres and how many movies are there of each genre? (hint: use table())
Question 9 How many movies were Dramas? (hint: don’t use table(), use sum())
Question 10 What was the box-office total, genre, running time, and rating of “A Bug’s Life”? (use indexing, don’t look up the values visually!)
Question 11 Is the movie “Pirate’s of the Caribbean” in the list? (hint: use a combination of logical indexing and the sum() function)
Question 12 A) What were the names of the Comedy movies? B) What was the mean box office revenue of those movies?
Question 13 What were the names of those movies that made at least $50 Million dollars?
Question 14 A) Out of all the movies that were either Comedies or Dramas, what was the smallest box office revenue? (Hint: use the %in% function.) B) What was the name of that movie? (Hint: Use logical indexing based on what you found in part A).
Question 15 A) What was the median movie time in minutes? B) What was the median movie time in hours?
Question 16: A) What were the names of the movies with an R rating? B) What was the mean boxoffice revnue of those movies?
Question 17: A) What were the names of the movies with either a G or a PG rating? B) What was the mean running time of those movies in minutes?
Question 18 A) What percent of movies were Dramas? B) What percent of the movies were over 100 minutes long?
Question 19 What were the names of the movies that made less than $30 Million dollars AND were Comedies?
Question 20 Look at the help menu for the order() function. Using this function, create a vector called “boxoffice.order” That indicates what the boxoffice rank of each movie is.
Question 21 What were the names of the top five highest grossing movies?
Question 22 What percent of the top 10 highest grossing movies were Comedies?
Question 23 Oops, it turns out there was another error in the dataset. No big deal. But all the boxoffice totals were wrong. The true boxoffice totals are below. Repeat all of your analyses using this new data. (Hint: Don’t re-type everything! Just copy and paste!!)
boxoffice.true <- c(43101797, 57312618, 45018557, 43176378, 49664862, 54979733,
52483124, 61740438, 57291732, 53867028, 48131527, 42715635, 48903075,
46933555, 45086429, 43385326, 50106648, 44155883, 45193453, 39162775
)
Question 24 Using the boxoffice.order vector you created before, sort the original vectors so that the data are in order of boxoffice revenue. Assign these vectors to new vectors with the subscript .o (e.g.; m.names.o, boxoffice.o, …)