Justin Kaplan

Assignment 3

movies <- read_csv(“https://gist.githubusercontent.com/tiangechen/b68782efa49a16edaf07dc2cdaa855ea/raw/0c794a9717f18b094eabab2cd6a6b9a226903577/movies.csv”)

1. Rename(): (4 points)

Rename the “Film” column to “movie_title” and “Year” to “release_year”.

rename <- movies %>%
rename(movie_title = Film, release_year = Year)

2. select(): (4 points)

Create a new dataframe with only the columns: movie_title, release_year, Genre, Profitability,

select <- rename %>% select(movie_title, release_year, Genre)

3. filter(): (4 points)

Filter the dataset to include only movies released after 2000 with a Rotten Tomatoes % higher than 80.

filter <- rename %>%
filter(release_year > 2000 & Rotten Tomatoes % > 80)

4. mutate(): (4 points)

Add a new column called “Profitability_millions” that converts the Profitability to millions of dollars.

profitability <- rename %>% mutate(profitability_millions = Profitability * 1000000)

5. arrange(): (3 points)

Sort the filtered dataset by Rotten Tomatoes % in descending order, and then by Profitability in descending order. five <- four %>% arrange(desc(Rotten Tomatoes %) , desc(Profitability_millions))

arrange <- profitability %>%
arrange(desc(Rotten Tomatoes %), profitability_millions )

6. Combining functions: (3 points)

Use the pipe operator (%>%) to chain these operations together, starting with the original dataset and ending with a final dataframe that incorporates all the above transformations.

rename2 <- movies %>%
rename(movie_title = Film, release_year = Year) select2 <- rename2 %>% select(movie_title, release_year, Genre) filter2 <- rename2 %>%
filter(release_year > 2000 & Rotten Tomatoes % > 80) profitability2 <- rename2 %>% mutate(profitability_millions = Profitability * 1000000) arrange2 <- profitability2 %>%
arrange(desc(Rotten Tomatoes %), profitability_millions )

7. Interpret question 6 (1 point)

The resulting data shows that the best movies are not always the most popular.This is shown by the contrasting results between profitability and audience score.