Term Project: Draft

Import data
Explain data
Visualize data
Correlation and regression analysis
Share interesting stories you found from the data
List names of all group members (both first and last name) at the top of the webpage.
Use the correct slug.

library(tidyverse)
horror_movies <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-22/horror_movies.csv") %>%
  # Cast column has multiple actors: separate them into multiple rows
  separate_rows(cast, sep = "\\|") %>%
  # Remove white spaces
  mutate(cast = str_trim(cast)) %>%
  # lump together least common factor levels
  mutate(cast = fct_lump(cast, 10)) %>%
  # filter out "Other"
  filter(cast != "Other")

# It appears that actors make difference in review ratings. For example, Kauffman's movies tend to be rated better than Roberts' movies. But do they always? How about the outlier? One movie of Roberts actually rated as high as any of Kauffman's.

horror_movies %>%
  mutate(cast = fct_reorder(cast, review_rating, na.rm = TRUE)) %>%
  ggplot(aes(cast, review_rating, fill = cast)) +
  geom_boxplot() +
  coord_flip() +
  theme(legend.position = "none") +
  labs(title = "Horror Movie Review Ratings by Top 10 Most Common Actors",
       y = "Horror Movie Review Ratings",
       x = NULL)


# The intercept represents predicted rating for movies in which Bill Moseley starred
horror_movies %>%
  lm(review_rating ~ cast, data = .) %>%
  summary()
## 
## Call:
## lm(formula = review_rating ~ cast, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2640 -0.9720 -0.1000  0.8777  5.0000 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4.673333   0.364202  12.832   <2e-16 ***
## castBill Oberst Jr.  0.005238   0.524176   0.010   0.9920    
## castBrinke Stevens  -0.173333   0.499681  -0.347   0.7291    
## castDebbie Rochon    0.057101   0.468134   0.122   0.9031    
## castElissa Dowling   0.140000   0.515059   0.272   0.7861    
## castEric Roberts    -1.109333   0.460683  -2.408   0.0171 *  
## castKane Hodder      0.306667   0.515059   0.595   0.5524    
## castLloyd Kaufman    1.090667   0.460683   2.367   0.0190 *  
## castMaria Olsen     -0.345556   0.493132  -0.701   0.4844    
## castSuzi Lorraine   -0.242083   0.506948  -0.478   0.6336    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.411 on 173 degrees of freedom
##   (21 observations deleted due to missingness)
## Multiple R-squared:  0.1603, Adjusted R-squared:  0.1166 
## F-statistic:  3.67 on 9 and 173 DF,  p-value: 0.0003167

Import data

Hint: You can choose any data you like but can’t take one that is already taken by other groups.

library(tidyverse)
horror_movies <- read_csv("https://github.com/rfordatascience/tidytuesday/raw/master/data/2018/2018-10-23/movie_profit.csv")

Explain data

Hint: Source and description of data, and definition of variables. Data shows the release date, movie name, production budget, domestic gross profit, worldwide profits, distributer, Movie ratings, and genre of a particular movie. The data is used to show analyzing movie profits.

Visualize data

Hint: Create at least two plots.

ggplot(horror_movies, 
       aes(x = mpaa_rating, 
           y = worldwide_gross)) +
  geom_boxplot() +
  labs(title = "Salary distribution by movie rates")

ggplot(horror_movies, 
       aes(x = genre, 
           y = production_budget)) +
  geom_point() +
  labs(title = "Salary distribution by Genre")

Correlation and regression analysis

The first plot is a box plot that shows the worldwide profits for ratings of movies. The plot shows that movies rated pg and pg-13 tend to make the most profits out of the 5 ratings. There isn’t enough g rated to actually compare same with N/A. R rated movies make profits but not as much as pg and pg-13.

The second plot is a scatter plot that shows the worldwide profits for genre of movie. The plot shows that action and adventure generate the most revenue out of the 5 genres. Comedy and horror are very even and share the lowest revenue. Drama is in between.

Term Project: Draft

Jack Baumann, Joseph Panetta, Ben Hart

Import data

Explain data

Visualize data

Correlation and regression analysis

Share interesting stories you found from the data

List names of all group members (both first and last name) at the top of the webpage.

Use the correct slug.