library(tidyverse)
horror_movies <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-22/horror_movies.csv") %>%
  # Cast column has multiple actors: separate them into multiple rows
  separate_rows(cast, sep = "\\|") %>%
  # Remove white spaces
  mutate(cast = str_trim(cast)) %>%
  # lump together least common factor levels
  mutate(cast = fct_lump(cast, 10)) %>%
  # filter out "Other"
  filter(cast != "Other")
# It appears that actors make difference in review ratings. For example, Kauffman's movies tend to be rated better than Roberts' movies. But do they always? How about the outlier? One movie of Roberts actually rated as high as any of Kauffman's.

horror_movies %>%
  mutate(cast = fct_reorder(cast, review_rating, na.rm = TRUE)) %>%
  ggplot(aes(cast, review_rating, fill = cast)) +
  geom_boxplot() +
  coord_flip() +
  theme(legend.position = "none") +
  labs(title = "Horror Movie Review Ratings by Top 10 Most Common Actors",
       y = "Horror Movie Review Ratings",
       x = NULL)


# The intercept represents predicted rating for movies in which Bill Moseley starred
horror_movies %>%
  lm(review_rating ~ cast, data = .) %>%
  summary()
## 
## Call:
## lm(formula = review_rating ~ cast, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2640 -0.9720 -0.1000  0.8777  5.0000 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4.673333   0.364202  12.832   <2e-16 ***
## castBill Oberst Jr.  0.005238   0.524176   0.010   0.9920    
## castBrinke Stevens  -0.173333   0.499681  -0.347   0.7291    
## castDebbie Rochon    0.057101   0.468134   0.122   0.9031    
## castElissa Dowling   0.140000   0.515059   0.272   0.7861    
## castEric Roberts    -1.109333   0.460683  -2.408   0.0171 *  
## castKane Hodder      0.306667   0.515059   0.595   0.5524    
## castLloyd Kaufman    1.090667   0.460683   2.367   0.0190 *  
## castMaria Olsen     -0.345556   0.493132  -0.701   0.4844    
## castSuzi Lorraine   -0.242083   0.506948  -0.478   0.6336    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.411 on 173 degrees of freedom
##   (21 observations deleted due to missingness)
## Multiple R-squared:  0.1603, Adjusted R-squared:  0.1166 
## F-statistic:  3.67 on 9 and 173 DF,  p-value: 0.0003167

Import data

Hint: You can choose any data you like but can’t take one that is already taken by other groups.

library(tidyverse)
horror_movies <- read_csv("https://github.com/rfordatascience/tidytuesday/raw/master/data/2018/2018-10-23/movie_profit.csv")

Explain data

Hint: Source and description of data, and definition of variables. Data shows the release date, movie name, production budget, domestic gross profit, worldwide profits, distributer, Movie ratings, and genre of a particular movie. The data is used to show analyzing movie profits.

Visualize data

Hint: Create at least two plots.

ggplot(horror_movies, 
       aes(x = mpaa_rating, 
           y = worldwide_gross)) +
  geom_boxplot() +
  labs(title = "Salary distribution by movie rates")

ggplot(horror_movies, 
       aes(x = genre, 
           y = production_budget)) +
  geom_point() +
  labs(title = "Salary distribution by Genre")

Correlation and regression analysis

The first plot is a box plot that shows the worldwide profits for ratings of movies. The plot shows that movies rated pg and pg-13 tend to make the most profits out of the 5 ratings. There isn’t enough g rated to actually compare same with N/A. R rated movies make profits but not as much as pg and pg-13.

The second plot is a scatter plot that shows the worldwide profits for genre of movie. The plot shows that action and adventure generate the most revenue out of the 5 genres. Comedy and horror are very even and share the lowest revenue. Drama is in between.

Share interesting stories you found from the data

Looking at the data I thought it was cool I could look at different worldwide prices for different topics. Such as the two above. It was also cool to see what movie distributors make the most money through their movies such as disney. Looking more specifically it was cool seeing the production budget for movies i’ve seen before such as Iron Man being close to 170,000,000 while the movie Halloween was only near 325,000. ## Hide the messages, but display the code and its results on the webpage.

List names of all group members (both first and last name) at the top of the webpage.

Use the correct slug.