DS Labs HW

Author

E Lott

Downloaded all the packages needed for my visualization

library(ggthemes)
library(ggplot2)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ lubridate 1.9.4     ✔ tibble    3.3.0
✔ purrr     1.1.0     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer)
library(extrafont)
Registering fonts with R
library("dslabs")
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
 [1] "make-admissions.R"                   
 [2] "make-brca.R"                         
 [3] "make-brexit_polls.R"                 
 [4] "make-calificaciones.R"               
 [5] "make-death_prob.R"                   
 [6] "make-divorce_margarine.R"            
 [7] "make-gapminder-rdas.R"               
 [8] "make-greenhouse_gases.R"             
 [9] "make-historic_co2.R"                 
[10] "make-mice_weights.R"                 
[11] "make-mnist_127.R"                    
[12] "make-mnist_27.R"                     
[13] "make-movielens.R"                    
[14] "make-murders-rda.R"                  
[15] "make-na_example-rda.R"               
[16] "make-nyc_regents_scores.R"           
[17] "make-olive.R"                        
[18] "make-outlier_example.R"              
[19] "make-polls_2008.R"                   
[20] "make-polls_us_election_2016.R"       
[21] "make-pr_death_counts.R"              
[22] "make-reported_heights-rda.R"         
[23] "make-research_funding_rates.R"       
[24] "make-results_us_election_2012.R"     
[25] "make-stars.R"                        
[26] "make-temp_carbon.R"                  
[27] "make-tissue-gene-expression.R"       
[28] "make-trump_tweets.R"                 
[29] "make-weekly_us_contagious_diseases.R"
[30] "save-gapminder-example-csv.R"        

I chose the data set I wanted to use

data("movielens")

I filtered out all the years but the 1980s and chose only the singular genres descriptions

filter_genre <- movielens |>
  filter(year %in% c("1980", "1981", "1982", "1983", "1984", "1985", "1986", "1987", "1988","1989") & genres %in% c("Drama","Comedy", "Thriller", "Horror"))

Next, I grouped by the year and genre and found the average ratings for each genre for each year

new_filter_genre <- filter_genre |>
  group_by(year, genres) |>
    summarize(avg_rate = mean(rating))
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
new_filter_genre
# A tibble: 36 × 3
# Groups:   year [10]
    year genres   avg_rate
   <int> <fct>       <dbl>
 1  1980 Comedy       3.55
 2  1980 Drama        3.83
 3  1980 Horror       3.89
 4  1980 Thriller     3   
 5  1981 Comedy       3.6 
 6  1981 Drama        3.68
 7  1981 Horror       2.77
 8  1981 Thriller     3.17
 9  1982 Comedy       3.41
10  1982 Drama        3.85
# ℹ 26 more rows

Then, I plotted the ratings of the 4 genres as the years go by

plot1 <- new_filter_genre |> 
  ggplot(aes(color = genres, x = year, y = avg_rate, group = genres)) +
  geom_line(position = "identity") +
  scale_color_brewer(palette = "Set2") +
  geom_point(position = "identity") +
  labs(title = "Ratings of Popular Genres in the 1980s",
       x = "Year", y = "Rating", color = "Genres") +
  theme_minimal(base_family = "serif") +
  scale_x_continuous(breaks = ~ axisTicks(., log = FALSE)) + # I used https://stackoverflow.com/questions/70596445/how-to-avoid-default-conversion-of-year-into-decimals-when-plotting to help me fix the decimal points in the years.
  ylim(1,5)
plot1

I loaded highcharter to try an interactive visualization

library(highcharter)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 

Attaching package: 'highcharter'
The following object is masked from 'package:dslabs':

    stars

I plotted the same visual from before but now it is interactive (This is my final visualization, please grade this one)

highchart() |>
  hc_add_series(data = new_filter_genre,
                   type = "line",
                   hcaes(x = year,
                   y = avg_rate, 
                   group = genres)) |>
  hc_colors(brewer.pal(4, "Dark2")) |>
  hc_xAxis(title = list(text="Year")) |>
  hc_yAxis(title = list(text="Average Rating"), min = 1, max = 5) |> # I used https://stackoverflow.com/questions/57468457/how-can-i-set-the-yaxis-limits-within-highchart-plot to help be set a limit to y-axis
  hc_plotOptions(series = list(marker = list(symbol = "circle"))) |>
  hc_legend(align = "left", 
            verticalAlign = "top") |>
    hc_title(text = "Average Ratings of Movies in the 80s by Genre") |>
  hc_add_theme(hc_theme_538())  #I used https://stackoverflow.com/questions/56550264/how-do-i-access-highcharts-themes-modify-and-create-new-themes to help me change the theme and https://jkunst.com/highcharter/articles/themes.html#themes to look at the different theme options and adding a title.

Essay

I created my graph by choosing a data set from DS Labs and picked the movie ratings. The data set is a list of movies in all genres from 1916 to 2016 and their ratings (I’m assuming out of 5 stars). I chose the 80s just because I wanted to see the difference in genre ratings at that time. It would be cool to compare that with another decade but I decided to stick with only one graph. I chose an interactive graph to challenge myself and I’m glad it worked out. The only issue there is that since some are close together it may be hard to switch between them. First, I filtered just the 80s and the four singular genres (because some were multiple). Then I found the average of the ratings for each genre in each year so the line was smooth. Finally, I graphed it. I found the odd dip in ratings of horror in 1984 so I wonder what movies came out that people disliked. I also found that at the end of the 1980s thrillers had a sharp increase in average ratings so I would like to see which movies changed that. Drama and comedies seemed to be stable the whole time. Also, none of the data for these genres had an average rating over 4 so I wonder why that is. Overall, it seems like horror and thriller had the most drastic changes and completely switched places at the end of the 1980s.