DS Labs

Author

N Diker

Loaded in the necessary libraries

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library("dslabs")
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))

 [1] "make-admissions.R"                   
 [2] "make-brca.R"                         
 [3] "make-brexit_polls.R"                 
 [4] "make-calificaciones.R"               
 [5] "make-death_prob.R"                   
 [6] "make-divorce_margarine.R"            
 [7] "make-gapminder-rdas.R"               
 [8] "make-greenhouse_gases.R"             
 [9] "make-historic_co2.R"                 
[10] "make-mice_weights.R"                 
[11] "make-mnist_127.R"                    
[12] "make-mnist_27.R"                     
[13] "make-movielens.R"                    
[14] "make-murders-rda.R"                  
[15] "make-na_example-rda.R"               
[16] "make-nyc_regents_scores.R"           
[17] "make-olive.R"                        
[18] "make-outlier_example.R"              
[19] "make-polls_2008.R"                   
[20] "make-polls_us_election_2016.R"       
[21] "make-pr_death_counts.R"              
[22] "make-reported_heights-rda.R"         
[23] "make-research_funding_rates.R"       
[24] "make-stars.R"                        
[25] "make-temp_carbon.R"                  
[26] "make-tissue-gene-expression.R"       
[27] "make-trump_tweets.R"                 
[28] "make-weekly_us_contagious_diseases.R"
[29] "save-gapminder-example-csv.R"

Browsed through dslabs and decided to go with the movielens dataset, looked through the dataset using the data function

data("movielens")

Started off by filtering the data according to the year range I wanted (1970-1990) just to see what genres I was working with during this range.

years <- movielens %>%
  filter(year >= 1970 & year <= 1980)

Filtered the data based on the genres I wanted to observe and the year range (1970-1990)

genres <- movielens %>%
  filter(year >= 1970 & year <= 1990 & 
         genres %in% c("Adventure|Sci-Fi", "Comedy", "Crime|Drama", "Drama", "Thriller"))

Added a border around the plot using theme_bw and used size + color in order to change the legend

plot1 <- ggplot(genres, aes(x = genres, y = year, size = rating, color = rating)) + 
  geom_point(alpha = 0.7) +
  scale_color_gradient(low = "green", high = "hotpink") + 
  labs(title = "Rating of Movies Based on the Year and Genre",
       x = "Movie Genre",
       y = "Year of Release (1970-1990)") +
  theme_bw() 

plot1

For this graph, I used the movielens dataset. The dataset is a compiled list of movies that have multiple different genres. The release year of the movies range from 1902 to 2016. For my graph, I focused on movies that were released between 1970 and 1990. I decided to only include genres such as adventure|sci-fi, comedy, crime|drama, drama, and thriller. I chose these genres simply because they interested me the most. I began by filtering the dataset to only show me movies released between 1970 and 1990 and then I filtered it again to show me my choice of genres from that time range. Then I created a graph that shows the ratings of each of these movies that fit the genre and year criteria. I made the legend reflect the ratings; the more pink a point is, the better the rating and the greener it is, the worse the rating was. I changed the size of the points in order to reflect the rating as well (it made it easier to see the difference in ratings). Based on the graph, we can see that there was a lack of adventure|sci-fi movies that came out between 1970 and 1990. The movies with the highest ratings are generally focused in the drama and comedy categories.