Import data

# excel file
Summer_Movies <- read_excel("../00_data/Summer Movies.xlsx")
Summer_Movies 
## # A tibble: 905 × 10
##    tconst   title_type primary_title original_title  year runtime_minutes genres
##    <chr>    <chr>      <chr>         <chr>          <dbl> <chr>           <chr> 
##  1 tt00114… movie      Midsummer Ma… Midsummer Mad…  1920 60              Drama 
##  2 tt00267… movie      A Midsummer … A Midsummer N…  1935 133             Comed…
##  3 tt00338… movie      The Teachers… Magistrarna p…  1941 86              Comedy
##  4 tt00373… movie      Summer Storm  Summer Storm    1944 106             Crime…
##  5 tt00384… movie      Centennial S… Centennial Su…  1946 102             Histo…
##  6 tt00387… tvMovie    A Midsummer … A Midsummer N…  1946 150             Drama…
##  7 tt00393… movie      One Swallow … En fluga gör …  1947 88              Comedy
##  8 tt00408… movie      Summer Holid… Summer Holiday  1948 93              Music…
##  9 tt00415… movie      In the Good … In the Good O…  1949 102             Comed…
## 10 tt00429… movie      Bountiful Su… Shchedroe leto  1951 87              Comed…
## # ℹ 895 more rows
## # ℹ 3 more variables: simple_title <chr>, average_rating <dbl>, num_votes <dbl>

State one question

Does the year correlate to the average rating at all?

Plot data

ggplot(data = Summer_Movies) + 
  geom_point(mapping = aes(x = year, y = average_rating))

Interpret

When looking at the data presented in the graph it appears that a majority of the movies sat within the 6-7 range of rating through the summers. The year only increasing the amount of movies within that space.