class: bg1, title-slide .title[ #
150 Years of Cinema: A Computational Analysis ] .subtitle[ ##
] .author[ ### Saurabh Khanna ] .institute[ ### University of Amsterdam ] .date[ ### 18 October 2024 ] --- class: middle <style type="text/css"> .bg1 { position: relative; z-index: 1; } .bg1::before { content: ""; background-image: url('images/diverse-crowds.png'); background-size: cover; position: absolute; top: 0px; right: 0px; bottom: 0px; left: 0px; opacity: 0.15; z-index: -1; } </style> --- class: middle # Background - Part of our [(in)visible](https://theinvisiblelab.org/) work - Invisible information: Theorize, quantify, boost - Invisible films and/or Invisibility in films --- class: inverse # Internet Movie Database (IMDb)  --- class: inverse # The Movie Database (TMDB)  --- class: inverse # Open Movie Database (OMDB)  --- class: middle # A cleaned union of the three databases <table> <thead> <tr> <th style="text-align:left;"> Title Type </th> <th style="text-align:right;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> movie </td> <td style="text-align:right;"> 684498 </td> </tr> <tr> <td style="text-align:left;"> short </td> <td style="text-align:right;"> 1001698 </td> </tr> <tr> <td style="text-align:left;"> tvSeries </td> <td style="text-align:right;"> 265638 </td> </tr> <tr> <td style="text-align:left;"> video </td> <td style="text-align:right;"> 293931 </td> </tr> <tr> <td style="text-align:left;"> videoGame </td> <td style="text-align:right;"> 39139 </td> </tr> </tbody> </table> Today's focus on _movie_. --- # Variables in film data ``` ## [1] "Imdb Id" "Localized Title" ## [3] "Cast" "Genres" ## [5] "Runtimes" "Countries" ## [7] "Country Codes" "Language Codes" ## [9] "Color Info" "Aspect Ratio" ## [11] "Sound Mix" "Certificates" ## [13] "Original Air Date" "Rating" ## [15] "Votes" "Cover Url" ## [17] "Plot Outline" "Languages" ## [19] "Title" "Year" ## [21] "Kind" "Original Title" ## [23] "Director" "Producer" ## [25] "Cinematographer" "Akas" ## [27] "Production Companies" "Distributors" ## [29] "Plot" "Composer" ## [31] "Editor" "Assistant Director" ## [33] "Music Department" "Miscellaneous Crew" ## [35] "Special Effects" "Other Companies" ## [37] "Writer" "Production Design" ## [39] "Camera and Electrical Department" "Set Decoration" ## [41] "Costume Designer" "Costume Department" ## [43] "Art Direction" "Art Department" ## [45] "Box Office" "Videos" ## [47] "Thanks" "Synopsis" ## [49] "Editorial Department" "Animation Department" ## [51] "Sound Crew" "Make Up" ## [53] "Production Manager" "Stunt Performer" ## [55] "Transportation Department" "Visual Effects" ## [57] "Location Management" "Casting Director" ## [59] "Casting Department" "Script Department" ## [61] "Top 250 Rank" "Special Effects Companies" ## [63] "Production Department" "Bottom 100 Rank" ## [65] "Production Status" "Production Status Updated" ## [67] "Electrical Department" "Series Years" ``` --- # Films over time
--- class: middle, center <img src="talk_18oct2024_files/figure-html/unnamed-chunk-9-1.png" width="504" /> --- # Films across geographies
--- <img src="talk_18oct2024_files/figure-html/unnamed-chunk-11-1.png" width="100%" /> --- # Genres
--- # Languages
--- class: middle Indian films `\(\neq\)` Bollywood? <img src="talk_18oct2024_files/figure-html/unnamed-chunk-14-1.png" width="504" /> --- class: inverse, middle Can we assess representation of women (or of any other gender) in film? --- # Bechdel-Wallace Test Three criteria for a film to pass the Bechdel-Wallace Test: - It has to have at least two women in it, - Who talk to each other, - About something besides a man. <br/> Dataset of 9593 films labelled by humans on how many criteria satisfied. <table> <thead> <tr> <th style="text-align:right;"> bechdel_score </th> <th style="text-align:right;"> n_films </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 918 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2149 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1021 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 5505 </td> </tr> </tbody> </table> --- # Average Bechdel Score by Genre
--- # Average Bechdel Score by Country
--- <img src="talk_18oct2024_files/figure-html/unnamed-chunk-19-1.png" width="100%" /> --- class: inverse, middle Can we create another metric for female representation? --- # Ocean's 8 | Member Name | Female | Credits Order | |-----------------------|--------|---------------| | Sandra Bullock | 1 | 1 | | Cate Blanchett | 1 | 2 | | Anne Hathaway | 1 | 3 | | Mindy Kaling | 1 | 4 | | Sarah Paulson | 1 | 5 | | Awkwafina | 1 | 6 | | Rihanna | 1 | 7 | | Helena Bonham Carter | 1 | 8 | | Richard Armitage | 0 | 9 | | James Corden | 0 | 10 | FRep Score `$$= \frac{1}{1} + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6} + \frac{1}{7} + \frac{1}{8} + \frac{0}{9} + \frac{0}{10} \to Recale =$$` 0.93 --- # Inception | Member Name | Female | Credits Order | |-----------------------|--------|---------------| | Leonardo DiCaprio | 0 | 1 | | Joseph Gordon-Levitt | 0 | 2 | | Ellen Page | 1 | 3 | | Tom Hardy | 0 | 4 | | Ken Watanabe | 0 | 5 | | Dileep Rao | 0 | 6 | | Cillian Murphy | 0 | 7 | | Tom Berenger | 0 | 8 | | Marion Cotillard | 1 | 9 | | Michael Caine | 0 | 10 | FRep Score `$$= \frac{0}{1} + \frac{0}{2} + \frac{1}{3} + \frac{0}{4} + \frac{0}{5} + \frac{0}{6} + \frac{0}{7} + \frac{0}{8} + \frac{1}{9} + \frac{0}{10} \to Recale =$$` 0.15 --- # Trends in this score over time <img src="talk_18oct2024_files/figure-html/unnamed-chunk-21-1.png" width="504" /> --- # Trends by genre <img src="images/genres.jpeg" width="500" style="border:0;"/> --- # Trends by country
--- # Checks?
--- class: middle # Possibilties - Fairness and bias (More: [(in)visible](https://theinvisiblelab.org/)) - Predictive models + Predict IMDb ratings for invisible films + Predict Bechdel-Wallace scores + Predict uniqueness of plots + *Encoding features is a task* - Vision models: 300,000 high resolution posters - TV data - [we can brainstorm] --- class: middle # Thank you! Reach out to collaborate: [saurabh.khanna@uva.nl](mailto:saurabh.khanna@uva.nl) Please do not disseminate the data.