Dataset Summary

The dataset that I have chose to analyze provides IMDB data for all both movie and television titles. This dataset includes variables such as (but not limited to) year, country, budget, rating, language, etc.

## color                         object
## director_name                 object
## num_critic_for_reviews       float64
## duration                     float64
## director_facebook_likes      float64
## actor_3_facebook_likes       float64
## actor_2_name                  object
## actor_1_facebook_likes       float64
##  gross                        object
## genres                        object
## actor_1_name                  object
## movie_title                   object
## num_voted_users                int64
## cast_total_facebook_likes      int64
## actor_3_name                  object
## facenumber_in_poster         float64
## plot_keywords                 object
## movie_imdb_link               object
## num_user_for_reviews         float64
## language                      object
## country                       object
## content_rating                object
## budget                       float64
## title_year                   float64
## actor_2_facebook_likes       float64
## imdb_score                   float64
## aspect_ratio                 float64
## movie_facebook_likes           int64
## dtype: object

Plot 1

The first graph that I created using this dataset illistrates the overall title count by title rating. The most popular rating by far was the Rating ‘R’, followed by ‘PG-13’ and ‘PG’. Over time, it appears more movies have been produced than television pictures (given the rating analysis).

## <BarContainer object of 10 artists>

Plot 2

The second graph that I created using this dataset illistrates the overall title count by title language. The most popular language was English, followed far behind by French and Spanish.

## <BarContainer object of 5 artists>

Plot 3

The third graph that I created using this dataset illistrates a histograph depicting the overall title count by year. The most popular decade was surely the 2010’s. The early 21st century has produced more movies than the entire 20th century.

## (array([  1.,   1.,   1.,   3.,   2.,   4.,   9.,   8.,   6.,  11.,   8.,
##         11.,   9.,  16.,  26.,  31.,  32.,  24.,  58.,  87.,  82., 122.,
##         95., 172., 519., 568., 604., 928., 676., 821.]), array([1916.        , 1919.33333333, 1922.66666667, 1926.        ,
##        1929.33333333, 1932.66666667, 1936.        , 1939.33333333,
##        1942.66666667, 1946.        , 1949.33333333, 1952.66666667,
##        1956.        , 1959.33333333, 1962.66666667, 1966.        ,
##        1969.33333333, 1972.66666667, 1976.        , 1979.33333333,
##        1982.66666667, 1986.        , 1989.33333333, 1992.66666667,
##        1996.        , 1999.33333333, 2002.66666667, 2006.        ,
##        2009.33333333, 2012.66666667, 2016.        ]), <a list of 30 Patch objects>)
## 
## C:\Users\dtmed\Anaconda3\lib\site-packages\numpy\lib\histograms.py:824: RuntimeWarning:
## 
## invalid value encountered in greater_equal
## 
## C:\Users\dtmed\Anaconda3\lib\site-packages\numpy\lib\histograms.py:825: RuntimeWarning:
## 
## invalid value encountered in less_equal
## (array([1900., 1920., 1940., 1960., 1980., 2000., 2020., 2040.]), <a list of 8 Text xticklabel objects>)
## (array([   0.,  200.,  400.,  600.,  800., 1000.]), <a list of 6 Text yticklabel objects>)

Plot 3

The fourth graph that I created using this dataset illistrates the overall title count by country. The United States clearly produces the most motion pictures followed by the UK.