Indroduction

I scraped data from IMDb website for the top 200 most popular movies that released in last year 2019. The http link is: http://www.imdb.com/search/title?count=200&release_date=2019,2019&title_type=feature

Specifically, I scraped 8 features from this website. It includes:

As for the pre-processing steps after I scraped all the data, include:

After pre-processing the data, save it to csv file.

Packages needed

Question 1

What type of movie has great reviews?

In order to answer this question, I select two features to analysis the impact that genre cause on review of the movie. To be more specific, I select feature rating to indicate the quality of the movie and use Genre to refer to the type of movie. And plot those two features as bar figure.

As we can observe that Action movie with good review among all the genre in terms of Rating score, because the number of Action movies are dominated in the Rating range [6,7]

In addition to that, I also analysis the number of movies for each Genre in the top 50 movies in terms of Rank. To be more specific, I select first 50 data that includes Genre and Rank, group by Genre and obtain the total number. The reason I select Rank and Genre here is because that Rank also indicates the review or quality of the movie to some degree, therefore, the number of movies in the top 50 ranked movies is also a good measure for the quality of a movie. For the table and the bar figure I created, we can obtain same conclusion as above, which Action is still the largest number of Movies in the top 50 ranked movies, it has 13 action movies, and the second is Comedy with 10.

In a short, both analysis in different aspects are concluded with same conclusion, that Action has great reviews.

Genre Total
Action 13
Comedy 10
Drama 8
Animation 5
Biography 5
Adventure 4
Horror 3
Crime 1
Thriller 1

Question 2