Name:

The file “IMDB Movies” contains a sample of movies from the 250 highest rated movies of all time. The Box_Office variable is measured in Millions of Dollars. Use this file to answer the following questions. Include your written responses with any charts and tables in this document. Upload your excel file to go with your quiz. 1. What are the elements in the data set and determine if the data are from a population or a sample?

The elements in the data are the movies. This is a sample since it doesn’t include all the movies ever made.

  1. Calculate the Average, Median and Standard Deviation for Rating and interpret each? Also, calculate and interpret the coefficient of variation. Is there a lot of variability in rating?
Avg_Rating Median_Rating SD_Rating CV
8.323 8.3 0.243 0.029

The average and median ratings are each close to 8.3 suggesting that the distribution is relatively symmetric for rating. The data represent some of the highest rated and grossing movies in history. The standard deviation of 0.243 shows very little variability in the ratings values. The coefficient of variation confirms this value since the standard deviation is only 3% of the mean. This indicates very little variability for rating across movies.

  1. Create a table for the five number summary and create a box plot. From the five number summary, discuss if the rating variable has a lot of variation. Using the box plot (you don’t need to calculate this), determine how many outliers the ratings variable has?
Min Q1 Q2 Q3 Max
8 8.1 8.3 8.5 9.3

  1. Calculate the Average, Median, and Standard Deviation for the variable Box_Office. Between the Average and the Median, which measure do you think would provide a more relevant measure for the center of the distribution?
Avg_Box Median_Box SD_Box
282.154 120.073 401.591

The average box office revenues are $282 million while the median box office revneues is $120 billion. There is a significant difference between those two values suggesting that there is a lot of right skew in the data. The median would provide a better representation of the center of the distribution since the data likely has extreme values.

  1. Is there a lot of variability between movies in terms of box office revenue? Explain your reasoning.

The standard deviation is aroudn $400 million and is shows a lot of variation between movies in terms of box office.

  1. What box office Revenue would put you in the top 10% of movies in terms of box office?

A film with box office revenue of 775.09 would be considered the top 10% or 90th percentile in box office.

  1. Pick a movie from the list and calculate the z-score for box office revenue and interpret this value. Be sure to include the movie you selected and their box office revenue along with the z-score.

The z-score for the Dark Knight, a very popular movie, is 1.8. The Dark Knight is 1.8 standard deviations above the mean.

  1. Avengers: Endgame has the highest box office revenue of 2799 (2.799 billion dollars). Calculate the z-score for Endgame and interpret this value? Is this considered an outlier?

Avengers End Game has a z-score of 6.27. The box office revenue for this film is 6.27 standard deviations above the mean, which is much larger than 3 standard deviations. This would be considered an outlier.