Data Columns
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## 'data.frame': 5000 obs. of 15 variables:
## $ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Release.Group : chr "Mission: Impossible II" "Gladiator" "Cast Away" "What Women Want" ...
## $ X.Worldwide : num 5.46e+08 4.61e+08 4.30e+08 3.74e+08 3.50e+08 ...
## $ X.Domestic : num 2.15e+08 1.88e+08 2.34e+08 1.83e+08 1.38e+08 ...
## $ Domestic.. : num 39.4 40.8 54.4 48.9 39.4 75.4 50.3 55.6 53.1 53.3 ...
## $ X.Foreign : num 3.31e+08 2.73e+08 1.96e+08 1.91e+08 2.12e+08 ...
## $ Foreign.. : num 60.6 59.2 45.6 51.1 60.6 24.6 49.7 44.4 46.9 46.7 ...
## $ Year : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
## $ Genres : chr "Adventure, Action, Thriller" "Action, Drama, Adventure" "Adventure, Drama" "Comedy, Romance" ...
## $ Rating : chr "6.126/10" "8.217/10" "7.663/10" "6.45/10" ...
## $ Vote_Count : num 6741 19032 11403 3944 2530 ...
## $ Original_Language : chr "en" "en" "en" "en" ...
## $ Production_Countries: chr "United States of America" "United Kingdom, United States of America" "United States of America" "United Kingdom, United States of America" ...
## $ Rating_Num : num 6.13 8.22 7.66 6.45 6.54 ...
## $ ROI : num 1.537 1.454 0.839 1.046 1.54 ...
Summarized Data
## X.Worldwide X.Domestic X.Foreign
## Min. :1.666e+06 Min. : 0 Min. :0.000e+00
## 1st Qu.:2.466e+07 1st Qu.: 92752 1st Qu.:1.371e+07
## Median :4.845e+07 Median : 17984212 Median :3.019e+07
## Mean :1.192e+08 Mean : 44725233 Mean :7.449e+07
## 3rd Qu.:1.198e+08 3rd Qu.: 53868472 3rd Qu.:7.212e+07
## Max. :2.799e+09 Max. :936662225 Max. :1.994e+09
## 1. Summary Statistics (Worldwide, Domestic, Foreign Revenue)
## These summary statistics provide a clear overview of box office performance across the three major revenue categories. All three measures show right-skewed distributions, meaning only a small group of films achieve extremely high grosses. Worldwide grosses tend to exceed domestic and foreign individually, reflecting the combined global performance.
## [1] "Rank" "Release.Group" "X.Worldwide"
## [4] "X.Domestic" "Domestic.." "X.Foreign"
## [7] "Foreign.." "Year" "Genres"
## [10] "Rating" "Vote_Count" "Original_Language"
## [13] "Production_Countries" "Rating_Num" "ROI"
Top Production Countries
| Production_Countries | total_worldwide |
|---|---|
| United States of America | 341120951607 |
| United Kingdom, United States of America | 48020494544 |
| China | 20871818368 |
| Japan | 14406591108 |
| Canada, United States of America | 10577370647 |
| Germany, United States of America | 9866231882 |
| China, Hong Kong | 9380187253 |
| Unknown / Missing | 8755559928 |
| New Zealand, United States of America | 6963371265 |
| South Korea | 6767957256 |
Histograms: Worldwide Gross (Full + Zoomed)
## 3. Histogram of Worldwide Gross
## Two histograms are shown: the full-range distribution and a zoomed-in version using the 90th percentile cutoff. The zoomed histogram reveals the structure of the lower and mid-range films, which are otherwise compressed in a full-scale plot due to extreme blockbuster outliers.
Histograms: Vote Count (Full + Zoomed)
## Warning: Removed 170 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 170 rows containing non-finite outside the scale range
## (`stat_bin()`).
## 4. Histogram of Vote Count
## Displaying both the full-range and zoomed histograms allows for clearer visualization of how votes are distributed. Most movies receive moderate vote totals, while a small number accumulate very high counts, which otherwise distort the scale.
Boxplot: Ratings
## 5. Boxplot of Rating
## The boxplot shows that most films have mid-to-high ratings, with relatively few extremely low-rated films. A narrow interquartile range suggests that ratings tend to be clustered, reflecting consistent audience scoring patterns.
Histograms: Ratings (Full + Zoomed)
## Warning: Removed 170 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Warning: Removed 170 rows containing non-finite outside the scale range
## (`stat_bin()`).
## Histogram of Ratings
## The full-range histogram shows that movie ratings cluster around the mid-to-high range. The zoomed-in version removes extreme values and reveals a clearer distribution of common rating scores, indicating that most films tend to fall between moderate and strong audience approval.
Histograms: Domestic & Foreign Gross (Full + Zoomed)
## Histogram of Domestic Gross
## The full-range histogram shows that domestic grosses vary widely across movies, with only a few extremely high-earning films. The zoomed-in histogram highlights the majority of releases that earn moderate domestic totals, which would otherwise be overshadowed by blockbuster outliers.
## Histogram of Foreign Gross
## The full-range view displays significant skewness caused by major international blockbusters. The zoomed histogram makes it easier to see the distribution of typical foreign box office earnings for most films, which cluster at much lower values than the global hits.
Boxplot: ROI
## ROI Boxplots
## The full-range plot shows the complete distribution, including extreme outliers. Because these outliers compress the view of the interquartile range, the zoomed-in plot focuses on the whisker region (Q1 − 1.5×IQR to Q3 + 1.5×IQR). This view displays the quartiles, median, and main body of the ROI distribution clearly.
Correlation: Worldwide Gross vs Vote Count
## [1] 0.7233687
## 6. Correlation Between Worldwide Gross and Vote Count
## The positive correlation indicates that films earning higher worldwide grosses tend to receive more audience votes. This makes intuitive sense since popular, high-earning films typically reach larger audiences who contribute more ratings. However, the relationship is not perfect, suggesting that some low-grossing films can still develop strong niche audiences.
Identify Top Two Genres
## Top two genres based on average worldwide gross: Science Fiction and Adventure
Comparison Table: Top Two Genres
| GenreGroup | avg_worldwide | avg_domestic | avg_foreign | avg_rating | avg_votes | film_count |
|---|---|---|---|---|---|---|
| Adventure | 206702997 | 70203825 | 136499144 | 6.579993 | 3415.498 | 810 |
| Science Fiction | 255367903 | 95353142 | 160014695 | 6.544652 | 5393.434 | 514 |
## 7. Genre Comparison Table
## The table highlights clear differences between the two highest-grossing genres. Variations in average revenue, ratings, and vote counts reveal distinctions in audience appeal and commercial performance. These insights help explain which genres tend to perform more strongly in the global marketplace.
Boxplot: Worldwide Gross by Top Two Genres
## 8. Boxplot: Worldwide Gross by Genre
## The boxplot shows differences in median gross and overall revenue spread between the two best-performing genres. This visualization supports the numerical comparison by showing which genre produces more consistently high-grossing films.
Histograms for Top Two Genres (Full + Zoomed)
## 10. Histograms for Genre: Science Fiction
## 11. Histograms for Genre: Adventure
T-Test Between Top Two Genres
##
## Welch Two Sample t-test
##
## data: genre1_data and genre2_data
## t = 0.55189, df = 950.84, p-value = 0.5812
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -26843830 47849422
## sample estimates:
## mean of x mean of y
## 255367903 244865107
## 9. T-Test on Worldwide Gross Between Top Genres
## The t-test evaluates whether the average worldwide gross differs significantly between the two top genres. A low p-value indicates that the difference is real rather than due to random chance. This conclusion strengthens the earlier findings from the summary statistics and visualizations.
Final Summary
## Summary and Insights
## This analysis examined global box office performance using revenue, rating, and popularity measures. The t-test returned a p-value of 0.5812, indicating that the difference in worldwide performance between Science Fiction and Adventure is not statistically significant. These insights highlight how genre plays an important role in shaping commercial outcomes.
Yearly Trend for Top Genres
## Trend Over Time by Year for Top Two Genres
## The line chart displays how the average worldwide gross for the two top genres has evolved over time. Changes in the slope and relative position of the lines show periods where one genre outperformed the other, illustrating how audience preferences and market conditions may have shifted across years.
## Summary of Trend Over Time
## The trend analysis shows how the box office performance of the two top genres has shifted across the 25-year period. Periods of rapid growth indicate years in which that genre released more commercially successful films, while downturns suggest fewer major hits or increased competition from other genres. The relative height of the lines shows which genre dominated in global revenue during each time span, and crossover points highlight years where the performance of the two genres reversed. Overall, the trend plot demonstrates long-term shifts in audience preferences and market dynamics, revealing whether each genre's popularity has been stable, rising, or declining over time.
Summary of Analysis
## Summary of Analysis
## • The dataset includes 5,000 films from 2000–2024 with revenue, ratings, votes, and genre information.
## • Revenue variables (worldwide, domestic, foreign) show strong right-skewness driven by blockbuster outliers.
## • Full and zoomed histograms were used to highlight both overall distribution shape and detailed structure.
## • A small number of production countries account for most global box office revenue.
## • Worldwide gross and vote count are positively correlated, suggesting higher audience reach increases engagement.
## • Genres were split and ranked by average worldwide gross to identify the top two best-performing genres.
## • These two genres were compared using consistent colors across tables, boxplots, histograms, and statistical tests.
## • A two-sample t-test evaluated whether their differences in worldwide gross were statistically significant.
## • Trend analysis across years illustrated how revenue patterns for the top genres evolved over time.
## • Overall, the analysis provides a concise overview of revenue patterns, audience behavior, and shifts in genre performance.