1 INTRODUCTION

1.1 Motivation:

The global video game industry, a multi-billion dollar market for years, presents an enticing opportunity for video game corporations to expand their operations and capture a larger share of this lucrative market. Consider a video game corporation eager to seize this opportunity; we, as the corporation’s market research team, are tasked with conducting an analysis of potential genres, platforms, and regions to inform the CEO’s investment decisions.

1.2 Tools and statistical techniques:

Python will be our primary tool for data visualization, enabling us to gain valuable insights into sales trends and market preferences.Besides, R Markdown will be utilized to craft our report that presents our analysis.

To calculate the necessary statistical figures, we will employ a range of statistical techniques, including Analysis of Variance (ANOVA), Chi-square test of independence, and Hypothesis testing.

1.3 Objectives:

The analysis will focus on three primary objectives:

Determining the region with the highest overall game sales to prioritize market expansion efforts.
The games available on both platforms, PS3 and X360, would have higher global sales on PS3 compared to X360.
Assessing whether there is association between platform and genre.

=> By addressing these objectives, the corporation will gain a clear understanding of the market landscape, enabling it to make informed decisions regarding game development and market expansion, ultimately maximizing its profit potential.

2 DATA EXPLORATION

2.1 Data description

Our original data is collected from https://bom.so/vR9cpq. With the scope of this report, we only focus on variables that are under control of our company having effects on the company video game’s sales. This is a data frame with 16598 observations on 11 variables after eliminating 5 unused columns. The image below is a small part of the whole dataset; by looking at this, we can have an overall perspective on the data we use.

Our data consists of 11 variables:

Rank: numerical order
Name: The name of the video game.
Platform: The platform on which games are sold.
Genre: The official genre of the game.
Year: The year of the game was released.
Pbulisher: video game publishers
NA Sale: Sales in North America.
JP Sale: Sales in Japan.
EU Sale: Sales in Europe.
Other Sale: Sales not in North America, The Europian Union, or Japan.
Global Sales: Total global sales.

Original display of summary:

##         Rank  ... Global_Sales
## 0          1  ...        82.74
## 1          2  ...        40.24
## 2          3  ...        35.82
## 3          4  ...        33.00
## 4          5  ...        31.37
## ...      ...  ...          ...
## 16593  16596  ...         0.01
## 16594  16597  ...         0.01
## 16595  16598  ...         0.01
## 16596  16599  ...         0.01
## 16597  16600  ...         0.01
## 
## [16598 rows x 11 columns]

2.2 Data analysis

The section is to visualize the data to see possible insights that will aid in solving the report’s objectives.

2.2.1 Plot 1

The box plot presented depicts the sales distribution across five regions: North America, Europe, Japan, and Other. As evident from the chart, North America emerges as the frontrunner in video game sales, with Europe trailing closely behind. Interestingly, the combined sales of the “Other” region are comparable to those of Japan.

2.2.2 Plot 2

The provided graph depicts the evolution of game production over the years, spanning from 1980 to 2020. Looking from an overall perspective, the number of games increased over time until 2010 and reached its peak then. The period after 2010, the number of games witnessed a relatively significant decrease.

2.2.3 Plot 3

The line chart depicts the gross sales in various regions from 1978 to 2020. North America consistently outperformed all other regions in terms of sales, followed by Europe. In contrast, sales in other regions remained relatively stagnant until after 1995. Notably, sales across all regions experienced a surge from 2000 to 2016 compared to the preceding period.

2.2.4 Plot 4

The pie chart depicts the proportion of genres of video games released between 1978 and 2020. As evident from the chart, Action emerged as the most popular genre, accounting for the highest proportion of 19.6%. Sport followed behind, claiming the second-highest proportion of 14,9%. Notably, other genres held a significant share of 16.8%. Racing, as indicated in the pie chart, was the least preferred genre.

2.2.5 Plot 5

The IQR for the PC platform is relatively small, indicating that most PC games sell within a narrow range.
The IQR for the PlayStation 4 and PlayStation 3 platforms is larger, indicating that there is a greater variation in sales for these platforms.
The Xbox 360 platform has a higher median sales than the Xbox One platform, but the IQR for the Xbox One platform is smaller, indicating that most Xbox One games sell within a narrower range.
The Nintendo Wii U and PlayStation Vita platforms have the lowest median sales and the smallest IQRs, indicating that there is the least variation in sales for these platforms.

2.2.6 Plot 6

The provided chart illustrates the top 10 global best-selling video games, measured in millions of units sold. Wii Sports reigns supreme, having sold a staggering 82.74 million copies, nearly double the sales of Super Mario Bros, the second best-selling game. Mario Kart Wii takes the third spot with 35.82 million copies sold, followed by a group of games with sales ranging from 28.31 to 33 million copies.

3 FUTHER ANALYSIS

3.1 The country with the highest gaming product sales

3.1.1 METHOD

3.1.1.1 Motivation

As a video game company, we would like to know where the most potential region is in terms of sales performance so that we can have more information to consider the next invested region. To evaluate that across North America, Europe, Japan, and other countries, we employ a statistical method known as Analysis of Variance (ANOVA). By applying ANOVA, we can determine whether there were statistically significant differences in sales performance among these regions. If significant differences were detected, we then use Tukey’s test to pinpoint specific regions with distinct sales performances.

Our objective is to assess whether the average number of sales is equal across four groups of countries.

3.1.1.2 Standard procedure

Step1: Form the testing hypotheses as follows:

Null Hypothesis (H0): The means of all groups are equal. (M1 = M2 = M3 = M4)
Alternative Hypothesis (Ha): At least one group mean is different from the others.

Step2: Collect data for each region.

Step3: Conduct ANOVA Test:

The form of an analysis of variance table :

So we need to calculate all of the components within the variance table:

1. SSW: Sum of Squares Within

SSW = \(\sum_{i}\) \(\sum_{j}\) \((X_{ij}\) - \(\overline{X}_{j})^2\)

\(\overline{X}_{j}\): is the mean of group i

\(X_{ij}\): is the j-th observation in group i

2. SSB: Sum of Squares Between

SSB = \(\sum_{i}\) \(n_{i}\) \((\overline{X}_{i}\) - \(\overline{X})^2\)

\(n_{i}\): is the number of observations in group i

\(\overline{X}_{i}\): is the mean of group i

\(\overline{X}\): is the overall mean

3. SST : The total Sum of Squares

SST = \(\sum_{i=1}^{N}\) \((X_{i}\) - \(\overline{X})^2\)

\(X_{i}\): is each individual observation

\(\overline{X}\): is the overall mean of all observations

N is the total number of observations

4. Degrees of Freedom for SSB

\(df_{between}\) = k - 1

k is the number of groups

5. Degrees of Freedom for SSW

\(df_{within}\) = N - k

N is the total number of observations, and k is the number of groups

6. Degrees of Freedom for SST

\(df_{total}\) = N - 1

N is the total number of observations

7. Mean Square for SSB (MSB)

MSB = \(\frac{SSB}{df_{between}}\)

8. Mean Square for SSW(MSW)

MSW = \(\frac{SSW}{df_{within}}\)

9. F- Statistic

F = \(\frac{MSB}{MSW}\)

10. P-value

P-value = \(P(X \geq F)\)

Step 4:

To determine whether the observed differences in group means are statistically significant, we employ a one-way analysis of variance (ANOVA) with a significance level of 0.05. If the ANOVA indicates a significant difference between group means, we proceed with Tukey’s post hoc test to identify the specific pairs of means that differ significantly from each other.

3.1.2 ANALYSIS

Step 1:

Hypothesis testing at significance level(=0.05) as follows:

Ho: The means of all regions are equal: μ1 = μ2 = μ3 = μ4
Ha: at least one pair of means are different from each other
- μ1: The mean for the sales of NA
- μ2: The mean for the sales of EU
- μ3: The mean for the sales of JP
- μ4: The mean for the sales of other countries.

Step 2:

In the following steps, we employ Python to perform calculations for data analysis and statistical modeling. The code is as follows:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd

Next, we import the data from a CSV file into a pandas:

df = pd.read_csv(r'C:\dara.csv')

Step 3:

We utilize Python code to load a dataset, reshape it into an appropriate format, fit a linear regression model, and subsequently conduct an analysis of variance (ANOVA) to assess the statistical significance of differences in sales across various regions:

print("DataFrame Head:")

## DataFrame Head:

print(df.head())

##    Rank                      Name Platform  ...  JP_Sales Other_Sales Global_Sales
## 0     1                Wii Sports      Wii  ...      3.77        8.46        82.74
## 1     2         Super Mario Bros.      NES  ...      6.81        0.77        40.24
## 2     3            Mario Kart Wii      Wii  ...      3.79        3.31        35.82
## 3     4         Wii Sports Resort      Wii  ...      3.28        2.96        33.00
## 4     5  Pokemon Red/Pokemon Blue       GB  ...     10.22        1.00        31.37
## 
## [5 rows x 11 columns]

# Melt the DataFrame
df_melt = pd.melt(df, value_vars=['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales'])
df_melt.columns = ['SalesRegion', 'SalesValue']

# Fit an OLS model
model = ols('SalesValue ~ C(SalesRegion)', data=df_melt).fit()

print("\nMelted DataFrame Head:")

## 
## Melted DataFrame Head:

print(df_melt.head())

##   SalesRegion  SalesValue
## 0    NA_Sales       41.49
## 1    NA_Sales       29.08
## 2    NA_Sales       15.85
## 3    NA_Sales       15.75
## 4    NA_Sales       11.27

# Perform ANOVA and display the ANOVA table
anova_table = sm.stats.anova_lm(model, typ=2)
print("\nANOVA Table:")

## 
## ANOVA Table:

print(anova_table)

##                       sum_sq       df           F  PR(>F)
## C(SalesRegion)    461.082116      3.0  583.513082     0.0
## Residual        17486.222913  66388.0         NaN     NaN

Step 4:

We check the significant level:

According to the table, we have:

F-statistic: F= 583.513082

P-Value: P = 0.0 < 0.05 (significance level)

=> Given that the p-value is less than the significance level (0.05), we reject the null hypothesis, which means there is at least a statistically significant diference in mean sales across regional groups. Consequently, we conclude that a statistically significant difference in mean sales exists among at least one pair of regional groups.

=> To further investigate these differences and pinpoint which regional groups have significantly different sales performance, we use a statistical test called Tukey’s post hoc test. This test compares the average sales of each regional group to all other groups while considering the overall significance level.

The Python code for this analysis is provided below:

# Check if the p-value is smaller than 0.05
if anova_table['PR(>F)'][0] < 0.05:
    # Perform Tukey's post hoc test
    tukey_result = pairwise_tukeyhsd(df_melt['SalesValue'], df_melt['SalesRegion'])
    print("\Tukey's Post Hoc Test:")
    print(tukey_result.summary())

## \Tukey's Post Hoc Test:
##    Multiple Comparison of Means - Tukey HSD, FWER=0.05    
## ==========================================================
##  group1     group2   meandiff p-adj  lower   upper  reject
## ----------------------------------------------------------
## EU_Sales    JP_Sales  -0.0689   0.0 -0.0833 -0.0544   True
## EU_Sales    NA_Sales    0.118   0.0  0.1035  0.1325   True
## EU_Sales Other_Sales  -0.0986   0.0 -0.1131 -0.0841   True
## JP_Sales    NA_Sales   0.1869   0.0  0.1724  0.2014   True
## JP_Sales Other_Sales  -0.0297   0.0 -0.0442 -0.0152   True
## NA_Sales Other_Sales  -0.2166   0.0 -0.2311 -0.2021   True
## ----------------------------------------------------------

Based on the aforementioned outcome:

The 99.95% confidence interval for the difference between EU and JP is as follows:

μ3 - μ2 (-0.0818, -0.0531)

Because the confidence interval does not include 0, we can confidently determine that with a 99.95% confidence level, the sales results for EU and JP differ. Furthermore, it also indicates that EU sales are greater than JP sales.

The 99.95% confidence interval for the difference between EU and NA is as follows:

μ1 - μ2 (0.1039,0.1327)

Since the confidence interval does not include 0, we can state that with a 99.95% confidence level, there is also a difference in sales results between EU and NA. In addition, the evidence suggests that NA sales outperform those of the EU.

The 99.95% confidence interval for the difference between EU and other countries is as follows:

μ4 - μ2 (-0.1121, -0.0833)

As the confidence interval does not include 0, we can conclude that with a 99.95% confidence level, there is still a difference in sales results between EU and other countries. Moreover, EU sales exceed those of other countries.

The 99.95% confidence interval for the difference between JP and NA is as follows:

μ1 - μ3 (0.1714, 0.2001)

As the confidence interval does not encompass 0, we can confidently assert that with a 99.95% confidence level, the sales results for JP and NA differ. Additionally, NA sales are larger than JP sales.

The 99.95% confidence interval for the difference between JP and other countries is as follows:

μ4 - μ3 (-0.0446, -0.0159)

Since the confidence interval does not encompass 0, it is possible that with a 99.95% confidence level, the sales results for JP and other countries are not the same. Also, JP sales are greater than other countries’ sales.

The 99.95% confidence interval for the difference between NA and other countries is as follows:

μ4 - μ1 (-0.2304, -0.2016)

Because the confidence interval does not include 0, it is plausible that with a 99.95% confidence level, the sales results for NA and other countries are also different. Furthermore, it also indicates that NA sales are higher than other countries’ sales.

=> NA is the country with highest sale of video games.

Finally, to visualize the result after testing the hypothesis, we employ Python and obtain the chart as follows:

3.1.3 CONCLUSION

A one-way ANOVA showed a statistically significant difference in sales performance across four regional groups. Tukey’s test further compares the mean value of the region’s sales, showing the means that are significantly different from each other. Specifically, in the comparison of sales performance among these four groups, it becomes evident that NA outperforms the other three regions, securing the highest number of sales. Consequently, we can confidently conclude that North America stands out as the country that has achieved the most successful sales performance in the gaming product category.

3.2 A game which is available in both platforms, PS3 and X360, would have higher global sales on PS3 compared to X360

3.2.1 METHOD

3.2.1.1 Motivation

Our choice of comparing the PS3 and Xbox 360 stems from their dominance in sales, as evident from Plot 5 (in section 2.2 Data Analysis). Consequently, we focus our analysis on these two platforms, beginning with the computation of their mean sales.

In the table above, it is evident that the average global sales for PS3 surpassed those of X360. To assess the statistical significance of this discrepancy in global sales, a statistical test will be conducted.

The global sales figures from the two platforms were demonstrated to be independent. Additionally, with a substantial sample size of 1678, the central limit theorem comes into play, ensuring that the sampling distribution of the mean conforms to normality. Consequently, the independent t-test was employed to compare the means of global sales for PS3 and X360.

A presumption was made that the games available on both platforms, PS3 and X360, would have higher global sales on PS3 compared to X360.

To validate this assumption, a hypothesis test with a 95 percent confidence level was conducted, with the null hypothesis stated as follows:

\(H_{0}\): \(\mu X360\) - \(\mu PS3\) >= 0

and the alternative hypothesis given as

\(H_{A}\): \(\mu X360\) - \(\mu PS3\) < 0

With μPS3 and μX360 representing the mean global sales for PS3 and X360, respectively, the alternative hypothesis posits that the global sales for X360 were not greater than those for PS3.

To gauge the likelihood of the sample results, a p-value—referred to as the observed level of significance for the test—was employed, with α set at 0.05. The null hypothesis could be rejected if the p-value is less than α; otherwise, the null hypothesis remains unaltered.

3.2.1.2 Standard procedure

1. Formulate Hypotheses: Null Hypothesis (H₀): This is the default assumption that there is no effect or no difference. It is denoted by H₀. Alternative Hypothesis (H₁ or Ha): This is the statement you want to test. It can be one-sided (greater than or less than) or two-sided (not equal to). The choice depends on the nature of your research question. The alternative hypothesis is denoted by H₁ or Ha

2. Select Significance Level (α): The significance level (α) is the probability of rejecting the null hypothesis when it is true. Common choices are 0.05, 0.01, or 0.10.

3. Collect and Analyze Data: Collect a sample of data relevant to your hypothesis. Analyze the data using appropriate statistical methods (e.g., t-test, z-test) to obtain a test statistic.

4. Determine Critical Region: For a one-sided test, the critical region is located entirely in one tail of the distribution. The critical value is determined based on the significance level and the distribution of the test statistic (e.g., z-table or t-table).

5. Make a Decision: If the test statistic falls into the critical region, reject the null hypothesis. If the test statistic does not fall into the critical region, fail to reject the null hypothesis.

6. Draw a Conclusion: Based on the decision in step 5, draw a conclusion in the context of the problem. If the null hypothesis is rejected, you may conclude that there is enough evidence to support the alternative hypothesis.

7. Calculate the P-Value (Optional): Calculate the p-value associated with the test statistic. If the p-value is less than the significance level, you reject the null hypothesis. This step is optional but can be informative.

8. Interpret Results: Provide a conclusion in the context of the problem, including practical implications of the findings.

3.2.2 ANALYSIS

By using R, a t-statistic was calculated by a three-step procedure as follows:

Step 1: The data was imported from a CSV file in R.

Step 2: Import the required libraries (library(dplyr))

Step 3: We are filtering out games that are common to both PS3 and X360.

library(dplyr)
library(readr)
df <- read_csv("C:/dara.csv")

## Rows: 16598 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Name, Platform, Year, Genre, Publisher
## dbl (6): Rank, NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Filter the data
 xy <- df[df$Platform %in% c("PS3", "X360"), ]
xy <- xy[ave(seq_along(xy$Platform), xy$Name, FUN = length) == 2, c("Name", "Platform", "Global_Sales")]

Extracting games in PS3

library(dplyr)
PS3 <- xy %>% 
  filter(Platform == "PS3") %>% 
  arrange(Name) %>% 
  mutate(index = row_number()) %>% 
  select(-index)  # To remove the index column created by mutate
tail(PS3)

## # A tibble: 6 × 3
##   Name                      Platform Global_Sales
##   <chr>                     <chr>           <dbl>
## 1 Zumba Fitness             PS3              0.59
## 2 [Prototype 2]             PS3              0.75
## 3 [Prototype]               PS3              1.24
## 4 de Blob 2                 PS3              0.21
## 5 nail'd                    PS3              0.12
## 6 pro evolution soccer 2011 PS3              2.42

Extracting Games in X360

X360 <- xy %>% 
  filter(Platform == "X360") %>% 
  arrange(Name) %>% 
  mutate(index = row_number()) %>% 
  select(-index)  # To remove the index column created by mutate
tail(X360)

## # A tibble: 6 × 3
##   Name                      Platform Global_Sales
##   <chr>                     <chr>           <dbl>
## 1 Zumba Fitness             X360             2.39
## 2 [Prototype 2]             X360             0.8 
## 3 [Prototype]               X360             1.31
## 4 de Blob 2                 X360             0.15
## 5 nail'd                    X360             0.1 
## 6 pro evolution soccer 2011 X360             0.61

Step 4: The t-statistic was calculated:

 # Perform an independent two-sample t-test
       result <- t.test(PS3$Global_Sales, X360$Global_Sales,
       alternative = "greater",
       mu = 0,
       paired = FALSE,
       var.equal = FALSE,
       conf.level = 0.95)
      
         # Print the result
         print(result)

## 
##  Welch Two Sample t-test
## 
## data:  PS3$Global_Sales and X360$Global_Sales
## t = 0.052176, df = 1679.8, p-value = 0.4792
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.1231136        Inf
## sample estimates:
## mean of x mean of y 
## 0.8663020 0.8622711

As per the statistical analysis in R, the t-statistic is reported as 0.052176, and the associated p-value is 0.4792. At a 95 percent confidence level, the smallest estimate for the true difference in the means of global sales between the two platforms is -0.1231136.

Step 5 & 6:

Since the p-value (0.4792) is greater than the significance level α (0.05), we fail to reject the null hypothesis Ho. In practical terms, this means there is no sufficient evidence to claim that the global sales for PS3 are larger than those for X360.

3.2.3 CONCLUSION

In conclusion, the statistical analysis indicates that there is insufficient evidence to reject the null hypothesis, suggesting that global sales for the PS3 are not significantly larger than those for the X360. While this finding provides valuable insights, it is essential for the company to complement this statistical perspective with a broader examination of market dynamics, consumer preferences, and competitive factors. Strategic considerations such as product enhancement, targeted marketing, and partnerships should be explored to address potential areas for improvement and enhance the overall competitiveness of the PS3 in the gaming industry.

3.3 There is correlation between platform and genre.

3.3.1 METHOD

3.3.1.1 Motivation

As a video game corporation,determining the right platform for releasing a video game is crucial. By aligning the game genre with the preferred genres on the targeted platform, companies can maximize their reach, appeal to the right audience, and ultimately increase their profits.

We aim to test whether there is association between platform and genre by employing Chi square test of independence. Consequently, we form the testing hypothesis as follows:

Ho: There is no association between platform and genre.
Ha: There is association between platform and genre.

3.3.1.2 Standard procedure

Step 1: State the null hypothesis (H₀) and alternative hypothesis (Ha)

Define the null hypothesis, which assumes no association between the two categorical variables, and the alternative hypothesis, which proposes an association between them.

Step 2: Organize data into a contingency table

Create a contingency table to organize the observed frequency counts for each combination of categories from the two variables.

Step 3: Calculate the expected frequency counts:

Determine the expected frequency counts for each cell in the contingency table under the assumption of the null hypothesis.

Step 4: Calculate the chi-square statistic

The chi-square statistic is a measure of the difference between the observed counts and the expected counts.

\(X^{2}\) = \(\ \Sigma\) \(\frac{(O-E)^2}{E}\)

Where:

\(X^{2}\): is the chi-square test statistic
\(\ \Sigma\): is the summation operator (it means “take the sum of”)
O: is the observed frequency
E: is the expected frequency

Step 5: Determine the degrees of freedom

The degrees of freedom for the chi-square test are calculated as (number of rows - 1) x (number of columns - 1).

Step 6: Calculate p - value

Obtain the p-value using the calculated chi-square statistic (Χ²) and the degrees of freedom (df) from a chi-square distribution table or statistical software.

Step 7: Make a decision

Compare the p-value to the significance level (α), commonly set at 0.05.

If p ≤ α, reject the null hypothesis (H₀) and conclude that there is a statistically significant association between the two categorical variables.
If p > α, fail to reject the null hypothesis (H₀) and conclude that there is not enough evidence to support an association between the two variables.

3.3.2 ANALYSIS

While the Chi-square test of independence can be performed manually, it can indeed be a lengthy and intricate process. This is particularly true for this large dataset. Instead, we use Python to compute calculations.

The code is as follows:

ctab = pd.crosstab(df.Platform, df.Genre)
# Calculate chi-square test using chisq function
chi2, p, dof, expected = chi2_contingency(ctab)
print(type(df))

## <class 'pandas.core.frame.DataFrame'>

print(f"Chi-square statistic: {chi2}")

## Chi-square statistic: 5909.978728846863

print(f"P-value: {p}")

## P-value: 0.0

print(f"Degrees of freedom: {dof}")

## Degrees of freedom: 330

print("Expected frequencies:")

## Expected frequencies:

print(expected)

## [[2.65711532e+01 1.03047355e+01 6.79503555e+00 1.39346307e+01
##   7.09953006e+00 4.66357392e+00 1.00082540e+01 1.19233643e+01
##   1.04970478e+01 6.94728281e+00 1.87985299e+01 5.45686227e+00]
##  [5.99349319e-01 2.32437643e-01 1.53271478e-01 3.14314978e-01
##   1.60139776e-01 1.05193397e-01 2.25750090e-01 2.68948066e-01
##   2.36775515e-01 1.56705627e-01 4.24026991e-01 1.23087119e-01]
##  [1.01689601e+02 3.94369201e+01 2.60050609e+01 5.33287746e+01
##   2.71703820e+01 1.78478130e+01 3.83022653e+01 4.56315219e+01
##   4.01729124e+01 2.65877214e+01 7.19432462e+01 2.08837812e+01]
##  [1.03887215e+01 4.02891915e+00 2.65670563e+00 5.44812628e+00
##   2.77575612e+00 1.82335221e+00 3.91300157e+00 4.66176648e+00
##   4.10410893e+00 2.71623087e+00 7.34980118e+00 2.13351006e+00]
##  [4.32130859e+02 1.67587541e+02 1.10508736e+02 2.26621099e+02
##   1.15460778e+02 7.58444391e+01 1.62765815e+02 1.93911556e+02
##   1.70715146e+02 1.12984757e+02 3.05723461e+02 8.87458127e+01]
##  [1.95787444e+01 7.59296301e+00 5.00686830e+00 1.02676226e+01
##   5.23123268e+00 3.43631763e+00 7.37450295e+00 8.78563682e+00
##   7.73466683e+00 5.11905049e+00 1.38515484e+01 4.02084589e+00]
##  [1.64221713e+02 6.36879142e+01 4.19963851e+01 8.61223039e+01
##   4.38782986e+01 2.88229907e+01 6.18555248e+01 7.36917701e+01
##   6.48764911e+01 4.29373418e+01 1.16183396e+02 3.37258706e+01]
##  [1.11079407e+02 4.30784432e+01 2.84063140e+01 5.82530425e+01
##   2.96792385e+01 1.94958429e+01 4.18390167e+01 4.98450416e+01
##   4.38823955e+01 2.90427762e+01 7.85863357e+01 2.28121460e+01]
##  [5.39414387e+00 2.09193879e+00 1.37944331e+00 2.82883480e+00
##   1.44125798e+00 9.46740571e-01 2.03175081e+00 2.42053259e+00
##   2.13097964e+00 1.41035064e+00 3.81624292e+00 1.10778407e+00]
##  [1.99783106e-01 7.74792144e-02 5.10904928e-02 1.04771659e-01
##   5.33799253e-02 3.50644656e-02 7.52500301e-02 8.96493553e-02
##   7.89251717e-02 5.22352091e-02 1.41342330e-01 4.10290396e-02]
##  [6.37308109e+01 2.47158694e+01 1.62978672e+01 3.34221593e+01
##   1.70281962e+01 1.11855645e+01 2.40047596e+01 2.85981444e+01
##   2.51771298e+01 1.66630317e+01 4.50882034e+01 1.30882636e+01]
##  [1.95787444e+01 7.59296301e+00 5.00686830e+00 1.02676226e+01
##   5.23123268e+00 3.43631763e+00 7.37450295e+00 8.78563682e+00
##   7.73466683e+00 5.11905049e+00 1.38515484e+01 4.02084589e+00]
##  [2.39739728e+00 9.29750572e-01 6.13085914e-01 1.25725991e+00
##   6.40559104e-01 4.20773587e-01 9.03000361e-01 1.07579226e+00
##   9.47102060e-01 6.26822509e-01 1.69610796e+00 4.92348476e-01]
##  [1.91791782e+02 7.43800458e+01 4.90468731e+01 1.00580793e+02
##   5.12447283e+01 3.36618870e+01 7.22400289e+01 8.60633811e+01
##   7.57681648e+01 5.01458007e+01 1.35688637e+02 3.93878781e+01]
##  [1.99783106e-01 7.74792144e-02 5.10904928e-02 1.04771659e-01
##   5.33799253e-02 3.50644656e-02 7.52500301e-02 8.96493553e-02
##   7.89251717e-02 5.22352091e-02 1.41342330e-01 4.10290396e-02]
##  [2.38940595e+02 9.26651404e+01 6.11042294e+01 1.25306904e+02
##   6.38423906e+01 4.19371009e+01 8.99990360e+01 1.07220629e+02
##   9.43945054e+01 6.24733100e+01 1.69045427e+02 4.90707314e+01]
##  [4.31731293e+02 1.67432582e+02 1.10406555e+02 2.26411556e+02
##   1.15354019e+02 7.57743102e+01 1.62615315e+02 1.93732257e+02
##   1.70557296e+02 1.12880287e+02 3.05440776e+02 8.86637547e+01]
##  [2.65511748e+02 1.02969876e+02 6.78992650e+01 1.39241535e+02
##   7.09419207e+01 4.66006748e+01 1.00007290e+02 1.19143993e+02
##   1.04891553e+02 6.94205928e+01 1.87843957e+02 5.45275937e+01]
##  [6.71271237e+01 2.60330160e+01 1.71664056e+01 3.52032775e+01
##   1.79356549e+01 1.17816604e+01 2.52840101e+01 3.01221834e+01
##   2.65188577e+01 1.75510302e+01 4.74910230e+01 1.37857573e+01]
##  [2.42336908e+02 9.39822870e+01 6.19727678e+01 1.27088023e+02
##   6.47498494e+01 4.25331968e+01 9.12782865e+01 1.08744668e+02
##   9.57362333e+01 6.33613086e+01 1.71448247e+02 4.97682251e+01]
##  [8.25104229e+01 3.19989155e+01 2.11003735e+01 4.32706953e+01
##   2.20459091e+01 1.44816243e+01 3.10782624e+01 3.70251838e+01
##   3.25960959e+01 2.15731413e+01 5.83743825e+01 1.69449934e+01]
##  [3.45624774e+01 1.34039041e+01 8.83865526e+00 1.81254970e+01
##   9.23472708e+00 6.06615255e+00 1.30182552e+01 1.55093385e+01
##   1.36540547e+01 9.03669117e+00 2.44522232e+01 7.09802386e+00]
##  [1.19869864e+00 4.64875286e-01 3.06542957e-01 6.28629955e-01
##   3.20279552e-01 2.10386794e-01 4.51500181e-01 5.37896132e-01
##   4.73551030e-01 3.13411254e-01 8.48053982e-01 2.46174238e-01]
##  [4.77481624e+01 1.85175322e+01 1.22106278e+01 2.50404266e+01
##   1.27578021e+01 8.38040728e+00 1.79847572e+01 2.14261959e+01
##   1.88631160e+01 1.24842150e+01 3.37808170e+01 9.80594047e+00]
##  [3.99566213e-01 1.54958429e-01 1.02180986e-01 2.09543318e-01
##   1.06759851e-01 7.01289312e-02 1.50500060e-01 1.79298711e-01
##   1.57850343e-01 1.04470418e-01 2.82684661e-01 8.20580793e-02]
##  [1.19869864e+00 4.64875286e-01 3.06542957e-01 6.28629955e-01
##   3.20279552e-01 2.10386794e-01 4.51500181e-01 5.37896132e-01
##   4.73551030e-01 3.13411254e-01 8.48053982e-01 2.46174238e-01]
##  [2.64712616e+02 1.02659959e+02 6.76949030e+01 1.38822448e+02
##   7.07284010e+01 4.64604169e+01 9.97062899e+01 1.18785396e+02
##   1.04575853e+02 6.92116520e+01 1.87278588e+02 5.43634775e+01]
##  [2.85689842e+01 1.10795277e+01 7.30594047e+00 1.49823473e+01
##   7.63332932e+00 5.01421858e+00 1.07607543e+01 1.28198578e+01
##   1.12862996e+01 7.46963490e+00 2.02119532e+01 5.86715267e+00]
##  [2.52725630e+02 9.80112062e+01 6.46294734e+01 1.32536149e+02
##   6.75256055e+01 4.43565490e+01 9.51912881e+01 1.13406435e+02
##   9.98403422e+01 6.60775395e+01 1.78798048e+02 5.19017351e+01]
##  [1.64621280e+02 6.38428726e+01 4.20985661e+01 8.63318472e+01
##   4.39850584e+01 2.88931197e+01 6.20060248e+01 7.38710688e+01
##   6.50343415e+01 4.30418123e+01 1.16466080e+02 3.38079287e+01]
##  [4.25538017e+01 1.65030727e+01 1.08822750e+01 2.23163634e+01
##   1.13699241e+01 7.46873117e+00 1.60282564e+01 1.90953127e+01
##   1.68110616e+01 1.11260995e+01 3.01059164e+01 8.73918544e+00]]

Based on the provided result, we can reject the null hypothesis and conclude that there is a significant association between genre and platform.

Since the p-value is 0, which is less than the typical significance level of 0.05, we can reject the null hypothesis. This means that there is a statistically significant association between genre and platform. The observed association is unlikely to have occurred by chance alone.

3.3.3 CONCLUSION

Our statistical analysis reveals a clear association between game genre and platform, highlighting the critical importance of choosing the optimal platform when releasing a game to maximize potential profit. In other words, specific genres tend to perform better on certain platforms, underscoring the need for careful consideration during the decision-making process for game releases.

4 CONCLUSION

After carefully examining our statistical analysis, the market research team presents the following strategic recommendations to the esteemed executives:

North America could be a potential region to be invested in the future.
Additionally, our analysis reveals a strong correlation between platform and game genre. This suggests that certain game genres are better suited for specific platforms. Therefore, aligning the release of compatible game genres with appropriate platforms is essential for maximizing profit margins.

DATA ANALYSIS: VIDEO GAME MARKET