data <- read.csv("C:\\Users\\gajaw\\OneDrive\\Desktop\\STATS\\vgsales.csv")

Summary Of Global Sales Column

col1_summary <- summary(data$Global_Sales)

print(col1_summary)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.0600  0.1700  0.5374  0.4700 82.7400

Insight:
The global sales value ranges from 0.17 million at the median to 82.74 million at the highest. This demonstrates a notable discrepancy in sales, suggesting that a small percentage of games have far higher global sales than the bulk.

Significance: A wide discrepancy between the median and maximum sales points to a few blockbuster games controlling the majority of total sales in the market. Knowing which games fit into this group will make it easier to focus on popular features or genres.

Further Question: What traits do these blockbuster games share (genre, publisher, platform, etc.) and how are they different from titles that have sold less?

Summary Of Other Sales Column

col2_summary <- summary(data$Other_Sales)

print(col2_summary)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.00000  0.00000  0.01000  0.04806  0.04000 10.57000

Insight: Other_Sales values range from 0.01 million at the median to 10.57 million at the maximum, showing that, with very few exceptions, most games have relatively modest sales outside of the primary markets (NA, EU, and JP).

Significance: This implies that the primary markets account for the great bulk of game sales, with other regions contributing very little to the total. To more successfully enter these smaller markets, businesses might need to create tailored tactics.

Further Question: Which games, and what special qualities or tactics helped them succeed, have sold better in these other regions?

Summary of a Categorical Column

col3_unique_val <- unique(data$Genre) 
col3_val_count <- table(data$Genre)
#Categorical Summary for column Publisher' 
print(data.frame(Value=col3_unique_val, Count= col3_val_count))
##           Value   Count.Var1 Count.Freq
## 1        Sports       Action       3316
## 2      Platform    Adventure       1286
## 3        Racing     Fighting        848
## 4  Role-Playing         Misc       1739
## 5        Puzzle     Platform        886
## 6          Misc       Puzzle        582
## 7       Shooter       Racing       1249
## 8    Simulation Role-Playing       1488
## 9        Action      Shooter       1310
## 10     Fighting   Simulation        867
## 11    Adventure       Sports       2346
## 12     Strategy     Strategy        681

Insight : The Sports genre has the most games (3316), suggesting that it is one of the most regularly generated or categorized categories in the dataset.

Significance: A robust and steady market demand could be the reason behind the popularity of the Sports genre. Knowing the reasons behind the popularity of this genre could be useful in creating new games or improving ones that already exist to appeal to this market.

Further Question: Does the greater quantity of sports games correspond with higher worldwide sales figures for this genre, or does the volume of creation not equal the success of sales?

Novel Questions to Investigate

Aggregate function for question 1 -

mean_global_sales<- aggregate(Global_Sales ~ Genre, data= data,sum)
print(mean_global_sales)
##           Genre Global_Sales
## 1        Action      1751.18
## 2     Adventure       239.04
## 3      Fighting       448.91
## 4          Misc       809.96
## 5      Platform       831.37
## 6        Puzzle       244.95
## 7        Racing       732.04
## 8  Role-Playing       927.37
## 9       Shooter      1037.37
## 10   Simulation       392.20
## 11       Sports      1330.93
## 12     Strategy       175.12

Insights:

Top-Performing Genres Globally:

Moderate-Performing Genres:

Lower-Performing Genres:

Significance:

Further Questions:

Visual Summaries

Box plot visualization between 2 columns - Global sales & Genre

library(ggplot2)

ggplot(data, aes(x = Genre, y = Global_Sales)) +
  geom_boxplot(fill = "skyblue", color = "black") +
  labs(title = "Distribution of global sales by genre", x = "Genre", y = "Global sales in millions") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Insights:

Significance :

Further Questions:

Scatter Plot for correlation between columns - Global sales and other sales by Genre

ggplot(data, aes(x = Global_Sales, y = Other_Sales, color = Genre)) +
  geom_point(alpha = 0.7) +
  labs(title = "Correlation between global sales and other sales by Genre", x = "Global sales in millions)", y = "Other sales in millions)") +
  theme_minimal()

Insight:
The scatter plot indicates a favourable association between video game sales globally and other sales categories, with the majority of titles having poor sales in both. Certain genres, such as “Action,” “Shooter,” and “Platform,” have sales outliers that are abnormally high, suggesting a wider audience.

Significance: Games that sell well internationally typically sell better elsewhere or on different platforms. Nonetheless, the majority of games don’t sell well.

Further Questions:

ggplot(data, aes(x = Year, y = Global_Sales, color = Genre , group = Genre )) +
  geom_line() +
  labs(title = "Trend of global sales over time by Genre ", x = "Year", y = "Global Sales in millions") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Insights:
Sales Fluctuate: There is a clear peak in the “Sports” category, but sales patterns vary greatly between genres, with some exhibiting strong spikes.
Consistently Poor Sales: Over time, some genres have had consistently low sales, which may indicate a decline in popularity or growth.

Significance:
By identifying popular genres over time, this data can assist publishers and game developers in scheduling upcoming releases.

Further Questions:

ggplot(data, aes(x = Genre, y = Global_Sales, fill = Publisher)) +
  geom_bar(stat = "identity") +
  labs(title = "Sales distribution by Genre and Publisher", x = "Genre", y = "Total global sales in millions")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Insights:
Diverse Publishers for Every Genre: Indicating that some genres are more competitive, some genres have numerous publishers contributing to sales, while others only have a small number. Sales Dominance: Publishers with a strong market position are those whose sales in a certain category are dominated.

Significance:
This can direct new publishers to less competitive sectors and aid in identifying which publishers are successful in specific genres.

Further Questions: