Steam merupakan salah satu platform distribusi game digital terbesar
di dunia.
Dataset ini berisi daftar 100 game terpopuler di Steam pada Juli
2024 beserta jumlah pemain aktif dan harga game.
Tujuan analisis ini adalah: - Melihat distribusi jumlah pemain - Mengetahui game dengan jumlah pemain tertinggi - Membuat visualisasi data menggunakan R
Sumber dataset: Top 100 Steam Games July 2024 — Kaggle
| Nama Variabel | Jenis Data | Skala |
|---|---|---|
| Rank | Numerik | Ordinal |
| Title | Kategorik | Nominal |
| Genre | Kategorik | Nominal |
| Current_Players | Numerik | Rasio |
| Current_Players2 | Numerik | Rasio |
| AllTime_Peak | Numerik | Rasio |
| Price_USD | Numerik | Rasio |
| price_category | Kategorik | Nominal |
data <- read.csv("TOP 100 STEAM JULY 2024.csv", encoding = "latin1")
data <- data %>%
mutate(
rank = as.integer(Rank),
title = as.character(Title),
genre = as.character(Genre),
current_players = as.integer(Current_Players),
current_players2 = as.integer(Current_Players2),
all_time_peak = as.integer(AllTime_Peak),
price_usd = as.numeric(Price_USD),
price_category = ifelse(Price_USD == 0, "Free to Play", "Paid")
)freq_genre <- data %>%
count(genre) %>%
mutate(persen = round(n / sum(n) * 100, 1))
plot_ly(
freq_genre,
labels = ~genre,
values = ~n,
type = "pie",
hole = 0.45,
textinfo = "none",
hovertemplate = "<b>%{label}</b><br>Jumlah Game: %{value}<br>Persentase: %{percent}<extra></extra>",
marker = list(
colors = c(
"#FF6B6B","#4ECDC4","#45B7D1","#96CEB4","#FFEAA7",
"#DDA0DD","#98D8C8","#F7DC6F","#BB8FCE","#F0B27A",
"#82E0AA","#F1948A","#7FB3D3","#A9DFBF","#FAD7A0","#AED6F1"
),
line = list(color = "white", width = 2.5)
)
) %>%
layout(
title = list(
text = "<b>Distribusi Genre — Top 100 Steam Games Juli 2024</b>",
font = list(size = 16, color = "#1a1a2e")
),
legend = list(orientation = "v", x = 1.02, y = 0.5),
annotations = list(
list(
text = "<b>100<br>Games</b>",
x = 0.5, y = 0.5,
font = list(size = 16, color = "#1a1a2e"),
showarrow = FALSE
)
)
)Interpretasi: Genre Shooter mendominasi Top 100 Steam Games Juli 2024 dengan *18 game (18%), diikuti oleh **RPG (14%)* dan *Strategy (13%). Ketiga genre ini bersama-sama menyumbang hampir **45%* dari seluruh dataset, menunjukkan bahwa game berbasis aksi dan strategi paling banyak diminati pemain Steam. Genre seperti MMO/MOBA, Action/Adventure, dan Other hanya diwakili oleh 1 game masing-masing, menandakan genre tersebut kurang kompetitif di platform Steam.
top10 <- data %>%
arrange(desc(current_players)) %>%
slice(1:10)
ggplot(top10, aes(x = reorder(title, current_players), y = current_players)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Game Steam Berdasarkan Current Players",
x = "Game",
y = "Jumlah Pemain")Interpretasi: Counter-Strike 2 menempati posisi pertama dengan *1.015.721 pemain aktif, jauh melampaui game lainnya. Di posisi kedua ada **Dota 2* dengan 702.487 pemain, disusul Banana (409.758) yang merupakan game kasual sederhana namun viral. Menariknya, 7 dari 10 game teratas adalah game Free to Play, membuktikan bahwa model bisnis gratis sangat efektif dalam menarik pemain dalam jumlah besar di platform Steam.
ggplot(data, aes(x = current_players2)) +
geom_histogram(bins = 20, fill = "#4E79A7", color = "white", alpha = 0.85) +
geom_vline(aes(xintercept = mean(current_players2, na.rm = TRUE)),
color = "red", linetype = "dashed", linewidth = 1) +
geom_vline(aes(xintercept = median(current_players2, na.rm = TRUE)),
color = "green", linetype = "dashed", linewidth = 1) +
scale_x_continuous(labels = comma) +
labs(
title = "Histogram: Distribusi Peak Players (24h)",
subtitle = "Garis merah = Mean | Garis hijau = Median",
x = "Peak Players (24h)", y = "Frekuensi"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 13, face = "bold"))Interpretasi: Distribusi peak players (24h) menunjukkan pola right-skewed (condong ke kanan) dengan nilai skewness *4,99, yang berarti sebagian besar game memiliki jumlah pemain yang relatif kecil, namun ada beberapa game yang memiliki jumlah pemain sangat besar (outlier). **Mean (85.991)* jauh lebih besar dari Median (37.380), mengonfirmasi distribusi tidak normal. Ini menunjukkan kesenjangan yang besar antara game populer dan game yang kurang populer di Steam.
ggplot(data, aes(x = current_players2, fill = price_category)) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c("Free to Play" = "#43e97b", "Paid" = "#f7971e")) +
scale_x_continuous(labels = comma) +
labs(
title = "Density Plot: Peak Players berdasarkan Kategori Harga",
x = "Peak Players (24h)", y = "Densitas",
fill = "Kategori Harga"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 13, face = "bold"))Interpretasi: Kurva density menunjukkan perbedaan yang signifikan antara game Free to Play dan *Paid. Game gratis memiliki **mean peak players 187.617* dan *median 59.668, jauh lebih tinggi dibandingkan game berbayar dengan **mean 46.469* dan *median 32.578. Kurva Free to Play lebih melebar ke kanan, menandakan beberapa game gratis memiliki basis pemain yang sangat masif. Hal ini mengindikasikan bahwa model **Free to Play lebih efektif* dalam menarik dan mempertahankan pemain.
ggplot(data, aes(x = reorder(genre, price_usd, median),
y = price_usd, fill = genre)) +
geom_boxplot(alpha = 0.7, outlier.color = "red",
outlier.shape = 16, outlier.size = 2) +
coord_flip() +
scale_fill_brewer(palette = "Paired") +
labs(
title = "Boxplot: Harga Game (USD) per Genre",
x = "Genre", y = "Harga (USD)"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 13, face = "bold"),
legend.position = "none"
)Interpretasi: Genre Sports/Racing memiliki median harga tertinggi yaitu *$57,49, diikuti **Strategy ($39,99)* dan *RPG ($39,99). Sebaliknya, genre **Battle Royale, MMO/MOBA, Social/Casual, dan Utility/Tool* seluruhnya memiliki median harga $0 karena didominasi game Free to Play. Genre Sports/Racing juga memiliki rentang harga paling lebar ($0 hingga $69,99), menunjukkan variasi harga yang besar dalam genre tersebut. Titik merah (outlier) terlihat pada beberapa genre yang memiliki 1-2 game dengan harga jauh di atas median genrenya.
mode_function <- function(x) {
ux <- unique(na.omit(x))
ux[which.max(tabulate(match(x, ux)))]
}
tabel_stats <- data.frame(
Ukuran = c("Mean", "Median", "Modus", "Q1 (Kuartil 1)", "Q3 (Kuartil 3)",
"Range", "Varians", "Standar Deviasi"),
Current_Players = c(
round(mean(data$current_players, na.rm = TRUE), 2),
round(median(data$current_players, na.rm = TRUE), 2),
round(mode_function(data$current_players), 2),
round(quantile(data$current_players, 0.25, na.rm = TRUE), 2),
round(quantile(data$current_players, 0.75, na.rm = TRUE), 2),
round(diff(range(data$current_players, na.rm = TRUE)), 2),
round(var(data$current_players, na.rm = TRUE), 2),
round(sd(data$current_players, na.rm = TRUE), 2)
),
Peak_Players_24h = c(
round(mean(data$current_players2, na.rm = TRUE), 2),
round(median(data$current_players2, na.rm = TRUE), 2),
round(mode_function(data$current_players2), 2),
round(quantile(data$current_players2, 0.25, na.rm = TRUE), 2),
round(quantile(data$current_players2, 0.75, na.rm = TRUE), 2),
round(diff(range(data$current_players2, na.rm = TRUE)), 2),
round(var(data$current_players2, na.rm = TRUE), 2),
round(sd(data$current_players2, na.rm = TRUE), 2)
),
AllTime_Peak = c(
round(mean(data$all_time_peak, na.rm = TRUE), 2),
round(median(data$all_time_peak, na.rm = TRUE), 2),
round(mode_function(data$all_time_peak), 2),
round(quantile(data$all_time_peak, 0.25, na.rm = TRUE), 2),
round(quantile(data$all_time_peak, 0.75, na.rm = TRUE), 2),
round(diff(range(data$all_time_peak, na.rm = TRUE)), 2),
round(var(data$all_time_peak, na.rm = TRUE), 2),
round(sd(data$all_time_peak, na.rm = TRUE), 2)
),
Price_USD = c(
round(mean(data$price_usd, na.rm = TRUE), 2),
round(median(data$price_usd, na.rm = TRUE), 2),
round(mode_function(data$price_usd), 2),
round(quantile(data$price_usd, 0.25, na.rm = TRUE), 2),
round(quantile(data$price_usd, 0.75, na.rm = TRUE), 2),
round(diff(range(data$price_usd, na.rm = TRUE)), 2),
round(var(data$price_usd, na.rm = TRUE), 2),
round(sd(data$price_usd, na.rm = TRUE), 2)
)
)
knitr::kable(tabel_stats,
col.names = c("Ukuran Statistik", "Current Players",
"Peak Players (24h)", "All-Time Peak", "Price (USD)"),
align = c("l","r","r","r","r"),
format.args = list(big.mark = ",")
)| Ukuran Statistik | Current Players | Peak Players (24h) | All-Time Peak | Price (USD) |
|---|---|---|---|---|
| Mean | 7.057893e+04 | 8.599056e+04 | 2.762827e+05 | 23.54 |
| Median | 3.067700e+04 | 3.737950e+04 | 1.042010e+05 | 19.99 |
| Modus | 1.015721e+06 | 1.276702e+06 | 1.818773e+06 | 0.00 |
| Q1 (Kuartil 1) | 2.038625e+04 | 2.294275e+04 | 6.986800e+04 | 0.00 |
| Q3 (Kuartil 3) | 6.255050e+04 | 6.582150e+04 | 2.482633e+05 | 39.99 |
| Range | 1.000112e+06 | 1.260827e+06 | 3.235328e+06 | 69.99 |
| Varians | 1.785016e+10 | 2.858351e+10 | 2.213774e+11 | 453.31 |
| Standar Deviasi | 1.336045e+05 | 1.690666e+05 | 4.705076e+05 | 21.29 |
Interpretasi: Rata-rata current players sebesar 70.579 dengan standar deviasi 133.604 menunjukkan variabilitas yang sangat besar antar game. Mean jauh lebih besar dari median pada semua variabel pemain, mengonfirmasi distribusi right-skewed — didominasi segelintir game populer. Modus price_usd adalah $0 (Free to Play), menandakan game gratis paling banyak muncul, sementara Q1 = $0 menunjukkan lebih dari 25% game dalam dataset adalah Free to Play.
Berdasarkan analisis dataset Top 100 Steam July 2024, dapat disimpulkan bahwa:
Analisis ini menunjukkan bagaimana visualisasi data dapat membantu memahami pola popularitas game di platform Steam.