| Nabila Chesaria Octavia Putri | 5052241006 |
| Amelia Widiastuti | 5052241007 |
| Agata Corinna Aulia Widyawati | 5052241036 |
Data Billionaire merupakan kumpulan informasi mengenai individu-individu terkaya di dunia. Data ini mencakup berbagai aspek, seperti nama, negara asal, total kekayaan, usia, industri yang digeluti, peringkat global, dan informasi relevan lainnya. Dengan menganalisis data ini, kita dapat memperoleh berbagai wawasan menarik, seperti faktor-faktor yang melatarbelakangi seseorang menjadi miliarder. Faktor-faktor tersebut dapat berasal dari negara asal, sektor industri, hingga kemungkinan adanya warisan kekayaan.
Untuk mengetahui apa saja hal yang bisa dijawab dari data ini, kami menyusun 3 pertanyaan utama. Pertanyaannya adalah sebagai berikut:
Sebelum mulai ke tahap visuaisasi, kami akan memulai dari memproses data hingga data menjadi data yang bersih dan siap di olah.
Sebelum masuk ke tahap analisis, kami melakukan tahap pre-processing data untuk membersihkan dan mempersiapkan data mentah menjadi format yang lebih siap untuk analisis lebih lanjut.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(ggcorrplot)
billion = read.csv("FIX_BILLIONAIRE.csv")
glimpse(billion)
## Rows: 2,640
## Columns: 35
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, …
## $ finalWorth <int> 211000, 180000, 114000, 107…
## $ category <chr> "Fashion & Retail", "Automo…
## $ personName <chr> "Bernard Arnault & family",…
## $ age <int> 74, 51, 59, 78, 92, 67, 81,…
## $ country <chr> "France", "United States", …
## $ city <chr> "Paris", "Austin", "Medina"…
## $ source <chr> "LVMH", "Tesla, SpaceX", "A…
## $ industries <chr> "Fashion & Retail", "Automo…
## $ countryOfCitizenship <chr> "France", "United States", …
## $ organization <chr> "LVMH Moët Hennessy Louis V…
## $ selfMade <lgl> FALSE, TRUE, TRUE, TRUE, TR…
## $ status <chr> "U", "D", "D", "U", "D", "D…
## $ gender <chr> "M", "M", "M", "M", "M", "M…
## $ birthDate <chr> "3/5/1949 0:00", "6/28/1971…
## $ lastName <chr> "Arnault", "Musk", "Bezos",…
## $ firstName <chr> "Bernard", "Elon", "Jeff", …
## $ title <chr> "Chairman and CEO", "CEO", …
## $ date <chr> "4/4/2023 5:01", "4/4/2023 …
## $ state <chr> "", "Texas", "Washington", …
## $ residenceStateRegion <chr> "", "South", "West", "West"…
## $ birthYear <int> 1949, 1971, 1964, 1944, 193…
## $ birthMonth <int> 3, 6, 1, 8, 8, 10, 2, 1, 4,…
## $ birthDay <int> 5, 28, 12, 17, 30, 28, 14, …
## $ cpi_country <dbl> 110.05, 117.24, 117.24, 117…
## $ cpi_change_country <dbl> 1.1, 7.5, 7.5, 7.5, 7.5, 7.…
## $ gdp_country <chr> "$2,715,518,274,227 ", "$21…
## $ gross_tertiary_education_enrollment <dbl> 65.6, 88.2, 88.2, 88.2, 88.…
## $ gross_primary_education_enrollment_country <dbl> 102.5, 101.8, 101.8, 101.8,…
## $ life_expectancy_country <dbl> 82.5, 78.5, 78.5, 78.5, 78.…
## $ tax_revenue_country_country <dbl> 24.2, 9.6, 9.6, 9.6, 9.6, 9…
## $ total_tax_rate_country <dbl> 60.7, 36.6, 36.6, 36.6, 36.…
## $ population_country <int> 67059887, 328239523, 328239…
## $ latitude_country <dbl> 46.22764, 37.09024, 37.0902…
## $ longitude_country <dbl> 2.213749, -95.712891, -95.7…
Ada data “” yang ditemukan dan data tersebut adalah NA.
Lalu, ditemukan juga bahwa tipe data gdp dan date adalah
char padahal harusnya numeric.
library(readr)
billion <- billion %>%
mutate(date = as.Date(date, format = "%d/%m/%Y"))
billion <- billion %>%
mutate(gdp_country = as.character(gdp_country), # mastiin tipenya char
gdp_country = parse_number(gdp_country)) # buang char $ dan koma
glimpse(billion)
## Rows: 2,640
## Columns: 35
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, …
## $ finalWorth <int> 211000, 180000, 114000, 107…
## $ category <chr> "Fashion & Retail", "Automo…
## $ personName <chr> "Bernard Arnault & family",…
## $ age <int> 74, 51, 59, 78, 92, 67, 81,…
## $ country <chr> "France", "United States", …
## $ city <chr> "Paris", "Austin", "Medina"…
## $ source <chr> "LVMH", "Tesla, SpaceX", "A…
## $ industries <chr> "Fashion & Retail", "Automo…
## $ countryOfCitizenship <chr> "France", "United States", …
## $ organization <chr> "LVMH Moët Hennessy Louis V…
## $ selfMade <lgl> FALSE, TRUE, TRUE, TRUE, TR…
## $ status <chr> "U", "D", "D", "U", "D", "D…
## $ gender <chr> "M", "M", "M", "M", "M", "M…
## $ birthDate <chr> "3/5/1949 0:00", "6/28/1971…
## $ lastName <chr> "Arnault", "Musk", "Bezos",…
## $ firstName <chr> "Bernard", "Elon", "Jeff", …
## $ title <chr> "Chairman and CEO", "CEO", …
## $ date <date> 2023-04-04, 2023-04-04, 20…
## $ state <chr> "", "Texas", "Washington", …
## $ residenceStateRegion <chr> "", "South", "West", "West"…
## $ birthYear <int> 1949, 1971, 1964, 1944, 193…
## $ birthMonth <int> 3, 6, 1, 8, 8, 10, 2, 1, 4,…
## $ birthDay <int> 5, 28, 12, 17, 30, 28, 14, …
## $ cpi_country <dbl> 110.05, 117.24, 117.24, 117…
## $ cpi_change_country <dbl> 1.1, 7.5, 7.5, 7.5, 7.5, 7.…
## $ gdp_country <dbl> 2.715518e+12, 2.142770e+13,…
## $ gross_tertiary_education_enrollment <dbl> 65.6, 88.2, 88.2, 88.2, 88.…
## $ gross_primary_education_enrollment_country <dbl> 102.5, 101.8, 101.8, 101.8,…
## $ life_expectancy_country <dbl> 82.5, 78.5, 78.5, 78.5, 78.…
## $ tax_revenue_country_country <dbl> 24.2, 9.6, 9.6, 9.6, 9.6, 9…
## $ total_tax_rate_country <dbl> 60.7, 36.6, 36.6, 36.6, 36.…
## $ population_country <int> 67059887, 328239523, 328239…
## $ latitude_country <dbl> 46.22764, 37.09024, 37.0902…
## $ longitude_country <dbl> 2.213749, -95.712891, -95.7…
Data sudah berubah ke tipe yang benar, maka data bisa diolah/cleaning lebih lanjut.
billion %>% duplicated() %>% sum()
## [1] 0
billion %>% filter(duplicated(.)) #show duplicated
## [1] rank
## [2] finalWorth
## [3] category
## [4] personName
## [5] age
## [6] country
## [7] city
## [8] source
## [9] industries
## [10] countryOfCitizenship
## [11] organization
## [12] selfMade
## [13] status
## [14] gender
## [15] birthDate
## [16] lastName
## [17] firstName
## [18] title
## [19] date
## [20] state
## [21] residenceStateRegion
## [22] birthYear
## [23] birthMonth
## [24] birthDay
## [25] cpi_country
## [26] cpi_change_country
## [27] gdp_country
## [28] gross_tertiary_education_enrollment
## [29] gross_primary_education_enrollment_country
## [30] life_expectancy_country
## [31] tax_revenue_country_country
## [32] total_tax_rate_country
## [33] population_country
## [34] latitude_country
## [35] longitude_country
## <0 rows> (or 0-length row.names)
billion <- billion %>%
mutate(across(where(is.character), ~na_if(., "")))
colSums(is.na(billion))
## rank
## 0
## finalWorth
## 0
## category
## 0
## personName
## 0
## age
## 65
## country
## 38
## city
## 72
## source
## 0
## industries
## 0
## countryOfCitizenship
## 0
## organization
## 2315
## selfMade
## 0
## status
## 0
## gender
## 0
## birthDate
## 76
## lastName
## 0
## firstName
## 3
## title
## 2301
## date
## 0
## state
## 1887
## residenceStateRegion
## 1893
## birthYear
## 76
## birthMonth
## 76
## birthDay
## 76
## cpi_country
## 184
## cpi_change_country
## 184
## gdp_country
## 164
## gross_tertiary_education_enrollment
## 182
## gross_primary_education_enrollment_country
## 181
## life_expectancy_country
## 182
## tax_revenue_country_country
## 183
## total_tax_rate_country
## 182
## population_country
## 164
## latitude_country
## 164
## longitude_country
## 164
na_cols <- billion %>%
select(where(is.character)) %>%
names()
billion <- billion %>%
mutate(across(all_of(na_cols), ~replace_na(., "Unknown")))
glimpse(billion)
## Rows: 2,640
## Columns: 35
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, …
## $ finalWorth <int> 211000, 180000, 114000, 107…
## $ category <chr> "Fashion & Retail", "Automo…
## $ personName <chr> "Bernard Arnault & family",…
## $ age <int> 74, 51, 59, 78, 92, 67, 81,…
## $ country <chr> "France", "United States", …
## $ city <chr> "Paris", "Austin", "Medina"…
## $ source <chr> "LVMH", "Tesla, SpaceX", "A…
## $ industries <chr> "Fashion & Retail", "Automo…
## $ countryOfCitizenship <chr> "France", "United States", …
## $ organization <chr> "LVMH Moët Hennessy Louis V…
## $ selfMade <lgl> FALSE, TRUE, TRUE, TRUE, TR…
## $ status <chr> "U", "D", "D", "U", "D", "D…
## $ gender <chr> "M", "M", "M", "M", "M", "M…
## $ birthDate <chr> "3/5/1949 0:00", "6/28/1971…
## $ lastName <chr> "Arnault", "Musk", "Bezos",…
## $ firstName <chr> "Bernard", "Elon", "Jeff", …
## $ title <chr> "Chairman and CEO", "CEO", …
## $ date <date> 2023-04-04, 2023-04-04, 20…
## $ state <chr> "Unknown", "Texas", "Washin…
## $ residenceStateRegion <chr> "Unknown", "South", "West",…
## $ birthYear <int> 1949, 1971, 1964, 1944, 193…
## $ birthMonth <int> 3, 6, 1, 8, 8, 10, 2, 1, 4,…
## $ birthDay <int> 5, 28, 12, 17, 30, 28, 14, …
## $ cpi_country <dbl> 110.05, 117.24, 117.24, 117…
## $ cpi_change_country <dbl> 1.1, 7.5, 7.5, 7.5, 7.5, 7.…
## $ gdp_country <dbl> 2.715518e+12, 2.142770e+13,…
## $ gross_tertiary_education_enrollment <dbl> 65.6, 88.2, 88.2, 88.2, 88.…
## $ gross_primary_education_enrollment_country <dbl> 102.5, 101.8, 101.8, 101.8,…
## $ life_expectancy_country <dbl> 82.5, 78.5, 78.5, 78.5, 78.…
## $ tax_revenue_country_country <dbl> 24.2, 9.6, 9.6, 9.6, 9.6, 9…
## $ total_tax_rate_country <dbl> 60.7, 36.6, 36.6, 36.6, 36.…
## $ population_country <int> 67059887, 328239523, 328239…
## $ latitude_country <dbl> 46.22764, 37.09024, 37.0902…
## $ longitude_country <dbl> 2.213749, -95.712891, -95.7…
bill_clean <- billion %>%
distinct() %>%
drop_na(where(is.numeric))
summary(bill_clean)
## rank finalWorth category personName
## Min. : 1 Min. : 1000 Length:2397 Length:2397
## 1st Qu.: 636 1st Qu.: 1500 Class :character Class :character
## Median :1272 Median : 2400 Mode :character Mode :character
## Mean :1276 Mean : 4759
## 3rd Qu.:1905 3rd Qu.: 4300
## Max. :2540 Max. :211000
## age country city source
## Min. : 18.00 Length:2397 Length:2397 Length:2397
## 1st Qu.: 56.00 Class :character Class :character Class :character
## Median : 65.00 Mode :character Mode :character Mode :character
## Mean : 64.96
## 3rd Qu.: 74.00
## Max. :101.00
## industries countryOfCitizenship organization selfMade
## Length:2397 Length:2397 Length:2397 Mode :logical
## Class :character Class :character Class :character FALSE:713
## Mode :character Mode :character Mode :character TRUE :1684
##
##
##
## status gender birthDate lastName
## Length:2397 Length:2397 Length:2397 Length:2397
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## firstName title date state
## Length:2397 Length:2397 Min. :2023-04-04 Length:2397
## Class :character Class :character 1st Qu.:2023-04-04 Class :character
## Mode :character Mode :character Median :2023-04-04 Mode :character
## Mean :2023-04-04
## 3rd Qu.:2023-04-04
## Max. :2023-04-04
## residenceStateRegion birthYear birthMonth birthDay
## Length:2397 Min. :1921 Min. : 1.000 Min. : 1.00
## Class :character 1st Qu.:1948 1st Qu.: 2.000 1st Qu.: 1.00
## Mode :character Median :1958 Median : 6.000 Median :11.00
## Mean :1957 Mean : 5.757 Mean :12.28
## 3rd Qu.:1967 3rd Qu.: 9.000 3rd Qu.:21.00
## Max. :2004 Max. :12.000 Max. :31.00
## cpi_country cpi_change_country gdp_country
## Min. : 99.55 Min. :-1.900 Min. :1.367e+10
## 1st Qu.:117.24 1st Qu.: 1.700 1st Qu.:1.736e+12
## Median :117.24 Median : 2.900 Median :1.991e+13
## Mean :127.90 Mean : 4.401 Mean :1.171e+13
## 3rd Qu.:125.08 3rd Qu.: 7.500 3rd Qu.:2.143e+13
## Max. :288.57 Max. :53.500 Max. :2.143e+13
## gross_tertiary_education_enrollment gross_primary_education_enrollment_country
## Min. : 4.00 Min. : 84.7
## 1st Qu.: 50.60 1st Qu.:100.2
## Median : 67.00 Median :101.8
## Mean : 67.47 Mean :102.9
## 3rd Qu.: 88.20 3rd Qu.:102.6
## Max. :136.60 Max. :142.1
## life_expectancy_country tax_revenue_country_country total_tax_rate_country
## Min. :54.3 Min. : 0.10 Min. : 9.90
## 1st Qu.:77.0 1st Qu.: 9.60 1st Qu.: 36.60
## Median :78.5 Median : 9.60 Median : 38.70
## Mean :78.1 Mean :12.58 Mean : 43.81
## 3rd Qu.:80.9 3rd Qu.:12.80 3rd Qu.: 59.10
## Max. :84.2 Max. :37.20 Max. :106.30
## population_country latitude_country longitude_country
## Min. :6.454e+05 Min. :-40.90 Min. :-106.35
## 1st Qu.:6.706e+07 1st Qu.: 35.86 1st Qu.: -95.71
## Median :3.282e+08 Median : 37.09 Median : 10.45
## Mean :5.103e+08 Mean : 34.78 Mean : 11.58
## 3rd Qu.:1.366e+09 3rd Qu.: 38.96 3rd Qu.: 104.20
## Max. :1.398e+09 Max. : 61.92 Max. : 174.89
# Final Worth
Outlier_FinalWorth <- ggplot(bill_clean, aes(x = "", y = finalWorth)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Final Worth", y = "Final Worth")
# Age
Outlier_Age <- ggplot(bill_clean, aes(x = "", y = age)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Age", y = "Age")
# CPI Country
Outlier_CpiCountry <- ggplot(bill_clean, aes(x = "", y = cpi_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot CPI Country", y = "CPI Country")
# CPI Change
Outlier_CpiChange <- ggplot(bill_clean, aes(x = "", y = cpi_change_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot CPI Change Country", y = "CPI Change Country")
# Gross Tertiary Education Enrollment
Outlier_GrossTertiary <- ggplot(bill_clean, aes(x = "", y = gross_tertiary_education_enrollment)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Gross Tertiary Education Enrollment", y = "Gross Tertiary Education Enrollment")
# Gross Primary Education Enrollment
Outlier_GrossPrimary <- ggplot(bill_clean, aes(x = "", y = gross_primary_education_enrollment_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Gross Primary Education Enrollment", y = "Gross Primary Education Enrollment")
# Life Expectancy
Outlier_LifeExpectancy <- ggplot(bill_clean, aes(x = "", y = life_expectancy_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Life Expectancy Country", y = "Life Expectancy Country")
# Tax Revenue
Outlier_TaxRevenue <- ggplot(bill_clean, aes(x = "", y = tax_revenue_country_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Tax Revenue Country", y = "Tax Revenue Country")
# Total Tax Rate
Outlier_TotalTaxRate <- ggplot(bill_clean, aes(x = "", y = total_tax_rate_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Total Tax Rate Country", y = "Total Tax Rate Country")
# Population
Outlier_Population <- ggplot(bill_clean, aes(x = "", y = population_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Population Country", y = "Population Country")
grid.arrange(
Outlier_FinalWorth,
Outlier_Age,
Outlier_CpiCountry,
Outlier_CpiChange,
Outlier_GrossTertiary,
Outlier_GrossPrimary,
Outlier_LifeExpectancy,
Outlier_TaxRevenue,
Outlier_TotalTaxRate,
Outlier_Population,
ncol = 4
)
Setelah melakukan pre-prosessing data, kami melanjutkan ke tahap analisis data. Untuk itu, kami menggunakan pertanyaan yang telah kami susun untuk mencari tahu mengenai faktor yang melatarbelakangi seseorang menjadi miliarder.
Kita akan lihat bagaimana negara dan industri itu berpengaruh terhadap kekayaan seseorang.
bill_clean %>%
count(country, sort = TRUE) %>%
ggplot(aes(x = reorder(country, n), y = n)) +
geom_col(fill = "tan1") +
coord_flip() + #nuker sumbu x dan y
geom_text(
aes(label = paste0(n, " (", scales::percent(n/sum(n), accuracy = 0.1), ")")),
hjust = -0.05,
size = 3.5
) +
scale_y_continuous(expand = expansion(mult = c(0,0.25))) +
labs(title = "Jumlah Miliarder di Tiap Negara",
x = "Negara", y = "Jumlah Miliarder") +
theme_minimal()
bill_clean %>%
count(country, sort = TRUE) %>%
head(10) %>%
ggplot(aes(x = reorder(country, n), y = n)) +
geom_col(fill = "tan1") +
coord_flip() + #nuker sumbu x dan y
geom_text(
aes(label = paste0(n, " (", scales::percent(n / sum(n), accuracy = 0.1), ")")),
hjust = -0.05, size = 3.5
) +
scale_y_continuous(expand = expansion(mult = c(0,0.15))) +
labs(title = "Top 10 Negara dengan Miliarder Terbanyak",
x = "Negara", y = "Jumlah Miliarder") +
theme_minimal()
Dari kedua grafik, dapat dilihat bahwa US dan China menduduki peringkat teratas Negara dengan jumlah miliarder terbanyak di dunia. Maka dari itu, bisa jadi ada kemungkinan bahwa negara itu berpengaruh terhadap kekayaan seseorang.
top_industries <- bill_clean %>%
group_by(industries) %>%
summarise(total_wealth = sum(finalWorth, na.rm = TRUE)) %>%
arrange(desc(total_wealth)) %>%
slice_head(n = 10) %>%
mutate(pct = total_wealth / sum(total_wealth))
ggplot(top_industries, aes(x = reorder(industries, total_wealth), y = total_wealth)) +
geom_bar(stat = "identity", fill = "tan1") +
geom_text(
aes(label = paste0(
comma(total_wealth, accuracy = 1),
" (", percent(pct, accuracy = 0.1), ")"
)),
hjust = -0.05,
size = 3.5
) +
labs(
title = "Top 10 Industri Berdasarkan Total Kekayaan",
x = "Industri",
y = "Total Kekayaan (Juta USD)"
) +
coord_flip() +
theme_minimal() +
scale_y_continuous(expand = expansion(mult = c(0, 0.4))) # beri ruang di ujung kanan
Dari grafik ini, terlihat bahwa industri Technology, Fashion & Retail, dan Finance & Investments adalah tiga industri teratas yang menghasilkan kekayan paling banyak dibandingkan industri lain. Hal ini karena kemungkinan kemajuan jaman—terutama di industri Technology—yang membuat ketiga industri ini menghasilkan total kekayaan yang banyak.
top_countries_order <- bill_clean %>%
count(country, sort = TRUE) %>%
slice_max(n, n = 5)
industry_country <- bill_clean %>%
filter(country %in% top_countries_order$country, !is.na(industries)) %>%
count(country, industries)
#Urutan industri berdasarkan total global
industry_levels <- industry_country %>%
filter(country %in% top_countries_order$country, !is.na(industries)) %>%
count(industries) %>%
arrange(desc(n)) %>%
pull(industries)
#Filter dan hitung jumlah miliarder per industri di 5 negara tersebut
industry_country <- industry_country %>%
mutate(
industries = factor(industries, levels = rev(industry_levels)),
country = factor(country, levels = top_countries_order$country)
)
#Visualisasi facet (dengan urutan negara sesuai ranking)
ggplot(industry_country, aes(x = industries, y = n, fill = industries)) +
geom_col(show.legend = FALSE) +
coord_flip() +
geom_text(aes(label = n), hjust = 0, size = 2.5) +
scale_x_discrete(expand = expansion(mult = c(0, 0.1))) +
facet_wrap(~country, scales = "free_y") +
labs(
title = "Distribusi Industri Miliarder di Negara-Negara Teratas",
x = "Industri",
y = "Jumlah Miliarder"
) +
theme_bw(base_size = 10) +
theme(
strip.text = element_text(size = 10, face = "bold"),
axis.text.x = element_text(size = 8),
axis.text.y = element_text(size = 7)
)
Dari grafik ini, kita jadi mengetahui tiap-tiap negara dengan jumlah miliarder terbanyak, industri apa yang paling banyak digeluti oleh para miliardernya. Jadi, kita bisa melihat ketika kita ingin menjadi miliarder, tempat atau lokasi dan industri mana saja yang membuka peluang lebih besar untuk menjadikan kita seorang miliarder.
bill_clean %>%
filter(!is.na(gender), !is.na(selfMade)) %>%
ggplot(aes(x = selfMade, y = finalWorth, fill = gender)) +
geom_boxplot() +
scale_fill_manual(values = c("Male" = "navy", "F" = "hotpink")) +
labs(title = "Distribusi Kekayaan Berdasarkan Self-Made dan Gender",
x = "Status Kekayaan", y = "Kekayaan (Juta USD)") +
ylim(0, 10000) +
theme_minimal()
Dapat diketahui bahwa kekayaan pewaris lebih besar dibanding perintis. Boxplot ini juga dapat menunjukkan bahwa variasi kekayaan miliarder wanita pewaris lebih banyak daripada miliarder pria pewaris maupun perintis. Jadi, seorang perintis tetap berkemungkinan memiliki kekayaan yang setara pewaris tetapi tetap ada perbedaan dari keduanya.
bill_clean %>%
group_by(industries, gender) %>%
summarise(count = n()) %>%
ggplot(aes(x = reorder(industries, count), y = count, fill = gender)) +
scale_fill_manual(values = c("M" = "navy", "F" = "hotpink")) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = count),
position = position_dodge(width = 0.8),
hjust = -0.1, size = 2) +
coord_flip() +
labs(title = "Jumlah Miliarder Berdasarkan Industri dan Gender", x = "Industri", y = "Jumlah", fill = "Gender") +
theme_minimal()
## `summarise()` has grouped output by 'industries'. You can override using the
## `.groups` argument.
Menunjukkan bahwa gender di tiap sektor masih didominasi oleh laki-laki.
ggplot(bill_clean, aes(x = age, y = finalWorth, color = gender)) +
geom_point(alpha = 0.9) +
geom_smooth()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
labs(title = "Hubungan Usia dan Kekayaan Berdasarkan Gender",
x = "Usia", y = "Kekayaan (Miliar USD)") +
theme_minimal()
## NULL
Hal ini menunjukkan bahwa usia tidak terlalu memiliki pengaruh pada kekayaan total. Puncak dari distribusi berada di sekitar 50-75 tahun yang menunjukkan angka rata-rata miliarder di dunia.
bill_clean %>%
mutate(age_group = cut(age, breaks = c(0, 40, 60, 80, 100),
labels = c("<=40", "41-60", "61-80", ">80"))) %>%
filter(!is.na(industries)) %>%
count(age_group, industries) %>%
top_n(40, n) %>% # ambil 40 baris tertinggi agar visualisasi tetap jelas
ggplot(aes(x = reorder(industries, n), y = n, fill = age_group)) +
geom_col(position = "dodge") +
coord_flip() +
labs(title = "Distribusi Miliarder Berdasarkan Usia dan Industri",
x = "Industri",
y = "Jumlah Miliarder",
fill = "Kelompok Usia") +
theme_minimal()
Dari sini dapat kita lihat bahwa miliarder di finance lebih banyak berusia 61-80 tahun, maka ada kemungkinan mereka berkecimpung di sektor tersebut dari lama. Sedangkan miliarder dengan kelompok usia paling bervariasi berada di sektor teknologi yang memang sedang digemari akhir-akhir ini.
vars_individu <- bill_clean %>%
select(finalWorth, age)
vars_negara <- bill_clean %>%
select(gdp_country, cpi_country, population_country,
total_tax_rate_country, tax_revenue_country_country)
cor_individu <- cor(vars_individu, use = "complete.obs")
cor_negara <- cor(vars_negara, use = "complete.obs")
ggcorrplot(cor_individu,
lab = TRUE,
type = "full",
colors = c("skyblue", "white", "firebrick"),
title = "Korelasi Variabel Individu",
lab_size = 4)
Hal ini relate dengan pernyataan pada scatterplot di atas bahwa umur tidak terlalu berpengaruh pada kekayaan.
ggcorrplot(cor_negara,
lab = TRUE,
type = "full",
colors = c("tan2", "white", "salmon"),
title = "Korelasi Variabel Negara",
lab_size = 4)
Arti: