| Nabila Chesaria Octavia Putri | 5052241006 |
| Amelia Widiastuti | 5052241007 |
| Agata Corinna Aulia Widyawati | 5052241036 |
Data Billionaire merupakan kumpulan informasi mengenai individu-individu terkaya di dunia. Data ini mencakup berbagai aspek, seperti nama, negara asal, total kekayaan, usia, industri yang digeluti, peringkat global, dan informasi relevan lainnya. Dengan menganalisis data ini, kita dapat memperoleh berbagai wawasan menarik, seperti faktor-faktor yang melatarbelakangi seseorang menjadi miliarder. Faktor-faktor tersebut dapat berasal dari negara asal, sektor industri, hingga kemungkinan adanya warisan kekayaan.
Untuk mengetahui apa saja hal yang bisa dijawab dari data ini, kami menyusun 3 pertanyaan utama. Pertanyaannya adalah sebagai berikut:
Sebelum mulai ke tahap visuaisasi, kami akan memulai dari memproses data hingga data menjadi data yang bersih dan siap di olah.
Sebelum masuk ke tahap analisis, kami melakukan tahap pre-processing data untuk membersihkan dan mempersiapkan data mentah menjadi format yang lebih siap untuk analisis lebih lanjut.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(ggcorrplot)
billion = read.csv("C:/Users/hp/Documents/AGATA/ITS/Semester 2/MatKul/VDE/eas/FIX_BILLIONAIRE.csv")
glimpse(billion)
## Rows: 2,640
## Columns: 35
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, …
## $ finalWorth <int> 211000, 180000, 114000, 107…
## $ category <chr> "Fashion & Retail", "Automo…
## $ personName <chr> "Bernard Arnault & family",…
## $ age <int> 74, 51, 59, 78, 92, 67, 81,…
## $ country <chr> "France", "United States", …
## $ city <chr> "Paris", "Austin", "Medina"…
## $ source <chr> "LVMH", "Tesla, SpaceX", "A…
## $ industries <chr> "Fashion & Retail", "Automo…
## $ countryOfCitizenship <chr> "France", "United States", …
## $ organization <chr> "LVMH Moët Hennessy Louis V…
## $ selfMade <lgl> FALSE, TRUE, TRUE, TRUE, TR…
## $ status <chr> "U", "D", "D", "U", "D", "D…
## $ gender <chr> "M", "M", "M", "M", "M", "M…
## $ birthDate <chr> "03/05/1949 00.00", "6/28/1…
## $ lastName <chr> "Arnault", "Musk", "Bezos",…
## $ firstName <chr> "Bernard", "Elon", "Jeff", …
## $ title <chr> "Chairman and CEO", "CEO", …
## $ date <chr> "04/04/2023 05.01", "04/04/…
## $ state <chr> "", "Texas", "Washington", …
## $ residenceStateRegion <chr> "", "South", "West", "West"…
## $ birthYear <int> 1949, 1971, 1964, 1944, 193…
## $ birthMonth <int> 3, 6, 1, 8, 8, 10, 2, 1, 4,…
## $ birthDay <int> 5, 28, 12, 17, 30, 28, 14, …
## $ cpi_country <dbl> 110.05, 117.24, 117.24, 117…
## $ cpi_change_country <dbl> 1.1, 7.5, 7.5, 7.5, 7.5, 7.…
## $ gdp_country <chr> "$2,715,518,274,227 ", "$21…
## $ gross_tertiary_education_enrollment <dbl> 65.6, 88.2, 88.2, 88.2, 88.…
## $ gross_primary_education_enrollment_country <dbl> 102.5, 101.8, 101.8, 101.8,…
## $ life_expectancy_country <dbl> 82.5, 78.5, 78.5, 78.5, 78.…
## $ tax_revenue_country_country <dbl> 24.2, 9.6, 9.6, 9.6, 9.6, 9…
## $ total_tax_rate_country <dbl> 60.7, 36.6, 36.6, 36.6, 36.…
## $ population_country <int> 67059887, 328239523, 328239…
## $ latitude_country <dbl> 46.22764, 37.09024, 37.0902…
## $ longitude_country <dbl> 2.213749, -95.712891, -95.7…
Ada data “” yang ditemukan dan data tersebut adalah NA.
Lalu, ditemukan juga bahwa tipe data gdp dan date adalah
char padahal harusnya numeric.
library(readr)
billion <- billion %>%
mutate(date = as.Date(date, format = "%d/%m/%Y"))
billion <- billion %>%
mutate(gdp_country = as.character(gdp_country), # mastiin tipenya char
gdp_country = parse_number(gdp_country)) # buang char $ dan koma
glimpse(billion)
## Rows: 2,640
## Columns: 35
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, …
## $ finalWorth <int> 211000, 180000, 114000, 107…
## $ category <chr> "Fashion & Retail", "Automo…
## $ personName <chr> "Bernard Arnault & family",…
## $ age <int> 74, 51, 59, 78, 92, 67, 81,…
## $ country <chr> "France", "United States", …
## $ city <chr> "Paris", "Austin", "Medina"…
## $ source <chr> "LVMH", "Tesla, SpaceX", "A…
## $ industries <chr> "Fashion & Retail", "Automo…
## $ countryOfCitizenship <chr> "France", "United States", …
## $ organization <chr> "LVMH Moët Hennessy Louis V…
## $ selfMade <lgl> FALSE, TRUE, TRUE, TRUE, TR…
## $ status <chr> "U", "D", "D", "U", "D", "D…
## $ gender <chr> "M", "M", "M", "M", "M", "M…
## $ birthDate <chr> "03/05/1949 00.00", "6/28/1…
## $ lastName <chr> "Arnault", "Musk", "Bezos",…
## $ firstName <chr> "Bernard", "Elon", "Jeff", …
## $ title <chr> "Chairman and CEO", "CEO", …
## $ date <date> 2023-04-04, 2023-04-04, 20…
## $ state <chr> "", "Texas", "Washington", …
## $ residenceStateRegion <chr> "", "South", "West", "West"…
## $ birthYear <int> 1949, 1971, 1964, 1944, 193…
## $ birthMonth <int> 3, 6, 1, 8, 8, 10, 2, 1, 4,…
## $ birthDay <int> 5, 28, 12, 17, 30, 28, 14, …
## $ cpi_country <dbl> 110.05, 117.24, 117.24, 117…
## $ cpi_change_country <dbl> 1.1, 7.5, 7.5, 7.5, 7.5, 7.…
## $ gdp_country <dbl> 2.715518e+12, 2.142770e+13,…
## $ gross_tertiary_education_enrollment <dbl> 65.6, 88.2, 88.2, 88.2, 88.…
## $ gross_primary_education_enrollment_country <dbl> 102.5, 101.8, 101.8, 101.8,…
## $ life_expectancy_country <dbl> 82.5, 78.5, 78.5, 78.5, 78.…
## $ tax_revenue_country_country <dbl> 24.2, 9.6, 9.6, 9.6, 9.6, 9…
## $ total_tax_rate_country <dbl> 60.7, 36.6, 36.6, 36.6, 36.…
## $ population_country <int> 67059887, 328239523, 328239…
## $ latitude_country <dbl> 46.22764, 37.09024, 37.0902…
## $ longitude_country <dbl> 2.213749, -95.712891, -95.7…
Data sudah berubah ke tipe yang benar, maka data bisa diolah/cleaning lebih lanjut.
billion %>% duplicated() %>% sum()
## [1] 0
billion %>% filter(duplicated(.)) #show duplicated
## [1] rank
## [2] finalWorth
## [3] category
## [4] personName
## [5] age
## [6] country
## [7] city
## [8] source
## [9] industries
## [10] countryOfCitizenship
## [11] organization
## [12] selfMade
## [13] status
## [14] gender
## [15] birthDate
## [16] lastName
## [17] firstName
## [18] title
## [19] date
## [20] state
## [21] residenceStateRegion
## [22] birthYear
## [23] birthMonth
## [24] birthDay
## [25] cpi_country
## [26] cpi_change_country
## [27] gdp_country
## [28] gross_tertiary_education_enrollment
## [29] gross_primary_education_enrollment_country
## [30] life_expectancy_country
## [31] tax_revenue_country_country
## [32] total_tax_rate_country
## [33] population_country
## [34] latitude_country
## [35] longitude_country
## <0 rows> (or 0-length row.names)
billion <- billion %>%
mutate(across(where(is.character), ~na_if(., "")))
colSums(is.na(billion))
## rank
## 0
## finalWorth
## 0
## category
## 0
## personName
## 0
## age
## 65
## country
## 38
## city
## 72
## source
## 0
## industries
## 0
## countryOfCitizenship
## 0
## organization
## 2315
## selfMade
## 0
## status
## 0
## gender
## 0
## birthDate
## 76
## lastName
## 0
## firstName
## 3
## title
## 2301
## date
## 0
## state
## 1887
## residenceStateRegion
## 1893
## birthYear
## 76
## birthMonth
## 76
## birthDay
## 76
## cpi_country
## 184
## cpi_change_country
## 184
## gdp_country
## 164
## gross_tertiary_education_enrollment
## 182
## gross_primary_education_enrollment_country
## 181
## life_expectancy_country
## 182
## tax_revenue_country_country
## 183
## total_tax_rate_country
## 182
## population_country
## 164
## latitude_country
## 164
## longitude_country
## 164
na_cols <- billion %>%
select(where(is.character)) %>%
names()
billion <- billion %>%
mutate(across(all_of(na_cols), ~replace_na(., "Unknown")))
glimpse(billion)
## Rows: 2,640
## Columns: 35
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, …
## $ finalWorth <int> 211000, 180000, 114000, 107…
## $ category <chr> "Fashion & Retail", "Automo…
## $ personName <chr> "Bernard Arnault & family",…
## $ age <int> 74, 51, 59, 78, 92, 67, 81,…
## $ country <chr> "France", "United States", …
## $ city <chr> "Paris", "Austin", "Medina"…
## $ source <chr> "LVMH", "Tesla, SpaceX", "A…
## $ industries <chr> "Fashion & Retail", "Automo…
## $ countryOfCitizenship <chr> "France", "United States", …
## $ organization <chr> "LVMH Moët Hennessy Louis V…
## $ selfMade <lgl> FALSE, TRUE, TRUE, TRUE, TR…
## $ status <chr> "U", "D", "D", "U", "D", "D…
## $ gender <chr> "M", "M", "M", "M", "M", "M…
## $ birthDate <chr> "03/05/1949 00.00", "6/28/1…
## $ lastName <chr> "Arnault", "Musk", "Bezos",…
## $ firstName <chr> "Bernard", "Elon", "Jeff", …
## $ title <chr> "Chairman and CEO", "CEO", …
## $ date <date> 2023-04-04, 2023-04-04, 20…
## $ state <chr> "Unknown", "Texas", "Washin…
## $ residenceStateRegion <chr> "Unknown", "South", "West",…
## $ birthYear <int> 1949, 1971, 1964, 1944, 193…
## $ birthMonth <int> 3, 6, 1, 8, 8, 10, 2, 1, 4,…
## $ birthDay <int> 5, 28, 12, 17, 30, 28, 14, …
## $ cpi_country <dbl> 110.05, 117.24, 117.24, 117…
## $ cpi_change_country <dbl> 1.1, 7.5, 7.5, 7.5, 7.5, 7.…
## $ gdp_country <dbl> 2.715518e+12, 2.142770e+13,…
## $ gross_tertiary_education_enrollment <dbl> 65.6, 88.2, 88.2, 88.2, 88.…
## $ gross_primary_education_enrollment_country <dbl> 102.5, 101.8, 101.8, 101.8,…
## $ life_expectancy_country <dbl> 82.5, 78.5, 78.5, 78.5, 78.…
## $ tax_revenue_country_country <dbl> 24.2, 9.6, 9.6, 9.6, 9.6, 9…
## $ total_tax_rate_country <dbl> 60.7, 36.6, 36.6, 36.6, 36.…
## $ population_country <int> 67059887, 328239523, 328239…
## $ latitude_country <dbl> 46.22764, 37.09024, 37.0902…
## $ longitude_country <dbl> 2.213749, -95.712891, -95.7…
bill_clean <- billion %>%
distinct() %>%
drop_na(where(is.numeric))
summary(bill_clean)
## rank finalWorth category personName
## Min. : 1 Min. : 1000 Length:2397 Length:2397
## 1st Qu.: 636 1st Qu.: 1500 Class :character Class :character
## Median :1272 Median : 2400 Mode :character Mode :character
## Mean :1276 Mean : 4759
## 3rd Qu.:1905 3rd Qu.: 4300
## Max. :2540 Max. :211000
## age country city source
## Min. : 18.00 Length:2397 Length:2397 Length:2397
## 1st Qu.: 56.00 Class :character Class :character Class :character
## Median : 65.00 Mode :character Mode :character Mode :character
## Mean : 64.96
## 3rd Qu.: 74.00
## Max. :101.00
## industries countryOfCitizenship organization selfMade
## Length:2397 Length:2397 Length:2397 Mode :logical
## Class :character Class :character Class :character FALSE:713
## Mode :character Mode :character Mode :character TRUE :1684
##
##
##
## status gender birthDate lastName
## Length:2397 Length:2397 Length:2397 Length:2397
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## firstName title date state
## Length:2397 Length:2397 Min. :2023-04-04 Length:2397
## Class :character Class :character 1st Qu.:2023-04-04 Class :character
## Mode :character Mode :character Median :2023-04-04 Mode :character
## Mean :2023-04-04
## 3rd Qu.:2023-04-04
## Max. :2023-04-04
## residenceStateRegion birthYear birthMonth birthDay
## Length:2397 Min. :1921 Min. : 1.000 Min. : 1.00
## Class :character 1st Qu.:1948 1st Qu.: 2.000 1st Qu.: 1.00
## Mode :character Median :1958 Median : 6.000 Median :11.00
## Mean :1957 Mean : 5.757 Mean :12.28
## 3rd Qu.:1967 3rd Qu.: 9.000 3rd Qu.:21.00
## Max. :2004 Max. :12.000 Max. :31.00
## cpi_country cpi_change_country gdp_country
## Min. : 99.55 Min. :-1.900 Min. :1.367e+10
## 1st Qu.:117.24 1st Qu.: 1.700 1st Qu.:1.736e+12
## Median :117.24 Median : 2.900 Median :1.991e+13
## Mean :127.90 Mean : 4.401 Mean :1.171e+13
## 3rd Qu.:125.08 3rd Qu.: 7.500 3rd Qu.:2.143e+13
## Max. :288.57 Max. :53.500 Max. :2.143e+13
## gross_tertiary_education_enrollment gross_primary_education_enrollment_country
## Min. : 4.00 Min. : 84.7
## 1st Qu.: 50.60 1st Qu.:100.2
## Median : 67.00 Median :101.8
## Mean : 67.47 Mean :102.9
## 3rd Qu.: 88.20 3rd Qu.:102.6
## Max. :136.60 Max. :142.1
## life_expectancy_country tax_revenue_country_country total_tax_rate_country
## Min. :54.3 Min. : 0.10 Min. : 9.90
## 1st Qu.:77.0 1st Qu.: 9.60 1st Qu.: 36.60
## Median :78.5 Median : 9.60 Median : 38.70
## Mean :78.1 Mean :12.58 Mean : 43.81
## 3rd Qu.:80.9 3rd Qu.:12.80 3rd Qu.: 59.10
## Max. :84.2 Max. :37.20 Max. :106.30
## population_country latitude_country longitude_country
## Min. :6.454e+05 Min. :-40.90 Min. :-106.35
## 1st Qu.:6.706e+07 1st Qu.: 35.86 1st Qu.: -95.71
## Median :3.282e+08 Median : 37.09 Median : 10.45
## Mean :5.103e+08 Mean : 34.78 Mean : 11.58
## 3rd Qu.:1.366e+09 3rd Qu.: 38.96 3rd Qu.: 104.20
## Max. :1.398e+09 Max. : 61.92 Max. : 174.89
# Final Worth
Outlier_FinalWorth <- ggplot(bill_clean, aes(x = "", y = finalWorth)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Final Worth", y = "Final Worth")
# Age
Outlier_Age <- ggplot(bill_clean, aes(x = "", y = age)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Age", y = "Age")
# CPI Country
Outlier_CpiCountry <- ggplot(bill_clean, aes(x = "", y = cpi_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot CPI Country", y = "CPI Country")
# CPI Change
Outlier_CpiChange <- ggplot(bill_clean, aes(x = "", y = cpi_change_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot CPI Change Country", y = "CPI Change Country")
# Gross Tertiary Education Enrollment
Outlier_GrossTertiary <- ggplot(bill_clean, aes(x = "", y = gross_tertiary_education_enrollment)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Gross Tertiary Education Enrollment", y = "Gross Tertiary Education Enrollment")
# Gross Primary Education Enrollment
Outlier_GrossPrimary <- ggplot(bill_clean, aes(x = "", y = gross_primary_education_enrollment_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Gross Primary Education Enrollment", y = "Gross Primary Education Enrollment")
# Life Expectancy
Outlier_LifeExpectancy <- ggplot(bill_clean, aes(x = "", y = life_expectancy_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Life Expectancy Country", y = "Life Expectancy Country")
# Tax Revenue
Outlier_TaxRevenue <- ggplot(bill_clean, aes(x = "", y = tax_revenue_country_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Tax Revenue Country", y = "Tax Revenue Country")
# Total Tax Rate
Outlier_TotalTaxRate <- ggplot(bill_clean, aes(x = "", y = total_tax_rate_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Total Tax Rate Country", y = "Total Tax Rate Country")
# Population
Outlier_Population <- ggplot(bill_clean, aes(x = "", y = population_country)) +
geom_boxplot(fill = "gray", outlier.color = "tan3") +
labs(title = "Boxplot Population Country", y = "Population Country")
grid.arrange(
Outlier_FinalWorth,
Outlier_Age,
Outlier_CpiCountry,
Outlier_CpiChange,
Outlier_GrossTertiary,
Outlier_GrossPrimary,
Outlier_LifeExpectancy,
Outlier_TaxRevenue,
Outlier_TotalTaxRate,
Outlier_Population,
ncol = 4
)
Setelah melakukan pre-prosessing data, kami melanjutkan ke tahap analisis data. Untuk itu, kami menyusun beberapa pertanyaan yang dapat membantu kami untuk mencari tahu mengenai faktor yang melatarbelakangi seseorang menjadi miliarder.
Kita akan lihat bagaimana negara dan industri itu berpengaruh terhadap kekayaan seseorang.
bill_clean %>%
count(country, sort = TRUE) %>%
ggplot(aes(x = reorder(country, n), y = n)) +
geom_col(fill = "tan1") +
coord_flip() + #nuker sumbu x dan y
geom_text(
aes(label = paste0(n, " (", scales::percent(n/sum(n), accuracy = 0.1), ")")),
hjust = -0.05,
size = 3.5
) +
scale_y_continuous(expand = expansion(mult = c(0,0.25))) +
labs(title = "Jumlah Miliarder di Tiap Negara",
x = "Negara", y = "Jumlah Miliarder") +
theme_minimal()
bill_clean %>%
count(country, sort = TRUE) %>%
head(10) %>%
ggplot(aes(x = reorder(country, n), y = n)) +
geom_col(fill = "tan1") +
coord_flip() + #nuker sumbu x dan y
geom_text(
aes(label = paste0(n, " (", scales::percent(n / sum(n), accuracy = 0.1), ")")),
hjust = -0.05, size = 3.5
) +
scale_y_continuous(expand = expansion(mult = c(0,0.15))) +
labs(title = "Top 10 Negara dengan Miliarder Terbanyak",
x = "Negara", y = "Jumlah Miliarder") +
theme_minimal()
Dari kedua grafik, dapat dilihat bahwa US dan China menduduki peringkat
teratas Negara dengan jumlah miliarder terbanyak di dunia. Maka dari
itu, bisa jadi ada kemungkinan bahwa negara itu berpengaruh terhadap
kekayaan seseorang.
top_industries <- bill_clean %>%
group_by(industries) %>%
summarise(total_wealth = sum(finalWorth, na.rm = TRUE)) %>%
arrange(desc(total_wealth)) %>%
slice_head(n = 10) %>%
mutate(pct = total_wealth / sum(total_wealth))
ggplot(top_industries, aes(x = reorder(industries, total_wealth), y = total_wealth)) +
geom_bar(stat = "identity", fill = "tan1") +
geom_text(
aes(label = paste0(
comma(total_wealth, accuracy = 1),
" (", percent(pct, accuracy = 0.1), ")"
)),
hjust = -0.05,
size = 3.5
) +
labs(
title = "Top 10 Industri Berdasarkan Total Kekayaan",
x = "Industri",
y = "Total Kekayaan (Juta USD)"
) +
coord_flip() +
theme_minimal() +
scale_y_continuous(expand = expansion(mult = c(0, 0.4))) # beri ruang di ujung kanan
Dari grafik ini, terlihat bahwa industri Technology, Fashion &
Retail, dan Finance & Investments adalah tiga industri teratas yang
menghasilkan kekayan paling banyak dibandingkan industri lain. Hal ini
karena kemungkinan kemajuan jaman—terutama di industri Technology—yang
membuat ketiga industri ini menghasilkan total kekayaan yang banyak.
top_countries_order <- bill_clean %>%
count(country, sort = TRUE) %>%
slice_max(n, n = 5)
industry_country <- bill_clean %>%
filter(country %in% top_countries_order$country, !is.na(industries)) %>%
count(country, industries)
# Urutan industri berdasarkan total global
industry_levels <- industry_country %>%
filter(country %in% top_countries_order$country, !is.na(industries)) %>%
count(industries) %>%
arrange(desc(n)) %>%
pull(industries)
# Filter dan hitung jumlah miliarder per industri di 5 negara tersebut
industry_country <- industry_country %>%
mutate(
industries = factor(industries, levels = rev(industry_levels)),
country = factor(country, levels = top_countries_order$country)
)
# Visualisasi facet (dengan urutan negara sesuai ranking)
ggplot(industry_country, aes(x = industries, y = n, fill = industries)) +
geom_col(show.legend = FALSE) +
coord_flip() +
geom_text(aes(label = n), hjust = 0, size = 2.5) +
scale_x_discrete(expand = expansion(mult = c(0, 0.1))) +
facet_wrap(~country, scales = "free_y") +
labs(
title = "Distribusi Industri Miliarder di Negara-Negara Teratas",
x = "Industri",
y = "Jumlah Miliarder"
) +
theme_bw(base_size = 10) +
theme(
strip.text = element_text(size = 10, face = "bold"),
axis.text.x = element_text(size = 8),
axis.text.y = element_text(size = 7)
)
Dari grafik ini, kita jadi mengetahui tiap-tiap negara dengan jumlah
miliarder terbanyak, industri apa yang paling banyak digeluti oleh para
miliardernya. Jadi, kita bisa melihat ketika kita ingin menjadi
miliarder, tempat atau lokasi dan industri mana saja yang membuka
peluang lebih besar untuk menjadikan kita seorang miliarder.
Kita akan lihat apakah status self-made atau usaha seseorang yang dimulai dari 0 serta gender bisa mempengaruhi total kekayaan seseorang.
bill_clean %>%
filter(!is.na(gender), !is.na(selfMade)) %>%
ggplot(aes(x = selfMade, y = finalWorth, fill = gender)) +
geom_boxplot() +
scale_fill_manual(values = c("M" = "navy", "F" = "hotpink")) +
labs(title = "Distribusi Kekayaan Berdasarkan Self-Made dan Gender",
x = "Status Kekayaan", y = "Kekayaan (Juta USD)") +
ylim(0, 10000)
## Warning: Removed 169 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
theme_minimal()
## List of 136
## $ line :List of 6
## ..$ colour : chr "black"
## ..$ linewidth : num 0.5
## ..$ linetype : num 1
## ..$ lineend : chr "butt"
## ..$ arrow : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_line" "element"
## $ rect :List of 5
## ..$ fill : chr "white"
## ..$ colour : chr "black"
## ..$ linewidth : num 0.5
## ..$ linetype : num 1
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_rect" "element"
## $ text :List of 11
## ..$ family : chr ""
## ..$ face : chr "plain"
## ..$ colour : chr "black"
## ..$ size : num 11
## ..$ hjust : num 0.5
## ..$ vjust : num 0.5
## ..$ angle : num 0
## ..$ lineheight : num 0.9
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : logi FALSE
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ title : NULL
## $ aspect.ratio : NULL
## $ axis.title : NULL
## $ axis.title.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.75points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.75points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.x.bottom : NULL
## $ axis.title.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : num 90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.75points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.title.y.left : NULL
## $ axis.title.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : num -90
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.75points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "grey30"
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 1
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 2.2points 0points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.top :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : NULL
## ..$ vjust : num 0
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 2.2points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.x.bottom : NULL
## $ axis.text.y :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 1
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.2points 0points 0points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.y.left : NULL
## $ axis.text.y.right :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 0points 0points 2.2points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text.theta : NULL
## $ axis.text.r :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0.5
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : 'margin' num [1:4] 0points 2.2points 0points 2.2points
## .. ..- attr(*, "unit")= int 8
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.ticks : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.ticks.x : NULL
## $ axis.ticks.x.top : NULL
## $ axis.ticks.x.bottom : NULL
## $ axis.ticks.y : NULL
## $ axis.ticks.y.left : NULL
## $ axis.ticks.y.right : NULL
## $ axis.ticks.theta : NULL
## $ axis.ticks.r : NULL
## $ axis.minor.ticks.x.top : NULL
## $ axis.minor.ticks.x.bottom : NULL
## $ axis.minor.ticks.y.left : NULL
## $ axis.minor.ticks.y.right : NULL
## $ axis.minor.ticks.theta : NULL
## $ axis.minor.ticks.r : NULL
## $ axis.ticks.length : 'simpleUnit' num 2.75points
## ..- attr(*, "unit")= int 8
## $ axis.ticks.length.x : NULL
## $ axis.ticks.length.x.top : NULL
## $ axis.ticks.length.x.bottom : NULL
## $ axis.ticks.length.y : NULL
## $ axis.ticks.length.y.left : NULL
## $ axis.ticks.length.y.right : NULL
## $ axis.ticks.length.theta : NULL
## $ axis.ticks.length.r : NULL
## $ axis.minor.ticks.length : 'rel' num 0.75
## $ axis.minor.ticks.length.x : NULL
## $ axis.minor.ticks.length.x.top : NULL
## $ axis.minor.ticks.length.x.bottom: NULL
## $ axis.minor.ticks.length.y : NULL
## $ axis.minor.ticks.length.y.left : NULL
## $ axis.minor.ticks.length.y.right : NULL
## $ axis.minor.ticks.length.theta : NULL
## $ axis.minor.ticks.length.r : NULL
## $ axis.line : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ axis.line.x : NULL
## $ axis.line.x.top : NULL
## $ axis.line.x.bottom : NULL
## $ axis.line.y : NULL
## $ axis.line.y.left : NULL
## $ axis.line.y.right : NULL
## $ axis.line.theta : NULL
## $ axis.line.r : NULL
## $ legend.background : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.margin : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
## ..- attr(*, "unit")= int 8
## $ legend.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## $ legend.spacing.x : NULL
## $ legend.spacing.y : NULL
## $ legend.key : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.key.size : 'simpleUnit' num 1.2lines
## ..- attr(*, "unit")= int 3
## $ legend.key.height : NULL
## $ legend.key.width : NULL
## $ legend.key.spacing : 'simpleUnit' num 5.5points
## ..- attr(*, "unit")= int 8
## $ legend.key.spacing.x : NULL
## $ legend.key.spacing.y : NULL
## $ legend.frame : NULL
## $ legend.ticks : NULL
## $ legend.ticks.length : 'rel' num 0.2
## $ legend.axis.line : NULL
## $ legend.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : 'rel' num 0.8
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.text.position : NULL
## $ legend.title :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : NULL
## ..$ hjust : num 0
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi TRUE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.title.position : NULL
## $ legend.position : chr "right"
## $ legend.position.inside : NULL
## $ legend.direction : NULL
## $ legend.byrow : NULL
## $ legend.justification : chr "center"
## $ legend.justification.top : NULL
## $ legend.justification.bottom : NULL
## $ legend.justification.left : NULL
## $ legend.justification.right : NULL
## $ legend.justification.inside : NULL
## $ legend.location : NULL
## $ legend.box : NULL
## $ legend.box.just : NULL
## $ legend.box.margin : 'margin' num [1:4] 0cm 0cm 0cm 0cm
## ..- attr(*, "unit")= int 1
## $ legend.box.background : list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## $ legend.box.spacing : 'simpleUnit' num 11points
## ..- attr(*, "unit")= int 8
## [list output truncated]
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi TRUE
## - attr(*, "validate")= logi TRUE
Dari grafik boxplot, dapat dilihat bahwa mayoritas miliarder merupakan self-made(perintis), bukan hasil warisan. Dari kelompok self-made, sebagian besar adalah laki-laki dan jumlah perempuannya lebih sedikit dibanding laki-laki. Ini mencerminkan adanya ketimpangan gender dalam peluang membangun kekayaan.
bill_clean %>%
group_by(industries, gender) %>%
summarise(count = n()) %>%
ggplot(aes(x = reorder(industries, count), y = count, fill = gender)) +
scale_fill_manual(values = c("M" = "navy", "F" = "hotpink")) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = count),
position = position_dodge(width = 0.8),
hjust = -0.1, size = 2) +
coord_flip() +
labs(title = "Jumlah Miliarder Berdasarkan Industri dan Gender", x = "Industri", y = "Jumlah", fill = "Gender") +
theme_minimal()
## `summarise()` has grouped output by 'industries'. You can override using the
## `.groups` argument.
Dari grafik, dapat dilihat bahwa laki-laki lebih banyak yang menjadi
miliarder dibandingkan perempuan, bahkan ada satu industri (Telecom) di
mana tidak ada miliarder perempuan. Industri seperti Fashion &
Retail dan Food & Beverage memiliki proporsi miliarder perempuan
yang lebih tinggi dibanding industri lain. Hal ini menunjukkan bahwa
jenis industri turut berpengaruh terhadap keterwakilan gender.
Kita akan lihat apakah usia itu berpengaruh pada kekayaan seseorang. Lalu, bagaimana hubungan di antara keduanya.
ggplot(bill_clean, aes(x = age, y = finalWorth, color = gender)) +
geom_point(alpha =0.9
) +
geom_smooth()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
labs(title =
"Hubungan Usia dan Kekayaan Berdasarkan Gender"
, x =
"Usia"
, y =
"Kekayaan (Miliar USD)"
) + theme_minimal()
## NULL
Hal ini menunjukkan bahwa usia tidak terlalu memiliki pengaruh pada kekayaan total. Puncak dari distribusiberada di sekitar 50-75 tahun yang menunjukkan angka rata-rata miliarder di dunia.
bill_clean %>%
mutate(age_group = cut(age, breaks = c(0, 40, 60, 80, 100),
labels = c("<=40", "41-60", "61-80", ">80"))) %>%
filter(!is.na(industries)) %>%
count(age_group, industries) %>%
slice_max(order_by = n, n = 40) %>%
ggplot(aes(x = reorder(industries, n), y = n, fill = age_group)) +
geom_col(position = "dodge") +
coord_flip() +
labs(
title = "Distribusi Miliarder Berdasarkan Usia dan Industri",
x = "Industri",
y = "Jumlah Miliarder",
fill = "Kelompok Usia"
) +
theme_minimal()
Dari sini dapat kita lihat bahwa miliarder di finance lebih banyak
berusia 61-80 tahun, maka ada kemungkinanmereka berkecimpung di sektor
tersebut dari lama. Sedangkan miliarder dengan kelompok usia
palingbervariasi berada di sektor teknologi yang memang sedang digemari
akhir-akhir ini.
individu_vars <- bill_clean %>%
select(finalWorth, age)
negara_vars <- bill_clean %>%
select(gdp_country, cpi_country, population_country,
total_tax_rate_country, tax_revenue_country_country)
cor_individu <- cor(individu_vars, use = "complete.obs")
cor_negara <- cor(negara_vars, use = "complete.obs")
ggcorrplot(cor_individu,
lab = TRUE,
type = "full",
colors = c("skyblue", "white", "firebrick"),
title = "Korelasi Variabel Individu",
lab_size = 4)
Hal ini relate dengan pernyataan pada scatterplot di atas bahwa umur
tidak terlalu berpengaruh pada kekayaan.
ggcorrplot(cor_negara,
lab = TRUE,
type = "full",
colors = c("tan2", "white", "salmon"),
title = "Korelasi Variabel Negara",
lab_size = 4)
Dari grafik ini, ada beberapa poin: