“The term
billionairedescribes an individual with a net worth of at least one billion units in their native currency, such as dollars or euros. The assets can range from cash and cash equivalents to real estate, business, and personal property. Since 1987, Forbes has ranked the wealthiest global citizens according to their net worth.”
In this LBB Project, we will used Dataset provided by Kaggle which contains statistics on the world’s billinaires, including information about their businesses, industries, and personal details. It provides insights into the wealth distribution, business sectors, and demographics of billionaires worldwide.
Those insights informations into the World’s Billionaires will be summarized using Data Visualization to give us better understanding of these
Billionaires'backgrounds.
#> 'data.frame': 2640 obs. of 35 variables:
#> $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ finalWorth : int 211000 180000 114000 107000 106000 104000 94500 93000 83400 80700 ...
#> $ category : chr "Fashion & Retail" "Automotive" "Technology" "Technology" ...
#> $ personName : chr "Bernard Arnault & family" "Elon Musk" "Jeff Bezos" "Larry Ellison" ...
#> $ age : int 74 51 59 78 92 67 81 83 65 67 ...
#> $ country : chr "France" "United States" "United States" "United States" ...
#> $ city : chr "Paris" "Austin" "Medina" "Lanai" ...
#> $ source : chr "LVMH" "Tesla, SpaceX" "Amazon" "Oracle" ...
#> $ industries : chr "Fashion & Retail" "Automotive" "Technology" "Technology" ...
#> $ countryOfCitizenship : chr "France" "United States" "United States" "United States" ...
#> $ organization : chr "LVMH Moët Hennessy Louis Vuitton" "Tesla" "Amazon" "Oracle" ...
#> $ selfMade : logi FALSE TRUE TRUE TRUE TRUE TRUE ...
#> $ status : chr "U" "D" "D" "U" ...
#> $ gender : chr "M" "M" "M" "M" ...
#> $ birthDate : chr "3/5/1949 0:00" "6/28/1971 0:00" "1/12/1964 0:00" "8/17/1944 0:00" ...
#> $ lastName : chr "Arnault" "Musk" "Bezos" "Ellison" ...
#> $ firstName : chr "Bernard" "Elon" "Jeff" "Larry" ...
#> $ title : chr "Chairman and CEO" "CEO" "Chairman and Founder" "CTO and Founder" ...
#> $ date : chr "4/4/2023 5:01" "4/4/2023 5:01" "4/4/2023 5:01" "4/4/2023 5:01" ...
#> $ state : chr "" "Texas" "Washington" "Hawaii" ...
#> $ residenceStateRegion : chr "" "South" "West" "West" ...
#> $ birthYear : int 1949 1971 1964 1944 1930 1955 1942 1940 1957 1956 ...
#> $ birthMonth : int 3 6 1 8 8 10 2 1 4 3 ...
#> $ birthDay : int 5 28 12 17 30 28 14 28 19 24 ...
#> $ cpi_country : num 110 117 117 117 117 ...
#> $ cpi_change_country : num 1.1 7.5 7.5 7.5 7.5 7.5 7.5 3.6 7.7 7.5 ...
#> $ gdp_country : chr "$2,715,518,274,227 " "$21,427,700,000,000 " "$21,427,700,000,000 " "$21,427,700,000,000 " ...
#> $ gross_tertiary_education_enrollment : num 65.6 88.2 88.2 88.2 88.2 88.2 88.2 40.2 28.1 88.2 ...
#> $ gross_primary_education_enrollment_country: num 102 102 102 102 102 ...
#> $ life_expectancy_country : num 82.5 78.5 78.5 78.5 78.5 78.5 78.5 75 69.4 78.5 ...
#> $ tax_revenue_country_country : num 24.2 9.6 9.6 9.6 9.6 9.6 9.6 13.1 11.2 9.6 ...
#> $ total_tax_rate_country : num 60.7 36.6 36.6 36.6 36.6 36.6 36.6 55.1 49.7 36.6 ...
#> $ population_country : int 67059887 328239523 328239523 328239523 328239523 328239523 328239523 126014024 1366417754 328239523 ...
#> $ latitude_country : num 46.2 37.1 37.1 37.1 37.1 ...
#> $ longitude_country : num 2.21 -95.71 -95.71 -95.71 -95.71 ...
The Dataset contains the following information:
rank: The ranking of the billionaire in terms of wealth.finalWorth: The final net worth of the billionaire in U.S. dollars.category: The category or industry in which the billionaire’s business operates.personName: The full name of the billionaire.age: The age of the billionaire.country: The country in which the billionaire resides.city: The city in which the billionaire resides.source: The source of the billionaire’s wealth.industries: The industries associated with the billionaire’s business interests.countryOfCitizenship: The country of citizenship of the billionaire.organization: The name of the organization or company associated with the billionaire.selfMade: Indicates whether the billionaire is self-made (True/False).status: “D” represents self-made billionaires (Founders/Entrepreneurs) and “U” indicates inherited or unearned wealth.gender: The gender of the billionaire.birthDate: The birthdate of the billionaire.lastName: The last name of the billionaire.firstName: The first name of the billionaire.title: The title or honorific of the billionaire.date: The date of data collection.state: The state in which the billionaire resides.residenceStateRegion: The region or state of residence of the billionaire.birthYear: The birth year of the billionaire.birthMonth: The birth month of the billionaire.birthDay: The birth day of the billionaire.cpi_country: Consumer Price Index (CPI) for the billionaire’s country.cpi_change_country: CPI change for the billionaire’s country.gdp_country: Gross Domestic Product (GDP) for the billionaire’s country.gross_tertiary_education_enrollment: Enrollment in tertiary education in the billionaire’s country.gross_primary_education_enrollment_country: Enrollment in primary education in the billionaire’s country.life_expectancy_country: Life expectancy in the billionaire’s country.tax_revenue_country_country: Tax revenue in the billionaire’s country.total_tax_rate_country: Total tax rate in the billionaire’s country.population_country: Population of the billionaire’s country.latitude_country: Latitude coordinate of the billionaire’s country.longitude_country: Longitude coordinate of the billionaire’s country.
Based on the information regarding Data Description, we will need to
change the data type as follows: - birthDate and
date -> date time format - all other variable characters
-> factor
# Change to date time format
df_billion$birthDate = mdy_hm(df_billion$birthDate)
df_billion$date = mdy_hm(df_billion$date)
# Change all the other variable characters as factor
df_billion <- df_billion %>%
mutate(across(where(is.character), as.factor))
# Re-check Data Types have been correctly changed
str(df_billion)#> 'data.frame': 2640 obs. of 35 variables:
#> $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ finalWorth : int 211000 180000 114000 107000 106000 104000 94500 93000 83400 80700 ...
#> $ category : Factor w/ 18 levels "Automotive","Construction & Engineering",..: 5 1 17 17 6 17 12 18 3 17 ...
#> $ personName : Factor w/ 2638 levels "A. Jayson Adair",..: 222 587 988 1274 2392 239 1564 303 1634 2136 ...
#> $ age : int 74 51 59 78 92 67 81 83 65 67 ...
#> $ country : Factor w/ 79 levels "","Algeria","Andorra",..: 26 76 76 76 76 76 76 45 33 76 ...
#> $ city : Factor w/ 742 levels "","A Coruña",..: 496 29 403 334 478 403 455 411 434 262 ...
#> $ source : Factor w/ 906 levels "3D printing",..: 474 823 29 594 82 516 99 814 228 516 ...
#> $ industries : Factor w/ 18 levels "Automotive","Construction & Engineering",..: 5 1 17 17 6 17 12 18 3 17 ...
#> $ countryOfCitizenship : Factor w/ 77 levels "Algeria","Argentina",..: 23 74 74 74 74 74 74 42 31 74 ...
#> $ organization : Factor w/ 295 levels "","ABC Supply",..: 163 260 8 194 29 33 36 9 213 159 ...
#> $ selfMade : logi FALSE TRUE TRUE TRUE TRUE TRUE ...
#> $ status : Factor w/ 6 levels "D","E","N","R",..: 6 1 1 6 1 1 6 6 1 1 ...
#> $ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
#> $ birthDate : POSIXct, format: "1949-03-05 00:00:00" "1971-06-28 00:00:00" ...
#> $ lastName : Factor w/ 1736 levels "Aarnio-Wihuri",..: 69 1049 149 420 216 513 176 1415 43 102 ...
#> $ firstName : Factor w/ 1771 levels "","A. Jayson",..: 143 342 638 855 1561 152 1012 187 1052 1420 ...
#> $ title : Factor w/ 98 levels "","Advisor","Athlete",..: 18 5 21 52 5 36 5 77 68 85 ...
#> $ date : POSIXct, format: "2023-04-04 05:01:00" "2023-04-04 05:01:00" ...
#> $ state : Factor w/ 46 levels "","Alabama","Arizona",..: 1 40 44 10 26 44 30 1 1 44 ...
#> $ residenceStateRegion : Factor w/ 6 levels "","Midwest","Northeast",..: 1 4 6 6 2 6 3 1 1 6 ...
#> $ birthYear : int 1949 1971 1964 1944 1930 1955 1942 1940 1957 1956 ...
#> $ birthMonth : int 3 6 1 8 8 10 2 1 4 3 ...
#> $ birthDay : int 5 28 12 17 30 28 14 28 19 24 ...
#> $ cpi_country : num 110 117 117 117 117 ...
#> $ cpi_change_country : num 1.1 7.5 7.5 7.5 7.5 7.5 7.5 3.6 7.7 7.5 ...
#> $ gdp_country : Factor w/ 69 levels "","$1,119,190,780,753 ",..: 22 26 26 26 26 26 26 3 21 26 ...
#> $ gross_tertiary_education_enrollment : num 65.6 88.2 88.2 88.2 88.2 88.2 88.2 40.2 28.1 88.2 ...
#> $ gross_primary_education_enrollment_country: num 102 102 102 102 102 ...
#> $ life_expectancy_country : num 82.5 78.5 78.5 78.5 78.5 78.5 78.5 75 69.4 78.5 ...
#> $ tax_revenue_country_country : num 24.2 9.6 9.6 9.6 9.6 9.6 9.6 13.1 11.2 9.6 ...
#> $ total_tax_rate_country : num 60.7 36.6 36.6 36.6 36.6 36.6 36.6 55.1 49.7 36.6 ...
#> $ population_country : int 67059887 328239523 328239523 328239523 328239523 328239523 328239523 126014024 1366417754 328239523 ...
#> $ latitude_country : num 46.2 37.1 37.1 37.1 37.1 ...
#> $ longitude_country : num 2.21 -95.71 -95.71 -95.71 -95.71 ...
# Check Column Information of All the NA Values
cols_names <- colnames(df_billion)
cols_NA_values <- colSums(is.na(df_billion))
# Temporary dataframe to keep the names of columns and total NA values for each columns
temp_cols <- data.frame(cols_names, cols_NA_values)
# Data frame for all columns name with its total NA values information
cols_NA <- temp_cols[!(cols_NA_values == 0),]
cols_NANA Values above can simply means that there is no information in regards to the corresponding column information but we will still kept the data regardless because there are others information that we can still gain insight from the other variables.
Therefore, there is no data that we will remove from our dataset
Using our dataset df_billion let us do exploratory
visualization on several things :
We will use dplyr() function to count the frequency of
variables country where the Billionaires resides and show
the top 10 ranking of the Countries with the Most Number of World
Billionaires resides
# Data Transformation
top10_country <- df_billion %>%
group_by(country) %>%
summarise(freq = n()) %>%
ungroup() %>%
arrange(-freq) %>%
head(10) %>%
# Adding label for tooltip
mutate(label = glue(
"Residence Country: {country}
Total: {comma(freq)} Billionaires"
))# Making Static Plot
plot1 <- ggplot(data = top10_country,
aes(x = freq,
y = reorder(country, freq),
color = freq,
text = label)) +
geom_point(size = 3) +
geom_segment(aes(x = 0,
xend = freq,
yend = country),
size = 1.0) +
scale_color_gradient(low = "lightblue",
high = "darkblue") +
scale_x_continuous(labels = comma) +
labs(title = "Top 10 Residences Countries with Most Billionaires",
x = "Total Billionaires",
y = "Residence Country") +
theme_minimal() +
theme(legend.position = "none")
# Creating interactive plot
ggplotly(plot1, tooltip = "text")# Data Transformation
top10_industries <- df_billion %>%
group_by(industries) %>%
summarise(freq = n()) %>%
ungroup() %>%
arrange(-freq) %>%
head(10) %>%
# Adding label for tooltip
mutate(label = glue(
"Industry: {industries}
Total: {comma(freq)} Billionaires"
))# Making Static Plot
plot2 <- ggplot(data = top10_industries,
aes(x = freq,
y = reorder(industries, freq),
text = label)) +
geom_col(mapping = aes(fill = freq)) +
scale_fill_gradient(low = "lightblue",
high = "darkblue") +
scale_x_continuous(labels = comma) +
labs(title = "Top 10 Industries with Most Billionaires",
x = "Total Billionaires",
y = "Industry Category") +
theme_minimal()
# Creating interactive plot
ggplotly(plot2, tooltip = "text")# Data Transformation of Top 50 Ranked as World Billionaires
loca_top50 <- data.frame(rank = as.character(df_billion$rank),
Name = df_billion$personName,
Assets = df_billion$finalWorth,
Residence_Country = df_billion$country,
lat = df_billion$latitude_country,
lng = df_billion$longitude_country) %>%
head(50)
# Plot the market into the World Map
map <- leaflet()
map <- addTiles(map = map)
addMarkers(map = map, data = loca_top50, popup = glue("<h5>Rank = {loca_top50$rank}</p>
Name = {loca_top50$Name}</p>
Assets = $ {comma(loca_top50$Assets)} B</p>
Country Residence = {loca_top50$Residence_Country}
"))There are many more variations and insights that we can combine to summarize the data using visualization. The above visualizations are just a small part of the beginning to get us started to more complex analysis. Hope these visualizations will helps to introduce the general knowledge from the dataset.
Billionaires Statistics Dataset (2023): (https://www.kaggle.com/datasets/nelgiriyewithana/billionaires-statistics-dataset)What Is a Billionaire?: (https://www.investopedia.com/terms/b/billionaire.asp)