Exploring the World’s Wealthiest Individuals

Author

Kittim

What is the Global Distribution and Wealth Status of Billionaires Worldwide?

Introduction

In this analysis, I delve into a comprehensive dataset of billionaires worldwide. The dataset encompasses various variables that provide insights into the economic, demographic, and geographic aspects of these affluent individuals. Key variables include the billionaire’s rank, net worth, category of wealth, age, country of citizenship, organization, and more. The data was sourced from Forbes and other financial publications and it’s owned by Nidula Elgiriyewithana. I will conduct thorough data cleaning to address missing values, inconsistencies, and outliers. The motivation behind choosing this dataset lies in the fascination with understanding the distribution of wealth globally and the factors influencing an individual’s inclusion in the prestigious billionaire club. Exploring patterns within this dataset can offer valuable insights into the dynamics of wealth accumulation and distribution across different industries, countries, and demographic groups.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggrepel)
Warning: package 'ggrepel' was built under R version 4.3.2
library(plotly)
Warning: package 'plotly' was built under R version 4.3.2

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(viridis)
Warning: package 'viridis' was built under R version 4.3.2
Loading required package: viridisLite
library(RColorBrewer)
library(leaflet)
Warning: package 'leaflet' was built under R version 4.3.2
library(sf)
Warning: package 'sf' was built under R version 4.3.2
Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
getwd()
[1] "C:/Users/mutho/Desktop/Fall 2023/Data 110/DATASETS/PROJECT 2"
billionaires_statistics_dataset <- read_csv("billionaires_statistics_dataset.csv")
Rows: 2640 Columns: 35
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): category, personName, country, city, source, industries, countryOf...
dbl (16): rank, finalWorth, age, birthYear, birthMonth, birthDay, cpi_countr...
lgl  (1): selfMade

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(billionaires_statistics_dataset)
# A tibble: 6 × 35
   rank finalWorth category     personName   age country city  source industries
  <dbl>      <dbl> <chr>        <chr>      <dbl> <chr>   <chr> <chr>  <chr>     
1     1     211000 Fashion & R… Bernard A…    74 France  Paris LVMH   Fashion &…
2     2     180000 Automotive   Elon Musk     51 United… Aust… Tesla… Automotive
3     3     114000 Technology   Jeff Bezos    59 United… Medi… Amazon Technology
4     4     107000 Technology   Larry Ell…    78 United… Lanai Oracle Technology
5     5     106000 Finance & I… Warren Bu…    92 United… Omaha Berks… Finance &…
6     6     104000 Technology   Bill Gates    67 United… Medi… Micro… Technology
# ℹ 26 more variables: countryOfCitizenship <chr>, organization <chr>,
#   selfMade <lgl>, status <chr>, gender <chr>, birthDate <chr>,
#   lastName <chr>, firstName <chr>, title <chr>, date <chr>, state <chr>,
#   residenceStateRegion <chr>, birthYear <dbl>, birthMonth <dbl>,
#   birthDay <dbl>, cpi_country <dbl>, cpi_change_country <dbl>,
#   gdp_country <chr>, gross_tertiary_education_enrollment <dbl>,
#   gross_primary_education_enrollment_country <dbl>, …

Removing variables that I am not using at the moment using select dply command.

billionaires <- billionaires_statistics_dataset |>
  select(-status, -selfMade, -gross_primary_education_enrollment_country, -gross_tertiary_education_enrollment, -residenceStateRegion, -state, -title, -state, -birthDate, -birthYear, -birthMonth, -birthDay, -date)
head(billionaires) 
# A tibble: 6 × 23
   rank finalWorth category     personName   age country city  source industries
  <dbl>      <dbl> <chr>        <chr>      <dbl> <chr>   <chr> <chr>  <chr>     
1     1     211000 Fashion & R… Bernard A…    74 France  Paris LVMH   Fashion &…
2     2     180000 Automotive   Elon Musk     51 United… Aust… Tesla… Automotive
3     3     114000 Technology   Jeff Bezos    59 United… Medi… Amazon Technology
4     4     107000 Technology   Larry Ell…    78 United… Lanai Oracle Technology
5     5     106000 Finance & I… Warren Bu…    92 United… Omaha Berks… Finance &…
6     6     104000 Technology   Bill Gates    67 United… Medi… Micro… Technology
# ℹ 14 more variables: countryOfCitizenship <chr>, organization <chr>,
#   gender <chr>, lastName <chr>, firstName <chr>, cpi_country <dbl>,
#   cpi_change_country <dbl>, gdp_country <chr>, life_expectancy_country <dbl>,
#   tax_revenue_country_country <dbl>, total_tax_rate_country <dbl>,
#   population_country <dbl>, latitude_country <dbl>, longitude_country <dbl>
## Convert variable names to lowercase
names(billionaires) <- tolower(names(billionaires))

## Replace spaces with underscores in variable names
names(billionaires) <- gsub(" ", "_", names(billionaires))

# Display
head(billionaires)
# A tibble: 6 × 23
   rank finalworth category     personname   age country city  source industries
  <dbl>      <dbl> <chr>        <chr>      <dbl> <chr>   <chr> <chr>  <chr>     
1     1     211000 Fashion & R… Bernard A…    74 France  Paris LVMH   Fashion &…
2     2     180000 Automotive   Elon Musk     51 United… Aust… Tesla… Automotive
3     3     114000 Technology   Jeff Bezos    59 United… Medi… Amazon Technology
4     4     107000 Technology   Larry Ell…    78 United… Lanai Oracle Technology
5     5     106000 Finance & I… Warren Bu…    92 United… Omaha Berks… Finance &…
6     6     104000 Technology   Bill Gates    67 United… Medi… Micro… Technology
# ℹ 14 more variables: countryofcitizenship <chr>, organization <chr>,
#   gender <chr>, lastname <chr>, firstname <chr>, cpi_country <dbl>,
#   cpi_change_country <dbl>, gdp_country <chr>, life_expectancy_country <dbl>,
#   tax_revenue_country_country <dbl>, total_tax_rate_country <dbl>,
#   population_country <dbl>, latitude_country <dbl>, longitude_country <dbl>
glimpse(billionaires)
Rows: 2,640
Columns: 23
$ rank                        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
$ finalworth                  <dbl> 211000, 180000, 114000, 107000, 106000, 10…
$ category                    <chr> "Fashion & Retail", "Automotive", "Technol…
$ personname                  <chr> "Bernard Arnault & family", "Elon Musk", "…
$ age                         <dbl> 74, 51, 59, 78, 92, 67, 81, 83, 65, 67, 69…
$ country                     <chr> "France", "United States", "United States"…
$ city                        <chr> "Paris", "Austin", "Medina", "Lanai", "Oma…
$ source                      <chr> "LVMH", "Tesla, SpaceX", "Amazon", "Oracle…
$ industries                  <chr> "Fashion & Retail", "Automotive", "Technol…
$ countryofcitizenship        <chr> "France", "United States", "United States"…
$ organization                <chr> "LVMH Moët Hennessy Louis Vuitton", "Tesla…
$ gender                      <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M…
$ lastname                    <chr> "Arnault", "Musk", "Bezos", "Ellison", "Bu…
$ firstname                   <chr> "Bernard", "Elon", "Jeff", "Larry", "Warre…
$ cpi_country                 <dbl> 110.05, 117.24, 117.24, 117.24, 117.24, 11…
$ cpi_change_country          <dbl> 1.1, 7.5, 7.5, 7.5, 7.5, 7.5, 7.5, 3.6, 7.…
$ gdp_country                 <chr> "$2,715,518,274,227", "$21,427,700,000,000…
$ life_expectancy_country     <dbl> 82.5, 78.5, 78.5, 78.5, 78.5, 78.5, 78.5, …
$ tax_revenue_country_country <dbl> 24.2, 9.6, 9.6, 9.6, 9.6, 9.6, 9.6, 13.1, …
$ total_tax_rate_country      <dbl> 60.7, 36.6, 36.6, 36.6, 36.6, 36.6, 36.6, …
$ population_country          <dbl> 67059887, 328239523, 328239523, 328239523,…
$ latitude_country            <dbl> 46.22764, 37.09024, 37.09024, 37.09024, 37…
$ longitude_country           <dbl> 2.213749, -95.712891, -95.712891, -95.7128…
## Top 20 billionaires based on final net worth 
top_20_billionaires <- billionaires %>%
  arrange(desc(finalworth)) %>%
  slice(1:20)

# Create a bar chart with colors representing the 'category' variable
ggplot(top_20_billionaires, aes(x = reorder(personname, -finalworth), y = finalworth, fill = category)) +
  geom_bar(stat = "identity") +
  labs(title = "Top 20 Billionaires - Bar Chart of Final Net Worth by Industry",
       x = "Billionaires",
       y = "Final Net Worth",
       fill = " ") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Top 20 billionaires based on final net worth
top_20_billionaires <- billionaires %>%
  arrange(desc(finalworth)) %>%
  slice(1:20)

# Create a scatterplot with color showing the country and size showing final worth
ggplot(top_20_billionaires, aes(x = rank, y = finalworth, color = country, size = finalworth, label = firstname)) +
  geom_point() +
  geom_text_repel(box.padding = 0.5, segment.size = 0.2) +
  labs(title = "Top 20 Billionaires by Final Net Worth",
       x = "Rank",
       y = "Final Net Worth",
       color = "Country") +
  theme_minimal()

# Top 20 billionaires based on final net worth
top_20_billionaires <- billionaires %>%
  arrange(desc(finalworth)) %>%
  slice(1:20)

# Define a color palette using RColorBrewer
colors <- brewer.pal(n = length(unique(top_20_billionaires$country)), name = "Set1")

# Creating an interactive scatterplot with color showing the country and size showing final worth
plot_ly(top_20_billionaires, 
        type = "scatter",  ## Explicitly specify trace type
        mode = "markers",  ## Explicitly specify scatter mode
        x = ~rank, 
        y = ~finalworth, 
        color = ~country, 
        colors = colors,  ## color palette
        size = ~finalworth, 
        text = ~paste("Name: ", personname, "<br>Country: ", country, "<br>Rank: ", rank, "<br>Final Worth: ", finalworth),
        hoverinfo = "text",
        marker = list(sizemode = "diameter",
                      sizeref = 1.6)) %>%
layout(title = "Top 20 Billionaires Globally by Final Net Worth",
       xaxis = list(title = "Rank"),
       yaxis = list(title = "Final Net Worth (*1 billion)"),
       showlegend = TRUE,
       margin = list(t = 100))  # Adjust the top margin value
Warning: `line.width` does not currently support multiple values.

Warning: `line.width` does not currently support multiple values.
top_20_billionaires <- billionaires %>%
  arrange(desc(finalworth)) %>%
  slice(1:20)
head(top_20_billionaires)
# A tibble: 6 × 23
   rank finalworth category     personname   age country city  source industries
  <dbl>      <dbl> <chr>        <chr>      <dbl> <chr>   <chr> <chr>  <chr>     
1     1     211000 Fashion & R… Bernard A…    74 France  Paris LVMH   Fashion &…
2     2     180000 Automotive   Elon Musk     51 United… Aust… Tesla… Automotive
3     3     114000 Technology   Jeff Bezos    59 United… Medi… Amazon Technology
4     4     107000 Technology   Larry Ell…    78 United… Lanai Oracle Technology
5     5     106000 Finance & I… Warren Bu…    92 United… Omaha Berks… Finance &…
6     6     104000 Technology   Bill Gates    67 United… Medi… Micro… Technology
# ℹ 14 more variables: countryofcitizenship <chr>, organization <chr>,
#   gender <chr>, lastname <chr>, firstname <chr>, cpi_country <dbl>,
#   cpi_change_country <dbl>, gdp_country <chr>, life_expectancy_country <dbl>,
#   tax_revenue_country_country <dbl>, total_tax_rate_country <dbl>,
#   population_country <dbl>, latitude_country <dbl>, longitude_country <dbl>
## Group data by country and calculate counts
total_billionaires_by_country <- billionaires %>%
  group_by(country, longitude_country, latitude_country) %>%
  mutate(
    total_billionaire_count = sum(!is.na(finalworth)),
    total_population = sum(!is.na(population_country)),
    total_gdp = sum(!is.na(gdp_country))
  ) %>%
  distinct(country, longitude_country, latitude_country, .keep_all = TRUE)

## Create the Leaflet map
global_billionaires_map <- leaflet(data = total_billionaires_by_country) %>%
  setView(lng = 87.5, lat = 34.5, zoom = 1.4)  ### Adjust the central coordinates and zoom level

## Add OpenStreetMap tiles
global_billionaires_map <- global_billionaires_map %>%
  addTiles()

## Add circle markers for each country
global_billionaires_map <- global_billionaires_map %>%
  addCircleMarkers(
    lng = ~longitude_country,   ### 'longitude_country' contains longitude information
    lat = ~latitude_country,    ### 'latitude_country' contains latitude information
    radius = ~sqrt(total_billionaire_count) * 0.5,  ### Adjust the radius based on the billionaire count
    color = "darkgreen",
    fillOpacity = 0.5,
    popup = ~paste(
      "Country: ", country,
      "<br>Billionaire Count: ", total_billionaire_count,
      "<br>Population: ", total_population,
      "<br>GDP: ", total_gdp
    ),
    label = ~paste(country, ", ", total_billionaire_count, " billionaires, Pop: ", population_country, ", GDP: ", gdp_country)
  )
Warning in validateCoords(lng, lat, funcName): Data contains 11 rows with
either missing or invalid lat/lon values and will be ignored
## Display the map
global_billionaires_map

The bar chart visualizes the top 20 billionaires based on their final net worth. Each bar represents an individual billionaire, with the length of the bars indicating their final net worth. The bars are color-coded according to the ‘category’ variable, providing a quick overview of the distribution of wealth among the top billionaires in different industries.

The scatterplot showcases the top 20 billionaires globally. The x-axis represents their rank, the y-axis shows their final net worth, and each point is color-coded by country. The size of each point corresponds to the final net worth, creating a dynamic and informative visualization. Hovering over the points reveals detailed information about each billionaire, including their name, country, rank, and final worth.

Finally, the map displays the distribution of billionaires around the world. The map uses circle markers to represent each country, with the size of the circles indicating the number of billionaires in that country. The map is color-coded, and hovering over each circle provides information about the country, the count of billionaires, population, and GDP. This interactive map offers a spatial perspective on the concentration of billionaires globally.