Introduction

My data set was compiled from the CORGIS Data set Project and it focuses on billionaires around the world throughout various time periods. I am focusing specifically on the year 2014 on this data set. There are many variables in this data set but the main ones I used are year, company_sector, demographics_age, location_region, and wealth_worth_in_billions. Year is the time period of the row, company_sector is the specific type of profession the company is in, demographics_age is the age of the billionaire, location_region is where the continent of the company is located, and wealth_worth_in_billions is how much the billionaire is worth in billions. I plan to explore the correlation between wealth of these technology company billionaires in correlation to their age as well as the differences in region based on wealth in the year 2014

Loading Libraries & Data Set & Observing The Structure

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
 
setwd("~/Documents/EC/Spring 2026/DATA 110/Project 1")
 
billionares <- read_csv("billionaires.csv")
## Rows: 2614 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): name, company.name, company.relationship, company.sector, company....
## dbl  (6): rank, year, company.founded, demographics.age, location.gdp, wealt...
## lgl  (3): wealth.how.from emerging, wealth.how.was founder, wealth.how.was p...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(billionares)
## spc_tbl_ [2,614 × 22] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ name                    : chr [1:2614] "Bill Gates" "Bill Gates" "Bill Gates" "Warren Buffett" ...
##  $ rank                    : num [1:2614] 1 1 1 2 2 2 3 3 3 4 ...
##  $ year                    : num [1:2614] 1996 2001 2014 1996 2001 ...
##  $ company.founded         : num [1:2614] 1975 1975 1975 1962 1962 ...
##  $ company.name            : chr [1:2614] "Microsoft" "Microsoft" "Microsoft" "Berkshire Hathaway" ...
##  $ company.relationship    : chr [1:2614] "founder" "founder" "founder" "founder" ...
##  $ company.sector          : chr [1:2614] "Software" "Software" "Software" "Finance" ...
##  $ company.type            : chr [1:2614] "new" "new" "new" "new" ...
##  $ demographics.age        : num [1:2614] 40 45 58 65 70 74 0 48 77 68 ...
##  $ demographics.gender     : chr [1:2614] "male" "male" "male" "male" ...
##  $ location.citizenship    : chr [1:2614] "United States" "United States" "United States" "United States" ...
##  $ location.country code   : chr [1:2614] "USA" "USA" "USA" "USA" ...
##  $ location.gdp            : num [1:2614] 8.10e+12 1.06e+13 0.00 8.10e+12 1.06e+13 ...
##  $ location.region         : chr [1:2614] "North America" "North America" "North America" "North America" ...
##  $ wealth.type             : chr [1:2614] "founder non-finance" "founder non-finance" "founder non-finance" "founder non-finance" ...
##  $ wealth.worth in billions: num [1:2614] 18.5 58.7 76 15 32.3 72 13.1 30.4 64 12.7 ...
##  $ wealth.how.category     : chr [1:2614] "New Sectors" "New Sectors" "New Sectors" "Traded Sectors" ...
##  $ wealth.how.from emerging: logi [1:2614] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ wealth.how.industry     : chr [1:2614] "Technology-Computer" "Technology-Computer" "Technology-Computer" "Consumer" ...
##  $ wealth.how.inherited    : chr [1:2614] "not inherited" "not inherited" "not inherited" "not inherited" ...
##  $ wealth.how.was founder  : logi [1:2614] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ wealth.how.was political: logi [1:2614] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   name = col_character(),
##   ..   rank = col_double(),
##   ..   year = col_double(),
##   ..   company.founded = col_double(),
##   ..   company.name = col_character(),
##   ..   company.relationship = col_character(),
##   ..   company.sector = col_character(),
##   ..   company.type = col_character(),
##   ..   demographics.age = col_double(),
##   ..   demographics.gender = col_character(),
##   ..   location.citizenship = col_character(),
##   ..   `location.country code` = col_character(),
##   ..   location.gdp = col_double(),
##   ..   location.region = col_character(),
##   ..   wealth.type = col_character(),
##   ..   `wealth.worth in billions` = col_double(),
##   ..   wealth.how.category = col_character(),
##   ..   `wealth.how.from emerging` = col_logical(),
##   ..   wealth.how.industry = col_character(),
##   ..   wealth.how.inherited = col_character(),
##   ..   `wealth.how.was founder` = col_logical(),
##   ..   `wealth.how.was political` = col_logical()
##   .. )
##  - attr(*, "problems")=<externalptr>
head(billionares)
## # A tibble: 6 × 22
##   name              rank  year company.founded company.name company.relationship
##   <chr>            <dbl> <dbl>           <dbl> <chr>        <chr>               
## 1 Bill Gates           1  1996            1975 Microsoft    founder             
## 2 Bill Gates           1  2001            1975 Microsoft    founder             
## 3 Bill Gates           1  2014            1975 Microsoft    founder             
## 4 Warren Buffett       2  1996            1962 Berkshire H… founder             
## 5 Warren Buffett       2  2001            1962 Berkshire H… founder             
## 6 Carlos Slim Helu     2  2014            1990 Telmex       founder             
## # ℹ 16 more variables: company.sector <chr>, company.type <chr>,
## #   demographics.age <dbl>, demographics.gender <chr>,
## #   location.citizenship <chr>, `location.country code` <chr>,
## #   location.gdp <dbl>, location.region <chr>, wealth.type <chr>,
## #   `wealth.worth in billions` <dbl>, wealth.how.category <chr>,
## #   `wealth.how.from emerging` <lgl>, wealth.how.industry <chr>,
## #   wealth.how.inherited <chr>, `wealth.how.was founder` <lgl>, …

Cleaning The Data Set

names(billionares) <- gsub("[(). \\-]", "_", names(billionares))
names(billionares) <- gsub("_$", "", names(billionares))
names(billionares) <- tolower(names(billionares))

head(billionares)
## # A tibble: 6 × 22
##   name              rank  year company_founded company_name company_relationship
##   <chr>            <dbl> <dbl>           <dbl> <chr>        <chr>               
## 1 Bill Gates           1  1996            1975 Microsoft    founder             
## 2 Bill Gates           1  2001            1975 Microsoft    founder             
## 3 Bill Gates           1  2014            1975 Microsoft    founder             
## 4 Warren Buffett       2  1996            1962 Berkshire H… founder             
## 5 Warren Buffett       2  2001            1962 Berkshire H… founder             
## 6 Carlos Slim Helu     2  2014            1990 Telmex       founder             
## # ℹ 16 more variables: company_sector <chr>, company_type <chr>,
## #   demographics_age <dbl>, demographics_gender <chr>,
## #   location_citizenship <chr>, location_country_code <chr>,
## #   location_gdp <dbl>, location_region <chr>, wealth_type <chr>,
## #   wealth_worth_in_billions <dbl>, wealth_how_category <chr>,
## #   wealth_how_from_emerging <lgl>, wealth_how_industry <chr>,
## #   wealth_how_inherited <chr>, wealth_how_was_founder <lgl>, …

Filtering The Data Set To Only The Year 2014 & technology companies & Removing Demographic Ages of 0 (Most Recent Year With The Most Observations)

billionares_2014_tech <- billionares |>
  filter(year == "2014") |>
  filter(company_sector == "technology") |>
  filter(demographics_age != "0")
billionares_2014_tech
## # A tibble: 24 × 22
##    name             rank  year company_founded company_name company_relationship
##    <chr>           <dbl> <dbl>           <dbl> <chr>        <chr>               
##  1 Larry Page         17  2014            1998 Google       founder             
##  2 Jeff Bezos         18  2014            1995 Amazon       founder             
##  3 Sergey Brin        19  2014            1998 Google       founder             
##  4 Mark Zuckerberg    21  2014            2004 Facebook     founder             
##  5 Steve Ballmer      36  2014            1975 Microsoft    CEO                 
##  6 Masayoshi Son      42  2014            1981 Softbank     founder/CEO         
##  7 Michael Dell       48  2014            1984 Dell         founder             
##  8 Paul Allen         56  2014            1975 Microsoft    founder             
##  9 Laurene Powell…    73  2014            1976 Apple        relation            
## 10 Shiv Nadar        102  2014            1976 HCL          founder             
## # ℹ 14 more rows
## # ℹ 16 more variables: company_sector <chr>, company_type <chr>,
## #   demographics_age <dbl>, demographics_gender <chr>,
## #   location_citizenship <chr>, location_country_code <chr>,
## #   location_gdp <dbl>, location_region <chr>, wealth_type <chr>,
## #   wealth_worth_in_billions <dbl>, wealth_how_category <chr>,
## #   wealth_how_from_emerging <lgl>, wealth_how_industry <chr>, …

Scatterplot of Age and Wealth in Billions of Billionares With Technology Companies in 2014

scatterplot_age_wealth <- ggplot(billionares_2014_tech, aes(x = demographics_age, y = wealth_worth_in_billions)) +
  labs(title = "Correlation Between Age and Wealth Worth in Billions of Billionaires in 2014",
  caption = "Source: CORGIS Dataset Project",
  x = "Age of Billionaires", 
  y = "Wealth Worth (Billions)") +
  theme_minimal(base_size = 8) +
  geom_smooth(method='lm',formula = y~x)
scatterplot_age_wealth + geom_point(size=1)

Linear Model of Age and Wealth in Billions of Billionares With Technology Companies in 2014

lm_wealth_age <- lm(wealth_worth_in_billions ~ demographics_age, data = billionares_2014_tech)
summary(lm_wealth_age)
## 
## Call:
## lm(formula = wealth_worth_in_billions ~ demographics_age, data = billionares_2014_tech)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.2166  -9.0618   0.0035   5.7859  17.8756 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       35.0462     8.7664   3.998 0.000606 ***
## demographics_age  -0.4184     0.1552  -2.695 0.013218 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.163 on 22 degrees of freedom
## Multiple R-squared:  0.2482, Adjusted R-squared:  0.2141 
## F-statistic: 7.264 on 1 and 22 DF,  p-value: 0.01322

Filtering the Data Set for My Bar Graph Visualization By Focusing on the Year 2014 and Removing Ages Equal to 0

billionares_2014 <- billionares |>
  filter(year == "2014") |>
  filter(demographics_age != "0")
billionares_2014
## # A tibble: 1,590 × 22
##    name             rank  year company_founded company_name company_relationship
##    <chr>           <dbl> <dbl>           <dbl> <chr>        <chr>               
##  1 Bill Gates          1  2014            1975 Microsoft    founder             
##  2 Carlos Slim He…     2  2014            1990 Telmex       founder             
##  3 Amancio Ortega      3  2014            1975 Zara         founder             
##  4 Warren Buffett      4  2014            1839 Berkshire H… founder             
##  5 Larry Ellison       5  2014            1977 Oracle       founder             
##  6 Charles Koch        6  2014            1940 Koch indust… relation            
##  7 David Koch          6  2014            1940 Koch indust… relation            
##  8 Sheldon Adelson     8  2014            1952 Las Vegas S… founder             
##  9 Christy Walton      9  2014            1962 Walmart      relation            
## 10 Jim Walton         10  2014            1962 Walmart      relation            
## # ℹ 1,580 more rows
## # ℹ 16 more variables: company_sector <chr>, company_type <chr>,
## #   demographics_age <dbl>, demographics_gender <chr>,
## #   location_citizenship <chr>, location_country_code <chr>,
## #   location_gdp <dbl>, location_region <chr>, wealth_type <chr>,
## #   wealth_worth_in_billions <dbl>, wealth_how_category <chr>,
## #   wealth_how_from_emerging <lgl>, wealth_how_industry <chr>, …

Bar Graph of Wealth Worth in Billions of Billionaires Based on Region

ggplot(data = billionares_2014, aes(x=location_region, y = wealth_worth_in_billions, fill = location_region)) + 
  coord_flip() +
  geom_col(alpha = 0.5)+
  labs(x = "Region", y = "Wealth Worth in Billions", 
       title = "Wealth Worth in Billions of Billionaires Based on Region",
       caption = "From the CORGIS Dataset Project")

Conclusion

I mainly cleaned my data set by removing the age values of 0 and switching the “.” for “_” when separating two words in variables. I filtered two data sets from the main one, one for the linear model, and one for the bar graph using the dplyr library. The scatter plot represents the correlation between age of the technology focused billionaires in comparison to their wealth worth in billions. With an adjusted r-squared of 0.214, we can conclude that there is not a large amount of evidence showing correlation between age and wealth worth in billions of technology oriented billionaires. My second visualization is a bar graph which focuses on the region of where the billionaires live in correlation to the total wealth worth in billions. In this visualization, we can clearly see the total wealth worth in billions of each region, showing that North America is the richest region for billionaires in 2014. This does not surprise me as I believe many companies and billionaires are established and living in North America. I wish I could have researched upon all billionaires for my linear model, not just technology based billionaires, but unfortunately the graph would look too cluttered if I were to do so.