My data set was compiled from the CORGIS Data set Project and it focuses on billionaires around the world throughout various time periods. I am focusing specifically on the year 2014 on this data set. There are many variables in this data set but the main ones I used are year, company_sector, demographics_age, location_region, and wealth_worth_in_billions. Year is the time period of the row, company_sector is the specific type of profession the company is in, demographics_age is the age of the billionaire, location_region is where the continent of the company is located, and wealth_worth_in_billions is how much the billionaire is worth in billions. I plan to explore the correlation between wealth of these technology company billionaires in correlation to their age as well as the differences in region based on wealth in the year 2014
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
setwd("~/Documents/EC/Spring 2026/DATA 110/Project 1")
billionares <- read_csv("billionaires.csv")
## Rows: 2614 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): name, company.name, company.relationship, company.sector, company....
## dbl (6): rank, year, company.founded, demographics.age, location.gdp, wealt...
## lgl (3): wealth.how.from emerging, wealth.how.was founder, wealth.how.was p...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(billionares)
## spc_tbl_ [2,614 × 22] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ name : chr [1:2614] "Bill Gates" "Bill Gates" "Bill Gates" "Warren Buffett" ...
## $ rank : num [1:2614] 1 1 1 2 2 2 3 3 3 4 ...
## $ year : num [1:2614] 1996 2001 2014 1996 2001 ...
## $ company.founded : num [1:2614] 1975 1975 1975 1962 1962 ...
## $ company.name : chr [1:2614] "Microsoft" "Microsoft" "Microsoft" "Berkshire Hathaway" ...
## $ company.relationship : chr [1:2614] "founder" "founder" "founder" "founder" ...
## $ company.sector : chr [1:2614] "Software" "Software" "Software" "Finance" ...
## $ company.type : chr [1:2614] "new" "new" "new" "new" ...
## $ demographics.age : num [1:2614] 40 45 58 65 70 74 0 48 77 68 ...
## $ demographics.gender : chr [1:2614] "male" "male" "male" "male" ...
## $ location.citizenship : chr [1:2614] "United States" "United States" "United States" "United States" ...
## $ location.country code : chr [1:2614] "USA" "USA" "USA" "USA" ...
## $ location.gdp : num [1:2614] 8.10e+12 1.06e+13 0.00 8.10e+12 1.06e+13 ...
## $ location.region : chr [1:2614] "North America" "North America" "North America" "North America" ...
## $ wealth.type : chr [1:2614] "founder non-finance" "founder non-finance" "founder non-finance" "founder non-finance" ...
## $ wealth.worth in billions: num [1:2614] 18.5 58.7 76 15 32.3 72 13.1 30.4 64 12.7 ...
## $ wealth.how.category : chr [1:2614] "New Sectors" "New Sectors" "New Sectors" "Traded Sectors" ...
## $ wealth.how.from emerging: logi [1:2614] TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ wealth.how.industry : chr [1:2614] "Technology-Computer" "Technology-Computer" "Technology-Computer" "Consumer" ...
## $ wealth.how.inherited : chr [1:2614] "not inherited" "not inherited" "not inherited" "not inherited" ...
## $ wealth.how.was founder : logi [1:2614] TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ wealth.how.was political: logi [1:2614] TRUE TRUE TRUE TRUE TRUE TRUE ...
## - attr(*, "spec")=
## .. cols(
## .. name = col_character(),
## .. rank = col_double(),
## .. year = col_double(),
## .. company.founded = col_double(),
## .. company.name = col_character(),
## .. company.relationship = col_character(),
## .. company.sector = col_character(),
## .. company.type = col_character(),
## .. demographics.age = col_double(),
## .. demographics.gender = col_character(),
## .. location.citizenship = col_character(),
## .. `location.country code` = col_character(),
## .. location.gdp = col_double(),
## .. location.region = col_character(),
## .. wealth.type = col_character(),
## .. `wealth.worth in billions` = col_double(),
## .. wealth.how.category = col_character(),
## .. `wealth.how.from emerging` = col_logical(),
## .. wealth.how.industry = col_character(),
## .. wealth.how.inherited = col_character(),
## .. `wealth.how.was founder` = col_logical(),
## .. `wealth.how.was political` = col_logical()
## .. )
## - attr(*, "problems")=<externalptr>
head(billionares)
## # A tibble: 6 × 22
## name rank year company.founded company.name company.relationship
## <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 Bill Gates 1 1996 1975 Microsoft founder
## 2 Bill Gates 1 2001 1975 Microsoft founder
## 3 Bill Gates 1 2014 1975 Microsoft founder
## 4 Warren Buffett 2 1996 1962 Berkshire H… founder
## 5 Warren Buffett 2 2001 1962 Berkshire H… founder
## 6 Carlos Slim Helu 2 2014 1990 Telmex founder
## # ℹ 16 more variables: company.sector <chr>, company.type <chr>,
## # demographics.age <dbl>, demographics.gender <chr>,
## # location.citizenship <chr>, `location.country code` <chr>,
## # location.gdp <dbl>, location.region <chr>, wealth.type <chr>,
## # `wealth.worth in billions` <dbl>, wealth.how.category <chr>,
## # `wealth.how.from emerging` <lgl>, wealth.how.industry <chr>,
## # wealth.how.inherited <chr>, `wealth.how.was founder` <lgl>, …
names(billionares) <- gsub("[(). \\-]", "_", names(billionares))
names(billionares) <- gsub("_$", "", names(billionares))
names(billionares) <- tolower(names(billionares))
head(billionares)
## # A tibble: 6 × 22
## name rank year company_founded company_name company_relationship
## <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 Bill Gates 1 1996 1975 Microsoft founder
## 2 Bill Gates 1 2001 1975 Microsoft founder
## 3 Bill Gates 1 2014 1975 Microsoft founder
## 4 Warren Buffett 2 1996 1962 Berkshire H… founder
## 5 Warren Buffett 2 2001 1962 Berkshire H… founder
## 6 Carlos Slim Helu 2 2014 1990 Telmex founder
## # ℹ 16 more variables: company_sector <chr>, company_type <chr>,
## # demographics_age <dbl>, demographics_gender <chr>,
## # location_citizenship <chr>, location_country_code <chr>,
## # location_gdp <dbl>, location_region <chr>, wealth_type <chr>,
## # wealth_worth_in_billions <dbl>, wealth_how_category <chr>,
## # wealth_how_from_emerging <lgl>, wealth_how_industry <chr>,
## # wealth_how_inherited <chr>, wealth_how_was_founder <lgl>, …
billionares_2014_tech <- billionares |>
filter(year == "2014") |>
filter(company_sector == "technology") |>
filter(demographics_age != "0")
billionares_2014_tech
## # A tibble: 24 × 22
## name rank year company_founded company_name company_relationship
## <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 Larry Page 17 2014 1998 Google founder
## 2 Jeff Bezos 18 2014 1995 Amazon founder
## 3 Sergey Brin 19 2014 1998 Google founder
## 4 Mark Zuckerberg 21 2014 2004 Facebook founder
## 5 Steve Ballmer 36 2014 1975 Microsoft CEO
## 6 Masayoshi Son 42 2014 1981 Softbank founder/CEO
## 7 Michael Dell 48 2014 1984 Dell founder
## 8 Paul Allen 56 2014 1975 Microsoft founder
## 9 Laurene Powell… 73 2014 1976 Apple relation
## 10 Shiv Nadar 102 2014 1976 HCL founder
## # ℹ 14 more rows
## # ℹ 16 more variables: company_sector <chr>, company_type <chr>,
## # demographics_age <dbl>, demographics_gender <chr>,
## # location_citizenship <chr>, location_country_code <chr>,
## # location_gdp <dbl>, location_region <chr>, wealth_type <chr>,
## # wealth_worth_in_billions <dbl>, wealth_how_category <chr>,
## # wealth_how_from_emerging <lgl>, wealth_how_industry <chr>, …
scatterplot_age_wealth <- ggplot(billionares_2014_tech, aes(x = demographics_age, y = wealth_worth_in_billions)) +
labs(title = "Correlation Between Age and Wealth Worth in Billions of Billionaires in 2014",
caption = "Source: CORGIS Dataset Project",
x = "Age of Billionaires",
y = "Wealth Worth (Billions)") +
theme_minimal(base_size = 8) +
geom_smooth(method='lm',formula = y~x)
scatterplot_age_wealth + geom_point(size=1)
lm_wealth_age <- lm(wealth_worth_in_billions ~ demographics_age, data = billionares_2014_tech)
summary(lm_wealth_age)
##
## Call:
## lm(formula = wealth_worth_in_billions ~ demographics_age, data = billionares_2014_tech)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.2166 -9.0618 0.0035 5.7859 17.8756
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.0462 8.7664 3.998 0.000606 ***
## demographics_age -0.4184 0.1552 -2.695 0.013218 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.163 on 22 degrees of freedom
## Multiple R-squared: 0.2482, Adjusted R-squared: 0.2141
## F-statistic: 7.264 on 1 and 22 DF, p-value: 0.01322
billionares_2014 <- billionares |>
filter(year == "2014") |>
filter(demographics_age != "0")
billionares_2014
## # A tibble: 1,590 × 22
## name rank year company_founded company_name company_relationship
## <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 Bill Gates 1 2014 1975 Microsoft founder
## 2 Carlos Slim He… 2 2014 1990 Telmex founder
## 3 Amancio Ortega 3 2014 1975 Zara founder
## 4 Warren Buffett 4 2014 1839 Berkshire H… founder
## 5 Larry Ellison 5 2014 1977 Oracle founder
## 6 Charles Koch 6 2014 1940 Koch indust… relation
## 7 David Koch 6 2014 1940 Koch indust… relation
## 8 Sheldon Adelson 8 2014 1952 Las Vegas S… founder
## 9 Christy Walton 9 2014 1962 Walmart relation
## 10 Jim Walton 10 2014 1962 Walmart relation
## # ℹ 1,580 more rows
## # ℹ 16 more variables: company_sector <chr>, company_type <chr>,
## # demographics_age <dbl>, demographics_gender <chr>,
## # location_citizenship <chr>, location_country_code <chr>,
## # location_gdp <dbl>, location_region <chr>, wealth_type <chr>,
## # wealth_worth_in_billions <dbl>, wealth_how_category <chr>,
## # wealth_how_from_emerging <lgl>, wealth_how_industry <chr>, …
ggplot(data = billionares_2014, aes(x=location_region, y = wealth_worth_in_billions, fill = location_region)) +
coord_flip() +
geom_col(alpha = 0.5)+
labs(x = "Region", y = "Wealth Worth in Billions",
title = "Wealth Worth in Billions of Billionaires Based on Region",
caption = "From the CORGIS Dataset Project")
I mainly cleaned my data set by removing the age values of 0 and switching the “.” for “_” when separating two words in variables. I filtered two data sets from the main one, one for the linear model, and one for the bar graph using the dplyr library. The scatter plot represents the correlation between age of the technology focused billionaires in comparison to their wealth worth in billions. With an adjusted r-squared of 0.214, we can conclude that there is not a large amount of evidence showing correlation between age and wealth worth in billions of technology oriented billionaires. My second visualization is a bar graph which focuses on the region of where the billionaires live in correlation to the total wealth worth in billions. In this visualization, we can clearly see the total wealth worth in billions of each region, showing that North America is the richest region for billionaires in 2014. This does not surprise me as I believe many companies and billionaires are established and living in North America. I wish I could have researched upon all billionaires for my linear model, not just technology based billionaires, but unfortunately the graph would look too cluttered if I were to do so.