2025-11-02

Description of the Dataset

  • This presentation pulls data from the World Bank “Entrepreneurship Database” which collected data on registered firms in 180 economies between 2006-2022. This is then compared to economic data from the World Bank’s “Prosperity” indicators on their Open Data webpage. The goal of this project is to investigate the correlation between entrepreneurship rates and select economic indicators.

  • Data Sources:

2025, “Entrepreneurship Database”, World Bank Group, https://www.worldbank.org/en/programs/entrepreneurship/total-number-of-firms (Accessed November 2nd, 2025)

2025, “Prosperity”, World Bank Group, https://data360.worldbank.org/en/prosperity (Accessed November 2nd, 2025)

Brief Summary

The first part of this presentation will examine information present in the Entrepreneurship Database. The primary indicator of this database is “new business density” or NBD. This measures the number of new firms registered per annum that are limited liability corporations, normalized by the adult population.

The second part of this presentation will take the following economic datasets from the “Prosperity” database into account:

  • Gini index (measuring economic inequality)

The rate of New Business Density for a given time period will then be compared to these two indicators.

GGplot Time Series

Pie Chart Comparison of Entrepreneurship Rates

The following pie charts compare regions based on the whether they have a low, medium, or high rate of entrepreneurship as is measured by the New Business Density rate. This is also an average of the total 2006-2022 period.

Plotly 3D plot

The following plot compares the year, New Business Density, and growth in GDP per Capita among the 180 countries listed for the 2006-2022 time period.

Analysis of 3D plot

The previous 3D plot provided a few key insights about the relationship between a country’s GINI coefficient and their rate of entrepreneurship as measured by New Business Density.

  1. Some countries have an independent relationship between GINI and NBD, as can be seen for Luxembourg, Estonia, and the United Kingdom. All these nations saw relatively little change in their overall GINI index, while seeing notable changes in their NBD rate.

  2. There appears to be a correlation between lower economic inequality and higher rates of entrepreneurship. However, the strength of this correlation can be confirmed through linear regression comparing GINI and NBD.

Ggplot Simple Linear Regression Code

In order to find a correlation between economic inequality and the rate of entrepreneurship in a country, I ran a simple linear regression model. This took the averages of each country’s NBD rate and GINI coefficient for the 2006-2022 time period and compared them with one another. The independent variable in this graph is the GINI coefficient.

# Create a 2D scatterplot comparing average GINI and NBD
ggplot(merged_data, aes(x = avg_NBD, y = avg_GINI)) +
  geom_point(color = "steelblue", size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "darkred") +
  labs(
    title = "Average GINI vs New Business Density (2006–2022)",
    x = "Average New Business Density Rate",
    y = "Average GINI (Inequality Index)",
    caption = "Source: World Bank Group"
  ) +
  theme_minimal()

Ggplot Simple Linear Regression Graph

The following plot compares the average Gini Coefficient with the average New Business Density throughout the 2006-2022 time period.

Statistical Analysis

Based on the analyses presented in this project, there appears to be a negligible correlation of the rate of entrepreneurship and the inequality as measured by the GINI index. This indicates that economic inequality is likely not a major factor weighing on entrepreneurship.

## 
## Call:
## lm(formula = avg_NBD ~ avg_GINI, data = merged_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3604 -2.1918 -1.3212  0.6592 17.0925 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.51027    1.56750   3.515 0.000601 ***
## avg_GINI    -0.07432    0.04119  -1.804 0.073444 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.652 on 133 degrees of freedom
## Multiple R-squared:  0.02389,    Adjusted R-squared:  0.01655 
## F-statistic: 3.256 on 1 and 133 DF,  p-value: 0.07344

Conclusion

Thank you! I hope that this helps those who are interested in the correlation between business creation and economic inequality.