.
Principles of Data Visualization
and Introduction to ggplot2
.
.
“I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine.”
Create 3 graphs showing:
-the distribution of companies in the dataset by State
-how many people are employed by companies in different industries in the state with the 3rd most companies in the data set
-which industries generate the most revenue per employee.
-Graphs 2 and 3 should show the distribution per industry
.
.
.
Graph 1: Number of companies by state in Inc.’s fastest growing companies

Graph 2: To create a graph showing distribution of employees by industry, the first thought is to create boxplots. Boxplots, in this dataset, mask the true distribution of the companies. A look at ‘Environmental Services’ shows the problem. While only 2 companies are in the New York dataset, boxplots make it appear that there may be far more.
## # A tibble: 2 x 12
## # Groups: Industry [1]
## Rank Name Growth_Rate Revenue Industry Employees City State
## <int> <fct> <dbl> <dbl> <fct> <int> <fct> <fct>
## 1 3661 Environment~ 0.81 45100000 Environme~ 250 Syra~ NY
## 2 4170 Creative En~ 0.62 5300000 Environme~ 60 New ~ NY
## # ... with 4 more variables: firstquart <dbl>, median_by_ind <dbl>,
## # thirdquart <dbl>, outliers <dbl>

Graph 3: To create an interactive graph showing the top industries in terms of revenue per employee, which contains distribution information, we created a new variable called the profitability variance ratio. Based on the Sharpe ratio, it measures amount of excess return per worker per standard deviation. We ordered our industries by this new ratio. An interactive tooltip shows revenue per employee, profitability variance ratio and the top company in each industry, by revenue per employee.
Ultimately, this is not a good measure for investment. We returned a lot of companies with high costs and complicated machinery. A better measure would have been based on profit ratios. Our average revenue per employee was based on industry totals, yielding an employee-weighted measure instead of a possible company-weighted average. An investor should be cognizant of the size of company he or she is considering for investment.
Code to set up a tooltip with multiple values comes from:
https://davidgohel.github.io/ggiraph/articles/offcran/examples.html