library(ggplot2)
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Principles of Data Visualization and Introduction to ggplot2

I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in:

And lets preview this data:

##   Rank                         Name Growth_Rate   Revenue
## 1    1                         Fuhu      421.48 1.179e+08
## 2    2        FederalConference.com      248.31 4.960e+07
## 3    3                The HCI Group      245.45 2.550e+07
## 4    4                      Bridger      233.08 1.900e+09
## 5    5                       DataXu      213.37 8.700e+07
## 6    6 MileStone Community Builders      179.38 4.570e+07
##                       Industry Employees         City State
## 1 Consumer Products & Services       104   El Segundo    CA
## 2          Government Services        51     Dumfries    VA
## 3                       Health       132 Jacksonville    FL
## 4                       Energy        50      Addison    TX
## 5      Advertising & Marketing       220       Boston    MA
## 6                  Real Estate        63       Austin    TX
##       Rank                          Name       Growth_Rate     
##  Min.   :   1   (Add)ventures         :   1   Min.   :  0.340  
##  1st Qu.:1252   @Properties           :   1   1st Qu.:  0.770  
##  Median :2502   1-Stop Translation USA:   1   Median :  1.420  
##  Mean   :2502   110 Consulting        :   1   Mean   :  4.612  
##  3rd Qu.:3751   11thStreetCoffee.com  :   1   3rd Qu.:  3.290  
##  Max.   :5000   123 Exteriors         :   1   Max.   :421.480  
##                 (Other)               :4995                    
##     Revenue                                  Industry      Employees      
##  Min.   :2.000e+06   IT Services                 : 733   Min.   :    1.0  
##  1st Qu.:5.100e+06   Business Products & Services: 482   1st Qu.:   25.0  
##  Median :1.090e+07   Advertising & Marketing     : 471   Median :   53.0  
##  Mean   :4.822e+07   Health                      : 355   Mean   :  232.7  
##  3rd Qu.:2.860e+07   Software                    : 342   3rd Qu.:  132.0  
##  Max.   :1.010e+10   Financial Services          : 260   Max.   :66803.0  
##                      (Other)                     :2358   NA's   :12       
##             City          State     
##  New York     : 160   CA     : 701  
##  Chicago      :  90   TX     : 387  
##  Austin       :  88   NY     : 311  
##  Houston      :  76   VA     : 283  
##  San Francisco:  75   FL     : 282  
##  Atlanta      :  74   IL     : 273  
##  (Other)      :4438   (Other):2764

Think a bit on what these summaries mean. Use the space below to add some more relevant non-visual exploratory information you think helps you understand this data:

## [1] 241160900000
## # A tibble: 25 x 2
## # Groups:   Industry [25]
##                        Industry     n
##                          <fctr> <int>
##  1                  IT Services   733
##  2 Business Products & Services   482
##  3      Advertising & Marketing   471
##  4                       Health   355
##  5                     Software   342
##  6           Financial Services   260
##  7                Manufacturing   256
##  8 Consumer Products & Services   203
##  9                       Retail   203
## 10          Government Services   202
## # ... with 15 more rows
## # A tibble: 25 x 2
##                        Industry `Revenue by Industry`
##                          <fctr>                 <dbl>
##  1 Business Products & Services           26367900000
##  2                  IT Services           20681300000
##  3                       Health           17863400000
##  4 Consumer Products & Services           14956400000
##  5   Logistics & Transportation           14840500000
##  6                       Energy           13771600000
##  7                 Construction           13174300000
##  8           Financial Services           13150900000
##  9              Food & Beverage           12911300000
## 10                Manufacturing           12684000000
## # ... with 15 more rows
##                        Industry   n Revenue by Industry
## 1             Computer Hardware  44         11885700000
## 2                        Energy 109         13771600000
## 3               Food & Beverage 131         12911300000
## 4    Logistics & Transportation 155         14840500000
## 5  Consumer Products & Services 203         14956400000
## 6                  Construction 187         13174300000
## 7            Telecommunications 129          7334400000
## 8  Business Products & Services 482         26367900000
## 9                      Security  73          3812800000
## 10       Environmental Services  51          2638800000
## 11           Financial Services 260         13150900000
## 12                       Retail 203         10257400000
## 13                       Health 355         17863400000
## 14                Manufacturing 256         12684000000
## 15         Travel & Hospitality  62          2931600000
## 16              Human Resources 196          9246100000
## 17                    Insurance  50          2337900000
## 18                  Engineering  74          2532500000
## 19                        Media  54          1742400000
## 20                  Real Estate  96          2965700000
## 21          Government Services 202          6009100000
## 22                  IT Services 733         20681300000
## 23                     Software 342          8140600000
## 24      Advertising & Marketing 471          7785000000
## 25                    Education  83          1139300000
##    Avg. Rev. By Industry
## 1              270129545
## 2              126344954
## 3               98559542
## 4               95745161
## 5               73676847
## 6               70450802
## 7               56855814
## 8               54705187
## 9               52230137
## 10              51741176
## 11              50580385
## 12              50529064
## 13              50319437
## 14              49546875
## 15              47283871
## 16              47173980
## 17              46758000
## 18              34222973
## 19              32266667
## 20              30892708
## 21              29748020
## 22              28214598
## 23              23802924
## 24              16528662
## 25              13726506
## # A tibble: 52 x 2
##     State `Revenue by State`
##    <fctr>              <dbl>
##  1     IL        33244300000
##  2     CA        23457900000
##  3     TX        22164200000
##  4     NY        18260400000
##  5     OH        12786600000
##  6     FL        10610300000
##  7     NC         9258500000
##  8     VA         8667700000
##  9     MI         7805800000
## 10     WI         7296600000
## # ... with 42 more rows
## # A tibble: 25 x 2
##                        Industry `Employees By Industry`
##                          <fctr>                   <int>
##  1              Human Resources                  226980
##  2           Financial Services                   47693
##  3 Consumer Products & Services                   45464
##  4                     Security                   41059
##  5      Advertising & Marketing                   39731
##  6                       Retail                   37068
##  7                 Construction                   29099
##  8                       Energy                   26437
##  9          Government Services                   26185
## 10         Travel & Hospitality                   23035
## # ... with 15 more rows
## # A tibble: 798 x 3
## # Groups:   State [52]
##     State                     Industry     n
##    <fctr>                       <fctr> <int>
##  1     CA      Advertising & Marketing    91
##  2     VA          Government Services    83
##  3     CA                  IT Services    82
##  4     CA Business Products & Services    69
##  5     VA                  IT Services    69
##  6     CA                     Software    65
##  7     NY      Advertising & Marketing    57
##  8     TX                  IT Services    54
##  9     IL                  IT Services    48
## 10     CA           Financial Services    44
## # ... with 788 more rows
## # A tibble: 25 x 4
##                        Industry `Avg. Growth Rate` `Min Growth Rate`
##                          <fctr>              <dbl>             <dbl>
##  1                       Energy           9.603303              0.35
##  2 Consumer Products & Services           8.776108              0.35
##  3                  Real Estate           7.746667              0.35
##  4          Government Services           7.238168              0.35
##  5      Advertising & Marketing           6.225478              0.35
##  6                       Retail           6.184729              0.34
##  7           Financial Services           5.435308              0.34
##  8                     Software           5.020643              0.35
##  9                       Health           4.856394              0.35
## 10                        Media           4.374074              0.41
## # ... with 15 more rows, and 1 more variables: `Max Growth Rate` <dbl>
## # A tibble: 25 x 4
##                        Industry `Avg. No. Employees` `Min No. Employees`
##                          <fctr>                <dbl>               <dbl>
##  1              Human Resources            1158.0612                   4
##  2                     Security             562.4521                   7
##  3         Travel & Hospitality             371.5323                   3
##  4                  Engineering             276.1486                  11
##  5                       Energy             242.5413                   2
##  6 Consumer Products & Services             223.9606                   1
##  7            Computer Hardware             220.7727                   6
##  8       Environmental Services             199.1176                   4
##  9           Financial Services             183.4346                   5
## 10                       Retail             182.6010                   2
## # ... with 15 more rows, and 1 more variables: `Max No. Employees` <dbl>

Question 1

Create a graph that shows the distribution of companies in the dataset by State (ie how many are in each state). There are a lot of States, so consider which axis you should use. This visualization is ultimately going to be consumed on a ‘portrait’ oriented screen (ie taller than wide), which should further guide your layout choices.

Quesiton 2

Lets dig in on the state with the 3rd most companies in the data set. Imagine you work for the state and are interested in how many people are employed by companies in different industries. Create a plot that shows the average and/or median employment by industry for companies in this state (only use cases with full data, use R’s complete.cases() function.) In addition to this, your graph should show how variable the ranges are, and you should deal with outliers.

Question 3

Now imagine you work for an investor and want to see which industries generate the most revenue per employee. Create a chart that makes this information clear. Once again, the distribution per industry should be shown.

##                        Industry Employees By Industry Revenue by Industry
## 1             Computer Hardware                  9714         11885700000
## 2                        Energy                 26437         13771600000
## 3                  Construction                 29099         13174300000
## 4  Consumer Products & Services                 45464         14956400000
## 5                     Insurance                  7339          2337900000
## 6                        Retail                 37068         10257400000
## 7            Financial Services                 47693         13150900000
## 8        Environmental Services                 10155          2638800000
## 9           Government Services                 26185          6009100000
## 10      Advertising & Marketing                 39731          7785000000
## 11                        Media                  9532          1742400000
## 12                    Education                  7685          1139300000
## 13         Travel & Hospitality                 23035          2931600000
## 14                  Engineering                 20435          2532500000
## 15                     Security                 41059          3812800000
## 16              Human Resources                226980          9246100000
## 17 Business Products & Services                    NA         26367900000
## 18              Food & Beverage                    NA         12911300000
## 19                       Health                    NA         17863400000
## 20                  IT Services                    NA         20681300000
## 21   Logistics & Transportation                    NA         14840500000
## 22                Manufacturing                    NA         12684000000
## 23                  Real Estate                    NA          2965700000
## 24                     Software                    NA          8140600000
## 25           Telecommunications                    NA          7334400000
##    Revenue Per Employee By Industry
## 1                        1223563.93
## 2                         520921.44
## 3                         452740.64
## 4                         328972.37
## 5                         318558.39
## 6                         276718.46
## 7                         275740.67
## 8                         259852.29
## 9                         229486.35
## 10                        195942.71
## 11                        182794.80
## 12                        148249.84
## 13                        127267.20
## 14                        123929.53
## 15                         92861.49
## 16                         40735.31
## 17                               NA
## 18                               NA
## 19                               NA
## 20                               NA
## 21                               NA
## 22                               NA
## 23                               NA
## 24                               NA
## 25                               NA