Data Importing and Cleaning/Tidying

Importing The Data (CPI Data and VC Investment Data from 1913 to 2014)

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following objects are masked from 'package:plyr':
## 
##     arrange, mutate, rename, summarise
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
## Loading required package: viridisLite
## 
## Attaching package: 'ggpubr'
## The following object is masked from 'package:plyr':
## 
##     mutate
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
## 
## Attaching package: 'ggmap'
## The following object is masked from 'package:plotly':
## 
##     wind
## Warning in readLines(json_file): incomplete final line found on 'https://
## datahub.io/core/cpi-us/datapackage.json'
## [1] "validation_report" "cpiai_csv"         "cpiai_json"       
## [4] "cpi-us_zip"        "cpiai"

Structure, head, and tail of Data

## 'data.frame':    49438 obs. of  34 variables:
##  $ name                : chr  "#waywire" "&TV Communications" "'Rock' Your Paper" "(In)Touch Network" ...
##  $ category_list       : chr  "|Entertainment|Politics|Social Media|News|" "|Games|" "|Publishing|Education|" "|Electronics|Guides|Coffee|Restaurants|Music|iPhone|Apps|Mobile|iOS|E-Commerce|" ...
##  $ market              : chr  " News " " Games " " Publishing " " Electronics " ...
##  $ funding_total_usd   : chr  " 17,50,000 " " 40,00,000 " "40,000" " 15,00,000 " ...
##  $ status              : chr  "acquired" "operating" "operating" "operating" ...
##  $ country_code        : chr  "USA" "USA" "EST" "GBR" ...
##  $ state_code          : chr  "NY" "CA" NA NA ...
##  $ region              : chr  "New York City" "Los Angeles" "Tallinn" "London" ...
##  $ city                : chr  "New York" "Los Angeles" "Tallinn" "London" ...
##  $ funding_rounds      : int  1 2 1 1 2 1 1 1 1 1 ...
##  $ founded_at          : chr  "6/1/12" NA "10/26/12" "4/1/11" ...
##  $ founded_month       : chr  "2012-06" NA "2012-10" "2011-04" ...
##  $ founded_quarter     : chr  "2012-Q2" NA "2012-Q4" "2011-Q2" ...
##  $ founded_year        : int  2012 NA 2012 2011 2014 2011 NA 2007 2010 NA ...
##  $ first_funding_at    : chr  "6/30/12" "6/4/10" "8/9/12" "4/1/11" ...
##  $ last_funding_at     : chr  "6/30/12" "9/23/10" "8/9/12" "4/1/11" ...
##  $ seed                : int  1750000 0 40000 1500000 0 0 0 0 0 41250 ...
##  $ venture             : num  0e+00 4e+06 0e+00 0e+00 0e+00 7e+06 0e+00 2e+06 0e+00 0e+00 ...
##  $ equity_crowdfunding : int  0 0 0 0 60000 0 0 0 0 0 ...
##  $ undisclosed         : int  0 0 0 0 0 0 4912393 0 0 0 ...
##  $ convertible_note    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ debt_financing      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ angel               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ grant               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ private_equity      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ post_ipo_equity     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ post_ipo_debt       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ secondary_market    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ product_crowdfunding: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ round_A             : int  0 0 0 0 0 0 0 2000000 0 0 ...
##  $ round_B             : int  0 0 0 0 0 7000000 0 0 0 0 ...
##  $ round_C             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ round_D             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ round_E             : int  0 0 0 0 0 0 0 0 0 0 ...
##                 name
## 1           #waywire
## 2 &TV Communications
## 3  'Rock' Your Paper
## 4  (In)Touch Network
## 5 -R- Ranch and Mine
## 6      .Club Domains
##                                                                     category_list
## 1                                      |Entertainment|Politics|Social Media|News|
## 2                                                                         |Games|
## 3                                                          |Publishing|Education|
## 4 |Electronics|Guides|Coffee|Restaurants|Music|iPhone|Apps|Mobile|iOS|E-Commerce|
## 5                                                   |Tourism|Entertainment|Games|
## 6                                                                      |Software|
##          market funding_total_usd    status country_code state_code
## 1         News         17,50,000   acquired          USA         NY
## 2        Games         40,00,000  operating          USA         CA
## 3   Publishing             40,000 operating          EST       <NA>
## 4  Electronics         15,00,000  operating          GBR       <NA>
## 5      Tourism             60,000 operating          USA         TX
## 6     Software         70,00,000       <NA>          USA         FL
##           region         city funding_rounds founded_at founded_month
## 1  New York City     New York              1     6/1/12       2012-06
## 2    Los Angeles  Los Angeles              2       <NA>          <NA>
## 3        Tallinn      Tallinn              1   10/26/12       2012-10
## 4         London       London              1     4/1/11       2011-04
## 5         Dallas   Fort Worth              2     1/1/14       2014-01
## 6 Ft. Lauderdale Oakland Park              1   10/10/11       2011-10
##   founded_quarter founded_year first_funding_at last_funding_at    seed venture
## 1         2012-Q2         2012          6/30/12         6/30/12 1750000   0e+00
## 2            <NA>           NA           6/4/10         9/23/10       0   4e+06
## 3         2012-Q4         2012           8/9/12          8/9/12   40000   0e+00
## 4         2011-Q2         2011           4/1/11          4/1/11 1500000   0e+00
## 5         2014-Q1         2014          8/17/14         9/26/14       0   0e+00
## 6         2011-Q4         2011          5/31/13         5/31/13       0   7e+06
##   equity_crowdfunding undisclosed convertible_note debt_financing angel grant
## 1                   0           0                0              0     0     0
## 2                   0           0                0              0     0     0
## 3                   0           0                0              0     0     0
## 4                   0           0                0              0     0     0
## 5               60000           0                0              0     0     0
## 6                   0           0                0              0     0     0
##   private_equity post_ipo_equity post_ipo_debt secondary_market
## 1              0               0             0                0
## 2              0               0             0                0
## 3              0               0             0                0
## 4              0               0             0                0
## 5              0               0             0                0
## 6              0               0             0                0
##   product_crowdfunding round_A round_B round_C round_D round_E
## 1                    0       0       0       0       0       0
## 2                    0       0       0       0       0       0
## 3                    0       0       0       0       0       0
## 4                    0       0       0       0       0       0
## 5                    0       0       0       0       0       0
## 6                    0       0 7000000       0       0       0
##                                name
## 49433                    Zytoprotec
## 49434                         Zzish
## 49435 ZZNode Science and Technology
## 49436         Zzzzapp Wireless ltd.
## 49437                 [a]list games
## 49438                         [x+1]
##                                                       category_list
## 49433                                               |Biotechnology|
## 49434 |Analytics|Gamification|Developer APIs|iOS|Android|Education|
## 49435                                         |Enterprise Software|
## 49436                 |Web Development|Advertising|Wireless|Mobile|
## 49437                                                       |Games|
## 49438                                         |Enterprise Software|
##                      market funding_total_usd    status country_code state_code
## 49433        Biotechnology         26,86,600  operating          AUT       <NA>
## 49434            Education          3,20,000  operating          GBR       <NA>
## 49435  Enterprise Software         15,87,301  operating          CHN       <NA>
## 49436      Web Development             97,398 operating          HRV       <NA>
## 49437                Games         93,00,000  operating         <NA>       <NA>
## 49438  Enterprise Software       4,50,00,000  operating          USA         NY
##              region               city funding_rounds founded_at founded_month
## 49433        Vienna Gerasdorf Bei Wien              1     1/1/07       2007-01
## 49434        London             London              1    1/28/13       2013-01
## 49435       Beijing            Beijing              1       <NA>          <NA>
## 49436         Split              Split              5    5/13/12       2012-05
## 49437          <NA>               <NA>              1       <NA>          <NA>
## 49438 New York City           New York              4     1/1/99       1999-01
##       founded_quarter founded_year first_funding_at last_funding_at    seed
## 49433         2007-Q1         2007          1/29/13         1/29/13       0
## 49434         2013-Q1         2013          3/24/14         3/24/14  320000
## 49435            <NA>           NA           4/1/12          4/1/12       0
## 49436         2012-Q2         2012          11/1/11         9/10/14   71525
## 49437            <NA>           NA         11/21/11        11/21/11 9300000
## 49438         1999-Q1         1999           6/1/08          4/4/13       0
##        venture equity_crowdfunding undisclosed convertible_note debt_financing
## 49433  2686600                   0           0                0        0.0e+00
## 49434        0                   0           0                0        0.0e+00
## 49435  1587301                   0           0                0        0.0e+00
## 49436        0                   0           0            25873        0.0e+00
## 49437        0                   0           0                0        0.0e+00
## 49438 28000000                   0           0                0        1.7e+07
##       angel grant private_equity post_ipo_equity post_ipo_debt secondary_market
## 49433     0     0              0               0             0                0
## 49434     0     0              0               0             0                0
## 49435     0     0              0               0             0                0
## 49436     0     0              0               0             0                0
## 49437     0     0              0               0             0                0
## 49438     0     0              0               0             0                0
##       product_crowdfunding  round_A  round_B round_C round_D round_E
## 49433                    0  2686600        0       0       0       0
## 49434                    0        0        0       0       0       0
## 49435                    0  1587301        0       0       0       0
## 49436                    0        0        0       0       0       0
## 49437                    0        0        0       0       0       0
## 49438                    0 16000000 10000000       0       0       0

Mutate character variables into factors and dates

## Warning: Problem with `mutate()` input `founded_at`.
## x  68 failed to parse.
## ℹ Input `founded_at` is `mdy(founded_at)`.
## Warning: 68 failed to parse.
## Warning: Problem with `mutate()` input `first_funding_at`.
## x  10 failed to parse.
## ℹ Input `first_funding_at` is `mdy(first_funding_at)`.
## Warning: 10 failed to parse.
## Warning: Problem with `mutate()` input `last_funding_at`.
## x  6 failed to parse.
## ℹ Input `last_funding_at` is `mdy(last_funding_at)`.
## Warning: 6 failed to parse.

Identify date issues

Remove commas from total funding column and extract month and quarter from founded_month and founded_quarter columns

## Warning: NAs introduced by coercion

Change founded_month integer variable to a factor with levels equal to the names of months

Create categorical variable for numerical venture variable

## [1] 2.200e+01 2.351e+09

Take a sample of 500 observations of venture funding and adjust the nominal venture funding for inflation such that funding from any time period is expressed in 2014 prices

## [1]         0 319412108

Exploratory Data Analysis

##                 name                  category_list        market
## 1 &TV Communications                        |Games|        Games 
## 2      .Club Domains                     |Software|     Software 
## 3            0-6.com                  |Curated Web|  Curated Web 
## 4    10 Minutes With                    |Education|    Education 
## 5    1000museums.com                  |Curated Web|  Curated Web 
## 6         1001 Menus |Local Businesses|Restaurants|  Restaurants 
##   funding_total_usd    status country_code state_code         region
## 1           4000000 operating          USA         CA    Los Angeles
## 2           7000000      <NA>          USA         FL Ft. Lauderdale
## 3           2000000 operating         <NA>       <NA>           <NA>
## 4           4400000 operating          GBR       <NA>         London
## 5           4962651 operating          USA         WA        Seattle
## 6           4059079 operating          FRA       <NA>          Paris
##           city funding_rounds founded_at founded_month founded_quarter
## 1  Los Angeles              2       <NA>          <NA>            <NA>
## 2 Oakland Park              1 2011-10-10       October              Q4
## 3         <NA>              1 2007-01-01       January              Q1
## 4       London              2 2013-01-01       January              Q1
## 5     Bellevue              6 2008-01-01       January              Q1
## 6        Paris              4 2010-11-20      November              Q4
##   founded_year first_funding_at last_funding_at   seed venture
## 1           NA       2010-06-04      2010-09-23      0 4000000
## 2         2011       2011-05-31      2011-05-31      0 7000000
## 3         2007       2007-03-19      2007-03-19      0 2000000
## 4         2013       2013-01-01      2013-10-09 400000 4000000
## 5         2008       2008-10-14      2008-09-19      0 3814772
## 6         2010       2010-12-15      2010-11-13 522169 3536910
##   equity_crowdfunding undisclosed convertible_note debt_financing angel grant
## 1                   0           0                0              0     0     0
## 2                   0           0                0              0     0     0
## 3                   0           0                0              0     0     0
## 4                   0           0                0              0     0     0
## 5                   0           0                0        1147879     0     0
## 6                   0           0                0              0     0     0
##   private_equity post_ipo_equity post_ipo_debt secondary_market
## 1              0               0             0                0
## 2              0               0             0                0
## 3              0               0             0                0
## 4              0               0             0                0
## 5              0               0             0                0
## 6              0               0             0                0
##   product_crowdfunding round_A round_B round_C round_D round_E fund_date_issue
## 1                    0       0       0       0       0       0           FALSE
## 2                    0       0 7000000       0       0       0           FALSE
## 3                    0 2000000       0       0       0       0           FALSE
## 4                    0 4000000       0       0       0       0           FALSE
## 5                    0       0       0       0       0       0           FALSE
## 6                    0 3536910       0       0       0       0           FALSE
##   markets3     venture_cats real_venture  time_period
## 1    Other 0 to 117 million            0         <NA>
## 2    Other 0 to 117 million            0 1990 to 2014
## 3 Internet 0 to 117 million            0 1990 to 2014
## 4    Other 0 to 117 million            0 1990 to 2014
## 5 Internet 0 to 117 million            0 1990 to 2014
## 6    Other 0 to 117 million            0 1990 to 2014
##                             name                         category_list
## 1                        Sunbeam                       |Biotechnology|
## 2          SuperOx Wastewater Co                       |Manufacturing|
## 3 Brain Tunnelgenix Technologies                       |Biotechnology|
## 4                    Audioscribe                                  <NA>
## 5                    Canopy Labs           |Lead Generation|Analytics|
## 6                          Manta |Professional Networking|Curated Web|
##              market funding_total_usd    status country_code state_code
## 1    Biotechnology            9329636 operating          USA         FL
## 2    Manufacturing            1700000 operating          USA         TX
## 3    Biotechnology            2563168 operating          USA         CT
## 4              <NA>           1499800 operating         <NA>       <NA>
## 5  Lead Generation            2064000      <NA>          USA         CA
## 6      Curated Web           47215715 operating          USA         OH
##           region            city funding_rounds founded_at founded_month
## 1 Ft. Lauderdale Fort Lauderdale              1 2007-01-01       January
## 2        Houston         Houston              1 2010-01-01       January
## 3       Hartford      Bridgeport              3 2006-01-01       January
## 4           <NA>            <NA>              2       <NA>          <NA>
## 5    SF Bay Area   San Francisco              2 2012-01-01       January
## 6 Columbus, Ohio        Columbus              2 2005-09-01     September
##   founded_quarter founded_year first_funding_at last_funding_at    seed
## 1              Q1         2007       2007-12-11      2007-12-11       0
## 2              Q1         2010       2010-05-24      2010-05-24       0
## 3              Q1         2006       2006-04-16      2006-09-17  100000
## 4            <NA>           NA       2012-07-01      2012-10-01       0
## 5              Q1         2012       2012-12-13      2012-03-21 1500000
## 6              Q3         2005       2005-01-04      2005-04-02       0
##    venture equity_crowdfunding undisclosed convertible_note debt_financing
## 1  9329636                   0           0                0              0
## 2  1700000                   0           0                0              0
## 3  2213168                   0           0                0              0
## 4  1499800                   0           0                0              0
## 5   564000                   0           0                0              0
## 6 47215715                   0           0                0              0
##   angel grant private_equity post_ipo_equity post_ipo_debt secondary_market
## 1     0     0              0               0             0                0
## 2     0     0              0               0             0                0
## 3     0     0              0               0             0                0
## 4     0     0              0               0             0                0
## 5     0     0              0               0             0                0
## 6     0     0              0               0             0                0
##   product_crowdfunding round_A round_B round_C round_D round_E fund_date_issue
## 1                    0       0       0       0       0       0           FALSE
## 2                    0       0       0       0       0       0           FALSE
## 3               250000       0       0       0       0       0           FALSE
## 4                    0       0       0       0       0       0           FALSE
## 5                    0       0       0       0       0       0           FALSE
## 6                    0       0       0       0       0       0           FALSE
##                  markets3     venture_cats real_venture  time_period
## 1 Health/Medicine/Biotech  0 to 32 million     10390367 1990 to 2014
## 2                   Other  0 to 32 million      1822627 1990 to 2014
## 3 Health/Medicine/Biotech  0 to 32 million      2569207 1990 to 2014
## 4                    <NA>  0 to 32 million      1528805         <NA>
## 5                   Other  0 to 32 million       574599 1990 to 2014
## 6                Internet 32 to 64 million     57915632 1990 to 2014
##              name                   category_list          market
## 1  Crocodile Gold                 |Manufacturing|  Manufacturing 
## 2        Monitise                        |Mobile|         Mobile 
## 3      Vaccinogen                 |Biotechnology|  Biotechnology 
## 4 Engage Mobility          |App Marketing|Mobile|  App Marketing 
## 5      Plaza Bank |Non Profit|E-Commerce|Banking|        Banking 
## 6          Insmed                 |Biotechnology|  Biotechnology 
##   funding_total_usd    status country_code state_code     region
## 1          18000000      <NA>          CAN         ON    Toronto
## 2         232329416 operating          GBR       <NA>     London
## 3          99051350 operating          USA         MD Hagerstown
## 4           1000000 operating          USA         FL    Orlando
## 5           4200000 operating          USA         WA    Seattle
## 6         100000000 operating          USA         NJ     Newark
##                city funding_rounds founded_at founded_month founded_quarter
## 1           Toronto              1 2009-01-01       January              Q1
## 2            London              4 2003-01-01       January              Q1
## 3         Frederick              5 2007-01-01       January              Q1
## 4           Orlando              1 2012-11-01      November              Q4
## 5           Seattle              1       <NA>          <NA>            <NA>
## 6 Monmouth Junction              1       <NA>          <NA>            <NA>
##   founded_year first_funding_at last_funding_at seed  venture
## 1         2009       2009-02-20      2009-02-20    0        0
## 2         2003       2003-07-13      2003-11-27    0  3564000
## 3         2007       2007-12-03      2007-08-25    0 18251350
## 4         2012       2012-04-15      2012-04-15    0        0
## 5           NA       2014-07-03      2014-07-03    0        0
## 6           NA       2014-08-14      2014-08-14    0        0
##   equity_crowdfunding undisclosed convertible_note debt_financing angel grant
## 1                   0           0                0          0e+00     0     0
## 2                   0           0                0          0e+00     0     0
## 3                   0           0                0          8e+05     0     0
## 4                   0           0                0          0e+00     0     0
## 5                   0           0                0          0e+00     0     0
## 6                   0           0                0          0e+00     0     0
##   private_equity post_ipo_equity post_ipo_debt secondary_market
## 1              0        19842728             0                0
## 2              0       290983638             0                0
## 3              0        89095584             0                0
## 4              0         1016650             0                0
## 5              0         4200000             0                0
## 6              0       100000000             0                0
##   product_crowdfunding round_A round_B round_C round_D round_E fund_date_issue
## 1                    0       0       0       0       0       0           FALSE
## 2                    0       0       0       0       0       0           FALSE
## 3                    0       0       0       0       0       0           FALSE
## 4                    0       0       0       0       0       0           FALSE
## 5                    0       0       0       0       0       0           FALSE
## 6                    0       0       0       0       0       0           FALSE
##                  markets3  time_period
## 1                   Other 1990 to 2014
## 2       Apps/Social Media 1990 to 2014
## 3 Health/Medicine/Biotech 1990 to 2014
## 4                   Other 1990 to 2014
## 5                 Finance         <NA>
## 6 Health/Medicine/Biotech         <NA>

Histograms

## Warning: Removed 13 rows containing non-finite values (stat_bin).
## Warning: Removed 24 rows containing missing values (geom_bar).

## Warning: Removed 411 rows containing non-finite values (stat_bin).
## Warning: Removed 16 rows containing missing values (geom_bar).

Bar charts

Heatmaps

Boxplots

## Warning: Removed 13 rows containing non-finite values (stat_boxplot).

## Warning: Removed 69 rows containing non-finite values (stat_boxplot).

## Warning: Removed 12 rows containing non-finite values (stat_boxplot).

## Warning: Removed 44 rows containing non-finite values (stat_boxplot).

Scatterplots

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 5430 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 123 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 117 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 117 rows containing non-finite values (stat_smooth).
## Warning: Removed 5430 rows containing non-finite values (stat_binhex).

## Warning: Removed 85 rows containing missing values (geom_point).

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 85 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 1997.9
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 15.075
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 25.756

## Warning: Removed 95 rows containing missing values (geom_point).

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 85 rows containing non-finite values (stat_smooth).

## Warning: Removed 117 rows containing non-finite values (stat_binhex).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 117 rows containing non-finite values (stat_smooth).

## Warning: Removed 117 rows containing missing values (geom_point).

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 117 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 1997
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 7.04
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.0816
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 1992.9
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 10.08
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 122.77
## Warning in sqrt(sum.squares/one.delta): NaNs produced

## Warning: Removed 117 rows containing missing values (geom_point).

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 117 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 1958.8
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 44.25
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 39.062

##                 name     market funding_total_usd    status country_code
## 1           #waywire      News            1750000  acquired          USA
## 2 &TV Communications     Games            4000000 operating          USA
## 3 -R- Ranch and Mine   Tourism              60000 operating          USA
## 4      .Club Domains  Software            7000000      <NA>          USA
## 5   004 Technologies  Software                 NA operating          USA
## 6            1-4 All  Software                 NA operating          USA
##   state_code              city funding_rounds founded_month founded_year
## 1         NY          New York              1          June         2012
## 2         CA       Los Angeles              2          <NA>           NA
## 3         TX        Fort Worth              2       January         2014
## 4         FL      Oakland Park              1       October         2011
## 5         IL         Champaign              1       January         2010
## 6         NC Connellys Springs              1          <NA>           NA
##   venture markets3
## 1       0    Other
## 2 4000000    Other
## 3       0    Other
## 4 7000000    Other
## 5       0    Other
## 6       0    Other
##                name      market funding_total_usd    status country_code
## 1   10 Minutes With  Education            4400000 operating          GBR
## 2          100e.com  Education            4500000 operating          CHN
## 3           100Plus  Analytics            1250000  acquired          USA
## 4 115 network disks  Education                 NA operating         <NA>
## 5            17u.cn     Travel           84440319 operating          CHN
## 6         1calendar  Education              40000 operating          DNK
##   state_code          city funding_rounds founded_month founded_year  venture
## 1       <NA>        London              2       January         2013  4000000
## 2       <NA>       Beijing              2          <NA>           NA  4500000
## 3         CA San Francisco              2     September         2011   500000
## 4       <NA>          <NA>              1          <NA>           NA        0
## 5       <NA>        Suzhou              3       January         2004 84440319
## 6       <NA>    Copenhagen              1       January         2009        0
##                      markets3
## 1                       Other
## 2                       Other
## 3 Big Data Analytics/Security
## 4                       Other
## 5                       Other
## 6                       Other
##               name      market funding_total_usd    status country_code
## 1    .Club Domains   Software            7000000      <NA>          USA
## 2 004 Technologies   Software                 NA operating          USA
## 3          1-4 All   Software                 NA operating          USA
## 4          100Plus  Analytics            1250000  acquired          USA
## 5         1010data   Software           35000000 operating          USA
## 6       10X10 Room   Software              77500 operating          USA
##   state_code              city funding_rounds founded_month founded_year
## 1         FL      Oakland Park              1       October         2011
## 2         IL         Champaign              1       January         2010
## 3         NC Connellys Springs              1          <NA>           NA
## 4         CA     San Francisco              2     September         2011
## 5         NY          New York              1       January         2000
## 6         MA         Lexington              1       January         2010
##    venture                    markets3
## 1  7000000                       Other
## 2        0                       Other
## 3        0                       Other
## 4   500000 Big Data Analytics/Security
## 5 35000000                       Other
## 6        0                       Other

## `summarise()` regrouping output by 'market' (override with `.groups` argument)
## # A tibble: 6 x 3
## # Groups:   market [2]
##   market        status    start_ups
##   <fct>         <fct>         <int>
## 1 " Analytics " acquired         42
## 2 " Analytics " closed            7
## 3 " Analytics " operating       323
## 4 " Analytics " <NA>              4
## 5 " Big Data "  acquired          7
## 6 " Big Data "  operating        83

## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 6 x 3
##   state_code compstate     Prob
##   <fct>          <int>    <dbl>
## 1 AK                12 0.000417
## 2 AL               105 0.00365 
## 3 AR               177 0.00615 
## 4 AZ               327 0.0114  
## 5 CA              9917 0.344   
## 6 CO               723 0.0251
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 6 x 3
##   funding_rounds tot_num_rounds   Prob
##            <int>          <int>  <dbl>
## 1              1          16908 0.587 
## 2              2           5696 0.198 
## 3              3           2806 0.0975
## 4              4           1562 0.0542
## 5              5            822 0.0285
## 6              6            466 0.0162

## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 223 x 2
##    country_code    long
##    <chr>          <dbl>
##  1 ABW           -70.0 
##  2 AFG            67.4 
##  3 AGO            16.2 
##  4 ALB            21.9 
##  5 AND             1.54
##  6 ARE            55.5 
##  7 ARG           -63.9 
##  8 ARM            44.8 
##  9 ASM          -171.  
## 10 ATG           -61.8 
## # … with 213 more rows
## # A tibble: 223 x 2
##    country_code   lat
##    <chr>        <dbl>
##  1 ABW           12.5
##  2 AFG           34.6
##  3 AGO          -11.7
##  4 ALB           40.8
##  5 AND           42.5
##  6 ARE           25.1
##  7 ARG          -34.2
##  8 ARM           40.3
##  9 ASM          -14.3
## 10 ATG           17.1
## # … with 213 more rows
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 223 x 2
##    country_code    long
##    <chr>          <dbl>
##  1 ABW           -70.0 
##  2 AFG            67.4 
##  3 AGO            16.2 
##  4 ALB            21.9 
##  5 AND             1.54
##  6 ARE            55.5 
##  7 ARG           -63.9 
##  8 ARM            44.8 
##  9 ASM          -171.  
## 10 ATG           -61.8 
## # … with 213 more rows
## # A tibble: 223 x 2
##    country_code   lat
##    <chr>        <dbl>
##  1 ABW           12.5
##  2 AFG           34.6
##  3 AGO          -11.7
##  4 ALB           40.8
##  5 AND           42.5
##  6 ARE           25.1
##  7 ARG          -34.2
##  8 ARM           40.3
##  9 ASM          -14.3
## 10 ATG           17.1
## # … with 213 more rows
##  NULL
##  NULL
## Source : https://maps.googleapis.com/maps/api/staticmap?center=0,0&zoom=1&size=640x640&scale=1&maptype=terrain&key=xxx
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Warning: Removed 1 rows containing missing values (geom_rect).
## Warning: Removed 7395 rows containing missing values (geom_point).

## Source : https://maps.googleapis.com/maps/api/staticmap?center=0,0&zoom=1&size=640x640&scale=1&maptype=terrain&key=xxx
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Warning: Removed 1 rows containing missing values (geom_rect).
## Warning: Removed 7878 rows containing missing values (geom_point).

## Warning: Removed 7075 rows containing missing values (position_stack).