library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggridges)

Observations of the data presented by Enterprise Health:

Observation 1 There are 182 deals lost, 53 deals won, and 82 deals in progress. The ratio of wins to losses is 53:182 which means that it can be predicted that of the deals currently on the pipeline, ~18.5 will be wins and ~63.5 will be losses.

Observation 2 In comparing the annual recurring revenue of the deals closed and deals open the majority of deals in both categories are in the 3,000 to 30,000 range, but in the case of closed cases there is a more consistent 30K to 60K range of deals where open cases only have a few deals in this range. I assume this means deals often go up in price before they are closed.

Observation 3 The ridge plots of won deals vs. lost deals in terms of annual recurring revenue are very similar, leading me to believe that “ARR” is not a deal breaker in most cases.

Observation 4 Most salesmen have a deal winning percentage of ~17% while KJ is more in the 30% range leading me to believe that more experienced deal owners are getting better percentages over time. Scott and Jeff are outliers with higher percentages likely because they have made few deals so far while Matt despite having many closed deals has a surprisingly low win percentage.

Observation 5 The majority of deals in the pipeline right now are either in the Sponsorship or Proposal stages while the minority are in the Contract stage.

## Data Comparisons
library(readxl)
Enterprise <- read_excel("Hubspot_Data_for_Class_1.xlsx")

Enterprise_Closed <- filter(Enterprise, Deal_Stage == "Closed lost" | Deal_Stage == "Closed won")

Enterprise_Open <- filter(Enterprise, Deal_Stage == "Qualify" | Deal_Stage == "Proposal" | Deal_Stage == "Gain Sponsorship" | Deal_Stage == "Demo" | Deal_Stage == "Contract")


ggplot(Enterprise, aes(x = Deal_Stage)) +
  geom_bar()

ggplot(Enterprise_Closed, aes(x = Annual_Recurring_Rev)) +
  geom_histogram(binwidth = 50000)
## Warning: Removed 49 rows containing non-finite values (stat_bin).

ggplot(Enterprise_Open, aes(x = Annual_Recurring_Rev)) +
  geom_histogram(binwidth = 50000)
## Warning: Removed 6 rows containing non-finite values (stat_bin).

ggplot(Enterprise_Closed, aes(x = Annual_Recurring_Rev, y = Deal_Stage, fill = Deal_Stage, color = Deal_Stage)) +
  geom_density_ridges(alpha = 0.5)
## Picking joint bandwidth of 38900
## Warning: Removed 49 rows containing non-finite values (stat_density_ridges).

ggplot(Enterprise_Closed, aes(fill = Deal_Stage, x = Deal_owner)) +
  geom_bar(position = "fill")

ggplot(Enterprise_Open, aes(fill = Deal_Stage, x = Deal_owner)) +
  geom_bar()

ggplot(Enterprise_Closed, aes(x = Deal_Stage, y = Annual_Recurring_Rev)) +
  geom_violin()
## Warning: Removed 49 rows containing non-finite values (stat_ydensity).