Importing the given csv file and exploring/studying it to gain insight about the data.
## New names:
## * `` -> ...1
## Rows: 685927 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): customer_email
## dbl (3): ...1, net_revenue, year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
| …1 | customer_email | net_revenue | year |
|---|---|---|---|
| 0 | nhknapwsbx@gmail.com | 249.92 | 2015 |
| 1 | joiuzbvcpn@gmail.com | 87.61 | 2015 |
| 2 | ukkjctepxt@gmail.com | 168.38 | 2015 |
| 3 | gykatilzrt@gmail.com | 62.40 | 2015 |
| 4 | mmsgsrtxah@gmail.com | 43.08 | 2015 |
| 5 | mobvusnzfr@gmail.com | 39.44 | 2015 |
| 6 | kxwkqvacsv@gmail.com | 171.46 | 2015 |
| 7 | iqmnugdcjs@gmail.com | 111.99 | 2015 |
| 8 | baiikostmd@gmail.com | 170.70 | 2015 |
| 9 | lfeafnigbu@gmail.com | 54.99 | 2015 |
Next, we check if the dataset contains any missing values. This dataset does not contain any missing values.
## ...1 customer_email net_revenue year
## 0 0 0 0
Now we change the name of the first column from ‘..1’ to ‘index’ to enhance readability and understanding as well.
The first five rows of the dataset look likke this.
| index | customer_email | net_revenue | year |
|---|---|---|---|
| 0 | nhknapwsbx@gmail.com | 249.92 | 2015 |
| 1 | joiuzbvcpn@gmail.com | 87.61 | 2015 |
| 2 | ukkjctepxt@gmail.com | 168.38 | 2015 |
| 3 | gykatilzrt@gmail.com | 62.40 | 2015 |
| 4 | mmsgsrtxah@gmail.com | 43.08 | 2015 |
We now find all the unique values in the column ‘year’
## [1] 2015 2016 2017
We now count the number of values corresponding to each year, in the order 2017, 2016 and 2015.
## [1] 231294
## [1] 204646
## [1] 249987
## # A tibble: 3 × 2
## year Revenue
## <dbl> <dbl>
## 1 2015 29036749.
## 2 2016 25730944.
## 3 2017 31417495.
## # A tibble: 1 × 1
## Revenue
## <dbl>
## 1 21769213.
## # A tibble: 1 × 1
## Revenue
## <dbl>
## 1 17206367.
## Joining, by = "customer_email"
## Joining, by = "customer_email"
## [1] "Existing customer growth for 2017 = 63857.0600000005"
## Joining, by = "customer_email"
## Joining, by = "customer_email"
## [1] "Existing customer growth for 2016 = 39043.6499999985"
## [1] "Revenue lost due to attrition for 2017 = 5686551.44"
## [1] "Revenue lost due to attrition for 2016 = -3305805.6"
A negative value indicates that there was rather an increase in the total revenue from the year 2015 to 2016
Existing customer revenue current year (2017)
## Joining, by = "customer_email"
## # A tibble: 1 × 1
## Revenue_2017
## <dbl>
## 1 9648282.
Existing customer revenue prior year (2016)
## Joining, by = "customer_email"
## # A tibble: 1 × 1
## Revenue_2016
## <dbl>
## 1 9584425.
Existing customer revenue current year (2016)
## Joining, by = "customer_email"
## # A tibble: 1 × 1
## Revenue_2016
## <dbl>
## 1 8524577.
Existing customer revenue prior year (2015)
## Joining, by = "customer_email"
## # A tibble: 1 × 1
## Revenue_2015
## <dbl>
## 1 8485533.
## [1] "Total customers current year (2017) = 249987"
## [1] "Total customers previous year (2016) = 204646"
## [1] "Total customers previous year (2015) = 231294"
## [1] "Total number of new customers in 2017 from 2016= 45341"
## [1] "Total number of new customers in 2016 from 2015= -26648"
A negative value indicates a loss of customers in 2016 from 2015
## Joining, by = "customer_email"
## # A tibble: 173,449 × 1
## customer_email
## <chr>
## 1 qvjfterwnk@gmail.com
## 2 fdkiioqtli@gmail.com
## 3 vpsfdhgrfo@gmail.com
## 4 pnnenkxlpj@gmail.com
## 5 qkjragcrsd@gmail.com
## 6 sqashwoqai@gmail.com
## 7 thcyofilac@gmail.com
## 8 ylqsflvinx@gmail.com
## 9 actvovcfwt@gmail.com
## 10 wacguvfcpu@gmail.com
## # … with 173,439 more rows
## [1] "Total number of customers lost in 2016 from 2015 = 26648"
## [1] "Emails of customers lost in 2016"
## Joining, by = "customer_email"
## # A tibble: 163,539 × 1
## customer_email
## <chr>
## 1 nhknapwsbx@gmail.com
## 2 joiuzbvcpn@gmail.com
## 3 ukkjctepxt@gmail.com
## 4 gykatilzrt@gmail.com
## 5 mmsgsrtxah@gmail.com
## 6 mobvusnzfr@gmail.com
## 7 kxwkqvacsv@gmail.com
## 8 iqmnugdcjs@gmail.com
## 9 utycjynusf@gmail.com
## 10 dptyjlaniw@gmail.com
## # … with 163,529 more rows
Here are two plots:
If given more time, I’d work more on the visualizations and explore deeper relationships between the existing customers and new customers wrt years and calculate yearly profit, loss and would gain deeper insight about the data.
PS: I’m extremely thankful to Stout for giving me this opportunity and I have thoroughly enjoyed working on this. I could not work more on this assessment because I have 2 midterms this week.