This is my practical exam reports for DataCamp’s Data Analyst Professional Certificate
Data of Pens and Printers Company is imported from “https://s3.amazonaws.com/talent-assets.datacamp.com/product_sales.csv”
Data set consists of 15,000 rows and 8 columns such as ‘week’, ‘sales_method’, ‘customer_id’, ‘nd_sold’, ‘revenue’, ‘years_as_customer’, ‘nb_site_visits’ and ‘state’.
Week is from 1 to 6, sale methods have 3 types of “Email”, “Call” and the combination of “Email + Call”. However, there are 23 orders with “em + call” may be “Email + Call”, so it is converted into “Email + Call” and 10 “email” will be converted into “Email”.
There are 15,000 distinct customers from 50 states.
There are 15,1270 products sold during 6 weeks of the sale compaign, with min of 7, mean of 10.08, median of 10 and max of 10.
The total revenue is 1,308,138 with min of 32.54, median of 89.50, mean of 93.93 and max of 238.32.
There are 1074 missing values in ‘revenue’, about 7.2% of rows, so the missing values will be replaced by the mean of ‘revenue’, 93.93.
The ‘year_of_customer’ have min of 0, median of 3, mean of 5.966 and max of 63.
The ‘nb_site_visits’ is 374,863 with min of 12, median of 25, mean of 24.99 and max of 41.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
df <- read_csv("https://s3.amazonaws.com/talent-assets.datacamp.com/product_sales.csv")
## Rows: 15000 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): sales_method, customer_id, state
## dbl (5): week, nb_sold, revenue, years_as_customer, nb_site_visits
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df,5)
## # A tibble: 5 × 8
## week sales_method customer_id nb_sold revenue years_as_customer
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2 Email 2e72d641-95ac-497b-bbf8-… 10 NA 0
## 2 6 Email + Call 3998a98d-70f5-44f7-942e-… 15 225. 1
## 3 5 Call d1de9884-8059-4065-b10f-… 11 52.6 6
## 4 4 Email 78aa75a4-ffeb-4817-b1d0-… 11 NA 3
## 5 3 Email 10e6d446-10a5-42e5-8210-… 9 90.5 0
## # ℹ 2 more variables: nb_site_visits <dbl>, state <chr>
View(df)
glimpse(df)
## Rows: 15,000
## Columns: 8
## $ week <dbl> 2, 6, 5, 4, 3, 6, 4, 1, 5, 5, 3, 2, 5, 2, 5, 4, 2, 6…
## $ sales_method <chr> "Email", "Email + Call", "Call", "Email", "Email", "…
## $ customer_id <chr> "2e72d641-95ac-497b-bbf8-4861764a7097", "3998a98d-70…
## $ nb_sold <dbl> 10, 15, 11, 11, 9, 13, 11, 10, 11, 11, 9, 9, 11, 10,…
## $ revenue <dbl> NA, 225.47, 52.55, NA, 90.49, 65.01, 113.38, 99.94, …
## $ years_as_customer <dbl> 0, 1, 6, 3, 0, 10, 9, 1, 10, 7, 4, 2, 2, 1, 1, 2, 6,…
## $ nb_site_visits <dbl> 24, 28, 26, 25, 28, 24, 28, 22, 31, 23, 28, 23, 30, …
## $ state <chr> "Arizona", "Kansas", "Wisconsin", "Indiana", "Illino…
summary(df)
## week sales_method customer_id nb_sold
## Min. :1.000 Length:15000 Length:15000 Min. : 7.00
## 1st Qu.:2.000 Class :character Class :character 1st Qu.: 9.00
## Median :3.000 Mode :character Mode :character Median :10.00
## Mean :3.098 Mean :10.08
## 3rd Qu.:5.000 3rd Qu.:11.00
## Max. :6.000 Max. :16.00
##
## revenue years_as_customer nb_site_visits state
## Min. : 32.54 Min. : 0.000 Min. :12.00 Length:15000
## 1st Qu.: 52.47 1st Qu.: 1.000 1st Qu.:23.00 Class :character
## Median : 89.50 Median : 3.000 Median :25.00 Mode :character
## Mean : 93.93 Mean : 4.966 Mean :24.99
## 3rd Qu.:107.33 3rd Qu.: 7.000 3rd Qu.:27.00
## Max. :238.32 Max. :63.000 Max. :41.00
## NA's :1074
str(df)
## spc_tbl_ [15,000 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ week : num [1:15000] 2 6 5 4 3 6 4 1 5 5 ...
## $ sales_method : chr [1:15000] "Email" "Email + Call" "Call" "Email" ...
## $ customer_id : chr [1:15000] "2e72d641-95ac-497b-bbf8-4861764a7097" "3998a98d-70f5-44f7-942e-789bb8ad2fe7" "d1de9884-8059-4065-b10f-86eef57e4a44" "78aa75a4-ffeb-4817-b1d0-2f030783c5d7" ...
## $ nb_sold : num [1:15000] 10 15 11 11 9 13 11 10 11 11 ...
## $ revenue : num [1:15000] NA 225.5 52.5 NA 90.5 ...
## $ years_as_customer: num [1:15000] 0 1 6 3 0 10 9 1 10 7 ...
## $ nb_site_visits : num [1:15000] 24 28 26 25 28 24 28 22 31 23 ...
## $ state : chr [1:15000] "Arizona" "Kansas" "Wisconsin" "Indiana" ...
## - attr(*, "spec")=
## .. cols(
## .. week = col_double(),
## .. sales_method = col_character(),
## .. customer_id = col_character(),
## .. nb_sold = col_double(),
## .. revenue = col_double(),
## .. years_as_customer = col_double(),
## .. nb_site_visits = col_double(),
## .. state = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
table(df$sales_method)
##
## Call em + call email Email Email + Call
## 4962 23 10 7456 2549
df %>% summarize(n_customers = n_distinct(customer_id))
## # A tibble: 1 × 1
## n_customers
## <int>
## 1 15000
After cleaning data with replacing incorrected values in ‘sales_method’ we have 3 types of sale methods that are “Email”, “Call”, and “Email + Call” and no “em + call” and “email” any more.
I replace the missing values of ‘revenue’ with the mean of 93.93, then total amount of ‘revenue’ is now 1,409,019, instead of 1,308,138, the median of 91.86, instead of 89.50.
Pens and Printers company is found in 1984, but the summary shows max of ‘years_as_customer’ of 63 that is error. Another customer’s ‘years_as_customer’ is 47. I replace these error values in this column with the mean of 5.
df_1 <- df %>% mutate(sales_method = case_when(
sales_method == "em + call" ~ "Email + Call",
sales_method == "email" ~ "Email", TRUE ~ sales_method))
df_2 <- df_1 %>% mutate(revenue = replace_na(revenue, 93.93) )
table(df_2$sales_method) # check if errors are corrected
##
## Call Email Email + Call
## 4962 7466 2572
summary(df_2$revenue) # check if errors are corrected
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 32.54 53.04 91.86 93.93 106.07 238.32
df_e <- df_2 %>% filter(years_as_customer > 40)
head(df_e) # check if errors occure
## # A tibble: 2 × 8
## week sales_method customer_id nb_sold revenue years_as_customer
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2 Email 18919515-a618-430c-9a05-… 10 97.2 63
## 2 4 Call 2ea97d34-571d-4e1b-95be-… 10 50.5 47
## # ℹ 2 more variables: nb_site_visits <dbl>, state <chr>
# final cleaned data
df_cln <- df_2 %>% mutate( years_as_customer = ifelse(years_as_customer >40, 5, years_as_customer) )
summary(df_cln)
## week sales_method customer_id nb_sold
## Min. :1.000 Length:15000 Length:15000 Min. : 7.00
## 1st Qu.:2.000 Class :character Class :character 1st Qu.: 9.00
## Median :3.000 Mode :character Mode :character Median :10.00
## Mean :3.098 Mean :10.08
## 3rd Qu.:5.000 3rd Qu.:11.00
## Max. :6.000 Max. :16.00
## revenue years_as_customer nb_site_visits state
## Min. : 32.54 Min. : 0.000 Min. :12.00 Length:15000
## 1st Qu.: 53.04 1st Qu.: 1.000 1st Qu.:23.00 Class :character
## Median : 91.86 Median : 3.000 Median :25.00 Mode :character
## Mean : 93.93 Mean : 4.959 Mean :24.99
## 3rd Qu.:106.07 3rd Qu.: 7.000 3rd Qu.:27.00
## Max. :238.32 Max. :39.000 Max. :41.00
Numbers of customers by sale methods. ‘Email’ only is the method that contacts most customers, then ‘Call’ and the combination of ‘Email + Call’. However, number of customers who contacted by ‘Email’ only gradually decreased by weeks.
The numbers of customers by sale methods as following:
Call: 4962 (33.08%)
Email: 7466 (49.77%)
Email + Call: 2572 (17.15%)
df_cln %>% ggplot(aes(sales_method, fill = sales_method)) +
geom_bar(stat = "count") +
scale_y_continuous(labels = scales::comma) +
labs(x= "Sale Methods", y="Number of customers",
title = "Number of Customers By Sale Methods", subtitle = "Data source: Pens and Printers Company") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The Numbers of customers is decreasing by week.
df_cln %>%
ggplot(aes(week, fill = factor(week))) +
geom_bar(stat = "count") +
scale_x_continuous(breaks = 1:6, labels = 1:6) +
scale_y_continuous(labels = scales::comma) +
labs(x= "Week", y="Number of customers",
title = "Number of Customers By Week", subtitle = "Data source: Pens and Printers Company") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
The numbers of customers contacts by all sale methods sharply decrease at the 6th week. This makes the number of orders and total revenue later drops accordingly.
The numbers of customers by sale methods and week is differently changing. The numbers of customers with “Email” is stably decreasing until the last week. The numbers of customers with “Call” and “Email + Call” increased stably until week 5, then drop suddenly at week 6. However, with the combine methods, the numbers of customers is at the lower level although on the same pattern.
df_cln %>% ggplot(aes(week, fill= sales_method)) +
geom_bar(position = position_dodge(width = 0.8)) +
scale_x_continuous(breaks = 1:6, labels = 1:6) +
scale_y_continuous(labels = scales::comma) +
labs(x= "Week", y="Number of customers",
title = "Number of Customers By Sale Methods and Week",
subtitle = "Data source: Pens and Printers Company")
Number of products sold by week sharply increases in week 4 and 5 even the number of customers is not increasing. In these weeks, the numbers of contacts by the combination of email and call increased.
week_sold <- df_cln %>%
group_by(week) %>%
summarize(num_sold = sum(nb_sold))
ggplot(week_sold, aes(week, num_sold)) +
geom_line(color = "blue", size = 0.7) +
scale_x_continuous(breaks = 1:6, labels = 1:6) +
scale_y_continuous(labels = scales::comma) +
labs(x= "Week", y="Number of products sold", title = "Number of Products Sold By Week",
subtitle = "Data source: Pens and Printers Company")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Numbers of site visits by week is changing as the same pattern of the numbers of products sold by week.
wk_visits <- df_cln %>%
group_by(week) %>%
summarize(num_visits = sum(nb_site_visits))
ggplot(wk_visits, aes(week, num_visits)) +
geom_line(color = "green", size = 0.7) +
scale_x_continuous(breaks = 1:6, labels = 1:6) +
scale_y_continuous(labels = scales::comma) +
labs(x= "Week", y="Site visits", title = "Site Visits By Week",
subtitle = "Data source: Pens and Printers Company")
The number of year as customers is decreasing by week. Almost loyal customers with 5 years or more as customers have orders on the first 3 weeks of the sale compaign.
wk_years <- df_cln %>%
group_by(week) %>%
summarize(mean_years = mean(years_as_customer))
ggplot(wk_years, aes(week, mean_years)) +
geom_line(color = "pink", size = 1, alpha = 3) +
scale_x_continuous(breaks = 1:6, labels = 1:6) +
labs(x= "Week", y="Mean years as customers", title = "Mean Years as Customer By Week",
subtitle = "Data source: Pens and Printers Company")
Number of products sold by states.
Top 20 of states that buy most number of products.
California, Texas, New York, Florida, Illinois, Pensilvania, Ohio, Michigan, Goergia and North Carolina are top 10 states buy most numbers of products.
st_sold <- df_cln %>%
group_by(state) %>%
summarize(sum_state_sold = sum(nb_sold)) %>%
arrange(desc(sum_state_sold))
st_sold[1:20,]
## # A tibble: 20 × 2
## state sum_state_sold
## <chr> <dbl>
## 1 California 18859
## 2 Texas 11957
## 3 New York 9734
## 4 Florida 9201
## 5 Illinois 6143
## 6 Pennsylvania 5979
## 7 Ohio 5699
## 8 Michigan 4998
## 9 Georgia 4930
## 10 North Carolina 4559
## 11 New Jersey 4338
## 12 Virginia 3790
## 13 Indiana 3558
## 14 Washington 3424
## 15 Tennessee 3414
## 16 Arizona 3238
## 17 Missouri 3122
## 18 Massachusetts 2913
## 19 Maryland 2669
## 20 Wisconsin 2528
Revenue by states.
Top 20 states that bring most revenue for the company.
California, Texas, New York, Florida, Illinois, Pensilvania, Ohio, Michigan, Goergia and North Carolina are top 10 states that bring most revenue for the company.
st_revenue <- df_cln %>%
group_by(state) %>%
summarize(sum_state_revenue = sum(revenue)) %>%
arrange(desc(sum_state_revenue))
st_revenue[1:20,]
## # A tibble: 20 × 2
## state sum_state_revenue
## <chr> <dbl>
## 1 California 173534.
## 2 Texas 113621.
## 3 New York 89442.
## 4 Florida 84978.
## 5 Illinois 56500.
## 6 Pennsylvania 55822.
## 7 Ohio 52332.
## 8 Michigan 47431.
## 9 Georgia 46150.
## 10 North Carolina 41142.
## 11 New Jersey 39533.
## 12 Virginia 36192.
## 13 Indiana 33160.
## 14 Washington 32841.
## 15 Tennessee 30701.
## 16 Arizona 29643.
## 17 Missouri 28208.
## 18 Massachusetts 27480.
## 19 Maryland 24480.
## 20 Wisconsin 23680.
The amount of revenue is changing similarly the pattern of changing of the numbers of products sold as well as the numbers of site visits.
wk_revenue <- df_cln %>%
group_by(week) %>%
summarize(total_revenue = sum(revenue))
ggplot(wk_revenue, aes(week, total_revenue)) +
geom_line(color = "red", size = 0.7) +
scale_x_continuous(breaks = 1:6, labels = 1:6) +
scale_y_continuous(labels = scales::comma)+
labs(x= "Week", y="Revenue", title = "Revenue By Week",
subtitle = "Data source: Pens and Printers Company")
Total revenue is 1,409,019. It is high at the first week then decreasing until week 3, then increasing back until week 5. It drops sharply at week 6.
Revenue by the methods:
“Email”: 723,415.8
“Email + Call”: 441,038.3
“Call”: 244,564.8
The total revenue by week and sale methods. The revenue of email method decreased sharply by the time. The revenue by call method is not sinificantly increasing.
The graph shows that the combine of email and call is the best sale methods to make the revenue increased. Therefore, this method of sales is recomendated for the company, emails combined with calls.
sum(df_cln$revenue)
## [1] 1409019
df_cln %>% group_by(sales_method) %>%
summarize(sum_revenue = sum(revenue))
## # A tibble: 3 × 2
## sales_method sum_revenue
## <chr> <dbl>
## 1 Call 244565.
## 2 Email 723416.
## 3 Email + Call 441038.
wk_sale_revenue <- df_cln %>%
group_by(week, sales_method) %>%
summarize(total_revenue = sum(revenue), .groups = 'drop')
ggplot(wk_sale_revenue, aes(week, total_revenue, color = sales_method)) +
geom_line() +
scale_x_continuous(breaks = 1:6, labels = 1:6) +
scale_y_continuous(labels = scales::comma)+
labs(x= "Week", y="Revenue", title = "Revenue By Week and Sale Methods",
subtitle = "Data source: Pens and Printers Company")
I considers revenue as the main metric to monitor the company performance. So, I want to further investigate the effect of sale methods on the revenue.
Difference in revenue between other sale methods is significant, p-value < 2e-16.
The most significant difference in revenue is between the combined approach of email plus call and call only (122.19), then between the combined method and email only (74.58), then email only and call only (47.61) using TukeyHSD test.
It is ‘Email + Call’ method with superior difference that makes the overall numbers of customers, products sold and ‘revenue’ increased at week 5 and 6. At these weeks, ‘Email” contacts decreasing and ’Call’ contacts increasing a litle.
Therefore the combine of ‘Email + Call’ and ‘Email’ only are the better approaches compared to ‘Call’ only.
model1 <- aov(revenue ~ sales_method, data = df_cln)
summary(model1)
## Df Sum Sq Mean Sq F value Pr(>F)
## sales_method 2 25421358 12710679 32246 <2e-16 ***
## Residuals 14997 5911407 394
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(model1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = revenue ~ sales_method, data = df_cln)
##
## $sales_method
## diff lwr upr p adj
## Email-Call 47.60714 46.75479 48.45949 0
## Email + Call-Call 122.18922 121.05855 123.31990 0
## Email + Call-Email 74.58208 73.51811 75.64606 0
Revenue is the dependent variable in a regression model to investigate the effect of other independent variales on the revenue.
Before building the model, I evaluate the correlation between variables.
The results shows no strong correlation between variables that are independent from ‘revenue’.
library(GGally)
## Warning: package 'GGally' was built under R version 4.3.2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
df_cor <- df_cln %>% select(-customer_id, - state, - week)
ggpairs(df_cor)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The model glm will be builded with ‘revenue’ as dependent variable and 4 other above as independent variables.
The summary results show ‘nb_site_visits’ and ‘years_as_customer’ are not significant, p-values > 0.05.
AIC: 125284
Therefore, the short model will be builded without two above variables.
model2 <- glm(revenue ~ sales_method + nb_sold + nb_site_visits + years_as_customer, data = df_cln, family = "gaussian")
summary(model2)
##
## Call:
## glm(formula = revenue ~ sales_method + nb_sold + nb_site_visits +
## years_as_customer, family = "gaussian", data = df_cln)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -25.64682 1.06564 -24.067 <2e-16 ***
## sales_methodEmail 45.87221 0.28915 158.644 <2e-16 ***
## sales_methodEmail + Call 100.65211 0.44627 225.543 <2e-16 ***
## nb_sold 7.96411 0.09449 84.282 <2e-16 ***
## nb_site_visits -0.03557 0.04221 -0.843 0.399
## years_as_customer 0.01296 0.02580 0.502 0.615
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 248.1067)
##
## Null deviance: 31332766 on 14999 degrees of freedom
## Residual deviance: 3720113 on 14994 degrees of freedom
## AIC: 125284
##
## Number of Fisher Scoring iterations: 2
The new glm model is better with smaller AIC, AIC: 125281.
model3 <- glm(revenue ~ sales_method + nb_sold, data = df_cln, family = "gaussian")
summary(model3)
##
## Call:
## glm(formula = revenue ~ sales_method + nb_sold, family = "gaussian",
## data = df_cln)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -26.07568 0.83252 -31.32 <2e-16 ***
## sales_methodEmail 45.86657 0.28909 158.66 <2e-16 ***
## sales_methodEmail + Call 100.66652 0.44599 225.72 <2e-16 ***
## nb_sold 7.92490 0.08433 93.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 248.0896)
##
## Null deviance: 31332766 on 14999 degrees of freedom
## Residual deviance: 3720351 on 14996 degrees of freedom
## AIC: 125281
##
## Number of Fisher Scoring iterations: 2
However, the variable ‘nb_sold’ is not much impact on the ‘revenue’, the estimate coefficent = 7.96387. Moreover, it is fairly strongly correlated with ‘revenue’, r = 0.662. It is not the variable that the team actively does. It acts as a covariate with ‘revenue’.
Therefore, the model I want to build will be with ‘sales_method’ only.
AIC: 132225, biger than that of previous models, however, this model is more practical.
model4 <- glm(revenue ~ sales_method , data = df_cln, family = "gaussian")
summary(model4)
##
## Call:
## glm(formula = revenue ~ sales_method, family = "gaussian", data = df_cln)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.2875 0.2818 174.9 <2e-16 ***
## sales_methodEmail 47.6071 0.3636 130.9 <2e-16 ***
## sales_methodEmail + Call 122.1892 0.4824 253.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 394.1727)
##
## Null deviance: 31332766 on 14999 degrees of freedom
## Residual deviance: 5911407 on 14997 degrees of freedom
## AIC: 132225
##
## Number of Fisher Scoring iterations: 2
There are 15,000 customers involved in this compaign, 7466 for “Email”, 4962 for “Call” only and 2572 for the combine “Email + Call”.
Total revenue is 1,409,019. It is high at the first week then decreasing until week 3, then increasing back until week 5. It drops sharply at week 6.
Revenue by the methods:
“Email”: 723,415.8
“Email + Call”: 441,038.3
“Call”: 244,564.8
Revenue from “Email” only is high at the first week, then decreasing stably to the last week of the compaign.
The revenue from “Call” only is low at the first week, then increasing slowly and drop back at the last week.
The revenue from the combine sale method, “Email + Call” is low at the first week, then increasing as the pattern of “Call” only method but at the level lower than that of “Call” only.
At week 6, the revenue from all sale methods drops quickly.
Numbers of customers are different significantly in three sale methods, especially small number in the combine sale method group, but revenue from this group is fair high.
Top 10 states bring about most revenue for the company are California, Texas, New York, Florida, Illinois, Pensilvania, Ohio, Michigan, Goergia and North Carolina.
The combined “Email + Call”, and “Email” only are most valuable methods for sale that bring about most revenue as well as the numbers of products sold for the company.
Email only is the method recommended to continue to use.
More data needed to prove the combine method of ‘Email + Call’ superiority.