This task focuses on the Exploratory Data Analysis (EDA), forecasting, and the resulting business insights for Rec Corp Ltd, a hypothetical company. The dataset used for analysis was generated using Python and sourced from Kaggle. The primary objectives were to conduct EDA, clean the data, and answer business questions to gain insights. Additionally, forecasting techniques were applied to predict future trends and support decision-making processes.The EDA process involved data cleaning, descriptive statistics, and visualizations to understand the dataset’s characteristics and identify patterns.In addition to EDA, forecasting techniques were employed to predict future trends and outcomes. These forecasts provided Rec Corp Ltd with valuable information to make informed business decisions, allocate resources effectively, and implement strategies to mitigate risks.The combination of EDA and forecasting enabled Rec Corp Ltd to gain a comprehensive understanding of their dataset, identify patterns, and anticipate future trends. By leveraging R and drawing on techniques from Kaggle, Rec Corp Ltd was able to extract meaningful insights and make data-driven decisions.In conclusion, the EDA and forecasting conducted for Rec Corp Ltd showcased the power of data analysis in driving business insights. By leveraging R, Kaggle, and various analytical techniques, Rec Corp Ltd gained a deeper understanding of their dataset, identified key trends, and made informed business decisions. This analysis serves as a foundation for Rec Corp Ltd to optimize their operations, enhance safety measures, and achieve their business goals.
REC corp LTD. is small-scaled business venture established in India. They have been selling four products for over 10 years. Exploratory Data Analysis (EDA) and forecasting are powerful techniques that provide valuable insights for businesses. In the case of Rec Corp Ltd, a hypothetical company, EDA was conducted to understand the provided dataset, followed by forecasting to predict future trends and make informed business decisions. Rec Corp Ltd gained a comprehensive understanding of the dataset, identified patterns, and utilized forecasting to anticipate future outcomes.
The products are P1, P2, P3 and P4.
Q1- Total unit sales of product 1
Q2- Total unit sales of product 2
Q3- Total unit sales of product 3
Q4- Total unit sales of product 4
S1- Total revenue from product 1
S2- Total revenue from product 2
S3- Total revenue from product 3
S4- Total revenue from product 4
Example :
On 13-06-2010 , product 1 had been brought by 5422 people and INR 17187.74 had been generated in revenue from product 1
Is there any trend in the sales of all four products during certain months?
Out of all four products , which product has seen the highest sales in all the given years?
The company has all it’s retail centers closed on the 31st of December every year. An estimate on number of units of each product that could be sold on 31st of Dec , every year , if all their retail centers were kept open.
The CEO is considering an idea to drop the production of any one of the products. Suggestion on whether his idea would result in a massive setback for the company.
The CEO would also like to predict the sales and revenues for the year 2024. Give a yearly estimate with the best possible accuracy.
The dataset is from Kaggle https://www.kaggle.com/datasets/ksabishek/product-sales-data
NOTE: This is a hypothetical dataset generated using python for educational purposes. It bears no resemblance to any real firm. Any similarity is a matter of coincidence.
## x date day month year q_p1 q_p2 q_p3 q_p4 s_p1 s_p2 s_p3 s_p4
## 0 0 0 0 0 0 0 0 0 0 0 0 0
## [1] 0
## 'data.frame': 4600 obs. of 13 variables:
## $ x : int 0 1 2 3 4 5 6 7 8 9 ...
## $ date : chr "13/06/2010" "14/06/2010" "15/06/2010" "16/06/2010" ...
## $ day : int 13 14 15 16 17 18 19 20 21 22 ...
## $ month: int 6 6 6 6 6 6 6 6 6 6 ...
## $ year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## $ q_p1 : int 5422 7047 1572 5657 3668 2898 6912 5209 6322 6865 ...
## $ q_p2 : int 3725 779 2082 2399 3207 2539 1470 2550 852 414 ...
## $ q_p3 : int 576 3578 595 3140 2184 311 1576 3415 3646 3902 ...
## $ q_p4 : int 907 1574 1145 1672 708 1513 1608 842 1377 562 ...
## $ s_p1 : num 17188 22339 4983 17933 11628 ...
## $ s_p2 : num 23617 4939 13200 15210 20332 ...
## $ s_p3 : num 3122 19393 3225 17019 11837 ...
## $ s_p4 : num 6467 11223 8164 11921 5048 ...
Date Column
## x date day month year q_p1 q_p2 q_p3 q_p4 s_p1 s_p2 s_p3 s_p4
## 0 26 0 0 0 0 0 0 0 0 0 0 0
## [1] 110 171 474 535 837 898 1201 1262 1565 1626 1929 1990 2292 2353 2656
## [16] 2717 3020 3081 3384 3445 3747 3808 4111 4172 4475 4536
Date column has 26 Na’s
Na’s are due to error in the day of month for example the month of September in 2010 does not end on 31st
Removing NA’s in date column
## x date day month year q_p1 q_p2 q_p3 q_p4 s_p1 s_p2 s_p3 s_p4
## 0 0 0 0 0 0 0 0 0 0 0 0 0
Column names
## [1] "date" "day" "month" "year" "q_p1" "q_p2" "q_p3" "q_p4" "s_p1"
## [10] "s_p2" "s_p3" "s_p4"
trend analysis
Observation
In the last quarter of the year 2022, product 1 sales showed a re-occurrence pattern from that of 2021.
In the last quarter of the year 2022, there was a decrease in product 2 and product 3 sales as compared to 2021.
In the last quarter of the year 2022, product 4 sales showed a re-occurrence pattern with that of 2021.
## q_p1 q_p2 q_p3 q_p4
## 18860169 9741271 14379603 5139979
Observation
Product 1 has seen the highest sales in all the given years 18 860 169 million sales
Suggestion
The company should keep on selling product 1 the more. Product 4 has lowest sales and the company should do a market on product 4.
##
## "units to be sold on 31 dec 2010 for product 1: 4315 sales"
## "units to be sold on 31 dec 2010 for product 2: 2191 sales"
## "units to be sold on 31 dec 2010 for product 3: 3524 sales"
## "units to be sold on 31 dec 2010 for product 4: 1098 sales"
##
## "units to be sold on 31 dec 2011 for product 1: 4363 sales"
## "units to be sold on 31 dec 2011 for product 2: 1876 sales"
## "units to be sold on 31 dec 2011 for product 3: 3716 sales"
## "units to be sold on 31 dec 2011 for product 4: 1166 sales"
##
## "units to be sold on 31 dec 2012 for product 1: 4465 sales"
## "units to be sold on 31 dec 2012 for product 2: 1714 sales"
## "units to be sold on 31 dec 2012 for product 3: 2360 sales"
## "units to be sold on 31 dec 2012 for product 4: 1178 sales"
##
## "units to be sold on 31 dec 2013 for product 1: 3471 sales"
## "units to be sold on 31 dec 2013 for product 2: 2353 sales"
## "units to be sold on 31 dec 2013 for product 3: 2615 sales"
## "units to be sold on 31 dec 2013 for product 4: 1086 sales"
##
## "units to be sold on 31 dec 2014 for product 1: 4409 sales"
## "units to be sold on 31 dec 2014 for product 2: 1774 sales"
## "units to be sold on 31 dec 2014 for product 3: 3059 sales"
## "units to be sold on 31 dec 2014 for product 4: 1051 sales"
##
## "units to be sold on 31 dec 2015 for product 1: 4719 sales"
## "units to be sold on 31 dec 2015 for product 2: 2584 sales"
## "units to be sold on 31 dec 2015 for product 3: 2759 sales"
## "units to be sold on 31 dec 2015 for product 4: 1105 sales"
##
## "units to be sold on 31 dec 2016 for product 1: 4440 sales"
## "units to be sold on 31 dec 2016 for product 2: 2194 sales"
## "units to be sold on 31 dec 2016 for product 3: 2846 sales"
## "units to be sold on 31 dec 2016 for product 4: 1160 sales"
##
## "units to be sold on 31 dec 2017 for product 1: 4114 sales"
## "units to be sold on 31 dec 2017 for product 2: 2289 sales"
## "units to be sold on 31 dec 2017 for product 3: 3228 sales"
## "units to be sold on 31 dec 2017 for product 4: 1123 sales"
##
## "units to be sold on 31 dec 2018 for product 1: 4413 sales"
## "units to be sold on 31 dec 2018 for product 2: 1888 sales"
## "units to be sold on 31 dec 2018 for product 3: 2923 sales"
## "units to be sold on 31 dec 2018 for product 4: 1192 sales"
##
## "units to be sold on 31 dec 2019 for product 1: 4204 sales"
## "units to be sold on 31 dec 2019 for product 2: 2073 sales"
## "units to be sold on 31 dec 2019 for product 3: 3097 sales"
## "units to be sold on 31 dec 2019 for product 4: 1117 sales"
##
## "units to be sold on 31 dec 2020 for product 1: 3617 sales"
## "units to be sold on 31 dec 2020 for product 2: 1985 sales"
## "units to be sold on 31 dec 2020 for product 3: 3170 sales"
## "units to be sold on 31 dec 2020 for product 4: 1079 sales"
##
## "units to be sold on 31 dec 2021 for product 1: 3586 sales"
## "units to be sold on 31 dec 2021 for product 2: 2588 sales"
## "units to be sold on 31 dec 2021 for product 3: 3048 sales"
## "units to be sold on 31 dec 2021 for product 4: 1120 sales"
##
## "units to be sold on 31 dec 2022 for product 1: 4328 sales"
## "units to be sold on 31 dec 2022 for product 2: 2051 sales"
## "units to be sold on 31 dec 2022 for product 3: 3644 sales"
## "units to be sold on 31 dec 2022 for product 4: 967 sales"
Suggestion
The company should not close the retail stores centers every year on the 31st of December because approximately 10 000 total sales can be sold from the 4 products per day generating a total revenue of approximately 50 000 per day
Comparative Analysis
Observation
If the CEO wishes to drop product 3 this would result in a loss of $77 937 448.26 revenues and this would cause a massive setback on the company as most of the revenue comes from product 3.
Suggestion
The CEO should not drop any of the product
date | estimated product 1 sales | estimated product 2 sales | estimated product 3 sales | estimated product 4 sales | estimated product 1 revenues | estimated product 2 revenues | estimated product 3 revenues | estimated product 4 revenues |
---|---|---|---|---|---|---|---|---|
2023-03-03 | 4520.020 | 2130.789 | 3085.457 | 1097.219 | 14325.68 | 13510.12 | 16718.45 | 7823.329 |
2023-04-03 | 4161.116 | 2080.522 | 3172.557 | 1125.876 | 13187.65 | 13191.49 | 17190.38 | 8027.480 |
2023-05-03 | 3934.930 | 2210.566 | 3504.955 | 1061.941 | 12470.36 | 14015.89 | 18992.14 | 7572.036 |
2023-06-03 | 4078.030 | 2097.167 | 3294.353 | 1113.916 | 12923.75 | 13297.00 | 17850.69 | 7942.433 |
2023-07-03 | 4343.515 | 2083.140 | 3047.654 | 1157.682 | 13766.73 | 13208.23 | 16515.38 | 8253.264 |
2023-08-03 | 4187.899 | 2318.075 | 2908.288 | 1111.065 | 13273.06 | 14697.47 | 15759.73 | 7921.419 |
2023-09-03 | 3635.496 | 2106.909 | 3255.463 | 1141.260 | 11521.87 | 13358.78 | 17641.38 | 8137.282 |
2023-10-03 | 4089.753 | 1959.460 | 3136.658 | 1145.276 | 12961.31 | 12424.12 | 16996.94 | 8165.761 |
2023-11-03 | 4309.246 | 2085.401 | 3214.908 | 1037.070 | 13656.93 | 13222.40 | 17421.06 | 7395.221 |
2023-12-03 | 4029.264 | 2103.981 | 3092.699 | 1126.864 | 12768.91 | 13340.18 | 16758.13 | 8035.147 |
2024-01-03 | 3897.355 | 2307.074 | 3214.711 | 1084.184 | 12350.70 | 14627.64 | 17419.80 | 7731.302 |
2024-02-03 | 4033.006 | 2190.737 | 3196.461 | 1149.421 | 12780.57 | 13890.18 | 17319.60 | 8195.924 |
Relationship between sales and revenue
Observation
The sales of product 1, product 2, product 3, product 4 and the revenues of product 1, product 2, product 3, product 4 have a correlation 1 respectively indicating a perfect positive linear relationship.
There is a strong linear association between the sales and revenues meaning changes in sales are directly and proportionally reflected in the changes in revenues and the converse is true.
Suggestion
The company should increase the production of product 1 because an increase in the sales of product 1 means an increase in the revenues of product 1 in a perfectly manner.
If the company faces a decrease in the production of product 1, this means that a decrease in the sales of product 1 will result in the decrease of the revenues of product.
However, correlation does not imply cause and effect between variables, there might be other contributing factors to be considered.
knitr::opts_chunk$set(echo = F, message = F, warning = F)
sales=read.csv(file.choose()) #importing dataset
library(janitor)
sales=clean_names(sales) #cleaning variable names
colSums(is.na(sales))#checking for missing values in the dataset
anyDuplicated.default(sales) #checking duplicate entries
str(sales)
library(lubridate)
sales$date=dmy(sales$date) #parsing dates
colSums(is.na(sales)) #checking any na's in date column
which(is.na(sales$date))
sales=na.omit(sales)
colSums(is.na(sales))
sales=sales[2:13] #dropping x variable #Keeping important variables
colnames(sales) #variable names
#trend analysis
library(ggplot2)
library(dplyr)
s2022=sales%>% filter(month==c(9:12) & year==2022) %>%
ggplot(aes(x=date),group=1,)+
geom_line(aes(y=q_p1,color="product 1 sales"),
stat = "identity") +
geom_line(aes(y=q_p2,color="product 2 sales"),
stat = "identity")+
geom_line(aes(y=q_p3,color="product 3 sales"),
stat = "identity") +
geom_line(aes(y=q_p4,color="product 4 sales"),
stat="identity") + theme_bw() +
labs(color="",title = "Trend analysis of products",
subtitle = "Year: 2022",x="Months",y="Frequency")+
theme(legend.position = "right")
s2021=sales %>% filter(month==c(9:12) & year==2021) %>%
ggplot(aes(x=date),group=1,)+
geom_line(aes(y=q_p1,color="product 1 sales"),
stat = "identity") +
geom_line(aes(y=q_p2,color="product 2 sales"),
stat = "identity")+
geom_line(aes(y=q_p3,color="product 3 sales"),
stat = "identity") +
geom_line(aes(y=q_p4,color="product 4 sales"),
stat="identity") + theme_bw() +
labs(color="",title = "Trend analysis of products",
subtitle = "Year: 2021",x="Months",y="Frequency")+
theme(legend.position = "right")
library(gridExtra)
grid.arrange(s2022, s2021 ,ncol=1)
#count of sales
t=sales %>% select(q_p1,q_p2,q_p3,q_p4) %>%
colSums()
t
#estimating number of units
dec_2010=sales%>%
filter(date>="2010-12-01" & date<="2010-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
a1=head(paste("units to be sold on 31 dec 2010 for product 1: " ,
round(dec_2010$mean_q1*1), "sales"),1)
a2=head(paste("units to be sold on 31 dec 2010 for product 2: " ,
round(dec_2010$mean_q2*1), "sales"),1)
a3=head(paste("units to be sold on 31 dec 2010 for product 3: " ,
round(dec_2010$mean_q3*1), "sales"),1)
a4=head(paste("units to be sold on 31 dec 2010 for product 4: " ,
round(dec_2010$mean_q4*1), "sales"),1)
aa=rbind(a1,a2,a3,a4)
rownames(aa)=c("","","","")
colnames(aa)=""
aa
###############################################################
dec_2011=sales%>%
filter(date>="2011-12-01" & date<="2011-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
b1=head(paste("units to be sold on 31 dec 2011 for product 1: " ,
round(dec_2011$mean_q1*1), "sales"),1)
b2=head(paste("units to be sold on 31 dec 2011 for product 2: " ,
round(dec_2011$mean_q2*1), "sales"),1)
b3=head(paste("units to be sold on 31 dec 2011 for product 3: " ,
round(dec_2011$mean_q3*1), "sales"),1)
b4=head(paste("units to be sold on 31 dec 2011 for product 4: " ,
round(dec_2011$mean_q4*1), "sales"),1)
bb=rbind(b1,b2,b3,b4)
rownames(bb)=c("","","","")
colnames(bb)=""
bb
############################################################
dec_2012=sales%>%
filter(date>="2012-12-01" & date<="2012-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
c1=head(paste("units to be sold on 31 dec 2012 for product 1: " ,
round(dec_2012$mean_q1*1), "sales"),1)
c2=head(paste("units to be sold on 31 dec 2012 for product 2: " ,
round(dec_2012$mean_q2*1), "sales"),1)
c3=head(paste("units to be sold on 31 dec 2012 for product 3: " ,
round(dec_2012$mean_q3*1), "sales"),1)
c4=head(paste("units to be sold on 31 dec 2012 for product 4: " ,
round(dec_2012$mean_q4*1), "sales"),1)
cc=rbind(c1,c2,c3,c4)
rownames(cc)=c("","","","")
colnames(cc)=""
cc
############################################################
dec_2013=sales%>%
filter(date>="2013-12-01" & date<="2013-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
d1=head(paste("units to be sold on 31 dec 2013 for product 1: " ,
round(dec_2013$mean_q1*1), "sales"),1)
d2=head(paste("units to be sold on 31 dec 2013 for product 2: " ,
round(dec_2013$mean_q2*1), "sales"),1)
d3=head(paste("units to be sold on 31 dec 2013 for product 3: " ,
round(dec_2013$mean_q3*1), "sales"),1)
d4=head(paste("units to be sold on 31 dec 2013 for product 4: " ,
round(dec_2013$mean_q4*1), "sales"),1)
dd=rbind(d1,d2,d3,d4)
rownames(dd)=c("","","","")
colnames(dd)=""
dd
################################################################
dec_2014=sales%>%
filter(date>="2014-12-01" & date<="2014-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
e1=head(paste("units to be sold on 31 dec 2014 for product 1: " ,
round(dec_2014$mean_q1*1), "sales"),1)
e2=head(paste("units to be sold on 31 dec 2014 for product 2: " ,
round(dec_2014$mean_q2*1), "sales"),1)
e3=head(paste("units to be sold on 31 dec 2014 for product 3: " ,
round(dec_2014$mean_q3*1), "sales"),1)
e4=head(paste("units to be sold on 31 dec 2014 for product 4: " ,
round(dec_2014$mean_q4*1), "sales"),1)
ee=rbind(e1,e2,e3,e4)
rownames(ee)=c("","","","")
colnames(ee)=""
ee
###################################################################
dec_2015=sales%>%
filter(date>="2015-12-01" & date<="2015-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
f1=head(paste("units to be sold on 31 dec 2015 for product 1: " ,
round(dec_2015$mean_q1*1), "sales"),1)
f2=head(paste("units to be sold on 31 dec 2015 for product 2: " ,
round(dec_2015$mean_q2*1), "sales"),1)
f3=head(paste("units to be sold on 31 dec 2015 for product 3: " ,
round(dec_2015$mean_q3*1), "sales"),1)
f4=head(paste("units to be sold on 31 dec 2015 for product 4: " ,
round(dec_2015$mean_q4*1), "sales"),1)
ff=rbind(f1,f2,f3,f4)
rownames(ff)=c("","","","")
colnames(ff)=""
ff
###############################################################
dec_2016=sales%>%
filter(date>="2016-12-01" & date<="2016-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
g1=head(paste("units to be sold on 31 dec 2016 for product 1: " ,
round(dec_2016$mean_q1*1), "sales"),1)
g2=head(paste("units to be sold on 31 dec 2016 for product 2: " ,
round(dec_2016$mean_q2*1), "sales"),1)
g3=head(paste("units to be sold on 31 dec 2016 for product 3: " ,
round(dec_2016$mean_q3*1), "sales"),1)
g4=head(paste("units to be sold on 31 dec 2016 for product 4: " ,
round(dec_2016$mean_q4*1), "sales"),1)
gg=rbind(g1,g2,g3,g4)
rownames(gg)=c("","","","")
colnames(gg)=""
gg
################################################################
dec_2017=sales%>%
filter(date>="2017-12-01" & date<="2017-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
h1=head(paste("units to be sold on 31 dec 2017 for product 1: " ,
round(dec_2017$mean_q1*1), "sales"),1)
h2=head(paste("units to be sold on 31 dec 2017 for product 2: " ,
round(dec_2017$mean_q2*1), "sales"),1)
h3=head(paste("units to be sold on 31 dec 2017 for product 3: " ,
round(dec_2017$mean_q3*1), "sales"),1)
h4=head(paste("units to be sold on 31 dec 2017 for product 4: " ,
round(dec_2017$mean_q4*1), "sales"),1)
hh=rbind(h1,h2,h3,h4)
rownames(hh)=c("","","","")
colnames(hh)=""
hh
###############################################################
dec_2018=sales%>%
filter(date>="2018-12-01" & date<="2018-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
j1=head(paste("units to be sold on 31 dec 2018 for product 1: " ,
round(dec_2018$mean_q1*1), "sales"),1)
j2=head(paste("units to be sold on 31 dec 2018 for product 2: " ,
round(dec_2018$mean_q2*1), "sales"),1)
j3=head(paste("units to be sold on 31 dec 2018 for product 3: " ,
round(dec_2018$mean_q3*1), "sales"),1)
j4=head(paste("units to be sold on 31 dec 2018 for product 4: " ,
round(dec_2018$mean_q4*1), "sales"),1)
jj=rbind(j1,j2,j3,j4)
rownames(jj)=c("","","","")
colnames(jj)=""
jj
##############################################################
dec_2019=sales%>%
filter(date>="2019-12-01" & date<="2019-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
k1=head(paste("units to be sold on 31 dec 2019 for product 1: " ,
round(dec_2019$mean_q1*1), "sales"),1)
k2=head(paste("units to be sold on 31 dec 2019 for product 2: " ,
round(dec_2019$mean_q2*1), "sales"),1)
k3=head(paste("units to be sold on 31 dec 2019 for product 3: " ,
round(dec_2019$mean_q3*1), "sales"),1)
k4=head(paste("units to be sold on 31 dec 2019 for product 4: " ,
round(dec_2019$mean_q4*1), "sales"),1)
kk=rbind(k1,k2,k3,k4)
rownames(kk)=c("","","","")
colnames(kk)=""
kk
#################################################################
dec_2020=sales%>%
filter(date>="2020-12-01" & date<="2020-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
l1=head(paste("units to be sold on 31 dec 2020 for product 1: " ,
round(dec_2020$mean_q1*1), "sales"),1)
l2=head(paste("units to be sold on 31 dec 2020 for product 2: " ,
round(dec_2020$mean_q2*1), "sales"),1)
l3=head(paste("units to be sold on 31 dec 2020 for product 3: " ,
round(dec_2020$mean_q3*1), "sales"),1)
l4=head(paste("units to be sold on 31 dec 2020 for product 4: " ,
round(dec_2020$mean_q4*1), "sales"),1)
ll=rbind(l1,l2,l3,l4)
rownames(ll)=c("","","","")
colnames(ll)=""
ll
##################################################################
dec_2021=sales%>%
filter(date>="2021-12-01" & date<="2021-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
m1=head(paste("units to be sold on 31 dec 2021 for product 1: " ,
round(dec_2021$mean_q1*1), "sales"),1)
m2=head(paste("units to be sold on 31 dec 2021 for product 2: " ,
round(dec_2021$mean_q2*1), "sales"),1)
m3=head(paste("units to be sold on 31 dec 2021 for product 3: " ,
round(dec_2021$mean_q3*1), "sales"),1)
m4=head(paste("units to be sold on 31 dec 2021 for product 4: " ,
round(dec_2021$mean_q4*1), "sales"),1)
mm=rbind(m1,m2,m3,m4)
rownames(mm)=c("","","","")
colnames(mm)=""
mm
####################################################################
dec_2022=sales%>%
filter(date>="2022-12-01" & date<="2022-12-30") %>%
select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
mutate(mean_q4=mean(q_p4))
n1=head(paste("units to be sold on 31 dec 2022 for product 1: " ,
round(dec_2022$mean_q1*1), "sales"),1)
n2=head(paste("units to be sold on 31 dec 2022 for product 2: " ,
round(dec_2022$mean_q2*1), "sales"),1)
n3=head(paste("units to be sold on 31 dec 2022 for product 3: " ,
round(dec_2022$mean_q3*1), "sales"),1)
n4=head(paste("units to be sold on 31 dec 2022 for product 4: " ,
round(dec_2022$mean_q4*1), "sales"),1)
nn=rbind(n1,n2,n3,n4)
rownames(nn)=c("","","","")
colnames(nn)=""
nn
#Comparative analysis
revenues=sales %>%
select(s_p1,s_p2,s_p3,s_p4) %>%
colSums()
nsales=sales %>%
select(q_p1,q_p2,q_p3,q_p4) %>%
colSums()
product_name=c("product 1","product 2","product 3","product 4")
df=as.data.frame(cbind(product_name,revenues,nsales))
bar_sales=df %>% ggplot(aes(product_name,nsales,fill=product_name))+
geom_bar(stat="identity") + theme_bw() +
labs(title="Distribution of total sales ",y="sales",
x="type of product")
bar_revenues=df %>% ggplot(aes(product_name,y=revenues,fill=product_name))+
geom_bar(stat="identity")+ theme_bw() +
labs(title="Distribution of total revenues ",y="revenues",
x="type of product")
grid.arrange(bar_sales,bar_revenues,ncol=1)
#forecast
library(prophet)
sales_p1= sales %>%
select(date,q_p1)
sales_p1=sales_p1 %>%
rename(ds=date,y=q_p1)
model_p1=prophet(sales_p1)
future_p_1_sales=make_future_dataframe(model_p1,periods = 12,freq = "month")
forecasted_p1=predict(model_p1,future_p_1_sales)
forecast_p1_sales=forecasted_p1 %>%
filter(ds>="2023-03-03") %>%
select(ds,yhat) %>%
rename("date"=ds,"estimated product 1 sales"=yhat)
##############################################################
sales_p2= sales %>%
select(date,q_p2)
sales_p2=sales_p2 %>%
rename(ds=date,y=q_p2)
model_p2=prophet(sales_p2)
future_p_2_sales=make_future_dataframe(model_p2,periods = 12,freq = "month")
forecasted_p2=predict(model_p2,future_p_2_sales)
forecast_p2_sales=forecasted_p2 %>%
filter(ds>="2023-03-03") %>%
select(yhat) %>%
rename("estimated product 2 sales"=yhat)
##########################################################################
sales_p3= sales %>%
select(date,q_p3)
sales_p3=sales_p3 %>%
rename(ds=date,y=q_p3)
model_p3=prophet(sales_p3)
future_p_3_sales=make_future_dataframe(model_p3,periods = 12,freq = "month")
forecasted_p3=predict(model_p3,future_p_3_sales)
forecast_p3_sales=forecasted_p3 %>%
filter(ds>="2023-03-03") %>%
select(yhat) %>%
rename("estimated product 3 sales"=yhat)
##########################################################################
sales_p4= sales %>%
select(date,q_p4)
sales_p4=sales_p4 %>%
rename(ds=date,y=q_p4)
model_p4=prophet(sales_p4)
future_p_4_sales=make_future_dataframe(model_p4,periods = 12,freq = "month")
forecasted_p4=predict(model_p4,future_p_4_sales)
forecast_p4_sales=forecasted_p4 %>%
filter(ds>="2023-03-03") %>%
select(yhat) %>%
rename("estimated product 4 sales"=yhat)
####################################################################
#ESTIMATED REVENUES
revenues_p1= sales %>%
select(date,s_p1)
revenues_p1=revenues_p1 %>%
rename(ds=date,y=s_p1)
model_revenues_p1=prophet(revenues_p1)
future_p_1_revenues=make_future_dataframe(model_revenues_p1,periods = 12,freq = "month")
forecasted_p1_revenues=predict(model_revenues_p1,future_p_1_revenues)
forecast_p1_revenues=forecasted_p1_revenues %>%
filter(ds>="2023-03-03") %>%
select(yhat) %>%
rename("estimated product 1 revenues"=yhat)
##################################################################3
revenues_p2= sales %>%
select(date,s_p2)
revenues_p2=revenues_p2 %>%
rename(ds=date,y=s_p2)
model_revenues_p2=prophet(revenues_p2)
future_p_2_revenues=make_future_dataframe(model_revenues_p2,periods = 12,freq = "month")
forecasted_p2_revenues=predict(model_revenues_p2,future_p_2_revenues)
forecast_p2_revenues=forecasted_p2_revenues %>%
filter(ds>="2023-03-03") %>%
select(yhat) %>%
rename("estimated product 2 revenues"=yhat)
######################################################################
revenues_p3= sales %>%
select(date,s_p3)
revenues_p3=revenues_p3 %>%
rename(ds=date,y=s_p3)
model_revenues_p3=prophet(revenues_p3)
future_p_3_revenues=make_future_dataframe(model_revenues_p3,periods = 12,freq = "month")
forecasted_p3_revenues=predict(model_revenues_p3,future_p_3_revenues)
forecast_p3_revenues=forecasted_p3_revenues %>%
filter(ds>="2023-03-03") %>%
select(yhat) %>%
rename("estimated product 3 revenues"=yhat)
##########################################################################3
revenues_p4= sales %>%
select(date,s_p4)
revenues_p4=revenues_p4 %>%
rename(ds=date,y=s_p4)
model_revenues_p4=prophet(revenues_p4)
future_p_4_revenues=make_future_dataframe(model_revenues_p4,periods = 12,freq = "month")
forecasted_p4_revenues=predict(model_revenues_p4,future_p_4_revenues)
forecast_p4_revenues=forecasted_p4_revenues %>%
filter(ds>="2023-03-03") %>%
select(yhat) %>%
rename("estimated product 4 revenues"=yhat)
#####################################################################
summary_forecast=cbind(forecast_p1_sales,forecast_p2_sales,
forecast_p3_sales,forecast_p4_sales,
forecast_p1_revenues,forecast_p2_revenues,
forecast_p3_revenues,forecast_p4_revenues)
library(knitr)
kable(summary_forecast)
#Correlation
library(corrplot)
co=sales %>%
select(q_p1,q_p2,q_p3,q_p4,
s_p1,s_p2,s_p3,s_p4)
co_matrix=cor(co)
corrplot(co_matrix,method = "number",col="black",
type = "lower",bg="gold2",number.digits = 3,
number.cex = 0.6,tl.col ="red2",title = "Correlation",
mar = c(2,2,2,2))