OVERVIEW

This task focuses on the Exploratory Data Analysis (EDA), forecasting, and the resulting business insights for Rec Corp Ltd, a hypothetical company. The dataset used for analysis was generated using Python and sourced from Kaggle. The primary objectives were to conduct EDA, clean the data, and answer business questions to gain insights. Additionally, forecasting techniques were applied to predict future trends and support decision-making processes.The EDA process involved data cleaning, descriptive statistics, and visualizations to understand the dataset’s characteristics and identify patterns.In addition to EDA, forecasting techniques were employed to predict future trends and outcomes. These forecasts provided Rec Corp Ltd with valuable information to make informed business decisions, allocate resources effectively, and implement strategies to mitigate risks.The combination of EDA and forecasting enabled Rec Corp Ltd to gain a comprehensive understanding of their dataset, identify patterns, and anticipate future trends. By leveraging R and drawing on techniques from Kaggle, Rec Corp Ltd was able to extract meaningful insights and make data-driven decisions.In conclusion, the EDA and forecasting conducted for Rec Corp Ltd showcased the power of data analysis in driving business insights. By leveraging R, Kaggle, and various analytical techniques, Rec Corp Ltd gained a deeper understanding of their dataset, identified key trends, and made informed business decisions. This analysis serves as a foundation for Rec Corp Ltd to optimize their operations, enhance safety measures, and achieve their business goals.

INTRODUCTION

REC corp LTD. is small-scaled business venture established in India. They have been selling four products for over 10 years. Exploratory Data Analysis (EDA) and forecasting are powerful techniques that provide valuable insights for businesses. In the case of Rec Corp Ltd, a hypothetical company, EDA was conducted to understand the provided dataset, followed by forecasting to predict future trends and make informed business decisions. Rec Corp Ltd gained a comprehensive understanding of the dataset, identified patterns, and utilized forecasting to anticipate future outcomes.

KEY TERMS

The products are P1, P2, P3 and P4.

Q1- Total unit sales of product 1

Q2- Total unit sales of product 2

Q3- Total unit sales of product 3

Q4- Total unit sales of product 4

S1- Total revenue from product 1

S2- Total revenue from product 2

S3- Total revenue from product 3

S4- Total revenue from product 4

Example :

On 13-06-2010 , product 1 had been brought by 5422 people and INR 17187.74 had been generated in revenue from product 1

BUSINESS QUESTIONS

  1. Is there any trend in the sales of all four products during certain months?

  2. Out of all four products , which product has seen the highest sales in all the given years?

  3. The company has all it’s retail centers closed on the 31st of December every year. An estimate on number of units of each product that could be sold on 31st of Dec , every year , if all their retail centers were kept open.

  4. The CEO is considering an idea to drop the production of any one of the products. Suggestion on whether his idea would result in a massive setback for the company.

  5. The CEO would also like to predict the sales and revenues for the year 2024. Give a yearly estimate with the best possible accuracy.

ABOUT DATASET

The dataset is from Kaggle https://www.kaggle.com/datasets/ksabishek/product-sales-data

NOTE: This is a hypothetical dataset generated using python for educational purposes. It bears no resemblance to any real firm. Any similarity is a matter of coincidence.

IMPORTING DATASET

DATA CLEANING

##     x  date   day month  year  q_p1  q_p2  q_p3  q_p4  s_p1  s_p2  s_p3  s_p4 
##     0     0     0     0     0     0     0     0     0     0     0     0     0
## [1] 0

DATA DESCRIPTION

## 'data.frame':    4600 obs. of  13 variables:
##  $ x    : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ date : chr  "13/06/2010" "14/06/2010" "15/06/2010" "16/06/2010" ...
##  $ day  : int  13 14 15 16 17 18 19 20 21 22 ...
##  $ month: int  6 6 6 6 6 6 6 6 6 6 ...
##  $ year : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ q_p1 : int  5422 7047 1572 5657 3668 2898 6912 5209 6322 6865 ...
##  $ q_p2 : int  3725 779 2082 2399 3207 2539 1470 2550 852 414 ...
##  $ q_p3 : int  576 3578 595 3140 2184 311 1576 3415 3646 3902 ...
##  $ q_p4 : int  907 1574 1145 1672 708 1513 1608 842 1377 562 ...
##  $ s_p1 : num  17188 22339 4983 17933 11628 ...
##  $ s_p2 : num  23617 4939 13200 15210 20332 ...
##  $ s_p3 : num  3122 19393 3225 17019 11837 ...
##  $ s_p4 : num  6467 11223 8164 11921 5048 ...

Date Column

##     x  date   day month  year  q_p1  q_p2  q_p3  q_p4  s_p1  s_p2  s_p3  s_p4 
##     0    26     0     0     0     0     0     0     0     0     0     0     0
##  [1]  110  171  474  535  837  898 1201 1262 1565 1626 1929 1990 2292 2353 2656
## [16] 2717 3020 3081 3384 3445 3747 3808 4111 4172 4475 4536

Removing NA’s in date column

##     x  date   day month  year  q_p1  q_p2  q_p3  q_p4  s_p1  s_p2  s_p3  s_p4 
##     0     0     0     0     0     0     0     0     0     0     0     0     0

Column names

##  [1] "date"  "day"   "month" "year"  "q_p1"  "q_p2"  "q_p3"  "q_p4"  "s_p1" 
## [10] "s_p2"  "s_p3"  "s_p4"

QN1

Is there any trend in the sales of all four products during certain months?

trend analysis

trend analysis

  • Observation

    In the last quarter of the year 2022, product 1 sales showed a re-occurrence pattern from that of 2021.

    In the last quarter of the year 2022, there was a decrease in product 2 and product 3 sales as compared to 2021.

    In the last quarter of the year 2022, product 4 sales showed a re-occurrence pattern with that of 2021.

QN2

Out of all four products , which product has seen the highest sales in all the given years?

##     q_p1     q_p2     q_p3     q_p4 
## 18860169  9741271 14379603  5139979
  • Observation

    Product 1 has seen the highest sales in all the given years 18 860 169 million sales

  • Suggestion

    The company should keep on selling product 1 the more. Product 4 has lowest sales and the company should do a market on product 4.

QN3

The company has all its retail centers closed on the 31st of December every year. An estimate on number of units of each product that could be sold on 31st of Dec , every year , if all their retail centers were kept open.

##                                                              
##  "units to be sold on 31 dec 2010 for product 1:  4315 sales"
##  "units to be sold on 31 dec 2010 for product 2:  2191 sales"
##  "units to be sold on 31 dec 2010 for product 3:  3524 sales"
##  "units to be sold on 31 dec 2010 for product 4:  1098 sales"
##                                                              
##  "units to be sold on 31 dec 2011 for product 1:  4363 sales"
##  "units to be sold on 31 dec 2011 for product 2:  1876 sales"
##  "units to be sold on 31 dec 2011 for product 3:  3716 sales"
##  "units to be sold on 31 dec 2011 for product 4:  1166 sales"
##                                                              
##  "units to be sold on 31 dec 2012 for product 1:  4465 sales"
##  "units to be sold on 31 dec 2012 for product 2:  1714 sales"
##  "units to be sold on 31 dec 2012 for product 3:  2360 sales"
##  "units to be sold on 31 dec 2012 for product 4:  1178 sales"
##                                                              
##  "units to be sold on 31 dec 2013 for product 1:  3471 sales"
##  "units to be sold on 31 dec 2013 for product 2:  2353 sales"
##  "units to be sold on 31 dec 2013 for product 3:  2615 sales"
##  "units to be sold on 31 dec 2013 for product 4:  1086 sales"
##                                                              
##  "units to be sold on 31 dec 2014 for product 1:  4409 sales"
##  "units to be sold on 31 dec 2014 for product 2:  1774 sales"
##  "units to be sold on 31 dec 2014 for product 3:  3059 sales"
##  "units to be sold on 31 dec 2014 for product 4:  1051 sales"
##                                                              
##  "units to be sold on 31 dec 2015 for product 1:  4719 sales"
##  "units to be sold on 31 dec 2015 for product 2:  2584 sales"
##  "units to be sold on 31 dec 2015 for product 3:  2759 sales"
##  "units to be sold on 31 dec 2015 for product 4:  1105 sales"
##                                                              
##  "units to be sold on 31 dec 2016 for product 1:  4440 sales"
##  "units to be sold on 31 dec 2016 for product 2:  2194 sales"
##  "units to be sold on 31 dec 2016 for product 3:  2846 sales"
##  "units to be sold on 31 dec 2016 for product 4:  1160 sales"
##                                                              
##  "units to be sold on 31 dec 2017 for product 1:  4114 sales"
##  "units to be sold on 31 dec 2017 for product 2:  2289 sales"
##  "units to be sold on 31 dec 2017 for product 3:  3228 sales"
##  "units to be sold on 31 dec 2017 for product 4:  1123 sales"
##                                                              
##  "units to be sold on 31 dec 2018 for product 1:  4413 sales"
##  "units to be sold on 31 dec 2018 for product 2:  1888 sales"
##  "units to be sold on 31 dec 2018 for product 3:  2923 sales"
##  "units to be sold on 31 dec 2018 for product 4:  1192 sales"
##                                                              
##  "units to be sold on 31 dec 2019 for product 1:  4204 sales"
##  "units to be sold on 31 dec 2019 for product 2:  2073 sales"
##  "units to be sold on 31 dec 2019 for product 3:  3097 sales"
##  "units to be sold on 31 dec 2019 for product 4:  1117 sales"
##                                                              
##  "units to be sold on 31 dec 2020 for product 1:  3617 sales"
##  "units to be sold on 31 dec 2020 for product 2:  1985 sales"
##  "units to be sold on 31 dec 2020 for product 3:  3170 sales"
##  "units to be sold on 31 dec 2020 for product 4:  1079 sales"
##                                                              
##  "units to be sold on 31 dec 2021 for product 1:  3586 sales"
##  "units to be sold on 31 dec 2021 for product 2:  2588 sales"
##  "units to be sold on 31 dec 2021 for product 3:  3048 sales"
##  "units to be sold on 31 dec 2021 for product 4:  1120 sales"
##                                                              
##  "units to be sold on 31 dec 2022 for product 1:  4328 sales"
##  "units to be sold on 31 dec 2022 for product 2:  2051 sales"
##  "units to be sold on 31 dec 2022 for product 3:  3644 sales"
##  "units to be sold on 31 dec 2022 for product 4:  967 sales"
  • Suggestion

    The company should not close the retail stores centers every year on the 31st of December because approximately 10 000 total sales can be sold from the 4 products per day generating a total revenue of approximately 50 000 per day

QN4

The CEO is considering an idea to drop the production of any one of the products. Suggestion on whether his idea would result in a massive setback for the company.

Comparative Analysis

Comparative Analysis

  • Observation

    If the CEO wishes to drop product 3 this would result in a loss of $77 937 448.26 revenues and this would cause a massive setback on the company as most of the revenue comes from product 3.

  • Suggestion

    The CEO should not drop any of the product

QN5

The CEO would also like to predict the sales and revenues for the year 2024. A yearly estimate with the best possible accuracy.

date estimated product 1 sales estimated product 2 sales estimated product 3 sales estimated product 4 sales estimated product 1 revenues estimated product 2 revenues estimated product 3 revenues estimated product 4 revenues
2023-03-03 4520.020 2130.789 3085.457 1097.219 14325.68 13510.12 16718.45 7823.329
2023-04-03 4161.116 2080.522 3172.557 1125.876 13187.65 13191.49 17190.38 8027.480
2023-05-03 3934.930 2210.566 3504.955 1061.941 12470.36 14015.89 18992.14 7572.036
2023-06-03 4078.030 2097.167 3294.353 1113.916 12923.75 13297.00 17850.69 7942.433
2023-07-03 4343.515 2083.140 3047.654 1157.682 13766.73 13208.23 16515.38 8253.264
2023-08-03 4187.899 2318.075 2908.288 1111.065 13273.06 14697.47 15759.73 7921.419
2023-09-03 3635.496 2106.909 3255.463 1141.260 11521.87 13358.78 17641.38 8137.282
2023-10-03 4089.753 1959.460 3136.658 1145.276 12961.31 12424.12 16996.94 8165.761
2023-11-03 4309.246 2085.401 3214.908 1037.070 13656.93 13222.40 17421.06 7395.221
2023-12-03 4029.264 2103.981 3092.699 1126.864 12768.91 13340.18 16758.13 8035.147
2024-01-03 3897.355 2307.074 3214.711 1084.184 12350.70 14627.64 17419.80 7731.302
2024-02-03 4033.006 2190.737 3196.461 1149.421 12780.57 13890.18 17319.60 8195.924
  • Observation

CORRELATION

Is there any relationship between sales and revenues ?

Relationship between sales and revenue

Relationship between sales and revenue

  • Observation

    The sales of product 1, product 2, product 3, product 4 and the revenues of product 1, product 2, product 3, product 4 have a correlation 1 respectively indicating a perfect positive linear relationship.

    There is a strong linear association between the sales and revenues meaning changes in sales are directly and proportionally reflected in the changes in revenues and the converse is true.

  • Suggestion

    The company should increase the production of product 1 because an increase in the sales of product 1 means an increase in the revenues of product 1 in a perfectly manner.

    If the company faces a decrease in the production of product 1, this means that a decrease in the sales of product 1 will result in the decrease of the revenues of product.

    However, correlation does not imply cause and effect between variables, there might be other contributing factors to be considered.


CODE APPENDIX

knitr::opts_chunk$set(echo = F, message = F, warning = F)
sales=read.csv(file.choose()) #importing dataset
library(janitor)
sales=clean_names(sales) #cleaning variable names
colSums(is.na(sales))#checking for missing values in the dataset
anyDuplicated.default(sales) #checking duplicate entries

str(sales)

library(lubridate) 

sales$date=dmy(sales$date) #parsing dates

colSums(is.na(sales)) #checking any na's in date column

which(is.na(sales$date)) 

sales=na.omit(sales)

colSums(is.na(sales))

sales=sales[2:13] #dropping x variable #Keeping important variables
colnames(sales) #variable names
#trend analysis
library(ggplot2)

library(dplyr)

s2022=sales%>% filter(month==c(9:12) & year==2022) %>%
  ggplot(aes(x=date),group=1,)+ 
  geom_line(aes(y=q_p1,color="product 1 sales"),
            stat = "identity") +
  geom_line(aes(y=q_p2,color="product 2 sales"),
            stat = "identity")+
  geom_line(aes(y=q_p3,color="product 3 sales"),
            stat = "identity") +
  geom_line(aes(y=q_p4,color="product 4 sales"),
            stat="identity") + theme_bw() +
  labs(color="",title = "Trend analysis of products",
       subtitle = "Year: 2022",x="Months",y="Frequency")+
  theme(legend.position = "right")


s2021=sales %>% filter(month==c(9:12) & year==2021) %>%
  ggplot(aes(x=date),group=1,)+ 
  geom_line(aes(y=q_p1,color="product 1 sales"),
            stat = "identity") +
  geom_line(aes(y=q_p2,color="product 2 sales"),
            stat = "identity")+
  geom_line(aes(y=q_p3,color="product 3 sales"),
            stat = "identity") +
  geom_line(aes(y=q_p4,color="product 4 sales"),
            stat="identity") + theme_bw() +
  labs(color="",title = "Trend analysis of products",
       subtitle = "Year: 2021",x="Months",y="Frequency")+
  theme(legend.position = "right")

library(gridExtra)

grid.arrange(s2022, s2021 ,ncol=1)
#count of sales
t=sales %>% select(q_p1,q_p2,q_p3,q_p4) %>%
  colSums()
t
#estimating number of units
dec_2010=sales%>%
  filter(date>="2010-12-01" & date<="2010-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

a1=head(paste("units to be sold on 31 dec 2010 for product 1: " ,
      round(dec_2010$mean_q1*1), "sales"),1)

a2=head(paste("units to be sold on 31 dec 2010 for product 2: " ,
      round(dec_2010$mean_q2*1), "sales"),1)

a3=head(paste("units to be sold on 31 dec 2010 for product 3: " ,
      round(dec_2010$mean_q3*1), "sales"),1)

a4=head(paste("units to be sold on 31 dec 2010 for product 4: " ,
      round(dec_2010$mean_q4*1), "sales"),1)

aa=rbind(a1,a2,a3,a4)

rownames(aa)=c("","","","")
colnames(aa)=""
aa 

###############################################################

dec_2011=sales%>%
  filter(date>="2011-12-01" & date<="2011-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

b1=head(paste("units to be sold on 31 dec 2011 for product 1: " ,
      round(dec_2011$mean_q1*1), "sales"),1)

b2=head(paste("units to be sold on 31 dec 2011 for product 2: " ,
      round(dec_2011$mean_q2*1), "sales"),1)

b3=head(paste("units to be sold on 31 dec 2011 for product 3: " ,
      round(dec_2011$mean_q3*1), "sales"),1)

b4=head(paste("units to be sold on 31 dec 2011 for product 4: " ,
      round(dec_2011$mean_q4*1), "sales"),1)

bb=rbind(b1,b2,b3,b4)

rownames(bb)=c("","","","")
colnames(bb)=""
bb 

############################################################

dec_2012=sales%>%
  filter(date>="2012-12-01" & date<="2012-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

c1=head(paste("units to be sold on 31 dec 2012 for product 1: " ,
      round(dec_2012$mean_q1*1), "sales"),1)

c2=head(paste("units to be sold on 31 dec 2012 for product 2: " ,
      round(dec_2012$mean_q2*1), "sales"),1)

c3=head(paste("units to be sold on 31 dec 2012 for product 3: " ,
      round(dec_2012$mean_q3*1), "sales"),1)

c4=head(paste("units to be sold on 31 dec 2012 for product 4: " ,
      round(dec_2012$mean_q4*1), "sales"),1)

cc=rbind(c1,c2,c3,c4)

rownames(cc)=c("","","","")
colnames(cc)=""
cc

############################################################

dec_2013=sales%>%
  filter(date>="2013-12-01" & date<="2013-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

d1=head(paste("units to be sold on 31 dec 2013 for product 1: " ,
      round(dec_2013$mean_q1*1), "sales"),1)

d2=head(paste("units to be sold on 31 dec 2013 for product 2: " ,
      round(dec_2013$mean_q2*1), "sales"),1)

d3=head(paste("units to be sold on 31 dec 2013 for product 3: " ,
      round(dec_2013$mean_q3*1), "sales"),1)

d4=head(paste("units to be sold on 31 dec 2013 for product 4: " ,
      round(dec_2013$mean_q4*1), "sales"),1)

dd=rbind(d1,d2,d3,d4)

rownames(dd)=c("","","","")
colnames(dd)=""
dd

################################################################

dec_2014=sales%>%
  filter(date>="2014-12-01" & date<="2014-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

e1=head(paste("units to be sold on 31 dec 2014 for product 1: " ,
      round(dec_2014$mean_q1*1), "sales"),1)

e2=head(paste("units to be sold on 31 dec 2014 for product 2: " ,
      round(dec_2014$mean_q2*1), "sales"),1)

e3=head(paste("units to be sold on 31 dec 2014 for product 3: " ,
      round(dec_2014$mean_q3*1), "sales"),1)

e4=head(paste("units to be sold on 31 dec 2014 for product 4: " ,
      round(dec_2014$mean_q4*1), "sales"),1)

ee=rbind(e1,e2,e3,e4)

rownames(ee)=c("","","","")
colnames(ee)=""
ee

###################################################################

dec_2015=sales%>%
  filter(date>="2015-12-01" & date<="2015-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

f1=head(paste("units to be sold on 31 dec 2015 for product 1: " ,
      round(dec_2015$mean_q1*1), "sales"),1)

f2=head(paste("units to be sold on 31 dec 2015 for product 2: " ,
      round(dec_2015$mean_q2*1), "sales"),1)

f3=head(paste("units to be sold on 31 dec 2015 for product 3: " ,
      round(dec_2015$mean_q3*1), "sales"),1)

f4=head(paste("units to be sold on 31 dec 2015 for product 4: " ,
      round(dec_2015$mean_q4*1), "sales"),1)

ff=rbind(f1,f2,f3,f4)

rownames(ff)=c("","","","")
colnames(ff)=""
ff

###############################################################

dec_2016=sales%>%
  filter(date>="2016-12-01" & date<="2016-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

g1=head(paste("units to be sold on 31 dec 2016 for product 1: " ,
      round(dec_2016$mean_q1*1), "sales"),1)

g2=head(paste("units to be sold on 31 dec 2016 for product 2: " ,
      round(dec_2016$mean_q2*1), "sales"),1)

g3=head(paste("units to be sold on 31 dec 2016 for product 3: " ,
      round(dec_2016$mean_q3*1), "sales"),1)

g4=head(paste("units to be sold on 31 dec 2016 for product 4: " ,
      round(dec_2016$mean_q4*1), "sales"),1)

gg=rbind(g1,g2,g3,g4)

rownames(gg)=c("","","","")
colnames(gg)=""
gg

################################################################

dec_2017=sales%>%
  filter(date>="2017-12-01" & date<="2017-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

h1=head(paste("units to be sold on 31 dec 2017 for product 1: " ,
      round(dec_2017$mean_q1*1), "sales"),1)

h2=head(paste("units to be sold on 31 dec 2017 for product 2: " ,
      round(dec_2017$mean_q2*1), "sales"),1)

h3=head(paste("units to be sold on 31 dec 2017 for product 3: " ,
      round(dec_2017$mean_q3*1), "sales"),1)

h4=head(paste("units to be sold on 31 dec 2017 for product 4: " ,
      round(dec_2017$mean_q4*1), "sales"),1)

hh=rbind(h1,h2,h3,h4)

rownames(hh)=c("","","","")
colnames(hh)=""
hh

###############################################################

dec_2018=sales%>%
  filter(date>="2018-12-01" & date<="2018-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

j1=head(paste("units to be sold on 31 dec 2018 for product 1: " ,
      round(dec_2018$mean_q1*1), "sales"),1)

j2=head(paste("units to be sold on 31 dec 2018 for product 2: " ,
      round(dec_2018$mean_q2*1), "sales"),1)

j3=head(paste("units to be sold on 31 dec 2018 for product 3: " ,
      round(dec_2018$mean_q3*1), "sales"),1)

j4=head(paste("units to be sold on 31 dec 2018 for product 4: " ,
      round(dec_2018$mean_q4*1), "sales"),1)

jj=rbind(j1,j2,j3,j4)

rownames(jj)=c("","","","")
colnames(jj)=""
jj

##############################################################

dec_2019=sales%>%
  filter(date>="2019-12-01" & date<="2019-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))


k1=head(paste("units to be sold on 31 dec 2019 for product 1: " ,
      round(dec_2019$mean_q1*1), "sales"),1)

k2=head(paste("units to be sold on 31 dec 2019 for product 2: " ,
      round(dec_2019$mean_q2*1), "sales"),1)

k3=head(paste("units to be sold on 31 dec 2019 for product 3: " ,
      round(dec_2019$mean_q3*1), "sales"),1)

k4=head(paste("units to be sold on 31 dec 2019 for product 4: " ,
      round(dec_2019$mean_q4*1), "sales"),1)

kk=rbind(k1,k2,k3,k4)

rownames(kk)=c("","","","")
colnames(kk)=""
kk

#################################################################

dec_2020=sales%>%
  filter(date>="2020-12-01" & date<="2020-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

l1=head(paste("units to be sold on 31 dec 2020 for product 1: " ,
      round(dec_2020$mean_q1*1), "sales"),1)

l2=head(paste("units to be sold on 31 dec 2020 for product 2: " ,
      round(dec_2020$mean_q2*1), "sales"),1)

l3=head(paste("units to be sold on 31 dec 2020 for product 3: " ,
      round(dec_2020$mean_q3*1), "sales"),1)

l4=head(paste("units to be sold on 31 dec 2020 for product 4: " ,
      round(dec_2020$mean_q4*1), "sales"),1)

ll=rbind(l1,l2,l3,l4)

rownames(ll)=c("","","","")
colnames(ll)=""
ll

##################################################################

dec_2021=sales%>%
  filter(date>="2021-12-01" & date<="2021-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

m1=head(paste("units to be sold on 31 dec 2021 for product 1: " ,
      round(dec_2021$mean_q1*1), "sales"),1)

m2=head(paste("units to be sold on 31 dec 2021 for product 2: " ,
      round(dec_2021$mean_q2*1), "sales"),1)

m3=head(paste("units to be sold on 31 dec 2021 for product 3: " ,
      round(dec_2021$mean_q3*1), "sales"),1)

m4=head(paste("units to be sold on 31 dec 2021 for product 4: " ,
      round(dec_2021$mean_q4*1), "sales"),1)

mm=rbind(m1,m2,m3,m4)

rownames(mm)=c("","","","")
colnames(mm)=""
mm

####################################################################

dec_2022=sales%>%
  filter(date>="2022-12-01" & date<="2022-12-30") %>%
  select(1,5:8) %>% mutate(mean_q1=mean(q_p1)) %>%
  mutate(mean_q2=mean(q_p2)) %>% mutate(mean_q3=mean(q_p3)) %>%
  mutate(mean_q4=mean(q_p4))

n1=head(paste("units to be sold on 31 dec 2022 for product 1: " ,
      round(dec_2022$mean_q1*1), "sales"),1)

n2=head(paste("units to be sold on 31 dec 2022 for product 2: " ,
      round(dec_2022$mean_q2*1), "sales"),1)

n3=head(paste("units to be sold on 31 dec 2022 for product 3: " ,
      round(dec_2022$mean_q3*1), "sales"),1)

n4=head(paste("units to be sold on 31 dec 2022 for product 4: " ,
      round(dec_2022$mean_q4*1), "sales"),1)

nn=rbind(n1,n2,n3,n4)

rownames(nn)=c("","","","")
colnames(nn)=""
nn

#Comparative analysis
revenues=sales %>%
  select(s_p1,s_p2,s_p3,s_p4) %>%
  colSums()

nsales=sales %>%
  select(q_p1,q_p2,q_p3,q_p4) %>%
  colSums()

product_name=c("product 1","product 2","product 3","product 4")

df=as.data.frame(cbind(product_name,revenues,nsales))

bar_sales=df %>% ggplot(aes(product_name,nsales,fill=product_name))+
  geom_bar(stat="identity") + theme_bw() +
  labs(title="Distribution of total sales ",y="sales",
       x="type of product")

bar_revenues=df %>% ggplot(aes(product_name,y=revenues,fill=product_name))+
  geom_bar(stat="identity")+ theme_bw() +
  labs(title="Distribution of total revenues ",y="revenues",
       x="type of product")

grid.arrange(bar_sales,bar_revenues,ncol=1)

#forecast

library(prophet)

sales_p1= sales %>%
  select(date,q_p1)

sales_p1=sales_p1 %>%
  rename(ds=date,y=q_p1)

model_p1=prophet(sales_p1)

future_p_1_sales=make_future_dataframe(model_p1,periods = 12,freq = "month")

forecasted_p1=predict(model_p1,future_p_1_sales)

forecast_p1_sales=forecasted_p1 %>%
  filter(ds>="2023-03-03") %>%
  select(ds,yhat) %>%
  rename("date"=ds,"estimated product 1 sales"=yhat)

##############################################################

sales_p2= sales %>%
  select(date,q_p2)

sales_p2=sales_p2 %>%
  rename(ds=date,y=q_p2)

model_p2=prophet(sales_p2)

future_p_2_sales=make_future_dataframe(model_p2,periods = 12,freq = "month")

forecasted_p2=predict(model_p2,future_p_2_sales)

forecast_p2_sales=forecasted_p2 %>%
  filter(ds>="2023-03-03") %>%
  select(yhat) %>%
  rename("estimated product 2 sales"=yhat)

##########################################################################

sales_p3= sales %>%
  select(date,q_p3)

sales_p3=sales_p3 %>%
  rename(ds=date,y=q_p3)

model_p3=prophet(sales_p3)

future_p_3_sales=make_future_dataframe(model_p3,periods = 12,freq = "month")

forecasted_p3=predict(model_p3,future_p_3_sales)

forecast_p3_sales=forecasted_p3 %>%
  filter(ds>="2023-03-03") %>%
  select(yhat) %>%
  rename("estimated product 3 sales"=yhat)

##########################################################################

sales_p4= sales %>%
  select(date,q_p4)

sales_p4=sales_p4 %>%
  rename(ds=date,y=q_p4)

model_p4=prophet(sales_p4)

future_p_4_sales=make_future_dataframe(model_p4,periods = 12,freq = "month")

forecasted_p4=predict(model_p4,future_p_4_sales)

forecast_p4_sales=forecasted_p4 %>%
  filter(ds>="2023-03-03") %>%
  select(yhat) %>%
  rename("estimated product 4 sales"=yhat)

####################################################################

#ESTIMATED REVENUES

revenues_p1= sales %>%
  select(date,s_p1)

revenues_p1=revenues_p1 %>%
  rename(ds=date,y=s_p1)

model_revenues_p1=prophet(revenues_p1)

future_p_1_revenues=make_future_dataframe(model_revenues_p1,periods = 12,freq = "month")

forecasted_p1_revenues=predict(model_revenues_p1,future_p_1_revenues)

forecast_p1_revenues=forecasted_p1_revenues %>%
  filter(ds>="2023-03-03") %>%
  select(yhat) %>%
  rename("estimated product 1 revenues"=yhat)

##################################################################3

revenues_p2= sales %>%
  select(date,s_p2)

revenues_p2=revenues_p2 %>%
  rename(ds=date,y=s_p2)

model_revenues_p2=prophet(revenues_p2)

future_p_2_revenues=make_future_dataframe(model_revenues_p2,periods = 12,freq = "month")

forecasted_p2_revenues=predict(model_revenues_p2,future_p_2_revenues)

forecast_p2_revenues=forecasted_p2_revenues %>%
  filter(ds>="2023-03-03") %>%
  select(yhat) %>%
  rename("estimated product 2 revenues"=yhat)

######################################################################


revenues_p3= sales %>%
  select(date,s_p3)

revenues_p3=revenues_p3 %>%
  rename(ds=date,y=s_p3)

model_revenues_p3=prophet(revenues_p3)

future_p_3_revenues=make_future_dataframe(model_revenues_p3,periods = 12,freq = "month")

forecasted_p3_revenues=predict(model_revenues_p3,future_p_3_revenues)

forecast_p3_revenues=forecasted_p3_revenues %>%
  filter(ds>="2023-03-03") %>%
  select(yhat) %>%
  rename("estimated product 3 revenues"=yhat)

##########################################################################3



revenues_p4= sales %>%
  select(date,s_p4)

revenues_p4=revenues_p4 %>%
  rename(ds=date,y=s_p4)

model_revenues_p4=prophet(revenues_p4)

future_p_4_revenues=make_future_dataframe(model_revenues_p4,periods = 12,freq = "month")

forecasted_p4_revenues=predict(model_revenues_p4,future_p_4_revenues)

forecast_p4_revenues=forecasted_p4_revenues %>%
  filter(ds>="2023-03-03") %>%
  select(yhat) %>%
  rename("estimated product 4 revenues"=yhat)

#####################################################################

summary_forecast=cbind(forecast_p1_sales,forecast_p2_sales,
                       forecast_p3_sales,forecast_p4_sales,
                       forecast_p1_revenues,forecast_p2_revenues,
                       forecast_p3_revenues,forecast_p4_revenues)

library(knitr)

kable(summary_forecast)
#Correlation
library(corrplot)

co=sales %>%
  select(q_p1,q_p2,q_p3,q_p4,
                 s_p1,s_p2,s_p3,s_p4)

co_matrix=cor(co)

corrplot(co_matrix,method = "number",col="black",
         type = "lower",bg="gold2",number.digits = 3,
         number.cex = 0.6,tl.col ="red2",title = "Correlation",
         mar = c(2,2,2,2))