##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Before uploading the data bases, I did a short cleaning where I standardized the language to English, rows and column. The names of the products had different labels since they are sold under different names on Mexico and the US. I change the months into numbers since it is easier to work that way for visualization and I renamed the columns of the wages databases to Workers and Wages because when you insert a database with a column name like Wages (USD), R takes them as Wages.(USD) and then when you try to rename them, you have problem because r is searching for a function.
Sales_US = read.xlsx("C:\\Users\\diego\\Desktop\\Data Analysis\\US clean.xlsx")
Workers_US = read.xlsx("C:\\Users\\diego\\Desktop\\Data Analysis\\US Workers.xlsx")
Sales_Mex = read.xlsx("C:\\Users\\diego\\Desktop\\Data Analysis\\Mexico Clean.xlsx")
Workers_Mex = read.xlsx("C:\\Users\\diego\\Desktop\\Data Analysis\\Mexico Workers.xlsx")## Rows: 36
## Columns: 8
## $ Month <dbl> 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, …
## $ Region <chr> "Region 1", "Region 2", "Region 3", "Region 1", "Region 2", …
## $ CrunchyCo <dbl> 76497, 66952, 65904, 80363, 78209, 75641, 72684, 64834, 7844…
## $ SpicyCo <dbl> 85593, 112069, 104656, 90866, 71201, 76728, 90539, 77073, 89…
## $ SaltCo <dbl> 88029, 68042, 48051, 67132, 63599, 91169, 69360, 90898, 8679…
## $ BBQCo <dbl> 41790, 37184, 15998, 45087, 52459, 48019, 45939, 46842, 2046…
## $ SalsaCo <dbl> 60876, 12250, 44202, 61019, 46366, 30994, 35893, 52129, 4860…
## $ CheeseCo <dbl> 1341.788, 1201.579, 2461.596, 1577.102, 3958.764, 7227.917, …
## Rows: 40
## Columns: 4
## $ Month <chr> "January", "January", "January", "February", "February", "Febr…
## $ Region <chr> "Region 1", "Region 2", "Region 3", "Region 1", "Region 2", "R…
## $ Workers <dbl> 119.00, 119.84, 137.14, 150.65, 206.00, 249.35, 262.27, 304.05…
## $ Wage <dbl> 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 11000, …
## Rows: 36
## Columns: 9
## $ Row <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ Month <dbl> 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, …
## $ Region <chr> "Region 1", "Region 2", "Region 3", "Region 1", "Region 2", …
## $ CrunchyCo <dbl> 4497, 3747, 6664, 6620, 4514, 7194, 8865, 8934, 9031, 9737, …
## $ SpicyCo <dbl> 4448, 6385, 5239, 7928, 7650, 12709, 10368, 12499, 11857, 10…
## $ SaltCo <dbl> NA, NA, 3518, 4012, 5599, 4550, 4762, 6086, 4698, 4503, 5276…
## $ BBQCo <dbl> NA, NA, 1737, 3739, 3028, 2844, 3798, 1198, 2432, 2400, 1515…
## $ SalsaCo <dbl> 5949, 7781, 7322, 7401, 5771, 6597, 8494, 8791, 6566, 8848, …
## $ CheeseCo <dbl> 3472.779, 1591.780, 3164.546, 8544.659, 8321.350, 1434.584, …
## Rows: 20
## Columns: 2
## $ Workers <dbl> 54, 59, 65, 80, 84, 93, 111, 137, 132, 121, 109, 99, 93, 77, 6…
## $ Wage <dbl> 5000, 8000, 11000, 14000, 17000, 20000, 23000, 26000, 29000, 3…
## 'data.frame': 36 obs. of 8 variables:
## $ Month : num 1 1 1 2 2 2 3 3 3 4 ...
## $ Region : chr "Region 1" "Region 2" "Region 3" "Region 1" ...
## $ CrunchyCo: num 76497 66952 65904 80363 78209 ...
## $ SpicyCo : num 85593 112069 104656 90866 71201 ...
## $ SaltCo : num 88029 68042 48051 67132 63599 ...
## $ BBQCo : num 41790 37184 15998 45087 52459 ...
## $ SalsaCo : num 60876 12250 44202 61019 46366 ...
## $ CheeseCo : num 1342 1202 2462 1577 3959 ...
## 'data.frame': 40 obs. of 4 variables:
## $ Month : chr "January" "January" "January" "February" ...
## $ Region : chr "Region 1" "Region 2" "Region 3" "Region 1" ...
## $ Workers: num 119 120 137 151 206 ...
## $ Wage : num 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 ...
## 'data.frame': 36 obs. of 9 variables:
## $ Row : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Month : num 1 1 1 2 2 2 3 3 3 4 ...
## $ Region : chr "Region 1" "Region 2" "Region 3" "Region 1" ...
## $ CrunchyCo: num 4497 3747 6664 6620 4514 ...
## $ SpicyCo : num 4448 6385 5239 7928 7650 ...
## $ SaltCo : num NA NA 3518 4012 5599 ...
## $ BBQCo : num NA NA 1737 3739 3028 ...
## $ SalsaCo : num 5949 7781 7322 7401 5771 ...
## $ CheeseCo : num 3473 1592 3165 8545 8321 ...
## 'data.frame': 20 obs. of 2 variables:
## $ Workers: num 54 59 65 80 84 93 111 137 132 121 ...
## $ Wage : num 5000 8000 11000 14000 17000 20000 23000 26000 29000 32000 ...
We have the data of the sales by region and month We have the information of the wages by region and month in the US but not in Mexico We have an extra column of row on data base of Mexico We have some NA on the Mexico’s database
## Month Region CrunchyCo SpicyCo SaltCo BBQCo SalsaCo CheeseCo
## 0 0 0 0 0 0 0 3
## Month Region Workers Wage
## 4 4 0 0
## Row Month Region CrunchyCo SpicyCo SaltCo BBQCo SalsaCo
## 0 0 0 1 0 2 2 0
## CheeseCo
## 3
## Workers Wage
## 0 0
We have 4 Na’s on month and region on the us workers database, I’m gonna skip those Na’s for now since I need to do a further analysis of the data and I’m probably gonna erase those columns since I don’t have that information on the workers database from Mexico.
We have NA’S on the sales of the Mexican database and now in the US data base, those need to be taken care of now, there are several ways to do this, but the most accurate in this case is probably to fill those Na’s with the average from the sales of that product to avoid having and impact on the statistics
Sales_Mex_Clean <- Sales_Mex
Sales_Mex_Clean$CrunchyCo[is.na(Sales_Mex_Clean$CrunchyCo)] <- mean(Sales_Mex_Clean$CrunchyCo, na.rm=TRUE)
Sales_Mex_Clean$SaltCo[is.na(Sales_Mex_Clean$SaltCo)] <- mean(Sales_Mex_Clean$SaltCo, na.rm=TRUE)
Sales_Mex_Clean$BBQCo[is.na(Sales_Mex_Clean$BBQCo)] <- mean(Sales_Mex_Clean$BBQCo, na.rm=TRUE)
Sales_Mex_Clean$CheeseCo[is.na(Sales_Mex_Clean$CheeseCo)] <- mean(Sales_Mex_Clean$CheeseCo , na.rm=TRUE)
sapply(Sales_Mex_Clean, function(x)sum (is.na(x)))## Row Month Region CrunchyCo SpicyCo SaltCo BBQCo SalsaCo
## 0 0 0 0 0 0 0 0
## CheeseCo
## 0
Sales_US_Clean <- Sales_US
Sales_US_Clean$CheeseCo[is.na(Sales_US_Clean$CheeseCo)] <- mean(Sales_US_Clean$CheeseCo , na.rm=TRUE)
sapply(Sales_US_Clean, function(x)sum (is.na(x)))## Month Region CrunchyCo SpicyCo SaltCo BBQCo SalsaCo CheeseCo
## 0 0 0 0 0 0 0 0
Now we replace those values with the average and we start doing some statistical analysis with the sales of both US and Mexico
MX_SALESxMONTH <- Sales_Mex_Clean %>% group_by(Month)
MX_SALESxMONTH <- MX_SALESxMONTH %>%
mutate(Total = CrunchyCo + SpicyCo + SaltCo + BBQCo + SalsaCo + CheeseCo)
#Create a column with total sales
MX_SALESxMONTH %>%
summarize(mean(CrunchyCo),
sd(CrunchyCo),
median(CrunchyCo),
sum(CrunchyCo),
quantile(CrunchyCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(CrunchyCo)` `sd(CrunchyCo)` `median(CrunchyCo)` `sum(CrunchyCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 4969. 1515. 4497 14908
## 2 2 6109. 1411. 6620 18328
## 3 3 8943. 83.4 8934 26830
## 4 4 9074 579. 8817 27222
## 5 5 7890 1300. 7793 23670
## 6 6 8327. 2539. 9695 24982
## 7 7 6399 863. 6313 19197
## 8 8 6877. 1319. 6897 20630
## 9 9 8036 2037. 8402 24108
## 10 10 8082. 158. 8070 24245
## 11 11 8000. 852. 7784 23999
## 12 12 8551. 820. 9017 25654.
## # ℹ 2 more variables: `quantile(CrunchyCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxMONTH %>%
summarize(mean(SpicyCo),
sd(SpicyCo),
median(SpicyCo),
sum(SpicyCo),
quantile(SpicyCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(SpicyCo)` `sd(SpicyCo)` `median(SpicyCo)` `sum(SpicyCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 5357. 974. 5239 16072
## 2 2 9429 2844. 7928 28287
## 3 3 11575. 1093. 11857 34724
## 4 4 10282. 2599. 10220 30847
## 5 5 8521. 1195. 7842 25562
## 6 6 9751. 1662. 9924 29252
## 7 7 10701 1123. 10150 32103
## 8 8 12149. 1971. 12749 36446
## 9 9 8133. 1475. 7400 24398
## 10 10 10805. 1641. 10172 32416
## 11 11 10830. 2038. 11889 32491
## 12 12 10858 1655. 11410 32574
## # ℹ 2 more variables: `quantile(SpicyCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxMONTH %>%
summarize(mean(SaltCo),
sd(SaltCo),
median(SaltCo),
sum(SaltCo),
quantile(SaltCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(SaltCo)` `sd(SaltCo)` `median(SaltCo)` `sum(SaltCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 4684. 1010. 5267. 14052.
## 2 2 4720. 807. 4550 14161
## 3 3 5182 784. 4762 15546
## 4 4 5140 581. 5276 15420
## 5 5 5841. 442. 5672 17522
## 6 6 5172. 1337. 4568 15517
## 7 7 6232. 1161. 6836 18695
## 8 8 5474 1831. 4445 16422
## 9 9 5713 860. 6028 17139
## 10 10 5469. 1291. 5131 16406
## 11 11 4418 50.5 4410 13254
## 12 12 5158 1113. 5030 15474
## # ℹ 2 more variables: `quantile(SaltCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxMONTH %>%
summarize(mean(BBQCo),
sd(BBQCo),
median(BBQCo),
sum(BBQCo),
quantile(BBQCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(BBQCo)` `sd(BBQCo)` `median(BBQCo)` `sum(BBQCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2456. 622. 2815. 7367.
## 2 2 3204. 473. 3028 9611
## 3 3 2476 1301. 2432 7428
## 4 4 1680. 654. 1515 5039
## 5 5 3444. 857. 3115 10331
## 6 6 2279. 695. 2016 6838
## 7 7 2564. 1127. 1920 7691
## 8 8 3019. 1629. 3774 9058
## 9 9 3426. 984. 3920 10277
## 10 10 4147. 1111. 4592 12440
## 11 11 3307 1014. 3593 9921
## 12 12 1777 281. 1870 5331
## # ℹ 2 more variables: `quantile(BBQCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxMONTH %>%
summarize(mean(SalsaCo),
sd(SalsaCo),
median(SalsaCo),
sum(SalsaCo),
quantile(SalsaCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(SalsaCo)` `sd(SalsaCo)` `median(SalsaCo)` `sum(SalsaCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 7017. 953. 7322 21052
## 2 2 6590. 815. 6597 19769
## 3 3 7950. 1208. 8494 23851
## 4 4 8091 1274. 8805 24273
## 5 5 8051 2269. 7724 24153
## 6 6 7760. 795. 7950 23281
## 7 7 8762 1324. 8223 26286
## 8 8 9003. 2590. 9900 27010
## 9 9 6782. 1521. 7201 20345
## 10 10 7994 2053. 7845 23982
## 11 11 7640. 1730. 7701 22921
## 12 12 7021 1319. 7037 21063
## # ℹ 2 more variables: `quantile(SalsaCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxMONTH %>%
summarize(mean(CheeseCo),
sd(CheeseCo),
median(CheeseCo),
sum(CheeseCo),
quantile(CheeseCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(CheeseCo)` `sd(CheeseCo)` `median(CheeseCo)` `sum(CheeseCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2743. 1009. 3165. 8229.
## 2 2 6100. 4042. 8321. 18301.
## 3 3 3496. 1449. 2826. 10488.
## 4 4 6799. 4114. 7333. 20397.
## 5 5 5284. 3631. 4501. 15852.
## 6 6 10234. 320. 10091. 30703.
## 7 7 5780. 3249. 4170. 17340.
## 8 8 7565. 4072. 7507. 22696.
## 9 9 10343. 1365. 10034. 31028.
## 10 10 5820. 5319. 5199. 17459.
## 11 11 3688. 3257. 2938. 11063.
## 12 12 6168. 0 6168. 18505.
## # ℹ 2 more variables: `quantile(CheeseCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxMONTH %>%
summarize(mean(Total),
sd(Total),
median(Total),
sum(Total),
quantile(Total, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(Total)` `sd(Total)` `median(Total)` `sum(Total)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 27226. 674. 27586. 81679.
## 2 2 36152. 1826. 35329. 108457.
## 3 3 39622. 2265. 40334. 118867.
## 4 4 41066. 3289. 40414. 123198.
## 5 5 39030. 4682. 37611. 117090.
## 6 6 43524. 1078. 43296. 130573.
## 7 7 40437. 2568. 40634. 121312.
## 8 8 44087. 2359. 45293. 132262.
## 9 9 42432. 4396. 40884. 127295.
## 10 10 42316. 5023. 42550. 126948.
## 11 11 37883. 7586. 39532. 113649.
## 12 12 39534. 2748. 40927. 118601.
## # ℹ 2 more variables: `quantile(Total, (0.9))` <dbl>, `n()` <int>
ggplot(data = MX_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = CrunchyCo)) +
geom_col() + labs(title = 'Sales of CrunchyCo by Month in Mexico') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = MX_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = SpicyCo)) +
geom_col() + labs(title = 'Sales of SpicyCo by Month in Mexico') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = MX_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = SaltCo)) +
geom_col() + labs(title = 'Sales of SaltCo by Month in Mexico') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = MX_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = BBQCo)) +
geom_col() + labs(title = 'Sales of BBQCo by Month in Mexico') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = MX_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = SalsaCo)) +
geom_col() + labs(title = 'Sales of SalsaCo by Month in Mexico') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = MX_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = CheeseCo)) +
geom_col() + labs(title = 'Sales of CheeseCo by Month in Mexico') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
CrunchyCo: It is sell the most is April (27,222) and the least in January (14,908) SpicyCo: It is sell the most in August (36,446) and the least in January (16,072) SaltCo: It is sell the most in July (18,695) and the least in November(13,254) BBQCo: It is sell the most in October (12,440) and the least in April(5,039) SalsaCo: It is sell the most in August (27,010) and the least in February (19,769) CheeseCo: It is sell the most in September (31,028) and the least in January (8,229)
ggplot(data = Lastmonthsales_MX, mapping = aes(x = Month, y = Total)) +
geom_col() + labs(title = 'Total sales in the last 3 Months in Mexico') MX_SALESxREGION <- Sales_Mex_Clean %>% group_by(Region)
MX_SALESxREGION <- MX_SALESxREGION %>%
mutate(Total = CrunchyCo + SpicyCo + SaltCo + BBQCo + SalsaCo)
#Create a column with total sales
MX_SALESxREGION %>%
summarize(mean(CrunchyCo),
sd(CrunchyCo),
median(CrunchyCo),
sum(CrunchyCo),
quantile(CrunchyCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(CrunchyCo)` `sd(CrunchyCo)` `median(CrunchyCo)` `sum(CrunchyCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region… 7784. 1694. 8058. 93406
## 2 Region… 6971. 1942. 7443 83650
## 3 Region… 8060. 1063. 7788. 96717.
## # ℹ 2 more variables: `quantile(CrunchyCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxREGION %>%
summarize(mean(SpicyCo),
sd(SpicyCo),
median(SpicyCo),
sum(SpicyCo),
quantile(SpicyCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(SpicyCo)` `sd(SpicyCo)` `median(SpicyCo)` `sum(SpicyCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 9319. 2211. 9936. 111825
## 2 Region 2 9897. 2366. 10746. 118768
## 3 Region 3 10382. 2439. 10025 124579
## # ℹ 2 more variables: `quantile(SpicyCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxREGION %>%
summarize(mean(SaltCo),
sd(SaltCo),
median(SaltCo),
sum(SaltCo),
quantile(SaltCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(SaltCo)` `sd(SaltCo)` `median(SaltCo)` `sum(SaltCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 5078. 863. 4896 60935.
## 2 Region 2 5035. 712. 5080. 60421.
## 3 Region 3 5688. 1271. 5985 68252
## # ℹ 2 more variables: `quantile(SaltCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxREGION %>%
summarize(mean(BBQCo),
sd(BBQCo),
median(BBQCo),
sum(BBQCo),
quantile(BBQCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(BBQCo)` `sd(BBQCo)` `median(BBQCo)` `sum(BBQCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 2908. 1145. 2807. 34902.
## 2 Region 2 2890. 1024. 2955 34676.
## 3 Region 3 2646. 1116. 2306. 31754
## # ℹ 2 more variables: `quantile(BBQCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxREGION %>%
summarize(mean(SalsaCo),
sd(SalsaCo),
median(SalsaCo),
sum(SalsaCo),
quantile(SalsaCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(SalsaCo)` `sd(SalsaCo)` `median(SalsaCo)` `sum(SalsaCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 7711. 1366. 7886. 92529
## 2 Region 2 8120. 1803. 7872. 97443
## 3 Region 3 7334. 1317. 7119 88014
## # ℹ 2 more variables: `quantile(SalsaCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxREGION %>%
summarize(mean(CheeseCo),
sd(CheeseCo),
median(CheeseCo),
sum(CheeseCo),
quantile(CheeseCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(CheeseCo)` `sd(CheeseCo)` `median(CheeseCo)` `sum(CheeseCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 6561. 3350. 6711. 78736.
## 2 Region 2 6745. 3668. 7245. 80939.
## 3 Region 3 5199. 3487. 4685. 62385.
## # ℹ 2 more variables: `quantile(CheeseCo, (0.9))` <dbl>, `n()` <int>
MX_SALESxREGION %>%
summarize(mean(Total),
sd(Total),
median(Total),
sum(Total),
quantile(Total, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(Total)` `sd(Total)` `median(Total)` `sum(Total)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 32800. 3913. 33334. 393597.
## 2 Region 2 32913. 4165. 34442. 394958.
## 3 Region 3 34110. 3912. 34872. 409316.
## # ℹ 2 more variables: `quantile(Total, (0.9))` <dbl>, `n()` <int>
ggplot(data = MX_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = CrunchyCo)) +
geom_col() + labs(title = 'Sales of CrunchyCo by Region in Mexico') ggplot(data = MX_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = SpicyCo)) +
geom_col() + labs(title = 'Sales of SpicyCo by Region in Mexico')ggplot(data = MX_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = SaltCo)) +
geom_col() + labs(title = 'Sales of SaltCo by Region in Mexico') ggplot(data = MX_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = BBQCo)) +
geom_col() + labs(title = 'Sales of BBQCo by Region in Mexico') ggplot(data = MX_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = SalsaCo)) +
geom_col() + labs(title = 'Sales of SalsaCo by Region in Mexico') ggplot(data = MX_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = CheeseCo)) +
geom_col() + labs(title = 'Sales of CheeseCo by Region in Mexico') CrunchyCo: It is sell the most in Region 3 (96,717) and the least in Region 2 (83,650) SpicyCo: It is sell the most in Region 3 (124,579) and the least in Region 1 (111,825) SaltCo: It is sell the most in Region 3 (68,252) and the least in Region 2 (60,421) BBQCo: It is sell the most in Region 1 (34,902) and the least in Region 3 (31,754) SalsaCo: It is sell the most in Region 2 (97,443) and the least in Region 3 (88,014) CheeseCo: It is sell the most in Region 2 (80,938) and the least in Region 3 (62,385)
area_graph_with_points <- ggplot(MX_SALESxMONTH %>% group_by(Month) , aes(x = Month, y = Total, fill = Region, color = Region)) +
geom_line(size = 1) + # Add a line on top of the areas
geom_point(size = 3, shape = 21, fill = "white") + # Add points with a white fill for contrast
labs(title = "Total Sales by Month and Region in Mexico", x = "Month", y = "Total") +
theme_minimal() + scale_x_discrete(limit = c(1:12)) # Minimalist theme for a clean look## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
We can see that Region 2 is probably the most versatile region because of the unexplanable pattern it shows We can also observe that in April the sales in region 2 where considerable lower than in the other regions, furthermore in August which was one of the months with the higher sales, region 1 dint contributed as much as the other regions to those sales We can also observed that in November, region 3 had considerable lower sales than the other regions
US_SALESxMONTH <- Sales_US_Clean %>% group_by(Month)
US_SALESxMONTH <- US_SALESxMONTH %>%
mutate(Total = CrunchyCo + SpicyCo + SaltCo + BBQCo + SalsaCo)
#Create a column with total sales
US_SALESxMONTH %>%
summarize(mean(CrunchyCo),
sd(CrunchyCo),
median(CrunchyCo),
sum(CrunchyCo),
quantile(CrunchyCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(CrunchyCo)` `sd(CrunchyCo)` `median(CrunchyCo)` `sum(CrunchyCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 69784. 5837. 66952 209353
## 2 2 78071 2364. 78209 234213
## 3 3 71986 6830. 72684 215958
## 4 4 73843. 739. 74063 221530
## 5 5 72723. 3276. 70877 218168
## 6 6 72101. 13925. 68731 216302
## 7 7 81189 20514. 73836 243567
## 8 8 76304. 10106. 73736 228911
## 9 9 72422. 10622. 77082 217267
## 10 10 72624. 10813. 67819 217871
## 11 11 62118. 9934. 60375 186353
## 12 12 88816 15121. 85599 266448
## # ℹ 2 more variables: `quantile(CrunchyCo, (0.9))` <dbl>, `n()` <int>
US_SALESxMONTH %>%
summarize(mean(SpicyCo),
sd(SpicyCo),
median(SpicyCo),
sum(SpicyCo),
quantile(SpicyCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(SpicyCo)` `sd(SpicyCo)` `median(SpicyCo)` `sum(SpicyCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 100773. 13659. 104656 302318
## 2 2 79598. 10142. 76728 238795
## 3 3 85822 7585. 89854 257466
## 4 4 88752 15054. 93874 266256
## 5 5 74417 20442. 70298 223251
## 6 6 92024. 5073. 89749 276073
## 7 7 86320. 8028. 83132 258961
## 8 8 86049. 3525. 84722 258146
## 9 9 76433. 4797. 76043 229298
## 10 10 84993. 9965. 83199 254980
## 11 11 77520. 5206. 77088 232561
## 12 12 92927. 14535. 90252 278780
## # ℹ 2 more variables: `quantile(SpicyCo, (0.9))` <dbl>, `n()` <int>
US_SALESxMONTH %>%
summarize(mean(SaltCo),
sd(SaltCo),
median(SaltCo),
sum(SaltCo),
quantile(SaltCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(SaltCo)` `sd(SaltCo)` `median(SaltCo)` `sum(SaltCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 68041. 19989. 68042 204122
## 2 2 73967. 15002. 67132 221900
## 3 3 82351. 11436. 86796 247054
## 4 4 83283 6678. 84979 249849
## 5 5 63126. 9868. 62715 189377
## 6 6 65820. 17008. 61247 197459
## 7 7 78669. 1345. 79442 236006
## 8 8 69469 8442. 71410 208407
## 9 9 69460. 17114. 64726 208381
## 10 10 72080 12306. 78737 216240
## 11 11 68578. 8631. 68213 205733
## 12 12 70672. 9142. 72167 212015
## # ℹ 2 more variables: `quantile(SaltCo, (0.9))` <dbl>, `n()` <int>
US_SALESxMONTH %>%
summarize(mean(BBQCo),
sd(BBQCo),
median(BBQCo),
sum(BBQCo),
quantile(BBQCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(BBQCo)` `sd(BBQCo)` `median(BBQCo)` `sum(BBQCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 31657. 13756. 37184 94972
## 2 2 48522. 3712. 48019 145565
## 3 3 37747. 14977. 45939 113242
## 4 4 35274 14293. 38690 105822
## 5 5 29982. 13746. 23468 89945
## 6 6 18594. 4115. 17039 55781
## 7 7 51288. 5749. 50682 153863
## 8 8 43500. 6576. 45055 130501
## 9 9 23086. 10839. 17077 69259
## 10 10 31293. 12852. 36450 93878
## 11 11 38278. 11784. 37051 114835
## 12 12 32023. 13801. 25704 96070
## # ℹ 2 more variables: `quantile(BBQCo, (0.9))` <dbl>, `n()` <int>
US_SALESxMONTH %>%
summarize(mean(SalsaCo),
sd(SalsaCo),
median(SalsaCo),
sum(SalsaCo),
quantile(SalsaCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(SalsaCo)` `sd(SalsaCo)` `median(SalsaCo)` `sum(SalsaCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 39109. 24710. 44202 117328
## 2 2 46126. 15014. 46366 138379
## 3 3 45542. 8540. 48605 136627
## 4 4 38623 13812. 33381 115869
## 5 5 60102. 4200. 58823 180305
## 6 6 40827 16191. 32548 122481
## 7 7 62382. 9954. 61760 187145
## 8 8 48769. 13442. 49887 146306
## 9 9 51045. 10800. 54254 153136
## 10 10 47125. 17107. 47325 141376
## 11 11 36962 1286. 36809 110886
## 12 12 55992 8301. 59500 167976
## # ℹ 2 more variables: `quantile(SalsaCo, (0.9))` <dbl>, `n()` <int>
US_SALESxMONTH %>%
summarize(mean(CheeseCo),
sd(CheeseCo),
median(CheeseCo),
sum(CheeseCo),
quantile(CheeseCo, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(CheeseCo)` `sd(CheeseCo)` `median(CheeseCo)` `sum(CheeseCo)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1668. 691. 1342. 5005.
## 2 2 4255. 2837. 3959. 12764.
## 3 3 6033. 1851. 5670. 18100.
## 4 4 2756. 938. 2312. 8267.
## 5 5 8417. 691. 8596. 25251.
## 6 6 6222. 4289. 4276. 18666.
## 7 7 4716. 2698. 4720. 14147.
## 8 8 8467. 3243. 9749. 25401.
## 9 9 10940. 1317. 11541. 32820.
## 10 10 9293. 2871. 10117. 27878.
## 11 11 8273. 4805. 10913. 24818.
## 12 12 6458. 0 6458. 19374.
## # ℹ 2 more variables: `quantile(CheeseCo, (0.9))` <dbl>, `n()` <int>
US_SALESxMONTH %>%
summarize(mean(Total),
sd(Total),
median(Total),
sum(Total),
quantile(Total, (.90)),
n())## # A tibble: 12 × 7
## Month `mean(Total)` `sd(Total)` `median(Total)` `sum(Total)`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 309364. 38629. 296497 928093
## 2 2 326284 16634. 322551 978852
## 3 3 323449 8702. 324156 970347
## 4 4 319775. 28882. 327620 959326
## 5 5 300349. 25559. 300506 901046
## 6 6 289365. 32886. 287798 868096
## 7 7 359847. 43009. 339845 1079542
## 8 8 324090. 13382. 330678 972271
## 9 9 292447 22249. 297147 877341
## 10 10 308115 55838. 307264 924345
## 11 11 283456 13589. 284327 850368
## 12 12 340430. 40835. 321929 1021289
## # ℹ 2 more variables: `quantile(Total, (0.9))` <dbl>, `n()` <int>
ggplot(data = US_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = CrunchyCo)) +
geom_col() + labs(title = 'Sales of CrunchyCo by Month in US') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = US_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = SpicyCo)) +
geom_col() + labs(title = 'Sales of SpicyCo by Month in US') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = US_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = SaltCo)) +
geom_col() + labs(title = 'Sales of SaltCo by Month in US') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = US_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = BBQCo)) +
geom_col() + labs(title = 'Sales of BBQCo by Month in US') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = US_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = SalsaCo)) +
geom_col() + labs(title = 'Sales of SalsaCo by Month in US') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
ggplot(data = US_SALESxMONTH %>% group_by(Month), mapping = aes(x = Month, y = CheeseCo)) +
geom_col() + labs(title = 'Sales of CheeseCo by Month in US') + scale_x_discrete(limit = c(1:12))## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
CrunchyCo: It is sell the most is December (266,448) and the least in November (186,353) SpicyCo: It is sell the most in January (302,318) and the least in May (223,251) SaltCo: It is sell the most in April (249,849) and the least in May(189,377) BBQCo: It is sell the most in July (153,863) and the least in June(55,781) SalsaCo: It is sell the most in July (187,145) and the least in November (110,886) CheeseCo: It is sell the most in September (32,819) and the least in November (5,005)
Sales are much lower on the new product
ggplot(data = Lastmonthsales_US, mapping = aes(x = Month, y = Total)) +
geom_col() + labs(title = 'Total sales in the last 3 Months in US') US_SALESxREGION <- Sales_US_Clean %>% group_by(Region)
US_SALESxREGION <- US_SALESxREGION %>%
mutate(Total = CrunchyCo + SpicyCo + SaltCo + BBQCo + SalsaCo)
#Create a column with total sales
US_SALESxREGION %>%
summarize(mean(CrunchyCo),
sd(CrunchyCo),
median(CrunchyCo),
sum(CrunchyCo),
quantile(CrunchyCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(CrunchyCo)` `sd(CrunchyCo)` `median(CrunchyCo)` `sum(CrunchyCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region… 75151. 12333. 73374. 901816
## 2 Region… 74237. 9608. 75005 890841
## 3 Region… 73607 11415. 72914. 883284
## # ℹ 2 more variables: `quantile(CrunchyCo, (0.9))` <dbl>, `n()` <int>
US_SALESxREGION %>%
summarize(mean(SpicyCo),
sd(SpicyCo),
median(SpicyCo),
sum(SpicyCo),
quantile(SpicyCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(SpicyCo)` `sd(SpicyCo)` `median(SpicyCo)` `sum(SpicyCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 81825. 10930. 84486. 981901
## 2 Region 2 88601. 11810. 83927 1063207
## 3 Region 3 85981. 12505. 86474 1031777
## # ℹ 2 more variables: `quantile(SpicyCo, (0.9))` <dbl>, `n()` <int>
US_SALESxREGION %>%
summarize(mean(SaltCo),
sd(SaltCo),
median(SaltCo),
sum(SaltCo),
quantile(SaltCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(SaltCo)` `sd(SaltCo)` `median(SaltCo)` `sum(SaltCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 74525. 9836. 74398. 894298
## 2 Region 2 71954. 9513. 72680 863442
## 3 Region 3 69900. 16126. 68671 838803
## # ℹ 2 more variables: `quantile(SaltCo, (0.9))` <dbl>, `n()` <int>
US_SALESxREGION %>%
summarize(mean(BBQCo),
sd(BBQCo),
median(BBQCo),
sum(BBQCo),
quantile(BBQCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(BBQCo)` `sd(BBQCo)` `median(BBQCo)` `sum(BBQCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 36385. 13983. 39120 436623
## 2 Region 2 35847. 13290. 37937 430165
## 3 Region 3 33079. 13172. 31378. 396945
## # ℹ 2 more variables: `quantile(BBQCo, (0.9))` <dbl>, `n()` <int>
US_SALESxREGION %>%
summarize(mean(SalsaCo),
sd(SalsaCo),
median(SalsaCo),
sum(SalsaCo),
quantile(SalsaCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(SalsaCo)` `sd(SalsaCo)` `median(SalsaCo)` `sum(SalsaCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 51162. 13959. 54694. 613950
## 2 Region 2 48917. 14824. 53504. 587003
## 3 Region 3 43072. 12169. 41604. 516861
## # ℹ 2 more variables: `quantile(SalsaCo, (0.9))` <dbl>, `n()` <int>
US_SALESxREGION %>%
summarize(mean(CheeseCo),
sd(CheeseCo),
median(CheeseCo),
sum(CheeseCo),
quantile(CheeseCo, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(CheeseCo)` `sd(CheeseCo)` `median(CheeseCo)` `sum(CheeseCo)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 7407. 3859. 8004. 88878.
## 2 Region 2 5475. 3049. 5195. 65702.
## 3 Region 3 6493. 3423. 6843. 77910.
## # ℹ 2 more variables: `quantile(CheeseCo, (0.9))` <dbl>, `n()` <int>
US_SALESxREGION %>%
summarize(mean(Total),
sd(Total),
median(Total),
sum(Total),
quantile(Total, (.90)),
n())## # A tibble: 3 × 7
## Region `mean(Total)` `sd(Total)` `median(Total)` `sum(Total)`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Region 1 319049 34997. 307978 3828588
## 2 Region 2 319555. 26825. 324418 3834658
## 3 Region 3 305639. 39520. 317261 3667670
## # ℹ 2 more variables: `quantile(Total, (0.9))` <dbl>, `n()` <int>
ggplot(data = US_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = CrunchyCo)) +
geom_col() + labs(title = 'Sales of CrunchyCo by Region in US') ggplot(data = US_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = SpicyCo)) +
geom_col() + labs(title = 'Sales of SpicyCo by Region in US')ggplot(data = US_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = SaltCo)) +
geom_col() + labs(title = 'Sales of SaltCo by Region in US') ggplot(data = US_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = BBQCo)) +
geom_col() + labs(title = 'Sales of BBQCo by Region in US') ggplot(data = US_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = SalsaCo)) +
geom_col() + labs(title = 'Sales of SalsaCo by Region in US') ggplot(data = US_SALESxREGION %>% group_by(Region), mapping = aes(x = Region, y = CheeseCo)) +
geom_col() + labs(title = 'Sales of CheeseCo by Region in US') CrunchyCo: It is sell the most in Region 1 (901,816) and the least in Region 3 (883,284) SpicyCo: It is sell the most in Region 2 (1,063,207) and the least in Region 1 (981,901) SaltCo: It is sell the most in Region 1 (894,298) and the least in Region 3 (838,803) BBQCo: It is sell the most in Region 1 (436,623) and the least in Region 3 (396,945) SalsaCo: It is sell the most in Region 1 (613,950) and the least in Region 3 (516,861) CheeseCo: It is sell the most in Region 1 (88,878) and the least in Region 2 (65,702)
area_graph_with_points <- ggplot(US_SALESxMONTH %>% group_by(Month) , aes(x = Month, y = Total, fill = Region, color = Region)) +
geom_line(size = 1) + # Add a line on top of the areas
geom_point(size = 3, shape = 21, fill = "white") + # Add points with a white fill for contrast
labs(title = "Total Sales by Month and Region in Mexico", x = "Month", y = "Total") +
theme_minimal() + scale_x_discrete(limit = c(1:12)) # Minimalist theme for a clean look## Warning in scale_x_discrete(limit = c(1:12)): Continuous limits supplied to discrete scale.
## ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
We can observe that in July, which was the month with the most sales, Region 1 was the one with the highest contribution, we can also observed this is the highest sale point in the whole graph. We can also see that in October, region 2 made significant more sales than the other 2 regions and in December, region 3 made more sales. Lastly we can’t observed any significant pattern or behavior but it help us visualize the distribution of the sales