Pepper Pirate Paradise
Ltd.Β Analysis
Date : 23/03/2024
Author : Omar Soub
Introduction
- On this chapter ,we will use R-Programming Language in order to
conduct descriptive and inferential analysis to analyse the Prices of
Bell Peppers of different colors (green, red, yellow) to help decision
making
Data Set and Data types
view
Data Set (1st five rows) :
| 2015-12-28 |
2016-01-03 |
2 |
green |
6.599075 |
1596040 |
10793.5 |
65658.1 |
1519589 |
0 |
0 |
False |
False |
False |
False |
6.625 |
6.325 |
0.50 |
0.1650 |
| 2015-12-28 |
2016-01-03 |
2 |
red |
7.175335 |
1596040 |
10793.5 |
65658.1 |
1519589 |
0 |
0 |
False |
False |
False |
False |
7.525 |
7.125 |
0.51 |
0.1683 |
| 2015-12-28 |
2016-01-03 |
2 |
yellow |
7.300575 |
1596040 |
10793.5 |
65658.1 |
1519589 |
0 |
0 |
False |
False |
False |
False |
7.425 |
7.025 |
16.55 |
5.4615 |
| 2016-01-04 |
2016-01-10 |
2 |
yellow |
7.379675 |
2295578 |
5677.8 |
15274.4 |
2274626 |
0 |
0 |
False |
False |
False |
False |
7.525 |
7.025 |
271.16 |
89.4828 |
| 2016-01-04 |
2016-01-10 |
2 |
red |
7.175335 |
2295578 |
5677.8 |
15274.4 |
2274626 |
0 |
0 |
False |
False |
False |
False |
7.625 |
7.125 |
42.33 |
13.9689 |
| 2016-01-04 |
2016-01-10 |
2 |
green |
6.599075 |
2295578 |
5677.8 |
15274.4 |
2274626 |
0 |
0 |
False |
False |
False |
False |
6.625 |
6.325 |
0.58 |
0.1914 |
data set dimensions
## [1] 1215 19
Data
types and some details using glimps function
## Rows: 1,215
## Columns: 19
## $ week_start_dt <chr> "2015-12-28", "2015-12-28", "2015-12-28", "2016-01-04β¦
## $ week_end_dt <chr> "2016-01-03", "2016-01-03", "2016-01-03", "2016-01-10β¦
## $ vietnam_season <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,β¦
## $ p_color <chr> "green", "red", "yellow", "yellow", "red", "green", "β¦
## $ price <dbl> 6.599075, 7.175335, 7.300575, 7.379675, 7.175335, 6.5β¦
## $ total_volume <dbl> 1596040, 1596040, 1596040, 2295578, 2295578, 2295578,β¦
## $ brazil <dbl> 10793.5, 10793.5, 10793.5, 5677.8, 5677.8, 5677.8, 26β¦
## $ india <dbl> 65658.1, 65658.1, 65658.1, 15274.4, 15274.4, 15274.4,β¦
## $ vietnam <dbl> 1519589, 1519589, 1519589, 2274626, 2274626, 2274626,β¦
## $ indonesia <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,β¦
## $ china <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,β¦
## $ brazil_season <chr> "False", "False", "False", "False", "False", "False",β¦
## $ indonesia_season <chr> "False", "False", "False", "False", "False", "False",β¦
## $ india_season <chr> "False", "False", "False", "False", "False", "False",β¦
## $ china_season <chr> "False", "False", "False", "False", "False", "False",β¦
## $ jordan_max_price <dbl> 6.625, 7.525, 7.425, 7.525, 7.625, 6.625, 7.525, 7.62β¦
## $ jordan_min_price <dbl> 6.325, 7.125, 7.025, 7.025, 7.125, 6.325, 7.125, 7.32β¦
## $ demand <dbl> 0.50, 0.51, 16.55, 271.16, 42.33, 0.58, 712.22, 28.52β¦
## $ supply <dbl> 0.1650, 0.1683, 5.4615, 89.4828, 13.9689, 0.1914, 235β¦
- converting (week_start_dt,week_end_dt) to date
factors,converting (vietnam_season,p_color,brazil_season,
,india_season,china_season) to factors
Data types
and some details using skim function
Data summary
| Name |
Piped data |
| Number of rows |
1215 |
| Number of columns |
19 |
| _______________________ |
|
| Column type frequency: |
|
| Date |
2 |
| factor |
6 |
| numeric |
11 |
| ________________________ |
|
| Group variables |
None |
Variable type: Date
| week_start_dt |
0 |
1 |
2015-12-28 |
2023-09-25 |
2019-11-11 |
405 |
| week_end_dt |
0 |
1 |
2016-01-03 |
2023-10-01 |
2019-11-17 |
405 |
Variable type: factor
| vietnam_season |
0 |
1 |
FALSE |
3 |
2: 486, 1: 414, 3: 315 |
| p_color |
0 |
1 |
FALSE |
3 |
gre: 405, red: 405, yel: 405 |
| brazil_season |
0 |
1 |
FALSE |
2 |
Fal: 690, Tru: 525 |
| indonesia_season |
0 |
1 |
FALSE |
2 |
Fal: 903, Tru: 312 |
| india_season |
0 |
1 |
FALSE |
2 |
Fal: 912, Tru: 303 |
| china_season |
0 |
1 |
FALSE |
2 |
Fal: 801, Tru: 414 |
Variable type: numeric
| price |
0 |
1 |
8.01 |
1.37 |
6.30 |
7.00 |
7.60 |
8.60 |
14.08 |
βββββ |
| total_volume |
0 |
1 |
2244701.29 |
482013.00 |
325471.00 |
1938245.70 |
2254902.30 |
2541122.80 |
3555978.80 |
βββββ |
| brazil |
0 |
1 |
245303.03 |
238532.96 |
0.00 |
10608.60 |
175493.10 |
443550.60 |
877647.80 |
βββββ |
| india |
0 |
1 |
28744.10 |
55904.81 |
0.00 |
0.00 |
0.00 |
31022.60 |
348997.40 |
βββββ |
| vietnam |
0 |
1 |
1806529.53 |
573830.71 |
220121.50 |
1402233.80 |
1785218.90 |
2204094.80 |
3549175.00 |
ββββ
β |
| indonesia |
0 |
1 |
154948.08 |
243242.28 |
0.00 |
0.00 |
2150.30 |
278469.20 |
1061050.80 |
βββββ |
| china |
0 |
1 |
8487.64 |
21366.04 |
0.00 |
0.00 |
0.00 |
7257.40 |
210920.20 |
βββββ |
| jordan_max_price |
0 |
1 |
8.51 |
1.49 |
6.22 |
7.42 |
8.03 |
9.22 |
14.22 |
βββββ |
| jordan_min_price |
0 |
1 |
7.93 |
1.37 |
6.03 |
6.92 |
7.53 |
8.53 |
13.72 |
ββ
βββ |
| demand |
0 |
1 |
229.51 |
225.82 |
0.50 |
75.82 |
117.73 |
391.72 |
1342.04 |
βββββ |
| supply |
0 |
1 |
94.53 |
113.63 |
0.16 |
18.06 |
50.44 |
118.67 |
616.65 |
βββββ |

Define categorical
variables
## [1] "vietnam_season" "p_color" "brazil_season" "indonesia_season"
## [5] "india_season" "china_season"
Define numerical
variables
## [1] "price" "total_volume" "brazil" "india"
## [5] "vietnam" "indonesia" "china" "jordan_max_price"
## [9] "jordan_min_price" "demand" "supply"
Categorical variables
EDA
Categorical variables
distribution
|
1:414 |
green :405 |
False:690 |
False:903 |
False:912 |
False:801 |
|
2:486 |
red :405 |
True :525 |
True :312 |
True :303 |
True :414 |
|
3:315 |
yellow:405 |
NA |
NA |
NA |
NA |
Categorical variables
Visualization


Numerical variables EDA
Numerical variables
distribution
|
Min. : 6.300 |
Min. : 325471 |
Min. : 0 |
Min. : 0 |
Min. : 220122 |
Min. : 0 |
Min. : 0 |
Min. : 6.225 |
Min. : 6.025 |
Min. : 0.50 |
Min. : 0.165 |
|
1st Qu.: 7.000 |
1st Qu.:1938246 |
1st Qu.: 10609 |
1st Qu.: 0 |
1st Qu.:1402234 |
1st Qu.: 0 |
1st Qu.: 0 |
1st Qu.: 7.425 |
1st Qu.: 6.925 |
1st Qu.: 75.82 |
1st Qu.: 18.057 |
|
Median : 7.600 |
Median :2254902 |
Median :175493 |
Median : 0 |
Median :1785219 |
Median : 2150 |
Median : 0 |
Median : 8.025 |
Median : 7.525 |
Median : 117.73 |
Median : 50.440 |
|
Mean : 8.009 |
Mean :2244701 |
Mean :245303 |
Mean : 28744 |
Mean :1806530 |
Mean : 154948 |
Mean : 8488 |
Mean : 8.509 |
Mean : 7.927 |
Mean : 229.51 |
Mean : 94.526 |
|
3rd Qu.: 8.601 |
3rd Qu.:2541123 |
3rd Qu.:443551 |
3rd Qu.: 31023 |
3rd Qu.:2204095 |
3rd Qu.: 278469 |
3rd Qu.: 7257 |
3rd Qu.: 9.225 |
3rd Qu.: 8.525 |
3rd Qu.: 391.73 |
3rd Qu.:118.673 |
|
Max. :14.085 |
Max. :3555979 |
Max. :877648 |
Max. :348997 |
Max. :3549175 |
Max. :1061051 |
Max. :210920 |
Max. :14.225 |
Max. :13.725 |
Max. :1342.04 |
Max. :616.654 |
Numerical variables
Visualization




Visualizing
The Relation between The numerical variables :

Visualizing
The Relation between The Price variable and other numerical variables
:

We can see from the above charts that the relationship between
the variables is not linear except when it comes to supply and demand
variables)
The
correlation between The numerical variables :
- as the data is not normally distributed we will apply spearman
methods
| price |
1.0000000 |
-0.1122322 |
0.2782183 |
-0.0264911 |
-0.3517192 |
0.3648408 |
0.0641676 |
0.9720657 |
0.9714315 |
0.4298274 |
0.3163057 |
| total_volume |
-0.1122322 |
1.0000000 |
0.1855243 |
-0.3535011 |
0.6964246 |
0.0542286 |
0.3501470 |
-0.1220329 |
-0.0969187 |
0.1125441 |
0.2829057 |
| brazil |
0.2782183 |
0.1855243 |
1.0000000 |
-0.6312192 |
-0.3813353 |
0.5656901 |
0.0864702 |
0.2423081 |
0.2533731 |
0.0296134 |
0.0247692 |
| india |
-0.0264911 |
-0.3535011 |
-0.6312192 |
1.0000000 |
0.0638375 |
-0.4172503 |
-0.2993819 |
0.0003487 |
-0.0140451 |
-0.0934378 |
-0.1787898 |
| vietnam |
-0.3517192 |
0.6964246 |
-0.3813353 |
0.0638375 |
1.0000000 |
-0.5616759 |
0.0860247 |
-0.3545445 |
-0.3258363 |
0.1054322 |
0.2330524 |
| indonesia |
0.3648408 |
0.0542286 |
0.5656901 |
-0.4172503 |
-0.5616759 |
1.0000000 |
0.2968556 |
0.3586799 |
0.3509917 |
-0.0014067 |
0.0182960 |
| china |
0.0641676 |
0.3501470 |
0.0864702 |
-0.2993819 |
0.0860247 |
0.2968556 |
1.0000000 |
0.0652780 |
0.0698367 |
0.0123709 |
0.3439685 |
| jordan_max_price |
0.9720657 |
-0.1220329 |
0.2423081 |
0.0003487 |
-0.3545445 |
0.3586799 |
0.0652780 |
1.0000000 |
0.9811579 |
0.4037138 |
0.2897715 |
| jordan_min_price |
0.9714315 |
-0.0969187 |
0.2533731 |
-0.0140451 |
-0.3258363 |
0.3509917 |
0.0698367 |
0.9811579 |
1.0000000 |
0.4340305 |
0.3253103 |
| demand |
0.4298274 |
0.1125441 |
0.0296134 |
-0.0934378 |
0.1054322 |
-0.0014067 |
0.0123709 |
0.4037138 |
0.4340305 |
1.0000000 |
0.7224376 |
| supply |
0.3163057 |
0.2829057 |
0.0247692 |
-0.1787898 |
0.2330524 |
0.0182960 |
0.3439685 |
0.2897715 |
0.3253103 |
0.7224376 |
1.0000000 |
- Visualizing The correlation

E. inferential analysis
- as out target is to understand the price variable for each
pepper_color we will only focus on these two variables
E.1: Price by
pepper_color
- E.1.1 : Price by number of doors-Basic description
| green |
405 |
7.215042 |
0.8224482 |
6.299625 |
6.699645 |
6.900000 |
7.539580 |
12.78481 |
| red |
405 |
7.882143 |
1.1570314 |
6.449990 |
7.100440 |
7.529990 |
8.350220 |
13.68538 |
| yellow |
405 |
8.929220 |
1.4515797 |
7.031635 |
7.799655 |
8.600075 |
9.899575 |
14.08459 |

- E.1.2 :Applying kruskal.test (for variables with more than two
groups) to find out if there are significant differences in Price
Distribution among the p_color groups :
##
## Kruskal-Wallis rank sum test
##
## data: price by p_color
## Kruskal-Wallis chi-squared = 403.28, df = 2, p-value < 2.2e-16
As the The p-value is less than 0.05 , we will reject the null
hypothesis,indicating that there are significant differences in Price
Distribution among the p_color groups.
- E.1.3 :Visualizing The Statistical test

- E.1.4: To find out this difference in details: we will apply
pairwise.wilcox.test
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: df$price and df$p_color
##
## green red
## red <2e-16 -
## yellow <2e-16 <2e-16
##
## P value adjustment method: bonferroni
As we can see there is significant difference in Price
Distributions when we compare all groups with each other as the p-value
adjusted is less than 0.05
Final Conclusions:
- 1. Bell Peppers of different colors (green, red, yellow) do
significantly affect the price distribution in the data set.



## * correlation type : generic
## * variable type : numeric
## * correlation method : spearman
##
## * Matrix of Correlation
## price total_volume brazil india vietnam
## price 1.00000000 -0.11773512 0.39102760 -0.02911878 -0.42545151
## total_volume -0.11773512 1.00000000 0.18552426 -0.35350112 0.69642455
## brazil 0.39102760 0.18552426 1.00000000 -0.63121918 -0.38133535
## india -0.02911878 -0.35350112 -0.63121918 1.00000000 0.06383747
## vietnam -0.42545151 0.69642455 -0.38133535 0.06383747 1.00000000
## indonesia 0.44305817 0.05422860 0.56569009 -0.41725026 -0.56167595
## china 0.04146590 0.35014697 0.08647015 -0.29938190 0.08602471
## jordan_max_price 0.95870207 -0.13044284 0.36527989 -0.02090337 -0.43071881
## jordan_min_price 0.95948166 -0.08864000 0.37605574 -0.02631894 -0.38687813
## demand 0.23520333 0.09411858 0.13588593 -0.10557995 0.05014064
## supply -0.05711386 0.31601899 0.04627478 -0.24572925 0.24080334
## indonesia china jordan_max_price jordan_min_price
## price 0.44305817 0.04146590 0.95870207 0.95948166
## total_volume 0.05422860 0.35014697 -0.13044284 -0.08864000
## brazil 0.56569009 0.08647015 0.36527989 0.37605574
## india -0.41725026 -0.29938190 -0.02090337 -0.02631894
## vietnam -0.56167595 0.08602471 -0.43071881 -0.38687813
## indonesia 1.00000000 0.29685559 0.43969053 0.43366220
## china 0.29685559 1.00000000 0.03600303 0.03418858
## jordan_max_price 0.43969053 0.03600303 1.00000000 0.97063336
## jordan_min_price 0.43366220 0.03418858 0.97063336 1.00000000
## demand 0.09461007 -0.02611732 0.25008227 0.26102531
## supply 0.06695820 0.49385444 -0.03733959 -0.02360919
## demand supply
## price 0.23520333 -0.05711386
## total_volume 0.09411858 0.31601899
## brazil 0.13588593 0.04627478
## india -0.10557995 -0.24572925
## vietnam 0.05014064 0.24080334
## indonesia 0.09461007 0.06695820
## china -0.02611732 0.49385444
## jordan_max_price 0.25008227 -0.03733959
## jordan_min_price 0.26102531 -0.02360919
## demand 1.00000000 0.25795877
## supply 0.25795877 1.00000000
| price |
1.0000000 |
-0.1177351 |
0.3910276 |
-0.0291188 |
-0.4254515 |
0.4430582 |
0.0414659 |
0.9587021 |
0.9594817 |
0.2352033 |
-0.0571139 |
| total_volume |
-0.1177351 |
1.0000000 |
0.1855243 |
-0.3535011 |
0.6964246 |
0.0542286 |
0.3501470 |
-0.1304428 |
-0.0886400 |
0.0941186 |
0.3160190 |
| brazil |
0.3910276 |
0.1855243 |
1.0000000 |
-0.6312192 |
-0.3813353 |
0.5656901 |
0.0864702 |
0.3652799 |
0.3760557 |
0.1358859 |
0.0462748 |
| india |
-0.0291188 |
-0.3535011 |
-0.6312192 |
1.0000000 |
0.0638375 |
-0.4172503 |
-0.2993819 |
-0.0209034 |
-0.0263189 |
-0.1055800 |
-0.2457293 |
| vietnam |
-0.4254515 |
0.6964246 |
-0.3813353 |
0.0638375 |
1.0000000 |
-0.5616759 |
0.0860247 |
-0.4307188 |
-0.3868781 |
0.0501406 |
0.2408033 |
| indonesia |
0.4430582 |
0.0542286 |
0.5656901 |
-0.4172503 |
-0.5616759 |
1.0000000 |
0.2968556 |
0.4396905 |
0.4336622 |
0.0946101 |
0.0669582 |
| china |
0.0414659 |
0.3501470 |
0.0864702 |
-0.2993819 |
0.0860247 |
0.2968556 |
1.0000000 |
0.0360030 |
0.0341886 |
-0.0261173 |
0.4938544 |
| jordan_max_price |
0.9587021 |
-0.1304428 |
0.3652799 |
-0.0209034 |
-0.4307188 |
0.4396905 |
0.0360030 |
1.0000000 |
0.9706334 |
0.2500823 |
-0.0373396 |
| jordan_min_price |
0.9594817 |
-0.0886400 |
0.3760557 |
-0.0263189 |
-0.3868781 |
0.4336622 |
0.0341886 |
0.9706334 |
1.0000000 |
0.2610253 |
-0.0236092 |
| demand |
0.2352033 |
0.0941186 |
0.1358859 |
-0.1055800 |
0.0501406 |
0.0946101 |
-0.0261173 |
0.2500823 |
0.2610253 |
1.0000000 |
0.2579588 |
| supply |
-0.0571139 |
0.3160190 |
0.0462748 |
-0.2457293 |
0.2408033 |
0.0669582 |
0.4938544 |
-0.0373396 |
-0.0236092 |
0.2579588 |
1.0000000 |