Pepper Pirate Paradise Ltd.Β Analysis

Date : 23/03/2024

Author : Omar Soub


Introduction


Data Set and Data types view


Data Set (1st five rows) :

week_start_dt week_end_dt vietnam_season p_color price total_volume brazil india vietnam indonesia china brazil_season indonesia_season india_season china_season jordan_max_price jordan_min_price demand supply
2015-12-28 2016-01-03 2 green 6.599075 1596040 10793.5 65658.1 1519589 0 0 False False False False 6.625 6.325 0.50 0.1650
2015-12-28 2016-01-03 2 red 7.175335 1596040 10793.5 65658.1 1519589 0 0 False False False False 7.525 7.125 0.51 0.1683
2015-12-28 2016-01-03 2 yellow 7.300575 1596040 10793.5 65658.1 1519589 0 0 False False False False 7.425 7.025 16.55 5.4615
2016-01-04 2016-01-10 2 yellow 7.379675 2295578 5677.8 15274.4 2274626 0 0 False False False False 7.525 7.025 271.16 89.4828
2016-01-04 2016-01-10 2 red 7.175335 2295578 5677.8 15274.4 2274626 0 0 False False False False 7.625 7.125 42.33 13.9689
2016-01-04 2016-01-10 2 green 6.599075 2295578 5677.8 15274.4 2274626 0 0 False False False False 6.625 6.325 0.58 0.1914

data set dimensions

## [1] 1215   19

Data types and some details using glimps function

## Rows: 1,215
## Columns: 19
## $ week_start_dt    <chr> "2015-12-28", "2015-12-28", "2015-12-28", "2016-01-04…
## $ week_end_dt      <chr> "2016-01-03", "2016-01-03", "2016-01-03", "2016-01-10…
## $ vietnam_season   <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ p_color          <chr> "green", "red", "yellow", "yellow", "red", "green", "…
## $ price            <dbl> 6.599075, 7.175335, 7.300575, 7.379675, 7.175335, 6.5…
## $ total_volume     <dbl> 1596040, 1596040, 1596040, 2295578, 2295578, 2295578,…
## $ brazil           <dbl> 10793.5, 10793.5, 10793.5, 5677.8, 5677.8, 5677.8, 26…
## $ india            <dbl> 65658.1, 65658.1, 65658.1, 15274.4, 15274.4, 15274.4,…
## $ vietnam          <dbl> 1519589, 1519589, 1519589, 2274626, 2274626, 2274626,…
## $ indonesia        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ china            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ brazil_season    <chr> "False", "False", "False", "False", "False", "False",…
## $ indonesia_season <chr> "False", "False", "False", "False", "False", "False",…
## $ india_season     <chr> "False", "False", "False", "False", "False", "False",…
## $ china_season     <chr> "False", "False", "False", "False", "False", "False",…
## $ jordan_max_price <dbl> 6.625, 7.525, 7.425, 7.525, 7.625, 6.625, 7.525, 7.62…
## $ jordan_min_price <dbl> 6.325, 7.125, 7.025, 7.025, 7.125, 6.325, 7.125, 7.32…
## $ demand           <dbl> 0.50, 0.51, 16.55, 271.16, 42.33, 0.58, 712.22, 28.52…
## $ supply           <dbl> 0.1650, 0.1683, 5.4615, 89.4828, 13.9689, 0.1914, 235…

Data types and some details using skim function

Data summary
Name Piped data
Number of rows 1215
Number of columns 19
_______________________
Column type frequency:
Date 2
factor 6
numeric 11
________________________
Group variables None

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
week_start_dt 0 1 2015-12-28 2023-09-25 2019-11-11 405
week_end_dt 0 1 2016-01-03 2023-10-01 2019-11-17 405

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
vietnam_season 0 1 FALSE 3 2: 486, 1: 414, 3: 315
p_color 0 1 FALSE 3 gre: 405, red: 405, yel: 405
brazil_season 0 1 FALSE 2 Fal: 690, Tru: 525
indonesia_season 0 1 FALSE 2 Fal: 903, Tru: 312
india_season 0 1 FALSE 2 Fal: 912, Tru: 303
china_season 0 1 FALSE 2 Fal: 801, Tru: 414

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
price 0 1 8.01 1.37 6.30 7.00 7.60 8.60 14.08 ▇▃▂▁▁
total_volume 0 1 2244701.29 482013.00 325471.00 1938245.70 2254902.30 2541122.80 3555978.80 ▁▂▇▇▂
brazil 0 1 245303.03 238532.96 0.00 10608.60 175493.10 443550.60 877647.80 ▇▂▃▂▁
india 0 1 28744.10 55904.81 0.00 0.00 0.00 31022.60 348997.40 ▇▁▁▁▁
vietnam 0 1 1806529.53 573830.71 220121.50 1402233.80 1785218.90 2204094.80 3549175.00 ▁▆▇▅▁
indonesia 0 1 154948.08 243242.28 0.00 0.00 2150.30 278469.20 1061050.80 ▇▂▁▁▁
china 0 1 8487.64 21366.04 0.00 0.00 0.00 7257.40 210920.20 ▇▁▁▁▁
jordan_max_price 0 1 8.51 1.49 6.22 7.42 8.03 9.22 14.22 ▇▇▃▁▁
jordan_min_price 0 1 7.93 1.37 6.03 6.92 7.53 8.53 13.72 ▇▅▂▁▁
demand 0 1 229.51 225.82 0.50 75.82 117.73 391.72 1342.04 ▇▂▁▁▁
supply 0 1 94.53 113.63 0.16 18.06 50.44 118.67 616.65 ▇▁▁▁▁


Define categorical variables

## [1] "vietnam_season"   "p_color"          "brazil_season"    "indonesia_season"
## [5] "india_season"     "china_season"

Define numerical variables

##  [1] "price"            "total_volume"     "brazil"           "india"           
##  [5] "vietnam"          "indonesia"        "china"            "jordan_max_price"
##  [9] "jordan_min_price" "demand"           "supply"

Categorical variables EDA

Categorical variables distribution

vietnam_season p_color brazil_season indonesia_season india_season china_season
1:414 green :405 False:690 False:903 False:912 False:801
2:486 red :405 True :525 True :312 True :303 True :414
3:315 yellow:405 NA NA NA NA

Categorical variables Visualization



Numerical variables EDA

Numerical variables distribution

price total_volume brazil india vietnam indonesia china jordan_max_price jordan_min_price demand supply
Min. : 6.300 Min. : 325471 Min. : 0 Min. : 0 Min. : 220122 Min. : 0 Min. : 0 Min. : 6.225 Min. : 6.025 Min. : 0.50 Min. : 0.165
1st Qu.: 7.000 1st Qu.:1938246 1st Qu.: 10609 1st Qu.: 0 1st Qu.:1402234 1st Qu.: 0 1st Qu.: 0 1st Qu.: 7.425 1st Qu.: 6.925 1st Qu.: 75.82 1st Qu.: 18.057
Median : 7.600 Median :2254902 Median :175493 Median : 0 Median :1785219 Median : 2150 Median : 0 Median : 8.025 Median : 7.525 Median : 117.73 Median : 50.440
Mean : 8.009 Mean :2244701 Mean :245303 Mean : 28744 Mean :1806530 Mean : 154948 Mean : 8488 Mean : 8.509 Mean : 7.927 Mean : 229.51 Mean : 94.526
3rd Qu.: 8.601 3rd Qu.:2541123 3rd Qu.:443551 3rd Qu.: 31023 3rd Qu.:2204095 3rd Qu.: 278469 3rd Qu.: 7257 3rd Qu.: 9.225 3rd Qu.: 8.525 3rd Qu.: 391.73 3rd Qu.:118.673
Max. :14.085 Max. :3555979 Max. :877648 Max. :348997 Max. :3549175 Max. :1061051 Max. :210920 Max. :14.225 Max. :13.725 Max. :1342.04 Max. :616.654

Numerical variables Visualization


Visualizing The Relation between The numerical variables :



Visualizing The Relation between The Price variable and other numerical variables :

We can see from the above charts that the relationship between the variables is not linear except when it comes to supply and demand variables)


The correlation between The numerical variables :

price total_volume brazil india vietnam indonesia china jordan_max_price jordan_min_price demand supply
price 1.0000000 -0.1122322 0.2782183 -0.0264911 -0.3517192 0.3648408 0.0641676 0.9720657 0.9714315 0.4298274 0.3163057
total_volume -0.1122322 1.0000000 0.1855243 -0.3535011 0.6964246 0.0542286 0.3501470 -0.1220329 -0.0969187 0.1125441 0.2829057
brazil 0.2782183 0.1855243 1.0000000 -0.6312192 -0.3813353 0.5656901 0.0864702 0.2423081 0.2533731 0.0296134 0.0247692
india -0.0264911 -0.3535011 -0.6312192 1.0000000 0.0638375 -0.4172503 -0.2993819 0.0003487 -0.0140451 -0.0934378 -0.1787898
vietnam -0.3517192 0.6964246 -0.3813353 0.0638375 1.0000000 -0.5616759 0.0860247 -0.3545445 -0.3258363 0.1054322 0.2330524
indonesia 0.3648408 0.0542286 0.5656901 -0.4172503 -0.5616759 1.0000000 0.2968556 0.3586799 0.3509917 -0.0014067 0.0182960
china 0.0641676 0.3501470 0.0864702 -0.2993819 0.0860247 0.2968556 1.0000000 0.0652780 0.0698367 0.0123709 0.3439685
jordan_max_price 0.9720657 -0.1220329 0.2423081 0.0003487 -0.3545445 0.3586799 0.0652780 1.0000000 0.9811579 0.4037138 0.2897715
jordan_min_price 0.9714315 -0.0969187 0.2533731 -0.0140451 -0.3258363 0.3509917 0.0698367 0.9811579 1.0000000 0.4340305 0.3253103
demand 0.4298274 0.1125441 0.0296134 -0.0934378 0.1054322 -0.0014067 0.0123709 0.4037138 0.4340305 1.0000000 0.7224376
supply 0.3163057 0.2829057 0.0247692 -0.1787898 0.2330524 0.0182960 0.3439685 0.2897715 0.3253103 0.7224376 1.0000000


E. inferential analysis

E.1: Price by pepper_color

p_color count mean sd min Q1 median Q3 max
green 405 7.215042 0.8224482 6.299625 6.699645 6.900000 7.539580 12.78481
red 405 7.882143 1.1570314 6.449990 7.100440 7.529990 8.350220 13.68538
yellow 405 8.929220 1.4515797 7.031635 7.799655 8.600075 9.899575 14.08459

## 
##  Kruskal-Wallis rank sum test
## 
## data:  price by p_color
## Kruskal-Wallis chi-squared = 403.28, df = 2, p-value < 2.2e-16

As the The p-value is less than 0.05 , we will reject the null hypothesis,indicating that there are significant differences in Price Distribution among the p_color groups.

## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  df$price and df$p_color 
## 
##        green  red   
## red    <2e-16 -     
## yellow <2e-16 <2e-16
## 
## P value adjustment method: bonferroni

As we can see there is significant difference in Price Distributions when we compare all groups with each other as the p-value adjusted is less than 0.05


Final Conclusions:


## * correlation type   : generic 
## * variable type      : numeric 
## * correlation method : spearman 
## 
## * Matrix of Correlation
##                        price total_volume      brazil       india     vietnam
## price             1.00000000  -0.11773512  0.39102760 -0.02911878 -0.42545151
## total_volume     -0.11773512   1.00000000  0.18552426 -0.35350112  0.69642455
## brazil            0.39102760   0.18552426  1.00000000 -0.63121918 -0.38133535
## india            -0.02911878  -0.35350112 -0.63121918  1.00000000  0.06383747
## vietnam          -0.42545151   0.69642455 -0.38133535  0.06383747  1.00000000
## indonesia         0.44305817   0.05422860  0.56569009 -0.41725026 -0.56167595
## china             0.04146590   0.35014697  0.08647015 -0.29938190  0.08602471
## jordan_max_price  0.95870207  -0.13044284  0.36527989 -0.02090337 -0.43071881
## jordan_min_price  0.95948166  -0.08864000  0.37605574 -0.02631894 -0.38687813
## demand            0.23520333   0.09411858  0.13588593 -0.10557995  0.05014064
## supply           -0.05711386   0.31601899  0.04627478 -0.24572925  0.24080334
##                    indonesia       china jordan_max_price jordan_min_price
## price             0.44305817  0.04146590       0.95870207       0.95948166
## total_volume      0.05422860  0.35014697      -0.13044284      -0.08864000
## brazil            0.56569009  0.08647015       0.36527989       0.37605574
## india            -0.41725026 -0.29938190      -0.02090337      -0.02631894
## vietnam          -0.56167595  0.08602471      -0.43071881      -0.38687813
## indonesia         1.00000000  0.29685559       0.43969053       0.43366220
## china             0.29685559  1.00000000       0.03600303       0.03418858
## jordan_max_price  0.43969053  0.03600303       1.00000000       0.97063336
## jordan_min_price  0.43366220  0.03418858       0.97063336       1.00000000
## demand            0.09461007 -0.02611732       0.25008227       0.26102531
## supply            0.06695820  0.49385444      -0.03733959      -0.02360919
##                       demand      supply
## price             0.23520333 -0.05711386
## total_volume      0.09411858  0.31601899
## brazil            0.13588593  0.04627478
## india            -0.10557995 -0.24572925
## vietnam           0.05014064  0.24080334
## indonesia         0.09461007  0.06695820
## china            -0.02611732  0.49385444
## jordan_max_price  0.25008227 -0.03733959
## jordan_min_price  0.26102531 -0.02360919
## demand            1.00000000  0.25795877
## supply            0.25795877  1.00000000
price total_volume brazil india vietnam indonesia china jordan_max_price jordan_min_price demand supply
price 1.0000000 -0.1177351 0.3910276 -0.0291188 -0.4254515 0.4430582 0.0414659 0.9587021 0.9594817 0.2352033 -0.0571139
total_volume -0.1177351 1.0000000 0.1855243 -0.3535011 0.6964246 0.0542286 0.3501470 -0.1304428 -0.0886400 0.0941186 0.3160190
brazil 0.3910276 0.1855243 1.0000000 -0.6312192 -0.3813353 0.5656901 0.0864702 0.3652799 0.3760557 0.1358859 0.0462748
india -0.0291188 -0.3535011 -0.6312192 1.0000000 0.0638375 -0.4172503 -0.2993819 -0.0209034 -0.0263189 -0.1055800 -0.2457293
vietnam -0.4254515 0.6964246 -0.3813353 0.0638375 1.0000000 -0.5616759 0.0860247 -0.4307188 -0.3868781 0.0501406 0.2408033
indonesia 0.4430582 0.0542286 0.5656901 -0.4172503 -0.5616759 1.0000000 0.2968556 0.4396905 0.4336622 0.0946101 0.0669582
china 0.0414659 0.3501470 0.0864702 -0.2993819 0.0860247 0.2968556 1.0000000 0.0360030 0.0341886 -0.0261173 0.4938544
jordan_max_price 0.9587021 -0.1304428 0.3652799 -0.0209034 -0.4307188 0.4396905 0.0360030 1.0000000 0.9706334 0.2500823 -0.0373396
jordan_min_price 0.9594817 -0.0886400 0.3760557 -0.0263189 -0.3868781 0.4336622 0.0341886 0.9706334 1.0000000 0.2610253 -0.0236092
demand 0.2352033 0.0941186 0.1358859 -0.1055800 0.0501406 0.0946101 -0.0261173 0.2500823 0.2610253 1.0000000 0.2579588
supply -0.0571139 0.3160190 0.0462748 -0.2457293 0.2408033 0.0669582 0.4938544 -0.0373396 -0.0236092 0.2579588 1.0000000