Correspondence Analysis & Wine
thematic::thematic_rmd(
bg = "#FDFEFE",
fg = "#2E7874",
accent = "#067A5C",
font = font_spec("Roboto Condensed", scale = 1),
qualitative = paletteer::paletteer_d("dutchmasters::pearl_earring"),
sequential = sequential_gradient(0.5, 0.75))Idea
My project pursues a practical goal to answer the vital and everlasting question — where is the best place to buy a bottle of wine? And which store should you go to if you have specific preferences for this drink? What will I find in the shop if I go there?
In these challenging times, it is becoming difficult to find a bottle of dry white wine from Portugal. Also, if you are planning to open your own alcohol business, then some of my conclusions may be useful to you.
I chose 3 alcohol distributors (AMwine – “Аромантный мир”, “K&B –”Красное и Белое”, “Winelab” – “Винлаб”) based on 3 factors:
- They are the leaders in the market
- They have a satisfactory filtering system on their website
- The shops are located in St. Petersburg
I chose the positions and countries in which I’m more interested + For which there is at least 1 option in each of the stores. So, no NAs (as better for comparability)
Data
wine <- read_excel("~/data1/WINE.xlsx")
row <- wine$...1
col <- names(wine[2:4])
winemat <- wine[-1]
names(winemat) <- NULL
winemat <- as.matrix(winemat)
colnames(winemat) <- col
row.names(winemat) <- row
winemat <- as.table(winemat)
kbl(winemat, align = "lll") %>%
kable_styling(bootstrap_options = c("striped","hover", "condensed"), font_size = 6, full_width = F)| K&B | Winelab | Amwine | |
|---|---|---|---|
| Portugal R | 7 | 2 | 26 |
| Portugal W | 8 | 2 | 15 |
| France R | 12 | 43 | 40 |
| France W | 10 | 35 | 26 |
| Italy R | 30 | 67 | 58 |
| Italy W | 22 | 49 | 63 |
| Spain R | 29 | 48 | 83 |
| Spain W | 17 | 19 | 43 |
| Georgia R | 26 | 22 | 22 |
| Georgia W | 7 | 12 | 7 |
| Russia R | 40 | 35 | 46 |
| Russia W | 30 | 35 | 41 |
| Chile W | 8 | 13 | 10 |
| Chile R | 7 | 21 | 19 |
| South Africa W | 5 | 5 | 10 |
| South Africa R | 3 | 4 | 19 |
| Argentina W | 2 | 5 | 2 |
| Argentina R | 4 | 7 | 10 |
W stands for white wine, R for red wine.
Visualization of contingency table
balloonplot(
t(winemat),
main = "Wine in Shops",
xlab = "",
ylab = "",
label = F,
show.margins = FALSE,
colmar=3,
rowmar=1,
text.size=1)At first look, it seems that one should not go to K&B at all. Let’s explore this further.
Chi-square test
chisq.test(winemat)##
## Pearson's Chi-squared test
##
## data: winemat
## X-squared = 95.836, df = 34, p-value = 8.453e-08
#chisq.test(winemat)$res
k = chisq.test(winemat)$stdres
kbl(k, align = "ccc") %>%
kable_styling(bootstrap_options = c("striped","hover", "condensed"), font_size = 8, full_width = F)| K&B | Winelab | Amwine | |
|---|---|---|---|
| Portugal R | -0.2460745 | -3.6287763 | 3.6792871 |
| Portugal W | 1.2637504 | -2.8111457 | 1.6423614 |
| France R | -2.2299898 | 2.3102122 | -0.3601715 |
| France W | -1.6017782 | 2.7130595 | -1.2676898 |
| Italy R | -0.7544155 | 2.4611247 | -1.7302022 |
| Italy W | -1.5685512 | 0.5480167 | 0.7779573 |
| Spain R | -1.1729490 | -1.2681246 | 2.1885310 |
| Spain W | -0.0380543 | -2.0095176 | 1.9559179 |
| Georgia R | 3.2303050 | -0.5466173 | -2.1594430 |
| Georgia W | 0.6544395 | 1.2700731 | -1.7597563 |
| Russia R | 3.1953219 | -1.3451606 | -1.3657038 |
| Russia W | 1.7278889 | -0.3228937 | -1.1258705 |
| Chile W | 0.5632998 | 0.8891029 | -1.3192448 |
| Chile R | -1.1527194 | 1.5060045 | -0.4847743 |
| South Africa W | 0.3621641 | -0.8960768 | 0.5572922 |
| South Africa R | -1.2694172 | -2.0670880 | 3.0337381 |
| Argentina W | 0.0389072 | 1.3377738 | -1.3133650 |
| Argentina R | -0.2963147 | -0.1079913 | 0.3495123 |
In our data, the variables in rows & columns are statistically significantly associated. Meaning that shops have a certain type of wine associated with their catalogs.
corrplot(t(chisq.test(winemat)$stdres), is.corr=FALSE)Well, now we see that K&B has a position in which it is the best option for consumers. Winelab while being a good provider of French wines is not the best option for my favorite dry white wine from Portugal. Interestingly, the geography of products in the store’s catalogs is very different.
CA
res.ca <- CA(winemat, graph = FALSE)
print(res.ca)## **Results of the Correspondence Analysis (CA)**
## The row variable has 18 categories; the column variable has 3 categories
## The chi square of independence between the two variables is equal to 95.83634 (p-value = 8.453298e-08 ).
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$col" "results for the columns"
## 3 "$col$coord" "coord. for the columns"
## 4 "$col$cos2" "cos2 for the columns"
## 5 "$col$contrib" "contributions of the columns"
## 6 "$row" "results for the rows"
## 7 "$row$coord" "coord. for the rows"
## 8 "$row$cos2" "cos2 for the rows"
## 9 "$row$contrib" "contributions of the rows"
## 10 "$call" "summary called parameters"
## 11 "$call$marge.col" "weights of the columns"
## 12 "$call$marge.row" "weights of the rows"
If the data were random:
Rows: eigenvalue = 1/17 (wine types - 1) ~ 0,059
Columns: the average axis should account = 1/2 (stores - 1) ~ 0,5
res.ca$eig## eigenvalue percentage of variance cumulative percentage of variance
## dim 1 0.04710405 60.50428 60.50428
## dim 2 0.03074837 39.49572 100.00000
fviz_screeplot(res.ca) +
geom_hline(yintercept = 50, linetype = 2, color = "#A65141FF")According to the this graph, only 1 dimension should be used in the solution. Since we don’t have much choice, we will continue to work with our two-dimensional scheme + we still need to look at all our stores, so nothing can be ruled out.
#fviz_ca_biplot(res.ca, repel = TRUE, col.row = "#394165FF", title = "CA Biplot for stores and wine")#fviz_ca_biplot(res.ca, repel = TRUE,
# map = "colprincipal",
# arrow = c(TRUE, TRUE), col.col = "#A65141FF", col.row = "#394165FF" )(Adding pictures in R seems to be illegally time-consuming, so did I add them in other app)
The angle between the arrows responsible for the store and the wine is sharp in the case of Russian red and white wine, Georgian red wine, and the K&B store, for AMwine — a similar situation with red wine from South Africa and Spain, for Winelab — with red Italian wine and white Argentine.
This sharp angle speaks of a strong association and this, in general, created for me as a buyer a picture of what the store’s assortment is, and what I will find on the shelves when I come there.
Contributions of rows
#corrplot(t(res.ca$row$contrib), is.corr = FALSE)The most contributing rows to Dim.1 is Red wine from Portugal & White wine from South Africa while for Dim.2 these are Georgian and Russian Red wine.
Column representation
fviz_cos2(res.ca, choice = "col", axes = 1:2)All our distributors represented well.
Conclusion:
1) strong association:
Russian red and white wine, Georgian red wine — K&B
Red wine from South Africa and Spain — AMwine
Red Italian wine and white Argentine wine — Winelab
2) In general, CA and visualization are enough to get a picture of the assortments within the selected stores.
3) Do I think that the first 20 lines of code and balloon plot would be enough to come to similar conclusions? Yes.
4) It seems to me that with the same amount of effort, it would be possible to come up with something that includes both a more sufficient comparison of stores and their description in one.
5) However, it is interesting to know that all stores with their assortment are associated with red wine. I don’t like him so much. And if I wanted to open my wine store, then making a marketing campaign based on the sale of lots and lots of good white wine would not be bad!