The data contains responses of users to three soft-drinks: Pepsi, 7Up and Coke. The attributes on which the response was given are Good for Snacks, Good with meals, Thirst quenching, etc.
library(knitr)
data = read.csv("soft_drinks.csv")
knitr::kable(data, caption = "Percentage of users who believe brand possesses the attribute.")
| Attribute | X7up | Coke | Pepsi |
|---|---|---|---|
| Good for snacks | 39 | 62 | 61 |
| Good with meals | 32 | 47 | 44 |
| For active, vital people | 38 | 60 | 66 |
| A drink my friends like | 30 | 55 | 53 |
| A good buy | 28 | 38 | 50 |
| A big bottle | 16 | 39 | 58 |
| Thirst quenching | 60 | 30 | 28 |
| Good tasting | 58 | 62 | 59 |
| For mixing | 66 | 18 | 4 |
Correlations are a good way to study the similarities and differences between the drinks.
correlations = data.frame(round(cor(data[, -1]), 2))
knitr::kable(correlations, caption = "Correlations between brands")
| X7up | Coke | Pepsi | |
|---|---|---|---|
| X7up | 1.00 | -0.30 | -0.63 |
| Coke | -0.30 | 1.00 | 0.87 |
| Pepsi | -0.63 | 0.87 | 1.00 |
We can observe that:
* Coke and Pepsi have a highly positive correlation (0.87), indicating that users believe that they possess (or don’t possess) similar attributes
* 7up and Pepsi have a significantly negative correlation (-0.63), indicating that they are perceived to possess dissimilar attributes
The following figure shows correlations visually:
7up scores high in the bottom three attributes - Thirst quenching, Good tasting and For mixing, while Pepsi and Coke score high in Good for snacks, For active, vital people and A drink my friends like.
In factor analysis, we are interested in finding latent factors common to certain brands. For instance, all soft-drinks could be influenced by two factors (say cola-like taste and mix well with other drinks).
If that were true, factor analysis can tell us.
In this case, due to lack of sufficient data, we cannot find two latent variables. Though we can run factor analysis and observe how similar or different the brands are to each other.
We run factor analysis using one factor, and observe how much does each brand contribute to that one factor.
f = factanal(data[, -1], factors = 1, na.action = na.omit)
f$loadings
##
## Loadings:
## Factor1
## X7up -0.626
## Coke 0.875
## Pepsi 0.998
##
## Factor1
## SS loadings 2.153
## Proportion Var 0.718
The loadings of Coke and Pepsi are highly positive (0.87 and 0.99), while that of 7up is negative (-0.63). This indicates that the main governing factor (in this dataset) appears to be something which Coke and Pepsi influence positively and 7up negatively.
Looking at the attributes, it seems that the latent factor measures how much others like the drink and how well is it suited for having with snacks and meals.