Factor Analysis

The data contains responses of users to three soft-drinks: Pepsi, 7Up and Coke. The attributes on which the response was given are Good for Snacks, Good with meals, Thirst quenching, etc.



library(knitr)
data = read.csv("soft_drinks.csv")
knitr::kable(data, caption = "Percentage of users who believe brand possesses the attribute.")
Percentage of users who believe brand possesses the attribute.
Attribute X7up Coke Pepsi
Good for snacks 39 62 61
Good with meals 32 47 44
For active, vital people 38 60 66
A drink my friends like 30 55 53
A good buy 28 38 50
A big bottle 16 39 58
Thirst quenching 60 30 28
Good tasting 58 62 59
For mixing 66 18 4

Correlations

Correlations are a good way to study the similarities and differences between the drinks.

correlations = data.frame(round(cor(data[, -1]), 2))
knitr::kable(correlations, caption = "Correlations between brands")
Correlations between brands
X7up Coke Pepsi
X7up 1.00 -0.30 -0.63
Coke -0.30 1.00 0.87
Pepsi -0.63 0.87 1.00

We can observe that:
* Coke and Pepsi have a highly positive correlation (0.87), indicating that users believe that they possess (or don’t possess) similar attributes
* 7up and Pepsi have a significantly negative correlation (-0.63), indicating that they are perceived to possess dissimilar attributes


The following figure shows correlations visually:



7up scores high in the bottom three attributes - Thirst quenching, Good tasting and For mixing, while Pepsi and Coke score high in Good for snacks, For active, vital people and A drink my friends like.

Factor Analysis

In factor analysis, we are interested in finding latent factors common to certain brands. For instance, all soft-drinks could be influenced by two factors (say cola-like taste and mix well with other drinks).

If that were true, factor analysis can tell us.

In this case, due to lack of sufficient data, we cannot find two latent variables. Though we can run factor analysis and observe how similar or different the brands are to each other.

We run factor analysis using one factor, and observe how much does each brand contribute to that one factor.

f = factanal(data[, -1], factors = 1, na.action = na.omit)
f$loadings
## 
## Loadings:
##       Factor1
## X7up  -0.626 
## Coke   0.875 
## Pepsi  0.998 
## 
##                Factor1
## SS loadings      2.153
## Proportion Var   0.718

The loadings of Coke and Pepsi are highly positive (0.87 and 0.99), while that of 7up is negative (-0.63). This indicates that the main governing factor (in this dataset) appears to be something which Coke and Pepsi influence positively and 7up negatively.

Looking at the attributes, it seems that the latent factor measures how much others like the drink and how well is it suited for having with snacks and meals.