A researcher wants to determine the underlying benefits consumers seek from the purchase of a shampoo. A sample of 30 respondents was interviewed. The respondents were asked to indicate their degree of agreement with the following statements using a 7 point scale (1: strongly disagree, 7: strongly agree)
So we load the data first.
head(x)
## Respondent x1 x2 x3 x4 x5 x6
## 1 1 7 3 6 4 2 4
## 2 2 1 3 2 4 5 4
## 3 3 6 2 7 4 1 3
## 4 4 4 5 4 6 2 5
## 5 5 1 2 2 3 6 2
## 6 6 6 3 6 4 2 4
dim(x)
## [1] 30 7
We will remove the respondent variable and do the correlation matrix.
x <- x[, 2:7]
cor(x)
## x1 x2 x3 x4 x5
## x1 1.000000000 -0.05321785 0.87309020 -0.086162233 -0.857636627
## x2 -0.053217850 1.00000000 -0.15502002 0.572212066 0.019745647
## x3 0.873090198 -0.15502002 1.00000000 -0.247787899 -0.777848036
## x4 -0.086162233 0.57221207 -0.24778790 1.000000000 -0.006581882
## x5 -0.857636627 0.01974565 -0.77784804 -0.006581882 1.000000000
## x6 0.004168129 0.64046495 -0.01806881 0.640464946 -0.136402944
## x6
## x1 0.004168129
## x2 0.640464946
## x3 -0.018068814
## x4 0.640464946
## x5 -0.136402944
## x6 1.000000000
As we can see correlation between x1, x3, x5 are quite high and good correlation between x2, x4 and x6.
We will run a Principle Component Analysis to determine the number of factors. We will check the summary of PCA, bar plot, screeplot and biplot.
x.pca <- princomp(x)
summary(x.pca)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4
## Standard deviation 3.1971521 2.0467225 0.95990875 0.84064381
## Proportion of Variance 0.6040845 0.2475649 0.05445415 0.04176333
## Cumulative Proportion 0.6040845 0.8516494 0.90610355 0.94786689
## Comp.5 Comp.6
## Standard deviation 0.76642000 0.5429094
## Proportion of Variance 0.03471401 0.0174191
## Cumulative Proportion 0.98258090 1.0000000
plot(x.pca)
screeplot(x.pca, type = "line")
biplot(x.pca)
Based on the summary and plots, it appears 2 components exist. So we will do factor analysis with 2 factors.
x.fa <- factanal(x, factors = 2, rotation = "varimax", scores = "regression")
x.fa
##
## Call:
## factanal(x = x, factors = 2, scores = "regression", rotation = "varimax")
##
## Uniquenesses:
## x1 x2 x3 x4 x5 x6
## 0.063 0.437 0.174 0.378 0.205 0.309
##
## Loadings:
## Factor1 Factor2
## x1 0.968
## x2 0.749
## x3 0.898 -0.140
## x4 0.784
## x5 -0.887
## x6 0.830
##
## Factor1 Factor2
## SS loadings 2.542 1.892
## Proportion Var 0.424 0.315
## Cumulative Var 0.424 0.739
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 5.21 on 4 degrees of freedom.
## The p-value is 0.266
So Factor1 has correlation among x1, x3, and x5. Factor2 with x2, x4 and x6.
We interpret in this way for factor1
x1 = It is important to buy a shampoo that prevents hair fall.
x3 = A shampoo should strengthen the roots of your hair.
x5 = Prevention of hair splitting is not an important as far as shampoo is considered.
So Factor1 represents hair care related benefits.
For factor2
x2 = I like a shampoo that gives shiny hair.
x4 = I prefer shampoo that decelerates the greying of hair.
x6 = A shampoo should make hair attractive.
So Factor2 represents hair looks and styles.
We generate the component scores as well.
head(x.fa$scores)
## Factor1 Factor2
## [1,] 1.3045863 -0.2412923
## [2,] -1.2951658 -0.2556460
## [3,] 1.1628756 -0.7569023
## [4,] 0.1747302 1.0107855
## [5,] -1.4279521 -1.3607723
## [6,] 0.9864023 -0.2510835