Introduction

Source of the content: http://r-marketing.r-forge.r-project.org/Instructor/Intro%20Factor%20Analysis/intro-factor-analysis.pdf

Factor Analysis: Basic Framework

From the original variables, factor analysis (FA) tries to find a smaller number of derived variables (factors) that meet theseconditions:

1 Maximally capture the correlations among the original variables (after accounting for error)
2 Each factor is associated clearly with a subset of the variables
3 Each variable is associated clearly with (ideally) only one factor
4 The factors are maximally differentiated from one another

These are rarely met perfectly in practice, but when they are approximated, the solution is close to “simple structure” that is very interpretable.

Another way to look at FA is that it seeks latent variables. A latent variable is an unobservable data generating process — such as a mental state — that is manifested in measurable quantities (such as survey items).

The product interest survey was designed to assess three latent variables:

General interest in a product category
Detailed interest in specific features
Interest in the product as an “image” product

Each of those is assessed with multiple items because any single item is imperfect.

People often confuse between the terms – factor analysis/ exploratory factor analysis and confirmatory factor analysis. Following table shows the difference between the two—

KEY TERMS IN FACTOR ANALYSIS

Latent variable: a presumed cognitive or data generating process that leads to observable data. This is often a theoretical construct.
Example: Product interest. Symbol: circle/oval, such as F1 .

Factor: a dimensional reduction that estimates a latent variable and its relationship to manifest variables.
Example: InterestFactor.

Loading: the strength of relationship between a factor and a variable.
Example: F1 → v1 = 0.45.
Ranges [-1.0 . . . 1.0], same as Pearson’s r.

Following diagram shows the a typical workflow of Factor Analysis—

Let us do Factor Analyais on a dataset. The dataset contains 11 items for simulated product interest and engagement data (PIES), rated on 7 point Likert type scale. We will determine the right number of factors and their variable loadings.

Some items’s scoring is transformed in following way—

Dataset

Column

Data Table

Dimension

[1] 3600   11

Variable Names

NotImportant, NeverThink, VeryInterested, LookFeatures, InvestigateDepth, SomeAreBetter, LearnAboutOptions, OthersOpinion, ExpressesPerson, TellAbout and MatchImage

Summary

Table continues below
NotImportant	NeverThink	VeryInterested	LookFeatures
Min. :1.000	Min. :1.000	Min. :1.00	Min. :1.000
1st Qu.:4.000	1st Qu.:3.000	1st Qu.:3.00	1st Qu.:3.000
Median :4.000	Median :4.000	Median :4.00	Median :4.000
Mean :4.339	Mean :4.104	Mean :4.11	Mean :4.039
3rd Qu.:5.000	3rd Qu.:5.000	3rd Qu.:5.00	3rd Qu.:5.000
Max. :7.000	Max. :7.000	Max. :7.00	Max. :7.000

Table continues below
InvestigateDepth	SomeAreBetter	LearnAboutOptions	OthersOpinion
Min. :1.000	Min. :1.000	Min. :1.000	Min. :1.000
1st Qu.:3.000	1st Qu.:3.000	1st Qu.:3.000	1st Qu.:3.000
Median :4.000	Median :4.000	Median :4.000	Median :4.000
Mean :3.999	Mean :3.922	Mean :3.872	Mean :3.904
3rd Qu.:5.000	3rd Qu.:5.000	3rd Qu.:5.000	3rd Qu.:5.000
Max. :7.000	Max. :7.000	Max. :7.000	Max. :7.000

ExpressesPerson	TellAbout	MatchImage
Min. :1.000	Min. :1.0	Min. :1.000
1st Qu.:3.000	1st Qu.:3.0	1st Qu.:3.000
Median :4.000	Median :4.0	Median :4.000
Mean :4.023	Mean :3.9	Mean :3.853
3rd Qu.:5.000	3rd Qu.:5.0	3rd Qu.:4.250
Max. :7.000	Max. :7.0	Max. :7.000

Correlation Plot

Determining Number of Factors

There is usually not a definitive answer. Choosing number of factors is partially a matter of usefulness.
Generally, look for consensus among:
- Theory: how many do you expect?
- Correlation matrix: how many seem to be there?
- Eigenvalues: how many Factors have Eigenvalue > 1?
- Eigenvalue scree plot: where is the “bend” in extraction?
- Parallel analysis and acceleration

Column

Eigenvalue

In factor analysis, an eigenvalue is the proportion of total shared (i.e., non-error) variance explained by each factor.

A factor is only useful if it explains more than 1 variable . . . and thus has eigenvalue > 1.0.

3.661, 1.642, 1.275, 0.6881, 0.5801, 0.572, 0.5608, 0.5388, 0.529, 0.4834 and 0.4701

This thumb rule suggests 3 factors in the data.

ScreePlot

Factor Rotation Model

EFA can be thought of as slicing a pizza. The same material(variance) can be carved up in ways that are mathematically identical, but might be more or less useful for a given situation.

Key decision: do you want the extracted factors to be correlated or not? In FA jargon, orthogonal or oblique?

By default, EFA looks for orthogonal factors that have r=0 correlation. This maximizes the interpretability, so orthogonal rotation is recommended in most cases, at least to start.

Some rotation options

Default: varimax: orthogonal rotation that aims for clear factor/variable structure. Generally recommended.

Oblique: oblimin: finds correlated factors while aiming for interpretability. Recommended if you want an oblique solution.

Oblique: promax: finds correlated factors similarly, but computationally different (good alternative). Recommended alternative if oblimin is not available or has difficulty.

Many others . . . : dozens have been developed. They are useful mostly when you’re very concerned about psychometrics

Fitting the Model

Column

Fitting the Model

Let us fit the model with orthogonal rotation.

library(psych)

data.fa <- fa(data, nfactors=3, rotate="varimax")

pander(summary(data.fa))

Factor analysis with Call: fa(r = data, nfactors = 3, rotate = “varimax”)

Test of the hypothesis that 3 factors are sufficient. The degrees of freedom for the model is 25 and the objective function was 0 The number of observations was 3600 with Chi Square = 17.17 with prob < 0.88

The root mean square of the residuals (RMSA) is 0 The df corrected root mean square of the residuals is 0.01

Tucker Lewis Index of factoring reliability = 1.002 RMSEA index = 0 and the 10 % confidence intervals are 0 0.007 BIC = -187.54

Summary of the Model

data.fa

Factor Analysis using method =  minres
Call: fa(r = data, nfactors = 3, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                   MR1  MR2  MR3   h2   u2 com
NotImportant      0.14 0.12 0.67 0.49 0.51 1.1
NeverThink        0.10 0.10 0.61 0.40 0.60 1.1
VeryInterested    0.28 0.36 0.48 0.44 0.56 2.5
LookFeatures      0.15 0.61 0.10 0.40 0.60 1.2
InvestigateDepth  0.13 0.72 0.10 0.54 0.46 1.1
SomeAreBetter     0.07 0.52 0.09 0.28 0.72 1.1
LearnAboutOptions 0.13 0.68 0.15 0.50 0.50 1.2
OthersOpinion     0.67 0.14 0.13 0.48 0.52 1.2
ExpressesPerson   0.71 0.14 0.13 0.53 0.47 1.1
TellAbout         0.65 0.12 0.14 0.46 0.54 1.2
MatchImage        0.63 0.13 0.08 0.42 0.58 1.1

                       MR1  MR2  MR3
SS loadings           1.94 1.83 1.17
Proportion Var        0.18 0.17 0.11
Cumulative Var        0.18 0.34 0.45
Proportion Explained  0.39 0.37 0.24
Cumulative Proportion 0.39 0.76 1.00

Mean item complexity =  1.3
Test of the hypothesis that 3 factors are sufficient.

The degrees of freedom for the null model are  55  and the objective function was  2.76 with Chi Square of  9905.74
The degrees of freedom for the model are 25  and the objective function was  0 

The root mean square of the residuals (RMSR) is  0 
The df corrected root mean square of the residuals is  0.01 

The harmonic number of observations is  3600 with the empirical chi square  9.78  with prob <  1 
The total number of observations was  3600  with Likelihood Chi Square =  17.17  with prob <  0.88 

Tucker Lewis Index of factoring reliability =  1.002
RMSEA index =  0  and the 90 % confidence intervals are  0 0.007
BIC =  -187.54
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   MR1  MR2  MR3
Correlation of (regression) scores with factors   0.87 0.86 0.79
Multiple R square of scores with factors          0.75 0.73 0.62
Minimum correlation of possible factor scores     0.50 0.46 0.24

Factor Loadings on Manifest Variables

Generally, factor with loading on manifest variable more than 0.30 is considered—

	MR1	MR2	MR3
NotImportant			0.674
NeverThink			0.614
VeryInterested		0.362	0.476
LookFeatures		0.607
InvestigateDepth		0.716
SomeAreBetter		0.518
LearnAboutOptions		0.678
OthersOpinion	0.665
ExpressesPerson	0.706
TellAbout	0.655
MatchImage	0.633

Visualization

Now, we can give name to these factors. Suggest!!! ????

In the sixth Step, you can repeat previous step with different rotations and try to compare the results from interpretation of the model point of view.

### Use the Factor Scores for Each Respondent

ImageF	FeatureF	GeneralF
0.5111	-1.241	0.7928
-0.06877	0.2804	0.6641
-0.3027	-0.1043	-0.8785
-0.8661	-1.106	0.4226
-0.692	-0.08929	-0.403
1.534	-0.3898	-0.06599

	ImageF	FeatureF	GeneralF
3595	-0.0229	-0.204	-0.08209
3596	0.2773	0.4208	0.4915
3597	1.938	-1.261	0.3646
3598	-0.4837	0.4699	1.331
3599	-0.5671	0.9789	0.2916
3600	-0.9219	0.6484	1.162

Confirmatory Factor Analysis

Column

CFA Introduction

Confirmatory Factor Analysis

CFA is a special case of structural equation modeling (SEM), applied to latent variable assessment, usually for surveys and similar data.

1 Assess the structure of survey scales — do items load where one would hope?

2 Evaluate the fit / appropriateness of a factor model — is a proposed model better than alternatives?

3 Evaluate the weights of items relative to one another and a scale — do they contribute equally?

4 Model other effects such as method effects and hierarchical relationships

Steps in CFA

1. Define your hypothesized/favored model with relationships of latent variables to manifest variables.

2 Define 1 or more alternative models that are reasonable, but which you believe are inferior.

3 Fit the models to your data.

4 Determine whether your model is good enough (fit indices, paths)

5 Determine whether your model is better than the alternative

6 Intepret your model

Let’s try to fit 3 factor model on our data and we will compare it with 1 factor model shown below—

Model Fit Measures

Model fit indices are the measures to evaluate and compare the CFA models.

Following model fit indices are most frequently referred—

Global fit indices

Example: Comparative Fit Index (CFI). Attempts to assess “absolute” fit vs. the data. Not very good measures, but set a minimum bar: want fit > 0.90.

Approximation error and residuals

Example: Standardized Root Mean Square Residual (SRMR). Difference between the data’s covariance matrix and the fitted model’s matrix. Want SRMR < 0.08. For Root Mean Square Error of Approximation, want Lower-CI(RMSEA) < 0.05.

Information Criteria

Example: Akaike Information Criterion (AIC). Assesses the model’s fit vs. the observed data. No absolute interpretation, but lower is better. Difference of 10 or more is large.

3 Factor Model

library(lavaan)

Model3 <- " General =~ NotImportant + NeverThink + VeryInterested
Feature =~ LookFeatures + InvestigateDepth + SomeAreBetter + LearnAboutOptions
Image =~ OthersOpinion + ExpressesPerson + TellAbout + MatchImage"

fit3 <- cfa(Model3, data=data)

pander(summary(fit3, fit.measures=TRUE))

lavaan 0.6-6 ended normally after 36 iterations

Estimator ML Optimization method NLMINB Number of free parameters 25

Number of observations 3600

Model Test User Model:

Test statistic 287.649 Degrees of freedom 41 P-value (Chi-square) 0.000

Model Test Baseline Model:

Test statistic 9920.901 Degrees of freedom 55 P-value 0.000

User Model versus Baseline Model:

Comparative Fit Index (CFI) 0.975 Tucker-Lewis Index (TLI) 0.966

Loglikelihood and Information Criteria:

Loglikelihood user model (H0) -52885.888 Loglikelihood unrestricted model (H1) -52742.064

Akaike (AIC) 105821.776 Bayesian (BIC) 105976.494 Sample-size adjusted Bayesian (BIC) 105897.056

Root Mean Square Error of Approximation:

RMSEA 0.041 90 Percent confidence interval - lower 0.036 90 Percent confidence interval - upper 0.045 P-value RMSEA <= 0.05 1.000

Standardized Root Mean Square Residual:

SRMR 0.030

Parameter Estimates:

Standard errors Standard Information Expected Information saturated (h1) model Structured

Latent Variables: Estimate Std.Err z-value P(>|z|) General =~
NotImportant 1.000
NeverThink 0.948 0.042 22.415 0.000 VeryInterested 1.305 0.052 25.268 0.000 Feature =~
LookFeatures 1.000
InvestigatDpth 1.168 0.037 31.168 0.000 SomeAreBetter 0.822 0.033 25.211 0.000 LearnAbotOptns 1.119 0.036 31.022 0.000 Image =~
OthersOpinion 1.000
ExpressesPersn 0.963 0.028 34.657 0.000 TellAbout 0.908 0.027 33.146 0.000 MatchImage 0.850 0.027 31.786 0.000

Covariances: Estimate Std.Err z-value P(>|z|) General ~~
Feature 0.217 0.012 17.561 0.000 Image 0.231 0.013 17.348 0.000 Feature ~~
Image 0.202 0.013 15.650 0.000

Variances: Estimate Std.Err z-value P(>|z|) .NotImportant 0.657 0.020 33.498 0.000 .NeverThink 0.796 0.022 35.967 0.000 .VeryInterested 0.463 0.022 21.479 0.000 .LookFeatures 0.657 0.019 33.973 0.000 .InvestigatDpth 0.554 0.019 28.588 0.000 .SomeAreBetter 0.779 0.021 37.701 0.000 .LearnAbotOptns 0.533 0.018 29.199 0.000 .OthersOpinion 0.640 0.020 32.071 0.000 .ExpressesPersn 0.476 0.016 29.501 0.000 .TellAbout 0.560 0.017 32.697 0.000 .MatchImage 0.599 0.017 34.500 0.000 General 0.337 0.021 15.799 0.000 Feature 0.446 0.024 18.684 0.000 Image 0.591 0.028 21.092 0.000

FIT:

Table continues below

npar fmin chisq df pvalue baseline.chisq baseline.df

25 0.03995 287.6 41 0 9921 55

Table continues below

baseline.pvalue cfi tli logl unrestricted.logl aic

0 0.975 0.9665 -52886 -52742 105822

Table continues below

bic ntotal bic2 rmsea rmsea.ci.lower rmsea.ci.upper

105976 3600 105897 0.04088 0.03649 0.0454

rmsea.pvalue srmr

0.9996 0.03033

Table continues below
npar	fmin	chisq	df	pvalue	baseline.chisq	baseline.df
25	0.03995	287.6	41	0	9921	55

Table continues below
baseline.pvalue	cfi	tli	logl	unrestricted.logl	aic
0	0.975	0.9665	-52886	-52742	105822

Table continues below
bic	ntotal	bic2	rmsea	rmsea.ci.lower	rmsea.ci.upper
105976	3600	105897	0.04088	0.03649	0.0454

rmsea.pvalue	srmr
0.9996	0.03033

PE:

lhs	op	rhs	est	se	z	pvalue
General	=~	NotImportant	1	0	NA	NA
General	=~	NeverThink	0.9484	0.04231	22.41	0
General	=~	VeryInterested	1.305	0.05165	25.27	0
Feature	=~	LookFeatures	1	0	NA	NA
Feature	=~	InvestigateDepth	1.168	0.03748	31.17	0
Feature	=~	SomeAreBetter	0.8216	0.03259	25.21	0
Feature	=~	LearnAboutOptions	1.119	0.03606	31.02	0
Image	=~	OthersOpinion	1	0	NA	NA
Image	=~	ExpressesPerson	0.9629	0.02778	34.66	0
Image	=~	TellAbout	0.9075	0.02738	33.15	0
Image	=~	MatchImage	0.8499	0.02674	31.79	0
NotImportant	~~	NotImportant	0.6575	0.01963	33.5	0
NeverThink	~~	NeverThink	0.7964	0.02214	35.97	0
VeryInterested	~~	VeryInterested	0.4631	0.02156	21.48	0
LookFeatures	~~	LookFeatures	0.6568	0.01933	33.97	0
InvestigateDepth	~~	InvestigateDepth	0.5543	0.01939	28.59	0
SomeAreBetter	~~	SomeAreBetter	0.7794	0.02067	37.7	0
LearnAboutOptions	~~	LearnAboutOptions	0.5325	0.01824	29.2	0
OthersOpinion	~~	OthersOpinion	0.6399	0.01995	32.07	0
ExpressesPerson	~~	ExpressesPerson	0.4761	0.01614	29.5	0
TellAbout	~~	TellAbout	0.56	0.01713	32.7	0
MatchImage	~~	MatchImage	0.5991	0.01737	34.5	0
General	~~	General	0.3366	0.0213	15.8	0
Feature	~~	Feature	0.4462	0.02388	18.68	0
Image	~~	Image	0.5908	0.02801	21.09	0
General	~~	Feature	0.217	0.01236	17.56	0
General	~~	Image	0.2312	0.01333	17.35	0
Feature	~~	Image	0.2023	0.01292	15.65	0

1 Factor Model

Model1 <- " Int =~ NotImportant + NeverThink + VeryInterested+ LookFeatures + InvestigateDepth + SomeAreBetter + LearnAboutOptions+ OthersOpinion + ExpressesPerson + TellAbout + MatchImage"

fit1 <- cfa(Model1, data=data)

pander(summary(fit1, fit.measures=TRUE))

lavaan 0.6-6 ended normally after 33 iterations

Estimator ML Optimization method NLMINB Number of free parameters 22

Number of observations 3600

Model Test User Model:

Test statistic 3284.581 Degrees of freedom 44 P-value (Chi-square) 0.000

Model Test Baseline Model:

Test statistic 9920.901 Degrees of freedom 55 P-value 0.000

User Model versus Baseline Model:

Comparative Fit Index (CFI) 0.672 Tucker-Lewis Index (TLI) 0.589

Loglikelihood and Information Criteria:

Loglikelihood user model (H0) -54384.354 Loglikelihood unrestricted model (H1) -52742.064

Akaike (AIC) 108812.709 Bayesian (BIC) 108948.860 Sample-size adjusted Bayesian (BIC) 108878.955

Root Mean Square Error of Approximation:

RMSEA 0.143 90 Percent confidence interval - lower 0.139 90 Percent confidence interval - upper 0.147 P-value RMSEA <= 0.05 0.000

Standardized Root Mean Square Residual:

SRMR 0.102

Parameter Estimates:

Standard errors Standard Information Expected Information saturated (h1) model Structured

Latent Variables: Estimate Std.Err z-value P(>|z|) Int =~
NotImportant 1.000
NeverThink 0.913 0.058 15.851 0.000 VeryInterested 1.475 0.071 20.794 0.000 LookFeatures 1.251 0.066 19.028 0.000 InvestigatDpth 1.355 0.069 19.534 0.000 SomeAreBetter 0.979 0.059 16.673 0.000 LearnAbotOptns 1.341 0.068 19.734 0.000 OthersOpinion 1.553 0.076 20.502 0.000 ExpressesPersn 1.465 0.070 20.789 0.000 TellAbout 1.405 0.069 20.335 0.000 MatchImage 1.312 0.066 19.815 0.000

Variances: Estimate Std.Err z-value P(>|z|) .NotImportant 0.821 0.020 40.286 0.000 .NeverThink 0.955 0.023 40.895 0.000 .VeryInterested 0.660 0.018 36.603 0.000 .LookFeatures 0.832 0.021 39.117 0.000 .InvestigatDpth 0.845 0.022 38.601 0.000 .SomeAreBetter 0.915 0.023 40.584 0.000 .LearnAbotOptns 0.780 0.020 38.362 0.000 .OthersOpinion 0.813 0.022 37.195 0.000 .ExpressesPersn 0.652 0.018 36.614 0.000 .TellAbout 0.705 0.019 37.490 0.000 .MatchImage 0.728 0.019 38.259 0.000 Int 0.173 0.015 11.794 0.000

FIT:

Table continues below

npar fmin chisq df pvalue baseline.chisq baseline.df

22 0.4562 3285 44 0 9921 55

Table continues below

baseline.pvalue cfi tli logl unrestricted.logl aic

0 0.6715 0.5894 -54384 -52742 108813

Table continues below

bic ntotal bic2 rmsea rmsea.ci.lower rmsea.ci.upper

108949 3600 108879 0.143 0.1389 0.1472

rmsea.pvalue srmr

0 0.1019

Table continues below
npar	fmin	chisq	df	pvalue	baseline.chisq	baseline.df
22	0.4562	3285	44	0	9921	55

Table continues below
baseline.pvalue	cfi	tli	logl	unrestricted.logl	aic
0	0.6715	0.5894	-54384	-52742	108813

Table continues below
bic	ntotal	bic2	rmsea	rmsea.ci.lower	rmsea.ci.upper
108949	3600	108879	0.143	0.1389	0.1472

rmsea.pvalue	srmr
0	0.1019

PE:

lhs	op	rhs	est	se	z	pvalue
Int	=~	NotImportant	1	0	NA	NA
Int	=~	NeverThink	0.9127	0.05758	15.85	0
Int	=~	VeryInterested	1.475	0.07095	20.79	0
Int	=~	LookFeatures	1.251	0.06573	19.03	0
Int	=~	InvestigateDepth	1.355	0.06937	19.53	0
Int	=~	SomeAreBetter	0.9794	0.05874	16.67	0
Int	=~	LearnAboutOptions	1.341	0.06795	19.73	0
Int	=~	OthersOpinion	1.553	0.07575	20.5	0
Int	=~	ExpressesPerson	1.465	0.07049	20.79	0
Int	=~	TellAbout	1.405	0.06908	20.34	0
Int	=~	MatchImage	1.312	0.06622	19.82	0
NotImportant	~~	NotImportant	0.821	0.02038	40.29	0
NeverThink	~~	NeverThink	0.9549	0.02335	40.89	0
VeryInterested	~~	VeryInterested	0.6598	0.01802	36.6	0
LookFeatures	~~	LookFeatures	0.8322	0.02127	39.12	0
InvestigateDepth	~~	InvestigateDepth	0.8453	0.0219	38.6	0
SomeAreBetter	~~	SomeAreBetter	0.9146	0.02254	40.58	0
LearnAboutOptions	~~	LearnAboutOptions	0.7797	0.02032	38.36	0
OthersOpinion	~~	OthersOpinion	0.8134	0.02187	37.19	0
ExpressesPerson	~~	ExpressesPerson	0.6522	0.01781	36.61	0
TellAbout	~~	TellAbout	0.7051	0.01881	37.49	0
MatchImage	~~	MatchImage	0.7279	0.01903	38.26	0
Int	~~	Int	0.1731	0.01468	11.79	0

Model Paths

About Us

DASHBOARD PREPARED BY (CONTACT FOR MACHINE LEARNING TRAINING, COACHING , CONSULTING & Complete Analysis of this case study)

* Dr AMITA SHARMA Post Doc from Erasmus University, Rotterdam, the Netherlands Assistant Professor Institute of Agri Business Management, Swami Keshwanand Rajasthan Agricultural University, Bikaner (Raj),India Blog: www.thinkingai.in

* ARUN KUMAR SHARMA Machine Learning Enthusiast 13 Years of Financial Services Marketing Exp Blogger, Writer and Machine Learning Consutlant Certified Business Analytics Professional Certified in Predictive Analytics, Indian Institute of Mnamagement,IIMx Bangalore Certified in Macroeconomic Forecasting, International Monetary Fund(IMFx) Certified in Text Analytics, openSAP Email: aks10000@gmail.com Tel:9468567418

=============================================

---
title: "Factor Analysis Case Study"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    social : ["facebook","twitter","linkedin", "menu"]
    source_code: embed
---

Introduction   {data-navmenu="LearnMore"}
======================================






Source of the content: http://r-marketing.r-forge.r-project.org/Instructor/Intro%20Factor%20Analysis/intro-factor-analysis.pdf




Factor Analysis: Basic Framework




From the original variables, factor analysis (FA) tries to find a smaller number of derived variables (factors) that meet theseconditions:




1 Maximally capture the correlations among the original variables
(after accounting for error)


2 Each factor is associated clearly with a subset of the variables


3 Each variable is associated clearly with (ideally) only one factor


4 The factors are maximally differentiated from one another




These are rarely met perfectly in practice, but when they are
approximated, the solution is close to “simple structure” that is very interpretable.




Another way to look at FA is that it seeks latent variables. A latent
variable is an unobservable data generating process — such as a
mental state — that is manifested in measurable quantities (such as
survey items).




The product interest survey was designed to assess three latent
variables:




General interest in a product category


Detailed interest in specific features


Interest in the product as an “image” product




Each of those is assessed with multiple items because any single
item is imperfect.









People often confuse between the terms -- factor analysis/ exploratory factor analysis and confirmatory factor analysis. Following table shows the difference between the two---









KEY TERMS IN FACTOR ANALYSIS




Latent variable: a presumed cognitive or data generating process
that leads to observable data. This is often a theoretical construct.


Example: Product interest. Symbol: circle/oval, such as F1 .




Factor: a dimensional reduction that estimates a latent variable
and its relationship to manifest variables. 


Example: InterestFactor.




Loading: the strength of relationship between a factor and a
variable. 


Example: F1 → v1 = 0.45. 


Ranges [-1.0 . . . 1.0], same as Pearson’s r.






Following diagram shows the a typical workflow of Factor Analysis---







Let us do Factor Analyais on a dataset. The dataset contains 11 items for simulated product interest and engagement data (PIES), rated on 7 point Likert type scale. We will determine the right number of factors and their variable loadings.




Some items's scoring is transformed in following way---













Dataset  {data-navmenu="LearnMore"}
===============================================================

Column {.tabset}
-----------------------------------------

### Data Table

```{r}

library(pander)

data=read.csv("factoranalysis.csv", header=T, stringsAsFactors = FALSE)

colnames(data)=c("NotImportant","NeverThink","VeryInterested","LookFeatures","InvestigateDepth","SomeAreBetter","LearnAboutOptions","OthersOpinion","ExpressesPerson","TellAbout","MatchImage")

DT::datatable(data, caption="Data View of simulated product interest and engagement data",
              filter="top")
```


### Dimension

```{r}

dim(data)

```


### Variable Names

```{r}

pander(colnames(data))


```

### Summary

```{r}

pander(summary(data))


```

### Correlation Plot

```{r}

corrplot::corrplot(cor(data), diag = FALSE)


```







Determining Number of Factors  {data-navmenu="LearnMore"}
==========================================================






There is usually not a definitive answer. Choosing number of factors
is partially a matter of usefulness.


Generally, look for consensus among:


- Theory: how many do you expect?


- Correlation matrix: how many seem to be there?


- Eigenvalues: how many Factors have Eigenvalue > 1?


- Eigenvalue scree plot: where is the “bend” in extraction?


- Parallel analysis and acceleration







Column {.tabset}
-------------------------------------

### Eigenvalue

In factor analysis, an eigenvalue is the proportion of total shared (i.e., non-error) variance explained by each factor.




A factor is only useful if it explains more than 1 variable . . . and
thus has eigenvalue > 1.0.

```{r}

pander(eigen(cor(data))$values)

```




This thumb rule suggests 3 factors in the data.

### ScreePlot

```{r}

pc=prcomp(data, cor=TRUE)

screeplot(pc, type="lines")


```


Factor Rotation Model    {data-navmenu="LearnMore"}
=========================================






EFA can be thought of as slicing a pizza. The same material(variance) can be carved up in ways that are mathematically
identical, but might be more or less useful for a given situation.




Key decision: do you want the extracted factors to be correlated or
not? In FA jargon, orthogonal or oblique?




By default, EFA looks for orthogonal factors that have r=0
correlation. This maximizes the interpretability, so orthogonal rotation is recommended in most cases, at least to start. 









Some rotation options




Default: varimax: orthogonal rotation that aims for clear
factor/variable structure. Generally recommended.




Oblique: oblimin: finds correlated factors while aiming for
interpretability. Recommended if you want an oblique solution.




Oblique: promax: finds correlated factors similarly, but
computationally different (good alternative). Recommended
alternative if oblimin is not available or has difficulty.




Many others . . . : dozens have been developed. They are
useful mostly when you’re very concerned about psychometrics







Fitting the Model   {data-navmenu="LearnMore"}
===================================================

Column {.tabset}
---------------------------------------

### Fitting the Model

Let us fit the model with orthogonal rotation.






```{r echo=TRUE}

library(psych)

data.fa <- fa(data, nfactors=3, rotate="varimax")

pander(summary(data.fa))

```






Summary of the Model




```{r echo=TRUE}

data.fa


```

### Factor Loadings on Manifest Variables




Generally, factor with loading on manifest variable more than 0.30 is considered---




```{r echo=FALSE}

L=round(data.fa$loadings, digits = 3)

struct=ifelse(L>0.3,as.numeric(L),' ')

pander(struct)

```

### Visualization

```{r echo=FALSE}

fa.diagram(data.fa)

```




Now, we can give name to these factors. Suggest!!! ????




In the sixth Step, you can repeat previous step with different rotations and try to compare the results from interpretation of the model point of view. 






### Use the Factor Scores for Each Respondent






```{r}

fa.scores <- data.frame(data.fa$scores)
names(fa.scores) <- c("ImageF", "FeatureF", "GeneralF")
pander(head(fa.scores))
pander(tail(fa.scores))

```

Confirmatory Factor Analysis  {data-navmenu="LearnMore"}
========================================

Column    {.tabset}
----------------------------------------------

### CFA Introduction






Confirmatory Factor Analysis




CFA is a special case of structural equation modeling (SEM), applied
to latent variable assessment, usually for surveys and similar data.






1 Assess the structure of survey scales — do items load where
one would hope?




2 Evaluate the fit / appropriateness of a factor model — is a
proposed model better than alternatives?




3 Evaluate the weights of items relative to one another and a
scale — do they contribute equally?




4 Model other effects such as method effects and hierarchical
relationships






Steps in CFA






1. Define your hypothesized/favored model with relationships of
latent variables to manifest variables.




2 Define 1 or more alternative models that are reasonable, but
which you believe are inferior.




3 Fit the models to your data.




4 Determine whether your model is good enough (fit indices,
paths)




5 Determine whether your model is better than the alternative




6 Intepret your model











Let's try to fit 3 factor model on our data and we will compare it with 1 factor model shown below---














### Model Fit Measures







Model fit indices are the measures to evaluate and compare the CFA models.




Following model fit indices are most frequently referred---




Global fit indices




Example: Comparative Fit Index (CFI). Attempts to assess
“absolute” fit vs. the data. Not very good measures, but set a
minimum bar: want fit > 0.90.




Approximation error and residuals




Example: Standardized Root Mean Square Residual (SRMR).
Difference between the data’s covariance matrix and the fitted
model’s matrix. Want SRMR < 0.08. For Root Mean Square Error
of Approximation, want Lower-CI(RMSEA) < 0.05.




Information Criteria




Example: Akaike Information Criterion (AIC). Assesses the model’s
fit vs. the observed data. No absolute interpretation, but lower is
better. Difference of 10 or more is large.







### 3 Factor Model




```{r echo=TRUE}

library(lavaan)

Model3 <- " General =~ NotImportant + NeverThink + VeryInterested
Feature =~ LookFeatures + InvestigateDepth + SomeAreBetter + LearnAboutOptions
Image =~ OthersOpinion + ExpressesPerson + TellAbout + MatchImage"

fit3 <- cfa(Model3, data=data)

pander(summary(fit3, fit.measures=TRUE))


```


### 1 Factor Model




```{r echo=TRUE}

Model1 <- " Int =~ NotImportant + NeverThink + VeryInterested+ LookFeatures + InvestigateDepth + SomeAreBetter + LearnAboutOptions+ OthersOpinion + ExpressesPerson + TellAbout + MatchImage"

fit1 <- cfa(Model1, data=data)

pander(summary(fit1, fit.measures=TRUE))


```

### Model Paths

```{r}

semPlot::semPaths(fit3, "std")


```

About Us    {data-navmenu="LearnMore"}
=====================================================

### DASHBOARD PREPARED BY (CONTACT FOR MACHINE LEARNING TRAINING, COACHING , CONSULTING & Complete Analysis of this case study)


* Dr AMITA SHARMA
      Post Doc from Erasmus University, Rotterdam, the Netherlands
      Assistant Professor
      Institute of Agri Business Management, 
      Swami Keshwanand Rajasthan Agricultural University, 
      Bikaner (Raj),India
      Blog: www.thinkingai.in





*  ARUN KUMAR SHARMA
      Machine Learning Enthusiast
      13 Years of Financial Services Marketing Exp
      Blogger, Writer and Machine Learning Consutlant
      Certified Business Analytics Professional
      Certified in Predictive Analytics, Indian Institute of     Mnamagement,IIMx Bangalore
      Certified in Macroeconomic Forecasting, International Monetary Fund(IMFx)
      Certified in Text Analytics, openSAP
      Email: aks10000@gmail.com
      Tel:9468567418


=============================================