Introduction



Source of the content: http://r-marketing.r-forge.r-project.org/Instructor/Intro%20Factor%20Analysis/intro-factor-analysis.pdf

Factor Analysis: Basic Framework

From the original variables, factor analysis (FA) tries to find a smaller number of derived variables (factors) that meet theseconditions:

1 Maximally capture the correlations among the original variables (after accounting for error)
2 Each factor is associated clearly with a subset of the variables
3 Each variable is associated clearly with (ideally) only one factor
4 The factors are maximally differentiated from one another


These are rarely met perfectly in practice, but when they are approximated, the solution is close to “simple structure” that is very interpretable.

Another way to look at FA is that it seeks latent variables. A latent variable is an unobservable data generating process — such as a mental state — that is manifested in measurable quantities (such as survey items).

The product interest survey was designed to assess three latent variables:

General interest in a product category
Detailed interest in specific features
Interest in the product as an “image” product

Each of those is assessed with multiple items because any single item is imperfect.



People often confuse between the terms – factor analysis/ exploratory factor analysis and confirmatory factor analysis. Following table shows the difference between the two—



KEY TERMS IN FACTOR ANALYSIS

Latent variable: a presumed cognitive or data generating process that leads to observable data. This is often a theoretical construct.
Example: Product interest. Symbol: circle/oval, such as F1 .

Factor: a dimensional reduction that estimates a latent variable and its relationship to manifest variables.
Example: InterestFactor.

Loading: the strength of relationship between a factor and a variable.
Example: F1 → v1 = 0.45.
Ranges [-1.0 . . . 1.0], same as Pearson’s r.


Following diagram shows the a typical workflow of Factor Analysis—


Let us do Factor Analyais on a dataset. The dataset contains 11 items for simulated product interest and engagement data (PIES), rated on 7 point Likert type scale. We will determine the right number of factors and their variable loadings.

Some items’s scoring is transformed in following way—




Dataset

Column

Data Table

Dimension

[1] 3600   11

Variable Names

NotImportant, NeverThink, VeryInterested, LookFeatures, InvestigateDepth, SomeAreBetter, LearnAboutOptions, OthersOpinion, ExpressesPerson, TellAbout and MatchImage

Summary

Table continues below
NotImportant NeverThink VeryInterested LookFeatures
Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000
1st Qu.:4.000 1st Qu.:3.000 1st Qu.:3.00 1st Qu.:3.000
Median :4.000 Median :4.000 Median :4.00 Median :4.000
Mean :4.339 Mean :4.104 Mean :4.11 Mean :4.039
3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.00 3rd Qu.:5.000
Max. :7.000 Max. :7.000 Max. :7.00 Max. :7.000
Table continues below
InvestigateDepth SomeAreBetter LearnAboutOptions OthersOpinion
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000
Median :4.000 Median :4.000 Median :4.000 Median :4.000
Mean :3.999 Mean :3.922 Mean :3.872 Mean :3.904
3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000
Max. :7.000 Max. :7.000 Max. :7.000 Max. :7.000
ExpressesPerson TellAbout MatchImage
Min. :1.000 Min. :1.0 Min. :1.000
1st Qu.:3.000 1st Qu.:3.0 1st Qu.:3.000
Median :4.000 Median :4.0 Median :4.000
Mean :4.023 Mean :3.9 Mean :3.853
3rd Qu.:5.000 3rd Qu.:5.0 3rd Qu.:4.250
Max. :7.000 Max. :7.0 Max. :7.000

Correlation Plot




Determining Number of Factors




There is usually not a definitive answer. Choosing number of factors is partially a matter of usefulness.
Generally, look for consensus among:
- Theory: how many do you expect?
- Correlation matrix: how many seem to be there?
- Eigenvalues: how many Factors have Eigenvalue > 1?
- Eigenvalue scree plot: where is the “bend” in extraction?
- Parallel analysis and acceleration


Column

Eigenvalue

In factor analysis, an eigenvalue is the proportion of total shared (i.e., non-error) variance explained by each factor.

A factor is only useful if it explains more than 1 variable . . . and thus has eigenvalue > 1.0.

3.661, 1.642, 1.275, 0.6881, 0.5801, 0.572, 0.5608, 0.5388, 0.529, 0.4834 and 0.4701

This thumb rule suggests 3 factors in the data.

ScreePlot

Factor Rotation Model




EFA can be thought of as slicing a pizza. The same material(variance) can be carved up in ways that are mathematically identical, but might be more or less useful for a given situation.

Key decision: do you want the extracted factors to be correlated or not? In FA jargon, orthogonal or oblique?

By default, EFA looks for orthogonal factors that have r=0 correlation. This maximizes the interpretability, so orthogonal rotation is recommended in most cases, at least to start.



Some rotation options

Default: varimax: orthogonal rotation that aims for clear factor/variable structure. Generally recommended.

Oblique: oblimin: finds correlated factors while aiming for interpretability. Recommended if you want an oblique solution.

Oblique: promax: finds correlated factors similarly, but computationally different (good alternative). Recommended alternative if oblimin is not available or has difficulty.

Many others . . . : dozens have been developed. They are useful mostly when you’re very concerned about psychometrics


Fitting the Model

Column

Fitting the Model

Let us fit the model with orthogonal rotation.

Factor analysis with Call: fa(r = data, nfactors = 3, rotate = “varimax”)

Test of the hypothesis that 3 factors are sufficient. The degrees of freedom for the model is 25 and the objective function was 0 The number of observations was 3600 with Chi Square = 17.17 with prob < 0.88

The root mean square of the residuals (RMSA) is 0 The df corrected root mean square of the residuals is 0.01

Tucker Lewis Index of factoring reliability = 1.002 RMSEA index = 0 and the 10 % confidence intervals are 0 0.007 BIC = -187.54


Summary of the Model

Factor Analysis using method =  minres
Call: fa(r = data, nfactors = 3, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                   MR1  MR2  MR3   h2   u2 com
NotImportant      0.14 0.12 0.67 0.49 0.51 1.1
NeverThink        0.10 0.10 0.61 0.40 0.60 1.1
VeryInterested    0.28 0.36 0.48 0.44 0.56 2.5
LookFeatures      0.15 0.61 0.10 0.40 0.60 1.2
InvestigateDepth  0.13 0.72 0.10 0.54 0.46 1.1
SomeAreBetter     0.07 0.52 0.09 0.28 0.72 1.1
LearnAboutOptions 0.13 0.68 0.15 0.50 0.50 1.2
OthersOpinion     0.67 0.14 0.13 0.48 0.52 1.2
ExpressesPerson   0.71 0.14 0.13 0.53 0.47 1.1
TellAbout         0.65 0.12 0.14 0.46 0.54 1.2
MatchImage        0.63 0.13 0.08 0.42 0.58 1.1

                       MR1  MR2  MR3
SS loadings           1.94 1.83 1.17
Proportion Var        0.18 0.17 0.11
Cumulative Var        0.18 0.34 0.45
Proportion Explained  0.39 0.37 0.24
Cumulative Proportion 0.39 0.76 1.00

Mean item complexity =  1.3
Test of the hypothesis that 3 factors are sufficient.

The degrees of freedom for the null model are  55  and the objective function was  2.76 with Chi Square of  9905.74
The degrees of freedom for the model are 25  and the objective function was  0 

The root mean square of the residuals (RMSR) is  0 
The df corrected root mean square of the residuals is  0.01 

The harmonic number of observations is  3600 with the empirical chi square  9.78  with prob <  1 
The total number of observations was  3600  with Likelihood Chi Square =  17.17  with prob <  0.88 

Tucker Lewis Index of factoring reliability =  1.002
RMSEA index =  0  and the 90 % confidence intervals are  0 0.007
BIC =  -187.54
Fit based upon off diagonal values = 1
Measures of factor score adequacy             
                                                   MR1  MR2  MR3
Correlation of (regression) scores with factors   0.87 0.86 0.79
Multiple R square of scores with factors          0.75 0.73 0.62
Minimum correlation of possible factor scores     0.50 0.46 0.24

Factor Loadings on Manifest Variables



Generally, factor with loading on manifest variable more than 0.30 is considered—

  MR1 MR2 MR3
NotImportant 0.674
NeverThink 0.614
VeryInterested 0.362 0.476
LookFeatures 0.607
InvestigateDepth 0.716
SomeAreBetter 0.518
LearnAboutOptions 0.678
OthersOpinion 0.665
ExpressesPerson 0.706
TellAbout 0.655
MatchImage 0.633

Visualization



Now, we can give name to these factors. Suggest!!! ????

In the sixth Step, you can repeat previous step with different rotations and try to compare the results from interpretation of the model point of view.


### Use the Factor Scores for Each Respondent


ImageF FeatureF GeneralF
0.5111 -1.241 0.7928
-0.06877 0.2804 0.6641
-0.3027 -0.1043 -0.8785
-0.8661 -1.106 0.4226
-0.692 -0.08929 -0.403
1.534 -0.3898 -0.06599
  ImageF FeatureF GeneralF
3595 -0.0229 -0.204 -0.08209
3596 0.2773 0.4208 0.4915
3597 1.938 -1.261 0.3646
3598 -0.4837 0.4699 1.331
3599 -0.5671 0.9789 0.2916
3600 -0.9219 0.6484 1.162

Confirmatory Factor Analysis

Column

CFA Introduction




Confirmatory Factor Analysis

CFA is a special case of structural equation modeling (SEM), applied to latent variable assessment, usually for surveys and similar data.


1 Assess the structure of survey scales — do items load where one would hope?

2 Evaluate the fit / appropriateness of a factor model — is a proposed model better than alternatives?

3 Evaluate the weights of items relative to one another and a scale — do they contribute equally?

4 Model other effects such as method effects and hierarchical relationships


Steps in CFA


1. Define your hypothesized/favored model with relationships of latent variables to manifest variables.

2 Define 1 or more alternative models that are reasonable, but which you believe are inferior.

3 Fit the models to your data.

4 Determine whether your model is good enough (fit indices, paths)

5 Determine whether your model is better than the alternative

6 Intepret your model




Let’s try to fit 3 factor model on our data and we will compare it with 1 factor model shown below—





Model Fit Measures




Model fit indices are the measures to evaluate and compare the CFA models.

Following model fit indices are most frequently referred—

Global fit indices

Example: Comparative Fit Index (CFI). Attempts to assess “absolute” fit vs. the data. Not very good measures, but set a minimum bar: want fit > 0.90.

Approximation error and residuals

Example: Standardized Root Mean Square Residual (SRMR). Difference between the data’s covariance matrix and the fitted model’s matrix. Want SRMR < 0.08. For Root Mean Square Error of Approximation, want Lower-CI(RMSEA) < 0.05.

Information Criteria

Example: Akaike Information Criterion (AIC). Assesses the model’s fit vs. the observed data. No absolute interpretation, but lower is better. Difference of 10 or more is large.


3 Factor Model



lavaan 0.6-6 ended normally after 36 iterations

Estimator ML Optimization method NLMINB Number of free parameters 25

Number of observations 3600

Model Test User Model:

Test statistic 287.649 Degrees of freedom 41 P-value (Chi-square) 0.000

Model Test Baseline Model:

Test statistic 9920.901 Degrees of freedom 55 P-value 0.000

User Model versus Baseline Model:

Comparative Fit Index (CFI) 0.975 Tucker-Lewis Index (TLI) 0.966

Loglikelihood and Information Criteria:

Loglikelihood user model (H0) -52885.888 Loglikelihood unrestricted model (H1) -52742.064

Akaike (AIC) 105821.776 Bayesian (BIC) 105976.494 Sample-size adjusted Bayesian (BIC) 105897.056

Root Mean Square Error of Approximation:

RMSEA 0.041 90 Percent confidence interval - lower 0.036 90 Percent confidence interval - upper 0.045 P-value RMSEA <= 0.05 1.000

Standardized Root Mean Square Residual:

SRMR 0.030

Parameter Estimates:

Standard errors Standard Information Expected Information saturated (h1) model Structured

Latent Variables: Estimate Std.Err z-value P(>|z|) General =~
NotImportant 1.000
NeverThink 0.948 0.042 22.415 0.000 VeryInterested 1.305 0.052 25.268 0.000 Feature =~
LookFeatures 1.000
InvestigatDpth 1.168 0.037 31.168 0.000 SomeAreBetter 0.822 0.033 25.211 0.000 LearnAbotOptns 1.119 0.036 31.022 0.000 Image =~
OthersOpinion 1.000
ExpressesPersn 0.963 0.028 34.657 0.000 TellAbout 0.908 0.027 33.146 0.000 MatchImage 0.850 0.027 31.786 0.000

Covariances: Estimate Std.Err z-value P(>|z|) General ~~
Feature 0.217 0.012 17.561 0.000 Image 0.231 0.013 17.348 0.000 Feature ~~
Image 0.202 0.013 15.650 0.000

Variances: Estimate Std.Err z-value P(>|z|) .NotImportant 0.657 0.020 33.498 0.000 .NeverThink 0.796 0.022 35.967 0.000 .VeryInterested 0.463 0.022 21.479 0.000 .LookFeatures 0.657 0.019 33.973 0.000 .InvestigatDpth 0.554 0.019 28.588 0.000 .SomeAreBetter 0.779 0.021 37.701 0.000 .LearnAbotOptns 0.533 0.018 29.199 0.000 .OthersOpinion 0.640 0.020 32.071 0.000 .ExpressesPersn 0.476 0.016 29.501 0.000 .TellAbout 0.560 0.017 32.697 0.000 .MatchImage 0.599 0.017 34.500 0.000 General 0.337 0.021 15.799 0.000 Feature 0.446 0.024 18.684 0.000 Image 0.591 0.028 21.092 0.000

  • FIT:

    Table continues below
    npar fmin chisq df pvalue baseline.chisq baseline.df
    25 0.03995 287.6 41 0 9921 55
    Table continues below
    baseline.pvalue cfi tli logl unrestricted.logl aic
    0 0.975 0.9665 -52886 -52742 105822
    Table continues below
    bic ntotal bic2 rmsea rmsea.ci.lower rmsea.ci.upper
    105976 3600 105897 0.04088 0.03649 0.0454
    rmsea.pvalue srmr
    0.9996 0.03033
  • PE:

    lhs op rhs exo est se z pvalue
    General =~ NotImportant 0 1 0 NA NA
    General =~ NeverThink 0 0.9484 0.04231 22.41 0
    General =~ VeryInterested 0 1.305 0.05165 25.27 0
    Feature =~ LookFeatures 0 1 0 NA NA
    Feature =~ InvestigateDepth 0 1.168 0.03748 31.17 0
    Feature =~ SomeAreBetter 0 0.8216 0.03259 25.21 0
    Feature =~ LearnAboutOptions 0 1.119 0.03606 31.02 0
    Image =~ OthersOpinion 0 1 0 NA NA
    Image =~ ExpressesPerson 0 0.9629 0.02778 34.66 0
    Image =~ TellAbout 0 0.9075 0.02738 33.15 0
    Image =~ MatchImage 0 0.8499 0.02674 31.79 0
    NotImportant ~~ NotImportant 0 0.6575 0.01963 33.5 0
    NeverThink ~~ NeverThink 0 0.7964 0.02214 35.97 0
    VeryInterested ~~ VeryInterested 0 0.4631 0.02156 21.48 0
    LookFeatures ~~ LookFeatures 0 0.6568 0.01933 33.97 0
    InvestigateDepth ~~ InvestigateDepth 0 0.5543 0.01939 28.59 0
    SomeAreBetter ~~ SomeAreBetter 0 0.7794 0.02067 37.7 0
    LearnAboutOptions ~~ LearnAboutOptions 0 0.5325 0.01824 29.2 0
    OthersOpinion ~~ OthersOpinion 0 0.6399 0.01995 32.07 0
    ExpressesPerson ~~ ExpressesPerson 0 0.4761 0.01614 29.5 0
    TellAbout ~~ TellAbout 0 0.56 0.01713 32.7 0
    MatchImage ~~ MatchImage 0 0.5991 0.01737 34.5 0
    General ~~ General 0 0.3366 0.0213 15.8 0
    Feature ~~ Feature 0 0.4462 0.02388 18.68 0
    Image ~~ Image 0 0.5908 0.02801 21.09 0
    General ~~ Feature 0 0.217 0.01236 17.56 0
    General ~~ Image 0 0.2312 0.01333 17.35 0
    Feature ~~ Image 0 0.2023 0.01292 15.65 0

1 Factor Model



lavaan 0.6-6 ended normally after 33 iterations

Estimator ML Optimization method NLMINB Number of free parameters 22

Number of observations 3600

Model Test User Model:

Test statistic 3284.581 Degrees of freedom 44 P-value (Chi-square) 0.000

Model Test Baseline Model:

Test statistic 9920.901 Degrees of freedom 55 P-value 0.000

User Model versus Baseline Model:

Comparative Fit Index (CFI) 0.672 Tucker-Lewis Index (TLI) 0.589

Loglikelihood and Information Criteria:

Loglikelihood user model (H0) -54384.354 Loglikelihood unrestricted model (H1) -52742.064

Akaike (AIC) 108812.709 Bayesian (BIC) 108948.860 Sample-size adjusted Bayesian (BIC) 108878.955

Root Mean Square Error of Approximation:

RMSEA 0.143 90 Percent confidence interval - lower 0.139 90 Percent confidence interval - upper 0.147 P-value RMSEA <= 0.05 0.000

Standardized Root Mean Square Residual:

SRMR 0.102

Parameter Estimates:

Standard errors Standard Information Expected Information saturated (h1) model Structured

Latent Variables: Estimate Std.Err z-value P(>|z|) Int =~
NotImportant 1.000
NeverThink 0.913 0.058 15.851 0.000 VeryInterested 1.475 0.071 20.794 0.000 LookFeatures 1.251 0.066 19.028 0.000 InvestigatDpth 1.355 0.069 19.534 0.000 SomeAreBetter 0.979 0.059 16.673 0.000 LearnAbotOptns 1.341 0.068 19.734 0.000 OthersOpinion 1.553 0.076 20.502 0.000 ExpressesPersn 1.465 0.070 20.789 0.000 TellAbout 1.405 0.069 20.335 0.000 MatchImage 1.312 0.066 19.815 0.000

Variances: Estimate Std.Err z-value P(>|z|) .NotImportant 0.821 0.020 40.286 0.000 .NeverThink 0.955 0.023 40.895 0.000 .VeryInterested 0.660 0.018 36.603 0.000 .LookFeatures 0.832 0.021 39.117 0.000 .InvestigatDpth 0.845 0.022 38.601 0.000 .SomeAreBetter 0.915 0.023 40.584 0.000 .LearnAbotOptns 0.780 0.020 38.362 0.000 .OthersOpinion 0.813 0.022 37.195 0.000 .ExpressesPersn 0.652 0.018 36.614 0.000 .TellAbout 0.705 0.019 37.490 0.000 .MatchImage 0.728 0.019 38.259 0.000 Int 0.173 0.015 11.794 0.000

  • FIT:

    Table continues below
    npar fmin chisq df pvalue baseline.chisq baseline.df
    22 0.4562 3285 44 0 9921 55
    Table continues below
    baseline.pvalue cfi tli logl unrestricted.logl aic
    0 0.6715 0.5894 -54384 -52742 108813
    Table continues below
    bic ntotal bic2 rmsea rmsea.ci.lower rmsea.ci.upper
    108949 3600 108879 0.143 0.1389 0.1472
    rmsea.pvalue srmr
    0 0.1019
  • PE:

    lhs op rhs exo est se z pvalue
    Int =~ NotImportant 0 1 0 NA NA
    Int =~ NeverThink 0 0.9127 0.05758 15.85 0
    Int =~ VeryInterested 0 1.475 0.07095 20.79 0
    Int =~ LookFeatures 0 1.251 0.06573 19.03 0
    Int =~ InvestigateDepth 0 1.355 0.06937 19.53 0
    Int =~ SomeAreBetter 0 0.9794 0.05874 16.67 0
    Int =~ LearnAboutOptions 0 1.341 0.06795 19.73 0
    Int =~ OthersOpinion 0 1.553 0.07575 20.5 0
    Int =~ ExpressesPerson 0 1.465 0.07049 20.79 0
    Int =~ TellAbout 0 1.405 0.06908 20.34 0
    Int =~ MatchImage 0 1.312 0.06622 19.82 0
    NotImportant ~~ NotImportant 0 0.821 0.02038 40.29 0
    NeverThink ~~ NeverThink 0 0.9549 0.02335 40.89 0
    VeryInterested ~~ VeryInterested 0 0.6598 0.01802 36.6 0
    LookFeatures ~~ LookFeatures 0 0.8322 0.02127 39.12 0
    InvestigateDepth ~~ InvestigateDepth 0 0.8453 0.0219 38.6 0
    SomeAreBetter ~~ SomeAreBetter 0 0.9146 0.02254 40.58 0
    LearnAboutOptions ~~ LearnAboutOptions 0 0.7797 0.02032 38.36 0
    OthersOpinion ~~ OthersOpinion 0 0.8134 0.02187 37.19 0
    ExpressesPerson ~~ ExpressesPerson 0 0.6522 0.01781 36.61 0
    TellAbout ~~ TellAbout 0 0.7051 0.01881 37.49 0
    MatchImage ~~ MatchImage 0 0.7279 0.01903 38.26 0
    Int ~~ Int 0 0.1731 0.01468 11.79 0

Model Paths

About Us

DASHBOARD PREPARED BY (CONTACT FOR MACHINE LEARNING TRAINING, COACHING , CONSULTING & Complete Analysis of this case study)

* Dr AMITA SHARMA Post Doc from Erasmus University, Rotterdam, the Netherlands Assistant Professor Institute of Agri Business Management, Swami Keshwanand Rajasthan Agricultural University, Bikaner (Raj),India Blog: www.thinkingai.in

* ARUN KUMAR SHARMA Machine Learning Enthusiast 13 Years of Financial Services Marketing Exp Blogger, Writer and Machine Learning Consutlant Certified Business Analytics Professional Certified in Predictive Analytics, Indian Institute of Mnamagement,IIMx Bangalore Certified in Macroeconomic Forecasting, International Monetary Fund(IMFx) Certified in Text Analytics, openSAP Email: Tel:9468567418

=============================================

---
title: "Factor Analysis Case Study"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    social : ["facebook","twitter","linkedin", "menu"]
    source_code: embed
---

Introduction   {data-navmenu="LearnMore"}
======================================




Source of the content: http://r-marketing.r-forge.r-project.org/Instructor/Intro%20Factor%20Analysis/intro-factor-analysis.pdf

Factor Analysis: Basic Framework

From the original variables, factor analysis (FA) tries to find a smaller number of derived variables (factors) that meet theseconditions:

1 Maximally capture the correlations among the original variables (after accounting for error)
2 Each factor is associated clearly with a subset of the variables
3 Each variable is associated clearly with (ideally) only one factor
4 The factors are maximally differentiated from one another


These are rarely met perfectly in practice, but when they are approximated, the solution is close to “simple structure” that is very interpretable.

Another way to look at FA is that it seeks latent variables. A latent variable is an unobservable data generating process — such as a mental state — that is manifested in measurable quantities (such as survey items).

The product interest survey was designed to assess three latent variables:

General interest in a product category
Detailed interest in specific features
Interest in the product as an “image” product

Each of those is assessed with multiple items because any single item is imperfect.



People often confuse between the terms -- factor analysis/ exploratory factor analysis and confirmatory factor analysis. Following table shows the difference between the two---



KEY TERMS IN FACTOR ANALYSIS

Latent variable: a presumed cognitive or data generating process that leads to observable data. This is often a theoretical construct.
Example: Product interest. Symbol: circle/oval, such as F1 .

Factor: a dimensional reduction that estimates a latent variable and its relationship to manifest variables.
Example: InterestFactor.

Loading: the strength of relationship between a factor and a variable.
Example: F1 → v1 = 0.45.
Ranges [-1.0 . . . 1.0], same as Pearson’s r.


Following diagram shows the a typical workflow of Factor Analysis---


Let us do Factor Analyais on a dataset. The dataset contains 11 items for simulated product interest and engagement data (PIES), rated on 7 point Likert type scale. We will determine the right number of factors and their variable loadings.

Some items's scoring is transformed in following way---




Dataset {data-navmenu="LearnMore"} =============================================================== Column {.tabset} ----------------------------------------- ### Data Table ```{r} library(pander) data=read.csv("factoranalysis.csv", header=T, stringsAsFactors = FALSE) colnames(data)=c("NotImportant","NeverThink","VeryInterested","LookFeatures","InvestigateDepth","SomeAreBetter","LearnAboutOptions","OthersOpinion","ExpressesPerson","TellAbout","MatchImage") DT::datatable(data, caption="Data View of simulated product interest and engagement data", filter="top") ``` ### Dimension ```{r} dim(data) ``` ### Variable Names ```{r} pander(colnames(data)) ``` ### Summary ```{r} pander(summary(data)) ``` ### Correlation Plot ```{r} corrplot::corrplot(cor(data), diag = FALSE) ```


Determining Number of Factors {data-navmenu="LearnMore"} ==========================================================


There is usually not a definitive answer. Choosing number of factors is partially a matter of usefulness.
Generally, look for consensus among:
- Theory: how many do you expect?
- Correlation matrix: how many seem to be there?
- Eigenvalues: how many Factors have Eigenvalue > 1?
- Eigenvalue scree plot: where is the “bend” in extraction?
- Parallel analysis and acceleration


Column {.tabset} ------------------------------------- ### Eigenvalue In factor analysis, an eigenvalue is the proportion of total shared (i.e., non-error) variance explained by each factor.

A factor is only useful if it explains more than 1 variable . . . and thus has eigenvalue > 1.0. ```{r} pander(eigen(cor(data))$values) ```

This thumb rule suggests 3 factors in the data. ### ScreePlot ```{r} pc=prcomp(data, cor=TRUE) screeplot(pc, type="lines") ``` Factor Rotation Model {data-navmenu="LearnMore"} =========================================


EFA can be thought of as slicing a pizza. The same material(variance) can be carved up in ways that are mathematically identical, but might be more or less useful for a given situation.

Key decision: do you want the extracted factors to be correlated or not? In FA jargon, orthogonal or oblique?

By default, EFA looks for orthogonal factors that have r=0 correlation. This maximizes the interpretability, so orthogonal rotation is recommended in most cases, at least to start.



Some rotation options

Default: varimax: orthogonal rotation that aims for clear factor/variable structure. Generally recommended.

Oblique: oblimin: finds correlated factors while aiming for interpretability. Recommended if you want an oblique solution.

Oblique: promax: finds correlated factors similarly, but computationally different (good alternative). Recommended alternative if oblimin is not available or has difficulty.

Many others . . . : dozens have been developed. They are useful mostly when you’re very concerned about psychometrics


Fitting the Model {data-navmenu="LearnMore"} =================================================== Column {.tabset} --------------------------------------- ### Fitting the Model Let us fit the model with orthogonal rotation.

```{r echo=TRUE} library(psych) data.fa <- fa(data, nfactors=3, rotate="varimax") pander(summary(data.fa)) ```


Summary of the Model

```{r echo=TRUE} data.fa ``` ### Factor Loadings on Manifest Variables

Generally, factor with loading on manifest variable more than 0.30 is considered---

```{r echo=FALSE} L=round(data.fa$loadings, digits = 3) struct=ifelse(L>0.3,as.numeric(L),' ') pander(struct) ``` ### Visualization ```{r echo=FALSE} fa.diagram(data.fa) ```

Now, we can give name to these factors. Suggest!!! ????

In the sixth Step, you can repeat previous step with different rotations and try to compare the results from interpretation of the model point of view.


### Use the Factor Scores for Each Respondent


```{r} fa.scores <- data.frame(data.fa$scores) names(fa.scores) <- c("ImageF", "FeatureF", "GeneralF") pander(head(fa.scores)) pander(tail(fa.scores)) ``` Confirmatory Factor Analysis {data-navmenu="LearnMore"} ======================================== Column {.tabset} ---------------------------------------------- ### CFA Introduction


Confirmatory Factor Analysis

CFA is a special case of structural equation modeling (SEM), applied to latent variable assessment, usually for surveys and similar data.


1 Assess the structure of survey scales — do items load where one would hope?

2 Evaluate the fit / appropriateness of a factor model — is a proposed model better than alternatives?

3 Evaluate the weights of items relative to one another and a scale — do they contribute equally?

4 Model other effects such as method effects and hierarchical relationships


Steps in CFA


1. Define your hypothesized/favored model with relationships of latent variables to manifest variables.

2 Define 1 or more alternative models that are reasonable, but which you believe are inferior.

3 Fit the models to your data.

4 Determine whether your model is good enough (fit indices, paths)

5 Determine whether your model is better than the alternative

6 Intepret your model




Let's try to fit 3 factor model on our data and we will compare it with 1 factor model shown below---





### Model Fit Measures


Model fit indices are the measures to evaluate and compare the CFA models.

Following model fit indices are most frequently referred---

Global fit indices

Example: Comparative Fit Index (CFI). Attempts to assess “absolute” fit vs. the data. Not very good measures, but set a minimum bar: want fit > 0.90.

Approximation error and residuals

Example: Standardized Root Mean Square Residual (SRMR). Difference between the data’s covariance matrix and the fitted model’s matrix. Want SRMR < 0.08. For Root Mean Square Error of Approximation, want Lower-CI(RMSEA) < 0.05.

Information Criteria

Example: Akaike Information Criterion (AIC). Assesses the model’s fit vs. the observed data. No absolute interpretation, but lower is better. Difference of 10 or more is large.


### 3 Factor Model

```{r echo=TRUE} library(lavaan) Model3 <- " General =~ NotImportant + NeverThink + VeryInterested Feature =~ LookFeatures + InvestigateDepth + SomeAreBetter + LearnAboutOptions Image =~ OthersOpinion + ExpressesPerson + TellAbout + MatchImage" fit3 <- cfa(Model3, data=data) pander(summary(fit3, fit.measures=TRUE)) ``` ### 1 Factor Model

```{r echo=TRUE} Model1 <- " Int =~ NotImportant + NeverThink + VeryInterested+ LookFeatures + InvestigateDepth + SomeAreBetter + LearnAboutOptions+ OthersOpinion + ExpressesPerson + TellAbout + MatchImage" fit1 <- cfa(Model1, data=data) pander(summary(fit1, fit.measures=TRUE)) ``` ### Model Paths ```{r} semPlot::semPaths(fit3, "std") ``` About Us {data-navmenu="LearnMore"} ===================================================== ### DASHBOARD PREPARED BY (CONTACT FOR MACHINE LEARNING TRAINING, COACHING , CONSULTING & Complete Analysis of this case study) * Dr AMITA SHARMA Post Doc from Erasmus University, Rotterdam, the Netherlands Assistant Professor Institute of Agri Business Management, Swami Keshwanand Rajasthan Agricultural University, Bikaner (Raj),India Blog: www.thinkingai.in

* ARUN KUMAR SHARMA Machine Learning Enthusiast 13 Years of Financial Services Marketing Exp Blogger, Writer and Machine Learning Consutlant Certified Business Analytics Professional Certified in Predictive Analytics, Indian Institute of Mnamagement,IIMx Bangalore Certified in Macroeconomic Forecasting, International Monetary Fund(IMFx) Certified in Text Analytics, openSAP Email: aks10000@gmail.com Tel:9468567418 =============================================