Import Data

library(rio)
library(psych)
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(corpcor)
library(GPArotation)
GeoScience <- read.csv("/Users/Lorraine/Desktop/Geoscience.csv")
Geo <- GeoScience[, 1:13]

1. How many factors should be interpreted? Report the relevant statistics (e.g., scree plot, parallel analysis, percent of variance) and rationale for your conclusion (based on the statistics and interpretability of the solution).

#Scree plot
pc1 <- principal(Geo, nfactors = 13, rotate = "none")
plot(pc1$values, type = "b")

#Parallel analysis
fa.parallel(Geo,fm = "pa", fa = "fa", n.iter = 500)

## Parallel analysis suggests that the number of factors =  3  and the number of components =  NA

The scree plot based on the most complicated situation (13 factors) suggested that we could include the first 4 or 5 factors since the slope is relatively steep for them. However, the parallel analysis suggested us to include 3 factors becuase only 3 factors explains more variance than the variance explained by simulated situations.Therefore, I am going to include 3 factors.

2. Turn in the table of factor loadings. Interpret the loadings in the Pattern Matrix. Mark the items used in your interpretation.

FAOB <- fa(Geo, fm="pa", nfactors = 3, rotate = "promax")
print.psych(FAOB, cut = .3, sort = TRUE)
## Factor Analysis using method =  pa
## Call: fa(r = Geo, nfactors = 3, rotate = "promax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        item   PA1   PA2   PA3     h2   u2 com
## item07    7  0.83             0.7113 0.29 1.2
## item06    6  0.83             0.5865 0.41 1.2
## item03    3  0.68             0.5195 0.48 1.4
## item08    8  0.43             0.2420 0.76 1.3
## item12   12                   0.0525 0.95 1.9
## item09    9        0.83       0.7100 0.29 1.1
## item10   10        0.74       0.5205 0.48 1.1
## item11   11        0.55       0.2715 0.73 1.1
## item13   13        0.39       0.2419 0.76 1.6
## item02    2              0.71 0.5614 0.44 1.1
## item01    1              0.55 0.3363 0.66 1.3
## item04    4              0.35 0.1752 0.82 1.4
## item05    5                   0.0089 0.99 1.3
## 
##                        PA1  PA2  PA3
## SS loadings           2.04 1.74 1.16
## Proportion Var        0.16 0.13 0.09
## Cumulative Var        0.16 0.29 0.38
## Proportion Explained  0.41 0.35 0.23
## Cumulative Proportion 0.41 0.77 1.00
## 
##  With factor correlations of 
##      PA1  PA2  PA3
## PA1 1.00 0.34 0.23
## PA2 0.34 1.00 0.14
## PA3 0.23 0.14 1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 3 factors are sufficient.
## 
## The degrees of freedom for the null model are  78  and the objective function was  3.27 with Chi Square of  427.43
## The degrees of freedom for the model are 42  and the objective function was  0.61 
## 
## The root mean square of the residuals (RMSR) is  0.05 
## The df corrected root mean square of the residuals is  0.07 
## 
## The harmonic number of observations is  137 with the empirical chi square  61.69  with prob <  0.025 
## The total number of observations was  137  with Likelihood Chi Square =  78.4  with prob <  0.00056 
## 
## Tucker Lewis Index of factoring reliability =  0.803
## RMSEA index =  0.084  and the 90 % confidence intervals are  0.052 0.107
## BIC =  -128.24
## Fit based upon off diagonal values = 0.94
## Measures of factor score adequacy             
##                                                    PA1  PA2  PA3
## Correlation of (regression) scores with factors   0.92 0.90 0.83
## Multiple R square of scores with factors          0.84 0.81 0.69
## Minimum correlation of possible factor scores     0.68 0.63 0.38

The loading table suggests that item 3, 6, 7, 8 loaded on factor 1, item 9, 10, 11, 13 loaded on factor 2, and item 1, 2, 4 loaded on factor 3, while item 5 did not load on any factor.

3. Assign a label to each factor based on the content of the items with strong loadings. Explain the rationale for your label based on what is common among the items (note: your explanation should go beyond simply listing the item content).

I would label Factor 1 as “Career interests in science” since all 4 items indicates reponse relates to career attitude towards science or geoscience. Factor 2 should be labeled as “Outdoor Enjoyment” since items loaded on factor 2 measure individuals’ interests in outdoor activities. Factor 3 should be labeled as “Knowledge in science” becuase all items loaded on this factor capture people’s skills and knowledge in science.

4. Report the correlations among the factors.

# PA1 PA2 PA3

PA1 1.00 0.34 0.23

PA2 0.34 1.00 0.14

PA3 0.23 0.14 1.00

The correlations among the factors are not strong, with a moderate correlation between factor 1 and factor 2, week correlations between factor 1 and factor 3 as well as factor 2 and factor 3.

5. Using the results, identify the three (3) best items to represent each subscale.

Items best represnet factor 1: item3, item6, item7

Items best represent factor 2: item9, item10, item11

Items best represent factor 3: item1, item2, item4