The aim of this project is to reduce the music and movie preferences of people supervised to investigation usign Principal Component Analysis.
The dataset used in this project is made up of 1010 observations where each row represents a person to whom the survey was administered and 150 different columns.
The variables (columns) can be split into the following groups:
I decide to focus my analysis only in music and movies preference.
## 'data.frame': 1010 obs. of 150 variables:
## $ Music : int 5 4 5 5 5 5 5 5 5 5 ...
## $ Slow.songs.or.fast.songs : int 3 4 5 3 3 3 5 3 3 3 ...
## $ Dance : int 2 2 2 2 4 2 5 3 3 2 ...
## $ Folk : int 1 1 2 1 3 3 3 2 1 5 ...
## $ Country : int 2 1 3 1 2 2 1 1 1 2 ...
## $ Classical.music : int 2 1 4 1 4 3 2 2 2 2 ...
## $ Musical : int 1 2 5 1 3 3 2 2 4 5 ...
## $ Pop : int 5 3 3 2 5 2 5 4 3 3 ...
## $ Rock : int 5 5 5 2 3 5 3 5 5 5 ...
## $ Metal.or.Hardrock : int 1 4 3 1 1 5 1 1 5 2 ...
## $ Punk : int 1 4 4 4 2 3 1 2 1 3 ...
## $ Hiphop..Rap : int 1 1 1 2 5 4 3 3 1 2 ...
## $ Reggae..Ska : int 1 3 4 2 3 3 1 2 2 4 ...
## $ Swing..Jazz : int 1 1 3 1 2 4 1 2 2 4 ...
## $ Rock.n.roll : int 3 4 5 2 1 4 2 3 2 4 ...
## $ Alternative : int 1 4 5 5 2 5 3 1 NA 4 ...
## $ Latino : int 1 2 5 1 4 3 3 2 1 5 ...
## $ Techno..Trance : int 1 1 1 2 2 1 5 3 1 1 ...
## $ Opera : int 1 1 3 1 2 3 2 2 1 2 ...
## $ Movies : int 5 5 5 5 5 5 4 5 5 5 ...
## $ Horror : int 4 2 3 4 4 5 2 4 1 2 ...
## $ Thriller : int 2 2 4 4 4 5 1 4 5 1 ...
## $ Comedy : int 5 4 4 3 5 5 5 5 5 5 ...
## $ Romantic : int 4 3 2 3 2 2 3 2 4 5 ...
## $ Sci.fi : int 4 4 4 4 3 3 1 3 4 1 ...
## $ War : int 1 1 2 3 3 3 3 3 5 3 ...
## $ Fantasy.Fairy.tales : int 5 3 5 1 4 4 5 4 4 4 ...
## $ Animated : int 5 5 5 2 4 3 5 4 4 4 ...
## $ Documentary : int 3 4 2 5 3 3 3 3 5 4 ...
## $ Western : int 1 1 2 1 1 2 1 1 1 1 ...
## $ Action : int 2 4 1 2 4 4 2 3 1 2 ...
## $ History : int 1 1 1 4 3 5 3 5 3 3 ...
## $ Psychology : int 5 3 2 4 2 3 3 2 2 2 ...
## $ Politics : int 1 4 1 5 3 4 1 3 1 3 ...
## $ Mathematics : int 3 5 5 4 2 2 1 1 1 3 ...
## $ Physics : int 3 2 2 1 2 3 1 1 1 1 ...
## $ Internet : int 5 4 4 3 2 4 2 5 1 5 ...
## $ PC : int 3 4 2 1 2 4 1 4 1 1 ...
## $ Economy.Management : int 5 5 4 2 2 1 3 1 1 4 ...
## $ Biology : int 3 1 1 3 3 4 5 2 3 2 ...
## $ Chemistry : int 3 1 1 3 3 4 5 2 1 1 ...
## $ Reading : int 3 4 5 5 5 3 3 2 5 4 ...
## $ Geography : int 3 4 2 4 2 3 3 3 1 4 ...
## $ Foreign.languages : int 5 5 5 4 3 4 4 4 1 5 ...
## $ Medicine : int 3 1 2 2 3 4 5 1 1 1 ...
## $ Law : int 1 2 3 5 2 3 3 2 1 1 ...
## $ Cars : int 1 2 1 1 3 5 4 1 1 1 ...
## $ Art.exhibitions : int 1 2 5 5 1 2 1 1 1 4 ...
## $ Religion : int 1 1 5 4 4 2 1 2 2 4 ...
## $ Countryside..outdoors : int 5 1 5 1 4 5 4 2 4 4 ...
## $ Dancing : int 3 1 5 1 1 1 3 1 1 5 ...
## $ Musical.instruments : int 3 1 5 1 3 5 2 1 2 3 ...
## $ Writing : int 2 1 5 3 1 1 1 1 1 1 ...
## $ Passive.sport : int 1 1 5 1 3 5 5 4 4 4 ...
## $ Active.sport : int 5 1 2 1 1 4 3 5 1 4 ...
## $ Gardening : int 5 1 1 1 4 2 3 1 1 1 ...
## $ Celebrities : int 1 2 1 2 3 1 1 3 5 2 ...
## $ Shopping : int 4 3 4 4 3 2 3 3 2 4 ...
## $ Science.and.technology : int 4 3 2 3 3 3 4 2 1 3 ...
## $ Theatre : int 2 2 5 1 2 1 3 2 5 5 ...
## $ Fun.with.friends : int 5 4 5 2 4 3 5 4 4 5 ...
## $ Adrenaline.sports : int 4 2 5 1 2 3 1 2 1 2 ...
## $ Pets : int 4 5 5 1 1 2 5 5 1 2 ...
## $ Flying : int 1 1 1 2 1 3 1 3 2 4 ...
## $ Storm : int 1 1 1 1 2 2 3 2 3 5 ...
## $ Darkness : int 1 1 1 1 1 2 2 4 1 4 ...
## $ Heights : int 1 2 1 3 1 2 1 3 5 5 ...
## $ Spiders : int 1 1 1 5 1 1 1 1 5 3 ...
## $ Snakes : int 5 1 1 5 1 2 5 5 5 4 ...
## $ Rats : int 3 1 1 5 2 2 1 3 2 4 ...
## $ Ageing : int 1 3 1 4 2 1 4 1 2 3 ...
## $ Dangerous.dogs : int 3 1 1 5 4 1 1 2 3 5 ...
## $ Fear.of.public.speaking : int 2 4 2 5 3 3 1 4 4 3 ...
## $ Smoking : chr "never smoked" "never smoked" "tried smoking" "former smoker" ...
## $ Alcohol : chr "drink a lot" "drink a lot" "drink a lot" "drink a lot" ...
## $ Healthy.eating : int 4 3 3 3 4 2 4 2 1 3 ...
## $ Daily.events : int 2 3 1 4 3 2 3 3 1 4 ...
## $ Prioritising.workload : int 2 2 2 4 1 2 5 1 2 2 ...
## $ Writing.notes : int 5 4 5 4 2 3 5 3 1 2 ...
## $ Workaholism : int 4 5 3 5 3 3 5 2 4 3 ...
## $ Thinking.ahead : int 2 4 5 3 5 3 3 4 2 3 ...
## $ Final.judgement : int 5 1 3 1 5 1 3 3 5 5 ...
## $ Reliability : int 4 4 4 3 5 3 4 3 5 4 ...
## $ Keeping.promises : int 4 4 5 4 4 4 5 3 4 5 ...
## $ Loss.of.interest : int 1 3 1 5 2 3 3 1 1 3 ...
## $ Friends.versus.money : int 3 4 5 2 3 2 4 4 4 4 ...
## $ Funniness : int 5 3 2 1 3 3 4 4 2 3 ...
## $ Fake : int 1 2 4 1 2 1 1 2 2 1 ...
## $ Criminal.damage : int 1 1 1 5 1 4 2 1 1 2 ...
## $ Decision.making : int 3 2 3 5 3 2 2 3 4 5 ...
## $ Elections : int 4 5 5 5 5 5 5 5 1 5 ...
## $ Self.criticism : int 1 4 4 5 5 4 3 3 3 4 ...
## $ Judgment.calls : int 3 4 4 4 5 4 5 5 2 5 ...
## $ Hypochondria : int 1 1 1 3 1 1 1 2 2 1 ...
## $ Empathy : int 3 2 5 3 3 4 4 1 5 4 ...
## $ Eating.to.survive : int 1 1 5 1 1 2 1 2 1 1 ...
## $ Giving : int 4 2 5 1 3 3 5 3 1 4 ...
## $ Compassion.to.animals : int 5 4 4 2 3 5 5 5 4 5 ...
## $ Borrowed.stuff : int 4 3 2 5 4 5 5 2 5 4 ...
## [list output truncated]
music <- data[,1:19]
movies <- data[20:31]I check if missing values are present in our categories of interest with summary.
In order to remove them, I will use the comand drop_na.
## Music Slow.songs.or.fast.songs Dance
## "NA's :3 " "NA's :2 " "NA's :4 "
## Folk Country Classical.music
## "NA's :5 " "NA's :5 " "NA's :7 "
## Musical Pop Rock
## "NA's :2 " "NA's :3 " "NA's :6 "
## Metal.or.Hardrock Punk Hiphop..Rap
## "NA's :3 " "NA's :8 " "NA's :4 "
## Reggae..Ska Swing..Jazz Rock.n.roll
## "NA's :7 " "NA's :6 " "NA's :7 "
## Alternative Latino Techno..Trance
## "NA's :7 " "NA's :8 " "NA's :7 "
## Opera
## "NA's :1 "
## Movies Horror Thriller Comedy
## "NA's :6 " "NA's :2 " "NA's :1 " "NA's :3 "
## Romantic Sci.fi War Fantasy.Fairy.tales
## "NA's :3 " "NA's :2 " "NA's :2 " "NA's :3 "
## Animated Documentary Western Action
## "NA's :3 " "NA's :8 " "NA's :4 " "NA's :2 "
As the plot below shows, there are some variables positively correlated with each other while there are also some which are negatively correlated.
For example is easy to see that Opera music is higly correleted with Classical.music or Punk with Metal or Hardrock.
On other hand Metal/Hardrock is negative correleted with Pop music.
Same as before, the plot illustrate the correletion between the variable in movie database.
As instance, the two most correlete types of Movie are Animated and Fantasy Fairy tales.
Principal component analysis (PCA) simplifies the complexity in high-dimensional data while retaining trends and patterns.
It does this by transforming the data into fewer dimensions, which act as summaries of features.
PCA may be influened by two elements, which should be addressed:
Kaiser-Guttman's Stopping Rule is a way of determining which components should be taken.
Components with an individual value greater than 1 should be maintained in this strategy.
It is also related to a screen test in which vertical axis values and horizontal axis components are plotted.
The pieces are ordered from the largest to the smallest and we pick the number of components based on the elbow rule.
We can select the number of components if the line of eigenvalues is levelling off.
The other technique is to look at the percentage of variance clarified, when components describe 80-90 percent it is fine.
pca1<-prcomp(music, center=TRUE, scale.=TRUE)
summary(pca1)## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.954 1.6287 1.4203 1.06085 1.04564 1.01691 0.94327
## Proportion of Variance 0.201 0.1396 0.1062 0.05923 0.05755 0.05443 0.04683
## Cumulative Proportion 0.201 0.3406 0.4468 0.50601 0.56355 0.61798 0.66481
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.92083 0.89933 0.80689 0.79847 0.75958 0.71350 0.66449
## Proportion of Variance 0.04463 0.04257 0.03427 0.03356 0.03037 0.02679 0.02324
## Cumulative Proportion 0.70944 0.75201 0.78627 0.81983 0.85020 0.87699 0.90023
## PC15 PC16 PC17 PC18 PC19
## Standard deviation 0.65879 0.63298 0.61603 0.59238 0.5749
## Proportion of Variance 0.02284 0.02109 0.01997 0.01847 0.0174
## Cumulative Proportion 0.92307 0.94416 0.96413 0.98260 1.0000
Altough the values grather than 1 above advice me to take 6 components, the following graph show that the sum of first 6 comoponents are not enough to reach 80% of variance clarified.
fviz_eig(pca1, addlabels = T)In fact if we consider the technique of percentage of variance clarified, I need at least 11 principal components to cover at least 80% of the variation in the data.
summary(pca1)## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.954 1.6287 1.4203 1.06085 1.04564 1.01691 0.94327
## Proportion of Variance 0.201 0.1396 0.1062 0.05923 0.05755 0.05443 0.04683
## Cumulative Proportion 0.201 0.3406 0.4468 0.50601 0.56355 0.61798 0.66481
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.92083 0.89933 0.80689 0.79847 0.75958 0.71350 0.66449
## Proportion of Variance 0.04463 0.04257 0.03427 0.03356 0.03037 0.02679 0.02324
## Cumulative Proportion 0.70944 0.75201 0.78627 0.81983 0.85020 0.87699 0.90023
## PC15 PC16 PC17 PC18 PC19
## Standard deviation 0.65879 0.63298 0.61603 0.59238 0.5749
## Proportion of Variance 0.02284 0.02109 0.01997 0.01847 0.0174
## Cumulative Proportion 0.92307 0.94416 0.96413 0.98260 1.0000
Seen the discordant results, I decide to investigate the number of components with another method: parallel analysis.
I have to compare my eigenvalues with the numbers of 95`percentile.
If eigenvalue is higher than the particular value returned by the function, there is a support to keep the component.
In my case, the first 4 components (3 + 1 borderline) seem higher than the competitors.
(eig.val <- get_eigenvalue(pca1))## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 3.8188378 20.099146 20.09915
## Dim.2 2.6525925 13.961013 34.06016
## Dim.3 2.0173416 10.617587 44.67775
## Dim.4 1.1253978 5.923146 50.60089
## Dim.5 1.0933675 5.754566 56.35546
## Dim.6 1.0341125 5.442697 61.79816
## Dim.7 0.8897612 4.682954 66.48111
## Dim.8 0.8479299 4.462789 70.94390
## Dim.9 0.8087956 4.256819 75.20072
## Dim.10 0.6510723 3.426696 78.62741
## Dim.11 0.6375536 3.355545 81.98296
## Dim.12 0.5769631 3.036648 85.01961
## Dim.13 0.5090878 2.679409 87.69902
## Dim.14 0.4415476 2.323935 90.02295
## Dim.15 0.4340003 2.284212 92.30716
## Dim.16 0.4006642 2.108759 94.41592
## Dim.17 0.3794989 1.997363 96.41328
## Dim.18 0.3509148 1.846920 98.26020
## Dim.19 0.3305612 1.739796 100.00000
hornpa(k=19,size=1000,reps=500,seed=123)##
## Parallel Analysis Results
##
## Method: pca
## Number of variables: 19
## Sample size: 1000
## Number of correlation matrices: 500
## Seed: 123
## Percentile: 0.95
##
## Compare your observed eigenvalues from your original dataset to the 95 percentile in the table below generated using random data. If your eigenvalue is greater than the percentile indicated (not the mean), you have support to retain that factor/component.
##
## Component Mean 0.95
## 1 1.247 1.292
## 2 1.205 1.239
## 3 1.169 1.194
## 4 1.139 1.164
## 5 1.112 1.136
## 6 1.085 1.108
## 7 1.062 1.083
## 8 1.039 1.057
## 9 1.016 1.034
## 10 0.995 1.014
## 11 0.973 0.991
## 12 0.951 0.971
## 13 0.929 0.948
## 14 0.908 0.926
## 15 0.885 0.905
## 16 0.862 0.881
## 17 0.837 0.857
## 18 0.809 0.834
## 19 0.775 0.804
Overall music results, suggest me to consider 5 components.
Here, I use k-mean in order to clusterize the results above and plot it.
km1<-eclust(music, k=5)autoplot(pca1, loadings=TRUE, loadings.colour='blue', loadings.label=TRUE, loadings.label.size=5)As I already did with music preferences, I decided to choose number of components with three different methods:
Moreover, I just change the pre-processing in order to proceed with Principal Component Analysis without different results.
The eigen values grather than 1 seem the first 4 ones.
pca2 <- prcomp(movies, center=TRUE, scale=TRUE)
fviz_eig(pca2, choice='eigenvalue', addlabels = T)If we compare the result of scree plot with the summary where Cumulative Proportion, at least 8 components are needed to reach the 85% of Percentage to variance explained.
summary(pca2)## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.5938 1.4522 1.2083 1.00035 0.89769 0.88677 0.86561
## Proportion of Variance 0.2117 0.1757 0.1217 0.08339 0.06715 0.06553 0.06244
## Cumulative Proportion 0.2117 0.3874 0.5091 0.59250 0.65965 0.72518 0.78762
## PC8 PC9 PC10 PC11 PC12
## Standard deviation 0.82154 0.77391 0.74671 0.64831 0.54479
## Proportion of Variance 0.05624 0.04991 0.04646 0.03503 0.02473
## Cumulative Proportion 0.84387 0.89378 0.94024 0.97527 1.00000
Regarding parallel analysis, the comparison between my eigen value and the column of 95` percentile, indicate that 4 components (3 + 1 borderline) should be keep.
(eig.val2 <- get_eigenvalue(pca2))## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 2.5403240 21.169367 21.16937
## Dim.2 2.1090111 17.575093 38.74446
## Dim.3 1.4599617 12.166348 50.91081
## Dim.4 1.0006963 8.339136 59.24994
## Dim.5 0.8058430 6.715359 65.96530
## Dim.6 0.7863557 6.552964 72.51827
## Dim.7 0.7492737 6.243948 78.76221
## Dim.8 0.6749285 5.624404 84.38662
## Dim.9 0.5989331 4.991109 89.37773
## Dim.10 0.5575788 4.646490 94.02422
## Dim.11 0.4203027 3.502523 97.52674
## Dim.12 0.2967911 2.473259 100.00000
hornpa(k=12,size=1000,reps=500,seed=123)##
## Parallel Analysis Results
##
## Method: pca
## Number of variables: 12
## Sample size: 1000
## Number of correlation matrices: 500
## Seed: 123
## Percentile: 0.95
##
## Compare your observed eigenvalues from your original dataset to the 95 percentile in the table below generated using random data. If your eigenvalue is greater than the percentile indicated (not the mean), you have support to retain that factor/component.
##
## Component Mean 0.95
## 1 1.180 1.224
## 2 1.135 1.167
## 3 1.099 1.127
## 4 1.067 1.093
## 5 1.038 1.060
## 6 1.010 1.030
## 7 0.984 1.004
## 8 0.957 0.978
## 9 0.929 0.951
## 10 0.900 0.924
## 11 0.870 0.896
## 12 0.831 0.863
Overall movie results, suggest me to consider 4 components.
Finally, I divided the movie preferences in 4 different cluster and plot it.
km2<-eclust(movies, k=4)autoplot(pca2, loadings=TRUE, loadings.colour='darkred', loadings.label=TRUE, loadings.label.size=5)The correlation between original variables and the power of each variable contribution to specific main components is shown in the below plots.
If variables are grouped together, the dependency implies that they are positively correlated, while if variables are placed on opposite sides of the plot, they are negatively correlated.
The length of the vector indicates how strong the specific variable's contribution to the specific main component is.
The plot where the PC1 and PC2 are considered togheter.
fviz_pca_var(pca1, col.var = "navy", repel = TRUE, axes = c(1, 2)) +
labs(title="Principal component analysis - Music", x="PC1", y="PC2")The plot where the PC2 and PC3 are considered togheter.
fviz_pca_var(pca1, col.var = "navy", repel = TRUE, axes = c(2, 3)) +
labs(title="Principal component analysis - Music", x="PC2", y="PC3")The plot where the PC3 and PC4 are considered togheter.
fviz_pca_var(pca1, col.var = "navy", repel = TRUE, axes = c(3, 4)) +
labs(title="Principal component analysis - Music", x="PC3", y="PC4")The plot where the PC4 and PC5 are considered togheter.
fviz_pca_var(pca1, col.var = "navy", repel = TRUE, axes = c(4, 5)) +
labs(title="Principal component analysis - Music", x="PC4", y="PC5")The plot where the PC1 and PC2 are considered togheter.
fviz_pca_var(pca2, col.var = "darkred", repel = TRUE, axes = c(1, 2)) +
labs(title="Principal component analysis - Movie", x="PC1", y="PC2")The plot where the PC2 and PC3 are considered togheter.
fviz_pca_var(pca2, col.var = "darkred", repel = TRUE, axes = c(2, 3)) +
labs(title="Principal component analysis - Movie", x="PC2", y="PC3")The plot where the PC3 and PC4 are considered togheter.
fviz_pca_var(pca2, col.var = "darkred", repel = TRUE, axes = c(3, 4)) +
labs(title="Principal component analysis - Movie", x="PC3", y="PC4")Continuing with the analysis, the plot below is not so helful in order to visulize the components.
On other hand, could be usefull to interpret the first five dimensions with the following types:
fviz_pca_var(pca1, col.var="contrib", col.circle = "blue",repel = T,ggtheme = theme_minimal())pca1$rotation[,1:5]## PC1 PC2 PC3 PC4
## Music -0.07910658 -0.06773298 0.20146291 -0.254868321
## Slow.songs.or.fast.songs 0.07401692 -0.02118718 0.32501523 -0.412353666
## Dance 0.08912409 -0.41141203 0.24874411 -0.185907411
## Folk -0.24907735 -0.19557249 -0.18807192 -0.116988542
## Country -0.22796452 -0.15293615 -0.10319001 -0.007203249
## Classical.music -0.33323030 -0.10626120 -0.23209176 -0.261992839
## Musical -0.21629296 -0.27823414 -0.18778344 0.018861734
## Pop 0.08175356 -0.37694117 0.11266663 0.014169543
## Rock -0.31414063 0.16180183 0.25068801 -0.034958632
## Metal.or.Hardrock -0.27074221 0.27896234 0.20058940 -0.161250816
## Punk -0.25452724 0.23314253 0.32438800 0.031380326
## Hiphop..Rap 0.13882985 -0.28639299 0.31573766 0.191871206
## Reggae..Ska -0.16549974 -0.14036770 0.32986698 0.461575056
## Swing..Jazz -0.32830852 -0.16832127 0.02723633 0.194691467
## Rock.n.roll -0.35014029 -0.01727067 0.16444190 0.147014209
## Alternative -0.29025469 0.13063016 0.16933259 -0.050340112
## Latino -0.11330156 -0.40114229 -0.02427954 0.183821427
## Techno..Trance 0.08787564 -0.22289556 0.28282930 -0.451521882
## Opera -0.28840758 -0.13001414 -0.30035467 -0.264047130
## PC5
## Music 0.36421487
## Slow.songs.or.fast.songs 0.03391737
## Dance -0.02388895
## Folk -0.22087379
## Country -0.18740981
## Classical.music -0.10652367
## Musical 0.32763451
## Pop 0.45330851
## Rock 0.33262216
## Metal.or.Hardrock -0.03814476
## Punk 0.01845789
## Hiphop..Rap -0.21104231
## Reggae..Ska -0.28975923
## Swing..Jazz -0.14527963
## Rock.n.roll 0.15707578
## Alternative -0.11861334
## Latino 0.13328062
## Techno..Trance -0.35924519
## Opera -0.09968505
Let's visualize the contribution to the first 4 Principal Components (for better viewing 4 instead of 5).
var<-get_pca_var(pca1)
a<-fviz_contrib(pca1, "var", axes=1, xtickslab.rt=90)
b<-fviz_contrib(pca1, "var", axes=2, xtickslab.rt=90)
c<-fviz_contrib(pca1, "var", axes=3, xtickslab.rt=90)
d<-fviz_contrib(pca1, "var", axes=4, xtickslab.rt=90)
e<-fviz_contrib(pca1, "var", axes=5, xtickslab.rt=90)
grid.arrange(a,b,c,d,top='Contribution to the first 4 Principal Components')The plot below visualize, as the red color rises, the contribution to the type of movie.
The following table allow to interpret the first 4 dimensions with this movie's type:
fviz_pca_var(pca2, col.var="contrib",
col.circle = "darkred",
gradient.cols=c("yellow2","red2"),
repel = T,ggtheme = theme_minimal())pca2$rotation[,1:4]## PC1 PC2 PC3 PC4
## Movies -0.17247372 0.30093113 -0.29457835 0.173533210
## Horror -0.29873279 0.04089540 -0.48318125 -0.366010065
## Thriller -0.39067146 0.04516843 -0.38457768 -0.330003569
## Comedy 0.02472672 0.35412470 -0.24791208 0.496728998
## Romantic 0.26740368 0.35425303 -0.10363250 0.282676671
## Sci.fi -0.37406413 0.13120295 0.03958551 0.211536845
## War -0.40131774 0.01563640 0.24713217 -0.001577096
## Fantasy.Fairy.tales 0.14371341 0.55280375 0.15376344 -0.270634022
## Animated 0.06584556 0.54609887 0.12180583 -0.345482098
## Documentary -0.18210840 0.14934485 0.48563533 -0.176469524
## Western -0.36425924 0.04084033 0.34900396 0.146949337
## Action -0.40552911 0.09788688 0.02987917 0.332096560
var2<-get_pca_var(pca2)
a2<-fviz_contrib(pca2, "var", axes=1, xtickslab.rt=90, color = "darkred", fill = "yellow")
b2<-fviz_contrib(pca2, "var", axes=2, xtickslab.rt=90,color = "darkred", fill = "yellow")
c2<-fviz_contrib(pca2, "var", axes=3, xtickslab.rt=90,color = "darkred", fill = "yellow")
d2<-fviz_contrib(pca2, "var", axes=4, xtickslab.rt=90,color = "darkred", fill = "yellow")
grid.arrange(a2,b2,c2,d2,top='Contribution to the first 4 Principal Components')