R语言做主成分分析的一个简单小例子

小明的数据分析笔记本

2022-08-02

这个是公众号 小明的数据分析笔记本 2019年11月30号 的推文

数据集总共是178个样本,14个变量,第一个变量是葡萄酒的种类,其余变量是葡萄酒的一些指标

读取数据集

# df<-read.delim("https://gist.githubusercontent.com/tijptjik/9408623/raw/b237fa5848349a14a14e5d4107dc7897c21951f5/wine.csv",
#                sep = ",",
#                header = TRUE)
# readr::write_csv(df,file = "Wine.csv")

df<-read.delim("Wine.csv",
               sep = ",",
               header = TRUE)
dim(df)
## [1] 178  14
library(knitr)
## Warning: package 'knitr' was built under R version 4.0.5
kable(df[1:6,])
Wine Alcohol Malic.acid Ash Acl Mg Phenols Flavanoids Nonflavanoid.phenols Proanth Color.int Hue OD Proline
1 14.23 1.71 2.43 15.6 127 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065
1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050
1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.68 1.03 3.17 1185
1 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480
1 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735
1 14.20 1.76 2.45 15.2 112 3.27 3.39 0.34 1.97 6.75 1.05 2.85 1450

主成分分析

df$Wine<-as.factor(df$Wine)
winepca<-prcomp(df[,2:14],scale. = TRUE)
library(factoextra)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.5
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_eig(winepca,addlabels = T)

fviz_pca_ind(winepca,
             col.ind = df$Wine,
             addEllipses = T,
             geom=("point"),
             legend.title="")

欢迎大家关注我的公众号

小明的数据分析笔记本