这个是公众号 小明的数据分析笔记本 2019年11月30号 的推文
数据集总共是178个样本,14个变量,第一个变量是葡萄酒的种类,其余变量是葡萄酒的一些指标
读取数据集
# df<-read.delim("https://gist.githubusercontent.com/tijptjik/9408623/raw/b237fa5848349a14a14e5d4107dc7897c21951f5/wine.csv",
# sep = ",",
# header = TRUE)
# readr::write_csv(df,file = "Wine.csv")
df<-read.delim("Wine.csv",
sep = ",",
header = TRUE)
dim(df)## [1] 178 14
library(knitr)## Warning: package 'knitr' was built under R version 4.0.5
kable(df[1:6,])| Wine | Alcohol | Malic.acid | Ash | Acl | Mg | Phenols | Flavanoids | Nonflavanoid.phenols | Proanth | Color.int | Hue | OD | Proline |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.80 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 |
| 1 | 13.20 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.40 | 1050 |
| 1 | 13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.80 | 3.24 | 0.30 | 2.81 | 5.68 | 1.03 | 3.17 | 1185 |
| 1 | 14.37 | 1.95 | 2.50 | 16.8 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.80 | 0.86 | 3.45 | 1480 |
| 1 | 13.24 | 2.59 | 2.87 | 21.0 | 118 | 2.80 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735 |
| 1 | 14.20 | 1.76 | 2.45 | 15.2 | 112 | 3.27 | 3.39 | 0.34 | 1.97 | 6.75 | 1.05 | 2.85 | 1450 |
主成分分析
df$Wine<-as.factor(df$Wine)
winepca<-prcomp(df[,2:14],scale. = TRUE)
library(factoextra)## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.5
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_eig(winepca,addlabels = T)fviz_pca_ind(winepca,
col.ind = df$Wine,
addEllipses = T,
geom=("point"),
legend.title="")欢迎大家关注我的公众号
小明的数据分析笔记本