setwd("C:/Users/石源方/Desktop/数据搬家/华理工/班级-华/各科课程作业/高等生物信息学-注意PDF格式/24-12-12-practice5_Proteomic")
# 加载必要的库
library(pheatmap)# 用于绘制热图
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# 读取CSV文件
proteomics_data <- read.csv("proteomics_data.csv", row.names = 1)
sample_group <- read.csv("sample_group.csv")
# 确保样本顺序一致
sample_group <- sample_group %>%
mutate(Group = factor(Group, levels = c("BPH", "TA1", "TA2"))) %>% # 指定分组顺序
filter(Sample_ID %in% colnames(proteomics_data))
# 对蛋白表达数据进行排序,使列顺序和样本组一致
proteomics_data <- proteomics_data[, sample_group$Sample_ID]
# 数据标准化:将每个蛋白的数据进行Z-score标准化
normalized_data <- t(scale(t(proteomics_data)))
# 初始化一个空向量,用于存储每组样本聚类后的列名
ordered_columns <- c()
# 对每个组进行样本层次聚类
for (grp in levels(sample_group$Group)) {
# 筛选当前组别的样本
current_samples <- sample_group$Sample_ID[sample_group$Group == grp]
current_matrix <- normalized_data[, current_samples]
# 计算当前组的样本之间的欧几里得距离,并进行层次聚类
dist_matrix <- dist(t(current_matrix), method = "euclidean")
cluster_result <- hclust(dist_matrix, method = "complete")
# 获取聚类后的样本顺序
ordered_columns <- c(ordered_columns, current_samples[cluster_result$order])
}
# 按聚类顺序重新排列整个数据矩阵
normalized_data <- normalized_data[, ordered_columns]
# 重新生成列注释(Group 信息)
annotation_col <- data.frame(Group = sample_group$Group)
rownames(annotation_col) <- sample_group$Sample_ID
annotation_col <- annotation_col[ordered_columns, , drop = FALSE] # 按新顺序排列
# 定义颜色 (根据组别区分颜色)
group_colors <- list(Group = c("BPH" = "#1F78B4", "TA1" = "#E31A1C", "TA2" = "#33A02C"))
# 绘制热图
pheatmap(
normalized_data, # 标准化后的蛋白表达矩阵
annotation_col = annotation_col, # 样本的分组信息
annotation_colors = group_colors, # 分组颜色定义
color = colorRampPalette(c("blue", "white", "red"))(100), # 热图颜色
scale = "none", # 已经标准化,不再额外缩放
cluster_cols = FALSE, # 列样本不聚类
cluster_rows = TRUE, # 行进行聚类
show_rownames = FALSE, # 不显示蛋白名
show_colnames = FALSE, # 不显示样本名
fontsize_row = 8, # 行名字体大小
main = "Hierarchical Clustering Heatmap by Group" # 图标题
)
