Korelasi merupakan pola hubungan linier antar peubah numerik yang dapat diketahui tingkat keerannya melalui koefisien \(r\). Nilai koefisien \(r\) berada pada rentang \(-1 \leq r \leq 1\). Nilai \(r\) emakin mendekati 1 ataupun -1 maka semakin erat hubungan linier antar peubah numerik tersebut. Sedangkan nilai positif maupun negatif menunjukkan arah hubungan antar peubah numerik tersebut. Nilai \(r\) positif berarti nilai rataan antar peubah yang searah, sedangkan nilai \(r\) yang negatif menunjukkan nilai rataan antar peubah yang berlawanan.
Berikut adalah korelasi yang dapat dilakukan pada dataset
soil yang tersedia pada paket agricolae.
Sebelum melakukan analisis korelasi, dapat dilakukan mengeksplorasi data
terlebih dahulu.
library(agricolae)
data(soil)
data_soil <- soil
str(data_soil)
## 'data.frame': 13 obs. of 23 variables:
## $ place: Factor w/ 13 levels "Chmar","Chz",..: 11 9 10 12 13 3 4 5 2 1 ...
## $ pH : num 3.8 7.2 8.4 4.8 4.4 7.6 8.3 8.4 6.1 4.4 ...
## $ EC : num 0.72 1 0.32 0.83 0.49 1.09 3.67 7.17 0.15 0.18 ...
## $ CaCO3: num 0 1.4 9.6 0 0 0.9 7.6 2.7 0 0 ...
## $ MO : num 1.7 2.6 2 1.9 2 1.3 1.7 1.1 1.6 3.8 ...
## $ CIC : num 5.76 20 18.88 17.28 6.4 ...
## $ P : num 8.1 17.5 17.6 15.8 7.2 8.9 20 50 8.5 44.1 ...
## $ K : num 28 166 124 86 30 350 537 414 84 240 ...
## $ sand : num 80 26 44 58 78 48 36 50 52 68 ...
## $ slime: num 16 42 32 30 14 38 36 28 28 26 ...
## $ clay : num 4 32 24 12 8 14 28 22 20 6 ...
## $ Ca : num 0.74 17.21 14.91 12.49 0.76 ...
## $ Mg : num 0.15 2.06 3.37 1.9 0.13 1.79 4.44 2.67 1.54 0.56 ...
## $ K2 : num 0.1 0.47 0.36 0.26 0.09 0.88 1.42 0.76 0.25 0.56 ...
## $ Na : num 0.27 0.26 0.24 0.32 0.26 0.37 0.62 0.92 0.29 0.27 ...
## $ Al_H : num 1.2 0 0 0.4 0.9 0 0 0 0 2.6 ...
## $ K_Mg : num 0.667 0.228 0.107 0.137 0.692 ...
## $ Ca_Mg: num 4.93 8.35 4.42 6.57 5.85 ...
## $ B : num 0.9 0.2 0.4 0.5 0.4 2.5 3.1 4.9 0.4 1.1 ...
## $ Cu : num 0.4 12.1 2.5 3 0.6 5.1 1.7 1.5 1.9 2.3 ...
## $ Fe : num 295.4 12.9 23.2 352.4 334.4 ...
## $ Mn : num 1.3 0.5 7.9 5.1 1.8 2.1 2.1 4.6 20.6 7.6 ...
## $ Zn : num 2.1 62.2 6.7 7 1.7 3.1 8.3 6.3 3.2 4.5 ...
summary(data_soil)
## place pH EC CaCO3 MO
## Chmar :1 Min. :3.800 Min. :0.090 Min. :0.000 Min. :1.100
## Chz :1 1st Qu.:4.700 1st Qu.:0.210 1st Qu.:0.000 1st Qu.:1.700
## Cnt1 :1 Median :6.100 Median :0.720 Median :0.000 Median :1.900
## Cnt2 :1 Mean :6.154 Mean :1.289 Mean :1.708 Mean :2.254
## Cnt3 :1 3rd Qu.:7.600 3rd Qu.:1.000 3rd Qu.:1.400 3rd Qu.:2.500
## Hco1 :1 Max. :8.400 Max. :7.170 Max. :9.600 Max. :5.400
## (Other):7
## CIC P K sand
## Min. : 5.76 Min. : 3.60 Min. : 28.0 Min. :26.00
## 1st Qu.:13.60 1st Qu.: 8.50 1st Qu.: 65.0 1st Qu.:44.00
## Median :17.28 Median :17.50 Median :124.0 Median :48.00
## Mean :16.17 Mean :20.65 Mean :202.5 Mean :51.69
## 3rd Qu.:20.00 3rd Qu.:20.10 3rd Qu.:350.0 3rd Qu.:58.00
## Max. :25.60 Max. :50.00 Max. :537.0 Max. :80.00
##
## slime clay Ca Mg
## Min. :14.00 Min. : 4.00 Min. : 0.740 Min. :0.130
## 1st Qu.:28.00 1st Qu.:10.00 1st Qu.: 7.440 1st Qu.:1.540
## Median :32.00 Median :14.00 Median : 9.570 Median :2.060
## Mean :31.85 Mean :16.46 Mean : 9.296 Mean :2.422
## 3rd Qu.:38.00 3rd Qu.:24.00 3rd Qu.:12.490 3rd Qu.:3.370
## Max. :46.00 Max. :32.00 Max. :17.210 Max. :6.250
##
## K2 Na Al_H K_Mg
## Min. :0.0900 Min. :0.2400 Min. :0.0000 Min. :0.0240
## 1st Qu.:0.1900 1st Qu.:0.2600 1st Qu.:0.0000 1st Qu.:0.1368
## Median :0.3600 Median :0.2700 Median :0.0000 Median :0.2846
## Mean :0.5077 Mean :0.3569 Mean :0.5308 Mean :0.3574
## 3rd Qu.:0.7600 3rd Qu.:0.3200 3rd Qu.:0.9000 3rd Qu.:0.4916
## Max. :1.4200 Max. :0.9200 Max. :2.6000 Max. :1.0000
##
## Ca_Mg B Cu Fe
## Min. :1.848 Min. :0.000 Min. : 0.400 Min. : 4.3
## 1st Qu.:3.514 1st Qu.:0.400 1st Qu.: 1.500 1st Qu.: 16.3
## Median :4.933 Median :0.500 Median : 2.300 Median : 295.4
## Mean :4.712 Mean :1.185 Mean : 3.131 Mean : 444.3
## 3rd Qu.:5.846 3rd Qu.:1.100 3rd Qu.: 3.000 3rd Qu.: 352.4
## Max. :8.354 Max. :4.900 Max. :12.100 Max. :1715.0
##
## Mn Zn
## Min. : 0.500 Min. : 1.700
## 1st Qu.: 2.100 1st Qu.: 2.900
## Median : 4.600 Median : 4.500
## Mean : 5.992 Mean : 8.892
## 3rd Qu.: 7.900 3rd Qu.: 6.700
## Max. :20.600 Max. :62.200
##
Kita bisa membuat boxplot untuk setiap peubah numerik seperti pada
pH, P, K, sand, dan
clay sebagai berikut.
new_data <- data_soil[, c("place", "pH", "P", "K", "sand", "clay")]
str(new_data)
## 'data.frame': 13 obs. of 6 variables:
## $ place: Factor w/ 13 levels "Chmar","Chz",..: 11 9 10 12 13 3 4 5 2 1 ...
## $ pH : num 3.8 7.2 8.4 4.8 4.4 7.6 8.3 8.4 6.1 4.4 ...
## $ P : num 8.1 17.5 17.6 15.8 7.2 8.9 20 50 8.5 44.1 ...
## $ K : num 28 166 124 86 30 350 537 414 84 240 ...
## $ sand : num 80 26 44 58 78 48 36 50 52 68 ...
## $ clay : num 4 32 24 12 8 14 28 22 20 6 ...
library(ggpubr)
## Loading required package: ggplot2
# Bar plot (bp)
bp1 <- ggbarplot(new_data, x = "place", y = "clay",
fill = "place", # change fill color by cyl
color = "white", # Set bar border colors to white
palette = "jco", # jco journal color palett. see ?ggpar
sort.val = "asc", # Sort the value in ascending order
sort.by.groups = TRUE, # Sort inside each group
x.text.angle = 90 # Rotate vertically x axis texts
) + font("x.text", size = 8)
bp2 <- ggbarplot(new_data, x = "place", y = "sand",
fill = "place", # change fill color by cyl
color = "white", # Set bar border colors to white
palette = "jco", # jco journal color palett. see ?ggpar
sort.val = "asc", # Sort the value in ascending order
sort.by.groups = TRUE, # Sort inside each group
x.text.angle = 90 # Rotate vertically x axis texts
) + font("x.text", size = 8)
bp3 <- ggbarplot(new_data, x = "place", y = "P",
fill = "place", # change fill color by cyl
color = "white", # Set bar border colors to white
palette = "jco", # jco journal color palett. see ?ggpar
sort.val = "asc", # Sort the value in ascending order
sort.by.groups = TRUE, # Sort inside each group
x.text.angle = 90 # Rotate vertically x axis texts
) + font("x.text", size = 8)
bp4 <- ggbarplot(new_data, x = "place", y = "pH",
fill = "place", # change fill color by cyl
color = "white", # Set bar border colors to white
palette = "jco", # jco journal color palett. see ?ggpar
sort.val = "asc", # Sort the value in ascending order
sort.by.groups = TRUE, # Sort inside each group
x.text.angle = 90 # Rotate vertically x axis texts
) + font("x.text", size = 8)
bp1
## Warning: This manual palette can handle a maximum of 10 values. You have
## supplied 13
bp2
## Warning: This manual palette can handle a maximum of 10 values. You have
## supplied 13
bp3
## Warning: This manual palette can handle a maximum of 10 values. You have
## supplied 13
bp4
## Warning: This manual palette can handle a maximum of 10 values. You have
## supplied 13
Selanjutnya kita bisa langsung melakukan korelasi dari seluruh peubah numerik dengan menggunakan heatmap. Berikut coding R nya.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
# Fungsi untuk menghitung p-value dari korelasi
cor.mtest <- function(mat, ...) {
mat <- as.matrix(mat)
n <- ncol(mat)
p.mat <- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
# Pilih hanya kolom numerik
new <- drop(data_soil[,-1])
d1_numeric <- new
# Hitung matriks korelasi dan p-value
cor_matrix <- cor(d1_numeric, use = "complete.obs")
p_matrix <- cor.mtest(d1_numeric)
# Ubah matriks korelasi dan p-value menjadi format panjang
cor_df <- melt(cor_matrix)
p_df <- melt(p_matrix)
# Gabungkan data korelasi dan p-value
cor_df <- cor_df %>%
rename(correlation = value) %>%
mutate(p_value = p_df$value)
# Tambahkan kolom signifikansi
cor_df <- cor_df %>%
mutate(signif = case_when(
p_value < 0.01 ~ "**",
p_value < 0.05 ~ "*",
TRUE ~ ""
))
# Gabungkan nilai korelasi dengan tanda signifikan
cor_df <- cor_df %>%
mutate(label = ifelse(Var1 == Var2, "", paste0(round(correlation, 2), signif))) # Hilangkan diagonal
# Buat heatmap dengan nilai korelasi dan signifikansi
pr1 <- ggplot(cor_df, aes(x = Var1, y = Var2, fill = correlation)) +
geom_tile(color = "white") +
geom_text(aes(label = label), color = "black", size = 3) +
scale_fill_gradient2(low = "red", mid = "darkslategray1", high = "blue",
midpoint = 0) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Heatmap Korelasi",
x = "Variabel X",
y = "Variabel Y",
fill = "r")
print(pr1)
# Fungsi untuk menghitung p-value dari korelasi
cor.mtest <- function(mat, ...) {
mat <- as.matrix(mat)
n <- ncol(mat)
p.mat <- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
# Pilih hanya kolom numerik
new <- drop(data_soil[,-1])
new <- new[,c("pH", "CaCO3", "P", "K", "Mg", "Na", "Al_H", "Cu", "Fe", "Zn")]
d1_numeric <- new
# Hitung matriks korelasi dan p-value
cor_matrix <- cor(d1_numeric, use = "complete.obs")
p_matrix <- cor.mtest(d1_numeric)
# Ubah matriks korelasi dan p-value menjadi format panjang
cor_df <- melt(cor_matrix)
p_df <- melt(p_matrix)
# Gabungkan data korelasi dan p-value
cor_df <- cor_df %>%
rename(correlation = value) %>%
mutate(p_value = p_df$value)
# Tambahkan kolom signifikansi
cor_df <- cor_df %>%
mutate(signif = case_when(
p_value < 0.01 ~ "**",
p_value < 0.05 ~ "*",
TRUE ~ ""
))
# Gabungkan nilai korelasi dengan tanda signifikan
cor_df <- cor_df %>%
mutate(label = ifelse(Var1 == Var2, "", paste0(round(correlation, 2), signif))) # Hilangkan diagonal
# Buat heatmap dengan nilai korelasi dan signifikansi
pr2 <- ggplot(cor_df, aes(x = Var1, y = Var2, fill = correlation)) +
geom_tile(color = "white") +
geom_text(aes(label = label), color = "black", size = 3) +
scale_fill_gradient2(low = "blue", mid = "darkslategray3", high = "skyblue",
midpoint = 0) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Heatmap Korelasi Data soil",
x = "Variabel X",
y = "Variabel Y",
fill = "r")
print(pr2)
Berdasarkan heatmap yang dibuat menunjukkan koefisien korelasi dari
peubah numerik berada pada rentang -0.2 hingga 0.9 untuk seluruh peubah
respon. Pada plot kedua, menunjukkan nilai r dari pH dengan
Al_H sebesar -0.65 dan signifikan. Hal ini menunjukkan
hubungan linier yang berlawanan dan kuat. Artinya rerata kenaikan
pH tidak diikuti oleh rerata Al_H. Selain itu,
nilai r yang positif, kuat dan signifikan ditunjukkan oleh
Zn dan Cu sehingga keduanya memiliki hubungan
linier yang searah sehingga rerata kenaikan Zn diikuti oleh
Cu.