In this article I will use cluster methods and focus on image clustering. For this purpose I chose the night map of United States. The aim of the study is to calculate percentage night light area of USA.
Firstly, I am loading useful libraries and my picture.
library('cluster')
library('jpeg')
library('ggplot2')
image <- readJPEG("light2.jpg")
Now I can print my original image.
plot(1, type="n", main='USA night map')
rasterImage(image, 0.6, 0.6, 1.4, 1.4)
Letâs check the dimension first.
dm1 <- dim(image)
dm1[1:2]
## [1] 630 985
Size of my image is equal to 630x985.
In the next step I will change my image type from jpg to rgb. RGB is color model based on assumption that every colour is a mix of three basic colours: red, green and blue with diffrent intensity.
rgbImage1<-data.frame(x=rep(1:dm1[2],
each=dm1[1]),
y=rep(dm1[1]:1, dm1[2]),
r.value=as.vector(image[,,1]),
g.value=as.vector(image[,,2]),
b.value=as.vector(image[,,3]))
We get data frame that have information about every pixel of our picture.
Now I can display the same picture but in RGB.
plot(y~x, data=rgbImage1, main="USA night map",
col=rgb(rgbImage1[c("r.value", "g.value", "b.value")]), asp=1, pch=".")
My data is quite large so I will use CLARA to reduce computing time. CLARA is extension to k-medoids method but it is created for large dataset.
n1 <- c()
for (i in 1:10) {
cl <- clara(rgbImage1[, c("r.value", "g.value", "b.value")], i)
n1[i] <- cl$silinfo$avg.width
}
plot(n1, type = 'l',
main = "Optimal number of clusters for my Image",
xlab = "Number of clusters",
ylab = "Average silhouette",
col = "blue")
Recomended number of clusters for my image is 2. Letâs print it now.
image = rgbImage1[, c("r.value", "g.value", "b.value")]
clara <- clara(image, 2)
plot(silhouette(clara))
colours<-rgb(clara$medoids[clara$clustering, ])
plot(rgbImage1$y~rgbImage1$x, col=colours, pch=".", cex=2, asp=1, main="2 colours")
As we can see on the picture, 2 clusters can tell us only about land and ocean area. We canât see the âlight placesâ. Thatâs why I will use (not recomended) 3 clusters.
image = rgbImage1[, c("r.value", "g.value", "b.value")]
clara <- clara(image, 3)
plot(silhouette(clara))
colours<-rgb(clara$medoids[clara$clustering, ])
plot(rgbImage1$y~rgbImage1$x, col=colours, pch=".", cex=2, asp=1, main="3 colours")
3 clusters help us to reach our goal. Now we can see 3 colours which represent: light places of USA, the USA land and oceans.
dominantColours <- as.data.frame(table(colours))
max_col <- max(dominantColours$Freq)/sum(dominantColours$Freq)
min_col <- min(dominantColours$Freq)/sum(dominantColours$Freq)
medium_col <- 1-max_col - min_col
dominantColours$distribution <- round((c(max_col, medium_col, min_col) * 100), 2)
dominantColours
## colours Freq distribution
## 1 #05050F 409206 65.94
## 2 #131222 201308 32.44
## 3 #B9A89E 10036 1.62
dominantColours$colours <- as.character(dominantColours$colours)
pie(dominantColours$Freq, labels = dominantColours$distribution,
col = dominantColours$colours,
xlab = "Colours",
ylab = "Frequency")
plot(image)
USAarea <- sum(dominantColours$Freq[dominantColours$colours=='#131222' | dominantColours$colours=='#B9A89E' ])
USAlightarea <- dominantColours$Freq[dominantColours$colours=='#B9A89E']
USAlightarea/USAarea
## [1] 0.04748656
In this article, I used clustering method on image to calculate percentage of light area of USA during night. I used 3 clusters to isolate 3 areas: light, land and ocean. I calculate that 4,7% of USA is light area.