In this paper I will use unsupervised learning tool for clustering images. Due to the fact that European Union has developed a lot over the last few decades, which can be observed, through the illumination of the terrain, as it indicates the technological development of the area. The goal of this paper is to investigate how much the area illuminated by lights at night over Europe has increased by. For this purpose I will use clara clustering method for image clustering, as it is the most optimal method for large datasets. The method is based on the k-medoids PAM algorithm.
Let’s start by loading a PNG file containing a satellite images of Europe by night for both 1992 and 2016, and see how they look without any transformations:
euro_92 <- readPNG("europe_1990v2.PNG")
euro_16 <- readPNG('europe_2010v2.PNG')
im_A <- ggplot() +
background_image(euro_92) +
theme(plot.margin = margin(t=1, l=1, r=1, b=1, unit = "cm"))
im_B <- ggplot() + background_image(euro_16) +
theme(plot.margin = margin(t=1, l=1, r=1, b=1, unit = "cm"))
ggarrange(im_A, im_B,
labels = c("1992", "2016"),
ncol = 2, nrow = 1)
It is clear, without any transformations, that Europe has developed a lot in two decades.
Lets now prove that we are dealing with a large dataset. We treat one image pixel as one observation.
dm92 <- dim(euro_92)
dm16 <- dim(euro_16)
Dimensions of the 1992 image 943, 1841
Dimensions of the 2016 image 782, 1115
The two images are similar in size, but not identical. This should be taken into account during calculating the increase in illumination.
Now let’s convert the images to RGB colors and see how they look:
rgbEuro92<-data.frame(x=rep(1:dm92[2],
each=dm92[1]),
y=rep(dm92[1]:1, dm92[2]),
r.value=as.vector(euro_92[,,1]),
g.value=as.vector(euro_92[,,2]),
b.value=as.vector(euro_92[,,3]))
rgbEuro16<-data.frame(x=rep(1:dm16[2],
each=dm16[1]),
y=rep(dm16[1]:1, dm16[2]),
r.value=as.vector(euro_16[,,1]),
g.value=as.vector(euro_16[,,2]),
b.value=as.vector(euro_16[,,3]))
plot(y~x, data=rgbEuro92, main="Europe 1992",
col=rgb(rgbEuro92[c("r.value", "g.value", "b.value")]), asp=1, pch=".")
plot(y~x, data=rgbEuro16, main="Europe 2016",
col=rgb(rgbEuro16[c("r.value", "g.value", "b.value")]), asp=1, pch=".")
As it was mentioned before in the study we will use Clara method. We need to determine the optimal number of clusters for both images. Before proceeding with the analysis, it is useful to think about the number of clusters we expect. On the one hand, we should consider comparing the average silhouette width for every number of clusters. Silhouette refers to the method of interpreting and validating consistency in data clusters. Silhouette ranges from -1 to +1, where a high value indicates that an object is well matched to its own cluster and poorly matched to neighboring clusters. So, the higher average silhouette width, the better clustering is. On the other hand, we need to consider the object of our analysis.
The ideal situation is when the number of optimal clusters using the silhouette method will be three.
n92 <- c()
for (i in 1:10) {
cl92 <- clara(rgbEuro92[, c("r.value", "g.value", "b.value")], i)
n92[i] <- cl92$silinfo$avg.width
}
n16 <- c()
for (i in 1:10) {
cl16 <- clara(rgbEuro16[, c("r.value", "g.value", "b.value")], i)
n16[i] <- cl16$silinfo$avg.width
}
plot(n92, type = 'l',
main = "Optimal number of clusters for year 1990",
xlab = "Number of clusters",
ylab = "Average silhouette",
col = "blue")
plot(n16, type = 'l',
main = "Optimal number of clusters for year 2010",
xlab = "Number of clusters",
ylab = "Average silhouette",
col = "red")
As can be seen, the optimal number of clusters for 1992 is seven and for 2016 two. However, since seven colors will significantly complicate the interpretation of the results, and two clusters is not enough since we will not separate land and water, it was decided to choose three clusters in both cases where the average silhouette width is still high enough:
Europe1992 = rgbEuro92[, c("r.value", "g.value", "b.value")]
Europe2016 = rgbEuro16[, c("r.value", "g.value", "b.value")]
clara923 <- clara(Europe1992, 3)
clara163 <- clara(Europe2016, 3)
plot(silhouette(clara923))
plot(silhouette(clara163))
Let’s see how the algorithm is applied in the images:
colours923<-rgb(clara923$medoids[clara923$clustering, ])
colours163<-rgb(clara163$medoids[clara163$clustering, ])
plot(rgbEuro92$y~rgbEuro92$x, col=colours923, pch=".", cex=2, asp=1, main="Europe 1992 - 3 colours")
plot(rgbEuro16$y~rgbEuro16$x, col=colours163, pch=".", cex=2, asp=1, main="Europe 2016 - 3 colours")
dominantColours92 <- as.data.frame(table(colours923))
max_col92 <- max(dominantColours92$Freq)/sum(dominantColours92$Freq)
min_col92 <- min(dominantColours92$Freq)/sum(dominantColours92$Freq)
medium_col92 <- 1-max_col92 - min_col92
dominantColours92$colours <- as.character(dominantColours92$colours)
dominantColours92$distribution <- round((c(max_col92, medium_col92, min_col92) * 100), 2)
dominantColours16 <- as.data.frame(table(colours163))
max_col16 <- max(dominantColours16$Freq)/sum(dominantColours16$Freq)
min_col16 <- min(dominantColours16$Freq)/sum(dominantColours16$Freq)
medium_col16 <- 1-max_col16 - min_col16
dominantColours16$colours <- as.character(dominantColours16$colours)
dominantColours16$distribution <- round((c(max_col16, medium_col16, min_col16) * 100), 2)
% light of land in 1992 = 8.73 %
% light of land in 2016 = 54.12 %
increase by 619.93 %
This study used the image clustering method to examine how the lighting in Europe changed over the 24 years between 1992 and 2016. The calculated increase is 619.93%.
However, it should be noted that this was not a very accurate calculation and has several flaws. First, clustering by design is a simplification. The dominant color (indicating the %water in the image) increased from 56% to 60%. This can be explained, for example, by the emergence of more roads that were classified as the darkest color - the color of water at night. The inaccuracy of the measurements may also be due to the moment when the pictures were taken (and the time when the pictures were taken is unknown) - the cities will glow quite differently at 10 pm than at 4 am. Nevertheless, I consider the study successful and very interesting - it certainly shows how Europe has developed in the last few decades.
Sources of the images: https://www.nightearth.com/
Google Earth