University of Warsaw - Faculty of Economic Sciences
What inspires me to do such a project?
Unsupervised learning can be a strong tool not only to analyze numeral data but also to do image-based analysis which is rather rare among statistical techniques we can use. Clustering snowflakes or rather snow areas in Poland on December 31st of 2021 and January 1st of 2022 can be very interesting because clustering will tell how the data changed thus, snow surface changed during 24h. Based only on satellite pictures we can describe nature change on a massive scale
Data used in this analysis comes from a site https://meteologix.com/dj/model-charts/swisshd-eu/poland/snow-depth.html?fbclid=IwAR2hpTKTwoVyVilgLkV1O_3QugqqVZ6yDXq6JWn_Tp2XfY-NNiKlD16MUoc providing data based on satellite images on topics related to weather. I used two images, first from December 31st of 2021 at 12:00 and the second from January 1st of 2022 at 12:00.
Libraries
But those images will be hard to cluster as we can see that not only Poland is included
Thus all spaces outside Poland will be replaced with green colour to make sure that it will contain a different cluster. This way I will not include any area that im not interested in
n1<-c() # empty vector to save results
for (i in 1:10) { # numer of clusters to consider
c1<-clara(rgbImage0[, c("r.value", "g.value", "b.value")], i)
n1[i]<-c1$silinfo$avg.width # saving silhouette to vector
}
plot(n1, type='l', main="Optimal number of clusters", xlab="Number of clusters", ylab="Average silhouette", col="blue")
abline(h=(1:30)*5/100, lty=3, col="grey50")
abline(v = 5, col='darkgreen')So for December 31st, it’s optimal to choose 5 clusters of colors which is pretty logical as we can see many shades of snow representing the different volumes of it. One color is devoted for the ‘outside green’
n1<-c() # empty vector to save results
for (i in 1:10) { # numer of clusters to consider
c1<-clara(rgbImage1[, c("r.value", "g.value", "b.value")], i)
n1[i]<-c1$silinfo$avg.width # saving silhouette to vector
}
plot(n1, type='l', main="Optimal number of clusters", xlab="Number of clusters", ylab="Average silhouette", col="blue")
abline(h=(1:30)*5/100, lty=3, col="grey50")
abline(v = 3, col='darkgreen')As it’s visible new year in Poland started with a warm morning because there wasn’t so much snow. Therefore, we can distinguish only 3 clusters so 3 colours
December 31st
| rgb_code1 | colour_name1 |
|---|---|
| #23B14D | green |
| #AAA9C9 | faded blue |
| #F8F8FA | white |
| #3C92E7 | blue |
| #75BAFF | purple |
January 1st
| rgb_code2 | colour_name2 |
|---|---|
| #23B14D | green |
| #A9A8C7 | faded blue |
| #F5F5F7 | white |
Visualisation is always a nice thing to do. Beside that, the aim of this analysis is to measure how much area less (or more but we can see than rather less) is covered by snow in only 24 hours.
| Colour | Freq |
|---|---|
| #3C92E7 | 92.24624 |
| #F8F8FA | 7.75376 |
| Colour1 | Freq |
|---|---|
| #A9A8C7 | 38.09902 |
| #F5F5F7 | 61.90098 |
It’s possible to observe that clustering help us to measure that snow area in Poland dropped from 92,2% in December 31st 12:00 to 38,1% in January 1st 12:00. It’s only 24 hours but the effect is very surprising! Well, definitely a warm start of 2022!