The data produced by IoT (internet of things) is enormous and data mining techniques can be used to get hidden information, which is of high business value. Smart cities are completely based on IoT.Air pollution is increasing rapidly in the smart cities and has adverse effects on human health. The sources of pollution are many including road traffic, industrial gases and others. In this study we try to find the healthiest areas, which are suitable for leaving, in the smart cities by using K-means clustering. The dataset is generated from the City Plus project. The data is enormous and dynamic due to the number of sensors deployed in the same location and their measurement frequency. This data consists of 5 air pollutants namely ozone, sulfur dioxide, nitrogen dioxide, carbon monoxide and particulate matter. There are 3 more fields in the data set namely- Longitude, latitude and timestamp.
Get the data
Get the ozone concentration grouped by latitude and longitude
x1 = sqldf("select longitude, latitude, avg(ozone) as avg_ozone, avg(particullate_matter) as particulate from tbl group by longitude,latitude")
## Loading required package: tcltk
## Warning: Quoted identifiers should have class SQL, use DBI::SQL() if the
## caller performs the quoting.
head(x1)
## longitude latitude avg_ozone particulate
## 1 10.06279 56.10071 114.87796 106.22234
## 2 10.06473 56.15245 97.82055 104.07477
## 3 10.06613 56.12138 123.91216 91.03358
## 4 10.07239 56.15457 133.58848 122.29269
## 5 10.07289 56.15469 140.92253 125.03375
## 6 10.07666 56.18959 102.11965 93.42367
Here we apply K-means algorithm for k = 4 through 10. The objective is to group the data into clusters and identify the cleanest areas in the city based on ozone level concentration.
# k = 4
result4<-kmeans(x1,4)
result4$centers
## longitude latitude avg_ozone particulate
## 1 10.18229 56.16382 113.46153 117.07341
## 2 10.17615 56.16417 109.35957 99.12508
## 3 10.17705 56.17164 130.43995 111.18956
## 4 10.17102 56.16822 94.32454 117.74445
# k = 5
result5<-kmeans(x1,5)
result4$centers
## longitude latitude avg_ozone particulate
## 1 10.18229 56.16382 113.46153 117.07341
## 2 10.17615 56.16417 109.35957 99.12508
## 3 10.17705 56.17164 130.43995 111.18956
## 4 10.17102 56.16822 94.32454 117.74445
# k = 6
result6<-kmeans(x1,6)
result6$centers
## longitude latitude avg_ozone particulate
## 1 10.17961 56.16234 112.00932 111.20974
## 2 10.18238 56.16182 117.05756 97.88561
## 3 10.15925 56.16820 96.69646 101.42041
## 4 10.18186 56.16410 113.37158 123.31061
## 5 10.18123 56.17512 131.52896 114.22018
## 6 10.17362 56.17085 93.45754 122.38768
# k = 7
result7<-kmeans(x1,7)
result7$centers
## longitude latitude avg_ozone particulate
## 1 10.17825 56.16944 118.60557 109.5562
## 2 10.18495 56.15798 115.86044 96.2054
## 3 10.17523 56.16222 105.50605 114.7037
## 4 10.17218 56.16558 88.92402 124.2009
## 5 10.18488 56.16478 115.13358 123.6108
## 6 10.18290 56.17964 135.92837 115.6180
## 7 10.16004 56.16695 95.85376 100.3781
# k = 8
result8<-kmeans(x1,8)
result8$centers
## longitude latitude avg_ozone particulate
## 1 10.15406 56.16399 95.34281 100.1931
## 2 10.17243 56.17007 100.94855 122.3536
## 3 10.18082 56.15508 108.38938 111.1353
## 4 10.18310 56.16506 115.70587 123.3082
## 5 10.18495 56.15798 115.86044 96.2054
## 6 10.18440 56.17986 136.74576 116.2542
## 7 10.17692 56.17476 120.45539 109.7785
## 8 10.18032 56.17061 81.92253 121.5084
# k = 9
result9<-kmeans(x1,9)
result9$centers
## longitude latitude avg_ozone particulate
## 1 10.17410 56.15099 114.33345 89.32272
## 2 10.14908 56.16746 95.39273 101.36787
## 3 10.18505 56.17144 117.82433 113.06978
## 4 10.17218 56.16558 88.92402 124.20092
## 5 10.18413 56.16805 111.10087 103.77677
## 6 10.18538 56.16045 115.05959 125.62627
## 7 10.18209 56.17875 137.23991 119.00751
## 8 10.17870 56.16995 127.04528 103.73866
## 9 10.17588 56.15930 105.29682 116.19068
# k = 10
result10<-kmeans(x1,10)
result10$centers
## longitude latitude avg_ozone particulate
## 1 10.15168 56.16368 94.99104 101.03512
## 2 10.18366 56.16245 115.32841 124.68005
## 3 10.17883 56.15726 106.61785 112.47661
## 4 10.18032 56.17061 81.92253 121.50843
## 5 10.17410 56.15099 114.33345 89.32272
## 6 10.17052 56.17031 100.98485 123.39569
## 7 10.18328 56.17163 118.22879 112.93678
## 8 10.18570 56.16588 111.94914 102.53156
## 9 10.18209 56.17875 137.23991 119.00751
## 10 10.17870 56.16995 127.04528 103.73866
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=56.166151,10.176887&zoom=12&size=640x640&scale=1&maptype=satellite&language=en-EN&sensor=false
When K = 10, we find the ozone level concentration to be the lowest. Thus the cleanest area is found to be at: Longitude:
long
## longitude
## 4 10.18032
Latitude:
lati
## latitude
## 4 56.17061
Minimum ozone level:
min_ozone_level
## [1] 81.92253