The data produced by IoT (internet of things) is enormous and data mining techniques can be used to get hidden information, which is of high business value. Smart cities are completely based on IoT.Air pollution is increasing rapidly in the smart cities and has adverse effects on human health. The sources of pollution are many including road traffic, industrial gases and others. In this study we try to find the healthiest areas, which are suitable for leaving, in the smart cities by using K-means clustering. The dataset is generated from the City Plus project. The data is enormous and dynamic due to the number of sensors deployed in the same location and their measurement frequency. This data consists of 5 air pollutants namely ozone, sulfur dioxide, nitrogen dioxide, carbon monoxide and particulate matter. There are 3 more fields in the data set namely- Longitude, latitude and timestamp.

Get the data

Get the ozone concentration grouped by latitude and longitude

x1 = sqldf("select longitude, latitude, avg(ozone) as avg_ozone, avg(particullate_matter) as particulate from tbl group by longitude,latitude")
## Loading required package: tcltk
## Warning: Quoted identifiers should have class SQL, use DBI::SQL() if the
## caller performs the quoting.
head(x1)
##   longitude latitude avg_ozone particulate
## 1  10.06279 56.10071 114.87796   106.22234
## 2  10.06473 56.15245  97.82055   104.07477
## 3  10.06613 56.12138 123.91216    91.03358
## 4  10.07239 56.15457 133.58848   122.29269
## 5  10.07289 56.15469 140.92253   125.03375
## 6  10.07666 56.18959 102.11965    93.42367

K-means

Here we apply K-means algorithm for k = 4 through 10. The objective is to group the data into clusters and identify the cleanest areas in the city based on ozone level concentration.

# k = 4
result4<-kmeans(x1,4)
result4$centers
##   longitude latitude avg_ozone particulate
## 1  10.18229 56.16382 113.46153   117.07341
## 2  10.17615 56.16417 109.35957    99.12508
## 3  10.17705 56.17164 130.43995   111.18956
## 4  10.17102 56.16822  94.32454   117.74445
# k = 5
result5<-kmeans(x1,5)
result4$centers
##   longitude latitude avg_ozone particulate
## 1  10.18229 56.16382 113.46153   117.07341
## 2  10.17615 56.16417 109.35957    99.12508
## 3  10.17705 56.17164 130.43995   111.18956
## 4  10.17102 56.16822  94.32454   117.74445
# k = 6
result6<-kmeans(x1,6)
result6$centers
##   longitude latitude avg_ozone particulate
## 1  10.17961 56.16234 112.00932   111.20974
## 2  10.18238 56.16182 117.05756    97.88561
## 3  10.15925 56.16820  96.69646   101.42041
## 4  10.18186 56.16410 113.37158   123.31061
## 5  10.18123 56.17512 131.52896   114.22018
## 6  10.17362 56.17085  93.45754   122.38768
# k = 7
result7<-kmeans(x1,7)
result7$centers
##   longitude latitude avg_ozone particulate
## 1  10.17825 56.16944 118.60557    109.5562
## 2  10.18495 56.15798 115.86044     96.2054
## 3  10.17523 56.16222 105.50605    114.7037
## 4  10.17218 56.16558  88.92402    124.2009
## 5  10.18488 56.16478 115.13358    123.6108
## 6  10.18290 56.17964 135.92837    115.6180
## 7  10.16004 56.16695  95.85376    100.3781
# k = 8
result8<-kmeans(x1,8)
result8$centers
##   longitude latitude avg_ozone particulate
## 1  10.15406 56.16399  95.34281    100.1931
## 2  10.17243 56.17007 100.94855    122.3536
## 3  10.18082 56.15508 108.38938    111.1353
## 4  10.18310 56.16506 115.70587    123.3082
## 5  10.18495 56.15798 115.86044     96.2054
## 6  10.18440 56.17986 136.74576    116.2542
## 7  10.17692 56.17476 120.45539    109.7785
## 8  10.18032 56.17061  81.92253    121.5084
# k = 9
result9<-kmeans(x1,9)
result9$centers
##   longitude latitude avg_ozone particulate
## 1  10.17410 56.15099 114.33345    89.32272
## 2  10.14908 56.16746  95.39273   101.36787
## 3  10.18505 56.17144 117.82433   113.06978
## 4  10.17218 56.16558  88.92402   124.20092
## 5  10.18413 56.16805 111.10087   103.77677
## 6  10.18538 56.16045 115.05959   125.62627
## 7  10.18209 56.17875 137.23991   119.00751
## 8  10.17870 56.16995 127.04528   103.73866
## 9  10.17588 56.15930 105.29682   116.19068
# k = 10
result10<-kmeans(x1,10)
result10$centers
##    longitude latitude avg_ozone particulate
## 1   10.15168 56.16368  94.99104   101.03512
## 2   10.18366 56.16245 115.32841   124.68005
## 3   10.17883 56.15726 106.61785   112.47661
## 4   10.18032 56.17061  81.92253   121.50843
## 5   10.17410 56.15099 114.33345    89.32272
## 6   10.17052 56.17031 100.98485   123.39569
## 7   10.18328 56.17163 118.22879   112.93678
## 8   10.18570 56.16588 111.94914   102.53156
## 9   10.18209 56.17875 137.23991   119.00751
## 10  10.17870 56.16995 127.04528   103.73866

Visualizing the cluster

Mapping the cluster

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=56.166151,10.176887&zoom=12&size=640x640&scale=1&maptype=satellite&language=en-EN&sensor=false

Conclusion

When K = 10, we find the ozone level concentration to be the lowest. Thus the cleanest area is found to be at: Longitude:

long
##   longitude
## 4  10.18032

Latitude:

lati
##   latitude
## 4 56.17061

Minimum ozone level:

min_ozone_level
## [1] 81.92253