INTRODUCTION

Data Visualization adds another meaning to understanding the data and its implications better. Recently I came across census related datasets that higlighted the mean centroid, population and county related data from various US states. To that end, I found a similar set of datasets on Kaggle from India, my motherland. A similar visualization was carried out in Python and was posted by a user. I want to use a similar approach in RStudio.

The dataset is available on Kaggle.

DATA PREPARATION AND PROCESSING

The original datasets are in csv format.There are three different datasets to work with: 1) State Wise Centroids from 2001 2) State Wise Centroids from 2011 3) District Wise Centroid and Population We will use the read.csv command to import them into R. This command will vary depending on the file type and the location of the dataset, either on your device or to be downloaded from an external site.

library(readr)
centroid_2001<- read_csv("~/Desktop/indian-census-data-with-geospatial-indexing/state wise centroids_2001.csv")
## Parsed with column specification:
## cols(
##   State = col_character(),
##   Longitude = col_double(),
##   Latitude = col_double()
## )
library(readr)
centroid_2011 <- read_csv("~/Desktop/indian-census-data-with-geospatial-indexing/state wise centroids_2011.csv")
## Parsed with column specification:
## cols(
##   State = col_character(),
##   Longitude = col_double(),
##   Latitude = col_double()
## )
library(readr)
d_centroid_population<- read_csv("~/Desktop/indian-census-data-with-geospatial-indexing/district wise centroids.csv")
## Parsed with column specification:
## cols(
##   State = col_character(),
##   District = col_character(),
##   Latitude = col_double(),
##   Longitude = col_double()
## )

A summary of each dataset will help us better understand its structure and constituent variables.

summary(centroid_2001)
##     State             Longitude        Latitude    
##  Length:35          Min.   :72.19   Min.   :10.30  
##  Class :character   1st Qu.:75.84   1st Qu.:19.79  
##  Mode  :character   Median :78.88   Median :23.71  
##                     Mean   :81.72   Mean   :22.75  
##                     3rd Qu.:88.33   3rd Qu.:27.20  
##                     Max.   :94.66   Max.   :33.63
summary(centroid_2011)
##     State             Longitude        Latitude    
##  Length:35          Min.   :72.20   Min.   :10.33  
##  Class :character   1st Qu.:75.84   1st Qu.:19.80  
##  Mode  :character   Median :78.85   Median :23.71  
##                     Mean   :81.73   Mean   :22.75  
##                     3rd Qu.:88.33   3rd Qu.:27.20  
##                     Max.   :94.55   Max.   :33.65
summary(d_centroid_population)
##     State             District            Latitude        Longitude    
##  Length:594         Length:594         Min.   : 7.835   Min.   :68.82  
##  Class :character   Class :character   1st Qu.:20.571   1st Qu.:76.41  
##  Mode  :character   Mode  :character   Median :24.479   Median :79.43  
##                                        Mean   :23.216   Mean   :81.10  
##                                        3rd Qu.:26.862   3rd Qu.:85.31  
##                                        Max.   :34.599   Max.   :96.60

In Rstudio the leaflet package helps create inteactive maps. Using the following code we can create an interactive map for the following dataset:

library(leaflet)

Adding specific markers will help point out the column we want to highlight on our map. In simple words, we can’t have all of our columns represented on this limited map.

The first map will help highlight centroids from various indian states from the census data collected in 2001.

centroid_2001%>%
   leaflet() %>%
   addTiles() %>%
   addMarkers(popup = centroid_2001$State,clusterOptions =markerClusterOptions())
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively

The second map will help highlights centroids from various indian states from the census data collected in 2011.

centroid_2011%>%
   leaflet() %>%
   addTiles() %>%
   addMarkers(popup = centroid_2011$State,clusterOptions =markerClusterOptions())
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively

Based only on visual characterization, not much changed in the span of 10 years in terms of the centroid locations in each state. But, most certainly the increase in population, migration and the economy the population dynamics changed in the 10 years, especially felt more so at district level. The next map will help visualize that better.

This map will help highlight the district level centroids:

d_centroid_population%>%
leaflet() %>%
   addTiles() %>%
   addMarkers(popup = d_centroid_population$District, clusterOptions =markerClusterOptions())
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively

PURPOSE AND RESULT

The maps created using leaflet will help better visualize the central hubs in each state of India, especially through the lense of census data collected back in 2001 and 2011. A closer look at the district levels provides a more detailed insight into places where the populations are a lot more dense and affect the economy more than other places within the state.