Data Visualization adds another meaning to understanding the data and its implications better. Recently I came across census related datasets that higlighted the mean centroid, population and county related data from various US states. To that end, I found a similar set of datasets on Kaggle from India, my motherland. A similar visualization was carried out in Python and was posted by a user. I want to use a similar approach in RStudio.
The dataset is available on Kaggle.
The original datasets are in csv format.There are three different datasets to work with: 1) State Wise Centroids from 2001 2) State Wise Centroids from 2011 3) District Wise Centroid and Population We will use the read.csv command to import them into R. This command will vary depending on the file type and the location of the dataset, either on your device or to be downloaded from an external site.
library(readr)
centroid_2001<- read_csv("~/Desktop/indian-census-data-with-geospatial-indexing/state wise centroids_2001.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## Longitude = col_double(),
## Latitude = col_double()
## )
library(readr)
centroid_2011 <- read_csv("~/Desktop/indian-census-data-with-geospatial-indexing/state wise centroids_2011.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## Longitude = col_double(),
## Latitude = col_double()
## )
library(readr)
d_centroid_population<- read_csv("~/Desktop/indian-census-data-with-geospatial-indexing/district wise centroids.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## District = col_character(),
## Latitude = col_double(),
## Longitude = col_double()
## )
A summary of each dataset will help us better understand its structure and constituent variables.
summary(centroid_2001)
## State Longitude Latitude
## Length:35 Min. :72.19 Min. :10.30
## Class :character 1st Qu.:75.84 1st Qu.:19.79
## Mode :character Median :78.88 Median :23.71
## Mean :81.72 Mean :22.75
## 3rd Qu.:88.33 3rd Qu.:27.20
## Max. :94.66 Max. :33.63
summary(centroid_2011)
## State Longitude Latitude
## Length:35 Min. :72.20 Min. :10.33
## Class :character 1st Qu.:75.84 1st Qu.:19.80
## Mode :character Median :78.85 Median :23.71
## Mean :81.73 Mean :22.75
## 3rd Qu.:88.33 3rd Qu.:27.20
## Max. :94.55 Max. :33.65
summary(d_centroid_population)
## State District Latitude Longitude
## Length:594 Length:594 Min. : 7.835 Min. :68.82
## Class :character Class :character 1st Qu.:20.571 1st Qu.:76.41
## Mode :character Mode :character Median :24.479 Median :79.43
## Mean :23.216 Mean :81.10
## 3rd Qu.:26.862 3rd Qu.:85.31
## Max. :34.599 Max. :96.60
In Rstudio the leaflet package helps create inteactive maps. Using the following code we can create an interactive map for the following dataset:
library(leaflet)
Adding specific markers will help point out the column we want to highlight on our map. In simple words, we can’t have all of our columns represented on this limited map.
The first map will help highlight centroids from various indian states from the census data collected in 2001.
centroid_2001%>%
leaflet() %>%
addTiles() %>%
addMarkers(popup = centroid_2001$State,clusterOptions =markerClusterOptions())
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively
The second map will help highlights centroids from various indian states from the census data collected in 2011.
centroid_2011%>%
leaflet() %>%
addTiles() %>%
addMarkers(popup = centroid_2011$State,clusterOptions =markerClusterOptions())
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively
Based only on visual characterization, not much changed in the span of 10 years in terms of the centroid locations in each state. But, most certainly the increase in population, migration and the economy the population dynamics changed in the 10 years, especially felt more so at district level. The next map will help visualize that better.
This map will help highlight the district level centroids:
d_centroid_population%>%
leaflet() %>%
addTiles() %>%
addMarkers(popup = d_centroid_population$District, clusterOptions =markerClusterOptions())
## Assuming "Longitude" and "Latitude" are longitude and latitude, respectively
The maps created using leaflet will help better visualize the central hubs in each state of India, especially through the lense of census data collected back in 2001 and 2011. A closer look at the district levels provides a more detailed insight into places where the populations are a lot more dense and affect the economy more than other places within the state.