Develop a script in R to create a choropleth map showing literacy rates across Indian states using spatial visualization.
What we will do?
Load required libraries
Load and inspect dataset
Perform exploratory data analysis
Load India map (spatial data)
Merge dataset with map
Create choropleth map
Step 1: Load required Dataset
In this step, we load the required libraries.
ggplot2 is used for creating visualizations.
dplyr helps in data manipulation.
sf is used for handling spatial (map) data.
readr helps in reading data files efficiently.
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.3
library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.3
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(maps)
Warning: package 'maps' was built under R version 4.5.3
Step 2: Load Dataset
Here, we load the dataset containing literacy rates using read.csv().We then preview the data using head() and check its structure using str() to understand the variables and their data types.
data <-data.frame(State =c("andhra pradesh","karnataka","tamil nadu","kerala","maharashtra","bihar"),Literacy_2001 =c(60,67,73,90,77,47),Literacy_2011 =c(67,75,80,94,83,63))data
State Literacy_2001 Literacy_2011
1 andhra pradesh 60 67
2 karnataka 67 75
3 tamil nadu 73 80
4 kerala 90 94
5 maharashtra 77 83
6 bihar 47 63
Step 3: Exploratory Data Analysis
We analyze the dataset to understand values and check for missing data.
summary(data)
State Literacy_2001 Literacy_2011
Length:6 Min. :47.00 Min. :63.00
Class :character 1st Qu.:61.75 1st Qu.:69.00
Mode :character Median :70.00 Median :77.50
Mean :69.00 Mean :77.00
3rd Qu.:76.00 3rd Qu.:82.25
Max. :90.00 Max. :94.00
State Literacy_2001 Literacy_2011 Growth
1 andhra pradesh 60 67 7
2 karnataka 67 75 8
3 tamil nadu 73 80 7
4 kerala 90 94 4
5 maharashtra 77 83 6
6 bihar 47 63 16
Step 4: Load India Map
We use built-in map data of India using the maps package.
# Load map dataindia_map <-map_data("world")# Convert to lowercase for matchingindia_map$region <-tolower(india_map$region)# Filter India onlyindia_map <- india_map[india_map$region =="india", ]head(india_map)
long lat group order region subregion
49786 93.89004 6.831055 826 50611 india Great Nicobar
49787 93.82881 6.748682 826 50612 india Great Nicobar
49788 93.70928 7.000683 826 50613 india Great Nicobar
49789 93.65800 7.016065 826 50614 india Great Nicobar
49790 93.65635 7.136231 826 50615 india Great Nicobar
49791 93.68418 7.183593 826 50616 india Great Nicobar
Step 5: Merge Data with Map
In this step, we combine the literacy dataset with the spatial map data.
This is done using left_join() by matching state names in both datasets.
This step is crucial because it links data values with geographic regions.
# Convert state names to lowercasedata$State <-tolower(data$State)# Merge map and datamerged_data <-merge(india_map, data, by.x ="region", by.y ="State", all.x =TRUE)head(merged_data)
region long lat group order subregion Literacy_2001
1 india 93.89004 6.831055 826 50611 Great Nicobar NA
2 india 93.82881 6.748682 826 50612 Great Nicobar NA
3 india 93.70928 7.000683 826 50613 Great Nicobar NA
4 india 93.65800 7.016065 826 50614 Great Nicobar NA
5 india 93.65635 7.136231 826 50615 Great Nicobar NA
6 india 93.68418 7.183593 826 50616 Great Nicobar NA
Literacy_2011 Growth
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
Step 6: Create Choropleth Map
Map for Literacy Rate (2011)
Here, we create a choropleth map where each Indian state is colored based on its literacy rate in 2011. Darker shades indicate higher literacy, while lighter shades indicate lower literacy.
ggplot(merged_data, aes(x = long, y = lat, group = group)) +geom_polygon(aes(fill = Literacy_2011), color ="black") +scale_fill_gradient(low ="lightblue", high ="darkblue", na.value ="grey50") +theme_minimal() +labs(title ="Literacy Rate in India (2011)",fill ="Literacy %" )