Anika and Likita
2026-04-15
Problem statement:
Develop a script in R to create a choropleth map showing literacy rates across Indian states using spatial visualization.
What we will do?
Load required libraries
Load and inspect dataset
Perform exploratory data analysis
Load India map (spatial data)
Merge dataset with map
Create choropleth map
Step 1: Load required Dataset
In this step, we load the required libraries.
ggplot2 is used for creating visualizations.
dplyr helps in data manipulation.
sf is used for handling spatial (map) data.
readr helps in reading data files efficiently.
{r} library(ggplot2) library(dplyr) library(maps)
Step 2: Load Dataset
Here, we load the dataset containing literacy rates using read.csv().We then preview the data using head() and check its structure using str() to understand the variables and their data types.
{r} data <- data.frame( State = c(“andhra pradesh”,“karnataka”,“tamil nadu”,“kerala”,“maharashtra”,“bihar”), Literacy_2001 = c(60,67,73,90,77,47), Literacy_2011 = c(67,75,80,94,83,63) )
data
Step 3: Exploratory Data Analysis
We analyze the dataset to understand values and check for missing data.
{r} summary(data) colSums(is.na(data))
data\(Growth <- data\)Literacy_2011 - data$Literacy_2001 data
Step 4: Load India Map
We use built-in map data of India using the maps package.
{r} # Load map data india_map <- map_data(“world”)
india_map\(region <- tolower(india_map\)region)
india_map <- india_map[india_map$region == “india”, ]
head(india_map)
Step 5: Merge Data with Map
In this step, we combine the literacy dataset with the spatial map data.This is done using left_join() by matching state names in both datasets.This step is crucial because it links data values with geographic regions.
{r} # Convert state names to lowercase data\(State <- tolower(data\)State)
merged_data <- merge(india_map, data, by.x = “region”, by.y = “State”, all.x = TRUE)
head(merged_data)
Step 6: Create Choropleth Map
Map for Literacy Rate (2011)
Here, we create a choropleth map where each Indian state is colored based on its literacy rate in 2011. Darker shades indicate higher literacy, while lighter shades indicate lower literacy.
{r} ggplot(merged_data, aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = Literacy_2011), color = “black”) + scale_fill_gradient(low = “lightblue”, high = “darkblue”, na.value = “grey50”) + theme_minimal() + labs( title = “Literacy Rate in India (2011)”, fill = “Literacy %” )