LA

Author

Anika and Likita

Problem statement:

Develop a script in R to create a choropleth map showing literacy rates across Indian states using spatial visualization.

What we will do?

  1. Load required libraries

  2. Load and inspect dataset

  3. Perform exploratory data analysis

  4. Load India map (spatial data)

  5. Merge dataset with map

  6. Create choropleth map

Step 1: Load required Dataset

In this step, we load the required libraries.

  • ggplot2 is used for creating visualizations.

  • dplyr helps in data manipulation.

  • sf is used for handling spatial (map) data.

  • readr helps in reading data files efficiently.

library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.3
library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.3

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(maps)
Warning: package 'maps' was built under R version 4.5.3

Step 2: Load Dataset

Here, we load the dataset containing literacy rates using read.csv().We then preview the data using head() and check its structure using str() to understand the variables and their data types.

data <- data.frame(
  State = c("andhra pradesh","karnataka","tamil nadu","kerala","maharashtra","bihar"),
  Literacy_2001 = c(60,67,73,90,77,47),
  Literacy_2011 = c(67,75,80,94,83,63)
)

data
           State Literacy_2001 Literacy_2011
1 andhra pradesh            60            67
2      karnataka            67            75
3     tamil nadu            73            80
4         kerala            90            94
5    maharashtra            77            83
6          bihar            47            63

Step 3: Exploratory Data Analysis

We analyze the dataset to understand values and check for missing data.

summary(data)
    State           Literacy_2001   Literacy_2011  
 Length:6           Min.   :47.00   Min.   :63.00  
 Class :character   1st Qu.:61.75   1st Qu.:69.00  
 Mode  :character   Median :70.00   Median :77.50  
                    Mean   :69.00   Mean   :77.00  
                    3rd Qu.:76.00   3rd Qu.:82.25  
                    Max.   :90.00   Max.   :94.00  
colSums(is.na(data))
        State Literacy_2001 Literacy_2011 
            0             0             0 
# Add growth column
data$Growth <- data$Literacy_2011 - data$Literacy_2001
data
           State Literacy_2001 Literacy_2011 Growth
1 andhra pradesh            60            67      7
2      karnataka            67            75      8
3     tamil nadu            73            80      7
4         kerala            90            94      4
5    maharashtra            77            83      6
6          bihar            47            63     16

Step 4: Load India Map

We use built-in map data of India using the maps package.

# Load map data
india_map <- map_data("world")

# Convert to lowercase for matching
india_map$region <- tolower(india_map$region)

# Filter India only
india_map <- india_map[india_map$region == "india", ]

head(india_map)
          long      lat group order region     subregion
49786 93.89004 6.831055   826 50611  india Great Nicobar
49787 93.82881 6.748682   826 50612  india Great Nicobar
49788 93.70928 7.000683   826 50613  india Great Nicobar
49789 93.65800 7.016065   826 50614  india Great Nicobar
49790 93.65635 7.136231   826 50615  india Great Nicobar
49791 93.68418 7.183593   826 50616  india Great Nicobar

Step 5: Merge Data with Map

In this step, we combine the literacy dataset with the spatial map data.
This is done using left_join() by matching state names in both datasets.
This step is crucial because it links data values with geographic regions.

# Convert state names to lowercase
data$State <- tolower(data$State)

# Merge map and data
merged_data <- merge(india_map, data, by.x = "region", by.y = "State", all.x = TRUE)

head(merged_data)
  region     long      lat group order     subregion Literacy_2001
1  india 93.89004 6.831055   826 50611 Great Nicobar            NA
2  india 93.82881 6.748682   826 50612 Great Nicobar            NA
3  india 93.70928 7.000683   826 50613 Great Nicobar            NA
4  india 93.65800 7.016065   826 50614 Great Nicobar            NA
5  india 93.65635 7.136231   826 50615 Great Nicobar            NA
6  india 93.68418 7.183593   826 50616 Great Nicobar            NA
  Literacy_2011 Growth
1            NA     NA
2            NA     NA
3            NA     NA
4            NA     NA
5            NA     NA
6            NA     NA

Step 6: Create Choropleth Map

Map for Literacy Rate (2011)

Here, we create a choropleth map where each Indian state is colored based on its literacy rate in 2011. Darker shades indicate higher literacy, while lighter shades indicate lower literacy.

ggplot(merged_data, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = Literacy_2011), color = "black") +
  scale_fill_gradient(low = "lightblue", high = "darkblue", na.value = "grey50") +
  theme_minimal() +
  labs(
    title = "Literacy Rate in India (2011)",
    fill = "Literacy %"
  )