LA-1

Author

Asmita DV(1NT23IS068)/Himanshu Singh(1NT23IS084)-Section B

Team 11:

Plot a dumbbell chart comparing male vs female literacy rates across states.

Step 1: Install and Load Required Libraries

library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)
library(ggalt)
Registered S3 methods overwritten by 'ggalt':
  method                  from   
  grid.draw.absoluteGrob  ggplot2
  grobHeight.absoluteGrob ggplot2
  grobWidth.absoluteGrob  ggplot2
  grobX.absoluteGrob      ggplot2
  grobY.absoluteGrob      ggplot2
library(readr) 

Step 2: Read the dataset

df <- read_csv("datafile.csv")
Rows: 35 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): All India/State/Union Territory
dbl (12): 1991 - Male, 1991 - Female, 1991 - Persons, 2001 - Male, 2001 - Fe...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df) #finding top 6 values of the dataset
# A tibble: 6 × 13
  All India/State/Union Territo…¹ `1991 - Male` `1991 - Female` `1991 - Persons`
  <chr>                                   <dbl>           <dbl>            <dbl>
1 <NA>                                       NA              NA               NA
2 Andhra Pradesh                             44              55               33
3 Arunachal Pradesh                          42              52               30
4 Assam                                      53              62               43
5 Bihar                                      38              51               22
6 Chhatisgarh                                43              58               28
# ℹ abbreviated name: ¹​`All India/State/Union Territory`
# ℹ 9 more variables: `2001 - Male` <dbl>, `2001 - Female` <dbl>,
#   `2001 - Persons` <dbl>, `2011 - Rural - Male` <dbl>,
#   `2011 - Rural - Female` <dbl>, `2011 - Rural - Person` <dbl>,
#   `2011 - Urban - Male` <dbl>, `2011 - Urban - Female` <dbl>,
#   `2011 - Urban - Persons` <dbl>

Step 3: Extract relevant columns and rename for simplicity

data <- df %>%
  select(state = `All India/State/Union Territory`,
         male = `2001 - Male`,
         female = `2001 - Female`) %>%
  mutate(
    male = as.numeric(male),
    female = as.numeric(female)
  ) %>%
  filter(!is.na(male), !is.na(female))

Step 4: Plot dumbbell chart

# Create the dumbbell chart using ggplot
ggplot(data, aes(y = state, x = female, xend = male)) +  # Set y-axis as state, x as female rate, xend as male rate

  # Add dumbbell segments: lines connecting female and male values for each state
  geom_dumbbell(
    size = 2,                 # Thickness of the dumbbell line
    colour = "gray80",        # Color of the connecting line
    colour_x = "#FF69B4",     # Color of the left point (female) - pink
    colour_xend = "#1E90FF"   # Color of the right point (male) - blue
  ) +

  # Add titles and axis labels
  labs(
    title = "Dumbbell Chart: Literacy Rate by Gender (2001)",  # Main title of the chart
    subtitle = "Comparing Female vs Male Literacy Rates Across Indian States",  # Subtitle
    x = "Literacy Rate (%)",   # Label for x-axis
    y = "State"                # Label for y-axis
  ) +

  # Apply a minimal, clean theme
  theme_minimal()
Warning: Using the `size` aesthetic with geom_segment was deprecated in ggplot2 3.4.0.
ℹ Please use the `linewidth` aesthetic instead.