LA Assignment

Author

Prem & Kiran T L

Introduction

This report analyzes population data (India Census dataset) using R programming. The objective is to:

  • Aggregate district-level data into state-level data
  • Visualize population distribution
  • Compare male and female populations
  • Identify top states by population

Load Required Library

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.3
library(maps)
Warning: package 'maps' was built under R version 4.5.3
library(viridis)
Warning: package 'viridis' was built under R version 4.5.3
Loading required package: viridisLite

Attaching package: 'viridis'
The following object is masked from 'package:maps':

    unemp

Loading Dataset

data <- read.csv(file.choose())
str(data)
'data.frame':   640 obs. of  25 variables:
 $ District_code       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ State_name          : chr  "JAMMU AND KASHMIR" "JAMMU AND KASHMIR" "JAMMU AND KASHMIR" "JAMMU AND KASHMIR" ...
 $ District_name       : chr  "Kupwara" "Badgam" "Leh(Ladakh)" "Kargil" ...
 $ Population          : int  870354 753745 133487 140802 476835 642415 616435 1008039 392232 1236829 ...
 $ Male                : int  474190 398041 78971 77785 251899 345351 326109 534733 207680 651124 ...
 $ Female              : int  396164 355704 54516 63017 224936 297064 290326 473306 184552 585705 ...
 $ Literate            : int  439654 335649 93770 86236 261724 364109 389204 545149 185979 748584 ...
 $ Workers             : int  229064 214866 75079 51873 161393 290912 200431 304200 149317 407188 ...
 $ Male_Workers        : int  190899 162578 53265 39839 117677 184752 161548 249581 101380 333151 ...
 $ Female_Workers      : int  38165 52288 21814 12034 43716 106160 38883 54619 47937 74037 ...
 $ Cultivator_Workers  : int  34680 55299 20869 8266 54264 136527 69533 57495 28232 12228 ...
 $ Agricultural_Workers: int  56759 36630 1645 3763 31583 24016 21566 62246 32882 10408 ...
 $ Household_Workers   : int  7946 29102 1020 1222 3930 4656 3952 15084 20484 20095 ...
 $ Hindus              : int  37128 10110 22882 10341 32604 221880 540063 30621 8439 42540 ...
 $ Muslims             : int  823286 736054 19057 108239 431279 402879 64234 959185 382006 1177342 ...
 $ Christians          : int  1700 1489 658 604 958 983 1828 1497 572 2746 ...
 $ Sikhs               : int  5600 5559 1092 1171 11188 15513 9551 14770 555 12187 ...
 $ Buddhists           : int  66 47 88635 20126 83 189 24 140 44 285 ...
 $ Jains               : int  39 6 103 28 10 26 16 29 17 74 ...
 $ Secondary_Education : int  74948 66459 16265 16938 46062 65921 91522 107837 35630 176409 ...
 $ Higher_Education    : int  39709 41367 8923 9826 29517 35804 47694 57932 18644 132727 ...
 $ Graduate_Education  : int  21751 27950 6197 3077 13962 18576 24330 48285 12721 121856 ...
 $ Age_Group_0_29      : int  600759 503223 70703 87532 304979 404903 357864 636524 252378 693238 ...
 $ Age_Group_30_49     : int  178435 160933 41515 35561 109818 153165 160123 239659 90465 351561 ...
 $ Age_Group_50        : int  89679 88978 21019 17488 61334 83319 97684 130513 48802 190330 ...

Aggregate Data (District → State)

state_data <- data %>%
  group_by(State_name) %>%
  summarise(
    Total_Population = sum(Population),
    Total_Male = sum(Male),
    Total_Female = sum(Female)
  )

Bar Chart (Population by State)

ggplot(state_data, aes(x = reorder(State_name, Total_Population),
                       y = Total_Population)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Population by State (India Census 2011)",
    x = "State",
    y = "Population"
  ) +
  theme_minimal()

Pie Chart (Rural vs Urban Population)

mf_data <- data.frame(
  Category = c("Male", "Female"),
  Population = c(sum(data$Male), sum(data$Female))
)

ggplot(mf_data, aes(x = "", y = Population, fill = Category)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y") +
  theme_void() +
  labs(title = "Male vs Female Population")

Ranking Plot (Top States by Population)

top_states <- state_data %>%
  arrange(desc(Total_Population)) %>%
  top_n(10, Total_Population)

ggplot(top_states,
       aes(x = reorder(State_name, Total_Population),
           y = Total_Population)) +
  geom_point(size = 4, color = "red") +
  geom_segment(aes(xend = State_name, yend = 0)) +
  coord_flip() +
  labs(
    title = "Top 10 States by Population",
    x = "State",
    y = "Population"
  ) +
  theme_minimal()

Conclusion

This analysis demonstrates:

  • Efficient data aggregation using dplyr
  • Visualization techniques using ggplot2
  • Insights into:
    • State-wise population distribution
    • Gender comparison
    • Top populated states