Introduction
This report analyzes population data (India Census dataset) using R programming. The objective is to:
- Aggregate district-level data into state-level data
- Visualize population distribution
- Compare male and female populations
- Identify top states by population
Load Required Library
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Warning: package 'ggplot2' was built under R version 4.5.3
Warning: package 'maps' was built under R version 4.5.3
Warning: package 'viridis' was built under R version 4.5.3
Loading required package: viridisLite
Attaching package: 'viridis'
The following object is masked from 'package:maps':
unemp
Loading Dataset
data <- read.csv(file.choose())
'data.frame': 640 obs. of 25 variables:
$ District_code : int 1 2 3 4 5 6 7 8 9 10 ...
$ State_name : chr "JAMMU AND KASHMIR" "JAMMU AND KASHMIR" "JAMMU AND KASHMIR" "JAMMU AND KASHMIR" ...
$ District_name : chr "Kupwara" "Badgam" "Leh(Ladakh)" "Kargil" ...
$ Population : int 870354 753745 133487 140802 476835 642415 616435 1008039 392232 1236829 ...
$ Male : int 474190 398041 78971 77785 251899 345351 326109 534733 207680 651124 ...
$ Female : int 396164 355704 54516 63017 224936 297064 290326 473306 184552 585705 ...
$ Literate : int 439654 335649 93770 86236 261724 364109 389204 545149 185979 748584 ...
$ Workers : int 229064 214866 75079 51873 161393 290912 200431 304200 149317 407188 ...
$ Male_Workers : int 190899 162578 53265 39839 117677 184752 161548 249581 101380 333151 ...
$ Female_Workers : int 38165 52288 21814 12034 43716 106160 38883 54619 47937 74037 ...
$ Cultivator_Workers : int 34680 55299 20869 8266 54264 136527 69533 57495 28232 12228 ...
$ Agricultural_Workers: int 56759 36630 1645 3763 31583 24016 21566 62246 32882 10408 ...
$ Household_Workers : int 7946 29102 1020 1222 3930 4656 3952 15084 20484 20095 ...
$ Hindus : int 37128 10110 22882 10341 32604 221880 540063 30621 8439 42540 ...
$ Muslims : int 823286 736054 19057 108239 431279 402879 64234 959185 382006 1177342 ...
$ Christians : int 1700 1489 658 604 958 983 1828 1497 572 2746 ...
$ Sikhs : int 5600 5559 1092 1171 11188 15513 9551 14770 555 12187 ...
$ Buddhists : int 66 47 88635 20126 83 189 24 140 44 285 ...
$ Jains : int 39 6 103 28 10 26 16 29 17 74 ...
$ Secondary_Education : int 74948 66459 16265 16938 46062 65921 91522 107837 35630 176409 ...
$ Higher_Education : int 39709 41367 8923 9826 29517 35804 47694 57932 18644 132727 ...
$ Graduate_Education : int 21751 27950 6197 3077 13962 18576 24330 48285 12721 121856 ...
$ Age_Group_0_29 : int 600759 503223 70703 87532 304979 404903 357864 636524 252378 693238 ...
$ Age_Group_30_49 : int 178435 160933 41515 35561 109818 153165 160123 239659 90465 351561 ...
$ Age_Group_50 : int 89679 88978 21019 17488 61334 83319 97684 130513 48802 190330 ...
Aggregate Data (District → State)
state_data <- data %>%
group_by(State_name) %>%
summarise(
Total_Population = sum(Population),
Total_Male = sum(Male),
Total_Female = sum(Female)
)
Bar Chart (Population by State)
ggplot(state_data, aes(x = reorder(State_name, Total_Population),
y = Total_Population)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(
title = "Population by State (India Census 2011)",
x = "State",
y = "Population"
) +
theme_minimal()
Pie Chart (Rural vs Urban Population)
mf_data <- data.frame(
Category = c("Male", "Female"),
Population = c(sum(data$Male), sum(data$Female))
)
ggplot(mf_data, aes(x = "", y = Population, fill = Category)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y") +
theme_void() +
labs(title = "Male vs Female Population")
Ranking Plot (Top States by Population)
top_states <- state_data %>%
arrange(desc(Total_Population)) %>%
top_n(10, Total_Population)
ggplot(top_states,
aes(x = reorder(State_name, Total_Population),
y = Total_Population)) +
geom_point(size = 4, color = "red") +
geom_segment(aes(xend = State_name, yend = 0)) +
coord_flip() +
labs(
title = "Top 10 States by Population",
x = "State",
y = "Population"
) +
theme_minimal()
Conclusion
This analysis demonstrates:
- Efficient data aggregation using dplyr
- Visualization techniques using ggplot2
- Insights into:
- State-wise population distribution
- Gender comparison
- Top populated states