require(knitr)
## Loading required package: knitr
opts_knit$set(root.dir = normalizePath('../'))
knitr::opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
echo=FALSE, warning=FALSE, message=FALSE)
opts_knit$set(root.dir = normalizePath('../'))
# Libraries
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(lubridate)
library(scales)
We will use the data generated in the previous report to plot some stats for Bangalore
## Source: local data frame [894 x 3]
## Groups: Age [?]
##
## Age Dates Count
## (int) (time) (int)
## 1 18 2012-07-01 19318
## 2 18 2012-12-01 2010
## 3 18 2013-01-01 32637
## 4 18 2013-04-01 58098
## 5 18 2013-05-01 58276
## 6 18 2013-10-01 161829
## 7 18 2014-01-01 191720
## 8 18 2014-03-01 176060
## 9 18 2014-10-01 179034
## 10 18 2015-01-01 171810
## .. ... ... ...
In 2011, Census gave the folowing data for Bangalore
City Population of 84,43,675 of whicl male population is 43,91,723 and female population is 40,51,952
Since the voters list is range bound and for a short duration in 2014 touches our expected value of 52 Lakh, We are not sure about the amount of inaccuracies in the data.
The rest of the analysis is performed assuming that the inaccuracy is unifirmly spread and is akin to white noise and will cancel itselves out.
The voters can be classified in the following categories
| Age Range | Category |
|---|---|
| 0-18 | Minor |
| 18-22 | Student |
| 22-27 | Single Worker |
| 27 -30 | Newly Married |
| 30 -40 | Married |
| 40-50 | Mid-Level |
| 50-60 | Senior-Level |
| 60+ | Retired |
Children age 0-18 are not a part of this database. There are 3 ways we can interpolate this data
To check how off our reasoning is, let us compare the census age data for Bangalore Urban District While the numbers in the census data are different this data also shows a sharp discontinuity around the age of 15-16 years. Another interesting pattern common to both the census data and the 2012 voters list is the regular spike in age. In case of census data this spike takes place every 10 years and can be an indication of the approximation done by the enumerator… * Average of 2 children per family. * Mothers Age at first child birth 26 * Mothers Age at second child birth 29 * Mothers Age at third child birth 32 (Note: The average numbers for India are not available online, the estimates for western developed countries are 29-30 for first child birth, using slightly lower numbers here.)
Sanity Check: Are minors 40% of the population? Total population in database= 8.845996710^{7} Minors = 2.825704710^{7}
Minors Percentage= 31.9433162 is far less than the expected 40%
Replotting the population summary graphics we see a lack of smooth transition between the minor’s and the voting population. This may be due to two possible reason.