There is renewed debate in India about the role of Hindi in national discourse. Politicial leaders from the new majority BJP Government are said to favour a tilt towards Hindi in social media and gradually in other aspects of official communications. Politiciand from states that don’t speak Hindi especially from Tamil Nadu are up in arms at the apparent downgrading of their languages.
In the middle of this debate can the facts shed some light on the problem?
In this short paper I examine some official statistics on on literacy levels in Indian states.
I took publicly available 2011 census data. to create a data set on illiteracy by state. The full data set is at the end of this document as an appendix
## 'data.frame': 35 obs. of 7 variables:
## $ State.Code: Factor w/ 35 levels "ANI","ANP","ARP",..: 33 20 5 35 2 23 31 29 17 11 ...
## $ State.Name: Factor w/ 35 levels "Andaman and Nicobar Islands",..: 33 21 5 35 2 20 31 29 17 12 ...
## $ Hindi.Belt: Factor w/ 2 levels "N","Y": 2 1 2 1 1 2 1 2 1 1 ...
## $ Women : num 46357801 16306087 26705906 14865603 17413013 ...
## $ Men : num 23743391 6769257 15631200 8570593 10660658 ...
## $ Women.Rate: num 51.4 69.9 46.4 66.6 58.7 ...
## $ Men.Rate : num 77.3 88.4 71.2 81.7 74.9 ...
Sates are characterised as either Hindi-belt or not.
The 9 Hindi-belt states are:
## [1] Uttar Pradesh Bihar Madhya Pradesh Rajasthan
## [5] Jharkhand Chhattisgarh Haryana Uttarakhand
## [9] Himachal Pradesh
## 35 Levels: Andaman and Nicobar Islands ... West Bengal
Create new colum Language.Group with values “Hindi” where Hindi.Belt = Y and “Other where Hindi.Belt = N
## 'data.frame': 35 obs. of 8 variables:
## $ State.Code : Factor w/ 35 levels "ANI","ANP","ARP",..: 33 20 5 35 2 23 31 29 17 11 ...
## $ State.Name : Factor w/ 35 levels "Andaman and Nicobar Islands",..: 33 21 5 35 2 20 31 29 17 12 ...
## $ Hindi.Belt : Factor w/ 2 levels "N","Y": 2 1 2 1 1 2 1 2 1 1 ...
## $ Women : num 46357801 16306087 26705906 14865603 17413013 ...
## $ Men : num 23743391 6769257 15631200 8570593 10660658 ...
## $ Women.Rate : num 51.4 69.9 46.4 66.6 58.7 ...
## $ Men.Rate : num 77.3 88.4 71.2 81.7 74.9 ...
## $ Language.Group: chr "Hindi" "Other" "Hindi" "Other" ...
The following plots show the female literacy rate in each of the 35 states and Union Territories in India. The latter 2 show the same data separately for the two categries of states - Hindi majority states nd others
dfwide <- ddply(data, c("Language.Group"), summarise,
Women = sum(Women),
Men = sum(Men))
str(dfwide)
## 'data.frame': 2 obs. of 3 variables:
## $ Language.Group: chr "Hindi" "Other"
## $ Women : num 1.27e+08 1.06e+08
## $ Men : num 64389195 55358033
dfwide
## Language.Group Women Men
## 1 Hindi 126710803 64389195
## 2 Other 105538381 55358033
dflong <- melt(dfwide, id.vars=c("Language.Group"),
measure.vars = c("Women", "Men"),
variable.name = "Gender.Group",
value.name = "Number.Illiterate")
dflong
## Language.Group Gender.Group Number.Illiterate
## 1 Hindi Women 126710803
## 2 Other Women 105538381
## 3 Hindi Men 64389195
## 4 Other Men 55358033
ggplot(dflong, aes(x=Language.Group, y=Number.Illiterate/10^6, fill = Gender.Group )) +
ggtitle("Illiterate population by Gender Group and main State Language" )+
geom_bar(position = "dodge", stat="identity", colour="black")+
ylab("Number of illiterate people, millions")+
xlab("State language")+
geom_text(aes(label=round(Number.Illiterate/10^6)), vjust=1.5, colour="white", position=position_dodge(.9), size=6)
## ymax not defined: adjusting position using y instead
data.women <- data[with(data, order(Language.Group, Women.Rate)), ]
#ggplot(data.women, aes(x=reorder(State.Name, Women.Rate), y=Women.Rate))+
# geom_bar(stat="identity")
ggplot(data.women, aes(y=reorder(State.Name, Women.Rate), x=Women.Rate))+
geom_point(size=3)+ xlim(20, 100)+theme_bw()+
xlab("Literacy rate in girls over 7 and women")+
ylab("State")+
theme(panel.grid.major.x=element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(colour="grey50", linetype="dashed"))
nameorder <- data.women$State.Name[order(data.women$Language.Group, data.women$Women.Rate)]
data.women$State.Name <- factor(data.women$State.Name, levels=nameorder)
ggplot(data.women, aes(x=Women.Rate, y=State.Name))+
xlab("Literacy rate in girls over 7 and women")+
ylab("State")+
geom_segment(aes(yend=State.Name), xend=0, colour="grey50")+
geom_point(size=3, aes(colour=Language.Group))+xlim(20,100)+
scale_colour_brewer(palette="Set1", limits=c("Other","Hindi"))+
theme_bw()+
theme(panel.grid.major.y = element_blank(),
legend.position=c(1,0.55),
legend.justification=c(1,0.5))
ggplot(data.women, aes(x=Women.Rate, y=State.Name))+
xlab("Literacy rate in girls over 7 and women")+
ylab("State")+
geom_segment(aes(yend=State.Name), xend=0, colour="grey50")+
geom_point(size=3, aes(colour=Language.Group))+xlim(20,100)+
scale_colour_brewer(palette="Set1", limits=c("Other","Hindi"), guide= FALSE)+
theme_bw()+
theme(panel.grid.major.y = element_blank())+
facet_grid(Language.Group~., scales = "free_y", space = "free_y")
The charts show clearly that female literacy rates are low in many states. With two exceptions (Himachal Pradesh, and Uttarakhand) the other 7 Hindi speaking states have rates below 60%. Also these include the large populous states of Uttar Pradesh, Bihar, Madhya Pradesh and Rajasthan.
22 June 2014
Acknowledgements: I learned a great deal from Winston Chang: R Graphics Cookbook, Dec 2012. ## Apendix
The full data set is here:
data
## State.Code State.Name Hindi.Belt Women Men
## 1 UPR Uttar Pradesh Y 46357801 23743391
## 2 MAH Maharashtra N 16306087 6769257
## 3 BHR Bihar Y 26705906 15631200
## 4 WBN West Bengal N 14865603 8570593
## 5 ANP Andhra Pradesh N 17413013 10660658
## 6 MPR Madhya Pradesh Y 15935702 7999856
## 7 TMN Tamil Nadu N 9669929 4782090
## 8 RJN Rajasthan Y 17236206 7398822
## 9 KNK Karnataka N 10241055 5428285
## 10 GJR Gujarat N 10619657 4488096
## 11 ORS Orissa N 7794958 3904726
## 12 KRL Kerala N 132060 623558
## 13 KHK Jharkhand Y 7699390 3921998
## 14 ASM Assam N 5649197 3530151
## 15 PNB Punjab N 4911025 2863640
## 16 CHG Chhattisgarh Y 5139347 2531425
## 17 HRY Haryana Y 5110244 2150624
## 18 DLI Delhi N 2429956 814233
## 19 JNK Jammu and Kashmir N 3003029 1543596
## 20 UTR Uttarakhand Y 1629901 647414
## 21 HPR Himachal Pradesh Y 896307 364464
## 22 TRP Tripura N 378248 158766
## 23 MGH Meghalaya N 414838 358741
## 24 MNP Manipur N 361865 179876
## 25 NGL Nagaland N 286075 176743
## 26 GOA Goa N 128322 54335
## 27 ARP Arunachal Pradesh N 311290 195992
## 28 PDC Puducherry N 101332 53545
## 29 MZR Mizoram N 71970 36925
## 30 CNG Chandigarh N 167115 58114
## 31 SKM Sikkim N 96636 43451
## 32 ANI Andaman and Nicobar Islands N 51395 19739
## 33 DNH Dadra and Nagar Haveli N 78475 28733
## 34 DND Daman and Diu N 49827 12719
## 35 LKD Lakshadweep N 5425 1471
## Women.Rate Men.Rate Language.Group
## 1 51.36 77.28 Hindi
## 2 69.87 88.38 Other
## 3 46.40 71.20 Hindi
## 4 66.57 81.69 Other
## 5 58.68 74.88 Other
## 6 54.49 78.73 Hindi
## 7 73.14 86.77 Other
## 8 47.76 79.19 Hindi
## 9 66.01 82.47 Other
## 10 63.31 85.75 Other
## 11 62.46 81.59 Other
## 12 99.76 96.11 Other
## 13 52.04 76.84 Hindi
## 14 63.00 77.85 Other
## 15 62.52 80.44 Other
## 16 59.58 80.27 Hindi
## 17 56.91 84.06 Hindi
## 18 68.85 90.94 Other
## 19 49.12 76.75 Other
## 20 67.06 87.40 Hindi
## 21 73.51 89.53 Hindi
## 22 78.98 91.53 Other
## 23 71.88 75.95 Other
## 24 71.73 86.06 Other
## 25 70.01 82.75 Other
## 26 82.16 92.65 Other
## 27 53.52 72.55 Other
## 28 84.05 91.26 Other
## 29 86.72 93.35 Other
## 30 64.81 89.99 Other
## 31 66.39 86.55 Other
## 32 71.08 90.27 Other
## 33 47.67 85.17 Other
## 34 46.37 91.54 Other
## 35 82.69 95.56 Other