Literacy in India

Introduction

There is renewed debate in India about the role of Hindi in national discourse. Politicial leaders from the new majority BJP Government are said to favour a tilt towards Hindi in social media and gradually in other aspects of official communications. Politiciand from states that don’t speak Hindi especially from Tamil Nadu are up in arms at the apparent downgrading of their languages.

In the middle of this debate can the facts shed some light on the problem?

In this short paper I examine some official statistics on on literacy levels in Indian states.

Data Sources and preparation

I took publicly available 2011 census data. to create a data set on illiteracy by state. The full data set is at the end of this document as an appendix

## 'data.frame':    35 obs. of  7 variables:
##  $ State.Code: Factor w/ 35 levels "ANI","ANP","ARP",..: 33 20 5 35 2 23 31 29 17 11 ...
##  $ State.Name: Factor w/ 35 levels "Andaman and Nicobar Islands",..: 33 21 5 35 2 20 31 29 17 12 ...
##  $ Hindi.Belt: Factor w/ 2 levels "N","Y": 2 1 2 1 1 2 1 2 1 1 ...
##  $ Women     : num  46357801 16306087 26705906 14865603 17413013 ...
##  $ Men       : num  23743391 6769257 15631200 8570593 10660658 ...
##  $ Women.Rate: num  51.4 69.9 46.4 66.6 58.7 ...
##  $ Men.Rate  : num  77.3 88.4 71.2 81.7 74.9 ...

Sates are characterised as either Hindi-belt or not.

The 9 Hindi-belt states are:

## [1] Uttar Pradesh    Bihar            Madhya Pradesh   Rajasthan       
## [5] Jharkhand        Chhattisgarh     Haryana          Uttarakhand     
## [9] Himachal Pradesh
## 35 Levels: Andaman and Nicobar Islands ... West Bengal

Create new colum Language.Group with values “Hindi” where Hindi.Belt = Y and “Other where Hindi.Belt = N

## 'data.frame':    35 obs. of  8 variables:
##  $ State.Code    : Factor w/ 35 levels "ANI","ANP","ARP",..: 33 20 5 35 2 23 31 29 17 11 ...
##  $ State.Name    : Factor w/ 35 levels "Andaman and Nicobar Islands",..: 33 21 5 35 2 20 31 29 17 12 ...
##  $ Hindi.Belt    : Factor w/ 2 levels "N","Y": 2 1 2 1 1 2 1 2 1 1 ...
##  $ Women         : num  46357801 16306087 26705906 14865603 17413013 ...
##  $ Men           : num  23743391 6769257 15631200 8570593 10660658 ...
##  $ Women.Rate    : num  51.4 69.9 46.4 66.6 58.7 ...
##  $ Men.Rate      : num  77.3 88.4 71.2 81.7 74.9 ...
##  $ Language.Group: chr  "Hindi" "Other" "Hindi" "Other" ...

Results

The following plots show the female literacy rate in each of the 35 states and Union Territories in India. The latter 2 show the same data separately for the two categries of states - Hindi majority states nd others

totals of illiterate people by Language group

dfwide <- ddply(data, c("Language.Group"), summarise,
            Women = sum(Women),
            Men = sum(Men))
str(dfwide)
## 'data.frame':    2 obs. of  3 variables:
##  $ Language.Group: chr  "Hindi" "Other"
##  $ Women         : num  1.27e+08 1.06e+08
##  $ Men           : num  64389195 55358033
dfwide
##   Language.Group     Women      Men
## 1          Hindi 126710803 64389195
## 2          Other 105538381 55358033
dflong <- melt(dfwide, id.vars=c("Language.Group"),
               measure.vars = c("Women", "Men"),
               variable.name = "Gender.Group",
               value.name = "Number.Illiterate")
dflong
##   Language.Group Gender.Group Number.Illiterate
## 1          Hindi        Women         126710803
## 2          Other        Women         105538381
## 3          Hindi          Men          64389195
## 4          Other          Men          55358033
ggplot(dflong, aes(x=Language.Group, y=Number.Illiterate/10^6, fill = Gender.Group )) +
        ggtitle("Illiterate population by Gender Group and main State Language" )+
geom_bar(position = "dodge", stat="identity", colour="black")+
        ylab("Number of illiterate people, millions")+
        xlab("State language")+
        geom_text(aes(label=round(Number.Illiterate/10^6)), vjust=1.5, colour="white", position=position_dodge(.9), size=6)
## ymax not defined: adjusting position using y instead

plot of chunk plot1

Plot 2

data.women <- data[with(data, order(Language.Group, Women.Rate)), ]
#ggplot(data.women, aes(x=reorder(State.Name, Women.Rate), y=Women.Rate))+
#        geom_bar(stat="identity")
ggplot(data.women, aes(y=reorder(State.Name, Women.Rate), x=Women.Rate))+
        geom_point(size=3)+ xlim(20, 100)+theme_bw()+
        xlab("Literacy rate in girls over 7 and women")+
        ylab("State")+
        theme(panel.grid.major.x=element_blank(), 
        panel.grid.minor.x = element_blank(), 
        panel.grid.major.y = element_line(colour="grey50", linetype="dashed"))

plot of chunk Plot2

Plot 3

nameorder <- data.women$State.Name[order(data.women$Language.Group, data.women$Women.Rate)]
data.women$State.Name <- factor(data.women$State.Name, levels=nameorder)
ggplot(data.women, aes(x=Women.Rate, y=State.Name))+
        xlab("Literacy rate in girls over 7 and women")+
        ylab("State")+
        geom_segment(aes(yend=State.Name), xend=0, colour="grey50")+
        geom_point(size=3, aes(colour=Language.Group))+xlim(20,100)+
        scale_colour_brewer(palette="Set1", limits=c("Other","Hindi"))+
        theme_bw()+
        theme(panel.grid.major.y = element_blank(),
              legend.position=c(1,0.55),
              legend.justification=c(1,0.5))

plot of chunk Plot3

Plot 4 - using facets

ggplot(data.women, aes(x=Women.Rate, y=State.Name))+
        xlab("Literacy rate in girls over 7 and women")+
        ylab("State")+
        geom_segment(aes(yend=State.Name), xend=0, colour="grey50")+
        geom_point(size=3, aes(colour=Language.Group))+xlim(20,100)+
        scale_colour_brewer(palette="Set1", limits=c("Other","Hindi"), guide= FALSE)+
        theme_bw()+
        theme(panel.grid.major.y = element_blank())+
        facet_grid(Language.Group~., scales = "free_y", space = "free_y")

plot of chunk plot4

Conclusions

The charts show clearly that female literacy rates are low in many states. With two exceptions (Himachal Pradesh, and Uttarakhand) the other 7 Hindi speaking states have rates below 60%. Also these include the large populous states of Uttar Pradesh, Bihar, Madhya Pradesh and Rajasthan.

Jayenaar

22 June 2014

Acknowledgements: I learned a great deal from Winston Chang: R Graphics Cookbook, Dec 2012. ## Apendix

The full data set is here:

data
##    State.Code                  State.Name Hindi.Belt    Women      Men
## 1         UPR               Uttar Pradesh          Y 46357801 23743391
## 2         MAH                 Maharashtra          N 16306087  6769257
## 3         BHR                       Bihar          Y 26705906 15631200
## 4         WBN                 West Bengal          N 14865603  8570593
## 5         ANP              Andhra Pradesh          N 17413013 10660658
## 6         MPR              Madhya Pradesh          Y 15935702  7999856
## 7         TMN                  Tamil Nadu          N  9669929  4782090
## 8         RJN                   Rajasthan          Y 17236206  7398822
## 9         KNK                   Karnataka          N 10241055  5428285
## 10        GJR                     Gujarat          N 10619657  4488096
## 11        ORS                      Orissa          N  7794958  3904726
## 12        KRL                      Kerala          N   132060   623558
## 13        KHK                   Jharkhand          Y  7699390  3921998
## 14        ASM                       Assam          N  5649197  3530151
## 15        PNB                      Punjab          N  4911025  2863640
## 16        CHG                Chhattisgarh          Y  5139347  2531425
## 17        HRY                     Haryana          Y  5110244  2150624
## 18        DLI                       Delhi          N  2429956   814233
## 19        JNK           Jammu and Kashmir          N  3003029  1543596
## 20        UTR                 Uttarakhand          Y  1629901   647414
## 21        HPR            Himachal Pradesh          Y   896307   364464
## 22        TRP                     Tripura          N   378248   158766
## 23        MGH                   Meghalaya          N   414838   358741
## 24        MNP                     Manipur          N   361865   179876
## 25        NGL                    Nagaland          N   286075   176743
## 26        GOA                         Goa          N   128322    54335
## 27        ARP           Arunachal Pradesh          N   311290   195992
## 28        PDC                  Puducherry          N   101332    53545
## 29        MZR                     Mizoram          N    71970    36925
## 30        CNG                  Chandigarh          N   167115    58114
## 31        SKM                      Sikkim          N    96636    43451
## 32        ANI Andaman and Nicobar Islands          N    51395    19739
## 33        DNH      Dadra and Nagar Haveli          N    78475    28733
## 34        DND               Daman and Diu          N    49827    12719
## 35        LKD                 Lakshadweep          N     5425     1471
##    Women.Rate Men.Rate Language.Group
## 1       51.36    77.28          Hindi
## 2       69.87    88.38          Other
## 3       46.40    71.20          Hindi
## 4       66.57    81.69          Other
## 5       58.68    74.88          Other
## 6       54.49    78.73          Hindi
## 7       73.14    86.77          Other
## 8       47.76    79.19          Hindi
## 9       66.01    82.47          Other
## 10      63.31    85.75          Other
## 11      62.46    81.59          Other
## 12      99.76    96.11          Other
## 13      52.04    76.84          Hindi
## 14      63.00    77.85          Other
## 15      62.52    80.44          Other
## 16      59.58    80.27          Hindi
## 17      56.91    84.06          Hindi
## 18      68.85    90.94          Other
## 19      49.12    76.75          Other
## 20      67.06    87.40          Hindi
## 21      73.51    89.53          Hindi
## 22      78.98    91.53          Other
## 23      71.88    75.95          Other
## 24      71.73    86.06          Other
## 25      70.01    82.75          Other
## 26      82.16    92.65          Other
## 27      53.52    72.55          Other
## 28      84.05    91.26          Other
## 29      86.72    93.35          Other
## 30      64.81    89.99          Other
## 31      66.39    86.55          Other
## 32      71.08    90.27          Other
## 33      47.67    85.17          Other
## 34      46.37    91.54          Other
## 35      82.69    95.56          Other