Speech style and education distinguish the grammatical classes of (ING)

Sarah Horwitz

Investigation of the variable (ING) (workin’ ~ working) has propelled the study of language variation and change (Fischer 1958, Labov 1966/1982, Trudgill 1974, Campbell-Kibler 2007, Houston 1985). While (ING)’s stability and saliency (Labov 2001a:86) cannot be ignored, researchers emphasize our knowledge of “…broad cross-variety patterns of [ING] usage” (Tagliamonte 2004:394) remains scant. We analyze constraints on (ING) across grammatical category in Philadelphia English and show that nominal, verbal and quantifier (ING) are conditioned differently. We argue (ING)’s variant stylistic conditioning supports a conception of (ING) as more than one variable and venture (ING) may be less stable than widely believed.

#insert 3 graphs, showing "nominal, verbal and quantifier (ING) are conditioned differently"
#coding is still work in process; trying to make graphs that show results of models
#i had to remove my in-process coding bc the document wouldn't run with it in

Data come from a roughly age/sex-balanced sample of 40 speakers from the Philadelphia Neighborhood Corpus (Labov & Rosenfelder 2011). Every instance of (ING) was coded for pronunciation (apical [ɪn] or velar [ɪŋ]) and grammatical class: nominal (including monomorphemes and nominal gerunds), verbal (participles and progressives) or quantifier (something/nothing).

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

##   code newgram2 index    N
## 1    0       gr     0  353
## 2    0        m     0   43
## 3    0        p     0 1465
## 4    0        s     0  289
## 5    1       gr     1  530
## 6    1        m     1   89
## 7    1        p     1  700
## 8    1        s     1  135

We adopt Labov (2001b)’s Style Decision Tree for classifying speech in the sociolinguistic interview into 1 of 8 contextual styles, further grouped into “Careful” or “Casual” for analysis.

## Source: local data frame [8 x 4]
## Groups: style
## 
##   style Bin_style  ing.rate    N
## 1     C   Careful 0.4241908 1174
## 2     G    Casual 0.3828125  128
## 3     K    Casual 0.4782609   23
## 4     L   Careful 0.2631579   38
## 5     N    Casual 0.2828125  640
## 6     R   Careful 0.4497817  458
## 7     S   Careful 0.3506098  328
## 8     T    Casual 0.4711656  815

We fit a logistic regression model to predict (ING) variant used from birth year, preceding segment, following segment, speaker gender, style, grammatical class, level of educational attainment and lexical frequency. We tested for birth year/education and speaker gender/style interactions. Every predictor except style yields significant main effects, and a significant birth year/education interaction appears.

## Loading required package: Matrix

## 
## Call:
## glm(formula = code ~ birthyear * school.cat + Pre_Seg + Post_Seg + 
##     sex * Bin_style + logfreq + newgram2, family = "binomial", 
##     data = ing.lsa3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4303  -0.8758  -0.5642   1.0024   2.2720  
## 
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)                      -19.005397   8.829386  -2.153   0.0314
## birthyear                          0.011387   0.004583   2.485   0.0130
## school.catJust HS                 45.341642  10.748279   4.219 2.46e-05
## school.catSome college            23.570086  10.527596   2.239   0.0252
## Pre_Segcoronal.obs                -0.827948   0.198299  -4.175 2.98e-05
## Pre_Segother                      -1.411220   0.187385  -7.531 5.03e-14
## Pre_Segvelar.N                    -2.464285   0.551067  -4.472 7.75e-06
## Post_Segpause                      0.573933   0.091805   6.252 4.06e-10
## Post_Segvelar.C                    0.053211   0.167948   0.317   0.7514
## sexm                              -1.067946   0.104618 -10.208  < 2e-16
## Bin_styleCasual                   -0.202489   0.105464  -1.920   0.0549
## logfreq                           -0.146180   0.021461  -6.811 9.67e-12
## newgram2m                          0.061601   0.230241   0.268   0.7890
## newgram2p                         -0.987186   0.091636 -10.773  < 2e-16
## newgram2s                         -0.990727   0.170694  -5.804 6.47e-09
## birthyear:school.catJust HS       -0.023457   0.005554  -4.224 2.40e-05
## birthyear:school.catSome college  -0.011813   0.005432  -2.175   0.0297
## sexm:Bin_styleCasual               0.156661   0.158085   0.991   0.3217
##                                     
## (Intercept)                      *  
## birthyear                        *  
## school.catJust HS                ***
## school.catSome college           *  
## Pre_Segcoronal.obs               ***
## Pre_Segother                     ***
## Pre_Segvelar.N                   ***
## Post_Segpause                    ***
## Post_Segvelar.C                     
## sexm                             ***
## Bin_styleCasual                  .  
## logfreq                          ***
## newgram2m                           
## newgram2p                        ***
## newgram2s                        ***
## birthyear:school.catJust HS      ***
## birthyear:school.catSome college *  
## sexm:Bin_styleCasual                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4852.6  on 3598  degrees of freedom
## Residual deviance: 4073.2  on 3581  degrees of freedom
## AIC: 4109.2
## 
## Number of Fisher Scoring iterations: 4

explain why log reg model instead of glmer??

Following Tagliamonte (2004) and using the full model’s predictors, we fit separate models for each grammatical class. For nominal and verbal (ING), preceding segment, following segment, speaker gender, grammar and lexical frequency are significant, but birth year, education, style and both interactions were significant only for verbal (ING).

Nominal (ING)

## 
## Call:
## glm(formula = code ~ birthyear + Pre_Seg + Post_Seg + sex + Bin_style + 
##     school.cat, family = "binomial", data = ing.lsa.nom)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7251  -1.0595   0.4884   0.9360   1.7418  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            11.622221   6.965703   1.668  0.09522 .  
## birthyear              -0.004882   0.003558  -1.372  0.17010    
## Pre_Segcoronal.obs     -0.925760   0.292942  -3.160  0.00158 ** 
## Pre_Segother           -2.168827   0.271319  -7.994 1.31e-15 ***
## Pre_Segvelar.N         -2.330654   0.968428  -2.407  0.01610 *  
## Post_Segpause           1.118219   0.182534   6.126 9.01e-10 ***
## Post_Segvelar.C         0.161780   0.285289   0.567  0.57066    
## sexm                   -0.756193   0.147513  -5.126 2.96e-07 ***
## Bin_styleCasual        -0.076404   0.149253  -0.512  0.60872    
## school.catNot HS       -0.379256   0.193057  -1.964  0.04947 *  
## school.catSome college  0.540211   0.174830   3.090  0.00200 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1357.7  on 1014  degrees of freedom
## Residual deviance: 1150.6  on 1004  degrees of freedom
## AIC: 1172.6
## 
## Number of Fisher Scoring iterations: 4

Verbal (ING)

## 
## Call:
## glm(formula = code ~ birthyear + Pre_Seg + Post_Seg + sex + Bin_style + 
##     school.cat, family = "binomial", data = ing.lsa.verb)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7620  -0.8609  -0.5940   1.0920   2.2093  
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -0.5338711  4.9889810  -0.107  0.91478    
## birthyear               0.0002321  0.0025501   0.091  0.92749    
## Pre_Segcoronal.obs     -0.2214000  0.2694499  -0.822  0.41126    
## Pre_Segother           -0.7193356  0.2584618  -2.783  0.00538 ** 
## Pre_Segvelar.N         -1.8218400  0.6899855  -2.640  0.00828 ** 
## Post_Segpause           0.6193800  0.1195548   5.181 2.21e-07 ***
## Post_Segvelar.C        -0.1773273  0.2332191  -0.760  0.44705    
## sexm                   -1.1218894  0.1028091 -10.912  < 2e-16 ***
## Bin_styleCasual        -0.2453234  0.1002576  -2.447  0.01441 *  
## school.catNot HS        0.2785092  0.1379976   2.018  0.04357 *  
## school.catSome college  0.9969303  0.1191992   8.364  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2725.1  on 2164  degrees of freedom
## Residual deviance: 2448.4  on 2154  degrees of freedom
## AIC: 2470.4
## 
## Number of Fisher Scoring iterations: 4

Quantifier (ING)

## 
## Call:
## glm(formula = code ~ birthyear + Post_Seg + sex + Bin_style + 
##     school.cat, family = "binomial", data = ing.lsa.quant)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8524  -0.8132  -0.5362   0.9875   2.1383  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            26.370104  12.355885   2.134   0.0328 *  
## birthyear              -0.013918   0.006323  -2.201   0.0277 *  
## Post_Segpause          -0.371417   0.267237  -1.390   0.1646    
## Post_Segvelar.C         1.089798   0.527798   2.065   0.0389 *  
## sexm                   -1.144259   0.237424  -4.819 1.44e-06 ***
## Bin_styleCasual        -0.046640   0.236920  -0.197   0.8439    
## school.catNot HS        0.274444   0.320272   0.857   0.3915    
## school.catSome college  1.670657   0.286623   5.829 5.58e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 530.56  on 423  degrees of freedom
## Residual deviance: 454.64  on 416  degrees of freedom
## AIC: 470.64
## 
## Number of Fisher Scoring iterations: 4

The speaker gender/style interaction revealed women style-shift more dramatically than men do.

#insert a graph that shows this!!

For quantifier (ING), birth year, education, speaker gender and the birth year/education interaction are significant. This suggests external factors condition all three grammatical forms differently and that internal factors condition quantifier (ING) differently than nominal and verbal (ING), supporting nominal and verbal (ING)’s divergent “social profiles” (Tagliamonte 2004:399; Tamminga 2014) and quantifier (ING)’s observed (Labov 2001a:88) contrastive behavior.

It is striking that stylistic constraints primarily distinguish each grammatical category; these categories have not consistently been differentiated in literature on (ING)’s indexical meaning (Eckert 2012, Podesva 2007, Kiesling 2008). This evidence supports the claim that (ING) is more than one variable (Tagliamonte 2004:400, Tamminga 2014:5). Verbal and quantifier (ING) demonstrated birth year/education interactions, shown by decreasing /ing/ rates for speakers who only attended high school and reflecting college education’s changing prevalence, as finishing high school is no longer a definitive social achievement.

Yet birth year’s significant main effect, in addition to the interaction (as in Horvath 1985 and Labov 1972), counters (ING)’s corroborated stability (summarized Labov 2001a:86), suggesting a change in progress. Analysis of larger, more variegated corpora may be necessary to further understand (ING)’s socio-indexical meanings.

References

insert!