Speech style and education distinguish the grammatical classes of (ING)

Sarah Horwitz

The stability and salience of variable (ING) (workin’ ~ working) (Labov 2001a:86) have foregrounded (ING)’s role in the study of language variation and change (Fischer 1958, Labov 1966/1982, Trudgill 1974, Campbell-Kibler 2007, Houston 1985), yet researchers emphasize that our knowledge of “…broad cross-variety patterns of [ING] usage” (Tagliamonte 2004:394) remains scant. We analyze constraints on (ING) across grammatical category in Philadelphia English and show that nominal, verbal and quantifier (ING) are conditioned differently. We argue (ING)’s variant stylistic conditioning supports a conception of (ING) as more than one variable and venture (ING) may be less stable than widely believed.

#insert 3 graphs, showing "nominal, verbal and quantifier (ING) are conditioned differently"
#coding is still work in process; trying to make graphs that show results of models
#i had to remove my in-process coding bc the document wouldn't run with it in

While suprising, these findings resemble the grammatically differentiated constraint ranking for (ING) observed by Tagliamonte in a 2004 study of York English. Of the internal and external factors Tagliamonte evaluated, grammatical category most significantly influenced patterns of (ING) usage (Tagliamonte 2004:398). Subsequent analysis of constraint ranking for nominal compared to verbal (ING) revealed “NOUNS and VERBS have entirely separate and unique linguistic and social profiles” (Tagliamonte 2004:399). Although differences exist in the factors Tagliamonte and I found to be significant, in both studies, of two distinct varieties of English, grammatical category primarily conditioned variation within (ING). The significant role of grammar in mediating variation within (ING) is not new (see Houston 1985; Marsh 1866), but as Tagliamonte emphasizes sufficient attention has not be dedicated to examining this trend.

Data come from a roughly age/sex-balanced sample of 40 speakers from the Philadelphia Neighborhood Corpus (Labov & Rosenfelder 2011). Every instance of (ING) was coded for pronunciation (apical [ɪn] or velar [ɪŋ]) and grammatical class: nominal (including monomorphemes and nominal gerunds), verbal (participles and progressives) or quantifier (something/nothing).

Our grammatical coding of (ING) follows the classificatory model developed by Tamminga (2014:44-49), and the examples that follow are Tamminga’s. Monomorphemic (ING) refers to roots, or items that are stored as whole objects in the lexicon. Examples include morning, pudding, awning, ceiling and during.

Nominal gerunds refer to words that can be split into a nominal or adjectival root and an -ing suffix, but whose meanings derive from both parts of the word. Examples include building (as in an apartment building), housing, clothing and lining.

Participles, like progressives, are words partially composed of a verbal head. Unlike progressives, however, participles can be substituted with nouns. An example of a participle is coding (which can be substituted with the noun “work”: “She does computer coding” is akin to “She does computer work”). An example of a progressive is studying.

Finally, something/nothing quantifiers are notable because they can be articulated in more reduced ways, with glottal stops and syllabic nasals, in addition to being realized with (ING)’s common apical and velar variants.

# Working in dplyr
library(dplyr)

# Accessing data file
ing.lsa <- read.csv("ingstyle_fin.csv")

# Collapsing 8 stylistic categories into 3 style codes in Bin_style
ing.lsa$Bin_style <- ifelse(ing.lsa$style %in% c("C","L","R","S"), "Careful",
                           ifelse(ing.lsa$style %in% c("G","K","N","T"), "Casual", 
                                  "NA"))

# Filter so "NA"s are taken out of code column
ing.lsa <- filter(ing.lsa, !code=="NA")

# Simplifying Pre_seg and Fol_seg columns ---------------------------

# Simplify phonological environment based on previous analysis
ing.lsa$Pre_Seg <- "other"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="velar" & ing.lsa$preseg_manner=="nasal"
  ] <- "velar.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner=="nasal"
  ] <- "coronal.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner %in% c("fricative","stop/aff")
  ] <- "coronal.obs"

ing.lsa$Post_Seg <- "other"

ing.lsa$Post_Seg[
  ing.lsa$folseg_place=="velar" & ing.lsa$folseg_manner %in% c("fricative",
                                                             "nasal",
                                                             "stop/aff",
                                                             "liquid/glide")
  ] <- "velar.C"

ing.lsa$Post_Seg[
  ing.lsa$folseg_manner=="pause"
  ] <- "pause"

# Collapsing grammatical categories ---------------------------

# Recode gram values into multiple categories; brackets index rows
ing.lsa$newgram2[ing.lsa$newgram %in% c("m")] <- "m"
ing.lsa$newgram2[ing.lsa$newgram %in% c("s")] <- "s"
ing.lsa$newgram2[ing.lsa$newgram %in% c("g","r")] <- "gr"
ing.lsa$newgram2[ing.lsa$newgram %in% c("p")] <- "p"

# Filter so "NA"s are taken out of newgram2 column
ing.lsa <- filter(ing.lsa, !newgram2=="NA")

# Filter so "NA"s are taken out of school.cat column
ing.lsa <- filter(ing.lsa, !school.cat=="NA")

# Filter so "NA"s are taken out of code column
ing.lsa2 <- subset(ing.lsa, code %in% c(0,1))

# Reordering educational attainment values
ing.lsa2$school.cat <- factor(ing.lsa2$school.cat, c("Not HS","Just HS","Some college"))

# Turning ing.lsa2 into dataframe
ing.lsa2 <- data.frame(ing.lsa2)

# Identifying the dataframe
ing.lsa3 <- ing.lsa2 %>%
  
  # Create data in long format
  group_by(code, newgram2) %>%
  
  # Summarize; add the retention rate and N column 
  summarise(index=mean(code),N=n())

# Turning ing.lsa3 into dataframe
ing.lsa3 <- data.frame(ing.lsa3)

# Peek at first 8 rows of dataframe
head(ing.lsa3, 8)

##   code newgram2 index    N
## 1    0       gr     0  353
## 2    0        m     0   43
## 3    0        p     0 1465
## 4    0        s     0  289
## 5    1       gr     1  530
## 6    1        m     1   89
## 7    1        p     1  700
## 8    1        s     1  135

(ING)’s nominal/verbal distinction can be traced to the suffix’s evolution during the Middle English period (Tagliamonte 2004). -ing is the result of the combination of at least two grammatical forms: the present participle, which in Old English was -ende, and the verbal noun, which in Old English was formed with the -ung suffix. In Middle English, a sequence of reduction processes gradually replaced -ende with -end, from which point it evolved to -en and finally to -in. Concurrently, spellings of “ing” or “ynge” began to replace the Old English -ung spelling. An explosion of variation in (ING) usage, connected to social evaluation, occurred once the -ing and -in suffices became commonly understood as variants of the same (ING) suffix.

Our study probes one of the emergent forms of variation within (ING), style. We adopt Labov (2001b)’s Style Decision Tree for classifying speech in the sociolinguistic interview into 1 of 8 contextual styles, further grouped into “Careful” or “Casual” for analysis. We expect more [ɪŋ] than [ɪn] to appear in Careful speech, as [ɪŋ] is the standard variant of (ING) and has indexical meanings including formality, effortfulness, articulateness and education (Campbell-Kibler 2007). Conversely, we expect the nonstandard variant [ɪn], which has been found to index inarticulateness/unpretentiousness, relaxation, and uneducation (Campbell-Kibler 2007), to appear with a higher frequency than [ɪŋ] in Casual speech.

The Style Decision Tree

## Fix coding so it works

## Some of the coding shown below is the same as coding presented earlier

# Working in dplyr
library(dplyr)

# Accessing data file
ing.lsa <- read.csv("ingstyle_fin.csv")

# Collapsing 8 stylistic categories into 3 style codes in Bin_style
ing.lsa$Bin_style <- ifelse(ing.lsa$style %in% c("C","L","R","S"), "Careful",
                           ifelse(ing.lsa$style %in% c("G","K","N","T"), "Casual", 
                                  "NA"))

# Filter so "NA"s are taken out of code column
ing.lsa <- filter(ing.lsa, !code=="NA")

# Simplifying Pre_seg and Fol_seg columns ---------------------------

# Simplify phonological environment based on previous analysis
ing.lsa$Pre_Seg <- "other"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="velar" & ing.lsa$preseg_manner=="nasal"
  ] <- "velar.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner=="nasal"
  ] <- "coronal.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner %in% c("fricative","stop/aff")
  ] <- "coronal.obs"

ing.lsa$Post_Seg <- "other"

ing.lsa$Post_Seg[
  ing.lsa$folseg_place=="velar" & ing.lsa$folseg_manner %in% c("fricative",
                                                             "nasal",
                                                             "stop/aff",
                                                             "liquid/glide")
  ] <- "velar.C"

ing.lsa$Post_Seg[
  ing.lsa$folseg_manner=="pause"
  ] <- "pause"

# Collapsing grammatical categories ---------------------------

# Recode gram values into multiple categories; brackets index rows
ing.lsa$newgram2[ing.lsa$newgram %in% c("m")] <- "m"
ing.lsa$newgram2[ing.lsa$newgram %in% c("s")] <- "s"
ing.lsa$newgram2[ing.lsa$newgram %in% c("g","r")] <- "gr"
ing.lsa$newgram2[ing.lsa$newgram %in% c("p")] <- "p"

# Filter so "NA"s are taken out of newgram2 column
ing.lsa <- filter(ing.lsa, !newgram2=="NA")

# Filter so "NA"s are taken out of school.cat column
ing.lsa <- filter(ing.lsa, !school.cat=="NA")

# Filter so "NA"s are taken out of code column
ing.lsa2 <- subset(ing.lsa, code %in% c(0,1))

# Reordering educational attainment values
ing.lsa2$school.cat <- factor(ing.lsa2$school.cat, c("Not HS","Just HS","Some college"))

# Reordering Casual and Careful style values
ing.lsa2$style <- factor(ing.lsa2$style, c("R","L","S","C","N","G","K","T","R","L","S","C","N","G","K","T"))

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

# Turning ing.lsa2 into dataframe
ing.lsa2 <- data.frame(ing.lsa2)

## Step 1 of replicating Tagliamonte <- get mean rates for data across styles (Careful & Casual)

# Turn into dplyr dataframe
ing.lsa2 <- tbl_df(ing.lsa2)

# Identifying the dataframe
ing.lsa.style <- ing.lsa2 %>%
  
  # Create data in long format
  group_by(style, Bin_style) %>%
  
  # Summarize; add the ret rate and N column 
  summarise(ing.rate=mean(code),N=n())

# View ing.lsa.style
View(ing.lsa.style)

# Peek at first 8 rows of dataframe
head(ing.lsa.style, 8)

## Source: local data frame [8 x 4]
## Groups: style
## 
##   style Bin_style  ing.rate    N
## 1     R   Careful 0.4497817  458
## 2     L   Careful 0.2631579   38
## 3     S   Careful 0.3506098  328
## 4     C   Careful 0.4241908 1174
## 5     N    Casual 0.2828125  640
## 6     G    Casual 0.3828125  128
## 7     K    Casual 0.4782609   23
## 8     T    Casual 0.4711656  815

We fit a logistic regression model to predict (ING) variant used from birth year, preceding segment, following segment, speaker gender, style, grammatical class, level of educational attainment and lexical frequency. We tested for birth year/education and speaker gender/style interactions. Every predictor except style yields significant main effects, and a significant birth year/education interaction appears.

## Again, some of the coding shown below replicates coding presented earlier

# Working in dplyr
library(dplyr)

# Working in lme4 (regressions)
library(lme4)

# Accessing data file
ing.lsa <- read.csv("ingstyle_fin.csv")

# Collapsing 8 stylistic categories into 3 style codes in Bin_style
ing.lsa$Bin_style <- ifelse(ing.lsa$style %in% c("C","L","R","S"), "Careful",
                           ifelse(ing.lsa$style %in% c("G","K","N","T"), "Casual", 
                                  "NA"))

# Filter so "NA"s are taken out of code column
ing.lsa <- filter(ing.lsa, !code=="NA")

# Simplifying Pre_seg and Fol_seg columns

# Simplify phonological environment based on previous analysis ---------------------------
ing.lsa$Pre_Seg <- "other"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="velar" & ing.lsa$preseg_manner=="nasal"
  ] <- "velar.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner=="nasal"
  ] <- "coronal.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner %in% c("fricative","stop/aff")
  ] <- "coronal.obs"

ing.lsa$Post_Seg <- "other"

ing.lsa$Post_Seg[
  ing.lsa$folseg_place=="velar" & ing.lsa$folseg_manner %in% c("fricative",
                                                             "nasal",
                                                             "stop/aff",
                                                             "liquid/glide")
  ] <- "velar.C"

ing.lsa$Post_Seg[
  ing.lsa$folseg_manner=="pause"
  ] <- "pause"

# Collapsing grammatical categories ---------------------------

# Recode gram values into multiple categories; brackets index rows
ing.lsa$newgram2[ing.lsa$newgram %in% c("m")] <- "m"
ing.lsa$newgram2[ing.lsa$newgram %in% c("s")] <- "s"
ing.lsa$newgram2[ing.lsa$newgram %in% c("g","r")] <- "gr"
ing.lsa$newgram2[ing.lsa$newgram %in% c("p")] <- "p"

# Filter so "NA"s are taken out of newgram2 column
ing.lsa <- filter(ing.lsa, !newgram2=="NA")

# Filter so "NA"s are taken out of school.cat column
ing.lsa <- filter(ing.lsa, !school.cat=="NA")

# Filter so "NA"s are taken out of code column
ing.lsa2 <- subset(ing.lsa, code %in% c(0,1))

# Reordering educational attainment values
ing.lsa2$school.cat <- factor(ing.lsa2$school.cat, c("Not HS","Just HS","Some college"))

# Adding "logfreq" category into data frame
ing.lsa2$logfreq <- log(ing.lsa2$subtlex.count)

# Turning ing.lsa2 into dataframe
ing.lsa2 <- data.frame(ing.lsa2)

# Filtering out negative infinity
ing.lsa3 <- subset(ing.lsa2, !logfreq=="-Inf")

# Turning ing.lsa3 into dataframe
ing.lsa3 <- data.frame(ing.lsa3)

# Making model: birthyear * education, sex * style, logfreq
# Model includes all grammatical categories
mod.ing.lsa.all.fin <- glm(code ~ birthyear * school.cat + Pre_Seg + Post_Seg + sex * 
                           Bin_style + logfreq + newgram2, ing.lsa3, family = "binomial")

Linear Regression Model Results

# Seeing the results of mod.ing.lsa.all.fin
summary(mod.ing.lsa.all.fin)

## 
## Call:
## glm(formula = code ~ birthyear * school.cat + Pre_Seg + Post_Seg + 
##     sex * Bin_style + logfreq + newgram2, family = "binomial", 
##     data = ing.lsa3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4303  -0.8758  -0.5642   1.0024   2.2720  
## 
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)                      -19.005397   8.829386  -2.153   0.0314
## birthyear                          0.011387   0.004583   2.485   0.0130
## school.catJust HS                 45.341642  10.748279   4.219 2.46e-05
## school.catSome college            23.570086  10.527596   2.239   0.0252
## Pre_Segcoronal.obs                -0.827948   0.198299  -4.175 2.98e-05
## Pre_Segother                      -1.411220   0.187385  -7.531 5.03e-14
## Pre_Segvelar.N                    -2.464285   0.551067  -4.472 7.75e-06
## Post_Segpause                      0.573933   0.091805   6.252 4.06e-10
## Post_Segvelar.C                    0.053211   0.167948   0.317   0.7514
## sexm                              -1.067946   0.104618 -10.208  < 2e-16
## Bin_styleCasual                   -0.202489   0.105464  -1.920   0.0549
## logfreq                           -0.146180   0.021461  -6.811 9.67e-12
## newgram2m                          0.061601   0.230241   0.268   0.7890
## newgram2p                         -0.987186   0.091636 -10.773  < 2e-16
## newgram2s                         -0.990727   0.170694  -5.804 6.47e-09
## birthyear:school.catJust HS       -0.023457   0.005554  -4.224 2.40e-05
## birthyear:school.catSome college  -0.011813   0.005432  -2.175   0.0297
## sexm:Bin_styleCasual               0.156661   0.158085   0.991   0.3217
##                                     
## (Intercept)                      *  
## birthyear                        *  
## school.catJust HS                ***
## school.catSome college           *  
## Pre_Segcoronal.obs               ***
## Pre_Segother                     ***
## Pre_Segvelar.N                   ***
## Post_Segpause                    ***
## Post_Segvelar.C                     
## sexm                             ***
## Bin_styleCasual                  .  
## logfreq                          ***
## newgram2m                           
## newgram2p                        ***
## newgram2s                        ***
## birthyear:school.catJust HS      ***
## birthyear:school.catSome college *  
## sexm:Bin_styleCasual                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4852.6  on 3598  degrees of freedom
## Residual deviance: 4073.2  on 3581  degrees of freedom
## AIC: 4109.2
## 
## Number of Fisher Scoring iterations: 4

Explain why log reg model instead of glmer

Following Tagliamonte (2004) and using the full model’s predictors, we fit separate models for each grammatical class. For nominal and verbal (ING), preceding segment, following segment, speaker gender, grammar and lexical frequency are significant, but birth year, education, style and both interactions were significant only for verbal (ING).

Preparing data to be subset by grammatical category

# Working in lme4 (regression)
library(lme4)

# Accessing data file
ing.lsa <- read.csv("ingstyle_fin.csv")

# Collapsing 8 stylistic categories into 3 style codes in Bin_style
ing.lsa$Bin_style <- ifelse(ing.lsa$style %in% c("C","L","R","S"), "Careful",
                            ifelse(ing.lsa$style %in% c("G","K","N","T"), "Casual", 
                                   "NA"))

# Filter so "NA"s are taken out of code column
ing.lsa <- filter(ing.lsa, !code=="NA")

# Simplifying Pre_seg and Fol_seg columns ---------------------------

# Simplify phonological environment based on previous analysis
ing.lsa$Pre_Seg <- "other"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="velar" & ing.lsa$preseg_manner=="nasal"
  ] <- "velar.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner=="nasal"
  ] <- "coronal.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner %in% c("fricative","stop/aff")
  ] <- "coronal.obs"

ing.lsa$Post_Seg <- "other"

ing.lsa$Post_Seg[
  ing.lsa$folseg_place=="velar" & ing.lsa$folseg_manner %in% c("fricative",
                                                               "nasal",
                                                               "stop/aff",
                                                               "liquid/glide")
  ] <- "velar.C"

ing.lsa$Post_Seg[
  ing.lsa$folseg_manner=="pause"
  ] <- "pause"

# Collapsing grammatical categories ---------------------------

# Recode gram values into multiple categories; brackets index rows
ing.lsa$newgram2[ing.lsa$newgram %in% c("m")] <- "m"
ing.lsa$newgram2[ing.lsa$newgram %in% c("s")] <- "s"
ing.lsa$newgram2[ing.lsa$newgram %in% c("g","r")] <- "gr"
ing.lsa$newgram2[ing.lsa$newgram %in% c("p")] <- "p"

# Filter so "NA"s are taken out of newgram2 column
ing.lsa <- filter(ing.lsa, !newgram2=="NA")

# Filter so "NA"s are taken out of code column
ing.lsa2 <- subset(ing.lsa, code %in% c(0,1))

# Filter so "NA"s are taken out of school.cat column
ing.lsa2 <- subset(ing.lsa, school.cat %in% c("Just HS","Not HS","Some college"))

Nominal (ING)

# Subsetting data by grammatical cat - NOMINAL
ing.lsa.nom <- subset(ing.lsa2, newgram2 %in% c("m","gr"))

# View ing.lsa.nom
View(ing.lsa.nom)

# Creating first model - NOMINAL
mod.ing.lsa.nom <- glm(code ~ birthyear + Pre_Seg + Post_Seg + sex + Bin_style + 
                         school.cat, ing.lsa.nom, family = "binomial")

# Seeing results of nominal model
summary(mod.ing.lsa.nom)

## 
## Call:
## glm(formula = code ~ birthyear + Pre_Seg + Post_Seg + sex + Bin_style + 
##     school.cat, family = "binomial", data = ing.lsa.nom)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7251  -1.0595   0.4884   0.9360   1.7418  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            11.622221   6.965703   1.668  0.09522 .  
## birthyear              -0.004882   0.003558  -1.372  0.17010    
## Pre_Segcoronal.obs     -0.925760   0.292942  -3.160  0.00158 ** 
## Pre_Segother           -2.168827   0.271319  -7.994 1.31e-15 ***
## Pre_Segvelar.N         -2.330654   0.968428  -2.407  0.01610 *  
## Post_Segpause           1.118219   0.182534   6.126 9.01e-10 ***
## Post_Segvelar.C         0.161780   0.285289   0.567  0.57066    
## sexm                   -0.756193   0.147513  -5.126 2.96e-07 ***
## Bin_styleCasual        -0.076404   0.149253  -0.512  0.60872    
## school.catNot HS       -0.379256   0.193057  -1.964  0.04947 *  
## school.catSome college  0.540211   0.174830   3.090  0.00200 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1357.7  on 1014  degrees of freedom
## Residual deviance: 1150.6  on 1004  degrees of freedom
## AIC: 1172.6
## 
## Number of Fisher Scoring iterations: 4

Verbal (ING)

# Subsetting data by grammatical cat - VERBAL
ing.lsa.verb <- subset(ing.lsa2, newgram2 %in% c("p"))

# View ing.lsa.verb
View(ing.lsa.verb)

# Creating second model - VERBAL
mod.ing.lsa.verb <- glm(code ~ birthyear + Pre_Seg + Post_Seg + sex + Bin_style + 
                          school.cat, ing.lsa.verb, family = "binomial")

# Seeing results of verbal model
summary(mod.ing.lsa.verb)

## 
## Call:
## glm(formula = code ~ birthyear + Pre_Seg + Post_Seg + sex + Bin_style + 
##     school.cat, family = "binomial", data = ing.lsa.verb)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7620  -0.8609  -0.5940   1.0920   2.2093  
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -0.5338711  4.9889810  -0.107  0.91478    
## birthyear               0.0002321  0.0025501   0.091  0.92749    
## Pre_Segcoronal.obs     -0.2214000  0.2694499  -0.822  0.41126    
## Pre_Segother           -0.7193356  0.2584618  -2.783  0.00538 ** 
## Pre_Segvelar.N         -1.8218400  0.6899855  -2.640  0.00828 ** 
## Post_Segpause           0.6193800  0.1195548   5.181 2.21e-07 ***
## Post_Segvelar.C        -0.1773273  0.2332191  -0.760  0.44705    
## sexm                   -1.1218894  0.1028091 -10.912  < 2e-16 ***
## Bin_styleCasual        -0.2453234  0.1002576  -2.447  0.01441 *  
## school.catNot HS        0.2785092  0.1379976   2.018  0.04357 *  
## school.catSome college  0.9969303  0.1191992   8.364  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2725.1  on 2164  degrees of freedom
## Residual deviance: 2448.4  on 2154  degrees of freedom
## AIC: 2470.4
## 
## Number of Fisher Scoring iterations: 4

Quantifier (ING)

# Subsetting data by grammatical cat - QUANTIFIER
ing.lsa.quant <- subset(ing.lsa2, newgram2 %in% c("s"))

# View ing.lsa.quant
View(ing.lsa.quant)

# Creating third model - QUANTIFIER
mod.ing.lsa.quant <- glm(code ~ birthyear + Post_Seg + sex + Bin_style + 
                           school.cat, ing.lsa.quant, family = "binomial")

# Seeing results of quantifier model
summary(mod.ing.lsa.quant)

## 
## Call:
## glm(formula = code ~ birthyear + Post_Seg + sex + Bin_style + 
##     school.cat, family = "binomial", data = ing.lsa.quant)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8524  -0.8132  -0.5362   0.9875   2.1383  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            26.370104  12.355885   2.134   0.0328 *  
## birthyear              -0.013918   0.006323  -2.201   0.0277 *  
## Post_Segpause          -0.371417   0.267237  -1.390   0.1646    
## Post_Segvelar.C         1.089798   0.527798   2.065   0.0389 *  
## sexm                   -1.144259   0.237424  -4.819 1.44e-06 ***
## Bin_styleCasual        -0.046640   0.236920  -0.197   0.8439    
## school.catNot HS        0.274444   0.320272   0.857   0.3915    
## school.catSome college  1.670657   0.286623   5.829 5.58e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 530.56  on 423  degrees of freedom
## Residual deviance: 454.64  on 416  degrees of freedom
## AIC: 470.64
## 
## Number of Fisher Scoring iterations: 4

The speaker gender/style interaction revealed women style-shift more dramatically than men do.

# Working in dplyr
library(dplyr)

# Working in ggplot (graphing)
library(ggplot2)

# Accessing data file
ing.lsa <- read.csv("ingstyle_fin.csv")

# Collapsing 8 stylistic categories into 3 style codes in Bin_style
ing.lsa$Bin_style <- ifelse(ing.lsa$style %in% c("C","L","R","S"), "Careful",
                            ifelse(ing.lsa$style %in% c("G","K","N","T"), "Casual", 
                                   "NA"))

# Collapsing grammatical categories ---------------------------

#recode gram values into multiple categories; brackets index rows
ing.lsa$newgram2[ing.lsa$newgram %in% c("m")] <- "m"
ing.lsa$newgram2[ing.lsa$newgram %in% c("s")] <- "s"
ing.lsa$newgram2[ing.lsa$newgram %in% c("g","r")] <- "gr"
ing.lsa$newgram2[ing.lsa$newgram %in% c("p")] <- "p"

# Filter so "NA"s are taken out of newgram2 column
ing.lsa <- filter(ing.lsa, !newgram2=="NA")

# Filter so "NA"s are taken out of school.cat column
ing.lsa <- filter(ing.lsa, !school.cat=="NA")

# Filter so "NA"s are taken out of code column
ing.lsa2 <- subset(ing.lsa, code %in% c(0,1))

# Reordering education attainment values
ing.lsa2$school.cat <- factor(ing.lsa2$school.cat, c("Not HS","Just HS","Some college"))

# Reordering Casual and Careful style values along x-axis
ing.lsa2$style <- factor(ing.lsa2$style, c("R","L","S","C","N","G","K","T","R","L","S","C","N","G","K","T"))

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

# Creating graph
ggplot(ing.lsa2, aes(style,code)) + 
  geom_bar(stat = "identity", aes(fill=sex)) + 
  facet_wrap(~ Bin_style) +
  facet_grid(sex ~ Bin_style) + 
  ggtitle("Stylistic Differentiation of (ING) by sex
          for eight categories of the Style Decision Tree")

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

For quantifier (ING), birth year, education, speaker gender and the birth year/education interaction are significant. This suggests external factors condition all three grammatical forms differently and that internal factors condition quantifier (ING) differently than nominal and verbal (ING), supporting nominal and verbal (ING)’s divergent “social profiles” (Tagliamonte 2004:399; Tamminga 2014) and quantifier (ING)’s observed (Labov 2001a:88) contrastive behavior.

It is striking that stylistic constraints primarily distinguish each grammatical category; these categories have not consistently been differentiated in literature on (ING)’s indexical meaning (Eckert 2012, Podesva 2007, Kiesling 2008). This evidence supports the claim that (ING) is more than one variable (Tagliamonte 2004:400, Tamminga 2014:5). Verbal and quantifier (ING) demonstrated birth year/education interactions, shown by decreasing /ing/ rates for speakers who only attended high school and reflecting college education’s changing prevalence, as finishing high school is no longer a definitive social achievement.

# Working in dplyr
library(dplyr)

# Working in ggplot (graphing)
library(ggplot2)

# Accessing data file
ing.lsa <- read.csv("ingstyle_fin.csv")

# Collapsing 8 stylistic categories into 3 style codes in Bin_style
ing.lsa$Bin_style <- ifelse(ing.lsa$style %in% c("C","L","R","S"), "Careful",
                           ifelse(ing.lsa$style %in% c("G","K","N","T"), "Casual", 
                                  "NA"))

# Filter so "NA"s are taken out of code column
ing.lsa <- filter(ing.lsa, !code=="NA")

# Simplifying Pre_seg and Fol_seg columns ---------------------------

# Simplify phonological environment based on previous analysis
ing.lsa$Pre_Seg <- "other"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="velar" & ing.lsa$preseg_manner=="nasal"
  ] <- "velar.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner=="nasal"
  ] <- "coronal.N"

ing.lsa$Pre_Seg[
  ing.lsa$preseg_place=="coronal" & ing.lsa$preseg_manner %in% c("fricative","stop/aff")
  ] <- "coronal.obs"

ing.lsa$Post_Seg <- "other"

ing.lsa$Post_Seg[
  ing.lsa$folseg_place=="velar" & ing.lsa$folseg_manner %in% c("fricative",
                                                             "nasal",
                                                             "stop/aff",
                                                             "liquid/glide")
  ] <- "velar.C"

ing.lsa$Post_Seg[
  ing.lsa$folseg_manner=="pause"
  ] <- "pause"

# Collapsing grammatical categories ---------------------------

# Recode gram values into multiple categories; brackets index rows
ing.lsa$newgram2[ing.lsa$newgram %in% c("m")] <- "m"
ing.lsa$newgram2[ing.lsa$newgram %in% c("s")] <- "s"
ing.lsa$newgram2[ing.lsa$newgram %in% c("g","r")] <- "gr"
ing.lsa$newgram2[ing.lsa$newgram %in% c("p")] <- "p"

# Filter so "NA"s are taken out of newgram2 column
ing.lsa <- filter(ing.lsa, !newgram2=="NA")

# Filter so "NA"s are taken out of school.cat column
ing.lsa <- filter(ing.lsa, !school.cat=="NA")

# Filter so "NA"s are taken out of code column
ing.lsa2 <- subset(ing.lsa, code %in% c(0,1)) # subset function indicates which values to keep in column

# Reordering educational attainment values
ing.lsa2$school.cat <- factor(ing.lsa2$school.cat, c("Not HS","Just HS","Some college"))

# Turning ing.lsa2 into dataframe
ing.lsa2 <- data.frame(ing.lsa2)

# Creating graph
ggplot(ing.lsa2, aes(birthyear, code, color = school.cat)) + 
  geom_point(aes(color=school.cat)) + 
  stat_smooth(aes(color=school.cat, fill=school.cat), method = "lm") + 
  labs(x = "Birth year", y = "/ing/ usage (%)", 
       color = "Educational attainment", fill="Educational attainment") + 
  ggtitle("/ing/ rate by education over time")

That rate of (ING) use is changing over time with educational achievement is striking, and reinforces how deeply patterns of language usage are entrenched in greater social phenomena. This is interesting given (ING)’s purported stability (summarized Labov 2001a:86), but more significantly because it implies further investigations of (ING) can be used to probe socially constructed realities that are evolving below the level of consciousness. Eckert (2012:94) identifies “indexical mutability” as the core property of linguistic variables in the third wave study of variation. That (ING)’s usage was shown to change over time suggests it is not immune to the processes of bricolage by which speakers reconstruct variables to reflect their community’s changing social concerns, highlighting (ING)’s continued relevance in third wave studies of variation. It appears subsequent studies of (ING) have the potential to deepen our understanding of indexicality and processes of speaker agency, such as stance (Kiesling 2008).

Birth year’s significant main effect, in addition to the interaction (as in Horvath 1985 and Labov 1972), counters (ING)’s corroborated stability (summarized Labov 2001a:86), suggesting a change in progress. Analysis of larger, more variegated corpora may be necessary to further understand (ING)’s socio-indexical meanings.

References

Campbell-Kibler, K. (2007). Accent, (ING), and the social logic of listener perceptions. American Speech, 82, 1, pp. 32-64. ● Eckert, P. (2012). Three waves of variation study: the emergence of meaning in the study of sociolinguistic variation. In Annual Review of Anthropology, 41, 87-100. ● Fischer, J. (1958). Social influences on the choice of a linguistic variant. In Word, 14, 1, pp. 47-56. ● Horvath, B.M. (1985). Variation in Australian English. Cambridge: Cambridge University Press. ● Houston, A. (1985). Continuity and change in English morphology: the variable (ING). PhD thesis, University of Pennsylvania. ● Kiesling, S.F. (2008). Style as stance: Stance as the explanation for patterns of sociolinguistic variation. In Alexandra Jaffe (ed), Sociolinguistic Perspectives on Stance. Oxford University Press. pp. 171-194. ● Labov, W. (1966/1982). The Social Stratification of English in New York City. Washington, D.C.: Center for Applied Linguistics. ● Labov, W. (1972). Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press. ● Labov, W. (2001a). Principles of Linguistic Change: Volume 2, Social Factors. Blackwell Publishers. ● Labov, W. (2001b). The anatomy of style-shifting. In Eckert & Rickford (Eds.), Style and Sociolinguistic Variation. Cambridge University Press. ● Labov, W. & I. Rosenfelder. (2011). The Philadelphia Neighborhood Corpus. ● Podesva, R. (2007). Three sources of stylistic meaning. Texas Linguistic Forum 51. ● Tagliamonte, S.A. (2004). Someth[in]’s go[ing] on!: Variable ing at ground zero. In Gunnarsson et al. (eds.), Language Variation in Europe: papers from the Second International Conference on Language Variation in Europe, ICLA VE 2 Uppsala, Sweden, June 12-14, 2003. ● Tamminga, M. (2014). Persistence in the Production of Linguistic Variation. Ph.D. Dissertation, University of Pennsylvania. ● Trudgill, P. (1974). The social differentiation of English in Norwich. Cambridge: University of Cambridge Press.