Introduction: Replicating Labov’s 2001 study

The values for style-shift were calculated by subtracting each speaker’s index for Careful speech from their index for Casual speech. The index is the mean rate at which the linguistic variant was used. For (DH), a higher index means the nonstandard (glottal) form was used more frequently, and for (ING), a higher index means the nonstandard (apical) form was used more frequently (even though it was reported in the opposite way). Negative and positive values refer to the direction of the style-shift. Positive values indicate a speaker shifted from using more of the nonstandard variant in Casual speech to less of the nonstandard form in Careful speech, while negative values indicate the opposite pattern.

DH: Visual Analysis

Labov found dramatic style-shifting with the (DH) variable, and in his analysis emphasizes how the majority of speakers shifted in the positive direction. His modal value fell between 25 and 30. No strong difference by gender appeared. However, Labov noted that women were concentrated among speakers with the highest style-shift values (7 of the 8 values over 85 are female) and that men were concentrated among the speakers with most negative values (6 of the 8 negative values belong to males).

Replicating Labov 2001 figure 5.2
#working in dplyr
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#opening data file
dh <- read.csv("dhstyle_fin.csv")

#peeking at first lines of data frame
head(dh)
##                File Segment Position Code1 Labov Code2 Seg_Start Seg_End
## 1 PH00-1-1-JStevens      DH    Start     9    NA    NA    36.613  36.643
## 2 PH00-1-1-JStevens      DH    Start     9    NA    NA    44.653  44.682
## 3 PH00-1-1-JStevens      DH    Start     9    NA    NA    48.302  48.333
## 4 PH00-1-1-JStevens      DH    Start     9    NA    NA    52.773  52.803
## 5 PH00-1-1-JStevens      DH    Start     0    NA    NA    63.138  63.168
## 6 PH00-1-1-JStevens      DH    Start     2    NA    NA    76.323  76.353
##     Word Word_Start Word_End Pre_Seg Pre_Seg2 Pre_Seg_Start Pre_Seg_End
## 1    THE     36.613   36.673       S       NA        36.543      36.613
## 2   THIS     44.653   44.792      AH       NA        44.453      44.653
## 3    THE     48.302   48.363       S       NA        48.273      48.302
## 4 THAT'S     52.773   52.983      sp       NA        52.633      52.773
## 5    THE     63.138   63.198       K       NA        63.108      63.138
## 6    THE     76.323   76.383       N       NA        76.273      76.323
##   Post_Seg Post_Seg2 Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1       AH        NA         36.643       36.673  1.950             3.590
## 2       AH        NA         44.682       44.713  1.940             2.577
## 3       AH        NA         48.333       48.363  1.081             6.475
## 4       AE        NA         52.803       52.853  1.500             3.333
## 5       AH        NA         63.168       63.198  1.620             4.938
## 6       AH        NA         76.353       76.383  2.250             3.556
##   Age Age2 Birthyear sex Ethnicity School code Gram PrevING PrevWord
## 1  21   NA      1979   m       i/r     14   NA   NA      NA       na
## 2  21   NA      1979   m       i/r     14   NA   NA       9      THE
## 3  21   NA      1979   m       i/r     14   NA   NA       9     THIS
## 4  21   NA      1979   m       i/r     14   NA   NA       9      THE
## 5  21   NA      1979   m       i/r     14   NA   NA       9   THAT'S
## 6  21   NA      1979   m       i/r     14   NA   NA       0      THE
##        X    Lag    X.1 style Bin_style Or_style
## 1 19.093 36.673 34.910     R        NA       NA
## 2  8.119 44.792 44.080     C        NA       NA
## 3  3.571 48.363 46.390     C        NA       NA
## 4  4.620 52.983 52.460     C        NA       NA
## 5 10.215 63.198 61.655     C        NA       NA
## 6 13.185 76.383 73.190     C        NA       NA
#seeing summary of data frame
summary(dh)
##                  File       Segment     Position         Code1      
##  PH92-2-3-JSantori :  924   DH:18974   Start:18974   Min.   :0.000  
##  PH06-2-2-Patrick  :  712                            1st Qu.:1.000  
##  PH06-2-7-Samantha :  626                            Median :1.000  
##  PH06-1-6-JMcPhee  :  611                            Mean   :1.578  
##  PH73-5-2-CKay     :  600                            3rd Qu.:1.000  
##  PH82-1-10-MCollins:  600                            Max.   :9.000  
##  (Other)           :14901                                           
##   Labov          Code2           Seg_Start           Seg_End        
##  Mode:logical   Mode:logical   Min.   :   1.953   Min.   :   2.003  
##  NA's:18974     NA's:18974     1st Qu.: 718.413   1st Qu.: 718.443  
##                                Median :1383.079   Median :1383.153  
##                                Mean   :1461.399   Mean   :1461.448  
##                                3rd Qu.:2102.733   3rd Qu.:2102.765  
##                                Max.   :4476.403   Max.   :4476.453  
##                                                                     
##       Word        Word_Start          Word_End           Pre_Seg    
##  THE    :7128   Min.   :   1.953   Min.   :   2.093   sp     :5560  
##  THAT   :2794   1st Qu.: 718.413   1st Qu.: 718.473   N      :2743  
##  THEY   :2652   Median :1383.079   Median :1383.341   Z      :1145  
##  THERE  :1185   Mean   :1461.399   Mean   :1461.554   V      :1018  
##  THIS   :1056   3rd Qu.:2102.733   3rd Qu.:2102.810   L      : 927  
##  THAT'S : 864   Max.   :4476.403   Max.   :4476.533   K      : 868  
##  (Other):3295                                         (Other):6713  
##  Pre_Seg2       Pre_Seg_Start       Pre_Seg_End          Post_Seg   
##  Mode:logical   Min.   :   1.893   Min.   :   1.953   AH     :8252  
##  NA's:18974     1st Qu.: 718.343   1st Qu.: 718.413   EH     :3148  
##                 Median :1382.614   Median :1383.079   EY     :2785  
##                 Mean   :1461.240   Mean   :1461.399   AE     :2760  
##                 3rd Qu.:2102.519   3rd Qu.:2102.733   IY     :1603  
##                 Max.   :4476.333   Max.   :4476.403   IH     : 254  
##                                                       (Other): 172  
##  Post_Seg2      Post_Seg_Start      Post_Seg_End          Window      
##  Mode:logical   Min.   :   2.003   Min.   :   2.063   Min.   : 0.470  
##  NA's:18974     1st Qu.: 718.443   1st Qu.: 718.473   1st Qu.: 1.240  
##                 Median :1383.153   Median :1383.329   Median : 1.530  
##                 Mean   :1461.448   Mean   :1461.519   Mean   : 1.793  
##                 3rd Qu.:2102.765   3rd Qu.:2102.810   3rd Qu.: 1.931  
##                 Max.   :4476.453   Max.   :4476.533   Max.   :53.157  
##                                                                       
##  Vowels_per_Second      Age          Age2           Birthyear    sex      
##  Min.   : 0.061    Min.   :18.00   Mode:logical   Min.   :1895   f:10144  
##  1st Qu.: 2.928    1st Qu.:30.00   NA's:18974     1st Qu.:1925   m: 8830  
##  Median : 4.196    Median :45.00                  Median :1944            
##  Mean   : 4.264    Mean   :46.91                  Mean   :1943            
##  3rd Qu.: 5.484    3rd Qu.:68.00                  3rd Qu.:1962            
##  Max.   :13.725    Max.   :78.00                  Max.   :1984            
##                                                                           
##    Ethnicity        School       code           Gram        
##  i      :9781   12     :7059   Mode:logical   Mode:logical  
##  r      :4219   16     :4466   NA's:18974     NA's:18974    
##  r/o    : 795   0      :1560                                
##  w/p/J  : 588   8      :1465                                
##  i/w/g  : 580   11     :1302                                
##  w      : 555   9      : 924                                
##  (Other):2456   (Other):2198                                
##     PrevING         PrevWord          X                Lag          
##  Min.   :0.000   THE    :6775   Min.   : -6.801   Min.   :   2.093  
##  1st Qu.:1.000   THAT   :2606   1st Qu.:  1.143   1st Qu.: 718.473  
##  Median :1.000   THEY   :2546   Median :  2.810   Median :1383.341  
##  Mean   :1.571   THERE  :1119   Mean   :  5.130   Mean   :1461.554  
##  3rd Qu.:1.000   THIS   :1000   3rd Qu.:  6.433   3rd Qu.:2102.810  
##  Max.   :9.000   (Other):4906   Max.   :344.800   Max.   :4476.533  
##  NA's   :1002    NA's   :  22   NA's   :22                          
##       X.1              style      Bin_style      Or_style      
##  Min.   :   1.88   C      :6548   Mode:logical   Mode:logical  
##  1st Qu.: 717.05   T      :4707   NA's:18974     NA's:18974    
##  Median :1381.71   R      :3167                                
##  Mean   :1459.96   N      :1793                                
##  3rd Qu.:2101.21   S      :1734                                
##  Max.   :4476.32   G      : 646                                
##                    (Other): 379
# Step 1 - Reorganizing the data frame

#collapsing style codes into 3 styles (car, cas, na) in Bin_style (%in%, not %>%)
dh$Bin_style <- ifelse(dh$style %in% c("C","R","L","S"), "Careful",
                       ifelse(dh$style %in% c("N","G","K","T"), "Casual", "NA"))

#filtering NAs from Bin_style column so don't appear in table
dh <- filter(dh, !Bin_style=="NA")

#collapsing/reconfiguring Code1 values in "Labov" so they match L's 2001 coding
#that is, creating labov's index so can properly replicate his study
dh$Labov[dh$Code1 %in% c("1")] <- "0"
dh$Labov[dh$Code1 %in% c("2","9")] <- "1"
dh$Labov[dh$Code1 %in% c("0")] <- "2"

#making the values in the Labov column numeric
dh$Labov <- as.numeric(dh$Labov)

head(dh)
##                File Segment Position Code1 Labov Code2 Seg_Start Seg_End
## 1 PH00-1-1-JStevens      DH    Start     9     1    NA    36.613  36.643
## 2 PH00-1-1-JStevens      DH    Start     9     1    NA    44.653  44.682
## 3 PH00-1-1-JStevens      DH    Start     9     1    NA    48.302  48.333
## 4 PH00-1-1-JStevens      DH    Start     9     1    NA    52.773  52.803
## 5 PH00-1-1-JStevens      DH    Start     0     2    NA    63.138  63.168
## 6 PH00-1-1-JStevens      DH    Start     2     1    NA    76.323  76.353
##     Word Word_Start Word_End Pre_Seg Pre_Seg2 Pre_Seg_Start Pre_Seg_End
## 1    THE     36.613   36.673       S       NA        36.543      36.613
## 2   THIS     44.653   44.792      AH       NA        44.453      44.653
## 3    THE     48.302   48.363       S       NA        48.273      48.302
## 4 THAT'S     52.773   52.983      sp       NA        52.633      52.773
## 5    THE     63.138   63.198       K       NA        63.108      63.138
## 6    THE     76.323   76.383       N       NA        76.273      76.323
##   Post_Seg Post_Seg2 Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1       AH        NA         36.643       36.673  1.950             3.590
## 2       AH        NA         44.682       44.713  1.940             2.577
## 3       AH        NA         48.333       48.363  1.081             6.475
## 4       AE        NA         52.803       52.853  1.500             3.333
## 5       AH        NA         63.168       63.198  1.620             4.938
## 6       AH        NA         76.353       76.383  2.250             3.556
##   Age Age2 Birthyear sex Ethnicity School code Gram PrevING PrevWord
## 1  21   NA      1979   m       i/r     14   NA   NA      NA       na
## 2  21   NA      1979   m       i/r     14   NA   NA       9      THE
## 3  21   NA      1979   m       i/r     14   NA   NA       9     THIS
## 4  21   NA      1979   m       i/r     14   NA   NA       9      THE
## 5  21   NA      1979   m       i/r     14   NA   NA       9   THAT'S
## 6  21   NA      1979   m       i/r     14   NA   NA       0      THE
##        X    Lag    X.1 style Bin_style Or_style
## 1 19.093 36.673 34.910     R   Careful       NA
## 2  8.119 44.792 44.080     C   Careful       NA
## 3  3.571 48.363 46.390     C   Careful       NA
## 4  4.620 52.983 52.460     C   Careful       NA
## 5 10.215 63.198 61.655     C   Careful       NA
## 6 13.185 76.383 73.190     C   Careful       NA
summary(dh)
##                  File       Segment     Position         Code1      
##  PH92-2-3-JSantori :  924   DH:18974   Start:18974   Min.   :0.000  
##  PH06-2-2-Patrick  :  712                            1st Qu.:1.000  
##  PH06-2-7-Samantha :  626                            Median :1.000  
##  PH06-1-6-JMcPhee  :  611                            Mean   :1.578  
##  PH73-5-2-CKay     :  600                            3rd Qu.:1.000  
##  PH82-1-10-MCollins:  600                            Max.   :9.000  
##  (Other)           :14901                                           
##      Labov         Code2           Seg_Start           Seg_End        
##  Min.   :0.0000   Mode:logical   Min.   :   1.953   Min.   :   2.003  
##  1st Qu.:0.0000   NA's:18974     1st Qu.: 718.413   1st Qu.: 718.443  
##  Median :0.0000                  Median :1383.079   Median :1383.153  
##  Mean   :0.5898                  Mean   :1461.399   Mean   :1461.448  
##  3rd Qu.:1.0000                  3rd Qu.:2102.733   3rd Qu.:2102.765  
##  Max.   :2.0000                  Max.   :4476.403   Max.   :4476.453  
##                                                                       
##       Word        Word_Start          Word_End           Pre_Seg    
##  THE    :7128   Min.   :   1.953   Min.   :   2.093   sp     :5560  
##  THAT   :2794   1st Qu.: 718.413   1st Qu.: 718.473   N      :2743  
##  THEY   :2652   Median :1383.079   Median :1383.341   Z      :1145  
##  THERE  :1185   Mean   :1461.399   Mean   :1461.554   V      :1018  
##  THIS   :1056   3rd Qu.:2102.733   3rd Qu.:2102.810   L      : 927  
##  THAT'S : 864   Max.   :4476.403   Max.   :4476.533   K      : 868  
##  (Other):3295                                         (Other):6713  
##  Pre_Seg2       Pre_Seg_Start       Pre_Seg_End          Post_Seg   
##  Mode:logical   Min.   :   1.893   Min.   :   1.953   AH     :8252  
##  NA's:18974     1st Qu.: 718.343   1st Qu.: 718.413   EH     :3148  
##                 Median :1382.614   Median :1383.079   EY     :2785  
##                 Mean   :1461.240   Mean   :1461.399   AE     :2760  
##                 3rd Qu.:2102.519   3rd Qu.:2102.733   IY     :1603  
##                 Max.   :4476.333   Max.   :4476.403   IH     : 254  
##                                                       (Other): 172  
##  Post_Seg2      Post_Seg_Start      Post_Seg_End          Window      
##  Mode:logical   Min.   :   2.003   Min.   :   2.063   Min.   : 0.470  
##  NA's:18974     1st Qu.: 718.443   1st Qu.: 718.473   1st Qu.: 1.240  
##                 Median :1383.153   Median :1383.329   Median : 1.530  
##                 Mean   :1461.448   Mean   :1461.519   Mean   : 1.793  
##                 3rd Qu.:2102.765   3rd Qu.:2102.810   3rd Qu.: 1.931  
##                 Max.   :4476.453   Max.   :4476.533   Max.   :53.157  
##                                                                       
##  Vowels_per_Second      Age          Age2           Birthyear    sex      
##  Min.   : 0.061    Min.   :18.00   Mode:logical   Min.   :1895   f:10144  
##  1st Qu.: 2.928    1st Qu.:30.00   NA's:18974     1st Qu.:1925   m: 8830  
##  Median : 4.196    Median :45.00                  Median :1944            
##  Mean   : 4.264    Mean   :46.91                  Mean   :1943            
##  3rd Qu.: 5.484    3rd Qu.:68.00                  3rd Qu.:1962            
##  Max.   :13.725    Max.   :78.00                  Max.   :1984            
##                                                                           
##    Ethnicity        School       code           Gram        
##  i      :9781   12     :7059   Mode:logical   Mode:logical  
##  r      :4219   16     :4466   NA's:18974     NA's:18974    
##  r/o    : 795   0      :1560                                
##  w/p/J  : 588   8      :1465                                
##  i/w/g  : 580   11     :1302                                
##  w      : 555   9      : 924                                
##  (Other):2456   (Other):2198                                
##     PrevING         PrevWord          X                Lag          
##  Min.   :0.000   THE    :6775   Min.   : -6.801   Min.   :   2.093  
##  1st Qu.:1.000   THAT   :2606   1st Qu.:  1.143   1st Qu.: 718.473  
##  Median :1.000   THEY   :2546   Median :  2.810   Median :1383.341  
##  Mean   :1.571   THERE  :1119   Mean   :  5.130   Mean   :1461.554  
##  3rd Qu.:1.000   THIS   :1000   3rd Qu.:  6.433   3rd Qu.:2102.810  
##  Max.   :9.000   (Other):4906   Max.   :344.800   Max.   :4476.533  
##  NA's   :1002    NA's   :  22   NA's   :22                          
##       X.1              style       Bin_style         Or_style      
##  Min.   :   1.88   C      :6548   Length:18974       Mode:logical  
##  1st Qu.: 717.05   T      :4707   Class :character   NA's:18974    
##  Median :1381.71   R      :3167   Mode  :character                 
##  Mean   :1459.96   N      :1793                                    
##  3rd Qu.:2101.21   S      :1734                                    
##  Max.   :4476.32   G      : 646                                    
##                    (Other): 379
#checking what's in Labov column (hopefully 0,1,2)
unique(dh$Labov)
## [1] 1 2 0
# Step 2 - begin replicating Labov 2001 by getting mean rates for data across Careful & Casual styles

#turning into dplyr data frame
dh <- tbl_df(dh)

#say what data frame
dh.style.fin <- dh %>%
  
  #will create data in long format
  group_by(Bin_style) %>%
  
  #summarize; add dh stopping rate and N columns
  summarise(non_stop=mean(Labov),N=n())

View(dh.style.fin)

# Step 3 - get mean rates across stylistic category by *individual speaker*

#turning into dplyr data frame
dh <- tbl_df(dh)

#say what data frame
dh.spkr.style <- dh %>%
  
  #will create data in long format
  group_by(File, Bin_style) %>%
  
  #summarize; add non-(dh) stopping rate and N columns
  summarise(non_stop=mean(Labov),N=n())

View(dh.spkr.style)

# Step 4 - getting a column with size of each individual speaker's style shift

#turning into dplyr data frame
dh <- tbl_df(dh)

#say what data frame
dh.spkr.style2 <- dh %>%
  
  #will create data in long format
  group_by(File, Bin_style, sex, Age, Birthyear) %>%
  
  #summarize; add non-(dh) stopping rate and N columns
  summarise(non_stop=mean(Labov),N=n())

View(dh.spkr.style2)

#opening reshape2 package (convert between long and wide format data)
library(reshape2)

#making new dataframe - each spkr is 1 row and car and cas rates are 2 separate columns
dh.spkr.style3 <- dcast(dh.spkr.style2, File + sex + Age + Birthyear ~ Bin_style, 
                        value.var = "non_stop")

#create style_shift column by subtracting Careful mean from Casual mean
dh.spkr.style3$DH_style_shift <- dh.spkr.style3$Casual - dh.spkr.style3$Careful

#multiplying style_shift values by 100 so they align with Labov's values
dh.spkr.style3$DH_style_shift <- dh.spkr.style3$DH_style_shift * 100

#create Age2 column that groups speakers into younger (<30) or older(≥30)
dh.spkr.style3$Age2 <- ifelse(dh.spkr.style3$Age %in% c(18,20,21,22,23,27,28,29), "younger", "older")

View(dh.spkr.style3)

#create N column with the total number of DH occurrences per speaker

#say what data frame
dh.spkr.style4 <- dh %>%
  
  #summarize; add N column
  summarise(N=n())

# Step 5 - replicating Labov figure 5.2 (histogram, Cas index - Car index, for ind spkr)

#opening gglpt2
library(ggplot2)

#making the graph - histogram with degree of ind style shift by gender
ggplot(dh.spkr.style3, aes(DH_style_shift)) + 
  geom_histogram(stat_bin=5, aes(fill=sex), position="dodge") + 
  ggtitle("Style-shifting of (DH) variable in Philadelphia")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Unlike Labov, I did not find dramatic style-shifting for (DH). My modal value fell around 5. However, like Labov I found no strong difference by gender. It is interesting to observe that in my data, women are concentrated among speakers with the most extreme negative values (values below -5) and men among those with the most extreme positive values (values of 10 and greater). This relative concentration is the opposite of what Labov found, as he noted women were concentrated among speakers with the most positive values and men among those with the most negative. It is important to note that I had fewer subjects than Labov. A larger sample size would increase confidence in the findings.

Replicating Labov 2001 figure 5.5

Labov found no overall-age effect, indicated by the overall regression line (straight solid line in the middle of the graph). Labov discounts the significance of the partial regressions for males and females (the dashed lines, which show a slight tendency for males to shift more as they age and for females to shift less), given that the correlation of age with (DH) shift for males is .08 and for females is -.03. He draws attention to a clump of young speakers in the bottom left corner of the graph that display behavior contrary to the prevalent societal norm, in that these speakers have very negative (DH) shifts. No comparable negative clustering occurs among the older speakers. Labov interprets this pattern as affirming the fact that children begin acquiring a community’s sociolinguistic norms at an early age (albeit gradually). He goes on to claim that age and socioeconomic class influence this acquisition.

# Step 6 - replicating Labov fig 5.5 - distribution of style shift by age and gender

#making the graph - scatterplot with dist of ind style shift by age and gender
ggplot(dh.spkr.style3, aes(Age, DH_style_shift)) + geom_point(aes(color=sex)) + 
  stat_smooth(method = "lm") + ggtitle("Distribution of (DH) style-shift by age and sex")

My graph of the distribution of (DH) style-shift by age and sex closely resembles Labov’s. The primary difference lies in the range of values on each graph’s y-axis (that is, the degree of the [DH] style-shift). Since I did not find as dramatic a style-shift for (DH) as Labov did, it makes sense that my range is smaller. The dramatic style-shifting observed by Labov was only observed in the positive direction, not the negative. It is interesting that the lower end of my range is similar to the lower end of Labov’s range; both fall between approximately -30 and -20. This suggests a consistent tendency for speakers to style-shift by decreasing their use of nonstandard variants as they move from Casual to Careful speech. This tendency is fairly intuitive given the saliency of the “casual” social meaning accorded to nonstandard variants and the more “formal” meaning accorded to standard forms. Similarly to Labov, I did not find a strong difference by age or sex. My overall regression line is slightly more sloped than Labov’s, indicating that a speaker’s tendency to style-shift increases slightly with age. My sampling represents an older portion of the population than Labov’s, so it is not possible to gauge adolescent patterns of style-shifting and their divergence from (or convergence with) the community’s normal patterns from my data. The overall regression line is centered on zero and is only slightly curved, suggesting that an age effect is very small and insignificant. That my regression line is centered on zero and Labov’s on 35 reinforces the different degrees of style-shifting each of us observed. It is interesting to note that most of the outlying points of dramatic style-shifting occurred for younger speakers, and that the points become closer to the regression line as age increases.

Replicating Labov figure 5.7

Labov interprets this distribution of index scores across the nuanced categories of the decision tree as validation of each category’s ability to contribute to identifying stylistic variation. Although he questions the decision tree’s effectiveness, Labov does not think the decision tree could be improved if any existing category were discarded. He notes that (DH) stylistic differences are considerably larger than (ING) stylistic differences (even acknowledging the scale covers twice the range). He points out that for (DH), Soapbox is most differentiated from the mean in Careful speech, and that in Casual speech Narrative and Kids have the highest index scores but all four Casual speech categories are well above the mean level.

# Step 7 - replicating fig 5.7 - stylistic differentiation for 8 cat of dec tree

#turning into dplyr data frame
dh <- tbl_df(dh)

#say what data frame
dh.style.fin2 <- dh %>%
  
  #will create data in long format
  group_by(style, Bin_style) %>%
  
  #summarize; add dh stopping rate and N columns
  summarise(index=mean(Labov),N=n())

#multiplying index values by 100 so correspond to Labov's values
dh.style.fin2$index <- dh.style.fin2$index * 100

#reordering style values on x-axis so match Labov 2001 order
dh.style.fin2$style <- factor(dh.style.fin2$style, c("S","L","R","C","N","K","T","G"))

View(dh.style.fin2)

#replicating figure 5.7 as a bar graph - index score arr. by orig 8 style values
ggplot(dh.style.fin2, aes(style,index)) + 
  geom_bar(stat = "identity") + 
  facet_wrap(~ Bin_style) + 
  ggtitle("Stylistic Differentiation of (DH) for 
          eight categories of the Style Decision Tree")

My graph shows a fairly consistent index of stylistic differentiation of (DH) across the nuanced categories of the decision tree, for both Careful and Casual styles. For the Careful styles, as occurred in Labov’s data, the index value for Soapbox diverges most from the other Careful values. However, even it does not stray dramatically from 60, the index value around which the values for the other stylistic categories fell. Interestingly, in my data the index value for Soapbox is higher than for the other Careful index values, while in Labov’s data it was lower than the other values. For the Casual styles, the Kids value diverges most from those of the other style categories, but it also does not stray far beyond 60. It is striking that each stylistic category’s index value fell around 60. This suggests that each category indicates style comparably to the other categories of the decision tree and, as Labov mentioned, does not give a reason to eliminate any category from the decision tree. It also indicates that perhaps another method entirely is needed to better gauge stylistic variation in speech. While the index values for Careful and Casual styles consistently fell around 60 in my data, a clear difference in the mean index value appeared across the Careful and Casual styles in Labov’s data. In Labov’s study, the index values clustered around 45 for the Careful styles and around slightly over 90 for the Casual styles. This calls into question the coding practices used in each study and reinforces the need to develop a more reliable method for quantitatively gauging stylistic variation.

Replicating Labov figure 5.8

Labov notes that the four Careful subcategories are clearly differentiated from the four Casual subcategories in this representation of (DH).

# Step 8 - replicating Labov's fig 5.8 (stylistic diff by sex for 8 cat of dec tree)

#turning into dplyr data frame
dh <- tbl_df(dh)

#say what data frame
dh.style.fin3 <- dh %>%
  
  #will create data in long format
  group_by(style, Bin_style, sex) %>%
  
  #summarize; add dh stopping rate and N columns
  summarise(index=mean(Labov),N=n())

#multiplying index values by 100 so correspond to Labov's values
dh.style.fin3$index <- dh.style.fin3$index * 100

#reordering style values on x-axis so match Labov 2001 order
dh.style.fin3$style <- factor(dh.style.fin3$style, c("S","L","R","C","N","K","T","G"))

View(dh.style.fin3)

#replicating figure 5.8 as bar graph - index score arr. by orig 8 style values and diff by sex
ggplot(dh.style.fin3, aes(style,index)) + 
  geom_bar(stat = "identity", aes(fill=sex)) + 
  facet_wrap(~ Bin_style) +
  facet_grid(sex ~ Bin_style) + 
  ggtitle("Stylistic Differentiation of (DH) by sex for 
          eight categories of the Style Decision Tree")

My graph indicates that (DH) stylistic differentiation is more varied across sex than across style category. Both men and women exhibit similar indices of style-shifting to those of their same sex across Careful and Casual styles. The index value for Careful speech for women falls between around 40 and 80, with Soapbox having the highest value (around 80), followed by Response and Careful (each around 50) and finally Language (around 40). The index values for Casual speech for women are similarly spread out and also fall between around 40 and 80. For Casual speech, Kids has the highest index value (around 80), followed by Group (around 60) and then Narrative (around 50) and Tangent (around 40). There is not a clear pattern of relatively higher, or lower, index values occurring among more objective, or subjective, categories (“objective” categories including Narrative, Language and Group and “subjective” categories including Kids, Tangent and Careful). Male speakers also exhibited more similar patterns across style than across gender. For both the Careful and Casual style categories, the male index value falls between 60 and 70. It is very interesting to compare these results to Labov’s, for his primary categorical differentiation of index value occurred across style category, not sex. For Labov, both men and women tended to have lower index values for degree of style shifting in the Careful category, with values clustering around 40, while for the Casual category speakers of both sexes tended to have higher values, with the women’s values clustering around 80 and the men’s between 100 and 120. interaction?? (usually male values seem higher but not always?)

Replicating Labov 2001 figure 5.6
# Step 9 - replicating Labov fig 5.6 - variable style-shift by size of data set and age

#issue with coding so had to remove

DH: Logistical Analysis

My logistic regression for (DH) did not find Style to be a significant predictor of speech variant used. My original hypothesis, that the percent frequency of the non-standard [d] variant will be greater in “casual” than “careful” contexts, was not supported. I am unable to reject the null hypothesis that different stylistic contexts have no effect on the variant ([dh] or [d]) a speaker realizes. These results are striking, because Labov found dramatic evidence of style shifting for the (DH) variable with data obtained from the same corpus as mine. Although style was a weaker predictor of speech variant than I anticipated, these findings are not enough to discount the importance of style in speech. It is possible that the lack of style shifting I found can be attributed to shortcomings of coding with the decision tree method, and that methods that can better accommodate for both the range of indexical meanings presented by every variable and for every individual speaker’s unique usage of these meanings to create situationally-relevant stances will be able to better predict stylistic variation in speech. Sex and birthyear were found to be significant predictors of the linguistic variant used, which underscores how external (social) factors do influence linguistic variation. Labov also found a significant and sizeable gender effect.

#accessing lme4 (program needed to run regression)
library(lme4)
## Loading required package: Matrix
#accessing data file
dh.lr <- read.csv("dhstyle_fin.csv")

#peeking at first lines of data file
head(dh.lr)
##                File Segment Position Code1 Labov Code2 Seg_Start Seg_End
## 1 PH00-1-1-JStevens      DH    Start     9    NA    NA    36.613  36.643
## 2 PH00-1-1-JStevens      DH    Start     9    NA    NA    44.653  44.682
## 3 PH00-1-1-JStevens      DH    Start     9    NA    NA    48.302  48.333
## 4 PH00-1-1-JStevens      DH    Start     9    NA    NA    52.773  52.803
## 5 PH00-1-1-JStevens      DH    Start     0    NA    NA    63.138  63.168
## 6 PH00-1-1-JStevens      DH    Start     2    NA    NA    76.323  76.353
##     Word Word_Start Word_End Pre_Seg Pre_Seg2 Pre_Seg_Start Pre_Seg_End
## 1    THE     36.613   36.673       S       NA        36.543      36.613
## 2   THIS     44.653   44.792      AH       NA        44.453      44.653
## 3    THE     48.302   48.363       S       NA        48.273      48.302
## 4 THAT'S     52.773   52.983      sp       NA        52.633      52.773
## 5    THE     63.138   63.198       K       NA        63.108      63.138
## 6    THE     76.323   76.383       N       NA        76.273      76.323
##   Post_Seg Post_Seg2 Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1       AH        NA         36.643       36.673  1.950             3.590
## 2       AH        NA         44.682       44.713  1.940             2.577
## 3       AH        NA         48.333       48.363  1.081             6.475
## 4       AE        NA         52.803       52.853  1.500             3.333
## 5       AH        NA         63.168       63.198  1.620             4.938
## 6       AH        NA         76.353       76.383  2.250             3.556
##   Age Age2 Birthyear sex Ethnicity School code Gram PrevING PrevWord
## 1  21   NA      1979   m       i/r     14   NA   NA      NA       na
## 2  21   NA      1979   m       i/r     14   NA   NA       9      THE
## 3  21   NA      1979   m       i/r     14   NA   NA       9     THIS
## 4  21   NA      1979   m       i/r     14   NA   NA       9      THE
## 5  21   NA      1979   m       i/r     14   NA   NA       9   THAT'S
## 6  21   NA      1979   m       i/r     14   NA   NA       0      THE
##        X    Lag    X.1 style Bin_style Or_style
## 1 19.093 36.673 34.910     R        NA       NA
## 2  8.119 44.792 44.080     C        NA       NA
## 3  3.571 48.363 46.390     C        NA       NA
## 4  4.620 52.983 52.460     C        NA       NA
## 5 10.215 63.198 61.655     C        NA       NA
## 6 13.185 76.383 73.190     C        NA       NA
#collapsing 8 stylistic categories into 3 style codes in Bin_style
dh.lr$Bin_style <- ifelse(dh.lr$style %in% c("C","L","R","S"), "Careful",
                          ifelse(dh.lr$style %in% c("G","K","N","T"), "Casual", 
                                 "NA"))

#filtering NAs from Bin_style column so don't appear in table
dh.lr <- filter(dh.lr, !Bin_style=="NA")

#configuring Code2 column for regression (DV only 0 - nstd or 1 - std)
#excluding 9's

#collapsing values in Code2 column
dh.lr$Code2[dh.lr$Code1 %in% c("0")] <- "0"
dh.lr$Code2[dh.lr$Code1 %in% c("1","2")] <- "1"

#making the values in Code2 column numeric
dh.lr$Code2 <- as.numeric(dh.lr$Code2)

#checking values in Code2 column
unique(dh.lr$Code2)
## [1] NA  0  1
#filtering NA's from Code2 column
dh.lr <- filter(dh.lr, !Code2=="NA")

#recode Pre_Seg column into 2 categories in Pre_Seg2
dh.lr$Pre_Seg2 <- ifelse(dh.lr$Pre_Seg %in% c("AA","AE","AH","AO","AW","AY","EH",
                                              "ER","EY","IH","IY","OW","OY","UH","UW"), 
                         "vowel", ifelse(dh.lr$Pre_Seg %in% c("br","lg","ls","ns","sp"), 
                                         "NA", "consonant"))

#recode Post_Seg column into 2 categories in Post_Seg2
dh.lr$Post_Seg2 <- ifelse(dh.lr$Post_Seg %in% c("AE","AH","AY","EH","EY",
                                                "IH","IY","OW"), "vowel", ifelse(dh.lr$Post_Seg %in% c("sp"), "NA", "consonant"))

#peeking at first lines of file
head(dh.lr)
##                File Segment Position Code1 Labov Code2 Seg_Start Seg_End
## 1 PH00-1-1-JStevens      DH    Start     0    NA     0    63.138  63.168
## 2 PH00-1-1-JStevens      DH    Start     2    NA     1    76.323  76.353
## 3 PH00-1-1-JStevens      DH    Start     2    NA     1    87.663  87.763
## 4 PH00-1-1-JStevens      DH    Start     1    NA     1    96.648  96.708
## 5 PH00-1-1-JStevens      DH    Start     0    NA     0   102.113 102.142
## 6 PH00-1-1-JStevens      DH    Start     0    NA     0   117.073 117.123
##    Word Word_Start Word_End Pre_Seg  Pre_Seg2 Pre_Seg_Start Pre_Seg_End
## 1   THE     63.138   63.198       K consonant        63.108      63.138
## 2   THE     76.323   76.383       N consonant        76.273      76.323
## 3   THE     87.663   87.973      sp        NA        87.643      87.663
## 4 THERE     96.648   96.767      OW     vowel        96.538      96.648
## 5  THEN    102.113  102.203      sp        NA       101.468     102.113
## 6  THEN    117.073  117.212      sp        NA       116.403     117.073
##   Post_Seg Post_Seg2 Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1       AH     vowel         63.168       63.198  1.620             4.938
## 2       AH     vowel         76.353       76.383  2.250             3.556
## 3       AH     vowel         87.763       87.973  1.520             7.895
## 4       EH     vowel         96.708       96.738  1.019             7.851
## 5       EH     vowel        102.142      102.173  1.425             4.211
## 6       EH     vowel        117.123      117.153  2.445             2.045
##   Age Age2 Birthyear sex Ethnicity School code Gram PrevING PrevWord
## 1  21   NA      1979   m       i/r     14   NA   NA       9   THAT'S
## 2  21   NA      1979   m       i/r     14   NA   NA       0      THE
## 3  21   NA      1979   m       i/r     14   NA   NA      NA       na
## 4  21   NA      1979   m       i/r     14   NA   NA       2      THE
## 5  21   NA      1979   m       i/r     14   NA   NA       1    THERE
## 6  21   NA      1979   m       i/r     14   NA   NA       0     THEN
##        X     Lag     X.1 style Bin_style Or_style
## 1 10.215  63.198  61.655     C   Careful       NA
## 2 13.185  76.383  73.190     C   Careful       NA
## 3  4.250  87.973  83.950     R   Careful       NA
## 4  8.794  96.767  96.165     C   Careful       NA
## 5  5.436 102.203 102.100     C   Careful       NA
## 6 15.009 117.212 116.030     C   Careful       NA
#summary of file contents -> ensure everything in correct format
summary(dh.lr)
##                  File       Segment     Position         Code1       
##  PH92-2-3-JSantori :  805   DH:17405   Start:17405   Min.   :0.0000  
##  PH06-2-2-Patrick  :  662                            1st Qu.:1.0000  
##  PH06-1-6-JMcPhee  :  607                            Median :1.0000  
##  PH73-5-2-CKay     :  586                            Mean   :0.9092  
##  PH82-1-10-MCollins:  582                            3rd Qu.:1.0000  
##  PH06-2-4-Brooke   :  570                            Max.   :2.0000  
##  (Other)           :13593                                            
##   Labov             Code2          Seg_Start           Seg_End        
##  Mode:logical   Min.   :0.0000   Min.   :   1.953   Min.   :   2.003  
##  NA's:17405     1st Qu.:1.0000   1st Qu.: 716.133   1st Qu.: 716.183  
##                 Median :1.0000   Median :1381.053   Median :1381.113  
##                 Mean   :0.7855   Mean   :1461.255   Mean   :1461.304  
##                 3rd Qu.:1.0000   3rd Qu.:2100.763   3rd Qu.:2100.812  
##                 Max.   :1.0000   Max.   :4476.403   Max.   :4476.453  
##                                                                       
##       Word        Word_Start          Word_End           Pre_Seg    
##  THE    :6727   Min.   :   1.953   Min.   :   2.093   sp     :5284  
##  THEY   :2596   1st Qu.: 716.133   1st Qu.: 716.223   N      :2486  
##  THAT   :2494   Median :1381.053   Median :1381.173   V      : 924  
##  THERE  :1043   Mean   :1461.255   Mean   :1461.411   Z      : 863  
##  THIS   : 963   3rd Qu.:2100.763   3rd Qu.:2100.961   L      : 838  
##  THAT'S : 621   Max.   :4476.403   Max.   :4476.533   ER     : 779  
##  (Other):2961                                         (Other):6231  
##    Pre_Seg2         Pre_Seg_Start       Pre_Seg_End          Post_Seg   
##  Length:17405       Min.   :   1.893   Min.   :   1.953   AH     :7491  
##  Class :character   1st Qu.: 716.078   1st Qu.: 716.133   EH     :2919  
##  Mode  :character   Median :1380.818   Median :1381.053   EY     :2728  
##                     Mean   :1461.095   Mean   :1461.255   AE     :2338  
##                     3rd Qu.:2100.633   3rd Qu.:2100.763   IY     :1538  
##                     Max.   :4476.333   Max.   :4476.403   IH     : 226  
##                                                           (Other): 165  
##   Post_Seg2         Post_Seg_Start      Post_Seg_End          Window      
##  Length:17405       Min.   :   2.003   Min.   :   2.063   Min.   : 0.470  
##  Class :character   1st Qu.: 716.183   1st Qu.: 716.223   1st Qu.: 1.249  
##  Mode  :character   Median :1381.113   Median :1381.143   Median : 1.539  
##                     Mean   :1461.304   Mean   :1461.377   Mean   : 1.782  
##                     3rd Qu.:2100.812   3rd Qu.:2100.893   3rd Qu.: 1.930  
##                     Max.   :4476.453   Max.   :4476.533   Max.   :53.157  
##                                                                           
##  Vowels_per_Second      Age          Age2           Birthyear    sex     
##  Min.   : 0.061    Min.   :18.00   Mode:logical   Min.   :1895   f:9569  
##  1st Qu.: 2.941    1st Qu.:29.00   NA's:17405     1st Qu.:1925   m:7836  
##  Median : 4.167    Median :45.00                  Median :1944           
##  Mean   : 4.249    Mean   :46.63                  Mean   :1943           
##  3rd Qu.: 5.446    3rd Qu.:65.00                  3rd Qu.:1962           
##  Max.   :13.725    Max.   :78.00                  Max.   :1984           
##                                                                          
##    Ethnicity        School       code           Gram        
##  i      :8765   12     :6480   Mode:logical   Mode:logical  
##  r      :3974   16     :4248   NA's:17405     NA's:17405    
##  r/o    : 760   0      :1495                                
##  w/p/J  : 570   8      :1311                                
##  w      : 521   11     :1184                                
##  r/p    : 515   9      : 805                                
##  (Other):2300   (Other):1882                                
##     PrevING         PrevWord          X                Lag          
##  Min.   :0.000   THE    :6249   Min.   : -6.801   Min.   :   2.093  
##  1st Qu.:1.000   THEY   :2399   1st Qu.:  1.149   1st Qu.: 716.223  
##  Median :1.000   THAT   :2364   Median :  2.800   Median :1381.173  
##  Mean   :1.499   THERE  :1019   Mean   :  5.116   Mean   :1461.411  
##  3rd Qu.:1.000   THIS   : 923   3rd Qu.:  6.411   3rd Qu.:2100.961  
##  Max.   :9.000   (Other):4435   Max.   :344.800   Max.   :4476.533  
##  NA's   :913     NA's   :  16   NA's   :16                          
##       X.1              style       Bin_style         Or_style      
##  Min.   :   1.88   C      :6000   Length:17405       Mode:logical  
##  1st Qu.: 714.64   T      :4414   Class :character   NA's:17405    
##  Median :1379.53   R      :2860   Mode  :character                 
##  Mean   :1459.83   N      :1619                                    
##  3rd Qu.:2099.71   S      :1590                                    
##  Max.   :4476.32   G      : 579                                    
##                    (Other): 343
#creating the model (9's excluded)
mod.dh <- glm(Code2 ~ sex + Birthyear + Pre_Seg2 + Post_Seg2 + Bin_style,
              dh.lr, family = "binomial")

#seeing the results of the model
summary(mod.dh)
## 
## Call:
## glm(formula = Code2 ~ sex + Birthyear + Pre_Seg2 + Post_Seg2 + 
##     Bin_style, family = "binomial", data = dh.lr)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3142   0.4082   0.5557   0.7362   1.1207  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -2.368e+01  1.970e+02  -0.120   0.9043    
## sexm            -5.196e-01  3.805e-02 -13.658   <2e-16 ***
## Birthyear        1.814e-02  8.815e-04  20.582   <2e-16 ***
## Pre_Seg2NA      -5.969e-01  4.277e-02 -13.956   <2e-16 ***
## Pre_Seg2vowel   -9.650e-02  5.072e-02  -1.903   0.0571 .  
## Post_Seg2NA      6.787e-02  2.274e+02   0.000   0.9998    
## Post_Seg2vowel  -9.776e+00  1.970e+02  -0.050   0.9604    
## Bin_styleCasual  6.697e-02  3.918e-02   1.709   0.0874 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 18098  on 17404  degrees of freedom
## Residual deviance: 17205  on 17397  degrees of freedom
## AIC: 17221
## 
## Number of Fisher Scoring iterations: 10

ING: Visual Analysis

**The data represented in the following graphs and analyses is a subsection of the original (ING) data frame: this subsection only contains (ING) progressives. This subset was used in lieu of the entire dataset to ensure the replication is consistent with Labov, who on page 94 of Labov 2001 notes only analyzing data from progressives.

Labov found less pronounced style-shifting for (ING) than he did for (DH). His modal value for (ING) fell close to zero. Once again, while some style-shift values occur below zero, the majority of speakers had positive style-shift values.

Replicating Labov 2001 figure 5.2
#working in dplyr
library(dplyr)

#opening data file
ing <- read.csv("ingstyle_fin.csv")

#peeking at first lines of data frame
head(ing)
##                File Segment Position code Gram Seg_Start Seg_End
## 1 PH00-1-1-JStevens     IH0      End    0    o    85.063  85.102
## 2 PH00-1-1-JStevens     IH0      End    0    p   112.708 112.738
## 3 PH00-1-1-JStevens     IH0      End    1    p   114.648 114.688
## 4 PH00-1-1-JStevens     IH0      End    0    p   127.878 127.907
## 5 PH00-1-1-JStevens     IH0      End    0    o   147.712 147.743
## 6 PH00-1-1-JStevens     IH0      End    1    n   148.553 148.603
##         word Word_Start Word_End Pre_Seg Pre_Seg_Start Pre_Seg_End
## 1    WORKING     84.912   85.132       K        85.023      85.063
## 2     FIXING    112.468  112.768       S       112.658     112.708
## 3 NETWORKING    114.308  114.798       K       114.588     114.648
## 4    WORKING    127.687  127.937       K       127.808     127.878
## 5    WAITING    147.593  147.773       T       147.682     147.712
## 6    OPENING    148.413  148.653       N       148.522     148.553
##   Post_Seg Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1        S         85.132       85.273  1.280             5.469
## 2        K        112.768      112.808  3.060             2.614
## 3       sp        114.798      115.138  3.635             2.201
## 4       AW        127.937      128.028  1.860             3.226
## 5        T        147.773      147.812  1.310             8.397
## 6       sp        148.653      148.673  1.430             6.993
##   preseg_place preseg_manner folseg_place folseg_manner sex birthyear age
## 1        velar      stop/aff      coronal     fricative   m      1979  21
## 2      coronal     fricative        velar      stop/aff   m      1979  21
## 3        velar      stop/aff         none         pause   m      1979  21
## 4        velar      stop/aff        velar         vowel   m      1979  21
## 5      coronal      stop/aff      coronal      stop/aff   m      1979  21
## 6      coronal         nasal         none         pause   m      1979  21
##   school school.calc   school.cat newgram newgram2 PrevING   PrevWord
## 1     14           2 Some college       p       NA       1         na
## 2     14           2 Some college       g       NA       0    WORKING
## 3     14           2 Some college       g       NA       0     FIXING
## 4     14           2 Some college       p       NA       1 NETWORKING
## 5     14           2 Some college       p       NA       0    WORKING
## 6     14           2 Some college       r       NA       0    WAITING
##   PrevGram    Lag subtlex.count   start     end style Bin_style Or_style
## 1     <NA>  1.409         12775  84.912  83.950     R        NA       NA
## 2        p 27.636           533 112.468 107.275     C        NA       NA
## 3        g  2.030            27 114.308 107.275     C        NA       NA
## 4        g 13.139         12775 127.687 126.505     C        NA       NA
## 5        p  8.600         10767 147.593 146.840     C        NA       NA
## 6        p  0.880          1960 148.413 146.840     C        NA       NA
#seeing summary of data frame
summary(ing)
##                     File      Segment    Position        code       
##  PH06-2-2-Patrick     : 171   IH0:3816   End:3816   Min.   :0.0000  
##  PH82-1-10-MCollins   : 163                         1st Qu.:0.0000  
##  PH06-1-6-JMcPhee     : 151                         Median :0.0000  
##  PH92-2-3-JSantori    : 142                         Mean   :0.4267  
##  PH06-2-5-Sophia      : 141                         3rd Qu.:1.0000  
##  PH79-3-5-VSarsparilla: 141                         Max.   :1.0000  
##  (Other)              :2907                         NA's   :17      
##  Gram       Seg_Start           Seg_End                word     
##  a: 102   Min.   :   9.493   Min.   :   9.542   SOMETHING: 329  
##  d:  49   1st Qu.: 748.056   1st Qu.: 748.086   GOING    : 285  
##  g: 188   Median :1436.973   Median :1437.013   DOING    : 165  
##  n: 241   Mean   :1498.944   Mean   :1498.984   GETTING  : 150  
##  o:1722   3rd Qu.:2204.564   3rd Qu.:2204.598   BEING    : 114  
##  p:1085   Max.   :4469.943   Max.   :4469.993   COMING   : 109  
##  s: 429                                         (Other)  :2664  
##    Word_Start          Word_End           Pre_Seg     Pre_Seg_Start     
##  Min.   :   9.373   Min.   :   9.642   K      : 570   Min.   :   9.413  
##  1st Qu.: 747.867   1st Qu.: 748.124   T      : 463   1st Qu.: 747.979  
##  Median :1436.683   Median :1437.118   TH     : 431   Median :1436.832  
##  Mean   :1498.694   Mean   :1499.053   OW     : 380   Mean   :1498.868  
##  3rd Qu.:2204.352   3rd Qu.:2204.640   N      : 235   3rd Qu.:2204.477  
##  Max.   :4469.703   Max.   :4470.052   V      : 229   Max.   :4469.793  
##                                        (Other):1508                     
##   Pre_Seg_End          Post_Seg    Post_Seg_Start      Post_Seg_End     
##  Min.   :   9.493   sp     : 849   Min.   :   9.642   Min.   :   9.782  
##  1st Qu.: 748.056   AH     : 525   1st Qu.: 748.124   1st Qu.: 748.200  
##  Median :1436.973   DH     : 337   Median :1437.118   Median :1437.208  
##  Mean   :1498.944   T      : 248   Mean   :1499.053   Mean   :1499.164  
##  3rd Qu.:2204.564   IH     : 134   3rd Qu.:2204.640   3rd Qu.:2204.678  
##  Max.   :4469.943   W      : 122   Max.   :4470.052   Max.   :4470.133  
##                     (Other):1601                                        
##      Window       Vowels_per_Second  preseg_place       preseg_manner 
##  Min.   : 0.570   Min.   : 0.148    coronal:2139   fricative   : 827  
##  1st Qu.: 1.300   1st Qu.: 3.515    labial :1019   liquid/glide: 246  
##  Median : 1.609   Median : 4.790    velar  : 658   nasal       : 405  
##  Mean   : 1.812   Mean   : 4.837                   stop/aff    :1359  
##  3rd Qu.: 1.970   3rd Qu.: 6.034                   vowel       : 979  
##  Max.   :33.877   Max.   :13.115                                      
##                                                                       
##   folseg_place       folseg_manner  sex        birthyear   
##  coronal:1405   fricative   : 559   f:1951   Min.   :1895  
##  labial : 517   liquid/glide: 504   m:1865   1st Qu.:1930  
##  none   : 863   nasal       : 146            Median :1944  
##  velar  :1031   pause       : 863            Mean   :1947  
##                 stop/aff    : 581            3rd Qu.:1970  
##                 vowel       :1163            Max.   :1984  
##                                                            
##       age            school       school.calc              school.cat  
##  Min.   :18.00   Min.   : 0.00   Min.   :-12.0000   Just HS     :1447  
##  1st Qu.:29.00   1st Qu.:11.00   1st Qu.: -1.0000   Not HS      :1012  
##  Median :43.00   Median :12.00   Median :  0.0000   Some college:1180  
##  Mean   :45.03   Mean   :12.03   Mean   :  0.0294   NA's        : 177  
##  3rd Qu.:62.00   3rd Qu.:14.00   3rd Qu.:  2.0000                      
##  Max.   :78.00   Max.   :16.00   Max.   :  4.0000                      
##                  NA's   :177     NA's   :177                           
##     newgram     newgram2          PrevING            PrevWord   
##  exclude:  18   Mode:logical   Min.   :0.0000   na       : 629  
##  g      : 625   NA's:3816      1st Qu.:0.0000   SOMETHING: 266  
##  m      : 138                  Median :1.0000   GOING    : 241  
##  p      :2279                  Mean   :0.5169   DOING    : 142  
##  r      : 324                  3rd Qu.:1.0000   GETTING  : 124  
##  s      : 432                  Max.   :1.0000   (Other)  :2397  
##                                NA's   :30       NA's     :  17  
##     PrevGram         Lag          subtlex.count        start         
##  exclude:  16   Min.   : -2.666   Min.   :     0   Min.   :   9.373  
##  g      : 517   1st Qu.:  3.257   1st Qu.:  1960   1st Qu.: 747.867  
##  m      : 112   Median :  9.977   Median :  9730   Median :1436.683  
##  p      :1921   Mean   : 20.314   Mean   : 24886   Mean   :1498.694  
##  r      : 257   3rd Qu.: 26.028   3rd Qu.: 26878   3rd Qu.:2204.352  
##  s      : 346   Max.   :401.690   Max.   :108288   Max.   :4469.703  
##  NA's   : 647   NA's   :17                                           
##       end              style      Bin_style      Or_style      
##  Min.   :   9.04   C      :1212   Mode:logical   Mode:logical  
##  1st Qu.: 746.63   T      : 922   NA's:3816      NA's:3816     
##  Median :1435.05   N      : 665                                
##  Mean   :1497.00   R      : 475                                
##  3rd Qu.:2203.72   S      : 352                                
##  Max.   :4468.66   G      : 128                                
##                    (Other):  62
# Step 1 - Reorganizing the data frame

#collapsing style codes into 3 styles (car, cas, na) in Bin_style (%in%, not %>%)
ing$Bin_style <- ifelse(ing$style %in% c("C","R","L","S"), "Careful",
                        ifelse(ing$style %in% c("N","G","K","T"), "Casual", "NA"))

#filter so NAs are taken out of code column
ing <- filter(ing, !code=="NA")

#filter so "n/a"s don't appear in table
ing <- filter(ing, !Bin_style=="n/a")

#seeing values in "Bin_style" column
unique(ing$Bin_style)
## [1] "Careful" "Casual"
#subsetting data so only working with progressives (like Labov)
ing.prog <- subset(ing, Gram %in% c("o"))

View(ing.prog)

head(ing.prog)
##                 File Segment Position code Gram Seg_Start Seg_End     word
## 1  PH00-1-1-JStevens     IH0      End    0    o    85.063  85.102  WORKING
## 5  PH00-1-1-JStevens     IH0      End    0    o   147.712 147.743  WAITING
## 7  PH00-1-1-JStevens     IH0      End    0    o   152.518 152.548   MAKING
## 10 PH00-1-1-JStevens     IH0      End    0    o   331.826 331.866  WORKING
## 12 PH00-1-1-JStevens     IH0      End    0    o   353.953 353.983    DOING
## 17 PH00-1-1-JStevens     IH0      End    0    o   454.528 454.558 THROWING
##    Word_Start Word_End Pre_Seg Pre_Seg_Start Pre_Seg_End Post_Seg
## 1      84.912   85.132       K        85.023      85.063        S
## 5     147.593  147.773       T       147.682     147.712        T
## 7     152.328  152.578       K       152.467     152.518        N
## 10    331.555  331.936       K       331.755     331.826       sp
## 12    353.823  354.042      UW       353.853     353.953        G
## 17    454.248  454.598      OW       454.418     454.528        B
##    Post_Seg_Start Post_Seg_End Window Vowels_per_Second preseg_place
## 1          85.132       85.273  1.280             5.469        velar
## 5         147.773      147.812  1.310             8.397      coronal
## 7         152.578      152.668  1.260             6.349        velar
## 10        331.936      332.066  1.682             2.973        velar
## 12        354.042      354.073  2.576             1.941       labial
## 17        454.598      454.648  2.370             3.376       labial
##    preseg_manner folseg_place folseg_manner sex birthyear age school
## 1       stop/aff      coronal     fricative   m      1979  21     14
## 5       stop/aff      coronal      stop/aff   m      1979  21     14
## 7       stop/aff      coronal         nasal   m      1979  21     14
## 10      stop/aff         none         pause   m      1979  21     14
## 12         vowel        velar      stop/aff   m      1979  21     14
## 17         vowel       labial      stop/aff   m      1979  21     14
##    school.calc   school.cat newgram newgram2 PrevING  PrevWord PrevGram
## 1            2 Some college       p       NA       1        na     <NA>
## 5            2 Some college       p       NA       0   WORKING        p
## 7            2 Some college       p       NA       1   OPENING        r
## 10           2 Some college       p       NA       0   HANGING        p
## 12           2 Some college       p       NA       0 SOMETHING        s
## 17           2 Some college       p       NA       1   CALLING        p
##       Lag subtlex.count   start     end style Bin_style Or_style
## 1   1.409         12775  84.912  83.950     R   Careful       NA
## 5   8.600         10767 147.593 146.840     C   Careful       NA
## 7   3.925         11349 152.328 151.035     C   Careful       NA
## 10  6.584         12775 331.555 330.573     R   Careful       NA
## 12 12.845         52492 353.823 353.310     T    Casual       NA
## 17  2.041          1488 454.248 450.665     C   Careful       NA
summary(ing.prog)
##                     File      Segment    Position        code       
##  PH82-1-10-MCollins   :  81   IH0:1705   End:1705   Min.   :0.0000  
##  PH06-2-5-Sophia      :  78                         1st Qu.:0.0000  
##  PH79-3-5-VSarsparilla:  68                         Median :0.0000  
##  PH10-2-3-Vince       :  67                         Mean   :0.3501  
##  PH92-2-3-JSantori    :  67                         3rd Qu.:1.0000  
##  PH06-1-4-KSwanson    :  63                         Max.   :1.0000  
##  (Other)              :1281                                         
##  Gram       Seg_Start           Seg_End              word     
##  a:   0   Min.   :   9.493   Min.   :   9.542   GOING  : 193  
##  d:   0   1st Qu.: 786.113   1st Qu.: 786.142   DOING  : 137  
##  g:   0   Median :1436.123   Median :1436.153   GETTING: 101  
##  n:   0   Mean   :1489.813   Mean   :1489.852   SAYING :  88  
##  o:1705   3rd Qu.:2185.921   3rd Qu.:2185.951   TALKING:  81  
##  p:   0   Max.   :4469.943   Max.   :4469.993   COMING :  66  
##  s:   0                                         (Other):1039  
##    Word_Start          Word_End           Pre_Seg    Pre_Seg_Start     
##  Min.   :   9.373   Min.   :   9.642   K      :334   Min.   :   9.413  
##  1st Qu.: 785.922   1st Qu.: 786.262   T      :256   1st Qu.: 786.022  
##  Median :1436.023   Median :1436.183   OW     :232   Median :1436.093  
##  Mean   :1489.577   Mean   :1489.918   UW     :141   Mean   :1489.732  
##  3rd Qu.:2185.721   3rd Qu.:2186.021   EY     :139   3rd Qu.:2185.741  
##  Max.   :4469.703   Max.   :4470.052   V      :100   Max.   :4469.793  
##                                        (Other):503                     
##   Pre_Seg_End          Post_Seg   Post_Seg_Start      Post_Seg_End     
##  Min.   :   9.493   sp     :331   Min.   :   9.642   Min.   :   9.782  
##  1st Qu.: 786.113   AH     :265   1st Qu.: 786.262   1st Qu.: 786.292  
##  Median :1436.123   DH     :145   Median :1436.183   Median :1436.393  
##  Mean   :1489.813   T      :130   Mean   :1489.918   Mean   :1490.025  
##  3rd Qu.:2185.921   AA     : 70   3rd Qu.:2186.021   3rd Qu.:2186.141  
##  Max.   :4469.943   IH     : 62   Max.   :4470.052   Max.   :4470.133  
##                     (Other):702                                        
##      Window       Vowels_per_Second  preseg_place      preseg_manner
##  Min.   : 0.610   Min.   : 0.148    coronal:743   fricative   :162  
##  1st Qu.: 1.231   1st Qu.: 3.774    labial :591   liquid/glide: 89  
##  Median : 1.510   Median : 5.040    velar  :371   nasal       :150  
##  Mean   : 1.761   Mean   : 5.086                  stop/aff    :710  
##  3rd Qu.: 1.880   3rd Qu.: 6.349                  vowel       :594  
##  Max.   :33.877   Max.   :13.115                                    
##                                                                     
##   folseg_place      folseg_manner sex       birthyear         age       
##  coronal:611   fricative   :237   f:940   Min.   :1895   Min.   :18.00  
##  labial :252   liquid/glide:207   m:765   1st Qu.:1930   1st Qu.:29.00  
##  none   :336   nasal       : 86           Median :1944   Median :41.00  
##  velar  :506   pause       :336           Mean   :1946   Mean   :44.63  
##                stop/aff    :271           3rd Qu.:1965   3rd Qu.:62.00  
##                vowel       :568           Max.   :1984   Max.   :78.00  
##                                                                         
##      school       school.calc             school.cat     newgram    
##  Min.   : 0.00   Min.   :-12.000   Just HS     :672   exclude:   0  
##  1st Qu.:11.00   1st Qu.: -1.000   Not HS      :462   g      :   0  
##  Median :12.00   Median :  0.000   Some college:482   m      :   0  
##  Mean   :11.85   Mean   : -0.146   NA's        : 89   p      :1705  
##  3rd Qu.:13.00   3rd Qu.:  1.000                      r      :   0  
##  Max.   :16.00   Max.   :  4.000                      s      :   0  
##  NA's   :89      NA's   :89                                         
##  newgram2          PrevING            PrevWord       PrevGram  
##  Mode:logical   Min.   :0.0000   na       : 353   exclude:  8  
##  NA's:1705      1st Qu.:0.0000   GOING    :  94   g      :160  
##                 Median :1.0000   SOMETHING:  85   m      : 29  
##                 Mean   :0.5234   DOING    :  71   p      :950  
##                 3rd Qu.:1.0000   GETTING  :  53   r      : 86  
##                 Max.   :1.0000   (Other)  :1040   s      :110  
##                 NA's   :16       NA's     :   9   NA's   :362  
##       Lag          subtlex.count        start               end         
##  Min.   : -2.666   Min.   :     0   Min.   :   9.373   Min.   :   9.04  
##  1st Qu.:  3.328   1st Qu.:  2735   1st Qu.: 785.922   1st Qu.: 785.31  
##  Median :  9.621   Median : 12263   Median :1436.023   Median :1430.98  
##  Mean   : 20.233   Mean   : 24686   Mean   :1489.577   Mean   :1487.92  
##  3rd Qu.: 24.334   3rd Qu.: 25385   3rd Qu.:2185.721   3rd Qu.:2184.52  
##  Max.   :401.690   Max.   :108288   Max.   :4469.703   Max.   :4468.66  
##  NA's   :9                                                              
##      style      Bin_style         Or_style      
##  C      :469   Length:1705        Mode:logical  
##  N      :407   Class :character   NA's:1705     
##  T      :372   Mode  :character                 
##  R      :219                                    
##  S      :126                                    
##  G      : 85                                    
##  (Other): 27
# Step 2 - begin replicating Labov 2001 by getting mean rates for data across Careful & Casual styles

#turn into dplyr dataframe
ing.prog <- tbl_df(ing.prog)

#say what dataframe
ing.prog.style.fin <- ing.prog %>%
  
  #will create data in long format
  group_by(Bin_style) %>%
  
  #summarize; add the ret rate and N column 
  summarise(ing.rate=mean(code),N=n())

View(ing.prog.style.fin)

# Step 3 - get mean rates across stylistic category by *individual speaker*

#turn into dplyr dataframe
ing.prog <- tbl_df(ing.prog)

#say what dataframe
ing.prog.spkr.style <- ing.prog %>%
  
  #will create data in long format
  group_by(File, Bin_style) %>%
  
  #summarize; add the rate and N column 
  summarise(ing.rate=mean(code),N=n())

View(ing.prog.spkr.style)

# Step 4 - getting a column with size of each individual speaker's style shift

#turning into dplyr data frame
ing.prog <- tbl_df(ing.prog)

#say what data frame
ing.prog.spkr.style2 <- ing.prog %>%
  
  #will create data in long format
  group_by(File, Bin_style, sex, age, birthyear) %>%
  
  #summarize; add ing.rate and N columns
  summarise(ing.rate=mean(code))

View(ing.prog.spkr.style2)

#opening reshape2 package
library(reshape2)

#making new dataframe - each spkr is 1 row and car and cas rates are 2 separate columns
ing.prog.spkr.style3 <- dcast(ing.prog.spkr.style2, File + sex + age + birthyear ~ Bin_style,
                              value.var = "ing.rate")

#create style_shift column by subtracting Careful mean from Casual mean
ing.prog.spkr.style3$ING_style_shift <- ing.prog.spkr.style3$Casual - ing.prog.spkr.style3$Careful

#multiplying style_shift values by -100 so align with Labov's (in mag & direction)
ing.prog.spkr.style3$ING_style_shift <- ing.prog.spkr.style3$ING_style_shift * -100

#create age2 column that groups speakers into younger (<30) or older(≥30)
ing.prog.spkr.style3$age2 <- ifelse(ing.prog.spkr.style3$age %in% c(18,20,21,22,23,27,28,29), "younger", "older")

View(ing.prog.spkr.style3)

# Step 5 - replicating Labov figure 5.2 (histogram, Cas index - Car index, for ind spkr)

#opening ggplot2
library(ggplot2)

#making the graph - histogram with degree of ind style shift by gender
ggplot(ing.prog.spkr.style3, aes(ING_style_shift)) + 
  geom_histogram(stat_bin=5, aes(fill=sex), position="dodge") +
  ggtitle("Style-shifting of (ING) variable in Philadelphia")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

I found slightly more style-shifting than Labov did for the (ING) variable, though the style-shifting I observed for (ING) was still not dramatic and occurred in the negative direction (the opposite direction of Labov’s shift). My modal value fell around -10. Labov does not remark upon a difference appearing by sex, and it seems that in my data speakers of each sex fell fairly consistently along the range of style-shift values. It is interesting that for (ING), my most extreme negative values correspond to male speakers and my most extreme positive values to females. These gender extremes are the opposite of what I found with (DH). A clear difference by gender does not otherwise emerge.

Replicating Labov 2001 figure 5.5
# Step 6 - replicating Labov fig 5.5 - distribution of style shift by age and gender

#making the graph - scatterplot with dist of ind style shift by age and gender
ggplot(ing.prog.spkr.style3, aes(age, ING_style_shift)) + geom_point(aes(color=sex)) + 
  stat_smooth(method = "lm") + ggtitle("Distribution of (ING) style-shift by age and sex")
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).

My values for the degree of (ING) style-shift range from approximately -30 to 20, and are consistent with the range of values I obtained for the degree of (DH) style-shift. It is interesting that for (ING), as age increases the points move further away from the overall regression line, while the reverse pattern was found with the (DH) data. As with (DH), the (ING) overall regression line is slightly sloped, increasing in slope as age increases, which suggests a small tendency for speakers to increase their degree of (ING) style-shifting as they age. However, the slope is not dramatic, reinforcing how an age effect on degree of (ING) style-shifting is fairly small. The overall regression line for (ING) is centered around -10, reflecting the slightly greater degree of style-shifting observed for (ING) than for (DH). It is interesting that the (ING) modal value is negative, indicating style-shifting for the (ING) variable occurs opposite the normal direction of style-shifting in society. The lack of a strong correlation between degree of style-shifting and age, and between degree of style-shifting and sex, reinforces the stable nature of the (ING) variable. It is interesting that for older speakers, females tend to exhibit more extreme style-shifting in the positive direction, and males more extreme style-shifting in the negative direction. Labov does not provide a graph showing the relationship between age, gender and degree of (ING) style-shift.

Replicating Labov 2001 figure 5.7

As explained above, Labov interprets this distribution of index scores across the nuanced categories of the decision tree as validation of each category’s ability to contribute to identifying stylistic variation. Labov notes that there are not large differences among the Careful speech subcategories for (ING), and that all of the Careful speech values fall below the (ING) mean value. He notes that three of the four Casual [sic] speech categories are above the (ING) mean value, and that Narrative is at the mean level. Labov introduces the notion of the objectivity of the style-coding process in pointing out that the Tangent value falls farthest from the (ING) mean value, and that Tangent is the least objective coding decision. He claims that this increased subjectivity might increase the likelihood that the linguistic variants used would bias the coder’s decision. Ideally, a coder would not be exposed to any linguistic information while they are coding. However, the presence of potentially biasing information was an issue I encountered while coding, revealing a shortcoming of the interview transcription process and perhaps also of the decision tree style-coding method.

# Step 7 - replicating fig 5.7 - stylistic differentiation for 8 cat of dec tree

#turning into dplyr data frame
ing.prog <- tbl_df(ing.prog)

#say what data frame
ing.prog.style.fin2 <- ing.prog %>%
  
  #will create data in long format
  group_by(style, Bin_style) %>%
  
  #summarize; add dh stopping rate and N columns
  summarise(index=mean(code),N=n())

#multiplying index values by 100 so correspond to Labov's values
ing.prog.style.fin2$index <- ing.prog.style.fin2$index * 100

#reordering style values on x-axis so match Labov 2001 order
ing.prog.style.fin2$style <- factor(ing.prog.style.fin2$style, c("S","L","R","C","N","K","T","G"))

View(ing.prog.style.fin2)

#replicating figure 5.7 as a bar graph - index score arr. by orig 8 style values
ggplot(ing.prog.style.fin2, aes(style,index)) + geom_bar(stat = "identity") + facet_wrap(~ Bin_style) + 
  ggtitle("Stylistic Differentiation of (ING) for eight categories of the Style Decision Tree")

My graph indicates that in each broad stylistic category (Careful and Casual), the index values for individual stylistic categories fall within a range. For both the Careful and Casual styles, this range is fairly consistent; it is slightly larger for Casual speech, between around 30 and 50, and between around 30 and 45 for Careful. For Careful, the value for Language is noticeably lower than those of the other three categories, which fall around 40. It is interesting that the Language value is lower. I noticed while coding that some instances of speech in the Language style might have been better coded as Narrative or Tangent, and this inconsistency in speech tokens coded as Language could explain Language’s lower index. For Casual, the categories are more spread out from one another: Kids and Tangent have the highest values, (around 50); Group falls slightly lower (with a value around 40); and Narrative falls lowest (with a value around 30). It is interesting that Narrative had the lowest index value in Casual speech, since Labov emphasized the category’s ability to capture casual/vernacular speech and placed the greatest emphasis on identifying instances of Narrative. That the narrative category has the lowest index raises questions about the style-coding process. Once again, it is interesting that the index value ranges are so similar between the individual style categories across the Careful-Casual stylistic distinction. This seems to indicate that each category contributes something comparable to the task of identifying stylistic variation in speech, and that for better classification an entirely different method may be necessary. The index values Labov found were more noticeably different between the broad Careful and Casual stylistic categories. Labov found a slightly greater difference (approximately 15) of relative clustering between these two categories than I did, but a much smaller difference than was found with (DH).

Replicating Labov 2001 figure 5.8

Labov notes that the (ING) values for men are generally higher than those for women, which he finds unsurprising given the regression analysis of the larger data set. He says that the difference by sex is more consistent in Casual speech (where the two sets have parallel paths) than in Careful speech. Labov draws attention to the fact that within these stylistic subsets, Soapbox is not differentiated from the Casual speech styles for women, while it is differentiated for men. He also mentions that men’s Careful and Casual speech is not differentiated between the Response and Narrative categories.

# Step 8 - replicating Labov's fig 5.8 (stylistic diff by sex for 8 cat of dec tree)

#turning into dplyr data frame
ing.prog <- tbl_df(ing.prog)

#say what data frame
ing.prog.style.fin3 <- ing.prog %>%
  
  #will create data in long format
  group_by(style, Bin_style, sex) %>%
  
  #summarize; add dh stopping rate and N columns
  summarise(index=mean(code),N=n())

#multiplying index values by 100 so correspond to Labov's values
ing.prog.style.fin3$index <- ing.prog.style.fin3$index * 100

#reordering style values on x-axis so match Labov 2001 order
ing.prog.style.fin3$style <- factor(ing.prog.style.fin3$style, c("S","L","R","C","N","K","T","G"))

View(ing.prog.style.fin3)

#replicating figure 5.8 as bar graph - index score arr. by orig 8 style values and diff by sex
ggplot(ing.prog.style.fin3, aes(style,index)) + geom_bar(stat = "identity", aes(fill=sex)) + facet_wrap(~ Bin_style) +
  facet_grid(sex ~ Bin_style) + ggtitle("Stylistic Differentiation of (ING) by sex for eight categories of the Style Decision Tree")

My graph again indicates that (ING) stylistic differentiation is more varied across sex than across style category. That is, both men and women exhibit similar indices of style-shifting to those of their same sex across Careful and Casual styles. The index value for women was fairly consistent across stylistic category (with the exception of the Language category), at around 50. The index values for men tended to be lower than those for women, and were once again fairly consistent across stylistic category. For men, the index values for Careful and Casual (with the exception of Kids) clustered around 30. It is interesting that there was one outlier stylistic category for each sex: Language for women and Kids for men. There is not a clear relationship between these values and Labov’s classification of their objectivity, for he classed Language as relatively objective, but Kids as relatively subjective. My results are also interesting when compared to Labov’s, for as with (DH), Labov’s primary differentiator of index value was stylistic category, not sex. For Labov, within each stylistic category men and women had more similar index values to each other than each sex did when compared to its own sex across stylistic category.

ING: Logistical Analysis

This logistic regression was designed to measure an interaction between the effects of grammar and style. An almost significant interaction occurred between the “participle” grammatical category and style. Labov only analyzed instances of (ING) in participle form. There was not an interaction between the other grammatical categories I analyzed (“gerund”, “mono-morpheme”, “root-attached”, and “something/nothing”) and style. As with the (DH) variable, the results of my logistic regression for (ING) did not find Style to be a significant predictor of speech variant used. That is, my original hypothesis that the percent frequency of the non-standard [-in’] variant would be greater when measured across “casual” than “careful” stylistic contexts was not supported. I fail to reject the null hypothesis that different stylistic contexts have no effect on the variant ([-ing] or [-in’]) realized by a speaker. It is important to note that sex, preceding segment, and following segment for “pause” were all found to be significant predictors of variant used, emphasizing the role that internal and external factors can have on the production of speech variables.

#accessing lme4 (program needed to run regression)
library(lme4)

#accessing data file
ing.lr <- read.csv("ingstyle_fin.csv")

#peeking at first lines of data file
head(ing.lr)
##                File Segment Position code Gram Seg_Start Seg_End
## 1 PH00-1-1-JStevens     IH0      End    0    o    85.063  85.102
## 2 PH00-1-1-JStevens     IH0      End    0    p   112.708 112.738
## 3 PH00-1-1-JStevens     IH0      End    1    p   114.648 114.688
## 4 PH00-1-1-JStevens     IH0      End    0    p   127.878 127.907
## 5 PH00-1-1-JStevens     IH0      End    0    o   147.712 147.743
## 6 PH00-1-1-JStevens     IH0      End    1    n   148.553 148.603
##         word Word_Start Word_End Pre_Seg Pre_Seg_Start Pre_Seg_End
## 1    WORKING     84.912   85.132       K        85.023      85.063
## 2     FIXING    112.468  112.768       S       112.658     112.708
## 3 NETWORKING    114.308  114.798       K       114.588     114.648
## 4    WORKING    127.687  127.937       K       127.808     127.878
## 5    WAITING    147.593  147.773       T       147.682     147.712
## 6    OPENING    148.413  148.653       N       148.522     148.553
##   Post_Seg Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1        S         85.132       85.273  1.280             5.469
## 2        K        112.768      112.808  3.060             2.614
## 3       sp        114.798      115.138  3.635             2.201
## 4       AW        127.937      128.028  1.860             3.226
## 5        T        147.773      147.812  1.310             8.397
## 6       sp        148.653      148.673  1.430             6.993
##   preseg_place preseg_manner folseg_place folseg_manner sex birthyear age
## 1        velar      stop/aff      coronal     fricative   m      1979  21
## 2      coronal     fricative        velar      stop/aff   m      1979  21
## 3        velar      stop/aff         none         pause   m      1979  21
## 4        velar      stop/aff        velar         vowel   m      1979  21
## 5      coronal      stop/aff      coronal      stop/aff   m      1979  21
## 6      coronal         nasal         none         pause   m      1979  21
##   school school.calc   school.cat newgram newgram2 PrevING   PrevWord
## 1     14           2 Some college       p       NA       1         na
## 2     14           2 Some college       g       NA       0    WORKING
## 3     14           2 Some college       g       NA       0     FIXING
## 4     14           2 Some college       p       NA       1 NETWORKING
## 5     14           2 Some college       p       NA       0    WORKING
## 6     14           2 Some college       r       NA       0    WAITING
##   PrevGram    Lag subtlex.count   start     end style Bin_style Or_style
## 1     <NA>  1.409         12775  84.912  83.950     R        NA       NA
## 2        p 27.636           533 112.468 107.275     C        NA       NA
## 3        g  2.030            27 114.308 107.275     C        NA       NA
## 4        g 13.139         12775 127.687 126.505     C        NA       NA
## 5        p  8.600         10767 147.593 146.840     C        NA       NA
## 6        p  0.880          1960 148.413 146.840     C        NA       NA
#collapsing 8 stylistic categories into 3 style codes in Bin_style
ing.lr$Bin_style <- ifelse(ing.lr$style %in% c("C","L","R","S"), "Careful",
                           ifelse(ing.lr$style %in% c("G","K","N","T"), "Casual", 
                                  "NA"))

#filter so NAs are taken out of code column
ing.lr <- filter(ing.lr, !code=="NA")

#filter so "n/a"s don't appear in table
ing.lr <- filter(ing.lr, !Bin_style=="n/a")

#simplifying preseg and folseg columns

# simplify phonological environment based on previous analysis
ing.lr$Pre_Seg <- "other"

ing.lr$Pre_Seg[
  ing.lr$preseg_place=="velar" & ing.lr$preseg_manner=="nasal"
  ] <- "velar.N"

ing.lr$Pre_Seg[
  ing.lr$preseg_place=="coronal" & ing.lr$preseg_manner=="nasal"
  ] <- "coronal.N"

ing.lr$Pre_Seg[
  ing.lr$preseg_place=="coronal" & ing.lr$preseg_manner %in% c("fricative","stop/aff")
  ] <- "coronal.obs"

ing.lr$Post_Seg <- "other"

ing.lr$Post_Seg[
  ing.lr$folseg_place=="velar" & ing.lr$folseg_manner %in% c("fricative",
                                                             "nasal",
                                                             "stop/aff",
                                                             "liquid/glide")
  ] <- "velar.C"

ing.lr$Post_Seg[
  ing.lr$folseg_manner=="pause"
  ] <- "pause"

#collapsing grammatical categories

#recode values into multiple categories; brackets index rows
ing.lr$newgram2[ing.lr$newgram %in% c("m","s")] <- "ms"
ing.lr$newgram2[ing.lr$newgram %in% c("g","r")] <- "gr"
ing.lr$newgram2[ing.lr$newgram %in% c("p")] <- "p"

#peeking at first lines of file
head(ing.lr)
##                File Segment Position code Gram Seg_Start Seg_End
## 1 PH00-1-1-JStevens     IH0      End    0    o    85.063  85.102
## 2 PH00-1-1-JStevens     IH0      End    0    p   112.708 112.738
## 3 PH00-1-1-JStevens     IH0      End    1    p   114.648 114.688
## 4 PH00-1-1-JStevens     IH0      End    0    p   127.878 127.907
## 5 PH00-1-1-JStevens     IH0      End    0    o   147.712 147.743
## 6 PH00-1-1-JStevens     IH0      End    1    n   148.553 148.603
##         word Word_Start Word_End     Pre_Seg Pre_Seg_Start Pre_Seg_End
## 1    WORKING     84.912   85.132       other        85.023      85.063
## 2     FIXING    112.468  112.768 coronal.obs       112.658     112.708
## 3 NETWORKING    114.308  114.798       other       114.588     114.648
## 4    WORKING    127.687  127.937       other       127.808     127.878
## 5    WAITING    147.593  147.773 coronal.obs       147.682     147.712
## 6    OPENING    148.413  148.653   coronal.N       148.522     148.553
##   Post_Seg Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1    other         85.132       85.273  1.280             5.469
## 2  velar.C        112.768      112.808  3.060             2.614
## 3    pause        114.798      115.138  3.635             2.201
## 4    other        127.937      128.028  1.860             3.226
## 5    other        147.773      147.812  1.310             8.397
## 6    pause        148.653      148.673  1.430             6.993
##   preseg_place preseg_manner folseg_place folseg_manner sex birthyear age
## 1        velar      stop/aff      coronal     fricative   m      1979  21
## 2      coronal     fricative        velar      stop/aff   m      1979  21
## 3        velar      stop/aff         none         pause   m      1979  21
## 4        velar      stop/aff        velar         vowel   m      1979  21
## 5      coronal      stop/aff      coronal      stop/aff   m      1979  21
## 6      coronal         nasal         none         pause   m      1979  21
##   school school.calc   school.cat newgram newgram2 PrevING   PrevWord
## 1     14           2 Some college       p        p       1         na
## 2     14           2 Some college       g       gr       0    WORKING
## 3     14           2 Some college       g       gr       0     FIXING
## 4     14           2 Some college       p        p       1 NETWORKING
## 5     14           2 Some college       p        p       0    WORKING
## 6     14           2 Some college       r       gr       0    WAITING
##   PrevGram    Lag subtlex.count   start     end style Bin_style Or_style
## 1     <NA>  1.409         12775  84.912  83.950     R   Careful       NA
## 2        p 27.636           533 112.468 107.275     C   Careful       NA
## 3        g  2.030            27 114.308 107.275     C   Careful       NA
## 4        g 13.139         12775 127.687 126.505     C   Careful       NA
## 5        p  8.600         10767 147.593 146.840     C   Careful       NA
## 6        p  0.880          1960 148.413 146.840     C   Careful       NA
#summary of file contents -> ensure everything in correct format
summary(ing.lr)
##                     File      Segment    Position        code       
##  PH06-2-2-Patrick     : 171   IH0:3799   End:3799   Min.   :0.0000  
##  PH82-1-10-MCollins   : 163                         1st Qu.:0.0000  
##  PH06-1-6-JMcPhee     : 145                         Median :0.0000  
##  PH92-2-3-JSantori    : 142                         Mean   :0.4267  
##  PH06-2-5-Sophia      : 141                         3rd Qu.:1.0000  
##  PH79-3-5-VSarsparilla: 140                         Max.   :1.0000  
##  (Other)              :2897                                         
##  Gram       Seg_Start           Seg_End                word     
##  a: 102   Min.   :   9.493   Min.   :   9.542   SOMETHING: 329  
##  d:  49   1st Qu.: 746.888   1st Qu.: 746.927   GOING    : 285  
##  g: 188   Median :1436.708   Median :1436.758   DOING    : 165  
##  n: 241   Mean   :1497.028   Mean   :1497.068   GETTING  : 150  
##  o:1705   3rd Qu.:2200.831   3rd Qu.:2200.885   BEING    : 114  
##  p:1085   Max.   :4469.943   Max.   :4469.993   COMING   : 109  
##  s: 429                                         (Other)  :2647  
##    Word_Start          Word_End          Pre_Seg         
##  Min.   :   9.373   Min.   :   9.642   Length:3799       
##  1st Qu.: 746.652   1st Qu.: 746.967   Class :character  
##  Median :1436.358   Median :1436.878   Mode  :character  
##  Mean   :1496.778   Mean   :1497.137                     
##  3rd Qu.:2200.555   3rd Qu.:2200.915                     
##  Max.   :4469.703   Max.   :4470.052                     
##                                                          
##  Pre_Seg_Start       Pre_Seg_End         Post_Seg        
##  Min.   :   9.413   Min.   :   9.493   Length:3799       
##  1st Qu.: 746.808   1st Qu.: 746.888   Class :character  
##  Median :1436.498   Median :1436.708   Mode  :character  
##  Mean   :1496.952   Mean   :1497.028                     
##  3rd Qu.:2200.755   3rd Qu.:2200.831                     
##  Max.   :4469.793   Max.   :4469.943                     
##                                                          
##  Post_Seg_Start      Post_Seg_End          Window       Vowels_per_Second
##  Min.   :   9.642   Min.   :   9.782   Min.   : 0.570   Min.   : 0.148   
##  1st Qu.: 746.967   1st Qu.: 747.044   1st Qu.: 1.300   1st Qu.: 3.510   
##  Median :1436.878   Median :1437.028   Median : 1.610   Median : 4.788   
##  Mean   :1497.137   Mean   :1497.248   Mean   : 1.813   Mean   : 4.833   
##  3rd Qu.:2200.915   3rd Qu.:2200.965   3rd Qu.: 1.970   3rd Qu.: 6.028   
##  Max.   :4470.052   Max.   :4470.133   Max.   :33.877   Max.   :13.115   
##                                                                          
##   preseg_place       preseg_manner   folseg_place       folseg_manner 
##  coronal:2139   fricative   : 827   coronal:1388   fricative   : 559  
##  labial :1002   liquid/glide: 246   labial : 517   liquid/glide: 504  
##  velar  : 658   nasal       : 405   none   : 863   nasal       : 146  
##                 stop/aff    :1359   velar  :1031   pause       : 863  
##                 vowel       : 962                  stop/aff    : 564  
##                                                    vowel       :1163  
##                                                                       
##  sex        birthyear         age            school     
##  f:1937   Min.   :1895   Min.   :18.00   Min.   : 0.00  
##  m:1862   1st Qu.:1930   1st Qu.:29.00   1st Qu.:11.00  
##           Median :1944   Median :43.00   Median :12.00  
##           Mean   :1946   Mean   :45.07   Mean   :12.02  
##           3rd Qu.:1965   3rd Qu.:62.00   3rd Qu.:14.00  
##           Max.   :1984   Max.   :78.00   Max.   :16.00  
##                                          NA's   :177    
##   school.calc               school.cat      newgram       newgram2        
##  Min.   :-12.00000   Just HS     :1442   exclude:  18   Length:3799       
##  1st Qu.: -1.00000   Not HS      :1012   g      : 625   Class :character  
##  Median :  0.00000   Some college:1168   m      : 138   Mode  :character  
##  Mean   :  0.01712   NA's        : 177   p      :2262                     
##  3rd Qu.:  2.00000                       r      : 324                     
##  Max.   :  4.00000                       s      : 432                     
##  NA's   :177                                                              
##     PrevING            PrevWord       PrevGram         Lag         
##  Min.   :0.0000   na       : 622   exclude:  16   Min.   : -2.666  
##  1st Qu.:0.0000   SOMETHING: 263   g      : 515   1st Qu.:  3.250  
##  Median :1.0000   GOING    : 240   m      : 112   Median :  9.973  
##  Mean   :0.5155   DOING    : 142   p      :1916   Mean   : 20.292  
##  3rd Qu.:1.0000   GETTING  : 123   r      : 257   3rd Qu.: 25.974  
##  Max.   :1.0000   (Other)  :2392   s      : 343   Max.   :401.690  
##  NA's   :28       NA's     :  17   NA's   : 640   NA's   :17       
##  subtlex.count        start               end              style     
##  Min.   :     0   Min.   :   9.373   Min.   :   9.04   C      :1206  
##  1st Qu.:  2181   1st Qu.: 746.652   1st Qu.: 745.57   T      : 916  
##  Median :  9730   Median :1436.358   Median :1434.12   N      : 665  
##  Mean   : 24997   Mean   :1496.778   Mean   :1495.09   R      : 472  
##  3rd Qu.: 26878   3rd Qu.:2200.555   3rd Qu.:2199.34   S      : 350  
##  Max.   :108288   Max.   :4469.703   Max.   :4468.66   G      : 128  
##                                                        (Other):  62  
##   Bin_style         Or_style      
##  Length:3799        Mode:logical  
##  Class :character   NA's:3799     
##  Mode  :character                 
##                                   
##                                   
##                                   
## 
View(ing.lr)

#now random effects are included in the models

#creating model 1 - testing for interaction
mod1.ing <- glmer(code ~ sex + birthyear + Pre_Seg + Post_Seg + newgram2 * Bin_style + 
                    (1|File), ing.lr, family = "binomial")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues
#creating model 2 - not testing for interaction
mod2.ing <- glmer(code ~ sex + birthyear + Pre_Seg + Post_Seg + newgram2 + Bin_style + 
                    (1|File), ing.lr, family = "binomial")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues
#seeing the results of mod 1
summary(mod1.ing)
## Warning in vcov.merMod(object, correlation = correlation, sigm = sig): variance-covariance matrix computed from finite-difference Hessian is
## not positive definite or contains NA values: falling back to var-cov estimated from RX
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## code ~ sex + birthyear + Pre_Seg + Post_Seg + newgram2 * Bin_style +  
##     (1 | File)
##    Data: ing.lr
## 
##      AIC      BIC   logLik deviance df.resid 
##   3640.1   3727.4  -1806.0   3612.1     3767 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.7060 -0.5473 -0.2824  0.5122  5.8853 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  File   (Intercept) 2.339    1.529   
## Number of obs: 3781, groups:  File, 40
## 
## Fixed effects:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 2.0294630 21.3842071   0.095  0.92439    
## sexm                       -1.1511843  0.4950596  -2.325  0.02005 *  
## birthyear                   0.0003583  0.0109941   0.033  0.97400    
## Pre_Segcoronal.obs         -1.3318672  0.1902577  -7.000 2.55e-12 ***
## Pre_Segother               -2.1171236  0.1920858 -11.022  < 2e-16 ***
## Pre_Segvelar.N             -2.4733188  0.5728748  -4.317 1.58e-05 ***
## Post_Segpause               0.7267724  0.1009590   7.199 6.08e-13 ***
## Post_Segvelar.C             0.2108500  0.1844139   1.143  0.25289    
## newgram2ms                 -1.3236769  0.1822270  -7.264 3.76e-13 ***
## newgram2p                  -1.0047381  0.1315023  -7.640 2.16e-14 ***
## Bin_styleCasual            -0.0144201  0.1728724  -0.083  0.93352    
## newgram2ms:Bin_styleCasual -0.0492485  0.2716102  -0.181  0.85612    
## newgram2p:Bin_styleCasual  -0.5362775  0.2043097  -2.625  0.00867 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) sexm   brthyr Pr_Sg. Pr_Sgt Pr_S.N Pst_Sg Ps_S.C
## sexm        -0.026                                                 
## birthyear   -1.000  0.015                                          
## Pr_Sgcrnl.b -0.018  0.009  0.011                                   
## Pre_Segothr -0.016  0.013  0.008  0.865                            
## Pre_Sgvlr.N -0.004  0.004  0.001  0.290  0.310                     
## Post_Segpas -0.012 -0.003  0.011 -0.011 -0.012  0.029              
## Pst_Sgvlr.C -0.004 -0.003  0.003  0.012  0.005 -0.001  0.140       
## newgram2ms   0.001  0.001 -0.004 -0.025  0.132  0.046 -0.069  0.010
## newgram2p    0.003  0.012 -0.007 -0.028 -0.030 -0.019 -0.002  0.028
## Bin_stylCsl -0.010  0.013  0.007  0.002  0.019  0.020 -0.025 -0.028
## nwgrm2m:B_C -0.001  0.001  0.002  0.062  0.037  0.010  0.042  0.017
## nwgrm2p:B_C  0.000 -0.001  0.002  0.018  0.008  0.001  0.011 -0.003
##             nwgrm2m nwgrm2p Bn_stC nwgrm2m:B_C
## sexm                                          
## birthyear                                     
## Pr_Sgcrnl.b                                   
## Pre_Segothr                                   
## Pre_Sgvlr.N                                   
## Post_Segpas                                   
## Pst_Sgvlr.C                                   
## newgram2ms                                    
## newgram2p    0.452                            
## Bin_stylCsl  0.349   0.480                    
## nwgrm2m:B_C -0.614  -0.307  -0.612            
## nwgrm2p:B_C -0.295  -0.632  -0.811  0.524     
## convergence code: 0
## unable to evaluate scaled gradient
## Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

This logistic regression was not designed to measure an interaction between the effects of grammar and style. Once again, the style independent variable was not found to be a significant predictor of speech variant used. That is, my original hypothesis that the percent frequency of the non-standard [-in’] variant would be greater when measured across “casual” than “careful” stylistic contexts was not supported, and I fail to reject the null hypothesis. It is important to note that sex, preceding segment, following segment for “pause”, and grammar were all found to be significant predictors of the speech variant used.

#seeing the results of mod 2
summary(mod2.ing)
## Warning in vcov.merMod(object, correlation = correlation, sigm = sig): variance-covariance matrix computed from finite-difference Hessian is
## not positive definite or contains NA values: falling back to var-cov estimated from RX
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## code ~ sex + birthyear + Pre_Seg + Post_Seg + newgram2 + Bin_style +  
##     (1 | File)
##    Data: ing.lr
## 
##      AIC      BIC   logLik deviance df.resid 
##   3644.9   3719.7  -1810.4   3620.9     3769 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.7239 -0.5469 -0.2854  0.5279  5.5580 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  File   (Intercept) 2.322    1.524   
## Number of obs: 3781, groups:  File, 40
## 
## Fixed effects:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         2.1165813 21.3102841   0.099 0.920883    
## sexm               -1.1557525  0.4933465  -2.343 0.019146 *  
## birthyear           0.0003824  0.0109560   0.035 0.972154    
## Pre_Segcoronal.obs -1.3430523  0.1900666  -7.066 1.59e-12 ***
## Pre_Segother       -2.1296033  0.1922150 -11.079  < 2e-16 ***
## Pre_Segvelar.N     -2.4868536  0.5765801  -4.313 1.61e-05 ***
## Post_Segpause       0.7255888  0.1006942   7.206 5.77e-13 ***
## Post_Segvelar.C     0.2048605  0.1834546   1.117 0.264130    
## newgram2ms         -1.3313415  0.1433271  -9.289  < 2e-16 ***
## newgram2p          -1.2322022  0.1016017 -12.128  < 2e-16 ***
## Bin_styleCasual    -0.3305277  0.0933410  -3.541 0.000398 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) sexm   brthyr Pr_Sg. Pr_Sgt Pr_S.N Pst_Sg Ps_S.C
## sexm        -0.026                                                 
## birthyear   -1.000  0.015                                          
## Pr_Sgcrnl.b -0.019  0.009  0.011                                   
## Pre_Segothr -0.016  0.013  0.008  0.865                            
## Pre_Sgvlr.N -0.004  0.005  0.001  0.289  0.308                     
## Post_Segpas -0.012 -0.003  0.011 -0.013 -0.013  0.028              
## Pst_Sgvlr.C -0.004 -0.003  0.003  0.010  0.004 -0.001  0.140       
## newgram2ms   0.000  0.002 -0.004  0.018  0.198  0.067 -0.054  0.024
## newgram2p    0.003  0.015 -0.006 -0.022 -0.033 -0.024  0.005  0.033
## Bin_stylCsl -0.019  0.023  0.017  0.059  0.066  0.043 -0.013 -0.047
##             nwgrm2m nwgrm2p
## sexm                       
## birthyear                  
## Pr_Sgcrnl.b                
## Pre_Segothr                
## Pre_Sgvlr.N                
## Post_Segpas                
## Pst_Sgvlr.C                
## newgram2ms                 
## newgram2p    0.458         
## Bin_stylCsl -0.010  -0.071 
## convergence code: 0
## unable to evaluate scaled gradient
## Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

An ANOVA was conducted to compare the two logistic regressions run for (ING) (to see if the more complex model, that tested for an interaction between grammar and style, really fit the data better enough to justify its increased complexity). The deviance between the two models was -7.2861 ???. This does not indicate a significant difference between the two models, and so model two (the model not testing for a grammar-style interaction) will be used to analyze this data set.