The values for style-shift were calculated by subtracting each speaker’s index for Careful speech from their index for Casual speech. The index is the mean rate at which the linguistic variant was used. For (DH), a higher index means the nonstandard (glottal) form was used more frequently, and for (ING), a higher index means the nonstandard (apical) form was used more frequently (even though it was reported in the opposite way). Negative and positive values refer to the direction of the style-shift. Positive values indicate a speaker shifted from using more of the nonstandard variant in Casual speech to less of the nonstandard form in Careful speech, while negative values indicate the opposite pattern.
Labov found dramatic style-shifting with the (DH) variable, and in his analysis emphasizes how the majority of speakers shifted in the positive direction. His modal value fell between 25 and 30. No strong difference by gender appeared. However, Labov noted that women were concentrated among speakers with the highest style-shift values (7 of the 8 values over 85 are female) and that men were concentrated among the speakers with most negative values (6 of the 8 negative values belong to males).
#working in dplyr
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#opening data file
dh <- read.csv("dhstyle_fin.csv")
#peeking at first lines of data frame
head(dh)
## File Segment Position Code1 Labov Code2 Seg_Start Seg_End
## 1 PH00-1-1-JStevens DH Start 9 NA NA 36.613 36.643
## 2 PH00-1-1-JStevens DH Start 9 NA NA 44.653 44.682
## 3 PH00-1-1-JStevens DH Start 9 NA NA 48.302 48.333
## 4 PH00-1-1-JStevens DH Start 9 NA NA 52.773 52.803
## 5 PH00-1-1-JStevens DH Start 0 NA NA 63.138 63.168
## 6 PH00-1-1-JStevens DH Start 2 NA NA 76.323 76.353
## Word Word_Start Word_End Pre_Seg Pre_Seg2 Pre_Seg_Start Pre_Seg_End
## 1 THE 36.613 36.673 S NA 36.543 36.613
## 2 THIS 44.653 44.792 AH NA 44.453 44.653
## 3 THE 48.302 48.363 S NA 48.273 48.302
## 4 THAT'S 52.773 52.983 sp NA 52.633 52.773
## 5 THE 63.138 63.198 K NA 63.108 63.138
## 6 THE 76.323 76.383 N NA 76.273 76.323
## Post_Seg Post_Seg2 Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1 AH NA 36.643 36.673 1.950 3.590
## 2 AH NA 44.682 44.713 1.940 2.577
## 3 AH NA 48.333 48.363 1.081 6.475
## 4 AE NA 52.803 52.853 1.500 3.333
## 5 AH NA 63.168 63.198 1.620 4.938
## 6 AH NA 76.353 76.383 2.250 3.556
## Age Age2 Birthyear sex Ethnicity School code Gram PrevING PrevWord
## 1 21 NA 1979 m i/r 14 NA NA NA na
## 2 21 NA 1979 m i/r 14 NA NA 9 THE
## 3 21 NA 1979 m i/r 14 NA NA 9 THIS
## 4 21 NA 1979 m i/r 14 NA NA 9 THE
## 5 21 NA 1979 m i/r 14 NA NA 9 THAT'S
## 6 21 NA 1979 m i/r 14 NA NA 0 THE
## X Lag X.1 style Bin_style Or_style
## 1 19.093 36.673 34.910 R NA NA
## 2 8.119 44.792 44.080 C NA NA
## 3 3.571 48.363 46.390 C NA NA
## 4 4.620 52.983 52.460 C NA NA
## 5 10.215 63.198 61.655 C NA NA
## 6 13.185 76.383 73.190 C NA NA
#seeing summary of data frame
summary(dh)
## File Segment Position Code1
## PH92-2-3-JSantori : 924 DH:18974 Start:18974 Min. :0.000
## PH06-2-2-Patrick : 712 1st Qu.:1.000
## PH06-2-7-Samantha : 626 Median :1.000
## PH06-1-6-JMcPhee : 611 Mean :1.578
## PH73-5-2-CKay : 600 3rd Qu.:1.000
## PH82-1-10-MCollins: 600 Max. :9.000
## (Other) :14901
## Labov Code2 Seg_Start Seg_End
## Mode:logical Mode:logical Min. : 1.953 Min. : 2.003
## NA's:18974 NA's:18974 1st Qu.: 718.413 1st Qu.: 718.443
## Median :1383.079 Median :1383.153
## Mean :1461.399 Mean :1461.448
## 3rd Qu.:2102.733 3rd Qu.:2102.765
## Max. :4476.403 Max. :4476.453
##
## Word Word_Start Word_End Pre_Seg
## THE :7128 Min. : 1.953 Min. : 2.093 sp :5560
## THAT :2794 1st Qu.: 718.413 1st Qu.: 718.473 N :2743
## THEY :2652 Median :1383.079 Median :1383.341 Z :1145
## THERE :1185 Mean :1461.399 Mean :1461.554 V :1018
## THIS :1056 3rd Qu.:2102.733 3rd Qu.:2102.810 L : 927
## THAT'S : 864 Max. :4476.403 Max. :4476.533 K : 868
## (Other):3295 (Other):6713
## Pre_Seg2 Pre_Seg_Start Pre_Seg_End Post_Seg
## Mode:logical Min. : 1.893 Min. : 1.953 AH :8252
## NA's:18974 1st Qu.: 718.343 1st Qu.: 718.413 EH :3148
## Median :1382.614 Median :1383.079 EY :2785
## Mean :1461.240 Mean :1461.399 AE :2760
## 3rd Qu.:2102.519 3rd Qu.:2102.733 IY :1603
## Max. :4476.333 Max. :4476.403 IH : 254
## (Other): 172
## Post_Seg2 Post_Seg_Start Post_Seg_End Window
## Mode:logical Min. : 2.003 Min. : 2.063 Min. : 0.470
## NA's:18974 1st Qu.: 718.443 1st Qu.: 718.473 1st Qu.: 1.240
## Median :1383.153 Median :1383.329 Median : 1.530
## Mean :1461.448 Mean :1461.519 Mean : 1.793
## 3rd Qu.:2102.765 3rd Qu.:2102.810 3rd Qu.: 1.931
## Max. :4476.453 Max. :4476.533 Max. :53.157
##
## Vowels_per_Second Age Age2 Birthyear sex
## Min. : 0.061 Min. :18.00 Mode:logical Min. :1895 f:10144
## 1st Qu.: 2.928 1st Qu.:30.00 NA's:18974 1st Qu.:1925 m: 8830
## Median : 4.196 Median :45.00 Median :1944
## Mean : 4.264 Mean :46.91 Mean :1943
## 3rd Qu.: 5.484 3rd Qu.:68.00 3rd Qu.:1962
## Max. :13.725 Max. :78.00 Max. :1984
##
## Ethnicity School code Gram
## i :9781 12 :7059 Mode:logical Mode:logical
## r :4219 16 :4466 NA's:18974 NA's:18974
## r/o : 795 0 :1560
## w/p/J : 588 8 :1465
## i/w/g : 580 11 :1302
## w : 555 9 : 924
## (Other):2456 (Other):2198
## PrevING PrevWord X Lag
## Min. :0.000 THE :6775 Min. : -6.801 Min. : 2.093
## 1st Qu.:1.000 THAT :2606 1st Qu.: 1.143 1st Qu.: 718.473
## Median :1.000 THEY :2546 Median : 2.810 Median :1383.341
## Mean :1.571 THERE :1119 Mean : 5.130 Mean :1461.554
## 3rd Qu.:1.000 THIS :1000 3rd Qu.: 6.433 3rd Qu.:2102.810
## Max. :9.000 (Other):4906 Max. :344.800 Max. :4476.533
## NA's :1002 NA's : 22 NA's :22
## X.1 style Bin_style Or_style
## Min. : 1.88 C :6548 Mode:logical Mode:logical
## 1st Qu.: 717.05 T :4707 NA's:18974 NA's:18974
## Median :1381.71 R :3167
## Mean :1459.96 N :1793
## 3rd Qu.:2101.21 S :1734
## Max. :4476.32 G : 646
## (Other): 379
# Step 1 - Reorganizing the data frame
#collapsing style codes into 3 styles (car, cas, na) in Bin_style (%in%, not %>%)
dh$Bin_style <- ifelse(dh$style %in% c("C","R","L","S"), "Careful",
ifelse(dh$style %in% c("N","G","K","T"), "Casual", "NA"))
#filtering NAs from Bin_style column so don't appear in table
dh <- filter(dh, !Bin_style=="NA")
#collapsing/reconfiguring Code1 values in "Labov" so they match L's 2001 coding
#that is, creating labov's index so can properly replicate his study
dh$Labov[dh$Code1 %in% c("1")] <- "0"
dh$Labov[dh$Code1 %in% c("2","9")] <- "1"
dh$Labov[dh$Code1 %in% c("0")] <- "2"
#making the values in the Labov column numeric
dh$Labov <- as.numeric(dh$Labov)
head(dh)
## File Segment Position Code1 Labov Code2 Seg_Start Seg_End
## 1 PH00-1-1-JStevens DH Start 9 1 NA 36.613 36.643
## 2 PH00-1-1-JStevens DH Start 9 1 NA 44.653 44.682
## 3 PH00-1-1-JStevens DH Start 9 1 NA 48.302 48.333
## 4 PH00-1-1-JStevens DH Start 9 1 NA 52.773 52.803
## 5 PH00-1-1-JStevens DH Start 0 2 NA 63.138 63.168
## 6 PH00-1-1-JStevens DH Start 2 1 NA 76.323 76.353
## Word Word_Start Word_End Pre_Seg Pre_Seg2 Pre_Seg_Start Pre_Seg_End
## 1 THE 36.613 36.673 S NA 36.543 36.613
## 2 THIS 44.653 44.792 AH NA 44.453 44.653
## 3 THE 48.302 48.363 S NA 48.273 48.302
## 4 THAT'S 52.773 52.983 sp NA 52.633 52.773
## 5 THE 63.138 63.198 K NA 63.108 63.138
## 6 THE 76.323 76.383 N NA 76.273 76.323
## Post_Seg Post_Seg2 Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1 AH NA 36.643 36.673 1.950 3.590
## 2 AH NA 44.682 44.713 1.940 2.577
## 3 AH NA 48.333 48.363 1.081 6.475
## 4 AE NA 52.803 52.853 1.500 3.333
## 5 AH NA 63.168 63.198 1.620 4.938
## 6 AH NA 76.353 76.383 2.250 3.556
## Age Age2 Birthyear sex Ethnicity School code Gram PrevING PrevWord
## 1 21 NA 1979 m i/r 14 NA NA NA na
## 2 21 NA 1979 m i/r 14 NA NA 9 THE
## 3 21 NA 1979 m i/r 14 NA NA 9 THIS
## 4 21 NA 1979 m i/r 14 NA NA 9 THE
## 5 21 NA 1979 m i/r 14 NA NA 9 THAT'S
## 6 21 NA 1979 m i/r 14 NA NA 0 THE
## X Lag X.1 style Bin_style Or_style
## 1 19.093 36.673 34.910 R Careful NA
## 2 8.119 44.792 44.080 C Careful NA
## 3 3.571 48.363 46.390 C Careful NA
## 4 4.620 52.983 52.460 C Careful NA
## 5 10.215 63.198 61.655 C Careful NA
## 6 13.185 76.383 73.190 C Careful NA
summary(dh)
## File Segment Position Code1
## PH92-2-3-JSantori : 924 DH:18974 Start:18974 Min. :0.000
## PH06-2-2-Patrick : 712 1st Qu.:1.000
## PH06-2-7-Samantha : 626 Median :1.000
## PH06-1-6-JMcPhee : 611 Mean :1.578
## PH73-5-2-CKay : 600 3rd Qu.:1.000
## PH82-1-10-MCollins: 600 Max. :9.000
## (Other) :14901
## Labov Code2 Seg_Start Seg_End
## Min. :0.0000 Mode:logical Min. : 1.953 Min. : 2.003
## 1st Qu.:0.0000 NA's:18974 1st Qu.: 718.413 1st Qu.: 718.443
## Median :0.0000 Median :1383.079 Median :1383.153
## Mean :0.5898 Mean :1461.399 Mean :1461.448
## 3rd Qu.:1.0000 3rd Qu.:2102.733 3rd Qu.:2102.765
## Max. :2.0000 Max. :4476.403 Max. :4476.453
##
## Word Word_Start Word_End Pre_Seg
## THE :7128 Min. : 1.953 Min. : 2.093 sp :5560
## THAT :2794 1st Qu.: 718.413 1st Qu.: 718.473 N :2743
## THEY :2652 Median :1383.079 Median :1383.341 Z :1145
## THERE :1185 Mean :1461.399 Mean :1461.554 V :1018
## THIS :1056 3rd Qu.:2102.733 3rd Qu.:2102.810 L : 927
## THAT'S : 864 Max. :4476.403 Max. :4476.533 K : 868
## (Other):3295 (Other):6713
## Pre_Seg2 Pre_Seg_Start Pre_Seg_End Post_Seg
## Mode:logical Min. : 1.893 Min. : 1.953 AH :8252
## NA's:18974 1st Qu.: 718.343 1st Qu.: 718.413 EH :3148
## Median :1382.614 Median :1383.079 EY :2785
## Mean :1461.240 Mean :1461.399 AE :2760
## 3rd Qu.:2102.519 3rd Qu.:2102.733 IY :1603
## Max. :4476.333 Max. :4476.403 IH : 254
## (Other): 172
## Post_Seg2 Post_Seg_Start Post_Seg_End Window
## Mode:logical Min. : 2.003 Min. : 2.063 Min. : 0.470
## NA's:18974 1st Qu.: 718.443 1st Qu.: 718.473 1st Qu.: 1.240
## Median :1383.153 Median :1383.329 Median : 1.530
## Mean :1461.448 Mean :1461.519 Mean : 1.793
## 3rd Qu.:2102.765 3rd Qu.:2102.810 3rd Qu.: 1.931
## Max. :4476.453 Max. :4476.533 Max. :53.157
##
## Vowels_per_Second Age Age2 Birthyear sex
## Min. : 0.061 Min. :18.00 Mode:logical Min. :1895 f:10144
## 1st Qu.: 2.928 1st Qu.:30.00 NA's:18974 1st Qu.:1925 m: 8830
## Median : 4.196 Median :45.00 Median :1944
## Mean : 4.264 Mean :46.91 Mean :1943
## 3rd Qu.: 5.484 3rd Qu.:68.00 3rd Qu.:1962
## Max. :13.725 Max. :78.00 Max. :1984
##
## Ethnicity School code Gram
## i :9781 12 :7059 Mode:logical Mode:logical
## r :4219 16 :4466 NA's:18974 NA's:18974
## r/o : 795 0 :1560
## w/p/J : 588 8 :1465
## i/w/g : 580 11 :1302
## w : 555 9 : 924
## (Other):2456 (Other):2198
## PrevING PrevWord X Lag
## Min. :0.000 THE :6775 Min. : -6.801 Min. : 2.093
## 1st Qu.:1.000 THAT :2606 1st Qu.: 1.143 1st Qu.: 718.473
## Median :1.000 THEY :2546 Median : 2.810 Median :1383.341
## Mean :1.571 THERE :1119 Mean : 5.130 Mean :1461.554
## 3rd Qu.:1.000 THIS :1000 3rd Qu.: 6.433 3rd Qu.:2102.810
## Max. :9.000 (Other):4906 Max. :344.800 Max. :4476.533
## NA's :1002 NA's : 22 NA's :22
## X.1 style Bin_style Or_style
## Min. : 1.88 C :6548 Length:18974 Mode:logical
## 1st Qu.: 717.05 T :4707 Class :character NA's:18974
## Median :1381.71 R :3167 Mode :character
## Mean :1459.96 N :1793
## 3rd Qu.:2101.21 S :1734
## Max. :4476.32 G : 646
## (Other): 379
#checking what's in Labov column (hopefully 0,1,2)
unique(dh$Labov)
## [1] 1 2 0
# Step 2 - begin replicating Labov 2001 by getting mean rates for data across Careful & Casual styles
#turning into dplyr data frame
dh <- tbl_df(dh)
#say what data frame
dh.style.fin <- dh %>%
#will create data in long format
group_by(Bin_style) %>%
#summarize; add dh stopping rate and N columns
summarise(non_stop=mean(Labov),N=n())
View(dh.style.fin)
# Step 3 - get mean rates across stylistic category by *individual speaker*
#turning into dplyr data frame
dh <- tbl_df(dh)
#say what data frame
dh.spkr.style <- dh %>%
#will create data in long format
group_by(File, Bin_style) %>%
#summarize; add non-(dh) stopping rate and N columns
summarise(non_stop=mean(Labov),N=n())
View(dh.spkr.style)
# Step 4 - getting a column with size of each individual speaker's style shift
#turning into dplyr data frame
dh <- tbl_df(dh)
#say what data frame
dh.spkr.style2 <- dh %>%
#will create data in long format
group_by(File, Bin_style, sex, Age, Birthyear) %>%
#summarize; add non-(dh) stopping rate and N columns
summarise(non_stop=mean(Labov),N=n())
View(dh.spkr.style2)
#opening reshape2 package (convert between long and wide format data)
library(reshape2)
#making new dataframe - each spkr is 1 row and car and cas rates are 2 separate columns
dh.spkr.style3 <- dcast(dh.spkr.style2, File + sex + Age + Birthyear ~ Bin_style,
value.var = "non_stop")
#create style_shift column by subtracting Careful mean from Casual mean
dh.spkr.style3$DH_style_shift <- dh.spkr.style3$Casual - dh.spkr.style3$Careful
#multiplying style_shift values by 100 so they align with Labov's values
dh.spkr.style3$DH_style_shift <- dh.spkr.style3$DH_style_shift * 100
#create Age2 column that groups speakers into younger (<30) or older(≥30)
dh.spkr.style3$Age2 <- ifelse(dh.spkr.style3$Age %in% c(18,20,21,22,23,27,28,29), "younger", "older")
View(dh.spkr.style3)
#create N column with the total number of DH occurrences per speaker
#say what data frame
dh.spkr.style4 <- dh %>%
#summarize; add N column
summarise(N=n())
# Step 5 - replicating Labov figure 5.2 (histogram, Cas index - Car index, for ind spkr)
#opening gglpt2
library(ggplot2)
#making the graph - histogram with degree of ind style shift by gender
ggplot(dh.spkr.style3, aes(DH_style_shift)) +
geom_histogram(stat_bin=5, aes(fill=sex), position="dodge") +
ggtitle("Style-shifting of (DH) variable in Philadelphia")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Unlike Labov, I did not find dramatic style-shifting for (DH). My modal value fell around 5. However, like Labov I found no strong difference by gender. It is interesting to observe that in my data, women are concentrated among speakers with the most extreme negative values (values below -5) and men among those with the most extreme positive values (values of 10 and greater). This relative concentration is the opposite of what Labov found, as he noted women were concentrated among speakers with the most positive values and men among those with the most negative. It is important to note that I had fewer subjects than Labov. A larger sample size would increase confidence in the findings.
Labov found no overall-age effect, indicated by the overall regression line (straight solid line in the middle of the graph). Labov discounts the significance of the partial regressions for males and females (the dashed lines, which show a slight tendency for males to shift more as they age and for females to shift less), given that the correlation of age with (DH) shift for males is .08 and for females is -.03. He draws attention to a clump of young speakers in the bottom left corner of the graph that display behavior contrary to the prevalent societal norm, in that these speakers have very negative (DH) shifts. No comparable negative clustering occurs among the older speakers. Labov interprets this pattern as affirming the fact that children begin acquiring a community’s sociolinguistic norms at an early age (albeit gradually). He goes on to claim that age and socioeconomic class influence this acquisition.
# Step 6 - replicating Labov fig 5.5 - distribution of style shift by age and gender
#making the graph - scatterplot with dist of ind style shift by age and gender
ggplot(dh.spkr.style3, aes(Age, DH_style_shift)) + geom_point(aes(color=sex)) +
stat_smooth(method = "lm") + ggtitle("Distribution of (DH) style-shift by age and sex")
My graph of the distribution of (DH) style-shift by age and sex closely resembles Labov’s. The primary difference lies in the range of values on each graph’s y-axis (that is, the degree of the [DH] style-shift). Since I did not find as dramatic a style-shift for (DH) as Labov did, it makes sense that my range is smaller. The dramatic style-shifting observed by Labov was only observed in the positive direction, not the negative. It is interesting that the lower end of my range is similar to the lower end of Labov’s range; both fall between approximately -30 and -20. This suggests a consistent tendency for speakers to style-shift by decreasing their use of nonstandard variants as they move from Casual to Careful speech. This tendency is fairly intuitive given the saliency of the “casual” social meaning accorded to nonstandard variants and the more “formal” meaning accorded to standard forms. Similarly to Labov, I did not find a strong difference by age or sex. My overall regression line is slightly more sloped than Labov’s, indicating that a speaker’s tendency to style-shift increases slightly with age. My sampling represents an older portion of the population than Labov’s, so it is not possible to gauge adolescent patterns of style-shifting and their divergence from (or convergence with) the community’s normal patterns from my data. The overall regression line is centered on zero and is only slightly curved, suggesting that an age effect is very small and insignificant. That my regression line is centered on zero and Labov’s on 35 reinforces the different degrees of style-shifting each of us observed. It is interesting to note that most of the outlying points of dramatic style-shifting occurred for younger speakers, and that the points become closer to the regression line as age increases.
Labov interprets this distribution of index scores across the nuanced categories of the decision tree as validation of each category’s ability to contribute to identifying stylistic variation. Although he questions the decision tree’s effectiveness, Labov does not think the decision tree could be improved if any existing category were discarded. He notes that (DH) stylistic differences are considerably larger than (ING) stylistic differences (even acknowledging the scale covers twice the range). He points out that for (DH), Soapbox is most differentiated from the mean in Careful speech, and that in Casual speech Narrative and Kids have the highest index scores but all four Casual speech categories are well above the mean level.
# Step 7 - replicating fig 5.7 - stylistic differentiation for 8 cat of dec tree
#turning into dplyr data frame
dh <- tbl_df(dh)
#say what data frame
dh.style.fin2 <- dh %>%
#will create data in long format
group_by(style, Bin_style) %>%
#summarize; add dh stopping rate and N columns
summarise(index=mean(Labov),N=n())
#multiplying index values by 100 so correspond to Labov's values
dh.style.fin2$index <- dh.style.fin2$index * 100
#reordering style values on x-axis so match Labov 2001 order
dh.style.fin2$style <- factor(dh.style.fin2$style, c("S","L","R","C","N","K","T","G"))
View(dh.style.fin2)
#replicating figure 5.7 as a bar graph - index score arr. by orig 8 style values
ggplot(dh.style.fin2, aes(style,index)) +
geom_bar(stat = "identity") +
facet_wrap(~ Bin_style) +
ggtitle("Stylistic Differentiation of (DH) for
eight categories of the Style Decision Tree")
My graph shows a fairly consistent index of stylistic differentiation of (DH) across the nuanced categories of the decision tree, for both Careful and Casual styles. For the Careful styles, as occurred in Labov’s data, the index value for Soapbox diverges most from the other Careful values. However, even it does not stray dramatically from 60, the index value around which the values for the other stylistic categories fell. Interestingly, in my data the index value for Soapbox is higher than for the other Careful index values, while in Labov’s data it was lower than the other values. For the Casual styles, the Kids value diverges most from those of the other style categories, but it also does not stray far beyond 60. It is striking that each stylistic category’s index value fell around 60. This suggests that each category indicates style comparably to the other categories of the decision tree and, as Labov mentioned, does not give a reason to eliminate any category from the decision tree. It also indicates that perhaps another method entirely is needed to better gauge stylistic variation in speech. While the index values for Careful and Casual styles consistently fell around 60 in my data, a clear difference in the mean index value appeared across the Careful and Casual styles in Labov’s data. In Labov’s study, the index values clustered around 45 for the Careful styles and around slightly over 90 for the Casual styles. This calls into question the coding practices used in each study and reinforces the need to develop a more reliable method for quantitatively gauging stylistic variation.
Labov notes that the four Careful subcategories are clearly differentiated from the four Casual subcategories in this representation of (DH).
# Step 8 - replicating Labov's fig 5.8 (stylistic diff by sex for 8 cat of dec tree)
#turning into dplyr data frame
dh <- tbl_df(dh)
#say what data frame
dh.style.fin3 <- dh %>%
#will create data in long format
group_by(style, Bin_style, sex) %>%
#summarize; add dh stopping rate and N columns
summarise(index=mean(Labov),N=n())
#multiplying index values by 100 so correspond to Labov's values
dh.style.fin3$index <- dh.style.fin3$index * 100
#reordering style values on x-axis so match Labov 2001 order
dh.style.fin3$style <- factor(dh.style.fin3$style, c("S","L","R","C","N","K","T","G"))
View(dh.style.fin3)
#replicating figure 5.8 as bar graph - index score arr. by orig 8 style values and diff by sex
ggplot(dh.style.fin3, aes(style,index)) +
geom_bar(stat = "identity", aes(fill=sex)) +
facet_wrap(~ Bin_style) +
facet_grid(sex ~ Bin_style) +
ggtitle("Stylistic Differentiation of (DH) by sex for
eight categories of the Style Decision Tree")
My graph indicates that (DH) stylistic differentiation is more varied across sex than across style category. Both men and women exhibit similar indices of style-shifting to those of their same sex across Careful and Casual styles. The index value for Careful speech for women falls between around 40 and 80, with Soapbox having the highest value (around 80), followed by Response and Careful (each around 50) and finally Language (around 40). The index values for Casual speech for women are similarly spread out and also fall between around 40 and 80. For Casual speech, Kids has the highest index value (around 80), followed by Group (around 60) and then Narrative (around 50) and Tangent (around 40). There is not a clear pattern of relatively higher, or lower, index values occurring among more objective, or subjective, categories (“objective” categories including Narrative, Language and Group and “subjective” categories including Kids, Tangent and Careful). Male speakers also exhibited more similar patterns across style than across gender. For both the Careful and Casual style categories, the male index value falls between 60 and 70. It is very interesting to compare these results to Labov’s, for his primary categorical differentiation of index value occurred across style category, not sex. For Labov, both men and women tended to have lower index values for degree of style shifting in the Careful category, with values clustering around 40, while for the Casual category speakers of both sexes tended to have higher values, with the women’s values clustering around 80 and the men’s between 100 and 120. interaction?? (usually male values seem higher but not always?)
# Step 9 - replicating Labov fig 5.6 - variable style-shift by size of data set and age
#issue with coding so had to remove
My logistic regression for (DH) did not find Style to be a significant predictor of speech variant used. My original hypothesis, that the percent frequency of the non-standard [d] variant will be greater in “casual” than “careful” contexts, was not supported. I am unable to reject the null hypothesis that different stylistic contexts have no effect on the variant ([dh] or [d]) a speaker realizes. These results are striking, because Labov found dramatic evidence of style shifting for the (DH) variable with data obtained from the same corpus as mine. Although style was a weaker predictor of speech variant than I anticipated, these findings are not enough to discount the importance of style in speech. It is possible that the lack of style shifting I found can be attributed to shortcomings of coding with the decision tree method, and that methods that can better accommodate for both the range of indexical meanings presented by every variable and for every individual speaker’s unique usage of these meanings to create situationally-relevant stances will be able to better predict stylistic variation in speech. Sex and birthyear were found to be significant predictors of the linguistic variant used, which underscores how external (social) factors do influence linguistic variation. Labov also found a significant and sizeable gender effect.
#accessing lme4 (program needed to run regression)
library(lme4)
## Loading required package: Matrix
#accessing data file
dh.lr <- read.csv("dhstyle_fin.csv")
#peeking at first lines of data file
head(dh.lr)
## File Segment Position Code1 Labov Code2 Seg_Start Seg_End
## 1 PH00-1-1-JStevens DH Start 9 NA NA 36.613 36.643
## 2 PH00-1-1-JStevens DH Start 9 NA NA 44.653 44.682
## 3 PH00-1-1-JStevens DH Start 9 NA NA 48.302 48.333
## 4 PH00-1-1-JStevens DH Start 9 NA NA 52.773 52.803
## 5 PH00-1-1-JStevens DH Start 0 NA NA 63.138 63.168
## 6 PH00-1-1-JStevens DH Start 2 NA NA 76.323 76.353
## Word Word_Start Word_End Pre_Seg Pre_Seg2 Pre_Seg_Start Pre_Seg_End
## 1 THE 36.613 36.673 S NA 36.543 36.613
## 2 THIS 44.653 44.792 AH NA 44.453 44.653
## 3 THE 48.302 48.363 S NA 48.273 48.302
## 4 THAT'S 52.773 52.983 sp NA 52.633 52.773
## 5 THE 63.138 63.198 K NA 63.108 63.138
## 6 THE 76.323 76.383 N NA 76.273 76.323
## Post_Seg Post_Seg2 Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1 AH NA 36.643 36.673 1.950 3.590
## 2 AH NA 44.682 44.713 1.940 2.577
## 3 AH NA 48.333 48.363 1.081 6.475
## 4 AE NA 52.803 52.853 1.500 3.333
## 5 AH NA 63.168 63.198 1.620 4.938
## 6 AH NA 76.353 76.383 2.250 3.556
## Age Age2 Birthyear sex Ethnicity School code Gram PrevING PrevWord
## 1 21 NA 1979 m i/r 14 NA NA NA na
## 2 21 NA 1979 m i/r 14 NA NA 9 THE
## 3 21 NA 1979 m i/r 14 NA NA 9 THIS
## 4 21 NA 1979 m i/r 14 NA NA 9 THE
## 5 21 NA 1979 m i/r 14 NA NA 9 THAT'S
## 6 21 NA 1979 m i/r 14 NA NA 0 THE
## X Lag X.1 style Bin_style Or_style
## 1 19.093 36.673 34.910 R NA NA
## 2 8.119 44.792 44.080 C NA NA
## 3 3.571 48.363 46.390 C NA NA
## 4 4.620 52.983 52.460 C NA NA
## 5 10.215 63.198 61.655 C NA NA
## 6 13.185 76.383 73.190 C NA NA
#collapsing 8 stylistic categories into 3 style codes in Bin_style
dh.lr$Bin_style <- ifelse(dh.lr$style %in% c("C","L","R","S"), "Careful",
ifelse(dh.lr$style %in% c("G","K","N","T"), "Casual",
"NA"))
#filtering NAs from Bin_style column so don't appear in table
dh.lr <- filter(dh.lr, !Bin_style=="NA")
#configuring Code2 column for regression (DV only 0 - nstd or 1 - std)
#excluding 9's
#collapsing values in Code2 column
dh.lr$Code2[dh.lr$Code1 %in% c("0")] <- "0"
dh.lr$Code2[dh.lr$Code1 %in% c("1","2")] <- "1"
#making the values in Code2 column numeric
dh.lr$Code2 <- as.numeric(dh.lr$Code2)
#checking values in Code2 column
unique(dh.lr$Code2)
## [1] NA 0 1
#filtering NA's from Code2 column
dh.lr <- filter(dh.lr, !Code2=="NA")
#recode Pre_Seg column into 2 categories in Pre_Seg2
dh.lr$Pre_Seg2 <- ifelse(dh.lr$Pre_Seg %in% c("AA","AE","AH","AO","AW","AY","EH",
"ER","EY","IH","IY","OW","OY","UH","UW"),
"vowel", ifelse(dh.lr$Pre_Seg %in% c("br","lg","ls","ns","sp"),
"NA", "consonant"))
#recode Post_Seg column into 2 categories in Post_Seg2
dh.lr$Post_Seg2 <- ifelse(dh.lr$Post_Seg %in% c("AE","AH","AY","EH","EY",
"IH","IY","OW"), "vowel", ifelse(dh.lr$Post_Seg %in% c("sp"), "NA", "consonant"))
#peeking at first lines of file
head(dh.lr)
## File Segment Position Code1 Labov Code2 Seg_Start Seg_End
## 1 PH00-1-1-JStevens DH Start 0 NA 0 63.138 63.168
## 2 PH00-1-1-JStevens DH Start 2 NA 1 76.323 76.353
## 3 PH00-1-1-JStevens DH Start 2 NA 1 87.663 87.763
## 4 PH00-1-1-JStevens DH Start 1 NA 1 96.648 96.708
## 5 PH00-1-1-JStevens DH Start 0 NA 0 102.113 102.142
## 6 PH00-1-1-JStevens DH Start 0 NA 0 117.073 117.123
## Word Word_Start Word_End Pre_Seg Pre_Seg2 Pre_Seg_Start Pre_Seg_End
## 1 THE 63.138 63.198 K consonant 63.108 63.138
## 2 THE 76.323 76.383 N consonant 76.273 76.323
## 3 THE 87.663 87.973 sp NA 87.643 87.663
## 4 THERE 96.648 96.767 OW vowel 96.538 96.648
## 5 THEN 102.113 102.203 sp NA 101.468 102.113
## 6 THEN 117.073 117.212 sp NA 116.403 117.073
## Post_Seg Post_Seg2 Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1 AH vowel 63.168 63.198 1.620 4.938
## 2 AH vowel 76.353 76.383 2.250 3.556
## 3 AH vowel 87.763 87.973 1.520 7.895
## 4 EH vowel 96.708 96.738 1.019 7.851
## 5 EH vowel 102.142 102.173 1.425 4.211
## 6 EH vowel 117.123 117.153 2.445 2.045
## Age Age2 Birthyear sex Ethnicity School code Gram PrevING PrevWord
## 1 21 NA 1979 m i/r 14 NA NA 9 THAT'S
## 2 21 NA 1979 m i/r 14 NA NA 0 THE
## 3 21 NA 1979 m i/r 14 NA NA NA na
## 4 21 NA 1979 m i/r 14 NA NA 2 THE
## 5 21 NA 1979 m i/r 14 NA NA 1 THERE
## 6 21 NA 1979 m i/r 14 NA NA 0 THEN
## X Lag X.1 style Bin_style Or_style
## 1 10.215 63.198 61.655 C Careful NA
## 2 13.185 76.383 73.190 C Careful NA
## 3 4.250 87.973 83.950 R Careful NA
## 4 8.794 96.767 96.165 C Careful NA
## 5 5.436 102.203 102.100 C Careful NA
## 6 15.009 117.212 116.030 C Careful NA
#summary of file contents -> ensure everything in correct format
summary(dh.lr)
## File Segment Position Code1
## PH92-2-3-JSantori : 805 DH:17405 Start:17405 Min. :0.0000
## PH06-2-2-Patrick : 662 1st Qu.:1.0000
## PH06-1-6-JMcPhee : 607 Median :1.0000
## PH73-5-2-CKay : 586 Mean :0.9092
## PH82-1-10-MCollins: 582 3rd Qu.:1.0000
## PH06-2-4-Brooke : 570 Max. :2.0000
## (Other) :13593
## Labov Code2 Seg_Start Seg_End
## Mode:logical Min. :0.0000 Min. : 1.953 Min. : 2.003
## NA's:17405 1st Qu.:1.0000 1st Qu.: 716.133 1st Qu.: 716.183
## Median :1.0000 Median :1381.053 Median :1381.113
## Mean :0.7855 Mean :1461.255 Mean :1461.304
## 3rd Qu.:1.0000 3rd Qu.:2100.763 3rd Qu.:2100.812
## Max. :1.0000 Max. :4476.403 Max. :4476.453
##
## Word Word_Start Word_End Pre_Seg
## THE :6727 Min. : 1.953 Min. : 2.093 sp :5284
## THEY :2596 1st Qu.: 716.133 1st Qu.: 716.223 N :2486
## THAT :2494 Median :1381.053 Median :1381.173 V : 924
## THERE :1043 Mean :1461.255 Mean :1461.411 Z : 863
## THIS : 963 3rd Qu.:2100.763 3rd Qu.:2100.961 L : 838
## THAT'S : 621 Max. :4476.403 Max. :4476.533 ER : 779
## (Other):2961 (Other):6231
## Pre_Seg2 Pre_Seg_Start Pre_Seg_End Post_Seg
## Length:17405 Min. : 1.893 Min. : 1.953 AH :7491
## Class :character 1st Qu.: 716.078 1st Qu.: 716.133 EH :2919
## Mode :character Median :1380.818 Median :1381.053 EY :2728
## Mean :1461.095 Mean :1461.255 AE :2338
## 3rd Qu.:2100.633 3rd Qu.:2100.763 IY :1538
## Max. :4476.333 Max. :4476.403 IH : 226
## (Other): 165
## Post_Seg2 Post_Seg_Start Post_Seg_End Window
## Length:17405 Min. : 2.003 Min. : 2.063 Min. : 0.470
## Class :character 1st Qu.: 716.183 1st Qu.: 716.223 1st Qu.: 1.249
## Mode :character Median :1381.113 Median :1381.143 Median : 1.539
## Mean :1461.304 Mean :1461.377 Mean : 1.782
## 3rd Qu.:2100.812 3rd Qu.:2100.893 3rd Qu.: 1.930
## Max. :4476.453 Max. :4476.533 Max. :53.157
##
## Vowels_per_Second Age Age2 Birthyear sex
## Min. : 0.061 Min. :18.00 Mode:logical Min. :1895 f:9569
## 1st Qu.: 2.941 1st Qu.:29.00 NA's:17405 1st Qu.:1925 m:7836
## Median : 4.167 Median :45.00 Median :1944
## Mean : 4.249 Mean :46.63 Mean :1943
## 3rd Qu.: 5.446 3rd Qu.:65.00 3rd Qu.:1962
## Max. :13.725 Max. :78.00 Max. :1984
##
## Ethnicity School code Gram
## i :8765 12 :6480 Mode:logical Mode:logical
## r :3974 16 :4248 NA's:17405 NA's:17405
## r/o : 760 0 :1495
## w/p/J : 570 8 :1311
## w : 521 11 :1184
## r/p : 515 9 : 805
## (Other):2300 (Other):1882
## PrevING PrevWord X Lag
## Min. :0.000 THE :6249 Min. : -6.801 Min. : 2.093
## 1st Qu.:1.000 THEY :2399 1st Qu.: 1.149 1st Qu.: 716.223
## Median :1.000 THAT :2364 Median : 2.800 Median :1381.173
## Mean :1.499 THERE :1019 Mean : 5.116 Mean :1461.411
## 3rd Qu.:1.000 THIS : 923 3rd Qu.: 6.411 3rd Qu.:2100.961
## Max. :9.000 (Other):4435 Max. :344.800 Max. :4476.533
## NA's :913 NA's : 16 NA's :16
## X.1 style Bin_style Or_style
## Min. : 1.88 C :6000 Length:17405 Mode:logical
## 1st Qu.: 714.64 T :4414 Class :character NA's:17405
## Median :1379.53 R :2860 Mode :character
## Mean :1459.83 N :1619
## 3rd Qu.:2099.71 S :1590
## Max. :4476.32 G : 579
## (Other): 343
#creating the model (9's excluded)
mod.dh <- glm(Code2 ~ sex + Birthyear + Pre_Seg2 + Post_Seg2 + Bin_style,
dh.lr, family = "binomial")
#seeing the results of the model
summary(mod.dh)
##
## Call:
## glm(formula = Code2 ~ sex + Birthyear + Pre_Seg2 + Post_Seg2 +
## Bin_style, family = "binomial", data = dh.lr)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3142 0.4082 0.5557 0.7362 1.1207
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.368e+01 1.970e+02 -0.120 0.9043
## sexm -5.196e-01 3.805e-02 -13.658 <2e-16 ***
## Birthyear 1.814e-02 8.815e-04 20.582 <2e-16 ***
## Pre_Seg2NA -5.969e-01 4.277e-02 -13.956 <2e-16 ***
## Pre_Seg2vowel -9.650e-02 5.072e-02 -1.903 0.0571 .
## Post_Seg2NA 6.787e-02 2.274e+02 0.000 0.9998
## Post_Seg2vowel -9.776e+00 1.970e+02 -0.050 0.9604
## Bin_styleCasual 6.697e-02 3.918e-02 1.709 0.0874 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 18098 on 17404 degrees of freedom
## Residual deviance: 17205 on 17397 degrees of freedom
## AIC: 17221
##
## Number of Fisher Scoring iterations: 10
**The data represented in the following graphs and analyses is a subsection of the original (ING) data frame: this subsection only contains (ING) progressives. This subset was used in lieu of the entire dataset to ensure the replication is consistent with Labov, who on page 94 of Labov 2001 notes only analyzing data from progressives.
Labov found less pronounced style-shifting for (ING) than he did for (DH). His modal value for (ING) fell close to zero. Once again, while some style-shift values occur below zero, the majority of speakers had positive style-shift values.
#working in dplyr
library(dplyr)
#opening data file
ing <- read.csv("ingstyle_fin.csv")
#peeking at first lines of data frame
head(ing)
## File Segment Position code Gram Seg_Start Seg_End
## 1 PH00-1-1-JStevens IH0 End 0 o 85.063 85.102
## 2 PH00-1-1-JStevens IH0 End 0 p 112.708 112.738
## 3 PH00-1-1-JStevens IH0 End 1 p 114.648 114.688
## 4 PH00-1-1-JStevens IH0 End 0 p 127.878 127.907
## 5 PH00-1-1-JStevens IH0 End 0 o 147.712 147.743
## 6 PH00-1-1-JStevens IH0 End 1 n 148.553 148.603
## word Word_Start Word_End Pre_Seg Pre_Seg_Start Pre_Seg_End
## 1 WORKING 84.912 85.132 K 85.023 85.063
## 2 FIXING 112.468 112.768 S 112.658 112.708
## 3 NETWORKING 114.308 114.798 K 114.588 114.648
## 4 WORKING 127.687 127.937 K 127.808 127.878
## 5 WAITING 147.593 147.773 T 147.682 147.712
## 6 OPENING 148.413 148.653 N 148.522 148.553
## Post_Seg Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1 S 85.132 85.273 1.280 5.469
## 2 K 112.768 112.808 3.060 2.614
## 3 sp 114.798 115.138 3.635 2.201
## 4 AW 127.937 128.028 1.860 3.226
## 5 T 147.773 147.812 1.310 8.397
## 6 sp 148.653 148.673 1.430 6.993
## preseg_place preseg_manner folseg_place folseg_manner sex birthyear age
## 1 velar stop/aff coronal fricative m 1979 21
## 2 coronal fricative velar stop/aff m 1979 21
## 3 velar stop/aff none pause m 1979 21
## 4 velar stop/aff velar vowel m 1979 21
## 5 coronal stop/aff coronal stop/aff m 1979 21
## 6 coronal nasal none pause m 1979 21
## school school.calc school.cat newgram newgram2 PrevING PrevWord
## 1 14 2 Some college p NA 1 na
## 2 14 2 Some college g NA 0 WORKING
## 3 14 2 Some college g NA 0 FIXING
## 4 14 2 Some college p NA 1 NETWORKING
## 5 14 2 Some college p NA 0 WORKING
## 6 14 2 Some college r NA 0 WAITING
## PrevGram Lag subtlex.count start end style Bin_style Or_style
## 1 <NA> 1.409 12775 84.912 83.950 R NA NA
## 2 p 27.636 533 112.468 107.275 C NA NA
## 3 g 2.030 27 114.308 107.275 C NA NA
## 4 g 13.139 12775 127.687 126.505 C NA NA
## 5 p 8.600 10767 147.593 146.840 C NA NA
## 6 p 0.880 1960 148.413 146.840 C NA NA
#seeing summary of data frame
summary(ing)
## File Segment Position code
## PH06-2-2-Patrick : 171 IH0:3816 End:3816 Min. :0.0000
## PH82-1-10-MCollins : 163 1st Qu.:0.0000
## PH06-1-6-JMcPhee : 151 Median :0.0000
## PH92-2-3-JSantori : 142 Mean :0.4267
## PH06-2-5-Sophia : 141 3rd Qu.:1.0000
## PH79-3-5-VSarsparilla: 141 Max. :1.0000
## (Other) :2907 NA's :17
## Gram Seg_Start Seg_End word
## a: 102 Min. : 9.493 Min. : 9.542 SOMETHING: 329
## d: 49 1st Qu.: 748.056 1st Qu.: 748.086 GOING : 285
## g: 188 Median :1436.973 Median :1437.013 DOING : 165
## n: 241 Mean :1498.944 Mean :1498.984 GETTING : 150
## o:1722 3rd Qu.:2204.564 3rd Qu.:2204.598 BEING : 114
## p:1085 Max. :4469.943 Max. :4469.993 COMING : 109
## s: 429 (Other) :2664
## Word_Start Word_End Pre_Seg Pre_Seg_Start
## Min. : 9.373 Min. : 9.642 K : 570 Min. : 9.413
## 1st Qu.: 747.867 1st Qu.: 748.124 T : 463 1st Qu.: 747.979
## Median :1436.683 Median :1437.118 TH : 431 Median :1436.832
## Mean :1498.694 Mean :1499.053 OW : 380 Mean :1498.868
## 3rd Qu.:2204.352 3rd Qu.:2204.640 N : 235 3rd Qu.:2204.477
## Max. :4469.703 Max. :4470.052 V : 229 Max. :4469.793
## (Other):1508
## Pre_Seg_End Post_Seg Post_Seg_Start Post_Seg_End
## Min. : 9.493 sp : 849 Min. : 9.642 Min. : 9.782
## 1st Qu.: 748.056 AH : 525 1st Qu.: 748.124 1st Qu.: 748.200
## Median :1436.973 DH : 337 Median :1437.118 Median :1437.208
## Mean :1498.944 T : 248 Mean :1499.053 Mean :1499.164
## 3rd Qu.:2204.564 IH : 134 3rd Qu.:2204.640 3rd Qu.:2204.678
## Max. :4469.943 W : 122 Max. :4470.052 Max. :4470.133
## (Other):1601
## Window Vowels_per_Second preseg_place preseg_manner
## Min. : 0.570 Min. : 0.148 coronal:2139 fricative : 827
## 1st Qu.: 1.300 1st Qu.: 3.515 labial :1019 liquid/glide: 246
## Median : 1.609 Median : 4.790 velar : 658 nasal : 405
## Mean : 1.812 Mean : 4.837 stop/aff :1359
## 3rd Qu.: 1.970 3rd Qu.: 6.034 vowel : 979
## Max. :33.877 Max. :13.115
##
## folseg_place folseg_manner sex birthyear
## coronal:1405 fricative : 559 f:1951 Min. :1895
## labial : 517 liquid/glide: 504 m:1865 1st Qu.:1930
## none : 863 nasal : 146 Median :1944
## velar :1031 pause : 863 Mean :1947
## stop/aff : 581 3rd Qu.:1970
## vowel :1163 Max. :1984
##
## age school school.calc school.cat
## Min. :18.00 Min. : 0.00 Min. :-12.0000 Just HS :1447
## 1st Qu.:29.00 1st Qu.:11.00 1st Qu.: -1.0000 Not HS :1012
## Median :43.00 Median :12.00 Median : 0.0000 Some college:1180
## Mean :45.03 Mean :12.03 Mean : 0.0294 NA's : 177
## 3rd Qu.:62.00 3rd Qu.:14.00 3rd Qu.: 2.0000
## Max. :78.00 Max. :16.00 Max. : 4.0000
## NA's :177 NA's :177
## newgram newgram2 PrevING PrevWord
## exclude: 18 Mode:logical Min. :0.0000 na : 629
## g : 625 NA's:3816 1st Qu.:0.0000 SOMETHING: 266
## m : 138 Median :1.0000 GOING : 241
## p :2279 Mean :0.5169 DOING : 142
## r : 324 3rd Qu.:1.0000 GETTING : 124
## s : 432 Max. :1.0000 (Other) :2397
## NA's :30 NA's : 17
## PrevGram Lag subtlex.count start
## exclude: 16 Min. : -2.666 Min. : 0 Min. : 9.373
## g : 517 1st Qu.: 3.257 1st Qu.: 1960 1st Qu.: 747.867
## m : 112 Median : 9.977 Median : 9730 Median :1436.683
## p :1921 Mean : 20.314 Mean : 24886 Mean :1498.694
## r : 257 3rd Qu.: 26.028 3rd Qu.: 26878 3rd Qu.:2204.352
## s : 346 Max. :401.690 Max. :108288 Max. :4469.703
## NA's : 647 NA's :17
## end style Bin_style Or_style
## Min. : 9.04 C :1212 Mode:logical Mode:logical
## 1st Qu.: 746.63 T : 922 NA's:3816 NA's:3816
## Median :1435.05 N : 665
## Mean :1497.00 R : 475
## 3rd Qu.:2203.72 S : 352
## Max. :4468.66 G : 128
## (Other): 62
# Step 1 - Reorganizing the data frame
#collapsing style codes into 3 styles (car, cas, na) in Bin_style (%in%, not %>%)
ing$Bin_style <- ifelse(ing$style %in% c("C","R","L","S"), "Careful",
ifelse(ing$style %in% c("N","G","K","T"), "Casual", "NA"))
#filter so NAs are taken out of code column
ing <- filter(ing, !code=="NA")
#filter so "n/a"s don't appear in table
ing <- filter(ing, !Bin_style=="n/a")
#seeing values in "Bin_style" column
unique(ing$Bin_style)
## [1] "Careful" "Casual"
#subsetting data so only working with progressives (like Labov)
ing.prog <- subset(ing, Gram %in% c("o"))
View(ing.prog)
head(ing.prog)
## File Segment Position code Gram Seg_Start Seg_End word
## 1 PH00-1-1-JStevens IH0 End 0 o 85.063 85.102 WORKING
## 5 PH00-1-1-JStevens IH0 End 0 o 147.712 147.743 WAITING
## 7 PH00-1-1-JStevens IH0 End 0 o 152.518 152.548 MAKING
## 10 PH00-1-1-JStevens IH0 End 0 o 331.826 331.866 WORKING
## 12 PH00-1-1-JStevens IH0 End 0 o 353.953 353.983 DOING
## 17 PH00-1-1-JStevens IH0 End 0 o 454.528 454.558 THROWING
## Word_Start Word_End Pre_Seg Pre_Seg_Start Pre_Seg_End Post_Seg
## 1 84.912 85.132 K 85.023 85.063 S
## 5 147.593 147.773 T 147.682 147.712 T
## 7 152.328 152.578 K 152.467 152.518 N
## 10 331.555 331.936 K 331.755 331.826 sp
## 12 353.823 354.042 UW 353.853 353.953 G
## 17 454.248 454.598 OW 454.418 454.528 B
## Post_Seg_Start Post_Seg_End Window Vowels_per_Second preseg_place
## 1 85.132 85.273 1.280 5.469 velar
## 5 147.773 147.812 1.310 8.397 coronal
## 7 152.578 152.668 1.260 6.349 velar
## 10 331.936 332.066 1.682 2.973 velar
## 12 354.042 354.073 2.576 1.941 labial
## 17 454.598 454.648 2.370 3.376 labial
## preseg_manner folseg_place folseg_manner sex birthyear age school
## 1 stop/aff coronal fricative m 1979 21 14
## 5 stop/aff coronal stop/aff m 1979 21 14
## 7 stop/aff coronal nasal m 1979 21 14
## 10 stop/aff none pause m 1979 21 14
## 12 vowel velar stop/aff m 1979 21 14
## 17 vowel labial stop/aff m 1979 21 14
## school.calc school.cat newgram newgram2 PrevING PrevWord PrevGram
## 1 2 Some college p NA 1 na <NA>
## 5 2 Some college p NA 0 WORKING p
## 7 2 Some college p NA 1 OPENING r
## 10 2 Some college p NA 0 HANGING p
## 12 2 Some college p NA 0 SOMETHING s
## 17 2 Some college p NA 1 CALLING p
## Lag subtlex.count start end style Bin_style Or_style
## 1 1.409 12775 84.912 83.950 R Careful NA
## 5 8.600 10767 147.593 146.840 C Careful NA
## 7 3.925 11349 152.328 151.035 C Careful NA
## 10 6.584 12775 331.555 330.573 R Careful NA
## 12 12.845 52492 353.823 353.310 T Casual NA
## 17 2.041 1488 454.248 450.665 C Careful NA
summary(ing.prog)
## File Segment Position code
## PH82-1-10-MCollins : 81 IH0:1705 End:1705 Min. :0.0000
## PH06-2-5-Sophia : 78 1st Qu.:0.0000
## PH79-3-5-VSarsparilla: 68 Median :0.0000
## PH10-2-3-Vince : 67 Mean :0.3501
## PH92-2-3-JSantori : 67 3rd Qu.:1.0000
## PH06-1-4-KSwanson : 63 Max. :1.0000
## (Other) :1281
## Gram Seg_Start Seg_End word
## a: 0 Min. : 9.493 Min. : 9.542 GOING : 193
## d: 0 1st Qu.: 786.113 1st Qu.: 786.142 DOING : 137
## g: 0 Median :1436.123 Median :1436.153 GETTING: 101
## n: 0 Mean :1489.813 Mean :1489.852 SAYING : 88
## o:1705 3rd Qu.:2185.921 3rd Qu.:2185.951 TALKING: 81
## p: 0 Max. :4469.943 Max. :4469.993 COMING : 66
## s: 0 (Other):1039
## Word_Start Word_End Pre_Seg Pre_Seg_Start
## Min. : 9.373 Min. : 9.642 K :334 Min. : 9.413
## 1st Qu.: 785.922 1st Qu.: 786.262 T :256 1st Qu.: 786.022
## Median :1436.023 Median :1436.183 OW :232 Median :1436.093
## Mean :1489.577 Mean :1489.918 UW :141 Mean :1489.732
## 3rd Qu.:2185.721 3rd Qu.:2186.021 EY :139 3rd Qu.:2185.741
## Max. :4469.703 Max. :4470.052 V :100 Max. :4469.793
## (Other):503
## Pre_Seg_End Post_Seg Post_Seg_Start Post_Seg_End
## Min. : 9.493 sp :331 Min. : 9.642 Min. : 9.782
## 1st Qu.: 786.113 AH :265 1st Qu.: 786.262 1st Qu.: 786.292
## Median :1436.123 DH :145 Median :1436.183 Median :1436.393
## Mean :1489.813 T :130 Mean :1489.918 Mean :1490.025
## 3rd Qu.:2185.921 AA : 70 3rd Qu.:2186.021 3rd Qu.:2186.141
## Max. :4469.943 IH : 62 Max. :4470.052 Max. :4470.133
## (Other):702
## Window Vowels_per_Second preseg_place preseg_manner
## Min. : 0.610 Min. : 0.148 coronal:743 fricative :162
## 1st Qu.: 1.231 1st Qu.: 3.774 labial :591 liquid/glide: 89
## Median : 1.510 Median : 5.040 velar :371 nasal :150
## Mean : 1.761 Mean : 5.086 stop/aff :710
## 3rd Qu.: 1.880 3rd Qu.: 6.349 vowel :594
## Max. :33.877 Max. :13.115
##
## folseg_place folseg_manner sex birthyear age
## coronal:611 fricative :237 f:940 Min. :1895 Min. :18.00
## labial :252 liquid/glide:207 m:765 1st Qu.:1930 1st Qu.:29.00
## none :336 nasal : 86 Median :1944 Median :41.00
## velar :506 pause :336 Mean :1946 Mean :44.63
## stop/aff :271 3rd Qu.:1965 3rd Qu.:62.00
## vowel :568 Max. :1984 Max. :78.00
##
## school school.calc school.cat newgram
## Min. : 0.00 Min. :-12.000 Just HS :672 exclude: 0
## 1st Qu.:11.00 1st Qu.: -1.000 Not HS :462 g : 0
## Median :12.00 Median : 0.000 Some college:482 m : 0
## Mean :11.85 Mean : -0.146 NA's : 89 p :1705
## 3rd Qu.:13.00 3rd Qu.: 1.000 r : 0
## Max. :16.00 Max. : 4.000 s : 0
## NA's :89 NA's :89
## newgram2 PrevING PrevWord PrevGram
## Mode:logical Min. :0.0000 na : 353 exclude: 8
## NA's:1705 1st Qu.:0.0000 GOING : 94 g :160
## Median :1.0000 SOMETHING: 85 m : 29
## Mean :0.5234 DOING : 71 p :950
## 3rd Qu.:1.0000 GETTING : 53 r : 86
## Max. :1.0000 (Other) :1040 s :110
## NA's :16 NA's : 9 NA's :362
## Lag subtlex.count start end
## Min. : -2.666 Min. : 0 Min. : 9.373 Min. : 9.04
## 1st Qu.: 3.328 1st Qu.: 2735 1st Qu.: 785.922 1st Qu.: 785.31
## Median : 9.621 Median : 12263 Median :1436.023 Median :1430.98
## Mean : 20.233 Mean : 24686 Mean :1489.577 Mean :1487.92
## 3rd Qu.: 24.334 3rd Qu.: 25385 3rd Qu.:2185.721 3rd Qu.:2184.52
## Max. :401.690 Max. :108288 Max. :4469.703 Max. :4468.66
## NA's :9
## style Bin_style Or_style
## C :469 Length:1705 Mode:logical
## N :407 Class :character NA's:1705
## T :372 Mode :character
## R :219
## S :126
## G : 85
## (Other): 27
# Step 2 - begin replicating Labov 2001 by getting mean rates for data across Careful & Casual styles
#turn into dplyr dataframe
ing.prog <- tbl_df(ing.prog)
#say what dataframe
ing.prog.style.fin <- ing.prog %>%
#will create data in long format
group_by(Bin_style) %>%
#summarize; add the ret rate and N column
summarise(ing.rate=mean(code),N=n())
View(ing.prog.style.fin)
# Step 3 - get mean rates across stylistic category by *individual speaker*
#turn into dplyr dataframe
ing.prog <- tbl_df(ing.prog)
#say what dataframe
ing.prog.spkr.style <- ing.prog %>%
#will create data in long format
group_by(File, Bin_style) %>%
#summarize; add the rate and N column
summarise(ing.rate=mean(code),N=n())
View(ing.prog.spkr.style)
# Step 4 - getting a column with size of each individual speaker's style shift
#turning into dplyr data frame
ing.prog <- tbl_df(ing.prog)
#say what data frame
ing.prog.spkr.style2 <- ing.prog %>%
#will create data in long format
group_by(File, Bin_style, sex, age, birthyear) %>%
#summarize; add ing.rate and N columns
summarise(ing.rate=mean(code))
View(ing.prog.spkr.style2)
#opening reshape2 package
library(reshape2)
#making new dataframe - each spkr is 1 row and car and cas rates are 2 separate columns
ing.prog.spkr.style3 <- dcast(ing.prog.spkr.style2, File + sex + age + birthyear ~ Bin_style,
value.var = "ing.rate")
#create style_shift column by subtracting Careful mean from Casual mean
ing.prog.spkr.style3$ING_style_shift <- ing.prog.spkr.style3$Casual - ing.prog.spkr.style3$Careful
#multiplying style_shift values by -100 so align with Labov's (in mag & direction)
ing.prog.spkr.style3$ING_style_shift <- ing.prog.spkr.style3$ING_style_shift * -100
#create age2 column that groups speakers into younger (<30) or older(≥30)
ing.prog.spkr.style3$age2 <- ifelse(ing.prog.spkr.style3$age %in% c(18,20,21,22,23,27,28,29), "younger", "older")
View(ing.prog.spkr.style3)
# Step 5 - replicating Labov figure 5.2 (histogram, Cas index - Car index, for ind spkr)
#opening ggplot2
library(ggplot2)
#making the graph - histogram with degree of ind style shift by gender
ggplot(ing.prog.spkr.style3, aes(ING_style_shift)) +
geom_histogram(stat_bin=5, aes(fill=sex), position="dodge") +
ggtitle("Style-shifting of (ING) variable in Philadelphia")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
I found slightly more style-shifting than Labov did for the (ING) variable, though the style-shifting I observed for (ING) was still not dramatic and occurred in the negative direction (the opposite direction of Labov’s shift). My modal value fell around -10. Labov does not remark upon a difference appearing by sex, and it seems that in my data speakers of each sex fell fairly consistently along the range of style-shift values. It is interesting that for (ING), my most extreme negative values correspond to male speakers and my most extreme positive values to females. These gender extremes are the opposite of what I found with (DH). A clear difference by gender does not otherwise emerge.
# Step 6 - replicating Labov fig 5.5 - distribution of style shift by age and gender
#making the graph - scatterplot with dist of ind style shift by age and gender
ggplot(ing.prog.spkr.style3, aes(age, ING_style_shift)) + geom_point(aes(color=sex)) +
stat_smooth(method = "lm") + ggtitle("Distribution of (ING) style-shift by age and sex")
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
My values for the degree of (ING) style-shift range from approximately -30 to 20, and are consistent with the range of values I obtained for the degree of (DH) style-shift. It is interesting that for (ING), as age increases the points move further away from the overall regression line, while the reverse pattern was found with the (DH) data. As with (DH), the (ING) overall regression line is slightly sloped, increasing in slope as age increases, which suggests a small tendency for speakers to increase their degree of (ING) style-shifting as they age. However, the slope is not dramatic, reinforcing how an age effect on degree of (ING) style-shifting is fairly small. The overall regression line for (ING) is centered around -10, reflecting the slightly greater degree of style-shifting observed for (ING) than for (DH). It is interesting that the (ING) modal value is negative, indicating style-shifting for the (ING) variable occurs opposite the normal direction of style-shifting in society. The lack of a strong correlation between degree of style-shifting and age, and between degree of style-shifting and sex, reinforces the stable nature of the (ING) variable. It is interesting that for older speakers, females tend to exhibit more extreme style-shifting in the positive direction, and males more extreme style-shifting in the negative direction. Labov does not provide a graph showing the relationship between age, gender and degree of (ING) style-shift.
As explained above, Labov interprets this distribution of index scores across the nuanced categories of the decision tree as validation of each category’s ability to contribute to identifying stylistic variation. Labov notes that there are not large differences among the Careful speech subcategories for (ING), and that all of the Careful speech values fall below the (ING) mean value. He notes that three of the four Casual [sic] speech categories are above the (ING) mean value, and that Narrative is at the mean level. Labov introduces the notion of the objectivity of the style-coding process in pointing out that the Tangent value falls farthest from the (ING) mean value, and that Tangent is the least objective coding decision. He claims that this increased subjectivity might increase the likelihood that the linguistic variants used would bias the coder’s decision. Ideally, a coder would not be exposed to any linguistic information while they are coding. However, the presence of potentially biasing information was an issue I encountered while coding, revealing a shortcoming of the interview transcription process and perhaps also of the decision tree style-coding method.
# Step 7 - replicating fig 5.7 - stylistic differentiation for 8 cat of dec tree
#turning into dplyr data frame
ing.prog <- tbl_df(ing.prog)
#say what data frame
ing.prog.style.fin2 <- ing.prog %>%
#will create data in long format
group_by(style, Bin_style) %>%
#summarize; add dh stopping rate and N columns
summarise(index=mean(code),N=n())
#multiplying index values by 100 so correspond to Labov's values
ing.prog.style.fin2$index <- ing.prog.style.fin2$index * 100
#reordering style values on x-axis so match Labov 2001 order
ing.prog.style.fin2$style <- factor(ing.prog.style.fin2$style, c("S","L","R","C","N","K","T","G"))
View(ing.prog.style.fin2)
#replicating figure 5.7 as a bar graph - index score arr. by orig 8 style values
ggplot(ing.prog.style.fin2, aes(style,index)) + geom_bar(stat = "identity") + facet_wrap(~ Bin_style) +
ggtitle("Stylistic Differentiation of (ING) for eight categories of the Style Decision Tree")
My graph indicates that in each broad stylistic category (Careful and Casual), the index values for individual stylistic categories fall within a range. For both the Careful and Casual styles, this range is fairly consistent; it is slightly larger for Casual speech, between around 30 and 50, and between around 30 and 45 for Careful. For Careful, the value for Language is noticeably lower than those of the other three categories, which fall around 40. It is interesting that the Language value is lower. I noticed while coding that some instances of speech in the Language style might have been better coded as Narrative or Tangent, and this inconsistency in speech tokens coded as Language could explain Language’s lower index. For Casual, the categories are more spread out from one another: Kids and Tangent have the highest values, (around 50); Group falls slightly lower (with a value around 40); and Narrative falls lowest (with a value around 30). It is interesting that Narrative had the lowest index value in Casual speech, since Labov emphasized the category’s ability to capture casual/vernacular speech and placed the greatest emphasis on identifying instances of Narrative. That the narrative category has the lowest index raises questions about the style-coding process. Once again, it is interesting that the index value ranges are so similar between the individual style categories across the Careful-Casual stylistic distinction. This seems to indicate that each category contributes something comparable to the task of identifying stylistic variation in speech, and that for better classification an entirely different method may be necessary. The index values Labov found were more noticeably different between the broad Careful and Casual stylistic categories. Labov found a slightly greater difference (approximately 15) of relative clustering between these two categories than I did, but a much smaller difference than was found with (DH).
Labov notes that the (ING) values for men are generally higher than those for women, which he finds unsurprising given the regression analysis of the larger data set. He says that the difference by sex is more consistent in Casual speech (where the two sets have parallel paths) than in Careful speech. Labov draws attention to the fact that within these stylistic subsets, Soapbox is not differentiated from the Casual speech styles for women, while it is differentiated for men. He also mentions that men’s Careful and Casual speech is not differentiated between the Response and Narrative categories.
# Step 8 - replicating Labov's fig 5.8 (stylistic diff by sex for 8 cat of dec tree)
#turning into dplyr data frame
ing.prog <- tbl_df(ing.prog)
#say what data frame
ing.prog.style.fin3 <- ing.prog %>%
#will create data in long format
group_by(style, Bin_style, sex) %>%
#summarize; add dh stopping rate and N columns
summarise(index=mean(code),N=n())
#multiplying index values by 100 so correspond to Labov's values
ing.prog.style.fin3$index <- ing.prog.style.fin3$index * 100
#reordering style values on x-axis so match Labov 2001 order
ing.prog.style.fin3$style <- factor(ing.prog.style.fin3$style, c("S","L","R","C","N","K","T","G"))
View(ing.prog.style.fin3)
#replicating figure 5.8 as bar graph - index score arr. by orig 8 style values and diff by sex
ggplot(ing.prog.style.fin3, aes(style,index)) + geom_bar(stat = "identity", aes(fill=sex)) + facet_wrap(~ Bin_style) +
facet_grid(sex ~ Bin_style) + ggtitle("Stylistic Differentiation of (ING) by sex for eight categories of the Style Decision Tree")
My graph again indicates that (ING) stylistic differentiation is more varied across sex than across style category. That is, both men and women exhibit similar indices of style-shifting to those of their same sex across Careful and Casual styles. The index value for women was fairly consistent across stylistic category (with the exception of the Language category), at around 50. The index values for men tended to be lower than those for women, and were once again fairly consistent across stylistic category. For men, the index values for Careful and Casual (with the exception of Kids) clustered around 30. It is interesting that there was one outlier stylistic category for each sex: Language for women and Kids for men. There is not a clear relationship between these values and Labov’s classification of their objectivity, for he classed Language as relatively objective, but Kids as relatively subjective. My results are also interesting when compared to Labov’s, for as with (DH), Labov’s primary differentiator of index value was stylistic category, not sex. For Labov, within each stylistic category men and women had more similar index values to each other than each sex did when compared to its own sex across stylistic category.
This logistic regression was designed to measure an interaction between the effects of grammar and style. An almost significant interaction occurred between the “participle” grammatical category and style. Labov only analyzed instances of (ING) in participle form. There was not an interaction between the other grammatical categories I analyzed (“gerund”, “mono-morpheme”, “root-attached”, and “something/nothing”) and style. As with the (DH) variable, the results of my logistic regression for (ING) did not find Style to be a significant predictor of speech variant used. That is, my original hypothesis that the percent frequency of the non-standard [-in’] variant would be greater when measured across “casual” than “careful” stylistic contexts was not supported. I fail to reject the null hypothesis that different stylistic contexts have no effect on the variant ([-ing] or [-in’]) realized by a speaker. It is important to note that sex, preceding segment, and following segment for “pause” were all found to be significant predictors of variant used, emphasizing the role that internal and external factors can have on the production of speech variables.
#accessing lme4 (program needed to run regression)
library(lme4)
#accessing data file
ing.lr <- read.csv("ingstyle_fin.csv")
#peeking at first lines of data file
head(ing.lr)
## File Segment Position code Gram Seg_Start Seg_End
## 1 PH00-1-1-JStevens IH0 End 0 o 85.063 85.102
## 2 PH00-1-1-JStevens IH0 End 0 p 112.708 112.738
## 3 PH00-1-1-JStevens IH0 End 1 p 114.648 114.688
## 4 PH00-1-1-JStevens IH0 End 0 p 127.878 127.907
## 5 PH00-1-1-JStevens IH0 End 0 o 147.712 147.743
## 6 PH00-1-1-JStevens IH0 End 1 n 148.553 148.603
## word Word_Start Word_End Pre_Seg Pre_Seg_Start Pre_Seg_End
## 1 WORKING 84.912 85.132 K 85.023 85.063
## 2 FIXING 112.468 112.768 S 112.658 112.708
## 3 NETWORKING 114.308 114.798 K 114.588 114.648
## 4 WORKING 127.687 127.937 K 127.808 127.878
## 5 WAITING 147.593 147.773 T 147.682 147.712
## 6 OPENING 148.413 148.653 N 148.522 148.553
## Post_Seg Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1 S 85.132 85.273 1.280 5.469
## 2 K 112.768 112.808 3.060 2.614
## 3 sp 114.798 115.138 3.635 2.201
## 4 AW 127.937 128.028 1.860 3.226
## 5 T 147.773 147.812 1.310 8.397
## 6 sp 148.653 148.673 1.430 6.993
## preseg_place preseg_manner folseg_place folseg_manner sex birthyear age
## 1 velar stop/aff coronal fricative m 1979 21
## 2 coronal fricative velar stop/aff m 1979 21
## 3 velar stop/aff none pause m 1979 21
## 4 velar stop/aff velar vowel m 1979 21
## 5 coronal stop/aff coronal stop/aff m 1979 21
## 6 coronal nasal none pause m 1979 21
## school school.calc school.cat newgram newgram2 PrevING PrevWord
## 1 14 2 Some college p NA 1 na
## 2 14 2 Some college g NA 0 WORKING
## 3 14 2 Some college g NA 0 FIXING
## 4 14 2 Some college p NA 1 NETWORKING
## 5 14 2 Some college p NA 0 WORKING
## 6 14 2 Some college r NA 0 WAITING
## PrevGram Lag subtlex.count start end style Bin_style Or_style
## 1 <NA> 1.409 12775 84.912 83.950 R NA NA
## 2 p 27.636 533 112.468 107.275 C NA NA
## 3 g 2.030 27 114.308 107.275 C NA NA
## 4 g 13.139 12775 127.687 126.505 C NA NA
## 5 p 8.600 10767 147.593 146.840 C NA NA
## 6 p 0.880 1960 148.413 146.840 C NA NA
#collapsing 8 stylistic categories into 3 style codes in Bin_style
ing.lr$Bin_style <- ifelse(ing.lr$style %in% c("C","L","R","S"), "Careful",
ifelse(ing.lr$style %in% c("G","K","N","T"), "Casual",
"NA"))
#filter so NAs are taken out of code column
ing.lr <- filter(ing.lr, !code=="NA")
#filter so "n/a"s don't appear in table
ing.lr <- filter(ing.lr, !Bin_style=="n/a")
#simplifying preseg and folseg columns
# simplify phonological environment based on previous analysis
ing.lr$Pre_Seg <- "other"
ing.lr$Pre_Seg[
ing.lr$preseg_place=="velar" & ing.lr$preseg_manner=="nasal"
] <- "velar.N"
ing.lr$Pre_Seg[
ing.lr$preseg_place=="coronal" & ing.lr$preseg_manner=="nasal"
] <- "coronal.N"
ing.lr$Pre_Seg[
ing.lr$preseg_place=="coronal" & ing.lr$preseg_manner %in% c("fricative","stop/aff")
] <- "coronal.obs"
ing.lr$Post_Seg <- "other"
ing.lr$Post_Seg[
ing.lr$folseg_place=="velar" & ing.lr$folseg_manner %in% c("fricative",
"nasal",
"stop/aff",
"liquid/glide")
] <- "velar.C"
ing.lr$Post_Seg[
ing.lr$folseg_manner=="pause"
] <- "pause"
#collapsing grammatical categories
#recode values into multiple categories; brackets index rows
ing.lr$newgram2[ing.lr$newgram %in% c("m","s")] <- "ms"
ing.lr$newgram2[ing.lr$newgram %in% c("g","r")] <- "gr"
ing.lr$newgram2[ing.lr$newgram %in% c("p")] <- "p"
#peeking at first lines of file
head(ing.lr)
## File Segment Position code Gram Seg_Start Seg_End
## 1 PH00-1-1-JStevens IH0 End 0 o 85.063 85.102
## 2 PH00-1-1-JStevens IH0 End 0 p 112.708 112.738
## 3 PH00-1-1-JStevens IH0 End 1 p 114.648 114.688
## 4 PH00-1-1-JStevens IH0 End 0 p 127.878 127.907
## 5 PH00-1-1-JStevens IH0 End 0 o 147.712 147.743
## 6 PH00-1-1-JStevens IH0 End 1 n 148.553 148.603
## word Word_Start Word_End Pre_Seg Pre_Seg_Start Pre_Seg_End
## 1 WORKING 84.912 85.132 other 85.023 85.063
## 2 FIXING 112.468 112.768 coronal.obs 112.658 112.708
## 3 NETWORKING 114.308 114.798 other 114.588 114.648
## 4 WORKING 127.687 127.937 other 127.808 127.878
## 5 WAITING 147.593 147.773 coronal.obs 147.682 147.712
## 6 OPENING 148.413 148.653 coronal.N 148.522 148.553
## Post_Seg Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## 1 other 85.132 85.273 1.280 5.469
## 2 velar.C 112.768 112.808 3.060 2.614
## 3 pause 114.798 115.138 3.635 2.201
## 4 other 127.937 128.028 1.860 3.226
## 5 other 147.773 147.812 1.310 8.397
## 6 pause 148.653 148.673 1.430 6.993
## preseg_place preseg_manner folseg_place folseg_manner sex birthyear age
## 1 velar stop/aff coronal fricative m 1979 21
## 2 coronal fricative velar stop/aff m 1979 21
## 3 velar stop/aff none pause m 1979 21
## 4 velar stop/aff velar vowel m 1979 21
## 5 coronal stop/aff coronal stop/aff m 1979 21
## 6 coronal nasal none pause m 1979 21
## school school.calc school.cat newgram newgram2 PrevING PrevWord
## 1 14 2 Some college p p 1 na
## 2 14 2 Some college g gr 0 WORKING
## 3 14 2 Some college g gr 0 FIXING
## 4 14 2 Some college p p 1 NETWORKING
## 5 14 2 Some college p p 0 WORKING
## 6 14 2 Some college r gr 0 WAITING
## PrevGram Lag subtlex.count start end style Bin_style Or_style
## 1 <NA> 1.409 12775 84.912 83.950 R Careful NA
## 2 p 27.636 533 112.468 107.275 C Careful NA
## 3 g 2.030 27 114.308 107.275 C Careful NA
## 4 g 13.139 12775 127.687 126.505 C Careful NA
## 5 p 8.600 10767 147.593 146.840 C Careful NA
## 6 p 0.880 1960 148.413 146.840 C Careful NA
#summary of file contents -> ensure everything in correct format
summary(ing.lr)
## File Segment Position code
## PH06-2-2-Patrick : 171 IH0:3799 End:3799 Min. :0.0000
## PH82-1-10-MCollins : 163 1st Qu.:0.0000
## PH06-1-6-JMcPhee : 145 Median :0.0000
## PH92-2-3-JSantori : 142 Mean :0.4267
## PH06-2-5-Sophia : 141 3rd Qu.:1.0000
## PH79-3-5-VSarsparilla: 140 Max. :1.0000
## (Other) :2897
## Gram Seg_Start Seg_End word
## a: 102 Min. : 9.493 Min. : 9.542 SOMETHING: 329
## d: 49 1st Qu.: 746.888 1st Qu.: 746.927 GOING : 285
## g: 188 Median :1436.708 Median :1436.758 DOING : 165
## n: 241 Mean :1497.028 Mean :1497.068 GETTING : 150
## o:1705 3rd Qu.:2200.831 3rd Qu.:2200.885 BEING : 114
## p:1085 Max. :4469.943 Max. :4469.993 COMING : 109
## s: 429 (Other) :2647
## Word_Start Word_End Pre_Seg
## Min. : 9.373 Min. : 9.642 Length:3799
## 1st Qu.: 746.652 1st Qu.: 746.967 Class :character
## Median :1436.358 Median :1436.878 Mode :character
## Mean :1496.778 Mean :1497.137
## 3rd Qu.:2200.555 3rd Qu.:2200.915
## Max. :4469.703 Max. :4470.052
##
## Pre_Seg_Start Pre_Seg_End Post_Seg
## Min. : 9.413 Min. : 9.493 Length:3799
## 1st Qu.: 746.808 1st Qu.: 746.888 Class :character
## Median :1436.498 Median :1436.708 Mode :character
## Mean :1496.952 Mean :1497.028
## 3rd Qu.:2200.755 3rd Qu.:2200.831
## Max. :4469.793 Max. :4469.943
##
## Post_Seg_Start Post_Seg_End Window Vowels_per_Second
## Min. : 9.642 Min. : 9.782 Min. : 0.570 Min. : 0.148
## 1st Qu.: 746.967 1st Qu.: 747.044 1st Qu.: 1.300 1st Qu.: 3.510
## Median :1436.878 Median :1437.028 Median : 1.610 Median : 4.788
## Mean :1497.137 Mean :1497.248 Mean : 1.813 Mean : 4.833
## 3rd Qu.:2200.915 3rd Qu.:2200.965 3rd Qu.: 1.970 3rd Qu.: 6.028
## Max. :4470.052 Max. :4470.133 Max. :33.877 Max. :13.115
##
## preseg_place preseg_manner folseg_place folseg_manner
## coronal:2139 fricative : 827 coronal:1388 fricative : 559
## labial :1002 liquid/glide: 246 labial : 517 liquid/glide: 504
## velar : 658 nasal : 405 none : 863 nasal : 146
## stop/aff :1359 velar :1031 pause : 863
## vowel : 962 stop/aff : 564
## vowel :1163
##
## sex birthyear age school
## f:1937 Min. :1895 Min. :18.00 Min. : 0.00
## m:1862 1st Qu.:1930 1st Qu.:29.00 1st Qu.:11.00
## Median :1944 Median :43.00 Median :12.00
## Mean :1946 Mean :45.07 Mean :12.02
## 3rd Qu.:1965 3rd Qu.:62.00 3rd Qu.:14.00
## Max. :1984 Max. :78.00 Max. :16.00
## NA's :177
## school.calc school.cat newgram newgram2
## Min. :-12.00000 Just HS :1442 exclude: 18 Length:3799
## 1st Qu.: -1.00000 Not HS :1012 g : 625 Class :character
## Median : 0.00000 Some college:1168 m : 138 Mode :character
## Mean : 0.01712 NA's : 177 p :2262
## 3rd Qu.: 2.00000 r : 324
## Max. : 4.00000 s : 432
## NA's :177
## PrevING PrevWord PrevGram Lag
## Min. :0.0000 na : 622 exclude: 16 Min. : -2.666
## 1st Qu.:0.0000 SOMETHING: 263 g : 515 1st Qu.: 3.250
## Median :1.0000 GOING : 240 m : 112 Median : 9.973
## Mean :0.5155 DOING : 142 p :1916 Mean : 20.292
## 3rd Qu.:1.0000 GETTING : 123 r : 257 3rd Qu.: 25.974
## Max. :1.0000 (Other) :2392 s : 343 Max. :401.690
## NA's :28 NA's : 17 NA's : 640 NA's :17
## subtlex.count start end style
## Min. : 0 Min. : 9.373 Min. : 9.04 C :1206
## 1st Qu.: 2181 1st Qu.: 746.652 1st Qu.: 745.57 T : 916
## Median : 9730 Median :1436.358 Median :1434.12 N : 665
## Mean : 24997 Mean :1496.778 Mean :1495.09 R : 472
## 3rd Qu.: 26878 3rd Qu.:2200.555 3rd Qu.:2199.34 S : 350
## Max. :108288 Max. :4469.703 Max. :4468.66 G : 128
## (Other): 62
## Bin_style Or_style
## Length:3799 Mode:logical
## Class :character NA's:3799
## Mode :character
##
##
##
##
View(ing.lr)
#now random effects are included in the models
#creating model 1 - testing for interaction
mod1.ing <- glmer(code ~ sex + birthyear + Pre_Seg + Post_Seg + newgram2 * Bin_style +
(1|File), ing.lr, family = "binomial")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues
#creating model 2 - not testing for interaction
mod2.ing <- glmer(code ~ sex + birthyear + Pre_Seg + Post_Seg + newgram2 + Bin_style +
(1|File), ing.lr, family = "binomial")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues
#seeing the results of mod 1
summary(mod1.ing)
## Warning in vcov.merMod(object, correlation = correlation, sigm = sig): variance-covariance matrix computed from finite-difference Hessian is
## not positive definite or contains NA values: falling back to var-cov estimated from RX
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## code ~ sex + birthyear + Pre_Seg + Post_Seg + newgram2 * Bin_style +
## (1 | File)
## Data: ing.lr
##
## AIC BIC logLik deviance df.resid
## 3640.1 3727.4 -1806.0 3612.1 3767
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.7060 -0.5473 -0.2824 0.5122 5.8853
##
## Random effects:
## Groups Name Variance Std.Dev.
## File (Intercept) 2.339 1.529
## Number of obs: 3781, groups: File, 40
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.0294630 21.3842071 0.095 0.92439
## sexm -1.1511843 0.4950596 -2.325 0.02005 *
## birthyear 0.0003583 0.0109941 0.033 0.97400
## Pre_Segcoronal.obs -1.3318672 0.1902577 -7.000 2.55e-12 ***
## Pre_Segother -2.1171236 0.1920858 -11.022 < 2e-16 ***
## Pre_Segvelar.N -2.4733188 0.5728748 -4.317 1.58e-05 ***
## Post_Segpause 0.7267724 0.1009590 7.199 6.08e-13 ***
## Post_Segvelar.C 0.2108500 0.1844139 1.143 0.25289
## newgram2ms -1.3236769 0.1822270 -7.264 3.76e-13 ***
## newgram2p -1.0047381 0.1315023 -7.640 2.16e-14 ***
## Bin_styleCasual -0.0144201 0.1728724 -0.083 0.93352
## newgram2ms:Bin_styleCasual -0.0492485 0.2716102 -0.181 0.85612
## newgram2p:Bin_styleCasual -0.5362775 0.2043097 -2.625 0.00867 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) sexm brthyr Pr_Sg. Pr_Sgt Pr_S.N Pst_Sg Ps_S.C
## sexm -0.026
## birthyear -1.000 0.015
## Pr_Sgcrnl.b -0.018 0.009 0.011
## Pre_Segothr -0.016 0.013 0.008 0.865
## Pre_Sgvlr.N -0.004 0.004 0.001 0.290 0.310
## Post_Segpas -0.012 -0.003 0.011 -0.011 -0.012 0.029
## Pst_Sgvlr.C -0.004 -0.003 0.003 0.012 0.005 -0.001 0.140
## newgram2ms 0.001 0.001 -0.004 -0.025 0.132 0.046 -0.069 0.010
## newgram2p 0.003 0.012 -0.007 -0.028 -0.030 -0.019 -0.002 0.028
## Bin_stylCsl -0.010 0.013 0.007 0.002 0.019 0.020 -0.025 -0.028
## nwgrm2m:B_C -0.001 0.001 0.002 0.062 0.037 0.010 0.042 0.017
## nwgrm2p:B_C 0.000 -0.001 0.002 0.018 0.008 0.001 0.011 -0.003
## nwgrm2m nwgrm2p Bn_stC nwgrm2m:B_C
## sexm
## birthyear
## Pr_Sgcrnl.b
## Pre_Segothr
## Pre_Sgvlr.N
## Post_Segpas
## Pst_Sgvlr.C
## newgram2ms
## newgram2p 0.452
## Bin_stylCsl 0.349 0.480
## nwgrm2m:B_C -0.614 -0.307 -0.612
## nwgrm2p:B_C -0.295 -0.632 -0.811 0.524
## convergence code: 0
## unable to evaluate scaled gradient
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
This logistic regression was not designed to measure an interaction between the effects of grammar and style. Once again, the style independent variable was not found to be a significant predictor of speech variant used. That is, my original hypothesis that the percent frequency of the non-standard [-in’] variant would be greater when measured across “casual” than “careful” stylistic contexts was not supported, and I fail to reject the null hypothesis. It is important to note that sex, preceding segment, following segment for “pause”, and grammar were all found to be significant predictors of the speech variant used.
#seeing the results of mod 2
summary(mod2.ing)
## Warning in vcov.merMod(object, correlation = correlation, sigm = sig): variance-covariance matrix computed from finite-difference Hessian is
## not positive definite or contains NA values: falling back to var-cov estimated from RX
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## code ~ sex + birthyear + Pre_Seg + Post_Seg + newgram2 + Bin_style +
## (1 | File)
## Data: ing.lr
##
## AIC BIC logLik deviance df.resid
## 3644.9 3719.7 -1810.4 3620.9 3769
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.7239 -0.5469 -0.2854 0.5279 5.5580
##
## Random effects:
## Groups Name Variance Std.Dev.
## File (Intercept) 2.322 1.524
## Number of obs: 3781, groups: File, 40
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.1165813 21.3102841 0.099 0.920883
## sexm -1.1557525 0.4933465 -2.343 0.019146 *
## birthyear 0.0003824 0.0109560 0.035 0.972154
## Pre_Segcoronal.obs -1.3430523 0.1900666 -7.066 1.59e-12 ***
## Pre_Segother -2.1296033 0.1922150 -11.079 < 2e-16 ***
## Pre_Segvelar.N -2.4868536 0.5765801 -4.313 1.61e-05 ***
## Post_Segpause 0.7255888 0.1006942 7.206 5.77e-13 ***
## Post_Segvelar.C 0.2048605 0.1834546 1.117 0.264130
## newgram2ms -1.3313415 0.1433271 -9.289 < 2e-16 ***
## newgram2p -1.2322022 0.1016017 -12.128 < 2e-16 ***
## Bin_styleCasual -0.3305277 0.0933410 -3.541 0.000398 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) sexm brthyr Pr_Sg. Pr_Sgt Pr_S.N Pst_Sg Ps_S.C
## sexm -0.026
## birthyear -1.000 0.015
## Pr_Sgcrnl.b -0.019 0.009 0.011
## Pre_Segothr -0.016 0.013 0.008 0.865
## Pre_Sgvlr.N -0.004 0.005 0.001 0.289 0.308
## Post_Segpas -0.012 -0.003 0.011 -0.013 -0.013 0.028
## Pst_Sgvlr.C -0.004 -0.003 0.003 0.010 0.004 -0.001 0.140
## newgram2ms 0.000 0.002 -0.004 0.018 0.198 0.067 -0.054 0.024
## newgram2p 0.003 0.015 -0.006 -0.022 -0.033 -0.024 0.005 0.033
## Bin_stylCsl -0.019 0.023 0.017 0.059 0.066 0.043 -0.013 -0.047
## nwgrm2m nwgrm2p
## sexm
## birthyear
## Pr_Sgcrnl.b
## Pre_Segothr
## Pre_Sgvlr.N
## Post_Segpas
## Pst_Sgvlr.C
## newgram2ms
## newgram2p 0.458
## Bin_stylCsl -0.010 -0.071
## convergence code: 0
## unable to evaluate scaled gradient
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
An ANOVA was conducted to compare the two logistic regressions run for (ING) (to see if the more complex model, that tested for an interaction between grammar and style, really fit the data better enough to justify its increased complexity). The deviance between the two models was -7.2861 ???. This does not indicate a significant difference between the two models, and so model two (the model not testing for a grammar-style interaction) will be used to analyze this data set.