Gender identity and lexical variation in social media - Theory part

Dmitriy Tsimokha
10/09/18

Brief info

Scientists: David Bamman (Pennsylvania), Jacob Eisenstein (Georgia) & Tyler Schnoebelen (California)

Study of the relationship between:

  • gender
  • linguistic style
  • social networks

Data: a novel corpus of 14,000 Twitter users

Previous work

Features of this qualitative research (in the comparison with their previous quantitative work):

  • clusterization reveals many possible alignments between linguistic resources and gender
  • significant correlation between the use of the mainstream gendered language and the homophility of social network

Background

Gender for sociolinguists were constructed, maintained, and disrupted by linguiscit practices.

Instrumentalist paradigm that emphasizes prediction of latent attributes from text:

  • In this methodology, accurate predictions justify a post hoc analysis to identify the words that are the most effective predictors; these words are then assembled into groups.
  • This reverses the direction of earlier corpus-based work in which word classes are defined in advance, and then compared quantitatively across genders.

Argamon et al. (2007)

Assembled 19,320 English blogs (681,288 posts, 140 million words), and built a predictive model of gender that achieves 80.5 percent accuracy

A post-hoc factor analysis found that:

  • content-related factors are used more often by men
  • while style-related factors are used more by women.

Rao et al. (2010)

Trained a classifier on a dataset of posts (‘tweets’) by 1,000 authors

They found that:

  • women used more emoticons, ellipses (), expressive lengthening (nooo waaay), complex punctuation (!! and ?!), and transcriptions of backchannels (ah, hmm)
  • The only words strongly associated with men were affirmations like yeah and yea.

Burger et al. (2011)

Identified author gender by linking 184,000 Twitter accounts to blog profiles with gender metadata

They found that automatic prediction of author gender is more accurate that the judgments of human raters.

‘informativeness’ and ‘involvement’

  • the involvement dimension consists of linguistic resources that create interactions between speakers and their audiences
  • the informational dimension consists of resources that communicate propositional content

Informational word classes were found to be used preferentially by men Involvement and interaction are associated with women

  • males are seen as preferring a ‘formal’ and ‘explicit’ style
  • while females are seen as preferring a style that is more ‘deictic’ and ‘contextual

Herring and Paolillo (2006)

  • women were more likely to write ‘diary’ blogs
  • men were more likely to write ‘filter’ blogs, linking to external content.

The involvement and informational word classes were associated with these genres, and the genres were in turn associated with gender. But, within each genre, there were no significant gender differences in the frequency of the word classes.

Eckert and McConnell-Ginet (1995)

Examined the interaction between gender and the local categories of school-oriented ‘jocks’ and anti-school ‘burnouts’:

  • boys were less standard than girls in general
  • the most non-standard language was employed by a group of ‘burned-out burnoutgirls

Eckert (2008)

the social meaning of linguistic variables depends crucially on the social and linguistic context in which they are deployed

Rather than describing variables like ING/IN as a direct reflection of gender or class, they can be seen as reflecting a field of meanings: educated/uneducated, effortful/easygoing, articulate/inarticulate, pretentious/unpretentious, formal/relaxed, and so on

This view has roots in Butler’s (1990: 179) casting of gender as a stylized repetition of acts, creating a relationship between (at least) an individual, an audience, and a topic

Final

gender and other social categories are performances, and these categories are performed differently in different situations

ways in which the interaction between language and gender are mediated by situational contexts

Each of these studies demonstrates a richness of interactions between language, gender, and situational context.