Documentation

Use this R Markdown script to scrape Linguist List for job trends with the following libraries:

Use the code as you like, but please just give a shout out if you use this script. Comments are most welcome!

The script is maintained on a githib repository (https://github.com/jaharris/linglist-scrape). Just type the following git command into your terminal to clone the repo:

> git clone https://github.com/jaharris/linglist-scrape.git scrape

Methodology

The script downloads the Linguist List job posting archives for the years specified below. After some reformatting, it removes all but tenture track job postings and categorizes the jobs according to keywords listed in the posting. The method for categorization largely follows previous efforts by Chris Potts, Heidi Harley, Stephanie Shih, and Rebecca Starr (see the Language Log postings on the 2008 data, 2009 data, and 2009-2012 data).

Fields currently reported are limited to the following fields, closely following the procedure discused here.

##  [1] "phonetics"         "phonology"         "morphology"       
##  [4] "syntax"            "semantics"         "historical"       
##  [7] "sociolinguistics"  "psycholinguistics" "langcog"          
## [10] "computational"     "acquisition"       "fieldwork"

Important caveat: The numbers reported here have been compiled without reading the actual post, and are limited to the keywords listed in the post itself. I make no claims regarding the accuracy of the categorization. In addition, some job postings may classified within multiple fields (e.g., syntax and morphology), leading to double counting (but see my attempt to normalize categorizes of the postings).

Output

Three kinds of file are produced by the script: a summary document in html, data files in csv format, and plots in pdf format.

Data

The script produces three csv files with data from the period of interest:

  1. All the jobs data after post-processing (including the original job listing),
  2. All the tenure track jobs in the fields of interest, and
  3. A basic summary of the number of jobs listed per year in each field.
Plots

The script outputs a plot summarizing the average number of jobs posted for each field and a plot showing how many jobs were advertised in a single year. Plots are printed directly to pdf format.

Set years here

Enter years to search in the R markdown file here.

start = 2004
end = 2019
dates <- start:end

Set to search Linguist List from 2004 to 2019.

NB. Start date cannot be before 2000. Note also that there are very few job advertisements before 2004.The remainder of the script should run without additional input.

You can also decide whether you want to include jobs with Applied Linguistics in the listing by changing the no.applied variable to `FALSE’.

no.applied <- TRUE

Tenure track job listings from 2004 to 2019.

Average number of jobs posted in period by year.
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Total
acquisition 4 6 6 4 5 1 2 7 10 10 5 8 22 21 15 12 138
computational 9 10 4 9 6 4 7 4 5 11 3 13 15 13 21 18 152
fieldwork 0 0 2 0 2 2 3 0 0 3 1 2 6 2 0 2 25
historical 1 2 1 3 5 2 1 3 3 6 2 1 3 4 2 5 44
langcog 1 4 2 1 5 4 4 6 4 7 4 6 2 4 6 6 66
morphology 0 1 1 1 0 2 3 0 2 3 3 3 5 2 2 6 34
phonetics 2 8 11 7 8 4 6 4 13 11 6 14 7 8 3 8 120
phonology 4 17 11 9 10 8 14 14 15 13 7 18 18 10 9 14 191
psycholinguistics 6 4 5 2 5 3 5 8 9 11 3 12 12 6 6 14 111
semantics 6 9 6 6 5 9 6 5 11 8 7 4 8 11 7 6 114
sociolinguistics 6 12 11 10 8 7 11 9 13 12 8 9 11 8 14 10 159
syntax 4 10 15 8 12 10 15 6 12 15 9 15 11 17 18 20 197
Total 43 83 75 60 71 56 77 66 97 110 58 105 120 106 103 121 1351



Hover over the boxes to view more information about the data:





Click on the legend to toggle the visibiity of a subfield:



Normalizing the postings

Some of the job postings have multiple keywords. I’ve normalized these postings by dividing each job by how many total fields it references. For example, if a posting were listed with both semantics and psycholinguistics as keywords, the posting would only contribute 0.5 to the overall total of jobs in each field.



Normalized average number of jobs posted in period by year.
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Total
acquisition 3.5 5.5 5 4 4 1 2 5.33 9 6.08 4.25 4 15.23 12.75 11.75 7.67 101.07
computational 8.5 9 4 8.5 6 3.5 6.5 4 3.08 8.08 1.83 8.62 9.95 7.75 20.33 11.95 121.6
fieldwork 0 0 2 0 2 1.5 3 0 0 3 0.33 1.5 4.45 1.33 0 1.2 20.32
historical 1 2 1 3 5 1 1 2.33 2 3.67 1 1 2 1.67 0.83 3 31.5
langcog 1 2.5 1.5 1 4.5 3.33 3 4.5 2.83 2.75 2.83 2.42 1.5 2 2.92 3.17 41.75
morphology 0 1 1 0.33 0 1.25 1.5 0 0.83 1.33 1 0.95 1.32 0.67 1.33 2 14.52
phonetics 2 5.5 8.5 5 6.5 2.83 3.17 1.83 9.75 5.17 3.83 6.33 3.83 4.5 2 3.53 74.28
phonology 3.5 15 6.83 6 8.5 4.42 9 10.5 10.5 6.92 5.17 10.95 9.23 5.5 6.83 6.03 124.88
psycholinguistics 5.5 3 4.5 2 4.5 2.5 5 6 6.83 5.5 1.75 8.92 6.83 4.83 3.83 7.95 79.45
semantics 4.5 8 4.83 3.33 4 5.25 2.67 3.83 8.08 5.08 5.75 1.67 4.28 8.08 3.92 3.33 76.62
sociolinguistics 5.5 11.5 11 8.83 8 6.5 11 7.5 9.5 8.42 6.33 4.37 6.58 4.67 11.83 6.08 127.62
syntax 3 9 13.83 6 10 5.92 11.17 4.17 7.58 10 5.92 10.28 4.78 12.25 13.42 11.08 138.4
Total 38 72 64 48 63 39 59 50 70 66 40 61 70 66 79 67 952