Use this R Markdown script to scrape Linguist List for job trends with the following libraries:
Use the code as you like, but please just give a shout out if you use this script. Comments are most welcome!
The script is maintained on a githib repository (https://github.com/jaharris/linglist-scrape). Just type the following git command into your terminal to clone the repo:
> git clone https://github.com/jaharris/linglist-scrape.git scrape
The script downloads the Linguist List job posting archives for the years specified below. After some reformatting, it removes all but tenture track job postings and categorizes the jobs according to keywords listed in the posting. The method for categorization largely follows previous efforts by Chris Potts, Heidi Harley, Stephanie Shih, and Rebecca Starr (see the Language Log postings on the 2008 data, 2009 data, and 2009-2012 data).
Fields currently reported are limited to the following fields, closely following the procedure discused here.
## [1] "phonetics" "phonology" "morphology"
## [4] "syntax" "semantics" "historical"
## [7] "sociolinguistics" "psycholinguistics" "langcog"
## [10] "computational" "acquisition" "fieldwork"
Important caveat: The numbers reported here have been compiled without reading the actual post, and are limited to the keywords listed in the post itself. I make no claims regarding the accuracy of the categorization. In addition, some job postings may classified within multiple fields (e.g., syntax and morphology), leading to double counting (but see my attempt to normalize categorizes of the postings).
Three kinds of file are produced by the script: a summary document in html, data files in csv format, and plots in pdf format.
The script produces three csv files with data from the period of interest:
The script outputs a plot summarizing the average number of jobs posted for each field and a plot showing how many jobs were advertised in a single year. Plots are printed directly to pdf format.
Enter years to search in the R markdown file here.
start = 2004
end = 2019
dates <- start:end
Set to search Linguist List from 2004 to 2019.
NB. Start date cannot be before 2000. Note also that there are very few job advertisements before 2004.The remainder of the script should run without additional input.
You can also decide whether you want to include jobs with Applied Linguistics in the listing by changing the no.applied variable to `FALSE’.
no.applied <- TRUE
| Average number of jobs posted in period by year. | |||||||||||||||||
| 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | Total | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| acquisition | 4 | 6 | 6 | 4 | 5 | 1 | 2 | 7 | 10 | 10 | 5 | 8 | 22 | 21 | 15 | 12 | 138 |
| computational | 9 | 10 | 4 | 9 | 6 | 4 | 7 | 4 | 5 | 11 | 3 | 13 | 15 | 13 | 21 | 18 | 152 |
| fieldwork | 0 | 0 | 2 | 0 | 2 | 2 | 3 | 0 | 0 | 3 | 1 | 2 | 6 | 2 | 0 | 2 | 25 |
| historical | 1 | 2 | 1 | 3 | 5 | 2 | 1 | 3 | 3 | 6 | 2 | 1 | 3 | 4 | 2 | 5 | 44 |
| langcog | 1 | 4 | 2 | 1 | 5 | 4 | 4 | 6 | 4 | 7 | 4 | 6 | 2 | 4 | 6 | 6 | 66 |
| morphology | 0 | 1 | 1 | 1 | 0 | 2 | 3 | 0 | 2 | 3 | 3 | 3 | 5 | 2 | 2 | 6 | 34 |
| phonetics | 2 | 8 | 11 | 7 | 8 | 4 | 6 | 4 | 13 | 11 | 6 | 14 | 7 | 8 | 3 | 8 | 120 |
| phonology | 4 | 17 | 11 | 9 | 10 | 8 | 14 | 14 | 15 | 13 | 7 | 18 | 18 | 10 | 9 | 14 | 191 |
| psycholinguistics | 6 | 4 | 5 | 2 | 5 | 3 | 5 | 8 | 9 | 11 | 3 | 12 | 12 | 6 | 6 | 14 | 111 |
| semantics | 6 | 9 | 6 | 6 | 5 | 9 | 6 | 5 | 11 | 8 | 7 | 4 | 8 | 11 | 7 | 6 | 114 |
| sociolinguistics | 6 | 12 | 11 | 10 | 8 | 7 | 11 | 9 | 13 | 12 | 8 | 9 | 11 | 8 | 14 | 10 | 159 |
| syntax | 4 | 10 | 15 | 8 | 12 | 10 | 15 | 6 | 12 | 15 | 9 | 15 | 11 | 17 | 18 | 20 | 197 |
| Total | 43 | 83 | 75 | 60 | 71 | 56 | 77 | 66 | 97 | 110 | 58 | 105 | 120 | 106 | 103 | 121 | 1351 |
Hover over the boxes to view more information about the data:
Click on the legend to toggle the visibiity of a subfield:
Some of the job postings have multiple keywords. I’ve normalized these postings by dividing each job by how many total fields it references. For example, if a posting were listed with both semantics and psycholinguistics as keywords, the posting would only contribute 0.5 to the overall total of jobs in each field.
| Normalized average number of jobs posted in period by year. | |||||||||||||||||
| 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | Total | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| acquisition | 3.5 | 5.5 | 5 | 4 | 4 | 1 | 2 | 5.33 | 9 | 6.08 | 4.25 | 4 | 15.23 | 12.75 | 11.75 | 7.67 | 101.07 |
| computational | 8.5 | 9 | 4 | 8.5 | 6 | 3.5 | 6.5 | 4 | 3.08 | 8.08 | 1.83 | 8.62 | 9.95 | 7.75 | 20.33 | 11.95 | 121.6 |
| fieldwork | 0 | 0 | 2 | 0 | 2 | 1.5 | 3 | 0 | 0 | 3 | 0.33 | 1.5 | 4.45 | 1.33 | 0 | 1.2 | 20.32 |
| historical | 1 | 2 | 1 | 3 | 5 | 1 | 1 | 2.33 | 2 | 3.67 | 1 | 1 | 2 | 1.67 | 0.83 | 3 | 31.5 |
| langcog | 1 | 2.5 | 1.5 | 1 | 4.5 | 3.33 | 3 | 4.5 | 2.83 | 2.75 | 2.83 | 2.42 | 1.5 | 2 | 2.92 | 3.17 | 41.75 |
| morphology | 0 | 1 | 1 | 0.33 | 0 | 1.25 | 1.5 | 0 | 0.83 | 1.33 | 1 | 0.95 | 1.32 | 0.67 | 1.33 | 2 | 14.52 |
| phonetics | 2 | 5.5 | 8.5 | 5 | 6.5 | 2.83 | 3.17 | 1.83 | 9.75 | 5.17 | 3.83 | 6.33 | 3.83 | 4.5 | 2 | 3.53 | 74.28 |
| phonology | 3.5 | 15 | 6.83 | 6 | 8.5 | 4.42 | 9 | 10.5 | 10.5 | 6.92 | 5.17 | 10.95 | 9.23 | 5.5 | 6.83 | 6.03 | 124.88 |
| psycholinguistics | 5.5 | 3 | 4.5 | 2 | 4.5 | 2.5 | 5 | 6 | 6.83 | 5.5 | 1.75 | 8.92 | 6.83 | 4.83 | 3.83 | 7.95 | 79.45 |
| semantics | 4.5 | 8 | 4.83 | 3.33 | 4 | 5.25 | 2.67 | 3.83 | 8.08 | 5.08 | 5.75 | 1.67 | 4.28 | 8.08 | 3.92 | 3.33 | 76.62 |
| sociolinguistics | 5.5 | 11.5 | 11 | 8.83 | 8 | 6.5 | 11 | 7.5 | 9.5 | 8.42 | 6.33 | 4.37 | 6.58 | 4.67 | 11.83 | 6.08 | 127.62 |
| syntax | 3 | 9 | 13.83 | 6 | 10 | 5.92 | 11.17 | 4.17 | 7.58 | 10 | 5.92 | 10.28 | 4.78 | 12.25 | 13.42 | 11.08 | 138.4 |
| Total | 38 | 72 | 64 | 48 | 63 | 39 | 59 | 50 | 70 | 66 | 40 | 61 | 70 | 66 | 79 | 67 | 952 |