Chris Bail
Department of Sociology
Duke University
A nascent field that uses computer-based techniques to analyze the rapidly increasing amount of data produced via digital sources or currently being archived in digital form.
Because many of the most pressing questions in sociology require vast amounts of data about social relationships between large groups of people that are rich in qualitative, longitudinal detail
if we don't do it, other fields will!
Computational methods not only require considerable technical knowledge about how to code, collect unstructured data, and analyze large amounts of data– they also introduce a range of original new theoretical, ethical, and logistical issues as well.
Computer science courses don't anticipate the types of questions social scientists might ask, and therefore they a) introduce many unnecessary concepts; b) do a poor job of explaining how computer programming tools might be used by social scientists; and c) computer scientists that engage social science theories usually produce lackluster research.
No. In my experience, learning to code is 5% intelligence, 95% endurance.
This course assumes NO prior experience coding… or your money back.
Unspoken/tacit knowledge among coders
Learning to think like a computer
Assuming failings are your fault
Expecting too much of computers
Becoming impatient while learning elementary concepts
We will learn to code together. You will do what I do on my laptop
We will work through messy, “real-world” examples that you might encounter.
We will tie our work in this class to your own research/publications
A little bit about me…
Please tell me your name, your disciplinary background, and why you are interested in taking this course. Please also tell me if you have any experience with the R programming language, or any other programming language.
This class may be a bit more didactic than other courses you have taken.
Each class will begin with hands-on instruction in coding, followed by individual exercises.
I have assigned ungraded homework assignments- each of which are designed to help you stitch together data collection and analysis for a final paper.
I HOPE this paper will complement something you are already doing (MA paper, Dissertation Proposal, Dissertation Article, etc.)
I have also assigned readings which we will discuss in the context of coding together. These readings are intended to illustrate potential applications of the methods, and inspire you to think about how the data might be used.
My deeply pragmatic teaching philosophy for this course requires you to give me constant input, and ask for feedback on your work whenever you need it. I am happy to look at homework assignments with you in office hours, or read an outline or first draft of your final paper if you give me enough time to do so.
Pros
1) Free; 2) Large user-base; 3) Always at the edge of innovation
Pros
1) Free; 2) Large user-base; 3) Always at the edge of innovation
Cons
1) Disorganized; 2) Steep learning curve/domain knowledge required; 3) No “manual” or customer support
Access to state-of-the-art statistics
Impressive visualization capabilities
Faster and more efficient than STATA/SPSS/SAS
Fully fledged programming language that can interface with other powerful languages (e.g. Python, C++)
R can scrape data from the web.
R is the most likely to become the lingua franca of computational sociology because it combines statistics, visualization, and programming capabailities like no other language.
All object-oriented languages have steep learning curves.
This is because it is not a “stand alone” software package that can be downloaded and used “off the shelf.”
R is difficult for STATA/SPSS/SAS users to learn because you must master a variety of concepts that are unique to object-oriented languages.
These basic concepts may seem irrelevant to your work right now will become extremely useful later because most types of analysis in R (and computational sociology more broadly) are more complicated than they seem at first.
There is so much to learn that I recommend not focusing on memorizing syntax or “facts” about R.
instead try to get a general sense of how things work as well as info about how to look up such details when you need them.
First, install R Terminal: http://cran.rstudio.com/
Second, install RStudio: http://www.rstudio.com/products/rstudio/download/
Choose the appropriate version of each (Mac/PC/Linux)
Please visit this link to download the files we will use for the remainder of the class: http://bit.ly/2c1E96s
Download the entire folder to a location you will remember- the Desktop is OK but you may prefer to create a new folder.
Start by opening the file entitled “Class # 1 R Code.R”
If you want to follow along with this presentation, open this .html file in a browswer:
Computational Social (Class #1).html
I recommend opening a “new script” using the “File/New” drop-down menu in RStudio and typing in the code on the slides that follow.
I have tried to “take notes” for you, in the form of an annotated script within the Dropbox entitled “Class # 1 R Code.R”
I recommend that you type the code yourself into the new file in order to “learn by doing,” etc.
This line tells you what your “working directory” is:
getwd()
[1] "/Users/christopherandrewbail/Desktop/Dropbox/TEACHING/Computational Soc Fall 2015/Course Dropbox"
To set your working directory to the desktop, type,
setwd("~/Desktop")
Sometimes, you will need to specify the entire file path:
setwd("/Users/christopherandrewbail/Desktop/Dropbox/TEACHING/Computational Soc Fall 2015/Course Dropbox")
Next, let's take a look at what documents are in your working directory:
list.files()
[1] "Class # 1 R Code.R" "Income By Race.xlsx" "Introduction to R Day 1.html"
[4] "Introduction to R Day 2.html" "OECD Health Data" "Pew Data.Rdata"
[7] "R Cheatsheets" "Sample Stata Data.dta" "Sample_CSV_Data.csv"
[10] "Sample_Pew_data.csv" "Syllabus (Computational Sociology).docx
A very basic operation:
2+2
[1] 4
Now let's create our first object or variable in R.
To do this, you need to use the <- operator
my_number<-2
If you type my_number and run the line (control+enter)
you will see it has a value of “2”:
my_number
[1] 2
We could have also accomplished this by writing my_number=2
Note that my_number now appears in the upper-right hand pane
of RStudio
Let's try some more basic operations:
2*my_number
[1] 4
2+my_number
[1] 4
… And a few more:
my_number/3
[1] 0.6666667
my_number^3
[1] 8
if we want to store the results of these basic
operations, we could use the <- operator again:
my_new_number<-2*my_number
When naming variables or objects in r, try to
avoid terms that may confuse r because they are
similar to commands. For example, don't name a
variable mean or median
It is also good practice to use very different types of names for your variables or objects so you don't accidentally repeat them in your code.
Also, keep in mind that R is case sensitive. If one letter is accidentally capitalized, your command won't work.
We can also create character or string variables by using either double or single quotation marks:
my_name<-"Georg Simmel"
If we want to see the variable, we can use this command:
print(my_name)
[1] "Georg Simmel"
Computational sociology is extremely exciting
But it requires an entirely new way of thinking about and doing sociology
The first steps of coding that we covered today are very boring
Be patient, even if it seems like most of what we did today is irrelavent to the type of sociology you do or want to do.
1) Open RStudio and create a new .R file. Set your working directory to a new folder on your computer (anyone will do). Create two numeric variables and multiply them by each other. Create a new string variable that consists of your first and last name.
2) Read Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., … Van Alstyne, M. (2009). SOCIAL SCIENCE: Computational Social Science. Science, 323(5915), 721–723
3) Read King, G. (2011). Ensuring the Data Rich Future of the Social Sciences. Science, 331(11 February), 719–721.