Computational Sociology

Chris Bail
Department of Sociology
Duke University

What is Computational Sociology?

A nascent field that uses computer-based techniques to analyze the rapidly increasing amount of data produced via digital sources or currently being archived in digital form.

Why Do We Need Another Subfield?

Because many of the most pressing questions in sociology require vast amounts of data about social relationships between large groups of people that are rich in qualitative, longitudinal detail

if we don't do it, other fields will!

Why Do We Need a New Type of Course?

Computational methods not only require considerable technical knowledge about how to code, collect unstructured data, and analyze large amounts of data– they also introduce a range of original new theoretical, ethical, and logistical issues as well.

Shouldn't We Learn from Computer Scientists?

Computer science courses don't anticipate the types of questions social scientists might ask, and therefore they a) introduce many unnecessary concepts; b) do a poor job of explaining how computer programming tools might be used by social scientists; and c) computer scientists that engage social science theories usually produce lackluster research.

Isn't Coding Difficult?

No. In my experience, learning to code is 5% intelligence, 95% endurance.

This course assumes NO prior experience coding… or your money back.

Obstacles to Learning

Unspoken/tacit knowledge among coders

Obstacles to Learning

Learning to think like a computer

Obstacles to Learning

Assuming failings are your fault

Obstacles to Learning

Expecting too much of computers

Obstacles to Learning

Becoming impatient while learning elementary concepts

Obstacles to Learning

Active Learning

We will learn to code together. You will do what I do on my laptop

We will work through messy, “real-world” examples that you might encounter.

We will tie our work in this class to your own research/publications

INTRODUCTIONS

Introductions

A little bit about me…

Introductions

Please tell me your name, your disciplinary background, and why you are interested in taking this course. Please also tell me if you have any experience with the R programming language, or any other programming language.

HOUSEKEEPING

Class Structure

This class may be a bit more didactic than other courses you have taken.

Each class will begin with hands-on instruction in coding, followed by individual exercises.

Class Structure

I have assigned ungraded homework assignments- each of which are designed to help you stitch together data collection and analysis for a final paper.

I HOPE this paper will complement something you are already doing (MA paper, Dissertation Proposal, Dissertation Article, etc.)

Class Structure

I have also assigned readings which we will discuss in the context of coding together. These readings are intended to illustrate potential applications of the methods, and inspire you to think about how the data might be used.

Laissez-Faire

My deeply pragmatic teaching philosophy for this course requires you to give me constant input, and ask for feedback on your work whenever you need it. I am happy to look at homework assignments with you in office hours, or read an outline or first draft of your final paper if you give me enough time to do so.

Please Give Me Feedback

Link to anonymous web survey:

http://goo.gl/forms/QJVxr8msK0

INTRODUCTION TO R

Bell Labs: The "S" Language

alt text

R is Open Source

Open Source Software

Pros

1) Free; 2) Large user-base; 3) Always at the edge of innovation

Open Source Software

Pros

1) Free; 2) Large user-base; 3) Always at the edge of innovation
Cons

1) Disorganized; 2) Steep learning curve/domain knowledge required; 3) No “manual” or customer support

alt text

Google Scholar Mentions

alt text

Why R?

Access to state-of-the-art statistics

Why R?

Impressive visualization capabilities

Why R?

Faster and more efficient than STATA/SPSS/SAS

Why R?

Fully fledged programming language that can interface with other powerful languages (e.g. Python, C++)

Why R?

R can scrape data from the web.

Why R?

R is the most likely to become the lingua franca of computational sociology because it combines statistics, visualization, and programming capabailities like no other language.

Beautiful aRt

alt text

Beautiful aRt

alt text

Beautiful aRt

R even made this Presentation!

alt text

R is Object-Oriented

alt text

R's Learning Curve

All object-oriented languages have steep learning curves.

R's Learning Curve

This is because it is not a “stand alone” software package that can be downloaded and used “off the shelf.”

R's Learning Curve

R is difficult for STATA/SPSS/SAS users to learn because you must master a variety of concepts that are unique to object-oriented languages.

R's Learning Curve

These basic concepts may seem irrelevant to your work right now will become extremely useful later because most types of analysis in R (and computational sociology more broadly) are more complicated than they seem at first.

R's Learning Curve

There is so much to learn that I recommend not focusing on memorizing syntax or “facts” about R.

instead try to get a general sense of how things work as well as info about how to look up such details when you need them.

Let's Try to Avoid This:

alt text

STARTING TO WORK WITH R

Ways to Use R

alt text

Let's Install R & RStudio

First, install R Terminal: http://cran.rstudio.com/
Second, install RStudio: http://www.rstudio.com/products/rstudio/download/
Choose the appropriate version of each (Mac/PC/Linux)

Let's Install R & RStudio

You may wish to set RSTudio as the “default program” to open programs with extensions “.R” and “.Rdata”
- on a Mac, “right click” on the file, then “Get Info”, then “Open with”
- on Windows, select “Default Programs” from the Start Menu

Our Class Dropbox

Please visit this link to download the files we will use for the remainder of the class: http://bit.ly/2c1E96s
Download the entire folder to a location you will remember- the Desktop is OK but you may prefer to create a new folder.
Start by opening the file entitled “Class # 1 R Code.R”
- double click, and remember that you want to open it in RStudio, not R

Our Class Dropbox

If you want to follow along with this presentation, open this .html file in a browswer:

Computational Social (Class #1).html

Getting to Know RStudio

How to Follow Along

I recommend opening a “new script” using the “File/New” drop-down menu in RStudio and typing in the code on the slides that follow.

I have tried to “take notes” for you, in the form of an annotated script within the Dropbox entitled “Class # 1 R Code.R”

I recommend that you type the code yourself into the new file in order to “learn by doing,” etc.

SETTING YOUR WORKING DIRECTORY

Setting Your Working Directory

This line tells you what your “working directory” is:

getwd()

[1] "/Users/christopherandrewbail/Desktop/Dropbox/TEACHING/Computational Soc Fall 2015/Course Dropbox"

Setting Your Working Directory

To set your working directory to the desktop, type,

setwd("~/Desktop")

Sometimes, you will need to specify the entire file path:

setwd("/Users/christopherandrewbail/Desktop/Dropbox/TEACHING/Computational Soc Fall 2015/Course Dropbox")

Setting Your Working Directory

Next, let's take a look at what documents are in your working directory:

list.files()

[1] "Class # 1 R Code.R"                      "Income By Race.xlsx"                     "Introduction to R Day 1.html"           
 [4] "Introduction to R Day 2.html"            "OECD Health Data"                        "Pew Data.Rdata"                         
 [7] "R Cheatsheets"                           "Sample Stata Data.dta"                   "Sample_CSV_Data.csv"                    
[10] "Sample_Pew_data.csv"                     "Syllabus (Computational Sociology).docx

BASIC OPERATIONS IN R

Basic Operations in R

A very basic operation:

2+2

[1] 4

Basic Operations in R

Now let's create our first object or variable in R. To do this, you need to use the <- operator

my_number<-2

Basic Operations in R

If you type my_number and run the line (control+enter) you will see it has a value of “2”:

my_number

[1] 2

We could have also accomplished this by writing my_number=2

Basic Operations in R

Note that my_number now appears in the upper-right hand pane of RStudio

Basic Operations in R

Let's try some more basic operations:

2*my_number

[1] 4

2+my_number

[1] 4

Basic Operations in R

… And a few more:

my_number/3

[1] 0.6666667

my_number^3

[1] 8

Basic Operations in R

if we want to store the results of these basic operations, we could use the <- operator again:

my_new_number<-2*my_number

Basic Operations in R

When naming variables or objects in r, try to avoid terms that may confuse r because they are similar to commands. For example, don't name a variable mean or median

Basic Operations in R

It is also good practice to use very different types of names for your variables or objects so you don't accidentally repeat them in your code.

Basic Operations in R

Also, keep in mind that R is case sensitive. If one letter is accidentally capitalized, your command won't work.

Basic Operations in R

We can also create character or string variables by using either double or single quotation marks:

my_name<-"Georg Simmel"

If we want to see the variable, we can use this command:

print(my_name)

[1] "Georg Simmel"

QUESTIONS?

Wrapping Up

Computational sociology is extremely exciting

Wrapping Up

But it requires an entirely new way of thinking about and doing sociology

Wrapping Up

The first steps of coding that we covered today are very boring

Wrapping Up

Be patient, even if it seems like most of what we did today is irrelavent to the type of sociology you do or want to do.

Homework

1) Open RStudio and create a new .R file. Set your working directory to a new folder on your computer (anyone will do). Create two numeric variables and multiply them by each other. Create a new string variable that consists of your first and last name.

2) Read Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., … Van Alstyne, M. (2009). SOCIAL SCIENCE: Computational Social Science. Science, 323(5915), 721–723

3) Read King, G. (2011). Ensuring the Data Rich Future of the Social Sciences. Science, 331(11 February), 719–721.