2026-03-30

General Class Structure

  • Monday: Theory
  • Warm-up Activity (10 min)
  • Discussion on readings (40 min)
  • Research design work (25 min)
  • Project questions (remainder)

General Class Structure

  • Wednesday: Laboratory
  • Warm-up Activity (10 min)
  • New material + discussions (40 min)
  • Additional material + discussions (25 min)
  • Problem set / final project questions (remainder)

Today’s Class

  • Intros
  • Thinking like data scientists
  • Activity
  • Intro to R, RStudio, Python, Anaconda, Jupyter Notebooks, and Julia
  • Interactive Coding Demo in R
  • Installing R and R Studio

Wednesday’s Class

  • On Zoom
  • Syllabus
  • Intro to coding in R

Office Hours

  • Office Hours: Fridays, 1:30pm-3:00pm

Introductions

  • Name
  • Program
  • One reason taking this class
  • Do you see yourself as a data scientist or social scientist?

Introductions

  • Name
  • Program
  • One reason taking this class
  • Do you see yourself as a data scientist or social scientist?

What is computational social science?

  • In groups

What is computational social science?

Let’s Consider a Study

  • computation, machine learning, new method
  • social importance, poverty

Predicting Poverty

  • Far more detailed than previous estimates

Predicting Poverty

  • Performed well in accuracy, 10x faster and 50x cheaper than census

Predicting Poverty

  • In groups: what are potential benefits?
  • In groups: what are potential trade-offs?
  • new information!
  • could reduce inequality?
  • privacy
  • accuracy
  • generalizability?
  • errors
  • unknown how it will be used

Let’s Begin

  • Salganik: The best place to start is research design.
  • Research question (e.g. what is poverty in Rwandan neighborhoods?)
  • Data (e.g. cell phone data)

Research Design Activity

  • Task: design a research study to estimate the number of self-described “data scientists” and “social scientists” at Stanford
  • What is the outcome of interest and/or research question?
  • What data will you use? (ready made vs. custom made)
  • Be creative!

Research Design Activity

  • Ready-Made Data: LinkedIn, github, etc.
  • Custom-Made Data: Surveys, compilations of student records, etc.
  • Things to consider:
  • What can we infer from online information or class schedules?
  • Can people be both? neither?
  • How costly are our methods?
  • Privacy/ethics concerns

Tools for Computational Social Science

Introducing R

  • Developed by statisticians
  • Lots of resources
  • Flexible
  • Amazing graphics
  • Open source, free
  • Syntax challenges from different packages

Introducing RStudio

  • Environment for working in R

Introducing Python

  • Developed by computer scientists
  • Lots of resources
  • Flexible and more popular than R
  • Open source, free
  • Similar to R, relies on various packages

Introducing Anaconda

  • Environment for launching multiple programs, including Jupyter Notebooks

Introducing Jupyter Notebooks

  • Flexible and intuitive way to integrate code and text
  • Similar to R Markdown in function

Introducing R Markdown

  • Creates nice pdf, html, documents and more
  • Recommended way to turn in assignments for this class (generate pdf or html files)

Introducing Julia

  • Quickly becoming popular in data science! Why?

Introducing Julia

Trying Out R

Before Next Class

  • Install R and RStudio
  • Optional: watch video linked at the end of exercise (~14min)
  • Optional: look at syllabus and problem set 1 (linked on canvas)
  • See you Wednesday on Zoom!

Class Plan

  • Syllabus (20 min)
  • Break (5 min)
  • Coding! Coding! Coding! (50 min)
  • Introduce Problem Set (Remainder)

Syllabus

Introducing R

  • Developed by statisticians
  • Lots of resources
  • Flexible
  • Amazing graphics
  • Open source, free
  • Syntax challenges from different packages

Introducing RStudio

  • Environment for working in R

Starting an R Script

#####################################################
## title: a new R script!
## author: you!
## purpose: to try out R
## date: today's date
#####################################################

# you can start coding below

Using Base R

# what does R do with numbers?
2
## [1] 2
# what if we try adding or multiplying?
4+3
## [1] 7
53*2
## [1] 106

Using Base R

# what if we put all the previous results together?
c(2, 3, 4+3, 53*2, 90*4/5)
## [1]   2   3   7 106  72

(Some) types of data in R

  • Numeric
  • Character
  • Logical
  • Integer

Explore on your own

  • Pick a number! Try running this number as different types of data (as.integer, as.numeric, as.character, as.logical)
  • Try using conditional statements (<, >, ==, !=, >=, <=) with different character and numeric values

Using Base R

# what if we put all the previous results together?
matrix(c(2, 3, 4+3, 53*2, 90*4/5),
       nrow = 5)
##      [,1]
## [1,]    2
## [2,]    3
## [3,]    7
## [4,]  106
## [5,]   72

A Note on ?’s

  • The ? sends you to the help page of whatever function follows it
  • Try it! Type ?matrix and then hit ctrl + enter

Using Base R

  • Try creating your own matrix!
  • Use the matrix() function
  • Add numbers by putting c(your numbers here) inside
  • Can you create a \(3 \cdot 3\) matrix?
  • Note: ? will give you more info on how a function works (i.e. ?matrix)

Using Base R

list(c(2, 3, 4+3),
     53*2,
     90*4/5)
## [[1]]
## [1] 2 3 7
## 
## [[2]]
## [1] 106
## 
## [[3]]
## [1] 72

(Some) Structures of data in R

  • Vectors
  • Matrices
  • Lists

Using Base R

  • An important part of coding is assigning values to labels
  • We can assign vectors, matrices, lists, and more
  • This is helpful for storing them and using them later
# assign "sum" to the sum of a few numbers
penguin <- c(2, 3, 4+3, 53*2, 90*4/5)

# now take a look at the result
penguin
## [1]   2   3   7 106  72

Using Base R

  • Something else we can do is apply functions to values
# let's have R sum the numbers
sum(c(2, 3, 4+3, 53*2, 90*4/5))
## [1] 190

Using Base R

  • Take a minute to
  1. Assign a series of numbers to a label
  2. Sum this label

Using Base R

  • We can also write our own functions!
# we'll call our function 'addition'
addition <- function(x, y){
  return(x+y)
}

Using Base R

addition(2, 2)
## [1] 4

Using Base R

  • As an exercise, try writing your own function
  • Start with just one argument (function(x))
  • Then try adding more args (function(x,y))

Using Base R

  • Loops iterate the same procedure through a series of values
# try printing each of the first 10 numbers

for(i in 1:10){
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Using Base R

  • As an exercise, try writing your own loop
  • What happens if you use something other than i?
  • What happens if you use something other than 1:10?

Using Base R

# vector with 1-10
ten <- c(1:10)

# try adding one
ten + 1
##  [1]  2  3  4  5  6  7  8  9 10 11

Using Base R

# vector with 1-10
ten <- c(1:10)

sapply(ten, function(x) x + 1)
##  [1]  2  3  4  5  6  7  8  9 10 11

Using Base R

  • A note on coding: we are speaking a language! Communication is key
  • Meaning: use lots of #’s

A Note on our Role as Data Scientists

Data Science by Hand?

  • Data science can be done with small data, and limited resources
  • What is important is that the analyst thinks carefully about decisions and message

John Snow and the 1954 Cholera Outbreak

Napolean’s March on Moscow

DuBois on McIntosh County

DuBois on Black Wealth

Getting Started with R

  • Could we do the following by hand?

Problem Set Questions