2023-06-22

General Class Structure

  • Tiny Desk & Activity (10 min)
  • New material + discussions (50 min)
  • Break (5 min)
  • Additional material + discussions (25 min?)
  • Problem set / final project questions (remainder)

Today’s Class

  • Intros
  • Groups
  • Definitions
  • Syllabus
  • Thinking like data scientists
  • Activity
  • Intro to R, RStudio, Python, Anaconda, Jupyter Notebooks, and Julia (if time)

Introductions

  • Name
  • Program
  • One reason taking this class
  • Favorite animal (or plant)

Week 1 Groups!

print.data.frame(groups)
##                 group 1                 group 2                   group 3
## 1 Widodo, Ignazio Marco Alsayegh, Aisha E H M I Saccone, Alexander Connor
## 2         Cai, Qingyuan             Tian, Zerui              Shah, Jainam
## 3  Dotson, Bianca Ciara      Jun, Ernest Ng Wei                          
## 4   Albertini, Federico           Lim, Fang Jan  Huynh Le Hue Tam, Vivian
##                    group 4                group 5                 group 6
## 1 Crawford, John Alexander           Gupta, Umang Spindler, Laine Addison
## 2                Su, Barry Cortez, Hugo Alexander           Ning, Zhi Yan
## 3      Premkrishna, Shrish        Tan, Zheng Yang            Ng, Michelle
## 4                                 Knutson, Blue C   Leong, Wen Hou Lester
##               group 7
## 1    Wan Rosli, Nadia
## 2                    
## 3 Andrew Yu Ming Xin,
## 4     Gnanam, Akash Y

What is data science?

  • In groups, discuss

What is data science?

What is climate change?

  • In groups, discuss

Climate change

Climate change

Climate change

Climate change or climate crisis?

Syllabus

Analytics for a Changing Climate: Intro to Social Data Science

source: bing AI art generator

Data Science by Hand?

  • Data science can be done with small data, and limited resources
  • What is important is that the analyst thinks carefully about decisions and message

John Snow and the 1954 Cholera Outbreak

Napolean’s March on Moscow

DuBois on McIntosh County

DuBois on Black Wealth

Data Activity

  • Task: communicate a message about the data using a visual approach of your choosing
  • Be creative! Think about your message

Getting Started with R

  • Could we do the following by hand?

Introducing R

  • Developed by statisticians
  • Lots of resources
  • Flexible
  • Amazing graphics
  • Open source, free
  • Syntax challenges from different packages

Introducing RStudio

  • Environment for working in R

Introducing Python

  • Developed by computer scientists
  • Lots of resources
  • Flexible and more popular than R
  • Open source, free
  • Similar to R, relies on various packages

Introducing Anaconda

  • Environment for launching multiple programs, including Jupyter Notebooks

Introducing Jupyter Notebooks

  • Flexible and intuitive way to integrate code and text
  • Similar to R Markdown in function

Introducing R Markdown

  • Creates nice pdf, html, documents and more
  • Recommended way to turn in assignments for this class (generate pdf or html files)

Introducing Julia

  • Quickly becoming popular in data science! Why?

Introducing Julia

Returning to the Data Activity

  • Same groups!
  • Wrap up in 5 min, share out for 5 min
  • Task: communicate a message about the data using a visual approach of your choosing
  • Be creative! Think about your message

Class Plan

  • Data activity (10 min)
  • Coding! Coding! Coding! (50 min)
  • Break (5 min)
  • Readings & Climate Discussion (25 min)
  • Introduce Problem Set (Remainder)

Introducing R

  • Developed by statisticians
  • Lots of resources
  • Flexible
  • Amazing graphics
  • Open source, free
  • Syntax challenges from different packages

Introducing RStudio

  • Environment for working in R

Starting an R Script

#####################################################
## title: a new R script!
## author: you!
## purpose: to try out R
## date: today's date
#####################################################

# you can start coding below

Using Base R

# what does R do with numbers?
2
## [1] 2
# what if we try adding or multiplying?
4+3
## [1] 7
53*2
## [1] 106

Using Base R

# what if we put all the previous results together?
c(2, 3, 4+3, 53*2, 90*4/5)
## [1]   2   3   7 106  72

(Some) types of data in R

  • Numeric
  • Character
  • Logical
  • Integer

Explore on your own

  • Pick a number! Try running this number as different types of data (as.integer, as.numeric, as.character, as.logical)
  • Try using conditional statements (<, >, ==, !=, >=, <=) with different character and numeric values

Using Base R

# what if we put all the previous results together?
matrix(c(2, 3, 4+3, 53*2, 90*4/5),
       nrow = 5)
##      [,1]
## [1,]    2
## [2,]    3
## [3,]    7
## [4,]  106
## [5,]   72

A Note on ?’s

  • The ? sends you to the help page of whatever function follows it
  • Try it! Type ?matrix and then hit ctrl + enter

Using Base R

  • Try creating your own matrix!
  • Use the matrix() function
  • Add numbers by putting c(your numbers here) inside
  • Can you create a \(3 \cdot 3\) matrix?
  • Note: ? will give you more info on how a function works (i.e. ?matrix)

Using Base R

list(c(2, 3, 4+3),
     53*2,
     90*4/5)
## [[1]]
## [1] 2 3 7
## 
## [[2]]
## [1] 106
## 
## [[3]]
## [1] 72

(Some) Structures of data in R

  • Vectors
  • Matrices
  • Lists

Using Base R

  • An important part of coding is assigning values to labels
  • We can assign vectors, matrices, lists, and more
  • This is helpful for storing them and using them later
# assign "sum" to the sum of a few numbers
penguin <- c(2, 3, 4+3, 53*2, 90*4/5)

# now take a look at the result
penguin
## [1]   2   3   7 106  72

Using Base R

  • Something else we can do is apply functions to values
# let's have R sum the numbers
sum(c(2, 3, 4+3, 53*2, 90*4/5))
## [1] 190

Using Base R

  • Take a minute to
  1. Assign a series of numbers to a label
  2. Sum this label

Using Base R

  • We can also write our own functions!
# we'll call our function 'addition'
addition <- function(x, y){
  return(x+y)
}

Using Base R

addition(2, 2)
## [1] 4

Using Base R

  • As an exercise, try writing your own function
  • Start with just one argument (function(x))
  • Then try adding more args (function(x,y))

Using Base R

  • Loops iterate the same procedure through a series of values
# try printing each of the first 10 numbers

for(i in 1:10){
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Using Base R

  • As an exercise, try writing your own loop
  • What happens if you use something other than i?
  • What happens if you use something other than 1:10?

Using Base R

# vector with 1-10
ten <- c(1:10)

# try adding one
ten + 1
##  [1]  2  3  4  5  6  7  8  9 10 11

Using Base R

# vector with 1-10
ten <- c(1:10)

sapply(ten, function(x) x + 1)
##  [1]  2  3  4  5  6  7  8  9 10 11

Using Base R

  • A note on coding: we are speaking a language! Communication is key
  • Meaning: use lots of #’s

A Climate Example

  • Let’s start to look at what R can do with data
  • But first, we need to understand packages
  • Most packages we will use are available on CRAN
# first, we want to install the readr package
# note that i have commented out the following line, 
# but you want to uncomment it the first time you run this code
#install.packages("readr")


# read as csv
library(readr)
temps <- read_csv("G:/My Drive/Data_Disasters/Course_site/Data/temps.csv")

A Climate Example

  • How can we look at our data?
## # A tibble: 10 × 2
##    Year  `Annual Average Temperature (F)`
##    <chr>                            <dbl>
##  1 1875                              52.5
##  2 1876                              51.5
##  3 1877                              52  
##  4 1878                              52.5
##  5 1879                              52.7
##  6 1880                              48.6
##  7 1881                              52.3
##  8 1882                              49.6
##  9 1883                              50.8
## 10 1884                              51
  • Also, the View() function

A Climate Example

  • What can we do with these data?

Readings

  • In small groups, discuss:
  • What is Environmental Justice?
  • What does EJ mean in the context of a changing climate?
  • How should social scientists consider these frameworks?

A Climate Framework

  • Holocene: ~10,000 years of stratigraphic stability prior to now
  • Anthropocene:current period where humans are primarily driving planetary changes.
  • a time interval marked by rapid but profound and far-reaching change to the Earth’s geology, currently driven by various forms of human impact.” (Zalasiewicz et al., 2017)

A Climate Framework

  • What are Planetary Bounds?
  • “thresholds which, if crossed, could generate unacceptable environmental change” (Rockström et al., 2009).

A Climate Framework

  • But what about justice?

“the initial environmental justice spark sprang from a Warren County, North Carolina, protest. In 1982, a small, predominately African-American community was designated to host a hazardous waste landfill. This landfill would accept PCB-contaminated soil that resulted from illegal dumping of toxic waste along roadways. After removing the contaminated soil, the state of North Carolina considered a number of potential sites to host the landfill, but ultimately settled on this small African-American community.”

A Climate Framework

A Climate Framework

  • What is Environmental Justice?
  • “equitable exposure to environmental good and harm” (Wolch et al., 2014).
  • Very prevalent! (White House, Patagonia, etc. etc.)
  • Not so recent (1992- Bush Sr. created an Environmental Equity Working Group in response to the EJ movement)

The EJ Atlas

EJ and Climate Change

  • Scientists have updated framework to include “just” boundaries
  • An example: “At 1.0°C global warming, tens of millions of people were exposed to wet bulb temperature extremes (Fig. 2), raising concerns of inter- and intragenerational justice. At 1.5°C warming, more than 200 million people, disproportionately those already vulnerable, poor and marginalized (intragenerational injustice), could be exposed to unprecedented mean annual temperatures, and more than 500 million could be exposed to long-term sea-level rise” (Rockström et al., 2023).

EJ and Climate Change

A Note on our Role as Data Scientists

Problem Set Questions