Class 5

Elisabeth Gawthrop

Today

  • Check-ins: name, how you’re doing, a win for this week

  • Announcements

  • Quiz

  • Go through quiz answers and discuss

  • Review homework

Break

  • Finding + joining data

Break

  • Group time to discuss story ideas so far

  • Let’s look for some data

  • Check out

Announcements

  • Grading

  • Extra credit opportunity: 10 points if you attend at least one of the four sessions of the Arnolt Center’s AI symposium on Thursday. I will be there, so come say hi and I will give you the points. I especially encourage attend at the 4:15–5:30 p.m session – Work smarter: Leveraging AI for investigative journalism.

  • Please add/include the CSVs that you imported into your Notebook_04 Canvas assignment, so that I can run the notebook with your real data. Obviously you won’t be penalized for the ‘late’ submission.

Class 5 Quiz

  1. Which function did you use in your homework along with group_by() and mathematical functions (such as sum()) to calculate a statistic for various groupings of data within a dataframe? (1 pt)

    a: calculate()
    b: summarise()
    c: stats()


  2. In R, with boolean data, also known as logical data, FALSE is equivalent to which number? (1 pt)

    a: -1
    b: 0
    c: 1

  1. In “What data can’t do”, how does Hannah Fry repeatedly refer to the complexities of data and math? (1 pt)

    a: It’s counting things
    b: It’s writing equations
    c: It’s using a super computer

  2. True or false: In addition to other causes, sea level rise is sometimes caused in part by sinking land masses. (1 pt)

Quiz con’t

  1. Below is a quote from “Our very strange search for ‘sea level’” by Brooke Jarvis. Do you think all data is a social and historical construct? Why or why not? (3 pts)
Like the metre, the minute, or the meridian that runs through Greenwich, England, "sea level" is best thought of as a social and historical construct, the result of an inherently arbitrary decision taken by generations of people doing their best to make sense of a strange and chaotic world.
  1. Did the two articles make you think about data in a different way? How so? Or if not, how did it reinforce what you already thought? (3 pts)

Review and discussion

  • Reading reactions
  • Homework review

Break

Finding data

  • Government - local, state, federal
    • Open data portals
    • If there’s a form, there’s a dataset
    • Records requests
  • Non-profit organizations
  • Media-centered orgs like Big Local News
  • For-profit companies sometimes offer some data that’s free and some that’s paid
  • See what data is cited by academic or other research papers
  • Get creative - scraping websites, extracting data from PDFs, collect it yourself

Census

Joining data

  • When using a join function, it will automatically join on identical columns - meaning the name of the variable/column is the same in both datasets

  • Or you can tell it tmax in x dataframe should be matched to T.MAX in y dataframe

    • For now, I would suggest changing the column names for the ones you want to match up before you join
  • If your columns aren’t truly identical (different values within the column), it will do funny things like add rows or columns

Types of joins

Left, right, inner, full - all are dplyr functions

Find it in help under “mutate-joins”

Image here + read another person explain joins: https://medium.com/@imanjokko/data-analysis-in-r-series-vi-joining-data-using-dplyr-fc0a83f0f064

Join examples

Head over to class5_notebook.Rmd

Break

Story ideas so far

Check in with your group

Coding lab

Let’s find and read in some data

Tasks before next week

  • Reading to be announced - will send an email via Canvas

  • Coding notebook: Due Feb 17 @ 5PM (on Canvas)

  • Story-based checkpoint:

    • List five potential datasets you might use in your story (in Potential Data Sources section)

    • Choose two of those datasets to use in the Selected Data Sources section, and complete the fields for those two datatsets (link, methodology, who gathered the data, etc).

Checkout

  1. Do you feel behind, on-track, or ahead?
  2. About how long did the reading take you?
  3. About how long did the coding notebook take you?
  4. About how long did story-based assignment take you (including research time)?
  5. Any other feedback, notes on how you’re feeling, etc?