2017-05-17

Introduction

Outline of Workshop

Our Textbook

ModernDive and Our Syllabi

  • ModernDive's guiding philosophies are deeply intertwined our syllabi
  • Hard to speak of ModernDive in isolation from syllabi and vice-versa
  • Numbers are numbers, but data has context…

Context & Background

Albert's Course

  • Introduction to Statistical & Data Sciences: Webpage and GitHub Repo
  • Administrative:
    • Chief non-econ/bio stats service class at Middlebury
    • 12 weeks each with 3h "lecture" + 1h "lab"
  • Students:
    • ~24 students/section of all years/backgrounds. Only stats class many will take
    • Background: Many had AP stats, some with programming
    • All had laptops that they brought everyday

Albert's Syllabus

  • Topic List
    • First half is data science: data visualization, manipulation, importing
    • Second half is intro stats: sampling, hypothesis tests, CI, regression
  • Evaluation
    • 10%: weekly problem sets
    • 10%: engagement
    • 45%: 3 midterms (last during finals week)
    • 35%: Final projects

Albert's Typical Classtime

  • First 10-15min: Priming topic, either via slides or chalk talk
  • Remainder: Students read over text & do Learning Checks in groups and without direct instructor guidance.

Chester: Social Statistics

What is Different?

What are we doing that's different and why?

  1. Data first! Start with data science via tidyverse, then stats.
  2. Replacing the mathematical/analytic with computational/simulation-based whenever possible.
  3. The above necessitates algorithmic thinking, computational logic and some coding/programming.
  4. Complete reproducibility

1) Data First!

Actual dialogue I had with a student:

1) Data First!

Cobb (TAS 2015): Minimizing prerequisites to research. In other words, focus on entirety of Wickham/Grolemund's pipeline…

1) Data First!

… and not just this part.

1) Data First!

Furthermore use data science tools that a data scientist would use. Example: tidyverse

1) Data First!

What does this buy us?

  • Context for asking scientific questions

2) Math vs Computers

Cobb (TAS 2015): Two possible "computational engines" for statistics, in particular relating to sampling:

  • Mathematics: formulas, probability theory, large-sample approximations, central limit theorem
  • Computers: simulations, resampling methods

2) Math vs Computers

We present students with a choice for our "engine":

Either we use this… Or we use this…
Drawing Drawing


  • Almost all are thrilled to do latter
  • Leave "bread crumbs" for more advanced math/stats courses

2) Math vs Computers

3) Algorithms, Computation, & Coding

4)

Let's Dive In!

Insert appropriate image

Getting Started

R

  • Chester's Book: Ask how he wants it pitched
  • DataCamp: Intro to R

RStudio vs RStudio Server