USCOTS 2017: A Fully Customizable Textbook for Intro Stats/Data Science

2017-05-17

Introduction

Outline of Workshop

Our Textbook

An Introduction to Statistical and Data Sciences via R
Webpage: http://moderndive.com. GitHub Repo

ModernDive and Our Syllabi

ModernDive's guiding philosophies are deeply intertwined our syllabi
Hard to speak of ModernDive in isolation from syllabi and vice-versa
Numbers are numbers, but data has context…

Context & Background

Albert's Course

Introduction to Statistical & Data Sciences: Webpage and GitHub Repo
Administrative:
- Chief non-econ/bio stats service class at Middlebury
- 12 weeks each with 3h "lecture" + 1h "lab"
Students:
- ~24 students/section of all years/backgrounds. Only stats class many will take
- Background: Many had AP stats, some with programming
- All had laptops that they brought everyday

Albert's Syllabus

Topic List
- First half is data science: data visualization, manipulation, importing
- Second half is intro stats: sampling, hypothesis tests, CI, regression
Evaluation
- 10%: weekly problem sets
- 10%: engagement
- 45%: 3 midterms (last during finals week)
- 35%: Final projects

Albert's Typical Classtime

First 10-15min: Priming topic, either via slides or chalk talk
Remainder: Students read over text & do Learning Checks in groups and without direct instructor guidance.

Chester: Social Statistics

What is Different?

What are we doing that's different and why?

Data first! Start with data science via tidyverse, then stats.
Replacing the mathematical/analytic with computational/simulation-based whenever possible.
The above necessitates algorithmic thinking, computational logic and some coding/programming.
Complete reproducibility

1) Data First!

Actual dialogue I had with a student:

1) Data First!

Cobb (TAS 2015): Minimizing prerequisites to research. In other words, focus on entirety of Wickham/Grolemund's pipeline…

1) Data First!

… and not just this part.

1) Data First!

Furthermore use data science tools that a data scientist would use. Example: tidyverse

1) Data First!

What does this buy us?

Context for asking scientific questions

2) Math vs Computers

Cobb (TAS 2015): Two possible "computational engines" for statistics, in particular relating to sampling:

Mathematics: formulas, probability theory, large-sample approximations, central limit theorem
Computers: simulations, resampling methods

2) Math vs Computers

We present students with a choice for our "engine":

Either we use this…	Or we use this…

Introduction

Outline of Workshop

Our Textbook

ModernDive and Our Syllabi

Context & Background

Albert's Course

Albert's Syllabus

Albert's Typical Classtime

Chester: Social Statistics

What is Different?

1) Data First!

1) Data First!

1) Data First!

1) Data First!

1) Data First!

2) Math vs Computers

2) Math vs Computers

2) Math vs Computers

3) Algorithms, Computation, & Coding

4)

Let's Dive In!

Getting Started

R

RStudio vs RStudio Server