Mon Sep 12, 2016

Course Title

  • In catalog: Introduction to Statistical Sciences
  • New: Introduction to Statistical and Data Sciences

What is Data Science?

Data Science

  • Example domains: biology, economics, physics, sociology, etc.
  • So why the title switch?

Dialogue with Student

Course Objective #1

Have students engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices.

  • Cobb: Minimizing prerequisites to research
  • Not necessarily publishing in top journals, but answering scientific questions with data.
  • Difficult to do research without understanding stats, however

Data/Science Research Pipeline

We will, as best we can, perform all this:

Data/Science Research Pipeline

And not just this, as in many previous intro stats courses:

Course Objective #2

Foster a conceptual understanding of statistical topics and methods using simulation/resampling and real data whenever possible, rather than mathematical formulae.

  • Whenever we can, use real data
  • "Classroom data are like teddy bears and real data are like a grizzly bear with salmon blood dripping out its mouth."
  • Example data set: nycflights
  • There are two "engines" that can make statistics "work"
    • Mathematics: formulas, approximations, etc
    • Computers: simulations, random number generation

The "Engine" of Statistics

In this course, computers and not math will be the "engine". What does this mean?

  • Less of this:
    Drawing
  • But more of this:
    Drawing

Programming/Coding

  • Previous programming/coding experience is not a prerequisite to this course
  • This course is NOT a course on programming, coding, nor computer science
  • You will however be exposed to basic computational and algorithmic thinking

Course Objective #3

Blur the traditional lecture/lab dichotomy of introductory statistics courses by incorporating more computational and algorithmic thinking into the syllabus.

  • Completely separate lecture and labs is a legacy of a time before
    Drawing

RStudio Server

  • Not all laptops are created equal: operating system, processing power, age
  • RStudio Server: cloud-based version of RStudio where all processing is done on Middlebury servers
  • go/rstudio (on campus or via VPN)

Thursday Lab Time

Undirected time to

  • Set aside to help people get up to speed with computing
  • It is like learning a language.
  • Learn from each other.
  • Collaborate on problem sets (due on Fridays)

Course Objective #5

Develop statistical literacy by, among other ways, tying in the curriculum to current events, demonstrating the importance statistics plays in society.

  • H.G. Wells (paraphrased): "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write."
  • Me: "Sure, it's easy to lie with statistics. But it's also hard to tell the truth without them."

Data Visualization and EDA

  • EDA = Exploratory data analysis
  • "Data visualization is a gateway drug to statistics."
  • Prez from Season 4 of "The Wire":

Final Project

  • Final exam
  • Capstone experience to align this topics and principles of this course with how research and learning is done in practice.
  • Work on interpersonal and collaborative skills. No textbook on that!