Lecture 1 - Course Introduction and Types of Data

Penelope Pooler Eisenbies

Housekeeping

  • Introduction 👋

  • Syllabus 📃

  • Navigating Slides ⬇️

  • Today’s plan 📋

    • Navigating R and RStudio 🪄

    • General Introduction to Statistics and Analytics 📈

    • Types of Data 🧮

A little about me… 👋

 [1] "ggthemes"     "gridExtra"    "RColorBrewer" "usdata"       "countrycode" 
 [6] "mapproj"      "maps"         "shadowtext"   "olsrr"        "magrittr"    
[11] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
[16] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
[21] "pacman"      

I grew up here and went to SU and

  • then I …

    • worked in Scotland, Slovakia, Lithuania, Chile.
    • traveled all over…
    • went to graduate school in Oregon and Virginia.
    • worked in federal gov’t and private sector.
  • Now I …

    • still do consulting and some research in analytics.
    • also helped create the Business Analytics major here at Whitman.

Using R and RStudio 🪄

  • You have two options to facilitate your introduction to R and RStudio:

    • Option 1: Download and Install R and RStudio on your laptop.
    • Option 2: Start with a free Posit Cloud account and then transition to Option 1.


  • If you are comfortable with coding: Start with Option 1, but still sign up for Posit Cloud account.
  • If you are nervous about coding: Choose Option 2.
  • For both options: I can help with download/install issues during office hours.


  • What I do: I maintain a Posit Cloud account for helping students but I do most of my work on my laptop.

What is Statistics, Analytics, etc.? 📈

  • Statistics (the discipline) allows us to answer questions about (almost) anything we want to know that we can collect data for.
  • We start with a POPULATION that we have a question about.

  • We select a subset from that population, called a SAMPLE.

  • We collect data from the SAMPLE and summarize it, to get an ESTIMATE.

  • That ESTIMATE answers our question about our POPULATION.

Statistics, Analytics, Data Science…huh

  • Statistics is a discipline (what we study) and a statistic is an estimate based on data.

    • This dual meaning confuses people.

    • A newer term to describe the discipline of organizing, analyzing, and presenting data is Analytics .

  • How do these terms, Statistics and Analytics, differ?

    • It depends on who you are talking to.
    • Analytics (like the new Business Analytics major at Whitman) is the more modern term.
    • Another (overlapping) term is Data Science which is similar but more encompassing.
  • Why so many terms? Good Question!

  • My Opinion: People are unsure about this field and its methods which leads to confusion.

Where do YOU fit in to Statistics and Analytics

  • Data, statistics, and analytics are essential to everyday life.

    • This is especially true for management and business professionals.
    • Understanding the pandemic, politics, sports, weather, investments, all require data skills.
  • Where you (students) are needed:

    • You are needed to understand data and communicate statistical information correctly and honestly to your peers, and the world at large.
  • Before you ask…

    • Yes, you could google How do I communicate statistics?
    • Yes you could write a related question in ChatGPT
    • This class will provide tools and information that those searches don’t provide.
    • More to come on how to use ChatGPT effectively and ethically.

What percent of people in each county in the United States have a Bachelor’s degree?

  • The POPULATION is all people the USA.

  • Within that population, we have a SUBPOPULATION for each county.

  • To answer this question, should we talk to EVERY person in every county?

    • NO! Instead, the American Community Survey uses an established sample design to attain representative data from each county.

    • The SAMPLE is the group of people SELECTED in each county who complete the survey.

    • The ESTIMATES from each county’s SAMPLE represent that county’s POPULATION.

What percent of people in each county in the United States have a Bachelor’s degree?

Lecture 1 In-class Exercises - Q1

Session ID: mas261f23


You are writing an article about freshman students at SU to find out how far they are from home and how long it took them to travel to campus.

You randomly select 100 freshman students and ask them some questions to collect your data.


The POPULATION of interest is


A. All students at SU

B. All freshman students at SU

C. All freshman students in the USA

D. The 100 students you selected