Tuesday, July 22, 2014

Course Outline

  1. Data is everywhere!

  2. Collecting data

  3. Summarizing data

  4. Using data to make inferences

Why statistics?

Statistics is the art and science of learning from data. Statisticians combine mathematical principles and subject knowledge to use data to make decisions about the larger world around them.

  • Where does all this data come from?

  • In today's world, data is everywhere!

Activity 1: At your computers, load your Facebook profile into Wolfram Alpha, what data do you get? How is it represented?

Collecting data

  • The first step in any statistical investigation is defining a research question.
  • RQ: How many Nebraska female high school students plan to pursue a college degree in one of the STEM fields (science, technology, engineering, and mathematics)?
  • What would we need to know to answer this research question?

Collecting data

  • Has anyone studied this research question before?
  • Do a quick Google search for female enrollment in STEM majors. What information did you find?
  • How would you collect data?

Collecting data

A random sample occurs when all members of the population of interest have the same chance of being selected. Random samples tend to be representative of the entire population, and without bias.

  • For our research question, what's the population of interest?

  • How could we take a random sample?

  • Is our group of students a random sample?

Collecting data

A random variable is a characteristic (may be categorical or numerical) that is recorded for subjects in our sample. Random variables change from subject to subject.

  • In your search, what random variables were measured?

  • What random variables would you measure if you designed your own study?

Collecting data

  • Let's collect some data to use a little later!

  • Complete the Google docs form to submit your own data.

Summarizing data

  • Statisticians usually work with data in a table (Excel spreadsheet, CSV file, etc.).

  • Graphical and numerical summaries make the data more accessible, and are easy to make in any software program.

  • Bad graphics are everywhere!

Summarizing data

Example: What's misleading about the graph below, published by UNL?

Summarizing data

Example: What about this Fox News pie chart?

  • In fact, there's an entire Tumblr for bad graphs!

Summarizing data

A numerical summary of a data set uses a single number to describe or characterize a random variable.

Common numerical summaries include:

  • Mean
  • Median
  • Mode
  • Standard deviation
  • Range
  • Sample proportion

  • Can you think of any others?

  • Which are appropriate for numerical data? Which are appropriate for categorical data?

Summarizing data

A graphical summary of a data set uses pictures to describe a random variable.

Common graphical summaries include:

  • Bar charts
  • Pie charts
  • Histograms
  • Dot plots
  • Scatterplots

  • Can you think of any others?

  • Which are appropriate for numerical data? Which are appropriate for categorical data?

Summarizing data

Summarizing data

Activity 2: Follow the instructions to load our class data into StatCrunch. In teams of 3-4, choose a random variable and prepare a poster summarizing that variable.

Your poster should include:

  • Appropriate numerical summaries

  • Appropriate graphical summaries

  • A potential research question this data could answer

In about 20-25 minutes, one team member will present the poster to the group.

Using data to make inferences

Statistical inference is the process of using data collected from a sample to make informed decisions about the population from which the sample came.

  • Classical statistics relies on tools from mathematics and probability theory.

  • Modern statisticians have a wide range of tools available… like simulation!

Using data to make inferences

In a simulation, statisticians use random chance mechanisms (like rolling a dice, drawing a card, computerized methods, etc.) to repeat the original experiment or to test a hypothesis about the "real world".

  • Our goal is to see whether data we've observed is consistent with a certain random scenario.

Using data to make inferences

Helper or Hinderer?

A 2007 issue of Nature reported a study investigating whether infants take into account an individual's actions towards others when evaluating that individual as appealing or aversive.

  • One study component used 16 6-month old infants as subjects.

  • Infants were shown a "climber" character that couldn't make it up a hill in two tries.

  • They were then shown two scenarios for the next try:

  1. The "climber" is "helped" up the hill by a blue square character.
  2. The "climber" is "hindered" up the hill by a yellow triangle character.

Using data to make inferences

Using data to make inferences

Activity 3: In your teams, simulate the "helper-hinderer" experiment.

  • Record how many times the "helper" character was chosen.

  • Assuming that the babies were choosing a character by random chance, and had no attraction to one character or the other, how many times would you expect to see the "helper" chosen?

Using data to make inferences

  • Suppose the researchers observed 14/16 = 87.5% of babies choosing the "helper" toy.

  • How likely do you think that is, if the babies are choosing by random chance?

Using data to make inferences

  • What would happen if we repeated the simulation lots of times?

plot of chunk unnamed-chunk-1

Using data to make inferences

Activity 4: In your teams, design a simulation study to decide whether our class sample provides evidence that a majority of Nebraska high school girls plan to pursue a college degree in STEM.

  • Make a hypothesis about the "true value" of your random variable.

  • Decide what you'll use to simulate the random chance. You can use dice, cards, Excel, or some other item.

  • If time, carry out a small simulation (15-20 random experiments).

  • How did the simulation you did compare to the data we collected?

Each team should make a poster describing their simulations. Toward the end of class one member of each team (not the previous presenter) should describe what they did.

What makes a good statistician?

Do you…

  • Have an interest in developing mathematical skills?

  • Like critical thinking problems?

  • Enjoy real-life applications and working with data?

  • Want to develop communication skills and learn about new topic areas?

Then consider a degree in statistics!

Statistics at Nebraska