Data is everywhere!
Collecting data
Summarizing data
Using data to make inferences
Tuesday, July 22, 2014
Data is everywhere!
Collecting data
Summarizing data
Using data to make inferences
Statistics is the art and science of learning from data. Statisticians combine mathematical principles and subject knowledge to use data to make decisions about the larger world around them.
Where does all this data come from?
In today's world, data is everywhere!
Activity 1: At your computers, load your Facebook profile into Wolfram Alpha, what data do you get? How is it represented?
A random sample occurs when all members of the population of interest have the same chance of being selected. Random samples tend to be representative of the entire population, and without bias.
For our research question, what's the population of interest?
How could we take a random sample?
Is our group of students a random sample?
A random variable is a characteristic (may be categorical or numerical) that is recorded for subjects in our sample. Random variables change from subject to subject.
In your search, what random variables were measured?
What random variables would you measure if you designed your own study?
Let's collect some data to use a little later!
Complete the Google docs form to submit your own data.
Statisticians usually work with data in a table (Excel spreadsheet, CSV file, etc.).
Graphical and numerical summaries make the data more accessible, and are easy to make in any software program.
Bad graphics are everywhere!
Example: What's misleading about the graph below, published by UNL?
Example: What about this Fox News pie chart?
A numerical summary of a data set uses a single number to describe or characterize a random variable.
Common numerical summaries include:
Sample proportion
Can you think of any others?
Which are appropriate for numerical data? Which are appropriate for categorical data?
A graphical summary of a data set uses pictures to describe a random variable.
Common graphical summaries include:
Scatterplots
Can you think of any others?
Which are appropriate for numerical data? Which are appropriate for categorical data?
Usually statisticians rely on statistical software.
Lots of options (paid and free exist), we'll use StatCrunch today.
Activity 2: Follow the instructions to load our class data into StatCrunch. In teams of 3-4, choose a random variable and prepare a poster summarizing that variable.
Your poster should include:
Appropriate numerical summaries
Appropriate graphical summaries
A potential research question this data could answer
In about 20-25 minutes, one team member will present the poster to the group.
Statistical inference is the process of using data collected from a sample to make informed decisions about the population from which the sample came.
Classical statistics relies on tools from mathematics and probability theory.
Modern statisticians have a wide range of tools available… like simulation!
In a simulation, statisticians use random chance mechanisms (like rolling a dice, drawing a card, computerized methods, etc.) to repeat the original experiment or to test a hypothesis about the "real world".
Helper or Hinderer?
A 2007 issue of Nature reported a study investigating whether infants take into account an individual's actions towards others when evaluating that individual as appealing or aversive.
One study component used 16 6-month old infants as subjects.
Infants were shown a "climber" character that couldn't make it up a hill in two tries.
They were then shown two scenarios for the next try:
Activity 3: In your teams, simulate the "helper-hinderer" experiment.
Record how many times the "helper" character was chosen.
Assuming that the babies were choosing a character by random chance, and had no attraction to one character or the other, how many times would you expect to see the "helper" chosen?
Suppose the researchers observed 14/16 = 87.5% of babies choosing the "helper" toy.
How likely do you think that is, if the babies are choosing by random chance?
Activity 4: In your teams, design a simulation study to decide whether our class sample provides evidence that a majority of Nebraska high school girls plan to pursue a college degree in STEM.
Make a hypothesis about the "true value" of your random variable.
Decide what you'll use to simulate the random chance. You can use dice, cards, Excel, or some other item.
If time, carry out a small simulation (15-20 random experiments).
How did the simulation you did compare to the data we collected?
Each team should make a poster describing their simulations. Toward the end of class one member of each team (not the previous presenter) should describe what they did.
Do you…
Have an interest in developing mathematical skills?
Like critical thinking problems?
Enjoy real-life applications and working with data?
Want to develop communication skills and learn about new topic areas?
Then consider a degree in statistics!
Master's degree in Statistics: 2 years after your bachelor's degree.
Undergraduate minor in Statistics.
Undergraduate major? COMING SOON!
Website: Department of Statistics
Email: schwab.aimee@huskers.unl.edu