Part 1: Getting Started in Statistics

Aimee Schwab
June 10, 2013

One of my goals for this class is to improve your “statistical literacy” and critical thinking about facts and figures presented to you in everyday life. To do this, we need to start with a few definitions.

Statistics is the art and science of learning from data.

Statisticians go through a three-step process. We…

  1. Design studies
  2. Analyze the data collected
  3. Translate the data into knowledge about the world around us

Population: the total set of subjects in which we are interested

Sample: the subset of the population for which we have (or plan to collect) data

  • Taking a sample gives us more information about the population.

Example: For each scenario, identify the population and describe the sample.

  1. A 2010 Gallup poll asked 1087 adults, “Do you have occasion to use alcoholic beverages such as liquor, wine, or beer, or are you a total abstainer?”
  2. Nielsen Media Research surveys 5000 randomly selected households and finds that among the TV sets in use, 19% are tuned to 60 Minutes.
  3. A graduate student at UNL conducts a research project about how adult Americans communicate. She mails a survey to 500 adults. She asks them to mail back a response to the question: “Do you prefer to use email or snail mail?” She gets back 65 responses, with 42 of them indicating a preference for snail mail.

Random Sampling: a sampling method in which every member of the population has the same chance of being included in that sample.

There are lots of ways to get a random sample! Random samples tend to be representative of the population, so we can draw conclusions from them.

One way to draw conclusions from data is to calculate certain values.

Parameter: a numerical summary of the population

Statistic: a numerical summary of the sample

Ideally, we can use a sample statistic to help us find a “best estimate” for the population parameter.

Example: Identify whether the scenario is talking about a parameter or a statistic.

  1. Based on a sample of 877 executives, 45% of them would not hire someone with a typo on their job application.
  2. The average number of textbooks purchased by students in this class this semester is 4.2.
  3. A sample of students is selected and the average amount of time waiting in line to buy textbooks is 0.65 hours.
  4. In a study of all 2223 passengers aboard the Titanic, it was found that 702 survived when it sank.

Example: Internet sites report that about 13% of Americans are left handed. Is this true at UNL? During a chemistry exam, the instructor walks around the room and counts 15 left-handed students out of 98 in the class. Identify the following:

  • variable of interest
  • population
  • sample
  • parameter
  • statistic
  • Was random sampling used in this example? Why or why not?

Example: Internet sites report that about 13% of Americans are left handed. Is this true at UNL? During a chemistry exam, the instructor walks around the room and counts 15 left-handed students out of 98 in the class. Identify the following:

  • variable of interest: proportion of people who are left-handed
  • population: all Americans
  • sample: students in UNL chemistry class
  • parameter: 13%
  • statistic: 15/98
  • This is not a random sample of all Americans! We could possibly generalize these results to UNL students.

So how can we make conclusions from data?

Descriptive statistics: methods for summarizing data collected from a sample or a population.

  • Descriptive statistics usually consist of graphs and numbers like the mean.

Descriptive statistics allow us to summarize what's happening in our sample or population. If our sample is small and our population is large, uncertainty can become a problem!

Inferential statistics: methods of making decisions about a population, based on data obtained from a sample of that population.

  • In inference, we don't have data from the entire population. We need to make some assumptions to relate results from the sample to the population as a whole.

In practice, both descriptive and inferential methods are used on the same data. Descriptive methods are incredibly straightforward, and most of you have used them before. The majority of our time will be spent on inference.