Aimee Schwab
June 13, 2013
Where We're Headed…
Disclaimer: most of the examples in this section are geared toward studies using humans as subjects. Why? As a whole, we tend to be most familiar with this type of research, and it mirrors what you'll be asked to do for your project. These ideas (especially experimental studies) can be extended to any desired subjects: cars, crops, animals.
When we design a study, we need to decide which variables we're most interested in. There are two major types of variables:
Response variable: the outcome variable on which comparisons are made
Explanatory variable: the variable that defines the groups (categorical) or changes in numerical values (quantitative) to be compared with respect to values for the response variable
The response variables are what we want to make a statement about. For example, I might want to design a survey to investigate what makes a person more likely to vote Republican or Democrat in the upcoming election. My response variable would be their voting preference - Republican or Democrat. You can sometimes think of the response as the final outcome.
There's lots of explanatory variables I could use. I could record a person's age, gender, income, education level, or marital status as explanatory variables. These explanatory variables might have an effect on who my subject is more likely to vote for. Changes in the explanatory variable explain changes in the response variable!
The first part of any study or experiment is the design stage. Before we can collect our data, we have to consider what methods to use. Some methods will be more appropriate than others, depending on the variable we are ultimately interested in.
Experimental study: a researcher assigns subjects to certain experimental conditions, then observes outcomes of the response variable
Observational study: a researcher observes values of the response variable and explanatory variables for the sample subjects without manipulating those subjects in any way
Example: For each scenario below, determine whether it is an observational or experimental study.
On the previous slide, 1 and 4 are experimental studies. 2 and 3 are observational studies.
Caution: It is very difficult to determine causation from observational studies, because of possible confounding effects between variables. Two variables are confounded when their effects on a response variable cannot be distinguished from each other. Unfortunately, sometimes it is unethical, or very difficult to conduct experiments. Even observational studies, when done properly, can provide good data.
Biased samples systematically favor certain outcomes over others. There are two relatively common ways researchers sample a population that are actually biased. If possible, you should avoid using these types of samples.
Volunteer sample: subjects volunteer to participate in the study -Typically one segment of the population may be more likely to volunteer than others, so these samples may be biased.
Convenience sample: the researcher includes subjects who are most convenient in the study instead of choosing a random sample
Example: Identify the following scenarios as a volunteer sample or a convenience sample. Explain your choice.
An NBC television news reporter gets a reaction to a breaking story by polling people as they pass in front of his studio.
The BBC requested viewers to call the network and indicate their favorite poem. Of more than 7500 callers, more than twice as many voted for Rudyard Kipling's If than for any other poem.
1 is convenience, 2 is volunteer.
Random sample: subjects are chosen at random to participate in the study or experiment.
Random samples have some huge advantages! Random sampling lets us avoid selection bias - subjects aren't chosen to be in the sample based on convenience alone, nor is one group more likely to be chosen.
Random samples also allow us to use probability to analyze our results and make inferences about the data. All statistical methods we'll use in this class assume that data comes from a random sample.
Simple random sample: a sample of \( n \) subjects from a population in which each possible sample of that size has the same chance of being selected
Sampling frame: a list of subjects in the population from which the sample is taken
For example, if I wanted to randomly sample 10 students from Stat 218, I could use the class roster as my sampling frame.
For the social sciences, most often we will use sample surveys to collect the data. Data for sample surveys is collected in one of three ways, depending on the goals of the study and the budget of the researcher.
Personal interview: an interviewer asks prepared questions and records the subject's responses.
Subjects are most likely to agree to participate in a personal interview, but these are the most costly to administer. If the research topic is sensitive (such as alcohol and drug use), a subject may be less likely to answer truthfully.
Telephone interview: an interviewer asks prepared questions over the phone (like a Gallup survey), and records the subject's spoken responses.
These are cheaper since there is no travel involved, but subjects are less likely to participate. Interviews have to be short over the phone or the subject may decide to stop it short.
Questionnaire: subjects are requested to fill out a questionnaire that's sent to them by email, traditional mail, or by some other means.
These are the cheapest interviews to conduct, but subjects are most likely to not participate.
Telephone interviews are most used in major national polls, and are probably what you're most familiar with.
While sample surveys are the most efficient method to collect some types of data, there are many sources of potential bias from survey sampling.
Undercoverage: the sample lacks representation from part of the population
Nonresponse bias: some sampled subjects cannot be reached or refuse to participate
Response bias: subjects may give the response that they think is socially acceptable, not necessarily the response that is correct
Example 1: A highly conservative website asks its readers, “Do you support gay marriage?” 99% of the website's respondents said that they do NOT support gay marriage. Do you believe that 99% of Americans do not support gay marriage? How might this survey be biased? What could you change to eliminate this bias?
Choose another website.
Example 2: Read the two versions of the question below. Which do you think will get more “yes” answers and why? What would be a better question to include on a survey?
"Do you believe guns should be banned?”
Example 3: In Part 1, we identified the population and sample for this scenario. What type of sample survey is this? How might the survey be biased? Are the results believable?
Example 4: We are interested in the percentage of households that still bake bread the old-fashioned way. To answer this question, a researcher makes random phone calls between 9 am and 5 pm. How might this survey be biased?
Example 5: A researcher wants to conduct a survey concerning students' sexual habits. How could each of the following influence student responses?
There are also other types of random sampling that we can use. Most of these are used in survey sampling.
Cluster sample: the population is divided into a large number of clusters, such as counties or city blocks. A simple random sample of the clusters is selected, and all subjects in those particular clusters are sampled.
Stratified random sample: the population is divided into separate groups called strata. A simple random sample is taken from each of the strata.
Example: For each sample, identify whether the researcher used cluster sampling or stratified random sampling. Can you explain why a specialized sampling design is better than a simple random sample for these cases?
Example: For each sample, identify whether the researcher used cluster sampling or stratified random sampling. Can you explain why a specialized sampling design is better than a simple random sample for these cases?
From before, remember that an experimental study is when we actually do something to people, animals, or objects in order to observe the response. When we do this, we are interested in:
Treatments: experimental conditions that are randomly assigned to each experimental unit (subject)
Explanatory variable: the treatment that was assigned to that particular experimental unit (subject)
Response variable: the outcome variable of interest. We want to compare the effects of each treatment on the response variable
Suppose that I wanted to do a study on whether antidepressants help people quit smoking. My experimental units (subjects) could be adults who were 18 or over and had smoked 5 or more cigarettes per day for the previous year. My treatments could be two different antidepressants, A and B. The explanatory variable would then be whether the subject was on antidepressant A or antidepressant B. The response variable would be whether the subject had successfully quit smoking at the end of the experiment.
What specifically makes this an experiment, and not an observational study? I've manipulated my subjects, and assigned them to one drug or the other! They did not choose the treatment that they received.
In an experimental study, treatments should be randomly assigned to each experimental unit. Random assignment has a couple of benefits:
Example: We are interested in studying the effects of diet on weight loss. What are the response and explanatory variables? What are the experimental units? What treatments could we apply?
To avoid confounding effects in our experiments, we try to make sure the conditions are as similar as possible for all variables except the factors we are studying. In a laboratory setting we can control everything, but in some situations it is not very realistic.
In some experiments, especially medical studies, we usually add a placebo (dummy treatment) to the experiment. This is because of the psychological effect that the placebo can have, called the placebo effect. Some people really do get better with a dummy treatment, because they believe they are getting an active treatment.
Control group: the group using the placebo treatment. Helps determine the true effect of the treatment by giving the researchers a baseline response to compare to.
Blind study: subjects are unaware which treatment they are receiving.
Double blind study: subjects and those interacting with them are unaware which treatment they are receiving.
When we're setting up an experiment, it's important to make sure that we are using a valid randomization method. We can create random samples by using a mechanical method to select subjects and assign them to treatments.
Replication: when several experimental units are assigned to each treatment.
Why is replication desirable? Each experimental unit will react a little differently to the treatments. By using as many experimental units as we can, we reduce the chance of any treatment getting “lucky”. The effects of any particular experimental unit will be averaged out by the rest.
Example: Eight people with headaches are given one of Advil, Tylenol, Excedrin, or a placebo. The time until they report the pain is gone is recorded for each person.
Differences between two or more treatments are called statistically significant if they are too large to be attributed to chance.
When a study reports a statistically significant result, it means the researchers found good evidence to support their hypothesis.