The goal of data science is to gain knowledge and insights from data. The ultimate goal of the course is to prepare you to use the tools of data science to conduct an independent analysis of your own.
This is your opportunity to apply everything we’ll learn about in the class to a topic that you personally care about.
Schedule
The final project will be a series of small assignments. Because this is a shortened summer course, we will do much of this in class and there will be no other homework. However, that also means it is especially important to stay on schedule for the final project.
- 7/8: In-class - introduce project, start thinking about research question.
- 7/12: Due: Draft of research question. In-class - discussion and review.
- 7/13: In-class - search for datasets.
- 7/15: Due: Research question and data source.
- 7/19: In-class - Read in dataset to R, start manipulation.
- 7/20: In-class - More data manipulation, work on plots.
- 7/22: Due: at least one plot from your analysis.
- 7/26: In-class - work on final project.
- 7/27: In-class - work on final project.
- 7/29: Due: final written report.
Step 1: Question
The core of your project is a research question - it is the question that you hope to answer. Of course, large and important questions cannot always be answered in a single project, but you can still make an argument and contribute evidence.
Generally, a good research question will want to explore systematic relationships between variables. Often, you will make a hypothesis about the relationship between an explanatory variable and a dependent variable.
- explanatory variable: a variable that, according to your hypothesis, helps to explain your outcome. Often also called an independent variable.
- dependent variable: your outcome.
Examples
Here are a few real examples of research questions that my students in the past have answered. Most of these are related to government and politics in some way because these are the courses I teach, but you should feel free to explore any kind of question you want:
- How do international standards for sex education (EV) relate to health outcomes for women and girls (DV)?
- Does NFL team success (EV) explain ticket prices (DV)?
- How have patterns of language (DV) changed over the past 30 years (EV) in modern song lyrics?
- Do campaign expenditures (EV) meaningfully influence Senate elections (DV)?
- How have crime rates and patterns (DV) changed over time (EV) in New York City?
- Does economic inequality (EV) increase patterns of terrorism (DV)?
Your research question must include:
- Explanatory variable: describe what it is, how it is measured. According to your theory, this is related to your outcome in some way.
- Dependent variable: your outcome and how it is measured.
- Hypothesis: a plausible theory about how your explanatory variable relates to your dependent variable. For example, an NFL team’s success on the field (EV) may bring them more fans and make it more attractive to attend their games. This could lead to increased ticket prices.
Step 2: Data Source
Once you have some idea of the topic you would like to answer, you should start looking for a data source. Here are some places to start that have been useful for my students in the past. Don’t limit yourself to these sources alone - be bold with your research question, and come talk to me if you can’t find a suitable dataset. We’ll find (or make) one!
Step 3: Analyses
Once you have your research question and dataset, you will start conducting your analyses. Most of you will have to do some data cleaning first. Since RMarkdown files can hold code, text, and plots, all of your work will be done here so it is easily reproducible.
Step 4: Final Report
Your final report will have the following sections:
- Introduction: introduce your research question and hypotheses. Discuss your research question in terms of your explanatory variable and dependent variable. Why does this matter for the real world?
- Data: discuss your data source and how you acquired it (and cite your source). Discuss how your explanatory and dependent variables are measured and presented in your dataset. Design at least one table or plot that describes your data.
- Results: analyze your research question. This section should explore the relationship between your explanatory and dependent variables. Discuss what you find.
- Conclusion: This section should (1) summarize your results, (2) discuss whether you find any support for your hypothesis, (3) critically analyze the potential limitations of your analysis, and (4) discuss a potential plan for future improvement.