You final project is to do a novel data analysis to answer a question and write about it. This can be interpreted broadly and the requirements are discussed below.
The rough outline of the project is: Start with a question. Find data that might get at that question. Play around with the data. Attempt to answer the question. Iterate. Communicate.
Your project should have one significant aspect to it. Examples might include,
You will work in groups of 3 people, assigned by the instructor. See below for grading details and the group work policies.
There are two Final Project deliverables: a blog post and the analysis code/data. The final project is due Thursday, December 7th at 11:00pm.
Write a blog post in R Markdown aimed at a general audience (think 538).
All analysis code should be well documented. The main technical results (plots, regressions, etc) should be written up in a well documented, supporting technical document (using R Markdown). You might also include R scripts for cleaning data or helper functions.
Your data set (both raw and processed) should be submitted as well. All documents and files must be zipped into one directory for submission.
An important part of your project is your process book. Your process book details your steps in developing your solution, including how you collected the data, alternative solutions you tried, describing statistical methods you used, and the insights you got. Equally important to your final results is how you got there! Your process book is the place you describe and document the space of possibilities you explored at each step of your project. We strongly advise you to include many visualizations in your process book. Your process book should include the following topics. Depending on your project type the amount of discussion you devote to each of them will vary:
Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.
Related Work: Anything that inspired you, such as a paper, a web site, or something we discussed in class.
Initial Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis? - Data: Source, scraping method, cleanup, storage, etc.
Exploratory Data Analysis: What visualizations did you use to look at your data in different ways? What are the different statistical methods you considered? Justify the decisions you made, and show any major changes to your ideas. How did you reach these conclusions?
Final Analysis: What did you learn about the data? How did you answer the questions? How can you justify your answers?
Presentation: Present your final results in a compelling and engaging way using text, visualizations, images, and videos on your project web site.
Describe the storytelling elements and goals in your process notebook and show us sketches and screenshots of different web site iterations. As this will be your only chance to describe your project in detail make sure that your process book is a standalone document that fully describes your process and results.
You can find a large amount of data online. I encourage you to “gather your own data online” by doing something like scraping Twitter. Or you are feeling lazy, you can choose from some suggestions below.
There are some obvious places to look like data.gov. I’ve put together a collection (of collections) of interesting datasets you can find online: here.
If you are already doing reserach with a dataset you are welcome to use it, but you have to do something new for this final project. That is, the work you submit cannot be part of any other course.
Your team’s grade will be 50% blog post and 50% analysis code. Your individual grade will be weighted by your team member’s reviews.
The project will be graded on
These are some examples of interesting analyses. Many of these examples would take longer than you have for the final project.