For your final project, you will take a dataset, explore it, tinker with it, and tell a nuanced story about it using at least three charts. I want this project to be as useful for you and your future career as possible - you’ll hopefully want to show off your final project in a portfolio or during job interviews.
Accordingly, you have some choice in what data you can use for this project. I’ve found several different high-quality datasets online.
You do not have to choose a dataset in your given emphasis. Choose whatever one you are most interested in or will have the most fun with.
Nonprofit / Business management
U.S. Charities and Non-profits: All of the charities and nonprofits registered with the IRS
515K Hotel Reviews Data in Europe: 515,000 customer reviews and scoring of 1,493 luxury hotels across Europe
Government management
Deadly traffic accidents in the UK (2015): List of all traffic-related deaths in the UK in 2015Source: data.gov.uk
Firefighter Fatalities in the United States: Name, rank, and cause of death for all firefighters killed since 2000Source: FEMA
Federal Emergencies and Disasters, 1953–Present: Every federal emergency or disaster declared by the President of the United States since 1953Source: FEMA
Global Terrorism Database (1970–2016): 170,000 terrorist attacks worldwide, 1970-2016Source: National Consortium for the Study of Terrorism and Responses to Terrorism (START), University of Maryland
Datasets for Visualization - Tidy Tuesday: Real-world data that can be used for wrangling and visualization.
For the final project, you as a group will write a memo using R Markdown to introduce, frame, and describe your story and figure. Use the final project template to get started. You should include the following in the memo:
There will be no final exam - this final project is the final exam
This final project is worth 30% of your final grade.
The skills covered in this course are rooted in design rules and principles rather than formulas and equations. As such, the application of these principles to a real data problem is one of the best ways to learn and assess mastery of these skills. I guarantee you one day you will need to apply these principles to communicate an idea or a story to audiences, so let’s make sure you have at least one chance to practice before the stakes are higher.
Your final project should illustrate your ability to transform raw data into insights by making the non-visible visible, showing clear trends or patterns, and / or identifying outliers or missing information. The specific skills involved in achieving this goal include all of the course learning objectives listed on our E-class page.
You will work on your final project as a team of 2 students of your choosing. If you choose to work in a team, you must finalize your team members at the time before our class meeting on November 14th.
Use the final project template for your analysis and report. Your final report should be written as a .Rmd file that compiles to a html webpage. Publish your compiled page online (e.g. via RPubs, Github, etc.), then submit your entire project folder (including your .Rmd file, data files, image files, etc.) as a single .zip file on E-class by the due date. Also include a link to the published report page in your E-class submission. For students in teams, only one person from your team should submit the file.
We will use this rubric to grade your report.
Your final project should be a fully reproducible product and available online as a html webpage. It should include text, data, code, and plots. Students working in teams must submit a short review of their teammates’ contributions to help ensure that the workload and grading for team members are equitable (see task 5 below). Below is a list of specific items your report should include (check the rubric to see their relative weighting).
State your research question and motivate why it is important / why the reader should care.
Describe your data:
On E-class under the assignment titled “Team Member Review [Final Report]”, submit a short description of the specific contributions of each team member in your team. For example, “Student A described the data and wrote the documentation for it. Student B led the data cleaning process and did much of the initial data exploration. Student C helped write code for the main visualizations.” These reviews will be kept confidential and compared to assess that the workload and grading for team members are equitable. Team members who do not make meaningful contributions to their projects will receive a lower grade than that of their team mates.
You can download a full example of what a final project might look like (but don’t make your final visualization look exactly like this—show some creativity!)
Here are a couple examples of exploratory data analysis and visualization:
Does the news reflect what we die from?, by Hannah Ritchie: This is an interesting original analysis from Our World in Data. The analysis reveals a large disconnect between what we see in the news and the day-to-day reality in terms of what we die from. This analysis also uses some interactive graphics so that the reader can view different versions of the same plot, such as scrolling through the plot over time.
Are first babies more likely to be late?, by Allen Downey. This is another relatively short analysis, but the author does a good job of documenting his data sources, stating the assumptions he makes, and also describes what data were dropped from the analysis (i.e. babies born via C-section).
This assignment is inspired and/or modified from other sources, including: