Final Project Overview

For your final project, you will take a dataset, explore it, tinker with it, and tell a nuanced story about it using at least three charts. I want this project to be as useful for you and your future career as possible - you’ll hopefully want to show off your final project in a portfolio or during job interviews.

Accordingly, you have some choice in what data you can use for this project. I’ve found several different high-quality datasets online.

You do not have to choose a dataset in your given emphasis. Choose whatever one you are most interested in or will have the most fun with.

Nonprofit / Business management

Government management

Datasets for Visualization - Tidy Tuesday: Real-world data that can be used for wrangling and visualization.

Instructions

For the final project, you as a group will write a memo using R Markdown to introduce, frame, and describe your story and figure. Use the final project template to get started. You should include the following in the memo:

Due: December 9th by 1:30 pm

There will be no final exam - this final project is the final exam

Weight:

This final project is worth 30% of your final grade.

Purpose:

The skills covered in this course are rooted in design rules and principles rather than formulas and equations. As such, the application of these principles to a real data problem is one of the best ways to learn and assess mastery of these skills. I guarantee you one day you will need to apply these principles to communicate an idea or a story to audiences, so let’s make sure you have at least one chance to practice before the stakes are higher.

Skills & Knowledge:

Your final project should illustrate your ability to transform raw data into insights by making the non-visible visible, showing clear trends or patterns, and / or identifying outliers or missing information. The specific skills involved in achieving this goal include all of the course learning objectives listed on our E-class page.

Teams:

You will work on your final project as a team of 2 to 3 students of your choosing. If you choose to work in a team, you must finalize your team members at the time before our class meeting on November 14th.

Submission Details:

Use the final project template for your analysis and report. Your final report should be written as a .Rmd file that compiles to a html webpage. Publish your compiled page online (e.g. via RPubs, Github, etc.), then submit your entire project folder (including your .Rmd file, data files, image files, etc.) as a single .zip file on E-class by the due date. Also include a link to the published report page in your E-class submission. For students in teams, only one person from your team should submit the file.

Assessment:

We will use this rubric to grade your report.

Tasks:

Your final project should be a fully reproducible product and available online as a html webpage. It should include text, data, code, and plots. Students working in teams must submit a short review of their teammates’ contributions to help ensure that the workload and grading for team members are equitable (see task 5 below). Below is a list of specific items your report should include (check the rubric to see their relative weighting).

  1. Follow these formatting rules:
  • As the report will compile to a html webpage, there is no length requirement; your report should be sufficiently detailed to address the requirements listed below and sufficiently concise such that it is expressed in the fewest necessary words.
  • In your markdown YAML header, include the project title and the name(s) of student(s) involved in the project such that they appear at the top of the rendered html page.
  • Your report should be fully reproducible - all data formatting and charts should be written in code chunks and rendered when you compile your .Rmd file to a html webpage.
  • Your report should be written in a narrative format (i.e. using coherent paragraphs rather than a series of bullet points). You may use headings where appropriate to break up your report into sections.
  • Proofread your html webpage before you submit - double check for spelling and formatting errors, especially rendered charts and tables!
  1. State your research question and motivate why it is important / why the reader should care.

  2. Describe your data:

  • Download a dataset and explore it. Many of these datasets are large and will not open (well) in Excel, so you’ll need to load the CSV file into R with read_csv(). Most of these datasets have nice categorical variables that you can use for grouping and summarizing, and many have time components too, so you can look at trends. Your past lecture scripts and homework assignments will come in handy here.
  • Articulate the main variables of interest in your project, and justify your choice of variables.
  • Provide descriptive statistics for your relevant variables. These can be a mix of graphs and summary tables.
  • You don’t have to summarize everything in every data set - just the variables that are relevant to your analysis.
  1. Describe your results:
  • Find a story in the data. Explore that story and make sure it’s true and insightful.
  • Display charts that either support or oppose your research question or illustrate what else you might need to address your research question.
  • Write narrative text around your charts to explain what the charts show and their significance towards addressing your research question. This should read as a continuous story rather than as a reply to each of the requirements described here.
  • Your plot type choices should highlight the main point(s) you want to make or clearly show the relationship you want to emphasize. Basically, they should answer the question “what do the data say about my research question?”
  • Your charts should be polished, following the design principles we have covered in class.
  • Export the charts you created into the folder named “images”.
  • You must include at least three different polished charts (i.e. don’t just make three scatterplots).
  1. For students working in teams:

On E-class under the assignment titled “Team Member Review [Final Report]”, submit a short description of the specific contributions of each team member in your team. For example, “Student A described the data and wrote the documentation for it. Student B led the data cleaning process and did much of the initial data exploration. Student C helped write code for the main visualizations.” These reviews will be kept confidential and compared to assess that the workload and grading for team members are equitable. Team members who do not make meaningful contributions to their projects will receive a lower grade than that of their team mates.

Examples:

You can download a full example of what a final project might look like (but don’t make your final visualization look exactly like this—show some creativity!)

Here are a couple examples of exploratory data analysis and visualization:

  • Does the news reflect what we die from?, by Hannah Ritchie: This is an interesting original analysis from Our World in Data. The analysis reveals a large disconnect between what we see in the news and the day-to-day reality in terms of what we die from. This analysis also uses some interactive graphics so that the reader can view different versions of the same plot, such as scrolling through the plot over time.

  • Are first babies more likely to be late?, by Allen Downey. This is another relatively short analysis, but the author does a good job of documenting his data sources, stating the assumptions he makes, and also describes what data were dropped from the analysis (i.e. babies born via C-section).

Sources:

This assignment is inspired and/or modified from other sources, including: