Three opportunities for additional points will be outlined in this document:

  1. Bonus analysis notebook: Due the Wednesday after spring break at 11PM. (On Canvas)

  2. Tidy Tuesdays: Due April 29 at 5PM. (Canvas assignment to come - but you can get started with instructions below)

  3. JedR write up: Due April 29 at 5PM. (Canvas assignment to come - but you can get started with instructions below)

Bonus analysis notebook

There are two prompts worth 10 points each and an opportunity to pose and answer your own question. These build off of the dataset we’ve been working on the last couple of weeks. Just in case the dataset wasn’t quite right in your last homework, I have attached it via email, so you should work off of that CSV.

Tidy Tuesday - creating mini data stories

Tidy Tuesday is a social data activity where every Tuesday people work from the same dataset to tidy/clean the data and produce and publish visualizations. Working professionals use it to keep themselves current and thinking creatively, and it’s also a great way as a student to stretch yourself a little.

Typically, once the dataset is released, people will code to get the data into shape and then they create and post the visualization, along with their code, on some sort of social media with the hashtag #TidyTuesday. There are more instructions here, where you will also find each week’s data, both current and archive: https://github.com/rfordatascience/tidytuesday/tree/main

So, you have the opportunity to earn up to 30 points for each Tidy Tuesday you complete. You may complete up to three. You do not have to turn around a Tidy Tuesday the same week it is posted. You can look through the archives and find a dataset you’re interested in.

You should browse through #TidyTuesday on social media to see what others have done, understand how it works and draw inspiration. Twitter/X probably has the most examples. But, it goes without saying, if you copy someone’s code or project - not ok!

Some of the projects people have done may seem intimidating. I’m not looking for anything super fancy. It would be nice to have a finding that tells a little story, not just a ‘water is wet’ result that you would never consider newsy. But the main purpose is simply to get practice with coding and putting together something coherent.

We haven’t covered visualization yet but will soon. You can get started with a project anyway. Many of the viz on #TidyTuesday will have been produced in R/ggplot and then maybe polished off in Illustrator. But you are welcomed/encouraged to produce your visualization in Datawrapper or Flourish.

We also haven’t covered uploading your code to GitHub yet, and that will be part of it. If you are ready to go and that’s your last step and we haven’t covered it, let me know.

I would encourage you to check in with me before you post. Certainly feel free to any time you have a question or get stuck, but also consider me like an editor. You would typically have someone else look at your work before publishing.

Okay, so that’s most of the information you need, I think. Here are the specific steps required:

10 points:

  • Use R to clean and analyze a Tidy Tuesday dataset. You can join it with other data sourced elsewhere too if you want to.

  • Use R to get the dataset into usable shape for visualization (in other words, you export a CSV you’ve created in R and use that directly for your viz without any further manipulation)

10 points:

  • Use Datawrapper or Flourish (or R or a tool of your choice) to create a visualization that highlights an interesting finding from your analysis

5 points:

  • Communicate your mini data story: publish your visualization along with ~3 sentences (or more) of explanation about the finding. This has to be publicly available. You can create a social media account just for this purpose if you don’t want it to be on your regular account or if you don’t have any social media account. (But, it’s also a good opportunity to show potential employers what you can do!)

5 points:

  • Publish your code, including all data required to run your code, on GitHub.

I will also set up a Canvas assignment where you can submit links to what you have published and formally “turn in” your work.

JedR write up

At NICAR, I learned about this work-in-progress code practicing site from a prof at UT Austin: https://utdata.github.io/jedr-academy/trials/. It uses the starwars dataset, which is a practice dataset that’s built into the dplyr package.

We have covered most of what’s in it, or at least started to cover it. With the exception of plotting. It could be useful for you as a way to practice what you’ve learned, or maybe it will present it in a new way that will help you understand it more.

If you complete the exercises (called trials), there will be questions set up for you as a Canvas assignment to provide feedback on if and how it was helpful, whether you would appreciate something like this earlier in the learning process or whether it is good as a way to practice, etc. I’ll post that assignment soon, but it won’t be due until our last class.