This is the capstone research project for this class, blending statistical analysis and communication. The UpShot is widely read by academics and non-academics. Each post is generally well-written, has an interesting question and a key point, and gets to the point fast. It features statistical analysis and visualizations as needed to drive the key point home and address likely reader questions. A non-specialist could read an UpShot post and come away smarter. Whatever you do later on, it’s (probably) the kind of output you’ll need or want to produce to convey quantitative insights to a broad audience. (Recommend time: similar to what you would invest in a final research paper or project)
I recommend starting the blog post early.
The blog post is an assignment where there is more subjectivity involved in its evaluation. To adhere to the principles of labor-based grading, I will allow for multiple submissions of the blog post. The idea here is that you can elicit feedback and evaluation on where on the effort tier the post stands and can choose to put in more time and effort to raise it another tier.
The effort tiers here are cumulative, i.e. to get a “complete” you must satisfy the “basic” and “complete” requirements.This assignment is inspired by Professor Wolcott, and many of the criteria below are borrowed from her rubric. Blog posts at any level must include a bibliography.
Basic: At a minimum, a blog post should have a clearly defined and stated purpose/question. It should be obvious to the intelligent layperson reader why they would be investing time in your post. There should be at least one data visualization/graphic. The visualization should satisfy the following criteria:
See “Visualizations” down below for some helpful advice.
Complete: To be a more complete representation of your learning from this class, a blog post should include some data analysis. The post should contain some basic data description, robustness checks, a regression with appropriate interpretation/presentation of the results, and a “Statistical appendix” explaining the messy details for the interested reader. The data description should be enough that your reader understands what’s in the data and where it comes from (words are fine in the main text). The robustness checks should be enough to answer some questions the reader may have about the generalizability/quality of the finding(s) (one or two in the main text is enough). If you make any causal claims, you should explain the necessary assumptions clearly in words; the reader should be able to understand these assumptions and when they might be violated. The results should be presented in a way that a reader who hasn’t taken ECON 210 can follow along (e.g. don’t just drop a table of R/Stata output in there, explain in words and maybe make a picture).
Extensive: To receive an “extensive” appraisal, you must (1) incorporate evidence outside the data, (2) write well, (3) do the analysis in both R and STATA, and (4) provide a replication package for your results. (1) involves choosing an appropriate set of peer-reviewed papers and seamlessly integrating them into the arguments, summarizing key points or issues in the sources cited to critically analyze those ideas and relate them to the post’s purpose. (2) involves controlling pace, rhythm, and variety; words chosen should be apt and precise; sentences should flow smoothly together and clearly open, develop, and close topics. Use the active voice. I recognize that “writing well” is a subjective thing. I encourage you to go to to the Writing Center for help with writing. On (4): A replication package is a folder (OneDrive Folder) that contains:
Generate a link to this replcation package folder and include it in the statistical appendix of your blog post.
You are welcome to chat with me (office hours, schedule an appointment, Slack) about your post.
The blog post is an assignment where there is more subjectivity involved in its evaluation. To adhere to the principles of labor-based grading, I will allow for multiple submissions of the blog post. The idea here is that you can elicit feedback and evaluation on where on the effort tier the post stands and can choose to put in more time and effort to raise it another tier.
To do this, just submit your post on Canvas and send me a DM on slack. Allow about 72 hours for a response. If there are is a surge of submissions in the final weeks, I may have to extend the response time (all the more reason to start early!)
Note that there is a final deadline for the posts. This date is on Canvas. After this date, no resubmissions will be possible.
Aim for the style from in this NY Times post with the caveat that you’re reporting on your own analysis, not someone else’s
Start writing down a few topics/subjects that you have interest in. Read about them. Keep a notepad with possible research questions.
Typically, a good research question to ask is specific and has an independent variable (X) and a dependent variable (Y).
Here are some examples:
A: What is the effect of exercise on mental health?
B: What is the effect of legalization of recreational marijuana on drug overdoses?
C: Is there racial inequality in criminal sentencing?
For A and B, you can write down a causal graph, while C is probably more of a descriptive analysis.
Following the examples A,B, and C from above, here are some ideas:
A: Exercise = X and will be measured as number of hours per week, Mental Health = Y and will be measured using a scale of mental health
B: Legalization = X and will be equal to one if a state has legalized it and zero if not, Drug Overdose = Y and will be the number of OD per 100,000
C: Racial Inequality = X and will be 1 if a defendant is a person of color, Sentencing = Y and the number of months a person is sentenced
This is really important and will help you in your search for data. Remember, the unit of observation tells you what each row is in your dataset. Following the examples A,B, and C from above:
A: Individual-level data
B: State-level
C: Individual or County level
There are a lot of different ways of doing this. To be honest, I usually do a first-pass on Google. Here are some additional ideas:
After you have done some preliminary searches and aren’t finding what you want, you can schedule an appointment with Ryan Clement directly.
To make this appointment as helpful as possible, you should let Ryan know
Ryan is amazing and thus popular, especially late in the semester when everyone is working on a final project. You will probably increase your chances of meeting with Ryan if you plan accordingly.
Is is very easy to procrastinate. I will have milestones in the form of periodic quizzes to check in on your progress.
Professor Rao, Bea Lea, and I recommending watching both videos. This should put you in a good position for creating a strong data visual.
A brief introduction to data visualization.
Advice on using ggplot to create strong visualizations
A knitted html. Why? I would like to be able to create a portfolio of posts. Note that I will ask for permission before including any posts in a portfolio. This collection maybe presented to the college community as examples of the amazing work that students are doing.
Include it with the main text. Your final post should be a single file with the main text, figures, bibliography, and any appendices.
Your code and warning messages should not be visible in your final html post. Typically you have two types of code chunks
-Chunks that manipulate the data (i.e. load, mutate, join, merge, etc . .). You don’t want to see the code or messages. Simply add include = FALSE in the chunk.
-Chunks that generate a result (i.e. ggplot). You don’t want to see the code or messages, but you want to see the result. Add the following to the chunk echo=FALSE, message=FALSE, warning=FALSE
See this post for more details.
An example markdown file and html output using Problem Set 9 data.
Think of your post in layers. The main text is for a general audience who cares a lot about your question and findings, but not so much about your methods and the details. The statistical appendix is for a more specialized audience who cares a lot about your methods and the details as well, but doesn’t want to see your code. The replication package is for a still-more specialized audience who wants to have the data and recreate your analysis themselves. Only the final layer of readers cares about your code, and even they don’t want to see it woven into your text.
At the Basic tier, you should describe your data well enough that a reader who is not steeped in your question understands what the variables are, how they were collected/constructed, what different values mean, and some of the issues which might come up in applying the data to your question. You should tell the reader how many observations you have, what time periods/regions they cover, what the unit of observation is, and what the units of measurement are. You can express this in 1-2 paragraphs.
At the Complete tier, you should include summary statistics of the variables in your statistical appendix. If many observations are missing, you should describe briefly whether this is likely to be a problem for your analysis or not. If you had to construct measures for your study, you should describe how you constructed them and what assumptions your procedure entailed. You can express all this in a table and 1-3 paragraphs (though if you need more space, you may use it)
Your main text should be no more than a 15-20 minute read, including time spend looking at figures. The average adult (apparently) reads at 225-250 words per minute. Suppose you include 2 figures, and each takes (no more than) 3 minutes to interpret and digest. That gives you a total budget of 3150 words, maximum.
Do not make it too long. I would recommend targeting 15 minutes for the main text, giving you a tighter budget of around 2025 words (including 2 figures). You can always put more detail in your appendix and refer to it (e.g. “This is robust to alternative definitions of my outcome variable; see Appendix for details.”)
You have some creative leeway here, but please make sure the flow is readable and clear. You don’t need to have an “Introduction” section header, but using section headers is a good idea. Use paragraphs. Your title should be informative. Your main question should be stated clearly within the first couple hundred words (I should not have to wait more than a minute to learn what your question is).
Yes, I expect you to use R or Stata for graphs in the blog post. A big part of the class is learning to use statistical programming software to produce high-quality analysis and graphics; you should incorporate that learning into your final project.