Lauren Goldman, Mai Kubin, Oshin Pandey

Literature Review

This section has a maximum of 10 points.

  • Find at least 3 research papers on your topic. State their findings and discuss the variables they control for, as well as the method used to establish a causal relationship.
  • State basic objectives and explain why the topic is important
  • State your contribution: what has been done in the literature and how you add to this
  • Grab the reader’s attention by presenting simple statistics, paradoxical evidence, topical examples, or challenges to common wisdom
  • Give a short summary of results

Data and Summary Statistics

This section has a maximum of 15 points. To earn the maximum number of points, you should answer the following three questions and your code should be clearly commented (what does each line of code produce).

  • Shortly explain your data and methods

  • Define the outcome and the explanatory variables

  • Name the sources of your data and how they can be obtained

  • Present summary statistics for the variables used in the analysis and sample size. Make a table note that explains variable definitions, units of measurements, and the like.

  • In order to obtain the maximum number of points for your analysis, you need to present summary statistics for all the variables used in a table. This should include the mean, standard deviation, minimum, maximum, and number of observations.

  • For all categorical variables, such as race and gender, you need to present them by category.

  • For trending variables, growth rates or graphs are more appropriate

  • Make sure that all variables are clearly labeled, and that you have the same number of observations for all the variables in your regression.

Graphs

This section has a maximum of 10 points.

  • To earn the maximum number of points, you need to create a graph of your variable of interest using ggplot. The graph should have title, legend, axis labels, and figure caption.

  • Bar Charts: Useful for comparing categorical data. They can show the frequency or proportion of categories

  • Histograms: Suitable for displaying the distribution of a continuous variable

  • Line Graphs: Ideal for showing trends over time

  • Scatter Plots: Used to examine the relationship between two continuous variables. They can help identify correlations or patterns.

Methodology

This section has a maximum 40 points. - To earn the maximum number of points, you need to answer the following questions:

  • Specify the population model you have in mind

  • Specify your model

  • Example: Effects of alcohol consumption on college GPA \[colGPA = \beta_0 + \beta_1 alcohol + \beta_2 hsGPA + \beta_3SAT + \beta_4female + u\]

  • Justify your specification choices

    • A convincing discussion of what variables to control for is essential. Are your results causal or not? Discuss potential concerns with omitted variables etc along with the consequences of those problems.
  • Present your results in a table that includes coefficients, standard error, R-squared, and the number of observations. Add control variables one by one, and report the results in separate columns along with the stars representing the significance levels. Ensure that the number of observations remains constant. If it is not, it could mean that you have not cleaned your data correctly.

  • Write about the magnitude and interpretations of your coefficients. Are they statistically significant? Do they have or not the expected sign (this may indicate a specification problem, for example, omitted variables)?

  • When using OLS: discuss why exogeneity assumptions hold

  • When using IV/2SLS: Explain why your instrumental variables fulfill the IV assumptions

  • When using panel methods: Explain what the unobserved individual specific effects stand for, and how they are removed/accounted for

  • If using OLS, discuss the ideal quasi-experimental design to estimate a causal effects

Results

This section has a maximum of 20 points.

  • Use tables to present regression results
  • Include standard errors, R-squared, the overall F-statistic, and the number of observations
  • Use asterisks to denote the statistical significance level (∗ ≡ 10%, ∗∗ ≡ 5%, ∗ ∗ ∗ ≡ 1%)
  • Discuss economic significance
  • If coefficients do not have the expected signs, this may indicate there is a specification problem, for example, omitted variables
  • Relate differences between the results from different methods to the differences in the assumptions underlying these methods
  • If results can be shown using figures, it is a good idea to use them
  • All tables and figures should have notes to make them self-explanatory

Conclusion

This section has a maximum of 5 points.

  • Shortly summarize (one of few sentences) your main results
  • State your key implications for theory, literature, and policy
  • Suggest directions for further research
  • Keep your conclusions section short: 2–3 paragraphs