This section has a maximum of 10 points.
This section has a maximum of 15 points. To earn the maximum number of points, you should answer the following three questions and your code should be clearly commented (what does each line of code produce).
Shortly explain your data and methods
Define the outcome and the explanatory variables
Name the sources of your data and how they can be obtained
Present summary statistics for the variables used in the analysis and sample size. Make a table note that explains variable definitions, units of measurements, and the like.
In order to obtain the maximum number of points for your analysis, you need to present summary statistics for all the variables used in a table. This should include the mean, standard deviation, minimum, maximum, and number of observations.
For all categorical variables, such as race and gender, you need to present them by category.
For trending variables, growth rates or graphs are more appropriate
Make sure that all variables are clearly labeled, and that you have the same number of observations for all the variables in your regression.
This section has a maximum of 10 points.
To earn the maximum number of points, you need to create a graph
of your variable of interest using ggplot. The graph should
have title, legend, axis labels, and figure caption.
Bar Charts: Useful for comparing categorical data. They can show the frequency or proportion of categories
Histograms: Suitable for displaying the distribution of a continuous variable
Line Graphs: Ideal for showing trends over time
Scatter Plots: Used to examine the relationship between two continuous variables. They can help identify correlations or patterns.
This section has a maximum 40 points. - To earn the maximum number of points, you need to answer the following questions:
Specify the population model you have in mind
Specify your model
Example: Effects of alcohol consumption on college GPA \[colGPA = \beta_0 + \beta_1 alcohol + \beta_2 hsGPA + \beta_3SAT + \beta_4female + u\]
Justify your specification choices
Present your results in a table that includes coefficients, standard error, R-squared, and the number of observations. Add control variables one by one, and report the results in separate columns along with the stars representing the significance levels. Ensure that the number of observations remains constant. If it is not, it could mean that you have not cleaned your data correctly.
Write about the magnitude and interpretations of your coefficients. Are they statistically significant? Do they have or not the expected sign (this may indicate a specification problem, for example, omitted variables)?
When using OLS: discuss why exogeneity assumptions hold
When using IV/2SLS: Explain why your instrumental variables fulfill the IV assumptions
When using panel methods: Explain what the unobserved individual specific effects stand for, and how they are removed/accounted for
If using OLS, discuss the ideal quasi-experimental design to estimate a causal effects
This section has a maximum of 20 points.
This section has a maximum of 5 points.