Final Exam Part II (60 points)
Note: For Final Exam Part II, I only accept typed answers in PDF format. You are free to use LaTeX/Word/Rmd/whatever to generate the PDF file.
You can find answers to all the questions in the course materials that are available online. You may also use other sources like Google, Wikipedia, and chatGPT to find more information. But keep in mind that Wikipedia and chatGPT may not give accurate answers (I am quite sure that chatGPT gives misleading answers for DiD and IV questions at this moment). So use them at your own risk.
Explain the training–validation–test datasets workflow used in the practice of statistical/machine learning. (10 points)
Some statisticians believe that “people cannot simply use multiple linear regression and observational data for causal analysis.” Do you agree with them? What’s the possible harm in interpreting the coefficients of a multiple linear regression model? (10 points)
What are the differences between an estimator, an estimate and an estimand? Explain using a regression example. (10 points)
Explain the three restrictions of an valid IV: relevance, exclusion and as-good-as-random assignment. Which restriction has been “ignored” most in previous empirical research? (10 points)
What does the LATE (Local Average Treatment Effect) theorem say? What additional assumption (except the three restrictions in the last question) do we need in interpreting the LATE estimand? (10 points)
What is a difference-in-differences estimator? Explain the parallel trends and no anticipation assumptions used in interpreting the difference-in-differences estimand. (10 points)