Project Block 2: Advanced Statistics
1 Part 1 - Advanced topics in multiple regression
In this part, you have to improve the multiple regression model you ended up in block 1 of the course (here is a description of the previous project).
The topics we will learn and apply in this part are the following:
Multiple regression with categorical variables
Multiple regression with interaction effects
You have to do the following:
Make sure that all the variable calculations and descriptive statistics is correct and complete (attend the feedback you received for this part). Do any correction or complete what you missed in the previous Project 1.
1.1 Multiple Regression - advanced topics
1.1.1 Calculation of Variables
For the complete historical dataset of annual fiscal years (fiscalmonth=12 for all quarter-years), you have to calculate the following new independent variables:
Firm size as a categorical variable. For each quarter, you have to label firms in 3 equal groups: small, medium, big according to the market value of the firms.
Calculate the corresponding dummy (binary) variables for the firm size following the dummy encoding method (not the one-hot coding).
Remember that the dependent variable is the future annual stock returns one quarter in the future.
1.1.2 Multiple regression model
Make sure you proposed remedial measures to leverage (extreme) values of variables and possible outliers.
Run a first multiple regression model to examine whether the financial ratios and firm size explain/predict future annual stock returns (one quarter later).
Interpret your model
State the 3 regression equations for each size group (small, medium and big size)
Interpret the results of each coefficient (beta and their statistical significance). Remember that the coefficients of categorical dummies have a special interpretation.
Add interaction terms between firm size and earnings per share deflated by price. In addition, add the financial leverage ratio along with its quadratic effect. Re-run the regression and INTERPRET the model.
State the regression equations for each size group
Interpret the regression coefficients and their level of significant. Note that you have coefficients for direct effects and interaction effects
Did the R-squared improved?
1.1.3 Exploring other remedial measures
To-be announced
2 Part 2 - Forecasting the IGAE index for Mexico
You have to design an ARIMA-SARIMA model to forecast the “Índice General de Actividad Económica” index. INEGI publishes a monthly index of the general economic activity for each industry and for the whole economy.. You can download this index Googling it (“inegi bie igae”)
Download the csv file and import it in Python.
You have to do the following:
Calibrate an ARIMA-SARIMA model for this index. Follow the calibration steps explain in class (HEREis a document with the calibration process).
You have to document your data management steps and the calibration process
You have to include a dummy variable as an exogeneous variable (X variable) in the model to consider the impact of economic recessions over time. From the historical big % declines, decide which months are considered as crisis months.
You have to CLEARLY INTERPRET the final calibrated model with YOUR WORDS. Make sure you interpret:
The coefficients of the autoregressive p term(s) and the seasonal autoregressive P term(s) and their statistical significance
The coefficient of the exogeneous crisis variable and its significance
You have to forecast the industrial index for Querétaro up to 2030. What is the expected average annual growth for 6-7 years? Make your own assumption about the future crisis effect.
3 Evaluation criteria
This is an individual assignment. Each student must do an original work. Please avoid possible interpretations of plagiarism.
For each deliverable you have to submit a Jupyter Notebook (.ipynb and .html files). document
The evaluation criteria will be:
Part | Weight |
---|---|
Multiple regression topics | 60% |
Time series forecasting models | 40% |
Each part will be graded as follows:
Section |
Weight | Notes |
---|---|---|
Data management and Descriptive statistics | 40% | Document your work. You have to explain what you did and also you have to clearly responded to each of the business questions |
Statistical modeling | 50% | Document your work. Make sure you provide a very clear interpretation of your models. Remember that you must interpret each coefficient and their corresponding statistical significance |
Conclusion | 10% | Provide a concise conclusion of your analysis according to the result of your models. Make sure you respond the main business questions |