Evidence_v1, Algorithms and Data Analysis
Download the same dataset we have used in the workshops. Use the code stated in workshop 3 (https://rpubs.com/cdorante/fz2022p_w3)
You can go to Workshop 2 (https://rpubs.com/cdorante/fz2022p_w2) to see the data dictionary for each dataset.
As in the previous workshop, merge the 2 datasets (usdata and usfirms) using a left-join. Remember, the panel-dataset usdata is the left dataset, which has historical annual financial data for all US firms; and the usfirms is the right dataset, which is a cross-sectional dataset with general information of firms from the S&P500 index.
You MUST keep ONLY firm-years with status=‘active’.
1 Calculating financial variables
For the variable calculation, you MUST WITH YOUR WORDS IN CAPITAL LETTER how you created the variables. If you use a Gemini / chatGPT prompt, indicate in quotes what was your prompt.
Using the merged dataset, you have to write the code to calculate the following financial variables and financial ratios for all firms-years:
1. Create financial variables
Gross profit (grossprofit) = Revenue - Cost of good Sold (cogs)
Earnings before interest and taxes (ebit) = Gross profit - Sales & general administrative expenses (sgae) - depreciation
Net Income (netincome) = ebit + otherincome + extraordinaryitems - finexp (financial expenses) - incometax
Annual market return: calculate annual return for all firm-years by calculating the continuously compounded percentage of thea djusted stock price (adjprice). Consider that you have a panel-data, so be careful when calculating returns to avoid using data from another firm in the cases of the first year for all firms. Hint: you can use the shift function and groupby firm to avoid using stock price of another stock to calculate the annual return of each firm for all years.
2. Using the same panel dataset, create columns for the following financial ratios:
Here you can use the shift function to get value of total assets one year ago. Make sure that you indicate to groupby firm so you do not use the totalssets from another firm to calculate the roabit of a firm.
- Return on Assets (roa):
roa=\frac{netincome_{t}}{totalassets_{t-1}}
Operational Earnings per share (oeps): ebit / sharesoutstanding
Operational eps deflated by stock price (oepsp) : oeps / originalprice
Cash flow to Assets ratio (cfr) as
cfr=\frac{cashflowoper_{t}}{totalassets_{t-1}} You have to winsorize epsp using the 1 and 99 percentile, and name it epspw.
2 DESIGN A MACHINE LEARNING MODEL
Prepare the data to desin and run a machine learning model to predict whether a stock annual return beats its industry average returns in the same year. Use naics1 as the industry.
For the model, use the following explanatory variables (as X predictors)
epspw (winsorized)
Fscore = F1 + F2 + F3 + F4
You can calculate the F accounting signals as:
F1 = 1 if roa>0; 0 otherwise
F2 = 1 if cfr>0; 0 otherwise
F3 = 1 if the change in roa (roa at t minus roa at t-1) is positive; = 0 otherwise
F4 = 1 if cfr > roa; =0 otherwise
Design and run a logistic regression to examine whether earnings per share deflated by price winsorized (epspw) and Fscore are related to the probability that the annual stock returns is higher than its industry average in the corresponding year.
In addition, you must run the corresponding MACHINE LEARNING model.
You have to EXPLAIN the following WITH YOUR WORDS:
HOW YOU CREATED THE ACCOUNTING F SIGNALS (VARIABLES)
EXPLAIN THE CODE YOU USED TO RUN THE LOGISTIC MODEL
RUN THE FIRST VERSION OF THE MODEL WITH ALL OBSERVATIONS (BEFORE THE MACHINE LEARNING MODEL), AND INTERPRET THE beta COEFFICIENTS OF epspw and Fscore WITH YOUR WORDS
EXPLAIN THE STEPS YOU FOLLOWED TO RUN THE MACHINE LEARNING MODEL
Show the Confusion Matrix. Just MENTION how many cases your model correctly predicted.
Calculate AND INTERPRET the following ratios:
6.1) Precision
6.2) Sensitivity
6.3) Specificity
ONLY KEEP THE Python CODE YOU NEED FOR THIS EVIDENCE. Extra CODE CAN BE PENALIZED
Remember that you have to submit your Google Colab LINK, and you have to SHARE it with me (cdorante@tec.mx).