Evidence_v2, Algorithms and Data Analysis
Download the 2 dataset we have used in the workshops. Use the code stated in workshop 3 (https://rpubs.com/cdorante/fz2022p_w3)
You can go to Workshop 2 (https://rpubs.com/cdorante/fz2022p_w2) to see the data dictionary for each dataset.
As in the previous workshop, merge the 2 datasets (usdata and usfirms) using a left-join. Remember, the panel-dataset usdata is the left dataset, which has historical annual financial data for all US firms; and the usfirms is the right dataset, which is a cross-sectional dataset with general information of firms from the S&P500 index.
You MUST keep ONLY firm-years with status=‘active’.
1 Calculating financial variables
For the variable calculation, you MUST WITH YOUR WORDS IN CAPITAL LETTER how you created the variables. If you use a Gemini prompt, indicate in quotes what was your prompt.
Using the merged dataset, you have to write the code to calculate the following financial variables and financial ratios for all firms-years:
1. Create financial variables
Total Revenue (totalsales) = revenue + extraordinaryitems + otherincome
Gross profit (grossprofit) = Revenue - Cost of good Sold (cogs)
Shareholder Equity (equity) = totalassets - totalliabilites
Avg Assets (avgassets) = arithmetic average of totalassets at the year t and totalassets at t-1
Annual market return (stockretur): calculate annual return for all firm-years by calculating the continuously compounded percentage of the adjusted stock price (adjprice).
Consider that you have a panel-data, so be careful when calculating returns to avoid using data from another firm in the cases of the first year for all firms.
2. Using the same panel dataset, create columns for the following financial ratios:
Here you can use the shift function to get value of total assets one year ago. Make sure that you indicate to groupby firm so you do not use the totalssets from another firm to calculate the roabit of a firm.
Operational Earnings per share (oeps): ebit / sharesoutstanding
Operational eps deflated by stock price (oepsp) : oeps / originalprice
Winsorized oepsp (oepspw) at the 1 and 99 percentiles
Financial leverage (leverage): longdebt / avgassets
Current Ratio (currentratio): currentassets / currentliabilities
GrossProfit-to-Asset ratio (gprofitratio) as:
gprofitratio=\frac{grossprofit_{t}}{totalassets_{t-1}}
- Asset turnover ratio (ato) as:
ato=\frac{totalsales_{t}}{totalassets_{t-1}}
2 DESIGN A MACHINE LEARNING MODEL
Prepare the data to design and run a machine learning model to predict whether a stock annual return beats its industry average returns in the same year. For industry average return calculate the average of stockreturn by industry-year. Use naics1 as the industry
For the model, use the following explanatory variables (as X predictors)
oepspw (winsorized)
Fscore = F1 + F2 + F3 + F4 + F5
F1, F2, F3, F4, F5 are accounting binary signals that measure annual firm improvements (=1 means improvement; 0 means no improvement) about important indicators related to firm performance, leverage or efficiency.
You can calculate the F accounting signals as:
F1 = 1 if the annual difference of leverage (leverage at t minus leverage at t-1) is negative (<0); =0 otherwise
F2 = 1 if the annual difference of currentratio is positive (>0); =0 otherwise
F3 = 1 if the annual difference of equity is positive (>0); =o otherwise
F4 = 1 if the annual difference of gprofitratio is positive (>0); =0 otherwise
F5 = 1 if the annual difference of ato is positive (>0); =0 otherwise
1. You have to EXPLAIN WITH YOUR OWN WORDS HOW YOU CREATED THE ACCOUNTING F SIGNALS (VARIABLES)
2. Design and run a logistic regression to examine whether earnings per share deflated by price winsorized (pepspw) and Fscore are related to the probability that the annual stock returns is higher than its industry average return in the corresponding year.
EXPLAIN THE CODE YOU USED TO RUN THE LOGISTIC MODEL
Show the regression output with all observations (BEFORE THE MACHINE LEARNING MODEL), AND INTERPRET THE beta COEFFICIENTS OF epspw and Fscore WITH YOUR WORDS
3. You must run the corresponding MACHINE LEARNING model.
EXPLAIN THE STEPS YOU FOLLOWED TO RUN THE MACHINE LEARNING MODEL
Show the Confusion Matrix. Just MENTION how many cases your model correctly predicted.
Calculate AND INTERPRET the following ratios:
Precision
Sensitivity
Specificity
ONLY KEEP THE Python CODE YOU NEED FOR THIS EVIDENCE. Extra CODE CAN BE PENALIZED
Remember that you have to submit your Google Colab LINK, and you have to SHARE it with me (cdorante@tec.mx).