Business case: Analyzing the US financial market

Author

Alberto Dorantes D.

Published

August 1, 2024

Abstract
This is business case prepared for the Statistics Module (Bloque 1) of the Advanced AI concentration

1 Case Description

You were hired as a data scientist in an important investment company in the department of financial analysis. The firm has been doing financial analysis and financial forecast for several years. You were hired to come up with better financial analytic models to design financial portfolios.

You have to analyze historical quarterly financial statements of real US public firms listed in the New York Exchange and NASDAQ. Download the following csv datasets:

Download them to your Google Drive folder to easily import it into Colab.

The first dataset (dataus2024q2) contains the historical financial data of the firms, while the second dataset (firmsus2024) is a catalog of all firms along with the corresponding industry type and status (active or cancelled).

The dataus2024q2 dataset has a panel-data (also called long format) structure. Each row has financial information for one US firm and 1 period (a quarter). All $ amounts are in thousands (’1000s). Here is a data dictionary of the columns:

Data dictionary of historical quarterly financial data.
Variable Description
firm Unique code of the company (also called ticker)
q Quarter date
fiscalmonth Month of the year when the firm closes a fiscal year
revenue Total sales of the firm from the first fiscal quarter to the current quarter
cogs Cost of good sold - variable costs of the products sold - from the first fiscal quarter to the current quarter
sgae Sales and general administrative expenses - from the first fiscal quarter to the current quarter
otherincome Other operational income/expenses that are not directly from the core operations of the firm - from the first fiscal quarter to the current quarter
extraordinaryitems Extra income/expenses not related to regular operations - from the first fiscal quarter to the current quarter
finexp Financial expenses - interest expenses paid (generated from loans) - from the first fiscal quarter to the current quarter
incometax Income tax from the first fiscal quarter to the current quarter
totalassets Total assets of the firm at the end of the quarter
currentassets Current assets of the firm at the end of the quarter
totalliabilities Total liabilities of the firm at the end of the quarter
currentliabilities Current liabilities of the firm at the end of the quarter
longdebt Balance of long-term financial debt (loans to pay longer than 1 year)
adjprice Stock adjusted price at the end of the quarter; adjusted for stock splits and dividend payments; used to calculate stock returns
originalprice Historical stock price (not adjusted); used to calculate historical market value
sharesoutstanding Historical number of shares available in the market
fixedassets Fixed assets value at the end of the quarter
year Calendar year
yearf Fiscal year - this depends on when the firm ends its fiscal year; if fiscalmonth=12 in the quarter 3, then the fiscal year will start in Q4 of a year and ends in the Q3 of the following year

For each firm there are many rows that represent historical quarterly financial data. All firms have quarters from Q1 2000 to Q2 2024. Not all firms have existed since 2000, so if the first quarters are empty that means that the firm did not exist in the US financial market in those quarters. Then, it is possible to know when each firm went public to issue shares in the financial market: the first quarter with some non-empty data.

Each firm has defined the month of the year used to close a fiscal year. For example, Apple closes the fiscal year at the end of Quarter 3 (end of September) of any year. Then, for Apple, in the Q3 of 2022, there will be a 12 for the fiscalmonth variable. In this case, Apple starts its fiscal year in the Q4 of each year and ends in the Q3 of the following year. Most of the firms (about 80%) close fiscal year in December, so these firms will have a 12 in the Q4 of each year.

The variables related to sales and expenses are cumulative for each fiscal year. For example, Apple sold about $117 billion in the last calendar quarter (Q4) of 2022, but this is the first fiscal quarter for Apple. For Q1 (calendar) 2023 (which is the 2nd fiscal quarter), Apple has about $212 billion in the revenue variable, meaning that considering fiscal quarter 1 and 2, Apple has sold $212 billion. For Q2 2023 Apple has about $293 billion, meaning that the cumulative revenue of fiscal Q1, Q2 and Q3 is about $293 billion. Then, if you select rows with fiscalmonth=12, then you will be selecting those quarters with annual financial information for each firm!

The firmsus2024.csv is a catalog of all active and cancelled US firms:

Variable Description
firm Unique code of the company (also called ticker)
name Name of the firm
status Status of the firm: active or cancelled
partind Percent participation in the S&P500 market index
naics1 North American Industry Classification Code - Level 1
naics2 North American Industry Classification Code - Level 2
SectorEconomatica Economatica Industry classification

The main objectives of your analysis are

  1. To learn about the firm composition of whole US financial market (using all firms), and

  2. Learn about the basics of financial analysis with a focus on statistics

  3. Identify patterns by looking at satistical relationships among financial variables

You have to select firms from 1 industry and then learn which financial factors/variables might be related to future stock return.

In the following sections you will find specific directions to provide descriptive statistics and do a statistical analysis to aim the objectives.

1.1 Calculation of financial variables and ratios

As a dependent variable, you have to use annual stock return in continuously compounded. You have to calculate:

  • contemporary continuous compounded returns (cc returns) for each firm-quarter.

  • future annual stock cc return (one-quarter ahead)

For independent/explanatory variables, you have to calculate at least 4 financial ratios that you believe can be used to predict future stock return.

For control variables, you have to use the following 2 financial ratios:

  • Earnings per share divided by price

  • Book-to-market ratio

Control variables are usually included as independent variables. Control variables are those variables that have been found to be statistically related to the dependent variable of your model. By including control varialbes, the results of your model will be more robust.

You have to download the document “Financial Ratios- The definitive guide” from:

https://corporatefinanceinstitute.com/resources/accounting/financial-ratios-definitive-guide/

You have to fill-in a form and download the pdf guide.

1.2 Firm Composition of the US financial market

Considering the most recent financial quarter of the dataset:

  • Show how many active firms by industry there are in the sample

  • Show a table that shows statistics about the book value and market value by industry. What can you say about the typical firm size in terms of market value and book value? How much these variables change within each industry? How firm size (in market value) is distributed? You have to do the right descriptive statistics

  • Which are the biggest 10 US firms in terms of market value and how far they are from the typical market size of a US firm?

  • Which are the biggest 10 US firms in terms of book value and how far they are from the typical book size of a US firm?

1.3 Evolution of the US financial market

Considering the whole history of financial data for all firms, and only annual fiscal information (selecting fiscalmonth=12) :

  • How the total market value of the US market has changed over the years? Do a table and/or graph and explain

  • How total revenue and total net income of all US firms has changed over the years? Do a table and/or graph and explain

1.4 Learing about one industry

The following analysis must be done after selecting a set of firms that belong to one industry.

  • You have to select all firms that belong to 1 industry. Each student will be randomly assigned 1 industry. The industries that will be considered are (financial services are not considered):

    • Manufacturing industries

    • Commercial industries (retail and wholesale)

    • Service industries

1.4.1 Descriptive statistics for your industry

Using the subsample of your industry and using annual fiscal information (selecting rows with fiscalmonth=12), and the most recent complete year (2023), do the following:

  • Show descriptive statistics for future annual log return, total assets, revenue, net income, control variables, and all your selected financial ratios. Provide a clear interpretation of the descriptive statistics of revenue and 1 financial ratio

1.4.2 Correlation matrix

Using annual fiscal data (selecting rows with fiscalmonth=12), and all years:

  • Show a correlation matrix with the same variables you used for descriptive statistics above, but in the case of totalassets, revenue and netincome, use the logarithmic transformation of their values. Which variables have strongest and significant linear relationship with stock returns (annual returns)? Explain

1.4.3 Regression Model

For each financial ratio, do the following:

  • Do a histogram and interpret it

  • Do plots to visualize the possible relationship each factor might have with quarterly returns (the dependent variable).

  • Design and run a first version of a multiple regression model to examine whether your selected ratios can explain/predict future annual stock returns.

  • You must check for possible multicollinearity problems. Briefly explain what is multicollinearity, run and interpret the corresponding test

  • Interpret your model

    • Interpret with your own words the results of each coefficient (beta and their statistical significance)

    • Interpret the R-squared of the model

    • Write a conclusion of your results

2 Deliverables

This case must be done by each student and the work must be original. Please avoid possible interpretations of plagiarism.

You have to submit a Jupyter Notebook (.ipynb and .html files) document

3 Evaluation criteria

The evaluation criteria will be:

Section Weight Notes
Data management and Descriptive statistics 40% Document your work. You have to explain what you did for data management, descriptive statistics, and also you have to clearly respond to each of the business questions
Statistical modeling 50% Document your work. Explain what you do in terms of variable calculation, correlation matrix and regression analysis. Make sure you provide a very clear interpretation of your model. Remember that you must interpret each coefficient and their corresponding statistical significance
Conclusion 10% Provide a concise conclusion of your analysis according to the result of your models. Make sure you respond the main business questions