Final Workshop, Business Analytics for Decision Making
1 Case Description
You were hired as a data scientist in an important mutual fund firm (investment company) in the department of financial analysis. The firm has been doing financial analysis and financial forecast for several years. You were hired to come up with alternative approaches to do descriptive, diagnostic and predictive analytics in order to come up with a general approach .
You have to analyze historical annual financial statements of all US public firms listed in the New York Exchange and NASDAQ since 2000. You will receive two csv files. The first dataset (usdata.csv) contains the historical financial data of the firms, while the second dataset (usfirms.csv) is a catalog of all firms along with the corresponding industry type and status (active or cancelled).
The usdata is a panel-data (also called long format) structure. Each row has financial information for one US firm and 1 period (a year). All $ amounts are in thousands (’1000s).
You can find a description of each variable in the dictionary.csv file.
The main objectives of your analysis are a) To learn about the firm composition of whole US financial market (using all firms), b) learn which financial factors/indicators are related to stock return., and c) Propose an investment portfolio strategy based on your statistical analysis.
2 Descriptive statistics
You have to show the 2023 composition of the US market in terms of # of firms by industry, typical firm size measured by market value. Use the most recent year (2023) to show this information.
In addition, you have to show how the US financial market has created wealth over the years using the following variables:
Total sales
Operating earnings (EBIT)
Net Income
Market value
Also, you have to do a DuPont Analysis (using the operational ratios) over the years for the whole US financial market, and highlight the important insights.
3 Analysis
You have to come up with 2 multiple regression models to understand how 1) current stock return and 2) future stock return are related to:
The DuPont ratios,
Earnings Yield (Earnings per share deflated by price)
Book-to-market ratio
Firm size - use a categorical variable to classify the firms in 3 groups: small, medium and large based on the firm market value in each year.
In both models, you have to control for time (years).
What you have to do?
You have to explain what each variables/indicator means. For Earnings Yield and Book-to-market ratio you have to explain what has been found by previous research about the relationship of these ratios with stock returns.
You have to clearly explain all your data preparation steps, variable calculations and also how you run the multiple regression model
You MUST clearly INTERPRET the regression model in terms of the beta coefficients and their significance. Also, you have to make your conclusion of these interpretations that you can use to make decisions related to an investment strategy.
You have to do a prediction of annual returns for 2024 for all US firms and keep this prediction in case you want to use it for your investment strategy.
You have to use at least 2 macro-economic variables to examine whether these variables are related to the stock return of the US market (or the S&P index). You can select any of the following:
US GDP
Inflation
Interest rates
Unemployment rates
You have to do at least 2 models to understand whether you see a relationship between 2 of these variables with the performance of the US financial market,.
In addition, do a research about the Industry Rotation Theory that examines the relationship between Sector performance and the Economic Cycle.
With your findings about the relationship between the macro-economic variables and the US financial market, and the suggestions of the Industry Rotation Theory, you have to come up with a proposal to select specific industries that you would use to pick your US stocks.
- You have to come up with a proposal for an investment strategy composed by US stocks and the Treasury Bill based on all your analysis.
You can test your proposal using the 2024 year as a benchmark. You have to make a proposal for a portfolio starting in 2025 and assuming that you will hold the portfolio for long-term (with annual balances)
Deliverable:
A pdf or word document of about 8-15 pages (including figures, and without including Python code)
A Jupyter notebook with the details of your data analysis (A Google colab link). This notebook has to have the detailed explanation of the data preparation and Python code. The interpretations and analysis can be here, but also have to be in the document.
An Excel file (if you used it for your analysis)