This is an INDIVIDUAL workshop. In this workshop we practice algorithmic thinking and programming with loops and functions, and we introduce programming for data management. We practice with financial problems such as bond valuation, calculation of financial ratios, and doing summary tables of financial variables.
1 Loops
The for statement is used to repeat certain tasks a specific number of times. We can loop over a sequence of numbers, or we can loop over specific elements of a vector.
In a similar manner, it is possible to program a loop with while statement. The main difference is that you need to specify what is the exit condition at the beginning of the loop. The while loop is useful when we do not know exactly how many iterations are needed to find a result.
Here is an example. If you want to know how many years you need to keep an investment with a fixed interest rate in order to duplicate your initial investment:
# Define variablesAPR =0.10# Annual Percentage Rate = 10%INV =100# Initial investment = $100MULTIPLE =2# Multiple = 2 (investment should double)BALANCE = INV # Start balance equal to initial investmentyear =0# Start with year = 0# Loop until balance reaches multiple of initial investmentwhile BALANCE < MULTIPLE * INV: year +=1# Increase year by 1 BALANCE = BALANCE * (1+ APR) # Apply growth factor# Print resultsprint(f"To multiply your investment times {MULTIPLE}, at an interest rate of {100*APR}%, you need {year} years.")
To multiply your investment times 2, at an interest rate of 10.0%, you need 8 years.
print(f"Your balance after {year} years will be ${BALANCE:.2f}")
Your balance after 8 years will be $214.36
It is up to you to practice with the different types of loops. When you have to write a loop, you first have to understand in detail what you are looking for. Then, you have to identify the logic you need to do the iterations and which type of calculations you need to do.
It is very recommended first write a pseudo-code with your own words specifying the details of your logic. The pseudo-code is just a step of sequential tasks with your own words. You can use some names of commands, but if you do not remember, you can just write what you want to do line by line, and specify what are the iterations and conditions of your algorithm.
Once you have your pseudo-code, you can use it as your prompt for your favorite LLM (such as Gemini, chatGPT, etc)
The more structured and clear you write your prompts to generate code, the more effective you will be writing code to automate any interesting and complex task in the context of Finance.
2 Looping vs vectorization
Interestingly, Python and other languages such as R, are designed to easily perform repetitive tasks for data frames, vectors and matrices without using a loop.
In Python, unlike other traditional computer languages such as C, there is an alternative way to do loops without writing loops!
This sounds weird. The way you can do a loop without writing a loop is using “vectorization”. Python can perform vectorization in a natural way. For example, if you want to do a sum of 2 vectors or matrices in a traditional language, you might think in a loop to do an element-by-element operation.
Imagine you want calculate the net present value of a new project in your company. The project involves 2 new products, A and B. Each product will bring its own income. You already estimated the expected annual free cash flow (income minus all expenses) of both products for the next 5 years. The following table shows the expected free cash flow for each year for each product (in millions MXP )
Product
Year 1
Year 2
Year 3
Year 4
Year 5
A
3
3
4
6
8
B
-2
-3
-1
3
5
The discount rate (or cost of capital) for this project (annual) is 15%. The initial investment of the project is 10 million pesos.
How can you estimate the present value of these cash flows?
You first have to add the cash flows of each product and get an annual free cash flow for the project. You can use a loop that for each year, you do the sum of the 2 corresponding free-cash flows. Something like:
# Define vectors of cash flows as Python listsA = [3, 3, 4, 6, 8]B = [-2, -3, -1, 3, 5]# Initialize list (with zeros) of same length of A sumcashflows = [0] *len(A)#sumcashflows = {}# Loop through indicesfor i inrange(len(A)):# I sum the cash flows of A and B for the time period i, and store it in sumcashflows: sumcashflows[i] = A[i] + B[i]# I print the resulting stream of cash flows of the project:print(sumcashflows)
[1, 0, 3, 9, 13]
Now, instead of using a loop, we can use vectorization in a natural way to do the sum of both vectors:
import numpy as npA = np.array([3, 3, 4, 6, 8])B = np.array([-2, -3, -1, 3, 5])# Element-wise sum using vectorization:sumcashflows = A + Bprint(sumcashflows)
[ 1 0 3 9 13]
I used numpy arrays numpy library) instead of lists since lists cannot be vectorized.
Check that the statement A + B performs the vectorization by summing element by element of each array. Internally, this vectorization of arrays actually does a kind of loop, but we do not need to program the loop!!
Data frames (from pandas library) are also vectorized since the operations you do can be done to all columns or all rows.
3 Functions
In any programming language we write an algorithm to automate a process that receive a set of inputs, process the inputs with specific steps, and then generate or return an output or result, which can be one number, a list of numbers, a dataset or any set of bits of information.
If we find that we need to run that algorithm more than one time with different input values, then it is a good idea to write a function that automate the process, but we can easily re-use the function to solve similar problems but with different input values.
In a function the input variables are defined as parameters, and the output is explicitly return at the end of the function.
Here is an example:
Imagine a function that receive a number n and return the factorial of that number. I can write this function in Python as follows:
def factorial(n):# There are 3 possibilities about n: n can be negative, zero or positive# I will use a conditional if to calculate the value of the factorial of nif n <0:# If the number is negative, I send a message that the factorial cannot be calculated and return nothing (None)print("The parameter n must be positive.")returnNoneelse: # Here n can be zero or positive# If n = 0, then I return 1. Why? by definition, in mathematics the # factorial of 0 is 1:if n==0: return1else: # if n>0 I calculate the factorial as the multiplication from 1 to n # Step 1: I start with a variable result = 1 result =1# Step 2: I do a for loop to iterate from 1 to n, and for each iteration I multiply result times the iterator (counter) for i inrange(1, n+1): result = result * i# Step 4: I return the result variable return result
Once I defined my function, I can call my function more than one time. For example:
num=-5factnum = factorial(num)
The parameter n must be positive.
print(f"The factorial of {num} is {factnum}")
The factorial of -5 is None
num =5print(f"The factorial of {num} is {factorial(num)}")
The factorial of 5 is 120
num =10print(f"The factorial of {num} is {factorial(num)}")
The factorial of 10 is 3628800
Many times it is useful to call a function from a loop. Let’s work with the following Bond valuation problem.
Imagine I want to write a function to value a bond, and then see how sensitive the bond is to changes in market interests rates.
I start by writing a function to calcualte the value (price) of a bond. I will use the following parameters as inputs:
Parameters (inputs) of my function:
Year-to-maturity: # of years of the contract
Annual interest rate
Frequency: # of times a year for the coupon payments
Annual coupon rate: % of the Face Value to calculate the annual coupon amount
Face value: Principal of the bond
Output:
Bond price / value
Instead of writing the steps before, I will write the steps within my function as comments:
def bond_value(years_to_maturity, annual_interest_rate, frequency, annual_coupon_rate, face_value):""" Calculates the value of a bond. Arguments: years_to_maturity: The number of years until the bond matures. annual_interest_rate: The annual yield to maturity (discount rate). frequency: The number of coupon payments per year. annual_coupon_rate: The annual coupon rate as a decimal. face_value: The face value (par value) of the bond. Returns: The calculated value of the bond. """# Calculate periodic interest rate and number of periods periodic_interest_rate = annual_interest_rate / frequency number_of_periods = years_to_maturity * frequency# Calculate periodic coupon payment periodic_coupon_payment = (annual_coupon_rate * face_value) / frequency# Calculate the present value of coupon payments (annuity) pv_coupon_payments = periodic_coupon_payment * (1- (1+ periodic_interest_rate)**(-number_of_periods)) / periodic_interest_rate# Calculate the present value of the face value pv_face_value = face_value / (1+ periodic_interest_rate)**number_of_periods# Total bond value is the sum of present values of coupon payments and face value total_bond_value = pv_coupon_payments + pv_face_valuereturn total_bond_value# Example usage:# Bond with 10 years to maturity, 5% annual interest rate, semi-annual payments,# 6% annual coupon rate, and $1000 face value.years =10interest_rate =0.05freq =2coupon_rate =0.06face =1000bond_val = bond_value(years, interest_rate, freq, coupon_rate, face)print(f"The value of the bond is: ${bond_val:,.2f}")
The value of the bond is: $1,077.95
As you see I can use this function to value almost any type of bond. Now, imagine that I want to see how much the value of a specific bond changes when market interest rate changes.
Consider the calculation of a bond price with the following parameters: facevalue = 1,000, frequency = 4, coupon rate = 10%, and year-to-maturity = 10 years. Now, with these characteristics, I want to see how much the bond value changes if interests change from 4% to 20%.
I can write a loop to calculate the bond value by changing the interest rate from 4% to 20% jumping the interest rate by 0.5%. For each iteration save the interest rate and the bond value. At the end of the loop, do a plot to see how the bond price changes when the interest rate changes:
import matplotlib.pyplot as plt# Bond parametersface_value =1000frequency =4annual_coupon_rate =0.10years_to_maturity =10# Interest rate rangestart_interest_rate =0.04end_interest_rate =0.20step_interest_rate =0.005# Lists to store resultsinterest_rates = []bond_values = []# Loop through interest rates and calculate bond valueinterest_rate = start_interest_ratewhile interest_rate <= end_interest_rate: bond_val = bond_value(years_to_maturity, interest_rate, frequency, annual_coupon_rate, face_value) interest_rates.append(interest_rate) bond_values.append(bond_val) interest_rate += step_interest_rate# Plot the resultsplt.figure(figsize=(10, 6))plt.plot(interest_rates, bond_values)plt.xlabel("Annual Interest Rate")plt.ylabel("Bond Value")plt.title("Bond Value vs. Annual Interest Rate")plt.grid(True)plt.show()
4 CHALLENGE 1
You are planning to save for retirement over the next 30 years. To do this, you will invest $6,000 a month in a stock account and $3,000 a month in a bond account. The return of the stock account is expected to be 12% compounded monthly, and the bond account will pay 5% annual interest rate also compounded monthly. When you retire, you will combine your money into an account with a 9% annual return. How much can you withdraw each month from your account assuming a 25-year withdrawal period? At the end of 25 years your balance must be zero. Assume that all payments are done at the end of each month.
You have to:
Write/draw your mental map to visualize this problem with your own lens. You can do this in a paper, take a picture and include it in your Colab, or you can use any note software (like OneNote) to do this and paste it as image.
Write down the input and the output
Write the logical steps you propose to solve the problem. Use your own words. Write short and clear sentences for each step.
Paste your inputs, output, and your steps in Gemini to generate Python code. Make sure that your solution is correct. You can check a solution in Excel or carefully review the Python solution.
Make sure you understand the generated Python code. Write down any question about Python code to the next class.
5 CHALLENGE 1 SOLUTION
Remember that the first step to write code for an algorithm is to make sure you clearly understand the problem at 100%. In this case, I tried to visualize the cash streams of my investment for the following 30 years, and then the cash withdrawal for the folowing 25 years for my retirement (after the first 30 years).
I imagined my payments as cash flow streams for the first 30 years (360 months) as follows:
month
1
2
3
…
360
stock saving
6000
6000
6000
…
6000
bond saving
3000
3000
3000
…
3000
After these first 360 months, I will end up with an ending balance for eacvh investment account. I can visualize this ending balance for each investment as the Future value of the investment.
Then, I can calculate the future value (in year 30) of these 2 investments. I make the assumption that I pay the savings at the end of each month. Since I know how to get the present value of an annuity, then I can first get the present value and then I can just calculate its corresponding future value by multiplying this present value times the factor (1+R/12)^{12*30}.
For each investment I first calculate the present value of each monthly payment, and then sum all of them. Once I have this present value of the investment, then I can just calcuate its corresponding future value in the year 30 (month 360). I can visualize this in tables:
For stock saving, the expected APR is 12% compounded monthly:
Monthly rate = 0.12/12
month
1
2
…
360
saving
6000
6000
…
6000
pv(saving)
=6000(1+0.12/12)
=6000(1+0.12/12)^2
…
6000(1+0.12/12)^360
The PV of this saving will be the sum of all present values. Once I have this PV, then I calculate its future value easily with the formula:
FV_{t}=PV*(1+R)^{t}
In this case, t will be 360 and R will be (0.12/12)
The same logic follows for the bond investment. And at the end I sum the 2 future values to get the ending balance of my investment after 30 years. From year 31 I will be enjoying my retirement savings for the following 25 years. Then I need to calculate the withdrawal amount per month from the first month of year 31 to endup with balance zero at the end of the 25 years.
With this conceptualization in mind, I start writing a function for the present value. I will use the approach of vectorization (instead of loops) trying to follow my logic.
import numpy as npdef pv_annuity(C, Y, APR, p):""" Computes the present value of an annuity. Parameters: C (float): Payment per period (e.g., monthly payment) Y (int): Number of years APR (float): Annual percentage rate (e.g., 0.12 for 12%) p (int): Number of compounding periods per year (e.g., 12 for monthly) Returns: float: Present value of the annuity """# Total number of payments n_periods = Y * p# Create arrays for payments and time periods cf = np.repeat(C, n_periods) months = np.arange(1, n_periods +1)# Compute present value of each cash flow pvcf = cf / (1+ APR / p) ** months# Return the sum of all discounted cash flowsreturn np.sum(pvcf)# Example usage:mystock_pv = pv_annuity(C=6000, Y=30, APR=0.12, p=12)mybond_pv = pv_annuity(C=3000, Y=30, APR=0.05, p=12)# Calculate their respective future values at the end of month 360mystock_fv = mystock_pv * (1+0.12/12) ** (30*12)mybond_fv = mybond_pv * (1+0.05/12) ** (30*12)# Print results with formattingprint(f"The future value of the stock account will be: ${mystock_fv:,.2f}")
The future value of the stock account will be: $20,969,784.80
print(f"The future value of the bond account will be: ${mybond_fv:,.2f}")
The future value of the bond account will be: $2,496,775.91
# Total future valuemy_fv = mystock_fv + mybond_fvprint(f"My end balance after 30 years will be: ${my_fv:,.2f}")
My end balance after 30 years will be: $23,466,560.70
I could also use the annuity formula to calculate the present value of an annuity instead of calculating it using vectorization. Here is another approach for an alternative function to get the present value of an annuity. The formula for annuity is:
PV=\frac{C}{R}*\left[1-\frac{1}{(1+R)^{N}}\right]
Then I write a function to calculate this:
import numpy as npdef pv_annuity2(C, Y, APR, p):""" Computes the present value of an annuity using the closed-form formula. Parameters: C (float): Payment per period (e.g., monthly payment) Y (int): Number of years APR (float): Annual percentage rate (e.g., 0.12 for 12%) p (int): Number of compounding periods per year (e.g., 12 for monthly) Returns: float: Present value of the annuity """ rate = APR / p n_periods = Y * p pv = C / rate * (1-1/ (1+ rate) ** n_periods)return pv# Example usage:mystock_pv2 = pv_annuity2(C=6000, Y=30, APR=0.12, p=12)mybond_pv2 = pv_annuity2(C=3000, Y=30, APR=0.05, p=12)# Calculate their respective future values at the end of month 360mystock_fv2 = mystock_pv2 * (1+0.12/12) ** (30*12)mybond_fv2 = mybond_pv2 * (1+0.05/12) ** (30*12)# Total future valuemy_fv2 = mystock_fv2 + mybond_fv2# Print resultsprint(f"The future value of the stock account will be: ${mystock_fv2:,.2f}")
The future value of the stock account will be: $20,969,784.80
print(f"The future value of the bond account will be: ${mybond_fv2:,.2f}")
The future value of the bond account will be: $2,496,775.91
print(f"My end balance after 30 years will be: ${my_fv2:,.2f}")
My end balance after 30 years will be: $23,466,560.70
I got the same result for the end balance as in my first approach.
Once I have the end balance after 30 years, then I can calculate how much money I can withdraw from my investment each month in order to finish my savings after 25 years. I will assume that I will be taking my money out of my account at the end of each month.
I can visualize this problem as another annuity where I know the present value that is my end balance after the 30 years, the future value will be zero, the annual rate is 9% and the number of years is 25. The only unknown is the monthly payment. The annuity formula is:
PV=\frac{C}{R}*\left[1-\frac{1}{(1+R)^{N}}\right]
C is the fixed payment of the annuity. In this problem, C represents the monthly withrawal. PV represents the value of my investment after 30 years. I treat this as present value since I am imagining myself 30 years from now that I already have my end balance of my investment. R is the monthly rate and N is the total number of months after 25 years.
Getting C from this formula:
C=\frac{PV*R}{\left[1-\frac{1}{(1+R)^{N}}\right]}
I can easily calculate the monthly withdrawal C in Python:
# Given parametersAPR =0.09# Annual percentage rate (9%)Y =25# Number of yearsp =12# Periods per year# Assuming 'myfv' was already computed previously (from your annuity example)# For example:# my_fv = mystock_fv2 + mybond_fv2# Compute the monthly withdrawalwdraw = my_fv * (APR / p) / (1- (1/ (1+ APR / p) ** (Y * p)))print("The monthly withdrawal will be:")
The monthly withdrawal will be:
print(f"${wdraw:,.2f}")
$196,930.52
I can test my result using the annuity function. I can calculate the present value of an annuity with a fixed payment equal to the withdrawal amount I calculated. If this present value is equal to the investment I would have at year 30, then the withraw calculation will be correct:
# Compute the required investment amount for the stock and bond accountsStock_and_Bond_Inv = pv_annuity(C=wdraw, Y=25, APR=APR, p=12)print("The calculated investment of the stock and bond accounts at year 30 will be:")
The calculated investment of the stock and bond accounts at year 30 will be:
print(f"${Stock_and_Bond_Inv:,.2f}")
$23,466,560.70
Since I got the same investment as the one I had calculated before, then I test that the withdrawal amount is correct.
6 Data structures
In Finance there are basically the following dataset structures or data types:
Time-series: in this structure you can have many periods, and information for one “subject” (subject can be company, index, industry, etc). Also, you can have more than one subject, but the information is placed as new column for each subject. For example: the dataset created after running getsymbols:
Cross-sectional structure: in this structure, you usually have many “subjects”, for ONLY ONE period For example, key performance indicators of Mexican companies for the last quarter of 2016:
Panel-data structure: it is combination of time-series and cross-sectional. In this structure, we have more than one subject, and for each subject we can have more than one period. Example:
For financial analysis, data management is a very important process. Most of the time, before designing and running a financial / econometric model, it is needed to do data wrangling: data collection/importing, data cleanning, data merging, data transformation.
In this example we will learn how to merge two datasets, and then do summary tables that gives us understanding of the data.
We will merge 2 dataset: a cross-sectional dataset with general information of firms from the S&P500 Market index, and a panel dataset with annual financial information for US public firms (A public firm is a company that issue shares in the market)
Download the following csv datasets:
List of all US public firms with general information of each firm
Download and import these 2 datasets with the following code:
import pandas as pdimport requestsheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}# Download the first fileurl1 ='https://www.apradie.com/datos/usfirms.csv'response1 = requests.get(url1, headers=headers)withopen('usfirms.csv', 'wb') as f: f.write(response1.content)
633061
# Download the second fileurl2 ='https://www.apradie.com/datos/usdata.csv'response2 = requests.get(url2, headers=headers)withopen('usdata.csv', 'wb') as f: f.write(response2.content)
11317275
# Import as data framesusfirms = pd.read_csv('usfirms.csv')usdata = pd.read_csv('usdata.csv')# Display the first few rows of each DataFrame to confirmprint("usfirms DataFrame:")
usfirms DataFrame:
usfirms.head()
empresa ... SectorEconomatica
0 A ... Electric Electron
1 AA ... Basic & Fab Metal
2 AABA_old ... Software & Data
3 AAC_old ... Other
4 AAIC_old ... Funds
[5 rows x 7 columns]
print("\nusdata DataFrame:")
usdata DataFrame:
usdata.head()
firm year revenue ... originalprice sharesoutstanding fixedassets
0 A 2000 10773000.0 ... 54.75 456366.381 1741000.0
1 A 2001 8396000.0 ... 28.51 463695.160 1848000.0
2 A 2002 6010000.0 ... 17.96 467024.421 1579000.0
3 A 2003 6056000.0 ... 29.24 476149.083 1447000.0
4 A 2004 7181000.0 ... 24.10 486841.087 1258000.0
[5 rows x 28 columns]
The first dataset contains a catalog of all US firms along with the corresponding industry type and status (active or cancelled), while the second dataset contains the historical financial data of the firms.
The usdata is a panel-dataset, which is called to be a long-format dataset. Each row has financial information for one US firm and 1 period (a year). All $ amounts are in thousands (’1000s).
Here is a data dictionary of the columns:
Data dictionary of historical annual financial data.
Variable
Description
firm
Firm ticker - Unique code of the company
year
Fiscal Year
revenue
Total sales of the firm in the fiscal year
cogs
Cost of good sold in the fiscal year - variable costs of the products sold
sgae
Sales and general administrative expenses in the fiscal year
depreciation
Depreciation and ammortization in the fiscal year
otherincome
Other operational income/expenses that are not directly from the core operations of the firm
extraordinaryitems
Extra income/expenses not related to regular operations
finexp
Financial expenses - interest expenses paid in the year (generated from loans)
incometax
Income tax paid in the year
cashflowoper
Cash flow generated from operations during the year
cashinbanks
Cash balance in banks at the end of the year
totalassets
Total assets of the firm at the end of the year
currentassets
Current assets of the firm at the end of the year
accountsreceivables
Accounts receivables at the end of the year
inventory
Inventory at the end of the year
totalliabilities
Total liabilities of the firm at the end of the year
currentliabilities
Current liabilities of the firm at the end of the year
accountspayable
Accounts payable at the end of the year
shortdebt
Short-term financial debt at the end of the year
longdebt
Balance of long-term financial debt (loans to pay longer than 1 year)
retainedearnings
Retained earnings in the year
adjprice
Stock adjusted price at the end of the year; adjusted for stock splits and dividend payments; used to calculate stock returns
originalprice
Historical stock price (not adjusted); used to calculate historical market value
sharesoutstanding
Historical number of shares available in the market
fixedassets
Fixed assets value at the end of the year
Each row of this dataset has yearly financial data of one firm in one year. All firms have years 2000 to 2023. Not all firms have existed since 2000. Then, it is possible to know when each firm went public to issue shares in the financial market: the first year with some non-empty data.
Earnings before interest and Taxes (ebit) and Net Income (netincome) must be calculated as:
The usfirms.csv is a catalog of all active and cancelled US firms:
Variable
Description
empresa
Firm ticker - Unique code of the company
Nombre
Name of the firm
status
Status of the firm: active or cancelled
partind
Percent participation in the S&P500 market index
naics1
North American Industry Classification Code - Level 1
naics2
North American Industry Classification Code - Level 2
SectorEconomatica
Economatica Industry classification
In the dataus dataset we have annual financial data. For income-statement variables, the information is the amount generated in the corresponding year fiscal year. For balance-sheet variables, the information is the value of the variable at the end of the year (since the beginning when the firm was created).
8 Data merging
In the usdata panel dataset there is no information about firm name, industry, status (active or cancelled), etc. That information is in the usfirms dataset. So, we start by integrating this information into the panel data.
Both datasets have a common/match column: firm ticker, but the column names for firm ticker in the datasets are different. the firm ticker column is called empresa for the usfirms, and firm for the usdata.
Merge the usfirm into the usdata indicating that empresa and firm columns are the matching columns. Only pull the following columns from the usfirm: Nombre, status, naics1 (industry naics level 1) and SectorEconomatica.
You can do this with the following code:
merged_data = pd.merge(usdata, usfirms[['empresa', 'Nombre', 'status', 'naics1']], left_on='firm', right_on='empresa', how='left')# Drop the redundant 'empresa' column from the merged DataFramemerged_data = merged_data.drop('empresa', axis=1)# Display the first few rows of the merged DataFrame to confirmprint("Merged DataFrame:")
Merged DataFrame:
merged_data.head()
firm year revenue ... Nombre status naics1
0 A 2000 10773000.0 ... Agilent Technologies, Inc active Manufacturing
1 A 2001 8396000.0 ... Agilent Technologies, Inc active Manufacturing
2 A 2002 6010000.0 ... Agilent Technologies, Inc active Manufacturing
3 A 2003 6056000.0 ... Agilent Technologies, Inc active Manufacturing
4 A 2004 7181000.0 ... Agilent Technologies, Inc active Manufacturing
[5 rows x 31 columns]
This type of merging is called left-join merging since we use the left dataset as base to pull data from the right dataset.
When doing a left-join merging of 2 dataset, it is very useful to visualize the left and the right datasets. The left dataset in this case is the usdata, and the right dataset is the usfirms. The left dataset is the base dataset, and we pull columns from the right dataset according to the matching columns.
When we do a vlookup in Excel, we do a left-join considering the left dataset the current sheet, and the right dataset the sheet you indicate to pull a column.
When we do a left-join, the resulting merged dataset must have the same number of rows than the left dataset. It is alwasy recommended to check the number of rows and columns of the resulting merged dataset to make sure that the merged opration was performed correctly:
We can use the shape method of pandas to show the # of rows and columns for each dataset:
print("Dimention of the usdata: ")
Dimention of the usdata:
print(usdata.shape)
(62453, 28)
print("Dimention of the usfirm: ")
Dimention of the usfirm:
print(usfirms.shape)
(5691, 7)
print("Dimention of the merged_data: ")
Dimention of the merged_data:
print(merged_data.shape)
(62453, 31)
We see that the merged_data has the same # of rows than the left dataset usdata. The # of columns of the merged_data is 28 since we pulled 3 columns from the right dataset. Then, we can continue with our data analysis.
9 CHALLENGE 2: Basic Data analysis and visualization
Using the merged dataset, create a dataset selecting historical information for Apple. Call this dataset apple_financials. The firm code (ticker) for Apple Inc Corporation is AAPL.
Using apple_financials, calculate a column for gross profit and ebit:
grossprofit = revenue - cogs
ebit = grossprofit - sgae - depreciation
Calculate the annual growth for all years for the following: 1) revenue, 2)grossprofit, and 3)ebit
Design and show a plot to visualize how revenue, gross profit and ebit have been growing over the years
Show a table with the annual growth for revenue, gross profit and ebit for all years
Design and show a plot to visualize annual growth variables from the previous table
Create a new column for market value:
marketvalue = originalprice * sharesoutstanding
For revenue, grossprofit, ebit, and marketvalue create a growth index starting with 1.0 in the first year. In other words, the index of each variable must represent how many times the variable has grown with respect to its value in the first year,
Show a plot to see the growth indexes over the years.
Use an LLM to generate the corresponding code to the previous steps. Make sure you understand the Python code. You can ask the LLM to explain the code line by line.
Fix any error generated by the LLM, or any calculation that can be improved, and re-generate the code.
Provide an insight about the previous plots. What can you say about Apple performance in terms of revenue, gross profit, ebit, and marketvalue? Explain with your words.
(Optional) Do an online check, find the most recent 10-K report (annual financial statement) of Apple, and compare 2023 annual sales and EBIT with what you got. You must get the same amount for the annual sales reported at the end of fiscal year (for Apple, the fiscal year is Q3).
10 CHALLENGE 2 - SOLUTION
After reading the challenge, I wrote the following prompt to Gemini:
“Analyze the financial performance of Apple (AAPL) using the merged_data dataset. Filter the data to create a new dataset apple_financials containing only Apple’s information. Calculate ‘grossprofit’ and ‘ebit’ using the formulas: grossprofit = revenue - cogs and ebit = grossprofit - sgae - depreciation. Calculate the annual growth for ‘revenue’, ‘grossprofit’, and ‘ebit’. Visualize the trend of ‘revenue’, ‘gross profit’, and ‘ebit’ over the years, and also visualize their annual growth rates. Finally, display a table showing the annual growth rates for ‘revenue’, ‘gross profit’, and ‘ebit’.”
And Gemini proposed subtasks to solve the problem step by step, and I got the following explanations and code:
Filter data for appleSubtask:Create a new DataFrame apple_financials containing only the data for Apple (ticker ‘AAPL’) from the merged_data DataFrame.
Reasoning: Filter the merged_data DataFrame to select rows where the ‘firm’ column is equal to ‘AAPL’ and store the result in apple_financials, then display the head of the new dataframe.
firm year revenue ... Nombre status naics1
163 AAPL 2000 7983000.0 ... Apple Inc active Manufacturing
164 AAPL 2001 5363000.0 ... Apple Inc active Manufacturing
165 AAPL 2002 5742000.0 ... Apple Inc active Manufacturing
166 AAPL 2003 6207000.0 ... Apple Inc active Manufacturing
167 AAPL 2004 8279000.0 ... Apple Inc active Manufacturing
[5 rows x 31 columns]
Calculate gross profit and ebitSubtask:Add new columns for ‘grossprofit’ and ‘ebit’ to the apple_financials DataFrame based on the provided formulas.
Reasoning: Calculate ‘grossprofit’ and ‘ebit’ and add them as new columns to the apple_financials DataFrame.
Calculate annual growthSubtask:Calculate the year-over-year growth rate for ‘revenue’, ‘grossprofit’, and ‘ebit’ in the apple_financials DataFrame.
Reasoning: Calculate the year-over-year percentage change for ‘revenue’, ‘grossprofit’, and ‘ebit’ and store them in new columns, then display the updated dataframe.
Visualize revenue, gross profit, and ebitSubtask:Create a plot to visualize the trend of ‘revenue’, ‘grossprofit’, and ‘ebit’ over the years for Apple.
import matplotlib.pyplot as pltplt.figure(figsize=(12, 6))plt.plot(apple_financials['year'], apple_financials['revenue'], label='Revenue')plt.plot(apple_financials['year'], apple_financials['grossprofit'], label='Gross Profit')plt.plot(apple_financials['year'], apple_financials['ebit'], label='EBIT')plt.xlabel('Year')plt.ylabel('Amount')plt.title('Financial Performance of Apple (AAPL) Over Time')plt.legend()plt.show()
Display annual growth tableSubtask:Show a table containing the calculated annual growth rates for ‘revenue’, ‘grossprofit’, and ‘ebit’.
Reasoning: Select the relevant columns and display them as a table to show the annual growth rates.
I see that there are negative values for EBIT for some years.
Interestingly, the traditional way to calculate percentage change of any value only applies when both, the current and previous values are positive! I recommend to try with simple examples to see what happens when you calculate % change when a variable moves from negative to positive, or viceversa, or when both are negative. You will get weird results that does not make sense!
Then, I have to make a decision for these cases since that can affect both, the plots and any other summary I might need to do like calculating average annual growth. I will assign a null value for these cases to avoid confusion with plots and avoid using these non-sense values in case I do annual average % change of the variables.
Check that these cases were NOT DETECTED by Gemini LLM! Remember that you are the boss here, not the LLM’s!
I do the following code to assign null values for these cases:
# Set growth to NaN when the current or previous value of the variable are negative. #I do this for both EBIT and grossprofitapple_financials.loc[(apple_financials['grossprofit']<0) | (apple_financials['grossprofit'].shift(1)<0), 'grossprofit_growth'] =Noneapple_financials.loc[(apple_financials['ebit']<0) | (apple_financials['ebit'].shift(1)<0), 'ebit_growth'] =None
I check the values to make sure that the code run correctly:
plt.figure(figsize=(12, 6))plt.plot(apple_financials['year'], apple_financials['revenue_growth'], label='Revenue Growth')plt.plot(apple_financials['year'], apple_financials['grossprofit_growth'], label='Gross Profit Growth')plt.plot(apple_financials['year'], apple_financials['ebit_growth'], label='EBIT Growth')plt.xlabel('Year')plt.ylabel('Growth Rate (%)')plt.title('Annual Growth Rates of Revenue, Gross Profit, and EBIT for Apple (AAPL)')plt.legend()plt.show()
I can now appreciate the annual growth of all variables over time.
The following is the summary given by Gemini:
Summary:Data Analysis Key Findings
The analysis successfully filtered the financial data to include only Apple (AAPL).
‘Grossprofit’ and ‘ebit’ were calculated using the provided formulas: grossprofit = revenue - cogs and ebit = grossprofit - sgae - depreciation.
Annual growth rates for ‘revenue’, ‘grossprofit’, and ‘ebit’ were calculated as percentage changes from the previous year.
The trend of ‘revenue’, ‘gross profit’, and ‘ebit’ over the years shows a general upward trajectory.
A table displaying the annual growth rates for ‘revenue’, ‘gross profit’, and ‘ebit’ was generated, showing the year-over-year percentage changes.
A plot visualizing the annual growth rates of ‘revenue’, ‘gross profit’, and ‘ebit’ was created, illustrating the volatility and trends in growth over the years.
Insights or Next Steps
Analyze the periods with significant fluctuations in growth rates to understand the underlying business drivers or external factors.
Compare Apple’s financial performance and growth rates against industry benchmarks or key competitors to gain further insights into its relative position.
Task
Analyze how Apple’s market value has grown over time and compare it with its revenue, gross profit, and EBIT trend/growth using the apple_financials dataset. For market value, multiply the sharesoutstanding and originalprice columns. Show plots, statistics, and insights about Apple’s growth in terms of market perception (value) and fundamentals (revenue, profit).
Calculate market value
Subtask:
Create a new column ‘market_value’ in the apple_financials DataFrame by multiplying ‘sharesoutstanding’ and ‘originalprice’.
Reasoning: Calculate ‘market_value’ by multiplying ‘sharesoutstanding’ and ‘originalprice’ and add it as a new column to the apple_financials DataFrame, then display the head of the updated dataframe.
firm year revenue ... ebit_growth market_value market_value_growth
163 AAPL 2000 7983000.0 ... NaN 8.367953e+06 NaN
164 AAPL 2001 5363000.0 ... NaN 5.441100e+06 -34.976932
165 AAPL 2002 5742000.0 ... NaN 5.203846e+06 -4.360410
166 AAPL 2003 6207000.0 ... NaN 7.511019e+06 44.335934
167 AAPL 2004 8279000.0 ... NaN 1.503210e+07 100.133916
[5 rows x 38 columns]
Visualize market value trend
Subtask:
Create a plot to visualize the trend of ‘market_value’ over the years, potentially alongside revenue, gross profit, and EBIT for comparison.
Reasoning: Create a plot to visualize the trend of ‘market_value’ over the years, potentially alongside revenue, gross profit, and EBIT for comparison.
plt.figure(figsize=(12, 6))plt.plot(apple_financials['year'], apple_financials['revenue'], label='Revenue')plt.plot(apple_financials['year'], apple_financials['grossprofit'], label='Gross Profit')plt.plot(apple_financials['year'], apple_financials['ebit'], label='EBIT')plt.plot(apple_financials['year'], apple_financials['market_value'], label='Market Value')plt.xlabel('Year')plt.ylabel('Amount')plt.title('Financial and Market Value Trend of Apple (AAPL) Over Time')plt.legend()plt.show()
I see that market value has grown a lot over time compared to revenue, ebit and gross profit.
Visualize market value growthSubtask:Create a plot to visualize the annual growth rate of ‘market_value’, potentially alongside the growth rates of revenue, gross profit, and EBIT for comparison.
Reasoning: Create a plot to visualize the annual growth rates of ‘market_value’, ‘revenue’, ‘grossprofit’, and ‘ebit’ over the years for comparison.
plt.figure(figsize=(12, 6))plt.plot(apple_financials['year'], apple_financials['market_value_growth'], label='Market Value Growth')plt.plot(apple_financials['year'], apple_financials['revenue_growth'], label='Revenue Growth')plt.plot(apple_financials['year'], apple_financials['grossprofit_growth'], label='Gross Profit Growth')plt.plot(apple_financials['year'], apple_financials['ebit_growth'], label='EBIT Growth')plt.xlabel('Year')plt.ylabel('Growth Rate (%)')plt.title('Annual Growth Rates of Market Value, Revenue, Gross Profit, and EBIT for Apple (AAPL)')plt.legend()plt.show()
Analyze and summarizeSubtask:Provide statistics and insights about the relationship between Apple’s market value growth and the growth of its fundamental financial metrics.
Reasoning: Calculate descriptive statistics and the correlation matrix for the growth rate columns and then analyze them to provide insights.
print("\nAnalysis of Growth Rates and Market Value Relationship:")
Analysis of Growth Rates and Market Value Relationship:
print("The descriptive statistics provide an overview of the central tendency, dispersion, and shape of the distribution for each growth rate. Large standard deviations indicate significant volatility in annual growth.")
The descriptive statistics provide an overview of the central tendency, dispersion, and shape of the distribution for each growth rate. Large standard deviations indicate significant volatility in annual growth.
print("\nThe correlation matrix shows the linear relationship between the growth rates. A high positive correlation suggests that when one metric grows, the other tends to grow as well. A high negative correlation suggests an inverse relationship.")
The correlation matrix shows the linear relationship between the growth rates. A high positive correlation suggests that when one metric grows, the other tends to grow as well. A high negative correlation suggests an inverse relationship.
print("\nBy examining these statistics and correlations, we can gain insights into how closely market value growth tracks the growth of fundamental financial metrics like revenue, gross profit, and EBIT.")
By examining these statistics and correlations, we can gain insights into how closely market value growth tracks the growth of fundamental financial metrics like revenue, gross profit, and EBIT.
print("\nObserving periods where market value growth significantly diverges from fundamental growth (e.g., market value growing much faster or slower than profits) can indicate factors beyond current financial performance influencing market perception, such as future expectations, market sentiment, or macroeconomic conditions.")
Observing periods where market value growth significantly diverges from fundamental growth (e.g., market value growing much faster or slower than profits) can indicate factors beyond current financial performance influencing market perception, such as future expectations, market sentiment, or macroeconomic conditions.
Summary given by Gemini:
Summary:
Data Analysis Key FindingsApple’s market value was calculated by multiplying sharesoutstanding and originalprice.Year-over-year growth rates for market value, revenue, gross profit, and EBIT were calculated.Visualizations showed the trends of market value, revenue, gross profit, and EBIT over time, as well as their annual growth rates.Descriptive statistics highlighted the volatility in the annual growth rates of all metrics, as indicated by large standard deviations.A positive correlation exists between market value growth and the growth of fundamental financial metrics (revenue, gross profit, and EBIT). The strongest correlation was observed between market value growth and EBIT growth (approximately 0.769).
Insights or Next StepsThe analysis suggests that Apple’s market value growth is positively related to its fundamental financial performance, particularly EBIT growth.Further investigation into periods where market value growth significantly diverges from fundamental growth could reveal other factors influencing market perception, such as investor sentiment or new product cycles.
TaskAnalyze the growth of ‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’ for Apple over time using the apple_financials dataset. Calculate a growth index for each variable, starting at 1 in the first year. Visualize these growth indices over time and provide insights based on the plot.
Calculate growth indicesSubtask:Calculate a growth index for ‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’, starting with an index of 1 in the first year.
Reasoning: Select the necessary columns, set ‘year’ as the index, and calculate the growth index for each selected financial metric and market value by normalizing to the first year’s value, then display the head of the resulting dataframe.
growth_indices = apple_financials[['year', 'revenue', 'grossprofit', 'ebit', 'market_value']].copy()growth_indices.set_index('year', inplace=True)for col in ['revenue', 'grossprofit', 'ebit', 'market_value']: growth_indices[f'{col}_index'] = growth_indices[col] / growth_indices[col].iloc[0]growth_indices.head()
Visualize growth indicesSubtask:Create a plot to visualize how each of these growth indices has changed over time.
Reasoning: Create a plot to visualize how each of these growth indices has changed over time as instructed.
plt.figure(figsize=(12, 6))plt.plot(growth_indices.index, growth_indices['revenue_index'], label='Revenue Index')plt.plot(growth_indices.index, growth_indices['grossprofit_index'], label='Gross Profit Index')plt.plot(growth_indices.index, growth_indices['ebit_index'], label='EBIT Index')plt.plot(growth_indices.index, growth_indices['market_value_index'], label='Market Value Index')plt.xlabel('Year')plt.ylabel('Growth Index (Normalized to 2000)')plt.title('Growth of Financial Metrics and Market Value for Apple (AAPL) - Indexed to 2000')plt.legend()plt.show()
Comparing with a previous plot of the variables (not their growth indexes), we see EBIT has grown at a faster rate than grossprofit and revenue! this is against the perception of the previous plot where revenue looks that is growing much faster than grossprofit and EBIT!
Let’s continue with Gemini analysis:
Analyze growth indicesSubtask:Provide insights based on the plot of the growth indices.
Summary:Data Analysis Key FindingsThe growth index for each variable (‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’) was calculated by normalizing the values to the first year (2000), where the index starts at 1.The plot of the growth indices over time reveals a significantly steeper growth trajectory for ‘market_value’ compared to ‘revenue’, ‘grossprofit’, and ‘ebit’.While all metrics show positive growth over the period, the market value’s growth far outpaces the growth in revenue, gross profit, and EBIT.
Insights or Next StepsThe divergence in growth between market value and financial metrics suggests that market valuation is strongly influenced by factors beyond current financial performance, potentially including future growth expectations, brand value, and investor sentiment.A potential next step could be to investigate specific events or periods that correlate with significant divergences in growth between market value and financial performance to understand the drivers behind these shifts.
11 CHALLENGE 3. Summary tables to provide data understanding
Using the merged dataset (with all firms), do the following:
Create columns for grossprofit, ebit and marketvalue as you did with the apple_financials dataset
Create a summary table by year with the sum of
revenue
grossprofit
ebit
marketvalue
Using the previous summary table, do a plot to show revenue, grossprofit, ebit and marketvalue over time
Using the previous summary table, create a growth index for each variable to show how many times each variable has grown with respect to the first year. Show a plot to see these indexes over time
Use an LLM to generate the corresponding code to the previous steps. Make sure you understand the Python code. You can ask the LLM to explain the code line by line.
Provide an insight about the previous plots. What can you say about the whole financial market performance in terms of revenue, gross profit, ebit, and marketvalue? Explain with your words.
12 CHALLENGE 3 SOLUTION
Calculate financial metricsSubtask:Add ‘grossprofit’, ‘ebit’, and ‘market_value’ columns to the merged_data DataFrame.
Reasoning: Add the ‘grossprofit’, ‘ebit’, and ‘market_value’ columns to the merged_data DataFrame and display the head.
Visualize trendsSubtask:Plot the yearly sum of ‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’ to visualize their trends over time for the entire market.
Reasoning: Create a plot to visualize the trend of yearly summed financial metrics and market value over time for the entire market as instructed.
plt.figure(figsize=(12, 6))plt.plot(yearly_market_summary['year'], yearly_market_summary['revenue'], label='Revenue')plt.plot(yearly_market_summary['year'], yearly_market_summary['grossprofit'], label='Gross Profit')plt.plot(yearly_market_summary['year'], yearly_market_summary['ebit'], label='EBIT')plt.plot(yearly_market_summary['year'], yearly_market_summary['market_value'], label='Market Value')plt.xlabel('Year')plt.ylabel('Amount')plt.title('Overall Market Financial and Market Value Trend Over Time')plt.legend()plt.show()
Calculate growth indicesSubtask:Calculate a growth index for the yearly sum of ‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’, normalized to the first year.
Reasoning: Select the necessary columns, set ‘year’ as the index, and calculate the growth index for each selected financial metric and market value by normalizing to the first year’s value, then display the head of the resulting dataframe.
growth_indices_market = yearly_market_summary[['year', 'revenue', 'grossprofit', 'ebit', 'market_value']].copy()growth_indices_market.set_index('year', inplace=True)for col in ['revenue', 'grossprofit', 'ebit', 'market_value']: growth_indices_market[f'{col}_index'] = growth_indices_market[col] / growth_indices_market[col].iloc[0]growth_indices_market.head()
Visualize growth indicesSubtask:Plot the yearly growth indices to visualize how these metrics have grown relative to the starting year for the entire market.
Reasoning: Plot the yearly growth indices to visualize how these metrics have grown relative to the starting year for the entire market.
plt.figure(figsize=(12, 6))plt.plot(growth_indices_market.index, growth_indices_market['revenue_index'], label='Revenue Index')plt.plot(growth_indices_market.index, growth_indices_market['grossprofit_index'], label='Gross Profit Index')plt.plot(growth_indices_market.index, growth_indices_market['ebit_index'], label='EBIT Index')plt.plot(growth_indices_market.index, growth_indices_market['market_value_index'], label='Market Value Index')plt.xlabel('Year')plt.ylabel('Growth Index (Normalized to First Year)')plt.title('Growth of Overall Market Financial Metrics and Market Value - Indexed to First Year')plt.legend()plt.show()
In the previous plot, it seems that sales and market value has grown faster than the other variables. However, with this plot of the growth indexes, we see that market value has grown at a similar pace than EBIT over the whole period. Interesting!
So, which plot is a better picture of the real growth rates of the variables?
Above is what Gemini analyzed and summarized:
Analyze and summarizeSubtask:Provide insights based on the plots, discussing the overall financial market performance in terms of revenue, gross profit, EBIT, and market value.
Summary:Data Analysis Key FindingsMarket value has shown significantly higher growth compared to revenue, gross profit, and EBIT when indexed to the first year. Revenue, gross profit, and EBIT demonstrate a general upward trend over the years, albeit with less pronounced growth than market value.
Insights or Next StepsThe divergence in growth between market value and fundamental financial metrics suggests that market sentiment and other non-financial factors have played a substantial role in driving market valuation during the observed period.
Further analysis could investigate potential factors contributing to the outperformance of market value growth, such as changes in market multiples (e.g., P/E ratios) or industry-specific trends.