This is an INDIVIDUAL workshop. In this workshop we practice algorithmic thinking and programming with loops and functions, and we introduce programming for data management. We practice with financial problems such as bond valuation, calculation of financial ratios, and doing summary tables of financial variables.
1 General Directions for each workshop
You have to work on Google Colab for all your workshops. In Google Colab, you MUST LOGIN with your @tec.mx account and then create a google colab document for each workshop.
You must share each Colab document (workshop) with the following account:
cdorante@tec.mx
You must give Edit privileges to these accounts.
In Google Colab you can work with Python or R notebooks. The default is Python notebooks.
Your Notebook will have a default name like “Untitled2.ipynb”. Click on this name and change it to “W2-Algorithms-YourFirstName-YourLastname”.
In your Workshop Notebook you have to:
You have to replicate all the code of the workshop, and do all the challenges to get full credit for the workshop. The accuracy of the challenge will not significantly affect your grade; completion will have more weight for your workshop grade.
It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your personal notebook to study for the FINAL EXAM. Your own workshop/notebook will be very helpful for your further study.
Once you finish your workshop, make sure that you RUN ALL CHUNKS. You can run each code chunk by clicking on the “Run” button located in the top-left section of each chunk. You can also run all the chunks in one-shot with Ctrl-F9. You have to submit to Canvas the web link of your Google Colab workshop.
2 Loops
The for statement is used to repeat certain tasks a specific number of times. We can loop over a sequence of numbers, or we can loop over specific elements of a vector.
In a similar manner, it is possible to program a loop with while statement. The main difference is that you need to specify what is the exit condition at the beginning of the loop. The while loop is useful when we do not know exactly how many iterations are needed to find a result.
Here is an example. If you want to know how many years you need to keep an investment with a fixed interest rate in order to duplicate your initial investment:
# Define variablesAPR =0.10# Annual Percentage Rate = 10%INV =100# Initial investment = $100MULTIPLE =2# Multiple = 2 (investment should double)BALANCE = INV # Start balance equal to initial investmentyear =0# Start with year = 0# Loop until balance reaches multiple of initial investmentwhile BALANCE < MULTIPLE * INV: year +=1# Increase year by 1 BALANCE = BALANCE * (1+ APR) # Apply growth factor# Print resultsprint(f"To multiply your investment times {MULTIPLE}, at an interest rate of {100*APR}%, you need {year} years.")
To multiply your investment times 2, at an interest rate of 10.0%, you need 8 years.
print(f"Your balance after {year} years will be ${BALANCE:.2f}")
Your balance after 8 years will be $214.36
It is up to you to practice with the different types of loops. When you have to write a loop, you first have to understand in detail what you are looking for. Then, you have to identify the logic you need to do the iterations and which type of calculations you need to do.
It is very recommended first write a pseudo-code with your own words specifying the details of your logic. The pseudo-code is just a step of sequential tasks with your own words. You can use some names of commands, but if you do not remember, you can just write what you want to do line by line, and specify what are the iterations and conditions of your algorithm.
Once you have your pseudo-code, you can use it as your prompt for your favorite LLM (such as Gemini, chatGPT, etc)
The more structured and clear you write your prompts to generate code, the more effective you will be writing code to automate any interesting and complex task in the context of Finance.
3 Looping vs vectorization
Interestingly, Python and other languages such as R, are designed to easily perform repetitive tasks for data frames, vectors and matrices without using a loop.
In Python, unlike other traditional computer languages such as C, there is an alternative way to do loops without writing loops!
This sounds weird. The way you can do a loop without writing a loop is using “vectorization”. Python can perform vectorization in a natural way. For example, if you want to do a sum of 2 vectors or matrices in a traditional language, you might think in a loop to do an element-by-element operation.
Imagine you want calculate the net present value of a new project in your company. The project involves 2 new products, A and B. Each product will bring its own income. You already estimated the expected annual free cash flow (income minus all expenses) of both products for the next 5 years. The following table shows the expected free cash flow for each year for each product (in millions MXP )
Product
Year 1
Year 2
Year 3
Year 4
Year 5
A
3
3
4
6
8
B
-2
-3
-1
3
5
The discount rate (or cost of capital) for this project (annual) is 15%. The initial investment of the project is 10 million pesos.
How can you estimate the present value of these cash flows?
You first have to add the cash flows of each product and get an annual free cash flow for the project. You can use a loop that for each year, you do the sum of the 2 corresponding free-cash flows. Something like:
# Define vectors of cash flows as Python listsA = [3, 3, 4, 6, 8]B = [-2, -3, -1, 3, 5]# Initialize list (with zeros) of same length of A sumcashflows = [0] *len(A)#sumcashflows = {}# Loop through indicesfor i inrange(len(A)):# I sum the cash flows of A and B for the time period i, and store it in sumcashflows: sumcashflows[i] = A[i] + B[i]# I print the resulting stream of cash flows of the project:print(sumcashflows)
[1, 0, 3, 9, 13]
Now, instead of using a loop, we can use vectorization in a natural way to do the sum of both vectors:
import numpy as npA = np.array([3, 3, 4, 6, 8])B = np.array([-2, -3, -1, 3, 5])# Element-wise sum using vectorization:sumcashflows = A + Bprint(sumcashflows)
[ 1 0 3 9 13]
I used numpy arrays numpy library) instead of lists since lists cannot be vectorized.
Check that the statement A + B performs the vectorization by summing element by element of each array. Internally, this vectorization of arrays actually does a kind of loop, but we do not need to program the loop!!
Data frames (from pandas library) are also vectorized since the operations you do can be done to all columns or all rows.
4 Functions
In any programming language we write an algorithm to automate a process that receive a set of inputs, process the inputs with specific steps, and then generate or return an output or result, which can be one number, a list of numbers, a dataset or any set of bits of information.
If we find that we need to run that algorithm more than one time with different input values, then it is a good idea to write a function that automate the process, but we can easily re-use the function to solve similar problems but with different input values.
In a function the input variables are defined as parameters, and the output is explicitly return at the end of the function.
Here is an example:
Imagine a function that receive a number n and return the factorial of that number. I can write this function in Python as follows:
def factorial(n):# There are 3 possibilities about n: n can be negative, zero or positive# I will use a conditional if to calculate the value of the factorial of nif n <0:# If the number is negative, I send a message that the factorial cannot be calculated and return nothing (None)print("The parameter n must be positive.")returnNoneelse: # Here n can be zero or positive# If n = 0, then I return 1. Why? by definition, in mathematics the # factorial of 0 is 1:if n==0: return1else: # if n>0 I calculate the factorial as the multiplication from 1 to n # Step 1: I start with a variable result = 1 result =1# Step 2: I do a for loop to iterate from 1 to n, and for each iteration I multiply result times the iterator (counter) for i inrange(1, n+1): result = result * i# Step 4: I return the result variable return result
Once I defined my function, I can call my function more than one time. For example:
num=-5factnum = factorial(num)
The parameter n must be positive.
print(f"The factorial of {num} is {factnum}")
The factorial of -5 is None
num =5print(f"The factorial of {num} is {factorial(num)}")
The factorial of 5 is 120
num =10print(f"The factorial of {num} is {factorial(num)}")
The factorial of 10 is 3628800
Many times it is useful to call a function from a loop. Let’s work with the following Bond valuation problem.
Imagine I want to write a function to value a bond, and then see how sensitive the bond is to changes in market interests rates.
I start by writing a function to calcualte the value (price) of a bond. I will use the following parameters as inputs:
Parameters (inputs) of my function:
Year-to-maturity: # of years of the contract
Annual interest rate
Frequency: # of times a year for the coupon payments
Annual coupon rate: % of the Face Value to calculate the annual coupon amount
Face value: Principal of the bond
Output:
Bond price / value
Instead of writing the steps before, I will write the steps within my function as comments:
def bond_value(years_to_maturity, annual_interest_rate, frequency, annual_coupon_rate, face_value):""" Calculates the value of a bond. Arguments: years_to_maturity: The number of years until the bond matures. annual_interest_rate: The annual yield to maturity (discount rate). frequency: The number of coupon payments per year. annual_coupon_rate: The annual coupon rate as a decimal. face_value: The face value (par value) of the bond. Returns: The calculated value of the bond. """# Calculate periodic interest rate and number of periods periodic_interest_rate = annual_interest_rate / frequency number_of_periods = years_to_maturity * frequency# Calculate periodic coupon payment periodic_coupon_payment = (annual_coupon_rate * face_value) / frequency# Calculate the present value of coupon payments (annuity) pv_coupon_payments = periodic_coupon_payment * (1- (1+ periodic_interest_rate)**(-number_of_periods)) / periodic_interest_rate# Calculate the present value of the face value pv_face_value = face_value / (1+ periodic_interest_rate)**number_of_periods# Total bond value is the sum of present values of coupon payments and face value total_bond_value = pv_coupon_payments + pv_face_valuereturn total_bond_value# Example usage:# Bond with 10 years to maturity, 5% annual interest rate, semi-annual payments,# 6% annual coupon rate, and $1000 face value.years =10interest_rate =0.05freq =2coupon_rate =0.06face =1000bond_val = bond_value(years, interest_rate, freq, coupon_rate, face)print(f"The value of the bond is: ${bond_val:,.2f}")
The value of the bond is: $1,077.95
As you see I can use this function to value almost any type of bond. Now, imagine that I want to see how much the value of a specific bond changes when market interest rate changes.
Consider the calculation of a bond price with the following parameters: facevalue = 1,000, frequency = 4, coupon rate = 10%, and year-to-maturity = 10 years. Now, with these characteristics, I want to see how much the bond value changes if interests change from 4% to 20%.
I can write a loop to calculate the bond value by changing the interest rate from 4% to 20% jumping the interest rate by 0.5%. For each iteration save the interest rate and the bond value. At the end of the loop, do a plot to see how the bond price changes when the interest rate changes:
import matplotlib.pyplot as plt# Bond parametersface_value =1000frequency =4annual_coupon_rate =0.10years_to_maturity =10# Interest rate rangestart_interest_rate =0.04end_interest_rate =0.20step_interest_rate =0.005# Lists to store resultsinterest_rates = []bond_values = []# Loop through interest rates and calculate bond valueinterest_rate = start_interest_ratewhile interest_rate <= end_interest_rate: bond_val = bond_value(years_to_maturity, interest_rate, frequency, annual_coupon_rate, face_value) interest_rates.append(interest_rate) bond_values.append(bond_val) interest_rate += step_interest_rate# Plot the resultsplt.figure(figsize=(10, 6))plt.plot(interest_rates, bond_values)plt.xlabel("Annual Interest Rate")plt.ylabel("Bond Value")plt.title("Bond Value vs. Annual Interest Rate")plt.grid(True)plt.show()
5 CHALLENGE 1
You are planning to save for retirement over the next 30 years. To do this, you will invest $6,000 a month in a stock account and $3,000 a month in a bond account. The return of the stock account is expected to be 12% compounded monthly, and the bond account will pay 5% annual interest rate also compounded monthly. When you retire, you will combine your money into an account with a 9% annual return. How much can you withdraw each month from your account assuming a 25-year withdrawal period? At the end of 25 years your balance must be zero. Assume that all payments are done at the end of each month.
You have to:
Write/draw your mental map to visualize this problem with your own lens. You can do this in a paper, take a picture and include it in your Colab, or you can use any note software (like OneNote) to do this and paste it as image.
Write down the input and the output
Write the logical steps you propose to solve the problem. Use your own words. Write short and clear sentences for each step.
Paste your inputs, output, and your steps in Gemini to generate Python code. Make sure that your solution is correct. You can check a solution in Excel or carefully review the Python solution.
Make sure you understand the generated Python code. Write down any question about Python code to the next class.
6 Data structures
In Finance there are basically the following dataset structures or data types:
Time-series: in this structure you can have many periods, and information for one “subject” (subject can be company, index, industry, etc). Also, you can have more than one subject, but the information is placed as new column for each subject. For example: the dataset created after running getsymbols:
Cross-sectional structure: in this structure, you usually have many “subjects”, for ONLY ONE period For example, key performance indicators of Mexican companies for the last quarter of 2016:
Panel-data structure: it is combination of time-series and cross-sectional. In this structure, we have more than one subject, and for each subject we can have more than one period. Example:
For financial analysis, data management is a very important process. Most of the time, before designing and running a financial / econometric model, it is needed to do data wrangling: data collection/importing, data cleanning, data merging, data transformation.
In this example we will learn how to merge two datasets, and then do summary tables that gives us understanding of the data.
We will merge 2 dataset: a cross-sectional dataset with general information of firms from the S&P500 Market index, and a panel dataset with annual financial information for US public firms (A public firm is a company that issue shares in the market)
Download the following csv datasets:
List of all US public firms with general information of each firm
Download and import these 2 datasets with the following code:
import pandas as pdimport requestsheaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}# Download the first fileurl1 ='https://www.apradie.com/datos/usfirms.csv'response1 = requests.get(url1, headers=headers)withopen('usfirms.csv', 'wb') as f: f.write(response1.content)
633061
# Download the second fileurl2 ='https://www.apradie.com/datos/usdata.csv'response2 = requests.get(url2, headers=headers)withopen('usdata.csv', 'wb') as f: f.write(response2.content)
11317275
# Import as data framesusfirms = pd.read_csv('usfirms.csv')usdata = pd.read_csv('usdata.csv')# Display the first few rows of each DataFrame to confirmprint("usfirms DataFrame:")
usfirms DataFrame:
usfirms.head()
empresa ... SectorEconomatica
0 A ... Electric Electron
1 AA ... Basic & Fab Metal
2 AABA_old ... Software & Data
3 AAC_old ... Other
4 AAIC_old ... Funds
[5 rows x 7 columns]
print("\nusdata DataFrame:")
usdata DataFrame:
usdata.head()
firm year revenue ... originalprice sharesoutstanding fixedassets
0 A 2000 10773000.0 ... 54.75 456366.381 1741000.0
1 A 2001 8396000.0 ... 28.51 463695.160 1848000.0
2 A 2002 6010000.0 ... 17.96 467024.421 1579000.0
3 A 2003 6056000.0 ... 29.24 476149.083 1447000.0
4 A 2004 7181000.0 ... 24.10 486841.087 1258000.0
[5 rows x 28 columns]
The first dataset contains a catalog of all US firms along with the corresponding industry type and status (active or cancelled), while the second dataset contains the historical financial data of the firms.
The usdata is a panel-dataset, which is called to be a long-format dataset. Each row has financial information for one US firm and 1 period (a year). All $ amounts are in thousands (’1000s).
Here is a data dictionary of the columns:
Data dictionary of historical annual financial data.
Variable
Description
firm
Firm ticker - Unique code of the company
year
Fiscal Year
revenue
Total sales of the firm in the fiscal year
cogs
Cost of good sold in the fiscal year - variable costs of the products sold
sgae
Sales and general administrative expenses in the fiscal year
depreciation
Depreciation and ammortization in the fiscal year
otherincome
Other operational income/expenses that are not directly from the core operations of the firm
extraordinaryitems
Extra income/expenses not related to regular operations
finexp
Financial expenses - interest expenses paid in the year (generated from loans)
incometax
Income tax paid in the year
cashflowoper
Cash flow generated from operations during the year
cashinbanks
Cash balance in banks at the end of the year
totalassets
Total assets of the firm at the end of the year
currentassets
Current assets of the firm at the end of the year
accountsreceivables
Accounts receivables at the end of the year
inventory
Inventory at the end of the year
totalliabilities
Total liabilities of the firm at the end of the year
currentliabilities
Current liabilities of the firm at the end of the year
accountspayable
Accounts payable at the end of the year
shortdebt
Short-term financial debt at the end of the year
longdebt
Balance of long-term financial debt (loans to pay longer than 1 year)
retainedearnings
Retained earnings in the year
adjprice
Stock adjusted price at the end of the year; adjusted for stock splits and dividend payments; used to calculate stock returns
originalprice
Historical stock price (not adjusted); used to calculate historical market value
sharesoutstanding
Historical number of shares available in the market
fixedassets
Fixed assets value at the end of the year
Each row of this dataset has yearly financial data of one firm in one year. All firms have years 2000 to 2023. Not all firms have existed since 2000. Then, it is possible to know when each firm went public to issue shares in the financial market: the first year with some non-empty data.
Earnings before interest and Taxes (ebit) and Net Income (netincome) must be calculated as:
The usfirms.csv is a catalog of all active and cancelled US firms:
Variable
Description
empresa
Firm ticker - Unique code of the company
Nombre
Name of the firm
status
Status of the firm: active or cancelled
partind
Percent participation in the S&P500 market index
naics1
North American Industry Classification Code - Level 1
naics2
North American Industry Classification Code - Level 2
SectorEconomatica
Economatica Industry classification
In the dataus dataset we have annual financial data. For income-statement variables, the information is the amount generated in the corresponding year fiscal year. For balance-sheet variables, the information is the value of the variable at the end of the year (since the beginning when the firm was created).
8 Data merging
In the usdata panel dataset there is no information about firm name, industry, status (active or cancelled), etc. That information is in the usfirms dataset. So, we start by integrating this information into the panel data.
Both datasets have a common/match column: firm ticker, but the column names for firm ticker in the datasets are different. the firm ticker column is called empresa for the usfirms, and firm for the usdata.
Merge the usfirm into the usdata indicating that empresa and firm columns are the matching columns. Only pull the following columns from the usfirm: Nombre, status, naics1 (industry naics level 1) and SectorEconomatica.
You can do this with the following code:
merged_data = pd.merge(usdata, usfirms[['empresa', 'Nombre', 'status', 'naics1']], left_on='firm', right_on='empresa', how='left')# Drop the redundant 'empresa' column from the merged DataFramemerged_data = merged_data.drop('empresa', axis=1)# Display the first few rows of the merged DataFrame to confirmprint("Merged DataFrame:")
Merged DataFrame:
merged_data.head()
firm year revenue ... Nombre status naics1
0 A 2000 10773000.0 ... Agilent Technologies, Inc active Manufacturing
1 A 2001 8396000.0 ... Agilent Technologies, Inc active Manufacturing
2 A 2002 6010000.0 ... Agilent Technologies, Inc active Manufacturing
3 A 2003 6056000.0 ... Agilent Technologies, Inc active Manufacturing
4 A 2004 7181000.0 ... Agilent Technologies, Inc active Manufacturing
[5 rows x 31 columns]
This type of merging is called left-join merging since we use the left dataset as base to pull data from the right dataset.
When doing a left-join merging of 2 dataset, it is very useful to visualize the left and the right datasets. The left dataset in this case is the usdata, and the right dataset is the usfirms. The left dataset is the base dataset, and we pull columns from the right dataset according to the matching columns.
When we do a vlookup in Excel, we do a left-join considering the left dataset the current sheet, and the right dataset the sheet you indicate to pull a column.
When we do a left-join, the resulting merged dataset must have the same number of rows than the left dataset. It is alwasy recommended to check the number of rows and columns of the resulting merged dataset to make sure that the merged opration was performed correctly:
We can use the shape method of pandas to show the # of rows and columns for each dataset:
print("Dimention of the usdata: ")
Dimention of the usdata:
print(usdata.shape)
(62453, 28)
print("Dimention of the usfirm: ")
Dimention of the usfirm:
print(usfirms.shape)
(5691, 7)
print("Dimention of the merged_data: ")
Dimention of the merged_data:
print(merged_data.shape)
(62453, 31)
We see that the merged_data has the same # of rows than the left dataset usdata. The # of columns of the merged_data is 28 since we pulled 3 columns from the right dataset. Then, we can continue with our data analysis.
9 CHALLENGE 2: Basic Data analysis and visualization
Using the merged dataset, create a dataset selecting historical information for Apple. Call this dataset apple_financials. The firm code (ticker) for Apple Inc Corporation is AAPL.
Using apple_financials, calculate a column for gross profit and ebit:
grossprofit = revenue - cogs
ebit = grossprofit - sgae - depreciation
Calculate the annual growth for all years for the following: 1) revenue, 2)grossprofit, and 3)ebit
Design and show a plot to visualize how revenue, gross profit and ebit have been growing over the years
Show a table with the annual growth for revenue, gross profit and ebit for all years
Design and show a plot to visualize annual growth variables from the previous table
Create a new column for market value:
marketvalue = originalprice * sharesoutstanding
For revenue, grossprofit, ebit, and marketvalue create a growth index starting with 1.0 in the first year. In other words, the index of each variable must represent how many times the variable has grown with respect to its value in the first year,
Show a plot to see the growth indexes over the years.
Use an LLM to generate the corresponding code to the previous steps. Make sure you understand the Python code. You can ask the LLM to explain the code line by line.
Fix any error generated by the LLM, or any calculation that can be improved, and re-generate the code.
Provide an insight about the previous plots. What can you say about Apple performance in terms of revenue, gross profit, ebit, and marketvalue? Explain with your words.
(Optional) Do an online check, find the most recent 10-K report (annual financial statement) of Apple, and compare 2023 annual sales and EBIT with what you got. You must get the same amount for the annual sales reported at the end of fiscal year (for Apple, the fiscal year is Q3).
10 CHALLENGE 3. Summary tables to provide data understanding
Using the merged dataset (with all firms), do the following:
Create columns for grossprofit, ebit and marketvalue as you did with the apple_financials dataset
Create a summary table by year with the sum of
revenue
grossprofit
ebit
marketvalue
Using the previous summary table, do a plot to show revenue, grossprofit, ebit and marketvalue over time
Using the previous summary table, create a growth index for each variable to show how many times each variable has grown with respect to the first year. Show a plot to see these indexes over time
Use an LLM to generate the corresponding code to the previous steps. Make sure you understand the Python code. You can ask the LLM to explain the code line by line.
Provide an insight about the previous plots. What can you say about the whole financial market performance in terms of revenue, gross profit, ebit, and marketvalue? Explain with your words.
11 Datacamp online courses
YOU MUST TAKEChapter 1 : Getting to know a dataset from the course: Exploratory Data Analysis in Python
12 W2 submission
The grade of this Workshop will be the following:
Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
Not submitted (0%)
Remember that you have to submit your Google Colab LINK, and you have to SHARE it with me.