Workshop 2 Solution, Algorithms and Data Analysis

Author

Alberto Dorantes

Published

October 6, 2025

Abstract
This is an INDIVIDUAL workshop. In this workshop we practice algorithmic thinking and programming with loops and functions, and we introduce programming for data management. We practice with financial problems such as bond valuation, calculation of financial ratios, and doing summary tables of financial variables.

1 Loops

The for statement is used to repeat certain tasks a specific number of times. We can loop over a sequence of numbers, or we can loop over specific elements of a vector.

In a similar manner, it is possible to program a loop with while statement. The main difference is that you need to specify what is the exit condition at the beginning of the loop. The while loop is useful when we do not know exactly how many iterations are needed to find a result.

Here is an example. If you want to know how many years you need to keep an investment with a fixed interest rate in order to duplicate your initial investment:

# Define variables
APR = 0.10        # Annual Percentage Rate = 10%
INV = 100         # Initial investment = $100
MULTIPLE = 2      # Multiple = 2 (investment should double)
BALANCE = INV     # Start balance equal to initial investment
year = 0          # Start with year = 0

# Loop until balance reaches multiple of initial investment
while BALANCE < MULTIPLE * INV:
    year += 1                           # Increase year by 1
    BALANCE = BALANCE * (1 + APR)       # Apply growth factor

# Print results
print(f"To multiply your investment times {MULTIPLE}, at an interest rate of {100*APR}%, you need {year} years.")
To multiply your investment times 2, at an interest rate of 10.0%, you need 8 years.
print(f"Your balance after {year} years will be ${BALANCE:.2f}")
Your balance after 8 years will be $214.36

It is up to you to practice with the different types of loops. When you have to write a loop, you first have to understand in detail what you are looking for. Then, you have to identify the logic you need to do the iterations and which type of calculations you need to do.

It is very recommended first write a pseudo-code with your own words specifying the details of your logic. The pseudo-code is just a step of sequential tasks with your own words. You can use some names of commands, but if you do not remember, you can just write what you want to do line by line, and specify what are the iterations and conditions of your algorithm.

Once you have your pseudo-code, you can use it as your prompt for your favorite LLM (such as Gemini, chatGPT, etc)

The more structured and clear you write your prompts to generate code, the more effective you will be writing code to automate any interesting and complex task in the context of Finance.

2 Looping vs vectorization

Interestingly, Python and other languages such as R, are designed to easily perform repetitive tasks for data frames, vectors and matrices without using a loop.

In Python, unlike other traditional computer languages such as C, there is an alternative way to do loops without writing loops!

This sounds weird. The way you can do a loop without writing a loop is using “vectorization”. Python can perform vectorization in a natural way. For example, if you want to do a sum of 2 vectors or matrices in a traditional language, you might think in a loop to do an element-by-element operation.

Imagine you want calculate the net present value of a new project in your company. The project involves 2 new products, A and B. Each product will bring its own income. You already estimated the expected annual free cash flow (income minus all expenses) of both products for the next 5 years. The following table shows the expected free cash flow for each year for each product (in millions MXP )

Product Year 1 Year 2 Year 3 Year 4 Year 5
A 3 3 4 6 8
B -2 -3 -1 3 5

The discount rate (or cost of capital) for this project (annual) is 15%. The initial investment of the project is 10 million pesos.

How can you estimate the present value of these cash flows?

You first have to add the cash flows of each product and get an annual free cash flow for the project. You can use a loop that for each year, you do the sum of the 2 corresponding free-cash flows. Something like:

# Define vectors of cash flows as Python lists
A = [3, 3, 4, 6, 8]
B = [-2, -3, -1, 3, 5]

# Initialize list (with zeros) of same length of A 
sumcashflows = [0] * len(A)
#sumcashflows = {}

# Loop through indices
for i in range(len(A)):
  # I sum the cash flows of A and B for the time period i, and store it in sumcashflows:
    sumcashflows[i] = A[i] + B[i]

# I print the resulting stream of cash flows of the project:

print(sumcashflows)
[1, 0, 3, 9, 13]

Now, instead of using a loop, we can use vectorization in a natural way to do the sum of both vectors:

import numpy as np

A = np.array([3, 3, 4, 6, 8])
B = np.array([-2, -3, -1, 3, 5])

# Element-wise sum using vectorization:

sumcashflows = A + B
print(sumcashflows)
[ 1  0  3  9 13]

I used numpy arrays numpy library) instead of lists since lists cannot be vectorized.

Check that the statement A + B performs the vectorization by summing element by element of each array. Internally, this vectorization of arrays actually does a kind of loop, but we do not need to program the loop!!

Data frames (from pandas library) are also vectorized since the operations you do can be done to all columns or all rows.

3 Functions

In any programming language we write an algorithm to automate a process that receive a set of inputs, process the inputs with specific steps, and then generate or return an output or result, which can be one number, a list of numbers, a dataset or any set of bits of information.

If we find that we need to run that algorithm more than one time with different input values, then it is a good idea to write a function that automate the process, but we can easily re-use the function to solve similar problems but with different input values.

In a function the input variables are defined as parameters, and the output is explicitly return at the end of the function.

Here is an example:

Imagine a function that receive a number n and return the factorial of that number. I can write this function in Python as follows:

def factorial(n):
  # There are 3 possibilities about n: n can be negative, zero or positive
  # I will use a conditional if to calculate the value of the factorial of n
  
  if n < 0:
     # If the number is negative, I send a message that the factorial cannot be calculated and return nothing (None)
     print("The parameter n must be positive.")
     return None
  else: # Here n can be zero or positive
   # If n = 0, then I return 1. Why? by definition, in mathematics the 
   # factorial of 0 is 1:
     if n==0: 
       return 1
     else:  # if n>0 I calculate the factorial as the multiplication from 1 to n  
       # Step 1: I start with a variable result = 1 
       result = 1
       # Step 2: I do a for loop to iterate from 1 to n, and for each iteration I multiply result times the iterator (counter) 
       for i in range(1, n+1): 
         result = result * i
       # Step 4: I return the result variable   
       return result   

Once I defined my function, I can call my function more than one time. For example:

num=-5
factnum = factorial(num)
The parameter n must be positive.
print(f"The factorial of {num} is {factnum}")
The factorial of -5 is None
num = 5 
print(f"The factorial of {num} is {factorial(num)}")
The factorial of 5 is 120
num = 10
print(f"The factorial of {num} is {factorial(num)}")
The factorial of 10 is 3628800

Many times it is useful to call a function from a loop. Let’s work with the following Bond valuation problem.

Imagine I want to write a function to value a bond, and then see how sensitive the bond is to changes in market interests rates.

I start by writing a function to calcualte the value (price) of a bond. I will use the following parameters as inputs:

Parameters (inputs) of my function:

  • Year-to-maturity: # of years of the contract

  • Annual interest rate

  • Frequency: # of times a year for the coupon payments

  • Annual coupon rate: % of the Face Value to calculate the annual coupon amount

  • Face value: Principal of the bond

Output:

  • Bond price / value

Instead of writing the steps before, I will write the steps within my function as comments:

def bond_value(years_to_maturity, annual_interest_rate, frequency, annual_coupon_rate, face_value):
  """
  Calculates the value of a bond.

  Arguments:
    years_to_maturity: The number of years until the bond matures.
    annual_interest_rate: The annual yield to maturity (discount rate).
    frequency: The number of coupon payments per year.
    annual_coupon_rate: The annual coupon rate as a decimal.
    face_value: The face value (par value) of the bond.

  Returns:
    The calculated value of the bond.
  """
  # Calculate periodic interest rate and number of periods 
  periodic_interest_rate = annual_interest_rate / frequency
  number_of_periods = years_to_maturity * frequency

  # Calculate periodic coupon payment
  periodic_coupon_payment = (annual_coupon_rate * face_value) / frequency

  # Calculate the present value of coupon payments (annuity)
  pv_coupon_payments = periodic_coupon_payment * (1 - (1 + periodic_interest_rate)**(-number_of_periods)) / periodic_interest_rate

  # Calculate the present value of the face value
  pv_face_value = face_value / (1 + periodic_interest_rate)**number_of_periods

  # Total bond value is the sum of present values of coupon payments and face value
  total_bond_value = pv_coupon_payments + pv_face_value

  return total_bond_value

# Example usage:
# Bond with 10 years to maturity, 5% annual interest rate, semi-annual payments,
# 6% annual coupon rate, and $1000 face value.
years = 10
interest_rate = 0.05
freq = 2
coupon_rate = 0.06
face = 1000

bond_val = bond_value(years, interest_rate, freq, coupon_rate, face)
print(f"The value of the bond is: ${bond_val:,.2f}")
The value of the bond is: $1,077.95

As you see I can use this function to value almost any type of bond. Now, imagine that I want to see how much the value of a specific bond changes when market interest rate changes.

Consider the calculation of a bond price with the following parameters: facevalue = 1,000, frequency = 4, coupon rate = 10%, and year-to-maturity = 10 years. Now, with these characteristics, I want to see how much the bond value changes if interests change from 4% to 20%.

I can write a loop to calculate the bond value by changing the interest rate from 4% to 20% jumping the interest rate by 0.5%. For each iteration save the interest rate and the bond value. At the end of the loop, do a plot to see how the bond price changes when the interest rate changes:

import matplotlib.pyplot as plt

# Bond parameters
face_value = 1000
frequency = 4
annual_coupon_rate = 0.10
years_to_maturity = 10

# Interest rate range
start_interest_rate = 0.04
end_interest_rate = 0.20
step_interest_rate = 0.005

# Lists to store results
interest_rates = []
bond_values = []

# Loop through interest rates and calculate bond value
interest_rate = start_interest_rate
while interest_rate <= end_interest_rate:
    bond_val = bond_value(years_to_maturity, interest_rate, frequency, annual_coupon_rate, face_value)
    interest_rates.append(interest_rate)
    bond_values.append(bond_val)
    interest_rate += step_interest_rate

# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(interest_rates, bond_values)
plt.xlabel("Annual Interest Rate")
plt.ylabel("Bond Value")
plt.title("Bond Value vs. Annual Interest Rate")
plt.grid(True)
plt.show()

4 CHALLENGE 1

You are planning to save for retirement over the next 30 years. To do this, you will invest $6,000 a month in a stock account and $3,000 a month in a bond account. The return of the stock account is expected to be 12% compounded monthly, and the bond account will pay 5% annual interest rate also compounded monthly. When you retire, you will combine your money into an account with a 9% annual return. How much can you withdraw each month from your account assuming a 25-year withdrawal period? At the end of 25 years your balance must be zero. Assume that all payments are done at the end of each month.

You have to:

  1. Write/draw your mental map to visualize this problem with your own lens. You can do this in a paper, take a picture and include it in your Colab, or you can use any note software (like OneNote) to do this and paste it as image.

  2. Write down the input and the output

  3. Write the logical steps you propose to solve the problem. Use your own words. Write short and clear sentences for each step.

  4. Paste your inputs, output, and your steps in Gemini to generate Python code. Make sure that your solution is correct. You can check a solution in Excel or carefully review the Python solution.

  5. Make sure you understand the generated Python code. Write down any question about Python code to the next class.

5 CHALLENGE 1 SOLUTION

Remember that the first step to write code for an algorithm is to make sure you clearly understand the problem at 100%. In this case, I tried to visualize the cash streams of my investment for the following 30 years, and then the cash withdrawal for the folowing 25 years for my retirement (after the first 30 years).

I imagined my payments as cash flow streams for the first 30 years (360 months) as follows:

month 1 2 3 360
stock saving 6000 6000 6000 6000
bond saving 3000 3000 3000 3000

After these first 360 months, I will end up with an ending balance for eacvh investment account. I can visualize this ending balance for each investment as the Future value of the investment.

Then, I can calculate the future value (in year 30) of these 2 investments. I make the assumption that I pay the savings at the end of each month. Since I know how to get the present value of an annuity, then I can first get the present value and then I can just calculate its corresponding future value by multiplying this present value times the factor (1+R/12)^{12*30}.

For each investment I first calculate the present value of each monthly payment, and then sum all of them. Once I have this present value of the investment, then I can just calcuate its corresponding future value in the year 30 (month 360). I can visualize this in tables:

For stock saving, the expected APR is 12% compounded monthly:

Monthly rate = 0.12/12

month 1 2 360
saving 6000 6000 6000
pv(saving) =6000(1+0.12/12) =6000(1+0.12/12)^2 6000(1+0.12/12)^360

The PV of this saving will be the sum of all present values. Once I have this PV, then I calculate its future value easily with the formula:

FV_{t}=PV*(1+R)^{t}

In this case, t will be 360 and R will be (0.12/12)

The same logic follows for the bond investment. And at the end I sum the 2 future values to get the ending balance of my investment after 30 years. From year 31 I will be enjoying my retirement savings for the following 25 years. Then I need to calculate the withdrawal amount per month from the first month of year 31 to endup with balance zero at the end of the 25 years.

With this conceptualization in mind, I start writing a function for the present value. I will use the approach of vectorization (instead of loops) trying to follow my logic.

import numpy as np

def pv_annuity(C, Y, APR, p):
    """
    Computes the present value of an annuity.
    
    Parameters:
        C (float): Payment per period (e.g., monthly payment)
        Y (int): Number of years
        APR (float): Annual percentage rate (e.g., 0.12 for 12%)
        p (int): Number of compounding periods per year (e.g., 12 for monthly)
        
    Returns:
        float: Present value of the annuity
    """
    # Total number of payments
    n_periods = Y * p
    
    # Create arrays for payments and time periods
    cf = np.repeat(C, n_periods)
    months = np.arange(1, n_periods + 1)
    
    # Compute present value of each cash flow
    pvcf = cf / (1 + APR / p) ** months
    
    # Return the sum of all discounted cash flows
    return np.sum(pvcf)


# Example usage:
mystock_pv = pv_annuity(C=6000, Y=30, APR=0.12, p=12)
mybond_pv = pv_annuity(C=3000, Y=30, APR=0.05, p=12)

# Calculate their respective future values at the end of month 360
mystock_fv = mystock_pv * (1 + 0.12 / 12) ** (30 * 12)
mybond_fv = mybond_pv * (1 + 0.05 / 12) ** (30 * 12)

# Print results with formatting
print(f"The future value of the stock account will be: ${mystock_fv:,.2f}")
The future value of the stock account will be: $20,969,784.80
print(f"The future value of the bond account will be: ${mybond_fv:,.2f}")
The future value of the bond account will be: $2,496,775.91
# Total future value
my_fv = mystock_fv + mybond_fv
print(f"My end balance after 30 years will be: ${my_fv:,.2f}")
My end balance after 30 years will be: $23,466,560.70

I could also use the annuity formula to calculate the present value of an annuity instead of calculating it using vectorization. Here is another approach for an alternative function to get the present value of an annuity. The formula for annuity is:

PV=\frac{C}{R}*\left[1-\frac{1}{(1+R)^{N}}\right]

Then I write a function to calculate this:

import numpy as np

def pv_annuity2(C, Y, APR, p):
    """
    Computes the present value of an annuity using the closed-form formula.
    
    Parameters:
        C (float): Payment per period (e.g., monthly payment)
        Y (int): Number of years
        APR (float): Annual percentage rate (e.g., 0.12 for 12%)
        p (int): Number of compounding periods per year (e.g., 12 for monthly)
        
    Returns:
        float: Present value of the annuity
    """
    rate = APR / p
    n_periods = Y * p
    
    pv = C / rate * (1 - 1 / (1 + rate) ** n_periods)
    return pv


# Example usage:
mystock_pv2 = pv_annuity2(C=6000, Y=30, APR=0.12, p=12)
mybond_pv2 = pv_annuity2(C=3000, Y=30, APR=0.05, p=12)

# Calculate their respective future values at the end of month 360
mystock_fv2 = mystock_pv2 * (1 + 0.12 / 12) ** (30 * 12)
mybond_fv2 = mybond_pv2 * (1 + 0.05 / 12) ** (30 * 12)

# Total future value
my_fv2 = mystock_fv2 + mybond_fv2

# Print results
print(f"The future value of the stock account will be: ${mystock_fv2:,.2f}")
The future value of the stock account will be: $20,969,784.80
print(f"The future value of the bond account will be: ${mybond_fv2:,.2f}")
The future value of the bond account will be: $2,496,775.91
print(f"My end balance after 30 years will be: ${my_fv2:,.2f}")
My end balance after 30 years will be: $23,466,560.70

I got the same result for the end balance as in my first approach.

Once I have the end balance after 30 years, then I can calculate how much money I can withdraw from my investment each month in order to finish my savings after 25 years. I will assume that I will be taking my money out of my account at the end of each month.

I can visualize this problem as another annuity where I know the present value that is my end balance after the 30 years, the future value will be zero, the annual rate is 9% and the number of years is 25. The only unknown is the monthly payment. The annuity formula is:

PV=\frac{C}{R}*\left[1-\frac{1}{(1+R)^{N}}\right]

C is the fixed payment of the annuity. In this problem, C represents the monthly withrawal. PV represents the value of my investment after 30 years. I treat this as present value since I am imagining myself 30 years from now that I already have my end balance of my investment. R is the monthly rate and N is the total number of months after 25 years.

Getting C from this formula:

C=\frac{PV*R}{\left[1-\frac{1}{(1+R)^{N}}\right]}

I can easily calculate the monthly withdrawal C in Python:

# Given parameters
APR = 0.09  # Annual percentage rate (9%)
Y = 25      # Number of years
p = 12      # Periods per year

# Assuming 'myfv' was already computed previously (from your annuity example)
# For example:
# my_fv = mystock_fv2 + mybond_fv2

# Compute the monthly withdrawal
wdraw = my_fv * (APR / p) / (1 - (1 / (1 + APR / p) ** (Y * p)))

print("The monthly withdrawal will be:")
The monthly withdrawal will be:
print(f"${wdraw:,.2f}")
$196,930.52

I can test my result using the annuity function. I can calculate the present value of an annuity with a fixed payment equal to the withdrawal amount I calculated. If this present value is equal to the investment I would have at year 30, then the withraw calculation will be correct:

# Compute the required investment amount for the stock and bond accounts
Stock_and_Bond_Inv = pv_annuity(C=wdraw, Y=25, APR=APR, p=12)

print("The calculated investment of the stock and bond accounts at year 30 will be:")
The calculated investment of the stock and bond accounts at year 30 will be:
print(f"${Stock_and_Bond_Inv:,.2f}")
$23,466,560.70

Since I got the same investment as the one I had calculated before, then I test that the withdrawal amount is correct.

6 Data structures

In Finance there are basically the following dataset structures or data types:

  1. Time-series: in this structure you can have many periods, and information for one “subject” (subject can be company, index, industry, etc). Also, you can have more than one subject, but the information is placed as new column for each subject. For example: the dataset created after running getsymbols:
       p_stock1 r_stock1 p_stock2 r_stock2
2014m1       10     0.02       20     0.01
2014m2       11     0.10       21     0.05
  1. Cross-sectional structure: in this structure, you usually have many “subjects”, for ONLY ONE period For example, key performance indicators of Mexican companies for the last quarter of 2016:
    Ticker   ROA ATO ProfitMargin
1 ALFAA.MX 0.023 0.9         0.15
2  AMXL.MX 0.015 1.1         0.10
  1. Panel-data structure: it is combination of time-series and cross-sectional. In this structure, we have more than one subject, and for each subject we can have more than one period. Example:
     Ticker quarter  ROA ATO ProfitMargin
1  ALFAA.MX  2014q1  0.2 1.1         0.17
2  ALFAA.MX  2014q2 0.21 1.2         0.16
3       ...     ...  ... ...          ...
4 BIMBOA.MX  2014q1 0.15 0.8         0.10
5 BIMBOA.MX  2014q2 0.20 0.9         0.05
6       ...     ...  ... ...          ...
7 BIMBOA.MX  2017q1 0.15 1.1         0.09

7 Data management for panel-data

For financial analysis, data management is a very important process. Most of the time, before designing and running a financial / econometric model, it is needed to do data wrangling: data collection/importing, data cleanning, data merging, data transformation.

In this example we will learn how to merge two datasets, and then do summary tables that gives us understanding of the data.

We will merge 2 dataset: a cross-sectional dataset with general information of firms from the S&P500 Market index, and a panel dataset with annual financial information for US public firms (A public firm is a company that issue shares in the market)

Download the following csv datasets:

Download and import these 2 datasets with the following code:

import pandas as pd
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# Download the first file
url1 = 'https://www.apradie.com/datos/usfirms.csv'
response1 = requests.get(url1, headers=headers)
with open('usfirms.csv', 'wb') as f:
    f.write(response1.content)
633061
# Download the second file
url2 = 'https://www.apradie.com/datos/usdata.csv'
response2 = requests.get(url2, headers=headers)
with open('usdata.csv', 'wb') as f:
    f.write(response2.content)
11317275
# Import as data frames
usfirms = pd.read_csv('usfirms.csv')
usdata = pd.read_csv('usdata.csv')

# Display the first few rows of each DataFrame to confirm
print("usfirms DataFrame:")
usfirms DataFrame:
usfirms.head()
    empresa  ...  SectorEconomatica
0         A  ...  Electric Electron
1        AA  ...  Basic & Fab Metal
2  AABA_old  ...    Software & Data
3   AAC_old  ...              Other
4  AAIC_old  ...              Funds

[5 rows x 7 columns]
print("\nusdata DataFrame:")

usdata DataFrame:
usdata.head()
  firm  year     revenue  ...  originalprice  sharesoutstanding  fixedassets
0    A  2000  10773000.0  ...          54.75         456366.381    1741000.0
1    A  2001   8396000.0  ...          28.51         463695.160    1848000.0
2    A  2002   6010000.0  ...          17.96         467024.421    1579000.0
3    A  2003   6056000.0  ...          29.24         476149.083    1447000.0
4    A  2004   7181000.0  ...          24.10         486841.087    1258000.0

[5 rows x 28 columns]

The first dataset contains a catalog of all US firms along with the corresponding industry type and status (active or cancelled), while the second dataset contains the historical financial data of the firms.

The usdata is a panel-dataset, which is called to be a long-format dataset. Each row has financial information for one US firm and 1 period (a year). All $ amounts are in thousands (’1000s).

Here is a data dictionary of the columns:

Data dictionary of historical annual financial data.
Variable Description
firm Firm ticker - Unique code of the company
year Fiscal Year
revenue Total sales of the firm in the fiscal year
cogs Cost of good sold in the fiscal year - variable costs of the products sold
sgae Sales and general administrative expenses in the fiscal year
depreciation Depreciation and ammortization in the fiscal year
otherincome Other operational income/expenses that are not directly from the core operations of the firm
extraordinaryitems Extra income/expenses not related to regular operations
finexp Financial expenses - interest expenses paid in the year (generated from loans)
incometax Income tax paid in the year
cashflowoper Cash flow generated from operations during the year
cashinbanks Cash balance in banks at the end of the year
totalassets Total assets of the firm at the end of the year
currentassets Current assets of the firm at the end of the year
accountsreceivables Accounts receivables at the end of the year
inventory Inventory at the end of the year
totalliabilities Total liabilities of the firm at the end of the year
currentliabilities Current liabilities of the firm at the end of the year
accountspayable Accounts payable at the end of the year
shortdebt Short-term financial debt at the end of the year
longdebt Balance of long-term financial debt (loans to pay longer than 1 year)
retainedearnings Retained earnings in the year
adjprice Stock adjusted price at the end of the year; adjusted for stock splits and dividend payments; used to calculate stock returns
originalprice Historical stock price (not adjusted); used to calculate historical market value
sharesoutstanding Historical number of shares available in the market
fixedassets Fixed assets value at the end of the year

Each row of this dataset has yearly financial data of one firm in one year. All firms have years 2000 to 2023. Not all firms have existed since 2000. Then, it is possible to know when each firm went public to issue shares in the financial market: the first year with some non-empty data.

Earnings before interest and Taxes (ebit) and Net Income (netincome) must be calculated as:

ebit = revenue - cogs - sgae - depreciation

netincome = ebit + otherincome + extraordinaryitems - finexp - incometax

The usfirms.csv is a catalog of all active and cancelled US firms:

Variable Description
empresa Firm ticker - Unique code of the company
Nombre Name of the firm
status Status of the firm: active or cancelled
partind Percent participation in the S&P500 market index
naics1 North American Industry Classification Code - Level 1
naics2 North American Industry Classification Code - Level 2
SectorEconomatica Economatica Industry classification

In the dataus dataset we have annual financial data. For income-statement variables, the information is the amount generated in the corresponding year fiscal year. For balance-sheet variables, the information is the value of the variable at the end of the year (since the beginning when the firm was created).

8 Data merging

In the usdata panel dataset there is no information about firm name, industry, status (active or cancelled), etc. That information is in the usfirms dataset. So, we start by integrating this information into the panel data.

Both datasets have a common/match column: firm ticker, but the column names for firm ticker in the datasets are different. the firm ticker column is called empresa for the usfirms, and firm for the usdata.

Merge the usfirm into the usdata indicating that empresa and firm columns are the matching columns. Only pull the following columns from the usfirm: Nombre, status, naics1 (industry naics level 1) and SectorEconomatica.

You can do this with the following code:

merged_data = pd.merge(usdata, usfirms[['empresa', 'Nombre', 'status', 'naics1']], left_on='firm', right_on='empresa', how='left')

# Drop the redundant 'empresa' column from the merged DataFrame
merged_data = merged_data.drop('empresa', axis=1)

# Display the first few rows of the merged DataFrame to confirm
print("Merged DataFrame:")
Merged DataFrame:
merged_data.head()
  firm  year     revenue  ...                     Nombre  status         naics1
0    A  2000  10773000.0  ...  Agilent Technologies, Inc  active  Manufacturing
1    A  2001   8396000.0  ...  Agilent Technologies, Inc  active  Manufacturing
2    A  2002   6010000.0  ...  Agilent Technologies, Inc  active  Manufacturing
3    A  2003   6056000.0  ...  Agilent Technologies, Inc  active  Manufacturing
4    A  2004   7181000.0  ...  Agilent Technologies, Inc  active  Manufacturing

[5 rows x 31 columns]

This type of merging is called left-join merging since we use the left dataset as base to pull data from the right dataset.

When doing a left-join merging of 2 dataset, it is very useful to visualize the left and the right datasets. The left dataset in this case is the usdata, and the right dataset is the usfirms. The left dataset is the base dataset, and we pull columns from the right dataset according to the matching columns.

When we do a vlookup in Excel, we do a left-join considering the left dataset the current sheet, and the right dataset the sheet you indicate to pull a column.

When we do a left-join, the resulting merged dataset must have the same number of rows than the left dataset. It is alwasy recommended to check the number of rows and columns of the resulting merged dataset to make sure that the merged opration was performed correctly:

We can use the shape method of pandas to show the # of rows and columns for each dataset:

print("Dimention of the usdata: ")
Dimention of the usdata: 
print(usdata.shape)
(62453, 28)
print("Dimention of the usfirm: ")
Dimention of the usfirm: 
print(usfirms.shape)
(5691, 7)
print("Dimention of the merged_data: ")
Dimention of the merged_data: 
print(merged_data.shape)
(62453, 31)

We see that the merged_data has the same # of rows than the left dataset usdata. The # of columns of the merged_data is 28 since we pulled 3 columns from the right dataset. Then, we can continue with our data analysis.

9 CHALLENGE 2: Basic Data analysis and visualization

Using the merged dataset, create a dataset selecting historical information for Apple. Call this dataset apple_financials. The firm code (ticker) for Apple Inc Corporation is AAPL.

Using apple_financials, calculate a column for gross profit and ebit:

grossprofit = revenue - cogs

ebit = grossprofit - sgae - depreciation

Calculate the annual growth for all years for the following: 1) revenue, 2)grossprofit, and 3)ebit

Design and show a plot to visualize how revenue, gross profit and ebit have been growing over the years

Show a table with the annual growth for revenue, gross profit and ebit for all years

Design and show a plot to visualize annual growth variables from the previous table

Create a new column for market value:

marketvalue = originalprice * sharesoutstanding

For revenue, grossprofit, ebit, and marketvalue create a growth index starting with 1.0 in the first year. In other words, the index of each variable must represent how many times the variable has grown with respect to its value in the first year,

Show a plot to see the growth indexes over the years.

Use an LLM to generate the corresponding code to the previous steps. Make sure you understand the Python code. You can ask the LLM to explain the code line by line.

Fix any error generated by the LLM, or any calculation that can be improved, and re-generate the code.

Provide an insight about the previous plots. What can you say about Apple performance in terms of revenue, gross profit, ebit, and marketvalue? Explain with your words.

(Optional) Do an online check, find the most recent 10-K report (annual financial statement) of Apple, and compare 2023 annual sales and EBIT with what you got. You must get the same amount for the annual sales reported at the end of fiscal year (for Apple, the fiscal year is Q3).

10 CHALLENGE 2 - SOLUTION

After reading the challenge, I wrote the following prompt to Gemini:

“Analyze the financial performance of Apple (AAPL) using the merged_data dataset. Filter the data to create a new dataset apple_financials containing only Apple’s information. Calculate ‘grossprofit’ and ‘ebit’ using the formulas: grossprofit = revenue - cogs and ebit = grossprofit - sgae - depreciation. Calculate the annual growth for ‘revenue’, ‘grossprofit’, and ‘ebit’. Visualize the trend of ‘revenue’, ‘gross profit’, and ‘ebit’ over the years, and also visualize their annual growth rates. Finally, display a table showing the annual growth rates for ‘revenue’, ‘gross profit’, and ‘ebit’.”

And Gemini proposed subtasks to solve the problem step by step, and I got the following explanations and code:

Filter data for apple Subtask: Create a new DataFrame apple_financials containing only the data for Apple (ticker ‘AAPL’) from the merged_data DataFrame.

Reasoning: Filter the merged_data DataFrame to select rows where the ‘firm’ column is equal to ‘AAPL’ and store the result in apple_financials, then display the head of the new dataframe.

apple_financials = merged_data[merged_data['firm'] == 'AAPL'].copy()
apple_financials.head()
     firm  year    revenue  ...     Nombre  status         naics1
163  AAPL  2000  7983000.0  ...  Apple Inc  active  Manufacturing
164  AAPL  2001  5363000.0  ...  Apple Inc  active  Manufacturing
165  AAPL  2002  5742000.0  ...  Apple Inc  active  Manufacturing
166  AAPL  2003  6207000.0  ...  Apple Inc  active  Manufacturing
167  AAPL  2004  8279000.0  ...  Apple Inc  active  Manufacturing

[5 rows x 31 columns]

Calculate gross profit and ebit Subtask: Add new columns for ‘grossprofit’ and ‘ebit’ to the apple_financials DataFrame based on the provided formulas.

Reasoning: Calculate ‘grossprofit’ and ‘ebit’ and add them as new columns to the apple_financials DataFrame.

apple_financials['grossprofit'] = apple_financials['revenue'] - apple_financials['cogs']
apple_financials['ebit'] = apple_financials['grossprofit'] - apple_financials['sgae'] - apple_financials['depreciation']
apple_financials.head()
     firm  year    revenue  ...         naics1  grossprofit      ebit
163  AAPL  2000  7983000.0  ...  Manufacturing    2166000.0  522000.0
164  AAPL  2001  5363000.0  ...  Manufacturing    1235000.0 -344000.0
165  AAPL  2002  5742000.0  ...  Manufacturing    1603000.0   17000.0
166  AAPL  2003  6207000.0  ...  Manufacturing    1708000.0   -1000.0
167  AAPL  2004  8279000.0  ...  Manufacturing    2259000.0  326000.0

[5 rows x 33 columns]

Calculate annual growth Subtask: Calculate the year-over-year growth rate for ‘revenue’, ‘grossprofit’, and ‘ebit’ in the apple_financials DataFrame.

Reasoning: Calculate the year-over-year percentage change for ‘revenue’, ‘grossprofit’, and ‘ebit’ and store them in new columns, then display the updated dataframe.

apple_financials['revenue_growth'] = apple_financials['revenue'].pct_change() * 100
apple_financials['grossprofit_growth'] = apple_financials['grossprofit'].pct_change() * 100
apple_financials['ebit_growth'] = apple_financials['ebit'].pct_change() * 100

print("Apple Financials with Growth Rates:")
Apple Financials with Growth Rates:
apple_financials.head()
     firm  year    revenue  ...  revenue_growth  grossprofit_growth   ebit_growth
163  AAPL  2000  7983000.0  ...             NaN                 NaN           NaN
164  AAPL  2001  5363000.0  ...      -32.819742          -42.982456   -165.900383
165  AAPL  2002  5742000.0  ...        7.066940           29.797571   -104.941860
166  AAPL  2003  6207000.0  ...        8.098224            6.550218   -105.882353
167  AAPL  2004  8279000.0  ...       33.381666           32.259953 -32700.000000

[5 rows x 36 columns]

Visualize revenue, gross profit, and ebit Subtask: Create a plot to visualize the trend of ‘revenue’, ‘grossprofit’, and ‘ebit’ over the years for Apple.

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(apple_financials['year'], apple_financials['revenue'], label='Revenue')
plt.plot(apple_financials['year'], apple_financials['grossprofit'], label='Gross Profit')
plt.plot(apple_financials['year'], apple_financials['ebit'], label='EBIT')
plt.xlabel('Year')
plt.ylabel('Amount')
plt.title('Financial Performance of Apple (AAPL) Over Time')
plt.legend()
plt.show()

Display annual growth table Subtask: Show a table containing the calculated annual growth rates for ‘revenue’, ‘grossprofit’, and ‘ebit’.

Reasoning: Select the relevant columns and display them as a table to show the annual growth rates.

growth_rates_table = apple_financials[['year', 'revenue',  'revenue_growth', 'grossprofit_growth', 'ebit', 'ebit_growth']]
growth_rates_table
     year      revenue  ...         ebit   ebit_growth
163  2000    7983000.0  ...     522000.0           NaN
164  2001    5363000.0  ...    -344000.0   -165.900383
165  2002    5742000.0  ...      17000.0   -104.941860
166  2003    6207000.0  ...      -1000.0   -105.882353
167  2004    8279000.0  ...     326000.0 -32700.000000
168  2005   13931000.0  ...    1650000.0    406.134969
169  2006   19315000.0  ...    2453000.0     48.666667
170  2007   24006000.0  ...    4409000.0     79.739095
171  2008   32479000.0  ...    6275000.0     42.322522
172  2009   36537000.0  ...    7658000.0     22.039841
173  2010   65225000.0  ...   18385000.0    140.075738
174  2011  108249000.0  ...   33790000.0     83.791134
175  2012  156508000.0  ...   55241000.0     63.483279
176  2013  170910000.0  ...   48999000.0    -11.299578
177  2014  182795000.0  ...   52503000.0      7.151166
178  2015  233715000.0  ...   71230000.0     35.668438
179  2016  215639000.0  ...   60024000.0    -15.732135
180  2017  229234000.0  ...   61344000.0      2.199120
181  2018  265595000.0  ...   70898000.0     15.574465
182  2019  260174000.0  ...   63930000.0     -9.828204
183  2020  274515000.0  ...   66288000.0      3.688409
184  2021  365817000.0  ...  108949000.0     64.357048
185  2022  394328000.0  ...  119437000.0      9.626523
186  2023  383285000.0  ...  114301000.0     -4.300175

[24 rows x 6 columns]

Visualize growth variables Subtask: Create a plot to visualize the annual growth rates of ‘revenue’, ‘grossprofit’, and ‘ebit’ over the years.

Reasoning: Create a plot to visualize the annual growth rates of ‘revenue’, ‘grossprofit’, and ‘ebit’ over the years.

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(apple_financials['year'], apple_financials['revenue_growth'], label='Revenue Growth')
plt.plot(apple_financials['year'], apple_financials['grossprofit_growth'], label='Gross Profit Growth')
plt.plot(apple_financials['year'], apple_financials['ebit_growth'], label='EBIT Growth')
plt.xlabel('Year')
plt.ylabel('Growth Rate (%)')
plt.title('Annual Growth Rates of Revenue, Gross Profit, and EBIT for Apple (AAPL)')
plt.legend()
plt.show()

There are very weird / extreme values for annual growth of EBIT. I check the data to see what is happening.

apple_financials[['firm','year','ebit','ebit_growth','grossprofit','grossprofit_growth']]
     firm  year         ebit   ebit_growth  grossprofit  grossprofit_growth
163  AAPL  2000     522000.0           NaN    2166000.0                 NaN
164  AAPL  2001    -344000.0   -165.900383    1235000.0          -42.982456
165  AAPL  2002      17000.0   -104.941860    1603000.0           29.797571
166  AAPL  2003      -1000.0   -105.882353    1708000.0            6.550218
167  AAPL  2004     326000.0 -32700.000000    2259000.0           32.259953
168  AAPL  2005    1650000.0    406.134969    4043000.0           78.972997
169  AAPL  2006    2453000.0     48.666667    5598000.0           38.461538
170  AAPL  2007    4409000.0     79.739095    8154000.0           45.659164
171  AAPL  2008    6275000.0     42.322522   11145000.0           36.681383
172  AAPL  2009    7658000.0     22.039841   13140000.0           17.900404
173  AAPL  2010   18385000.0    140.075738   25684000.0           95.464231
174  AAPL  2011   33790000.0     83.791134   43818000.0           70.604267
175  AAPL  2012   55241000.0     63.483279   68662000.0           56.698161
176  AAPL  2013   48999000.0    -11.299578   64304000.0           -6.347033
177  AAPL  2014   52503000.0      7.151166   70537000.0            9.693021
178  AAPL  2015   71230000.0     35.668438   93626000.0           32.733175
179  AAPL  2016   60024000.0    -15.732135   84263000.0          -10.000427
180  AAPL  2017   61344000.0      2.199120   88186000.0            4.655661
181  AAPL  2018   70898000.0     15.574465  101839000.0           15.482049
182  AAPL  2019   63930000.0     -9.828204   98392000.0           -3.384754
183  AAPL  2020   66288000.0      3.688409  104956000.0            6.671274
184  AAPL  2021  108949000.0     64.357048  152836000.0           45.619117
185  AAPL  2022  119437000.0      9.626523  170782000.0           11.741998
186  AAPL  2023  114301000.0     -4.300175  169148000.0           -0.956775

I see that there are negative values for EBIT for some years.

Interestingly, the traditional way to calculate percentage change of any value only applies when both, the current and previous values are positive! I recommend to try with simple examples to see what happens when you calculate % change when a variable moves from negative to positive, or viceversa, or when both are negative. You will get weird results that does not make sense!

Then, I have to make a decision for these cases since that can affect both, the plots and any other summary I might need to do like calculating average annual growth. I will assign a null value for these cases to avoid confusion with plots and avoid using these non-sense values in case I do annual average % change of the variables.

Check that these cases were NOT DETECTED by Gemini LLM! Remember that you are the boss here, not the LLM’s!

I do the following code to assign null values for these cases:

# Set growth to NaN when the current or previous value of the variable are negative. 
#I do this for both EBIT and grossprofit
apple_financials.loc[(apple_financials['grossprofit']<0) | (apple_financials['grossprofit'].shift(1)<0), 'grossprofit_growth'] = None
apple_financials.loc[(apple_financials['ebit']< 0) | (apple_financials['ebit'].shift(1)<0), 'ebit_growth'] = None

I check the values to make sure that the code run correctly:

apple_financials[['firm','year','ebit','ebit_growth','grossprofit','grossprofit_growth']]
     firm  year         ebit  ebit_growth  grossprofit  grossprofit_growth
163  AAPL  2000     522000.0          NaN    2166000.0                 NaN
164  AAPL  2001    -344000.0          NaN    1235000.0          -42.982456
165  AAPL  2002      17000.0          NaN    1603000.0           29.797571
166  AAPL  2003      -1000.0          NaN    1708000.0            6.550218
167  AAPL  2004     326000.0          NaN    2259000.0           32.259953
168  AAPL  2005    1650000.0   406.134969    4043000.0           78.972997
169  AAPL  2006    2453000.0    48.666667    5598000.0           38.461538
170  AAPL  2007    4409000.0    79.739095    8154000.0           45.659164
171  AAPL  2008    6275000.0    42.322522   11145000.0           36.681383
172  AAPL  2009    7658000.0    22.039841   13140000.0           17.900404
173  AAPL  2010   18385000.0   140.075738   25684000.0           95.464231
174  AAPL  2011   33790000.0    83.791134   43818000.0           70.604267
175  AAPL  2012   55241000.0    63.483279   68662000.0           56.698161
176  AAPL  2013   48999000.0   -11.299578   64304000.0           -6.347033
177  AAPL  2014   52503000.0     7.151166   70537000.0            9.693021
178  AAPL  2015   71230000.0    35.668438   93626000.0           32.733175
179  AAPL  2016   60024000.0   -15.732135   84263000.0          -10.000427
180  AAPL  2017   61344000.0     2.199120   88186000.0            4.655661
181  AAPL  2018   70898000.0    15.574465  101839000.0           15.482049
182  AAPL  2019   63930000.0    -9.828204   98392000.0           -3.384754
183  AAPL  2020   66288000.0     3.688409  104956000.0            6.671274
184  AAPL  2021  108949000.0    64.357048  152836000.0           45.619117
185  AAPL  2022  119437000.0     9.626523  170782000.0           11.741998
186  AAPL  2023  114301000.0    -4.300175  169148000.0           -0.956775

Now it looks ok.

I will re-do the plot of annual growths:

plt.figure(figsize=(12, 6))
plt.plot(apple_financials['year'], apple_financials['revenue_growth'], label='Revenue Growth')
plt.plot(apple_financials['year'], apple_financials['grossprofit_growth'], label='Gross Profit Growth')
plt.plot(apple_financials['year'], apple_financials['ebit_growth'], label='EBIT Growth')
plt.xlabel('Year')
plt.ylabel('Growth Rate (%)')
plt.title('Annual Growth Rates of Revenue, Gross Profit, and EBIT for Apple (AAPL)')
plt.legend()
plt.show()

I can now appreciate the annual growth of all variables over time.

The following is the summary given by Gemini:

Summary: Data Analysis Key Findings

The analysis successfully filtered the financial data to include only Apple (AAPL).

‘Grossprofit’ and ‘ebit’ were calculated using the provided formulas: grossprofit = revenue - cogs and ebit = grossprofit - sgae - depreciation.

Annual growth rates for ‘revenue’, ‘grossprofit’, and ‘ebit’ were calculated as percentage changes from the previous year.

The trend of ‘revenue’, ‘gross profit’, and ‘ebit’ over the years shows a general upward trajectory.

A table displaying the annual growth rates for ‘revenue’, ‘gross profit’, and ‘ebit’ was generated, showing the year-over-year percentage changes.

A plot visualizing the annual growth rates of ‘revenue’, ‘gross profit’, and ‘ebit’ was created, illustrating the volatility and trends in growth over the years.

Insights or Next Steps

Analyze the periods with significant fluctuations in growth rates to understand the underlying business drivers or external factors.

Compare Apple’s financial performance and growth rates against industry benchmarks or key competitors to gain further insights into its relative position.

Task

Analyze how Apple’s market value has grown over time and compare it with its revenue, gross profit, and EBIT trend/growth using the apple_financials dataset. For market value, multiply the sharesoutstanding and originalprice columns. Show plots, statistics, and insights about Apple’s growth in terms of market perception (value) and fundamentals (revenue, profit).

Calculate market value

Subtask:

Create a new column ‘market_value’ in the apple_financials DataFrame by multiplying ‘sharesoutstanding’ and ‘originalprice’.

Reasoning: Calculate ‘market_value’ by multiplying ‘sharesoutstanding’ and ‘originalprice’ and add it as a new column to the apple_financials DataFrame, then display the head of the updated dataframe.

apple_financials['market_value'] = apple_financials['sharesoutstanding'] * apple_financials['originalprice']
apple_financials.head()
     firm  year    revenue  ...  grossprofit_growth  ebit_growth  market_value
163  AAPL  2000  7983000.0  ...                 NaN          NaN  8.367953e+06
164  AAPL  2001  5363000.0  ...          -42.982456          NaN  5.441100e+06
165  AAPL  2002  5742000.0  ...           29.797571          NaN  5.203846e+06
166  AAPL  2003  6207000.0  ...            6.550218          NaN  7.511019e+06
167  AAPL  2004  8279000.0  ...           32.259953          NaN  1.503210e+07

[5 rows x 37 columns]

Calculate market value growth Subtask: Calculate the year-over-year growth rate for ‘market_value’.

Reasoning: Calculate the year-over-year percentage change for ‘market_value’ and store it in a new column, then display the updated dataframe

apple_financials['market_value_growth'] = apple_financials['market_value'].pct_change() * 100
apple_financials.head()
     firm  year    revenue  ...  ebit_growth  market_value  market_value_growth
163  AAPL  2000  7983000.0  ...          NaN  8.367953e+06                  NaN
164  AAPL  2001  5363000.0  ...          NaN  5.441100e+06           -34.976932
165  AAPL  2002  5742000.0  ...          NaN  5.203846e+06            -4.360410
166  AAPL  2003  6207000.0  ...          NaN  7.511019e+06            44.335934
167  AAPL  2004  8279000.0  ...          NaN  1.503210e+07           100.133916

[5 rows x 38 columns]

Visualize market value trend

Subtask:

Create a plot to visualize the trend of ‘market_value’ over the years, potentially alongside revenue, gross profit, and EBIT for comparison.

Reasoning: Create a plot to visualize the trend of ‘market_value’ over the years, potentially alongside revenue, gross profit, and EBIT for comparison.

plt.figure(figsize=(12, 6))
plt.plot(apple_financials['year'], apple_financials['revenue'], label='Revenue')
plt.plot(apple_financials['year'], apple_financials['grossprofit'], label='Gross Profit')
plt.plot(apple_financials['year'], apple_financials['ebit'], label='EBIT')
plt.plot(apple_financials['year'], apple_financials['market_value'], label='Market Value')
plt.xlabel('Year')
plt.ylabel('Amount')
plt.title('Financial and Market Value Trend of Apple (AAPL) Over Time')
plt.legend()
plt.show()

I see that market value has grown a lot over time compared to revenue, ebit and gross profit.

Visualize market value growth Subtask: Create a plot to visualize the annual growth rate of ‘market_value’, potentially alongside the growth rates of revenue, gross profit, and EBIT for comparison.

Reasoning: Create a plot to visualize the annual growth rates of ‘market_value’, ‘revenue’, ‘grossprofit’, and ‘ebit’ over the years for comparison.

plt.figure(figsize=(12, 6))
plt.plot(apple_financials['year'], apple_financials['market_value_growth'], label='Market Value Growth')
plt.plot(apple_financials['year'], apple_financials['revenue_growth'], label='Revenue Growth')
plt.plot(apple_financials['year'], apple_financials['grossprofit_growth'], label='Gross Profit Growth')
plt.plot(apple_financials['year'], apple_financials['ebit_growth'], label='EBIT Growth')
plt.xlabel('Year')
plt.ylabel('Growth Rate (%)')
plt.title('Annual Growth Rates of Market Value, Revenue, Gross Profit, and EBIT for Apple (AAPL)')
plt.legend()
plt.show()

Analyze and summarize Subtask: Provide statistics and insights about the relationship between Apple’s market value growth and the growth of its fundamental financial metrics.

Reasoning: Calculate descriptive statistics and the correlation matrix for the growth rate columns and then analyze them to provide insights.

growth_columns = ['revenue_growth', 'grossprofit_growth', 'ebit_growth', 'market_value_growth']
descriptive_stats = apple_financials[growth_columns].describe()
correlation_matrix = apple_financials[growth_columns].corr()

print("Descriptive Statistics of Growth Rates:")
Descriptive Statistics of Growth Rates:
descriptive_stats
       revenue_growth  grossprofit_growth  ebit_growth  market_value_growth
count       23.000000           23.000000    19.000000            23.000000
mean        21.041311           24.868467    51.755701            37.437307
std         26.322433           31.778774    94.878979            53.226724
min        -32.819742          -42.982456   -15.732135           -34.976932
25%          6.629235            5.602940     2.943765            -3.757823
50%         12.494227           17.900404    22.039841            36.375885
75%         34.338504           42.040328    63.920164            60.512745
max         78.517667           95.464231   406.134969           195.947726
print("\nCorrelation Matrix of Growth Rates:")

Correlation Matrix of Growth Rates:
correlation_matrix
                     revenue_growth  ...  market_value_growth
revenue_growth             1.000000  ...             0.537387
grossprofit_growth         0.968869  ...             0.562143
ebit_growth                0.739306  ...             0.769368
market_value_growth        0.537387  ...             1.000000

[4 rows x 4 columns]
print("\nAnalysis of Growth Rates and Market Value Relationship:")

Analysis of Growth Rates and Market Value Relationship:
print("The descriptive statistics provide an overview of the central tendency, dispersion, and shape of the distribution for each growth rate. Large standard deviations indicate significant volatility in annual growth.")
The descriptive statistics provide an overview of the central tendency, dispersion, and shape of the distribution for each growth rate. Large standard deviations indicate significant volatility in annual growth.
print("\nThe correlation matrix shows the linear relationship between the growth rates. A high positive correlation suggests that when one metric grows, the other tends to grow as well. A high negative correlation suggests an inverse relationship.")

The correlation matrix shows the linear relationship between the growth rates. A high positive correlation suggests that when one metric grows, the other tends to grow as well. A high negative correlation suggests an inverse relationship.
print("\nBy examining these statistics and correlations, we can gain insights into how closely market value growth tracks the growth of fundamental financial metrics like revenue, gross profit, and EBIT.")

By examining these statistics and correlations, we can gain insights into how closely market value growth tracks the growth of fundamental financial metrics like revenue, gross profit, and EBIT.
print("\nObserving periods where market value growth significantly diverges from fundamental growth (e.g., market value growing much faster or slower than profits) can indicate factors beyond current financial performance influencing market perception, such as future expectations, market sentiment, or macroeconomic conditions.")

Observing periods where market value growth significantly diverges from fundamental growth (e.g., market value growing much faster or slower than profits) can indicate factors beyond current financial performance influencing market perception, such as future expectations, market sentiment, or macroeconomic conditions.

Summary given by Gemini:

Summary:

Data Analysis Key Findings Apple’s market value was calculated by multiplying sharesoutstanding and originalprice. Year-over-year growth rates for market value, revenue, gross profit, and EBIT were calculated. Visualizations showed the trends of market value, revenue, gross profit, and EBIT over time, as well as their annual growth rates. Descriptive statistics highlighted the volatility in the annual growth rates of all metrics, as indicated by large standard deviations. A positive correlation exists between market value growth and the growth of fundamental financial metrics (revenue, gross profit, and EBIT). The strongest correlation was observed between market value growth and EBIT growth (approximately 0.769).

Insights or Next Steps The analysis suggests that Apple’s market value growth is positively related to its fundamental financial performance, particularly EBIT growth. Further investigation into periods where market value growth significantly diverges from fundamental growth could reveal other factors influencing market perception, such as investor sentiment or new product cycles.

Task Analyze the growth of ‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’ for Apple over time using the apple_financials dataset. Calculate a growth index for each variable, starting at 1 in the first year. Visualize these growth indices over time and provide insights based on the plot.

Calculate growth indices Subtask: Calculate a growth index for ‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’, starting with an index of 1 in the first year.

Reasoning: Select the necessary columns, set ‘year’ as the index, and calculate the growth index for each selected financial metric and market value by normalizing to the first year’s value, then display the head of the resulting dataframe.

growth_indices = apple_financials[['year', 'revenue', 'grossprofit', 'ebit', 'market_value']].copy()
growth_indices.set_index('year', inplace=True)

for col in ['revenue', 'grossprofit', 'ebit', 'market_value']:
    growth_indices[f'{col}_index'] = growth_indices[col] / growth_indices[col].iloc[0]

growth_indices.head()
        revenue  grossprofit  ...  ebit_index  market_value_index
year                          ...                                
2000  7983000.0    2166000.0  ...    1.000000            1.000000
2001  5363000.0    1235000.0  ...   -0.659004            0.650231
2002  5742000.0    1603000.0  ...    0.032567            0.621878
2003  6207000.0    1708000.0  ...   -0.001916            0.897593
2004  8279000.0    2259000.0  ...    0.624521            1.796389

[5 rows x 8 columns]

Visualize growth indices Subtask: Create a plot to visualize how each of these growth indices has changed over time.

Reasoning: Create a plot to visualize how each of these growth indices has changed over time as instructed.

plt.figure(figsize=(12, 6))
plt.plot(growth_indices.index, growth_indices['revenue_index'], label='Revenue Index')
plt.plot(growth_indices.index, growth_indices['grossprofit_index'], label='Gross Profit Index')
plt.plot(growth_indices.index, growth_indices['ebit_index'], label='EBIT Index')
plt.plot(growth_indices.index, growth_indices['market_value_index'], label='Market Value Index')
plt.xlabel('Year')
plt.ylabel('Growth Index (Normalized to 2000)')
plt.title('Growth of Financial Metrics and Market Value for Apple (AAPL) - Indexed to 2000')
plt.legend()
plt.show()

Comparing with a previous plot of the variables (not their growth indexes), we see EBIT has grown at a faster rate than grossprofit and revenue! this is against the perception of the previous plot where revenue looks that is growing much faster than grossprofit and EBIT!

Let’s continue with Gemini analysis:

Analyze growth indices Subtask: Provide insights based on the plot of the growth indices.

Summary: Data Analysis Key Findings The growth index for each variable (‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’) was calculated by normalizing the values to the first year (2000), where the index starts at 1. The plot of the growth indices over time reveals a significantly steeper growth trajectory for ‘market_value’ compared to ‘revenue’, ‘grossprofit’, and ‘ebit’. While all metrics show positive growth over the period, the market value’s growth far outpaces the growth in revenue, gross profit, and EBIT.

Insights or Next Steps The divergence in growth between market value and financial metrics suggests that market valuation is strongly influenced by factors beyond current financial performance, potentially including future growth expectations, brand value, and investor sentiment. A potential next step could be to investigate specific events or periods that correlate with significant divergences in growth between market value and financial performance to understand the drivers behind these shifts.

11 CHALLENGE 3. Summary tables to provide data understanding

Using the merged dataset (with all firms), do the following:

  • Create columns for grossprofit, ebit and marketvalue as you did with the apple_financials dataset

  • Create a summary table by year with the sum of

    • revenue

    • grossprofit

    • ebit

    • marketvalue

  • Using the previous summary table, do a plot to show revenue, grossprofit, ebit and marketvalue over time

  • Using the previous summary table, create a growth index for each variable to show how many times each variable has grown with respect to the first year. Show a plot to see these indexes over time

Use an LLM to generate the corresponding code to the previous steps. Make sure you understand the Python code. You can ask the LLM to explain the code line by line.

  • Provide an insight about the previous plots. What can you say about the whole financial market performance in terms of revenue, gross profit, ebit, and marketvalue? Explain with your words.

12 CHALLENGE 3 SOLUTION

Calculate financial metrics Subtask: Add ‘grossprofit’, ‘ebit’, and ‘market_value’ columns to the merged_data DataFrame.

Reasoning: Add the ‘grossprofit’, ‘ebit’, and ‘market_value’ columns to the merged_data DataFrame and display the head.

merged_data['grossprofit'] = merged_data['revenue'] - merged_data['cogs']
merged_data['ebit'] = merged_data['grossprofit'] - merged_data['sgae'] - merged_data['depreciation']
merged_data['market_value'] = merged_data['sharesoutstanding'] * merged_data['originalprice']
merged_data.head()
  firm  year     revenue  ...  grossprofit       ebit  market_value
0    A  2000  10773000.0  ...    5251000.0  1053000.0  2.498606e+07
1    A  2001   8396000.0  ...    3230000.0  -778000.0  1.321995e+07
2    A  2002   6010000.0  ...    2316000.0 -1607000.0  8.387759e+06
3    A  2003   6056000.0  ...    2294000.0  -725000.0  1.392260e+07
4    A  2004   7181000.0  ...    3123000.0   386000.0  1.173287e+07

[5 rows x 34 columns]

Create yearly summary Subtask: Create a summary table by year showing the sum of ‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’ for all firms.

Reasoning: Group the data by year and calculate the sum of the specified financial metrics to create a yearly market summary table.

yearly_market_summary = merged_data.groupby('year')[['revenue', 'grossprofit', 'ebit', 'market_value']].sum().reset_index()
yearly_market_summary.head()
   year       revenue   grossprofit          ebit  market_value
0  2000  7.160726e+09  3.061458e+09  6.144066e+08  1.317474e+10
1  2001  7.542706e+09  3.159320e+09  3.162269e+08  1.152252e+10
2  2002  7.378204e+09  3.263510e+09  4.496981e+08  9.163255e+09
3  2003  7.984335e+09  3.605409e+09  6.817022e+08  1.183163e+10
4  2004  8.912634e+09  4.013534e+09  8.226867e+08  1.309736e+10

Visualize trends Subtask: Plot the yearly sum of ‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’ to visualize their trends over time for the entire market.

Reasoning: Create a plot to visualize the trend of yearly summed financial metrics and market value over time for the entire market as instructed.

plt.figure(figsize=(12, 6))
plt.plot(yearly_market_summary['year'], yearly_market_summary['revenue'], label='Revenue')
plt.plot(yearly_market_summary['year'], yearly_market_summary['grossprofit'], label='Gross Profit')
plt.plot(yearly_market_summary['year'], yearly_market_summary['ebit'], label='EBIT')
plt.plot(yearly_market_summary['year'], yearly_market_summary['market_value'], label='Market Value')
plt.xlabel('Year')
plt.ylabel('Amount')
plt.title('Overall Market Financial and Market Value Trend Over Time')
plt.legend()
plt.show()

Calculate growth indices Subtask: Calculate a growth index for the yearly sum of ‘revenue’, ‘grossprofit’, ‘ebit’, and ‘market_value’, normalized to the first year.

Reasoning: Select the necessary columns, set ‘year’ as the index, and calculate the growth index for each selected financial metric and market value by normalizing to the first year’s value, then display the head of the resulting dataframe.

growth_indices_market = yearly_market_summary[['year', 'revenue', 'grossprofit', 'ebit', 'market_value']].copy()
growth_indices_market.set_index('year', inplace=True)

for col in ['revenue', 'grossprofit', 'ebit', 'market_value']:
    growth_indices_market[f'{col}_index'] = growth_indices_market[col] / growth_indices_market[col].iloc[0]

growth_indices_market.head()
           revenue   grossprofit  ...  ebit_index  market_value_index
year                              ...                                
2000  7.160726e+09  3.061458e+09  ...    1.000000            1.000000
2001  7.542706e+09  3.159320e+09  ...    0.514687            0.874592
2002  7.378204e+09  3.263510e+09  ...    0.731923            0.695517
2003  7.984335e+09  3.605409e+09  ...    1.109529            0.898055
2004  8.912634e+09  4.013534e+09  ...    1.338994            0.994127

[5 rows x 8 columns]

Visualize growth indices Subtask: Plot the yearly growth indices to visualize how these metrics have grown relative to the starting year for the entire market.

Reasoning: Plot the yearly growth indices to visualize how these metrics have grown relative to the starting year for the entire market.

plt.figure(figsize=(12, 6))
plt.plot(growth_indices_market.index, growth_indices_market['revenue_index'], label='Revenue Index')
plt.plot(growth_indices_market.index, growth_indices_market['grossprofit_index'], label='Gross Profit Index')
plt.plot(growth_indices_market.index, growth_indices_market['ebit_index'], label='EBIT Index')
plt.plot(growth_indices_market.index, growth_indices_market['market_value_index'], label='Market Value Index')
plt.xlabel('Year')
plt.ylabel('Growth Index (Normalized to First Year)')
plt.title('Growth of Overall Market Financial Metrics and Market Value - Indexed to First Year')
plt.legend()
plt.show()

In the previous plot, it seems that sales and market value has grown faster than the other variables. However, with this plot of the growth indexes, we see that market value has grown at a similar pace than EBIT over the whole period. Interesting!

So, which plot is a better picture of the real growth rates of the variables?

Above is what Gemini analyzed and summarized:

Analyze and summarize Subtask: Provide insights based on the plots, discussing the overall financial market performance in terms of revenue, gross profit, EBIT, and market value.

Summary: Data Analysis Key Findings Market value has shown significantly higher growth compared to revenue, gross profit, and EBIT when indexed to the first year. Revenue, gross profit, and EBIT demonstrate a general upward trend over the years, albeit with less pronounced growth than market value.

Insights or Next Steps The divergence in growth between market value and fundamental financial metrics suggests that market sentiment and other non-financial factors have played a substantial role in driving market valuation during the observed period.

Further analysis could investigate potential factors contributing to the outperformance of market value growth, such as changes in market multiples (e.g., P/E ratios) or industry-specific trends.