Python for Finance: Investment Fundamentals & Data Analytics
Instructor:
Section 1: Welcome! Course Introduction
1 What Does Course Cover ?
2 Download Useful Resources - Exercise and Solutions
[Udemy!](https://www.dropbox.com/sh/25ifew3toiziwt2/AAAIYTvavqFfWPEemxeZQt37a?dl=0)
Section 2: Introcduction to programming with Python
Section 3: Python Variables and Data Types
Section 4: Basic Python Syntax
Section 5: Python Operators Continued
Section 6: Conditional Satements
Section 7: Python Functions
Section 8: Python Sequences
Section 9: Using Itegrations in Python
Section 10: Advanced Python tools
Section 11: PART II FINANCE: Calculating and Comparing Rates of Return in Python
Section 12: PART II FINANCE: Measuring Investment Risk
Section 13: PART II FINANCE: Using Regressions for Financial Analysis
Section 14: PART II FINANCE: Markowitz Portfolio Optimaization
Section 15: PART II FINANCE: The CAPM Asset Pricing Model
Section 16: PART II FINANCE: Multivariate regression analysis
Section 17: PART II Finance - Monte Carlo simulations as a decision-making tool
98. The essence of Monte Carlo simulations
Monte Carlo simulations: is an important tool that finds several applications in the world of bussiness and finance.
When we run a Monte Carlo simulation, we are interested in observing the different possible realizations of a future events.
What happens in the real life is just one of the possible outcomes of any event.
If a basketball player shoots a free thro at the end of a game and the game is tied, there are two possibility: He scores and wins the game or he doesn’t.
What we see in pratice is the state of the world that comes into realiztion.
However, this doesn’t tell us much about the player’s chances of scoring the free thro beforehand.
=> And that’s where Monte Carlo simulation comes in handy.
Monte Carlo simulations
- We can use past data to crate a simulation -anew set of fictional but sensible data
These realizations are generated by observing the distribution of the historical data and calculating it mena and variance. The new dataset is larger than the initial set, and it allows us to see what would have happpend if the player was suppose to shoot the final free throw a thousand times, for instance.
Such information is valuable as it allows us to consider a good proxy of the probability of different.
A large dataset with “fictional” data: - A large dataset - Good proxy of different outcomes - Used in a number of situations
Monte Carlo simulations are used in coporate finance, investment, valuation, asset management, risk management, estimating insurance liabilities, pricing of options and other derivatives.
The large level of uncertainty in finance makes Monte Carlo a valuable tool that improve the decision making process when several random variables are in play.
Take a look at the note book file coming with this lesson. It will explain a very important concept, one we will use when performing Monte Carlo simulations.
Quiz: Why are Monte Carlo simulations frequently applied in the world of Finance? Mark the wrong answer
To estimate statistics based on past data
To create a large “fictional” data sets
To obtain good proxy of different outcomes
To reduce the level of uncertainty in the Finance markets
99. Monte Carlo applied in a Corporate Finance context
Given that a Monte Carlo simulation tackles uncertainty, it can have many applications in the world of finance. Undoubtedly some are in a corporate finance setting.
Finance managers face many unknowns in their work. They must forecast the development of revenues, costs of goods sold and operating expenses. And each value is affected by many fators whose behavior could be considered random. Htat’s why a tool that reduces complexity and imporves decison making processes can be helpful.
**Current Revenues = Last Year Revenues*( 1+ Growth Rate of Revenues)**
Observed:
Last Year Revenues: available
Revenue Growth Rate: Random variable
=> Computer software would allow us to simulate the development of revenues, run 1000 simulations of growth rate, obtain a maximum, minimum and average expected revenues
Understand the direction where revenues are heading… This can be useful for a finance manager as it would allow him to understand overall direction of where the company is heading and the maximum and minimum amounts of revenue he can expect.
The two parameters revenue growth and standard deviation could be obtained by looking at historical figures or arbitrarily chosen based on the user’s intuition.
The reasoning for cost of goods sold and operating expenses is almost tha same, with one main difference. We need to represent COGS and OpEx as a percentage of revenues, then model the development of the percentage of revenues. This would allow us to obtain COGS and OpEx figures, which are in line with the firm’s expected revenues. As intuitively COGS and OpEx depend on the revenues the firm makes. If revenues are low, there will be noo need to spend a lot for COGS as fewer products will be produces. When you deduct COGS from the revenue of a company, you will obtain its gross profits.
| Type of correlation | Description |
| Revenues | Current revenues = Previous revenues * (1 + growth rate) Growth rate is the unknown variable. We can simulate its development if we know its distribution, mean, and standard deviation. This would allow us to obtain multiple simulations about the development of revenues |
| Cost of goods sold | Cost of goods sold = Percentage of revenues; For each “revenue path” Cogs can be simulated as a percentage of the observed amount of revenues. All we have to do is simulate the percentage as a random variable with a known distribution, mean, and standard deviation |
| Gross profit | Revenues – Cost of goods sold = Gross profit |
Quiz: Monte Carlo would allow us to forecast:
Revenues
COGS
Operatings Profit
All of above
100. Monte Carlo: Predicting Gross Profit – Part I
Task: predict the frim’s future gross profit
Requirements:
- Expected Revenue
- Expected COGS
For this purpose, we will need the values of expected revenue and expected cost of goods sold referred to as cogs. In brief, we’ll start by performing one thousand simulations of the company’s expected revenues.
101. Monte Carlo: Predicting Gross Profit – Part II
The difference between revenues and cogs gives us gross profit computing, the latter is the objective of this lesson.
Since we have generated 1000 potential revenue in COGS values, calculating the simulation of gross profit requires us to combine the revenue and cogs.
We created COGS as a negative number, so it is correct to use a plus and not a minus sign here.
102. Monte Carlo: Predicting Gross Profit – Part III
In this lesson, we’ll show you how a Monte Carlo simulation can be used to model the development of asset prices such as stocks, the price of an equity share is something we’ve observed in the past.
However, its future development is unknown, right?
The only information we have is about past prices.
Tomorrow, the company’s shares could go up or down. Who knows?
Let’s look at the following formula.
The price of a share today is equal to the price of the same share yesterday multiplied by e to the power of r, where r is the long return of the share.
\[\mathrm{Price\; Today\;=\;Price\;Yesterday}\times e^r\]
If this equation looks a little strange to you, remember the algebra principle according to which e to the power of a natural logarithm gives us the number we are taking a logarithm for here are is equal to the natural logarithm of today’s price divided by yesterday’s price.
This is an equation we should feel confident with. Its main added value is it allows us to depict today’s stock price as a function of yesterday’s stock price and the daily return we’ll have. It is, a random variable Brownian motion is a concept that would allow us to model such randomness. The formula we can use is made of two components. * The first one is called Drift and the second one is the stock’s Volatility.
Drift is the direction rates of return have been headed in the past.
It is the best approximation about the future we have.
First, we’ll start by calculating the stock’s periodic daily returns over the historical period. We only have to take a natural logarithm of the ratio between current and previous price.
Once we’ve calculated daily returns in the historical period, we can easily calculate their average standard deviation and variance.This would allow us to calculate the drift component, which is equal to the average daily return, minus zero point five times the variance.
\[\mathrm{Drift}=(\mu-\frac{1}{2}\sigma^2)\] We recognize historical values are eroded in the future. Hence a random component is included zero point five times the variance.
The drift is the expected daily return of the stock.
- The second component of a Brownian motion is the random variable.
\[\mathrm{Random\;variable\;=\;}\sigma\times Z(Rand(0;1))\]
It is given by a stock’s historical volatility multiplied by Z of a random number between zero and one. The random number from zero to one is a percentage. If we assume expected future returns are distributed. Normally, Z of the percentage between zero to one would give us the number of standard deviations away from the mean. We can do that because statisticians have calculated the distance between the mean and events that have a given probability of occurring, a probability between zero and one.
So, for example, the distance between the mean and events with a probability of less than ninety nine point seven percent is three standard deviations. This is how we can determine the variable component of the Brownian motion.
The equation of a stock price today becomes yesterday’s price multiplied by E to the power of the drift, plus the random value. \[\mathrm{Price\;today\;=\;Price\;Yesterday}\times e^{(\mu-\frac{1}{2}\sigma^2)+\sigma Z[Rand(0;1)]}\] If we repeat this calculation 1000 times, we’ll be able to simulate the development of tomorrow’s stock price and assess the likelihood it will follow a certain pattern.
In addition, this is a great way to assess the upside and the downside of the investment as we’ve obtained an upper and lower bound when performing the Monte Carlo simulation.
These are the mechanics you need to understand when using Monte Carlo for asset pricing.
103. Monte Carlo: Forecasting Stock Prices - Part I
In this lesson, we’ll continue to explore how Monte Carlo simulations can be applied in practice.
In particular, we will see how we can run a simulation when trying to predict the future stock price of a company.
We want to forecast PGS future stock price in this exercise.
So the first thing we’ll do is estimate its historical log returns. There is a second way to obtain simple or logarithmic returns, and we will discuss it in more detail in the notebook document attached to this video. The method will apply here is called Percent Change and you must write percent underscore change open and close parentheses to obtain the simple returns from a provided data set. We can create the formula for long returns by using numpties log and then type one, plus the simple returns extracted from our data.
104. Monte Carlo: Forecasting Stock Prices - Part II
The type function allows us to check their type and see it as Pande series.
However, let me demonstrate how Taiping values after a panda’s object, be it a series or a data frame, can transfer the object into a numpty array.
105. Monte Carlo: Forecasting Stock Prices - Part III
To make credible predictions about the future, the first stock price in our list must be the last one in our data set.
It is the current market price.
…
Finally, we can generate value for our price list.
We must set up a loop that begins in day one and ends that day 1000, we can simply write down the formula for the expected stock price on date in Python
It will be equal to the price and day T minus one times the daily return observed in day T. \[S_t=S_{t-1}\times \mathrm{daily\_returns_t}\]
106. An Introduction to Derivative Contracts
Derivative Instruments
Originally, derivatives served as a hedging instrument. Companies interested in buying these contracts were mostly concerned about protecting their investment. However, with time, financial institutions introduced a great deal of innovation to the scene, the so-called financial engineering was applied, and new types of derivatives appeared
| Type of correlation | Description |
| Forwards | A forward contract is used when two parties agree that one party will sell to the other an underlying asset at a future point of time. The price of the asset is agreed beforehand. |
| Futures | Futures are highly standardized forward contracts typically stipulated in a marketplace. The difference between futures and forwards is the level of standardization and the participation of a clearing house – the transaction goes through the marketplace, and the counterparties do not know each other. |
| Swaps | Swap contracts are derivative instruments in which two parties agree to exchange cash flows, based on an underlying asset at a future point of time. The underlying asset can be an interest rate, a stock price, a bond price, a commodity price, and so on. |
| Options | An option contract enables its owner to buy or sell an underlying asset at a price, also known as strike price. The owner of the option contract may buy or sell the asset at the given price, but he may also decide not to do it if the asset’s price isn’t advantageous. |
Forward / Future contracts
The two parties enter into an agreement to buy/sell an asset at time T
A forward/future contract has the following payoff:
The payoff of a forward/future contract is a function of the agreed price when the contract is signed (K), and the price of the asset at time t
107. The Black Scholes Formula for Option Pricing
Option contracts
There are two main types of options.
Call Options – the holder has the right to buy an asset at an agreed strike price.
Put Options – the holder has the right to sell an asset at an agreed strike price.
At time T, the owner of the option decides whether to exercise it or not:
A call option contract has the following payoff:
The payoff of an option is a function of the agreed strike price when the contract is signed (K), and the price of the asset at the time of maturity of the option (St). In addition, there are two types of options – European and American. European options can be exercised only at maturity, while American options can be exercised at any time and are hence more valuable.
A put option contract has the following payoff:
The payoff of an option is a function of the agreed strike price when the contract is signed (K), and the price of the asset at the time of maturity of the option (St). In addition, there are two types of options – European and American. European options can be exercised only at maturity, while American options can be exercised at any time and are hence more valuable.
Swap contracts
In a swap contract, the two parties agree to exchange cash flows based on an underlying asset.
A swap contract has the following payoff:
The payoff of a swap is a function of the price of the underlying asset.
Pricing derivatives
The Black Scholes formula is one of the most widely used tools for derivatives pricing. It can be written in the following way:
\[C(S,t)=N(d_1)S-N(d_2)Ke^{-r(T-t)}\] \[d_1=\frac{1}{s\sqrt{(T-t)}}[ln\frac{S}{K}+(r+\frac{s^2}{2})(T-t)]\] \[d_2=d_1-s\sqrt{T-t}\]| Type of correlation | Description |
| S | The stock’s current market price |
| K | The strike price at which the option can be exercised; if we exercise the option, we can buy the stock at the strike price K |
| T-t | The option’s time until expiration |
| r | Risk free rate |
| s | The standard deviation of the underlying asset |
| N | Normal distribution |
108. Monte Carlo: Black-Scholes-Merton
Black Scholes Merton \[d_1= \frac{ln(\frac{S}{K})+(r+\frac{sdev^2}{2})t}{s\times \sqrt t}\] \[d_2=d_1-s\times \sqrt t \] \[C = SN(d_1)-Ke^{-rt}N(d_2)\] where:
S - Stock price K - Strike price r - risk free rate stdev - standard deviation T - time horizon(years)
Cumulative Distribution Function(cdf)
Show how the data accumulates in time
> #### Quiz: How can you use a Monte Carlo simulation when applying the Black-Scholes-Merton formula for calculating future outcome?
When generating future random risk free rates
Fir randomizing strike values
When Randomly generating future stock prices
109. Monte Carlo: Euler Discretization - Part I
\[S_t=S_{t-1}e^{((r-1/2\times stdev^2)\delta_t+stdev.\sqrt {\delta_t}Z_t)}\]
110. Monte Carlo: Euler Discretization - Part II
Call Option:
Buy if: \(S-K>0\) Don’t buy uf : \(S-K<0\)
\[C=\frac{e^{-rT}\sum p_i}{iterations}\]
Section 18: APPENDIX pandas Fundementals
111. pandas Series - Introduction
The “Series” object is a single-column data, or a set of values that correspond to a single variable
=> We can crate a Series from a list
# Check Type:
type(object)
object: the default datatype assigned to data which is not numeric
Pandas Series object corresponds to the one dimensional NumPy array structure,
Takeaways:
- The pandas Series objects is something like powerful version of Python list, or and enhanced version of the NumPy array
Series $ $ a larger set of toools and capabilities that are pertinent to the pandas library only The Series objects stores its value in a sequenced ordered, and has an explicit index 2. Always maintain the data consistency.
112. pandas - Working with Methods - Part I
Almost every entity in Python is an object containing data and typically metadata and some functionalities.
A Python object is associated with a certain collection of attributes and methods.
Attributes provide the metadata, while methods relate to the functionalities and behavior of the object, while attributes are somehow passive since they can only deliver information about a given data set.
Methods are active in the sense that they actually work with the data stored in the object to refine our understanding of why we need Python methods.
| Functions | Methods |
|---|---|
| An independent entity | Can have access to the object’s data |
| NaN | Can manipulate the object’s state |
Both tools are very similar because when provided with some initial data, they can make specific operations with it and return an output.
However, a function is an independent entity in the sense that it is not associated with an object by construction.
On the contrary, a method from a given package is generally applied to an object of a certain class.
More precisely, when called or invoked, a method can have access to the objects data and can also manipulate the object’s state.
For this reason, since we can’t use a method unless there’s an object to associate it with different libraries contain their own sets of methods.
Thus, they can be applied to the types of objects associated with these libraries only.
For instance, there is a certain group of methods and pandas that are specific to the series object.
They can’t be used on data frames or objects belonging to other packages in the rest of this lecture.
We will focus on working with some well-known panda series methods, but once again, the approach we adopt will be applicable when using the Python or Nampy methods as well.
114. pandas - Using Parametes and Argument
In fact, one of the best features of using methods is that we can also modify their performance.
Technically, we can achieve that by knowing the parameters associated with a given method and then supplying the relevant arguments upon execution.
We’ll clarify the distinction between parameter and the argument, and we’ll use this terminology in the context of panda series.
Of course, the theory and principles will go through will be applicable not just when working with pandas, numpy or Python, but with other languages and packages as well.
The value of that default argument happens to be the integer five great.
In addition, a parameter of a python method or function always has a name.
Therefore, we can write the relevant name in our code explicitly.
The benefit of using parameter names explicitly and in the right order is to inform the reader exactly what parameters you’re providing the arguments for.
Thus, we can avoid any ambiguity in what the supplied arguments are supposed to be used for.
115.
So the Panda series object represents a single column data or a set of observations related to a single variable.
Of course, bonds to the one dimensional Numpy array structure will assume that’s clear, and we’ll
focus on exploring various methods and tools related to the pandas series.
117.
Further elaborated the concept of the data frame.
This is important because you will be constantly working with the structure while pre-processing and analyzing data with pandas.
In fact, it will be very convenient to develop an understanding of the data frame by comparing it to the characteristics of the series object.
You are already aware of the latter.
So let’s take advantage of that.
You can think of the series object as a single column data or a set of values that correspond to a single variable.
Well, the same can be true for a data frame, but more often relates to a multi column data where each
column represents a different variable.
That’s why every column can contain data of its own type.
And this is one of the best features of the structure.
In other words, the information in the data frame can potentially be heterogeneous.
It doesn’t need to be of a single type, as is the case with a series.
Nevertheless, this allows us to still preserve data consistency.
We should keep aiming for having information of the same type within a certain column.
Therefore, it is very important to remember the following throughout the lecture and beyond.
Fundamentally, from the perspective of pandas, the data frame is nothing but a collection of multiple
series objects.
This means that as a structure, the series corresponds to a single column of a data frame.
So any characteristic of a series you can think of is also applicable to the separate columns of a data frame.
Therefore, you can refer to the series object as a one dimensional data structure because it contains values along a single axis representing a certain number of rows.
The data frame instead is a two dimensional structure in the sense that its data has been organized not only in the rows but also in the columns.
Thus, it represents a tabular structure and is the closest Python analogue we can have to a standard two dimensional dataset, so to speak, or a spreadsheet.
So a data frame can be regarded as a collection of series in the same way in which an Excel spreadsheet is a collection of multiple columns.
That’s why a data frame can both row and column labels.
On the whole, how can such organization improve your analysis?
Well, it frees you from constantly referring to the data frame as a two dimensional numpy array or a 2D matrix.
Rather, you can think of it analytically as a collection of multiple observations for the given variables.
Thus, you’ll be able to obtain information contained in a single data point by referring to the relevant observation of a certain variable in technical comparison.
You can use the row index of a series as a single point of reference to obtain a certain data value while working with the data frame, though we need to use two points of reference the Row Index and the column index.
If you use just one point of reference in the data frame, you’d obtain either an entire row or an entire column to obtain the value of a single data point from a data frame corresponding to the value of a specific observation from a given variable.
You’ll need the intersection defined by the relevant index and column positions as a final juxtaposition between the two key panda’s data structures.
Let’s view them as programming objects.
Basically, the series can be looked at as a powerful version of the Python list.
However, it also includes some Python dictionary features.
For instance, a series doesn’t just take integer float string or other values.
It’s indexing relates to the functionality of the keys of a dictionary.
Then the related features of this structure can lead to many analytical inferences.
Perhaps the most important feature is the advanced type of indexing associated with the series object as opposed to the Python list object.
The former allows us to extract the desired parts of the given data set more quickly and efficiently.
Taking that into account, the data frame object is essentially an enhanced Python dictionary, much like creating a series from a Python dictionary.
We can construct a data frame.
The difference is that we don’t need to associate just a single value to the dictionary keys.
We can provide a whole object that contains the values of an entire column.
Thus, the pandas data frame provides dictionary like access to its elements, inherits the characteristics of the dictionary class in Python, and adds a lot more features and functionalities on top to enrich the outcome of our analyses.
118.
Let’s start with an overview of the general characteristics the data frame is the most important structure
in the panda’s library.
There are several ways to look at it, and perhaps the most intuitive one is to consider it as a data table.
It is nothing but a tabular structure that contains multiple observations for a given set of variables.
That’s why we define the data frame as two dimensional.
Therefore, to obtain a single data value from it, you need two points of reference.
One is the column of interest and the other one the relevant row.
So as a structure, we can associate the data frame to a matrix.
However, pandas users know that the data frame is just a collection of one or several series objects.
Thus, whether it’s representing single or multi column data, every column from the data frame is a series object itself.
This means that each column inherits the characteristics and functionalities of a series.
One of these functionalities is that a series can contain data of its own type.
Hence, the information in a data frame can potentially be heterogeneous, which from an analytical perspective, and given that we preserve data, consistency can be of huge advantage.
The data frame structure allows you to conduct and improve your analysis in millions of ways.
So what you need is to step on some fundamentals. The rest is all about accumulating practice on working with various datasets.
That said, let’s focus on the following example the data frame is a structure that corresponds to a two dimensional numpy array.
Remember, that’s why we will initialize a data frame from a homogenous array.
For starters, let’s import pandas and numpy
Then we will create a two by three array, which means it will contain two one dimensional arrays with three elements each.
We’ll call it array a.
The relationship between the pandas data frame and the NUM by two dimensional array is solid.
As a consequence, creating a data frame out of array is straightforward.
We only need to place Array A within the parentheses of the pandas data frame constructor to obtain the desired result.
The type function confirms that indeed, the object is a data frame back to the obtained output.
Don’t forget that when we are not explicit about the index and column names while creating a data frame, the default integer indexing will be implicitly assigned to both OK as a next step.
We can use the conventional name for a data frame variable, which is DMF, and provide the following column labels by using the columns parameter column one column two column three.
Then, as the documentation states, we can use the index parameter to assign row one and row two as index values.
120. pandas DataFrames - Data Selection
We will begin working on data selection and panda’s data frames.
We will first introduce the concept of subset selection and then focus on its application on pandas.
Data Selection:
Data selection or subset selection in the pandas data frame means extracting elements, rows, columns or subsets from such an object.
Data selection allows us to work on just a portion of a data set.
Since this is exactly what you will do most of the time in the stage of analysis, the ability to extract specific parts of the data is crucial.
Now, let’s focus on how you can achieve this by using pandas. One way to select a subset of a data frame is through indexing. ### Indexing Indexing means using one or both types of indexes.
A data frame has the ROE index and the column index to access or select specific parts of the data, or to put it in other words.
Indexing means providing certain values as a row and or column specify was also called row and column indexers.
With their help, we can point to specific rows or columns of the dataset so we can extract a specific entry or a desired subset in our data. So far, this is all theoretical.
Note: A column name with a whitespace character can’t act as a valid identifier for Python.
Panda series and data frames or Numpy arrays, you’ll always need to access parts of the available data.
To do that, you’ll be using a similar syntax that may behave differently under the various types of data selection.
121. pandas DataFrames - Data Selection with .iloc[]
iloc[]
- =iloc indexer= iloc accessor
Same rules apply for indexing:
- Python lists
- pandas Series by index position
- pandas Series and Dataframe with .iloc[]
- Using the strict implicit integer location position-based indexing.