Analysis of Risk

In Theory

I have read from “The psychology of money” that a semi successful VC has these numbers when it comes to the number of actually successful companies. 0.5% of companies in the portfolio make a return of about 50x, 1% make around 20x, 2.5% make around 10x, 31% break even, and a staggering 65% lose money by 50%. We will play around with these theoretical parameters assuming we can trade like this. Only the 4% of companies actually make money! The rest? Lose.

# Analyzed from VC data
theoretical_spread <- data.frame(
  "Percent of Companies" = c(0.005, 0.01, 0.025, 0.31, 0.65),
  "Multiplier" = c(50, 20, 10, 1, 0.5)
)

if (apply(theoretical_spread, 2, sum)[1] != 1) {
  print("Percentage must add up to 100%")
}

theoretical_spread
##   Percent.of.Companies Multiplier
## 1                0.005       50.0
## 2                0.010       20.0
## 3                0.025       10.0
## 4                0.310        1.0
## 5                0.650        0.5

Money Generated in Theory

We assume that losing money companies mean that you lost 50% of your initial investment. We assume this because if the value dips below 50% we will sell.

initial_capital <- 10000
num_companies <- 100
inv_per_company <- initial_capital / num_companies

# Money is made from the top 4% 
percent_that_makes_money <- sum(theoretical_spread[theoretical_spread$Multiplier > 1, ]$Percent.of.Companies)

money <- ((theoretical_spread[, 1] * num_companies) * inv_per_company) * theoretical_spread[, 2]

data <- cbind(theoretical_spread, money)
data
##   Percent.of.Companies Multiplier money
## 1                0.005       50.0  2500
## 2                0.010       20.0  2000
## 3                0.025       10.0  2500
## 4                0.310        1.0  3100
## 5                0.650        0.5  3250
money_generated <- sum(data$money)
cat("Only the top ", percent_that_makes_money * 100, "% are what is generating profit, rest is either breakeven or failing. Even with this low probability, the profit we gained is $", money_generated - initial_capital, " from an initial capital of $", initial_capital, sep = "")
## Only the top 4% are what is generating profit, rest is either breakeven or failing. Even with this low probability, the profit we gained is $3350 from an initial capital of $10000

Vary the amount of companies we invest in

INITIAL_CAPITAL <- 10000
# Theoretical Capital gain because we use a sample with pregenerated probabilities  
capital_gain_theoretical <- function(number_of_companies) {
  inv_per_company <- INITIAL_CAPITAL / number_of_companies
  # Represents the company's returns as a multiplier
  return <- sample(theoretical_spread$Multiplier, number_of_companies, replace = TRUE, prob = theoretical_spread$Percent.of.Companies)
  
  money_generated_from_each_company <- inv_per_company * return
  money_generated <- sum(money_generated_from_each_company)
  profit <- money_generated - INITIAL_CAPITAL
  profit
}

RUNS <- 1000
results <- data.frame(
  "5 Companies" = numeric(RUNS),
  "10 Companies" = numeric(RUNS),
  "20 Companies" = numeric(RUNS),
  "50 Companies" = numeric(RUNS),
  "100 Companies" = numeric(RUNS),
  "200 Companies" = numeric(RUNS)
)

num_companies <- c(5, 10, 20, 50, 100, 200)

for (i in seq_along(num_companies)) {
  for (run in seq_len(RUNS)) {
    results[run, i] <- capital_gain_theoretical(num_companies[i]) 
  }
}

{function(){
  par(mfrow=c(3, 2)) 
  hist(results$X5.Companies)
  hist(results$X10.Companies)
  hist(results$X20.Companies)
  hist(results$X50.Companies)
  hist(results$X100.Companies)
  hist(results$X200.Companies)
}}()

prob_making_money <- function(profit, column) {
  probability <- sum(results[, column] > 0) / nrow(results)
  cat("Your probability of making more than $", profit, " is ", probability * 100, "%\n", sep="")
}

prob_making_money(0, "X5.Companies")
## Your probability of making more than $0 is 18.8%
prob_making_money(0, "X10.Companies")
## Your probability of making more than $0 is 33.6%
prob_making_money(0, "X20.Companies")
## Your probability of making more than $0 is 58.4%
prob_making_money(0, "X50.Companies")
## Your probability of making more than $0 is 70.2%
prob_making_money(0, "X100.Companies")
## Your probability of making more than $0 is 78%
prob_making_money(0, "X200.Companies")
## Your probability of making more than $0 is 89.8%
# We removed the box plot for 5 companies because the range is so high that it messes up the plot.
par(mfrow=c(1, 1))
boxplot(results[, -1], ylab="Profit", xlab="From 10 to 200 Companies")

As you increase the number of companies you decrease risk but you decrease the money earning potential. As you decrease number of companies you increase risk of losing a lot of money but you can make a lot of money as well.

From Theory to Experimental

Multipliers and stock data csv have been generated by my python script (spread_generator.py) that calls ALPACA api to fetch price data of the nasdaq. Data is accurate and up to date. choice.1.csv is a list of stocks that I have chose personally from the nasdaq that I think are good companies. Their multipliers have been generated by running the python script with a specific date range. A multiplier is basically \(NVDA_{mult} = NVDA_{future\ price} / NVDA_{past\ price}\).

Loading Data

# These are stocks I chose personally
my_spread <- read.csv("data/choice.1.csv", sep=",", header = TRUE)
my_spread[my_spread[, "Multiplier"] < 0.5, "Multiplier"] <- 0.5
head(my_spread)
##   Symbol Multiplier
## 1   UBER   1.767338
## 2   AMZN   1.217994
## 3   MSFT   1.271145
## 4   META   1.552486
## 5    NOC   1.054198
## 6   NFLX   1.371103
nrow(my_spread)
## [1] 33

Loading Paramters

INITIAL_CAPITAL <- 10000
RUNS <- 1000
# When changing this variable make sure it is reflected in the results data frame!
num_companies <- c(1, 5, 10, 15, 20, 25)

Varying the amount of companies I invest in with the stocks I chose

# Theoretical Capital gain because we use a sample with pregenerated probabilities  
# The spread is a list of the multipliers 
# Returns: profit
capital_gain_custom_spread <- function(number_of_companies, multiplier = numeric(0)) {
  inv_per_company <- INITIAL_CAPITAL / number_of_companies
  # Represents the companies
  # Different from theoreitical spread with custom probabilities
  return <- sample(multiplier, number_of_companies, replace = FALSE)
  money_generated_from_each_company <- inv_per_company * return
  money_generated <- sum(money_generated_from_each_company)
  
  profit <- money_generated - INITIAL_CAPITAL
  profit
}

results = data.frame(
  "1 Company" = numeric(RUNS),
  "5 Company" = numeric(RUNS),
  "10 Company" = numeric(RUNS),
  "15 Company" = numeric(RUNS),
  "20 Company" = numeric(RUNS),
  "25 Company" = numeric(RUNS)
)

for (i in seq_along(num_companies)) {
  for (run in seq_len(RUNS)) {
    results[run, i] <- capital_gain_custom_spread(num_companies[i], multiplier = my_spread[, "Multiplier"]) 
  }
}

{function() {
  par(mfrow = c(3, 2), mar=c(4,4,1,3))
  for (column in colnames(results)) {
    hist(
      results[, column],
      col = "blue",
      ylab = "Attempts",
      xlab = "Profit",
      breaks = "Freedman-Diaconis",
      #breaks = 100,
      xlim = c(min(results[, column], 0), max(results[, column])),
      main = column
    )
    # Mean line
    abline(v = mean(results[, column]), col = "red")
    # Quantile lines
    #abline(v = quantile(results[, column], prob = c(0.25, .50, 0.75)), col = "purple")
  }
}}()

prob_making_money(0, "X20.Company")
## Your probability of making more than $0 is 99.6%

As we increase the number of companies we invest in, we decrease risk of losing a lot of money, however, we also decrease the money earning potential. The theory matches reality.

Comparing Random Stock Picking vs Stocks I picked

Loading Data

my_choice_stocks <- read.csv("data/choice.1.csv", header = TRUE, sep=",")
random_stocks <- read.csv("data/random.1.csv", header = TRUE, sep=",")

Loading Parameters

# The starting capital you are starting with
INITIAL_CAPITAL <- 10000
# The number of runs to perform, more the better 
RUNS <- 1000
# Number of companies we are investing in (for random and my choice stocks)
COMPANIES <- 15

Experiment

capital_gain_custom_spread <- function(number_of_companies, multiplier = numeric(0)) {
  inv_per_company <- INITIAL_CAPITAL / number_of_companies
  # Represents the companies
  # Different from theoreitical spread with custom probabilities
  return <- sample(multiplier, number_of_companies, replace = FALSE)
  money_generated_from_each_company <- inv_per_company * return
  money_generated <- sum(money_generated_from_each_company)
  
  profit <- money_generated - INITIAL_CAPITAL
  profit
}

categories <- c("My Stock", "Random Stock")
num_rows = length(categories) * RUNS
results = data.frame(
  "Profit" = numeric(num_rows),
  "Type" = factor(character(num_rows), levels = categories)
)

idx <- 1
# Getting my stock data
for (i in seq_len(RUNS)) {
    results[idx, "Profit"] <- capital_gain_custom_spread(COMPANIES, multiplier = my_choice_stocks[, "Multiplier"])
    results[idx, "Type"] <- categories[1]
    idx <- idx + 1
}

# Getting random stock data
for (i in seq_len(RUNS)) {
    results[idx, "Profit"] <- capital_gain_custom_spread(COMPANIES, multiplier = random_stocks[, "Multiplier"])
    results[idx, "Type"] <- categories[2]
    idx <- idx + 1
}


{function() {
  p <- ggplot(results, aes(x=Profit, fill=Type))
  p + geom_area(stat = "bin") + geom_vline(xintercept = 0, col = "red")
}}()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

prob_making_money <- tapply(results$Profit, results$Type, function(category_data) {
  sum(category_data > 0) / length(category_data)
})

cat(
  "The probability of making more than $0 with me choosing stocks is ", 
  prob_making_money["My Stock"] * 100, "%.\n", 
  "The probability of making more than $0 with random stocks is ",
  prob_making_money["Random Stock"] * 100, "%.", sep = ""
)
## The probability of making more than $0 with me choosing stocks is 96.1%.
## The probability of making more than $0 with random stocks is 65.1%.
my_stock_results <- results[results$Type == "My Stock", "Profit"]
random_stock_results <- results[results$Type == "Random Stock", "Profit"]
{function() {
  par(mfrow = c(1, 2))
  hist(
    random_stock_results, 
    main = "Random Stocks",
    xlab = "Profit"
  )
  hist(
    my_stock_results,
    main = "Me Choosing Stocks",
    xlab = "Profit"
  )
}}()

Hypothesis testing, Is me choosing stocks better than random?

We will do a one sided two sample z-test (\(n > 30\)) comparing the means. We see that the data follows a normal distribution. We need to see if the mean profit is significantly greater than the random mean profit. This would then mean our expertise of choosing stocks works, or if it is just a fluke.

\[ H_0: \mu_{choice} \leq \mu_{random}\\ H_1: \mu_{choice} > \mu_{random}\\ \]

population_sd <- function(x) {
  n <- length(x)              # Number of elements in the sample
  mean_x <- mean(x)           # Mean of the sample
  sqrt(sum((x - mean_x)^2) / n)  # Population standard deviation formula
}

res <- z.test(
  x = my_stock_results, 
  y = random_stock_results, 
  alternative = "greater",
  sigma.x = population_sd(my_stock_results),
  sigma.y = population_sd(random_stock_results)
)

res
## 
##  Two-sample z-Test
## 
## data:  my_stock_results and random_stock_results
## z = 47.328, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  1107.922       NA
## sample estimates:
## mean of x mean of y 
## 1261.0354  113.2218

In Conclusion

if (res$p.value < 0.05) {
  cat("Since p-value is less than the significance level, we reject the null hypothesis which means that you choosing stocks is better than randomly picking stocks")
} else {
  cat("Since the null hypothesis is not rejected, you may as well invest in random stocks than choose stocks yourself. You suck at trading. :((")
}
## Since p-value is less than the significance level, we reject the null hypothesis which means that you choosing stocks is better than randomly picking stocks

Testing different time periods

For this I want to gather a list of 15 big tech companies that I believe in. I will randomly be choosing 7 stocks to buy and sell in each time period, I will run 1000 runs of this. I have 5 time periods (6 months each) and will be testing each one. I will then plot all of them in a stacked area chart. This will help determine if I have a good enough chance of making money this way, and later I can compare to the S&P 500 performance.

Loading Data

time_periods <- 1:5
COMPANIES <- 7
RUNS <- 1000
data <- data.frame("Profit" = numeric(0), "TimePeriod" = factor(numeric(0), levels = time_periods))

idx <- 1
for (tp in time_periods) {
  file_name <- paste("data/time_period/", tp, ".csv", sep="")
  tp_data <- read.csv(file_name, header = TRUE, sep = ",")
  for (i in seq_len(RUNS)) {
      data[idx, "Profit"] <- capital_gain_custom_spread(COMPANIES, multiplier = tp_data[, "Multiplier"])
      data[idx, "TimePeriod"] <- tp
      idx <- idx + 1
  }
}

Displaying Data

Now that we have loaded the data into a data frame we can create a stacked area chart

{function() {
  p <- ggplot(data, aes(x=Profit, fill=TimePeriod))
  p + geom_area(stat = "bin", bins=30) + geom_vline(xintercept = 0, col = "red")
}}()

As you can see that time period really matters. Time Period 1 means that if you bought and sold in the most recent 6 month cycle. Time period 2 is the 6 months before that. And so one. Each time period lasts 6 months (bought and sold). My conclusion is that the stock market is mostly just luck. When you get in really matters. If we invest in more companies we can minimize risk and less money can be lost, but less money can be gained as well. As you can see that in time period 5 (oldest 6month time period), no matter what combination of 7/15 big companies you chose, you would have lost money, if you bought and sold in that time period.

Increase number of companies we choose, and increase list of stock choices

I will find the top 50 biggest tech stocks, and test investing into a random 25 companies from the list, with 1000 runs. I will use the same time periods.

Loading Data

time_periods <- 1:5
COMPANIES <- 25
RUNS <- 1000
data <- data.frame("Profit" = numeric(0), "TimePeriod" = factor(numeric(0), levels = time_periods))

idx <- 1
for (tp in time_periods) {
  file_name <- paste("data/time_period.2/", tp, ".csv", sep="")
  tp_data <- read.csv(file_name, header = TRUE, sep = ",")
  for (i in seq_len(RUNS)) {
      data[idx, "Profit"] <- capital_gain_custom_spread(COMPANIES, multiplier = tp_data[, "Multiplier"])
      data[idx, "TimePeriod"] <- tp
      idx <- idx + 1
  }
}

Displaying Data

Now that we have loaded the data into a data frame we can create a stacked area chart

{function() {
  p <- ggplot(data, aes(x=Profit, fill=TimePeriod))
  p + geom_area(stat = "bin", bins=30) + geom_vline(xintercept = 0, col = "red")
}}()

Conclusion

Same shit as before. It really matters what time period you choose, no matter the amount of companies you invest in. Knowing what state the economy is critical if you want to make money. This 6 month time periods really show you the seasons of the economy, at least pertaining to big tech.

Here is time period 5 which seems to be the worst time period to invest in. The economy was shit this time. There is no scenario that exists where you can make money by any combination of 25 companies from the biggest tech companies from a list of 50 top tech companies during this time. Here is a histogram:

# Time 
hist(data[data$TimePeriod == 5, "Profit"], col = "red", xlab="Profit", main = "Time Period 5", xlim = c(-3000, 0))

data[(data$TimePeriod == 5 & data$Profit >= 0), ]
## [1] Profit     TimePeriod
## <0 rows> (or 0-length row.names)

Conclusion

Just put your money in the S&P and forget about it. Unless you can somehow predict economic cycles. One thing to take away here is that you choosing stocks is better than randomly picking stocks.

My Stuff

The script I used to fetch Stock API data.

# spread_generator.py
import requests 
from datetime import datetime
import json
import csv

def main():
    stock_file = str(input("Input txt file (stocks.txt): ")).strip()
    output_file = str(input("Output csv file (stocks.csv): ")).strip()

    def is_stock(stock_symbol):
        if "#" in stock_symbol or stock_symbol == "\n":
            return False
        return True

    stocks = []

    with open(stock_file, "r", encoding="utf-8") as f:
        raw = f.readlines()
        for data in raw:
            if is_stock(data):
                stocks.append(data.strip())

    stocks_query = "%2C".join(stocks)
    old_date = datetime(2022, 2, 15).strftime("%Y-%m-%d")
    curr_date = datetime(2022, 8, 15).strftime("%Y-%m-%d")

    future_prices_url = f"https://data.alpaca.markets/v2/stocks/bars?symbols={stocks_query}&timeframe=1Day&start={curr_date}&end={curr_date}&limit=1000&adjustment=all&feed=sip&sort=asc"
    old_prices_url = f"https://data.alpaca.markets/v2/stocks/bars?symbols={stocks_query}&timeframe=1Day&start={old_date}&end={old_date}&limit=1000&adjustment=all&feed=sip&sort=asc"

    headers = {
        "accept": "application/json",
        "APCA-API-KEY-ID": "XXX",
        "APCA-API-SECRET-KEY": "XXXXXX"
    }

    response = requests.get(future_prices_url, headers=headers)
    future_prices = json.loads(response.text)["bars"]
    response = requests.get(old_prices_url, headers=headers)
    old_prices = json.loads(response.text)["bars"]

    data_csv = [("Symbol", "Multiplier")]

# Goes by the open price
    for symbol in stocks:
        if symbol not in future_prices:
            print(f"Symbol {symbol} does not exist in ALPACA API")
            continue
        elif symbol not in old_prices:
            print(f"Symbol {symbol} did not exist {old_date}")
            continue
        future_open_price = future_prices[symbol][0]["o"]
        old_open_price = old_prices[symbol][0]["o"]
        multiplier = future_open_price / old_open_price

        data_csv.append((symbol, multiplier))

    with open(output_file, "w", newline="") as file:
        writer = csv.writer(file)
        writer.writerows(data_csv)

if __name__ == "__main__": 
    main()