I have read from “The psychology of money” that a semi successful VC has these numbers when it comes to the number of actually successful companies. 0.5% of companies in the portfolio make a return of about 50x, 1% make around 20x, 2.5% make around 10x, 31% break even, and a staggering 65% lose money by 50%. We will play around with these theoretical parameters assuming we can trade like this. Only the 4% of companies actually make money! The rest? Lose.
# Analyzed from VC data
theoretical_spread <- data.frame(
"Percent of Companies" = c(0.005, 0.01, 0.025, 0.31, 0.65),
"Multiplier" = c(50, 20, 10, 1, 0.5)
)
if (apply(theoretical_spread, 2, sum)[1] != 1) {
print("Percentage must add up to 100%")
}
theoretical_spread
## Percent.of.Companies Multiplier
## 1 0.005 50.0
## 2 0.010 20.0
## 3 0.025 10.0
## 4 0.310 1.0
## 5 0.650 0.5
We assume that losing money companies mean that you lost 50% of your initial investment. We assume this because if the value dips below 50% we will sell.
initial_capital <- 10000
num_companies <- 100
inv_per_company <- initial_capital / num_companies
# Money is made from the top 4%
percent_that_makes_money <- sum(theoretical_spread[theoretical_spread$Multiplier > 1, ]$Percent.of.Companies)
money <- ((theoretical_spread[, 1] * num_companies) * inv_per_company) * theoretical_spread[, 2]
data <- cbind(theoretical_spread, money)
data
## Percent.of.Companies Multiplier money
## 1 0.005 50.0 2500
## 2 0.010 20.0 2000
## 3 0.025 10.0 2500
## 4 0.310 1.0 3100
## 5 0.650 0.5 3250
money_generated <- sum(data$money)
cat("Only the top ", percent_that_makes_money * 100, "% are what is generating profit, rest is either breakeven or failing. Even with this low probability, the profit we gained is $", money_generated - initial_capital, " from an initial capital of $", initial_capital, sep = "")
## Only the top 4% are what is generating profit, rest is either breakeven or failing. Even with this low probability, the profit we gained is $3350 from an initial capital of $10000
INITIAL_CAPITAL <- 10000
# Theoretical Capital gain because we use a sample with pregenerated probabilities
capital_gain_theoretical <- function(number_of_companies) {
inv_per_company <- INITIAL_CAPITAL / number_of_companies
# Represents the company's returns as a multiplier
return <- sample(theoretical_spread$Multiplier, number_of_companies, replace = TRUE, prob = theoretical_spread$Percent.of.Companies)
money_generated_from_each_company <- inv_per_company * return
money_generated <- sum(money_generated_from_each_company)
profit <- money_generated - INITIAL_CAPITAL
profit
}
RUNS <- 1000
results <- data.frame(
"5 Companies" = numeric(RUNS),
"10 Companies" = numeric(RUNS),
"20 Companies" = numeric(RUNS),
"50 Companies" = numeric(RUNS),
"100 Companies" = numeric(RUNS),
"200 Companies" = numeric(RUNS)
)
num_companies <- c(5, 10, 20, 50, 100, 200)
for (i in seq_along(num_companies)) {
for (run in seq_len(RUNS)) {
results[run, i] <- capital_gain_theoretical(num_companies[i])
}
}
{function(){
par(mfrow=c(3, 2))
hist(results$X5.Companies)
hist(results$X10.Companies)
hist(results$X20.Companies)
hist(results$X50.Companies)
hist(results$X100.Companies)
hist(results$X200.Companies)
}}()
prob_making_money <- function(profit, column) {
probability <- sum(results[, column] > 0) / nrow(results)
cat("Your probability of making more than $", profit, " is ", probability * 100, "%\n", sep="")
}
prob_making_money(0, "X5.Companies")
## Your probability of making more than $0 is 18.8%
prob_making_money(0, "X10.Companies")
## Your probability of making more than $0 is 33.6%
prob_making_money(0, "X20.Companies")
## Your probability of making more than $0 is 58.4%
prob_making_money(0, "X50.Companies")
## Your probability of making more than $0 is 70.2%
prob_making_money(0, "X100.Companies")
## Your probability of making more than $0 is 78%
prob_making_money(0, "X200.Companies")
## Your probability of making more than $0 is 89.8%
# We removed the box plot for 5 companies because the range is so high that it messes up the plot.
par(mfrow=c(1, 1))
boxplot(results[, -1], ylab="Profit", xlab="From 10 to 200 Companies")
As you increase the number of companies you decrease risk but you decrease the money earning potential. As you decrease number of companies you increase risk of losing a lot of money but you can make a lot of money as well.
Multipliers and stock data csv have been generated by my python script (spread_generator.py) that calls ALPACA api to fetch price data of the nasdaq. Data is accurate and up to date. choice.1.csv is a list of stocks that I have chose personally from the nasdaq that I think are good companies. Their multipliers have been generated by running the python script with a specific date range. A multiplier is basically \(NVDA_{mult} = NVDA_{future\ price} / NVDA_{past\ price}\).
# These are stocks I chose personally
my_spread <- read.csv("data/choice.1.csv", sep=",", header = TRUE)
my_spread[my_spread[, "Multiplier"] < 0.5, "Multiplier"] <- 0.5
head(my_spread)
## Symbol Multiplier
## 1 UBER 1.767338
## 2 AMZN 1.217994
## 3 MSFT 1.271145
## 4 META 1.552486
## 5 NOC 1.054198
## 6 NFLX 1.371103
nrow(my_spread)
## [1] 33
INITIAL_CAPITAL <- 10000
RUNS <- 1000
# When changing this variable make sure it is reflected in the results data frame!
num_companies <- c(1, 5, 10, 15, 20, 25)
# Theoretical Capital gain because we use a sample with pregenerated probabilities
# The spread is a list of the multipliers
# Returns: profit
capital_gain_custom_spread <- function(number_of_companies, multiplier = numeric(0)) {
inv_per_company <- INITIAL_CAPITAL / number_of_companies
# Represents the companies
# Different from theoreitical spread with custom probabilities
return <- sample(multiplier, number_of_companies, replace = FALSE)
money_generated_from_each_company <- inv_per_company * return
money_generated <- sum(money_generated_from_each_company)
profit <- money_generated - INITIAL_CAPITAL
profit
}
results = data.frame(
"1 Company" = numeric(RUNS),
"5 Company" = numeric(RUNS),
"10 Company" = numeric(RUNS),
"15 Company" = numeric(RUNS),
"20 Company" = numeric(RUNS),
"25 Company" = numeric(RUNS)
)
for (i in seq_along(num_companies)) {
for (run in seq_len(RUNS)) {
results[run, i] <- capital_gain_custom_spread(num_companies[i], multiplier = my_spread[, "Multiplier"])
}
}
{function() {
par(mfrow = c(3, 2), mar=c(4,4,1,3))
for (column in colnames(results)) {
hist(
results[, column],
col = "blue",
ylab = "Attempts",
xlab = "Profit",
breaks = "Freedman-Diaconis",
#breaks = 100,
xlim = c(min(results[, column], 0), max(results[, column])),
main = column
)
# Mean line
abline(v = mean(results[, column]), col = "red")
# Quantile lines
#abline(v = quantile(results[, column], prob = c(0.25, .50, 0.75)), col = "purple")
}
}}()
prob_making_money(0, "X20.Company")
## Your probability of making more than $0 is 99.6%
As we increase the number of companies we invest in, we decrease risk of losing a lot of money, however, we also decrease the money earning potential. The theory matches reality.
my_choice_stocks <- read.csv("data/choice.1.csv", header = TRUE, sep=",")
random_stocks <- read.csv("data/random.1.csv", header = TRUE, sep=",")
# The starting capital you are starting with
INITIAL_CAPITAL <- 10000
# The number of runs to perform, more the better
RUNS <- 1000
# Number of companies we are investing in (for random and my choice stocks)
COMPANIES <- 15
capital_gain_custom_spread <- function(number_of_companies, multiplier = numeric(0)) {
inv_per_company <- INITIAL_CAPITAL / number_of_companies
# Represents the companies
# Different from theoreitical spread with custom probabilities
return <- sample(multiplier, number_of_companies, replace = FALSE)
money_generated_from_each_company <- inv_per_company * return
money_generated <- sum(money_generated_from_each_company)
profit <- money_generated - INITIAL_CAPITAL
profit
}
categories <- c("My Stock", "Random Stock")
num_rows = length(categories) * RUNS
results = data.frame(
"Profit" = numeric(num_rows),
"Type" = factor(character(num_rows), levels = categories)
)
idx <- 1
# Getting my stock data
for (i in seq_len(RUNS)) {
results[idx, "Profit"] <- capital_gain_custom_spread(COMPANIES, multiplier = my_choice_stocks[, "Multiplier"])
results[idx, "Type"] <- categories[1]
idx <- idx + 1
}
# Getting random stock data
for (i in seq_len(RUNS)) {
results[idx, "Profit"] <- capital_gain_custom_spread(COMPANIES, multiplier = random_stocks[, "Multiplier"])
results[idx, "Type"] <- categories[2]
idx <- idx + 1
}
{function() {
p <- ggplot(results, aes(x=Profit, fill=Type))
p + geom_area(stat = "bin") + geom_vline(xintercept = 0, col = "red")
}}()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
prob_making_money <- tapply(results$Profit, results$Type, function(category_data) {
sum(category_data > 0) / length(category_data)
})
cat(
"The probability of making more than $0 with me choosing stocks is ",
prob_making_money["My Stock"] * 100, "%.\n",
"The probability of making more than $0 with random stocks is ",
prob_making_money["Random Stock"] * 100, "%.", sep = ""
)
## The probability of making more than $0 with me choosing stocks is 96.1%.
## The probability of making more than $0 with random stocks is 65.1%.
my_stock_results <- results[results$Type == "My Stock", "Profit"]
random_stock_results <- results[results$Type == "Random Stock", "Profit"]
{function() {
par(mfrow = c(1, 2))
hist(
random_stock_results,
main = "Random Stocks",
xlab = "Profit"
)
hist(
my_stock_results,
main = "Me Choosing Stocks",
xlab = "Profit"
)
}}()
We will do a one sided two sample z-test (\(n > 30\)) comparing the means. We see that the data follows a normal distribution. We need to see if the mean profit is significantly greater than the random mean profit. This would then mean our expertise of choosing stocks works, or if it is just a fluke.
\[ H_0: \mu_{choice} \leq \mu_{random}\\ H_1: \mu_{choice} > \mu_{random}\\ \]
population_sd <- function(x) {
n <- length(x) # Number of elements in the sample
mean_x <- mean(x) # Mean of the sample
sqrt(sum((x - mean_x)^2) / n) # Population standard deviation formula
}
res <- z.test(
x = my_stock_results,
y = random_stock_results,
alternative = "greater",
sigma.x = population_sd(my_stock_results),
sigma.y = population_sd(random_stock_results)
)
res
##
## Two-sample z-Test
##
## data: my_stock_results and random_stock_results
## z = 47.328, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 1107.922 NA
## sample estimates:
## mean of x mean of y
## 1261.0354 113.2218
if (res$p.value < 0.05) {
cat("Since p-value is less than the significance level, we reject the null hypothesis which means that you choosing stocks is better than randomly picking stocks")
} else {
cat("Since the null hypothesis is not rejected, you may as well invest in random stocks than choose stocks yourself. You suck at trading. :((")
}
## Since p-value is less than the significance level, we reject the null hypothesis which means that you choosing stocks is better than randomly picking stocks
For this I want to gather a list of 15 big tech companies that I believe in. I will randomly be choosing 7 stocks to buy and sell in each time period, I will run 1000 runs of this. I have 5 time periods (6 months each) and will be testing each one. I will then plot all of them in a stacked area chart. This will help determine if I have a good enough chance of making money this way, and later I can compare to the S&P 500 performance.
time_periods <- 1:5
COMPANIES <- 7
RUNS <- 1000
data <- data.frame("Profit" = numeric(0), "TimePeriod" = factor(numeric(0), levels = time_periods))
idx <- 1
for (tp in time_periods) {
file_name <- paste("data/time_period/", tp, ".csv", sep="")
tp_data <- read.csv(file_name, header = TRUE, sep = ",")
for (i in seq_len(RUNS)) {
data[idx, "Profit"] <- capital_gain_custom_spread(COMPANIES, multiplier = tp_data[, "Multiplier"])
data[idx, "TimePeriod"] <- tp
idx <- idx + 1
}
}
Now that we have loaded the data into a data frame we can create a stacked area chart
{function() {
p <- ggplot(data, aes(x=Profit, fill=TimePeriod))
p + geom_area(stat = "bin", bins=30) + geom_vline(xintercept = 0, col = "red")
}}()
As you can see that time period really matters. Time Period 1 means that if you bought and sold in the most recent 6 month cycle. Time period 2 is the 6 months before that. And so one. Each time period lasts 6 months (bought and sold). My conclusion is that the stock market is mostly just luck. When you get in really matters. If we invest in more companies we can minimize risk and less money can be lost, but less money can be gained as well. As you can see that in time period 5 (oldest 6month time period), no matter what combination of 7/15 big companies you chose, you would have lost money, if you bought and sold in that time period.
I will find the top 50 biggest tech stocks, and test investing into a random 25 companies from the list, with 1000 runs. I will use the same time periods.
time_periods <- 1:5
COMPANIES <- 25
RUNS <- 1000
data <- data.frame("Profit" = numeric(0), "TimePeriod" = factor(numeric(0), levels = time_periods))
idx <- 1
for (tp in time_periods) {
file_name <- paste("data/time_period.2/", tp, ".csv", sep="")
tp_data <- read.csv(file_name, header = TRUE, sep = ",")
for (i in seq_len(RUNS)) {
data[idx, "Profit"] <- capital_gain_custom_spread(COMPANIES, multiplier = tp_data[, "Multiplier"])
data[idx, "TimePeriod"] <- tp
idx <- idx + 1
}
}
Now that we have loaded the data into a data frame we can create a stacked area chart
{function() {
p <- ggplot(data, aes(x=Profit, fill=TimePeriod))
p + geom_area(stat = "bin", bins=30) + geom_vline(xintercept = 0, col = "red")
}}()
Same shit as before. It really matters what time period you choose, no matter the amount of companies you invest in. Knowing what state the economy is critical if you want to make money. This 6 month time periods really show you the seasons of the economy, at least pertaining to big tech.
Here is time period 5 which seems to be the worst time period to invest in. The economy was shit this time. There is no scenario that exists where you can make money by any combination of 25 companies from the biggest tech companies from a list of 50 top tech companies during this time. Here is a histogram:
# Time
hist(data[data$TimePeriod == 5, "Profit"], col = "red", xlab="Profit", main = "Time Period 5", xlim = c(-3000, 0))
data[(data$TimePeriod == 5 & data$Profit >= 0), ]
## [1] Profit TimePeriod
## <0 rows> (or 0-length row.names)
Just put your money in the S&P and forget about it. Unless you can somehow predict economic cycles. One thing to take away here is that you choosing stocks is better than randomly picking stocks.
The script I used to fetch Stock API data.
# spread_generator.py
import requests
from datetime import datetime
import json
import csv
def main():
stock_file = str(input("Input txt file (stocks.txt): ")).strip()
output_file = str(input("Output csv file (stocks.csv): ")).strip()
def is_stock(stock_symbol):
if "#" in stock_symbol or stock_symbol == "\n":
return False
return True
stocks = []
with open(stock_file, "r", encoding="utf-8") as f:
raw = f.readlines()
for data in raw:
if is_stock(data):
stocks.append(data.strip())
stocks_query = "%2C".join(stocks)
old_date = datetime(2022, 2, 15).strftime("%Y-%m-%d")
curr_date = datetime(2022, 8, 15).strftime("%Y-%m-%d")
future_prices_url = f"https://data.alpaca.markets/v2/stocks/bars?symbols={stocks_query}&timeframe=1Day&start={curr_date}&end={curr_date}&limit=1000&adjustment=all&feed=sip&sort=asc"
old_prices_url = f"https://data.alpaca.markets/v2/stocks/bars?symbols={stocks_query}&timeframe=1Day&start={old_date}&end={old_date}&limit=1000&adjustment=all&feed=sip&sort=asc"
headers = {
"accept": "application/json",
"APCA-API-KEY-ID": "XXX",
"APCA-API-SECRET-KEY": "XXXXXX"
}
response = requests.get(future_prices_url, headers=headers)
future_prices = json.loads(response.text)["bars"]
response = requests.get(old_prices_url, headers=headers)
old_prices = json.loads(response.text)["bars"]
data_csv = [("Symbol", "Multiplier")]
# Goes by the open price
for symbol in stocks:
if symbol not in future_prices:
print(f"Symbol {symbol} does not exist in ALPACA API")
continue
elif symbol not in old_prices:
print(f"Symbol {symbol} did not exist {old_date}")
continue
future_open_price = future_prices[symbol][0]["o"]
old_open_price = old_prices[symbol][0]["o"]
multiplier = future_open_price / old_open_price
data_csv.append((symbol, multiplier))
with open(output_file, "w", newline="") as file:
writer = csv.writer(file)
writer.writerows(data_csv)
if __name__ == "__main__":
main()