The goal of the project is to find out weather long only straetgy works

Data Overview

options(warn=-1)
library(ggplot2)
data = read.csv("https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/Ecdat/SP500.csv")
head(data)
##   X       r500
## 1 1 -0.0117265
## 2 2  0.0024544
## 3 3  0.0110516
## 4 4  0.0190512
## 5 5 -0.0055657
## 6 6 -0.0043148

Exploratory Data Analysis

summary(data$r500)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -0.2280000 -0.0048450  0.0005357  0.0004181  0.0057660  0.0870900
paste('Sd', sd(data$r500), sep= ' ')
## [1] "Sd 0.0108629600701567"

from the exploratory analysis, it seems like the mean return is close to 0 but the maxium drawdown is much bigger than maxium increase. On the other hand since the median and mean are above 0, the maxium drawdow maybe an outlier. So the long only straetgy may work.

Add precent return column and direction column

data$precent_return = sapply(data$r500,function (x) paste(round(x*100,2),'%',sep =''))
data$direction =sapply(data$r500, function(x) if (x>0) 'up' else 'down')
colnames(data)[2] <- 'value_return'
head(data)
##   X value_return precent_return direction
## 1 1   -0.0117265         -1.17%      down
## 2 2    0.0024544          0.25%        up
## 3 3    0.0110516          1.11%        up
## 4 4    0.0190512          1.91%        up
## 5 5   -0.0055657         -0.56%      down
## 6 6   -0.0043148         -0.43%      down

Plot precent return distribution and box plot view

precent_return <- data.frame(data[2]*100)
rp=ggplot(precent_return ,aes(x=precent_return),scale.default()) + geom_histogram(binwidth = 0.1, color="white", fill=rgb(0.2,0.7,0.6,0.4))+ scale_x_sqrt()
rp

par(mfrow=c(1,1))
rb=boxplot(precent_return,data=precent_return)

The box plot indicates that the -22.8 decrease is indeed an outlier

Compare number of increase vs decrease

direction = data.frame(sapply(data$direction, function(x) if (x == 'up') 1 else 0))
rd=ggplot(direction ,aes(x=direction),scale.default()) + geom_histogram( color="white", fill=rgb(0.2,0.7,0.6,0.4))+ scale_x_sqrt()
rd
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Since number of increase is more than decrese, it still make sence to use long only straetgy

Consrtuct long only straetgy with in initial investment of $100

data$long =  cumprod((1+data$value_return))*100
print(paste("With initial investment of $100 and long only straetgy, the financial balance is $", data$long[length(data$long)], sep =''))
## [1] "With initial investment of $100 and long only straetgy, the financial balance is $270.274128721932"
cp=ggplot(data, aes(X, long))+geom_line()+xlab("Time")+ylab("cumulative Return")
cp

So according to the result, the long only straetgy work for this data set.

read csv from github

data <- read.csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/Ecdat/SP500.csv")
head(data)
##   X       r500
## 1 1 -0.0117265
## 2 2  0.0024544
## 3 3  0.0110516
## 4 4  0.0190512
## 5 5 -0.0055657
## 6 6 -0.0043148