Setting up

Loading the data and required librarys and checking if their are any missing values.

stock <- readRDS("stocks.RDS")
private <- readRDS("private_equity_activity.RDS")
company <- readRDS("stock_company_info.RDS")

library(dplyr)
library(tidyr)
library(lubridate)
library(ggplot2)

sum(is.na(stock))
[1] 0

Question 1.1

Which stocks had the longest net daily streak upwards/downwards (consecutive positive or negative growth days)?

Solution For each ticker we code the entire sequence of daily changes as 1 if it increased, 0 if no change occured and -1 if it decreased. Then we can count the longest run using in-built rle() function.

tickers <- unique(stock$ticker)
answer <- c()
for (value in tickers) {
  f <- filter(stock, ticker == value)
  vec <- ifelse(f$close > f$open, 1, ifelse(f$close==f$open, 0, -1))
  rle.out <- rle(vec)
  ans <- max(rle.out$lengths)
  answer <- c(answer, ans)
}
max(answer) 
[1] 966
which.max(answer)
[1] 621
tickers[621]
[1] "cats"
filter(company, ticker == "cats")
# A tibble: 1 x 7
  ticker company_name  state zip   sector     industry                employees
  <chr>  <chr>         <chr> <chr> <chr>      <chr>                       <dbl>
1 cats   Catasys, Inc. CA    90404 Healthcare Medical Care Facilities       395

Question 1.2

What was the worst month across all stocks?

Solution One measure of performance of a stock during a month can be the change in its value over the month = sum of daily changes in that month and then we can average over all stocks to see which month was best or worse.

To do this first we add some useful columns to the stocks data frame like “year”, “month”, “date” and “change” (closing - opening price of the day).

stock1 <- stock %>% mutate(year = year(date)) %>%
                                              mutate(month = month(date)) %>%
                                              mutate(day = day(date)) %>%
                                              mutate(daynum = as.numeric(date)) %>%
                                              mutate(change = close - open)
answer <- c()
for (i in 1:12) {
  f <- filter(stock1, month == i)
  df <- group_by(f, ticker)
  sum <- summarise(df, monthchange = sum(change))
  ans <- mean(sum$monthchange)
  answer <- c(answer, ans)
}
which.min(answer)
[1] 5
which.max(answer)
[1] 8

So the worst month is May (5th month) and best is August (8th month).

Question 1.3

Which airline’s stock had the worst average performance for 2015?

Solution The measure of annual performance I take is closing price of year - opening price of year. This is just sum of daily changes which is equal to sum of monthly changes. Sum of monthly changes is just 12 times the average monthly change.

tickers <- filter(company, industry == "Airlines")$ticker
answer <- c()
for(i in tickers) {
f <- stock1 %>% filter(ticker == i) %>% filter(year == "2015")
ans <- sum(f$change)
answer <- c(answer, ans)
}
answer
[1] -15.194  -4.660   5.469   0.000
tickers
[1] "aal"  "algt" "alk"  "azul"
filter(company, industry == "Airlines")
# A tibble: 4 x 7
  ticker company_name                state zip     sector     industry employees
  <chr>  <chr>                       <chr> <chr>   <chr>      <chr>        <dbl>
1 aal    American Airlines Group In~ TX    76155   Industria~ Airlines    133700
2 algt   Allegiant Travel Company    NV    89144   Industria~ Airlines      4029
3 alk    Alaska Air Group, Inc.      WA    98188   Industria~ Airlines     24134
4 azul   Azul S.A.                   <NA>  Tambore Industria~ Airlines     13189

We can see from the two vectors: answer and tickers : that “aal” (American Airline) price fell the most in 2015 and “alk” (Alaska Air Group) price increased the most in 2015. There are four airline stocks.

Question 1.4

After finding the best and worst airline performance in the last question, plot the following.

stock2 <- stock1 %>% filter(year>2013, year <2018) %>% filter(ticker %in% tickers)

stock2 %>% ggplot(aes(x= daynum-16071, y = close, group = ticker, color= ticker)) + geom_line()

By looking at the graph we see that in 2015 (around 350 to 750 in the x-axis) we see that the red line (aal) was indeed falling the most during that period while the blue line (alk) had steadily risen. The green line (algt) fluctuated quite a bit with sharp rise and shapr fall but overall it had fallen in 2015. The purple line Azul is zero except in 2017.

Questions 2 and 3

The above took over four hours. Questions 2 and 3, I can only provide common sense responses (which may lead to good models). I have started learning ML models in the second year of Mstat so I am not very qualified to answer these presently except by common sense. Thank you.

End of report