This project is purposed to give investors some stock insights about S&P Stock Index.
The dataset that I analyze is stock index S&P 500 from 2013 to 2018. It contains 505 stocks in 6 columns:
source: kaggle
## date open high low close volume Name
## 1 2013-02-08 15.07 15.12 14.63 14.75 8407500 AAL
## 2 2013-02-11 14.89 15.01 14.26 14.46 8882000 AAL
## 3 2013-02-12 14.45 14.51 14.10 14.27 8126000 AAL
## 4 2013-02-13 14.30 14.94 14.25 14.66 10259500 AAL
## 5 2013-02-14 14.94 14.96 13.16 13.99 31879900 AAL
## 6 2013-02-15 13.93 14.61 13.93 14.50 15628000 AAL
I will create a copy of imported dataframe, so If I did something wrong I can revert the dataframe back without re-import dataframe again.
stock = data.frame(stock_old)## 'data.frame': 619040 obs. of 7 variables:
## $ date : chr "2013-02-08" "2013-02-11" "2013-02-12" "2013-02-13" ...
## $ open : num 15.1 14.9 14.4 14.3 14.9 ...
## $ high : num 15.1 15 14.5 14.9 15 ...
## $ low : num 14.6 14.3 14.1 14.2 13.2 ...
## $ close : num 14.8 14.5 14.3 14.7 14 ...
## $ volume: int 8407500 8882000 8126000 10259500 31879900 15628000 11354400 14725200 11922100 6071400 ...
## $ Name : chr "AAL" "AAL" "AAL" "AAL" ...
as we can see there are, 2 columns with inappropriate data type: date and name.
so I will change data type of column date to date and column name to factor
## 'data.frame': 619040 obs. of 7 variables:
## $ date : Date, format: "2013-02-08" "2013-02-11" ...
## $ open : num 15.1 14.9 14.4 14.3 14.9 ...
## $ high : num 15.1 15 14.5 14.9 15 ...
## $ low : num 14.6 14.3 14.1 14.2 13.2 ...
## $ close : num 14.8 14.5 14.3 14.7 14 ...
## $ volume: int 8407500 8882000 8126000 10259500 31879900 15628000 11354400 14725200 11922100 6071400 ...
## $ Name : Factor w/ 505 levels "A","AAL","AAP",..: 2 2 2 2 2 2 2 2 2 2 ...
lubridate::wday(stock$date, label = T)[1:7]## [1] Fri Mon Tue Wed Thu Fri Tue
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
in order to make line graph, the date must be available sequentially but as we can see there are no date on Saturday and Sunday
so I decide to make a date range then fill the value with the previous value
in order to make the date range, we need to get the min and max date value
## [1] "min date: 2013-02-08"
## [1] "max date: 2018-02-07"
## date open high low close volume Name
## 0 303101 303098 303098 303090 303090 0
NA_df <- stock[!complete.cases(stock),]
head(NA_df)## date open high low close volume Name
## 2 2013-02-09 NA NA NA NA NA A
## 3 2013-02-10 NA NA NA NA NA A
## 9 2013-02-16 NA NA NA NA NA A
## 10 2013-02-17 NA NA NA NA NA A
## 11 2013-02-18 NA NA NA NA NA A
## 16 2013-02-23 NA NA NA NA NA A
unique(weekdays(NA_df[1:20, "date"]))## [1] "Saturday" "Sunday" "Monday" "Friday"
Turns out after I did a null data inspection, I got 2 problems on the dataframe: 1. After fill missing date, the value of open, high, low, and close columns for the corresponding added date is NA. 2. There are some value is missing on weekdays, this might be because on these days, there are some special events, for ex: international holiday.
After the inspection, I decided to use fill function to fill NA stock prices with price the day before
stock <- stock %>%
fill(open, high, low, close, volume, .direction="down")anyNA(stock)## [1] FALSE
There are no NA value on dataframe, so it ready to be analyzed
## date open high low
## Min. :2013-02-08 Min. : 1.62 Min. : 1.69 Min. : 1.50
## 1st Qu.:2014-05-10 1st Qu.: 40.25 1st Qu.: 40.64 1st Qu.: 39.86
## Median :2015-08-09 Median : 62.78 Median : 63.34 Median : 62.21
## Mean :2015-08-09 Mean : 82.91 Mean : 83.69 Mean : 82.15
## 3rd Qu.:2016-11-08 3rd Qu.: 94.17 3rd Qu.: 94.96 3rd Qu.: 93.36
## Max. :2018-02-07 Max. :2044.00 Max. :2067.99 Max. :2035.11
##
## close volume Name
## Min. : 1.59 Min. : 0 A : 1826
## 1st Qu.: 40.27 1st Qu.: 1069482 AAL : 1826
## Median : 62.81 Median : 2087317 AAP : 1826
## Mean : 82.94 Mean : 4390521 AAPL : 1826
## 3rd Qu.: 94.22 3rd Qu.: 4364796 ABBV : 1826
## Max. :2049.00 Max. :618237630 ABC : 1826
## (Other):911174
Insight:
This stock data is collected from February, 8th 2013 util February, 7th 2018
If we compare median and mean of all stock price (open, high, low, and close) respectively. It seems the value is pretty equal, this is because we generate statistic descriptive from all stocks.
Based on my research on investopedia.com, S&P 500 is one of most-widely watched index globally. But as we can see there’s one or some stock that have no volume transaction
The value of 1st quarter is increased over time to 3rd quarter across all stock prices
In order to give investors insight about this index, I will make several business questions:
## # A tibble: 5 × 2
## Name price_increase
## <fct> <chr>
## 1 DE $99.71
## 2 COST $99.46
## 3 PNC $98.39
## 4 IBM $97.95
## 5 CELG $97.14
Insight:
As we can see, top 5 stocks in price growth are: - Deere & Company (DE) - Costco Wholesale Corporation (COST) - Financial Services Group Inc (PNC) - IBM - Celgene Corporation (CELG)
## # A tibble: 6 × 3
## # Groups: year [6]
## year Name volume_sum
## <dbl> <fct> <chr>
## 1 2013 XLNX $997,577,275
## 2 2014 KSS $995,961,962
## 3 2015 CMS $998,214,711
## 4 2016 AET $994,782,004
## 5 2017 DISH $996,217,505
## 6 2018 AMT $99,070,512
Insight:
The analysis result says that, Bank of America Corp (BAC) is the most stock traded in terms of volume. 5 times in a row from 2013 - 2017. But in 2018, General Electric Co (GE) took over that record
Cited from investopedia.com Momentum is one of indicator to measure rice or fall in stock price. This article also tell us that bull market (increasing stock price) is tend to last longer than bear market (falling stock price). This metric will help traders to decide weather they want to buy a stock or not using momentum.
The formula for momentum is:
\(V - V{x}\)
where: \(V\) = latest price \(V{x}\) = closing price \(x\) = number of days ago
src: https://www.investopedia.com/articles/technical/081501.asp#:~:text=Market%20momentum%20is%20measured%20by,plotted%20around%20a%20zero%20line.
last_date <- stock %>%
slice_tail %>%
select(date)
last_date## date
## 1 2018-02-07
closing_price_10_days_ago <- stock %>%
filter(date == last_date$date-10) %>%
group_by(Name) %>%
select(Name, close) %>%
arrange(-close) %>%
head(5)
closing_price_10_days_ago## # A tibble: 5 × 2
## # Groups: Name [5]
## Name close
## <fct> <dbl>
## 1 PCLN 1950.
## 2 AMZN 1402.
## 3 GOOGL 1188.
## 4 GOOG 1176.
## 5 AZO 797.
Insight: - Booking Holdings Inc (PCLN) - Amazon (AMZN) - GOOGL (Google) - Alphabet Inc Class C (GOOG) - Autozone Inc (AZO)
Those stocks are top 5 in terms of momentum on thes last 10 days.
The interesting fact about this finding is GOOGL, AMZN, and GOOG are stocks in sector technology. So, in 2018 technology sector stocks are already traded frequently.