This assignment requires time series data. I’m planning to choose a stock price dataset. , I’ll load data into R, check data type of each columns. Library tidyquant provide stock price, I choose stock Apple(AAPL) for this assignment to analyze price in year 2023. To use window functions calculate moving average, I will use cumean ( ) to calcualte YTD average and lag ( ) to look back 6 days window calculate 6 days moving average. In the end I’ll show result through plots.
Data Exploration
From a glance, I have 250 rows of data which looks about right amount of total trading days in year 2023. There are 8 columns in different data type such as char, date and number. Lowest price occurred toward beginning of the year on date 1/3, and the highest price on 12/29. Overall this looks like a quality data set to work with.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyquant)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
stock_data_clean <- stock_data |>select(date, close) |>#use closeing pricearrange(date) |>#make sure date in ordermutate(year =year(date),month =month(date),day =day(date) )
Window Functions
Use window function, in this case cumean ( ), to calculate YTD average. First need make sure data are sorted by date , then we will calculates a value for each row based on a window of previous rows. The window expands as date moving forward.
Similarly, we will use lab ( ) function to look back 6 days window. Each row’s value depends on previous rows and windows slides forward as we move through the sorted dates. I notice there are some missing values for this one, due to there aren’t enough days to calculate 6 days moving average at the beginning of the year.
# plot for both averagesggplot(stock_analysis, aes(x = date)) +geom_line(aes(y = close, color ="Daily Close"), alpha =0.5) +geom_line(aes(y = ytd_avg, color ="YTD Average"), size =1) +geom_line(aes(y = ma_6, color ="6-Day MA"), size =1) +labs(title ="Apple Stock YTD Average and 6-Day Moving Average",x ="Date", y ="Price (USD)",color ="Metric") +theme_minimal() +scale_color_manual(values =c("Daily Close"="gray","YTD Average"="blue","6-Day MA"="red"))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning: Removed 5 rows containing missing values or values outside the scale range
(`geom_line()`).
Conclusion
The goal of the project is do window function on a time series data. I choose stock price because it’s readyly available, well recorded with price and date. cumean ( ) and lag ( ) are both use of window function to calculate closing price of Apple stock in the chosen year - 2023. The plots shows the average price are trending up from Jan to Dec of 2023.
One small but crucial details is to make sure sort the data by date at the begin since the moving window will just expand by rows and won’t be checking on the dates during the process.