week3b

Author

Zihao Yu

I will begin by studying two examples, then select a dataset and complete the assignment according to the instructions.

2.What data challenges do I may anticipate?

I may run into datasets of poor quality, such as missing dates, values with NA, or anomalies.

source: “https://raw.githubusercontent.com/XxY-coder/data607-week3b/refs/heads/main/stocks.csv”

3. Clean up the data.

It has 4 different types of tickers.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(lubridate)


stock_raw <- read.csv("https://raw.githubusercontent.com/XxY-coder/data607-week3b/refs/heads/main/stocks.csv")
names(stock_raw)
[1] "Ticker"    "Date"      "Open"      "High"      "Low"       "Close"    
[7] "Adj.Close" "Volume"   
stock_raw |>
  distinct(Ticker) |>
  count()
  n
1 4
stock_raw |>
  summarise(
    min_date = min(as.Date(Date), na.rm = TRUE),
    max_date = max(as.Date(Date), na.rm = TRUE)
  )
    min_date   max_date
1 2023-02-07 2023-05-05
stocks <- 
  stock_raw |>
  transmute(
    Ticker, 
    date = as.Date(Date),
    adj_close = as.numeric(Adj.Close)
) |>
  arrange(Ticker, date) |>
  distinct(
    Ticker, 
    date, 
    .keep_all = TRUE
)

4.Calculate the year-to-date average

stocks <-
  stocks |>
  mutate(year = year(date)) |>
  group_by(Ticker, year) |>
  arrange(date, .by_group = TRUE) |>
  mutate(ytd_avg = cummean(adj_close)) |>
  ungroup()

stocks
# A tibble: 248 × 5
   Ticker date       adj_close  year ytd_avg
   <chr>  <date>         <dbl> <dbl>   <dbl>
 1 AAPL   2023-02-07      154.  2023    154.
 2 AAPL   2023-02-08      152.  2023    153.
 3 AAPL   2023-02-09      151.  2023    152.
 4 AAPL   2023-02-10      151.  2023    152.
 5 AAPL   2023-02-13      154.  2023    152.
 6 AAPL   2023-02-14      153.  2023    152.
 7 AAPL   2023-02-15      155.  2023    153.
 8 AAPL   2023-02-16      154.  2023    153.
 9 AAPL   2023-02-17      153.  2023    153.
10 AAPL   2023-02-21      148.  2023    152.
# ℹ 238 more rows

5.Calculate the six-day moving averages

library(zoo)

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
stocks <-
  stocks |>
  arrange(Ticker, date) |>
  group_by(Ticker) |>
  mutate(
    d6_move = rollmean(adj_close, k = 6, align = "right", fill = NA)
  ) |>
  ungroup()

stocks
# A tibble: 248 × 6
   Ticker date       adj_close  year ytd_avg d6_move
   <chr>  <date>         <dbl> <dbl>   <dbl>   <dbl>
 1 AAPL   2023-02-07      154.  2023    154.     NA 
 2 AAPL   2023-02-08      152.  2023    153.     NA 
 3 AAPL   2023-02-09      151.  2023    152.     NA 
 4 AAPL   2023-02-10      151.  2023    152.     NA 
 5 AAPL   2023-02-13      154.  2023    152.     NA 
 6 AAPL   2023-02-14      153.  2023    152.    152.
 7 AAPL   2023-02-15      155.  2023    153.    153.
 8 AAPL   2023-02-16      154.  2023    153.    153.
 9 AAPL   2023-02-17      153.  2023    153.    153.
10 AAPL   2023-02-21      148.  2023    152.    153.
# ℹ 238 more rows
stocks_f <- stocks |>
  select(Ticker, date, adj_close, ytd_avg, d6_move)

view(stocks_f)

6.Conclusion

Four stocks increase slowly between February and May. AAPL and GOOG displayed relatively smoother upward trends; MSFT initially declined slightly but then rise ; NFLX exhibited high volatility and an unstable. Overall, NFLX reached the highest range, while GOOG reached the lowest.

ggplot(
  stocks_f,
  aes(x = date, y = d6_move, color = Ticker, group = Ticker)
) +
  geom_line(na.rm = TRUE) +
  labs(x = "Date", y = "6-day Moving Average", color = "Ticker")