I will begin by studying two examples, then select a dataset and complete the assignment according to the instructions.
2.What data challenges do I may anticipate?
I may run into datasets of poor quality, such as missing dates, values with NA, or anomalies.
source: “https://raw.githubusercontent.com/XxY-coder/data607-week3b/refs/heads/main/stocks.csv”
3. Clean up the data.
It has 4 different types of tickers.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
library (lubridate)
stock_raw <- read.csv ("https://raw.githubusercontent.com/XxY-coder/data607-week3b/refs/heads/main/stocks.csv" )
names (stock_raw)
[1] "Ticker" "Date" "Open" "High" "Low" "Close"
[7] "Adj.Close" "Volume"
stock_raw |>
distinct (Ticker) |>
count ()
stock_raw |>
summarise (
min_date = min (as.Date (Date), na.rm = TRUE ),
max_date = max (as.Date (Date), na.rm = TRUE )
)
min_date max_date
1 2023-02-07 2023-05-05
stocks <-
stock_raw |>
transmute (
Ticker,
date = as.Date (Date),
adj_close = as.numeric (Adj.Close)
) |>
arrange (Ticker, date) |>
distinct (
Ticker,
date,
.keep_all = TRUE
)
4.Calculate the year-to-date average
stocks <-
stocks |>
mutate (year = year (date)) |>
group_by (Ticker, year) |>
arrange (date, .by_group = TRUE ) |>
mutate (ytd_avg = cummean (adj_close)) |>
ungroup ()
stocks
# A tibble: 248 × 5
Ticker date adj_close year ytd_avg
<chr> <date> <dbl> <dbl> <dbl>
1 AAPL 2023-02-07 154. 2023 154.
2 AAPL 2023-02-08 152. 2023 153.
3 AAPL 2023-02-09 151. 2023 152.
4 AAPL 2023-02-10 151. 2023 152.
5 AAPL 2023-02-13 154. 2023 152.
6 AAPL 2023-02-14 153. 2023 152.
7 AAPL 2023-02-15 155. 2023 153.
8 AAPL 2023-02-16 154. 2023 153.
9 AAPL 2023-02-17 153. 2023 153.
10 AAPL 2023-02-21 148. 2023 152.
# ℹ 238 more rows
5.Calculate the six-day moving averages
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
stocks <-
stocks |>
arrange (Ticker, date) |>
group_by (Ticker) |>
mutate (
d6_move = rollmean (adj_close, k = 6 , align = "right" , fill = NA )
) |>
ungroup ()
stocks
# A tibble: 248 × 6
Ticker date adj_close year ytd_avg d6_move
<chr> <date> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2023-02-07 154. 2023 154. NA
2 AAPL 2023-02-08 152. 2023 153. NA
3 AAPL 2023-02-09 151. 2023 152. NA
4 AAPL 2023-02-10 151. 2023 152. NA
5 AAPL 2023-02-13 154. 2023 152. NA
6 AAPL 2023-02-14 153. 2023 152. 152.
7 AAPL 2023-02-15 155. 2023 153. 153.
8 AAPL 2023-02-16 154. 2023 153. 153.
9 AAPL 2023-02-17 153. 2023 153. 153.
10 AAPL 2023-02-21 148. 2023 152. 153.
# ℹ 238 more rows
stocks_f <- stocks |>
select (Ticker, date, adj_close, ytd_avg, d6_move)
view (stocks_f)
6.Conclusion
Four stocks increase slowly between February and May. AAPL and GOOG displayed relatively smoother upward trends; MSFT initially declined slightly but then rise ; NFLX exhibited high volatility and an unstable. Overall, NFLX reached the highest range, while GOOG reached the lowest.
ggplot (
stocks_f,
aes (x = date, y = d6_move, color = Ticker, group = Ticker)
) +
geom_line (na.rm = TRUE ) +
labs (x = "Date" , y = "6-day Moving Average" , color = "Ticker" )