R Markdown

We will be looking at the Starbucks Stock Data set, found here:https://www.kaggle.com/datasets/mayankanand2701/starbucks-stock-price-dataset. This data set includes columns for the date of the trading day, the opening, highest, lowest, closing, and adjusted closing price of Starbucks stock. The data set also includes a column for the trading volume of the Starbucks stock on the given day.

We will use the cummean() window function from the dplyr library, and mutate() along with the rollmean() functions from the dplyr and zoo libraries to find the year to date closing and adjusted closing price for each year in the data set. We will also find the six-day moving average by year for the closing and asjusted closing price.

We will use the following libraries

Reading in Our Data

Let’s first read in our data into a data frame.

url <- ("https://raw.githubusercontent.com/WendyR20/DATA-607-Assignment-3B/refs/heads/main/Starbucks_Dataset.csv")
df <- read.csv(url)

Now we can take a glimpse at our data, we want to pay attention to the data types of our closing, adjusted closing and date columns.

glimpse(df)
## Rows: 8,036
## Columns: 7
## $ Date      <chr> "1992-06-26", "1992-06-29", "1992-06-30", "1992-07-01", "199…
## $ Open      <dbl> 0.328125, 0.339844, 0.367188, 0.351563, 0.359375, 0.351563, …
## $ High      <dbl> 0.347656, 0.367188, 0.371094, 0.359375, 0.359375, 0.355469, …
## $ Low       <dbl> 0.320313, 0.332031, 0.343750, 0.339844, 0.347656, 0.347656, …
## $ Close     <dbl> 0.335938, 0.359375, 0.347656, 0.355469, 0.355469, 0.355469, …
## $ Adj.Close <dbl> 0.260703, 0.278891, 0.269797, 0.275860, 0.275860, 0.275860, …
## $ Volume    <int> 224358400, 58732800, 34777600, 18316800, 13996800, 5753600, …

We have seen that the Date column is a character type, for the sake of our calculations we will convert this column to a date type.

df$Date <- as.Date(df$Date)

Year to Date Averages

Let’s calculate the year to date closing price average per year.

df_ytdm <- df %>%
  mutate(year = year(Date)) %>%
  group_by(year) %>%
  mutate( ytd_avg_price = cummean(Close)) %>%
  ungroup()

Let’s calculate the year to date adjusted closing price average per year.

df_ytdm2 <- df_ytdm %>% 
    group_by(year) %>%
    mutate(ytd_avg_close = cummean(Adj.Close))  %>%
    ungroup()

Now let’s take a look at the year to date closing price average and adjusted closing price average columns we added to our data frame.

glimpse(df_ytdm2)
## Rows: 8,036
## Columns: 10
## $ Date          <date> 1992-06-26, 1992-06-29, 1992-06-30, 1992-07-01, 1992-07…
## $ Open          <dbl> 0.328125, 0.339844, 0.367188, 0.351563, 0.359375, 0.3515…
## $ High          <dbl> 0.347656, 0.367188, 0.371094, 0.359375, 0.359375, 0.3554…
## $ Low           <dbl> 0.320313, 0.332031, 0.343750, 0.339844, 0.347656, 0.3476…
## $ Close         <dbl> 0.335938, 0.359375, 0.347656, 0.355469, 0.355469, 0.3554…
## $ Adj.Close     <dbl> 0.260703, 0.278891, 0.269797, 0.275860, 0.275860, 0.2758…
## $ Volume        <int> 224358400, 58732800, 34777600, 18316800, 13996800, 57536…
## $ year          <dbl> 1992, 1992, 1992, 1992, 1992, 1992, 1992, 1992, 1992, 19…
## $ ytd_avg_price <dbl> 0.3359380, 0.3476565, 0.3476563, 0.3496095, 0.3507814, 0…
## $ ytd_avg_close <dbl> 0.2607030, 0.2697970, 0.2697970, 0.2713127, 0.2722222, 0…

Now, for fun, let’s make a summary table of what the year to date closing price and adjusted closing price average on the final day of each year.

Starbucks_summary <-df_ytdm2 %>%
  group_by(year) %>%
  summarise(
    Closing_Price = last(ytd_avg_price),
    Closing_Adjusted_Price = last(ytd_avg_close)
  ) %>% 
 
  ungroup()

print(Starbucks_summary)
## # A tibble: 33 × 3
##     year Closing_Price Closing_Adjusted_Price
##    <dbl>         <dbl>                  <dbl>
##  1  1992         0.465                  0.361
##  2  1993         0.702                  0.545
##  3  1994         0.814                  0.632
##  4  1995         1.05                   0.811
##  5  1996         1.71                   1.33 
##  6  1997         2.20                   1.71 
##  7  1998         2.70                   2.09 
##  8  1999         3.46                   2.69 
##  9  2000         4.70                   3.64 
## 10  2001         4.93                   3.82 
## # ℹ 23 more rows

Lets’ plot the the year to date closing price on the final day of each year.

custom <- theme(
  plot.title = element_text( vjust = 2, color = "navy"),
  axis.title.x = element_text( vjust = - 2),
  axis.title.y = element_text( vjust = + 2)
)

ggplot(Starbucks_summary, aes(x = year , y = Closing_Price )) +
  geom_line() +
  labs(
    title = ("Starbucks Year to Year Stock Closing Price Change"),
    x = "Year",
    y = "Closing Price"
  ) + 
  custom

Six Day Moving Average

Let’s calculate the six day clsoing price moving average (grouping by year).

df_ytdm2 <- df_ytdm2 %>%
  group_by(year) %>%
  mutate(Six_Day_Close_Avg = rollmean(x = Close, k=6, fill = NA, align = "right" ))  %>%
  ungroup()

head(df_ytdm2)
## # A tibble: 6 × 11
##   Date        Open  High   Low Close Adj.Close    Volume  year ytd_avg_price
##   <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <int> <dbl>         <dbl>
## 1 1992-06-26 0.328 0.348 0.320 0.336     0.261 224358400  1992         0.336
## 2 1992-06-29 0.340 0.367 0.332 0.359     0.279  58732800  1992         0.348
## 3 1992-06-30 0.367 0.371 0.344 0.348     0.270  34777600  1992         0.348
## 4 1992-07-01 0.352 0.359 0.340 0.355     0.276  18316800  1992         0.350
## 5 1992-07-02 0.359 0.359 0.348 0.355     0.276  13996800  1992         0.351
## 6 1992-07-06 0.352 0.355 0.348 0.355     0.276   5753600  1992         0.352
## # ℹ 2 more variables: ytd_avg_close <dbl>, Six_Day_Close_Avg <dbl>

Let’s calculate the six day adjusted clsoing price moving average (grouping by year).

df_ytdm2 <- df_ytdm2 %>%
  group_by(year) %>%
  mutate(Six_Day_CloseAdj_Avg = rollmean(x = Adj.Close, k=6, fill = NA, align = "right" ))  %>%
  ungroup()

head(df_ytdm2)
## # A tibble: 6 × 12
##   Date        Open  High   Low Close Adj.Close    Volume  year ytd_avg_price
##   <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <int> <dbl>         <dbl>
## 1 1992-06-26 0.328 0.348 0.320 0.336     0.261 224358400  1992         0.336
## 2 1992-06-29 0.340 0.367 0.332 0.359     0.279  58732800  1992         0.348
## 3 1992-06-30 0.367 0.371 0.344 0.348     0.270  34777600  1992         0.348
## 4 1992-07-01 0.352 0.359 0.340 0.355     0.276  18316800  1992         0.350
## 5 1992-07-02 0.359 0.359 0.348 0.355     0.276  13996800  1992         0.351
## 6 1992-07-06 0.352 0.355 0.348 0.355     0.276   5753600  1992         0.352
## # ℹ 3 more variables: ytd_avg_close <dbl>, Six_Day_Close_Avg <dbl>,
## #   Six_Day_CloseAdj_Avg <dbl>