This analysis demonstrates how to reshape stock data from wide to long format using the tidyverse package. The data contains weekly closing prices for five US listed companies in 2019.
library(tidyverse)
library(dplyr)
stock_df <- read_csv("stock_df.csv")
stock_df
## # A tibble: 5 × 106
## company `2019_week1` `2019_week2` `2019_week3` `2019_week4` `2019_week5`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Amazon 1848. 1641. 1696. 1671. 1626.
## 2 Apple 73.4 38.1 39.2 39.4 41.6
## 3 Facebook 205. 144. 150. 149. 166.
## 4 Google 1337. 1057. 1098. 1091. 1111.
## 5 Microsoft 158. 103. 108. 107. 103.
## # ℹ 100 more variables: `2019_week6` <dbl>, `2019_week7` <dbl>,
## # `2019_week8` <dbl>, `2019_week9` <dbl>, `2019_week10` <dbl>,
## # `2019_week11` <dbl>, `2019_week12` <dbl>, `2019_week13` <dbl>,
## # `2019_week14` <dbl>, `2019_week15` <dbl>, `2019_week16` <dbl>,
## # `2019_week17` <dbl>, `2019_week18` <dbl>, `2019_week19` <dbl>,
## # `2019_week20` <dbl>, `2019_week21` <dbl>, `2019_week22` <dbl>,
## # `2019_week23` <dbl>, `2019_week24` <dbl>, `2019_week25` <dbl>, …
The original dataset has 5 rows (companies) and 106 columns (company name + 105 weeks).
We use pivot_longer() to transform the data from wide to
long format:
stock_df_long <- stock_df %>%
pivot_longer(
cols = !company, # Select all columns except 'company'
names_to = c("year", "week"), # Split column names into 'year' and 'week'
names_sep = "_week", # Separator between year and week
names_transform = list( # Transform the extracted names
year = as.integer, # Convert year to integer
week = as.integer # Convert week to integer
),
values_to = "price" # Name for the values column
)
stock_df_long
## # A tibble: 525 × 4
## company year week price
## <chr> <int> <int> <dbl>
## 1 Amazon 2019 1 1848.
## 2 Amazon 2019 2 1641.
## 3 Amazon 2019 3 1696.
## 4 Amazon 2019 4 1671.
## 5 Amazon 2019 5 1626.
## 6 Amazon 2019 6 1588.
## 7 Amazon 2019 7 1608.
## 8 Amazon 2019 8 1632.
## 9 Amazon 2019 9 1672.
## 10 Amazon 2019 10 1621.
## # ℹ 515 more rows
The reshaped dataset now has 525 rows and 4 columns.
# Summary by company
stock_df_long %>%
group_by(company) %>%
summarise(
min_price = min(price),
max_price = max(price),
avg_price = mean(price),
weeks = n()
) %>%
knitr::kable(digits = 2, caption = "Summary Statistics by Company")
| company | min_price | max_price | avg_price | weeks |
|---|---|---|---|---|
| Amazon | 1588.22 | 3401.80 | 2234.66 | 105 |
| Apple | 38.07 | 132.69 | 74.18 | 105 |
| 143.80 | 293.66 | 209.06 | 105 | |
| 1057.19 | 1827.99 | 1341.01 | 105 | |
| Microsoft | 102.78 | 228.91 | 162.59 | 105 |
ggplot(stock_df_long, aes(x = week, y = price, color = company)) +
geom_line(linewidth = 1) +
labs(
title = "Weekly Stock Prices for Tech Companies in 2019",
x = "Week of Year",
y = "Stock Price (USD)",
color = "Company"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
legend.position = "bottom"
)
The data has been successfully reshaped from wide to long format, making it easier to analyze and visualize trends across multiple companies over time.