Introduction

This analysis demonstrates how to reshape stock data from wide to long format using the tidyverse package. The data contains weekly closing prices for five US listed companies in 2019.

Load Libraries

library(tidyverse)
library(dplyr)

Import Data

stock_df <- read_csv("stock_df.csv")

Original Wide Format Data

stock_df
## # A tibble: 5 × 106
##   company   `2019_week1` `2019_week2` `2019_week3` `2019_week4` `2019_week5`
##   <chr>            <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
## 1 Amazon          1848.        1641.        1696.        1671.        1626. 
## 2 Apple             73.4         38.1         39.2         39.4         41.6
## 3 Facebook         205.         144.         150.         149.         166. 
## 4 Google          1337.        1057.        1098.        1091.        1111. 
## 5 Microsoft        158.         103.         108.         107.         103. 
## # ℹ 100 more variables: `2019_week6` <dbl>, `2019_week7` <dbl>,
## #   `2019_week8` <dbl>, `2019_week9` <dbl>, `2019_week10` <dbl>,
## #   `2019_week11` <dbl>, `2019_week12` <dbl>, `2019_week13` <dbl>,
## #   `2019_week14` <dbl>, `2019_week15` <dbl>, `2019_week16` <dbl>,
## #   `2019_week17` <dbl>, `2019_week18` <dbl>, `2019_week19` <dbl>,
## #   `2019_week20` <dbl>, `2019_week21` <dbl>, `2019_week22` <dbl>,
## #   `2019_week23` <dbl>, `2019_week24` <dbl>, `2019_week25` <dbl>, …

The original dataset has 5 rows (companies) and 106 columns (company name + 105 weeks).

Reshape to Long Format

We use pivot_longer() to transform the data from wide to long format:

stock_df_long <- stock_df %>%
  pivot_longer(
    cols = !company,                    # Select all columns except 'company'
    names_to = c("year", "week"),       # Split column names into 'year' and 'week'
    names_sep = "_week",                # Separator between year and week
    names_transform = list(             # Transform the extracted names
      year = as.integer,                # Convert year to integer
      week = as.integer                 # Convert week to integer
    ),
    values_to = "price"                 # Name for the values column
  )

Long Format Data

stock_df_long
## # A tibble: 525 × 4
##    company  year  week price
##    <chr>   <int> <int> <dbl>
##  1 Amazon   2019     1 1848.
##  2 Amazon   2019     2 1641.
##  3 Amazon   2019     3 1696.
##  4 Amazon   2019     4 1671.
##  5 Amazon   2019     5 1626.
##  6 Amazon   2019     6 1588.
##  7 Amazon   2019     7 1608.
##  8 Amazon   2019     8 1632.
##  9 Amazon   2019     9 1672.
## 10 Amazon   2019    10 1621.
## # ℹ 515 more rows

The reshaped dataset now has 525 rows and 4 columns.

Summary Statistics

# Summary by company
stock_df_long %>%
  group_by(company) %>%
  summarise(
    min_price = min(price),
    max_price = max(price),
    avg_price = mean(price),
    weeks = n()
  ) %>%
  knitr::kable(digits = 2, caption = "Summary Statistics by Company")
Summary Statistics by Company
company min_price max_price avg_price weeks
Amazon 1588.22 3401.80 2234.66 105
Apple 38.07 132.69 74.18 105
Facebook 143.80 293.66 209.06 105
Google 1057.19 1827.99 1341.01 105
Microsoft 102.78 228.91 162.59 105

Visualization

ggplot(stock_df_long, aes(x = week, y = price, color = company)) +
  geom_line(linewidth = 1) +
  labs(
    title = "Weekly Stock Prices for Tech Companies in 2019",
    x = "Week of Year",
    y = "Stock Price (USD)",
    color = "Company"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    legend.position = "bottom"
  )

Conclusion

The data has been successfully reshaped from wide to long format, making it easier to analyze and visualize trends across multiple companies over time.