Week 3 Challenge: Stock Markets

title: ‘Week 3 Challenge: Stock Markets’ author: “David Acevedo” date: “2025-09-20” output: html_document —

This report looks at stock market values in High Income Countries.
The goal is to see how they have changed over time and also compare averages by decade.

knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)
library(psych)

## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

library(ggthemes)   # for theme_economist

First, I load the Excel file with the stock market data.

stock_mkt <- read_excel(
  "/Users/davidacevedo/Library/Mobile Documents/com~apple~CloudDocs/01. NYU | MSQM/02. R Programming for Data/02. Assignments/06. Lesson #6 Assignments/Stock Markets, US$.xlsx",
  sheet = "annual"
)

The file is wide, so I reshape it by pivoting (Year, Country, Value).
I also remove missing values and round the numbers.

stock_pivot <- pivot_longer(stock_mkt, cols = -Year,  names_to = "Country", values_to = "Value", values_transform = list(Value = as.numeric))
stock <- subset(stock_pivot, !is.na(stock_pivot$Value))
stock$Value <- round(stock$Value, 1)

head(stock, 5)

## # A tibble: 5 × 3
##    Year Country   Value
##   <dbl> <chr>     <dbl>
## 1  1996 Argentina  23.9
## 2  1996 Australia  40.1
## 3  1996 Austria    37.3
## 4  1996 Belgium    62.6
## 5  1996 Brazil     15.1

Now I filter the data to keep only High Income Countries.

high_income <- filter(stock, stock$Country == "High Income Countries")

I look at the first rows and some summary statistics to get an idea of the numbers.

head(high_income, 5)

## # A tibble: 5 × 3
##    Year Country               Value
##   <dbl> <chr>                 <dbl>
## 1  1996 High Income Countries  65.7
## 2  1997 High Income Countries  76.4
## 3  1998 High Income Countries  88.3
## 4  1999 High Income Countries 105. 
## 5  2000 High Income Countries 114.

summary(high_income)

##       Year        Country              Value      
##  Min.   :1996   Length:30          Min.   : 65.7  
##  1st Qu.:2003   Class :character   1st Qu.: 92.8  
##  Median :2010   Mode  :character   Median :114.2  
##  Mean   :2010                      Mean   :142.4  
##  3rd Qu.:2018                      3rd Qu.:174.2  
##  Max.   :2025                      Max.   :329.5

describe(high_income)

##          vars  n    mean   sd median trimmed   mad    min    max range skew
## Year        1 30 2010.50  8.8 2010.5 2010.50 11.12 1996.0 2025.0  29.0 0.00
## Country*    2 30    1.00  0.0    1.0    1.00  0.00    1.0    1.0   0.0  NaN
## Value       3 30  142.37 69.0  114.2  132.28 44.48   65.7  329.5 263.8 1.16
##          kurtosis    se
## Year        -1.32  1.61
## Country*      NaN  0.00
## Value        0.38 12.60

Here is a line chart that shows the stock market values year by year.
It uses the Economist theme to make it look cleaner.

ggplot(high_income, aes(x = Year, y = Value)) +
  geom_line(color = "darkblue", size = 1) +
  labs(title = "Stock Market Trends in High Income Countries",
       y = "Stock Market Value (US$)",
       x = "Year") +
  theme_economist() +
  scale_color_economist()

Next, I calculate the average stock market value by decade.
This helps smooth out the year-to-year ups and downs.

# Create a new column for decade
#ChatGPT PRompt: Prompt: create a function that calculates the stock market value by decade for high income countries, print results with basic statistics:

high_income$Decade <- floor(high_income$Year / 10) * 10

# Group by decade and calculate summary
decade_summary <- aggregate(Value ~ Decade, data = high_income, 
                            FUN = function(x) c(avg_value = mean(x, na.rm = TRUE), min_value = min(x, na.rm = TRUE), max_value = max(x, na.rm = TRUE)
                            ))

# The result comes as a matrix inside the Value column
# so we convert it into separate columns
decade_summary <- data.frame(
  Decade = decade_summary$Decade,
  avg_value = decade_summary$Value[, "avg_value"],
  min_value = decade_summary$Value[, "min_value"],
  max_value = decade_summary$Value[, "max_value"]
)

decade_summary

##   Decade avg_value min_value max_value
## 1   1990   83.7750      65.7     104.7
## 2   2000   98.6100      73.9     133.4
## 3   2010  139.6100      98.8     182.6
## 4   2020  258.9667     194.4     329.5

The results show both the overall trend each year and also how the averages change when grouped by decade.
This gives a good picture of growth in High Income Countries over time.

ggplot(decade_summary, aes(x = Decade, y = avg_value)) +
  geom_line(color = "darkblue", size = 1.2) +
  geom_point(color = "darkblue", size = 3) +
  labs(title = "Average Stock Market Value by Decade",
       x = "Decade",
       y = "Average Value (US$)") +
  theme_economist()