title: ‘Week 3 Challenge: Stock Markets’ author: “David Acevedo” date: “2025-09-20” output: html_document —
This report looks at stock market values in High Income
Countries.
The goal is to see how they have changed over time and also compare
averages by decade.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(psych)
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(ggthemes) # for theme_economist
First, I load the Excel file with the stock market data.
stock_mkt <- read_excel(
"/Users/davidacevedo/Library/Mobile Documents/com~apple~CloudDocs/01. NYU | MSQM/02. R Programming for Data/02. Assignments/06. Lesson #6 Assignments/Stock Markets, US$.xlsx",
sheet = "annual"
)
The file is wide, so I reshape it by pivoting (Year, Country,
Value).
I also remove missing values and round the numbers.
stock_pivot <- pivot_longer(stock_mkt, cols = -Year, names_to = "Country", values_to = "Value", values_transform = list(Value = as.numeric))
stock <- subset(stock_pivot, !is.na(stock_pivot$Value))
stock$Value <- round(stock$Value, 1)
head(stock, 5)
## # A tibble: 5 × 3
## Year Country Value
## <dbl> <chr> <dbl>
## 1 1996 Argentina 23.9
## 2 1996 Australia 40.1
## 3 1996 Austria 37.3
## 4 1996 Belgium 62.6
## 5 1996 Brazil 15.1
Now I filter the data to keep only High Income Countries.
high_income <- filter(stock, stock$Country == "High Income Countries")
I look at the first rows and some summary statistics to get an idea of the numbers.
head(high_income, 5)
## # A tibble: 5 × 3
## Year Country Value
## <dbl> <chr> <dbl>
## 1 1996 High Income Countries 65.7
## 2 1997 High Income Countries 76.4
## 3 1998 High Income Countries 88.3
## 4 1999 High Income Countries 105.
## 5 2000 High Income Countries 114.
summary(high_income)
## Year Country Value
## Min. :1996 Length:30 Min. : 65.7
## 1st Qu.:2003 Class :character 1st Qu.: 92.8
## Median :2010 Mode :character Median :114.2
## Mean :2010 Mean :142.4
## 3rd Qu.:2018 3rd Qu.:174.2
## Max. :2025 Max. :329.5
describe(high_income)
## vars n mean sd median trimmed mad min max range skew
## Year 1 30 2010.50 8.8 2010.5 2010.50 11.12 1996.0 2025.0 29.0 0.00
## Country* 2 30 1.00 0.0 1.0 1.00 0.00 1.0 1.0 0.0 NaN
## Value 3 30 142.37 69.0 114.2 132.28 44.48 65.7 329.5 263.8 1.16
## kurtosis se
## Year -1.32 1.61
## Country* NaN 0.00
## Value 0.38 12.60
Here is a line chart that shows the stock market values year by
year.
It uses the Economist theme to make it look cleaner.
ggplot(high_income, aes(x = Year, y = Value)) +
geom_line(color = "darkblue", size = 1) +
labs(title = "Stock Market Trends in High Income Countries",
y = "Stock Market Value (US$)",
x = "Year") +
theme_economist() +
scale_color_economist()
Next, I calculate the average stock market value by decade.
This helps smooth out the year-to-year ups and downs.
# Create a new column for decade
#ChatGPT PRompt: Prompt: create a function that calculates the stock market value by decade for high income countries, print results with basic statistics:
high_income$Decade <- floor(high_income$Year / 10) * 10
# Group by decade and calculate summary
decade_summary <- aggregate(Value ~ Decade, data = high_income,
FUN = function(x) c(avg_value = mean(x, na.rm = TRUE), min_value = min(x, na.rm = TRUE), max_value = max(x, na.rm = TRUE)
))
# The result comes as a matrix inside the Value column
# so we convert it into separate columns
decade_summary <- data.frame(
Decade = decade_summary$Decade,
avg_value = decade_summary$Value[, "avg_value"],
min_value = decade_summary$Value[, "min_value"],
max_value = decade_summary$Value[, "max_value"]
)
decade_summary
## Decade avg_value min_value max_value
## 1 1990 83.7750 65.7 104.7
## 2 2000 98.6100 73.9 133.4
## 3 2010 139.6100 98.8 182.6
## 4 2020 258.9667 194.4 329.5
The results show both the overall trend each year and also how the
averages change when grouped by decade.
This gives a good picture of growth in High Income Countries over
time.
ggplot(decade_summary, aes(x = Decade, y = avg_value)) +
geom_line(color = "darkblue", size = 1.2) +
geom_point(color = "darkblue", size = 3) +
labs(title = "Average Stock Market Value by Decade",
x = "Decade",
y = "Average Value (US$)") +
theme_economist()