Student Details
Chandangowda Maruvanahalli Shivaramu (s4063920)
Problem Statement
For a six-year period (2018–2024), the S&P 500 stock market index
and Bitcoin prices are analyzed for patterns and statistical features in
this report. Descriptive statistics are being thoroughly analyzed in
order to comprehend the central patterns and variability of both
datasets. By computing and showing correlation coefficients every six
months, the analysis delves deeper into the datasets trends and examines
the relationship between them. In conclusion, this study evaluates
whether formal tests and visual inspection are used to determine whether
the prices of Bitcoin and the S&P 500 follow a normal distribution.
The analysis’s conclusions shed light on how these two financial
instruments behave and interact, facilitating data-driven
decision-making.
Load Packages
library(ggplot2)
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(lubridate)
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
library(readr)
library(stats)
Data
The S&P 500 index and Bitcoin historical prices from 2018 to 2024
are the datasets used in this analysis. The read_csv() function was used
to import the data, and the Date columns were then formatted
appropriately for additional analysis. A preview of the two datasets was
shown to ensure that the data was properly imported and prepared for
further work.
# Through R function called ‘read.csv’ which comes under 'readr' library, we are reading the csv files
sp_500_dataframe <- read_csv("/Users/chandangowda/Desktop/S&P 500-1.csv", show_col_types = FALSE)
btcoin_dataframe <- read_csv("/Users/chandangowda/Desktop/BTC-USD-1.csv", show_col_types = FALSE)
# Converting 'Date' column to date format
sp_500_dataframe$Date <- dmy(sp_500_dataframe$Date)
btcoin_dataframe$Date <- dmy(btcoin_dataframe$Date)
# Removing commas in 'Price' column and converting it to numeric
sp_500_dataframe$Price <- as.numeric(gsub(",", "", sp_500_dataframe$Price))
# head() function is used to display starting few rows of data
head(sp_500_dataframe)
head(btcoin_dataframe)
Task 1
In this task, descriptive statistics are calculated for both the
S&P 500 and Bitcoin datasets. For the purpose of comprehending the
central tendencies and variability in each dataset, critical metrics
such as mean, median, mode, range, standard deviation, and interquartile
range (IQR) are calculated. A comparative table is created to highlight
the differences in these metrics between the two financial instruments,
providing a basis for further analysis.
# Descriptive statistics for S&P 500 dataset
sp_500_mean <- mean(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_median <- median(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_mode <- as.numeric(names(sort(table(sp_500_dataframe$Price), decreasing=TRUE)[1]))
sp_500_range <- range(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_sd <- sd(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_min <- min(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_max <- max(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_first_quantile <- quantile(sp_500_dataframe$Price, 0.25, na.rm = TRUE)
sp_500_last_quantile <- quantile(sp_500_dataframe$Price, 0.75, na.rm = TRUE)
sp_500_iqr <- IQR(sp_500_dataframe$Price, na.rm = TRUE)
# Descriptive statistics for Bitcoin dataset
btcoin_mean <- mean(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_median <- median(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_mode <- as.numeric(names(sort(table(btcoin_dataframe$`Adj Close`), decreasing=TRUE)[1]))
btcoin_range <- range(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_sd <- sd(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_min <- min(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_max <- max(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_first_quantile <- quantile(btcoin_dataframe$`Adj Close`, 0.25, na.rm = TRUE)
btcoin_last_quantile <- quantile(btcoin_dataframe$`Adj Close`, 0.75, na.rm = TRUE)
btcoin_iqr <- IQR(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
# Combining both statistics for comparison
comparative_statistics <- data.frame(
statistics_type = c("Mean", "Median", "Mode", "Range", "Standard Deviation", "Min", "Max", "First Quantile (Q1)", "Third Quantile (Q3)", "IQR"),
sp_500_data = c(sp_500_mean, sp_500_median, sp_500_mode, paste(sp_500_range, collapse = " - "), sp_500_sd, sp_500_min, sp_500_max, sp_500_first_quantile, sp_500_last_quantile, sp_500_iqr),
btcoin_data = c(btcoin_mean, btcoin_median, btcoin_mode, paste(btcoin_range, collapse = " - "), btcoin_sd, btcoin_min, btcoin_max, btcoin_first_quantile, btcoin_last_quantile, btcoin_iqr)
)
# Displaying the combined statistics
comparative_statistics
Task 2
In this task, line plots are used to show the price trends of Bitcoin
and the S&P 500 over a six-year period, making it easy to compare
their movements. Additionally, the correlation between the two datasets
is calculated every six months to explore any potential relationship
between them. Plotting the correlation values over time demonstrates the
evolution of the link between Bitcoin and the S&P 500.
# Plotting the S&P 500 data trend
ggplot(sp_500_dataframe, aes(x = Date, y = Price)) +
geom_line(color = "pink") +
labs(title = "S&P 500 Stocks Trend",
x = "Date",
y = "Price")

# Plotting the Bitcoin data trend
ggplot(btcoin_dataframe, aes(x = Date, y = `Adj Close`)) +
geom_line(color = "skyblue") +
labs(title = "Bitcoin Trend",
x = "Date",
y = "Price (USD)")

# Calculating correlation every six months and plotting it
# Merging both the datasets
merged_dataframe <- merge(sp_500_dataframe, btcoin_dataframe, by = "Date", all = TRUE)
# Creating a new column for 6-month duration
merged_dataframe <- merged_dataframe %>%
mutate(six_month_duration = floor_date(Date, "6 months"))
# Calculating the correlation for each 6-month duration
correlation_data <- merged_dataframe %>%
group_by(six_month_duration) %>%
summarize(correlation = cor(`Adj Close`, Price, use = "complete.obs"))
# Plotting the correlation
ggplot(correlation_data, aes(x = six_month_duration, y = correlation)) +
geom_line(color = "purple") +
labs(title = "Correlation data of S&P 500 and Bitcoin prices (every six months)",
x = "Date",
y = "Correlation")

Task 3
In this task, the correlation between the S&P 500 and Bitcoin
prices is calculated to explore the relationship between the two
datasets. To measure the strength of this association, the correlation
coefficient is calculated after any missing values have been eliminated.
In order to show the general direction and strength of the association
between the S&P 500 and Bitcoin values, a linear trend line is added
to a scatter plot that is created to represent the correlation.
# Removing the rows with missing values from the merged dataframe
clean_dataframe <- merged_dataframe %>%
filter(!is.na(Price) & !is.na(`Adj Close`))
# Computing the correlation coefficient between both datasets
sp_btcoin_correlation_coefficient <- cor(clean_dataframe$`Adj Close`, clean_dataframe$Price, use = "complete.obs")
# Creating a scatter plot to visualize the correlation
ggplot(clean_dataframe, aes(x = Price, y = `Adj Close`)) +
geom_point(color = "black", alpha = 0.5) + # Scatter plot points
geom_smooth(method = "lm", color = "green", se = FALSE) + # Add linear trend line
labs(title = paste("Correlation visualization between S&P 500 and Bitcoin Prices"),
x = "S&P 500 Price",
y = "Bitcoin Price") +
theme_minimal()

# Displaying the correlation coefficient
sp_btcoin_correlation_coefficient
[1] 0.8976749
Task 4
This task evaluates whether the S&P 500 and Bitcoin datasets
follow a normal distribution. Histograms and Q-Q plots are used in the
study to visually check the data and evaluate its distribution and form.
To officially test for normality, the Shapiro-Wilk test is also run on
both datasets. Further statistical analysis is guided by the results,
which shed light on the distribution characteristics of the S&P 500
and Bitcoin values.
# Normal Distribution Analysis
# Visually inspecting by generating histogram for S&P 500
ggplot(sp_500_dataframe, aes(x = Price)) +
geom_histogram(binwidth = 150, fill = "green", color = "black") +
labs(title = "Distribution of S&P 500 Price", x = "S&P 500 Price", y = "Frequency") +
theme_minimal()

# Q-Q Plot for S&P 500
ggplot(sp_500_dataframe, aes(sample = Price)) +
stat_qq() +
stat_qq_line() +
labs(title = "Q-Q Plot for S&P 500 Price", x = "Theoretical Quantiles", y = "Sample Quantiles") +
theme_minimal()

# Visually inspecting by generating histogram for Bitcoin
ggplot(btcoin_dataframe, aes(x = `Adj Close`)) +
geom_histogram(binwidth = 2500, fill = "red", color = "black") +
labs(title = "Distribution of Bitcoin Price", x = "Bitcoin Price (Adj Close)", y = "Frequency") +
theme_minimal()

# Q-Q Plot for Bitcoin
ggplot(btcoin_dataframe, aes(sample = `Adj Close`)) +
stat_qq() +
stat_qq_line() +
labs(title = "Q-Q Plot for Bitcoin Price", x = "Theoretical Quantiles", y = "Sample Quantiles") +
theme_minimal()

# Shapiro-Wilk Test for Normality
# Shapiro-Wilk test for S&P 500
sp_500_testing <- shapiro.test(sp_500_dataframe$Price)
# Shapiro-Wilk test for Bitcoin
btcoin_testing <- shapiro.test(btcoin_dataframe$`Adj Close`)
# Displaying the results of the Shapiro-Wilk test
sp_500_testing
Shapiro-Wilk normality test
data: sp_500_dataframe$Price
W = 0.961, p-value < 2.2e-16
btcoin_testing
Shapiro-Wilk normality test
data: btcoin_dataframe$`Adj Close`
W = 0.90984, p-value < 2.2e-16
---
title: "MATH1324 Assignment 1"
subtitle: Statistical analysis of SP_500 and Bitcoin data of 6 years (from 2018 to 2024)
output:
  html_notebook: default
---

## Student Details

Chandangowda Maruvanahalli Shivaramu (s4063920)

## Problem Statement

For a six-year period (2018–2024), the S&P 500 stock market index and Bitcoin prices are analyzed for patterns and statistical features in this report. Descriptive statistics are being thoroughly analyzed in order to comprehend the central patterns and variability of both datasets. By computing and showing correlation coefficients every six months, the analysis delves deeper into the datasets trends and examines the relationship between them. In conclusion, this study evaluates whether formal tests and visual inspection are used to determine whether the prices of Bitcoin and the S&P 500 follow a normal distribution. The analysis's conclusions shed light on how these two financial instruments behave and interact, facilitating data-driven decision-making.

## Load Packages

```{r}
library(ggplot2)
library(dplyr)
library(lubridate)
library(readr)
library(stats)
```

## Data

The S&P 500 index and Bitcoin historical prices from 2018 to 2024 are the datasets used in this analysis. The read_csv() function was used to import the data, and the Date columns were then formatted appropriately for additional analysis. A preview of the two datasets was shown to ensure that the data was properly imported and prepared for further work.

```{r}
# Through R function called ‘read.csv’ which comes under 'readr' library, we are reading the csv files
sp_500_dataframe <- read_csv("/Users/chandangowda/Desktop/S&P 500-1.csv", show_col_types = FALSE)
btcoin_dataframe <- read_csv("/Users/chandangowda/Desktop/BTC-USD-1.csv", show_col_types = FALSE)

# Converting 'Date' column to date format
sp_500_dataframe$Date <- dmy(sp_500_dataframe$Date)
btcoin_dataframe$Date <- dmy(btcoin_dataframe$Date)

# Removing commas in 'Price' column and converting it to numeric
sp_500_dataframe$Price <- as.numeric(gsub(",", "", sp_500_dataframe$Price))

# head() function is used to display starting few rows of data
head(sp_500_dataframe)
head(btcoin_dataframe)
```

## Task 1

In this task, descriptive statistics are calculated for both the S&P 500 and Bitcoin datasets. For the purpose of comprehending the central tendencies and variability in each dataset, critical metrics such as mean, median, mode, range, standard deviation, and interquartile range (IQR) are calculated. A comparative table is created to highlight the differences in these metrics between the two financial instruments, providing a basis for further analysis.

```{r}
# Descriptive statistics for S&P 500 dataset
sp_500_mean <- mean(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_median <- median(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_mode <- as.numeric(names(sort(table(sp_500_dataframe$Price), decreasing=TRUE)[1]))
sp_500_range <- range(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_sd <- sd(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_min <- min(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_max <- max(sp_500_dataframe$Price, na.rm = TRUE)
sp_500_first_quantile <- quantile(sp_500_dataframe$Price, 0.25, na.rm = TRUE)
sp_500_last_quantile <- quantile(sp_500_dataframe$Price, 0.75, na.rm = TRUE)
sp_500_iqr <- IQR(sp_500_dataframe$Price, na.rm = TRUE)

# Descriptive statistics for Bitcoin dataset
btcoin_mean <- mean(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_median <- median(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_mode <- as.numeric(names(sort(table(btcoin_dataframe$`Adj Close`), decreasing=TRUE)[1]))
btcoin_range <- range(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_sd <- sd(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_min <- min(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_max <- max(btcoin_dataframe$`Adj Close`, na.rm = TRUE)
btcoin_first_quantile  <- quantile(btcoin_dataframe$`Adj Close`, 0.25, na.rm = TRUE)
btcoin_last_quantile <- quantile(btcoin_dataframe$`Adj Close`, 0.75, na.rm = TRUE)
btcoin_iqr <- IQR(btcoin_dataframe$`Adj Close`, na.rm = TRUE)

# Combining both statistics for comparison
comparative_statistics <- data.frame(
  statistics_type = c("Mean", "Median", "Mode", "Range", "Standard Deviation", "Min", "Max", "First Quantile (Q1)", "Third Quantile (Q3)", "IQR"),
  sp_500_data = c(sp_500_mean, sp_500_median, sp_500_mode, paste(sp_500_range, collapse = " - "), sp_500_sd, sp_500_min, sp_500_max, sp_500_first_quantile, sp_500_last_quantile, sp_500_iqr),
  btcoin_data = c(btcoin_mean, btcoin_median, btcoin_mode, paste(btcoin_range, collapse = " - "), btcoin_sd, btcoin_min, btcoin_max, btcoin_first_quantile, btcoin_last_quantile, btcoin_iqr)
)

# Displaying the combined statistics
comparative_statistics
```

## Task 2

In this task, line plots are used to show the price trends of Bitcoin and the S&P 500 over a six-year period, making it easy to compare their movements. Additionally, the correlation between the two datasets is calculated every six months to explore any potential relationship between them. Plotting the correlation values over time demonstrates the evolution of the link between Bitcoin and the S&P 500.

```{r}
# Plotting the S&P 500 data trend
ggplot(sp_500_dataframe, aes(x = Date, y = Price)) +
  geom_line(color = "pink") +
  labs(title = "S&P 500 Stocks Trend",
       x = "Date",
       y = "Price")
```

```{r}
# Plotting the Bitcoin data trend
ggplot(btcoin_dataframe, aes(x = Date, y = `Adj Close`)) +
  geom_line(color = "skyblue") +
  labs(title = "Bitcoin Trend",
       x = "Date",
       y = "Price (USD)")
```

```{r}
# Calculating correlation every six months and plotting it

# Merging both the datasets
merged_dataframe <- merge(sp_500_dataframe, btcoin_dataframe, by = "Date", all = TRUE)

# Creating a new column for 6-month duration
merged_dataframe <- merged_dataframe %>%
  mutate(six_month_duration = floor_date(Date, "6 months"))

# Calculating the correlation for each 6-month duration
correlation_data <- merged_dataframe %>%
  group_by(six_month_duration) %>%
  summarize(correlation = cor(`Adj Close`, Price, use = "complete.obs"))

# Plotting the correlation
ggplot(correlation_data, aes(x = six_month_duration, y = correlation)) +
  geom_line(color = "purple") +
  labs(title = "Correlation data of S&P 500 and Bitcoin prices (every six months)",
       x = "Date",
       y = "Correlation")
```

## Task 3

In this task, the correlation between the S&P 500 and Bitcoin prices is calculated to explore the relationship between the two datasets. To measure the strength of this association, the correlation coefficient is calculated after any missing values have been eliminated. In order to show the general direction and strength of the association between the S&P 500 and Bitcoin values, a linear trend line is added to a scatter plot that is created to represent the correlation.

```{r}
# Removing the rows with missing values from the merged dataframe
clean_dataframe <- merged_dataframe %>%
  filter(!is.na(Price) & !is.na(`Adj Close`))

# Computing the correlation coefficient between both datasets
sp_btcoin_correlation_coefficient <- cor(clean_dataframe$`Adj Close`, clean_dataframe$Price, use = "complete.obs")

# Creating a scatter plot to visualize the correlation
ggplot(clean_dataframe, aes(x = Price, y = `Adj Close`)) +
  geom_point(color = "black", alpha = 0.5) +  # Scatter plot points
  geom_smooth(method = "lm", color = "green", se = FALSE) +  # Add linear trend line
  labs(title = paste("Correlation visualization between S&P 500 and Bitcoin Prices"),
       x = "S&P 500 Price",
       y = "Bitcoin Price") +
  theme_minimal()
```

```{r}
# Displaying the correlation coefficient
sp_btcoin_correlation_coefficient
```

## Task 4

This task evaluates whether the S&P 500 and Bitcoin datasets follow a normal distribution. Histograms and Q-Q plots are used in the study to visually check the data and evaluate its distribution and form. To officially test for normality, the Shapiro-Wilk test is also run on both datasets. Further statistical analysis is guided by the results, which shed light on the distribution characteristics of the S&P 500 and Bitcoin values.

```{r}
# Normal Distribution Analysis

# Visually inspecting by generating histogram for S&P 500
ggplot(sp_500_dataframe, aes(x = Price)) +
  geom_histogram(binwidth = 150, fill = "green", color = "black") +
  labs(title = "Distribution of S&P 500 Price", x = "S&P 500 Price", y = "Frequency") +
  theme_minimal()
```

```{r}
# Q-Q Plot for S&P 500
ggplot(sp_500_dataframe, aes(sample = Price)) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot for S&P 500 Price", x = "Theoretical Quantiles", y = "Sample Quantiles") +
  theme_minimal()
```

```{r}
# Visually inspecting by generating histogram for Bitcoin
ggplot(btcoin_dataframe, aes(x = `Adj Close`)) +
  geom_histogram(binwidth = 2500, fill = "red", color = "black") +
  labs(title = "Distribution of Bitcoin Price", x = "Bitcoin Price (Adj Close)", y = "Frequency") +
  theme_minimal()
```

```{r}
# Q-Q Plot for Bitcoin
ggplot(btcoin_dataframe, aes(sample = `Adj Close`)) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot for Bitcoin Price", x = "Theoretical Quantiles", y = "Sample Quantiles") +
  theme_minimal()
```

```{r}
# Shapiro-Wilk Test for Normality

# Shapiro-Wilk test for S&P 500
sp_500_testing <- shapiro.test(sp_500_dataframe$Price)

# Shapiro-Wilk test for Bitcoin
btcoin_testing <- shapiro.test(btcoin_dataframe$`Adj Close`)

# Displaying the results of the Shapiro-Wilk test
sp_500_testing
btcoin_testing
```

## References

[1] RMIT (2024) Module 1 materials. Accessed on: (15 Aug - 05 Sep)/2024. Link: <https://rmit.instructure.com/courses/124219/pages/week-1-learning-materials-slash-activities?module_item_id=6322095>

[2] RMIT (2024) Module 2 materials. Accessed on: (20 Aug - 05 Sep)/2024. Link: <https://rmit.instructure.com/courses/124219/pages/week-2-learning-materials-slash-activities?module_item_id=6322100>

[3] QuillBot Website, used for paraphrasing my explanations. Accessed on: (01 Sep - 05 Sep)/2024. URL: <https://quillbot.com>

[4] Shapiro, S. S., & Wilk, M. B. (1965). An Analysis of Variance Test for Normality (Complete Samples). Biometrika, 52(3-4), 591–611. Accessed on: (01 Sep - 05 Sep)/2024. Link: <https://doi.org/10.1093/biomet/52.3-4.591>.
