library(readxl)
library(moments)
file_path <- "hw02 data.xlsx"
df <- read_excel(file_path)
## New names:
## • `` -> `...1`
colnames(df)[1] <- "Date"
df$Date <- as.numeric(df$Date)
midpoint <- nrow(df) %/% 2
df_first_half <- df[1:midpoint, ]
df_second_half <- df[(midpoint+1):nrow(df), ]
compute_stats <- function(data) {
stats <- data.frame(
Mean = colMeans(data, na.rm = TRUE),
SD = apply(data, 2, sd, na.rm = TRUE),
Skewness = apply(data, 2, skewness, na.rm = TRUE),
Kurtosis = apply(data, 2, kurtosis, na.rm = TRUE)
)
return(stats)
}
df_first_half_numeric <- df_first_half[, -1]
df_second_half_numeric <- df_second_half[, -1]
stats_first_half <- compute_stats(df_first_half_numeric)
stats_second_half <- compute_stats(df_second_half_numeric)
print("Summary statistics for the first half:")
## [1] "Summary statistics for the first half:"
print(stats_first_half)
## Mean SD Skewness Kurtosis
## SMALL LoBM 0.9797676 8.211286 1.1852484 12.022978
## ME1 BM2 1.1645060 8.398445 1.6200903 15.900982
## SMALL HiBM 1.4797768 10.164258 2.3471818 20.526106
## BIG LoBM 0.7670809 5.706153 0.1649981 9.985251
## ME2 BM2 0.8096833 6.720350 1.7538693 20.724012
## BIG HiBM 1.1905485 8.894354 1.7614099 17.444886
print("Summary statistics for the second half:")
## [1] "Summary statistics for the second half:"
print(stats_second_half)
## Mean SD Skewness Kurtosis
## SMALL LoBM 1.0019195 6.706768 -0.4116903 5.097896
## ME1 BM2 1.3557365 5.301609 -0.5451070 6.447590
## SMALL HiBM 1.4331846 5.520672 -0.4717360 7.298135
## BIG LoBM 0.9786109 4.700148 -0.3329533 4.982457
## ME2 BM2 1.0632631 4.330084 -0.4586522 5.579832
## BIG HiBM 1.1273446 4.926489 -0.5464995 5.840161
comparison <- stats_first_half - stats_second_half
print("Difference in statistics (First Half - Second Half):")
## [1] "Difference in statistics (First Half - Second Half):"
print(comparison)
## Mean SD Skewness Kurtosis
## SMALL LoBM -0.02215187 1.504518 1.5969387 6.925082
## ME1 BM2 -0.19123052 3.096836 2.1651973 9.453393
## SMALL HiBM 0.04659213 4.643586 2.8189177 13.227971
## BIG LoBM -0.21152996 1.006005 0.4979513 5.002794
## ME2 BM2 -0.25357978 2.390266 2.2125215 15.144180
## BIG HiBM 0.06320393 3.967865 2.3079094 11.604725
#Do the six split-halves statistics suggest to you that returns come from the same distribution over the entire period?
#To determine whether returns come from the same distribution over the entire period, we compare the statistical differences between the first and second halves of the dataset.
#Key Observations
#Mean Returns
#The mean returns are relatively stable, with minor differences between the two halves.
#The largest difference is for ME1 BM2 (-0.19) and ME2 BM2 (-0.25), but overall, these values are not drastically different.
#Standard Deviation (SD) Differences
#The first half exhibits higher volatility across all portfolios compared to the second half.
#The SD in the first half is significantly larger (e.g., SMALL HiBM: 10.16 vs. 5.52), indicating that returns were more volatile in the earlier period.
#Skewness Shift
#The first half has positive skewness (long right tails), while the second half has negative skewness (long left tails).
#This suggests a fundamental shift in the distribution of returns over time, possibly due to changing market dynamics.
#Kurtosis Differences
#The first half has significantly higher kurtosis, indicating more extreme events (higher probability of large deviations).
#For example, SMALL HiBM kurtosis drops from 20.52 to 7.29, showing a decrease in extreme return fluctuations in the later period.
#Conclusion:
#The differences in standard deviation, skewness, and kurtosis suggest that returns do not come from the same distribution over the entire period.
#The first half exhibits higher volatility, extreme deviations, and positive skewness, while the second half shows lower volatility, less extreme movements, and negative skewness.
#These results indicate a structural change in the return distribution, possibly due to economic, financial, or policy changes over time.