Question 1–> Load and Clean Data We began by loading the raw high-frequency data from the file TY_Matteo.txt, which contains intraday prices for the ETH T-Bills index. Each row in the dataset includes a date (DATE), time (TIME), and corresponding price information (CLOSE).
To prepare the data for analysis, we combined the DATE and TIME columns into a single timestamp using the POSIXct function. This created a precise and standardized datetime object for each observation. We then converted the price data into an xts object (data_tbill) indexed by this time vector
We confirmed successful conversion by plotting the first 1,000 data points. This plot gives a visual overview of the high-frequency price movements and helps detect any irregularities (such as flat lines or outliers) in the raw data. At this stage, we ensured the timestamp alignment was correct and prepared the data for further processing such as weekend removal and realized volatility estimation.
To ensure accurate analysis of intraday financial data, we removed weekend observations. Using the POSIXlt format, we extracted the weekday component from each timestamp, where 0 represents Sunday and 6 represents Saturday. We then dropped these rows to ensure that only trading days remained. This step is essential to avoid including non-trading periods which would artificially distort volatility and return calculations.
To validate that weekends were successfully removed from the high-frequency dataset, we visually inspected the processed time series by plotting the first 1,000 observations. In high-frequency financial data, weekends often appear as flat horizontal lines in time series plots. This is because prices remain unchanged during non-trading hours, but timestamps may continue if the data includes weekend entries. These flat lines can introduce biases in return and volatility calculations by implying no movement over a long period, which is not representative of market behavior.
After applying a filter to remove all timestamps corresponding to Saturday (wday == 6) and Sunday (wday == 0), we re-plotted the time series. The resulting graph shows continuous price changes without any unnaturally extended flat segments. This confirms that All non-trading weekend periods were correctly excluded and the dataset now reflects only active market hours, which is essential for accurate realized volatility estimation and return modeling.
# Load High Frequency Data into R
library(highfrequency)
library(xts) #important library
library(readr)
library(ggplot2)
getwd()
## [1] "C:/Users/user/Desktop"
cl = c(rep("factor", 4), rep("double", 4))
# load the data. In this case the delimiter is ','
df_2 <- read.delim("TY_Matteo.txt", sep = ",")