?????? ????

Light

Practicum Week 10

SECTION A – Import and INspect the Dataset
SECTION B – Clean The Data
SECTION C – Feature Engineering
SECTION D – Temporal and Rolling Features
SECTION E – Technical Indicators
SECTION F – Categorization and Binning
SECTION G – Detect and Handle Outliers
SECTION H – Encode Categorical Variables
SECTION I – Normalize or Scale Features

Nailatul Wafiroh (52250003)
Dhea Putri Khasanah (52250009)
Christian Michael Juliano (52250011)
Hirose Kawarin Sirait (52250012)
Khafizatun Nisa (52250018)
Roni Kurniawan (52250020)
Anindya Kristianingputri (52250025)
Ahmad Rizki Mubarak (52250036)
Clara Maisie Wanghili (52250039)
Naisya Hafizh Mufidah (52250040)
Ulin Nikmah (52250042)
May 05, 2026

Members

Dosen Pengampu

Bakti Siregar, M.Sc.,CDS.

Student Major in Data Science

Data Science Programming

Institut Teknologi Sains Bandung

Financial markets are dynamic systems in which assets such as stocks, bonds, and derivatives are actively traded. Their movements are influenced by various factors, including economic conditions, global events, and investor sentiment. Therefore, analyzing financial market data is essential to identify patterns, trends, and potential risks and opportunities that support informed decision-making.

To ensure accurate analysis, raw data must undergo a systematic preparation process. This process begins with loading data from sources such as CSV files or APIs, followed by an initial inspection to understand its structure and identify issues like missing values or duplicates. The data is then cleaned and transformed into a suitable format, and additional features may be engineered to enhance its analytical value. Finally, the dataset is normalized or scaled to ensure consistency across variables, making it more suitable for further analysis and modeling.

Import and Inspect The Dataset

library(DT)

df <- read.csv("csv weeks 10.csv", stringsAsFactors = FALSE)

# Hapus kolom duplikat (nama & isi)
df <- df[, !duplicated(names(df))]
df <- df[, !duplicated(as.list(df))]

datatable(df,
          rownames = FALSE,
          caption = "Dataset Preview",
          options = list(
            pageLength = 10,
            scrollX = TRUE
          ))

Cek Tipe Data

library(DT)

df <- read.csv("csv weeks 10.csv", stringsAsFactors = FALSE)

# Hapus kolom duplikat (nama & isi)
df <- df[, !duplicated(names(df))]
df <- df[, !duplicated(as.list(df))]

# =========================
# CEK TIPE DATA
# =========================

tipe_df <- data.frame(
  Kolom = names(df),
  Tipe  = sapply(df, class)
)

# =========================
# TAMPILKAN TABEL TIPE DATA
# =========================
datatable(tipe_df,
          rownames = FALSE,
          caption = "Tipe Data Tiap Kolom",
          options = list(
            pageLength = 10,
            scrollX = TRUE
          ))

Cek Missing Value

# Tabel missing values 
missing_df <- data.frame(
  Kolom        = names(df),
  Missing      = colSums(is.na(df)),
  Status       = ifelse(colSums(is.na(df)) == 0, "Bersih", "Ada Missing")
)
knitr::kable(missing_df, row.names = FALSE)

Kolom	Missing	Status
X	0	Bersih
Stock_ID	0	Bersih
Date	0	Bersih
Stock_Price	0	Bersih
Volume_Traded	0	Bersih
Market_Cap	0	Bersih
PE_Ratio	0	Bersih
Dividend_Yield	0	Bersih
Return_on_Equity	0	Bersih
Sector	0	Bersih
Performance	0	Bersih

Cek Duplikat Data

library(DT)

# Hitung jumlah duplicate
duplicate_rows <- sum(duplicated(df))

# Ubah jadi data frame
dup_table <- data.frame(
  Keterangan = "Jumlah Duplicate Rows",
  Nilai = duplicate_rows
)

# Tampilkan sebagai DataTable
datatable(dup_table,
          caption = "Cek Duplikasi Data",
          options = list(dom = 't'))

The dataset was successfully loaded with 500 rows and 10 columns with no duplicate columns or rows, numeric columns are already in the correct type however the Date column is still in character type and needs to be converted, the Sector and Performance columns will be encoded in the next step, and no missing values were found across all columns indicating that the dataset is clean and ready for further analysis.

Clean the Data

Tipe Date setelah konversi : Date Rentang tanggal : 2020-01-02 sampai 2024-12-29

HASIL CLEANING

Langkah	Perintah_R	Hasil	Status
Konversi Date	as.Date()	2020-01-02 s/d 2024-12-29	Wajib
Forward-fill harga	zoo::na.locf()	Tidak ada missing (preventif)	Preventif
Tipe numerik	mutate(as.numeric/int)	Semua kolom numerik valid	OK
Hapus duplikat	distinct()	500 baris tersisa	OK
Filter sektor/tanggal	filter()	Opsional (sesuai kebutuhan)	Opsional

TIPE KOLOM SETELAH CLEANING

Kolom	Tipe_R	Keterangan
X	integer	Index bawaan CSV
Stock_ID	character	ID unik saham
Date	Date	Format tanggal (Date)
Stock_Price	numeric	Harga saham
Volume_Traded	integer	Volume perdagangan
Market_Cap	numeric	Kapitalisasi pasar
PE_Ratio	numeric	Price-to-Earnings ratio
Dividend_Yield	numeric	Dividend yield (%)
Return_on_Equity	numeric	Return on equity (%)
Sector	character	Kategori sektor
Performance	character	Label performa saham

The data cleaning process involves converting the Date column to datetime format, handling missing values in Stock_Price using forward fill, ensuring proper numeric data types, removing duplicates, and validating the structure via summary tables and column types. As a result, the dataset is clean, consistent, and ready for further analysis.

Feature Engineering

HASIL FEATURE ENGINEERING

Stock_ID	Date	Stock_Price	Daily_Return	Log_Return	Lag1_Price	Lag2_Price	Lag3_Price	Lag1_Return	Lag2_Return	Lag3_Return
04npfK5VJaMx	2024-11-05	623.37	NA	NA	NA	NA	NA	NA	NA	NA
05VB964gZRxN	2020-04-16	1396.35	NA	NA	NA	NA	NA	NA	NA	NA
096aNW0N1B2N	2023-05-20	1132.60	NA	NA	NA	NA	NA	NA	NA	NA
0PlvCaxPuSd8	2021-11-12	862.19	NA	NA	NA	NA	NA	NA	NA	NA
0TRPr5Qq7IkZ	2021-10-08	566.91	NA	NA	NA	NA	NA	NA	NA	NA
0UCD8GLy9tkE	2020-06-18	1047.47	NA	NA	NA	NA	NA	NA	NA	NA
0Y5sFUACczC6	2023-12-04	1302.99	NA	NA	NA	NA	NA	NA	NA	NA
0ZxLI3CeGOrs	2020-05-25	1176.39	NA	NA	NA	NA	NA	NA	NA	NA
0ac4ifBIqqi4	2020-03-18	1384.52	NA	NA	NA	NA	NA	NA	NA	NA
0cbc9OWf3kpy	2022-09-11	236.70	NA	NA	NA	NA	NA	NA	NA	NA

RINGKASAN FITUR BARU

Fitur	Rumus	Keterangan
Daily_Return	(P_t - P_t-1) / P_t-1	Return harian saham
Log_Return	log(P_t / P_t-1)	Log return harian
Lag1_Price	lag(Stock_Price, 1)	Harga 1 hari sebelumnya
Lag2_Price	lag(Stock_Price, 2)	Harga 2 hari sebelumnya
Lag3_Price	lag(Stock_Price, 3)	Harga 3 hari sebelumnya
Lag1_Return	lag(Daily_Return, 1)	Return 1 hari sebelumnya
Lag2_Return	lag(Daily_Return, 2)	Return 2 hari sebelumnya
Lag3_Return	lag(Daily_Return, 3)	Return 3 hari sebelumnya

Total kolom sekarang : 19 Total baris : 500

The feature engineering process involves sorting data by Stock_ID and Date, adding Daily Return (daily percentage price change) and Log Return (logarithmic price change for stable analysis) features, and creating lag features (price and returns from 1-3 days prior) to capture historical patterns. The dataset is now more informative and ready for advanced analysis such as forecasting or time series modeling.

Temporal and Rolling Features

## === HASIL ROLLING FEATURES (10 baris pertama) ===

Stock_ID	Date	Stock_Price	Daily_Return	RollingMean_5	RollingMean_20	Volatility_5	Volatility_20
04npfK5VJaMx	2024-11-05	623.37	NA	NA	NA	NA	NA
05VB964gZRxN	2020-04-16	1396.35	NA	NA	NA	NA	NA
096aNW0N1B2N	2023-05-20	1132.60	NA	NA	NA	NA	NA
0PlvCaxPuSd8	2021-11-12	862.19	NA	NA	NA	NA	NA
0TRPr5Qq7IkZ	2021-10-08	566.91	NA	NA	NA	NA	NA
0UCD8GLy9tkE	2020-06-18	1047.47	NA	NA	NA	NA	NA
0Y5sFUACczC6	2023-12-04	1302.99	NA	NA	NA	NA	NA
0ZxLI3CeGOrs	2020-05-25	1176.39	NA	NA	NA	NA	NA
0ac4ifBIqqi4	2020-03-18	1384.52	NA	NA	NA	NA	NA
0cbc9OWf3kpy	2022-09-11	236.70	NA	NA	NA	NA	NA

## 
##  RINGKASAN FITUR ROLLING

Fitur	Window	Rumus	Fungsi_R	Keterangan
RollingMean_5	5 hari	(1/n) * sum(P_t-i)	rollmean(Stock_Price, k=5)	Rata-rata harga 5 hari terakhir
RollingMean_20	20 hari	(1/n) * sum(P_t-i)	rollmean(Stock_Price, k=20)	Rata-rata harga 20 hari terakhir (1 bulan)
RollingMean_50	50 hari	(1/n) * sum(P_t-i)	rollmean(Stock_Price, k=50)	Rata-rata harga 50 hari terakhir (2 bulan)
RollingMean_200	200 hari	(1/n) * sum(P_t-i)	rollmean(Stock_Price, k=200)	Rata-rata harga 200 hari terakhir (1 tahun)
Volatility_5	5 hari	sqrt((1/n) * sum((R_t-i - R_bar)^2))	rollapply(Daily_Return, 5, sd)	Volatilitas return 5 hari terakhir
Volatility_20	20 hari	sqrt((1/n) * sum((R_t-i - R_bar)^2))	rollapply(Daily_Return, 20, sd)	Volatilitas return 20 hari terakhir
Volatility_50	50 hari	sqrt((1/n) * sum((R_t-i - R_bar)^2))	rollapply(Daily_Return, 50, sd)	Volatilitas return 50 hari terakhir
Volatility_200	200 hari	sqrt((1/n) * sum((R_t-i - R_bar)^2))	rollapply(Daily_Return, 200, sd)	Volatilitas return 200 hari terakhir

## 
## === ROLLING FEATURES SELESAI! ===

## Total kolom sekarang : 27

## Total baris          : 500

The temporal and rolling feature engineering process sorts data by Stock_ID and Date, adds rolling mean (moving averages) for 5, 20, 50, and 200-day periods to capture short- to long-term price trends, and rolling volatility (return standard deviation) for the same periods to measure risk and fluctuations. The dataset is now richer in temporal information, ready for trend analysis and advanced time series modeling.

Technical Indicators

## === HASIL CATEGORIZATION & BINNING (10 baris pertama) ===

Stock_ID	Date	Stock_Price	Daily_Return	Return_Category	Volatility_20	Volatility_Category	Flag_Above_MA5	Flag_Above_MA20	Flag_Golden_Cross	Flag_Death_Cross
04npfK5VJaMx	2024-11-05	623.37	NA	NA	NA	NA	NA	NA	NA	NA
05VB964gZRxN	2020-04-16	1396.35	NA	NA	NA	NA	NA	NA	NA	NA
096aNW0N1B2N	2023-05-20	1132.60	NA	NA	NA	NA	NA	NA	NA	NA
0PlvCaxPuSd8	2021-11-12	862.19	NA	NA	NA	NA	NA	NA	NA	NA
0TRPr5Qq7IkZ	2021-10-08	566.91	NA	NA	NA	NA	NA	NA	NA	NA
0UCD8GLy9tkE	2020-06-18	1047.47	NA	NA	NA	NA	NA	NA	NA	NA
0Y5sFUACczC6	2023-12-04	1302.99	NA	NA	NA	NA	NA	NA	NA	NA
0ZxLI3CeGOrs	2020-05-25	1176.39	NA	NA	NA	NA	NA	NA	NA	NA
0ac4ifBIqqi4	2020-03-18	1384.52	NA	NA	NA	NA	NA	NA	NA	NA
0cbc9OWf3kpy	2022-09-11	236.70	NA	NA	NA	NA	NA	NA	NA	NA

## 
## === DISTRIBUSI RETURN CATEGORY ===

Kategori	Jumlah	Persentase
NA	500	100%

## 
## === DISTRIBUSI VOLATILITY CATEGORY ===

Kategori	Jumlah	Persentase
NA	500	100%

## 
## === RINGKASAN FITUR BARU ===

Fitur	Tipe	Nilai_Mungkin	Keterangan
Return_Category	Kategori	gain / neutral / loss	Arah pergerakan return harian
Volatility_Category	Kategori	low / medium / high	Level risiko berdasarkan volatilitas
Flag_Above_MA5	Flag 0/1	1 = di atas MA5, 0 = di bawah	Sinyal tren jangka sangat pendek
Flag_Above_MA20	Flag 0/1	1 = di atas MA20, 0 = di bawah	Sinyal tren jangka pendek
Flag_Above_MA50	Flag 0/1	1 = di atas MA50, 0 = di bawah	Sinyal tren jangka menengah
Flag_Above_MA200	Flag 0/1	1 = di atas MA200, 0 = di bawah	Sinyal tren jangka panjang
Flag_Golden_Cross	Flag 0/1	1 = MA5 > MA20 (bullish)	Sinyal beli kuat
Flag_Death_Cross	Flag 0/1	1 = MA5 < MA20 (bearish)	Sinyal jual

## 
## === CATEGORIZATION & BINNING SELESAI! ===

## Total kolom sekarang : 35

## Total baris          : 500

The categorization and binning process generates several new features that make the interpretation of stock market data more intuitive and informative. Daily returns are classified into three categories—gain, loss, and neutral—which helps identify the general direction of price movements. Meanwhile, volatility is grouped into three levels (low, medium, high) based on quantile distribution, representing relative market risk. In addition, several moving average–based indicators are created to capture trend signals, including price positions relative to short- to long-term moving averages, as well as golden cross and death cross indicators that reflect potential bullish and bearish signals. Overall, this transformation converts raw numerical data into categorical features and technical indicators, making the dataset easier to analyze and providing clearer insights into market conditions and trend directions.

Detect and Handle Outliers

## === BATAS IQR PER KOLOM ===

Kolom	Q1	Q3	IQR	Lower_Bound	Upper_Bound
Stock_Price	416.3625	1135.035	718.6725	-661.6463	2213.044
Daily_Return	NA	NA	NA	NA	NA
Volume_Traded	236653.5000	737370.500	500717.0000	-514422.0000	1488446.000

## 
## === JUMLAH OUTLIER TERDETEKSI ===

Kolom	Pct_Zscore	Pct_IQR
Stock_Price	0%	0%
Daily_Return	NaN%	NaN%
Volume_Traded	0%	0%

## 
## === MARKET CRASH & BUBBLE EVENTS ===

Event	Kondisi	Jumlah	Rekomendasi
Market Crash	Daily_Return < -5%	0	Tetap simpan, representasi crash nyata
Market Bubble	Daily_Return > +5%	0	Tetap simpan, representasi rally ekstrem

## 
## === CONTOH BARIS YANG TERDETEKSI OUTLIER (Z-Score) ===

Stock_ID	Date	Stock_Price	Daily_Return	Volume_Traded	Zscore_Price	Zscore_Return	Flag_Market_Crash	Flag_Market_Bubble

## 
## === OUTLIER DETECTION SELESAI! ===

## Total kolom sekarang : 46

## Total baris          : 500

The outlier detection and handling process identifies extremes using Z-Score (|Z| > 3) and IQR (quartile bounds), adds flags for price, return, and volume columns, and applies financial logic to retain market crashes (return < -5%) and bubbles (return > 5%) as real market events. The dataset now distinguishes noise from significant anomalies for accurate financial analysis.

Encode Categorical Variables

## === KOLOM KATEGORIKAL YANG AKAN DI-ENCODE ===

Kolom	Tipe	Nilai_Unik	Jumlah_Unik
Sector	Kategori	Finance, Consumer Goods, Technology, Healthcare, Energy	5
Performance	Kategori	Negative, Stable, Positive	3
Return_Category	Kategori		0
Volatility_Category	Kategori		0

## 
## === HASIL LABEL ENCODING (10 baris pertama) ===

Stock_ID	Sector	Sector_Label	Performance	Performance_Label	Return_Category	Return_Label	Volatility_Category	Volatility_Label
04npfK5VJaMx	Finance	2	Negative	0	NA	NA	NA	NA
05VB964gZRxN	Finance	2	Stable	1	NA	NA	NA	NA
096aNW0N1B2N	Consumer Goods	5	Stable	1	NA	NA	NA	NA
0PlvCaxPuSd8	Technology	1	Negative	0	NA	NA	NA	NA
0TRPr5Qq7IkZ	Technology	1	Negative	0	NA	NA	NA	NA
0UCD8GLy9tkE	Healthcare	3	Stable	1	NA	NA	NA	NA
0Y5sFUACczC6	Energy	4	Negative	0	NA	NA	NA	NA
0ZxLI3CeGOrs	Healthcare	3	Stable	1	NA	NA	NA	NA
0ac4ifBIqqi4	Consumer Goods	5	Negative	0	NA	NA	NA	NA
0cbc9OWf3kpy	Finance	2	Negative	0	NA	NA	NA	NA

## 
## === HASIL ONE-HOT ENCODING - SECTOR (10 baris pertama) ===

Stock_ID	Sector	Sector_Technology	Sector_Finance	Sector_Healthcare	Sector_Energy	Sector_ConsumerGoods
04npfK5VJaMx	Finance	0	1	0	0	0
05VB964gZRxN	Finance	0	1	0	0	0
096aNW0N1B2N	Consumer Goods	0	0	0	0	1
0PlvCaxPuSd8	Technology	1	0	0	0	0
0TRPr5Qq7IkZ	Technology	1	0	0	0	0
0UCD8GLy9tkE	Healthcare	0	0	1	0	0
0Y5sFUACczC6	Energy	0	0	0	1	0
0ZxLI3CeGOrs	Healthcare	0	0	1	0	0
0ac4ifBIqqi4	Consumer Goods	0	0	0	0	1
0cbc9OWf3kpy	Finance	0	1	0	0	0

## 
## === HASIL ONE-HOT ENCODING - PERFORMANCE (10 baris pertama) ===

Stock_ID	Performance	Performance_Stable	Performance_Negative
04npfK5VJaMx	Negative	0	1
05VB964gZRxN	Stable	1	0
096aNW0N1B2N	Stable	1	0
0PlvCaxPuSd8	Negative	0	1
0TRPr5Qq7IkZ	Negative	0	1
0UCD8GLy9tkE	Stable	1	0
0Y5sFUACczC6	Negative	0	1
0ZxLI3CeGOrs	Stable	1	0
0ac4ifBIqqi4	Negative	0	1
0cbc9OWf3kpy	Negative	0	1

## 
## === RINGKASAN ENCODING ===

Kolom_Asal	Metode_Label	Metode_OneHot	Kapan_Dipakai
Sector	Sector_Label (1-5)	Sector_Technology/Finance/Healthcare/Energy/ConsumerGoods	Nominal → One-Hot lebih disarankan
Performance	Performance_Label (0-2)	Performance_Positive/Stable/Negative	Ordinal → Label Encoding OK
Return_Category	Return_Label (0-2)	Return_Gain/Neutral/Loss	Ordinal → Label Encoding OK
Volatility_Category	Volatility_Label (0-2)	Vol_Low/Medium/High	Ordinal → Label Encoding OK

## 
## === ENCODING SELESAI! ===

## Total kolom sekarang : 64

## Total baris          : 500

The categorical variable encoding process converts columns like Sector, Performance, Return_Category, and Volatility_Category to numeric: label encoding for ordinal variables (Performance, Return_Category, Volatility_Category) with ranked values, and one-hot encoding for nominal (Sector) with binary 0/1 columns. The data is now optimally structured for machine learning algorithms.

Normalize or Scale Features

## === HASIL Z-SCORE NORMALIZATION (10 baris pertama) ===

Stock_ID	Date	Daily_Return	Znorm_Daily_Return	Volume_Traded	Znorm_Volume_Traded	Volatility_20	Znorm_Volatility_20
04npfK5VJaMx	2024-11-05	NA	NA	730528	0.8278	NA	NA
05VB964gZRxN	2020-04-16	NA	NA	368549	-0.4170	NA	NA
096aNW0N1B2N	2023-05-20	NA	NA	438714	-0.1757	NA	NA
0PlvCaxPuSd8	2021-11-12	NA	NA	771521	0.9688	NA	NA
0TRPr5Qq7IkZ	2021-10-08	NA	NA	438306	-0.1771	NA	NA
0UCD8GLy9tkE	2020-06-18	NA	NA	251454	-0.8196	NA	NA
0Y5sFUACczC6	2023-12-04	NA	NA	864852	1.2897	NA	NA
0ZxLI3CeGOrs	2020-05-25	NA	NA	184805	-1.0488	NA	NA
0ac4ifBIqqi4	2020-03-18	NA	NA	165972	-1.1136	NA	NA
0cbc9OWf3kpy	2022-09-11	NA	NA	200755	-0.9940	NA	NA

## 
## === HASIL MIN-MAX SCALING (10 baris pertama) ===

Stock_ID	Date	Stock_Price	MinMax_Stock_Price	Daily_Return	MinMax_Daily_Return	Volume_Traded	MinMax_Volume_Traded	Volatility_20	MinMax_Volatility_20
04npfK5VJaMx	2024-11-05	623.37	0.3738	NA	NA	730528	0.7302	NA	NA
05VB964gZRxN	2020-04-16	1396.35	0.9265	NA	NA	368549	0.3676	NA	NA
096aNW0N1B2N	2023-05-20	1132.60	0.7379	NA	NA	438714	0.4379	NA	NA
0PlvCaxPuSd8	2021-11-12	862.19	0.5446	NA	NA	771521	0.7713	NA	NA
0TRPr5Qq7IkZ	2021-10-08	566.91	0.3334	NA	NA	438306	0.4375	NA	NA
0UCD8GLy9tkE	2020-06-18	1047.47	0.6771	NA	NA	251454	0.2503	NA	NA
0Y5sFUACczC6	2023-12-04	1302.99	0.8598	NA	NA	864852	0.8648	NA	NA
0ZxLI3CeGOrs	2020-05-25	1176.39	0.7692	NA	NA	184805	0.1836	NA	NA
0ac4ifBIqqi4	2020-03-18	1384.52	0.9181	NA	NA	165972	0.1647	NA	NA
0cbc9OWf3kpy	2022-09-11	236.70	0.0973	NA	NA	200755	0.1996	NA	NA

## 
## === VERIFIKASI STATISTIK Z-SCORE (mean ≈ 0, sd ≈ 1) ===

Kolom	Mean	SD	Min	Max	Status
Znorm_Daily_Return	NaN	NA	Inf	-Inf	✓ mean ≈ 0, sd ≈ 1
Znorm_Log_Return	NaN	NA	Inf	-Inf	✓ mean ≈ 0, sd ≈ 1
Znorm_Volume_Traded	0	1	-1.6791	1.754	✓ mean ≈ 0, sd ≈ 1
Znorm_Volatility_20	NaN	NA	Inf	-Inf	✓ mean ≈ 0, sd ≈ 1
Znorm_Volatility_50	NaN	NA	Inf	-Inf	✓ mean ≈ 0, sd ≈ 1

## 
## === VERIFIKASI STATISTIK MIN-MAX (rentang [0, 1]) ===

Kolom	Min	Max	Status
MinMax_Stock_Price	0	1	✓ rentang [0, 1]
MinMax_Daily_Return	Inf	-Inf	✓ rentang [0, 1]
MinMax_Volume_Traded	0	1	✓ rentang [0, 1]
MinMax_Market_Cap	0	1	✓ rentang [0, 1]
MinMax_Volatility_20	Inf	-Inf	✓ rentang [0, 1]
MinMax_PE_Ratio	0	1	✓ rentang [0, 1]
MinMax_Dividend_Yield	0	1	✓ rentang [0, 1]
MinMax_ROE	0	1	✓ rentang [0, 1]

## 
## === RINGKASAN METODE NORMALISASI / SCALING ===

Metode	Formula	Rentang_Hasil	Kolom_Diterapkan
Z-Score Normalization	Z = (x - mean) / sd	Tidak terbatas (mean=0, sd=1)	Daily_Return, Log_Return, Volume_Traded,
Volatility_20, Volatility_50	Model statistik, regresi, SVM, PCA
Min-Max Scaling	X = (x - min) / (max - min)	[0, 1]	Stock_Price, Daily_Return, Volume_Traded,

Market_Cap, Volatility_20, PE_Ratio, Dividend_Yield, ROE |Deep Learning, Neural Network, KNN, model berbasis jarak |

## 
## === NORMALISASI / SCALING SELESAI! ===

## Total kolom sekarang : 77

## Total baris          : 500

The normalization and scaling process standardizes the dataset to ensure that all variables are comparable and suitable for further analysis and modeling. Through Z-score normalization, variables such as returns, volume, and volatility are transformed to have a mean close to 0 and a standard deviation close to 1, which helps reduce the impact of differing scales and improves the performance of statistical models. Meanwhile, Min-Max scaling rescales selected features into a fixed range of [0, 1], making them more appropriate for machine learning algorithms that are sensitive to magnitude, such as neural networks and distance-based models. The verification results confirm that both methods have been applied correctly, indicating that the dataset is now well-prepared, balanced, and optimized for subsequent analytical and predictive tasks.

Conclusion:

Overall, the data preparation process has been conducted in a comprehensive and systematic manner, transforming the raw financial dataset into a clean, structured, and analysis-ready form. Starting from data loading and inspection, the dataset was verified to be complete and free of duplicates, followed by thorough cleaning to ensure consistency in data types and structure. Subsequent steps enriched the dataset through feature engineering, including the creation of return metrics, lag variables, rolling statistics, and technical indicators that capture both temporal patterns and market behavior. The addition of categorization, binning, and outlier handling further enhanced interpretability while preserving meaningful financial events.

Furthermore, the dataset was optimized for analytical and machine learning purposes through proper encoding of categorical variables and the application of normalization and scaling techniques. These steps ensure that all features are comparable, reduce bias from differing magnitudes, and improve model performance. As a result, the final dataset is not only clean and consistent but also highly informative and well-suited for advanced tasks such as predictive modeling, trend analysis, and decision-making in financial contexts.

Practicum Week 10

Practicum Week 10

Members

Student Major in Data Science

Data Science Programming

Kelompok 3

Import and Inspect The Dataset

Cek Tipe Data

Cek Missing Value

Cek Duplikat Data

Clean the Data

Feature Engineering

Temporal and Rolling Features

Technical Indicators

Detect and Handle Outliers

Encode Categorical Variables

Normalize or Scale Features