?????? ????

Light

Practicum Week 10

SECTION A – Import and INspect the Dataset
SECTION B – Clean The Data
SECTION C – Feature Engineering
SECTION D – Temporal and Rolling Features
SECTION E – Technical Indicators
SECTION F – Categorization and Binning
SECTION G – Detect and Handle Outliers
SECTION H – Encode Categorical Variables
SECTION I – Normalize or Scale Features

Nailatul Wafiroh (52250003)
Dhea Putri Khasanah (52250009)
Christian Michael Juliano (52250011)
Hirose Kawarin Sirait (52250012)
Khafizatun Nisa (52250018)
Roni Kurniawan (52250020)
Anindya Kristianingputri (52250025)
Ahmad Rizki Mubarak (52250036)
Clara Maisie Wanghili (52250039)
Naisya Hafizh Mufidah (52250040)
Ulin Nikmah (52250042)
May 05, 2026

Members

Dosen Pengampu

Bakti Siregar, M.Sc.,CDS.

Student Major in Data Science

Data Science Programming

Institut Teknologi Sains Bandung

Introduction

Financial markets are dynamic systems in which assets such as stocks, bonds, and derivatives are actively traded. Their movements are influenced by various factors, including economic conditions, global events, and investor sentiment. Therefore, analyzing financial market data is essential to identify patterns, trends, and potential risks and opportunities that support informed decision-making.

To ensure accurate analysis, raw data must undergo a systematic preparation process. This process begins with loading data from sources such as CSV files or APIs, followed by an initial inspection to understand its structure and identify issues like missing values or duplicates. The data is then cleaned and transformed into a suitable format, and additional features may be engineered to enhance its analytical value. Finally, the dataset is normalized or scaled to ensure consistency across variables, making it more suitable for further analysis and modeling.

Import and Inspect The Dataset

library(DT)

# Load dataset
df <- read.csv("C:/Users/Roni/Desktop/program10/data csv practikum week 10.csv",
               stringsAsFactors = FALSE)

# Remove duplicate columns (by name and content)
df <- df[, !duplicated(names(df))]
df <- df[, !duplicated(as.list(df))]

# Display dataset preview
datatable(df,
          rownames = FALSE,
          caption = "Dataset Preview",
          options = list(
            pageLength = 10,
            scrollX = TRUE
          ))

Interpretation

The dataset is successfully imported and displayed using an interactive table. Duplicate columns are removed to ensure data integrity.

This step is important because:

It ensures no redundant variables exist It helps prevent bias in analysis It improves computational efficiency

Data Type Check

library(DT)

# Check column data types
type_df <- data.frame(
  Column = names(df),
  Type   = sapply(df, class)
)

# Display table
datatable(type_df,
          rownames = FALSE,
          caption = "Data Type of Each Column",
          options = list(
            pageLength = 10,
            scrollX = TRUE
          ))

Missing Value Check

# Missing values table
missing_df <- data.frame(
  Column   = names(df),
  Missing  = colSums(is.na(df)),
  Status   = ifelse(colSums(is.na(df)) == 0, "Clean", "Has Missing Values")
)

knitr::kable(missing_df, row.names = FALSE)

Column	Missing	Status
X	0	Clean
Stock_ID	0	Clean
Date	0	Clean
Stock_Price	0	Clean
Volume_Traded	0	Clean
Market_Cap	0	Clean
PE_Ratio	0	Clean
Dividend_Yield	0	Clean
Return_on_Equity	0	Clean
Sector	0	Clean
Performance	0	Clean

Duplicate check

library(DT)

# Count duplicate rows
duplicate_rows <- sum(duplicated(df))

dup_table <- data.frame(
  Description = "Number of Duplicate Rows",
  Value = duplicate_rows
)

datatable(dup_table,
          caption = "Duplicate Data Check",
          options = list(dom = 't'))

Interpretation:

The dataset was successfully loaded with 500 rows and 10 columns with no duplicate columns or rows, numeric columns are already in the correct type however the Date column is still in character type and needs to be converted, the Sector and Performance columns will be encoded in the next step, and no missing values were found across all columns indicating that the dataset is clean and ready for further analysis.

DATA CLEANING

Date Column Conversion Summary
Information	Result
Date column used	Date
Data type	Date
Start date	2020-01-02
End date	2024-12-29

Interpretation:

The data cleaning process involves converting the Date column to datetime format, handling missing values in Stock_Price using forward fill, ensuring proper numeric data types, removing duplicates, and validating the structure via summary tables and column types. As a result, the dataset is clean, consistent, and ready for further analysis.

Feature Engineering

FEATURE ENGINEERING RESULTS

Stock_ID	Date	Stock_Price	Daily_Return	Log_Return	Lag1_Price	Lag2_Price	Lag3_Price	Lag1_Return	Lag2_Return	Lag3_Return
04npfK5VJaMx	2024-11-05	623.37	NA	NA	NA	NA	NA	NA	NA	NA
05VB964gZRxN	2020-04-16	1396.35	NA	NA	NA	NA	NA	NA	NA	NA
096aNW0N1B2N	2023-05-20	1132.60	NA	NA	NA	NA	NA	NA	NA	NA
0PlvCaxPuSd8	2021-11-12	862.19	NA	NA	NA	NA	NA	NA	NA	NA
0TRPr5Qq7IkZ	2021-10-08	566.91	NA	NA	NA	NA	NA	NA	NA	NA
0UCD8GLy9tkE	2020-06-18	1047.47	NA	NA	NA	NA	NA	NA	NA	NA
0Y5sFUACczC6	2023-12-04	1302.99	NA	NA	NA	NA	NA	NA	NA	NA
0ZxLI3CeGOrs	2020-05-25	1176.39	NA	NA	NA	NA	NA	NA	NA	NA
0ac4ifBIqqi4	2020-03-18	1384.52	NA	NA	NA	NA	NA	NA	NA	NA
0cbc9OWf3kpy	2022-09-11	236.70	NA	NA	NA	NA	NA	NA	NA	NA

Feature Summary

Feature	Formula	Description
Daily_Return	(P_t - P_t-1) / P_t-1	Daily stock return
Log_Return	log(P_t / P_t-1)	Logarithmic return
Lag1_Price	lag(Stock_Price, 1)	Price 1 day ago
Lag2_Price	lag(Stock_Price, 2)	Price 2 days ago
Lag3_Price	lag(Stock_Price, 3)	Price 3 days ago
Lag1_Return	lag(Daily_Return, 1)	Return 1 day ago
Lag2_Return	lag(Daily_Return, 2)	Return 2 days ago
Lag3_Return	lag(Daily_Return, 3)	Return 3 days ago

Interpretation:

The feature engineering process involves sorting data by Stock_ID and Date, adding Daily Return (daily percentage price change) and Log Return (logarithmic price change for stable analysis) features, and creating lag features (price and returns from 1-3 days prior) to capture historical patterns. The dataset is now more informative and ready for advanced analysis such as forecasting or time series modeling.

Temporal and Rolling Features

HASIL ROLLING FEATURES (10 baris pertama)

Stock_ID	Date	Stock_Price	Daily_Return	RollingMean_5	RollingMean_20	Volatility_5	Volatility_20
04npfK5VJaMx	2024-11-05	623.37	NA	NA	NA	NA	NA
05VB964gZRxN	2020-04-16	1396.35	NA	NA	NA	NA	NA
096aNW0N1B2N	2023-05-20	1132.60	NA	NA	NA	NA	NA
0PlvCaxPuSd8	2021-11-12	862.19	NA	NA	NA	NA	NA
0TRPr5Qq7IkZ	2021-10-08	566.91	NA	NA	NA	NA	NA
0UCD8GLy9tkE	2020-06-18	1047.47	NA	NA	NA	NA	NA
0Y5sFUACczC6	2023-12-04	1302.99	NA	NA	NA	NA	NA
0ZxLI3CeGOrs	2020-05-25	1176.39	NA	NA	NA	NA	NA
0ac4ifBIqqi4	2020-03-18	1384.52	NA	NA	NA	NA	NA
0cbc9OWf3kpy	2022-09-11	236.70	NA	NA	NA	NA	NA

RINGKASAN FITUR ROLLING

Fitur	Window	Rumus	Fungsi_R	Keterangan
RollingMean_5	5 hari	(1/n) * sum(P_t-i)	rollmean(Stock_Price, k=5)	Rata-rata harga 5 hari terakhir
RollingMean_20	20 hari	(1/n) * sum(P_t-i)	rollmean(Stock_Price, k=20)	Rata-rata harga 20 hari terakhir (1 bulan)
RollingMean_50	50 hari	(1/n) * sum(P_t-i)	rollmean(Stock_Price, k=50)	Rata-rata harga 50 hari terakhir (2 bulan)
RollingMean_200	200 hari	(1/n) * sum(P_t-i)	rollmean(Stock_Price, k=200)	Rata-rata harga 200 hari terakhir (1 tahun)
Volatility_5	5 hari	sqrt((1/n) * sum((R_t-i - R_bar)^2))	rollapply(Daily_Return, 5, sd)	Volatilitas return 5 hari terakhir
Volatility_20	20 hari	sqrt((1/n) * sum((R_t-i - R_bar)^2))	rollapply(Daily_Return, 20, sd)	Volatilitas return 20 hari terakhir
Volatility_50	50 hari	sqrt((1/n) * sum((R_t-i - R_bar)^2))	rollapply(Daily_Return, 50, sd)	Volatilitas return 50 hari terakhir
Volatility_200	200 hari	sqrt((1/n) * sum((R_t-i - R_bar)^2))	rollapply(Daily_Return, 200, sd)	Volatilitas return 200 hari terakhir

Interpretation:

The temporal and rolling feature engineering process sorts data by Stock_ID and Date, adds rolling mean (moving averages) for 5, 20, 50, and 200-day periods to capture short- to long-term price trends, and rolling volatility (return standard deviation) for the same periods to measure risk and fluctuations. The dataset is now richer in temporal information, ready for trend analysis and advanced time series modeling.

Technical Indicators

HASIL CATEGORIZATION & BINNING (10 baris pertama)

Stock_ID	Date	Stock_Price	Daily_Return	Return_Category	Volatility_20	Volatility_Category	Flag_Above_MA5	Flag_Above_MA20	Flag_Golden_Cross	Flag_Death_Cross
04npfK5VJaMx	2024-11-05	623.37	NA	NA	NA	NA	NA	NA	NA	NA
05VB964gZRxN	2020-04-16	1396.35	NA	NA	NA	NA	NA	NA	NA	NA
096aNW0N1B2N	2023-05-20	1132.60	NA	NA	NA	NA	NA	NA	NA	NA
0PlvCaxPuSd8	2021-11-12	862.19	NA	NA	NA	NA	NA	NA	NA	NA
0TRPr5Qq7IkZ	2021-10-08	566.91	NA	NA	NA	NA	NA	NA	NA	NA
0UCD8GLy9tkE	2020-06-18	1047.47	NA	NA	NA	NA	NA	NA	NA	NA
0Y5sFUACczC6	2023-12-04	1302.99	NA	NA	NA	NA	NA	NA	NA	NA
0ZxLI3CeGOrs	2020-05-25	1176.39	NA	NA	NA	NA	NA	NA	NA	NA
0ac4ifBIqqi4	2020-03-18	1384.52	NA	NA	NA	NA	NA	NA	NA	NA
0cbc9OWf3kpy	2022-09-11	236.70	NA	NA	NA	NA	NA	NA	NA	NA

Interpretation:

The categorization and binning process generates several new features that make the interpretation of stock market data more intuitive and informative. Daily returns are classified into three categories—gain, loss, and neutral—which helps identify the general direction of price movements. Meanwhile, volatility is grouped into three levels (low, medium, high) based on quantile distribution, representing relative market risk. In addition, several moving average–based indicators are created to capture trend signals, including price positions relative to short- to long-term moving averages, as well as golden cross and death cross indicators that reflect potential bullish and bearish signals. Overall, this transformation converts raw numerical data into categorical features and technical indicators, making the dataset easier to analyze and providing clearer insights into market conditions and trend directions.

DISTRIBUSI RETURN CATEGORY

Kategori	Jumlah	Persentase
NA	500	100%

DISTRIBUSI VOLATILITY CATEGORY

Kategori	Jumlah	Persentase
NA	500	100%

RINGKASAN FITUR BARU

Fitur	Tipe	Nilai_Mungkin	Keterangan
Return_Category	Kategori	gain / neutral / loss	Arah pergerakan return harian
Volatility_Category	Kategori	low / medium / high	Level risiko berdasarkan volatilitas
Flag_Above_MA5	Flag 0/1	1 = di atas MA5, 0 = di bawah	Sinyal tren jangka sangat pendek
Flag_Above_MA20	Flag 0/1	1 = di atas MA20, 0 = di bawah	Sinyal tren jangka pendek
Flag_Above_MA50	Flag 0/1	1 = di atas MA50, 0 = di bawah	Sinyal tren jangka menengah
Flag_Above_MA200	Flag 0/1	1 = di atas MA200, 0 = di bawah	Sinyal tren jangka panjang
Flag_Golden_Cross	Flag 0/1	1 = MA5 > MA20 (bullish)	Sinyal beli kuat
Flag_Death_Cross	Flag 0/1	1 = MA5 < MA20 (bearish)	Sinyal jual

Detect and Handle Outliers

## === IQR BOUNDARIES PER COLUMN ===

Column	Q1	Q3	IQR	Lower_Bound	Upper_Bound
Stock_Price	416.3625	1135.035	718.6725	-661.6463	2213.044
Daily_Return	NA	NA	NA	NA	NA
Volume_Traded	236653.5000	737370.500	500717.0000	-514422.0000	1488446.000

## 
## === TOTAL DETECTED OUTLIERS ===

Column	Pct_Zscore	Pct_IQR
Stock_Price	0%	0%
Daily_Return	NaN%	NaN%
Volume_Traded	0%	0%

## 
## === MARKET CRASH & BUBBLE EVENTS ===

Event	Condition	Count	Recommendation
Market Crash	Daily_Return < -5%	0	Keep data (real crash representation)
Market Bubble	Daily_Return > +5%	0	Keep data (extreme rally representation)

## 
## === SAMPLE OUTLIER ROWS (Z-Score Method) ===

Stock_ID	Date	Stock_Price	Daily_Return	Volume_Traded	Zscore_Price	Zscore_Return	Flag_Market_Crash	Flag_Market_Bubble

## 
## === OUTLIER DETECTION COMPLETED! ===

## Total columns: 46

## Total rows   : 500

Interpretation:

The outlier detection and handling process identifies extremes using Z-Score (|Z| > 3) and IQR (quartile bounds), adds flags for price, return, and volume columns, and applies financial logic to retain market crashes (return < -5%) and bubbles (return > 5%) as real market events. The dataset now distinguishes noise from significant anomalies for accurate financial analysis.

Encode Categorical Variables

## === CATEGORICAL COLUMNS TO BE ENCODED ===

Column	Type	Unique_Values	Num_Unique
Sector	Categorical	Finance, Consumer Goods, Technology, Healthcare, Energy	5
Performance	Categorical	Negative, Stable, Positive	3
Return_Category	Categorical		0
Volatility_Category	Categorical		0

## 
## === LABEL ENCODING RESULT (first 10 rows) ===

Stock_ID	Sector	Sector_Label	Performance	Performance_Label	Return_Category	Return_Label	Volatility_Category	Volatility_Label
04npfK5VJaMx	Finance	2	Negative	0	NA	NA	NA	NA
05VB964gZRxN	Finance	2	Stable	1	NA	NA	NA	NA
096aNW0N1B2N	Consumer Goods	5	Stable	1	NA	NA	NA	NA
0PlvCaxPuSd8	Technology	1	Negative	0	NA	NA	NA	NA
0TRPr5Qq7IkZ	Technology	1	Negative	0	NA	NA	NA	NA
0UCD8GLy9tkE	Healthcare	3	Stable	1	NA	NA	NA	NA
0Y5sFUACczC6	Energy	4	Negative	0	NA	NA	NA	NA
0ZxLI3CeGOrs	Healthcare	3	Stable	1	NA	NA	NA	NA
0ac4ifBIqqi4	Consumer Goods	5	Negative	0	NA	NA	NA	NA
0cbc9OWf3kpy	Finance	2	Negative	0	NA	NA	NA	NA

## 
## === ONE-HOT ENCODING RESULT - SECTOR (first 10 rows) ===

Stock_ID	Sector	Sector_Technology	Sector_Finance	Sector_Healthcare	Sector_Energy	Sector_ConsumerGoods
04npfK5VJaMx	Finance	0	1	0	0	0
05VB964gZRxN	Finance	0	1	0	0	0
096aNW0N1B2N	Consumer Goods	0	0	0	0	1
0PlvCaxPuSd8	Technology	1	0	0	0	0
0TRPr5Qq7IkZ	Technology	1	0	0	0	0
0UCD8GLy9tkE	Healthcare	0	0	1	0	0
0Y5sFUACczC6	Energy	0	0	0	1	0
0ZxLI3CeGOrs	Healthcare	0	0	1	0	0
0ac4ifBIqqi4	Consumer Goods	0	0	0	0	1
0cbc9OWf3kpy	Finance	0	1	0	0	0

## 
## === ONE-HOT ENCODING RESULT - PERFORMANCE (first 10 rows) ===

Stock_ID	Performance	Performance_Stable	Performance_Negative
04npfK5VJaMx	Negative	0	1
05VB964gZRxN	Stable	1	0
096aNW0N1B2N	Stable	1	0
0PlvCaxPuSd8	Negative	0	1
0TRPr5Qq7IkZ	Negative	0	1
0UCD8GLy9tkE	Stable	1	0
0Y5sFUACczC6	Negative	0	1
0ZxLI3CeGOrs	Stable	1	0
0ac4ifBIqqi4	Negative	0	1
0cbc9OWf3kpy	Negative	0	1

## 
## === ENCODING SUMMARY ===

Original_Column	Label_Encoding	One_Hot_Encoding	When_To_Use
Sector	Sector_Label (1-5)	Sector_Technology/Finance/Healthcare/Energy/ConsumerGoods	Nominal → One-Hot preferred
Performance	Performance_Label (0-2)	Performance_Positive/Stable/Negative	Ordinal → Label encoding OK
Return_Category	Return_Label (0-2)	Return_Gain/Neutral/Loss	Ordinal → Label encoding OK
Volatility_Category	Volatility_Label (0-2)	Vol_Low/Medium/High	Ordinal → Label encoding OK

## 
## === ENCODING COMPLETED! ===

## Total columns now : 64

## Total rows        : 500

Interpretation:

The categorical variable encoding process converts columns like Sector, Performance, Return_Category, and Volatility_Category to numeric: label encoding for ordinal variables (Performance, Return_Category, Volatility_Category) with ranked values, and one-hot encoding for nominal (Sector) with binary 0/1 columns. The data is now optimally structured for machine learning algorithms.

Normalize or Scale Features

## === Z-SCORE NORMALIZATION RESULTS (first 10 rows) ===

Stock_ID	Date	Daily_Return	Znorm_Daily_Return	Volume_Traded	Znorm_Volume_Traded	Volatility_20	Znorm_Volatility_20
04npfK5VJaMx	2024-11-05	NA	NA	730528	0.8278	NA	NA
05VB964gZRxN	2020-04-16	NA	NA	368549	-0.4170	NA	NA
096aNW0N1B2N	2023-05-20	NA	NA	438714	-0.1757	NA	NA
0PlvCaxPuSd8	2021-11-12	NA	NA	771521	0.9688	NA	NA
0TRPr5Qq7IkZ	2021-10-08	NA	NA	438306	-0.1771	NA	NA
0UCD8GLy9tkE	2020-06-18	NA	NA	251454	-0.8196	NA	NA
0Y5sFUACczC6	2023-12-04	NA	NA	864852	1.2897	NA	NA
0ZxLI3CeGOrs	2020-05-25	NA	NA	184805	-1.0488	NA	NA
0ac4ifBIqqi4	2020-03-18	NA	NA	165972	-1.1136	NA	NA
0cbc9OWf3kpy	2022-09-11	NA	NA	200755	-0.9940	NA	NA

## 
## === MIN-MAX SCALING RESULTS (first 10 rows) ===

Stock_ID	Date	Stock_Price	MinMax_Stock_Price	Daily_Return	MinMax_Daily_Return	Volume_Traded	MinMax_Volume_Traded	Volatility_20	MinMax_Volatility_20
04npfK5VJaMx	2024-11-05	623.37	0.3738	NA	NA	730528	0.7302	NA	NA
05VB964gZRxN	2020-04-16	1396.35	0.9265	NA	NA	368549	0.3676	NA	NA
096aNW0N1B2N	2023-05-20	1132.60	0.7379	NA	NA	438714	0.4379	NA	NA
0PlvCaxPuSd8	2021-11-12	862.19	0.5446	NA	NA	771521	0.7713	NA	NA
0TRPr5Qq7IkZ	2021-10-08	566.91	0.3334	NA	NA	438306	0.4375	NA	NA
0UCD8GLy9tkE	2020-06-18	1047.47	0.6771	NA	NA	251454	0.2503	NA	NA
0Y5sFUACczC6	2023-12-04	1302.99	0.8598	NA	NA	864852	0.8648	NA	NA
0ZxLI3CeGOrs	2020-05-25	1176.39	0.7692	NA	NA	184805	0.1836	NA	NA
0ac4ifBIqqi4	2020-03-18	1384.52	0.9181	NA	NA	165972	0.1647	NA	NA
0cbc9OWf3kpy	2022-09-11	236.70	0.0973	NA	NA	200755	0.1996	NA	NA

## 
## === Z-SCORE STATISTICS VALIDATION (mean ≈ 0, sd ≈ 1) ===

Column	Mean	SD	Min	Max	Status
Znorm_Daily_Return	NaN	NA	Inf	-Inf	✓ mean ≈ 0, sd ≈ 1
Znorm_Log_Return	NaN	NA	Inf	-Inf	✓ mean ≈ 0, sd ≈ 1
Znorm_Volume_Traded	0	1	-1.6791	1.754	✓ mean ≈ 0, sd ≈ 1
Znorm_Volatility_20	NaN	NA	Inf	-Inf	✓ mean ≈ 0, sd ≈ 1
Znorm_Volatility_50	NaN	NA	Inf	-Inf	✓ mean ≈ 0, sd ≈ 1

## 
## === MIN-MAX STATISTICS VALIDATION (range [0, 1]) ===

Column	Min	Max	Status
MinMax_Stock_Price	0	1	✓ range [0, 1]
MinMax_Daily_Return	Inf	-Inf	✓ range [0, 1]
MinMax_Volume_Traded	0	1	✓ range [0, 1]
MinMax_Market_Cap	0	1	✓ range [0, 1]
MinMax_Volatility_20	Inf	-Inf	✓ range [0, 1]
MinMax_PE_Ratio	0	1	✓ range [0, 1]
MinMax_Dividend_Yield	0	1	✓ range [0, 1]
MinMax_ROE	0	1	✓ range [0, 1]

## 
## === SUMMARY OF SCALING METHODS ===

Method	Formula	Output_Range	Applied_To
Z-Score Normalization	Z = (x - mean) / sd	Unbounded (mean=0, sd=1)	Daily_Return, Log_Return, Volume_Traded,
Volatility_20, Volatility_50	Statistical models, regression, SVM, PCA
Min-Max Scaling	X = (x - min) / (max - min)	[0, 1]	Stock_Price, Daily_Return, Volume_Traded,

Market_Cap, Volatility_20, PE_Ratio, Dividend_Yield, ROE |Deep learning, neural networks, KNN, distance-based models |

## 
## === SCALING PROCESS COMPLETED! ===

## Total columns : 77

## Total rows    : 500

Interpretation:

The normalization and scaling process standardizes the dataset to ensure that all variables are comparable and suitable for further analysis and modeling. Through Z-score normalization, variables such as returns, volume, and volatility are transformed to have a mean close to 0 and a standard deviation close to 1, which helps reduce the impact of differing scales and improves the performance of statistical models. Meanwhile, Min-Max scaling rescales selected features into a fixed range of [0, 1], making them more appropriate for machine learning algorithms that are sensitive to magnitude, such as neural networks and distance-based models. The verification results confirm that both methods have been applied correctly, indicating that the dataset is now well-prepared, balanced, and optimized for subsequent analytical and predictive tasks.

Conclusion:

Overall, the data preparation process has been conducted in a comprehensive and systematic manner, transforming the raw financial dataset into a clean, structured, and analysis-ready form. Starting from data loading and inspection, the dataset was verified to be complete and free of duplicates, followed by thorough cleaning to ensure consistency in data types and structure. Subsequent steps enriched the dataset through feature engineering, including the creation of return metrics, lag variables, rolling statistics, and technical indicators that capture both temporal patterns and market behavior. The addition of categorization, binning, and outlier handling further enhanced interpretability while preserving meaningful financial events.

Furthermore, the dataset was optimized for analytical and machine learning purposes through proper encoding of categorical variables and the application of normalization and scaling techniques. These steps ensure that all features are comparable, reduce bias from differing magnitudes, and improve model performance. As a result, the final dataset is not only clean and consistent but also highly informative and well-suited for advanced tasks such as predictive modeling, trend analysis, and decision-making in financial contexts.

Practicum Week 10

Practicum Week 10

Members

Student Major in Data Science

Data Science Programming

Kelompok 3

Import and Inspect The Dataset

Data Type Check

Missing Value Check

Duplicate check

DATA CLEANING

Feature Engineering

Temporal and Rolling Features

Technical Indicators

Detect and Handle Outliers

Encode Categorical Variables

Normalize or Scale Features