This tutorial will cover Z-score normalization, Min-max normalization, Range normalization, and Decimal scaling. Please ensure that you have the necessary packages installed before running the code.

#Key points -Data normalization is transforming the values of a variable or a set of variables so that they have a standard scale or range. -Different ways to normalize data in R include z-score normalization, min-max normalization, range normalization, decimal scaling, and max_scale normalization. -Z-score normalization transforms each value by subtracting its mean and dividing it by its standard deviation. The result is a new variable with a mean of zero and a standard deviation of one. -Min-max normalization transforms each value by subtracting its minimum and dividing by its range (maximum-minimum). The result is a new variable with a minimum of zero and a maximum of one. -Range normalization transforms each value by dividing by its range (maximum-minimum). The result is a new variable that has a range of one. -Decimal scaling transforms each value by dividing by a power of 10 that is equal to or larger than the maximum absolute value of the variable. The result is a new variable ranging from -1 to 1. -Max_scale normalization transforms each value by multiplying by a specific maximum value and dividing by the actual maximum value of the variable. The result is a new variable with a maximum of the particular value. -Normalization can reduce the effects of outliers, improve the performance of statistical tests and machine learning algorithms, and make the data more comparable and interpretable. -Normalization is not a one-size-fits-all solution, and we need to choose the appropriate method based on the type and distribution of the data, the purpose of the analysis, and the desired outcome.

#Data Normalization Techniques in R Data normalization is a process used to scale and transform numerical data into a common scale to make comparisons and analysis easier. In this tutorial, we’ll cover four different techniques for data normalization using the mtcars dataset.

#1. Z-score Normalization Z-score normalization standardizes data by subtracting the mean and dividing by the standard deviation. This technique transforms data into a distribution with a mean of 0 and a standard deviation of 1.

# Load necessary packages
#install.packages(c("car", "dummy"))
library(car)
library(dummy)

# Z-score normalize the mpg variable
mtcars$mpg_z <- scale(mtcars$mpg)

# Check the mean and standard deviation of the normalized mpg variable
mean(mtcars$mpg_z)
## [1] 7.112366e-17
sd(mtcars$mpg_z)
## [1] 1

2. Min-max Normalization

Min-max normalization scales data to a specified range (usually [0, 1]) by subtracting the minimum value and dividing by the range of values.

# Min-max normalize the mpg variable
mtcars$mpg_mm <- scale(mtcars$mpg, center = min(mtcars$mpg), scale = max(mtcars$mpg) - min(mtcars$mpg))

# Check the minimum and maximum of the normalized mpg variable
min(mtcars$mpg_mm)
## [1] 0
max(mtcars$mpg_mm)
## [1] 1

3. Range Normalization

Range normalization scales data to a specified range (usually [0, 1]) by subtracting the minimum value and dividing by the range of values.

# Range normalize the mpg variable
mtcars$mpg_rn <- (mtcars$mpg - min(mtcars$mpg)) / (max(mtcars$mpg) - min(mtcars$mpg))

# Check the range of the normalized mpg variable
range(mtcars$mpg_rn)
## [1] 0 1

#4. Decimal Scaling Decimal scaling scales data by dividing it with a power of 10 to move the decimal point to a desired position.

# Define a function for decimal scaling
decimal_scale <- function(x) {
  # Find the maximum absolute value of x
  max_abs <- max(abs(x))
  
  # Find the smallest power of 10 that is equal to or larger than max_abs
  power <- ceiling(log10(max_abs))
  
  # Divide x by 10^power
  x / (10^power)
}

# Apply decimal scaling to the mpg variable of mtcars
mtcars$mpg_ds <- decimal_scale(mtcars$mpg)

Now, let’s create visualizations to compare the original and normalized data.

# Create histograms for original and normalized mpg variables
par(mfrow = c(1, 2))
hist(mtcars$mpg, main = "Original mpg", xlab = "Miles per gallon")
hist(mtcars$mpg_z, main = "Z-score normalized mpg", xlab = "Z-scores")

# Create boxplots for original and normalized variables of mtcars
par(mfrow = c(2, 1))
boxplot(mtcars, main = "Original mtcars", las = 2)
boxplot(mtcars$mpg_z, main = "Z-score normalized mtcars", las = 2)

For complete code and tutorial visit here: https://www.data03.online/2023/08/how-to-normalize-data-r-my-data.html