1 Estandarizacion

1.1 Score_Z

\[ z = \frac{x - \mu}{\sigma} \]

📌 Nota:
El score Z indica cuántas desviaciones estándar se encuentra un valor respecto a la media de la distribución.

1.2 Usando R para estandarizar una variable

set.seed(123)
x = rnorm(n = 60, mean = 3, sd = 0.3)
head(x)
## [1] 2.831857 2.930947 3.467612 3.021153 3.038786 3.514519
hist(x, main = "Histograma de x", xlab = "Valores de x", col = "lightblue", border = "white",)

boxplot(x)
points(mean(x),
       col = "tomato",
       cex = 1.5,
       pch = 16)

set.seed(123)
Y = rexp(n = 600,rate = 1/2)
head(Y)
## [1] 1.68691452 1.15322054 2.65810974 0.06315472 0.11242195 0.63300243
boxplot(Y)
points(mean(Y),
       col = "orange",
       cex = 1.5,
       pch = 16)

hist(Y)
abline(v = mean(Y),
       col= "red",
       lty = 3,
       lwd = 3)

1.3 Ahora si score-z para X

z_x = scale(x)
head(z_x)
##              [,1]
## [1,] -0.687729949
## [2,] -0.324914908
## [3,]  1.640081452
## [4,]  0.005372616
## [5,]  0.069938613
## [6,]  1.811830980
par(mfrow = c(1,2))
hist(x, nclass = 10)
hist(z_x, nclass = 10)

plot(x, z_x)

### Correlacion de pearson

cor(x = x, y = z_x,method = "pearson")
##      [,1]
## [1,]    1

1.4 transformaciones lineales

\[ aditiva\\ T(x_1 + x_2) = T(x_1) + T(x_2)\\ homogenea\\ T(cX) = cT(X) \]

\[ T(x + y) = \frac{(x + y) - \mu}{\sigma} \]

\[ T(x) + T(y) = \frac{x - \mu}{\sigma} + \frac{y - \mu}{\sigma} = \frac{x + y - 2\mu}{\sigma} \]

\[ T(x + y) \neq T(x) + T(y) \quad \text{si } \mu \neq 0 \]

\[ T(\alpha x) = \frac{\alpha x - \mu}{\sigma} \]

\[ \alpha T(x) = \alpha \frac{x - \mu}{\sigma} = \frac{\alpha x - \alpha \mu}{\sigma} \]

\[ T(\alpha x) \neq \alpha T(x) \quad \text{si } \mu \neq 0 \]

cat(mean(x), "media de x")
## 3.019685 media de x
cat(mean(z_x), "media de x_z")
## 1.834719e-16 media de x_z
cat(var(x), "varianza de x")
## 0.07459062 varianza de x
cat(var(z_x), "varianza de z_x")
## 1 varianza de z_x