Descriptiva multivariada

\[ \bar{x_j} = \frac{1}{n}1'X_j = \frac{1}{n}X_j '1 \]

Matriz de unos

n<-10
p<-10

U<- matrix(1,n,p)
U
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    1    1    1    1    1    1    1    1     1
##  [2,]    1    1    1    1    1    1    1    1    1     1
##  [3,]    1    1    1    1    1    1    1    1    1     1
##  [4,]    1    1    1    1    1    1    1    1    1     1
##  [5,]    1    1    1    1    1    1    1    1    1     1
##  [6,]    1    1    1    1    1    1    1    1    1     1
##  [7,]    1    1    1    1    1    1    1    1    1     1
##  [8,]    1    1    1    1    1    1    1    1    1     1
##  [9,]    1    1    1    1    1    1    1    1    1     1
## [10,]    1    1    1    1    1    1    1    1    1     1

Matriz J

\[ J = \frac{1}{n}U = \frac{1}{n}11' \]

rep(1,10)
##  [1] 1 1 1 1 1 1 1 1 1 1
rep(1,10)%*%t(rep(1,10))
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    1    1    1    1    1    1    1    1     1
##  [2,]    1    1    1    1    1    1    1    1    1     1
##  [3,]    1    1    1    1    1    1    1    1    1     1
##  [4,]    1    1    1    1    1    1    1    1    1     1
##  [5,]    1    1    1    1    1    1    1    1    1     1
##  [6,]    1    1    1    1    1    1    1    1    1     1
##  [7,]    1    1    1    1    1    1    1    1    1     1
##  [8,]    1    1    1    1    1    1    1    1    1     1
##  [9,]    1    1    1    1    1    1    1    1    1     1
## [10,]    1    1    1    1    1    1    1    1    1     1
rep(1,10)%*%t(rep(1,10)) == U
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [7,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [8,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [9,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
## [10,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
U%*%U
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   10   10   10   10   10   10   10   10   10    10
##  [2,]   10   10   10   10   10   10   10   10   10    10
##  [3,]   10   10   10   10   10   10   10   10   10    10
##  [4,]   10   10   10   10   10   10   10   10   10    10
##  [5,]   10   10   10   10   10   10   10   10   10    10
##  [6,]   10   10   10   10   10   10   10   10   10    10
##  [7,]   10   10   10   10   10   10   10   10   10    10
##  [8,]   10   10   10   10   10   10   10   10   10    10
##  [9,]   10   10   10   10   10   10   10   10   10    10
## [10,]   10   10   10   10   10   10   10   10   10    10
J <- (1/n)*U
J
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [2,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [3,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [4,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [5,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [6,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [7,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [8,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [9,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
## [10,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1

Matriz idempotente

\[ A^k=A \]

\[ A^2=A \]

J%*%J
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [2,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [3,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [4,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [5,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [6,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [7,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [8,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
##  [9,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1
## [10,]  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1   0.1

Por lo tanto la matriz J es idempotente

I<-diag(1,n,p)

Matriz de error o Matriz de centrado

\[ I - J \]

J==t(J)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [7,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [8,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [9,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
## [10,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE

\[ A=A' \]

Matriz pseudo-idéntica

Sea \(A \in R^{n \times n}\) se dice que A es pseudo-idéntica cuando y solo cuando.

En este caso la matriz J es simétrica e idempotente, por lo tanto es una pseudo-idéntica.

I-J
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]  0.9 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [2,] -0.1  0.9 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [3,] -0.1 -0.1  0.9 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [4,] -0.1 -0.1 -0.1  0.9 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [5,] -0.1 -0.1 -0.1 -0.1  0.9 -0.1 -0.1 -0.1 -0.1  -0.1
##  [6,] -0.1 -0.1 -0.1 -0.1 -0.1  0.9 -0.1 -0.1 -0.1  -0.1
##  [7,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  0.9 -0.1 -0.1  -0.1
##  [8,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  0.9 -0.1  -0.1
##  [9,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  0.9  -0.1
## [10,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1   0.9
I-J == t(I-J)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [7,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [8,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
##  [9,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE
## [10,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE

POR LO TANTO, LA MATRIZ DE CENTRADO ES SIMÉTRICA.

(I-J)%*%(I-J)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]  0.9 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [2,] -0.1  0.9 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [3,] -0.1 -0.1  0.9 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [4,] -0.1 -0.1 -0.1  0.9 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [5,] -0.1 -0.1 -0.1 -0.1  0.9 -0.1 -0.1 -0.1 -0.1  -0.1
##  [6,] -0.1 -0.1 -0.1 -0.1 -0.1  0.9 -0.1 -0.1 -0.1  -0.1
##  [7,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  0.9 -0.1 -0.1  -0.1
##  [8,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  0.9 -0.1  -0.1
##  [9,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  0.9  -0.1
## [10,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1   0.9
(I-J)%*%(I-J) 
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]  0.9 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [2,] -0.1  0.9 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [3,] -0.1 -0.1  0.9 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [4,] -0.1 -0.1 -0.1  0.9 -0.1 -0.1 -0.1 -0.1 -0.1  -0.1
##  [5,] -0.1 -0.1 -0.1 -0.1  0.9 -0.1 -0.1 -0.1 -0.1  -0.1
##  [6,] -0.1 -0.1 -0.1 -0.1 -0.1  0.9 -0.1 -0.1 -0.1  -0.1
##  [7,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  0.9 -0.1 -0.1  -0.1
##  [8,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  0.9 -0.1  -0.1
##  [9,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1  0.9  -0.1
## [10,] -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1   0.9
# Cargar el conjunto de datos Iris (ya está disponible en R por defecto)
data(iris)

# Ver las primeras filas del conjunto de datos
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
# Resumen estadístico del conjunto de datos
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
View(iris)
attach(iris)
library(ggplot2)
ggplot(iris,aes(x=factor(Species), y=iris$Sepal.Length)) +
    geom_boxplot(outlier.shape=16, outlier.size=1) +
    labs(x="Especie",
           y="Longitud del Sépalo", fill="Tipo") +
    theme(legend.position="top", legend.direction="horizontal")
## Warning: Use of `iris$Sepal.Length` is discouraged.
## ℹ Use `Sepal.Length` instead.

ggplot(iris,aes(x=factor(Species), y=iris$Sepal.Length, colour = Species)) +
    geom_violin(outlier.shape=16, outlier.size=1) +
    labs(x="Especie",
           y="Longitud del Sépalo", fill="Tipo") +
    theme(legend.position="top", legend.direction="horizontal")
## Warning in geom_violin(outlier.shape = 16, outlier.size = 1): Ignoring unknown
## parameters: `outlier.shape` and `outlier.size`
## Warning: Use of `iris$Sepal.Length` is discouraged.
## ℹ Use `Sepal.Length` instead.

\[ \bar{x_j} = \frac{1}{n}1'X_j = \frac{1}{n}X_j '1 \]

\[ \hat{\mu}=\frac{1}{n}X'1 \]

d<- iris[,1:4]
med1<-(1/length(d$Sepal.Length))*as.matrix(t(d))%*%rep(1,length(d$Sepal.Length))

med1
##                  [,1]
## Sepal.Length 5.843333
## Sepal.Width  3.057333
## Petal.Length 3.758000
## Petal.Width  1.199333
med2<-rep(0,4)
for(j in 1:4){
  med2[j]<-mean(d[,j])
}
med2
## [1] 5.843333 3.057333 3.758000 1.199333

Matriz de varianzas y covarianzas

\[ S = X'(I-J)X = (s_{i,j})_{i,j} \]

var(d)
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    0.6856935  -0.0424340    1.2743154   0.5162707
## Sepal.Width    -0.0424340   0.1899794   -0.3296564  -0.1216394
## Petal.Length    1.2743154  -0.3296564    3.1162779   1.2956094
## Petal.Width     0.5162707  -0.1216394    1.2956094   0.5810063
library(corrplot)
## corrplot 0.95 loaded
cov_matrix <- cov(d)
print(cov_matrix)
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    0.6856935  -0.0424340    1.2743154   0.5162707
## Sepal.Width    -0.0424340   0.1899794   -0.3296564  -0.1216394
## Petal.Length    1.2743154  -0.3296564    3.1162779   1.2956094
## Petal.Width     0.5162707  -0.1216394    1.2956094   0.5810063
cor(d)
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

Positividad de la matriz S

eigen(var(d))
## eigen() decomposition
## $values
## [1] 4.22824171 0.24267075 0.07820950 0.02383509
## 
## $vectors
##             [,1]        [,2]        [,3]       [,4]
## [1,]  0.36138659 -0.65658877  0.58202985  0.3154872
## [2,] -0.08452251 -0.73016143 -0.59791083 -0.3197231
## [3,]  0.85667061  0.17337266 -0.07623608 -0.4798390
## [4,]  0.35828920  0.07548102 -0.54583143  0.7536574

La matriz de varianzas y covarianzas es una matriz definida positiva.

Teorema.

Una matriz simétrica es definida positiva si y solo si es de varianzas y covarianzas.

corrplot(cor(d))

eigen(cor(d))
## eigen() decomposition
## $values
## [1] 2.91849782 0.91403047 0.14675688 0.02071484
## 
## $vectors
##            [,1]        [,2]       [,3]       [,4]
## [1,]  0.5210659 -0.37741762  0.7195664  0.2612863
## [2,] -0.2693474 -0.92329566 -0.2443818 -0.1235096
## [3,]  0.5804131 -0.02449161 -0.1421264 -0.8014492
## [4,]  0.5648565 -0.06694199 -0.6342727  0.5235971
sum(diag(cor(d)))
## [1] 4
tt<-eigen(cor(d))
sum(tt$values)
## [1] 4

Varianza total

# Varianza generalizada

det(var(d))
## [1] 0.00191273
prod(eigen(var(d))$values)
## [1] 0.00191273

Primer punto del parcial.

data("mtcars")
View(mtcars)
  1. Con la librería mtcars realizar la matriz de var y cov.
  2. Con la librería mtcars realizar la matriz de cor y el corplot.
  3. Hallar la varianza total por definición y por teorema.
  4. Hallar la varianza general por definición y por teorema.
  5. Estandarizar los datos y hallar la matriz de cov. Qué relación tiene la matriz de cov de los datos estandarizados y la de correlación.
  6. Hallar los vectores propios y las proyecciones de las columnas sbre los vectores propios de la matriz de var y covar.
  7. Halle el vector de medias.
  8. Hallar la matriz de distancias de Mahallanobis.
  9. Demuestre que para matrices simétricas se cumple que

\[tr(A) = \sum_{i=1}^n \lambda_i\]

\[det(A) = \prod_{i=1}^n \lambda_i\]

  1. Demuestre que: \[ S = X'(I-J)X = (s_{i,j})_{i,j} \]