site : https://archive.ics.uci.edu/dataset/583/chemical+composition+of+ceramic+samples
Ceramic.Name: name of ceramic types from Longquan and Jindgezhen Part: a binary categorical variable (‘Body’ or ‘Glaze’) Na2O: percentage of Na2O (wt%) MgO: percentage of MgO (wt%) Al2O3: percentage of AI2O3 (wt%) SiO2: percentage of SiO2 (wt%) K2O: percentage of K2O (wt%) CaO: percentage of CaO (wt%) TiO2: percentage of TiO2 (wt%) Fe2O3: percentage of Fe2O3 (wt%) MnO: percentage of MnO (ppm) CuO: percentage of CuO (ppm) ZnO: percentage of ZnO (ppm) PbO2: percentage of PbO2 (ppm) Rb2O: percentage of Rb2O (ppm) SrO: percentage of SrO (ppm) Y2O3: percentage of Y2O3 (ppm) ZrO2: percentage of ZrO2 (ppm) P2O5: percentage of P2O5 (ppm)
Focus-variables :
df <- read.csv("Chemical Composion of Ceramic.csv")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
num <- df |> select(MnO, CuO, ZrO2, P2O5)
hist(num$MnO)
qqnorm(num$MnO)
qqline(num$MnO, col = "red")
hist(num$CuO)
qqnorm(num$CuO)
qqline(num$CuO, col = "red")
skewwed, non-normal
Closer to normality
hist(num$ZrO2)
qqnorm(num$ZrO2)
qqline(num$ZrO2, col = "red")
skewwed, non-normal
Closer to normality
hist(num$P2O5)
qqnorm(num$P2O5)
qqline(num$P2O5, col = "red")
plot(num)
Lets take a step back and remember “Statistical Distance” :
\[ (x-\mu)'\Sigma(x-\mu) \]
Which we know, that if \(X\sim N_p(\vec{\mu}, \Sigma)\) then :
\[ (x-\mu)'\Sigma(x-\mu) \sim X^2_{p}(\alpha) \]
xbar <- colMeans(num)
xbar_vec <- matrix(xbar, ncol = 1)
rownames(xbar_vec) <- names(xbar)
cov_mat <- cov(num) # est cov
S_inv <- solve(cov_mat) # est of cov-inc