Rótulos: Código da empresa (Cód_Emp)
Indicadores: Prazo médio de recebimento de vendas (PMRV, em dias); Endividamento (em %); Vendas (em R$ x mil); Margem líquida de vendas (em %).
rm(list=ls(all=TRUE))
library(psych)
library(MVN)
library(MVA)
library(corrplot)
library(Hmisc)
library(readxl)
library(robustX)
library(ggplot2)
# digite o local correto do arquivo
Indic_finan_Fatorial <- read_excel("Indic_finan_Fatorial.xls")
#head(Indic_finan_Fatorial)
dados <- Indic_finan_Fatorial[,2:5]
head(dados)
# Estatísticas Descritivas
describe(dados)
dados
4 Variables 45 Observations
-------------------------------------------------------------------------------------------
PMRV
n missing unique Info Mean .05 .10 .25 .50 .75 .90
45 0 38 1 53.13 12.20 15.41 25.68 49.22 82.39 96.30
.95
97.16
lowest : 6.42 7.49 11.77 13.91 14.98, highest: 95.23 96.30 97.37 99.51 99.80
-------------------------------------------------------------------------------------------
Endividamento
n missing unique Info Mean .05 .10 .25 .50 .75 .90
45 0 44 1 31.71 16.46 17.68 22.00 29.75 39.00 48.49
.95
52.49
lowest : 14.77 15.09 16.26 17.23 17.55, highest: 48.58 50.61 52.97 53.50 69.44
-------------------------------------------------------------------------------------------
Vendas
n missing unique Info Mean .05 .10 .25 .50 .75 .90
45 0 45 1 3989 2080 2236 2921 3719 4770 6167
.95
6765
lowest : 1981 2003 2074 2106 2129, highest: 6390 6637 6797 6953 9641
-------------------------------------------------------------------------------------------
Margem_liquida
n missing unique Info Mean .05 .10 .25 .50 .75 .90
45 0 37 1 13.22 8.736 9.138 10.165 13.054 16.906 17.848
.95
18.062
lowest : 8.453 8.560 8.700 8.881 9.095, highest: 17.240 17.655 17.976 18.083 18.190
-------------------------------------------------------------------------------------------
boxplot(dados)
dados.pad <- scale(dados)
boxplot(dados.pad)
O box-plot permite a identificação de possíveis outliers (univariados). Pode-se pensar na exclusão desses dados. Mas tenha cuidado!
# outliers
# Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298.
result1 <- mvBACON(dados) # o pacote retorna os resultados em uma lista
MV-BACON (subset no. 1): 16 of 45 (35.56 %)
MV-BACON (subset no. 2): 38 of 45 (84.44 %)
MV-BACON (subset no. 3): 40 of 45 (88.89 %)
MV-BACON (subset no. 4): 41 of 45 (91.11 %)
MV-BACON (subset no. 5): 41 of 45 (91.11 %)
result2 <- data.frame(pos=c(1:length(result1[[1]])),result1[[1]])
result2
# gráfico com o pacote ggplot2
dados2 <- data.frame(Indic_finan_Fatorial$Cod_Emp, result1[[1]])
ggplot(data = dados2, aes(x = Indic_finan_Fatorial.Cod_Emp, y = result1[[1]] )) +
geom_point()+
geom_text(aes(label=Indic_finan_Fatorial.Cod_Emp))+
geom_hline(yintercept = result1$limit)
# Verificação de normalidade
# http://www.biosoft.hacettepe.edu.tr/MVN/
# testes univariados
uniPlot(dados, type = "qqplot") # creates univariate Q-Q plots
uniPlot(dados, type = "histogram") # creates univariate histograms
# univariate normality tests:
# SW: Shapiro-Wilk,
# CVM: Cramer-von Mises,
# Lillie: Lilliefors (Kolmogorov-Smirnov),
# SF: Shapiro-Francia,
# AD: Anderson-Darling
uniNorm(dados, type = "SW", desc=F)
$`Descriptive Statistics`
NULL
$`Shapiro-Wilk's Normality Test`
uniNorm(dados, type = "CVM" , desc = F)
$`Descriptive Statistics`
NULL
$`Cramer-von Mises's Normality Test`
uniNorm(dados, type = "Lillie", desc = F)
$`Descriptive Statistics`
NULL
$`Lilliefors (Kolmogorov-Smirnov)'s Normality Test`
uniNorm(dados, type = "SF", desc = F)
$`Descriptive Statistics`
NULL
$`Shapiro-Francia's Normality Test`
uniNorm(dados, type = "AD", desc = F)
$`Descriptive Statistics`
NULL
$`Anderson-Darling's Normality Test`
# testes multivariados
hzTest(dados)
Henze-Zirkler's Multivariate Normality Test
---------------------------------------------
data : dados
HZ : 1.2577
p-value : 0.0001032643
Result : Data are not multivariate normal.
---------------------------------------------
roystonTest(dados)
Royston's Multivariate Normality Test
---------------------------------------------
data : dados
H : 35.05432
p-value : 4.837587e-07
Result : Data are not multivariate normal.
---------------------------------------------
mardiaTest(dados)
Mardia's Multivariate Normality Test
---------------------------------------
data : dados
g1p : 4.203704
chi.skew : 31.52778
p.value.skew : 0.04859737
g2p : 23.68478
z.kurtosis : -0.1526065
p.value.kurt : 0.8787086
chi.small.skew : 34.53042
p.value.small : 0.02275265
Result : Data are not multivariate normal.
---------------------------------------
# correlação
mcor <- rcorr(as.matrix(dados))
mcor
PMRV Endividamento Vendas Margem_liquida
PMRV 1.00 0.23 0.63 0.60
Endividamento 0.23 1.00 0.24 -0.10
Vendas 0.63 0.24 1.00 0.58
Margem_liquida 0.60 -0.10 0.58 1.00
n= 45
P
PMRV Endividamento Vendas Margem_liquida
PMRV 0.1210 0.0000 0.0000
Endividamento 0.1210 0.1148 0.5227
Vendas 0.0000 0.1148 0.0000
Margem_liquida 0.0000 0.5227 0.0000
R <- cor(dados)
corrplot(R, method="number",type="upper", order = "hclust", tl.srt = 45)
# corrplot(R, method="circle",type="full", order = "hclust", tl.srt = 45)
Observa-se que há altas correlações entre as variáveis Vendas, PMRV e Margem_líquida (p-valor < 5%) Existe considerável número de correlações superiores a 0,30 a normalidade multivariada não pode ser evidenciada, contudo podemos usar métodos robustos para extrair os fatores, como por exemplo o método dos componentes principais.
################################################################
#Partial correlation matrix
################################################################
partial.cor <- function (x)
{
R <- cor(x)
RI <- solve(R)
D <- 1/sqrt(diag(RI))
Rp <- -RI * (D %o% D)
diag(Rp) <- 0
rownames(Rp) <- colnames(Rp) <- colnames(x)
Rp
}
mat_anti_imagem <- -partial.cor(dados)
mat_anti_imagem
PMRV Endividamento Vendas Margem_liquida
PMRV 0.0000000 -0.2520021 -0.3380240 -0.4265296
Endividamento -0.2520021 0.0000000 -0.2462634 0.3685448
Vendas -0.3380240 -0.2462634 0.0000000 -0.3869238
Margem_liquida -0.4265296 0.3685448 -0.3869238 0.0000000
corrplot(mat_anti_imagem, method="number",type="upper", order = "hclust", tl.srt = 45)
################################################################
# The Bartlett's test statistic indicates to what extent we deviate from the reference situation |R| = 1.
################################################################
Bartlett.sphericity.test <- function(x)
{
method <- "Bartlett's test of sphericity"
data.name <- deparse(substitute(x))
x <- subset(x, complete.cases(x)) # Omit missing values
n <- nrow(x)
p <- ncol(x)
chisq <- (1-n+(2*p+5)/6)*log(det(cor(x)))
df <- p*(p-1)/2
p.value <- pchisq(chisq, df, lower.tail=FALSE)
names(chisq) <- "X-squared"
names(df) <- "df"
return(structure(list(statistic=chisq, parameter=df, p.value=p.value,
method=method, data.name=data.name), class="htest"))
}
Bartlett.sphericity.test(dados)
Bartlett's test of sphericity
data: dados
X-squared = 53.165, df = 6, p-value = 1.087e-09
################################################################
# KMO index
################################################################
kmo <- function(x)
{
x <- subset(x, complete.cases(x)) # Omit missing values
r <- cor(x) # Correlation matrix
r2 <- r^2 # Squared correlation coefficients
i <- solve(r) # Inverse matrix of correlation matrix
d <- diag(i) # Diagonal elements of inverse matrix
p2 <- (-i/sqrt(outer(d, d)))^2 # Squared partial correlation coefficients
diag(r2) <- diag(p2) <- 0 # Delete diagonal elements
KMO <- sum(r2)/(sum(r2)+sum(p2))
MSA <- colSums(r2)/(colSums(r2)+colSums(p2))
return(list(KMO=KMO, MSA=MSA))
}
kmo(dados)
$KMO
[1] 0.6309025
$MSA
PMRV Endividamento Vendas Margem_liquida
0.6909296 0.3183487 0.7071779 0.6008364
o KMO torna razoável a aplicação da AF o teste de esfericidade de Bartlett rejeita a hipótese de a matriz de correlações ser identidade
O MSA é adequado, com exceção da variável Endividamento (0,318). Se a comunalidade for alta, esta variável pode ser sozinha um Fator.
n.dados <- length(dados)
fit <- principal(dados, nfactors=n.dados, rotate="none")
print(fit, sort=T)
Principal Components Analysis
Call: principal(r = dados, nfactors = n.dados, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
item PC1 PC2 PC3 PC4 h2 u2 com
PMRV 1 0.88 0.04 0.40 -0.27 1 1.1e-16 1.6
Vendas 3 0.87 0.06 -0.46 -0.17 1 4.4e-16 1.6
Margem_liquida 4 0.81 -0.43 0.06 0.41 1 0.0e+00 2.1
Endividamento 2 0.27 0.94 0.04 0.21 1 3.3e-16 1.3
PC1 PC2 PC3 PC4
SS loadings 2.24 1.07 0.38 0.31
Proportion Var 0.56 0.27 0.09 0.08
Cumulative Var 0.56 0.83 0.92 1.00
Proportion Explained 0.56 0.27 0.09 0.08
Cumulative Proportion 0.56 0.83 0.92 1.00
Mean item complexity = 1.6
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0
with the empirical chi square 0 with prob < NA
Fit based upon off diagonal values = 1
#fit$values
#fit$scores
#fit$weights
#fit$loadings
#fit$communality
# Escolha a quantidade de fatores
fit1 <- principal(dados, nfactors=2, rotate="none")
print(fit1, sort=T)
Principal Components Analysis
Call: principal(r = dados, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
item PC1 PC2 h2 u2 com
PMRV 1 0.88 0.04 0.77 0.231 1.0
Vendas 3 0.87 0.06 0.76 0.242 1.0
Margem_liquida 4 0.81 -0.43 0.83 0.169 1.5
Endividamento 2 0.27 0.94 0.96 0.045 1.2
PC1 PC2
SS loadings 2.24 1.07
Proportion Var 0.56 0.27
Cumulative Var 0.56 0.83
Proportion Explained 0.68 0.32
Cumulative Proportion 0.68 1.00
Mean item complexity = 1.2
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.09
with the empirical chi square 4.29 with prob < NA
Fit based upon off diagonal values = 0.96
#fit1$values
#fit1$scores
#fit1$weights
#fit1$loadings
#fit1$communality
Observando as comunalidades, há forte relação com os fatores extraídos. A variável Endividamento possui alta comunalidade. Sendo possível considerá-la como um Fator. A extração de dois Fatores explicam cerca de 82,8% da variabilidade total dos dados Há evidência de que no Fator 1 seja predominante as variáveis: PMRV, vendas e Margem_líquida e no Fator 2 variável Endividamento. É possível sugerir um nome para o Fator 1 (volume de negócios-faturamento) e para o Fator 2 (estrutura de capital)
biplot(fit1)
O biplot representa a relação entre as variáveis e os fatores (após a rotação Varimax nesse caso). Endividamento possui elevada carga no Fator 2 e as demais variáveis no Fator 1
# varimax rotation
fit2 <- principal(dados, nfactors=2, rotate="varimax")
print(fit2, sort=T)
Principal Components Analysis
Call: principal(r = dados, nfactors = 2, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
item RC1 RC2 h2 u2 com
Margem_liquida 4 0.87 -0.26 0.83 0.169 1.2
PMRV 1 0.85 0.21 0.77 0.231 1.1
Vendas 3 0.84 0.23 0.76 0.242 1.1
Endividamento 2 0.08 0.97 0.96 0.045 1.0
RC1 RC2
SS loadings 2.20 1.12
Proportion Var 0.55 0.28
Cumulative Var 0.55 0.83
Proportion Explained 0.66 0.34
Cumulative Proportion 0.66 1.00
Mean item complexity = 1.1
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.09
with the empirical chi square 4.29 with prob < NA
Fit based upon off diagonal values = 0.96
#fit2$values
#fit2$scores
#fit2$weights
#fit2$loadings
#fit2$communality
biplot(fit1)
biplot(fit2)
#########
# critério da soma ponderada e ordenamento
# formação de rankings das empresas / indicador de desempenho
# cálculo com a rotação varimax
fit2$values
[1] 2.2424292 1.0708642 0.3757126 0.3109940
peso1 <- fit2$values[1]/sum(fit2$values)
peso2 <- fit2$values[2]/sum(fit2$values)
classificacao <- fit2$scores[,1]*peso1+fit2$scores[,2]*peso2
desempenho_emp <- cbind.data.frame(Indic_finan_Fatorial,classificacao)
desempenho_emp[order(desempenho_emp$classificacao,decreasing=c(TRUE)),]