dados <- read_excel("dados.xlsx", sheet = "dados")
dados$trat <- as.factor(dados$trat)
levels(dados$trat) <- c("50ml","100ml","200ml")
with(dados,fat2.dic(cult,trat,alt, fac.names=c("Cultivar","Tratamento")))
------------------------------------------------------------------------
Legenda:
FATOR 1: Cultivar
FATOR 2: Tratamento
------------------------------------------------------------------------
Quadro da analise de variancia
------------------------------------------------------------------------
GL SQ QM Fc Pr>Fc
Cultivar 4 115.731 28.9328 38.958 0.000000
Tratamento 2 24.153 12.0763 16.261 0.000175
Cultivar*Tratamento 8 11.191 1.3988 1.884 0.138309
Residuo 15 11.140 0.7427
Total 29 162.215
------------------------------------------------------------------------
CV = 9.35 %
------------------------------------------------------------------------
Teste de normalidade dos residuos (Shapiro-Wilk)
valor-p: 0.8347244
De acordo com o teste de Shapiro-Wilk a 5% de significancia, os residuos podem ser considerados normais.
------------------------------------------------------------------------
Interacao nao significativa: analisando os efeitos simples
------------------------------------------------------------------------
Cultivar
Teste de Tukey
------------------------------------------------------------------------
Grupos Tratamentos Medias
a CGH - 271 12.7
b CGH - 298 9.166667
b CGH - 227 8.916667
b CGH - 328 8.666667
c CGH - 323 6.616667
------------------------------------------------------------------------
Tratamento
Teste de Tukey
------------------------------------------------------------------------
Grupos Tratamentos Medias
a 100ml 10.42
b 200ml 8.95
b 50ml 8.27
------------------------------------------------------------------------
Gráfico
#
dados %>% ggplot() + geom_bar(aes(reorder(cult,-alt), alt, fill = trat),
stat = "identity",
position = "dodge") + theme_classic() +xlab("Cultivar") + ylab("Altura (cm)") + scale_fill_discrete(name="Tratamento")
sum_my_rise(dados, alt, cult, trat) %>% ggplot() +
aes(x=trat, y=mean, fill=cult) +
geom_bar(stat = "identity", position = "dodge") +
xlab('') + ylab('Média de Altura (cm)') +
scale_fill_discrete(name="Cultivar")+
geom_errorbar(aes(ymin = mean - se,
ymax = mean + se),
width = 0.3,
size = 0.9,
position = pd,
color = "black")+
theme(
axis.text.x = element_text(
angle = 45,
hjust = 1,
colour = "black")) +theme_classic()
with(dados,fat2.dic(cult,trat,nf, fac.names=c("Cultivar","Tratamento")))
------------------------------------------------------------------------
Legenda:
FATOR 1: Cultivar
FATOR 2: Tratamento
------------------------------------------------------------------------
Quadro da analise de variancia
------------------------------------------------------------------------
GL SQ QM Fc Pr>Fc
Cultivar 4 518.47 129.617 82.734 0.00000
Tratamento 2 4.07 2.033 1.298 0.30208
Cultivar*Tratamento 8 14.93 1.867 1.191 0.36598
Residuo 15 23.50 1.567
Total 29 560.97
------------------------------------------------------------------------
CV = 11.34 %
------------------------------------------------------------------------
Teste de normalidade dos residuos (Shapiro-Wilk)
valor-p: 0.04977834
ATENCAO: a 5% de significancia, os residuos nao podem ser considerados normais!
------------------------------------------------------------------------
Interacao nao significativa: analisando os efeitos simples
------------------------------------------------------------------------
Cultivar
Teste de Tukey
------------------------------------------------------------------------
Grupos Tratamentos Medias
a CGH - 227 16.66667
a CGH - 271 15.5
b CGH - 298 8.333333
b CGH - 323 7.5
b CGH - 328 7.166667
------------------------------------------------------------------------
Tratamento
De acordo com o teste F, as medias desse fator sao estatisticamente iguais.
------------------------------------------------------------------------
Niveis Medias
1 100ml 11.5
2 200ml 11.0
3 50ml 10.6
------------------------------------------------------------------------
# Gráficos
#
dados %>% ggplot() + geom_bar(aes(reorder(cult,-nf), nf, fill = trat),
stat = "identity",
position = "dodge") + theme_classic() +xlab("Cultivar") + ylab("Número de Folhas") + scale_fill_discrete(name="Tratamento")
sum_my_rise(dados, nf, cult, trat) %>% ggplot() +
aes(x=trat, y=mean, fill=cult) +
geom_bar(stat = "identity", position = "dodge") +
xlab('') + ylab('Média de N. de Folhas') +
scale_fill_discrete(name="Cultivar")+
geom_errorbar(aes(ymin = mean - se,
ymax = mean + se),
width = 0.3,
size = 0.9,
position = pd,
color = "black")+
theme(
axis.text.x = element_text(
angle = 45,
hjust = 1,
colour = "black")) +theme_classic()
a<-aggregate(nf ~ cult + trat, dados, mean)
b<-aggregate(alt ~ cult + trat, dados, mean)
dados2 <- merge(a,b)
dados2$inter <- interaction(dados2$cult,dados2$trat)
rownames(dados2) <- dados2$inter
#levels(dados2$trat) <- c("50","100","200")
dados2 <- dados2[,c(3:4)]
names(dados2) <- c("N. Folhas","Altura")
dados2 %>% kable
| N. Folhas | Altura | |
|---|---|---|
| CGH - 227.100ml | 15.5 | 9.75 |
| CGH - 227.200ml | 17.0 | 8.50 |
| CGH - 227.50ml | 17.5 | 8.50 |
| CGH - 271.100ml | 16.5 | 14.50 |
| CGH - 271.200ml | 15.5 | 12.75 |
| CGH - 271.50ml | 14.5 | 10.85 |
| CGH - 298.100ml | 9.0 | 9.25 |
| CGH - 298.200ml | 8.5 | 10.00 |
| CGH - 298.50ml | 7.5 | 8.25 |
| CGH - 323.100ml | 8.5 | 8.35 |
| CGH - 323.200ml | 6.5 | 5.50 |
| CGH - 323.50ml | 7.5 | 6.00 |
| CGH - 328.100ml | 8.0 | 10.25 |
| CGH - 328.200ml | 7.5 | 8.00 |
| CGH - 328.50ml | 6.0 | 7.75 |
#dados2$mL <- as.numeric(dados2$mL)
Dimensionamento e padronização
df <- scale(dados2)
df %>% kable()
| N. Folhas | Altura | |
|---|---|---|
| CGH - 227.100ml | 1.0194990 | 0.2310403 |
| CGH - 227.200ml | 1.3618680 | -0.3070971 |
| CGH - 227.50ml | 1.4759910 | -0.3070971 |
| CGH - 271.100ml | 1.2477450 | 2.2759626 |
| CGH - 271.200ml | 1.0194990 | 1.5225702 |
| CGH - 271.50ml | 0.7912529 | 0.7046013 |
| CGH - 298.100ml | -0.4641003 | 0.0157854 |
| CGH - 298.200ml | -0.5782233 | 0.3386678 |
| CGH - 298.50ml | -0.8064693 | -0.4147246 |
| CGH - 323.100ml | -0.5782233 | -0.3716736 |
| CGH - 323.200ml | -1.0347154 | -1.5986269 |
| CGH - 323.50ml | -0.8064693 | -1.3833720 |
| CGH - 328.100ml | -0.6923463 | 0.4462953 |
| CGH - 328.200ml | -0.8064693 | -0.5223521 |
| CGH - 328.50ml | -1.1488384 | -0.6299796 |
Número ótimo de clusters
# Silhueta média para kmeans
fviz_nbclust(df, kmeans, method = "silhouette")
# Estatística de lacunas
fviz_nbclust(df, kmeans, method = "gap_stat")
# Método Elbow para kmeans
fviz_nbclust(df, kmeans, method = "wss") + geom_vline(xintercept = 6, linetype = 2)
nbclust_out <- NbClust(
data = df,
distance = "euclidean",
min.nc = 2,
max.nc = 8,
method = "kmeans"
)
*** : The Hubert index is a graphical method of determining the number of clusters.
In the plot of Hubert index, we seek a significant knee that corresponds to a
significant increase of the value of the measure i.e the significant peak in Hubert
index second differences plot.
*** : The D index is a graphical method of determining the number of clusters.
In the plot of D index, we seek a significant knee (the significant peak in Dindex
second differences plot) that corresponds to a significant increase of the value of
the measure.
*******************************************************************
* Among all indices:
* 4 proposed 2 as the best number of clusters
* 3 proposed 3 as the best number of clusters
* 2 proposed 4 as the best number of clusters
* 4 proposed 5 as the best number of clusters
* 6 proposed 6 as the best number of clusters
* 3 proposed 7 as the best number of clusters
* 1 proposed 8 as the best number of clusters
***** Conclusion *****
* According to the majority rule, the best number of clusters is 6
*******************************************************************
nbclust_plot <- data.frame(clusters = nbclust_out$Best.nc[1, ])
nbclust_plot <- subset(nbclust_plot, clusters >= 2 & clusters <= 8)
ggplot(nbclust_plot) +
aes(x = clusters) +
geom_histogram(bins = 30L, fill = "#0c4c8a") +
labs(x = "Number of clusters", y = "Frequency among all indices", title = "Optimal number of clusters") +
theme_minimal()
Clusterização k-means
set.seed(123)
km.res=kmeans(df, 6, nstart=25)
print(km.res)
K-means clustering with 6 clusters of sizes 2, 2, 4, 3, 2, 2
Cluster means:
N. Folhas Altura
1 -0.9205924 -1.4909995
2 1.4189295 -0.3070971
3 -0.8350001 -0.4846825
4 -0.5782233 0.2669162
5 0.9053760 0.4678208
6 1.1336220 1.8992664
Clustering vector:
CGH - 227.100ml CGH - 227.200ml CGH - 227.50ml CGH - 271.100ml CGH - 271.200ml
5 2 2 6 6
CGH - 271.50ml CGH - 298.100ml CGH - 298.200ml CGH - 298.50ml CGH - 323.100ml
5 4 4 3 3
CGH - 323.200ml CGH - 323.50ml CGH - 328.100ml CGH - 328.200ml CGH - 328.50ml
1 1 4 3 3
Within cluster sum of squares by cluster:
[1] 0.049215479 0.006512032 0.206252166 0.126439986 0.138178111 0.309848191
(between_SS / total_SS = 97.0 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
Criando novo banco de dados com cluster
aggregate(dados2, by=list(cluster=km.res$cluster), mean)
dd <- cbind(dados2, cluster = km.res$cluster)
km.res$cluster
CGH - 227.100ml CGH - 227.200ml CGH - 227.50ml CGH - 271.100ml CGH - 271.200ml
5 2 2 6 6
CGH - 271.50ml CGH - 298.100ml CGH - 298.200ml CGH - 298.50ml CGH - 323.100ml
5 4 4 3 3
CGH - 323.200ml CGH - 323.50ml CGH - 328.100ml CGH - 328.200ml CGH - 328.50ml
1 1 4 3 3
km.res$size
[1] 2 2 4 3 2 2
km.res$centers
N. Folhas Altura
1 -0.9205924 -1.4909995
2 1.4189295 -0.3070971
3 -0.8350001 -0.4846825
4 -0.5782233 0.2669162
5 0.9053760 0.4678208
6 1.1336220 1.8992664
Vizualizando os clusters
fviz_cluster(km.res, data=df,
geom.ind = c("text"),
ellipse.type="euclid",
star.plot=TRUE,
palette = "Dark2",
repel=TRUE,
ggtheme=theme_minimal()
)
Dendrograma
dista=dist(df, method="euclidean")
dista.hc=hclust(d=dista, method="ward.D")
fviz_dend(dista.hc, cex=0.5, k = 6, color_labels_by_k = TRUE, horiz = T)
res <- hcut(df, k = 6, stand = TRUE)
fviz_dend(res, rect = TRUE, cex = 0.5,
k_colors = "Dark2", horiz = T)
km.res1 <- hkmeans(df, 6,hc.metric = "euclid" ,hc.method = "ward.D")
fviz_dend(km.res1, cex = 0.6, palette = "Dark2",
rect = TRUE, rect_border = "Dark2", rect_fill = TRUE, horiz = T)
fviz_dist(dista, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))
km.pca <- PCA(
df,
graph = F,
scale.unit = TRUE)
eig.val <- get_eigenvalue(km.pca)
eig.val
eigenvalue variance.percent cumulative.variance.percent
Dim.1 1.5904037 79.52018 79.52018
Dim.2 0.4095963 20.47982 100.00000
fviz_eig(km.pca, addlabels=TRUE)
var <- get_pca_var(km.pca)
var
Principal Component Analysis Results for variables
===================================================
Name Description
1 "$coord" "Coordinates for the variables"
2 "$cor" "Correlations between variables and dimensions"
3 "$cos2" "Cos2 for the variables"
4 "$contrib" "contributions of the variables"
$coord
$cor
$cos2
$contrib
Coordinates for the variables
Correlations between variables and dimensions
Cos2 for the variables
contributions of the variables
# Coordenadas
head(var$coord)
Dim.1 Dim.2
N. Folhas 0.8917409 -0.4525463
Altura 0.8917409 0.4525463
# Cos2: qualidade no mapa do fator
head(var$cos2)
Dim.1 Dim.2
N. Folhas 0.7952018 0.2047982
Altura 0.7952018 0.2047982
# Contribuições para os componentes principais
head(var$contrib)
Dim.1 Dim.2
N. Folhas 50 50
Altura 50 50
fviz_cos2(km.pca, choice = "var", axes = 1:2)
df %>% cor(method = "spearman") %>% corrplot(.,
method = "number",
type = "upper",
tl.pos = "td")
summary(km.pca)
Call:
PCA(X = df, scale.unit = TRUE, graph = F)
Eigenvalues
Dim.1 Dim.2
Variance 1.59 0.41
% of var. 79.52 20.48
Cumulative % of var. 79.52 100.00
Individuals (the 10 first)
Dist Dim.1 ctr cos2 Dim.2 ctr cos2
CGH - 227.100ml | 1.082 | 0.915 3.512 0.716 | -0.577 5.421 0.284 |
CGH - 227.200ml | 1.445 | 0.772 2.498 0.285 | -1.222 24.287 0.715 |
CGH - 227.50ml | 1.561 | 0.856 3.068 0.301 | -1.305 27.722 0.699 |
CGH - 271.100ml | 2.687 | 2.579 27.883 0.922 | 0.753 9.218 0.078 |
CGH - 271.200ml | 1.897 | 1.861 14.511 0.962 | 0.368 2.207 0.038 |
CGH - 271.50ml | 1.097 | 1.095 5.025 0.997 | -0.063 0.065 0.003 |
CGH - 298.100ml | 0.481 | -0.328 0.451 0.466 | 0.351 2.008 0.534 |
CGH - 298.200ml | 0.694 | -0.175 0.129 0.064 | 0.671 7.330 0.936 |
CGH - 298.50ml | 0.939 | -0.894 3.349 0.907 | 0.287 1.338 0.093 |
CGH - 323.100ml | 0.712 | -0.695 2.026 0.955 | 0.151 0.372 0.045 |
Variables
Dim.1 ctr cos2 Dim.2 ctr cos2
N. Folhas | 0.892 50.000 0.795 | -0.453 50.000 0.205 |
Altura | 0.892 50.000 0.795 | 0.453 50.000 0.205 |
# Contribuições de variáveis para PC1
fviz_contrib(km.pca, choice = "var", axes = 1, top = 10)
# Contribuições de variáveis para PC2
fviz_contrib(km.pca, choice = "var", axes = 2, top = 10)
# contribuição total para PC1 e PC2
fviz_contrib(km.pca, choice = "var", axes = 1:2)
fviz_pca_biplot(
km.pca,
geom.ind = "text",
col.var = "contrib",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
legend.title = "Contribuição",
palette = "Dark2",
repel = F
)
fviz_pca_ind(
km.pca,
geom = "text",
habillage = as.factor(dd$cluster),
addEllipses = TRUE,
palette = "Dark2"
)
fviz_pca_ind(km.pca,
geom.ind = "text",
col.ind = as.factor(dd$cluster),
addEllipses = TRUE,
legend.title = "Grupos",
repel = T,
palette = "Dark2"
)
df %>% pairs.panels(.,
show.points=TRUE,
method = "spearman",
gap=0,
stars=TRUE,
ci=FALSE,
alpha=0.05,
cex.cor=1,
cex=1.0,
breaks="Sturges",
rug=FALSE,
density=F,
hist.col="darkgreen",
factor=5,
digits=2,
ellipses=FALSE,
scale=FALSE,
smooth=TRUE,
lm=T,
cor=T
)
ind <- get_pca_ind(km.pca)
ind
Principal Component Analysis Results for individuals
===================================================
Name Description
1 "$coord" "Coordinates for the individuals"
2 "$cos2" "Cos2 for the individuals"
3 "$contrib" "contributions of the individuals"
$coord
$cos2
$contrib
Coordinates for the individuals
Cos2 for the individuals
contributions of the individuals
# Coordenadas de indivíduos
head(ind$coord)
Dim.1 Dim.2
CGH - 227.100ml 0.9153011 -0.57709263
CGH - 227.200ml 0.7720133 -1.22155739
CGH - 227.50ml 0.8555428 -1.30508689
CGH - 271.100ml 2.5790899 0.75257822
CGH - 271.200ml 1.8606041 0.36821043
CGH - 271.50ml 1.0948532 -0.06342251
# Qualidade dos indivíduos
head(ind$cos2)
Dim.1 Dim.2
CGH - 227.100ml 0.7155513 0.284448676
CGH - 227.200ml 0.2854145 0.714585525
CGH - 227.50ml 0.3005716 0.699428441
CGH - 271.100ml 0.9215340 0.078466030
CGH - 271.200ml 0.9623122 0.037687752
CGH - 271.50ml 0.9966556 0.003344415
# Contribuições de indivíduos
head(ind$contrib)
Dim.1 Dim.2
CGH - 227.100ml 3.511796 5.42055506
CGH - 227.200ml 2.498336 24.28736839
CGH - 227.50ml 3.068208 27.72244699
CGH - 271.100ml 27.882668 9.21840950
CGH - 271.200ml 14.511400 2.20670805
CGH - 271.50ml 5.024735 0.06546959
fviz_pca_ind(km.pca, col.ind = "cos2",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE
)
fviz_contrib(km.pca, choice = "ind", axes = 1:2)
set.seed(123)
my.cont.var <- rnorm(15)
fviz_pca_ind(km.pca, col.ind = my.cont.var,
gradient.cols = c("blue", "yellow", "red"),
legend.title = "Cont.Var")
fviz_pca_ind(km.pca,
geom.ind = "point",
col.ind = as.factor(dd$cluster),
palette = "Dark2",
addEllipses = TRUE,
legend.title = "Grupos"
)
fviz_pca_biplot(km.pca,
geom.ind = c("point","text"),
fill.ind = as.factor(dd$cluster), col.ind = "black",
pointshape = 21,
pointsize = 2,
palette = "jco",
repel = T,
addEllipses = TRUE,
alpha.var ="contrib",
col.var = "contrib",
gradient.cols = "Dark2",
legend.title = list(fill = "Cluster",
color = "Contrib",
alpha = "Contrib")
)
library(tidyverse)
library(ggpubr)
library(rstatix)
pd = position_dodge(1)
dados %>% sample_n_by(cult,trat, size = 1)
dados %>%
group_by(cult,trat) %>%
get_summary_stats(alt,nf, type = "mean_sd")
bxp <- ggboxplot(
dados, x = "cult", y = "alt",
color = "trat", palette = "jco", ylab = "Altura", xlab= "Cultura",
)
ggpar(bxp,
legend = "right", legend.title = "Tratamento")
dados %>%
group_by(cult) %>%
shapiro_test(alt)
dados %>%
group_by(cult) %>%
identify_outliers(alt)
dados %>%
group_by(cult) %>%
identify_outliers(nf)
# Build the linear model
model <- lm(alt ~ cult, data = dados)
# Create a QQ plot of residuals
ggqqplot(residuals(model))
# Build the linear model
model <- lm(nf ~ cult, data = dados)
# Create a QQ plot of residuals
ggqqplot(residuals(model))
dados %>%
group_by(cult) %>%
shapiro_test(alt)
dados %>%
group_by(cult) %>%
shapiro_test(nf)
ggqqplot(dados, "alt", facet.by = "cult")
ggqqplot(dados, "nf", facet.by = "cult")
plot(model, 1)
dados %>% levene_test(alt ~ cult)
dados %>% levene_test(nf ~ cult)
res.aov <- dados %>% anova_test(alt ~ cult * trat)
res.aov
model <- lm(alt ~ cult * trat, data = dados)
dados %>%
group_by(cult) %>%
anova_test(alt ~ trat, error = model)
library(emmeans)
pwc <- dados %>%
group_by(cult) %>%
emmeans_test(alt ~ trat, p.adjust.method = "bonferroni")
pwc
res.aov
dados %>%
pairwise_t_test(
alt ~ cult,
p.adjust.method = "bonferroni"
)
model <- lm(alt ~ cult * trat, data = dados)
dados %>%
emmeans_test(
alt ~ cult, p.adjust.method = "bonferroni",
model = model
)
pwc <- pwc %>% add_xy_position(x = "cult")
ggpar(bxp,
legend = "right", legend.title = "Tratamento") +
stat_pvalue_manual(pwc) +
labs(
subtitle = get_test_label(res.aov, detailed = TRUE),
caption = get_pwc_label(pwc)
) + theme(legend.position = "right")
Os procedimentos estatísticos utilizados neste estudo foram realizados no programa R (R Core Team 2020). Pacotes stats (R Core Team 2020). Pacote dyplr, para manipulação de dados, (Wickham et al. 2020). Pacote ggpot2: elementos gráficos (Wickham 2016), Para análise multivariada foram utilizados os pacotes factoextra, (Kassambara and Mundt 2020), FactorMiner, (Lê, Josse, and Husson 2008), vegan (Oksanen et al. 2019), corrplot (Wei and Simko 2017), NbClust (Charrad et al. 2014), e para ANAVA e Teste de Tukey foi utilizado o ExpDes.pt (Ferreira, Cavalcanti, and Nogueira 2018)
Charrad, Malika, Nadia Ghazzali, Véronique Boiteau, and Azam Niknafs. 2014. “NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set.” Journal of Statistical Software 61 (6): 1–36. http://www.jstatsoft.org/v61/i06/.
Ferreira, Eric Batista, Portya Piscitelli Cavalcanti, and Denismar Alves Nogueira. 2018. ExpDes.pt: Pacote Experimental Designs (Portuguese). https://CRAN.R-project.org/package=ExpDes.pt.
Kassambara, Alboukadel, and Fabian Mundt. 2020. Factoextra: Extract and Visualize the Results of Multivariate Data Analyses. https://CRAN.R-project.org/package=factoextra.
Lê, Sébastien, Julie Josse, and François Husson. 2008. “FactoMineR: A Package for Multivariate Analysis.” Journal of Statistical Software 25 (1): 1–18. https://doi.org/10.18637/jss.v025.i01.
Oksanen, Jari, F. Guillaume Blanchet, Michael Friendly, Roeland Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, et al. 2019. Vegan: Community Ecology Package. https://CRAN.R-project.org/package=vegan.
R Core Team. 2020. “R: A Language and Environment for Statistical Computing.” https://www.R-project.org/.
Wei, Taiyun, and Viliam Simko. 2017. R Package "Corrplot": Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.