Assessing the soil microbiome of a native Atlantic forest fragment and an adjacent farm

Conforme a proposta: “dados foram analisados quanto a sua natureza através de gráficos exploratórios e estatísticas descritivas (percentual, medidas de tendência central e dispersão).(…) ajuste do modelo estatístico mais adequado (…). Ao fim, verificamos a aderência dos resíduos dos modelos às premissas dos testes (normalidade e heterocedasticidade) e a independência temporal da amostra. (…) confecção dos elementos visuais (figuras e tabelas)(…).

Objetivos:

“…the present study aimed to investigate the bacterial community structure of both soils with long-term agro-farming activities and undisturbed soils with natural vegetation, using metabarcoding, and assessing the differences in their ecological attributes.”

Metodologia:

O estudo foi feito utilizando 3 amostras em um fragmento florestal e 3 amostras numa área de fazenda adjacente. O baixo número amostral dificulta análises estatísticas que possam ser representativas, aumentando a chance de erro Tipo I. Porém, compreendemos que os dados de caracterização dos OTUs são um avanço tecnológico, visto que tal identificação permite avaliar os táxons que são filtrados em ambientes com pressão antropogênica. Assim, optamos por evidenciar estes resultados, caracterizando a estrutura do microbioma por meio da riqueza e de beta diversidade.

Esforço amostral: 3 amostras de solo por área (Fragmento e Fazenda).

Dados brutos

Dados enviados, Subset das 10 primeiras linhas:

# LOAD DATA #### 
setwd("D:/Dropbox/00_Unioeste/2 Pesquisa/3 Doutorado/Natália/16spaper")
species<-read.csv2("Table_All_24_02_22.csv", sep=";", header=T)
skim(species)

Data summary
Name	species
Number of rows	7256
Number of columns	9
_______________________
Column type frequency:
character	2
numeric	7
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
Rank	0	1	5	13	0	9	0
Scientific.Name	0	1	3	67	0	7256	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p25	p50	p75	p100	hist
TaxId	1	889406.32	950287.48	56643.5	376647	1776348	2770094	▇▁▂▂▂
F21_contigs	1	94.04	1626.67	1.0	4	20	110926	▇▁▁▁▁
F22_contigs	1	68.16	1182.93	0.0	3	14	81188	▇▁▁▁▁
F23_contigs	1	64.87	1129.32	0.0	3	13	77364	▇▁▁▁▁
R51_contigs	1	79.52	1359.37	1.0	4	18	99152	▇▁▁▁▁
R52_contigs	1	88.38	1514.71	1.0	5	21	110140	▇▁▁▁▁
R53_contigs	1	92.30	1568.72	1.0	5	23	114597	▇▁▁▁▁

summary(species)

##      Rank               TaxId         Scientific.Name     F21_contigs       
##  Length:7256        Min.   :      0   Length:7256        Min.   :     0.00  
##  Class :character   1st Qu.:  56644   Class :character   1st Qu.:     1.00  
##  Mode  :character   Median : 376647   Mode  :character   Median :     4.00  
##                     Mean   : 889406                      Mean   :    94.04  
##                     3rd Qu.:1776348                      3rd Qu.:    20.00  
##                     Max.   :2770094                      Max.   :110926.00  
##   F22_contigs        F23_contigs        R51_contigs        R52_contigs       
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :     0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:     1.00  
##  Median :    3.00   Median :    3.00   Median :    4.00   Median :     5.00  
##  Mean   :   68.16   Mean   :   64.87   Mean   :   79.52   Mean   :    88.38  
##  3rd Qu.:   14.00   3rd Qu.:   13.00   3rd Qu.:   18.00   3rd Qu.:    21.00  
##  Max.   :81188.00   Max.   :77364.00   Max.   :99152.00   Max.   :110140.00  
##   R53_contigs      
##  Min.   :     0.0  
##  1st Qu.:     1.0  
##  Median :     5.0  
##  Mean   :    92.3  
##  3rd Qu.:    23.0  
##  Max.   :114597.0

table(species$Rank)

## 
## below species         class        family         genus         order 
##           165           107           458          1511           219 
##        phylum       species  superkingdom  unclassified 
##            58          4733             4             1

species[1:20,1:9] %>% 
  kbl() %>% kable_classic(full_width = F, html_font = "Cambria")

Rank	TaxId	Scientific.Name	F21_contigs	F22_contigs	F23_contigs	R51_contigs	R52_contigs	R53_contigs
unclassified	0	Unknown	8305	6205	5388	5105	5815	6328
species	1916956	Synechococcus sp. SynAce01	5	4	3	9	5	5
species	1720344	Psychrobacter sp. AntiMn-1	0	1	0	0	0	0
superkingdom	2	Bacteria <bacteria>	110926	81188	77364	99152	110140	114597
genus	131079	Limnobacter	6	8	0	13	13	17
genus	6	Azorhizobium	99	39	47	44	52	40
species	1179669	Yersinia sp. KBS0713	0	2	0	0	0	0
species	7	Azorhizobium caulinodans	99	39	47	44	52	40
species	9	Buchnera aphidicola	18	10	10	29	35	49
species	1179675	Burkholderia sp. KBS0801	0	2	1	0	2	3
species	1179672	Flavobacterium sp. KBS0721	0	0	1	0	0	2
genus	10	Cellvibrio	4	9	6	10	25	11
species	1179673	Oerskovia sp. KBS0722	32	13	28	8	15	7
species	11	Cellulomonas gilvus	9	5	1	4	11	4
genus	13	Dictyoglomus	10	3	6	18	18	5
species	14	Dictyoglomus thermophilum	0	1	0	3	1	0
genus	65551	Meiothermus	31	37	34	45	35	57
species	65553	Sulfurospirillum deleyianum	0	0	0	0	0	1
genus	16	Methylophilus	0	0	0	1	2	1
species	65555	Desulfocapsa sulfexigens	1	3	0	5	4	8

genus=subset(species, species$Rank=="genus")
genus[1:20,1:9] %>% 
  kbl() %>% kable_classic(full_width = F, html_font = "Cambria")

	Rank	TaxId	Scientific.Name	F21_contigs	F22_contigs	F23_contigs	R51_contigs	R52_contigs	R53_contigs
5	genus	131079	Limnobacter	6	8	0	13	13	17
6	genus	6	Azorhizobium	99	39	47	44	52	40
12	genus	10	Cellvibrio	4	9	6	10	25	11
15	genus	13	Dictyoglomus	10	3	6	18	18	5
17	genus	65551	Meiothermus	31	37	34	45	35	57
19	genus	16	Methylophilus	0	0	0	1	2	1
21	genus	18	Pelobacter	25	24	27	53	40	64
23	genus	20	Phenylobacterium	471	373	329	217	222	245
24	genus	22	Shewanella	12	14	5	16	10	16
29	genus	32	Myxococcus	40	44	47	92	101	113
37	genus	40	Stigmatella	150	72	127	122	154	121
40	genus	42	Cystobacter	12	11	16	79	31	56
42	genus	44	Melittangium	17	14	16	32	57	61
44	genus	47	Archangium	119	65	75	81	102	93
47	genus	50	Chondromyces	74	41	55	175	219	219
54	genus	442430	Coraliomargarita	13	7	5	9	6	12
56	genus	59	Vitreoscilla	16	6	12	7	6	7
60	genus	68	Lysobacter	261	159	159	304	476	484
65	genus	75	Caulobacter	420	291	291	152	209	167
66	genus	622681	Candidatus Planktophila	4	3	1	3	3	1

family=subset(species, species$Rank=="family")
family[1:20,1:9] %>% 
  kbl() %>% kable_classic(full_width = F, html_font = "Cambria")

	Rank	TaxId	Scientific.Name	F21_contigs	F22_contigs	F23_contigs	R51_contigs	R52_contigs	R53_contigs
28	family	31	Myxococcaceae	132	100	119	216	250	282
36	family	39	Archangiaceae	357	196	294	374	418	395
46	family	49	Polyangiaceae	364	224	263	867	1039	1035
104	family	213119	Desulfobacteraceae	111	93	88	271	298	328
109	family	126	Planctomycetaceae	374	256	244	269	271	314
110	family	213117	Desulfohalobiaceae	9	7	10	32	34	33
112	family	213116	Desulfomicrobiaceae	22	8	15	40	39	70
115	family	213121	Desulfobulbaceae	105	65	72	152	173	210
123	family	137	Spirochaetaceae	67	51	55	121	148	166
141	family	170	Leptospiraceae	26	20	17	27	32	32
153	family	2162846	Candidatus Nanopelagicaceae	24	21	7	24	41	25
158	family	82115	Rhizobiaceae	2117	1204	1157	610	841	775
186	family	1769717	Mesoaciditogaceae	2	0	3	7	6	10
262	family	541000	Ruminococcaceae	33	26	25	67	60	91
303	family	49546	Flavobacteriaceae	201	96	105	272	496	477
309	family	403	Methylococcaceae	105	77	64	151	172	196
323	family	213422	Geobacteraceae	149	135	107	230	282	300
324	family	213421	Desulfuromonadaceae	57	38	55	106	105	124
326	family	433	Acetobacteraceae	1472	1079	1093	880	965	969
338	family	444	Legionellaceae	152	79	63	70	103	95

class=subset(species, species$Rank=="class")
class[1:20,1:9] %>% 
  kbl() %>% kable_classic(full_width = F, html_font = "Cambria")

	Rank	TaxId	Scientific.Name	F21_contigs	F22_contigs	F23_contigs	R51_contigs	R52_contigs	R53_contigs
71	class	28211	Alphaproteobacteria	35393	25712	23828	18871	21325	21833
75	class	1760	Actinobacteria <high GC Gram+>	26830	18927	19554	18237	19373	19209
76	class	28216	Betaproteobacteria	5046	3223	3111	7620	8902	9159
80	class	1236	Gammaproteobacteria	6212	4824	4366	5843	7462	7412
82	class	28221	Deltaproteobacteria	3324	2324	2399	5826	6637	6980
520	class	219685	Gemmatimonadetes <class>	753	720	552	2945	2884	3119
622	class	91061	Bacilli	1649	1201	1256	2570	2785	2998
782	class	1813735	Vicinamibacteria	1042	922	698	2472	2598	2793
792	class	186801	Clostridia	1318	1127	1046	2308	2380	2702
922	class	1853228	Chitinophagia	1353	869	646	1647	2168	2748
1080	class	204432	Acidobacteriia	3968	3861	3160	1904	2060	2154
1326	class	203683	Planctomycetia	3248	2138	2011	1947	1952	2170
1391	class	84992	Acidimicrobiia	1398	928	990	1619	1774	1934
1651	class	84995	Rubrobacteria	1223	827	912	1764	1706	1753
1691	class	1497346	Thermoleophilia	1359	881	1058	1321	1205	1295
1784	class	32061	Chloroflexia	153	98	76	1113	1267	1128
1801	class	768503	Cytophagia	420	180	215	628	1071	1081
1955	class	292625	Anaerolineae	157	110	86	641	703	758
1991	class	147550	Sordariomycetes	565	367	392	487	687	829
2053	class	203693	Nitrospira <class>	170	142	121	648	628	674

biotic=read.csv2("Pasta1.csv",h=T,sep =";", dec=".", row.names = 3)
skim(biotic)

Data summary
Name	biotic
Number of rows	4733
Number of columns	8
_______________________
Column type frequency:
character	1
numeric	7
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
Rank	0	1	7	7	0	1	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
TaxId	1	1012417.99	985090.36	7	68223	585425	1914471	2748873	▇▁▂▂▃
F21_contigs	1	15.28	57.09	0	0	2	10	1359	▇▁▁▁▁
F22_contigs	1	10.97	44.52	0	0	2	7	1119	▇▁▁▁▁
F23_contigs	1	10.43	40.80	0	0	2	7	1058	▇▁▁▁▁
R51_contigs	1	12.98	61.82	0	0	2	9	2472	▇▁▁▁▁
R52_contigs	1	14.42	64.28	0	0	3	10	2598	▇▁▁▁▁
R53_contigs	1	15.17	68.17	0	1	3	11	2793	▇▁▁▁▁

str(biotic)

## 'data.frame':    4733 obs. of  8 variables:
##  $ Rank       : chr  "species" "species" "species" "species" ...
##  $ TaxId      : int  1916956 1720344 1179669 7 9 1179675 1179672 1179673 11 14 ...
##  $ F21_contigs: int  5 0 0 99 18 0 0 32 9 0 ...
##  $ F22_contigs: int  4 1 2 39 10 2 0 13 5 1 ...
##  $ F23_contigs: int  3 0 0 47 10 1 1 28 1 0 ...
##  $ R51_contigs: int  9 0 0 44 29 0 0 8 4 3 ...
##  $ R52_contigs: int  5 0 0 52 35 2 0 15 11 1 ...
##  $ R53_contigs: int  5 0 0 40 49 3 2 7 4 0 ...

summary(biotic)

##      Rank               TaxId          F21_contigs       F22_contigs     
##  Length:4733        Min.   :      7   Min.   :   0.00   Min.   :   0.00  
##  Class :character   1st Qu.:  68223   1st Qu.:   0.00   1st Qu.:   0.00  
##  Mode  :character   Median : 585425   Median :   2.00   Median :   2.00  
##                     Mean   :1012418   Mean   :  15.28   Mean   :  10.97  
##                     3rd Qu.:1914471   3rd Qu.:  10.00   3rd Qu.:   7.00  
##                     Max.   :2748873   Max.   :1359.00   Max.   :1119.00  
##   F23_contigs       R51_contigs       R52_contigs       R53_contigs     
##  Min.   :   0.00   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   1.00  
##  Median :   2.00   Median :   2.00   Median :   3.00   Median :   3.00  
##  Mean   :  10.43   Mean   :  12.98   Mean   :  14.42   Mean   :  15.17  
##  3rd Qu.:   7.00   3rd Qu.:   9.00   3rd Qu.:  10.00   3rd Qu.:  11.00  
##  Max.   :1058.00   Max.   :2472.00   Max.   :2598.00   Max.   :2793.00

biotic[1:10,1:8] %>% 
  kbl(caption = "Abundance of OTUs per sample.") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Abundance of OTUs per sample.
	Rank	TaxId	F21_contigs	F22_contigs	F23_contigs	R51_contigs	R52_contigs	R53_contigs
Synechococcus sp. SynAce01	species	1916956	5	4	3	9	5	5
Psychrobacter sp. AntiMn-1	species	1720344	0	1	0	0	0	0
Yersinia sp. KBS0713	species	1179669	0	2	0	0	0	0
Azorhizobium caulinodans	species	7	99	39	47	44	52	40
Buchnera aphidicola	species	9	18	10	10	29	35	49
Burkholderia sp. KBS0801	species	1179675	0	2	1	0	2	3
Flavobacterium sp. KBS0721	species	1179672	0	0	1	0	0	2
Oerskovia sp. KBS0722	species	1179673	32	13	28	8	15	7
Cellulomonas gilvus	species	11	9	5	1	4	11	4
Dictyoglomus thermophilum	species	14	0	1	0	3	1	0

Cálculo dos Índices de diversidade para explorar a estrutura do microbioma:

Espécie

## Subset
species_1=subset(biotic, biotic$Rank=="species")
#str(species_1)
species_1$Rank<-as.factor(species_1$Rank)
#head(species_1)
sp=species_1[,-c(1:2)]#tirando as colunas Rank & TaxID
sp.t=t.data.frame(sp)#transpondo os dados 
col=nrow(species_1)
sp.t<-sp.t[,1:col]
soma1<-sp.t[1:2,]
soma1[1,]<-colSums(sp.t[1:3,])
soma1[2,]<-colSums(sp.t[4:6,])
rownames(soma1)<-c("F","R")

## Calculo dos índices de diversidade para análise exploratória
sha1=data.frame(diversity(soma1))
H1=data.frame(sha1) 
colnames(H1)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras

simp1 <- diversity(soma1, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
H1$Simpson<-as.vector(simp1)

# Species richness (S)
S1 <- specnumber(sp.t) ## rowSums(sp.t > 0)

P1 <- sha1/log(S1)# Pielou's evenness (J)

pairs(cbind(sha1, simp1), pch="+", col="blue")

H1$Richness<-c(round(mean(S1[1:3]),0),round(mean(S1[4:6]),0))
H1$Pielou<-P1[,1]

H1 %>% 
  kbl(caption = "Diversity Index and Richness per local") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Diversity Index and Richness per local
	Shannon	Simpson	Richness	Pielou
F	6.805272	0.9966543	3218	0.8360560
R	6.812034	0.9954103	3513	0.8460409

Família

#str(family)
family$Rank<-as.factor(family$Rank)
#head(species_1)
f=family[,-c(1:3)]#tirando as colunas Rank & TaxID
f.t=t.data.frame(f)#transpondo os dados 
col=nrow(family)
f.t<-f.t[,1:col]
soma2<-f.t[1:2,]
soma2[1,]<-colSums(f.t[1:3,])
soma2[2,]<-colSums(f.t[4:6,])
rownames(soma2)<-c("F","R")

## Calculo dos índices de diversidade para análise exploratória
sha2=data.frame(diversity(soma2))
H2=data.frame(sha2) 
colnames(H2)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras

simp2 <- diversity(soma2, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
H2$Simpson<-as.vector(simp2)

# Species richness (S)
S2 <- specnumber(f.t) ## rowSums(sp.t > 0)

P2 <- sha2/log(S2)# Pielou's evenness (J)

pairs(cbind(sha2, simp2), pch="+", col="blue")

H2$Richness<-c(round(mean(S2[1:3]),0),round(mean(S2[4:6]),0))
H2$Pielou<-P2[,1]

H2 %>% 
  kbl(caption = "Diversity Index and Richness per local") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Diversity Index and Richness per local
	Shannon	Simpson	Richness	Pielou
F	4.532598	0.9767113	417	0.7495153
R	4.859533	0.9862589	433	0.8070941

Classe

#str(Class)
class$Rank<-as.factor(class$Rank)
#head(species_1)
c=class[,-c(1:3)]#tirando as colunas Rank & TaxID
c.t=t.data.frame(c)#transpondo os dados 
col=nrow(class)
c.t<-c.t[,1:col]
soma3<-c.t[1:2,]
soma3[1,]<-colSums(c.t[1:3,])
soma3[2,]<-colSums(c.t[4:6,])
rownames(soma3)<-c("F","R")

## Calculo dos índices de diversidade para análise exploratória
sha3=data.frame(diversity(soma3))
H3=data.frame(sha3) 
colnames(H3)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras

simp3 <- diversity(soma3, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
H3$Simpson<-as.vector(simp3)

# Species richness (S)
S3 <- specnumber(c.t) ## rowSums(sp.t > 0)

P3 <- sha3/log(S3)# Pielou's evenness (J)

pairs(cbind(sha3, simp3), pch="+", col="blue")

H3$Richness<-c(round(sum(S3[1:3]),0),round(sum(S3[4:6]),0))
H3$Pielou<-P3[,1]

H3 %>% 
  kbl(caption = "Diversity Index and Richness per local") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Diversity Index and Richness per local
	Shannon	Simpson	Richness	Pielou
F	2.429273	0.8119179	305	0.5252512
R	2.890531	0.8933679	313	0.6249834

Olhando a similaridade entre as amostras

Espécies

rural=sp[,4:6]
rural=ifelse(rural>0,1,0)
rural=as.data.frame(apply(rural, 1, sum))

floresta=sp[,1:3]
floresta=ifelse(floresta>0,1,0)
floresta=as.data.frame(apply(floresta, 1, sum))

simi<-round(jaccard(rural, floresta),2)#compara dois vetores com as espécies ordenadas da mesma maneira de duas areas, para isso somei as especies que ocorriam em qualquer uma das amostras
simi<-as.data.frame(simi)
simi$bray<-round(bray.curtis(rural, floresta),2)
simi$sor<-round(sorenson(rural, floresta),2)

colnames(simi)<-c("Jaccard", "Bray-curtis", "Sorensen")
#REF: Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell.

simi %>% 
  kbl(caption = "Areas similarities") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Areas similarities
Jaccard	Bray-curtis	Sorensen
0.75	0.84	0.86

Família

rural=family[,7:9]
rural=ifelse(rural>0,1,0)
rural=as.data.frame(apply(rural, 1, sum))

floresta=family[,4:6]
floresta=ifelse(floresta>0,1,0)
floresta=as.data.frame(apply(floresta, 1, sum))

simi<-round(jaccard(rural, floresta),2)#compara dois vetores com as espécies ordenadas da mesma maneira de duas areas, para isso somei as especies que ocorriam em qualquer uma das amostras
simi<-as.data.frame(simi)
simi$bray<-round(bray.curtis(rural, floresta),2)
simi$sor<-round(sorenson(rural, floresta),2)

colnames(simi)<-c("Jaccard", "Bray-curtis", "Sorensen")
#REF: Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell.

simi %>% 
  kbl(caption = "Areas similarities") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Areas similarities
Jaccard	Bray-curtis	Sorensen
0.95	0.97	0.97

Classe

rural=class[,7:9]
rural=ifelse(rural>0,1,0)
rural=as.data.frame(apply(rural, 1, sum))

floresta=class[,4:6]
floresta=ifelse(floresta>0,1,0)
floresta=as.data.frame(apply(floresta, 1, sum))

simi<-round(jaccard(rural, floresta),2)#compara dois vetores com as espécies ordenadas da mesma maneira de duas areas, para isso somei as especies que ocorriam em qualquer uma das amostras
simi<-as.data.frame(simi)
simi$bray<-round(bray.curtis(rural, floresta),2)
simi$sor<-round(sorenson(rural, floresta),2)

colnames(simi)<-c("Jaccard", "Bray-curtis", "Sorensen")
#REF: Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell.

simi %>% 
  kbl(caption = "Areas similarities") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Areas similarities
Jaccard	Bray-curtis	Sorensen
0.98	0.98	0.99

forest.sp=species[,4:6]
forest.f=family[,4:6]
forest.c=class[,4:6]

rural.sp=species[,7:9]
rural.f=family[,7:9]
rural.c=class[,7:9]

f1=rowSums(forest.sp)
f2=rowSums(forest.f)
f3=rowSums(forest.c)
r1=rowSums(rural.sp)
r2=rowSums(rural.f)
r3=rowSums(rural.c)

Para comparar a composição de microbioma, após calcular os índices de diversidade, similaridade e riqueza, foi calculado a diversidade-beta.

Explorando Beta diversidade

??sor: Sorensen dissimilarity (Baselga, 2010) as operational response variable for the underlying structure of microbiome. To further test it as a function of the local.

Pairwise beta - diversidade calculada pela abordagem de Baselga (2010):

Espécie

bin_abun1 <- as.data.frame((sp.t>0)*1)# transformar em presence-absence
#head(bin_abun)

bifun1 <- beta.pair(bin_abun1) ## calcular beta par a par

sor.df1 <- matrixConvert(bifun1$beta.sor, # extrair beta sorensen entre obs
                      colname= c("sample1", "sample2", "beta.sor"))

#sor.df <- sor.df %>% ## reorganizar para usar na exploratoria
#  separate("sample1", c("ponto_1", "tempo_1", "substrato_1")) %>%
#  separate("sample2", c("ponto_2", "tempo_2", "substrato_2"))

ggplot(sor.df1, aes(sample1, sample2, fill= beta.sor)) + 
  geom_tile()+
  scale_fill_gradient(low="gray", high="dark orange")+
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(size=8, angle=0),
        axis.title.y = element_blank(),
        axis.text.y  = element_text(size=8))+ggtitle("Espécie")

Família

bin_abun2 <- as.data.frame((f.t>0)*1)# transformar em presence-absence
#head(bin_abun)

bifun2 <- beta.pair(bin_abun2) ## calcular beta par a par

sor.df2 <- matrixConvert(bifun2$beta.sor, # extrair beta sorensen entre obs
                      colname= c("sample1", "sample2", "beta.sor"))

#sor.df <- sor.df %>% ## reorganizar para usar na exploratoria
#  separate("sample1", c("ponto_1", "tempo_1", "substrato_1")) %>%
#  separate("sample2", c("ponto_2", "tempo_2", "substrato_2"))

ggplot(sor.df2, aes(sample1, sample2, fill= beta.sor)) + 
  geom_tile()+
  scale_fill_gradient(low="gray", high="dark orange")+
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(size=8, angle=0),
        axis.title.y = element_blank(),
        axis.text.y  = element_text(size=8))+ggtitle("Família")

Classe

bin_abun3 <- as.data.frame((c.t>0)*1)# transformar em presence-absence
#head(bin_abun)

bifun3 <- beta.pair(bin_abun3) ## calcular beta par a par

sor.df3 <- matrixConvert(bifun3$beta.sor, # extrair beta sorensen entre obs
                      colname= c("sample1", "sample2", "beta.sor"))

#sor.df <- sor.df %>% ## reorganizar para usar na exploratoria
#  separate("sample1", c("ponto_1", "tempo_1", "substrato_1")) %>%
#  separate("sample2", c("ponto_2", "tempo_2", "substrato_2"))

ggplot(sor.df3, aes(sample1, sample2, fill= beta.sor)) + 
  geom_tile()+
  scale_fill_gradient(low="gray", high="dark orange")+
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(size=8, angle=0),
        axis.title.y = element_blank(),
        axis.text.y  = element_text(size=8))+ggtitle("Classe")

Beta ~ area rural e fragmento:

Testando modelo para saber se tem diferença entre fazenda-fragmento na beta diversidade:

Olhando o modelo:

Espécie

sor1 <- sor.df1 %>% 
    unite("intervalo", c(sample1,sample2), remove=F) 

sor1$area<-c(1,1,1,0,0,0,0,0,0,2,0,0,0,2,2)
#quando compara frag com frag é 1
#quando compara frag com ref é 0
#quando compara ref com ref é 2

sor1$area<-as.factor(sor1$area)

## transformar beta antes de rodar para encaixar na distribuicao beta

y_transf_betareg = function(y){
  n.obs = sum(!is.na(y))
  (y*(n.obs-1) + 0.5)/n.obs
}

sor1$beta.sor.t <- y_transf_betareg(sor.df1$beta.sor)

m.tot1 <- betareg(beta.sor.t ~ area,
              data=sor1)
summary(m.tot1)

## 
## Call:
## betareg(formula = beta.sor.t ~ area, data = sor1)
## 
## Standardized weighted residuals 2:
##     Min      1Q  Median      3Q     Max 
## -2.1268 -1.0156  0.4182  0.8810  1.2260 
## 
## Coefficients (mean model with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.29550    0.01553 -83.443  < 2e-16 ***
## area1       -0.19965    0.03250  -6.143  8.1e-10 ***
## area2       -0.33014    0.03360  -9.827  < 2e-16 ***
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)   
## (phi)   2730.6      996.9   2.739  0.00616 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood:  51.9 on 4 Df
## Pseudo R-squared: 0.8921
## Number of iterations: 1713 (BFGS) + 3 (Fisher scoring)

performance::r2(m.tot1)

## # R2 for Beta Regression
##   Pseudo R2: 0.892

car::Anova(m.tot1)

## Analysis of Deviance Table (Type II tests)
## 
## Response: beta.sor.t
##      Df  Chisq Pr(>Chisq)    
## area  2 113.17  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ggemmeans(m.tot1, terms=c("area")) %>% plot() +
  theme_cowplot() +
  ggtitle("Espécie")+
  theme(axis.text.x = element_text(hjust=1))+
        ylab("Beta diversity")

ggeffect(m.tot1, terms=c("area")) %>% plot()

Podemos ver que a diferença nos valores de Beta são maiores quando comparamos área ruralxfragmento do que fragmento com fragmento [1] ou rural com rural[2].

Porém, vamos lembrar que beta varia de 0 a 1.

ggpredict(m.tot1, terms=c("area")) %>% plot()+
  theme(axis.text.x = element_text(hjust=1))+
        ylab("Beta diversity")+xlab("Area")+
  ggtitle("Beta Sorensen")

Portanto, a dimensão da diferença é ainda muito baixa. De 0 a 1, o índice fica entre 0.17 e 0.22, demonstrando que apesar dos números serem estatisticamente diferentes, a magnitude da diferença é pequena. Note que, apesar da riqueza da área Rural ser maior, a beta-diversidade dela ainda é menor do que a comparação entre fragmentos.

Comparações dos valores de beta-diversidade entre as amostras:

ggplot(sor1,
       aes(x=intervalo, y=beta.sor, col=area)) +
       geom_point(shape=5,size=2)+ ylab("Beta diversity")+
       theme(axis.text.x  = element_text(angle=75, vjust=0.5))+
       scale_x_discrete(name="") +
       scale_colour_discrete(name="",
                   labels=c("RuralxFrag","FragxFrag","RurxRur"))+ggtitle("Espécie")

Família

sor2 <- sor.df2 %>% 
    unite("intervalo", c(sample1,sample2), remove=F) 

sor2$area<-c(1,1,1,0,0,0,0,0,0,2,0,0,0,2,2)
#quando compara frag com frag é 1
#quando compara frag com ref é 0
#quando compara ref com ref é 2

sor2$area<-as.factor(sor2$area)

## transformar beta antes de rodar para encaixar na distribuicao beta

y_transf_betareg = function(y){
  n.obs = sum(!is.na(y))
  (y*(n.obs-1) + 0.5)/n.obs
}

sor2$beta.sor.t <- y_transf_betareg(sor.df2$beta.sor)

m.tot2 <- betareg(beta.sor.t ~ area,
              data=sor2)
summary(m.tot2)

## 
## Call:
## betareg(formula = beta.sor.t ~ area, data = sor2)
## 
## Standardized weighted residuals 2:
##     Min      1Q  Median      3Q     Max 
## -2.2736 -0.7393  0.2277  0.9330  1.5372 
## 
## Coefficients (mean model with logit link):
##             Estimate Std. Error  z value Pr(>|z|)    
## (Intercept) -2.60040    0.02367 -109.869  < 2e-16 ***
## area1       -0.07271    0.04845   -1.501  0.13339    
## area2       -0.13891    0.04953   -2.805  0.00504 ** 
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)   
## (phi)     3080       1125   2.738  0.00618 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 59.84 on 4 Df
## Pseudo R-squared: 0.3721
## Number of iterations: 2591 (BFGS) + 4 (Fisher scoring)

performance::r2(m.tot2)

## # R2 for Beta Regression
##   Pseudo R2: 0.372

car::Anova(m.tot2)

## Analysis of Deviance Table (Type II tests)
## 
## Response: beta.sor.t
##      Df  Chisq Pr(>Chisq)  
## area  2 8.6237    0.01341 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ggemmeans(m.tot2, terms=c("area")) %>% plot() +
  theme_cowplot() +
  ggtitle("Família")+
  theme(axis.text.x = element_text(hjust=1))+
        ylab("Beta diversity")

ggeffect(m.tot2, terms=c("area")) %>% plot()

ggplot(sor2,
       aes(x=intervalo, y=beta.sor, col=area)) +
       geom_point(shape=5,size=2)+ ylab("Beta diversity")+
       theme(axis.text.x  = element_text(angle=75, vjust=0.5))+
       scale_x_discrete(name="") +
       scale_colour_discrete(name="",
                   labels=c("RuralxFrag","FragxFrag","RurxRur"))+ggtitle("Família")

Classe

sor3 <- sor.df3 %>% 
    unite("intervalo", c(sample1,sample2), remove=F) 

sor3$area<-c(1,1,1,0,0,0,0,0,0,2,0,0,0,2,2)
#quando compara frag com frag é 1
#quando compara frag com ref é 0
#quando compara ref com ref é 2

sor3$area<-as.factor(sor3$area)

## transformar beta antes de rodar para encaixar na distribuicao beta

y_transf_betareg = function(y){
  n.obs = sum(!is.na(y))
  (y*(n.obs-1) + 0.5)/n.obs
}

sor3$beta.sor.t <- y_transf_betareg(sor.df3$beta.sor)

m.tot3 <- betareg(beta.sor.t ~ area,
              data=sor3)
summary(m.tot3)

## 
## Call:
## betareg(formula = beta.sor.t ~ area, data = sor3)
## 
## Standardized weighted residuals 2:
##     Min      1Q  Median      3Q     Max 
## -2.9724 -0.4697 -0.2840  0.8010  1.8829 
## 
## Coefficients (mean model with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.85225    0.04017 -71.001   <2e-16 ***
## area1       -0.07067    0.08213  -0.861    0.389    
## area2       -0.12329    0.08362  -1.474    0.140    
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)   
## (phi)   1330.5      486.2   2.737   0.0062 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 55.23 on 4 Df
## Pseudo R-squared: 0.1422
## Number of iterations: 1878 (BFGS) + 3 (Fisher scoring)

performance::r2(m.tot3)

## # R2 for Beta Regression
##   Pseudo R2: 0.142

car::Anova(m.tot3)

## Analysis of Deviance Table (Type II tests)
## 
## Response: beta.sor.t
##      Df  Chisq Pr(>Chisq)
## area  2 2.4548     0.2931

ggemmeans(m.tot3, terms=c("area")) %>% plot() +
  theme_cowplot() +
  ggtitle("Classe")+
  theme(axis.text.x = element_text(hjust=1))+
        ylab("Beta diversity")

ggeffect(m.tot3, terms=c("area")) %>% plot()

ggplot(sor3,
       aes(x=intervalo, y=beta.sor, col=area)) +
       geom_point(shape=5,size=2)+ ylab("Beta diversity")+
       theme(axis.text.x  = element_text(angle=75, vjust=0.5))+
       scale_x_discrete(name="") +
       scale_colour_discrete(name="",
                   labels=c("RuralxFrag","FragxFrag","RurxRur"))+ggtitle("Classe")

Abundância das OTUs

Abaixo temos um mapa de calor mostrando a variação na abundância das diferentes OTUs para cada uma das áreas.

Estrutura taxonômica

mtscaled1 = as.matrix(scale(species_1[-c(1:2)]))
row_order1 = hclust(dist(mtscaled1))$order
column_order1 = hclust(dist(t(mtscaled1)))$order
#heatmap reorders both variables and observations using a clustering algorithm: it computes the distance between each pair of rows and columns and try to order them by similarity.They allow us to simultaneously visualize clusters of samples and features. First hierarchical clustering is done of both the rows and the columns of the data matrix. The columns/rows of the data matrix are re-ordered according to the hierarchical clustering result, putting similar observations close to each other. The blocks of ‘high’ and ‘low’ values are adjacent in the data matrix. Finally, a color scheme is applied for the visualization and the data matrix is displayed. Visualizing the data matrix in this way can help to find the variables that appear to be characteristic for each sample cluster.
abund_t1<-t(as.matrix(soma1)) #todas as amostras juntas
my_group1 <- as.numeric(as.factor(substr(rownames(abund_t1), 1 , 1)))
colSide <- brewer.pal(9, "Set1")[my_group1]
colMain <- colorRampPalette(brewer.pal(9, "Set1"))(299)

heatmap(mtscaled1[row_order1,column_order1], col= colMain, Colv=NA, Rowv=NA, xlab="Areas", main="Microbiomes", cexRow=0.2, cexCol = 1, RowSideColors=colSide) #todas as amostras separadas

abundtscaled1 = as.matrix(scale(abund_t1))
abundt_row_order1 = hclust(dist(abundtscaled1))$order
colMainT <- colorRampPalette(brewer.pal(9, "Greens"))(299)
#isTRUE(abund_t[,1]==abund_t[,2])

heatmap(abund_t1, Colv = NA, Rowv = NA, xlab="Areas", RowSideColors=colSide, col=colMainT, main="Microbiome - Espécies",cexRow=0.2, cexCol = 1)

abund1<-as.data.frame(abund_t1[,1])
colnames(abund1)<-"abundance"
abund1$area<-rep("Frag",nrow(abund1))
abund_R1<-as.data.frame(abund_t1[,2])
colnames(abund_R1)<-"abundance"
abund_R1$area<-rep("Rural",nrow(abund_R1))
abund_to1<-rbind(abund1,abund_R1)
abund_to1$sp<-as.vector(rownames(abund_to1))

abund_to1$rank=as.numeric(rank(abund_to1$abundance))
head(abund_to1)

##                            abundance area                         sp   rank
## Synechococcus sp. SynAce01        12 Frag Synechococcus sp. SynAce01 5660.5
## Psychrobacter sp. AntiMn-1         1 Frag Psychrobacter sp. AntiMn-1 1779.5
## Yersinia sp. KBS0713               2 Frag       Yersinia sp. KBS0713 2714.5
## Azorhizobium caulinodans         185 Frag   Azorhizobium caulinodans 9097.0
## Buchnera aphidicola               38 Frag        Buchnera aphidicola 7597.5
## Burkholderia sp. KBS0801           3 Frag   Burkholderia sp. KBS0801 3317.5

ggplot(abund_to1,aes(x =rank,y = abundance, colour = area))+
  #geom_point()+
  geom_smooth(method="loess")+
  scale_colour_manual(values=c("orange", "darkgreen"),
                      name="Areas",
                   labels=c("Frag","Rural"))+
  ylab("Abundance")+ xlab("OTUs ranking")+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())

abund_to1$ablog<-log(abund_to1$abundance)
ggplot(abund_to1)+
  geom_point(aes(x = sp,y = ablog, colour = area))+
  scale_colour_manual(values=c("orange", "darkgreen"),
                      name="Areas",
                   labels=c("Frag","Rural"))+
  ylab("Log(Abundance)")+ xlab("OTUs")+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())

ggplot(abund_to1)+
  geom_point(aes(x = sp,y = abundance, colour = area))+
  scale_colour_manual(values=c("orange", "darkgreen"),
                      name="Areas",
                   labels=c("Frag","Rural"))+
  ylab("Abundance")+ xlab("OTUs")+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())

É possível ver como varia a abundância entre as amostras do fragmento e da área rural.

Abaixo a gente observa a correlação das abundâncias entre as diferentes áreas. É possível verificar que claramente as correlações mais fortes são entre as mesmas áreas.

my_fn <- function(data, mapping, ...){
  p <- ggplot(data = data, mapping = mapping) + 
    geom_point() + 
    geom_smooth(method=lm, fill="blue", color="blue", ...)
  p
}

ggpairs(species_1[-c(1:2)], 
  upper = list(continuous = wrap("cor", size=3)),
  lower = list(continuous = my_fn))

Apesar do n amostral muito pequeno, rodamos modelos com intuito de ganhar mais intuição sobre o que pode estar acontecendo. Contudo, a ausência de mais amostras não nos permite, de fato, saber a distribuição dos dados e ajustar um modelo que descreva bem os dados.

Modelos para ganhar intuição

We fit generalised linear models with negative binomial distribution for richness (MASS::glm.nb).

sha=data.frame(diversity(sp.t))
h=data.frame(sha) 
colnames(h)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras

simp <- diversity(sp.t, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
h$Simpson<-as.vector(simp)

# Species richness (S)
S <- specnumber(sp.t) ## rowSums(sp.t > 0)

P <- sha/log(S)# Pielou's evenness (J)

h$Richness<-S
h$Evenness<-round(P[,1],2)
h$Area<-c("F","F","F","R", "R", "R")

mod0 <- glm.nb(h$Richness ~ 1, data= h)
mod1 <- glm.nb(h$Richness ~ h$Area, data= h)

AICtab(mod0, mod1, base=T)

##      AIC  dAIC df
## mod1 80.6  0.0 3 
## mod0 84.1  3.4 2

hnp::hnp(mod1)

## Negative binomial model (using MASS package)

testResiduals(simulateResiduals(mod1))

## $uniformity
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.208, p-value = 0.9127
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.1551, p-value = 0.648
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa bootstrapped outlier test
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 6, p-value = 1
## alternative hypothesis: two.sided
##  percent confidence interval:
##  0.0000000 0.1666667
## sample estimates:
## outlier frequency (expected: 0.0116666666666667 ) 
##                                                 0

## $uniformity
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.208, p-value = 0.9127
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.1551, p-value = 0.648
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa bootstrapped outlier test
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 6, p-value = 1
## alternative hypothesis: two.sided
##  percent confidence interval:
##  0.0000000 0.1666667
## sample estimates:
## outlier frequency (expected: 0.0116666666666667 ) 
##                                                 0

Model results for richness:

The best fitted model was the one containing area as a predictor (R2 = 0.84).

summary(mod1)

## 
## Call:
## glm.nb(formula = h$Richness ~ h$Area, data = h, init.theta = 995.7504182, 
##     link = log)
## 
## Deviance Residuals: 
## F21_contigs  F22_contigs  F23_contigs  R51_contigs  R52_contigs  R53_contigs  
##     1.76908     -0.67909     -1.14227     -0.78899      0.06603      0.71091  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  8.07641    0.02094 385.752  < 2e-16 ***
## h$AreaR      0.08772    0.02946   2.977  0.00291 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(995.7504) family taken to be 1)
## 
##     Null deviance: 14.8918  on 5  degrees of freedom
## Residual deviance:  6.0278  on 4  degrees of freedom
## AIC: 80.63
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  996 
##           Std. Err.:  749 
## 
##  2 x log-likelihood:  -74.63

performance::r2(mod1)

## # R2 for Generalized Linear Regression
##   Nagelkerke's R2: 0.842

Beta diversidade

We fit generalised linear models with beta distribution for beta diversity (betareg::betareg) to the data using the R program. We base the selection in the Akaike information criteria (AIC-based), selecting the most parsimonious models with delta AICc < 2.

AIC-based model selection

#### MODELOS BINOMIAL
mdb0 <- betareg(beta.sor.t ~ 1, data= sor1)
mdb1 <- betareg(beta.sor.t ~ area, data= sor1)

AICtab(mdb0, mdb1, base=T)

##      AIC   dAIC  df
## mdb1 -95.8   0.0 4 
## mdb0 -67.1  28.7 2

MuMIn::AICc(mod0, mod1)

##      df     AICc
## mod0  2 88.06361
## mod1  3 92.62973

par(mfrow=c(2,2))
plot(mdb1)

Model results for beta:

The best fitted model was the one containing time and tree species (R2 = 0.89).

summary(mdb1)

## 
## Call:
## betareg(formula = beta.sor.t ~ area, data = sor1)
## 
## Standardized weighted residuals 2:
##     Min      1Q  Median      3Q     Max 
## -2.1268 -1.0156  0.4182  0.8810  1.2260 
## 
## Coefficients (mean model with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.29550    0.01553 -83.443  < 2e-16 ***
## area1       -0.19965    0.03250  -6.143  8.1e-10 ***
## area2       -0.33014    0.03360  -9.827  < 2e-16 ***
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)   
## (phi)   2730.6      996.9   2.739  0.00616 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood:  51.9 on 4 Df
## Pseudo R-squared: 0.8921
## Number of iterations: 1713 (BFGS) + 3 (Fisher scoring)

Variáveis abióticas

Variáveis abióticas também poderiam alterar a composição do microbioma. Fizemos alguns boxplots para explorar como estas variáveis estão mudando entre a área rural e a área de floresta, de acordo com o que nos foi enviado.

AB<-read.csv2("abioticas.csv", sep=";",  fileEncoding = "Latin1", header=T, dec=".")
AB$Area<-as.factor(AB$Area)
levels(AB$Area)

## [1] "Edge"   "Forest" "Rural"  "Urban"

head(AB)

##    Area Spot Arylsulfatase X..glucosidase Acid.Phosphatase Alkaline.Phosphatase
## 1 Urban    1      2.265181       3.121007         35.47741             7.889823
## 2 Urban    2     22.308969      20.948246         82.05325            13.600314
## 3 Urban    3     13.716082       5.310704         75.46843            12.200906
## 4 Urban    4     25.794935      11.454087         82.20383            11.513082
## 5 Urban    5      4.323747       6.116436         56.82712             7.254958
## 6 Urban    6     33.617979       7.830552         66.07508            21.418350
##   Density Porosity Moisture Microporosity Macroporosity Particle.Density   SOC
## 1    1.18    50.28   137.52         38.83         11.46             2.37 19.48
## 2    1.14    53.22   139.74         45.40          7.82             2.44 19.48
## 3    1.07    57.32   127.62         40.77         16.56             2.50 27.27
## 4    1.23    46.54   141.98         36.38         10.16             2.29 23.38
## 5    1.37    44.04   142.03         36.73          7.31             2.45 21.82
## 6    1.09    54.67   127.00         37.49         17.18             2.40 15.58
##     pH     P    K        N   Ca   Mg Al H.Al   BS  CEC   SOM Sand Clay Silt
## 1 5.11  5.36 0.26 1.464045 5.87 1.57  0 5.89 7.70 7.70 33.58 25.0   60 15.0
## 2 5.37  4.42 0.41 1.548105 5.56 1.53  0 4.34 7.50 7.50 33.58 45.0   40 15.0
## 3 5.14 20.44 0.61 2.416725 6.82 1.34  0 7.59 8.77 8.77 47.02 10.0   60 30.0
## 4 5.09 14.77 0.61 2.248605 7.19 1.65  0 6.11 9.45 9.45 40.30 27.5   40 32.5
## 5 5.14  9.66 0.20 2.080485 6.75 2.05  0 6.30 9.00 9.00 37.61 15.0   40 45.0
## 6 5.47  4.74 0.26 1.702215 5.83 2.71  0 4.85 8.80 8.80 26.87 20.0   60 20.0
##          Ba     Cr      Cu       Fe       Mn       Na     Ni      Pb      Zn
## 1 1701.7937 2.3323  9.3714 311.4072 105.4963 324.5041 0.0000  9.8453 11.2872
## 2 1533.6323 0.5898 26.2806 319.3578 427.1936  62.8692 1.0749  6.7686 11.1408
## 3 1338.5650 0.1877 21.1874 327.5233 276.3934  22.6249 2.2574  8.6146 26.3801
## 4 1695.0673 0.7238 16.5697 387.9052 241.4081  21.3530 0.6450 17.8446 20.0046
## 5 1661.4350 1.9301 19.7614 368.3509 291.0361  38.4098 0.4300 14.7679 16.4754
## 6  733.1839 1.2600  9.5751 224.5948 141.7591  39.6422 0.8600 21.5366  7.4651

str(AB)

## 'data.frame':    48 obs. of  36 variables:
##  $ Area                : Factor w/ 4 levels "Edge","Forest",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ Spot                : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Arylsulfatase       : num  2.27 22.31 13.72 25.79 4.32 ...
##  $ X..glucosidase      : num  3.12 20.95 5.31 11.45 6.12 ...
##  $ Acid.Phosphatase    : num  35.5 82.1 75.5 82.2 56.8 ...
##  $ Alkaline.Phosphatase: num  7.89 13.6 12.2 11.51 7.25 ...
##  $ Density             : num  1.18 1.14 1.07 1.23 1.37 1.09 0.96 0.95 1.2 1.11 ...
##  $ Porosity            : num  50.3 53.2 57.3 46.5 44 ...
##  $ Moisture            : num  138 140 128 142 142 ...
##  $ Microporosity       : num  38.8 45.4 40.8 36.4 36.7 ...
##  $ Macroporosity       : num  11.46 7.82 16.56 10.16 7.31 ...
##  $ Particle.Density    : num  2.37 2.44 2.5 2.29 2.45 2.4 2.28 2.4 2.54 2.39 ...
##  $ SOC                 : num  19.5 19.5 27.3 23.4 21.8 ...
##  $ pH                  : num  5.11 5.37 5.14 5.09 5.14 5.47 3.81 3.89 4.02 3.96 ...
##  $ P                   : num  5.36 4.42 20.44 14.77 9.66 ...
##  $ K                   : num  0.26 0.41 0.61 0.61 0.2 0.26 0.28 0.41 0.08 0.1 ...
##  $ N                   : num  1.46 1.55 2.42 2.25 2.08 ...
##  $ Ca                  : num  5.87 5.56 6.82 7.19 6.75 5.83 1.94 1.68 0.79 1.26 ...
##  $ Mg                  : num  1.57 1.53 1.34 1.65 2.05 2.71 0.41 0.54 0.18 0.38 ...
##  $ Al                  : num  0 0 0 0 0 0 2.15 1.15 1.65 1.45 ...
##  $ H.Al                : num  5.89 4.34 7.59 6.11 6.3 ...
##  $ BS                  : num  7.7 7.5 8.77 9.45 9 8.8 2.63 2.63 1.05 1.74 ...
##  $ CEC                 : num  7.7 7.5 8.77 9.45 9 8.8 4.78 3.78 2.7 3.19 ...
##  $ SOM                 : num  33.6 33.6 47 40.3 37.6 ...
##  $ Sand                : num  25 45 10 27.5 15 20 25 25 25 10 ...
##  $ Clay                : int  60 40 60 40 40 60 60 60 60 60 ...
##  $ Silt                : num  15 15 30 32.5 45 20 15 15 15 30 ...
##  $ Ba                  : num  1702 1534 1339 1695 1661 ...
##  $ Cr                  : num  2.332 0.59 0.188 0.724 1.93 ...
##  $ Cu                  : num  9.37 26.28 21.19 16.57 19.76 ...
##  $ Fe                  : num  311 319 328 388 368 ...
##  $ Mn                  : num  105 427 276 241 291 ...
##  $ Na                  : num  324.5 62.9 22.6 21.4 38.4 ...
##  $ Ni                  : num  0 1.075 2.257 0.645 0.43 ...
##  $ Pb                  : num  9.85 6.77 8.61 17.84 14.77 ...
##  $ Zn                  : num  11.3 11.1 26.4 20 16.5 ...

ABi<-AB[AB$Area=="Rural"|AB$Area=="Forest",]
vari<-colnames(ABi)

head(ABi)

##     Area Spot Arylsulfatase X..glucosidase Acid.Phosphatase
## 13 Rural    1      4.954743      0.1215587         75.89307
## 14 Rural    2     10.216895      0.2399537         89.09107
## 15 Rural    3     13.128901     -0.1823805         88.30463
## 16 Rural    4      9.809631      0.0485416         68.86310
## 17 Rural    5     20.454745      0.4921459         85.60830
## 18 Rural    6     23.786942      0.1136793         93.07828
##    Alkaline.Phosphatase Density Porosity Moisture Microporosity Macroporosity
## 13             5.134844    1.36    44.27   159.36         38.51          5.76
## 14             5.485450    1.35    45.74   157.91         37.61          8.12
## 15             1.980411    1.25    51.44   144.27         38.91         12.53
## 16             1.400418    1.32    48.26   151.95         36.30         11.96
## 17            11.822917    1.35    45.79   153.66         35.99          9.80
## 18             9.053077    1.32    50.80   159.02         31.92         18.88
##    Particle.Density   SOC   pH     P    K        N   Ca   Mg   Al  H.Al   BS
## 13             2.44 19.48 4.44  6.73 0.41 1.295925 6.05 2.44 0.10  5.00 8.90
## 14             2.49 21.82 4.50 10.84 0.41 2.668905 5.32 1.55 0.10  6.99 7.28
## 15             2.57 24.93 4.62  4.74 0.10 1.954395 5.26 1.79 0.03  6.74 7.15
## 16             2.54 23.38 4.18  6.04 0.15 1.870335 1.61 0.52 2.00 12.87 2.28
## 17             2.48 23.38 4.37  5.05 0.82 2.949105 7.34 1.59 0.01  5.15 9.75
## 18             2.69 17.14 4.45 10.84 0.77 2.374695 5.44 1.09 0.25  7.93 7.30
##     CEC   SOM Sand Clay Silt        Ba     Cr      Cu       Fe       Mn      Na
## 13 9.00 33.58   10   50   40 1257.8475 6.3534 24.3791 385.1117 215.3166 32.8401
## 14 7.38 37.61   15   55   30 1197.3094 5.4151 25.9410 232.1157 369.8512 55.9169
## 15 7.18 42.99   15   50   35 1291.4798 5.0130 21.9344 574.8527 136.6980 37.3669
## 16 4.28 40.30   15   50   35  248.8789 7.1576 27.0955 201.6024 255.2155 49.3201
## 17 9.76 40.30   25   35   40 1520.1794 7.5597 30.6267 207.1893 557.5530 35.3919
## 18 7.55 29.55   30   45   25  941.7040 7.2917 45.1591 130.0467 590.0323 41.9649
##        Ni      Pb      Zn
## 13 3.8698 19.6906  5.9689
## 14 1.1824 15.9986  6.7821
## 15 3.4398 27.0745  5.7086
## 16 2.1499 22.7672  5.7574
## 17 3.7623 21.5366  8.9777
## 18 2.5799 19.0752 11.0432

#Desta análise precisaremos retirar arylsulfatase, X..glicosidase, fosfatase básica e fosfatase ácida, pois são enzimas e creio que para este artigo não são pertinentes. Dá para tirar também msu, micro e macroporo, pois só com a porosidade já teremos uma boas descrição dos dados

ABi<-ABi[,-c(3:6,9:11)]

head(ABi)

##     Area Spot Density Porosity Particle.Density   SOC   pH     P    K        N
## 13 Rural    1    1.36    44.27             2.44 19.48 4.44  6.73 0.41 1.295925
## 14 Rural    2    1.35    45.74             2.49 21.82 4.50 10.84 0.41 2.668905
## 15 Rural    3    1.25    51.44             2.57 24.93 4.62  4.74 0.10 1.954395
## 16 Rural    4    1.32    48.26             2.54 23.38 4.18  6.04 0.15 1.870335
## 17 Rural    5    1.35    45.79             2.48 23.38 4.37  5.05 0.82 2.949105
## 18 Rural    6    1.32    50.80             2.69 17.14 4.45 10.84 0.77 2.374695
##      Ca   Mg   Al  H.Al   BS  CEC   SOM Sand Clay Silt        Ba     Cr      Cu
## 13 6.05 2.44 0.10  5.00 8.90 9.00 33.58   10   50   40 1257.8475 6.3534 24.3791
## 14 5.32 1.55 0.10  6.99 7.28 7.38 37.61   15   55   30 1197.3094 5.4151 25.9410
## 15 5.26 1.79 0.03  6.74 7.15 7.18 42.99   15   50   35 1291.4798 5.0130 21.9344
## 16 1.61 0.52 2.00 12.87 2.28 4.28 40.30   15   50   35  248.8789 7.1576 27.0955
## 17 7.34 1.59 0.01  5.15 9.75 9.76 40.30   25   35   40 1520.1794 7.5597 30.6267
## 18 5.44 1.09 0.25  7.93 7.30 7.55 29.55   30   45   25  941.7040 7.2917 45.1591
##          Fe       Mn      Na     Ni      Pb      Zn
## 13 385.1117 215.3166 32.8401 3.8698 19.6906  5.9689
## 14 232.1157 369.8512 55.9169 1.1824 15.9986  6.7821
## 15 574.8527 136.6980 37.3669 3.4398 27.0745  5.7086
## 16 201.6024 255.2155 49.3201 2.1499 22.7672  5.7574
## 17 207.1893 557.5530 35.3919 3.7623 21.5366  8.9777
## 18 130.0467 590.0323 41.9649 2.5799 19.0752 11.0432

Das variáveis medidas, as que parecem ser diferentes entre as áreas são Arilsulfatase,Fosfatase Ácida, Densidade, Porosidade, MSU, Microporo, Potássio (k) e MO. Os outros parecem iguais.

Realizamos um PCA para combinar essas variáveis:

### Componentes principais

pca <- PCA(ABi[,-1], quali.sup = 1, graph=F)
p <- fviz_pca_biplot(pca,geom="point", col.ind = ABi[,1],
                      repel = TRUE,addEllipses = T)
p

Contudo, não conseguimos linkar os valores das amostras das variáveis abióticas à estrutura das comunidades de OTU, de maneira que não conseguimos adicionar estas variáveis no modelo.

No artigo, aparecem dados de: - C: C_aCl_l - C: em cmolc dm^-³ - N: em cmolc dm^-³ - K: em cmolc dm^-³ - Ca: em cmolc dm^-³ - Mg: em cmolc dm^-³ - Al: em cmolc dm^-³ - H⁺Al: em cmolc dm^-³ - V: em porcentagem - Clay: em porcentagem

Se não for possível vincular às amostras, ainda assim pode ser interessante mostrar alguns boxplots.

for(i in 3:ncol(ABi)){
#i=4
A<-ggplot(ABi)+
  geom_boxplot(aes(y=ABi[,i], x=ABi[,1])) + 
  theme_cowplot()+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=0.5))
  
print(A)
print(vari[i])}

## [1] "Arylsulfatase"

## [1] "X..glucosidase"

## [1] "Acid.Phosphatase"

## [1] "Alkaline.Phosphatase"

## [1] "Density"

## [1] "Porosity"

## [1] "Moisture"

## [1] "Microporosity"

## [1] "Macroporosity"

## [1] "Particle.Density"

## [1] "SOC"

## [1] "pH"

## [1] "P"

## [1] "K"

## [1] "N"

## [1] "Ca"

## [1] "Mg"

## [1] "Al"

## [1] "H.Al"

## [1] "BS"

## [1] "CEC"

## [1] "SOM"

## [1] "Sand"

## [1] "Clay"

## [1] "Silt"

## [1] "Ba"

## [1] "Cr"

Rarefação

#### Rarefaction curve
########### https://rpubs.com/brouwern/iNEXTvVEGAN
## Directory and arquive
setwd("D:/Dropbox/00_Unioeste/2 Pesquisa/3 Doutorado/Natália/Artigos_Projeto/Analises_Artigo3")
biotic=read.csv2("Pasta1.csv",h=T,sep =";", dec=".", row.names = 3)
names(biotic)

## [1] "Rank"  "TaxId" "F1"    "F2"    "F3"    "R1"    "R2"    "R3"

## Subset
species=subset(biotic, biotic$Rank=="species")
names(species)

## [1] "Rank"  "TaxId" "F1"    "F2"    "F3"    "R1"    "R2"    "R3"

sp=species[,-c(1:2)]

s=data.frame(t(sp))

#### Packages
library(iNEXT)
library(vegan)

#### All sampling 
names(species)

## [1] "Rank"  "TaxId" "F1"    "F2"    "F3"    "R1"    "R2"    "R3"

x <- list("Forest1" = species[,3], "Forest2"=species[,4], "Forest3"=species[,5], "Rural1"=species[,6], "Rural2"=species[,7], "Rural3"=species[,8])
str(x)

## List of 6
##  $ Forest1: int [1:4733] 0 0 0 0 0 1 0 2 3 0 ...
##  $ Forest2: int [1:4733] 1 2 2 0 0 2 0 0 1 0 ...
##  $ Forest3: int [1:4733] 0 0 1 1 0 1 0 2 0 0 ...
##  $ Rural1 : int [1:4733] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Rural2 : int [1:4733] 0 0 2 0 0 2 0 3 2 5 ...
##  $ Rural3 : int [1:4733] 0 0 3 2 1 1 1 6 0 1 ...

# call iNEXT on the list
out <- iNEXT(x, q=0, datatype="abundance", endpoint = 150000)

# Sample-size-based R/E curves, separating by "site""
ggiNEXT(out, type=1)+theme_bw()

Como conclusão até o momento, com base nessas análises, a composição dos solos, não se diferencia em termos de índices de diversidade e são indicadas como similares pelo índice de similaridade. Porém, há diferença na riqueza. A ??diversidade indica que a área do fragmento e a área rural são diferentes, ainda que a diferença seja muito baixa. O mapa de calor que mostra como se diferença a estrutura da comunidade através da abundância das espécies por área. Por fim, a comparação dos dados abióticos indica diferenciação entre o fragmento e a área rural. Se for possível conectar as amostras de OTUs com as amostras de fatores abióticos, podemos investigar se existe algum impacto da variação nos fatores abióticos com a estrutura das OTUs

Some useful References

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Bolker B, R Development Core Team (2022). bbmle: Tools for General Maximum Likelihood Estimation. R package version 1.0.25, https://CRAN.R-project.org/package=bbmle.

Barto?? K (2022). MuMIn: Multi-Model Inference. R package version 1.46.0, https://CRAN.R-project.org/package=MuMIn.