Assessing the soil microbiome of a native Atlantic forest fragment and an adjacent farm

Conforme a proposta: “dados foram analisados quanto a sua natureza através de gráficos exploratórios e estatísticas descritivas (percentual, medidas de tendência central e dispersão).(…) ajuste do modelo estatístico mais adequado (…). Ao fim, verificamos a aderência dos resíduos dos modelos às premissas dos testes (normalidade e heterocedasticidade) e a independência temporal da amostra. (…) confecção dos elementos visuais (figuras e tabelas)(…).

Objetivos:

“…the present study aimed to investigate the bacterial community structure of both soils with long-term agro-farming activities and undisturbed soils with natural vegetation, using metabarcoding, and assessing the differences in their ecological attributes.”

Metodologia:

O estudo foi feito utilizando 3 amostras em um fragmento florestal e 3 amostras numa área de fazenda adjacente. O baixo número amostral dificulta análises estatísticas que possam ser representativas, aumentando a chance de erro Tipo I. Porém, compreendemos que os dados de caracterização dos OTUs são um avanço tecnológico, visto que tal identificação permite avaliar os táxons que são filtrados em ambientes com pressão antropogênica. Assim, optamos por evidenciar estes resultados, caracterizando a estrutura do microbioma por meio da riqueza e de beta diversidade.

Esforço amostral: 3 amostras de solo por área (Fragmento e Fazenda).

Dados brutos

Dados enviados, Subset das 10 primeiras linhas:

# LOAD DATA #### 
setwd("D:/Dropbox/00_Unioeste/2 Pesquisa/3 Doutorado/Natália/16spaper")
species<-read.csv2("Table_All_24_02_22.csv", sep=";", header=T)
skim(species)
Data summary
Name species
Number of rows 7256
Number of columns 9
_______________________
Column type frequency:
character 2
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Rank 0 1 5 13 0 9 0
Scientific.Name 0 1 3 67 0 7256 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
TaxId 0 1 889406.32 950287.48 0 56643.5 376647 1776348 2770094 ▇▁▂▂▂
F21_contigs 0 1 94.04 1626.67 0 1.0 4 20 110926 ▇▁▁▁▁
F22_contigs 0 1 68.16 1182.93 0 0.0 3 14 81188 ▇▁▁▁▁
F23_contigs 0 1 64.87 1129.32 0 0.0 3 13 77364 ▇▁▁▁▁
R51_contigs 0 1 79.52 1359.37 0 1.0 4 18 99152 ▇▁▁▁▁
R52_contigs 0 1 88.38 1514.71 0 1.0 5 21 110140 ▇▁▁▁▁
R53_contigs 0 1 92.30 1568.72 0 1.0 5 23 114597 ▇▁▁▁▁
summary(species)
##      Rank               TaxId         Scientific.Name     F21_contigs       
##  Length:7256        Min.   :      0   Length:7256        Min.   :     0.00  
##  Class :character   1st Qu.:  56644   Class :character   1st Qu.:     1.00  
##  Mode  :character   Median : 376647   Mode  :character   Median :     4.00  
##                     Mean   : 889406                      Mean   :    94.04  
##                     3rd Qu.:1776348                      3rd Qu.:    20.00  
##                     Max.   :2770094                      Max.   :110926.00  
##   F22_contigs        F23_contigs        R51_contigs        R52_contigs       
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :     0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:     1.00  
##  Median :    3.00   Median :    3.00   Median :    4.00   Median :     5.00  
##  Mean   :   68.16   Mean   :   64.87   Mean   :   79.52   Mean   :    88.38  
##  3rd Qu.:   14.00   3rd Qu.:   13.00   3rd Qu.:   18.00   3rd Qu.:    21.00  
##  Max.   :81188.00   Max.   :77364.00   Max.   :99152.00   Max.   :110140.00  
##   R53_contigs      
##  Min.   :     0.0  
##  1st Qu.:     1.0  
##  Median :     5.0  
##  Mean   :    92.3  
##  3rd Qu.:    23.0  
##  Max.   :114597.0
table(species$Rank)
## 
## below species         class        family         genus         order 
##           165           107           458          1511           219 
##        phylum       species  superkingdom  unclassified 
##            58          4733             4             1
species[1:20,1:9] %>% 
  kbl() %>% kable_classic(full_width = F, html_font = "Cambria")
Rank TaxId Scientific.Name F21_contigs F22_contigs F23_contigs R51_contigs R52_contigs R53_contigs
unclassified 0 Unknown 8305 6205 5388 5105 5815 6328
species 1916956 Synechococcus sp. SynAce01 5 4 3 9 5 5
species 1720344 Psychrobacter sp. AntiMn-1 0 1 0 0 0 0
superkingdom 2 Bacteria <bacteria> 110926 81188 77364 99152 110140 114597
genus 131079 Limnobacter 6 8 0 13 13 17
genus 6 Azorhizobium 99 39 47 44 52 40
species 1179669 Yersinia sp. KBS0713 0 2 0 0 0 0
species 7 Azorhizobium caulinodans 99 39 47 44 52 40
species 9 Buchnera aphidicola 18 10 10 29 35 49
species 1179675 Burkholderia sp. KBS0801 0 2 1 0 2 3
species 1179672 Flavobacterium sp. KBS0721 0 0 1 0 0 2
genus 10 Cellvibrio 4 9 6 10 25 11
species 1179673 Oerskovia sp. KBS0722 32 13 28 8 15 7
species 11 Cellulomonas gilvus 9 5 1 4 11 4
genus 13 Dictyoglomus 10 3 6 18 18 5
species 14 Dictyoglomus thermophilum 0 1 0 3 1 0
genus 65551 Meiothermus 31 37 34 45 35 57
species 65553 Sulfurospirillum deleyianum 0 0 0 0 0 1
genus 16 Methylophilus 0 0 0 1 2 1
species 65555 Desulfocapsa sulfexigens 1 3 0 5 4 8
genus=subset(species, species$Rank=="genus")
genus[1:20,1:9] %>% 
  kbl() %>% kable_classic(full_width = F, html_font = "Cambria")
Rank TaxId Scientific.Name F21_contigs F22_contigs F23_contigs R51_contigs R52_contigs R53_contigs
5 genus 131079 Limnobacter 6 8 0 13 13 17
6 genus 6 Azorhizobium 99 39 47 44 52 40
12 genus 10 Cellvibrio 4 9 6 10 25 11
15 genus 13 Dictyoglomus 10 3 6 18 18 5
17 genus 65551 Meiothermus 31 37 34 45 35 57
19 genus 16 Methylophilus 0 0 0 1 2 1
21 genus 18 Pelobacter 25 24 27 53 40 64
23 genus 20 Phenylobacterium 471 373 329 217 222 245
24 genus 22 Shewanella 12 14 5 16 10 16
29 genus 32 Myxococcus 40 44 47 92 101 113
37 genus 40 Stigmatella 150 72 127 122 154 121
40 genus 42 Cystobacter 12 11 16 79 31 56
42 genus 44 Melittangium 17 14 16 32 57 61
44 genus 47 Archangium 119 65 75 81 102 93
47 genus 50 Chondromyces 74 41 55 175 219 219
54 genus 442430 Coraliomargarita 13 7 5 9 6 12
56 genus 59 Vitreoscilla 16 6 12 7 6 7
60 genus 68 Lysobacter 261 159 159 304 476 484
65 genus 75 Caulobacter 420 291 291 152 209 167
66 genus 622681 Candidatus Planktophila 4 3 1 3 3 1
family=subset(species, species$Rank=="family")
family[1:20,1:9] %>% 
  kbl() %>% kable_classic(full_width = F, html_font = "Cambria")
Rank TaxId Scientific.Name F21_contigs F22_contigs F23_contigs R51_contigs R52_contigs R53_contigs
28 family 31 Myxococcaceae 132 100 119 216 250 282
36 family 39 Archangiaceae 357 196 294 374 418 395
46 family 49 Polyangiaceae 364 224 263 867 1039 1035
104 family 213119 Desulfobacteraceae 111 93 88 271 298 328
109 family 126 Planctomycetaceae 374 256 244 269 271 314
110 family 213117 Desulfohalobiaceae 9 7 10 32 34 33
112 family 213116 Desulfomicrobiaceae 22 8 15 40 39 70
115 family 213121 Desulfobulbaceae 105 65 72 152 173 210
123 family 137 Spirochaetaceae 67 51 55 121 148 166
141 family 170 Leptospiraceae 26 20 17 27 32 32
153 family 2162846 Candidatus Nanopelagicaceae 24 21 7 24 41 25
158 family 82115 Rhizobiaceae 2117 1204 1157 610 841 775
186 family 1769717 Mesoaciditogaceae 2 0 3 7 6 10
262 family 541000 Ruminococcaceae 33 26 25 67 60 91
303 family 49546 Flavobacteriaceae 201 96 105 272 496 477
309 family 403 Methylococcaceae 105 77 64 151 172 196
323 family 213422 Geobacteraceae 149 135 107 230 282 300
324 family 213421 Desulfuromonadaceae 57 38 55 106 105 124
326 family 433 Acetobacteraceae 1472 1079 1093 880 965 969
338 family 444 Legionellaceae 152 79 63 70 103 95
class=subset(species, species$Rank=="class")
class[1:20,1:9] %>% 
  kbl() %>% kable_classic(full_width = F, html_font = "Cambria")
Rank TaxId Scientific.Name F21_contigs F22_contigs F23_contigs R51_contigs R52_contigs R53_contigs
71 class 28211 Alphaproteobacteria 35393 25712 23828 18871 21325 21833
75 class 1760 Actinobacteria <high GC Gram+> 26830 18927 19554 18237 19373 19209
76 class 28216 Betaproteobacteria 5046 3223 3111 7620 8902 9159
80 class 1236 Gammaproteobacteria 6212 4824 4366 5843 7462 7412
82 class 28221 Deltaproteobacteria 3324 2324 2399 5826 6637 6980
520 class 219685 Gemmatimonadetes <class> 753 720 552 2945 2884 3119
622 class 91061 Bacilli 1649 1201 1256 2570 2785 2998
782 class 1813735 Vicinamibacteria 1042 922 698 2472 2598 2793
792 class 186801 Clostridia 1318 1127 1046 2308 2380 2702
922 class 1853228 Chitinophagia 1353 869 646 1647 2168 2748
1080 class 204432 Acidobacteriia 3968 3861 3160 1904 2060 2154
1326 class 203683 Planctomycetia 3248 2138 2011 1947 1952 2170
1391 class 84992 Acidimicrobiia 1398 928 990 1619 1774 1934
1651 class 84995 Rubrobacteria 1223 827 912 1764 1706 1753
1691 class 1497346 Thermoleophilia 1359 881 1058 1321 1205 1295
1784 class 32061 Chloroflexia 153 98 76 1113 1267 1128
1801 class 768503 Cytophagia 420 180 215 628 1071 1081
1955 class 292625 Anaerolineae 157 110 86 641 703 758
1991 class 147550 Sordariomycetes 565 367 392 487 687 829
2053 class 203693 Nitrospira <class> 170 142 121 648 628 674
biotic=read.csv2("Pasta1.csv",h=T,sep =";", dec=".", row.names = 3)
skim(biotic)
Data summary
Name biotic
Number of rows 4733
Number of columns 8
_______________________
Column type frequency:
character 1
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Rank 0 1 7 7 0 1 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
TaxId 0 1 1012417.99 985090.36 7 68223 585425 1914471 2748873 ▇▁▂▂▃
F21_contigs 0 1 15.28 57.09 0 0 2 10 1359 ▇▁▁▁▁
F22_contigs 0 1 10.97 44.52 0 0 2 7 1119 ▇▁▁▁▁
F23_contigs 0 1 10.43 40.80 0 0 2 7 1058 ▇▁▁▁▁
R51_contigs 0 1 12.98 61.82 0 0 2 9 2472 ▇▁▁▁▁
R52_contigs 0 1 14.42 64.28 0 0 3 10 2598 ▇▁▁▁▁
R53_contigs 0 1 15.17 68.17 0 1 3 11 2793 ▇▁▁▁▁
str(biotic)
## 'data.frame':    4733 obs. of  8 variables:
##  $ Rank       : chr  "species" "species" "species" "species" ...
##  $ TaxId      : int  1916956 1720344 1179669 7 9 1179675 1179672 1179673 11 14 ...
##  $ F21_contigs: int  5 0 0 99 18 0 0 32 9 0 ...
##  $ F22_contigs: int  4 1 2 39 10 2 0 13 5 1 ...
##  $ F23_contigs: int  3 0 0 47 10 1 1 28 1 0 ...
##  $ R51_contigs: int  9 0 0 44 29 0 0 8 4 3 ...
##  $ R52_contigs: int  5 0 0 52 35 2 0 15 11 1 ...
##  $ R53_contigs: int  5 0 0 40 49 3 2 7 4 0 ...
summary(biotic)
##      Rank               TaxId          F21_contigs       F22_contigs     
##  Length:4733        Min.   :      7   Min.   :   0.00   Min.   :   0.00  
##  Class :character   1st Qu.:  68223   1st Qu.:   0.00   1st Qu.:   0.00  
##  Mode  :character   Median : 585425   Median :   2.00   Median :   2.00  
##                     Mean   :1012418   Mean   :  15.28   Mean   :  10.97  
##                     3rd Qu.:1914471   3rd Qu.:  10.00   3rd Qu.:   7.00  
##                     Max.   :2748873   Max.   :1359.00   Max.   :1119.00  
##   F23_contigs       R51_contigs       R52_contigs       R53_contigs     
##  Min.   :   0.00   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:   1.00  
##  Median :   2.00   Median :   2.00   Median :   3.00   Median :   3.00  
##  Mean   :  10.43   Mean   :  12.98   Mean   :  14.42   Mean   :  15.17  
##  3rd Qu.:   7.00   3rd Qu.:   9.00   3rd Qu.:  10.00   3rd Qu.:  11.00  
##  Max.   :1058.00   Max.   :2472.00   Max.   :2598.00   Max.   :2793.00
biotic[1:10,1:8] %>% 
  kbl(caption = "Abundance of OTUs per sample.") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Abundance of OTUs per sample.
Rank TaxId F21_contigs F22_contigs F23_contigs R51_contigs R52_contigs R53_contigs
Synechococcus sp. SynAce01 species 1916956 5 4 3 9 5 5
Psychrobacter sp. AntiMn-1 species 1720344 0 1 0 0 0 0
Yersinia sp. KBS0713 species 1179669 0 2 0 0 0 0
Azorhizobium caulinodans species 7 99 39 47 44 52 40
Buchnera aphidicola species 9 18 10 10 29 35 49
Burkholderia sp. KBS0801 species 1179675 0 2 1 0 2 3
Flavobacterium sp. KBS0721 species 1179672 0 0 1 0 0 2
Oerskovia sp. KBS0722 species 1179673 32 13 28 8 15 7
Cellulomonas gilvus species 11 9 5 1 4 11 4
Dictyoglomus thermophilum species 14 0 1 0 3 1 0

Cálculo dos Índices de diversidade para explorar a estrutura do microbioma:

  • Espécie
## Subset
species_1=subset(biotic, biotic$Rank=="species")
#str(species_1)
species_1$Rank<-as.factor(species_1$Rank)
#head(species_1)
sp=species_1[,-c(1:2)]#tirando as colunas Rank & TaxID
sp.t=t.data.frame(sp)#transpondo os dados 
col=nrow(species_1)
sp.t<-sp.t[,1:col]
soma1<-sp.t[1:2,]
soma1[1,]<-colSums(sp.t[1:3,])
soma1[2,]<-colSums(sp.t[4:6,])
rownames(soma1)<-c("F","R")

## Calculo dos índices de diversidade para análise exploratória
sha1=data.frame(diversity(soma1))
H1=data.frame(sha1) 
colnames(H1)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras

simp1 <- diversity(soma1, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
H1$Simpson<-as.vector(simp1)

# Species richness (S)
S1 <- specnumber(sp.t) ## rowSums(sp.t > 0)

P1 <- sha1/log(S1)# Pielou's evenness (J)

pairs(cbind(sha1, simp1), pch="+", col="blue")

H1$Richness<-c(round(mean(S1[1:3]),0),round(mean(S1[4:6]),0))
H1$Pielou<-P1[,1]

H1 %>% 
  kbl(caption = "Diversity Index and Richness per local") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Diversity Index and Richness per local
Shannon Simpson Richness Pielou
F 6.805272 0.9966543 3218 0.8360560
R 6.812034 0.9954103 3513 0.8460409
  • Família
#str(family)
family$Rank<-as.factor(family$Rank)
#head(species_1)
f=family[,-c(1:3)]#tirando as colunas Rank & TaxID
f.t=t.data.frame(f)#transpondo os dados 
col=nrow(family)
f.t<-f.t[,1:col]
soma2<-f.t[1:2,]
soma2[1,]<-colSums(f.t[1:3,])
soma2[2,]<-colSums(f.t[4:6,])
rownames(soma2)<-c("F","R")

## Calculo dos índices de diversidade para análise exploratória
sha2=data.frame(diversity(soma2))
H2=data.frame(sha2) 
colnames(H2)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras

simp2 <- diversity(soma2, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
H2$Simpson<-as.vector(simp2)

# Species richness (S)
S2 <- specnumber(f.t) ## rowSums(sp.t > 0)

P2 <- sha2/log(S2)# Pielou's evenness (J)

pairs(cbind(sha2, simp2), pch="+", col="blue")

H2$Richness<-c(round(mean(S2[1:3]),0),round(mean(S2[4:6]),0))
H2$Pielou<-P2[,1]

H2 %>% 
  kbl(caption = "Diversity Index and Richness per local") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Diversity Index and Richness per local
Shannon Simpson Richness Pielou
F 4.532598 0.9767113 417 0.7495153
R 4.859533 0.9862589 433 0.8070941
  • Classe
#str(Class)
class$Rank<-as.factor(class$Rank)
#head(species_1)
c=class[,-c(1:3)]#tirando as colunas Rank & TaxID
c.t=t.data.frame(c)#transpondo os dados 
col=nrow(class)
c.t<-c.t[,1:col]
soma3<-c.t[1:2,]
soma3[1,]<-colSums(c.t[1:3,])
soma3[2,]<-colSums(c.t[4:6,])
rownames(soma3)<-c("F","R")

## Calculo dos índices de diversidade para análise exploratória
sha3=data.frame(diversity(soma3))
H3=data.frame(sha3) 
colnames(H3)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras

simp3 <- diversity(soma3, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
H3$Simpson<-as.vector(simp3)

# Species richness (S)
S3 <- specnumber(c.t) ## rowSums(sp.t > 0)

P3 <- sha3/log(S3)# Pielou's evenness (J)

pairs(cbind(sha3, simp3), pch="+", col="blue")

H3$Richness<-c(round(sum(S3[1:3]),0),round(sum(S3[4:6]),0))
H3$Pielou<-P3[,1]

H3 %>% 
  kbl(caption = "Diversity Index and Richness per local") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Diversity Index and Richness per local
Shannon Simpson Richness Pielou
F 2.429273 0.8119179 305 0.5252512
R 2.890531 0.8933679 313 0.6249834

Olhando a similaridade entre as amostras

  • Espécies
rural=sp[,4:6]
rural=ifelse(rural>0,1,0)
rural=as.data.frame(apply(rural, 1, sum))

floresta=sp[,1:3]
floresta=ifelse(floresta>0,1,0)
floresta=as.data.frame(apply(floresta, 1, sum))

simi<-round(jaccard(rural, floresta),2)#compara dois vetores com as espécies ordenadas da mesma maneira de duas areas, para isso somei as especies que ocorriam em qualquer uma das amostras
simi<-as.data.frame(simi)
simi$bray<-round(bray.curtis(rural, floresta),2)
simi$sor<-round(sorenson(rural, floresta),2)

colnames(simi)<-c("Jaccard", "Bray-curtis", "Sorensen")
#REF: Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell.

simi %>% 
  kbl(caption = "Areas similarities") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Areas similarities
Jaccard Bray-curtis Sorensen
0.75 0.84 0.86
  • Família
rural=family[,7:9]
rural=ifelse(rural>0,1,0)
rural=as.data.frame(apply(rural, 1, sum))

floresta=family[,4:6]
floresta=ifelse(floresta>0,1,0)
floresta=as.data.frame(apply(floresta, 1, sum))

simi<-round(jaccard(rural, floresta),2)#compara dois vetores com as espécies ordenadas da mesma maneira de duas areas, para isso somei as especies que ocorriam em qualquer uma das amostras
simi<-as.data.frame(simi)
simi$bray<-round(bray.curtis(rural, floresta),2)
simi$sor<-round(sorenson(rural, floresta),2)

colnames(simi)<-c("Jaccard", "Bray-curtis", "Sorensen")
#REF: Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell.

simi %>% 
  kbl(caption = "Areas similarities") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Areas similarities
Jaccard Bray-curtis Sorensen
0.95 0.97 0.97
  • Classe

    rural=class[,7:9]
    rural=ifelse(rural>0,1,0)
    rural=as.data.frame(apply(rural, 1, sum))
    
    floresta=class[,4:6]
    floresta=ifelse(floresta>0,1,0)
    floresta=as.data.frame(apply(floresta, 1, sum))
    
    simi<-round(jaccard(rural, floresta),2)#compara dois vetores com as espécies ordenadas da mesma maneira de duas areas, para isso somei as especies que ocorriam em qualquer uma das amostras
    simi<-as.data.frame(simi)
    simi$bray<-round(bray.curtis(rural, floresta),2)
    simi$sor<-round(sorenson(rural, floresta),2)
    
    colnames(simi)<-c("Jaccard", "Bray-curtis", "Sorensen")
    #REF: Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell.
    
    simi %>% 
      kbl(caption = "Areas similarities") %>%
      kable_classic(full_width = F, html_font = "Cambria")

    Areas similarities

    Jaccard

    Bray-curtis

    Sorensen

    0.98

    0.98

    0.99

forest.sp=species[,4:6]
forest.f=family[,4:6]
forest.c=class[,4:6]

rural.sp=species[,7:9]
rural.f=family[,7:9]
rural.c=class[,7:9]

f1=rowSums(forest.sp)
f2=rowSums(forest.f)
f3=rowSums(forest.c)
r1=rowSums(rural.sp)
r2=rowSums(rural.f)
r3=rowSums(rural.c)

Para comparar a composição de microbioma, após calcular os índices de diversidade, similaridade e riqueza, foi calculado a diversidade-beta.

Explorando Beta diversidade

??sor: Sorensen dissimilarity (Baselga, 2010) as operational response variable for the underlying structure of microbiome. To further test it as a function of the local.

Pairwise beta - diversidade calculada pela abordagem de Baselga (2010):

  • Espécie
bin_abun1 <- as.data.frame((sp.t>0)*1)# transformar em presence-absence
#head(bin_abun)

bifun1 <- beta.pair(bin_abun1) ## calcular beta par a par

sor.df1 <- matrixConvert(bifun1$beta.sor, # extrair beta sorensen entre obs
                      colname= c("sample1", "sample2", "beta.sor"))

#sor.df <- sor.df %>% ## reorganizar para usar na exploratoria
#  separate("sample1", c("ponto_1", "tempo_1", "substrato_1")) %>%
#  separate("sample2", c("ponto_2", "tempo_2", "substrato_2"))

ggplot(sor.df1, aes(sample1, sample2, fill= beta.sor)) + 
  geom_tile()+
  scale_fill_gradient(low="gray", high="dark orange")+
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(size=8, angle=0),
        axis.title.y = element_blank(),
        axis.text.y  = element_text(size=8))+ggtitle("Espécie")

  • Família
bin_abun2 <- as.data.frame((f.t>0)*1)# transformar em presence-absence
#head(bin_abun)

bifun2 <- beta.pair(bin_abun2) ## calcular beta par a par

sor.df2 <- matrixConvert(bifun2$beta.sor, # extrair beta sorensen entre obs
                      colname= c("sample1", "sample2", "beta.sor"))

#sor.df <- sor.df %>% ## reorganizar para usar na exploratoria
#  separate("sample1", c("ponto_1", "tempo_1", "substrato_1")) %>%
#  separate("sample2", c("ponto_2", "tempo_2", "substrato_2"))

ggplot(sor.df2, aes(sample1, sample2, fill= beta.sor)) + 
  geom_tile()+
  scale_fill_gradient(low="gray", high="dark orange")+
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(size=8, angle=0),
        axis.title.y = element_blank(),
        axis.text.y  = element_text(size=8))+ggtitle("Família")

  • Classe
bin_abun3 <- as.data.frame((c.t>0)*1)# transformar em presence-absence
#head(bin_abun)

bifun3 <- beta.pair(bin_abun3) ## calcular beta par a par

sor.df3 <- matrixConvert(bifun3$beta.sor, # extrair beta sorensen entre obs
                      colname= c("sample1", "sample2", "beta.sor"))

#sor.df <- sor.df %>% ## reorganizar para usar na exploratoria
#  separate("sample1", c("ponto_1", "tempo_1", "substrato_1")) %>%
#  separate("sample2", c("ponto_2", "tempo_2", "substrato_2"))

ggplot(sor.df3, aes(sample1, sample2, fill= beta.sor)) + 
  geom_tile()+
  scale_fill_gradient(low="gray", high="dark orange")+
  theme(axis.title.x = element_blank(),
        axis.text.x  = element_text(size=8, angle=0),
        axis.title.y = element_blank(),
        axis.text.y  = element_text(size=8))+ggtitle("Classe")

Beta ~ area rural e fragmento:

Testando modelo para saber se tem diferença entre fazenda-fragmento na beta diversidade:

Olhando o modelo:

  • Espécie
sor1 <- sor.df1 %>% 
    unite("intervalo", c(sample1,sample2), remove=F) 

sor1$area<-c(1,1,1,0,0,0,0,0,0,2,0,0,0,2,2)
#quando compara frag com frag é 1
#quando compara frag com ref é 0
#quando compara ref com ref é 2

sor1$area<-as.factor(sor1$area)

## transformar beta antes de rodar para encaixar na distribuicao beta

y_transf_betareg = function(y){
  n.obs = sum(!is.na(y))
  (y*(n.obs-1) + 0.5)/n.obs
}

sor1$beta.sor.t <- y_transf_betareg(sor.df1$beta.sor)

m.tot1 <- betareg(beta.sor.t ~ area,
              data=sor1)
summary(m.tot1)
## 
## Call:
## betareg(formula = beta.sor.t ~ area, data = sor1)
## 
## Standardized weighted residuals 2:
##     Min      1Q  Median      3Q     Max 
## -2.1268 -1.0156  0.4182  0.8810  1.2260 
## 
## Coefficients (mean model with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.29550    0.01553 -83.443  < 2e-16 ***
## area1       -0.19965    0.03250  -6.143  8.1e-10 ***
## area2       -0.33014    0.03360  -9.827  < 2e-16 ***
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)   
## (phi)   2730.6      996.9   2.739  0.00616 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood:  51.9 on 4 Df
## Pseudo R-squared: 0.8921
## Number of iterations: 1713 (BFGS) + 3 (Fisher scoring)
performance::r2(m.tot1)
## # R2 for Beta Regression
##   Pseudo R2: 0.892
car::Anova(m.tot1)
## Analysis of Deviance Table (Type II tests)
## 
## Response: beta.sor.t
##      Df  Chisq Pr(>Chisq)    
## area  2 113.17  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggemmeans(m.tot1, terms=c("area")) %>% plot() +
  theme_cowplot() +
  ggtitle("Espécie")+
  theme(axis.text.x = element_text(hjust=1))+
        ylab("Beta diversity")

ggeffect(m.tot1, terms=c("area")) %>% plot() 

Podemos ver que a diferença nos valores de Beta são maiores quando comparamos área ruralxfragmento do que fragmento com fragmento [1] ou rural com rural[2].

Porém, vamos lembrar que beta varia de 0 a 1.

ggpredict(m.tot1, terms=c("area")) %>% plot()+
  theme(axis.text.x = element_text(hjust=1))+
        ylab("Beta diversity")+xlab("Area")+
  ggtitle("Beta Sorensen")

Portanto, a dimensão da diferença é ainda muito baixa. De 0 a 1, o índice fica entre 0.17 e 0.22, demonstrando que apesar dos números serem estatisticamente diferentes, a magnitude da diferença é pequena. Note que, apesar da riqueza da área Rural ser maior, a beta-diversidade dela ainda é menor do que a comparação entre fragmentos.

Comparações dos valores de beta-diversidade entre as amostras:

ggplot(sor1,
       aes(x=intervalo, y=beta.sor, col=area)) +
       geom_point(shape=5,size=2)+ ylab("Beta diversity")+
       theme(axis.text.x  = element_text(angle=75, vjust=0.5))+
       scale_x_discrete(name="") +
       scale_colour_discrete(name="",
                   labels=c("RuralxFrag","FragxFrag","RurxRur"))+ggtitle("Espécie")

  • Família

    sor2 <- sor.df2 %>% 
        unite("intervalo", c(sample1,sample2), remove=F) 
    
    sor2$area<-c(1,1,1,0,0,0,0,0,0,2,0,0,0,2,2)
    #quando compara frag com frag é 1
    #quando compara frag com ref é 0
    #quando compara ref com ref é 2
    
    sor2$area<-as.factor(sor2$area)
    
    ## transformar beta antes de rodar para encaixar na distribuicao beta
    
    y_transf_betareg = function(y){
      n.obs = sum(!is.na(y))
      (y*(n.obs-1) + 0.5)/n.obs
    }
    
    sor2$beta.sor.t <- y_transf_betareg(sor.df2$beta.sor)
    
    m.tot2 <- betareg(beta.sor.t ~ area,
                  data=sor2)
    summary(m.tot2)
    ## 
    ## Call:
    ## betareg(formula = beta.sor.t ~ area, data = sor2)
    ## 
    ## Standardized weighted residuals 2:
    ##     Min      1Q  Median      3Q     Max 
    ## -2.2736 -0.7393  0.2277  0.9330  1.5372 
    ## 
    ## Coefficients (mean model with logit link):
    ##             Estimate Std. Error  z value Pr(>|z|)    
    ## (Intercept) -2.60040    0.02367 -109.869  < 2e-16 ***
    ## area1       -0.07271    0.04845   -1.501  0.13339    
    ## area2       -0.13891    0.04953   -2.805  0.00504 ** 
    ## 
    ## Phi coefficients (precision model with identity link):
    ##       Estimate Std. Error z value Pr(>|z|)   
    ## (phi)     3080       1125   2.738  0.00618 **
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
    ## 
    ## Type of estimator: ML (maximum likelihood)
    ## Log-likelihood: 59.84 on 4 Df
    ## Pseudo R-squared: 0.3721
    ## Number of iterations: 2591 (BFGS) + 4 (Fisher scoring)
    performance::r2(m.tot2)
    ## # R2 for Beta Regression
    ##   Pseudo R2: 0.372
    car::Anova(m.tot2)
    ## Analysis of Deviance Table (Type II tests)
    ## 
    ## Response: beta.sor.t
    ##      Df  Chisq Pr(>Chisq)  
    ## area  2 8.6237    0.01341 *
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ggemmeans(m.tot2, terms=c("area")) %>% plot() +
      theme_cowplot() +
      ggtitle("Família")+
      theme(axis.text.x = element_text(hjust=1))+
            ylab("Beta diversity")

    ggeffect(m.tot2, terms=c("area")) %>% plot() 

    ggplot(sor2,
           aes(x=intervalo, y=beta.sor, col=area)) +
           geom_point(shape=5,size=2)+ ylab("Beta diversity")+
           theme(axis.text.x  = element_text(angle=75, vjust=0.5))+
           scale_x_discrete(name="") +
           scale_colour_discrete(name="",
                       labels=c("RuralxFrag","FragxFrag","RurxRur"))+ggtitle("Família")

  • Classe

    sor3 <- sor.df3 %>% 
        unite("intervalo", c(sample1,sample2), remove=F) 
    
    sor3$area<-c(1,1,1,0,0,0,0,0,0,2,0,0,0,2,2)
    #quando compara frag com frag é 1
    #quando compara frag com ref é 0
    #quando compara ref com ref é 2
    
    sor3$area<-as.factor(sor3$area)
    
    ## transformar beta antes de rodar para encaixar na distribuicao beta
    
    y_transf_betareg = function(y){
      n.obs = sum(!is.na(y))
      (y*(n.obs-1) + 0.5)/n.obs
    }
    
    sor3$beta.sor.t <- y_transf_betareg(sor.df3$beta.sor)
    
    m.tot3 <- betareg(beta.sor.t ~ area,
                  data=sor3)
    summary(m.tot3)
    ## 
    ## Call:
    ## betareg(formula = beta.sor.t ~ area, data = sor3)
    ## 
    ## Standardized weighted residuals 2:
    ##     Min      1Q  Median      3Q     Max 
    ## -2.9724 -0.4697 -0.2840  0.8010  1.8829 
    ## 
    ## Coefficients (mean model with logit link):
    ##             Estimate Std. Error z value Pr(>|z|)    
    ## (Intercept) -2.85225    0.04017 -71.001   <2e-16 ***
    ## area1       -0.07067    0.08213  -0.861    0.389    
    ## area2       -0.12329    0.08362  -1.474    0.140    
    ## 
    ## Phi coefficients (precision model with identity link):
    ##       Estimate Std. Error z value Pr(>|z|)   
    ## (phi)   1330.5      486.2   2.737   0.0062 **
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
    ## 
    ## Type of estimator: ML (maximum likelihood)
    ## Log-likelihood: 55.23 on 4 Df
    ## Pseudo R-squared: 0.1422
    ## Number of iterations: 1878 (BFGS) + 3 (Fisher scoring)
    performance::r2(m.tot3)
    ## # R2 for Beta Regression
    ##   Pseudo R2: 0.142
    car::Anova(m.tot3)
    ## Analysis of Deviance Table (Type II tests)
    ## 
    ## Response: beta.sor.t
    ##      Df  Chisq Pr(>Chisq)
    ## area  2 2.4548     0.2931
    ggemmeans(m.tot3, terms=c("area")) %>% plot() +
      theme_cowplot() +
      ggtitle("Classe")+
      theme(axis.text.x = element_text(hjust=1))+
            ylab("Beta diversity")

    ggeffect(m.tot3, terms=c("area")) %>% plot() 

    ggplot(sor3,
           aes(x=intervalo, y=beta.sor, col=area)) +
           geom_point(shape=5,size=2)+ ylab("Beta diversity")+
           theme(axis.text.x  = element_text(angle=75, vjust=0.5))+
           scale_x_discrete(name="") +
           scale_colour_discrete(name="",
                       labels=c("RuralxFrag","FragxFrag","RurxRur"))+ggtitle("Classe")

Abundância das OTUs

Abaixo temos um mapa de calor mostrando a variação na abundância das diferentes OTUs para cada uma das áreas.

Estrutura taxonômica

mtscaled1 = as.matrix(scale(species_1[-c(1:2)]))
row_order1 = hclust(dist(mtscaled1))$order
column_order1 = hclust(dist(t(mtscaled1)))$order
#heatmap reorders both variables and observations using a clustering algorithm: it computes the distance between each pair of rows and columns and try to order them by similarity.They allow us to simultaneously visualize clusters of samples and features. First hierarchical clustering is done of both the rows and the columns of the data matrix. The columns/rows of the data matrix are re-ordered according to the hierarchical clustering result, putting similar observations close to each other. The blocks of ‘high’ and ‘low’ values are adjacent in the data matrix. Finally, a color scheme is applied for the visualization and the data matrix is displayed. Visualizing the data matrix in this way can help to find the variables that appear to be characteristic for each sample cluster.
abund_t1<-t(as.matrix(soma1)) #todas as amostras juntas
my_group1 <- as.numeric(as.factor(substr(rownames(abund_t1), 1 , 1)))
colSide <- brewer.pal(9, "Set1")[my_group1]
colMain <- colorRampPalette(brewer.pal(9, "Set1"))(299)

heatmap(mtscaled1[row_order1,column_order1], col= colMain, Colv=NA, Rowv=NA, xlab="Areas", main="Microbiomes", cexRow=0.2, cexCol = 1, RowSideColors=colSide) #todas as amostras separadas

abundtscaled1 = as.matrix(scale(abund_t1))
abundt_row_order1 = hclust(dist(abundtscaled1))$order
colMainT <- colorRampPalette(brewer.pal(9, "Greens"))(299)
#isTRUE(abund_t[,1]==abund_t[,2])

heatmap(abund_t1, Colv = NA, Rowv = NA, xlab="Areas", RowSideColors=colSide, col=colMainT, main="Microbiome - Espécies",cexRow=0.2, cexCol = 1)

abund1<-as.data.frame(abund_t1[,1])
colnames(abund1)<-"abundance"
abund1$area<-rep("Frag",nrow(abund1))
abund_R1<-as.data.frame(abund_t1[,2])
colnames(abund_R1)<-"abundance"
abund_R1$area<-rep("Rural",nrow(abund_R1))
abund_to1<-rbind(abund1,abund_R1)
abund_to1$sp<-as.vector(rownames(abund_to1))

abund_to1$rank=as.numeric(rank(abund_to1$abundance))
head(abund_to1)
##                            abundance area                         sp   rank
## Synechococcus sp. SynAce01        12 Frag Synechococcus sp. SynAce01 5660.5
## Psychrobacter sp. AntiMn-1         1 Frag Psychrobacter sp. AntiMn-1 1779.5
## Yersinia sp. KBS0713               2 Frag       Yersinia sp. KBS0713 2714.5
## Azorhizobium caulinodans         185 Frag   Azorhizobium caulinodans 9097.0
## Buchnera aphidicola               38 Frag        Buchnera aphidicola 7597.5
## Burkholderia sp. KBS0801           3 Frag   Burkholderia sp. KBS0801 3317.5
ggplot(abund_to1,aes(x =rank,y = abundance, colour = area))+
  #geom_point()+
  geom_smooth(method="loess")+
  scale_colour_manual(values=c("orange", "darkgreen"),
                      name="Areas",
                   labels=c("Frag","Rural"))+
  ylab("Abundance")+ xlab("OTUs ranking")+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())

abund_to1$ablog<-log(abund_to1$abundance)
ggplot(abund_to1)+
  geom_point(aes(x = sp,y = ablog, colour = area))+
  scale_colour_manual(values=c("orange", "darkgreen"),
                      name="Areas",
                   labels=c("Frag","Rural"))+
  ylab("Log(Abundance)")+ xlab("OTUs")+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())

ggplot(abund_to1)+
  geom_point(aes(x = sp,y = abundance, colour = area))+
  scale_colour_manual(values=c("orange", "darkgreen"),
                      name="Areas",
                   labels=c("Frag","Rural"))+
  ylab("Abundance")+ xlab("OTUs")+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())

É possível ver como varia a abundância entre as amostras do fragmento e da área rural.

Abaixo a gente observa a correlação das abundâncias entre as diferentes áreas. É possível verificar que claramente as correlações mais fortes são entre as mesmas áreas.

my_fn <- function(data, mapping, ...){
  p <- ggplot(data = data, mapping = mapping) + 
    geom_point() + 
    geom_smooth(method=lm, fill="blue", color="blue", ...)
  p
}

ggpairs(species_1[-c(1:2)], 
  upper = list(continuous = wrap("cor", size=3)),
  lower = list(continuous = my_fn))

Apesar do n amostral muito pequeno, rodamos modelos com intuito de ganhar mais intuição sobre o que pode estar acontecendo. Contudo, a ausência de mais amostras não nos permite, de fato, saber a distribuição dos dados e ajustar um modelo que descreva bem os dados.

Modelos para ganhar intuição

We fit generalised linear models with negative binomial distribution for richness (MASS::glm.nb).

sha=data.frame(diversity(sp.t))
h=data.frame(sha) 
colnames(h)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras

simp <- diversity(sp.t, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
h$Simpson<-as.vector(simp)

# Species richness (S)
S <- specnumber(sp.t) ## rowSums(sp.t > 0)

P <- sha/log(S)# Pielou's evenness (J)

h$Richness<-S
h$Evenness<-round(P[,1],2)
h$Area<-c("F","F","F","R", "R", "R")

mod0 <- glm.nb(h$Richness ~ 1, data= h)
mod1 <- glm.nb(h$Richness ~ h$Area, data= h)

AICtab(mod0, mod1, base=T)
##      AIC  dAIC df
## mod1 80.6  0.0 3 
## mod0 84.1  3.4 2
hnp::hnp(mod1)
## Negative binomial model (using MASS package)

testResiduals(simulateResiduals(mod1))

## $uniformity
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.208, p-value = 0.9127
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.1551, p-value = 0.648
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa bootstrapped outlier test
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 6, p-value = 1
## alternative hypothesis: two.sided
##  percent confidence interval:
##  0.0000000 0.1666667
## sample estimates:
## outlier frequency (expected: 0.0116666666666667 ) 
##                                                 0
## $uniformity
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.208, p-value = 0.9127
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.1551, p-value = 0.648
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa bootstrapped outlier test
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 6, p-value = 1
## alternative hypothesis: two.sided
##  percent confidence interval:
##  0.0000000 0.1666667
## sample estimates:
## outlier frequency (expected: 0.0116666666666667 ) 
##                                                 0

Model results for richness:

The best fitted model was the one containing area as a predictor (R2 = 0.84).

summary(mod1)
## 
## Call:
## glm.nb(formula = h$Richness ~ h$Area, data = h, init.theta = 995.7504182, 
##     link = log)
## 
## Deviance Residuals: 
## F21_contigs  F22_contigs  F23_contigs  R51_contigs  R52_contigs  R53_contigs  
##     1.76908     -0.67909     -1.14227     -0.78899      0.06603      0.71091  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  8.07641    0.02094 385.752  < 2e-16 ***
## h$AreaR      0.08772    0.02946   2.977  0.00291 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(995.7504) family taken to be 1)
## 
##     Null deviance: 14.8918  on 5  degrees of freedom
## Residual deviance:  6.0278  on 4  degrees of freedom
## AIC: 80.63
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  996 
##           Std. Err.:  749 
## 
##  2 x log-likelihood:  -74.63
performance::r2(mod1)
## # R2 for Generalized Linear Regression
##   Nagelkerke's R2: 0.842

Beta diversidade

We fit generalised linear models with beta distribution for beta diversity (betareg::betareg) to the data using the R program. We base the selection in the Akaike information criteria (AIC-based), selecting the most parsimonious models with delta AICc < 2.

AIC-based model selection

#### MODELOS BINOMIAL
mdb0 <- betareg(beta.sor.t ~ 1, data= sor1)
mdb1 <- betareg(beta.sor.t ~ area, data= sor1)

AICtab(mdb0, mdb1, base=T)
##      AIC   dAIC  df
## mdb1 -95.8   0.0 4 
## mdb0 -67.1  28.7 2
MuMIn::AICc(mod0, mod1)
##      df     AICc
## mod0  2 88.06361
## mod1  3 92.62973
par(mfrow=c(2,2))
plot(mdb1)

Model results for beta:

The best fitted model was the one containing time and tree species (R2 = 0.89).

summary(mdb1)
## 
## Call:
## betareg(formula = beta.sor.t ~ area, data = sor1)
## 
## Standardized weighted residuals 2:
##     Min      1Q  Median      3Q     Max 
## -2.1268 -1.0156  0.4182  0.8810  1.2260 
## 
## Coefficients (mean model with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.29550    0.01553 -83.443  < 2e-16 ***
## area1       -0.19965    0.03250  -6.143  8.1e-10 ***
## area2       -0.33014    0.03360  -9.827  < 2e-16 ***
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)   
## (phi)   2730.6      996.9   2.739  0.00616 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood:  51.9 on 4 Df
## Pseudo R-squared: 0.8921
## Number of iterations: 1713 (BFGS) + 3 (Fisher scoring)

Variáveis abióticas

Variáveis abióticas também poderiam alterar a composição do microbioma. Fizemos alguns boxplots para explorar como estas variáveis estão mudando entre a área rural e a área de floresta, de acordo com o que nos foi enviado.

AB<-read.csv2("abioticas.csv", sep=";",  fileEncoding = "Latin1", header=T, dec=".")
AB$Area<-as.factor(AB$Area)
levels(AB$Area)
## [1] "Edge"   "Forest" "Rural"  "Urban"
head(AB)
##    Area Spot Arylsulfatase X..glucosidase Acid.Phosphatase Alkaline.Phosphatase
## 1 Urban    1      2.265181       3.121007         35.47741             7.889823
## 2 Urban    2     22.308969      20.948246         82.05325            13.600314
## 3 Urban    3     13.716082       5.310704         75.46843            12.200906
## 4 Urban    4     25.794935      11.454087         82.20383            11.513082
## 5 Urban    5      4.323747       6.116436         56.82712             7.254958
## 6 Urban    6     33.617979       7.830552         66.07508            21.418350
##   Density Porosity Moisture Microporosity Macroporosity Particle.Density   SOC
## 1    1.18    50.28   137.52         38.83         11.46             2.37 19.48
## 2    1.14    53.22   139.74         45.40          7.82             2.44 19.48
## 3    1.07    57.32   127.62         40.77         16.56             2.50 27.27
## 4    1.23    46.54   141.98         36.38         10.16             2.29 23.38
## 5    1.37    44.04   142.03         36.73          7.31             2.45 21.82
## 6    1.09    54.67   127.00         37.49         17.18             2.40 15.58
##     pH     P    K        N   Ca   Mg Al H.Al   BS  CEC   SOM Sand Clay Silt
## 1 5.11  5.36 0.26 1.464045 5.87 1.57  0 5.89 7.70 7.70 33.58 25.0   60 15.0
## 2 5.37  4.42 0.41 1.548105 5.56 1.53  0 4.34 7.50 7.50 33.58 45.0   40 15.0
## 3 5.14 20.44 0.61 2.416725 6.82 1.34  0 7.59 8.77 8.77 47.02 10.0   60 30.0
## 4 5.09 14.77 0.61 2.248605 7.19 1.65  0 6.11 9.45 9.45 40.30 27.5   40 32.5
## 5 5.14  9.66 0.20 2.080485 6.75 2.05  0 6.30 9.00 9.00 37.61 15.0   40 45.0
## 6 5.47  4.74 0.26 1.702215 5.83 2.71  0 4.85 8.80 8.80 26.87 20.0   60 20.0
##          Ba     Cr      Cu       Fe       Mn       Na     Ni      Pb      Zn
## 1 1701.7937 2.3323  9.3714 311.4072 105.4963 324.5041 0.0000  9.8453 11.2872
## 2 1533.6323 0.5898 26.2806 319.3578 427.1936  62.8692 1.0749  6.7686 11.1408
## 3 1338.5650 0.1877 21.1874 327.5233 276.3934  22.6249 2.2574  8.6146 26.3801
## 4 1695.0673 0.7238 16.5697 387.9052 241.4081  21.3530 0.6450 17.8446 20.0046
## 5 1661.4350 1.9301 19.7614 368.3509 291.0361  38.4098 0.4300 14.7679 16.4754
## 6  733.1839 1.2600  9.5751 224.5948 141.7591  39.6422 0.8600 21.5366  7.4651
str(AB)
## 'data.frame':    48 obs. of  36 variables:
##  $ Area                : Factor w/ 4 levels "Edge","Forest",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ Spot                : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Arylsulfatase       : num  2.27 22.31 13.72 25.79 4.32 ...
##  $ X..glucosidase      : num  3.12 20.95 5.31 11.45 6.12 ...
##  $ Acid.Phosphatase    : num  35.5 82.1 75.5 82.2 56.8 ...
##  $ Alkaline.Phosphatase: num  7.89 13.6 12.2 11.51 7.25 ...
##  $ Density             : num  1.18 1.14 1.07 1.23 1.37 1.09 0.96 0.95 1.2 1.11 ...
##  $ Porosity            : num  50.3 53.2 57.3 46.5 44 ...
##  $ Moisture            : num  138 140 128 142 142 ...
##  $ Microporosity       : num  38.8 45.4 40.8 36.4 36.7 ...
##  $ Macroporosity       : num  11.46 7.82 16.56 10.16 7.31 ...
##  $ Particle.Density    : num  2.37 2.44 2.5 2.29 2.45 2.4 2.28 2.4 2.54 2.39 ...
##  $ SOC                 : num  19.5 19.5 27.3 23.4 21.8 ...
##  $ pH                  : num  5.11 5.37 5.14 5.09 5.14 5.47 3.81 3.89 4.02 3.96 ...
##  $ P                   : num  5.36 4.42 20.44 14.77 9.66 ...
##  $ K                   : num  0.26 0.41 0.61 0.61 0.2 0.26 0.28 0.41 0.08 0.1 ...
##  $ N                   : num  1.46 1.55 2.42 2.25 2.08 ...
##  $ Ca                  : num  5.87 5.56 6.82 7.19 6.75 5.83 1.94 1.68 0.79 1.26 ...
##  $ Mg                  : num  1.57 1.53 1.34 1.65 2.05 2.71 0.41 0.54 0.18 0.38 ...
##  $ Al                  : num  0 0 0 0 0 0 2.15 1.15 1.65 1.45 ...
##  $ H.Al                : num  5.89 4.34 7.59 6.11 6.3 ...
##  $ BS                  : num  7.7 7.5 8.77 9.45 9 8.8 2.63 2.63 1.05 1.74 ...
##  $ CEC                 : num  7.7 7.5 8.77 9.45 9 8.8 4.78 3.78 2.7 3.19 ...
##  $ SOM                 : num  33.6 33.6 47 40.3 37.6 ...
##  $ Sand                : num  25 45 10 27.5 15 20 25 25 25 10 ...
##  $ Clay                : int  60 40 60 40 40 60 60 60 60 60 ...
##  $ Silt                : num  15 15 30 32.5 45 20 15 15 15 30 ...
##  $ Ba                  : num  1702 1534 1339 1695 1661 ...
##  $ Cr                  : num  2.332 0.59 0.188 0.724 1.93 ...
##  $ Cu                  : num  9.37 26.28 21.19 16.57 19.76 ...
##  $ Fe                  : num  311 319 328 388 368 ...
##  $ Mn                  : num  105 427 276 241 291 ...
##  $ Na                  : num  324.5 62.9 22.6 21.4 38.4 ...
##  $ Ni                  : num  0 1.075 2.257 0.645 0.43 ...
##  $ Pb                  : num  9.85 6.77 8.61 17.84 14.77 ...
##  $ Zn                  : num  11.3 11.1 26.4 20 16.5 ...
ABi<-AB[AB$Area=="Rural"|AB$Area=="Forest",]
vari<-colnames(ABi)

head(ABi)
##     Area Spot Arylsulfatase X..glucosidase Acid.Phosphatase
## 13 Rural    1      4.954743      0.1215587         75.89307
## 14 Rural    2     10.216895      0.2399537         89.09107
## 15 Rural    3     13.128901     -0.1823805         88.30463
## 16 Rural    4      9.809631      0.0485416         68.86310
## 17 Rural    5     20.454745      0.4921459         85.60830
## 18 Rural    6     23.786942      0.1136793         93.07828
##    Alkaline.Phosphatase Density Porosity Moisture Microporosity Macroporosity
## 13             5.134844    1.36    44.27   159.36         38.51          5.76
## 14             5.485450    1.35    45.74   157.91         37.61          8.12
## 15             1.980411    1.25    51.44   144.27         38.91         12.53
## 16             1.400418    1.32    48.26   151.95         36.30         11.96
## 17            11.822917    1.35    45.79   153.66         35.99          9.80
## 18             9.053077    1.32    50.80   159.02         31.92         18.88
##    Particle.Density   SOC   pH     P    K        N   Ca   Mg   Al  H.Al   BS
## 13             2.44 19.48 4.44  6.73 0.41 1.295925 6.05 2.44 0.10  5.00 8.90
## 14             2.49 21.82 4.50 10.84 0.41 2.668905 5.32 1.55 0.10  6.99 7.28
## 15             2.57 24.93 4.62  4.74 0.10 1.954395 5.26 1.79 0.03  6.74 7.15
## 16             2.54 23.38 4.18  6.04 0.15 1.870335 1.61 0.52 2.00 12.87 2.28
## 17             2.48 23.38 4.37  5.05 0.82 2.949105 7.34 1.59 0.01  5.15 9.75
## 18             2.69 17.14 4.45 10.84 0.77 2.374695 5.44 1.09 0.25  7.93 7.30
##     CEC   SOM Sand Clay Silt        Ba     Cr      Cu       Fe       Mn      Na
## 13 9.00 33.58   10   50   40 1257.8475 6.3534 24.3791 385.1117 215.3166 32.8401
## 14 7.38 37.61   15   55   30 1197.3094 5.4151 25.9410 232.1157 369.8512 55.9169
## 15 7.18 42.99   15   50   35 1291.4798 5.0130 21.9344 574.8527 136.6980 37.3669
## 16 4.28 40.30   15   50   35  248.8789 7.1576 27.0955 201.6024 255.2155 49.3201
## 17 9.76 40.30   25   35   40 1520.1794 7.5597 30.6267 207.1893 557.5530 35.3919
## 18 7.55 29.55   30   45   25  941.7040 7.2917 45.1591 130.0467 590.0323 41.9649
##        Ni      Pb      Zn
## 13 3.8698 19.6906  5.9689
## 14 1.1824 15.9986  6.7821
## 15 3.4398 27.0745  5.7086
## 16 2.1499 22.7672  5.7574
## 17 3.7623 21.5366  8.9777
## 18 2.5799 19.0752 11.0432
#Desta análise precisaremos retirar arylsulfatase, X..glicosidase, fosfatase básica e fosfatase ácida, pois são enzimas e creio que para este artigo não são pertinentes. Dá para tirar também msu, micro e macroporo, pois só com a porosidade já teremos uma boas descrição dos dados

ABi<-ABi[,-c(3:6,9:11)]

head(ABi)
##     Area Spot Density Porosity Particle.Density   SOC   pH     P    K        N
## 13 Rural    1    1.36    44.27             2.44 19.48 4.44  6.73 0.41 1.295925
## 14 Rural    2    1.35    45.74             2.49 21.82 4.50 10.84 0.41 2.668905
## 15 Rural    3    1.25    51.44             2.57 24.93 4.62  4.74 0.10 1.954395
## 16 Rural    4    1.32    48.26             2.54 23.38 4.18  6.04 0.15 1.870335
## 17 Rural    5    1.35    45.79             2.48 23.38 4.37  5.05 0.82 2.949105
## 18 Rural    6    1.32    50.80             2.69 17.14 4.45 10.84 0.77 2.374695
##      Ca   Mg   Al  H.Al   BS  CEC   SOM Sand Clay Silt        Ba     Cr      Cu
## 13 6.05 2.44 0.10  5.00 8.90 9.00 33.58   10   50   40 1257.8475 6.3534 24.3791
## 14 5.32 1.55 0.10  6.99 7.28 7.38 37.61   15   55   30 1197.3094 5.4151 25.9410
## 15 5.26 1.79 0.03  6.74 7.15 7.18 42.99   15   50   35 1291.4798 5.0130 21.9344
## 16 1.61 0.52 2.00 12.87 2.28 4.28 40.30   15   50   35  248.8789 7.1576 27.0955
## 17 7.34 1.59 0.01  5.15 9.75 9.76 40.30   25   35   40 1520.1794 7.5597 30.6267
## 18 5.44 1.09 0.25  7.93 7.30 7.55 29.55   30   45   25  941.7040 7.2917 45.1591
##          Fe       Mn      Na     Ni      Pb      Zn
## 13 385.1117 215.3166 32.8401 3.8698 19.6906  5.9689
## 14 232.1157 369.8512 55.9169 1.1824 15.9986  6.7821
## 15 574.8527 136.6980 37.3669 3.4398 27.0745  5.7086
## 16 201.6024 255.2155 49.3201 2.1499 22.7672  5.7574
## 17 207.1893 557.5530 35.3919 3.7623 21.5366  8.9777
## 18 130.0467 590.0323 41.9649 2.5799 19.0752 11.0432

Das variáveis medidas, as que parecem ser diferentes entre as áreas são Arilsulfatase,Fosfatase Ácida, Densidade, Porosidade, MSU, Microporo, Potássio (k) e MO. Os outros parecem iguais.

Realizamos um PCA para combinar essas variáveis:

### Componentes principais

pca <- PCA(ABi[,-1], quali.sup = 1, graph=F)
p <- fviz_pca_biplot(pca,geom="point", col.ind = ABi[,1],
                      repel = TRUE,addEllipses = T)
p

Contudo, não conseguimos linkar os valores das amostras das variáveis abióticas à estrutura das comunidades de OTU, de maneira que não conseguimos adicionar estas variáveis no modelo.

No artigo, aparecem dados de: - C: CaCll - C: em cmolc dm - N: em cmolc dm - K: em cmolc dm - Ca: em cmolc dm - Mg: em cmolc dm - Al: em cmolc dm - H+Al: em cmolc dm - V: em porcentagem - Clay: em porcentagem

Se não for possível vincular às amostras, ainda assim pode ser interessante mostrar alguns boxplots.

for(i in 3:ncol(ABi)){
#i=4
A<-ggplot(ABi)+
  geom_boxplot(aes(y=ABi[,i], x=ABi[,1])) + 
  theme_cowplot()+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=0.5))
  
print(A)
print(vari[i])}

## [1] "Arylsulfatase"

## [1] "X..glucosidase"

## [1] "Acid.Phosphatase"

## [1] "Alkaline.Phosphatase"

## [1] "Density"

## [1] "Porosity"

## [1] "Moisture"

## [1] "Microporosity"

## [1] "Macroporosity"

## [1] "Particle.Density"

## [1] "SOC"

## [1] "pH"

## [1] "P"

## [1] "K"

## [1] "N"

## [1] "Ca"

## [1] "Mg"

## [1] "Al"

## [1] "H.Al"

## [1] "BS"

## [1] "CEC"

## [1] "SOM"

## [1] "Sand"

## [1] "Clay"

## [1] "Silt"

## [1] "Ba"

## [1] "Cr"

Rarefação

#### Rarefaction curve
########### https://rpubs.com/brouwern/iNEXTvVEGAN
## Directory and arquive
setwd("D:/Dropbox/00_Unioeste/2 Pesquisa/3 Doutorado/Natália/Artigos_Projeto/Analises_Artigo3")
biotic=read.csv2("Pasta1.csv",h=T,sep =";", dec=".", row.names = 3)
names(biotic)
## [1] "Rank"  "TaxId" "F1"    "F2"    "F3"    "R1"    "R2"    "R3"
## Subset
species=subset(biotic, biotic$Rank=="species")
names(species)
## [1] "Rank"  "TaxId" "F1"    "F2"    "F3"    "R1"    "R2"    "R3"
sp=species[,-c(1:2)]

s=data.frame(t(sp))

#### Packages
library(iNEXT)
library(vegan)

#### All sampling 
names(species)
## [1] "Rank"  "TaxId" "F1"    "F2"    "F3"    "R1"    "R2"    "R3"
x <- list("Forest1" = species[,3], "Forest2"=species[,4], "Forest3"=species[,5], "Rural1"=species[,6], "Rural2"=species[,7], "Rural3"=species[,8])
str(x)
## List of 6
##  $ Forest1: int [1:4733] 0 0 0 0 0 1 0 2 3 0 ...
##  $ Forest2: int [1:4733] 1 2 2 0 0 2 0 0 1 0 ...
##  $ Forest3: int [1:4733] 0 0 1 1 0 1 0 2 0 0 ...
##  $ Rural1 : int [1:4733] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Rural2 : int [1:4733] 0 0 2 0 0 2 0 3 2 5 ...
##  $ Rural3 : int [1:4733] 0 0 3 2 1 1 1 6 0 1 ...
# call iNEXT on the list
out <- iNEXT(x, q=0, datatype="abundance", endpoint = 150000)

# Sample-size-based R/E curves, separating by "site""
ggiNEXT(out, type=1)+theme_bw()

Como conclusão até o momento, com base nessas análises, a composição dos solos, não se diferencia em termos de índices de diversidade e são indicadas como similares pelo índice de similaridade. Porém, há diferença na riqueza. A ??diversidade indica que a área do fragmento e a área rural são diferentes, ainda que a diferença seja muito baixa. O mapa de calor que mostra como se diferença a estrutura da comunidade através da abundância das espécies por área. Por fim, a comparação dos dados abióticos indica diferenciação entre o fragmento e a área rural. Se for possível conectar as amostras de OTUs com as amostras de fatores abióticos, podemos investigar se existe algum impacto da variação nos fatores abióticos com a estrutura das OTUs

Some useful References

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Bolker B, R Development Core Team (2022). bbmle: Tools for General Maximum Likelihood Estimation. R package version 1.0.25, https://CRAN.R-project.org/package=bbmle.

Barto?? K (2022). MuMIn: Multi-Model Inference. R package version 1.46.0, https://CRAN.R-project.org/package=MuMIn.