Assessing the soil microbiome of a native Atlantic forest fragment and an adjacent farm
Conforme a proposta: “dados foram analisados quanto a sua natureza através de gráficos exploratórios e estatísticas descritivas (percentual, medidas de tendência central e dispersão).(…) ajuste do modelo estatístico mais adequado (…). Ao fim, verificamos a aderência dos resíduos dos modelos às premissas dos testes (normalidade e heterocedasticidade) e a independência temporal da amostra. (…) confecção dos elementos visuais (figuras e tabelas)(…).
Objetivos:
“…the present study aimed to investigate the bacterial community structure of both soils with long-term agro-farming activities and undisturbed soils with natural vegetation, using metabarcoding, and assessing the differences in their ecological attributes.”
Metodologia:
O estudo foi feito utilizando 3 amostras em um fragmento florestal e 3 amostras numa área de fazenda adjacente. O baixo número amostral dificulta análises estatísticas que possam ser representativas, aumentando a chance de erro Tipo I. Porém, compreendemos que os dados de caracterização dos OTUs são um avanço tecnológico, visto que tal identificação permite avaliar os táxons que são filtrados em ambientes com pressão antropogênica. Assim, optamos por evidenciar estes resultados, caracterizando a estrutura do microbioma por meio da riqueza e de beta diversidade.
Esforço amostral: 3 amostras de solo por área (Fragmento e Fazenda).
Dados brutos
Dados enviados, Subset das 10 primeiras linhas:
# LOAD DATA ####
setwd("D:/Dropbox/00_Unioeste/2 Pesquisa/3 Doutorado/Natália/16spaper")
species<-read.csv2("Table_All_24_02_22.csv", sep=";", header=T)
skim(species)| Name | species |
| Number of rows | 7256 |
| Number of columns | 9 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 7 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Rank | 0 | 1 | 5 | 13 | 0 | 9 | 0 |
| Scientific.Name | 0 | 1 | 3 | 67 | 0 | 7256 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| TaxId | 0 | 1 | 889406.32 | 950287.48 | 0 | 56643.5 | 376647 | 1776348 | 2770094 | ▇▁▂▂▂ |
| F21_contigs | 0 | 1 | 94.04 | 1626.67 | 0 | 1.0 | 4 | 20 | 110926 | ▇▁▁▁▁ |
| F22_contigs | 0 | 1 | 68.16 | 1182.93 | 0 | 0.0 | 3 | 14 | 81188 | ▇▁▁▁▁ |
| F23_contigs | 0 | 1 | 64.87 | 1129.32 | 0 | 0.0 | 3 | 13 | 77364 | ▇▁▁▁▁ |
| R51_contigs | 0 | 1 | 79.52 | 1359.37 | 0 | 1.0 | 4 | 18 | 99152 | ▇▁▁▁▁ |
| R52_contigs | 0 | 1 | 88.38 | 1514.71 | 0 | 1.0 | 5 | 21 | 110140 | ▇▁▁▁▁ |
| R53_contigs | 0 | 1 | 92.30 | 1568.72 | 0 | 1.0 | 5 | 23 | 114597 | ▇▁▁▁▁ |
summary(species)## Rank TaxId Scientific.Name F21_contigs
## Length:7256 Min. : 0 Length:7256 Min. : 0.00
## Class :character 1st Qu.: 56644 Class :character 1st Qu.: 1.00
## Mode :character Median : 376647 Mode :character Median : 4.00
## Mean : 889406 Mean : 94.04
## 3rd Qu.:1776348 3rd Qu.: 20.00
## Max. :2770094 Max. :110926.00
## F22_contigs F23_contigs R51_contigs R52_contigs
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.00
## Median : 3.00 Median : 3.00 Median : 4.00 Median : 5.00
## Mean : 68.16 Mean : 64.87 Mean : 79.52 Mean : 88.38
## 3rd Qu.: 14.00 3rd Qu.: 13.00 3rd Qu.: 18.00 3rd Qu.: 21.00
## Max. :81188.00 Max. :77364.00 Max. :99152.00 Max. :110140.00
## R53_contigs
## Min. : 0.0
## 1st Qu.: 1.0
## Median : 5.0
## Mean : 92.3
## 3rd Qu.: 23.0
## Max. :114597.0
table(species$Rank)##
## below species class family genus order
## 165 107 458 1511 219
## phylum species superkingdom unclassified
## 58 4733 4 1
species[1:20,1:9] %>%
kbl() %>% kable_classic(full_width = F, html_font = "Cambria")| Rank | TaxId | Scientific.Name | F21_contigs | F22_contigs | F23_contigs | R51_contigs | R52_contigs | R53_contigs |
|---|---|---|---|---|---|---|---|---|
| unclassified | 0 | Unknown | 8305 | 6205 | 5388 | 5105 | 5815 | 6328 |
| species | 1916956 | Synechococcus sp. SynAce01 | 5 | 4 | 3 | 9 | 5 | 5 |
| species | 1720344 | Psychrobacter sp. AntiMn-1 | 0 | 1 | 0 | 0 | 0 | 0 |
| superkingdom | 2 | Bacteria <bacteria> | 110926 | 81188 | 77364 | 99152 | 110140 | 114597 |
| genus | 131079 | Limnobacter | 6 | 8 | 0 | 13 | 13 | 17 |
| genus | 6 | Azorhizobium | 99 | 39 | 47 | 44 | 52 | 40 |
| species | 1179669 | Yersinia sp. KBS0713 | 0 | 2 | 0 | 0 | 0 | 0 |
| species | 7 | Azorhizobium caulinodans | 99 | 39 | 47 | 44 | 52 | 40 |
| species | 9 | Buchnera aphidicola | 18 | 10 | 10 | 29 | 35 | 49 |
| species | 1179675 | Burkholderia sp. KBS0801 | 0 | 2 | 1 | 0 | 2 | 3 |
| species | 1179672 | Flavobacterium sp. KBS0721 | 0 | 0 | 1 | 0 | 0 | 2 |
| genus | 10 | Cellvibrio | 4 | 9 | 6 | 10 | 25 | 11 |
| species | 1179673 | Oerskovia sp. KBS0722 | 32 | 13 | 28 | 8 | 15 | 7 |
| species | 11 | Cellulomonas gilvus | 9 | 5 | 1 | 4 | 11 | 4 |
| genus | 13 | Dictyoglomus | 10 | 3 | 6 | 18 | 18 | 5 |
| species | 14 | Dictyoglomus thermophilum | 0 | 1 | 0 | 3 | 1 | 0 |
| genus | 65551 | Meiothermus | 31 | 37 | 34 | 45 | 35 | 57 |
| species | 65553 | Sulfurospirillum deleyianum | 0 | 0 | 0 | 0 | 0 | 1 |
| genus | 16 | Methylophilus | 0 | 0 | 0 | 1 | 2 | 1 |
| species | 65555 | Desulfocapsa sulfexigens | 1 | 3 | 0 | 5 | 4 | 8 |
genus=subset(species, species$Rank=="genus")
genus[1:20,1:9] %>%
kbl() %>% kable_classic(full_width = F, html_font = "Cambria")| Rank | TaxId | Scientific.Name | F21_contigs | F22_contigs | F23_contigs | R51_contigs | R52_contigs | R53_contigs | |
|---|---|---|---|---|---|---|---|---|---|
| 5 | genus | 131079 | Limnobacter | 6 | 8 | 0 | 13 | 13 | 17 |
| 6 | genus | 6 | Azorhizobium | 99 | 39 | 47 | 44 | 52 | 40 |
| 12 | genus | 10 | Cellvibrio | 4 | 9 | 6 | 10 | 25 | 11 |
| 15 | genus | 13 | Dictyoglomus | 10 | 3 | 6 | 18 | 18 | 5 |
| 17 | genus | 65551 | Meiothermus | 31 | 37 | 34 | 45 | 35 | 57 |
| 19 | genus | 16 | Methylophilus | 0 | 0 | 0 | 1 | 2 | 1 |
| 21 | genus | 18 | Pelobacter | 25 | 24 | 27 | 53 | 40 | 64 |
| 23 | genus | 20 | Phenylobacterium | 471 | 373 | 329 | 217 | 222 | 245 |
| 24 | genus | 22 | Shewanella | 12 | 14 | 5 | 16 | 10 | 16 |
| 29 | genus | 32 | Myxococcus | 40 | 44 | 47 | 92 | 101 | 113 |
| 37 | genus | 40 | Stigmatella | 150 | 72 | 127 | 122 | 154 | 121 |
| 40 | genus | 42 | Cystobacter | 12 | 11 | 16 | 79 | 31 | 56 |
| 42 | genus | 44 | Melittangium | 17 | 14 | 16 | 32 | 57 | 61 |
| 44 | genus | 47 | Archangium | 119 | 65 | 75 | 81 | 102 | 93 |
| 47 | genus | 50 | Chondromyces | 74 | 41 | 55 | 175 | 219 | 219 |
| 54 | genus | 442430 | Coraliomargarita | 13 | 7 | 5 | 9 | 6 | 12 |
| 56 | genus | 59 | Vitreoscilla | 16 | 6 | 12 | 7 | 6 | 7 |
| 60 | genus | 68 | Lysobacter | 261 | 159 | 159 | 304 | 476 | 484 |
| 65 | genus | 75 | Caulobacter | 420 | 291 | 291 | 152 | 209 | 167 |
| 66 | genus | 622681 | Candidatus Planktophila | 4 | 3 | 1 | 3 | 3 | 1 |
family=subset(species, species$Rank=="family")
family[1:20,1:9] %>%
kbl() %>% kable_classic(full_width = F, html_font = "Cambria")| Rank | TaxId | Scientific.Name | F21_contigs | F22_contigs | F23_contigs | R51_contigs | R52_contigs | R53_contigs | |
|---|---|---|---|---|---|---|---|---|---|
| 28 | family | 31 | Myxococcaceae | 132 | 100 | 119 | 216 | 250 | 282 |
| 36 | family | 39 | Archangiaceae | 357 | 196 | 294 | 374 | 418 | 395 |
| 46 | family | 49 | Polyangiaceae | 364 | 224 | 263 | 867 | 1039 | 1035 |
| 104 | family | 213119 | Desulfobacteraceae | 111 | 93 | 88 | 271 | 298 | 328 |
| 109 | family | 126 | Planctomycetaceae | 374 | 256 | 244 | 269 | 271 | 314 |
| 110 | family | 213117 | Desulfohalobiaceae | 9 | 7 | 10 | 32 | 34 | 33 |
| 112 | family | 213116 | Desulfomicrobiaceae | 22 | 8 | 15 | 40 | 39 | 70 |
| 115 | family | 213121 | Desulfobulbaceae | 105 | 65 | 72 | 152 | 173 | 210 |
| 123 | family | 137 | Spirochaetaceae | 67 | 51 | 55 | 121 | 148 | 166 |
| 141 | family | 170 | Leptospiraceae | 26 | 20 | 17 | 27 | 32 | 32 |
| 153 | family | 2162846 | Candidatus Nanopelagicaceae | 24 | 21 | 7 | 24 | 41 | 25 |
| 158 | family | 82115 | Rhizobiaceae | 2117 | 1204 | 1157 | 610 | 841 | 775 |
| 186 | family | 1769717 | Mesoaciditogaceae | 2 | 0 | 3 | 7 | 6 | 10 |
| 262 | family | 541000 | Ruminococcaceae | 33 | 26 | 25 | 67 | 60 | 91 |
| 303 | family | 49546 | Flavobacteriaceae | 201 | 96 | 105 | 272 | 496 | 477 |
| 309 | family | 403 | Methylococcaceae | 105 | 77 | 64 | 151 | 172 | 196 |
| 323 | family | 213422 | Geobacteraceae | 149 | 135 | 107 | 230 | 282 | 300 |
| 324 | family | 213421 | Desulfuromonadaceae | 57 | 38 | 55 | 106 | 105 | 124 |
| 326 | family | 433 | Acetobacteraceae | 1472 | 1079 | 1093 | 880 | 965 | 969 |
| 338 | family | 444 | Legionellaceae | 152 | 79 | 63 | 70 | 103 | 95 |
class=subset(species, species$Rank=="class")
class[1:20,1:9] %>%
kbl() %>% kable_classic(full_width = F, html_font = "Cambria")| Rank | TaxId | Scientific.Name | F21_contigs | F22_contigs | F23_contigs | R51_contigs | R52_contigs | R53_contigs | |
|---|---|---|---|---|---|---|---|---|---|
| 71 | class | 28211 | Alphaproteobacteria | 35393 | 25712 | 23828 | 18871 | 21325 | 21833 |
| 75 | class | 1760 | Actinobacteria <high GC Gram+> | 26830 | 18927 | 19554 | 18237 | 19373 | 19209 |
| 76 | class | 28216 | Betaproteobacteria | 5046 | 3223 | 3111 | 7620 | 8902 | 9159 |
| 80 | class | 1236 | Gammaproteobacteria | 6212 | 4824 | 4366 | 5843 | 7462 | 7412 |
| 82 | class | 28221 | Deltaproteobacteria | 3324 | 2324 | 2399 | 5826 | 6637 | 6980 |
| 520 | class | 219685 | Gemmatimonadetes <class> | 753 | 720 | 552 | 2945 | 2884 | 3119 |
| 622 | class | 91061 | Bacilli | 1649 | 1201 | 1256 | 2570 | 2785 | 2998 |
| 782 | class | 1813735 | Vicinamibacteria | 1042 | 922 | 698 | 2472 | 2598 | 2793 |
| 792 | class | 186801 | Clostridia | 1318 | 1127 | 1046 | 2308 | 2380 | 2702 |
| 922 | class | 1853228 | Chitinophagia | 1353 | 869 | 646 | 1647 | 2168 | 2748 |
| 1080 | class | 204432 | Acidobacteriia | 3968 | 3861 | 3160 | 1904 | 2060 | 2154 |
| 1326 | class | 203683 | Planctomycetia | 3248 | 2138 | 2011 | 1947 | 1952 | 2170 |
| 1391 | class | 84992 | Acidimicrobiia | 1398 | 928 | 990 | 1619 | 1774 | 1934 |
| 1651 | class | 84995 | Rubrobacteria | 1223 | 827 | 912 | 1764 | 1706 | 1753 |
| 1691 | class | 1497346 | Thermoleophilia | 1359 | 881 | 1058 | 1321 | 1205 | 1295 |
| 1784 | class | 32061 | Chloroflexia | 153 | 98 | 76 | 1113 | 1267 | 1128 |
| 1801 | class | 768503 | Cytophagia | 420 | 180 | 215 | 628 | 1071 | 1081 |
| 1955 | class | 292625 | Anaerolineae | 157 | 110 | 86 | 641 | 703 | 758 |
| 1991 | class | 147550 | Sordariomycetes | 565 | 367 | 392 | 487 | 687 | 829 |
| 2053 | class | 203693 | Nitrospira <class> | 170 | 142 | 121 | 648 | 628 | 674 |
biotic=read.csv2("Pasta1.csv",h=T,sep =";", dec=".", row.names = 3)
skim(biotic)| Name | biotic |
| Number of rows | 4733 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 7 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Rank | 0 | 1 | 7 | 7 | 0 | 1 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| TaxId | 0 | 1 | 1012417.99 | 985090.36 | 7 | 68223 | 585425 | 1914471 | 2748873 | ▇▁▂▂▃ |
| F21_contigs | 0 | 1 | 15.28 | 57.09 | 0 | 0 | 2 | 10 | 1359 | ▇▁▁▁▁ |
| F22_contigs | 0 | 1 | 10.97 | 44.52 | 0 | 0 | 2 | 7 | 1119 | ▇▁▁▁▁ |
| F23_contigs | 0 | 1 | 10.43 | 40.80 | 0 | 0 | 2 | 7 | 1058 | ▇▁▁▁▁ |
| R51_contigs | 0 | 1 | 12.98 | 61.82 | 0 | 0 | 2 | 9 | 2472 | ▇▁▁▁▁ |
| R52_contigs | 0 | 1 | 14.42 | 64.28 | 0 | 0 | 3 | 10 | 2598 | ▇▁▁▁▁ |
| R53_contigs | 0 | 1 | 15.17 | 68.17 | 0 | 1 | 3 | 11 | 2793 | ▇▁▁▁▁ |
str(biotic)## 'data.frame': 4733 obs. of 8 variables:
## $ Rank : chr "species" "species" "species" "species" ...
## $ TaxId : int 1916956 1720344 1179669 7 9 1179675 1179672 1179673 11 14 ...
## $ F21_contigs: int 5 0 0 99 18 0 0 32 9 0 ...
## $ F22_contigs: int 4 1 2 39 10 2 0 13 5 1 ...
## $ F23_contigs: int 3 0 0 47 10 1 1 28 1 0 ...
## $ R51_contigs: int 9 0 0 44 29 0 0 8 4 3 ...
## $ R52_contigs: int 5 0 0 52 35 2 0 15 11 1 ...
## $ R53_contigs: int 5 0 0 40 49 3 2 7 4 0 ...
summary(biotic)## Rank TaxId F21_contigs F22_contigs
## Length:4733 Min. : 7 Min. : 0.00 Min. : 0.00
## Class :character 1st Qu.: 68223 1st Qu.: 0.00 1st Qu.: 0.00
## Mode :character Median : 585425 Median : 2.00 Median : 2.00
## Mean :1012418 Mean : 15.28 Mean : 10.97
## 3rd Qu.:1914471 3rd Qu.: 10.00 3rd Qu.: 7.00
## Max. :2748873 Max. :1359.00 Max. :1119.00
## F23_contigs R51_contigs R52_contigs R53_contigs
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00
## Median : 2.00 Median : 2.00 Median : 3.00 Median : 3.00
## Mean : 10.43 Mean : 12.98 Mean : 14.42 Mean : 15.17
## 3rd Qu.: 7.00 3rd Qu.: 9.00 3rd Qu.: 10.00 3rd Qu.: 11.00
## Max. :1058.00 Max. :2472.00 Max. :2598.00 Max. :2793.00
biotic[1:10,1:8] %>%
kbl(caption = "Abundance of OTUs per sample.") %>%
kable_classic(full_width = F, html_font = "Cambria")| Rank | TaxId | F21_contigs | F22_contigs | F23_contigs | R51_contigs | R52_contigs | R53_contigs | |
|---|---|---|---|---|---|---|---|---|
| Synechococcus sp. SynAce01 | species | 1916956 | 5 | 4 | 3 | 9 | 5 | 5 |
| Psychrobacter sp. AntiMn-1 | species | 1720344 | 0 | 1 | 0 | 0 | 0 | 0 |
| Yersinia sp. KBS0713 | species | 1179669 | 0 | 2 | 0 | 0 | 0 | 0 |
| Azorhizobium caulinodans | species | 7 | 99 | 39 | 47 | 44 | 52 | 40 |
| Buchnera aphidicola | species | 9 | 18 | 10 | 10 | 29 | 35 | 49 |
| Burkholderia sp. KBS0801 | species | 1179675 | 0 | 2 | 1 | 0 | 2 | 3 |
| Flavobacterium sp. KBS0721 | species | 1179672 | 0 | 0 | 1 | 0 | 0 | 2 |
| Oerskovia sp. KBS0722 | species | 1179673 | 32 | 13 | 28 | 8 | 15 | 7 |
| Cellulomonas gilvus | species | 11 | 9 | 5 | 1 | 4 | 11 | 4 |
| Dictyoglomus thermophilum | species | 14 | 0 | 1 | 0 | 3 | 1 | 0 |
Cálculo dos Índices de diversidade para explorar a estrutura do microbioma:
- Espécie
## Subset
species_1=subset(biotic, biotic$Rank=="species")
#str(species_1)
species_1$Rank<-as.factor(species_1$Rank)
#head(species_1)
sp=species_1[,-c(1:2)]#tirando as colunas Rank & TaxID
sp.t=t.data.frame(sp)#transpondo os dados
col=nrow(species_1)
sp.t<-sp.t[,1:col]
soma1<-sp.t[1:2,]
soma1[1,]<-colSums(sp.t[1:3,])
soma1[2,]<-colSums(sp.t[4:6,])
rownames(soma1)<-c("F","R")
## Calculo dos índices de diversidade para análise exploratória
sha1=data.frame(diversity(soma1))
H1=data.frame(sha1)
colnames(H1)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras
simp1 <- diversity(soma1, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
H1$Simpson<-as.vector(simp1)
# Species richness (S)
S1 <- specnumber(sp.t) ## rowSums(sp.t > 0)
P1 <- sha1/log(S1)# Pielou's evenness (J)
pairs(cbind(sha1, simp1), pch="+", col="blue")H1$Richness<-c(round(mean(S1[1:3]),0),round(mean(S1[4:6]),0))
H1$Pielou<-P1[,1]
H1 %>%
kbl(caption = "Diversity Index and Richness per local") %>%
kable_classic(full_width = F, html_font = "Cambria")| Shannon | Simpson | Richness | Pielou | |
|---|---|---|---|---|
| F | 6.805272 | 0.9966543 | 3218 | 0.8360560 |
| R | 6.812034 | 0.9954103 | 3513 | 0.8460409 |
- Família
#str(family)
family$Rank<-as.factor(family$Rank)
#head(species_1)
f=family[,-c(1:3)]#tirando as colunas Rank & TaxID
f.t=t.data.frame(f)#transpondo os dados
col=nrow(family)
f.t<-f.t[,1:col]
soma2<-f.t[1:2,]
soma2[1,]<-colSums(f.t[1:3,])
soma2[2,]<-colSums(f.t[4:6,])
rownames(soma2)<-c("F","R")
## Calculo dos índices de diversidade para análise exploratória
sha2=data.frame(diversity(soma2))
H2=data.frame(sha2)
colnames(H2)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras
simp2 <- diversity(soma2, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
H2$Simpson<-as.vector(simp2)
# Species richness (S)
S2 <- specnumber(f.t) ## rowSums(sp.t > 0)
P2 <- sha2/log(S2)# Pielou's evenness (J)
pairs(cbind(sha2, simp2), pch="+", col="blue")H2$Richness<-c(round(mean(S2[1:3]),0),round(mean(S2[4:6]),0))
H2$Pielou<-P2[,1]
H2 %>%
kbl(caption = "Diversity Index and Richness per local") %>%
kable_classic(full_width = F, html_font = "Cambria")| Shannon | Simpson | Richness | Pielou | |
|---|---|---|---|---|
| F | 4.532598 | 0.9767113 | 417 | 0.7495153 |
| R | 4.859533 | 0.9862589 | 433 | 0.8070941 |
- Classe
#str(Class)
class$Rank<-as.factor(class$Rank)
#head(species_1)
c=class[,-c(1:3)]#tirando as colunas Rank & TaxID
c.t=t.data.frame(c)#transpondo os dados
col=nrow(class)
c.t<-c.t[,1:col]
soma3<-c.t[1:2,]
soma3[1,]<-colSums(c.t[1:3,])
soma3[2,]<-colSums(c.t[4:6,])
rownames(soma3)<-c("F","R")
## Calculo dos índices de diversidade para análise exploratória
sha3=data.frame(diversity(soma3))
H3=data.frame(sha3)
colnames(H3)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras
simp3 <- diversity(soma3, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
H3$Simpson<-as.vector(simp3)
# Species richness (S)
S3 <- specnumber(c.t) ## rowSums(sp.t > 0)
P3 <- sha3/log(S3)# Pielou's evenness (J)
pairs(cbind(sha3, simp3), pch="+", col="blue")H3$Richness<-c(round(sum(S3[1:3]),0),round(sum(S3[4:6]),0))
H3$Pielou<-P3[,1]
H3 %>%
kbl(caption = "Diversity Index and Richness per local") %>%
kable_classic(full_width = F, html_font = "Cambria")| Shannon | Simpson | Richness | Pielou | |
|---|---|---|---|---|
| F | 2.429273 | 0.8119179 | 305 | 0.5252512 |
| R | 2.890531 | 0.8933679 | 313 | 0.6249834 |
Olhando a similaridade entre as amostras
- Espécies
rural=sp[,4:6]
rural=ifelse(rural>0,1,0)
rural=as.data.frame(apply(rural, 1, sum))
floresta=sp[,1:3]
floresta=ifelse(floresta>0,1,0)
floresta=as.data.frame(apply(floresta, 1, sum))
simi<-round(jaccard(rural, floresta),2)#compara dois vetores com as espécies ordenadas da mesma maneira de duas areas, para isso somei as especies que ocorriam em qualquer uma das amostras
simi<-as.data.frame(simi)
simi$bray<-round(bray.curtis(rural, floresta),2)
simi$sor<-round(sorenson(rural, floresta),2)
colnames(simi)<-c("Jaccard", "Bray-curtis", "Sorensen")
#REF: Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell.
simi %>%
kbl(caption = "Areas similarities") %>%
kable_classic(full_width = F, html_font = "Cambria")| Jaccard | Bray-curtis | Sorensen |
|---|---|---|
| 0.75 | 0.84 | 0.86 |
- Família
rural=family[,7:9]
rural=ifelse(rural>0,1,0)
rural=as.data.frame(apply(rural, 1, sum))
floresta=family[,4:6]
floresta=ifelse(floresta>0,1,0)
floresta=as.data.frame(apply(floresta, 1, sum))
simi<-round(jaccard(rural, floresta),2)#compara dois vetores com as espécies ordenadas da mesma maneira de duas areas, para isso somei as especies que ocorriam em qualquer uma das amostras
simi<-as.data.frame(simi)
simi$bray<-round(bray.curtis(rural, floresta),2)
simi$sor<-round(sorenson(rural, floresta),2)
colnames(simi)<-c("Jaccard", "Bray-curtis", "Sorensen")
#REF: Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell.
simi %>%
kbl(caption = "Areas similarities") %>%
kable_classic(full_width = F, html_font = "Cambria")| Jaccard | Bray-curtis | Sorensen |
|---|---|---|
| 0.95 | 0.97 | 0.97 |
Classe
rural=class[,7:9] rural=ifelse(rural>0,1,0) rural=as.data.frame(apply(rural, 1, sum)) floresta=class[,4:6] floresta=ifelse(floresta>0,1,0) floresta=as.data.frame(apply(floresta, 1, sum)) simi<-round(jaccard(rural, floresta),2)#compara dois vetores com as espécies ordenadas da mesma maneira de duas areas, para isso somei as especies que ocorriam em qualquer uma das amostras simi<-as.data.frame(simi) simi$bray<-round(bray.curtis(rural, floresta),2) simi$sor<-round(sorenson(rural, floresta),2) colnames(simi)<-c("Jaccard", "Bray-curtis", "Sorensen") #REF: Magurran, A. E. 2004. Measuring Biological Diversity. Oxford, Blackwell. simi %>% kbl(caption = "Areas similarities") %>% kable_classic(full_width = F, html_font = "Cambria")Areas similarities
Jaccard
Bray-curtis
Sorensen
0.98
0.98
0.99
forest.sp=species[,4:6]
forest.f=family[,4:6]
forest.c=class[,4:6]
rural.sp=species[,7:9]
rural.f=family[,7:9]
rural.c=class[,7:9]
f1=rowSums(forest.sp)
f2=rowSums(forest.f)
f3=rowSums(forest.c)
r1=rowSums(rural.sp)
r2=rowSums(rural.f)
r3=rowSums(rural.c)Para comparar a composição de microbioma, após calcular os índices de diversidade, similaridade e riqueza, foi calculado a diversidade-beta.
Explorando Beta diversidade
??sor: Sorensen dissimilarity (Baselga, 2010) as operational response variable for the underlying structure of microbiome. To further test it as a function of the local.
Pairwise beta - diversidade calculada pela abordagem de Baselga (2010):
- Espécie
bin_abun1 <- as.data.frame((sp.t>0)*1)# transformar em presence-absence
#head(bin_abun)
bifun1 <- beta.pair(bin_abun1) ## calcular beta par a par
sor.df1 <- matrixConvert(bifun1$beta.sor, # extrair beta sorensen entre obs
colname= c("sample1", "sample2", "beta.sor"))
#sor.df <- sor.df %>% ## reorganizar para usar na exploratoria
# separate("sample1", c("ponto_1", "tempo_1", "substrato_1")) %>%
# separate("sample2", c("ponto_2", "tempo_2", "substrato_2"))
ggplot(sor.df1, aes(sample1, sample2, fill= beta.sor)) +
geom_tile()+
scale_fill_gradient(low="gray", high="dark orange")+
theme(axis.title.x = element_blank(),
axis.text.x = element_text(size=8, angle=0),
axis.title.y = element_blank(),
axis.text.y = element_text(size=8))+ggtitle("Espécie")- Família
bin_abun2 <- as.data.frame((f.t>0)*1)# transformar em presence-absence
#head(bin_abun)
bifun2 <- beta.pair(bin_abun2) ## calcular beta par a par
sor.df2 <- matrixConvert(bifun2$beta.sor, # extrair beta sorensen entre obs
colname= c("sample1", "sample2", "beta.sor"))
#sor.df <- sor.df %>% ## reorganizar para usar na exploratoria
# separate("sample1", c("ponto_1", "tempo_1", "substrato_1")) %>%
# separate("sample2", c("ponto_2", "tempo_2", "substrato_2"))
ggplot(sor.df2, aes(sample1, sample2, fill= beta.sor)) +
geom_tile()+
scale_fill_gradient(low="gray", high="dark orange")+
theme(axis.title.x = element_blank(),
axis.text.x = element_text(size=8, angle=0),
axis.title.y = element_blank(),
axis.text.y = element_text(size=8))+ggtitle("Família")- Classe
bin_abun3 <- as.data.frame((c.t>0)*1)# transformar em presence-absence
#head(bin_abun)
bifun3 <- beta.pair(bin_abun3) ## calcular beta par a par
sor.df3 <- matrixConvert(bifun3$beta.sor, # extrair beta sorensen entre obs
colname= c("sample1", "sample2", "beta.sor"))
#sor.df <- sor.df %>% ## reorganizar para usar na exploratoria
# separate("sample1", c("ponto_1", "tempo_1", "substrato_1")) %>%
# separate("sample2", c("ponto_2", "tempo_2", "substrato_2"))
ggplot(sor.df3, aes(sample1, sample2, fill= beta.sor)) +
geom_tile()+
scale_fill_gradient(low="gray", high="dark orange")+
theme(axis.title.x = element_blank(),
axis.text.x = element_text(size=8, angle=0),
axis.title.y = element_blank(),
axis.text.y = element_text(size=8))+ggtitle("Classe")Beta ~ area rural e fragmento:
Testando modelo para saber se tem diferença entre fazenda-fragmento na beta diversidade:
Olhando o modelo:
- Espécie
sor1 <- sor.df1 %>%
unite("intervalo", c(sample1,sample2), remove=F)
sor1$area<-c(1,1,1,0,0,0,0,0,0,2,0,0,0,2,2)
#quando compara frag com frag é 1
#quando compara frag com ref é 0
#quando compara ref com ref é 2
sor1$area<-as.factor(sor1$area)
## transformar beta antes de rodar para encaixar na distribuicao beta
y_transf_betareg = function(y){
n.obs = sum(!is.na(y))
(y*(n.obs-1) + 0.5)/n.obs
}
sor1$beta.sor.t <- y_transf_betareg(sor.df1$beta.sor)
m.tot1 <- betareg(beta.sor.t ~ area,
data=sor1)
summary(m.tot1)##
## Call:
## betareg(formula = beta.sor.t ~ area, data = sor1)
##
## Standardized weighted residuals 2:
## Min 1Q Median 3Q Max
## -2.1268 -1.0156 0.4182 0.8810 1.2260
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.29550 0.01553 -83.443 < 2e-16 ***
## area1 -0.19965 0.03250 -6.143 8.1e-10 ***
## area2 -0.33014 0.03360 -9.827 < 2e-16 ***
##
## Phi coefficients (precision model with identity link):
## Estimate Std. Error z value Pr(>|z|)
## (phi) 2730.6 996.9 2.739 0.00616 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 51.9 on 4 Df
## Pseudo R-squared: 0.8921
## Number of iterations: 1713 (BFGS) + 3 (Fisher scoring)
performance::r2(m.tot1)## # R2 for Beta Regression
## Pseudo R2: 0.892
car::Anova(m.tot1)## Analysis of Deviance Table (Type II tests)
##
## Response: beta.sor.t
## Df Chisq Pr(>Chisq)
## area 2 113.17 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggemmeans(m.tot1, terms=c("area")) %>% plot() +
theme_cowplot() +
ggtitle("Espécie")+
theme(axis.text.x = element_text(hjust=1))+
ylab("Beta diversity")ggeffect(m.tot1, terms=c("area")) %>% plot() Podemos ver que a diferença nos valores de Beta são maiores quando comparamos área ruralxfragmento do que fragmento com fragmento [1] ou rural com rural[2].
Porém, vamos lembrar que beta varia de 0 a 1.
ggpredict(m.tot1, terms=c("area")) %>% plot()+
theme(axis.text.x = element_text(hjust=1))+
ylab("Beta diversity")+xlab("Area")+
ggtitle("Beta Sorensen")Portanto, a dimensão da diferença é ainda muito baixa. De 0 a 1, o índice fica entre 0.17 e 0.22, demonstrando que apesar dos números serem estatisticamente diferentes, a magnitude da diferença é pequena. Note que, apesar da riqueza da área Rural ser maior, a beta-diversidade dela ainda é menor do que a comparação entre fragmentos.
Comparações dos valores de beta-diversidade entre as amostras:
ggplot(sor1,
aes(x=intervalo, y=beta.sor, col=area)) +
geom_point(shape=5,size=2)+ ylab("Beta diversity")+
theme(axis.text.x = element_text(angle=75, vjust=0.5))+
scale_x_discrete(name="") +
scale_colour_discrete(name="",
labels=c("RuralxFrag","FragxFrag","RurxRur"))+ggtitle("Espécie")Família
sor2 <- sor.df2 %>% unite("intervalo", c(sample1,sample2), remove=F) sor2$area<-c(1,1,1,0,0,0,0,0,0,2,0,0,0,2,2) #quando compara frag com frag é 1 #quando compara frag com ref é 0 #quando compara ref com ref é 2 sor2$area<-as.factor(sor2$area) ## transformar beta antes de rodar para encaixar na distribuicao beta y_transf_betareg = function(y){ n.obs = sum(!is.na(y)) (y*(n.obs-1) + 0.5)/n.obs } sor2$beta.sor.t <- y_transf_betareg(sor.df2$beta.sor) m.tot2 <- betareg(beta.sor.t ~ area, data=sor2) summary(m.tot2)## ## Call: ## betareg(formula = beta.sor.t ~ area, data = sor2) ## ## Standardized weighted residuals 2: ## Min 1Q Median 3Q Max ## -2.2736 -0.7393 0.2277 0.9330 1.5372 ## ## Coefficients (mean model with logit link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -2.60040 0.02367 -109.869 < 2e-16 *** ## area1 -0.07271 0.04845 -1.501 0.13339 ## area2 -0.13891 0.04953 -2.805 0.00504 ** ## ## Phi coefficients (precision model with identity link): ## Estimate Std. Error z value Pr(>|z|) ## (phi) 3080 1125 2.738 0.00618 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Type of estimator: ML (maximum likelihood) ## Log-likelihood: 59.84 on 4 Df ## Pseudo R-squared: 0.3721 ## Number of iterations: 2591 (BFGS) + 4 (Fisher scoring)performance::r2(m.tot2)## # R2 for Beta Regression ## Pseudo R2: 0.372car::Anova(m.tot2)## Analysis of Deviance Table (Type II tests) ## ## Response: beta.sor.t ## Df Chisq Pr(>Chisq) ## area 2 8.6237 0.01341 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1ggemmeans(m.tot2, terms=c("area")) %>% plot() + theme_cowplot() + ggtitle("Família")+ theme(axis.text.x = element_text(hjust=1))+ ylab("Beta diversity")ggeffect(m.tot2, terms=c("area")) %>% plot()ggplot(sor2, aes(x=intervalo, y=beta.sor, col=area)) + geom_point(shape=5,size=2)+ ylab("Beta diversity")+ theme(axis.text.x = element_text(angle=75, vjust=0.5))+ scale_x_discrete(name="") + scale_colour_discrete(name="", labels=c("RuralxFrag","FragxFrag","RurxRur"))+ggtitle("Família")Classe
sor3 <- sor.df3 %>% unite("intervalo", c(sample1,sample2), remove=F) sor3$area<-c(1,1,1,0,0,0,0,0,0,2,0,0,0,2,2) #quando compara frag com frag é 1 #quando compara frag com ref é 0 #quando compara ref com ref é 2 sor3$area<-as.factor(sor3$area) ## transformar beta antes de rodar para encaixar na distribuicao beta y_transf_betareg = function(y){ n.obs = sum(!is.na(y)) (y*(n.obs-1) + 0.5)/n.obs } sor3$beta.sor.t <- y_transf_betareg(sor.df3$beta.sor) m.tot3 <- betareg(beta.sor.t ~ area, data=sor3) summary(m.tot3)## ## Call: ## betareg(formula = beta.sor.t ~ area, data = sor3) ## ## Standardized weighted residuals 2: ## Min 1Q Median 3Q Max ## -2.9724 -0.4697 -0.2840 0.8010 1.8829 ## ## Coefficients (mean model with logit link): ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -2.85225 0.04017 -71.001 <2e-16 *** ## area1 -0.07067 0.08213 -0.861 0.389 ## area2 -0.12329 0.08362 -1.474 0.140 ## ## Phi coefficients (precision model with identity link): ## Estimate Std. Error z value Pr(>|z|) ## (phi) 1330.5 486.2 2.737 0.0062 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Type of estimator: ML (maximum likelihood) ## Log-likelihood: 55.23 on 4 Df ## Pseudo R-squared: 0.1422 ## Number of iterations: 1878 (BFGS) + 3 (Fisher scoring)performance::r2(m.tot3)## # R2 for Beta Regression ## Pseudo R2: 0.142car::Anova(m.tot3)## Analysis of Deviance Table (Type II tests) ## ## Response: beta.sor.t ## Df Chisq Pr(>Chisq) ## area 2 2.4548 0.2931ggemmeans(m.tot3, terms=c("area")) %>% plot() + theme_cowplot() + ggtitle("Classe")+ theme(axis.text.x = element_text(hjust=1))+ ylab("Beta diversity")ggeffect(m.tot3, terms=c("area")) %>% plot()ggplot(sor3, aes(x=intervalo, y=beta.sor, col=area)) + geom_point(shape=5,size=2)+ ylab("Beta diversity")+ theme(axis.text.x = element_text(angle=75, vjust=0.5))+ scale_x_discrete(name="") + scale_colour_discrete(name="", labels=c("RuralxFrag","FragxFrag","RurxRur"))+ggtitle("Classe")
Abundância das OTUs
Abaixo temos um mapa de calor mostrando a variação na abundância das diferentes OTUs para cada uma das áreas.
Estrutura taxonômica
mtscaled1 = as.matrix(scale(species_1[-c(1:2)]))
row_order1 = hclust(dist(mtscaled1))$order
column_order1 = hclust(dist(t(mtscaled1)))$order
#heatmap reorders both variables and observations using a clustering algorithm: it computes the distance between each pair of rows and columns and try to order them by similarity.They allow us to simultaneously visualize clusters of samples and features. First hierarchical clustering is done of both the rows and the columns of the data matrix. The columns/rows of the data matrix are re-ordered according to the hierarchical clustering result, putting similar observations close to each other. The blocks of ‘high’ and ‘low’ values are adjacent in the data matrix. Finally, a color scheme is applied for the visualization and the data matrix is displayed. Visualizing the data matrix in this way can help to find the variables that appear to be characteristic for each sample cluster.
abund_t1<-t(as.matrix(soma1)) #todas as amostras juntas
my_group1 <- as.numeric(as.factor(substr(rownames(abund_t1), 1 , 1)))
colSide <- brewer.pal(9, "Set1")[my_group1]
colMain <- colorRampPalette(brewer.pal(9, "Set1"))(299)
heatmap(mtscaled1[row_order1,column_order1], col= colMain, Colv=NA, Rowv=NA, xlab="Areas", main="Microbiomes", cexRow=0.2, cexCol = 1, RowSideColors=colSide) #todas as amostras separadasabundtscaled1 = as.matrix(scale(abund_t1))
abundt_row_order1 = hclust(dist(abundtscaled1))$order
colMainT <- colorRampPalette(brewer.pal(9, "Greens"))(299)
#isTRUE(abund_t[,1]==abund_t[,2])
heatmap(abund_t1, Colv = NA, Rowv = NA, xlab="Areas", RowSideColors=colSide, col=colMainT, main="Microbiome - Espécies",cexRow=0.2, cexCol = 1)abund1<-as.data.frame(abund_t1[,1])
colnames(abund1)<-"abundance"
abund1$area<-rep("Frag",nrow(abund1))
abund_R1<-as.data.frame(abund_t1[,2])
colnames(abund_R1)<-"abundance"
abund_R1$area<-rep("Rural",nrow(abund_R1))
abund_to1<-rbind(abund1,abund_R1)
abund_to1$sp<-as.vector(rownames(abund_to1))
abund_to1$rank=as.numeric(rank(abund_to1$abundance))
head(abund_to1)## abundance area sp rank
## Synechococcus sp. SynAce01 12 Frag Synechococcus sp. SynAce01 5660.5
## Psychrobacter sp. AntiMn-1 1 Frag Psychrobacter sp. AntiMn-1 1779.5
## Yersinia sp. KBS0713 2 Frag Yersinia sp. KBS0713 2714.5
## Azorhizobium caulinodans 185 Frag Azorhizobium caulinodans 9097.0
## Buchnera aphidicola 38 Frag Buchnera aphidicola 7597.5
## Burkholderia sp. KBS0801 3 Frag Burkholderia sp. KBS0801 3317.5
ggplot(abund_to1,aes(x =rank,y = abundance, colour = area))+
#geom_point()+
geom_smooth(method="loess")+
scale_colour_manual(values=c("orange", "darkgreen"),
name="Areas",
labels=c("Frag","Rural"))+
ylab("Abundance")+ xlab("OTUs ranking")+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())abund_to1$ablog<-log(abund_to1$abundance)
ggplot(abund_to1)+
geom_point(aes(x = sp,y = ablog, colour = area))+
scale_colour_manual(values=c("orange", "darkgreen"),
name="Areas",
labels=c("Frag","Rural"))+
ylab("Log(Abundance)")+ xlab("OTUs")+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())ggplot(abund_to1)+
geom_point(aes(x = sp,y = abundance, colour = area))+
scale_colour_manual(values=c("orange", "darkgreen"),
name="Areas",
labels=c("Frag","Rural"))+
ylab("Abundance")+ xlab("OTUs")+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())É possível ver como varia a abundância entre as amostras do fragmento e da área rural.
Abaixo a gente observa a correlação das abundâncias entre as diferentes áreas. É possível verificar que claramente as correlações mais fortes são entre as mesmas áreas.
my_fn <- function(data, mapping, ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(method=lm, fill="blue", color="blue", ...)
p
}
ggpairs(species_1[-c(1:2)],
upper = list(continuous = wrap("cor", size=3)),
lower = list(continuous = my_fn))Apesar do n amostral muito pequeno, rodamos modelos com intuito de ganhar mais intuição sobre o que pode estar acontecendo. Contudo, a ausência de mais amostras não nos permite, de fato, saber a distribuição dos dados e ajustar um modelo que descreva bem os dados.
Modelos para ganhar intuição
We fit generalised linear models with negative binomial distribution for richness (MASS::glm.nb).
sha=data.frame(diversity(sp.t))
h=data.frame(sha)
colnames(h)<-"Shannon" #Indice de Shanon é a abundância proporcional das especies comparando por area; Shanon é mais sensível à espécies raras
simp <- diversity(sp.t, "simpson") #Enquanto Simpson é mais sensível à equabilidade de distribuição das especies
h$Simpson<-as.vector(simp)
# Species richness (S)
S <- specnumber(sp.t) ## rowSums(sp.t > 0)
P <- sha/log(S)# Pielou's evenness (J)
h$Richness<-S
h$Evenness<-round(P[,1],2)
h$Area<-c("F","F","F","R", "R", "R")
mod0 <- glm.nb(h$Richness ~ 1, data= h)
mod1 <- glm.nb(h$Richness ~ h$Area, data= h)
AICtab(mod0, mod1, base=T)## AIC dAIC df
## mod1 80.6 0.0 3
## mod0 84.1 3.4 2
hnp::hnp(mod1)## Negative binomial model (using MASS package)
testResiduals(simulateResiduals(mod1))## $uniformity
##
## Exact one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.208, p-value = 0.9127
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.1551, p-value = 0.648
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa bootstrapped outlier test
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 6, p-value = 1
## alternative hypothesis: two.sided
## percent confidence interval:
## 0.0000000 0.1666667
## sample estimates:
## outlier frequency (expected: 0.0116666666666667 )
## 0
## $uniformity
##
## Exact one-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.208, p-value = 0.9127
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 1.1551, p-value = 0.648
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa bootstrapped outlier test
##
## data: simulationOutput
## outliers at both margin(s) = 0, observations = 6, p-value = 1
## alternative hypothesis: two.sided
## percent confidence interval:
## 0.0000000 0.1666667
## sample estimates:
## outlier frequency (expected: 0.0116666666666667 )
## 0
Model results for richness:
The best fitted model was the one containing area as a predictor (R2 = 0.84).
summary(mod1)##
## Call:
## glm.nb(formula = h$Richness ~ h$Area, data = h, init.theta = 995.7504182,
## link = log)
##
## Deviance Residuals:
## F21_contigs F22_contigs F23_contigs R51_contigs R52_contigs R53_contigs
## 1.76908 -0.67909 -1.14227 -0.78899 0.06603 0.71091
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.07641 0.02094 385.752 < 2e-16 ***
## h$AreaR 0.08772 0.02946 2.977 0.00291 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(995.7504) family taken to be 1)
##
## Null deviance: 14.8918 on 5 degrees of freedom
## Residual deviance: 6.0278 on 4 degrees of freedom
## AIC: 80.63
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 996
## Std. Err.: 749
##
## 2 x log-likelihood: -74.63
performance::r2(mod1)## # R2 for Generalized Linear Regression
## Nagelkerke's R2: 0.842
Beta diversidade
We fit generalised linear models with beta distribution for beta diversity (betareg::betareg) to the data using the R program. We base the selection in the Akaike information criteria (AIC-based), selecting the most parsimonious models with delta AICc < 2.
AIC-based model selection
#### MODELOS BINOMIAL
mdb0 <- betareg(beta.sor.t ~ 1, data= sor1)
mdb1 <- betareg(beta.sor.t ~ area, data= sor1)
AICtab(mdb0, mdb1, base=T)## AIC dAIC df
## mdb1 -95.8 0.0 4
## mdb0 -67.1 28.7 2
MuMIn::AICc(mod0, mod1)## df AICc
## mod0 2 88.06361
## mod1 3 92.62973
par(mfrow=c(2,2))
plot(mdb1)Model results for beta:
The best fitted model was the one containing time and tree species (R2 = 0.89).
summary(mdb1)##
## Call:
## betareg(formula = beta.sor.t ~ area, data = sor1)
##
## Standardized weighted residuals 2:
## Min 1Q Median 3Q Max
## -2.1268 -1.0156 0.4182 0.8810 1.2260
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.29550 0.01553 -83.443 < 2e-16 ***
## area1 -0.19965 0.03250 -6.143 8.1e-10 ***
## area2 -0.33014 0.03360 -9.827 < 2e-16 ***
##
## Phi coefficients (precision model with identity link):
## Estimate Std. Error z value Pr(>|z|)
## (phi) 2730.6 996.9 2.739 0.00616 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 51.9 on 4 Df
## Pseudo R-squared: 0.8921
## Number of iterations: 1713 (BFGS) + 3 (Fisher scoring)
Variáveis abióticas
Variáveis abióticas também poderiam alterar a composição do microbioma. Fizemos alguns boxplots para explorar como estas variáveis estão mudando entre a área rural e a área de floresta, de acordo com o que nos foi enviado.
AB<-read.csv2("abioticas.csv", sep=";", fileEncoding = "Latin1", header=T, dec=".")
AB$Area<-as.factor(AB$Area)
levels(AB$Area)## [1] "Edge" "Forest" "Rural" "Urban"
head(AB)## Area Spot Arylsulfatase X..glucosidase Acid.Phosphatase Alkaline.Phosphatase
## 1 Urban 1 2.265181 3.121007 35.47741 7.889823
## 2 Urban 2 22.308969 20.948246 82.05325 13.600314
## 3 Urban 3 13.716082 5.310704 75.46843 12.200906
## 4 Urban 4 25.794935 11.454087 82.20383 11.513082
## 5 Urban 5 4.323747 6.116436 56.82712 7.254958
## 6 Urban 6 33.617979 7.830552 66.07508 21.418350
## Density Porosity Moisture Microporosity Macroporosity Particle.Density SOC
## 1 1.18 50.28 137.52 38.83 11.46 2.37 19.48
## 2 1.14 53.22 139.74 45.40 7.82 2.44 19.48
## 3 1.07 57.32 127.62 40.77 16.56 2.50 27.27
## 4 1.23 46.54 141.98 36.38 10.16 2.29 23.38
## 5 1.37 44.04 142.03 36.73 7.31 2.45 21.82
## 6 1.09 54.67 127.00 37.49 17.18 2.40 15.58
## pH P K N Ca Mg Al H.Al BS CEC SOM Sand Clay Silt
## 1 5.11 5.36 0.26 1.464045 5.87 1.57 0 5.89 7.70 7.70 33.58 25.0 60 15.0
## 2 5.37 4.42 0.41 1.548105 5.56 1.53 0 4.34 7.50 7.50 33.58 45.0 40 15.0
## 3 5.14 20.44 0.61 2.416725 6.82 1.34 0 7.59 8.77 8.77 47.02 10.0 60 30.0
## 4 5.09 14.77 0.61 2.248605 7.19 1.65 0 6.11 9.45 9.45 40.30 27.5 40 32.5
## 5 5.14 9.66 0.20 2.080485 6.75 2.05 0 6.30 9.00 9.00 37.61 15.0 40 45.0
## 6 5.47 4.74 0.26 1.702215 5.83 2.71 0 4.85 8.80 8.80 26.87 20.0 60 20.0
## Ba Cr Cu Fe Mn Na Ni Pb Zn
## 1 1701.7937 2.3323 9.3714 311.4072 105.4963 324.5041 0.0000 9.8453 11.2872
## 2 1533.6323 0.5898 26.2806 319.3578 427.1936 62.8692 1.0749 6.7686 11.1408
## 3 1338.5650 0.1877 21.1874 327.5233 276.3934 22.6249 2.2574 8.6146 26.3801
## 4 1695.0673 0.7238 16.5697 387.9052 241.4081 21.3530 0.6450 17.8446 20.0046
## 5 1661.4350 1.9301 19.7614 368.3509 291.0361 38.4098 0.4300 14.7679 16.4754
## 6 733.1839 1.2600 9.5751 224.5948 141.7591 39.6422 0.8600 21.5366 7.4651
str(AB)## 'data.frame': 48 obs. of 36 variables:
## $ Area : Factor w/ 4 levels "Edge","Forest",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ Spot : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Arylsulfatase : num 2.27 22.31 13.72 25.79 4.32 ...
## $ X..glucosidase : num 3.12 20.95 5.31 11.45 6.12 ...
## $ Acid.Phosphatase : num 35.5 82.1 75.5 82.2 56.8 ...
## $ Alkaline.Phosphatase: num 7.89 13.6 12.2 11.51 7.25 ...
## $ Density : num 1.18 1.14 1.07 1.23 1.37 1.09 0.96 0.95 1.2 1.11 ...
## $ Porosity : num 50.3 53.2 57.3 46.5 44 ...
## $ Moisture : num 138 140 128 142 142 ...
## $ Microporosity : num 38.8 45.4 40.8 36.4 36.7 ...
## $ Macroporosity : num 11.46 7.82 16.56 10.16 7.31 ...
## $ Particle.Density : num 2.37 2.44 2.5 2.29 2.45 2.4 2.28 2.4 2.54 2.39 ...
## $ SOC : num 19.5 19.5 27.3 23.4 21.8 ...
## $ pH : num 5.11 5.37 5.14 5.09 5.14 5.47 3.81 3.89 4.02 3.96 ...
## $ P : num 5.36 4.42 20.44 14.77 9.66 ...
## $ K : num 0.26 0.41 0.61 0.61 0.2 0.26 0.28 0.41 0.08 0.1 ...
## $ N : num 1.46 1.55 2.42 2.25 2.08 ...
## $ Ca : num 5.87 5.56 6.82 7.19 6.75 5.83 1.94 1.68 0.79 1.26 ...
## $ Mg : num 1.57 1.53 1.34 1.65 2.05 2.71 0.41 0.54 0.18 0.38 ...
## $ Al : num 0 0 0 0 0 0 2.15 1.15 1.65 1.45 ...
## $ H.Al : num 5.89 4.34 7.59 6.11 6.3 ...
## $ BS : num 7.7 7.5 8.77 9.45 9 8.8 2.63 2.63 1.05 1.74 ...
## $ CEC : num 7.7 7.5 8.77 9.45 9 8.8 4.78 3.78 2.7 3.19 ...
## $ SOM : num 33.6 33.6 47 40.3 37.6 ...
## $ Sand : num 25 45 10 27.5 15 20 25 25 25 10 ...
## $ Clay : int 60 40 60 40 40 60 60 60 60 60 ...
## $ Silt : num 15 15 30 32.5 45 20 15 15 15 30 ...
## $ Ba : num 1702 1534 1339 1695 1661 ...
## $ Cr : num 2.332 0.59 0.188 0.724 1.93 ...
## $ Cu : num 9.37 26.28 21.19 16.57 19.76 ...
## $ Fe : num 311 319 328 388 368 ...
## $ Mn : num 105 427 276 241 291 ...
## $ Na : num 324.5 62.9 22.6 21.4 38.4 ...
## $ Ni : num 0 1.075 2.257 0.645 0.43 ...
## $ Pb : num 9.85 6.77 8.61 17.84 14.77 ...
## $ Zn : num 11.3 11.1 26.4 20 16.5 ...
ABi<-AB[AB$Area=="Rural"|AB$Area=="Forest",]
vari<-colnames(ABi)
head(ABi)## Area Spot Arylsulfatase X..glucosidase Acid.Phosphatase
## 13 Rural 1 4.954743 0.1215587 75.89307
## 14 Rural 2 10.216895 0.2399537 89.09107
## 15 Rural 3 13.128901 -0.1823805 88.30463
## 16 Rural 4 9.809631 0.0485416 68.86310
## 17 Rural 5 20.454745 0.4921459 85.60830
## 18 Rural 6 23.786942 0.1136793 93.07828
## Alkaline.Phosphatase Density Porosity Moisture Microporosity Macroporosity
## 13 5.134844 1.36 44.27 159.36 38.51 5.76
## 14 5.485450 1.35 45.74 157.91 37.61 8.12
## 15 1.980411 1.25 51.44 144.27 38.91 12.53
## 16 1.400418 1.32 48.26 151.95 36.30 11.96
## 17 11.822917 1.35 45.79 153.66 35.99 9.80
## 18 9.053077 1.32 50.80 159.02 31.92 18.88
## Particle.Density SOC pH P K N Ca Mg Al H.Al BS
## 13 2.44 19.48 4.44 6.73 0.41 1.295925 6.05 2.44 0.10 5.00 8.90
## 14 2.49 21.82 4.50 10.84 0.41 2.668905 5.32 1.55 0.10 6.99 7.28
## 15 2.57 24.93 4.62 4.74 0.10 1.954395 5.26 1.79 0.03 6.74 7.15
## 16 2.54 23.38 4.18 6.04 0.15 1.870335 1.61 0.52 2.00 12.87 2.28
## 17 2.48 23.38 4.37 5.05 0.82 2.949105 7.34 1.59 0.01 5.15 9.75
## 18 2.69 17.14 4.45 10.84 0.77 2.374695 5.44 1.09 0.25 7.93 7.30
## CEC SOM Sand Clay Silt Ba Cr Cu Fe Mn Na
## 13 9.00 33.58 10 50 40 1257.8475 6.3534 24.3791 385.1117 215.3166 32.8401
## 14 7.38 37.61 15 55 30 1197.3094 5.4151 25.9410 232.1157 369.8512 55.9169
## 15 7.18 42.99 15 50 35 1291.4798 5.0130 21.9344 574.8527 136.6980 37.3669
## 16 4.28 40.30 15 50 35 248.8789 7.1576 27.0955 201.6024 255.2155 49.3201
## 17 9.76 40.30 25 35 40 1520.1794 7.5597 30.6267 207.1893 557.5530 35.3919
## 18 7.55 29.55 30 45 25 941.7040 7.2917 45.1591 130.0467 590.0323 41.9649
## Ni Pb Zn
## 13 3.8698 19.6906 5.9689
## 14 1.1824 15.9986 6.7821
## 15 3.4398 27.0745 5.7086
## 16 2.1499 22.7672 5.7574
## 17 3.7623 21.5366 8.9777
## 18 2.5799 19.0752 11.0432
#Desta análise precisaremos retirar arylsulfatase, X..glicosidase, fosfatase básica e fosfatase ácida, pois são enzimas e creio que para este artigo não são pertinentes. Dá para tirar também msu, micro e macroporo, pois só com a porosidade já teremos uma boas descrição dos dados
ABi<-ABi[,-c(3:6,9:11)]
head(ABi)## Area Spot Density Porosity Particle.Density SOC pH P K N
## 13 Rural 1 1.36 44.27 2.44 19.48 4.44 6.73 0.41 1.295925
## 14 Rural 2 1.35 45.74 2.49 21.82 4.50 10.84 0.41 2.668905
## 15 Rural 3 1.25 51.44 2.57 24.93 4.62 4.74 0.10 1.954395
## 16 Rural 4 1.32 48.26 2.54 23.38 4.18 6.04 0.15 1.870335
## 17 Rural 5 1.35 45.79 2.48 23.38 4.37 5.05 0.82 2.949105
## 18 Rural 6 1.32 50.80 2.69 17.14 4.45 10.84 0.77 2.374695
## Ca Mg Al H.Al BS CEC SOM Sand Clay Silt Ba Cr Cu
## 13 6.05 2.44 0.10 5.00 8.90 9.00 33.58 10 50 40 1257.8475 6.3534 24.3791
## 14 5.32 1.55 0.10 6.99 7.28 7.38 37.61 15 55 30 1197.3094 5.4151 25.9410
## 15 5.26 1.79 0.03 6.74 7.15 7.18 42.99 15 50 35 1291.4798 5.0130 21.9344
## 16 1.61 0.52 2.00 12.87 2.28 4.28 40.30 15 50 35 248.8789 7.1576 27.0955
## 17 7.34 1.59 0.01 5.15 9.75 9.76 40.30 25 35 40 1520.1794 7.5597 30.6267
## 18 5.44 1.09 0.25 7.93 7.30 7.55 29.55 30 45 25 941.7040 7.2917 45.1591
## Fe Mn Na Ni Pb Zn
## 13 385.1117 215.3166 32.8401 3.8698 19.6906 5.9689
## 14 232.1157 369.8512 55.9169 1.1824 15.9986 6.7821
## 15 574.8527 136.6980 37.3669 3.4398 27.0745 5.7086
## 16 201.6024 255.2155 49.3201 2.1499 22.7672 5.7574
## 17 207.1893 557.5530 35.3919 3.7623 21.5366 8.9777
## 18 130.0467 590.0323 41.9649 2.5799 19.0752 11.0432
Das variáveis medidas, as que parecem ser diferentes entre as áreas são Arilsulfatase,Fosfatase Ácida, Densidade, Porosidade, MSU, Microporo, Potássio (k) e MO. Os outros parecem iguais.
Realizamos um PCA para combinar essas variáveis:
### Componentes principais
pca <- PCA(ABi[,-1], quali.sup = 1, graph=F)
p <- fviz_pca_biplot(pca,geom="point", col.ind = ABi[,1],
repel = TRUE,addEllipses = T)
pContudo, não conseguimos linkar os valores das amostras das variáveis abióticas à estrutura das comunidades de OTU, de maneira que não conseguimos adicionar estas variáveis no modelo.
No artigo, aparecem dados de: - C: CaCll - C: em cmolc dm-³ - N: em cmolc dm-³ - K: em cmolc dm-³ - Ca: em cmolc dm-³ - Mg: em cmolc dm-³ - Al: em cmolc dm-³ - H+Al: em cmolc dm-³ - V: em porcentagem - Clay: em porcentagem
Se não for possível vincular às amostras, ainda assim pode ser interessante mostrar alguns boxplots.
for(i in 3:ncol(ABi)){
#i=4
A<-ggplot(ABi)+
geom_boxplot(aes(y=ABi[,i], x=ABi[,1])) +
theme_cowplot()+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=0.5))
print(A)
print(vari[i])}## [1] "Arylsulfatase"
## [1] "X..glucosidase"
## [1] "Acid.Phosphatase"
## [1] "Alkaline.Phosphatase"
## [1] "Density"
## [1] "Porosity"
## [1] "Moisture"
## [1] "Microporosity"
## [1] "Macroporosity"
## [1] "Particle.Density"
## [1] "SOC"
## [1] "pH"
## [1] "P"
## [1] "K"
## [1] "N"
## [1] "Ca"
## [1] "Mg"
## [1] "Al"
## [1] "H.Al"
## [1] "BS"
## [1] "CEC"
## [1] "SOM"
## [1] "Sand"
## [1] "Clay"
## [1] "Silt"
## [1] "Ba"
## [1] "Cr"
Rarefação
#### Rarefaction curve
########### https://rpubs.com/brouwern/iNEXTvVEGAN
## Directory and arquive
setwd("D:/Dropbox/00_Unioeste/2 Pesquisa/3 Doutorado/Natália/Artigos_Projeto/Analises_Artigo3")
biotic=read.csv2("Pasta1.csv",h=T,sep =";", dec=".", row.names = 3)
names(biotic)## [1] "Rank" "TaxId" "F1" "F2" "F3" "R1" "R2" "R3"
## Subset
species=subset(biotic, biotic$Rank=="species")
names(species)## [1] "Rank" "TaxId" "F1" "F2" "F3" "R1" "R2" "R3"
sp=species[,-c(1:2)]
s=data.frame(t(sp))
#### Packages
library(iNEXT)
library(vegan)
#### All sampling
names(species)## [1] "Rank" "TaxId" "F1" "F2" "F3" "R1" "R2" "R3"
x <- list("Forest1" = species[,3], "Forest2"=species[,4], "Forest3"=species[,5], "Rural1"=species[,6], "Rural2"=species[,7], "Rural3"=species[,8])
str(x)## List of 6
## $ Forest1: int [1:4733] 0 0 0 0 0 1 0 2 3 0 ...
## $ Forest2: int [1:4733] 1 2 2 0 0 2 0 0 1 0 ...
## $ Forest3: int [1:4733] 0 0 1 1 0 1 0 2 0 0 ...
## $ Rural1 : int [1:4733] 0 0 0 0 0 0 0 0 0 0 ...
## $ Rural2 : int [1:4733] 0 0 2 0 0 2 0 3 2 5 ...
## $ Rural3 : int [1:4733] 0 0 3 2 1 1 1 6 0 1 ...
# call iNEXT on the list
out <- iNEXT(x, q=0, datatype="abundance", endpoint = 150000)
# Sample-size-based R/E curves, separating by "site""
ggiNEXT(out, type=1)+theme_bw()Como conclusão até o momento, com base nessas análises, a composição dos solos, não se diferencia em termos de índices de diversidade e são indicadas como similares pelo índice de similaridade. Porém, há diferença na riqueza. A ??diversidade indica que a área do fragmento e a área rural são diferentes, ainda que a diferença seja muito baixa. O mapa de calor que mostra como se diferença a estrutura da comunidade através da abundância das espécies por área. Por fim, a comparação dos dados abióticos indica diferenciação entre o fragmento e a área rural. Se for possível conectar as amostras de OTUs com as amostras de fatores abióticos, podemos investigar se existe algum impacto da variação nos fatores abióticos com a estrutura das OTUs
Some useful References
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Bolker B, R Development Core Team (2022). bbmle: Tools for General Maximum Likelihood Estimation. R package version 1.0.25, https://CRAN.R-project.org/package=bbmle.
Barto?? K (2022). MuMIn: Multi-Model Inference. R package version 1.46.0, https://CRAN.R-project.org/package=MuMIn.