library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
snakes <- readr::read_csv('snakedb_2024-05-30_tax__genus.csv')
## Rows: 546 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): family, genus, shouldnotseeme
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
snakes_clean <- snakes %>%
select(family,genus)
Works cited: main page https://snakedb.org/pages/about-snakedb.php
direct dataset: https://snakedb.org/downloads/snakedb_2024-05-30_tax__genus.csv
#Snake Genera Diversity Across Families
Question Which snake family has the most genera,and how does the diversity of genera differ across families?
Introduction: The data set I chose talks about a variety of genera (genus) in the different snake families. Genera means a group covering more than one species. To answer my question we need a couple of specifics from our data set. We will need two columns seen in the data set, genus and family. With the help of these two columns, I am going to be able to compare the amount of different genus belong to each snake family and be able to determine which snake family has the most diversity, and how the other families do in comparison.
#Data exploration and cleaning the data
dim(snakes)
## [1] 546 3
colnames(snakes)
## [1] "family" "genus" "shouldnotseeme"
sum(is.na(snakes))
## [1] 1
snakes %>%
select(family, genus)
## # A tibble: 546 × 2
## family genus
## <chr> <chr>
## 1 Elapidae Acanthophis
## 2 Xenodermidae Achalinus
## 3 Boidae Acrantophis
## 4 Acrochordidae Acrochordus
## 5 Typhlopidae Acutotyphlops
## 6 Colubridae Adelphicos
## 7 Colubridae Adelphostigma
## 8 Colubridae Aeluroglena
## 9 Colubridae Afronatrix
## 10 Typhlopidae Afrotyphlops
## # ℹ 536 more rows
head(snakes)
## # A tibble: 6 × 3
## family genus shouldnotseeme
## <chr> <chr> <chr>
## 1 Elapidae Acanthophis RDB 2023-09-30
## 2 Xenodermidae Achalinus RDB 2023-09-30
## 3 Boidae Acrantophis RDB 2023-09-30
## 4 Acrochordidae Acrochordus RDB 2023-09-30
## 5 Typhlopidae Acutotyphlops RDB 2023-09-30
## 6 Colubridae Adelphicos RDB 2023-09-30
Explanation: In this chunk, I cleaned up the data and looked further at the data. First I need to take a look at the dimension of the data set this tells me the amount of rows (546) and (3) columns it has. We then use colnames to tell us the names in the set/ what are our variables. We can see there are three columns: “shouldnotseeme”, “family” and “genus”. Then, we use sum is.na with snakes. This checks if we have any missing values in this dataset, We found no na’s are in this set.Then by using snakes and the pipe operator and the select with the family and genus. All together by using snakes and the pipe operator it makes our code easier to read and select makes sure we only choose the two columns we need instead of the original three. Lastly, to make sure our dataset is correct, we use the function head to check the first 6 rows.
#Data summarization
snakes_clean2 <- snakes_clean |>
group_by(family) |>
summarize(count = n())
Explanation: After cleaning our data set, we now have the updated set that is called, “snakes_clean”. This set has only the two columns family and genus. But we want to now create a dataframe that summarizes the amount of genera belong to each of the families. We are going to call that snakes_clean2. In line 62, we are using this new data frame assigning it to store the summarized results. Then we are using the group by function to group the data by their family and then using summmarize and count, to count how many genera (genus) groups are in each of these snake families. Now this data will be in two columns count and family.
snakes_clean2
## # A tibble: 32 × 2
## family count
## <chr> <int>
## 1 Acrochordidae 1
## 2 Aniliidae 1
## 3 Anomalepididae 4
## 4 Anomochilidae 1
## 5 Atractaspididae 11
## 6 Boidae 14
## 7 Bolyeriidae 2
## 8 Colubridae 262
## 9 Cyclocoridae 5
## 10 Cylindrophiidae 1
## # ℹ 22 more rows
maxrow <-which.max(snakes_clean2$count)
snakes_clean2[maxrow,]
## # A tibble: 1 × 2
## family count
## <chr> <int>
## 1 Colubridae 262
Explanation: With our new dataframe, “snakes_clean2”. I put that at top of the chunk to make sure R knows what dataframe we are using. Then using the which.max function on the count column to find the row number with the largest value. This is saved as maxrow. Lastly, we use the dataframe and brackets (this selects the row) with maxrow in it to show us the row for the family with the most genera. We then get our result of which family has the greatest diversity which is Colubridae.
Conclusion: After cleaning and analyzing the data set, we found that the family that the most genera is Colubridae with a total of 262. I did this by grouping all the families and counting how many genera each of the families have. In regards to how the diversity differs across the different snake families, we see families like “Aniliidae” that only have one genera in this entire family. We also on the other hand see families like, “Elapidae” that have a total of 55 genera in one family. Overall, this shows that genus diversity varies widely across these families. Ulimately, Colubridae does the best in terms of diversity for genera and that it has suggests that it has adapted to its enviroments the best to be able to populate such a large number in comparision to the others. With the help of this dataset we were able to find which had the greatest genera and how well the other families did in comparision and researchers can use this data to be able to prioritize ways to conserve these familes and be able to understand and further study the patterns of these snakes.
A potential avenue we can take is that researchers why is this family thriving at such a large quantity. They can research more into environmental factors, or even evolutionary history. This could help the reproduction of genera in other families and find more ways to protect these snakes overall!