Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
library(performance)library(rsample)library(maps)
Attaching package: 'maps'
The following object is masked from 'package:purrr':
map
library(sjPlot)library(sjmisc)
Attaching package: 'sjmisc'
The following object is masked from 'package:purrr':
is_empty
The following object is masked from 'package:tidyr':
replace_na
The following object is masked from 'package:tibble':
add_case
library(sjlabelled)
Attaching package: 'sjlabelled'
The following object is masked from 'package:forcats':
as_factor
The following object is masked from 'package:dplyr':
as_label
The following object is masked from 'package:ggplot2':
as_label
sparrow_m<-read_csv("birds 11.38.48 AM.csv")
New names:
Rows: 6092602 Columns: 9
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): species dbl (7): ...1, day, month, year, decimalLatitude,
decimalLongitude, count dttm (1): eventDate
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
#counts of species by year and speciessparrows_m5 <- sparrows_d %>%group_by(time_year,species) %>% dplyr::summarize(sum_count=sum(count),.groups ='drop') %>%as.data.frame()head(sparrows_m5)
p0 <-ggplot(data = sparrows_passer,aes(y = mean_count, x =time_year)) +geom_line()+labs(title ="Distribution of Passer domesticus over time", x="time", y ="mean count")+theme_classic()p1 <-ggplot(data = sparrows_passer2,aes(y = mean_count, x =time_year)) +geom_line()+labs(title ="Distribution of Passer domesticus from 2000-2020", x="time", y ="mean_count")+theme_bw()p0+p1
p2 <-ggplot(data = sparrows_spizella,aes(y = mean_count, x =time_year)) +geom_line()+labs(title ="Distribution of Spizella pusilla over time", x="time", y ="count")+theme_classic()p3 <-ggplot(data = sparrows_spizella2,aes(y = mean_count, x =time_year)) +geom_line()+labs(title ="Distribution of Spizella pusilla from 2000-2020", x="time", y ="count")+theme_classic()p2+p3
The distribution of bird species generally increased from 1900 - 1955. The same can be seen in the early 2005-2012 (very clear on the second plot). I think this would be an interesting trend to investigate / hypothesize later on.
p5 <-ggplot(data = all_birds,aes(y = sum_count, x = time_year, color = species)) +geom_point()+geom_line()+labs(title ="Distribution of Sparrows from 2000-2020", x="species", y ="Total # birds")+theme_bw()+theme(axis.text.x=element_text(angle =0, size =15, vjust =1), axis.title.y =element_text(size =16),axis.title.x =element_text(size =16))+theme_classic()p5+geom_errorbar(aes(ymin=mean_count-se, ymax=mean_count+se), width=.2,position=position_dodge(.9))
The number of sparrows tracked generally increased for the two decades between 2000-2020. The invasive species (Passer domesticus) was greater than the Spizella pusilla which makes sense. But this is not also a very sufficient graph. I am not really sure about the error bars being similar for both species
p6 <-ggplot(data = sparrows_m6,aes(y = sum_count1, x = lat, color = species)) +geom_point()+labs(title ="Distribution of Sparrows in space", x="Latitude range", y ="Total # birds")+theme_bw()+theme(axis.text.x=element_text(angle =0, size =15, vjust =1), axis.title.y =element_text(size =16),axis.title.x =element_text(size =16))+theme_classic()p7 <-ggplot(data = sparrows_m6, mapping =aes(x = lat, color = species))+geom_density()+labs(title ="Distribution of Sparrows in space", x="Latitude range")+theme_classic()grid.arrange(p6,p7, ncol =1)
The distribution of Passer domesticus is generally higher than that of Spizella pusilla for all latitude ranges (both plots. The first graph however seems to be show this better than the histogram.
this would make sense since we hypothesized that the Passer domesticus is the invasive species and thus would overtime out-compete the Spizella pusilla - but am not sure the counts used are meaningful
DATA VISUALIZATION HYPOTHESIS 2
#counts of species by year and species#then abundance by ratiossparrows_m5_2 <- sparrows_d %>%filter(time_year >1999, time_year<2021) %>%group_by(time_year,species) %>% dplyr::summarize(sum_count=sum(count),.groups ='drop') %>%as.data.frame()head(sparrows_m5_2)
p5 <-ggplot(data = sparrows_m5_2,aes(y = log_counts, x = time_year, color = species)) +geom_point()+geom_line()+labs(title ="Distribution of Sparrows from 2000-2021", x="species", y ="log_abundance of birds")+theme_bw()+theme(axis.text.x=element_text(angle =0, size =15, vjust =1), axis.title.y =element_text(size =16),axis.title.x =element_text(size =16))+theme_classic()p5
There is definitely difference in relative log abundance of sparrows for the two species. Passer domesticus were higher than Spizella pusilla over the years between 2000 to 2021
set.seed(400) #any number is finesparrows_intervals<-reg_intervals(log_counts~species+time_year, data=sparrows_m5_2, type='percentile',keep_reps=FALSE)sparrows_intervals