Project 1 (Frogs)

Author

Wesley Samimi

Introduction

The data set in this project will be used to answer the question”, “What affects the size of a frog’s egg?” The data set is a collection of data from a study of various populations of frogs 2035 to 3494 m above sea level in the eastern Tibetan Plateau in 2013, from W. Chen, Z. H. Tang, X. G. Fan, Y. Wang, and D. A. Pike. The data they collected includes the altitude and latitude of frog they studied, the body length of the mother frog who laid the egg clutch in cm, the number of eggs in a clutch (clutch size), the volume of the egg clutch in mm^3, and the average diameter of an egg in mm. A couple of these variables mention an egg clutch, an egg clutch is a group of eggs laid by reptiles, amphibians, etc. which are a laid at a single nesting period.

Load the Libraries & Data

library(tidyverse)
library(dplyr)
library(ggplot2)
setwd("C:/Users/wesle/Downloads/Data 101")
frogsds <- read_csv("frog - frog.csv")
head(frogsds)
# A tibble: 6 × 6
  altitude latitude clutch.size body.size clutch.volume egg.size
     <dbl>    <dbl>       <dbl>     <dbl>         <dbl>    <dbl>
1     3462     34.8        182.      3.63          178.     1.95
2     3462     34.8        269.      3.63          257.     1.95
3     3462     34.8        158.      3.72          151.     1.95
4     3462     34.8        234.      3.80          224.     1.95
5     3462     34.8        245.      3.89          234.     1.95
6     3462     34.8        302.      3.89          288.     1.95

Data Structure & Checking for NAs

str(frogsds)
spc_tbl_ [431 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ altitude     : num [1:431] 3462 3462 3462 3462 3462 ...
 $ latitude     : num [1:431] 34.8 34.8 34.8 34.8 34.8 ...
 $ clutch.size  : num [1:431] 182 269 158 234 245 ...
 $ body.size    : num [1:431] 3.63 3.63 3.72 3.8 3.89 ...
 $ clutch.volume: num [1:431] 178 257 151 224 234 ...
 $ egg.size     : num [1:431] 1.95 1.95 1.95 1.95 1.95 ...
 - attr(*, "spec")=
  .. cols(
  ..   altitude = col_number(),
  ..   latitude = col_double(),
  ..   clutch.size = col_double(),
  ..   body.size = col_double(),
  ..   clutch.volume = col_double(),
  ..   egg.size = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
colSums(is.na(frogsds))
     altitude      latitude   clutch.size     body.size clutch.volume 
            0             0             0           302             0 
     egg.size 
            0 

Cleaning Data

frogsds1 <- frogsds |>
  select(altitude, latitude, clutch.size, clutch.volume, egg.size)

names(frogsds1) <- gsub("\\.","_",names(frogsds1))

Summary Statistics

summary(frogsds1)
    altitude       latitude      clutch_size     clutch_volume   
 Min.   :2035   Min.   :32.78   Min.   : 158.5   Min.   : 151.4  
 1st Qu.:3189   1st Qu.:34.30   1st Qu.: 549.5   1st Qu.: 609.6  
 Median :3462   Median :34.30   Median : 707.9   Median : 831.8  
 Mean   :3276   Mean   :34.35   Mean   : 721.3   Mean   : 882.5  
 3rd Qu.:3493   3rd Qu.:34.82   3rd Qu.: 851.1   3rd Qu.:1096.5  
 Max.   :3493   Max.   :34.96   Max.   :1698.2   Max.   :2630.3  
    egg_size    
 Min.   :1.622  
 1st Qu.:1.950  
 Median :2.089  
 Mean   :2.114  
 3rd Qu.:2.291  
 Max.   :2.630  

Correlation

cor(frogsds1)
                 altitude   latitude clutch_size clutch_volume    egg_size
altitude       1.00000000  0.6488412 -0.02723525     0.0401179  0.05375085
latitude       0.64884122  1.0000000 -0.17055254    -0.2334410 -0.22320508
clutch_size   -0.02723525 -0.1705525  1.00000000     0.8077344  0.11812489
clutch_volume  0.04011790 -0.2334410  0.80773441     1.0000000  0.64626047
egg_size       0.05375085 -0.2232051  0.11812489     0.6462605  1.00000000

As show in the “Data Structure & Checking for NAs” chunk, body.size had 302 NA values and with the data set only having 431 observations. I chose to remove the column as it wouldn’t prove useful due to majority of the values within it being NA values. In the correlation matrix above it was shown that clutch volume had the highest correlation with egg size, with it being 0.6463. With that information I knew that it had a positive correlation with egg size, so as clutch volume increases egg size should as well. But due to the correlation only being 0.6463 I chose to use those variables to create a plot to see how it looked.

Visualization/Plot

frogsds1 |>
  ggplot(aes(x = clutch_volume, y = egg_size)) +
  geom_point() +
  xlab("Clutch Volume") +
  ylab("Egg Size")

Conclusion

The plot above seems to be somewhat linear, as the clutch volume increases, the egg size also increases, but due to there being a lot of different clutch volumes at a specific egg size, clutch volume is only an okay option to use to predict the size of an egg. Since the egg size in the data set is based average diameter, and clutch volume is the volume of a single egg in the clutch which have hundreds their correlation is lower than it should be, as diameter and volume should be closely related. From the correlation matrix before the plot all of the correlation coefficient values, other than the one for clutch volume, were close to 0, with the one being the furthest from zero being -0.2232. This shows that the data collected in this data set from the 2013 study wouldn’t be effective to use to predict the egg sizes of frogs. Along with that due to their over 300 NA values for body size, and there only being 431 total observations, it wasn’t really possible to use it to check its correlation with egg size. Overall, this data set cannot be used for predicting the egg size of a frog and for future data sets to be able to that possibly including different kinds of variables and having significantly less NA values in a column would be useful.

Source(s):

Chen, W., et al. Maternal investment increases with altitude in a frog on the Tibetan Plateau. Journal of evolutionary biology 26.12 (2013): 2710-2715. Data accessible from Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.6v0c1

Link to Dataset: https://www.openintro.org/data/index.php?data=frog