I’m interested in the effects of temperature on the behavior, ecology and physiology of amphibians. For one of my dissertation chapters, I’m comparing thermal optima for performance (the temperature at which an animal maximizes a performance metric, in this case energy assimilation) and critical thermal limits (the minimum and maximum temperatures at which an animal loses critical motor function) across the geographic range of a widespread species. My collection sites also contain 3 genetically distinct clades, so I can compare physiology among clades to determine whether evolutionary history constrains this species’ response to temperature.
Some variables of interest: 1) Thermal optimum for performance: quantitative, range = 0 - 30°C 2) Environmental temperature: quantitative, range depends on the chosen metric. I have daily ground temp data from 1980-2015 for each site. From this data, I have calculated mean daily max temp, mean daily min temp, mean daily temp, mean annual days below 5°C, mean annual days above 25° C, etc. 3) Clade: categorical, 3 clades 4) Site: categorical, 12 sites 5) Individual body metrics: mass (quantitative), snout-to-vent length (quantitative), tail length (quantitative), total length (quantitative), sex (categorical)
After estimating thermal optimum for performance from thermal performance curves (energy assimilation ~ temperature), I will model thermal optima as a response variable and environmental temp * clade as explanatory variables.
seq(2,100)
## [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## [35] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
## [52] 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
## [69] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
## [86] 87 88 89 90 91 92 93 94 95 96 97 98 99 100
c(2:100)
## [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## [35] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
## [52] 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
## [69] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
## [86] 87 88 89 90 91 92 93 94 95 96 97 98 99 100
cumsum(rep(1,100))[-1]
## [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [18] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## [35] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
## [52] 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
## [69] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
## [86] 87 88 89 90 91 92 93 94 95 96 97 98 99 100
set.seed(26)
dat <- rnorm(10)
(A <- sum(dat))
## [1] 0.5174843
(B <- sum(dat^2))
## [1] 7.999122
(C <- sum(dat)/mean(dat)/length(dat))
## [1] 1
(D <- sum(dat - mean(dat))^2)
## [1] 3.774823e-32
(E <- B - length(dat) * mean(dat)^2)
## [1] 7.972343
The solutions of B and E are always equal.
## Loading required package: plyr
## lowestGDP
## 1 Congo, Democratic Republic of
## 2 Liberia
## 3 Burundi
## 4 Zimbabwe
## 5 Eritrea
## 6 Central African Republic
## 7 Niger
## 8 Sierra Leone
## 9 Malawi
## 10 Togo
## highestGDP lowestbirth
## 1 Saint Pierre and Miquelon Hong Kong
## 2 San Marino Japan
## 3 Sint Maarten Germany
## 4 Somalia Andorra
## 5 South Georgia and the South Sandwich Islands Italy
## 6 South Sudan Macau
## 7 Tokelau Guernsey
## 8 Turks and Caicos Islands Austria
## 9 Tuvalu Bosnia and Herzegovina
## 10 Vatican City Lithuania
## highestbirth lowestlit
## 1 Palestinian territories Mali
## 2 Pitcairn Islands South Sudan
## 3 Saint Barth�lemy Burkina Faso
## 4 Saint Martin Niger
## 5 Sint Maarten Guinea
## 6 South Georgia and the South Sandwich Islands Chad
## 7 South Sudan Ethiopia
## 8 S�o Tom� and Pr�ncipe Sierra Leone
## 9 Tokelau Benin
## 10 Vatican City Senegal
C/D) 10 countries with the highest densities
## Country Population Area Water Density
## 1 Bahrain 1234596 758 0.00 1628.755
## 2 Bermuda 64566 54 0.00 1195.667
## 3 Dominica 9378818 751 0.72 12488.439
## 4 Gibraltar 29441 6 0.00 4906.833
## 5 Hong Kong 7097600 1104 4.53 6428.986
## 6 Macau 556800 30 0.00 18560.000
## 7 Malta 417608 316 0.00 1321.544
## 8 Monaco 35000 2 0.00 17500.000
## 9 Singapore 5076700 710 1.43 7150.282
## 10 Sint Maarten 37429 34 0.00 1100.853
10 countries with the lowest densities
## Country Population Area Water Density
## 1 Australia 22722835 7692024 0.76 2.95407750
## 2 Botswana 1800098 582000 2.58 3.09295189
## 3 Falkland Islands 3000 12173 0.00 0.24644705
## 4 Greenland 56452 2166086 NA 0.02606175
## 5 Iceland 318452 103000 2.67 3.09176699
## 6 Mongolia 2822900 1564100 0.68 1.80480788
## 7 Namibia 2088669 824268 0.12 2.53396832
## 8 Pitcairn Islands 50 47 0.00 1.06382979
## 9 Suriname 525000 163820 4.77 3.20473691
## 10 Western Sahara 531000 266000 0.00 1.99624060
Histogram of birth rates in Europe (red), Asia (purple), and Africa (green)
The density of birth rates are unimodal in Europe and Asia and bimodal in Africa (easier to visualize when Africa is plotted separately). Europe has a right-skewed distribution, Asia has a symmetric distribution, and Africa has a left-skewed distribution. Europe has a narrow spread; it’s birth rates do not exceed 20. Asia and Africa have greater spreads, with birth rates ranging from ~ 7 - 46 in Asian countries and ~ 12 - 50 in African countries.
Skew <- function(x) {
x2 <- x[!is.na(x)]
n <- length(x2)
mn <- sum(x2)/n
skew <- (sum((x2 - mn)^3)/n) / (sum((x2 - mn)^2)/n)^(3/2)
return(skew)
}
## [1] 1.870465
## [1] -1.417821
## [1] 0.7485703
GDP has the greatest skew, and it’s skewed to the right. Literacy has the 2nd greatest skew and is skewed to the left. Birth rate is the least skewed and is skewed to the right.
These assessments correspond to the visualization, show in the histograms above. The strong right skew of GDP makes logical sense, knowing that the majority of wealth in the world is held by relatively few countries and individuals. The strong left skew is kind of surprising. I would have expected it to be less skewed, with a larger variance of literacy among countries. Birth rates match my real-world predictions.
## Continent skew.GDP skew.literacy skew.birthrate
## 1 Africa 2.0451059 -0.4611914 -0.6254087
## 2 Asia 1.7775283 -1.3431745 0.4778668
## 3 Europe 1.2085067 -3.1042619 1.2721972
## 4 North America 1.6008635 -1.0741350 0.8174836
## 5 Oceania 1.3231944 -0.7961659 0.0174787
## 6 South America 0.1406809 0.1837546 0.9035542
## 7 <NA> NaN NaN NaN
Interpretation:
GDP - There’s a wide range in the strength of the skew for GDP, but all continents have a right skew. The difference is especially apparant between Africa, which has a very strong skew and South America, which has a very weak skew. This suggests that the majority of wealth in Africa belows to very few countries, while the wealth is distributed much more evenly among countries in South America.
Literacy - There is a left skew in literacy for all of the continents, except South America. This time, we see a stark difference between South America and Europe, where South America has a weak right skew and Europe has a very strong left skew. Almost all European countries fall into the top tier of literacy, while South American countries have a wide range of literacy.
Birth rate - There is a right skew in birth rate for all continents, except Africa. We looked into this pattern for Africa earlier (Q4).
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "log" is not a
## graphical parameter
getSummaryStatsByHand <- function(x,y, plotreg =TRUE, plotdiag = TRUE)
{n <- length(x)
x.bar <- sum(x)/n
y.bar <- sum(y)/n
x.sd <- sd(x)
y.sd <- sd(y)
r <- cor(x, y)
b <- r * (y.sd / x.sd)
a <- y.bar - b * x.bar
SS.total <- sum((y - y.bar)^2)
SS.residual <- sum((y - (a + b * x))^2)
SS.model <- SS.total - SS.residual
r2 <- SS.model / SS.total
if(plotreg)
# plot the regression
if(plotdiag)
# plot the diagnostics
return(list(x.bar = x.bar, y.bar = y.bar,
x.sd = x.sd, y.sd = y.sd,
r = r, a = a, b = b,
SS.total = SS.total, SS.model = SS.model, SS.residual =
SS.residual, r2 = r2))
}
## x.bar y.bar x.sd y.sd r a b
## [1,] 9 7.500909 3.316625 2.031568 0.8164205 3.000091 0.5000909
## [2,] 9 7.500909 3.316625 2.031657 0.8162365 3.000909 0.5000000
## [3,] 9 7.500000 3.316625 2.030424 0.8162867 3.002455 0.4997273
## [4,] 9 7.500909 3.316625 2.030579 0.8165214 3.001727 0.4999091
## SS.total SS.model SS.residual r2
## [1,] 41.27269 27.51000 13.76269 0.6665425
## [2,] 41.27629 27.50000 13.77629 0.6662420
## [3,] 41.22620 27.47001 13.75619 0.6663240
## [4,] 41.23249 27.49000 13.74249 0.6667073
There appears to be a linear relationship between x and y in the 1st dataset (red), a quadratic relationship between x and y in the 2nd (blue) and 3rd (orange) datasets, and no relationship between x and y in the 4th (purple) dataset. When plotting all 4 datasets together using only linear models, the regression lines are equal. Although the variables in these datasets actually have very different relationships, the solutions for regression coefficients are equal. However, t is not appropriate to use linear models for all 4 datasets given the nature of the relationships between x and y.
It’s clear that the quartet datasets have similar summary stats, but appear different from one another on a graph. After Googling it, I found that the quartet is meant to illustrate the importance of graphical interpretation/data visualization prior to analyses.
Bonus - I accidentally plotted the curves before the regression lines, so the plot is included above.
I wrote a function, DeMereA, to generate n number of random rolls of 6-sided dice. It returns TRUE when at least one 6 is rolled and FALSE when 6 is not rolled.
I used the replicate function to simulate 1000 rolls of de Mere’s game, which rolls 4 dice at a time. I then used the length function to count the number of times the function returned “TRUE”, aka the number of victories.
DeMereA <- function(n) {
roll <- sample(1:6, size = n, replace = TRUE)
if(6 %in% roll){
return(TRUE)}
else{return(FALSE)}}
roll.1000 <- replicate(1000, DeMereA(4))
table(roll.1000)
## roll.1000
## FALSE TRUE
## 503 497
I wrote a function, DeMereB, to update the simulation from above to include 24 dice/game. I also changed the rules so that at least 1 in 24 die must roll double 6’s for a victory.
I simulated this game 1000 times.
DeMereB <- function(n) {
die1 <- replicate(24, sample(1:6, size = n, replace = TRUE))
die2 <- replicate(24, sample(1:6, size = n, replace = TRUE))
if(die1 == 6 & die2 == 6){
return(TRUE)}
else{return(FALSE)}}
rollB.1000 <- replicate(1000, DeMereB(1))
table(rollB.1000)
## rollB.1000
## FALSE TRUE
## 965 35
## [1] "Outcome of die roll at Mr. Abel's"
## x.a
## 5
## 1
## [1] "How much candy did the trick or treater get at Mrs. Bernoulli's?"
## x.b
## H T
## 3 3
## [1] 3
## [1] "What's the probability of getting all 6 pieces of candy at Mrs. Bernoulli's?"
## [1] 0.015625
## [1] "How much candy does the trick or treater get at Dr. Cauchy's?"
## x.c
## Club Heart Spade
## 1 1 2
## [1] 3
## [1] "Number of candies that can be expected to be obtained from Abel"
## [1] "3.5 +- 3.5"
## [1] "Number of candies that can be expected to be obtained from Bernoulli"
## [1] "3 +- 1.5"
## [1] "Number of candies that can be expected to be obtained from Cauchy"
## [1] "3 +- 0.75"
Trick or Treaters can expect the most candy from Mr. Abel, and Dr. Cauchy will provide the most consistent amount of candy.