R
R
You are NOT required to read these!
But these books are seriously good.
Even the rigorous peer-review system might miss some minor flaws.
\(\rightarrow\) No big deal so long as you offer corrections to your flawed work.
\(\rightarrow\) Knowingly fraudulent practices can cost you your career.
\(\rightarrow\) Knowingly fraudulent practices can cost you your career, discredit your institution and your field of research, and even seriously impede the careers of unknowing co-workers.
Lack of statistical knowledge
\(\rightarrow\) Methods can “break”
\(\rightarrow\) Incorrect conclusions
Lack of biological knowledge
\(\rightarrow\) p-hacking
\(\rightarrow\) Waste of time
\(\rightarrow\) and
\(\rightarrow\)
describes the sum total of all values of a variable given a certain research question. This includes non-measured data. describes the sum total of all values of a variable for any given analysis. This can only include measured data.
In an experimental set-up, you rear an ant colony of exactly 10,000 individuals. You are interested in the average mandible strength of ants within the colony.
You cannot possibly take measurements of all 10,000 individuals.
Taking measurements on a (e.g. 1,000 individuals) from within the (10,000 individuals).
This differentiation is only applicable when concerned with , which we won’t cover in these seminars. describes the subset of the total data which is used to the model. describes the subset of the total data which is used to the performance of the model.
You have identified a way to model how mandible strength and ant size are interconnected but don’t know how to assess the quality of your model (a model will always fit the data it was built on extremely well).
Split the available data into two non-overlapping subsets of data ( and ) and use these separately to build your model and assess its performance.
A procedure is when any member of the has an equal chance of being selected into the .
and are established from the population with the same sense of randomness although there may be exceptions depending on the modelling procedure at hand.
Number all units contained within the set-up and sample those units corresponding to random numbers. Use the sample() function to create truly random subsets. Remember to use set.seed() to make this step reproducible!
R# Making it reproducible set.seed(42) # Establishing a population pop <- c(1:15) pop
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# Establishing a random sample sam <- sample(pop, 5, replace = FALSE) sam
## [1] 1 5 15 9 10
RRR landscapeRBut you will have access to it anyway as it comes with R (we will use version 3.4.2. ).
I recommend RStudio (). If you use it a lot, I also recommend changing the appearance to ‘Vibrant Ink’ (setting located in the ‘Global Options’ window nested within the ‘Tools’ tab).
The Source is where you load scripts and write most of your coding document.
The Environment, History, Connections is where you will be able to quickly access all objects of your current R session.
Files, Plots, Packages, Help Viewer are especially useful for document navigation, data visualisation and to get information on certain functions in R.
The Console is where you execute short commands, and warning and error messages are displayed.