Intro to Course

Alban Guillaumet, Troy University

“An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.”

John Tukey

Objectives

  • Intro to Course + Syllabus

  • Quick example

Some preliminary questions

  • What is the field(s) in biology you are most interested in?

  • What do you know about statistics?

  • Are you interested by statistics?

  • Why are statistics so important?

  • Why are statistics so important … FOR YOU! ?

Overview: General Info

About me...

  • Alban Guillaumet (French)

  • Vertebrate ecology and evolution

( Photo credit: Dan Clark/USFWS )

Objective - Intro to data science

Emphasis - Applied stats using R

  • Emphasis is placed on the application of quantitative techniques using the statistical software R.

  • Practical:

    • 1 - what type of statistical analyses to run?
    • 2 - how to implement and correctly interpret the analysis?
    • 3 - how to best present the data and analysis?
  • Therefore, expect R to be a very important component of the class

Emphasis - Applied stats using R

  • What is R? (see Lab 1)

Emphasis - Applied stats using R

  • By contrast, the mathematical theory underlying the statistical analyses will be given much less attention; for the most part, we will NOT derive mathematical formula ourselves.

Overview: Method

  • Presentation of new concepts in lecture

  • Gradually building R skills:

    • in the lab
    • in lecture (examples): code integrated and dissected

Perspective

Participation!

  • There may be some difficulties or frustration (too hard, not clear enough, too much work,etc.)!

  • Ask questions in class, and / or come talk to me and let's discuss how to improve your experience. I'm here to help!

  • Especially if you encounter difficulties, please do not wait!

Overview: General Info

Overview: Schedule & Assignments

  • Excel file called 'Schedule' will be updated weekly [Canvas]

Overview: Grading Summary

Category # Points
Homework 1/3
Midterms 1/3
Final 1/3

A = 90 and above; B 80-89.9; C = 70-79.9; D = 60-69.9, F < 60.

Overview: Homework

  • Each week, homework will usually include several practice problems, one of which will generally be graded; a lab assignment related to R practice may be given too.

  • Your Work will be due at the beginning of the class the following week [by email to guillaumet.troy.6691@gmail.com ]

  • No late homework will be accepted.

  • Your lowest homework grade will be dropped.

Overview: Midterms

  • 2 midterms (in class or take home)
  • No make-ups will be given
  • For one legitimate exam absence, you will have the option to replace your missing exam by your final exam.

Overview: Final

  • More details later…

Example

  • Research topic: Disease ecology

  • Question: Is reproduction hazardous to health?

  • Hypothesis: There is a positive relationship between reproductive effort and susceptibility to malaria in great tits

( Picture from Francis C. Franklin / CC-BY-SA-3.0 )

Example - Experimental study

#birdMalariaData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3aBirdMalaria.csv"))
birdMalariaData <-read.csv("C:/Alban/TROY/Teach/RMED/data/chap02e3aBirdMalaria.csv") 
str(birdMalariaData)
'data.frame':   65 obs. of  3 variables:
 $ bird     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ treatment: Factor w/ 2 levels "Control","Egg removal": 1 1 1 1 1 1 1 2 2 2 ...
 $ response : Factor w/ 2 levels "Malaria","No Malaria": 1 1 1 1 1 1 1 1 1 1 ...

Example - the data

x <-  sample(1:65, size = 10, replace = FALSE) 
print(birdMalariaData[x,], row.names = FALSE)  
 bird   treatment   response
   51 Egg removal No Malaria
   11 Egg removal    Malaria
   41     Control No Malaria
   19 Egg removal    Malaria
   33     Control No Malaria
   54 Egg removal No Malaria
   39     Control No Malaria
   64 Egg removal No Malaria
   63 Egg removal No Malaria
   18 Egg removal    Malaria

Example - Summarize the info contained in your data

  • Contingency table
d <- birdMalariaData
birdMalariaTable <- table(d$response, d$treatment)
addmargins(birdMalariaTable, FUN = sum, quiet = TRUE)

             Control Egg removal sum
  Malaria          7          15  22
  No Malaria      28          15  43
  sum             35          30  65
  • What statistical hypothesis can you make?

Example - grouped bar graph

plot of chunk unnamed-chunk-4

Example - mosaic plot

plot of chunk unnamed-chunk-5

Example - Statistical test

We test the null hypothesis that the probability to be infected by malaria does NOT depend on the 'treatment' group:

chisq.test(birdMalariaTable, correct = F)

    Pearson's Chi-squared test

data:  birdMalariaTable
X-squared = 6.4931, df = 1, p-value = 0.01083