Lab Report 1

Nicole Jamieson

August 1, 2014


This is your first lab report for “Biological Sampling and Interpretation, BIOL501! For every lab session, you will create such a document. As you can see, part of the work has already been done for you. What you will need to do is fill in the missing parts/edit what is there. Your first set of reports will be handed in on August 27, your second set on October 22. Generally, you are welcome to add notes/examples to this report that go beyond what is asked, but make sure they are relevant to the topic and correct. Also note the 'What have I learnt…' and the 'I got it - what's more?' sections at the end of the document. The former is compulsory and is assessed, the latter is only looked at but not assessed. Whenever you add plain text (not R code) to a document like this, please use '>' in front of the paragraph.

This makes it easier for us to see the parts you have contributed :)

I won't say much regarding the R / R Markdown syntax at this point. The 'syntax' is the way you code things, e.g. if you want to write something in italics, you have to use asterisks. The only thing you need to know now is how to save your R Markdown file (File -> Save As, then choose an appropriate folder), and how to 'knit' it. Knitting is the process of compiling your code, i.e. turning it into a nicely readable html format, which you can print. When you first receive your document, you will be able to knit it by pressing the 'knit HTML' button (you have to have the packages 'knitr' and 'markdown' installed, we will help you with this!). After every major change you make to the editor file, you should knit your document in order to make sure that there is no syntax error (which would prevent the html document from appearing on your screen and an error message to appear in your console). The console is the lower window in Rstudio, where commands are interpreted by R (see also my explanations during the lectures, as well as introductory texts to R). What's the point in using R Markdown? Why not just use word? R Markdown is a cool tool because you can embed R code chunks. In essence, it combines your statistics tool with a word editing software (like word for example). Let us insert some R code to show you:

sqrt(25)  #here we simply calculate the square root of 25
[1] 5

This looks all complicated now, and you may not have understood what is going on at all. This is not a problem though, as much of what you will need to know will be learnt effortlessly simply by looking at the code you are given, and making changes by trial and error.

OK! Let's start to make this interactive. First, give an example of a non-scientific and a scientific hypothesis, as was outlined in the lecture.

Example of a non-scientific (non-testable) hypothesis

There are some rats on rangitoto

Example of a scientific (testable) hypothesis

Students that attend class do better than students who do not attend class

Next, let us think of variables. What is a variable? What kind of variables exist? You should have absolute clarity about the terms 'predictor', 'response', 'categorical', and 'continuous variable'. First, let's define them all, adding alternative terms and subcategories where appropriate.

Predictor variable

Is an independant variable that enables you to predict another variable. If the variable is manipulated rather than calculated or measured

Response variable

A variable which relies on another variable. it also means the same as an independant

Categorical variable

entities are divided into distinct caterogies. Binary: two caterogies e.g. dead or alive. Nominal: more than two e.g. omnivore, vegetarian, vegan ect. Ordinal: same as nominal however, caterogies have an locial order e.g. fail, pass, merit, excellent.

Continuous variable

Entities get a distinct score. e.g. human body weight

Let us create some variables in R now. You can do this by inserting a 'chunk' of code into R (go to Chunks -> Insert Chunk).

haircolour <- c("black", "grey", "white", "blonde", "brown")

This for example created a categorical (and more precisely a nominal) variable named 'haircolour'. If you want to retrieve the variable (see its content), type 'haircolour' into the console:

haircolour
[1] "black"  "grey"   "white"  "blonde" "brown" 

Next, create another nominal variable with the names of the people with the above hair colours. Call the variable 'name' and make sure it has 5 entries as well. Use any names you like. To get you started, I have already inserted an R chunk below. Remove my comment and insert the code!

names <- c("nicole", "rita", "kate", "te rangi", "Epa")
names
[1] "nicole"   "rita"     "kate"     "te rangi" "Epa"     

Let us also create an ordinal and a continuous variable, each containing 5 values:

mark <- c(8, 9, 9, 10, 7)  #ordinal variable
height <- c(1.89, 1.71, 1.62, 1.86, 1.59)  #continuous variable

Aso create a variable 'weight' (the people's body weights). Try to choose the 5 values reasonably to match up with the body heights. This time, you have to insert your own chunk - a copy paste exercise!

weight <- c(66.5, 60.3, 56.8, 111.9, 84)  #continuous variable

Now describe an experiment where you have at least one predictor, and one response variable. Identify the nature of the variables (continuous, categorical, …?). Create the variables in R by inserting R chunks, commenting on what you are doing before or after each chunk. Do NOT simply repeat the mice example from the lecture.

Students who attend class recieve better results compared to students who do not attend class. calcualate students average attendance over a year based on students final end of year exam results.

attendance <- c(20, 20, 21, 23, 24, 20, 19, 23, 20, 24, 89, 90, 87, 96, 95, 
    85, 88, 98, 79, 90)  #Predictor variable

results <- c(50, 45, 44, 51, 49, 48, 51, 47, 46, 40, 88, 87, 89, 95, 96, 97, 
    89, 99, 96, 97)  #Response varible

data.frame(attendance, results)
   attendance results
1          20      50
2          20      45
3          21      44
4          23      51
5          24      49
6          20      48
7          19      51
8          23      47
9          20      46
10         24      40
11         89      88
12         90      87
13         87      89
14         96      95
15         95      96
16         85      97
17         88      89
18         98      99
19         79      96
20         90      97
plot(attendance, results)

plot of chunk unnamed-chunk-8

Make up a hypothesis that tests some theory about your imaginary data set. (For example, if your data is about mice: injecting ascorbic acid into mice does not prolong their longevity.) Explain three ways to increase your signal to noise ratio in your example.

What have I learnt from this lab in terms of using R?

Throughout this lab i have learnt how to insert a title, insert a chunk, write a comment inside chunk the box area, by using the hash tag which allows R not to include what i have written in the code. create a plot and graph by sending it to the console first then displaying it on my report. by highlighting and pressing control-enter it well then send it to the console for R to interpret.

In this section, I would like you to independently list things you have learnt. For example, you can say that you understood that c() is a function in R that is used to create variables. Or you could say that you learnt that the # sign is used to comment something out in R. Or you could clarify (for yourself) little tricky syntax issues like the fact that when you create a nominal variable (called a character vector in R), you can use either the single or the double quote, it doesn't matter:

test1 <- c("Tobi", "Mark")  #using single quotes
test2 <- c("Tobi", "Mark")  #using double quotes
test1
[1] "Tobi" "Mark"
test2
[1] "Tobi" "Mark"

In any case, this list should be relatively long as you will definitely have learnt a lot!

You could also be more systematic and have a section on R functions in every lab report for example, or a glossary with terms we used during the lectures/labs. These are excellent ways to learn.

Insert your part here!

I got that, what's more?

In this section, you will get a few hints on how to go further if you finish early. This is also useful to revisit what you have just done.

Today, try to use the function data.frame to produce a table (called dataframe in R) from the variables you created in the last task. Further, you can try to plot your data somehow. If that is not enough, try to add systematic and random error to your variables in a copy of your dataframe.