For this assessment, complete the readings and preparatory exercises. Then complete the four components below.

Component 1: R Review

Instructions

Submit your answers to the 5 questions below:

Q1: Which of the following expressions assigns the number 2 to the variable x?

Choose one or more:

  1. x == 2
  2. x <- 2
  3. x - 2
  4. x = 2

Your answer: B and D

Q2: What does the following expression return?

paste("apple", "pie")

Choose one:

  1. “applepie”
  2. “apple, pie”
  3. “apple pie”
  4. An error

Your answer: C

Q3: What does the following expression return?

max(abs(c(-5, 1, 5)))

Choose one:

  1. -5
  2. 1
  3. 5
  4. An error

Your answer: C

Q4: If x and y are both data.frames defined by:

x <- data.frame(z = 1:2)
y <- data.frame(z = 3)

which of the following expressions would be a correct way to combine them into one data.frame that looks like this:

z
-
1
2
3

(i.e. one column with the numbers 1, 2, and 3 in it)

Choose one or more:

  1. join(x, y)
  2. c(x, y)
  3. rbind(x, y)
  4. x + y

Your answer: C

Q5: Given the following data.frame,

x <- data.frame(y = 1:10)

Which expression(s) return a data.frame with rows where y is greater than 5 (i.e. 6 - 10)?

Choose one or more:

  1. x[x$y > 5,]
  2. x$y > 5
  3. x[which(x$y > 6),]
  4. x[y > 5,]
  5. subset(x, y > 5)

Your answer: E

Component 2: reflections on the readings

Instructions:

Write a brief reflection on the three assigned readings. In your answer, address each of the questions associated with each article.

Simple rules for organizing data in a spreadsheet

Things to consider while reading:

  • What are some problems with spreadsheets and datasets that you’ve encountered in the past?
  • Were any of the tips new?
  • How many of the tips have you not adhered to in the past?

A quick guide to organizing computational biology projects

Things to consider while reading:

  • How have you organized your files on your own computer?
  • Do you think you might have different file structures for different courses/projects?
  • What would be the ideal filesystem structure for your thesis?

Introduction to RMarkdown

Things to consider while reading:

  • What is the biggest benefit you can see to using Rmarkdown?
  • How might you organise your Rmarkdown document to be most useful to you?
  • Do you think you might use Rmarkdown for your thesis?

Your reflection:

1.0 Simple rules for organizing data in a spreadsheet

1.1 I have had datasets that have inconsistent variable names from that of what we used the lab. This caused some confusion as I was unsure of what acronym was for which variable. E.g. Pipefish was now PF. 

1.2 Having a comprehensive data dictionary is something that is new to me. I can see how this would be very useful when sharing your work to colleagues and quickly bringing them up to speed on the terminology you might be using which is specific to the project. 

1.3 I have used colour coding in the past as a way to quickly look at a copy of the raw data and see results clearly. I did this because I had no intention of looking at the data after an extended period of time. After reading the paper I can now see that this could become an issue further down the line. 


2.0 A quick guide to organizing computational biology projects

2.1 I have organised files on my computer, especially for different uni courses. As an example, in this course I will have a folder titled 'BIOL459' and then within this folder I will have folders for each tutorial:

--BIOL459--
----Craigs_assignment----
----Matts_assignment----
----Amys_assignment----

2.2 It is definidtly course dependent as in BIOL309 I did week by week folders as this made more sense due to the structure of the taught material being week by week. In some weeks there were two topics so I would sub folder them:

--BIOL309--
----Week_1----
----Week_2----
------Topic_1------
------Topic_2------
----Week_3----
...

2.3 As I havn't started by thesis (I only have a rough idea of the project) I am unsure how I will structure it. I think at the moment a structure such as the following is a likley starting place however this will change as specifics of the project develop. 

-MSc Thesis-
  -Introduction-
    -References from lit review-
  -Methods-
    -Figures-
    -Reference-
  -Results-
    -Figures-
      -tables-
      -images-
    -R_code-
      -raw_data-
      -cleaned-data-
    -References-
  -Discussion-
    -References-
  -Conclusion-
  -Apendicies-
    -Images-
    -Tables-
    -Data-
    
3.0 Introduction to RMarkdown
------------

Things to consider while reading:
    
- What is the biggest benefit you can see to using Rmarkdown?
- How might you organise your Rmarkdown document to be most useful to you?
- Do you think you might use Rmarkdown for your thesis?

--------

3.1 The biggest benefit to using Rmarkdown that I can see is 

Component 3: Exploring the RStudio Server and the command line

Instructions

Get set up in the RStudio Server (or install RStudio to your own computer) and try out using the command line. Fill in the following table:

Command What it does Equivalent R command
cd
ls
less
head
tail
wc
grep

```