Part 1: Review of functions and workflow

Start by loading packages

# load Seurat, tidyverse, and ggplot packages
library(Seurat)
library(ggplot2)
library(tidyverse)

Q: What if you don’t have your package? Close RStudio and install it via Anaconda.

Next, verify and set your working directory.

Good practice: Get and set your working directory using getwd() and setwd()
Better practice: Avoid using setwd() multiple times in a script, or in the middle of a script, and set it only using an absolute filepath. Keep your analysis data in a subfolder.
Best practice: For personal use, setwd() works fine. But shareable code should be written using “R Projects”. See: https://martinctc.github.io/blog/rstudio-projects-and-working-directories-a-beginner%27s-guide/

getwd()

## [1] "/Users/iancoccimiglio/OneDrive/RstudioCode"

# setwd("/home/ian/OneDrive/RstudioCode/") # You'll need to set this according to your machine.

Question 1: Will this code work if I give it to someone else? Question 2: What is the difference between an absolute and a relative file path?

Old functions to try:

# data <- read.csv(blood_pressure.csv) # why does this code not work?
data <- read.csv("blood_pressure.csv") # why does this code work?

mean(data$bp_before) # What does this function do?

## [1] 156.45

sd(data$bp_before) # What does this function do?

## [1] 11.38985

plot(data$bp_before) # What does this function do?

hist(data$bp_before) # What does this function do?

Question 1: Is there a difference between using single quotes and double quotes? Question 2: What would our plan be if we wanted to compare the female to the male blood pressures?

# I want specific columns!
colVar = c("patient", "sex") # use the function 'c' to 'combine' the column names.
data[colVar] # name of specific columns

# I want specific rows!
male = subset(data, sex == 'Male')
female = subset(data, sex == 'Female')

# I want to plot this data!
par(mfrow=c(1,2))
plot(male$bp_before, col = 1)
plot(female$bp_before, col=2)

Question 1: Question 2:

Part 2: Commenting and documentation

Programmers write comments for documentation or explanation. Create comments by leading with a hash symbol (#), then writing text. Good comments strike a balance of not over-explaining or under-explaining the code w/r/t the intended audience. Over time, you’ll develop your own “style” - try to make it nice for future-you and future-others to read.

## Not good comment - too verbose
numExp <- 1 # Gives a value of 1 to the variable NumExp, then saves it in the environment for later use.

## Better comments, especially for an introductory course
names = c("Fabio", "Sandeep", "Laura", "Henry") # sample of names of Rossi lab members
print(names) # prints the names to the console

## [1] "Fabio"   "Sandeep" "Laura"   "Henry"

myName = "Ian"

if (myName %in% names ) {
  x = "Hello Rossi lab member!"
  print(x)
  } else {
  x = "Access denied"
  print(x)
} # greets user if their name is in list of Rossi lab names.

## [1] "Access denied"

Challenge 1 - what might be the failure modes of this code?
Challenge 2 - how would you comment out this entire section of code? Is there a better way?
Challenge 3 - what if you didn’t like the variable “names” and wanted to call it something else?
Challenge 4 - Is x a good variable name?

Part 3 - Seurat!

pbmc.data <- Read10X('data/filtered_gene_bc_matrices/hg19/')
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)

## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')

# using square brackets [[ ]] allows us to reference specific data. Here, it is being used to save a values of the percentage of mitochondrial (MT) genes in each cell.
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

Coding Club 3