Binding Affinities Modeling PRELAB

Author

Sadie Conway

Published

April 20, 2001

This is an individual assignment, but you are allowed to work together in groups and discuss coding and answers. That said, you are responsible for all of the material in this laboratory assignment. DO NOT COPY from anyone that you work with. You are NOT allowed to share code. You need to write the code and answer the questions yourself. Try the coding yourself first before seeking help.

Be sure to include your name in the file name as follows: lastname_firstname_labday.qmd. Also, type your full name in the quotes on line 3 where it says “author.”

Due Monday week 5 (04/20/26) by 1pm CDT. Please upload both the .qmd and the rendered html files

For any questions regarding coding material, please contact the Quantitative Biosciences Center, or Professor Michael Walsh, or come to the Quantitative Biosciences Center (QBC) office hours in BSLC 401. Dr. Walsh will have office hours Monday, Wednesday and Friday from 12pm to 1pm (BSLC 211)

Quantitative Biosciences Center Hours: M-T-W-TR 5:30-7:30pm in BSLC 401, and Sun 5:30-7:30pm on zoom! https://college.uchicago.edu/academics/quantitative-biosciences-center

Organize your code and answers clearly in one .qmd file. Enter all answers to boldface questions as comments in the code.

Below is a review of the R concepts you will be using for this lab.

Part-1: Vectors in R

Data values of the same type can be grouped into larger collections called vectors. Think of a vector as a box: it has a length (the number of values it can hold) and you (the coder) can add things to the box, change the size (length) of the box, edit the items in the box, or take them out as you please. Having “boxes” that you can sort values into and manipulate is one of the most useful tools at your disposal.

First, let’s discuss the ways you can create a vector.

If you have a small set of values you want to enter manually, you can use the c() function. In this case, c stands for concatenate, which squishes all of your values into one vector

myvector <- c(2,6,8,1,100,49850)
print(myvector)

[1]     2     6     8     1   100 49850

If you have numerical data that follows a fixed pattern there are a couple of things you can do:

#you can define a vector, similar to the way that you create a variable but instead listing a range of numbers, between start and end points, and counting up by 1's.

myvector <- 3:9
print(myvector)

[1] 3 4 5 6 7 8 9

#the seq command will give you all of the numbers between a start and end point, counting by a specific factor.
#the syntax is as follows: seq(startingvalue,endingvalue,countby)

myvector <- seq(0,100,10)
print(myvector)

 [1]   0  10  20  30  40  50  60  70  80  90 100

One particularly useful way to make a vector is to create an empty vector of all 0s. This uses the numeric() function, where the number inside the bracket is the length of your vector. In other words, this allows you to specify how long an empty vector is.

Whenever you are calculating a large set of data points, having an “empty box” of a vector is helpful if you want to store the numbers you calculated.

myvector <- numeric(10) #the numeric here just tells R to create a vector that stores the numeric data type
print(myvector)

 [1] 0 0 0 0 0 0 0 0 0 0

Manipulating Vectors

Once you’ve made your vector, there are lots of ways to manipulate it in order to extract data. For this section, let’s create a simple vector, x, from 5 to 20, increasing by 1's.

x <- (5:20)
x

 [1]  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

Here is a quick list of the things you can do with your vector. Please note that this is not extensive, but just meant to show you some of the options available to you. Note that vector manipulation is one of the best things about R. Many of these operations would be tedious to do by hand, but, in R, can be solved within seconds using only one line of code.

Since this chunk has many functions in it, we will disable the output for convenience. If you ever need to do this, you can type {r, eval=FALSE} at the top of your chunk. Please remove the eval=FALSE and run the code chunk below:

length(x) #finds the number of data points in your vector

x[2] #indexes a location within your vector. For example, this returns the second number in vector x

x[1:5] #returns the values in locations 1 through 5 

x[3] <- 2 #reassigns one of the values (at position 3) to a new value (2)

x[-4] #prints all elements of the vector except the element at index 4 To create a new vector without the index 4 element, you need to create a new vector.

NewVec1 <- x[-4] #creates a new vector with all the elements of vector x except the element at index 4.

x + 10 #adds 10 to every value in the orignal x vector

NewVec2 <- x+10 # creates a new vector with 10 added to every value in the vector.

Remember that vectors are also variables so they can be reassigned, manipulated, or copied.

vector1 <- c(1,3,5,7) #creates new vector 1,3,5,7
vector2 <- vector1 #creates a new vector called vector2, with the same values as vector 1
print(vector1)

[1] 1 3 5 7

print(vector2)

[1] 1 3 5 7

#we can now manipulate vector1 without changing the values in vector2
vector1 <- vector1 + 5 
# in the above line, we're updating the value of vector 1 to be the values of vector1 + 5

#vector1 has been updated, but vector2 remains the same
print(vector1)

[1]  6  8 10 12

print(vector2)

[1] 1 3 5 7

What if we want to find the sum of the elements of our vector? Luckily, there is a built-in function in R that calculates the sum of the objects input into it. Its name is, appropriately, sum(). Try the following command to find the sum of the numbers contained in x:

sum(x) #remember that x contains integers from 5 to 20, inclusive, and with the value at position 3 changed from 7 to 2. Does the sum (and the following commands) make sense?

[1] 200

Once you have a vector, you can access individual elements of the vector by using the index feature of R. Note that index values in R start with 1 and not 0! The index is simply what position in the vector whatever item is - for example, if you want to obtain the first element in a vector, you simply type something like the following:

x[1]

[1] 5

This can be repeated for any value that you care about in the vector:

x[3]

[1] 7

x[6]

[1] 10

x[9]

[1] 13

So x[1] will give you this first item that you entered into your vector, x[2] will give you the second item, etc. If you want to pick out a range of numbers (say, the elements in the vector from position 3 to position 7), you can simply use a colon when subsetting the vector:

x[3:7]

[1]  7  8  9 10 11

Part-1 Questions

1.1 Using c( ), create a new character vector (vector of strings) containing your first name, last name, and the string"BIOS20186"

vec1 <- c("Sadie", "Conway", "BIOS20186")

1.2 Using indexing, return your last name from the vector you just made.

vec1[2]

[1] "Conway"

1.3 Using seq(), create a vector containing numbers 1 to 10 with the interval of 0.25 (Hint: use the help option to learn about this built-in function in R).

seq(1, 10, 0.25)

 [1]  1.00  1.25  1.50  1.75  2.00  2.25  2.50  2.75  3.00  3.25  3.50  3.75
[13]  4.00  4.25  4.50  4.75  5.00  5.25  5.50  5.75  6.00  6.25  6.50  6.75
[25]  7.00  7.25  7.50  7.75  8.00  8.25  8.50  8.75  9.00  9.25  9.50  9.75
[37] 10.00

1.4 What are the outputs for each of the following commands? Describe what each of the following commands are doing in comments next to each line of code. Hint: try the commands one at a time.

# Remove the "eval = FALSE" before running the code chunk
myVec <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100) #Creates a vector

myVec[3:8] #Returns the values in position 3 through 8

[1] 30 40 50 60 70 80

myVec[-3] #Removes the 3rd element and returns all of the others

[1]  10  20  40  50  60  70  80  90 100

myVec[c(3,5,9)] #Returns elements in position 3 5 and 9

[1] 30 50 90

myVec[4] <- 22 #Replaces the 4th element with the value 22

7+myVec #Adds 7 to every value in the vector

 [1]  17  27  37  29  57  67  77  87  97 107

Part-2: Plotting and Data Frames

Plotting

The primary tool for data visualization in R is the plot() function. For R to plot a function, it needs a list of x values and a list with the same number of y values or an equation that involves the x values.

While a lot of biologists use R to graph experimentally collected data, many will also use R or another analytic tool to test potential mathematical models for their data. You will be using R for the latter purpose.

An example of using the plot() function is as follows:

x <- seq(0, 100, 1) #Tells the code to create a list of x values from 0 to 100 separated by 1

plot(x, x^(2)-20, type = "l", xlab = "My x values", ylab = "My y values")

legend("topleft",c("Meaningless Legend", "oooh second color"),fill=c(1, 5))

Data Frames

We’ve already gone over how to manipulate data in the form of vectors, however real data most often does not come in vectors, and is not usually immediately ready for analysis. The most common form data comes in is called a data frame which is a list of data with numbered rows and titled columns, think excel sheets.

Dataframes contain two dimensions, as opposed to vectors which only have one. To access a specific point in a dataframe you need to give it two indexes: the column index and the row index. In code form we can do: dfname[row_index, col_index]. If we want to access a specific row or column we just do not input anything for the second dimensions: dfname[row_index, ] or dfname[ , col_index].

Most dataframes columns and/or rows are titled, which makes data access considerably easier if we know the title of the columns or row we want to access. To do this we do: dfname$col_name or dfname$row_name.

In the following example we will use a dataframe with penguin data from the package palmerpenguins, which contains a dataframe called penguins

If you do not have the package installed you will need to type install.packages("palmerpenguins") into your r console

#you many need to do install.packages("palmerpenguins") before 
library(palmerpenguins)

Warning: package 'palmerpenguins' was built under R version 4.5.3


Attaching package: 'palmerpenguins'

The following objects are masked from 'package:datasets':

    penguins, penguins_raw

#Print out a small subset of our dataframe
head(penguins)

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <fct>, year <int>

#Create a vector with only the flipper lengths
lengths <- penguins$flipperead.r_length_mm

Warning: Unknown or uninitialised column: `flipperead.r_length_mm`.

#Printing the first ten flipper lengths
lengths[1:10]

NULL

#plotting flipper length against the bill depth
plot(penguins$bill_depth_mm, penguins$bill_length_mm, xlab="bill depth (mm)", ylab="bill length (mm)", main="Penguin Characteristics", pch=19, col="lightblue")

Part-2 Questions

2.1 (a) Load the data file called coronadata.csv into R using the read.csv() command. This data is from and contains the number of new cases of COVID-19 in China and Italy during the first 99 days of 2020.

data1 <- read.csv("C:/Users/sadie/Downloads/Actin_Kd_Modeling PRELAB/PRELAB/coronadata.csv")
read.csv("C:/Users/sadie/Downloads/Actin_Kd_Modeling PRELAB/PRELAB/coronadata.csv")

   Day China Italy
1    1     0     0
2    2     0     0
3    3    17     0
4    4     0     0
5    5    15     0
6    6     0     0
7    7     0     0
8    8     0     0
9    9     0     0
10  10     0     0
11  11     0     0
12  12     0     0
13  13     0     0
14  14     0     0
15  15     0     0
16  16     0     0
17  17     4     0
18  18    17     0
19  19   136     0
20  20    19     0
21  21   151     0
22  22   140     0
23  23    97     0
24  24   259     0
25  25   441     0
26  26   665     0
27  27   787     0
28  28  1753     0
29  29  1466     0
30  30  1740     0
31  31  1980     3
32  32  2095     0
33  33  2590     0
34  34  2812     0
35  35  3237     0
36  36  3872     0
37  37  3727     0
38  38  3160     0
39  39  3418     0
40  40  2607     0
41  41  2974     0
42  42  2490     0
43  43  2028     0
44  44 15141     0
45  45  4156     0
46  46  2538     0
47  47  2007     0
48  48  2052     0
49  49  1890     0
50  50  1750     0
51  51   394     0
52  52   891     0
53  53   826    14
54  54   647    62
55  55   218    53
56  56   515    97
57  57   410    93
58  58   439    78
59  59   329   250
60  60   428   238
61  61   574   240
62  62   205   561
63  63   127   347
64  64   119   466
65  65   117   587
66  66   170   769
67  67   101   778
68  68    46  1247
69  69    45  1492
70  70    20  1797
71  71    29   977
72  72    24  2313
73  73    22  2651
74  74    19  2547
75  75    22  3497
76  76    25  2823
77  77    43  4000
78  78    23  3526
79  79    44  4207
80  80    99  5322
81  81    52  5986
82  82    65  6557
83  83   138  5560
84  84    69  4789
85  85    78  5249
86  86   102  5210
87  87    94  6153
88  88   119  5959
89  89   113  5974
90  90    98  5217
91  91    84  4050
92  92    54  4053
93  93   100  4782
94  94    70  4668
95  95    62  4585
96  96    48  4805
97  97    67  4316
98  98    56  3599
99  99    86  3039

2.1 (b) Create a lineplot with the Day on the x-axis and number of new incidences per day in China on the y axis. Add the following parameter: type = "l" in your plot() function to connect the dots by a line.

plot(data1$Day, data1$China, type = "l", col = "green", xlab = "Day", ylab = "New Incidences per Day", main = "China vs Italy")

2.1 (c) Add the number of new incidences per day in Italy to the above plot. In other words make one plot that contains both the number of new incidences per day for China and Italy. Use different colors for each data set. Hint: use the lines() function to add a new graph to an excising plot.

plot(data1$Day, data1$China, type = "l", col = "green", xlab = "Day", ylab = "New Incidences per Day", main = "China vs Italy")
lines(data1$Day, data1$Italy, type = "l", col = "red")

Part-3: Binding Affinity of Actin Binding Protein - Profilin

Microfilaments are one of the three major cytoskeletal elements in eukaryotic cells, and they play a key role in many cellular processes such as cell division, cell motility, and cell shape maintenance. Microfilament structure is a result of the processes of nucleation, elongation, and equilibrium of actin subunits coming on and off filaments. An example is shown in the graph to the right, taken from your textbook. During a slow initial lag phase, called nucleation, G-actin monomers bind to each other forming small oligomeric “nuclei” or “seeds” comprised of a few actin subunits. Additional monomers are added to the oligomeric seeds at a much faster rate. During this elongation phase, longer actin filaments are formed. These filaments are also referred to as F-actin, which is short for filamentous actin. The polymerization reaction is reversible, and thus, the filament is a dynamic structure – losing and gaining actin monomers. Once the filament reaches a steady state (equilibrium) condition, where the rate of addition of monomers equals the rate of depolymerization, the filament no longer grows. This is seen as a plateau on the graph.

Inside the cell, polymerization is not an unregulated spontaneous process. It is carefully orchestrated and regulated by the action of many proteins that allow for polymerization as needed in the cell. Functions for these proteins include maintenance of a large pool of free actin monomer, promoting or blocking nucleation events, accelerating or terminating elongation of the polymer, severing and disassembly of filaments and bundling filaments.

In this lab, you will investigate the actin-binding protein, profilin, and its binding affinity for G-actin.

Look up some of the functions of profilin, especially with regard to nucleation and elongation

The binding affinity of two proteins is typically described by an equilibrium dissociation constant, K_d (also written as K_D). The binding of profilin to actin is a reversible reaction. Written as a dissociation reaction,

$AP\leftrightarrow A + P$

where A, P, and AP are monomeric actin, profilin, and profilin bound actin, respectively, the equilibrium dissociation constant equals the concentration of the free protein products divided by the concentration of bound protein.

$K_{d}\, = \frac{[A][P]}{[AP]}$

The K_d is reported in units of molarity (M) and it reflects the affinity of the two proteins binding to each other. As the affinity increases, the proportion of bound protein [AP] increases. Thus, as [AP] increases, K_d decreases. A protein with a K_d in the nanomolar (10^-9M) range has a stronger affinity than a protein with a K_d in the micromolar (10^-6M).

We will determine the K_d value for profilin indirectly by measuring actin assembly rates as a function of free actin. The more free actin in the reaction, the faster the assembly rate. With the pyrene-actin fluorescence assay used in these experiments, the assembly rates reflect both nucleation and elongation events. We will be focusing on the initial portion of actin assembly, during which actin oligomers (“seeds”) form but fluorescence observed also includes fluorescence due to elongation of actin filaments extending from the seeds. Regardless, the assembly rate is a function of the concentration of free actin.

The modeling lab in week 5 will explain how to obtain the K_d value of profilin for G-actin from a plot of profilin concentration versus assembly rate. For the pre-lab, we are focusing on how to obtain assembly rates using a linear regression of the initial “nucleation” phase of the assembly curve.

In this assignment, you will plot data from a file containing the fluorescence vs. time data from two actin assembly reactions - one reaction performed with actin only, and another reaction where 3 µM profilin was included. This data is from pyrene-actin assays similar to what section 1 performed in lab on week 3.

In the week 5 lab we will filter the data to limit the analysis to the linear portion of the actin assembly curve and we will perform a linear regression to find a best fit line for these linear portions of the curve. The slope value of the line is the assembly rate.

Read in and Plot Experimental Data for a Range of Profilin Concentrations

Handling Experimental Data

Import the data file in your folder that you downloaded from Canvas and unzipped.

The data file Profilin_AssemblyCurves.csv should be in the folder with this document. Do not move it out of this folder. This files contains fluorescence readings collected from in vitro pyrene-actin assays, where the concentration of actin was kept constant and the concentration of profilin varied. The dataset includes readings from the first 3000 seconds of a polymerization reaction, which is primarily during the nucleation phase but will include elongation events as well. With pyrene assays, keep in mind that the increase in fluorescence includes both nucleation and elongation events.
The following command will read the data file into a variable. The example is Variablename, but you can choose something more informative.
```
Variablename <- read.csv("C:/Users/sadie/Downloads/Actin_Kd_Modeling PRELAB/PRELAB/Profilin_AssemblyCurves.csv")
```
Read in the file by assigning the whole .csv file to a variable name – which will be a matrix with dimensions of 201x12 (201 rows and 12 columns) – this is a data frame.

Choose between the following 2 ways to work with your data

1. Assign each column to a specific variable

Variable_name_column <- Variablename$columnName

columnName is the name of the column explicitly written in the .csv file. You need to look at the .csv file to see the tag for each column. Hint: you can do this in RStudio by clicking on the name of your data frame in environment (top right box on your screen).
If you choose this method, only create vectors for the first two columns. You do not need to use the rest of the .csv yet!

2. Work with the dataframe itself – remembering the order of the columns and what each column corresponds to.

Variable_name_column <- Variablename[,2]

In this case, you are still looking at the same set of data, but accessing it numerically through indexing rather than the column name. In this example, we used the number 2 to pull from the second column in the dataset (which is data from the actin only reaction).

Regardless of which way you access your data, you can now use these variables (Variable_name_column) in your .qmd file for plotting or other purposes.

3.1.Plot the actin only and the actin plus 3 μM Profilin as lines in one graph (use type=“l”, include axes labels with units, title, and add a legend). See Part 2 above for explanations on making plots.

The concentration of actin is constant for each reaction, however, the amount of profilin varies. The profilin concentration is listed in the column headers. At 3 µM profilin (column 6), there is a slight molar excess of profilin relative to actin.
The units in the .csv data file are in seconds for time and arbitrary units (A.U.) for the Pyrene Fluoresence listed in the other columns. Make sure to include units in your axis labels.
Assign colors to the curve(s) by using the following code to set up a variable cl with a rainbow of indexed colors. When setting the colors of a curve, use col=cl[1] (where the index refers to one of the 2 colors) directly within your parentheses when plotting. We’ve set up the color rainbow with only two colors but when plotting multiple lines, you would increase the rainbow number to as many lines as you are plotting.
```
cl<-rainbow(2)
plot(Variablename$Time, Variablename$ActinOnly, type = "l", col = cl[1], xlab = "Time (s)", ylab = "Pyrene Fluoresence (A.U.)", main = "Actin Only vs Actin + 3um Profilin")
lines(Variablename$Time, Variablename$p3uM, type = "l", col = cl[2])
legend("topleft", legend = c("Actin Only", "Actin + 3um Profilin"), col = cl, lwd = "2")
```

3.2. Describe what you see.

A) Examine the slopes of the two lines. Is the rate of actin assembly faster or slower in the presence of profilin?

ANSWER HERE: The rate of actin assembly is slower in the presence of profilin, which is observed based on the shallower slope of the cyan line.

B) What is the role of profilin during the nucleation phase? (Look up the roles of profilin in actin assembly.)

ANSWER HERE: During nucleation, profilin binds to actin monomers to prevent them from nucleating and forming new actin nuclei and allowing the cell to maintain a large store of actin monomers.

C) What is the role profilin during the elongation phase?

ANSWER HERE: During elongation, profilin delivers actin monomers to the barbed end of the filament.

In week 5, be prepared to explain the effect(s) of profilin on the actin assembly rate (slope).

Perform a linear fit to the experimental data

Now we want to fit a line through each graph and calculate its slope by performing a linear least squares regression. The slope will give us the assembly rates so that we can quantitate how much profilin affects actin assembly. Additionally, we want to inspect the R² value (coefficient of determination) to determine the overall model performance. The R² represents the amount of variation in y explained by x in a given data set. In our case the R² is interpreted as the proportion of the variance in the dependent variable (fluorescence) that is predictable from the independent variable (Profilin concentration). An R² of 0 means that the dependent variable (fluorescence) cannot be predicted from the independent variable (Profilin concentration). An R² of 1 means the dependent variable can be predicted without error from the independent variable. An R² of 0.10 means that 10 percent of the variance in Y is explainable by X.

A summary of relevant R syntax:

R syntax for linear least squares regression

# don't remove eval = FALSE from the example code but do remove it when writing your own code chunk

fit <- lm(y ~ x) #(lm=linear model), y=variable on y-axis, x=variable on x-axis

Intercept, slope and R²

intercept<-fit$coefficients[1]
slope<-fit$coefficients[2]
R2<-summary(fit)$r.squared

Plot linear fit

plot(x,y,cex=0.3,ylim=c(100,600)) # plot experimental data
# cex=0.2 makes small dots for points
# ylim sets the same y-axis range for each plot

abline(fit,col="red") #draws best fit line in red

legend("topleft", bty="n", cex=0.7, legend=c(paste("slope is", format(slope, digits=3)),paste("R2 is", format(R2, digits=3))))
# bty = "n" suppresses making a box around the legend

3.3. Perform linear fits for the actin alone curve and for the actin + 3 µM profilin curve. Then, on two separate plots, for each plot, show: (a) the data points, (b) the best fit line, and (c) the slope and r-squared value for the best fit line.

# Actin Only linear fit
x <- Variablename$Time
y <- Variablename$ActinOnly

fit <- lm(y ~ x)
slope <- coef(fit)[2]
R2 <- summary(fit)$r.squared

plot(x, y,
     cex = 0.3,
     ylim = c(100, 600),
     xlab = "Time (s)",
     ylab = "Pyrene Fluorescence (A.U.)",
     main = "Actin Only Linear Fit")

abline(fit, col = "red", lwd = 2)

legend("topleft",
       bty = "n",
       cex = 0.8,
       legend = c(
         paste("Slope =", round(slope,3)),
         paste("R2 =", round(R2,3))
       ))

# Actin + 3 uM Profilin linear fit
x <- Variablename$Time
y <- Variablename$p3uM

fit <- lm(y ~ x)
slope <- coef(fit)[2]
R2 <- summary(fit)$r.squared

plot(x, y,
     cex = 0.3,
     ylim = c(100, 600),
     xlab = "Time (s)",
     ylab = "Pyrene Fluorescence (A.U.)",
     main = "Actin + 3 uM Profilin Linear Fit")

abline(fit, col = "blue", lwd = 2)

legend("topleft",
       bty = "n",
       cex = 0.8,
       legend = c(
         paste("Slope =", round(slope,3)),
         paste("R2 =", round(R2,3))
       ))

During the week 5 lab, you will plot the curves and calculate the individual slopes for actin assembly in the presence of a range of profilin concentrations. From this data, you will be able to estimate the binding affinity of profilin for G-actin.