myvector <- c(2,6,8,1,100,49850)
print(myvector)[1] 2 6 8 1 100 49850
This is an individual assignment, but you are allowed to work together in groups and discuss coding and answers. That said, you are responsible for all of the material in this laboratory assignment. DO NOT COPY from anyone that you work with. You are NOT allowed to share code. You need to write the code and answer the questions yourself. Try the coding yourself first before seeking help.
Be sure to include your name in the file name as follows: lastname_firstname_labday.qmd. Also, type your full name in the quotes on line 3 where it says “author.”
Due Monday week 5 (04/20/26) by 1pm CDT. Please upload both the .qmd and the rendered html files
For any questions regarding coding material, please contact the Quantitative Biosciences Center, or Professor Michael Walsh, or come to the Quantitative Biosciences Center (QBC) office hours in BSLC 401. Dr. Walsh will have office hours Monday, Wednesday and Friday from 12pm to 1pm (BSLC 211)
Quantitative Biosciences Center Hours: M-T-W-TR 5:30-7:30pm in BSLC 401, and Sun 5:30-7:30pm on zoom! https://college.uchicago.edu/academics/quantitative-biosciences-center
Organize your code and answers clearly in one .qmd file. Enter all answers to boldface questions as comments in the code.
Below is a review of the R concepts you will be using for this lab.
Data values of the same type can be grouped into larger collections called vectors. Think of a vector as a box: it has a length (the number of values it can hold) and you (the coder) can add things to the box, change the size (length) of the box, edit the items in the box, or take them out as you please. Having “boxes” that you can sort values into and manipulate is one of the most useful tools at your disposal.
First, let’s discuss the ways you can create a vector.
If you have a small set of values you want to enter manually, you can use the c() function. In this case, c stands for concatenate, which squishes all of your values into one vector
myvector <- c(2,6,8,1,100,49850)
print(myvector)[1] 2 6 8 1 100 49850
If you have numerical data that follows a fixed pattern there are a couple of things you can do:
#you can define a vector, similar to the way that you create a variable but instead listing a range of numbers, between start and end points, and counting up by 1's.
myvector <- 3:9
print(myvector)[1] 3 4 5 6 7 8 9
#the seq command will give you all of the numbers between a start and end point, counting by a specific factor.
#the syntax is as follows: seq(startingvalue,endingvalue,countby)
myvector <- seq(0,100,10)
print(myvector) [1] 0 10 20 30 40 50 60 70 80 90 100
One particularly useful way to make a vector is to create an empty vector of all 0s. This uses the numeric() function, where the number inside the bracket is the length of your vector. In other words, this allows you to specify how long an empty vector is.
Whenever you are calculating a large set of data points, having an “empty box” of a vector is helpful if you want to store the numbers you calculated.
myvector <- numeric(10) #the numeric here just tells R to create a vector that stores the numeric data type
print(myvector) [1] 0 0 0 0 0 0 0 0 0 0
Once you’ve made your vector, there are lots of ways to manipulate it in order to extract data. For this section, let’s create a simple vector, x, from 5 to 20, increasing by 1's.
x <- (5:20)
x [1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Here is a quick list of the things you can do with your vector. Please note that this is not extensive, but just meant to show you some of the options available to you. Note that vector manipulation is one of the best things about R. Many of these operations would be tedious to do by hand, but, in R, can be solved within seconds using only one line of code.
Since this chunk has many functions in it, we will disable the output for convenience. If you ever need to do this, you can type {r, eval=FALSE} at the top of your chunk. Please remove the eval=FALSE and run the code chunk below:
length(x) #finds the number of data points in your vector
x[2] #indexes a location within your vector. For example, this returns the second number in vector x
x[1:5] #returns the values in locations 1 through 5
x[3] <- 2 #reassigns one of the values (at position 3) to a new value (2)
x[-4] #prints all elements of the vector except the element at index 4 To create a new vector without the index 4 element, you need to create a new vector.
NewVec1 <- x[-4] #creates a new vector with all the elements of vector x except the element at index 4.
x + 10 #adds 10 to every value in the orignal x vector
NewVec2 <- x+10 # creates a new vector with 10 added to every value in the vector.Remember that vectors are also variables so they can be reassigned, manipulated, or copied.
vector1 <- c(1,3,5,7) #creates new vector 1,3,5,7
vector2 <- vector1 #creates a new vector called vector2, with the same values as vector 1
print(vector1)[1] 1 3 5 7
print(vector2)[1] 1 3 5 7
#we can now manipulate vector1 without changing the values in vector2
vector1 <- vector1 + 5
# in the above line, we're updating the value of vector 1 to be the values of vector1 + 5
#vector1 has been updated, but vector2 remains the same
print(vector1)[1] 6 8 10 12
print(vector2)[1] 1 3 5 7
What if we want to find the sum of the elements of our vector? Luckily, there is a built-in function in R that calculates the sum of the objects input into it. Its name is, appropriately, sum(). Try the following command to find the sum of the numbers contained in x:
sum(x) #remember that x contains integers from 5 to 20, inclusive, and with the value at position 3 changed from 7 to 2. Does the sum (and the following commands) make sense?[1] 200
Once you have a vector, you can access individual elements of the vector by using the index feature of R. Note that index values in R start with 1 and not 0! The index is simply what position in the vector whatever item is - for example, if you want to obtain the first element in a vector, you simply type something like the following:
x[1][1] 5
This can be repeated for any value that you care about in the vector:
x[3][1] 7
x[6][1] 10
x[9][1] 13
So x[1] will give you this first item that you entered into your vector, x[2] will give you the second item, etc. If you want to pick out a range of numbers (say, the elements in the vector from position 3 to position 7), you can simply use a colon when subsetting the vector:
x[3:7][1] 7 8 9 10 11
1.1 Using c( ), create a new character vector (vector of strings) containing your first name, last name, and the string"BIOS20186"
vec1 <- c("Sadie", "Conway", "BIOS20186")1.2 Using indexing, return your last name from the vector you just made.
vec1[2][1] "Conway"
1.3 Using seq(), create a vector containing numbers 1 to 10 with the interval of 0.25 (Hint: use the help option to learn about this built-in function in R).
seq(1, 10, 0.25) [1] 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75
[13] 4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75 6.00 6.25 6.50 6.75
[25] 7.00 7.25 7.50 7.75 8.00 8.25 8.50 8.75 9.00 9.25 9.50 9.75
[37] 10.00
1.4 What are the outputs for each of the following commands? Describe what each of the following commands are doing in comments next to each line of code. Hint: try the commands one at a time.
# Remove the "eval = FALSE" before running the code chunk
myVec <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100) #Creates a vector
myVec[3:8] #Returns the values in position 3 through 8[1] 30 40 50 60 70 80
myVec[-3] #Removes the 3rd element and returns all of the others[1] 10 20 40 50 60 70 80 90 100
myVec[c(3,5,9)] #Returns elements in position 3 5 and 9[1] 30 50 90
myVec[4] <- 22 #Replaces the 4th element with the value 22
7+myVec #Adds 7 to every value in the vector [1] 17 27 37 29 57 67 77 87 97 107
The primary tool for data visualization in R is the plot() function. For R to plot a function, it needs a list of x values and a list with the same number of y values or an equation that involves the x values.
While a lot of biologists use R to graph experimentally collected data, many will also use R or another analytic tool to test potential mathematical models for their data. You will be using R for the latter purpose.
An example of using the plot() function is as follows:
x <- seq(0, 100, 1) #Tells the code to create a list of x values from 0 to 100 separated by 1
plot(x, x^(2)-20, type = "l", xlab = "My x values", ylab = "My y values")
legend("topleft",c("Meaningless Legend", "oooh second color"),fill=c(1, 5))We’ve already gone over how to manipulate data in the form of vectors, however real data most often does not come in vectors, and is not usually immediately ready for analysis. The most common form data comes in is called a data frame which is a list of data with numbered rows and titled columns, think excel sheets.
Dataframes contain two dimensions, as opposed to vectors which only have one. To access a specific point in a dataframe you need to give it two indexes: the column index and the row index. In code form we can do: dfname[row_index, col_index]. If we want to access a specific row or column we just do not input anything for the second dimensions: dfname[row_index, ] or dfname[ , col_index].
Most dataframes columns and/or rows are titled, which makes data access considerably easier if we know the title of the columns or row we want to access. To do this we do: dfname$col_name or dfname$row_name.
In the following example we will use a dataframe with penguin data from the package palmerpenguins, which contains a dataframe called penguins
If you do not have the package installed you will need to type install.packages("palmerpenguins") into your r console
#you many need to do install.packages("palmerpenguins") before
library(palmerpenguins)Warning: package 'palmerpenguins' was built under R version 4.5.3
Attaching package: 'palmerpenguins'
The following objects are masked from 'package:datasets':
penguins, penguins_raw
#Print out a small subset of our dataframe
head(penguins)# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
# ℹ 2 more variables: sex <fct>, year <int>
#Create a vector with only the flipper lengths
lengths <- penguins$flipperead.r_length_mmWarning: Unknown or uninitialised column: `flipperead.r_length_mm`.
#Printing the first ten flipper lengths
lengths[1:10]NULL
#plotting flipper length against the bill depth
plot(penguins$bill_depth_mm, penguins$bill_length_mm, xlab="bill depth (mm)", ylab="bill length (mm)", main="Penguin Characteristics", pch=19, col="lightblue")2.1 (a) Load the data file called coronadata.csv into R using the read.csv() command. This data is from and contains the number of new cases of COVID-19 in China and Italy during the first 99 days of 2020.
data1 <- read.csv("C:/Users/sadie/Downloads/Actin_Kd_Modeling PRELAB/PRELAB/coronadata.csv")
read.csv("C:/Users/sadie/Downloads/Actin_Kd_Modeling PRELAB/PRELAB/coronadata.csv") Day China Italy
1 1 0 0
2 2 0 0
3 3 17 0
4 4 0 0
5 5 15 0
6 6 0 0
7 7 0 0
8 8 0 0
9 9 0 0
10 10 0 0
11 11 0 0
12 12 0 0
13 13 0 0
14 14 0 0
15 15 0 0
16 16 0 0
17 17 4 0
18 18 17 0
19 19 136 0
20 20 19 0
21 21 151 0
22 22 140 0
23 23 97 0
24 24 259 0
25 25 441 0
26 26 665 0
27 27 787 0
28 28 1753 0
29 29 1466 0
30 30 1740 0
31 31 1980 3
32 32 2095 0
33 33 2590 0
34 34 2812 0
35 35 3237 0
36 36 3872 0
37 37 3727 0
38 38 3160 0
39 39 3418 0
40 40 2607 0
41 41 2974 0
42 42 2490 0
43 43 2028 0
44 44 15141 0
45 45 4156 0
46 46 2538 0
47 47 2007 0
48 48 2052 0
49 49 1890 0
50 50 1750 0
51 51 394 0
52 52 891 0
53 53 826 14
54 54 647 62
55 55 218 53
56 56 515 97
57 57 410 93
58 58 439 78
59 59 329 250
60 60 428 238
61 61 574 240
62 62 205 561
63 63 127 347
64 64 119 466
65 65 117 587
66 66 170 769
67 67 101 778
68 68 46 1247
69 69 45 1492
70 70 20 1797
71 71 29 977
72 72 24 2313
73 73 22 2651
74 74 19 2547
75 75 22 3497
76 76 25 2823
77 77 43 4000
78 78 23 3526
79 79 44 4207
80 80 99 5322
81 81 52 5986
82 82 65 6557
83 83 138 5560
84 84 69 4789
85 85 78 5249
86 86 102 5210
87 87 94 6153
88 88 119 5959
89 89 113 5974
90 90 98 5217
91 91 84 4050
92 92 54 4053
93 93 100 4782
94 94 70 4668
95 95 62 4585
96 96 48 4805
97 97 67 4316
98 98 56 3599
99 99 86 3039
2.1 (b) Create a lineplot with the Day on the x-axis and number of new incidences per day in China on the y axis. Add the following parameter: type = "l" in your plot() function to connect the dots by a line.
plot(data1$Day, data1$China, type = "l", col = "green", xlab = "Day", ylab = "New Incidences per Day", main = "China vs Italy")2.1 (c) Add the number of new incidences per day in Italy to the above plot. In other words make one plot that contains both the number of new incidences per day for China and Italy. Use different colors for each data set. Hint: use the lines() function to add a new graph to an excising plot.
plot(data1$Day, data1$China, type = "l", col = "green", xlab = "Day", ylab = "New Incidences per Day", main = "China vs Italy")
lines(data1$Day, data1$Italy, type = "l", col = "red")Microfilaments are one of the three major cytoskeletal elements in eukaryotic cells, and they play a key role in many cellular processes such as cell division, cell motility, and cell shape maintenance. Microfilament structure is a result of the processes of nucleation, elongation, and equilibrium of actin subunits coming on and off filaments. An example is shown in the graph to the right, taken from your textbook. During a slow initial lag phase, called nucleation, G-actin monomers bind to each other forming small oligomeric “nuclei” or “seeds” comprised of a few actin subunits. Additional monomers are added to the oligomeric seeds at a much faster rate. During this elongation phase, longer actin filaments are formed. These filaments are also referred to as F-actin, which is short for filamentous actin. The polymerization reaction is reversible, and thus, the filament is a dynamic structure – losing and gaining actin monomers. Once the filament reaches a steady state (equilibrium) condition, where the rate of addition of monomers equals the rate of depolymerization, the filament no longer grows. This is seen as a plateau on the graph.

Inside the cell, polymerization is not an unregulated spontaneous process. It is carefully orchestrated and regulated by the action of many proteins that allow for polymerization as needed in the cell. Functions for these proteins include maintenance of a large pool of free actin monomer, promoting or blocking nucleation events, accelerating or terminating elongation of the polymer, severing and disassembly of filaments and bundling filaments.
In this lab, you will investigate the actin-binding protein, profilin, and its binding affinity for G-actin.
Look up some of the functions of profilin, especially with regard to nucleation and elongation
The binding affinity of two proteins is typically described by an equilibrium dissociation constant, Kd (also written as KD). The binding of profilin to actin is a reversible reaction. Written as a dissociation reaction,
\(AP\leftrightarrow A + P\)
where A, P, and AP are monomeric actin, profilin, and profilin bound actin, respectively, the equilibrium dissociation constant equals the concentration of the free protein products divided by the concentration of bound protein.
\(K_{d}\, = \frac{[A][P]}{[AP]}\)
The Kd is reported in units of molarity (M) and it reflects the affinity of the two proteins binding to each other. As the affinity increases, the proportion of bound protein [AP] increases. Thus, as [AP] increases, Kd decreases. A protein with a Kd in the nanomolar (10-9M) range has a stronger affinity than a protein with a Kd in the micromolar (10-6M).
We will determine the Kd value for profilin indirectly by measuring actin assembly rates as a function of free actin. The more free actin in the reaction, the faster the assembly rate. With the pyrene-actin fluorescence assay used in these experiments, the assembly rates reflect both nucleation and elongation events. We will be focusing on the initial portion of actin assembly, during which actin oligomers (“seeds”) form but fluorescence observed also includes fluorescence due to elongation of actin filaments extending from the seeds. Regardless, the assembly rate is a function of the concentration of free actin.
The modeling lab in week 5 will explain how to obtain the Kd value of profilin for G-actin from a plot of profilin concentration versus assembly rate. For the pre-lab, we are focusing on how to obtain assembly rates using a linear regression of the initial “nucleation” phase of the assembly curve.
In this assignment, you will plot data from a file containing the fluorescence vs. time data from two actin assembly reactions - one reaction performed with actin only, and another reaction where 3 µM profilin was included. This data is from pyrene-actin assays similar to what section 1 performed in lab on week 3.
In the week 5 lab we will filter the data to limit the analysis to the linear portion of the actin assembly curve and we will perform a linear regression to find a best fit line for these linear portions of the curve. The slope value of the line is the assembly rate.

Read in and Plot Experimental Data for a Range of Profilin Concentrations
Handling Experimental Data
Import the data file in your folder that you downloaded from Canvas and unzipped.
The data file Profilin_AssemblyCurves.csv should be in the folder with this document. Do not move it out of this folder. This files contains fluorescence readings collected from in vitro pyrene-actin assays, where the concentration of actin was kept constant and the concentration of profilin varied. The dataset includes readings from the first 3000 seconds of a polymerization reaction, which is primarily during the nucleation phase but will include elongation events as well. With pyrene assays, keep in mind that the increase in fluorescence includes both nucleation and elongation events.
The following command will read the data file into a variable. The example is Variablename, but you can choose something more informative.
Variablename <- read.csv("C:/Users/sadie/Downloads/Actin_Kd_Modeling PRELAB/PRELAB/Profilin_AssemblyCurves.csv")Read in the file by assigning the whole .csv file to a variable name – which will be a matrix with dimensions of 201x12 (201 rows and 12 columns) – this is a data frame.
1. Assign each column to a specific variable
Variable_name_column <- Variablename$columnName2. Work with the dataframe itself – remembering the order of the columns and what each column corresponds to.
Variable_name_column <- Variablename[,2]Regardless of which way you access your data, you can now use these variables (Variable_name_column) in your .qmd file for plotting or other purposes.
3.1.Plot the actin only and the actin plus 3 μM Profilin as lines in one graph (use type=“l”, include axes labels with units, title, and add a legend). See Part 2 above for explanations on making plots.
The concentration of actin is constant for each reaction, however, the amount of profilin varies. The profilin concentration is listed in the column headers. At 3 µM profilin (column 6), there is a slight molar excess of profilin relative to actin.
The units in the .csv data file are in seconds for time and arbitrary units (A.U.) for the Pyrene Fluoresence listed in the other columns. Make sure to include units in your axis labels.
Assign colors to the curve(s) by using the following code to set up a variable cl with a rainbow of indexed colors. When setting the colors of a curve, use col=cl[1] (where the index refers to one of the 2 colors) directly within your parentheses when plotting. We’ve set up the color rainbow with only two colors but when plotting multiple lines, you would increase the rainbow number to as many lines as you are plotting.
cl<-rainbow(2)
plot(Variablename$Time, Variablename$ActinOnly, type = "l", col = cl[1], xlab = "Time (s)", ylab = "Pyrene Fluoresence (A.U.)", main = "Actin Only vs Actin + 3um Profilin")
lines(Variablename$Time, Variablename$p3uM, type = "l", col = cl[2])
legend("topleft", legend = c("Actin Only", "Actin + 3um Profilin"), col = cl, lwd = "2")3.2. Describe what you see.
A) Examine the slopes of the two lines. Is the rate of actin assembly faster or slower in the presence of profilin?
ANSWER HERE: The rate of actin assembly is slower in the presence of profilin, which is observed based on the shallower slope of the cyan line.
B) What is the role of profilin during the nucleation phase? (Look up the roles of profilin in actin assembly.)
ANSWER HERE: During nucleation, profilin binds to actin monomers to prevent them from nucleating and forming new actin nuclei and allowing the cell to maintain a large store of actin monomers.
C) What is the role profilin during the elongation phase?
ANSWER HERE: During elongation, profilin delivers actin monomers to the barbed end of the filament.
In week 5, be prepared to explain the effect(s) of profilin on the actin assembly rate (slope).
Perform a linear fit to the experimental data
Now we want to fit a line through each graph and calculate its slope by performing a linear least squares regression. The slope will give us the assembly rates so that we can quantitate how much profilin affects actin assembly. Additionally, we want to inspect the R2 value (coefficient of determination) to determine the overall model performance. The R2 represents the amount of variation in y explained by x in a given data set. In our case the R2 is interpreted as the proportion of the variance in the dependent variable (fluorescence) that is predictable from the independent variable (Profilin concentration). An R2 of 0 means that the dependent variable (fluorescence) cannot be predicted from the independent variable (Profilin concentration). An R2 of 1 means the dependent variable can be predicted without error from the independent variable. An R2 of 0.10 means that 10 percent of the variance in Y is explainable by X.
A summary of relevant R syntax:
R syntax for linear least squares regression
# don't remove eval = FALSE from the example code but do remove it when writing your own code chunk
fit <- lm(y ~ x) #(lm=linear model), y=variable on y-axis, x=variable on x-axisIntercept, slope and R2
intercept<-fit$coefficients[1]
slope<-fit$coefficients[2]
R2<-summary(fit)$r.squaredPlot linear fit
plot(x,y,cex=0.3,ylim=c(100,600)) # plot experimental data
# cex=0.2 makes small dots for points
# ylim sets the same y-axis range for each plot
abline(fit,col="red") #draws best fit line in red
legend("topleft", bty="n", cex=0.7, legend=c(paste("slope is", format(slope, digits=3)),paste("R2 is", format(R2, digits=3))))
# bty = "n" suppresses making a box around the legend3.3. Perform linear fits for the actin alone curve and for the actin + 3 µM profilin curve. Then, on two separate plots, for each plot, show: (a) the data points, (b) the best fit line, and (c) the slope and r-squared value for the best fit line.
# Actin Only linear fit
x <- Variablename$Time
y <- Variablename$ActinOnly
fit <- lm(y ~ x)
slope <- coef(fit)[2]
R2 <- summary(fit)$r.squared
plot(x, y,
cex = 0.3,
ylim = c(100, 600),
xlab = "Time (s)",
ylab = "Pyrene Fluorescence (A.U.)",
main = "Actin Only Linear Fit")
abline(fit, col = "red", lwd = 2)
legend("topleft",
bty = "n",
cex = 0.8,
legend = c(
paste("Slope =", round(slope,3)),
paste("R2 =", round(R2,3))
))# Actin + 3 uM Profilin linear fit
x <- Variablename$Time
y <- Variablename$p3uM
fit <- lm(y ~ x)
slope <- coef(fit)[2]
R2 <- summary(fit)$r.squared
plot(x, y,
cex = 0.3,
ylim = c(100, 600),
xlab = "Time (s)",
ylab = "Pyrene Fluorescence (A.U.)",
main = "Actin + 3 uM Profilin Linear Fit")
abline(fit, col = "blue", lwd = 2)
legend("topleft",
bty = "n",
cex = 0.8,
legend = c(
paste("Slope =", round(slope,3)),
paste("R2 =", round(R2,3))
))During the week 5 lab, you will plot the curves and calculate the individual slopes for actin assembly in the presence of a range of profilin concentrations. From this data, you will be able to estimate the binding affinity of profilin for G-actin.