Debugging in programming refers to the process of identifying and fixing errors, or bugs, in the code.
Debugging is a crucial skill for developers as it ensures that programs run correctly and produce the expected results.
Why do we spend time debugging?
In this lesson, we will be finding/correcting bugs within the functions we wrote/someone else wrote
This principle emphasizes the importance of confirming the existence of a bug before attempting to fix it. Many times, what appears to be a bug may actually be a misunderstanding of how the code is supposed to work.
x <- 5
y <- 10
z <- x+y
print(z)
## [1] 15
In this example, before assuming there’s a bug, we confirm whether the result of z is what we expect by printing it.
This principle suggests repeating the process that leads to the bug consistently so that you can observe the bug each time. Using a small and consistent dataset or setting a random seed can help achieve this.
# Example
set.seed(123) # Set a random seed for reproducibility
data <- rnorm(10) # Generate random data
print(mean(data)) # Check the mean of the data
## [1] 0.07462564
When debugging, it’s often helpful to start with a small and simple version of the code to isolate the issue. Once the problem is identified, gradually expand to more complex scenarios
Use if statement to enforce value of input / intermediate value
Or use stopifnot()
Use print()
Copy / paste a function into new R script file
This refers to traditional methods of debugging, such as inserting print() statements or manually inspecting variables to understand the flow of the program and identify potential issues.
Printing the values of variables or expressions at various points in the code helps track their values and behavior, aiding in identifying discrepancies or unexpected outcomes.
Sometimes, copying the code into a new R script file can help identify issues that may be caused by external factors such as hidden characters, encoding problems, or interactions with other parts of the code.
Demo example:
Goal: Perform t test on each gene and report the gene that has smallest p value
#Simulate
n_tests <- 10000
rep <- 5
n <- n_tests * rep
#set.seed(1)
ctrl <- matrix(rnorm(n, mean = 10), nrow = n_tests)
rownames(ctrl) <- paste("gene_", 1:n_tests, sep ="")
colnames(ctrl) <- paste("subject_", 1:rep, sep = "")
#set.seed(2)
trt <- matrix(rnorm(n, mean = 13), nrow = n_tests)
rownames(trt) <- paste("gene_", 1:n_tests, sep ="")
colnames(trt) <- paste("subject_", 1:rep, sep = "")
trt[n_tests]
## [1] 13.4528
#initiate a p value = 1
min.pval <- 1
gene.ind <- 0
for (i in 1:n_tests){
#Perform a t test for two groups of data
my.test <- (t.test(ctrl[i,], trt[i,]))
#extract p value
my.pval <- my.test$p.value
if(my.pval < min.pval){
min.pval <- my.pval
gene.ind <- i
}
}
print(min.pval)
## [1] 5.059427e-09
print(gene.ind)
## [1] 3580
n_tests <- 10
rep <- 3
n <- n_tests * rep
set.seed(1)
ctrl <- matrix(rnorm(n), nrow = n_tests)
set.seed(2)
trt <- matrix(rnorm(n, mean = 10), nrow = n_tests)
trt[n_tests,]
## [1] 9.861213 10.432265 10.289637
debug() will open into a browser that lets you debug step by step
undebug() stops the function from entering into debug mode
browser() lets you start the debugging process from a specified point
debugone() You only have ro tun this once, no need to undebug
trace() This finds the spot where the error is located in the
untrace() exits the trace function
traceback() run immediately after an error and it will give the possible bugs
my_function <- function(x,y){
z <- x * y + 5
browser()
w <- z / 0
v <- log(w)
return(v)
}
my_function(2,3)
## Called from: my_function(2, 3)
## debug: w <- z/0
## debug: v <- log(w)
## debug: return(v)
## [1] Inf
trace(my_function)
debug(my_function)
debugonce(my_function)
undebug(my_function)