Michael Czahor
Introduction
In class we breifly discussed debugging our functions to make sure they are running properly. After speaking with Eric I came up with several ideas on how to test functions individually and simultaneously. What you are going to see in the coming code/analysis is an individual breakdown of each code using functions and verbal analysis, followed by an overall comparison of all four functions at the same time.
pattern_a <- function(y, x) {
# find binary pattern y in sequence x
browser()
if (!all(y %in% c(0,1))) stop("y is not a binary pattern")
if (!all(x %in% c(0,1))) stop("x is not a binary sequence")
# check that y isn't longer than x:
if (length(y) > length(x)) return(0)
# does x start with pattern y?
found <- all(y==x[1:length(y)])
# check the rest
return( found + pattern_a(y,x[-1]))
}
x=rbinom(8,1,.5)
y=rbinom(3,1,.5)
pattern_a(y,x)
Instructions to debug this parts a-d After Eric or Dr. Hofmann run this code he or she will see a Browse function come up in the bottom. Proceed by entering the character 'n' over and over to do a line by line analysis of the code. After doing this for this function we seemed to have a great working function. But this is to easy because I simply assigned random Bernoulli variables with values of 0 and 1. I am going to give the function vectors with values that I know it won't like to see what happens. Note: when you are using your browser function you can simply enter the character 'c' to execute the remainder of the code. I chose to use the 'n' character to verify that a line by line analysis would work for Bernoulli's which it did. So now let us get some errors within our output. BEFORE PROCEEDING please close your browser by entering a cap 'Q' to exit the browser.
x=c(1,0,0,1,1,0,1,0,0,0,1,2)
y=c(1,0)
pattern_a(y,x)
A non-binary vector in part a
Notice I left the browser function in patterna. When Browse[1]> appears enter a 'c' and you will get the message “Error in pattern(y,x):x is not a binary sequence. ” To see where the error is I suggest using a traceback function. traceback() will print in reverse order an output that looks like this: 2: stop(“x is not a binary sequence”) at #5 1: pattern_a(y, x) We have seen that part a is adaquate in regard to noticing non-binary input which in my opinion is a key functionallity of binary coding. Next we will look at the situation in part a when a given pattern is longer then the x vector. Intuitavely we know that this is just wrong. But how will our function act when we throw something like this into it. Proceed to the code below to see how pattern_a handles this task.
x=rbinom(100,1,.5)
y=rbinom(101,1,.5)
pattern_a(y,x)
An analysis on the y vector being larger than the x vector
Wow, we have come across our first curious siting within this code. When your Browse[1]> appears enter 'c' to execute through everything. You will get an answer of 0. Granted this is absolutely correct becuase vector y does not appear in vector x, but the issue I have is no error occurs. When looking at the code to pattern_a we can easily see where the issue occurs.
# check that y isn't longer than x:if (length(y) > length(x)) return(0)
Returning a value of 0 is a very BAD idea. 0 is an actual possibility of vector y not occuring in x so there is no reason to return a possible value when y>x. There should be an error message saying “LENGTH OF VECTOR Y IS GREATER THAN LENGTH OF VECTOR X”
Our last analysis of part a will be using a function as requested in the instructions to analyze multiple vectors at the same time for pattern_a.
pattern_a <- function(y, x) {
if (!all(y %in% c(0,1))) stop("y is not a binary pattern")
if (!all(x %in% c(0,1))) stop("x is not a binary sequence")
if (length(y) > length(x)) return(0)
found <- all(y==x[1:length(y)])
return( found + pattern_a(y,x[-1]))
}
analyzeA=function(x,y){
for(i in 1:20){
x=rbinom(25,1,.5)
y=rbinom(2,1,.5)
print(pattern_a(y,x))
}
}
analyzeA(x,y)
Analysis on multiple vectors for pattern_a
The first thing to notice is I removed the browser function to speed analysis up since I know this will work. What I did was generated 20 x vectors and 20 y vectors of lengths 25 and 2 respectively. The print statement inside of the for loop prints 20 values of how often y occurs in x. Due to the central limit theorem if we were to run 30+ vectors then we would be able to use zscore analysis with continuity corrections to assess the probability of how often y occurs in x. I realize this has nothing to do with the assignment, but it is a very good statistical property to recognize. Finally for part a lets look at an error that can occur in a multiple vector function.
analyzeA=function(x,y){
for(i in 1:20){
x=rbinom(25,1,.5)
y=rbinom(26,1,.5)
print(pattern_a(y,x))
}
}
analyzeA(x,y)
Errors for multiple vector analysis on part A
As we saw with single vectors our output is 0 everytime even though y>x. I'm going to fix it real quick and then move on to pattern_b.
pattern_a <- function(y, x) {
if (!all(y %in% c(0,1))) stop("y is not a binary pattern")
if (!all(x %in% c(0,1))) stop("x is not a binary sequence")
if (length(y) > length(x)) return("Y IS LARGER THAN X! NOT POSSIBLE")
found <- all(y==x[1:length(y)])
return( found + pattern_a(y,x[-1]))
}
analyzeA=function(x,y){
for(i in 1:20){
x=rbinom(25,1,.5)
y=rbinom(26,1,.5)
print(pattern_a(y,x))
}
}
analyzeA(x,y)
Final Thoughts on Pattern A The biggest problem in pattern_a was the returning of 0 when y>x. This was corrected in the above multivector example.
Pattern B
pattern_b <- function(y,x) {
browser()
result <- 0
if (!all(y %in% c(0,1))) stop("y is not a binary pattern")
if (!all(x %in% c(0,1))) stop("x is not a binary sequence")
if (length(y) > length(x)) return(0)
for (i in 0:(length(x)-length(y))) result <- result + all(y == x[i + (1:length(y))])
return(result)}
x=c(0,1,0,1,0,0,0,1,1,0,0,1,1,2,0,0,1,0,0,1)
y=rbinom(2,1,.5)
pattern_b(y,x)
A non-binary vector in part b
Notice I left the browser function in patternb. When Browse[1]> appears enter a 'c' and you will get the message “Error in pattern(y,x):x is not a binary sequence. ” To see where the error is I suggest using a traceback function. traceback() will print in reverse order an output that looks like this:
Browse[2]> n debug at #5: if (!all(x %in% c(0, 1))) stop(“x is not a binary sequence”)
We have seen that pattern b is adaquate in regard to noticing non-binary input which in my opinion is a key functionallity of binary coding. Next we will look at the situation in part a when a given pattern is longer then the x vector. Intuitavely we know that this is just wrong. But how will our function act when we throw something like this into it. Proceed to the code below to see how pattern_a handles this task.
x=rbinom(100,1,.5)
y=rbinom(101,1,.5)
pattern_b(y,x)
An analysis on the y vector being larger than the x vector
Again, we have come across our first curious siting within this code. When your Browse[1]> appears enter 'c' to execute through everything. You will get an answer of 0. Granted this is absolutely correct becuase vector y does not appear in vector x, but the issue I have is no error occurs. When looking at the code to pattern_a we can easily see where the issue occurs.
if (length(y) > length(x)) return(0)
Returning a value of 0 is a very BAD idea. 0 is an actual possibility of vector y not occuring in x so there is no reason to return a possible value when y>x. There should be an error message saying “LENGTH OF VECTOR Y IS GREATER THAN LENGTH OF VECTOR X”
Our last analysis of part b will be using a function as requested in the instructions to analyze multiple vectors at the same time for pattern_b.
pattern_b <- function(y,x) {
result <- 0
if (!all(y %in% c(0,1))) stop("y is not a binary pattern")
if (!all(x %in% c(0,1))) stop("x is not a binary sequence")
if (length(y) > length(x)) return(0)
for (i in 0:(length(x)-length(y))) result <- result + all(y == x[i + (1:length(y))])
return(result)}
analyzeB=function(x,y){
for(i in 1:20){
x=rbinom(25,1,.5)
y=rbinom(2,1,.5)
print(pattern_b(y,x))
}
}
analyzeB(x,y)
Multiple Vector Analysis on Pattern B
The first thing to notice is I removed the browser function to speed analysis up since I know this will work. What I did was generated 20 x vectors and 20 y vectors of lengths 25 and 2 respectively. The print statement inside of the for loop prints 20 values of how often y occurs in x. Due to the central limit theorem if we were to run 30+ vectors then we would be able to use zscore analysis with continuity corrections to assess the probability of how often y occurs in x. I realize this has nothing to do with the assignment, but it is a very good statistical property to recognize. Finally for part a lets look at an error that can occur in a multiple vector function.
Errors for multiple vector analysis on part B
As we saw with single vectors our output is 0 everytime even though y>x. I'm going to fix it real quick and then move on to pattern_b.
pattern_b <- function(y,x) {
result <- 0
if (!all(y %in% c(0,1))) stop("y is not a binary pattern")
if (!all(x %in% c(0,1))) stop("x is not a binary sequence")
if (length(y) > length(x)) return("Y is larger than X NOT POSSIBLE")
for (i in 0:(length(x)-length(y))) result <- result + all(y == x[i + (1:length(y))])
return(result)}
analyzeB=function(x,y){
for(i in 1:20){
x=rbinom(25,1,.5)
y=rbinom(26,1,.5)
print(pattern_b(y,x))
}
}
analyzeB(x,y)
Final Thoughts on Pattern B The biggest problem in pattern_b was the returning of 0 when y>x. This was corrected in the above multivector example. In my opinion parts A and B had the same major issue, but overall both worked adaquately for what we are trying to accomplish in this homework.
Pattern C
Pattern C is a bit more interesting than the first two patterns that we analyzed. Pattern C has an initial function created and then nested within it. It is essential to note that this aspect is unique in R. In C++ and Python there is not nearly direct of a way to pass functions through functions. The straightforward aspects of R are very beautiful to work with.
break_vec = function(x, M) {
n <- length(x)
if (M > n)
stop("the block size M = ", M, " is greater than length of x (", n, ")")
x0 = 1:(n - M + 1)
substring(paste(x, collapse = ""), x0, x0 + M - 1)
}
pattern_c <- function(y,x) {
sum(break_vec(x, length(y)) == paste(y, collapse = ""))
}
x=c(2,0,0,0,1,0,0,1,0,0,1,0,0,1,0)
y=rbinom(3,1,.5)
pattern_c(y,x)
Non-Binary Analysis on Part C
When setting the above x and y vectors to those given values pattern_c(y,x) returned a value of 0. This is terrible! We can see right away that x contains a 2. This is a major issue with pattern c.
x=rbinom(100,1,.5)
y=rbinom(101,1,.5)
pattern_c(y,x)
An analysis on the y vector being larger than the x vector
Finally we have a pattern that recognizes that y needs to be <= x. Lets use the traceback function to see exactly what is happening in regard to the error.
traceback(pattern_c(y,x))
Traceback shows us the line which is violated Error in break_vec(x, length(y)) : the block size M = 101 is greater than length of x (100)
Let us put a browser function into pattern_c to analyze what is going on when y>x at a very detailed level.
pattern_c <- function(y,x) {
browser()
sum(break_vec(x, length(y)) == paste(y, collapse = ""))
}
pattern_c(y,x)
Enter the character n twice to see right where the error occurs. Note: even though it is ahead of our class to fix this the function trace() would let us edit inside the debug mode. I won't perform this now but it is a good idea for future reference.
MultiVector Analysis on C
pattern_c <- function(y,x) {
sum(break_vec(x, length(y)) == paste(y, collapse = ""))
}
analyzeC=function(x,y){
for(i in 1:20){
x=rbinom(25,1,.5)
y=rbinom(2,1,.5)
print(pattern_c(y,x))
}
}
analyzeC(x,y)
Multiple Vector Analysis on Pattern C
Same as parts A and B
Errors for multiple vector analysis on part C
Unlike Parts A and B part C should recognize the vector length issue.
Error in break_vec(x, length(y)) : the block size M = 26 is greater than length of x (25)
This error was analyzed using a traceback and debugging function and gives specific instructions that the block size needs to be less than or equal to the vector size. We must abide by these instructions.
analyzeC=function(x,y){
for(i in 1:20){
x=rbinom(25,1,.5)
y=rbinom(26,1,.5)
print(pattern_c(y,x))
}
}
analyzeC(x,y)
Final Thoughts on Pattern C
The beneficial aspect to pattern_c is a seperate function was created to denote the problems that were mentioned in patterns a and b. This was a nice siting after the issues we saw in patterns a and b.
Pattern D
pattern_d <- function(y,x) {
result=0
n=length(x)
k=length(y)
for (i in 1:(n-k+1)) {
j=0
stop=FALSE
while ((j < k) & !stop) {
if (x[i+j] != y[j+1]) stop=TRUE
j=j+1
}
if (!stop) result=result+1
}
result
}
Intial D thoughts
The first difference we can see is that D does not contain a stop function like the rest of the functions. In my opinion this is the most breakable pattern out of the 4. Let's run similar tests to analyze this hypothesis.
x=c(1,0,0,1,1,0,1,0,0,0,1,2)
y=c(1,0)
pattern_d(y,x)
A non-binary vector in part d
pattern_d(y,x) produced a value of 3. This is a true statement but should never have been produced. The non-binary 2 at the end of the x vector should have caused this statement to not be run. So part d is prone to issues with non binary vectors.
x=rbinom(100,1,.5)
y=rbinom(101,1,.5)
pattern_d(y,x)
** Y Vector > X vector ** Pattern d recognized the issue and produced an error message of
Error in if (x[i + j] != y[j + 1]) stop = TRUE
argument is of length zero
This is great and after assessing this via browser() and traceback() I was able to pinpoint the errors location. Since I have shown how this is done in previous examples I will not do it again here.
analyzeD=function(x,y){
for(i in 1:20){
x=rbinom(25,1,.5)
y=rbinom(2,1,.5)
print(pattern_d(y,x))
}
}
analyzeD(x,y)
Analysis on multiple vectors for pattern_d
Same as parts A and B
analyzeD=function(x,y){
for(i in 1:20){
x=rbinom(25,1,.5)
y=rbinom(26,1,.5)
print(pattern_d(y,x))
}
}
analyzeD(x,y)
Errors for multiple vector analysis on part D
Same as part c
Final Thoughts on Pattern D
The biggest problem in pattern_d was the lace of any stop code.
Simultaneous Analysis
In the simultaneous analysis I will create one quick function just to make sure that we get the same output for all 4 patterns at once. I feel that plenty of analysis was done on the individual functions, so verifying equal output will assure me that they are all adaquate for basic binary input given that length of vector y is less than the length of vector x. The other constraint clearly being binary values across each vector.
pattern_a <- function(y, x) {
if (!all(y %in% c(0,1))) stop("y is not a binary pattern")
if (!all(x %in% c(0,1))) stop("x is not a binary sequence")
if (length(y) > length(x)) return(0)
found <- all(y==x[1:length(y)])
return( found + pattern_a(y,x[-1]))
}
test=function(x,y){
for(i in 1:50){
x=rbinom(1000,1,.5)
y=rbinom(8,1,.5)
resulta=pattern_a(y,x)
resultb=pattern_b(y,x)
resultc=pattern_c(y,x)
resultd=pattern_d(y,x)
i=i+1
print(c(resulta,resultb,resultc,resultd))}
}
test(x,y)
FINAL ANALYSIS
I produced a 50X4 generated row and column output to verify 50 runs of equality among our functions. Notice I re-normalized function_a since some manipulation was done earlier.
Discussion on 5 things that make a good code
I think the first thing that makes a good code is the use of the DRY principle. It is easy to see that once code gets longer we would rather not have stuff all over the place. Organization is essential when programming in any language. Next, avoiding nested loops may be a good idea since it causes programs to be slow. Even though R has this quality performance we will see that once the programs get longer that nested functionallity will be more of a nuisance than a help from a time standpoint. Third, with if () statements, I think using a braces is a better choice, even if the conditional command is only one or two lines. Modifying functions is much easier if the braces are already there. Fourth, avoid global variables! These are better known as formal parameters. It is a much better idea to move through function arguments when possible. Lastly, in regard to functions I recommend always using a return statement at the end of each function. This gives an explicit understanding of what is happening. It is good for you as a programmer and your peers to be able to understand every detail of the code, ESPECIALLY the output.