MORE ON FUNCTIONS: ARGUMENTS, CREATING, PRINTING, SAVING RESULTS, RETURNING RESULTS

Introduction—-

Whether your dealing with data types, reading in data, or other analytic tasks

a fundamental principle of data analytic languages, and programming in

general, is to avoid repeating code. This concept is often abbreviated as DRY:

Don’t repeat yourself. One reason why you should not repeat code is because

it’s a waste of time. Repeating code also leads to more lines of code to

maintain.

The opposite of DRY code is WET code in which you write out the process every

time.

One way to avoid using WET code is to package a process or routine into a

function. In this lesson I assume that you’ve already had some experience using

functions in R. So, I want to dive deeper into functions so that you can use

them more effectively. I think a great way to learn about functions is to

create some yourself.

You may have already used some functions, but I want to share more details

about how to use and create them.

Arguments to functions—-

Also known as parameters. With respect to a function, they are the variables.

The process performed by the function will not change, but the output will

change based on the arguments.

?log # In the help, there are two arguments: x, and base. The = exp(1) after the base means that there's a default value that will be used if you don't assign it a value.
## starting httpd help server ... done
log(x = 100, base = 10) # You can explicitly identify the values for each argument
## [1] 2
log(100, 10) # But you don't have to if they come in the order that the arguments are taken
## [1] 2
log(base = 10, x = 100) # You can put them out of order if you explictly identify them
## [1] 2
?sum # In the help, there are two arguments: ..., and na.rm. The ellipsis ... means that the number of arguments can vary
sum(1,3,5)
## [1] 9

Creating functions—-

Let’s create a function that uses the built-in cars dataset

?cars
summarizeAndPlot <- function(dataframe = cars, column = 'speed'){
  summary(dataframe)
  paste0('The value in row 1, column 1 is: ', dataframe[1,1])
  paste0('The value in row 2, column 2 is: ', dataframe[2,2])
  hist(dataframe[,column])
}

summarizeAndPlot() # Notice that it prints the histogram, but not the summary stats or other lines of text. It will only print text if it comes last.

# Printing—- # Give instructions to print the summary statistics

summarizeAndPlot <- function(dataframe = cars, column = 'speed'){
  print(summary(dataframe))
  print(paste0('The value in row 1, column 1 is: ', dataframe[1,1]))
  print(paste0('The value in row 2, column 2 is: ', dataframe[2,2]))
  hist(dataframe[,column])
}
summarizeAndPlot() # This now prints the summary statistics and the other two lines of text.
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00  
## [1] "The value in row 1, column 1 is: 4"
## [1] "The value in row 2, column 2 is: 10"

# Let’s test it out on another dataset

j17i <- read.csv('jan17Items.csv', sep = ',') # Read in the jan17Items.csv dataset
summarizeAndPlot(j17i, 'Tax') # It works well!
##      Time           OperationType        BarCode          CashierName       
##  Length:8899        Length:8899        Length:8899        Length:8899       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    LineItem          Department          Category         CardholderName    
##  Length:8899        Length:8899        Length:8899        Length:8899       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  RegisterName       StoreNumber        TransactionNumber  CustomerCode      
##  Length:8899        Length:8899        Length:8899        Length:8899       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##       Cost               Price             Quantity        Modifiers      
##  Min.   :-189.0400   Min.   :-322.810   Min.   : 1.000   Min.   :-2.5200  
##  1st Qu.:   0.1100   1st Qu.:   4.500   1st Qu.: 1.000   1st Qu.: 0.0100  
##  Median :   0.1100   Median :  12.020   Median : 1.000   Median : 0.0100  
##  Mean   :   0.2108   Mean   :   9.612   Mean   : 1.177   Mean   : 0.6548  
##  3rd Qu.:   0.1100   3rd Qu.:  14.680   3rd Qu.: 1.000   3rd Qu.: 1.0800  
##  Max.   : 189.0300   Max.   :  24.630   Max.   :36.000   Max.   :57.5500  
##                      NA's   :562                                          
##     Subtotal         Discounts           NetTotal            Tax         
##  Min.   :-322.80   Min.   : -0.0300   Min.   :-322.77   Min.   :-25.340  
##  1st Qu.:   5.24   1st Qu.: -0.0300   1st Qu.:   4.87   1st Qu.:  0.410  
##  Median :  13.10   Median : -0.0300   Median :  13.13   Median :  1.040  
##  Mean   :  15.10   Mean   :  0.7626   Mean   :  14.34   Mean   :  1.131  
##  3rd Qu.:  15.76   3rd Qu.: -0.0300   3rd Qu.:  15.79   3rd Qu.:  1.230  
##  Max.   :3328.12   Max.   :951.8300   Max.   :3328.15   Max.   :261.260  
##                                                         NA's   :341      
##     TotalDue      
##  Min.   :-348.11  
##  1st Qu.:   5.68  
##  Median :  14.17  
##  Mean   :  15.88  
##  3rd Qu.:  17.02  
##  Max.   :3589.41  
##  NA's   :341      
## [1] "The value in row 1, column 1 is: 2017-01-26T21:18:00Z"
## [1] "The value in row 2, column 2 is: SALE"

# Saving results—- # Modify the function to return the summary statistics. # You can save whatever is returned from a function into an object

a # This returns the contents of the variable a to the console. Notice that it creates a list that has elements needed for the histogram, but not the summary statistics.

a <- summarizeAndPlot(j17i, 'Tax') # This runs the function and saves the returned values to the variable, a
##      Time           OperationType        BarCode          CashierName       
##  Length:8899        Length:8899        Length:8899        Length:8899       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    LineItem          Department          Category         CardholderName    
##  Length:8899        Length:8899        Length:8899        Length:8899       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  RegisterName       StoreNumber        TransactionNumber  CustomerCode      
##  Length:8899        Length:8899        Length:8899        Length:8899       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##       Cost               Price             Quantity        Modifiers      
##  Min.   :-189.0400   Min.   :-322.810   Min.   : 1.000   Min.   :-2.5200  
##  1st Qu.:   0.1100   1st Qu.:   4.500   1st Qu.: 1.000   1st Qu.: 0.0100  
##  Median :   0.1100   Median :  12.020   Median : 1.000   Median : 0.0100  
##  Mean   :   0.2108   Mean   :   9.612   Mean   : 1.177   Mean   : 0.6548  
##  3rd Qu.:   0.1100   3rd Qu.:  14.680   3rd Qu.: 1.000   3rd Qu.: 1.0800  
##  Max.   : 189.0300   Max.   :  24.630   Max.   :36.000   Max.   :57.5500  
##                      NA's   :562                                          
##     Subtotal         Discounts           NetTotal            Tax         
##  Min.   :-322.80   Min.   : -0.0300   Min.   :-322.77   Min.   :-25.340  
##  1st Qu.:   5.24   1st Qu.: -0.0300   1st Qu.:   4.87   1st Qu.:  0.410  
##  Median :  13.10   Median : -0.0300   Median :  13.13   Median :  1.040  
##  Mean   :  15.10   Mean   :  0.7626   Mean   :  14.34   Mean   :  1.131  
##  3rd Qu.:  15.76   3rd Qu.: -0.0300   3rd Qu.:  15.79   3rd Qu.:  1.230  
##  Max.   :3328.12   Max.   :951.8300   Max.   :3328.15   Max.   :261.260  
##                                                         NA's   :341      
##     TotalDue      
##  Min.   :-348.11  
##  1st Qu.:   5.68  
##  Median :  14.17  
##  Mean   :  15.88  
##  3rd Qu.:  17.02  
##  Max.   :3589.41  
##  NA's   :341      
## [1] "The value in row 1, column 1 is: 2017-01-26T21:18:00Z"
## [1] "The value in row 2, column 2 is: SALE"

a
## $breaks
##  [1] -40 -20   0  20  40  60  80 100 120 140 160 180 200 220 240 260 280
## 
## $counts
##  [1]    2   19 8520   11    3    1    0    0    1    0    0    0    0    0    0
## [16]    1
## 
## $density
##  [1] 1.168497e-05 1.110072e-04 4.977799e-02 6.426735e-05 1.752746e-05
##  [6] 5.842487e-06 0.000000e+00 0.000000e+00 5.842487e-06 0.000000e+00
## [11] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [16] 5.842487e-06
## 
## $mids
##  [1] -30 -10  10  30  50  70  90 110 130 150 170 190 210 230 250 270
## 
## $xname
## [1] "dataframe[, column]"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"

Returning—-

Adjust the function so that it returns the summary statistics

summarizeAndPlot <- function(dataframe = cars, column = ‘speed’){ print(summary(dataframe)) return(summary(dataframe)) # This is the line of code that returns the summary statistics hist(dataframe[,column]) } a <- summarizeAndPlot(j17i, ‘Tax’) # Now run the function and save the returned output to variable a again. a # You can see that the summary statistics are now the contents of a

You can return more than one thing by creating a list, but that’s beyond the scope of this lesson.

The point is that printing and returning serve different purposes.

Conclusion—-

Whether you create your own or use functions created by someone else, these

key concepts should help you understand how to use them effectively.