1 What is programming?

There are countless definitions of what programming is, here I collect four key phrases of them:

  • Programming is the process of taking an algorithm and encoding it into a notation, a programming language so that it can be executed by a computer. Although many programming languages and many different types of computers exist, the important first step is the need to have the solution. Without an algorithm, there can be no program.

  • Computer science is not the study of programming. Programming, however, is an important part of what a computer scientist does. Programming is often the way that we create a representation of our solutions. Therefore, this language representation and the process of creating it becomes a fundamental part of the discipline.

  • Algorithms describe the solution to a problem in terms of the data needed to represent the problem instance and the set of steps necessary to produce the intended result. Programming languages must provide a notational way to represent both the process and the data. To this end, languages provide control constructs and data types.

  • Control constructs allow algorithmic steps to be represented in a convenient yet unambiguous way. At a minimum, algorithms require constructs that perform sequential processing, selection for decision-making, and iteration for repetitive control. As long as the language provides these basic statements, it can be used for algorithm representation.

In Simple Words: Programming is how you get computers to solve problems.

2 Real Life Algorithms

Lets take a real life situation to make us better understanding about this algorithm, what it the precise step-by-step instructions on how you Calling a friend on the telephone?

Steps:

  • Pick up the phone and listen for a dial tone
  • Press each digit of the phone number on the phone
  • If busy, hang up phone, wait 5 minutes, jump to step 2
  • If no one answers, leave a message then hang up
  • If no answering machine, hang up and wait 2 hours, then jump to step 2
  • Talk to friend
  • Hang up phone

Assumptions:

  • Step 1 assumes that you live alone and no one else could be on the phone.
  • The algorithm assumes the existence of a working phone and active service.
  • The algorithm assumes you are not deaf or mute.
  • The algorithm assumes a normal corded phone.

3 Why Study Algorithms?

In short, because time is money. Energy is money. And computers are designed to optimize both. Unfortunately, computers only do them what you tell them to do all the way down to what they choose to remember. Our job is to figure out what to tell them and how.

One of the best ways to improve your reach as a data scientist is to write functions. Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. Functions are often used to encapsulate a sequence of expressions that need to be executed numerous times, perhaps under slightly different conditions. Functions are also often written when code must be shared with others or the public.

4 What is a Function?

A function, in a programming environment, is a set of instructions to carry out specified tasks. A programmer builds a function to avoid repeating the same task or reduce complexity. A function component should contain:

  • The actual name of the function.
  • May or may not include arguments.
  • Body collection of statements that define what the function does.
  • May or may not return one or more values.
Function Machine

Function Machine

5 How to Write Function in R

In some occasion, we need to write our own function because we have to accomplish a particular task and no ready made function exists. A user-defined function involves a name, arguments and a body.

5.1 One Argument Function

The function accepts a value and returns the square of the value.

function.name <- function(arguments) 
{
    computations on the arguments   
    some other code
}   

5.1.1 Example (square)

square_function<- function(x) 
{
  x^2                                 # compute the square of `x`
}  
square_function(9)                    # calling the function and passing value
## [1] 81

5.1.2 Example (square_root)

square_root<- function(bakti) 
{
  bakti^(1/3)                      # compute the square root of `x`
}  
square_root(27)                    # calling the function and passing the value 
## [1] 3

5.1.3 Example (average)

average<- function(x) 
{
  sum(x)/length(x)                # compute the average of `x`
}  
x<-c(1,2,3,4,5,6)
average(x)                        # calling the function and passing the value 
## [1] 3.5

5.2 Multi Arguments Function

We can write a function with more than one argument. Consider the function called “times”. It is a straightforward function multiplying two variables.

function.name <- function(argument1, argument2,...., argument_n) 
{
    computations on the arguments   
    some other code
}   

5.2.1 Example (times)

times <- function(x,y) 
{
  x*y                                   # compute (multiply x and y) 
}
times(1000,4)                           # calling the function and passing the value
## [1] 4000

5.2.2 Example (volume)

volume<- function(p,l,t) 
{
  2*(p+l)
}
volume(5,4,3)
## [1] 18

5.2.3 Example (avarage_freq)

avarage_freq <- function(x,freq) 
{
sum(x*freq)/length(x)
}
x<-c(1,2,3,4,5)
freq<-c(4,5,6,6,6)
avarage_freq(x,freq)
## [1] 17.2

5.2.4 Example (avarage_freq_comment)

avarage_freq_comment <- function(x,freq) 
{
 avarage <-sum(x*freq)/length(x)
 result <- paste("Avarage Frequency is", sep = " ", avarage)
 return(result)
}
avarage_freq_comment(x,freq)
## [1] "Avarage Frequency is 17.2"

6 Simple Case Example

To create a function in R, you will make and transform an R script. We know that the best way to learn to swim is by jumping in the deep end, so let’s just write a function to show you how easy that is in R.

6.1 Normalize

As I have mentioned earlier, data scientists need to do many repetitive tasks. Most of the time, we copy and paste chunks of code over and over. Another example, normalization of a variable is highly recommended before we run a machine learning algorithm. The formula to normalize a variable is:

\[normalize={x-x_{min} \over x_{max} - x_{min}}\]

Now, let’s create a data frame as we have learn in the R basics last section.

set.seed(123)                                      # to ensure we generate the same data
df<- data.frame(                                   # create dataframe
  a = rnorm(10, 5, 1),                             # vector `a` with normal random numbers
  b = rnorm(10, 5, 1),                             # vector `b` with normal random numbers
  c = rnorm(10, 5, 1)                              # vector `c` with normal random numbers
)
df                                                 # print the dataframe result
##           a        b        c
## 1  4.439524 6.224082 3.932176
## 2  4.769823 5.359814 4.782025
## 3  6.558708 5.400771 3.973996
## 4  5.070508 5.110683 4.271109
## 5  5.129288 4.444159 4.374961
## 6  6.715065 6.786913 3.313307
## 7  5.460916 5.497850 5.837787
## 8  3.734939 3.033383 5.153373
## 9  4.313147 5.701356 3.861863
## 10 4.554338 4.527209 6.253815

We already know how to use the min() and max() function in R. Therefore we can use normalize formula that we have above to get the normalized value of df as the following:

df.norm <- data.frame(  
  a = (df$a -min(df$a))/(max(df$a)-min(df$a)), 
  b = (df$b -min(df$b))/(max(df$b)-min(df$b)),    
  c = (df$c -min(df$c))/(max(df$c)-min(df$c))    
)
df.norm                                            # print data frame result
##            a         b         c
## 1  0.2364281 0.8500528 0.2104635
## 2  0.3472617 0.6197981 0.4994777
## 3  0.9475335 0.6307099 0.2246853
## 4  0.4481587 0.5534256 0.3257267
## 5  0.4678825 0.3758531 0.3610444
## 6  1.0000000 1.0000000 0.0000000
## 7  0.5791625 0.6565733 0.8585184
## 8  0.0000000 0.0000000 0.6257648
## 9  0.1940214 0.7107903 0.1865516
## 10 0.2749546 0.3979789 1.0000000

However, this method is prone to mistake. We could copy and forget to change the column name after pasting. Therefore, a good practice is to write a function each time you need to paste the same code more than twice. We can rearrange the code into a formula and call it whenever it is needed. Let’s consider carefully this function:

normalize <- function(x){
  norm <- (x-min(x))/(max(x)-min(x))
  return(norm)
}
normalize(df$a)
##  [1] 0.2364281 0.3472617 0.9475335 0.4481587 0.4678825 1.0000000 0.5791625
##  [8] 0.0000000 0.1940214 0.2749546

Even though the example is simple, we can infer the power of a formula. The above code is easier to read and especially avoid mistakes when pasting codes. We will also improve this function more powerful in the next section after you learn how to use for() and apply().

6.2 Percent

Suppose you want to present fractional numbers as percentages, nicely rounded to one decimal digit. Here’s how to achieve that:

  • Multiply the fractional numbers by 100.
  • Round the result to one decimal place: You can use the round() function to do this.
  • Paste a percentage sign after the rounded number: The paste() function is at your service to fulfill this task.
  • Print the result: The print() function will do this.

You can easily translate these steps into a little script for R. So, open a new script file in your editor and type the following code:

x <- c(0.8765, 0.4321, 0.1234, 0.05678)
percent <- round(x * 100, digits = 1)
result <- paste(percent, sep = " in ", "%")
print(result)
## [1] "87.6 in %" "43.2 in %" "12.3 in %" "5.7 in %"

To make this script into a function, you need to do a few things. Look at the script as a little factory that takes the raw numeric material and polishes it up to shiny percentages every mathematician will crave.

First, you have to construct the factory building, preferably with an address so people would know where to send their numbers. Then you have to install a front gate so you can get the raw numbers in. Next, you create the production line to transform those numbers. Finally, you have to install a back gate so you can send your shiny percentages into the world.

To build your factory, change the script to the following code:

addPercent <- function(x){
 percent <- round(x * 100, digits = 1)
 result <- paste(percent, sep = "", "%")
 return(result)
}

If you save this script as a .R file: for example, addPercent.R to your computer/PC in a folder. Then you can now call this script in the console with the following command:

source('Functions/addPercent.R')                   # make sure your working directory properly
x<-normalize(df$a)                                 # using normalized data as we have above
addPercent(x)                                      # assign x as percent using your function
##  [1] "23.6%" "34.7%" "94.8%" "44.8%" "46.8%" "100%"  "57.9%" "0%"    "19.4%"
## [10] "27.5%"

7 Your Exercise

In this section, you are expected to be more confident to create your own function. Here I advise you to create a function for each tasks bellow:

  • Univariate variable (one dimension)

    • average
    • middle_value
    • most_frequent
    • max_value
    • min_value
    • variance
    • standard_deviation
    • Outliers
    • summary (all functions) - optional
  • Multivariate variable (more dimension)

    • average
    • middle_value
    • most_frequent
    • max_value
    • min_value
    • variance
    • standard_deviation
    • Outliers
    • summary (all functions) - optional
  • Simple Case Example

Id       <- (1:5000)
Date     <- seq(as.Date("2018/01/01"), by = "day", length.out = 5000)

Name     <- sample(c("Angel","Sherly","Vanessa","Irene","Julian","Jeffry","Nikita","Kefas","Siana","Lala",
               "Fallen","Ardifo","Kevin","Michael","Felisha","Calisha","Patricia","Naomi","Eric","Jacob"),
               5000, replace = T)

City     <- sample(rep(c("Jakarta","Bogor","Depok","Tangerang","Bekasi"), times = 1000))

Outlet   <- sample(c("Outlet 1","Outlet 2","Outlet 3","Outlet 4","Outlet 5"),5000, replace = T)

Menu     <- c("Cappucino","Es Kopi Susu","Hot Caramel Latte","Hot Chocolate","Hot Red Velvet Latte","Ice Americano",
              "Ice Berry Coffee","Ice Cafe Latte","Ice Caramel Latte","Ice Coffee Avocado","Ice Coffee Lite",
              "Ice Matcha Espresso","Ice Matcha Latte","Ice Red Velvet Latte")
all_menu <- sample(Menu, 5000, replace = T)
Price    <- sample(18000:45000,14, replace = T)
DFPrice  <- data.frame(Menu, Price)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Menu_Price <- left_join(data.frame(Menu = all_menu),DFPrice)
## Joining, by = "Menu"
KopiKenangan <- cbind(data.frame(Id,
                                 Date,
                                 Name,
                                 City,
                                 Outlet),
                                 Menu_Price)
head(KopiKenangan,5)
##   Id       Date   Name      City   Outlet                 Menu Price
## 1  1 2018-01-01 Julian   Jakarta Outlet 3 Hot Red Velvet Latte 31322
## 2  2 2018-01-02  Kefas    Bekasi Outlet 2 Hot Red Velvet Latte 31322
## 3  3 2018-01-03 Ardifo     Bogor Outlet 5  Ice Matcha Espresso 29440
## 4  4 2018-01-04  Kevin Tangerang Outlet 3 Hot Red Velvet Latte 31322
## 5  5 2018-01-05  Naomi    Bekasi Outlet 1 Hot Red Velvet Latte 31322

Let’s say, you have a data set already in your hand as you can see above. Please create a function to calculate the following tasks:

  • The percentage of sales for each city.
  • The frequency of Name and Menu.
  • The Average of monthly sales per-menu item.