There are countless definitions of what programming is, here I collect four key phrases of them:
Programming is the process of taking an algorithm and encoding it into a notation, a programming language so that it can be executed by a computer. Although many programming languages and many different types of computers exist, the important first step is the need to have the solution. Without an algorithm, there can be no program.
Computer science is not the study of programming. Programming, however, is an important part of what a computer scientist does. Programming is often the way that we create a representation of our solutions. Therefore, this language representation and the process of creating it becomes a fundamental part of the discipline.
Algorithms describe the solution to a problem in terms of the data needed to represent the problem instance and the set of steps necessary to produce the intended result. Programming languages must provide a notational way to represent both the process and the data. To this end, languages provide control constructs and data types.
Control constructs allow algorithmic steps to be represented in a convenient yet unambiguous way. At a minimum, algorithms require constructs that perform sequential processing, selection for decision-making, and iteration for repetitive control. As long as the language provides these basic statements, it can be used for algorithm representation.
In Simple Words: Programming is how you get computers to solve problems.
Lets take a real life situation to make us better understanding about this algorithm, what it the precise step-by-step instructions on how you Calling a friend on the telephone?
Steps:
Assumptions:
In short, because time is money. Energy is money. And computers are designed to optimize both. Unfortunately, computers only do them what you tell them to do all the way down to what they choose to remember. Our job is to figure out what to tell them and how.
One of the best ways to improve your reach as a data scientist is to write functions. Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. Functions are often used to encapsulate a sequence of expressions that need to be executed numerous times, perhaps under slightly different conditions. Functions are also often written when code must be shared with others or the public.
A function, in a programming environment, is a set of instructions to carry out specified tasks. A programmer builds a function to avoid repeating the same task or reduce complexity. A function component should contain:
name of the function.arguments.Body collection of statements that define what the function does.return one or more values.Function Machine
In some occasion, we need to write our own function because we have to accomplish a particular task and no ready made function exists. A user-defined function involves a name, arguments and a body.
The function accepts a value and returns the square of the value.
function.name <- function(arguments)
{
computations on the arguments
some other code
}
square_function<- function(x)
{
x^2 # compute the square of `x`
}
square_function(9) # calling the function and passing value## [1] 81
We can write a function with more than one argument. Consider the function called “times”. It is a straightforward function multiplying two variables.
function.name <- function(argument1, argument2,...., argument_n)
{
computations on the arguments
some other code
}
times <- function(x,y)
{
x*y # compute (multiply x and y)
}
times(1000,4) # calling the function and passing the value## [1] 4000
To create a function in R, you will make and transform an R script. We know that the best way to learn to swim is by jumping in the deep end, so let’s just write a function to show you how easy that is in R.
As I have mentioned earlier, data scientists need to do many repetitive tasks. Most of the time, we copy and paste chunks of code over and over. Another example, normalization of a variable is highly recommended before we run a machine learning algorithm. The formula to normalize a variable is:
\[normalize={x-x_{min} \over x_{max} - x_{min}}\]
Now, let’s create a data frame as we have learn in the R basics last section.
set.seed(123) # to ensure we generate the same data
df<- data.frame( # create dataframe
a = rnorm(10, 5, 1), # vector `a` with normal random numbers
b = rnorm(10, 5, 1), # vector `b` with normal random numbers
c = rnorm(10, 5, 1) # vector `c` with normal random numbers
)
df # print the dataframe result## a b c
## 1 4.439524 6.224082 3.932176
## 2 4.769823 5.359814 4.782025
## 3 6.558708 5.400771 3.973996
## 4 5.070508 5.110683 4.271109
## 5 5.129288 4.444159 4.374961
## 6 6.715065 6.786913 3.313307
## 7 5.460916 5.497850 5.837787
## 8 3.734939 3.033383 5.153373
## 9 4.313147 5.701356 3.861863
## 10 4.554338 4.527209 6.253815
We already know how to use the min() and max() function in R. Therefore we can use normalize formula that we have above to get the normalized value of df as the following:
df.norm <- data.frame(
a = (df$a -min(df$a))/(max(df$a)-min(df$a)),
b = (df$b -min(df$b))/(max(df$b)-min(df$b)),
c = (df$c -min(df$c))/(max(df$c)-min(df$c))
)
df.norm # print data frame result## a b c
## 1 0.2364281 0.8500528 0.2104635
## 2 0.3472617 0.6197981 0.4994777
## 3 0.9475335 0.6307099 0.2246853
## 4 0.4481587 0.5534256 0.3257267
## 5 0.4678825 0.3758531 0.3610444
## 6 1.0000000 1.0000000 0.0000000
## 7 0.5791625 0.6565733 0.8585184
## 8 0.0000000 0.0000000 0.6257648
## 9 0.1940214 0.7107903 0.1865516
## 10 0.2749546 0.3979789 1.0000000
However, this method is prone to mistake. We could copy and forget to change the column name after pasting. Therefore, a good practice is to write a function each time you need to paste the same code more than twice. We can rearrange the code into a formula and call it whenever it is needed. Let’s consider carefully this function:
## [1] 0.2364281 0.3472617 0.9475335 0.4481587 0.4678825 1.0000000 0.5791625
## [8] 0.0000000 0.1940214 0.2749546
Even though the example is simple, we can infer the power of a formula. The above code is easier to read and especially avoid mistakes when pasting codes. We will also improve this function more powerful in the next section after you learn how to use for() and apply().
Suppose you want to present fractional numbers as percentages, nicely rounded to one decimal digit. Here’s how to achieve that:
round() function to do this.paste() function is at your service to fulfill this task.print() function will do this.You can easily translate these steps into a little script for R. So, open a new script file in your editor and type the following code:
x <- c(0.8765, 0.4321, 0.1234, 0.05678)
percent <- round(x * 100, digits = 1)
result <- paste(percent, sep = " in ", "%")
print(result)## [1] "87.6 in %" "43.2 in %" "12.3 in %" "5.7 in %"
To make this script into a function, you need to do a few things. Look at the script as a little factory that takes the raw numeric material and polishes it up to shiny percentages every mathematician will crave.
First, you have to construct the factory building, preferably with an address so people would know where to send their numbers. Then you have to install a front gate so you can get the raw numbers in. Next, you create the production line to transform those numbers. Finally, you have to install a back gate so you can send your shiny percentages into the world.
To build your factory, change the script to the following code:
addPercent <- function(x){
percent <- round(x * 100, digits = 1)
result <- paste(percent, sep = "", "%")
return(result)
}If you save this script as a .R file: for example, addPercent.R to your computer/PC in a folder. Then you can now call this script in the console with the following command:
source('Functions/addPercent.R') # make sure your working directory properly
x<-normalize(df$a) # using normalized data as we have above
addPercent(x) # assign x as percent using your function## [1] "23.6%" "34.7%" "94.8%" "44.8%" "46.8%" "100%" "57.9%" "0%" "19.4%"
## [10] "27.5%"
In this section, you are expected to be more confident to create your own function. Here I advise you to create a function for each tasks bellow:
Univariate variable (one dimension)
Multivariate variable (more dimension)
Simple Case Example
Id <- (1:5000)
Date <- seq(as.Date("2018/01/01"), by = "day", length.out = 5000)
Name <- sample(c("Angel","Sherly","Vanessa","Irene","Julian","Jeffry","Nikita","Kefas","Siana","Lala",
"Fallen","Ardifo","Kevin","Michael","Felisha","Calisha","Patricia","Naomi","Eric","Jacob"),
5000, replace = T)
City <- sample(rep(c("Jakarta","Bogor","Depok","Tangerang","Bekasi"), times = 1000))
Outlet <- sample(c("Outlet 1","Outlet 2","Outlet 3","Outlet 4","Outlet 5"),5000, replace = T)
Menu <- c("Cappucino","Es Kopi Susu","Hot Caramel Latte","Hot Chocolate","Hot Red Velvet Latte","Ice Americano",
"Ice Berry Coffee","Ice Cafe Latte","Ice Caramel Latte","Ice Coffee Avocado","Ice Coffee Lite",
"Ice Matcha Espresso","Ice Matcha Latte","Ice Red Velvet Latte")
all_menu <- sample(Menu, 5000, replace = T)
Price <- sample(18000:45000,14, replace = T)
DFPrice <- data.frame(Menu, Price)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Joining, by = "Menu"
## Id Date Name City Outlet Menu Price
## 1 1 2018-01-01 Julian Jakarta Outlet 3 Hot Red Velvet Latte 31322
## 2 2 2018-01-02 Kefas Bekasi Outlet 2 Hot Red Velvet Latte 31322
## 3 3 2018-01-03 Ardifo Bogor Outlet 5 Ice Matcha Espresso 29440
## 4 4 2018-01-04 Kevin Tangerang Outlet 3 Hot Red Velvet Latte 31322
## 5 5 2018-01-05 Naomi Bekasi Outlet 1 Hot Red Velvet Latte 31322
Let’s say, you have a data set already in your hand as you can see above. Please create a function to calculate the following tasks: