We will continue our discussion of coding priniciples by looking at how towrite functions. We begin our discussion with a look at a function in it’s mathematical form and see how we can translate it to code and use it to compute any values.
\[ T_k=\frac{5(T+459.67)}{9} \]
We can of course,plug any Fahrenheit T to get its conversion to kelvin but how do we make this scalable? What if we needed to convert 100 different temperature values? Sure we can do in in excel but behind the scenes, this is how we would do such an operation.
Lets start by evaluating the equation at something like 12 degrees F.
(5*(12+459.67))/9
## [1] 262.0389
What if we have 10 different temperatures? We will not sit here and compute the formula 10 times. We can instead save the formula into a function and we can feed in temperatures into this function to output the conversions. Lets call our function F_to_K. We define our function using the function command and in the brackets, we denote the input variable. Think of y=f(x)
T_to_K <-function(t)
{
(5*(t+459.67))/9
}
Lets use our function by storing a value to x and then printing out the output.
x=12
T_to_K(x)
## [1] 262.0389
Now if we wanted to compute 12 conversions for 12 differnt temperatures, it would still be tedious to define 12 different variables and compute each conversion for each value. We can instead use a program technique called the for loop. The for loop is the most basic of the loop family. The for loop allows you to automate parts of your code over as many iterations as needed.
https://www.r-bloggers.com/how-to-write-the-first-for-loop-in-r/
Lets create some sample data consisting of 20 different temperature values and convert it intoa data frame. We bind 20 different values using the vector c() then we use data.frame to convert the vector into a data frame (similar to a spreadsheet). Th head function simply displays the top few rows of any data frame,
#define temperatures
temp <- c(12, 46,123,56,23.6, 56,78, 34,2, 8,-5,12,77,34, 99, 11.1,56,85,99,3)
temp_data<-data.frame(temp)
head(temp_data, 20)
## temp
## 1 12.0
## 2 46.0
## 3 123.0
## 4 56.0
## 5 23.6
## 6 56.0
## 7 78.0
## 8 34.0
## 9 2.0
## 10 8.0
## 11 -5.0
## 12 12.0
## 13 77.0
## 14 34.0
## 15 99.0
## 16 11.1
## 17 56.0
## 18 85.0
## 19 99.0
## 20 3.0
The goal here is for our function to iterate over every row in our sample data and create a new column with the converted kelvin temperature. Thankfully r already comes with a for loop wrapped up in the sapply function. Essentially the for loop built into the sapply function will go row by row and convert each temperature from F to Kelvin using the formula function we made.
temp_data$kelvin<- sapply(temp_data$temp, T_to_K)
head(temp_data, 20)
## temp kelvin
## 1 12.0 262.0389
## 2 46.0 280.9278
## 3 123.0 323.7056
## 4 56.0 286.4833
## 5 23.6 268.4833
## 6 56.0 286.4833
## 7 78.0 298.7056
## 8 34.0 274.2611
## 9 2.0 256.4833
## 10 8.0 259.8167
## 11 -5.0 252.5944
## 12 12.0 262.0389
## 13 77.0 298.1500
## 14 34.0 274.2611
## 15 99.0 310.3722
## 16 11.1 261.5389
## 17 56.0 286.4833
## 18 85.0 302.5944
## 19 99.0 310.3722
## 20 3.0 257.0389
While the s apply function works in the case of creating a new column based on the values in another column, it is still crucial to learn how to write for loops from scratch.
Lets look at the structure of afor loop:
for (variable in sequence)
{
expression
expression
expression
}
## Error in for (variable in sequence) {: invalid for() loop sequence
In the above code, there is no definitions of variables so we expect an error to be raised but we only want to look at the basic structure of a for loop. We open a for loop with the for comman followed by (). The inside of the brackets lets you pick the variable you are going to iterate on for the sequence of your choice.
Lets make a for loop from scratch. we want to print out the square root of x for a list of numbers 1 through 15 as our sequence.
for (x in c(1:15))
{
print(sqrt(x))
}
## [1] 1
## [1] 1.414214
## [1] 1.732051
## [1] 2
## [1] 2.236068
## [1] 2.44949
## [1] 2.645751
## [1] 2.828427
## [1] 3
## [1] 3.162278
## [1] 3.316625
## [1] 3.464102
## [1] 3.605551
## [1] 3.741657
## [1] 3.872983
Lets recall our data that we made with temperatures. We applied sapply to use r’s built i for loop to convert each temperature to kelvin. Lets write our own for loop to do s similar computation as we did above. Note that our sequence is going to be the length of our data which is 20. We can store the length of our data into some variable and store the count of rows which can be found using nrow()
n=nrow(temp_data)
for (x in n)
{
temp_data$sqrt_kelvin=sqrt(temp_data$kelvin)
}
head(temp_data, 20)
## temp kelvin sqrt_kelvin
## 1 12.0 262.0389 16.18762
## 2 46.0 280.9278 16.76090
## 3 123.0 323.7056 17.99182
## 4 56.0 286.4833 16.92582
## 5 23.6 268.4833 16.38546
## 6 56.0 286.4833 16.92582
## 7 78.0 298.7056 17.28310
## 8 34.0 274.2611 16.56083
## 9 2.0 256.4833 16.01510
## 10 8.0 259.8167 16.11883
## 11 -5.0 252.5944 15.89322
## 12 12.0 262.0389 16.18762
## 13 77.0 298.1500 17.26702
## 14 34.0 274.2611 16.56083
## 15 99.0 310.3722 17.61738
## 16 11.1 261.5389 16.17216
## 17 56.0 286.4833 16.92582
## 18 85.0 302.5944 17.39524
## 19 99.0 310.3722 17.61738
## 20 3.0 257.0389 16.03243
Now lets pretend sapply never existed and we needed to use our own hand written for loop to convert the temperatures to kelvin. What would that look like? We would need to run our values in the temp column through our T_to_K function we made earlier. We use the same n we defined above. Lets call this new column, kelvin_from_scratch
for (x in n)
{
temp_data$kelvin_from_scratch=T_to_K(temp_data$temp)
}
head(temp_data,20)
## temp kelvin sqrt_kelvin kelvin_from_scratch
## 1 12.0 262.0389 16.18762 262.0389
## 2 46.0 280.9278 16.76090 280.9278
## 3 123.0 323.7056 17.99182 323.7056
## 4 56.0 286.4833 16.92582 286.4833
## 5 23.6 268.4833 16.38546 268.4833
## 6 56.0 286.4833 16.92582 286.4833
## 7 78.0 298.7056 17.28310 298.7056
## 8 34.0 274.2611 16.56083 274.2611
## 9 2.0 256.4833 16.01510 256.4833
## 10 8.0 259.8167 16.11883 259.8167
## 11 -5.0 252.5944 15.89322 252.5944
## 12 12.0 262.0389 16.18762 262.0389
## 13 77.0 298.1500 17.26702 298.1500
## 14 34.0 274.2611 16.56083 274.2611
## 15 99.0 310.3722 17.61738 310.3722
## 16 11.1 261.5389 16.17216 261.5389
## 17 56.0 286.4833 16.92582 286.4833
## 18 85.0 302.5944 17.39524 302.5944
## 19 99.0 310.3722 17.61738 310.3722
## 20 3.0 257.0389 16.03243 257.0389
write your code in the following code brackets.
Create a simple data set consisting of 1 column which could represent exam scores, different temperatures, orsome arbitrary measurments. Once the data is created, display the rows (Refer to how we made the temp data earlier)
Write a function that does some univariate operation such as a formula to convert C to F or a mathematical operation of your choice. (see how we built the kelvin conversion function)
Create a new column in our data and apply that formula to your data to generate a new column. use the head function to display your newly created data. (refer to how we computed kelvin_from_scratch or sqrt_kelvin)