Loopin’ Time

Hey there! This brief markdown will simply demonstrate my ability to create “if” statements, “for” loops, and “while” loops. These are nothing groundbreaking, but it’s good to have a log of skills.

The Basics

In this example I’ve set a variable ‘x’ to 13, and I’ve written an if-else statement that prints ‘x is even’ if x is an even number and ‘x is not even’ if x is an odd number. We use a modulo to see if the remainder is divisible by 2.

x <-13
if (x%%2 == 0){
  #code
  print('x is even')
} else {
  print('x is not even')
} 
## [1] "x is not even"
#same thing but reassigned x to 12
x <-12
if (x%%2 == 0){
  #code
  print('x is even')
} else {
  print('x is not even')
} 
## [1] "x is even"

I didn’t include this above, but if you wanted to add a third condition, for arguments that weren’t binary you could add an ‘elseif’ statement with another condition.

Something a little more complicated

we want to put a vector into the correct order, in this loop, we will not be returning a vector. We will be returning the values in descending order (largest to smallest)

y <- c(11,4,7)
if(y[1] > y[2]){
  #comparing the first number
  first <- y[1]
  second <- y[2]
} else {
  first <- y[2]
  second <- y[1]
}
# compare the third number 
if(y[3] > first & y[3] > second){
  third <- second
  second <- first
  first <- y[3]
} else if (y[3] < first & y[3] < second){
  third <- y[3]
} else {
  third <- second
  second <- y[3]
}

print(paste(first,second,third))
## [1] "11 7 4"

Example while loop

x <- 0
while (x < 10) {
  print(paste0('x is:',x))
  x <- x+1
 if (x ==7){
  print("x is equal to 7; break loop!")
   #include the break operator to stop the loop, could be redundant, but I think it's good practice.
  break
}}
## [1] "x is:0"
## [1] "x is:1"
## [1] "x is:2"
## [1] "x is:3"
## [1] "x is:4"
## [1] "x is:5"
## [1] "x is:6"
## [1] "x is equal to 7; break loop!"

Example for loop

For loop syntax is seemingly simple (as all things in programming are), but gets complex quickly.

the basic for syntax looks like this:

for(variable in V){print(variable)}

Let’s see what this looks like in practice.

#sample vector
V <- c(1,2,3,4,5)
for (temp.var in V){
  #Execute code for every temp.var in V
  result <- temp.var + 1
  print(paste('The temp.var plus 1 is equal to:', result))
  }
## [1] "The temp.var plus 1 is equal to: 2"
## [1] "The temp.var plus 1 is equal to: 3"
## [1] "The temp.var plus 1 is equal to: 4"
## [1] "The temp.var plus 1 is equal to: 5"
## [1] "The temp.var plus 1 is equal to: 6"

In this for loop we see that for every temp variable, we create a new variable ‘result’ which is temp.var + 1, then we print a statement, and then the result into a statement than ran 5 times, the number of variables in the loop.

Nested for loops

We can nest for loops inside of each other. We’ll practice this by creating a matrix this time instead of a vector

#This matrix has 5 rows, and since it goes from 1-15, this means it'll have 3 columns
matrix_A <- matrix(1:15, nrow = 5)
matrix_A
##      [,1] [,2] [,3]
## [1,]    1    6   11
## [2,]    2    7   12
## [3,]    3    8   13
## [4,]    4    9   14
## [5,]    5   10   15
#for every row in matrix_A, and for every column in matrix_A, print the value
for (row in 1:nrow(matrix_A)){
  for (col in 1:ncol(matrix_A)){
    print(paste('The element at row:', row, 'and col:', col, 'is', matrix_A[row, col]))
  }
}
## [1] "The element at row: 1 and col: 1 is 1"
## [1] "The element at row: 1 and col: 2 is 6"
## [1] "The element at row: 1 and col: 3 is 11"
## [1] "The element at row: 2 and col: 1 is 2"
## [1] "The element at row: 2 and col: 2 is 7"
## [1] "The element at row: 2 and col: 3 is 12"
## [1] "The element at row: 3 and col: 1 is 3"
## [1] "The element at row: 3 and col: 2 is 8"
## [1] "The element at row: 3 and col: 3 is 13"
## [1] "The element at row: 4 and col: 1 is 4"
## [1] "The element at row: 4 and col: 2 is 9"
## [1] "The element at row: 4 and col: 3 is 14"
## [1] "The element at row: 5 and col: 1 is 5"
## [1] "The element at row: 5 and col: 2 is 10"
## [1] "The element at row: 5 and col: 3 is 15"

Applying this to a data frame

So now that we’ve shown how if statements, for, and while loops work, let’s start using these in practice with the data frame “diamonds” that comes in the tidyverse package

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(hexbin)

Let’s take a look at the structure of the variables:

str(diamonds)
## tibble[,10] [53,940 x 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

It looks like we have 10 variables. 3 ordered factors: cut, color, and clarity; one integer: price; and then the rest are numeric: Carat, depth, table, x, y, and z.

Let’s create a new variable in the data frame by combining the mutate() function with an ifelse() statement to include have everything. Let’s create a variable called “chonkers” for any diamond above 1 carat

diamonds <- diamonds %>%
  mutate(chonkers = ifelse(carat >= 2.0,"chonker", "small potatoes"))

Cool, we now have a new varaible in the data frame. Let’s filter to look at only the chonkers

beeg_diamonds <- diamonds %>%
  filter(chonkers == "chonker")
head(beeg_diamonds)
## # A tibble: 6 x 11
##   carat cut     color clarity depth table price     x     y     z chonkers
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>   
## 1  2    Premium J     I1       61.5    59  5051  8.11  8.06  4.97 chonker 
## 2  2.06 Premium J     I1       61.2    58  5203  8.1   8.07  4.95 chonker 
## 3  2.14 Fair    J     I1       69.4    57  5405  7.74  7.7   5.36 chonker 
## 4  2.15 Fair    J     I1       65.5    57  5430  8.01  7.95  5.23 chonker 
## 5  2.22 Fair    J     I1       66.7    56  5607  8.04  8.02  5.36 chonker 
## 6  2    Fair    I     I1       66      60  5667  7.78  7.74  5.1  chonker

EDA and Plots

Let’s do a little bit of eda. Looks like there’s a really clear relationship between carat and price.

ggplot(data = diamonds, aes(x=carat, y = price)) + geom_point()

Let’s add add a variable using “cut” as our color to see if that adds any useful information

ggplot(data = diamonds, aes(x=carat, y = price, color = cut),group_by(cut)) + geom_point()

I find that the hex graphs give a better understanding of the scope of it

ggplot(data = diamonds, aes(x=carat, y = price)) + geom_hex(bins=50)

#now only chonkers

ggplot(data = beeg_diamonds,aes(carat, price))+geom_hex(bins = 30)

Looking at that last graph, we see that there aren’t many expensive diamonds as they get bigger than 2.0 carats. Maybe there’s a diminishing return at somepoint, or a logarithmic relationship between size of diamond and the price?