Hey there! This brief markdown will simply demonstrate my ability to create “if” statements, “for” loops, and “while” loops. These are nothing groundbreaking, but it’s good to have a log of skills.
In this example I’ve set a variable ‘x’ to 13, and I’ve written an if-else statement that prints ‘x is even’ if x is an even number and ‘x is not even’ if x is an odd number. We use a modulo to see if the remainder is divisible by 2.
x <-13
if (x%%2 == 0){
#code
print('x is even')
} else {
print('x is not even')
}
## [1] "x is not even"
#same thing but reassigned x to 12
x <-12
if (x%%2 == 0){
#code
print('x is even')
} else {
print('x is not even')
}
## [1] "x is even"
I didn’t include this above, but if you wanted to add a third condition, for arguments that weren’t binary you could add an ‘elseif’ statement with another condition.
we want to put a vector into the correct order, in this loop, we will not be returning a vector. We will be returning the values in descending order (largest to smallest)
y <- c(11,4,7)
if(y[1] > y[2]){
#comparing the first number
first <- y[1]
second <- y[2]
} else {
first <- y[2]
second <- y[1]
}
# compare the third number
if(y[3] > first & y[3] > second){
third <- second
second <- first
first <- y[3]
} else if (y[3] < first & y[3] < second){
third <- y[3]
} else {
third <- second
second <- y[3]
}
print(paste(first,second,third))
## [1] "11 7 4"
x <- 0
while (x < 10) {
print(paste0('x is:',x))
x <- x+1
if (x ==7){
print("x is equal to 7; break loop!")
#include the break operator to stop the loop, could be redundant, but I think it's good practice.
break
}}
## [1] "x is:0"
## [1] "x is:1"
## [1] "x is:2"
## [1] "x is:3"
## [1] "x is:4"
## [1] "x is:5"
## [1] "x is:6"
## [1] "x is equal to 7; break loop!"
For loop syntax is seemingly simple (as all things in programming are), but gets complex quickly.
the basic for syntax looks like this:
for(variable in V){print(variable)}
Let’s see what this looks like in practice.
#sample vector
V <- c(1,2,3,4,5)
for (temp.var in V){
#Execute code for every temp.var in V
result <- temp.var + 1
print(paste('The temp.var plus 1 is equal to:', result))
}
## [1] "The temp.var plus 1 is equal to: 2"
## [1] "The temp.var plus 1 is equal to: 3"
## [1] "The temp.var plus 1 is equal to: 4"
## [1] "The temp.var plus 1 is equal to: 5"
## [1] "The temp.var plus 1 is equal to: 6"
In this for loop we see that for every temp variable, we create a new variable ‘result’ which is temp.var + 1, then we print a statement, and then the result into a statement than ran 5 times, the number of variables in the loop.
We can nest for loops inside of each other. We’ll practice this by creating a matrix this time instead of a vector
#This matrix has 5 rows, and since it goes from 1-15, this means it'll have 3 columns
matrix_A <- matrix(1:15, nrow = 5)
matrix_A
## [,1] [,2] [,3]
## [1,] 1 6 11
## [2,] 2 7 12
## [3,] 3 8 13
## [4,] 4 9 14
## [5,] 5 10 15
#for every row in matrix_A, and for every column in matrix_A, print the value
for (row in 1:nrow(matrix_A)){
for (col in 1:ncol(matrix_A)){
print(paste('The element at row:', row, 'and col:', col, 'is', matrix_A[row, col]))
}
}
## [1] "The element at row: 1 and col: 1 is 1"
## [1] "The element at row: 1 and col: 2 is 6"
## [1] "The element at row: 1 and col: 3 is 11"
## [1] "The element at row: 2 and col: 1 is 2"
## [1] "The element at row: 2 and col: 2 is 7"
## [1] "The element at row: 2 and col: 3 is 12"
## [1] "The element at row: 3 and col: 1 is 3"
## [1] "The element at row: 3 and col: 2 is 8"
## [1] "The element at row: 3 and col: 3 is 13"
## [1] "The element at row: 4 and col: 1 is 4"
## [1] "The element at row: 4 and col: 2 is 9"
## [1] "The element at row: 4 and col: 3 is 14"
## [1] "The element at row: 5 and col: 1 is 5"
## [1] "The element at row: 5 and col: 2 is 10"
## [1] "The element at row: 5 and col: 3 is 15"
So now that we’ve shown how if statements, for, and while loops work, let’s start using these in practice with the data frame “diamonds” that comes in the tidyverse package
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.1 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(hexbin)
Let’s take a look at the structure of the variables:
str(diamonds)
## tibble[,10] [53,940 x 10] (S3: tbl_df/tbl/data.frame)
## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
It looks like we have 10 variables. 3 ordered factors: cut, color, and clarity; one integer: price; and then the rest are numeric: Carat, depth, table, x, y, and z.
Let’s create a new variable in the data frame by combining the mutate() function with an ifelse() statement to include have everything. Let’s create a variable called “chonkers” for any diamond above 1 carat
diamonds <- diamonds %>%
mutate(chonkers = ifelse(carat >= 2.0,"chonker", "small potatoes"))
Cool, we now have a new varaible in the data frame. Let’s filter to look at only the chonkers
beeg_diamonds <- diamonds %>%
filter(chonkers == "chonker")
head(beeg_diamonds)
## # A tibble: 6 x 11
## carat cut color clarity depth table price x y z chonkers
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>
## 1 2 Premium J I1 61.5 59 5051 8.11 8.06 4.97 chonker
## 2 2.06 Premium J I1 61.2 58 5203 8.1 8.07 4.95 chonker
## 3 2.14 Fair J I1 69.4 57 5405 7.74 7.7 5.36 chonker
## 4 2.15 Fair J I1 65.5 57 5430 8.01 7.95 5.23 chonker
## 5 2.22 Fair J I1 66.7 56 5607 8.04 8.02 5.36 chonker
## 6 2 Fair I I1 66 60 5667 7.78 7.74 5.1 chonker
Let’s do a little bit of eda. Looks like there’s a really clear relationship between carat and price.
ggplot(data = diamonds, aes(x=carat, y = price)) + geom_point()
Let’s add add a variable using “cut” as our color to see if that adds any useful information
ggplot(data = diamonds, aes(x=carat, y = price, color = cut),group_by(cut)) + geom_point()
I find that the hex graphs give a better understanding of the scope of it
ggplot(data = diamonds, aes(x=carat, y = price)) + geom_hex(bins=50)
#now only chonkers
ggplot(data = beeg_diamonds,aes(carat, price))+geom_hex(bins = 30)
Looking at that last graph, we see that there aren’t many expensive diamonds as they get bigger than 2.0 carats. Maybe there’s a diminishing return at somepoint, or a logarithmic relationship between size of diamond and the price?