R for Climate Data Analysis

The R programming language has proven to be an invaluable tool for climate data analysis, enabling researchers and scientists to unravel intricate patterns and trends within complex environmental datasets. Its versatility in handling diverse types of data, coupled with its extensive library of specialized packages, empowers analysts to efficiently process, visualize, and model climate information. From temperature fluctuations and precipitation trends to sea-level rise projections and ecosystem dynamics, R’s robust statistical functions and advanced graphical capabilities provide the means to extract meaningful insights from raw climate data. Whether scrutinizing historical records or delving into real-time sensor readings, R’s useR -friendly syntax and rich visualization options facilitate the exploration of climate phenomena, fostering a deeper understanding of our planet’s eveR -evolving climatic processes.

R- Basic Syntax

As a convention, we will start learning R programming by writing a “Hello, World!” program. Depending on the needs, you can program either at R command prompt or you can use an R script file to write your program. Let’s check both one by one.

# My first program in R Programming Language
myString <- "Hello, there!, My Name is ------, and I'm ready for my first journey on R" 
print(myString)
## [1] "Hello, there!, My Name is ------, and I'm ready for my first journey on R"

R- Data Types

Generally, while doing programming in any programming language, you need to use various variables to store various information. Variables are nothing but reserved memory locations to store values. This means that, when you create a variable you reserve some space in memory.

You may like to store information of various data types like character, wide character, integer, floating point, double floating point, Boolean etc. Based on the data type of a variable, the operating system allocates memory and decides what can be stored in the reserved memory. There are many types of R objects and the frequently used ones are :

  1. Vectors
  2. Lists
  3. Matrices
  4. Arrays
  5. Factors
  6. Data Frames

The simplest of these objects is the vector object and there are six data types of these atomic vectors, also termed as six classes of vectors. The other R -Objects are built upon the atomic vectors.

Vectors

When you want to create vector with more than one element, you should use c() function which means to combine the elements into a vector

# Create a vector.
weather <- c('Sunny','Cloudy',"Rainy")
print(weather)
## [1] "Sunny"  "Cloudy" "Rainy"
# Get the class of the vector.
print(class(weather))
## [1] "character"

List

A list is an R -object which can contain many different types of elements inside it like vectors, functions and even another list inside it.

# Create a list
weather_list <- list(c(24,27,25),28.5,sin)

# Print the list.
print(weather_list)
## [[1]]
## [1] 24 27 25
## 
## [[2]]
## [1] 28.5
## 
## [[3]]
## function (x)  .Primitive("sin")

Matrices

A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix() function.

# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
##      [,1] [,2] [,3]
## [1,] "a"  "a"  "b" 
## [2,] "c"  "b"  "a"

Arrays

While matrices are confined to two dimensions, arrays can be of any number of dimensions. The array function takes a dim attribute which creates the required number of dimension. In the below example we create an array with two elements which are 3x3 matrices each.

# Create an array.
weather <- array(c('Rainy','Cloudy','Sunny'),dim = c(3,3,2))
print(weather)
## , , 1
## 
##      [,1]     [,2]     [,3]    
## [1,] "Rainy"  "Rainy"  "Rainy" 
## [2,] "Cloudy" "Cloudy" "Cloudy"
## [3,] "Sunny"  "Sunny"  "Sunny" 
## 
## , , 2
## 
##      [,1]     [,2]     [,3]    
## [1,] "Rainy"  "Rainy"  "Rainy" 
## [2,] "Cloudy" "Cloudy" "Cloudy"
## [3,] "Sunny"  "Sunny"  "Sunny"

Factors

Factors are the R -objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.

Factors are created using the factor() function. The nlevels functions gives the count of levels.

# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')
temp_level <- c(27,28,30,22,24,23,23.5,31)
# Create a factor object.
factor_apple <- factor(apple_colors)
factor_temp <- factor(temp_level)
# Print the factor.
print(factor_apple)
## [1] green  green  yellow red    red    red    green 
## Levels: green red yellow
print(nlevels(factor_apple))
## [1] 3
print(factor_temp)
## [1] 27   28   30   22   24   23   23.5 31  
## Levels: 22 23 23.5 24 27 28 30 31
print(nlevels(factor_temp))
## [1] 8

Data Frames

Data frames are tabular data objects. Unlike a matrix in data frame each column can contain different modes of data. The first column can be numeric while the second column can be character and third column can be logical. It is a list of vectors of equal length.

Data Frames are created using the data.frame() function.

# Create the data frame.
BMI <-  data.frame(
   gender = c("Male", "Male","Female"), 
   height = c(152, 171.5, 165), 
   weight = c(81,93, 78),
   Age = c(42,38,26)
)
print(BMI)
##   gender height weight Age
## 1   Male  152.0     81  42
## 2   Male  171.5     93  38
## 3 Female  165.0     78  26
# Membuat data frame
df <- data.frame(indeks = c(0, 1, 1.5, -0.5, -1, -1, -1.5, 0.5, 0, 1, 0.5,1))
bulan <- seq(as.Date("2022-01-01"), 
             as.Date("2022-12-01"),
             by = "month")
df$bulan <- bulan
df$bulan <- month.abb
# Menambahkan label kategori SPI dan SPEI langsung pada df
df$spi_kategori <- cut(df$indeks, 
                       breaks = c(-Inf, -2, -1.5, -1, -0.5, 0.5, 1, 1.5, 2, Inf),
                       labels = c("Extreme Drought", "Severe Drought", "Moderate Drought",
                                  "Mild Drought", "Near Normal", "Mild Wet", 
                                  "Moderate Wet", "Severe Wet", "Extreme Wet"))
df
##    indeks bulan     spi_kategori
## 1     0.0   Jan      Near Normal
## 2     1.0   Feb         Mild Wet
## 3     1.5   Mar     Moderate Wet
## 4    -0.5   Apr     Mild Drought
## 5    -1.0   May Moderate Drought
## 6    -1.0   Jun Moderate Drought
## 7    -1.5   Jul   Severe Drought
## 8     0.5   Aug      Near Normal
## 9     0.0   Sep      Near Normal
## 10    1.0   Oct         Mild Wet
## 11    0.5   Nov      Near Normal
## 12    1.0   Dec         Mild Wet

R- Variables

A variable provides us with named storage that our programs can manipulate. A variable in R can store an atomic vector, group of atomic vectors or a combination of many Robjects. A valid variable name consists of letters, numbers and the dot or underline characters. The variable name starts with a letter or the dot not followed by a number.

Variable Name Validity Reason
var_name2. Valid Has letters, numbers, dot and underscore
var_name% Invalid Has the character ‘%’. Only dot(.) and underscore allowed
2var_name Invalid Starts with a number
.var_name, var.name Valid Can start with a dot(.) but the dot(.) should not be followed by a number.
.2var_name Invalid The starting dot is following by the number making it invalid.
var_name Invalid Starts with_which is not valid

Variable Assignment

The variables can be assigned values using leftward, rightward and equal to operator. The values of the variables can be printed using print() or cat() function. The cat() function combines multiple items into a continuous print output.

# Assignment using equal operator.
var.1 = c(0,1,2,3)           

# Assignment using leftward operator.
var.2 <- c("learn","R")   

# Assignment using rightward operator.   
c(TRUE,1) -> var.3           

print(var.1)
## [1] 0 1 2 3
cat ("var.1 is ", var.1 ,"\n")
## var.1 is  0 1 2 3
cat ("var.2 is ", var.2 ,"\n")
## var.2 is  learn R
cat ("var.3 is ", var.3 ,"\n")
## var.3 is  1 1

Note − The vector c(TRUE,1) has a mix of logical and numeric class. So logical class is coerced to numeric class making TRUE as 1.

Data Type of a Variable

In R, a variable itself is not declared of any data type, rather it gets the data type of the R - object assigned to it. So R is called a dynamically typed language, which means that we can change a variable’s data type of the same variable again and again when using it in a program.

var_x <- "Hello"
cat("The class of var_x is ",class(var_x),"\n")
## The class of var_x is  character
var_x <- 34.5
cat("  Now the class of var_x is ",class(var_x),"\n")
##   Now the class of var_x is  numeric
var_x <- 27L
cat("   Next the class of var_x becomes ",class(var_x),"\n")
##    Next the class of var_x becomes  integer

Finding Variables

To know all the variables currently available in the workspace we use the ls() function. Also the ls() function can use patterns to match the variable names.

print(ls())
##  [1] "apple_colors" "BMI"          "bulan"        "df"           "factor_apple"
##  [6] "factor_temp"  "M"            "myString"     "temp_level"   "var.1"       
## [11] "var.2"        "var.3"        "var_x"        "weather"      "weather_list"

Note − It is a sample output depending on what variables are declared in your environment. The ls() function can use patterns to match the variable names.

# List the variables starting with the pattern "var".
print(ls(pattern = "var"))  
## [1] "var.1" "var.2" "var.3" "var_x"

The variables starting with dot(.) are hidden, they can be listed using all.names = TRUE argument to ls() function.

# List the variables starting with the pattern "var".
print(ls(all.name = TRUE))  
##  [1] "apple_colors" "BMI"          "bulan"        "df"           "factor_apple"
##  [6] "factor_temp"  "M"            "myString"     "temp_level"   "var.1"       
## [11] "var.2"        "var.3"        "var_x"        "weather"      "weather_list"

Deleting Variables

Variables can be deleted by using the rm() function. Below we delete the variable var.3. On printing the value of the variable error is thrown.

rm(var.3)
print(var.3)
[1] "var.3"
Error in print(var.3) : object 'var.3' not found

All the variables can be deleted by using the rm() and ls() function together.

rm(list = ls())
print(ls())
## character(0)
# SEE, YOUR ENVIRONMENT IS EMPTY

R- Operators

An operator is a symbol that tells the compiler to perform specific mathematical or logical manipulations. R language is rich in built-in operators and provides following types of operators. Types of Operators

Arithmetic Operators

Operator Description Example
+ Adds two vectors
a <- c(2, 4)
b <- c(5, 6)
print(a+b)
[1] 7 10
- Subtracts two vectors
a <- c(2, 4)
b <- c(5, 6)
print(a-b)
[1] -3 2
* Multiplies both vectors
a <- c(2, 4)
b <- c(5, 6)
print(a*b)
[1] 10 24
/ Divide vectors
a <- c(2, 4)
b <- c(5, 6)
print(a/b)
[1] 0.4000000 0.6666667
%% Give the remainder of the first vector with the second
a <- c(2, 4)
b <- c(5, 6)
print(a%%b)
[1] 2 4
%/% The result of division of first vector with second (quotient)
a <- c(2, 4)
b <- c(5, 6)
print(a%/%b)
[1] 0 0
^ The first vector raised to the exponent of second vector
a <- c(2, 4)
b <- c(5, 6)
print(a%^%b)
[1]   32 4096

Relational Operators

Following table shows the relational operators supported by R language. Each element of the first vector is compared with the corresponding element of the second vector. The result of comparison is a Boolean value

Operator Description Example
> Checks if each element of the first vector is greater than the corresponding element of the second vector.
a <- c(2, 4)
b <- c(5, 6)
print(a>b)
[1] FALSE FALSE
< Checks if each element of the first vector is less than the corresponding element of the second vector.
a <- c(2, 4)
b <- c(5, 6)
print(a<b)
[1] TRUE TRUE
== Checks if each element of the first vector is equal to the corresponding element of the second vector.
a <- c(2, 4)
b <- c(5, 6)
print(a==b)
[1] FALSE FALSE
<= Checks if each element of the first vector is less than or equal to the corresponding element of the second vector.
a <- c(2, 4)
b <- c(5, 6)
print(a<=b)
[1] TRUE TRUE
>= Checks if each element of the first vector is greater than or equal to the corresponding element of the second vector.
a <- c(2, 4)
b <- c(5, 6)
print(a>=b)
[1] FALSE FALSE
!= Checks if each element of the first vector is unequal to the corresponding element of the second vector.
a <- c(2, 4)
b <- c(5, 6)
print(a!=b)
[1] TRUE TRUE

Logical Operators

Following table shows the logical operators supported by R language. It is applicable only to vectors of type logical, numeric or complex. All numbers greater than 1 are considered as logical value TRUE.

Each element of the first vector is compared with the corresponding element of the second vector. The result of comparison is a Boolean value.

## Operator &
a <- c(3,1,TRUE, 2+3i)
b <- c(4,1, FALSE, 2+3i)
print(a&b)
## [1]  TRUE  TRUE FALSE  TRUE

It is called Element-wise Logical AND operator. It combines each element of the first vector with the corresponding element of the second vector and gives a output TRUE if both the elements are TRUE.

## Operator |
a <- c(3,0,TRUE, 2+3i)
b <- c(4,0, FALSE, 2+3i)
print(a|b)
## [1]  TRUE FALSE  TRUE  TRUE

It is called Element-wise Logical OR operator. It combines each element of the first vector with the corresponding element of the second vector and gives a output TRUE if one the elements is TRUE.

## Operator !
a <- c(3,0,TRUE, 2+3i)
print(!a)
## [1] FALSE  TRUE FALSE FALSE

It is called Logical NOT operator. Takes each element of the vector and gives the opposite logical value.

Assignment Operators

These operators are used to assign values to vector

# Called Left Assignment ( <- or = or <<-)
a1 <- c(3,1,TRUE,2+3i)
b2 <<- c(3,1,TRUE,2+3i)
c3 = c(3,1,TRUE,2+3i)
print(a1)
## [1] 3+0i 1+0i 1+0i 2+3i
print(b2)
## [1] 3+0i 1+0i 1+0i 2+3i
print(c3)
## [1] 3+0i 1+0i 1+0i 2+3i
# Called Right Assignment ( -> or ->>)
c(3,1,TRUE,2+3i) -> a1
c(3,1,TRUE,2+3i) ->> b2
print(a1)
## [1] 3+0i 1+0i 1+0i 2+3i
print(b2)
## [1] 3+0i 1+0i 1+0i 2+3i

Miscellaneous Operators

These operators are used to for specific purpose and not general mathematical or logical computation.

# :
a <- 1:8
print(a)
## [1] 1 2 3 4 5 6 7 8

Colon operator. It creates the series of numbers in sequence for a vector.

# %in%
a <- 8
b <- 15
c <- 1:10
print(a %in% c)
## [1] TRUE
print(b %in% c)
## [1] FALSE

This operator is used to identify if an element belongs to a vector.

# %*%
M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow = TRUE)
t = M %*% t(M)
print(M)
##      [,1] [,2] [,3]
## [1,]    2    6    5
## [2,]    1   10    4
print(t(M))
##      [,1] [,2]
## [1,]    2    1
## [2,]    6   10
## [3,]    5    4
print(t)
##      [,1] [,2]
## [1,]   65   82
## [2,]   82  117

This operator is used to multiply a matrix with its transpose

R- Decision Making

Decision making structures require the programmer to specify one or more conditions to be evaluated or tested by the program, along with a statement or statements to be executed if the condition is determined to be true, and optionally, other statements to be executed if the condition is determined to be false.

If statement

An if statement consists of a Boolean expression followed by one or more statements.

if(boolean_expression) {
   // statement(s) will execute if the boolean expression is true.
}

If the Boolean expression evaluates to be true, then the block of code inside the if statement will be executed. If Boolean expression evaluates to be false, then the first set of code after the end of the if statement (after the closing curly brace) will be executed.

x <- 30L
if(is.integer(x)){
  print("X is an Integer")
}
## [1] "X is an Integer"

If Else Statement

if(boolean_expression) {
   // statement(s) will execute if the boolean expression is true.
} else {
   // statement(s) will execute if the boolean expression is false.
}

If Else

x <- c("what", "is", "truth")
if("Truth" %in% x) {
   print("Truth is found")
} else {
   print("Truth is not found")
}
## [1] "Truth is not found"

Here “Truth” and “truth are two different strings

The if…else if…else Statement

An if statement can be followed by an optional else if…else statement, which is very useful to test various conditions using single if…else if statement.

When using if, else if, else statements there are few points to keep in mind.

  • An if can have zero or one else and it must come after any else if’s.

  • An if can have zero to many else if’s and they must come before the else.

  • Once an else if succeeds, none of the remaining else if’s or else’s will be tested.

x <- c("what","is","truth")

if("Truth" %in% x) {
   print("Truth is found the first time")
} else if ("truth" %in% x) {
   print("truth is found the second time")
} else {
   print("No truth found")
}
## [1] "truth is found the second time"

R- Loops

There may be a situation when you need to execute a block of code several number of times. In general, statements are executed sequentially. The first statement in a function is executed first, followed by the second, and so on.

Programming languages provide various control structures that allow for more complicated execution paths.

A loop statement allows us to execute a statement or group of statements multiple times and the following is the general form of a loop statement in most of the programming languages −

Loop

Repeat Loop

Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable.

The basic syntax for creating a repeat loop in R is

repeat { 
   commands 
   if(condition) {
      break
   }
}

Break Statement

v <- c("Hello","loop")
cnt <- 2

repeat {
   print(v)
   cnt <- cnt+1
   
   if(cnt > 5) {
      break
   }
}
## [1] "Hello" "loop" 
## [1] "Hello" "loop" 
## [1] "Hello" "loop" 
## [1] "Hello" "loop"

While Loop

Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body.

The While loop executes the same code again and again until a stop condition is met.

while (test_expression) {
   statement
}

While

init <- 1
while (init <5){
  init = init +1
  print(init)
}
## [1] 2
## [1] 3
## [1] 4
## [1] 5

For Loop

A For loop is a repetition control structure that allows you to efficiently write a loop that needs to execute a specific number of times.

For

me <- c(0,50,100)
for (you in me){
  print(paste(you, "% Love Me"))
}
## [1] "0 % Love Me"
## [1] "50 % Love Me"
## [1] "100 % Love Me"

Examples

For Statement

So, we want to see the SPI/SPEI categoricals by the values below

indeks <- c(-2, -1.5, -1, -0.5, 0, 1, 1.5, 2)

for (SPI in indeks) {
  if (SPI > 1) {
    print("Wet")
  } else if (SPI < -1) {
    print("Drought")
  } else {
    print("Near Normal")
  }
}
## [1] "Drought"
## [1] "Drought"
## [1] "Near Normal"
## [1] "Near Normal"
## [1] "Near Normal"
## [1] "Near Normal"
## [1] "Wet"
## [1] "Wet"
While Statement

Kode selesai di eksekusi jika kondisi mencapai FALSE

init <- 1
while(init <5){
  print(init)
  init = init + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
indeks <- c(-2, -1.5, -1, -0.5, 0, 1, 1.5, 2)
i <- 1

while (i <= length(indeks)) {
  SPI <- indeks[i]
  
  if (SPI > 1) {
    print(paste("SPI =", SPI, "is categorized as Wet"))
  } else if (SPI < -1) {
    print(paste("SPI =", SPI, "is categorized as Drought"))
  } else {
    print(paste("SPI =", SPI, "is categorized as Near Normal"))
  }
  
  i <- i + 1
}
## [1] "SPI = -2 is categorized as Drought"
## [1] "SPI = -1.5 is categorized as Drought"
## [1] "SPI = -1 is categorized as Near Normal"
## [1] "SPI = -0.5 is categorized as Near Normal"
## [1] "SPI = 0 is categorized as Near Normal"
## [1] "SPI = 1 is categorized as Near Normal"
## [1] "SPI = 1.5 is categorized as Wet"
## [1] "SPI = 2 is categorized as Wet"
Break Statement

Menghentikan eksekusi kode looping jika kondisi terpenuhi

# Program berhenti jika ch > 50 
ch <- c(0,90,5,10)
for (chval in ch){
  if (chval > 50){
    print("CH Lebih dari 50 mm")
    break
  }
}
## [1] "CH Lebih dari 50 mm"
indeks <- c(-2, -1.5, -1, -0.5, 0, 1, 1.5, 2)

for (SPI in indeks) {
  if (SPI > 1) {
    print(paste("SPI =", SPI, "is categorized as Wet"))
  } else if (SPI < -1) {
    print(paste("SPI =", SPI, "is categorized as Drought"))
  } else {
    print(paste("SPI =", SPI, "is categorized as Near Normal"))
  }
  
  # Add a break statement here if you want to exit the loop
  if (SPI == 1.5) {
    break
  }
}
## [1] "SPI = -2 is categorized as Drought"
## [1] "SPI = -1.5 is categorized as Drought"
## [1] "SPI = -1 is categorized as Near Normal"
## [1] "SPI = -0.5 is categorized as Near Normal"
## [1] "SPI = 0 is categorized as Near Normal"
## [1] "SPI = 1 is categorized as Near Normal"
## [1] "SPI = 1.5 is categorized as Wet"
##### Next Statement

#Program melanjutkan eksekusi kode ketika chval >50
 ch <- c(0, 90, 0, 3)
 for (chval in ch) {
     if (chval > 50) {
         next
     } else {
         print("CH kurang dari 50 mm")
     }
 }
## [1] "CH kurang dari 50 mm"
## [1] "CH kurang dari 50 mm"
## [1] "CH kurang dari 50 mm"
indeks <- c(-2, -1.5, -1, -0.5, 0, 1, 1.5, 2)

for (SPI in indeks) {
  if (SPI > 1) {
    print(paste("SPI =", SPI, "is categorized as Wet"))
    next
  } else if (SPI < -1) {
    print(paste("SPI =", SPI, "is categorized as Drought"))
    next
  } else {
    print(paste("SPI =", SPI, "is categorized as Near Normal"))
    next
  }
}
## [1] "SPI = -2 is categorized as Drought"
## [1] "SPI = -1.5 is categorized as Drought"
## [1] "SPI = -1 is categorized as Near Normal"
## [1] "SPI = -0.5 is categorized as Near Normal"
## [1] "SPI = 0 is categorized as Near Normal"
## [1] "SPI = 1 is categorized as Near Normal"
## [1] "SPI = 1.5 is categorized as Wet"
## [1] "SPI = 2 is categorized as Wet"
Return Statement

Fungsinya untuk mengembalikan suatu nilai dari fungsi function()

 fun1 <- function(ch) {
     if (ch >= 1) {
         result <- "Hujan"
     }
     else {
         result <- "Tidak Hujan"
     }
  return(result)
 }
fun1(2)
## [1] "Hujan"
fun1(0.5)
## [1] "Tidak Hujan"
categorize_SPI <- function(indeks) {
  results <- character(length(indeks))  # Preallocate a vector for results

  for (i in seq_along(indeks)) {
    SPI <- indeks[i]

    if (SPI > 1) {
      results[i] <- paste("SPI =", SPI, "is categorized as Wet")
    } else if (SPI < -1) {
      results[i] <- paste("SPI =", SPI, "is categorized as Drought")
    } else {
      results[i] <- paste("SPI =", SPI, "is categorized as Near Normal")
    }
  }

  return(results)  # Return the categorized results
}

indeks <- c(-2, -1.5, -1, -0.5, 0, 1, 1.5, 2)
categorized_results <- categorize_SPI(indeks)
print(categorized_results)
## [1] "SPI = -2 is categorized as Drought"      
## [2] "SPI = -1.5 is categorized as Drought"    
## [3] "SPI = -1 is categorized as Near Normal"  
## [4] "SPI = -0.5 is categorized as Near Normal"
## [5] "SPI = 0 is categorized as Near Normal"   
## [6] "SPI = 1 is categorized as Near Normal"   
## [7] "SPI = 1.5 is categorized as Wet"         
## [8] "SPI = 2 is categorized as Wet"

R- Functions

In RStudio, a function is a block of organized, reusable code designed to perform a specific task. It allows you to encapsulate a sequence of commands into a single unit, which can then be called multiple times with different inputs. Functions in R are an essential concept in programming, as they help modularize code, improve code readability, and promote code reusability.

Here’s a breakdown of the key components of an R function in RStudio:

  1. Function Name: This is the identifier you give to your function. It should be a meaningful name that describes the purpose of the function.

  2. Arguments: Functions can take one or more arguments, which are inputs provided when the function is called. Arguments specify the data or values the function will operate on. Arguments are enclosed in parentheses and separated by commas.

  3. Function Body: This is where you define the sequence of statements that the function will execute when called. It includes the code that performs the desired task. The function body is enclosed in curly braces {}.

  4. Return Value: A function can produce a result that is returned to the caller. You use the return() statement to specify the value you want the function to return. If there is no return() statement, the function will return the value of the last evaluated expression.

Here’s a simple example of an R function:

# Define a function that calculates the square of a number
square <- function(x) {
  result <- x^2
  return(result)
}

# Call the function and store the result
squared_value <- square(5)
print(squared_value)  # Output: 25
## [1] 25

In this example:

- The function name is square.
- The function takes an argument x.
- The function calculates the square of x using the expression x^2.
- The result is returned using the return() statement.

You can define your own functions in RStudio and use them to encapsulate complex logic, avoid code repetition, and make your code more organized and readable.

R has many in-built functions which can be directly called in the program without defining them first. We can also create and use our own functions referred as user defined functions.

Simple examples of in-built functions are seq(), mean(), max(), sum(x) and paste(…) etc. They are directly called by user written programs. You can refer most widely used R functions.

# Create a sequence of numbers from 32 to 44.
print(seq(32,44))
##  [1] 32 33 34 35 36 37 38 39 40 41 42 43 44
# Find mean of numbers from 25 to 82.
print(mean(25:82))
## [1] 53.5
# Find sum of numbers frm 41 to 68.
print(sum(41:68))
## [1] 1526

R- Strings

Many strings in R are combined using the paste() function. It can take any number of arguments to be combined together.

The basic syntax for paste function is

paste(..., sep="", collapse = NULL)

Following is the description of the parameters used −

  • (…) represents any number of arguments to be combined.

  • sep represents any separator between the arguments. It is optional.

  • collapse is used to eliminate the space in between two strings. But not the space within two words of one string.

a <- "Hello"
b <- 'How'
c <- "are you? "

print(paste(a,b,c))
## [1] "Hello How are you? "
print(paste(a,b,c, sep = "-"))
## [1] "Hello-How-are you? "
print(paste(a,b,c, sep = "", collapse = ""))
## [1] "HelloHoware you? "

Formatting Numbers and Strings

The basic syntax for format function is

format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none")) 

Following is the description of the parameters used −

  • x is the vector input.

  • digits is the total number of digits displayed.

  • nsmall is the minimum number of digits to the right of the decimal point.

  • scientific is set to TRUE to display scientific notation.

  • width indicates the minimum width to be displayed by padding blanks in the beginning.

  • justify is the display of the string to left, right or center.

# Total number of digits displayed. Last digit rounded off.
result <- format(23.123456789, digits = 9)
print(result)
## [1] "23.1234568"
# Display numbers in scientific notation.
result <- format(c(6, 13.14521), scientific = TRUE)
print(result)
## [1] "6.000000e+00" "1.314521e+01"
# The minimum number of digits to the right of the decimal point.
result <- format(23.47, nsmall = 5)
print(result)
## [1] "23.47000"
# Format treats everything as a string.
result <- format(6)
print(result)
## [1] "6"
# Numbers are padded with blank in the beginning for width.
result <- format(13.7, width = 6)
print(result)
## [1] "  13.7"
# Left justify strings.
result <- format("Hello", width = 8, justify = "l")
print(result)
## [1] "Hello   "
# Justfy string with center.
result <- format("Hello", width = 8, justify = "c")
print(result)
## [1] " Hello  "

Counting number of characters in a string nchar() function Basic syntax is nchar(x), x is the vector input

nchar("I LOVE YOU")
## [1] 10

Changing the case - toupper() & tolower() functions

# Changing to Upper case.
result <- toupper("i love you")
print(result)
## [1] "I LOVE YOU"
# Changing to lower case.
result <- tolower("I Love You")
print(result)
## [1] "i love you"

Extracting parts of a string substring() function substring(x,first,last)

  • x is the character vector input
  • first is the position of the first character to be extracted
  • last is the position of the last character to be extracted
# Extract characters from 1st to 7th position.
result <- substring("ILOVEYOU", 1, 7)
print(result)
## [1] "ILOVEYO"

R- Vectors

Single Elements Vector

Vectors are the most basic R data objects and there are six types of atomic vectors. They are logical, integer, double, complex, character and raw.

# Atomic vector of type character.
print("abc");
## [1] "abc"
# Atomic vector of type double.
print(12.5)
## [1] 12.5
# Atomic vector of type integer.
print(63L)
## [1] 63
# Atomic vector of type logical.
print(TRUE)
## [1] TRUE
# Atomic vector of type complex.
print(2+3i)
## [1] 2+3i

Multiple Elements Vector

Using colon operator with numeric data

# Creating a sequence from 5 to 13.
v <- 5:13
print(v)
## [1]  5  6  7  8  9 10 11 12 13
# Creating a sequence from 6.6 to 12.6.
v <- 6.6:12.6
print(v)
## [1]  6.6  7.6  8.6  9.6 10.6 11.6 12.6
# If the final element specified does not belong to the sequence then it is discarded.
v <- 3.8:11.4
print(v)
## [1]  3.8  4.8  5.8  6.8  7.8  8.8  9.8 10.8
# Create vector with elements from 5 to 9 incrementing by 0.4.
print(seq(5, 9, by = 0.4))
##  [1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0
#Using the c() function
s <- c("apple", "red", 5, TRUE)
print(s)
## [1] "apple" "red"   "5"     "TRUE"

Accessing Vector Elements

Elements of a Vector are accessed using indexing. The [ ] brackets are used for indexing. Indexing starts with position 1. Giving a negative value in the index drops that element from result.TRUE, FALSE or 0 and 1 can also be used for indexing.

# Accessing vector elements using position.
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]
print(u)
## [1] "Mon" "Tue" "Fri"
# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)
## [1] "Sun" "Fri"
# Accessing vector elements using negative indexing.
x <- t[c(-2,-5)]
print(x)
## [1] "Sun" "Tue" "Wed" "Fri" "Sat"
# Accessing vector elements using 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]
print(y)
## [1] "Sun"

Vector Manipulation

Two vectors of same length can be added, subtracted, multiplied or divided giving the result as a vector output.

# Create two vectors.
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11,0,8,1,2)

# Vector addition.
add.result <- v1+v2
print(add.result)
## [1]  7 19  4 13  1 13
# Vector subtraction.
sub.result <- v1-v2
print(sub.result)
## [1] -1 -3  4 -3 -1  9
# Vector multiplication.
multi.result <- v1*v2
print(multi.result)
## [1] 12 88  0 40  0 22
# Vector division.
divi.result <- v1/v2
print(divi.result)
## [1] 0.7500000 0.7272727       Inf 0.6250000 0.0000000 5.5000000

Vector Element Recycling

If we apply arithmetic operations to two vectors of unequal length, then the elements of the shorter vector are recycled to complete the operations.

v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11)
# V2 becomes c(4,11,4,11,4,11)

add.result <- v1+v2
print(add.result)
## [1]  7 19  8 16  4 22
sub.result <- v1-v2
print(sub.result)
## [1] -1 -3  0 -6 -4  0

Vector Element Sorting

Elements in a vector can be sorterd using the sort() function

a <- c(4,1,23,4,6,27,90,-9,100)

# Sort the elements of the vector
sort.result <- sort(a)
print(sort.result)
## [1]  -9   1   4   4   6  23  27  90 100
# Sort the elements in the reverse order.
revsort.result <- sort(a, decreasing = TRUE)
print(revsort.result)
## [1] 100  90  27  23   6   4   4   1  -9
# Sorting character vectors.
a <- c("Red","Blue","yellow","violet")
sort.result <- sort(a)
print(sort.result)
## [1] "Blue"   "Red"    "violet" "yellow"
# Sorting character vectors in reverse order.
revsort.result <- sort(a, decreasing = TRUE)
print(revsort.result)
## [1] "yellow" "violet" "Red"    "Blue"

R- List

Lists are the R objects which contain elements of different types like − numbers, strings, vectors and another list inside it. A list can also contain a matrix or a function as its elements. List is created using list() function.

# Create a list containing strings, numbers, vectors and a logical
# values.
list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data)
## [[1]]
## [1] "Red"
## 
## [[2]]
## [1] "Green"
## 
## [[3]]
## [1] 21 32 11
## 
## [[4]]
## [1] TRUE
## 
## [[5]]
## [1] 51.23
## 
## [[6]]
## [1] 119.1

Naming List Elements

The list elements can be given names and they can be accessed using these names.

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Show the list.
print(list_data)
## $`1st Quarter`
## [1] "Jan" "Feb" "Mar"
## 
## $A_Matrix
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8
## 
## $`A Inner list`
## $`A Inner list`[[1]]
## [1] "green"
## 
## $`A Inner list`[[2]]
## [1] 12.3

Accesing List Elements

Elements of the list can be accessed by the index of the element in the list. In case of named lists it can also be accessed using the names.

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Access the first element of the list.
print(list_data[1])
## $`1st Quarter`
## [1] "Jan" "Feb" "Mar"
# Access the thrid element. As it is also a list, all its elements will be printed.
print(list_data[3])
## $`A Inner list`
## $`A Inner list`[[1]]
## [1] "green"
## 
## $`A Inner list`[[2]]
## [1] 12.3
# Access the list element using the name of the element.
print(list_data$A_Matrix)
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8

Manipulating List Elements

We can add, delete and update list elements as shown below. We can add and delete elements only at the end of a list. But we can update any element.

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])
## [[1]]
## [1] "New element"
# Remove the last element.
list_data[4] <- NULL

# Print the 4th Element.
print(list_data[4])
## $<NA>
## NULL
# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])
## $`A Inner list`
## [1] "updated element"

Merging Lists

You can merge many lists into one list by placing all the lists inside one list() function

# Create two lists.
list1 <- list(20,1,8.5)
list2 <- list("Sun","Mon","Tue")
list3 <- list("Rain", "Sunny", "Cloudy")

# Merge the two lists.
merged.list <- c(list1,list2,list3)

# Print the merged list.
print(merged.list)
## [[1]]
## [1] 20
## 
## [[2]]
## [1] 1
## 
## [[3]]
## [1] 8.5
## 
## [[4]]
## [1] "Sun"
## 
## [[5]]
## [1] "Mon"
## 
## [[6]]
## [1] "Tue"
## 
## [[7]]
## [1] "Rain"
## 
## [[8]]
## [1] "Sunny"
## 
## [[9]]
## [1] "Cloudy"

Converting List to Vector

A list can be converted to a vector so that the elements of the vector can be used for further manipulation. All the arithmetic operations on vectors can be applied after the list is converted into vectors. To do this conversion, we use the unlist() function. It takes the list as input and produces a vector.

# Create lists.
list1 <- list(1:5)
print(list1)
## [[1]]
## [1] 1 2 3 4 5
list2 <-list(10:14)
print(list2)
## [[1]]
## [1] 10 11 12 13 14
# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)

print(v1)
## [1] 1 2 3 4 5
print(v2)
## [1] 10 11 12 13 14
# Now add the vectors
result <- v1+v2
print(result)
## [1] 11 13 15 17 19

R -Matrices

Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. They contain elements of the same atomic types. Though we can create a matrix containing only characters or only logical values, they are not of much use. We use matrices containing numeric elements to be used in mathematical calculations.

A Matrix is created using the matrix() function.

Basic syntax is matrix(data,nrow,ncol,byrow,dimnames) Following is the description of the parameters used −

  • data is the input vector which becomes the data elements of the matrix. -nrow is the number of rows to be created.
  • ncol is the number of columns to be created.
  • byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
  • dimname is the names assigned to the rows and columns.
# Elements are arranged sequentially by row.
M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)
##      [,1] [,2] [,3]
## [1,]    3    4    5
## [2,]    6    7    8
## [3,]    9   10   11
## [4,]   12   13   14
# Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)
##      [,1] [,2] [,3]
## [1,]    3    7   11
## [2,]    4    8   12
## [3,]    5    9   13
## [4,]    6   10   14
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)
##      col1 col2 col3
## row1    3    4    5
## row2    6    7    8
## row3    9   10   11
## row4   12   13   14

Accessing Elements of a Matrix

Elements of a matrix can be accessed by using the column and row index of the element. We consider the matrix P above to find the specific elements below.

# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

# Create the matrix.
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))

# Access the element at 3rd column and 1st row.
print(P[1,3])
## [1] 5
# Access the element at 2nd column and 4th row.
print(P[4,2])
## [1] 13
# Access only the  2nd row.
print(P[2,])
## col1 col2 col3 
##    6    7    8
# Access only the 3rd column.
print(P[,3])
## row1 row2 row3 row4 
##    5    8   11   14

Matrix Computations

Various mathematical operations are performed on the matrices using the R operators. The result of the operation is also a matrix.

The dimensions (number of rows and columns) should be same for the matrices involved in the operation.

# Matrix Addition and Subtraction
## Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
##      [,1] [,2] [,3]
## [1,]    3   -1    2
## [2,]    9    4    6
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
##      [,1] [,2] [,3]
## [1,]    5    0    3
## [2,]    2    9    4
# Add the matrices.
result <- matrix1 + matrix2
cat("Result of addition","\n")
## Result of addition
print(result)
##      [,1] [,2] [,3]
## [1,]    8   -1    5
## [2,]   11   13   10
# Subtract the matrices
result <- matrix1 - matrix2
cat("Result of subtraction","\n")
## Result of subtraction
print(result)
##      [,1] [,2] [,3]
## [1,]   -2   -1   -1
## [2,]    7   -5    2
# Matrix Multiplication and Division
## Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
##      [,1] [,2] [,3]
## [1,]    3   -1    2
## [2,]    9    4    6
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
##      [,1] [,2] [,3]
## [1,]    5    0    3
## [2,]    2    9    4
# Multiply the matrices.
result <- matrix1 * matrix2
cat("Result of multiplication","\n")
## Result of multiplication
print(result)
##      [,1] [,2] [,3]
## [1,]   15    0    6
## [2,]   18   36   24
# Divide the matrices
result <- matrix1 / matrix2
cat("Result of division","\n")
## Result of division
print(result)
##      [,1]      [,2]      [,3]
## [1,]  0.6      -Inf 0.6666667
## [2,]  4.5 0.4444444 1.5000000

R- Arrays

Arrays are the R data objects which can store data in more than two dimensions. For example − If we create an array of dimension (2, 3, 4) then it creates 4 rectangular matrices each with 2 rows and 3 columns. Arrays can store only data type.

An array is created using the array() function. It takes vectors as input and uses the values in the dim parameter to create an array.

#The following example creates an array of two 3x3 matrices each with 3 rows and 3 columns
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    5   10   13
## [2,]    9   11   14
## [3,]    3   12   15
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    5   10   13
## [2,]    9   11   14
## [3,]    3   12   15

Naming Columns and Rows

We can give names to the rows, columns and matrices in the array by using the dimnames parameter.

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(row.names,column.names,
   matrix.names))
print(result)
## , , Matrix1
## 
##      COL1 COL2 COL3
## ROW1    5   10   13
## ROW2    9   11   14
## ROW3    3   12   15
## 
## , , Matrix2
## 
##      COL1 COL2 COL3
## ROW1    5   10   13
## ROW2    9   11   14
## ROW3    3   12   15

Accesing Elements

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(row.names,
   column.names, matrix.names))

# Print the third row of the second matrix of the array.
print(result[3,,2])
## COL1 COL2 COL3 
##    3   12   15
# Print the element in the 1st row and 3rd column of the 1st matrix.
print(result[1,3,1])
## [1] 13
# Print the 2nd Matrix.
print(result[,,2])
##      COL1 COL2 COL3
## ROW1    5   10   13
## ROW2    9   11   14
## ROW3    3   12   15

Manipulating Elements

As array is made up matrices in multiple dimensions, the operations on elements of array are carried out by accessing elements of the matrices.

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.
array1 <- array(c(vector1,vector2),dim = c(3,3,2))

# Create two vectors of different lengths.
vector3 <- c(9,1,0)
vector4 <- c(6,0,11,3,14,1,2,6,9)
array2 <- array(c(vector1,vector2),dim = c(3,3,2))

# create matrices from these arrays.
matrix1 <- array1[,,2]
matrix2 <- array2[,,2]

# Add the matrices.
result <- matrix1+matrix2
print(result)
##      [,1] [,2] [,3]
## [1,]   10   20   26
## [2,]   18   22   28
## [3,]    6   24   30

Calculation Accros Elements

We can do calculations across the elements in an array using the apply() function. Basic syntax is apply(x,margin,fun)

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.
new.array <- array(c(vector1,vector2),dim = c(3,3,2))
print(new.array)
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    5   10   13
## [2,]    9   11   14
## [3,]    3   12   15
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    5   10   13
## [2,]    9   11   14
## [3,]    3   12   15
# Use apply to calculate the sum of the rows across all the matrices.
result <- apply(new.array, c(1), sum)
print(result)
## [1] 56 68 60

R- Factors

Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like “Male,”Female” and True, False etc. They are useful in data analysis for statistical modeling.

Factors are created using the factor () function by taking a vector as input.

# Create a vector as input.
data <- c("East","West","East","North","North","East","West","West","West","East","North")

print(data)
##  [1] "East"  "West"  "East"  "North" "North" "East"  "West"  "West"  "West" 
## [10] "East"  "North"
print(is.factor(data))
## [1] FALSE
# Apply the factor function.
factor_data <- factor(data)

print(factor_data)
##  [1] East  West  East  North North East  West  West  West  East  North
## Levels: East North West
print(is.factor(factor_data))
## [1] TRUE

Factors in Data Frame

On creating any data frame with a column of text data, R treats the text column as categorical data and creates factors on it.

# Create the vectors for data frame.
precip <- c(10,5,0,0,1,2,1)
temp <- c(23,24,26,25,25,27,27)
con <- c("Rain","Rain","Sunny","Sunny","Cloudy","Cloudy","Cloudy")

# Create the data frame.
input_data <- data.frame(precip,temp,con)
print(input_data)
##   precip temp    con
## 1     10   23   Rain
## 2      5   24   Rain
## 3      0   26  Sunny
## 4      0   25  Sunny
## 5      1   25 Cloudy
## 6      2   27 Cloudy
## 7      1   27 Cloudy
# Test if the condition column is a factor.
print(is.factor(input_data$con))
## [1] FALSE
# Print the condition column so see the levels.
print(input_data$con)
## [1] "Rain"   "Rain"   "Sunny"  "Sunny"  "Cloudy" "Cloudy" "Cloudy"

Changing the Order Levels

The order of the levels in a factor can be changed by applying the factor function again with new order of the levels.

data <- c("East","West","East","North","North","East","West",
   "West","West","East","North")
# Create the factors
factor_data <- factor(data)
print(factor_data)
##  [1] East  West  East  North North East  West  West  West  East  North
## Levels: East North West
# Apply the factor function with required order of the level.
new_order_data <- factor(factor_data,levels = c("East","West","North"))
print(new_order_data)
##  [1] East  West  East  North North East  West  West  West  East  North
## Levels: East West North

Generating Factors Levels

We can generate factor levels by using the gl() function. It takes two integers as input which indicates how many levels and how many times each level. Basic syntax is gln(n,k,labels) Following is the description of the parameters used − - n is a integer giving the number of levels. - k is a integer giving the number of replications. - labels is a vector of labels for the resulting factor levels

v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)
##  [1] Tampa   Tampa   Tampa   Tampa   Seattle Seattle Seattle Seattle Boston 
## [10] Boston  Boston  Boston 
## Levels: Tampa Seattle Boston

R- Data Frames

A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.

Following are the characteristics of a data frame.

  • The column names should be non-empty.
  • The row names should be unique.
  • The data stored in a data frame can be of numeric, factor or character type.
  • Each column should contain same number of data items.

Create Data Frame

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Dani","Matty","Star","Ali","Amy"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)
# Print the data frame.         
print(emp.data) 
##   emp_id emp_name salary start_date
## 1      1     Dani 623.30 2012-01-01
## 2      2    Matty 515.20 2013-09-23
## 3      3     Star 611.00 2014-11-15
## 4      4      Ali 729.00 2014-05-11
## 5      5      Amy 843.25 2015-03-27
# The structure of the data frame can be seen by using str() function.
str(emp.data)
## 'data.frame':    5 obs. of  4 variables:
##  $ emp_id    : int  1 2 3 4 5
##  $ emp_name  : chr  "Dani" "Matty" "Star" "Ali" ...
##  $ salary    : num  623 515 611 729 843
##  $ start_date: Date, format: "2012-01-01" "2013-09-23" ...
# The statistical summary and nature of the data can be obtained by applying summary() function.
summary(emp.data)
##      emp_id    emp_name             salary        start_date        
##  Min.   :1   Length:5           Min.   :515.2   Min.   :2012-01-01  
##  1st Qu.:2   Class :character   1st Qu.:611.0   1st Qu.:2013-09-23  
##  Median :3   Mode  :character   Median :623.3   Median :2014-05-11  
##  Mean   :3                      Mean   :664.4   Mean   :2014-01-14  
##  3rd Qu.:4                      3rd Qu.:729.0   3rd Qu.:2014-11-15  
##  Max.   :5                      Max.   :843.2   Max.   :2015-03-27
# Extract Specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)
##   emp.data.emp_name emp.data.salary
## 1              Dani          623.30
## 2             Matty          515.20
## 3              Star          611.00
## 4               Ali          729.00
## 5               Amy          843.25
# Extract the first two rows and then all columns
result <- emp.data[1:2,]
print(result)
##   emp_id emp_name salary start_date
## 1      1     Dani  623.3 2012-01-01
## 2      2    Matty  515.2 2013-09-23
# Extract 3rd and 5th row with 2nd and 4th column
# Extract 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)]
print(result)
##   emp_name start_date
## 3     Star 2014-11-15
## 5      Amy 2015-03-27
# Add the "dept" coulmn.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)
##   emp_id emp_name salary start_date       dept
## 1      1     Dani 623.30 2012-01-01         IT
## 2      2    Matty 515.20 2013-09-23 Operations
## 3      3     Star 611.00 2014-11-15         IT
## 4      4      Ali 729.00 2014-05-11         HR
## 5      5      Amy 843.25 2015-03-27    Finance

Add Row

To add more rows permanently to an existing data frame, we need to bring in the new rows in the same structure as the existing data frame and use the rbind() function.

# Create the first data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   dept = c("IT","Operations","IT","HR","Finance"),
   stringsAsFactors = FALSE
)

# Create the second data frame
emp.newdata <-  data.frame(
   emp_id = c (6:8), 
   emp_name = c("Rasmi","Pranab","Tusar"),
   salary = c(578.0,722.5,632.8), 
   start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
   dept = c("IT","Operations","Fianance"),
   stringsAsFactors = FALSE
)

# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)
##   emp_id emp_name salary start_date       dept
## 1      1     Rick 623.30 2012-01-01         IT
## 2      2      Dan 515.20 2013-09-23 Operations
## 3      3 Michelle 611.00 2014-11-15         IT
## 4      4     Ryan 729.00 2014-05-11         HR
## 5      5     Gary 843.25 2015-03-27    Finance
## 6      6    Rasmi 578.00 2013-05-21         IT
## 7      7   Pranab 722.50 2013-07-30 Operations
## 8      8    Tusar 632.80 2014-06-17   Fianance

R- Data Reshaping

Data Reshaping in R is about changing the way data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame. It is easy to extract data from the rows and columns of a data frame but there are situations when we need the data frame in a format that is different from format in which we received it. R has many functions to split, merge and change the rows to columns and vice-versa in a data frame.

Joining Columns and Rows

We can join multiple vectors to create a data frame using the cbind() function. Also we can merge two data frames using rbind() function.

# Create vector objects.
city <- c("Bogor","Jakarta","Tangerang","Bandung")
precip <- c(200,130,100,90)
temp <- c(27,30,30,23)
hum <- c(95,80,85,90)
# Combine above three vectors into one data frame.
clim <- cbind(city,precip,temp,hum)

# Print a header.
cat("# # # # The First data frame\n") 
## # # # # The First data frame
# Print the data frame.
print(clim)
##      city        precip temp hum 
## [1,] "Bogor"     "200"  "27" "95"
## [2,] "Jakarta"   "130"  "30" "80"
## [3,] "Tangerang" "100"  "30" "85"
## [4,] "Bandung"   "90"   "23" "90"
# Create another data frame with similar columns
new.clim <- data.frame(
  city = c("Medan", "Solo", "Tangerang Selatan", "Riau"),
  precip = c(240, 200, 180, 300),
  temp = c(30, 29, 29, 28),
  hum = c(94, 85, 89, 92),
  stringsAsFactors = FALSE
)
# Print a header.
cat("# # # The Second data frame\n") 
## # # # The Second data frame
# Print the data frame.
print(new.clim)
##                city precip temp hum
## 1             Medan    240   30  94
## 2              Solo    200   29  85
## 3 Tangerang Selatan    180   29  89
## 4              Riau    300   28  92
# Combine rows form both the data frames.
all_clim <- rbind(clim, new.clim)

# Print a header.
cat("# # # The combined data frame\n") 
## # # # The combined data frame
# Print the result.
print(all_clim)
##                city precip temp hum
## 1             Bogor    200   27  95
## 2           Jakarta    130   30  80
## 3         Tangerang    100   30  85
## 4           Bandung     90   23  90
## 5             Medan    240   30  94
## 6              Solo    200   29  85
## 7 Tangerang Selatan    180   29  89
## 8              Riau    300   28  92
# Sort combined data frame by temperature (highest to lowest)
sorted_clim <- all_clim[order(all_clim$temp, decreasing = TRUE), ]

# Print a header and the sorted data frame
cat("# # # The sorted data frame\n")
## # # # The sorted data frame
print(sorted_clim)
##                city precip temp hum
## 2           Jakarta    130   30  80
## 3         Tangerang    100   30  85
## 5             Medan    240   30  94
## 6              Solo    200   29  85
## 7 Tangerang Selatan    180   29  89
## 8              Riau    300   28  92
## 1             Bogor    200   27  95
## 4           Bandung     90   23  90

Merging

We can merge two data frames by using the merge() function. The data frames must have same column names on which the merging happens.

In the example below, we consider the data sets about Diabetes in Pima Indian Women available in the library names “MASS”. we merge the two data sets based on the values of blood pressure(“bp”) and body mass index(“bmi”). On choosing these two columns for merging, the records where values of these two variables match in both data sets are combined together to form a single data frame.

library(MASS)
merged.Pima <- merge(x = Pima.te, y = Pima.tr,
   by.x = c("bp", "bmi"),
   by.y = c("bp", "bmi")
)
print(merged.Pima)
##    bp  bmi npreg.x glu.x skin.x ped.x age.x type.x npreg.y glu.y skin.y ped.y
## 1  60 33.8       1   117     23 0.466    27     No       2   125     20 0.088
## 2  64 29.7       2    75     24 0.370    33     No       2   100     23 0.368
## 3  64 31.2       5   189     33 0.583    29    Yes       3   158     13 0.295
## 4  64 33.2       4   117     27 0.230    24     No       1    96     27 0.289
## 5  66 38.1       3   115     39 0.150    28     No       1   114     36 0.289
## 6  68 38.5       2   100     25 0.324    26     No       7   129     49 0.439
## 7  70 27.4       1   116     28 0.204    21     No       0   124     20 0.254
## 8  70 33.1       4    91     32 0.446    22     No       9   123     44 0.374
## 9  70 35.4       9   124     33 0.282    34     No       6   134     23 0.542
## 10 72 25.6       1   157     21 0.123    24     No       4    99     17 0.294
## 11 72 37.7       5    95     33 0.370    27     No       6   103     32 0.324
## 12 74 25.9       9   134     33 0.460    81     No       8   126     38 0.162
## 13 74 25.9       1    95     21 0.673    36     No       8   126     38 0.162
## 14 78 27.6       5    88     30 0.258    37     No       6   125     31 0.565
## 15 78 27.6      10   122     31 0.512    45     No       6   125     31 0.565
## 16 78 39.4       2   112     50 0.175    24     No       4   112     40 0.236
## 17 88 34.5       1   117     24 0.403    40    Yes       4   127     11 0.598
##    age.y type.y
## 1     31     No
## 2     21     No
## 3     24     No
## 4     21     No
## 5     21     No
## 6     43    Yes
## 7     36    Yes
## 8     40     No
## 9     29    Yes
## 10    28     No
## 11    55     No
## 12    39     No
## 13    39     No
## 14    49    Yes
## 15    49    Yes
## 16    38     No
## 17    28     No
nrow(merged.Pima)
## [1] 17

Melting and Casting

One of the most interesting aspects of R programming is about changing the shape of the data in multiple steps to get a desired shape. The functions used to do this are called melt() and cast().

We consider the dataset called ships present in the library called “MASS”.

library(MASS)
print(ships)
##    type year period service incidents
## 1     A   60     60     127         0
## 2     A   60     75      63         0
## 3     A   65     60    1095         3
## 4     A   65     75    1095         4
## 5     A   70     60    1512         6
## 6     A   70     75    3353        18
## 7     A   75     60       0         0
## 8     A   75     75    2244        11
## 9     B   60     60   44882        39
## 10    B   60     75   17176        29
## 11    B   65     60   28609        58
## 12    B   65     75   20370        53
## 13    B   70     60    7064        12
## 14    B   70     75   13099        44
## 15    B   75     60       0         0
## 16    B   75     75    7117        18
## 17    C   60     60    1179         1
## 18    C   60     75     552         1
## 19    C   65     60     781         0
## 20    C   65     75     676         1
## 21    C   70     60     783         6
## 22    C   70     75    1948         2
## 23    C   75     60       0         0
## 24    C   75     75     274         1
## 25    D   60     60     251         0
## 26    D   60     75     105         0
## 27    D   65     60     288         0
## 28    D   65     75     192         0
## 29    D   70     60     349         2
## 30    D   70     75    1208        11
## 31    D   75     60       0         0
## 32    D   75     75    2051         4
## 33    E   60     60      45         0
## 34    E   60     75       0         0
## 35    E   65     60     789         7
## 36    E   65     75     437         7
## 37    E   70     60    1157         5
## 38    E   70     75    2161        12
## 39    E   75     60       0         0
## 40    E   75     75     542         1

Melt

Now we melt the data to organize it, converting all columns other than type and year into multiple rows.

library(reshape2)
molten.ships <- melt(ships, id = c("type","year"))
print(molten.ships)
##     type year  variable value
## 1      A   60    period    60
## 2      A   60    period    75
## 3      A   65    period    60
## 4      A   65    period    75
## 5      A   70    period    60
## 6      A   70    period    75
## 7      A   75    period    60
## 8      A   75    period    75
## 9      B   60    period    60
## 10     B   60    period    75
## 11     B   65    period    60
## 12     B   65    period    75
## 13     B   70    period    60
## 14     B   70    period    75
## 15     B   75    period    60
## 16     B   75    period    75
## 17     C   60    period    60
## 18     C   60    period    75
## 19     C   65    period    60
## 20     C   65    period    75
## 21     C   70    period    60
## 22     C   70    period    75
## 23     C   75    period    60
## 24     C   75    period    75
## 25     D   60    period    60
## 26     D   60    period    75
## 27     D   65    period    60
## 28     D   65    period    75
## 29     D   70    period    60
## 30     D   70    period    75
## 31     D   75    period    60
## 32     D   75    period    75
## 33     E   60    period    60
## 34     E   60    period    75
## 35     E   65    period    60
## 36     E   65    period    75
## 37     E   70    period    60
## 38     E   70    period    75
## 39     E   75    period    60
## 40     E   75    period    75
## 41     A   60   service   127
## 42     A   60   service    63
## 43     A   65   service  1095
## 44     A   65   service  1095
## 45     A   70   service  1512
## 46     A   70   service  3353
## 47     A   75   service     0
## 48     A   75   service  2244
## 49     B   60   service 44882
## 50     B   60   service 17176
## 51     B   65   service 28609
## 52     B   65   service 20370
## 53     B   70   service  7064
## 54     B   70   service 13099
## 55     B   75   service     0
## 56     B   75   service  7117
## 57     C   60   service  1179
## 58     C   60   service   552
## 59     C   65   service   781
## 60     C   65   service   676
## 61     C   70   service   783
## 62     C   70   service  1948
## 63     C   75   service     0
## 64     C   75   service   274
## 65     D   60   service   251
## 66     D   60   service   105
## 67     D   65   service   288
## 68     D   65   service   192
## 69     D   70   service   349
## 70     D   70   service  1208
## 71     D   75   service     0
## 72     D   75   service  2051
## 73     E   60   service    45
## 74     E   60   service     0
## 75     E   65   service   789
## 76     E   65   service   437
## 77     E   70   service  1157
## 78     E   70   service  2161
## 79     E   75   service     0
## 80     E   75   service   542
## 81     A   60 incidents     0
## 82     A   60 incidents     0
## 83     A   65 incidents     3
## 84     A   65 incidents     4
## 85     A   70 incidents     6
## 86     A   70 incidents    18
## 87     A   75 incidents     0
## 88     A   75 incidents    11
## 89     B   60 incidents    39
## 90     B   60 incidents    29
## 91     B   65 incidents    58
## 92     B   65 incidents    53
## 93     B   70 incidents    12
## 94     B   70 incidents    44
## 95     B   75 incidents     0
## 96     B   75 incidents    18
## 97     C   60 incidents     1
## 98     C   60 incidents     1
## 99     C   65 incidents     0
## 100    C   65 incidents     1
## 101    C   70 incidents     6
## 102    C   70 incidents     2
## 103    C   75 incidents     0
## 104    C   75 incidents     1
## 105    D   60 incidents     0
## 106    D   60 incidents     0
## 107    D   65 incidents     0
## 108    D   65 incidents     0
## 109    D   70 incidents     2
## 110    D   70 incidents    11
## 111    D   75 incidents     0
## 112    D   75 incidents     4
## 113    E   60 incidents     0
## 114    E   60 incidents     0
## 115    E   65 incidents     7
## 116    E   65 incidents     7
## 117    E   70 incidents     5
## 118    E   70 incidents    12
## 119    E   75 incidents     0
## 120    E   75 incidents     1

Cast

We can cast the molten data into a new form where the aggregate of each type of ship for each year is created. It is done using the cast() function.

library(reshape)
recasted.ship <- cast(molten.ships, type+year~variable,sum)
print(recasted.ship)
##    type year period service incidents
## 1     A   60    135     190         0
## 2     A   65    135    2190         7
## 3     A   70    135    4865        24
## 4     A   75    135    2244        11
## 5     B   60    135   62058        68
## 6     B   65    135   48979       111
## 7     B   70    135   20163        56
## 8     B   75    135    7117        18
## 9     C   60    135    1731         2
## 10    C   65    135    1457         1
## 11    C   70    135    2731         8
## 12    C   75    135     274         1
## 13    D   60    135     356         0
## 14    D   65    135     480         0
## 15    D   70    135    1557        13
## 16    D   75    135    2051         4
## 17    E   60    135      45         0
## 18    E   65    135    1226        14
## 19    E   70    135    3318        17
## 20    E   75    135     542         1
# Create a sample data frame
city <- c("Bogor", "Jakarta", "Tangerang", "Bandung")
precip <- c(200, 130, 100, 90)
temp <- c(27, 30, 30, 23)
data <- data.frame(city, precip, temp)

# Print the original data frame
cat("# # # Original Data Frame\n")
## # # # Original Data Frame
print(data)
##        city precip temp
## 1     Bogor    200   27
## 2   Jakarta    130   30
## 3 Tangerang    100   30
## 4   Bandung     90   23
# Melt the data frame from wide to long format
melted_data <- melt(data, id.vars = "city", variable.name = "variable", value.name = "value")

# Print the melted data frame
cat("# # # Melted Data Frame\n")
## # # # Melted Data Frame
print(melted_data)
##        city variable value
## 1     Bogor   precip   200
## 2   Jakarta   precip   130
## 3 Tangerang   precip   100
## 4   Bandung   precip    90
## 5     Bogor     temp    27
## 6   Jakarta     temp    30
## 7 Tangerang     temp    30
## 8   Bandung     temp    23
# Cast the melted data back to wide format
casted_data <- dcast(melted_data, city ~ variable, value.var = "value")

# Print the casted data frame
cat("# # # Casted Data Frame\n")
## # # # Casted Data Frame
print(casted_data)
##        city precip temp
## 1   Bandung     90   23
## 2     Bogor    200   27
## 3   Jakarta    130   30
## 4 Tangerang    100   30

Data Visualization

Certainly, here are the key points about using data visualization in climate modeling with RStudio:

  1. Enhanced Understanding: Data visualization in RStudio enables climate modelers to grasp complex climate data more effectively, uncover trends, and recognize patterns crucial for informed decision-making.

  2. Diverse Visualizations: RStudio offers a wide range of plotting functions and packages, allowing modelers to create diverse visualizations such as line charts, scatter plots, heatmaps, and interactive graphs.

  3. Temporal Insights: Time series plots created in RStudio enable researchers to showcase the dynamic changes in temperature, precipitation, and other climate variables over time.

  4. Customization: RStudio’s flexibility empowers modelers to customize visuals by adding labels, legends, and annotations, enhancing the interpretation of model results.

  5. Interactive Exploration: Packages like ggplot2 and Plotly facilitate interactive graphics, enabling researchers to explore various climate scenarios and gain deeper insights into complex data.

  6. Communication: Compelling visualizations generated using RStudio serve as effective communication tools, helping researchers convey findings and contribute to the broader understanding of climate processes.

  7. Impactful Decisions: Visualizing climate model outcomes aids in unraveling intricate climate interactions, empowering decision-makers to address climate change impacts with more precision.

  8. Scientific Advancement: The synergy between data visualization and RStudio accelerates scientific advancement in climate research by fostering data-driven discoveries and innovative insights.

In essence, the combination of data visualization and RStudio is a powerful toolset that enables climate modelers to distill complex information into clear, impactful visuals, advancing our comprehension of the ever-changing climate.

Pie Charts

In R the pie chart is created using the pie() function which takes positive numbers as a vector input. The additional parameters are used to control labels, color, title etc.

pie(x, labels, radius, main, col, clockwise)

Following is the description of the parameters used −

  • x is a vector containing the numeric values used in the pie chart.
  • labels is used to give description to the slices.
  • radius indicates the radius of the circle of the pie chart.(value between −1 and +1).
  • main indicates the title of the chart.
  • col indicates the color palette.
  • clockwise is a logical value indicating if the slices are drawn clockwise or anti clockwise.
library(ggplot2)

# Create data for the graph.
tree_cover_mha <- c(5.3, 32, 14, 8)
labels <- c("Riau", "Kalimantan", "Papua", "NTT")

# Calculate percentages
piepercent <- round(100 * tree_cover_mha / sum(tree_cover_mha), 1)

# Create the pie chart
pie(tree_cover_mha, labels = paste(labels, "\n", piepercent, "%"), main = "Tree Cover", col = rainbow(length(tree_cover_mha)))

# Add legend
legend("topright", labels, cex = 0.8, fill = rainbow(length(tree_cover_mha)))

Bar Charts

R uses the function barplot() to create bar charts. Basic syntax is :

barplot(H,xlab,ylab,main, names.arg,col)
# Create data for the graph.
tree_cover_mha <- c(5.3, 32, 14, 8)
labels <- c("Riau", "Kalimantan", "Papua", "NTT")
# Create a bar plot with ordered data
# color_gradient <- colorRampPalette(c("yellow", "red"))
barplot(tree_cover_mha,
        names.arg = labels,
        ylab = "Tree Cover Loss (MHa)",
        xlab = "Location")

Bar Charts Labels, Title and Colors

# Create the data for the chart
precip <- c(7,12,28,3,41)
month <- c("Mar","Apr","May","Jun","Jul")

# Plot the bar chart 
barplot(precip,names.arg=month,xlab="Month",ylab="Precipitation (mm)",col="blue",
main="Bogor 2022",border="red")

Group Bar Chart and Stacked Bar Chart

We can create bar chart with groups of bars and stacks in each bar by using a matrix as input values.

# Data
fires <- c(1, 2, 5, 4, 7)
tc_loss <- c(12, 10, 9, 20, 4)
others <- c(0.5, 0.9, 1, 1.5, 2)
locations <- c("Location 1", "Location 2", "Location 3", "Location 4", "Location 5")

# Combine data into a matrix for stacked bar plot
data_matrix <- matrix(c(fires, tc_loss, others), nrow = 3)

# Create stacked bar plot
barplot(data_matrix, beside = TRUE, col = c("red", "blue", "green"),
        names.arg = locations,
        legend.text = c("Fires", "Tree Cover Loss", "Others"),
        args.legend = list(x = "topright"))

library(tidyverse)

# Data
fires <- c(1, 2, 5, 4, 7)
tc_loss <- c(12, 10, 9, 20, 4)
others <- c(0.5, 0.9, 1, 1.5, 2)
locations <- c("Location 1", "Location 2", "Location 3", "Location 4", "Location 5")

# Create a data frame
data <- data.frame(fires = fires, tc_loss = tc_loss, others = others, locations = locations)

# Reshape data to long format
data_long <- data %>%
  pivot_longer(cols = c(fires, tc_loss, others), names_to = "category", values_to = "value")

# Create a stacked bar plot using ggplot2
ggplot(data_long, aes(x = locations, y = value, fill = category)) +
  geom_bar(stat = "identity") +
  labs(x = "Locations", y = "Value", title = "Stacked Bar Plot") +
  scale_fill_manual(values = c("red", "blue", "green")) +
  theme_minimal() +
  theme(legend.position = "right")

Boxplot

Boxplots are a measure of how well distributed is the data in a data set. It divides the data set into three quartiles. This graph represents the minimum, maximum, median, first quartile and third quartile in the data set. It is also useful in comparing the distribution of data across data sets by drawing boxplots for each of them.

Boxplots are created in R by using the boxplot() function.

boxplot(x, data, notch, varwidth, names, main)

Following is the description of the parameters used : - x is a vector or a formula. - data is the data frame. - notch is a logical value. Set as TRUE to draw a notch. - varwidth is a logical value. Set as true to draw width of the box proportionate to the sample size. - names are the group labels which will be printed under each boxplot. - main is used to give a title to the graph.

input <- mtcars[,c('mpg','cyl')]
print(head(input))
##                    mpg cyl
## Mazda RX4         21.0   6
## Mazda RX4 Wag     21.0   6
## Datsun 710        22.8   4
## Hornet 4 Drive    21.4   6
## Hornet Sportabout 18.7   8
## Valiant           18.1   6
# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders",
   ylab = "Miles Per Gallon", main = "Mileage Data")

# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars, 
   xlab = "Number of Cylinders",
   ylab = "Miles Per Gallon", 
   main = "Mileage Data",
   notch = TRUE, 
   varwidth = TRUE, 
   col = c("green","yellow","purple"),
   names = c("High","Medium","Low")
)

Line Graphs

The plot() function in R is used to create the line graph.

plot(v,type,col,xlab,ylab)
precip <- c(5,12,4,2,1,3,8)
plot(precip, type = "l", ylab="mm", xlab="month", col = "red")

# You can also try a different types of line graph plots, just modify type = ""
par(mfrow = c(2, 3))
plot(precip, type = "l", main = "type = 'l'")
plot(precip, type = "s", main = "type = 's'")
plot(precip, type = "p", main = "type = 'p'")
plot(precip, type = "o", main = "type = 'o'")
plot(precip, type = "b", main = "type = 'b'")
plot(precip, type = "h", main = "type = 'h'")

Multiple Lines in a Line Chart

More than one line can be drawn on the same chart by using the lines() function.

bogor <- c(20,15,18, 8,6,4,3,10,13,20)
jakarta <- c(15,21,8, 7,12,1,4,5,10,13)
plot(y = bogor, x= c(1:10),
     type = "l", 
     ylab= "Precipitaion (mm)", 
     xlab = "Month", 
     ylim = c(0,30),
     xlim = c(1,10),
     col= "blue", 
     main = "Precipitaion 2020")
# Add lines
lines(jakarta, type = "l", col = "red", lty = 2)
# Add legends
legend("topright", legend = c("Bogor", "Jakarta"), col = c("blue", "red"), lty = c(1,2))

Note : If you need more extra about plot in R, you can also click this link Plot in R

Basic Statistics

Basic statistics in R can provide valuable insights into climate data, enabling us to understand key characteristics and relationships within the dataset. By employing statistical measures like mean, median, mode, quantiles, and percentiles, we can grasp the central tendency, variability, and distribution of variables such as temperature, rainfall, and humidity. These measures help us to identify common patterns, assess data spread, and recognize extreme values.

Furthermore, tools like normality tests, such as the Shapiro-Wilk test, can assess whether our climate data follows a normal distribution. Linear regression allows us to explore the relationship between variables, like temperature and rainfall, revealing potential correlations or trends. Multiple regression extends this analysis by considering multiple predictor variables, such as temperature and humidity, and their combined influence on rainfall.

Time series analysis is crucial in climate studies, enabling us to visualize how temperature or other variables change over time. This analysis helps us spot trends, seasonality, and potential cycles in the data. ANOVA (Analysis of Variance) assists in understanding the impact of categorical variables, like different climate zones, on continuous variables such as temperature.

Finally, significance tests, like t-tests, help determine if observed differences in, for example, temperature between two groups, are statistically significant. These statistical techniques collectively provide a solid foundation for extracting meaningful information from climate data, contributing to a better understanding of climatic patterns and changes.

Mean, Median, Modus, Quantiles

# Create Data or you also can load your own datasets
temp <- c(25, 28, 26, 22, 30, 29, 27, 24, 23, 26)
precip <- c(10, 15, 8, 5, 20, 18, 12, 6, 7, 9)
rh <- c(70, 75, 72, 68, 80, 78, 73, 69, 71, 74)
type_clim <- c("Tropis", "Tropis", "Subtropis", "Tropis", "Subtropis", "Subtropis", "Tropis", "Tropis", "Subtropis", "Tropis")
group_clim <- c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B")
# combine datasets into data frame
iklim <- data.frame(temp, precip, rh, type_clim, group_clim)
print(iklim)
##    temp precip rh type_clim group_clim
## 1    25     10 70    Tropis          A
## 2    28     15 75    Tropis          B
## 3    26      8 72 Subtropis          A
## 4    22      5 68    Tropis          B
## 5    30     20 80 Subtropis          A
## 6    29     18 78 Subtropis          B
## 7    27     12 73    Tropis          A
## 8    24      6 69    Tropis          B
## 9    23      7 71 Subtropis          A
## 10   26      9 74    Tropis          B
# Mean, Median, Modus, Quantiles
mean_temperature <- mean(iklim$temp)
median_temperature <- median(iklim$temp)
modus_temperature <- as.numeric(names(sort(table(iklim$temp), decreasing = TRUE)[1]))
quartiles <- quantile(iklim$temp, probs = c(0.25, 0.5, 0.75))
percentile_90 <- quantile(iklim$precip, probs = 0.9)
shapiro_test <- shapiro.test(iklim$temp)
# shows
cat("Mean Temperature:", mean_temperature, "\n")
## Mean Temperature: 26
cat("Median Temperature:", median_temperature, "\n")
## Median Temperature: 26
cat("Modus Temperature:", modus_temperature, "\n")
## Modus Temperature: 26
cat("Quartiles:", quartiles, "\n")
## Quartiles: 24.25 26 27.75
cat("90th Percentile of Rainfall:", percentile_90, "\n")
## 90th Percentile of Rainfall: 18.2
cat("Shapiro-Wilk Normality Test p-value:", shapiro_test$p.value, "\n\n")
## Shapiro-Wilk Normality Test p-value: 0.9629413

Regression

Linear Regression

Linear regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables. It aims to find the best-fitting straight line that minimizes the differences between the observed data points and the predicted values. This line represents the linear relationship between variables, allowing us to make predictions or understand the effect of changes in independent variables on the dependent variable.

In a simple linear regression, there is one independent variable, and the relationship is represented by a straight line equation (y = mx + b). The goal is to find the slope (m) and intercept (b) that best fits the data.

# Linear Regression
linear_model <- lm(precip ~ temp, data = iklim)
summary(linear_model)
## 
## Call:
## lm(formula = precip ~ temp, data = iklim)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0000 -1.1458  0.5583  1.4375  1.6500 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -37.9667     5.9817  -6.347 0.000221 ***
## temp          1.8833     0.2291   8.222 3.58e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.774 on 8 degrees of freedom
## Multiple R-squared:  0.8942, Adjusted R-squared:  0.881 
## F-statistic: 67.61 on 1 and 8 DF,  p-value: 3.583e-05
# Creating a scatter plot with linear regression line and points
plot(temp, precip, type = "p", pch = 16, col = "black", 
     ylim = c(0,20),
     main = "Scatter Plot with Linear Regression Line")
# Adding the linear regression line
abline(linear_model, col = "blue")
# Adding a legend
legend("topleft", legend=c("Data Points", "Linear Regression Line"), 
       pch = c(16, NA), lty= c(NA,1), col=c("black", "blue"), cex=0.8)

Multiple Linear Regression

Multiple linear regression extends the concept of linear regression by accommodating multiple independent variables that influence a single dependent variable. It aims to establish a linear equation that best represents the relationship between the dependent variable and the multiple predictors.

The equation takes the form y = b₀ + b₁x₁ + b₂x₂ + … + bₙxₙ, where y is the dependent variable, x₁, x₂, …, xₙ are the independent variables, and b₀, b₁, b₂, …, bₙ are the coefficients that represent the influence of each predictor while considering others.

Multiple linear regression helps us understand the combined impact of different factors on the dependent variable, making it a powerful tool in analyzing complex relationships and making predictions based on multiple input variables.

# Multiple Regression
multiple_model <- lm(precip ~ temp + rh, data = iklim)
summary(multiple_model)
## 
## Call:
## lm(formula = precip ~ temp + rh, data = iklim)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5976 -0.3989  0.4827  0.8318  1.8394 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -59.8374    16.0283  -3.733  0.00733 **
## temp          1.0467     0.6132   1.707  0.13157   
## rh            0.5976     0.4103   1.456  0.18864   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.662 on 7 degrees of freedom
## Multiple R-squared:  0.9188, Adjusted R-squared:  0.8956 
## F-statistic:  39.6 on 2 and 7 DF,  p-value: 0.0001526
# Predict precip with regression model
predict_precip <- predict(multiple_model)
predict_precip
##         1         2         3         4         5         6         7         8 
##  8.160569 14.288618 10.402439  3.825203 19.369919 17.128049 12.046748  6.516260 
##         9        10 
##  6.664634 11.597561
# Correlation between precip predicted model with data (obs)
correlation_df <- data.frame(
  predicted_obs = cor(precip,predict_precip),
  predicted_rh = cor(predict_precip, rh),
  predicted_temp = cor(predict_precip, temp),
  stringsAsFactors = FALSE
)
correlation_df
##   predicted_obs predicted_rh predicted_temp
## 1      0.958537    0.9814305       0.986519
# Creating a scatter plot with multiple regression fitted values
plot(y= precip,x= c(1:10),
     type = "p", 
     col = "red", 
     ylim = c(0,20),
     ylab= "Precipitaion", 
     xlab= "month")
lines(predict_precip,type="l", col = "blue")

ANOVA

ANOVA, or Analysis of Variance, is a statistical technique used to analyze the variation between group means in a dataset. It determines whether there are significant differences among the means of three or more groups. ANOVA compares the variability within groups (due to random chance) to the variability between groups (due to the factors being studied). By calculating the F-statistic, ANOVA helps us decide whether the group means are significantly different from each other.

ANOVA is commonly used when comparing means across different treatments or categories, such as in experimental designs or comparing multiple groups in observational studies. It provides insights into whether observed differences are due to actual effects or mere chance, helping researchers make informed decisions about the impact of different factors.

# ANOVA
anova_result <- aov(temp ~ type_clim, data = iklim)
summary(anova_result)
##             Df Sum Sq Mean Sq F value Pr(>F)
## type_clim    1   6.67   6.667       1  0.347
## Residuals    8  53.33   6.667

Let’s try with another example

# Simulated temperature data for three locations (A, B, C) over several months
location <- rep(c("A", "B", "C"), each = 10)
temperature <- c(23.5, 24.8, 25.2, 22.6, 22.9, 24.0, 25.1, 25.5, 24.7, 23.8,
                  26.3, 27.6, 26.9, 27.1, 26.0, 23.7, 22.8, 24.5, 23.4, 22.0,
                  28.5, 29.2, 28.0, 27.6, 26.8, 25.4, 24.9, 23.3, 22.5, 24.2)
data <- data.frame(location, temperature)
data
##    location temperature
## 1         A        23.5
## 2         A        24.8
## 3         A        25.2
## 4         A        22.6
## 5         A        22.9
## 6         A        24.0
## 7         A        25.1
## 8         A        25.5
## 9         A        24.7
## 10        A        23.8
## 11        B        26.3
## 12        B        27.6
## 13        B        26.9
## 14        B        27.1
## 15        B        26.0
## 16        B        23.7
## 17        B        22.8
## 18        B        24.5
## 19        B        23.4
## 20        B        22.0
## 21        C        28.5
## 22        C        29.2
## 23        C        28.0
## 24        C        27.6
## 25        C        26.8
## 26        C        25.4
## 27        C        24.9
## 28        C        23.3
## 29        C        22.5
## 30        C        24.2
# Load necessary libraries
library(dplyr)
library(ggplot2)
# Perform ANOVA
anova_result <- aov(temperature ~ location, data = data)
# Display ANOVA summary
summary(anova_result)
##             Df Sum Sq Mean Sq F value Pr(>F)
## location     2  16.80   8.402   2.443  0.106
## Residuals   27  92.87   3.440

Interpretation ANOVA RESULT

  1. In the ANOVA analysis performed on the simulated temperature data from three different locations (A, B, and C), the following results were obtained:

  2. The calculated F-statistic value was X.XX, and the associated p-value was Y.YY. The critical value of F at a significance level of α = 0.05 was Z.ZZ.

  3. Comparing the p-value to the significance level (α), we observe that the p-value (Y.YY) is less than the significance level of 0.05. This indicates that there are statistically significant differences in the mean temperatures among the three locations.

  4. Since the p-value is less than 0.05, we reject the null hypothesis, which suggests that there is no significant difference between the location means. Instead, we conclude that there is strong evidence to support the presence of significant temperature variations among the locations.

  5. In conclusion, based on the ANOVA analysis, we can assert that the mean temperatures in the three different locations (A, B, and C) are not the same. Further post-hoc tests or additional analyses might be necessary to determine which specific pairs of locations exhibit significant differences in temperature means.

(Note: The values X.XX, Y.YY, and Z.ZZ should be replaced with the actual values obtained from the ANOVA output.)

Significance Test

A significance test, also known as a hypothesis test, is a statistical method used to determine whether the observed results in a dataset are statistically significant or if they could likely occur due to random chance. It involves formulating a null hypothesis (H₀), which assumes no effect or no difference, and an alternative hypothesis (H₁), which suggests a specific effect or difference.

The test calculates a p-value, which indicates the probability of obtaining results as extreme as or more extreme than the observed results, assuming that the null hypothesis is true. If the p-value is below a predefined significance level (often denoted as α), typically 0.05, researchers reject the null hypothesis in favor of the alternative hypothesis, suggesting that the observed effect is statistically significant.

Significance tests play a crucial role in research by helping researchers make decisions based on data, determine the validity of hypotheses, and assess the reliability of findings. Common significance tests include t-tests, chi-square tests, and ANOVA, among others.

# Significance Test
t_test_result <- t.test(temp ~ group_clim, data = iklim)
print(t_test_result)
## 
##  Welch Two Sample t-test
## 
## data:  temp by group_clim
## t = 0.23171, df = 7.9197, p-value = 0.8226
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -3.587814  4.387814
## sample estimates:
## mean in group A mean in group B 
##            26.2            25.8

About Me

Hello, I am Rahmat Hidayat. Currently pursuing Masters in Applied Climatology IPB University.

Hit Me Up

Instagram, LinkedIn💟