1. Introduction:

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. R is a programming language developed by Ross Ihaka and Robert Gentleman at the University of Auckland in 1993

Why use R for statistical computing and graphics?

Application of R programming in the real world

How to download & install R, R studio

# 1.1 R as a calculator
## 1.1.1 Performing variuos arithmetic operations: Addition(+),Subtraction(-), multiplication(*), exponent(^) etc
r 25+3
## [1] 28
r 25-3
## [1] 22
r 25*3
## [1] 75
r 25/3 # division
## [1] 8.333333
r 25%/%3 # given the integer value when 25 is divided by 3
## [1] 8
r 2**3
## [1] 8
r 2^3
## [1] 8
## 1.1.2 Builtin functions Built-in functions refer to a set of pre-defined functions
r exp(6)
## [1] 403.4288
r sqrt(36) #square root of 36
## [1] 6
r sum(2,3,4,5,6)
## [1] 20
r log(64)
## [1] 4.158883
r log(10,2) #log 10 to base 2
## [1] 3.321928
r log(42,10)#log 42 to the base 10
## [1] 1.623249
r log(5,3)
## [1] 1.464974
r factorial(5) # 5!= 1*2*3*4*5
## [1] 120
r abs(-8.5)#absolute value
## [1] 8.5
r floor(3.8) #greatest integer less than 3.8
## [1] 3
r ceiling(3.2) #next integer to 3.2
## [1] 4
r rep(35,times=10) #repeate 35 10 times
## [1] 35 35 35 35 35 35 35 35 35 35
r rep("Happy", times=5)
## [1] "Happy" "Happy" "Happy" "Happy" "Happy"
r ## [1] "Happy" "Happy" "Happy" "Happy" "Happy" 5:9 # display numbers from 5 to 9
## [1] 5 6 7 8 9
r 1:100
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 ## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 ## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 ## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 ## [91] 91 92 93 94 95 96 97 98 99 100
r seq(5,9) # generates a sequence of numbers from 5 to 9,
## [1] 5 6 7 8 9
r seq(5,10,0.5) #generates a sequence of numbers starting from 5, incrementing by 0.5 till 10
## [1] 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
r seq(1,50,5)
## [1] 1 6 11 16 21 26 31 36 41 46 ## 1.1.3 Relational Operators Relational operators in R are used to compare values and determine the relationship between them.
r 2<5
## [1] TRUE
r 3>9
## [1] FALSE
r 2+3==6 #to check whether 'is equal to'
## [1] FALSE
r 2!=3 #to check whether 'not equal to'
## [1] TRUE # 1.2 Assigning value to a variable we can assign a value to a variable using the assignment operator <- or the equal sign = syntax: variable_name <- value
r x <- 48 # x is assigned the value 48 print(x)
## [1] 48
r x/5
## [1] 9.6
r x*2
## [1] 96
r y<-"Happy" rm(x) #to remove x from memory rm(y) # 1.3 Vectors In R, a vector is a fundamental data structure that can hold a collection of values of the same data type. Vectors are essential in R and are used extensively in data analysis, statistics, and programming. c() is used to create vectors.
```r a<-c(20,30,40,45,56) #easiest way to create a vector in R
str(a) # it will provide information about the data type of a and its contents ```
## num [1:5] 20 30 40 45 56
r print(a)
## [1] 20 30 40 45 56
r View(a) # Subsetting (operator [ ])
Subsetting using the [ ] operator in R allows you to extract specific elements or subsets of elements from an object
r a[3] # to extract third value or subset of vector "a" with 3rd element
## [1] 40
r a[5]
## [1] 56
r a[c(1,3,5)] #to extract several values
## [1] 20 40 56
r a[-2] #to drop second value
## [1] 20 40 45 56
r a[-3]
## [1] 20 30 45 56
r a[c(-2,-5)]# to drop 2nd and 5th value
## [1] 20 40 45
r length(a) # To find number of elements
## [1] 5 # Find the class of a vector
r a1<-c(20,30,40,45,56) class(a) # To find class of a vector
## [1] "numeric"
r m <- c(5, 'a', -1, 2) class(m)
## [1] "character"
r m<-c(TRUE,F,T,FALSE) class(m)
## [1] "logical"
r sapply(m, class) # display class of all elements
## [1] "logical" "logical" "logical" "logical" #Operations on vectors
r b<-c(1,2,3,4,5) a+b
## [1] 21 32 43 49 61
r a*b
## [1] 20 60 120 180 280
r a-b
## [1] 19 28 37 41 51 # Question 1
Construct a vector with elements -2,3,-6,10,7 and assign it as X and another vector with elements 11,23,14,52,16 and assign it as Y a) Find length of X and Y b) Remove 2 nd element from Y c) Find 4th element of X d) Find X+Y , X*Y e) Find Y/X and round to 1 decimal place
r X<-c(-2,-3,-6,10,7) Y<-c(11,23,14,52,16) length(X)
## [1] 5
r length(Y)
## [1] 5
r Y[-2]
## [1] 11 14 52 16
r X[4]
## [1] 10
r X+Y
## [1] 9 20 8 62 23
r X*Y
## [1] -22 -69 -84 520 112
r round(Y/X,1)
## [1] -5.5 -7.7 -2.3 5.2 2.3
r data <- c("apple", 3.14, 42, TRUE) class(data[2])
## [1] "character"
# Question 2 The weight before and after a diet plan for a group of # 5 people are given. Find the weight loss and also its mean # before: 78,72,78,79,105 # after : 67,65,79,70,93
r before<-c(78,72,78,79,105) after<-c(67,65,79,70,93) wtloss<-before-after wtloss
## [1] 11 7 -1 9 12
r mean(wtloss)
## [1] 7.6
r max(wtloss)
## [1] 12
r min(wtloss)
## [1] -1
# Practice Questions
## Question 1: Create a vector grades with elements “A”, “B”, “C”, “D”, “F”. Change the third element to “B+”.
## Question 2: Create a vector temperatures with elements 72, 68, 75, 80, 77. Convert the temperatures from Fahrenheit to Celsius using the formula (Fahrenheit - 32) * 5/9. Round the Celsius temperatures to one decimal place.
# Question 3: Create two vectors, vector1 with elements 1 to 5 and vector2 with elements 6 to 10. Calculate the product of these vectors.
# Question 4: Create a vector names with elements “John”, “Jane”, “Bob”, “Alice”, “Eve”. Extract the first three elements of the vector and store them in a new vector called subset_names.
# Question 5: Create a vector data with the following values: “apple”, 3.14, 42, TRUE. Find the class of each element in the vector and store the results in a new vector called data_classes.

Dataframe

In a dataframe, data is organized into rows and columns, where each column can contain data of a different data type.

# create a DATA FRAME

Applied <- data.frame(
                  Specialization = c("Bio","Chem","ES","Bio"),
                  Level = c("AD","Btech","Dip","Btech"),
                  GPA = c(3.21,3.63,2.15,2.01))
AS2 <- Applied[,c(2,3)]

# SUBSETTING: create a subset of dataframe 'Applied' containing the
# last and first columns only. Call it 'AS3'

AS3 <- Applied[,c(3,1)]

# SUBSETTING: create a subset of dataframe 'Applied' containing the
# 2nd column and last three rows only. Call it 'AS4'
AS4 <- Applied[c(2:4),2]

# Create a dataframe
data <- data.frame(
  Name = c("Alice", "Bob", "Charlie", "David", "Eva"),
  Age = c(25, 30, 22, 35, 28),
  Score = c(92, 85, 78, 96, 88)
)

# Print the dataframe
print(data)
##      Name Age Score
## 1   Alice  25    92
## 2     Bob  30    85
## 3 Charlie  22    78
## 4   David  35    96
## 5     Eva  28    88
# Access specific columns
names <- data$Name
ages <- data$Age
scores <- data$Score

# Print the values of specific columns
cat("Names: ", names, "\n")
## Names:  Alice Bob Charlie David Eva
cat("Ages: ", ages, "\n")
## Ages:  25 30 22 35 28
cat("Scores: ", scores, "\n")
## Scores:  92 85 78 96 88
# Calculate summary statistics
mean_age <- mean(ages)
mean_score <- mean(scores)

cat("Mean Age: ", mean_age, "\n")
## Mean Age:  28
cat("Mean Score: ", mean_score, "\n")
## Mean Score:  87.8
# Filter data based on a condition (e.g., age greater than 25)
filtered_data <- data[data$Age > 25, ]

# Print the filtered dataframe
print(filtered_data)
##    Name Age Score
## 2   Bob  30    85
## 4 David  35    96
## 5   Eva  28    88
# Sort the dataframe by Age in descending order
sorted_data <- data[order(data$Age, decreasing = TRUE), ]

# Print the sorted dataframe
print(sorted_data)
##      Name Age Score
## 4   David  35    96
## 2     Bob  30    85
## 5     Eva  28    88
## 1   Alice  25    92
## 3 Charlie  22    78