R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. R is a programming language developed by Ross Ihaka and Robert Gentleman at the University of Auckland in 1993
R can be downloaded for free from https://www.r-project.org/
R Studio allows the user to run R in a more user-friendly environment. It is open-source (i.e. free) and available at https://rstudio.com/products/rstudio/download/
25+3
## [1] 28
25-3
## [1] 22
25*3
## [1] 75
25/3 # division
## [1] 8.333333
25%/%3 # given the integer value when 25 is divided by 3
## [1] 8
2**3
## [1] 8
2^3
## [1] 8
Built-in functions refer to a set of pre-defined functions
exp(6)
## [1] 403.4288
sqrt(36) #square root of 36
## [1] 6
sum(2,3,4,5,6)
## [1] 20
log(64)
## [1] 4.158883
log(10,2) #log 10 to base 2
## [1] 3.321928
log(42,10)#log 42 to the base 10
## [1] 1.623249
log(5,3)
## [1] 1.464974
factorial(5) # 5!= 1*2*3*4*5
## [1] 120
abs(-8.5)#absolute value
## [1] 8.5
floor(3.8) #greatest integer less than 3.8
## [1] 3
ceiling(3.2) #next integer to 3.2
## [1] 4
rep(35,times=10) #repeate 35 10 times
## [1] 35 35 35 35 35 35 35 35 35 35
rep("Happy", times=5)
## [1] "Happy" "Happy" "Happy" "Happy" "Happy"
## [1] "Happy" "Happy" "Happy" "Happy" "Happy"
5:9 # display numbers from 5 to 9
## [1] 5 6 7 8 9
1:100
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
seq(5,9) # generates a sequence of numbers from 5 to 9,
## [1] 5 6 7 8 9
seq(5,10,0.5) #generates a sequence of numbers starting from 5, incrementing by 0.5 till 10
## [1] 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
seq(1,50,5)
## [1] 1 6 11 16 21 26 31 36 41 46
Relational operators in R are used to compare values and determine the relationship between them.
2<5
## [1] TRUE
3>9
## [1] FALSE
2+3==6 #to check whether 'is equal to'
## [1] FALSE
2!=3 #to check whether 'not equal to'
## [1] TRUE
we can assign a value to a variable using the assignment operator <- or the equal sign = syntax: variable_name <- value
x <- 48 # x is assigned the value 48
print(x)
## [1] 48
x/5
## [1] 9.6
x*2
## [1] 96
y<-"Happy"
rm(x) #to remove x from memory
rm(y)
In R, a vector is a fundamental data structure that can hold a collection of values of the same data type. Vectors are essential in R and are used extensively in data analysis, statistics, and programming. c() is used to create vectors.
a<-c(20,30,40,45,56) #easiest way to create a vector in R
str(a) # it will provide information about the data type of a and its contents
## num [1:5] 20 30 40 45 56
print(a)
## [1] 20 30 40 45 56
View(a)
Subsetting using the [ ] operator in R allows you to extract specific elements or subsets of elements from an object
a[3] # to extract third value or subset of vector "a" with 3rd element
## [1] 40
a[5]
## [1] 56
a[c(1,3,5)] #to extract several values
## [1] 20 40 56
a[-2] #to drop second value
## [1] 20 40 45 56
a[-3]
## [1] 20 30 45 56
a[c(-2,-5)]# to drop 2nd and 5th value
## [1] 20 40 45
length(a) # To find number of elements
## [1] 5
a1<-c(20,30,40,45,56)
class(a) # To find class of a vector
## [1] "numeric"
m <- c(5, 'a', -1, 2)
class(m)
## [1] "character"
m<-c(TRUE,F,T,FALSE)
class(m)
## [1] "logical"
sapply(m, class) # display class of all elements
## [1] "logical" "logical" "logical" "logical"
#Operations on vectors
b<-c(1,2,3,4,5)
a+b
## [1] 21 32 43 49 61
a*b
## [1] 20 60 120 180 280
a-b
## [1] 19 28 37 41 51
Construct a vector with elements -2,3,-6,10,7 and assign it as X and another vector with elements 11,23,14,52,16 and assign it as Y a) Find length of X and Y b) Remove 2 nd element from Y c) Find 4th element of X d) Find X+Y , X*Y e) Find Y/X and round to 1 decimal place
X<-c(-2,-3,-6,10,7)
Y<-c(11,23,14,52,16)
length(X)
## [1] 5
length(Y)
## [1] 5
Y[-2]
## [1] 11 14 52 16
X[4]
## [1] 10
X+Y
## [1] 9 20 8 62 23
X*Y
## [1] -22 -69 -84 520 112
round(Y/X,1)
## [1] -5.5 -7.7 -2.3 5.2 2.3
data <- c("apple", 3.14, 42, TRUE)
class(data[2])
## [1] "character"
before<-c(78,72,78,79,105)
after<-c(67,65,79,70,93)
wtloss<-before-after
wtloss
## [1] 11 7 -1 9 12
mean(wtloss)
## [1] 7.6
max(wtloss)
## [1] 12
min(wtloss)
## [1] -1
Create a vector grades with elements “A”, “B”, “C”, “D”, “F”. Change the third element to “B+”.
Create a vector temperatures with elements 72, 68, 75, 80, 77. Convert the temperatures from Fahrenheit to Celsius using the formula (Fahrenheit - 32) * 5/9. Round the Celsius temperatures to one decimal place.
Create two vectors, vector1 with elements 1 to 5 and vector2 with elements 6 to 10. Calculate the product of these vectors.
Create a vector names with elements “John”, “Jane”, “Bob”, “Alice”, “Eve”. Extract the first three elements of the vector and store them in a new vector called subset_names.
Create a vector data with the following values: “apple”, 3.14, 42, TRUE. Find the class of each element in the vector and store the results in a new vector called data_classes.
In a dataframe, data is organized into rows and columns, where each column can contain data of a different data type.
# create a DATA FRAME
Applied <- data.frame(
Specialization = c("Bio","Chem","ES","Bio"),
Level = c("AD","Btech","Dip","Btech"),
GPA = c(3.21,3.63,2.15,2.01))
Applied
## Specialization Level GPA
## 1 Bio AD 3.21
## 2 Chem Btech 3.63
## 3 ES Dip 2.15
## 4 Bio Btech 2.01
AS2 <- Applied[,c(2,3)]
# SUBSETTING: create a subset of dataframe 'Applied' containing the
# last and first columns only. Call it 'AS3'
AS3 <- Applied[,c(3,1)]
AS3
## GPA Specialization
## 1 3.21 Bio
## 2 3.63 Chem
## 3 2.15 ES
## 4 2.01 Bio
# SUBSETTING: create a subset of dataframe 'Applied' containing the
# 2nd column and last three rows only. Call it 'AS4'
AS4 <- Applied[c(2:4),2]
AS4
## [1] "Btech" "Dip" "Btech"
# Create a dataframe
data <- data.frame(
Name = c("Alice", "Bob", "Charlie", "David", "Eva"),
Age = c(25, 30, 22, 35, 28),
Score = c(92, 85, 78, 96, 88)
)
# Print the dataframe
print(data)
## Name Age Score
## 1 Alice 25 92
## 2 Bob 30 85
## 3 Charlie 22 78
## 4 David 35 96
## 5 Eva 28 88
# Access specific columns
names <- data$Name
ages <- data$Age
scores <- data$Score
# Calculate summary statistics
mean_age <- mean(ages)
mean_score <- mean(scores)
cat("Mean Age: ", mean_age, "\n")
## Mean Age: 28
cat("Mean Score: ", mean_score, "\n")
## Mean Score: 87.8
# Filter data based on a condition (e.g., age greater than 25)
filtered_data <- data[data$Age > 25, ]
# Print the filtered dataframe
print(filtered_data)
## Name Age Score
## 2 Bob 30 85
## 4 David 35 96
## 5 Eva 28 88
# Sort the dataframe by Age in descending order
sorted_data <- data[order(data$Age, decreasing = TRUE), ]
# Print the sorted dataframe
print(sorted_data)
## Name Age Score
## 4 David 35 96
## 2 Bob 30 85
## 5 Eva 28 88
## 1 Alice 25 92
## 3 Charlie 22 78
Create a data frame named ‘student_data’ with the following columns: ‘Name’ (character), ‘Age’ (numeric), ‘Gender’ (factor), and ‘Score’ (numeric). Add at least 5 rows of data to this data frame.
student_data=data.frame(
Name = c('Arun','gopi','nivya','manu','unni'),
Age = c( 25,30,15,25,23),
Gender = c('Male','Male','female', 'Male','Male'),
Score =c(50,25,76,83,11))
student_data
## Name Age Gender Score
## 1 Arun 25 Male 50
## 2 gopi 30 Male 25
## 3 nivya 15 female 76
## 4 manu 25 Male 83
## 5 unni 23 Male 11
Data Frame Subsetting:
From the ‘student_data’ data frame, extract only the ‘Name’ and ‘Score’ columns into a new data frame named ‘name_and_score.’ Create a subset of ‘student_data’ that includes only male students with ages greater than 20.
name_and_score= student_data[,c(1,4)]
name_and_score
## Name Score
## 1 Arun 50
## 2 gopi 25
## 3 nivya 76
## 4 manu 83
## 5 unni 11
In R, you can create lists using the list() function. A list is a versatile data structure that can hold various data types, including vectors, matrices, data frames, and even other lists. Here’s how you can create and work with lists in R:
# List
shoplist <- list(
c("nails","hummer","woods"),
c("veg"),
c("notebooks"),
c(50,10,5),
Applied
)
SUBSETTING: create a vector containing the first list from the ‘shoplist’. call this as ‘list1’
list1 <- shoplist[[1]]
list1
## [1] "nails" "hummer" "woods"
SUBSETTING: create a vector containing the 2nd element of the fourth
list from the ‘shoplist’.
call this as ‘list2’
list2 <- shoplist[[4]]
SUBSETTING: create a containing the 2nd and element
of the fourth list from the ‘shoplist’.
call this as ‘list2’
list2 <- shoplist[[4]][[2]]
list2
## [1] 10
cars <- mtcars
View(cars)
#OR
View(mtcars)
# show the first 6 rows
head(cars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# show the first 3 rows
head(cars,n=3)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# show the last 6 rows
tail(cars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
# show the last 10 rows
tail(cars,n=10)
## mpg cyl disp hp drat wt qsec vs am gear carb
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# To view in built dataframes
View(mtcars)
View(iris)
View(ToothGrowth)
View(PlantGrowth)
View(USArrests)