1. Introduction:

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. R is a programming language developed by Ross Ihaka and Robert Gentleman at the University of Auckland in 1993

Why use R for statistical computing and graphics?

It runs on all platforms.
It is the popular programming language and increasing in popularity.
It is an open-source and free programming language.
It is being used by the biggest tech giants.
R and its libraries implement a wide variety of statistical and graphical techniques.
It is easily extensible through functions and extensions.
R community is noted for its active contributions in terms of packages.

Application of R programming in the real world

Social Media: Behavior analysis, Sentiment Analysis
IT: Business Intelligence Software Development and Machine learning Product
Finance: Stock Market Modeling and Fraud Detection
Government: Weather Forecasting and Record-Keeping
Research and Academic
E-Commerce
Banking Sector
Health Care
Manufacturing Industry

How to download & install R, R studio

R can be downloaded for free from https://www.r-project.org/
R Studio allows the user to run R in a more user-friendly environment. It is open-source (i.e. free) and available at https://rstudio.com/products/rstudio/download/

1.1 R as a calculator

1.1.1 Performing variuos arithmetic operations: Addition(+),Subtraction(-), multiplication(*), exponent(^) etc

25+3

## [1] 28

25-3

## [1] 22

25*3

## [1] 75

25/3   # division

## [1] 8.333333

25%/%3 # given the integer value when 25 is divided by 3

## [1] 8

2**3

## [1] 8

2^3

## [1] 8

1.1.2 Builtin functions

Built-in functions refer to a set of pre-defined functions

exp(6)

## [1] 403.4288

sqrt(36) #square root of 36

## [1] 6

sum(2,3,4,5,6)

## [1] 20

log(64)

## [1] 4.158883

log(10,2) #log 10 to base 2

## [1] 3.321928

log(42,10)#log 42 to the base 10

## [1] 1.623249

log(5,3)

## [1] 1.464974

factorial(5) # 5!= 1*2*3*4*5

## [1] 120

abs(-8.5)#absolute value

## [1] 8.5

floor(3.8)  #greatest integer less than 3.8

## [1] 3

ceiling(3.2) #next integer to 3.2

## [1] 4

rep(35,times=10) #repeate 35 10 times

##  [1] 35 35 35 35 35 35 35 35 35 35

rep("Happy", times=5)

## [1] "Happy" "Happy" "Happy" "Happy" "Happy"

## [1] "Happy" "Happy" "Happy" "Happy" "Happy"
5:9  # display numbers from 5 to 9

## [1] 5 6 7 8 9

1:100

##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100

seq(5,9) # generates a sequence of numbers from 5 to 9,

## [1] 5 6 7 8 9

seq(5,10,0.5) #generates a sequence of numbers starting from 5, incrementing by 0.5 till 10

##  [1]  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0

seq(1,50,5)

##  [1]  1  6 11 16 21 26 31 36 41 46

1.1.3 Relational Operators

Relational operators in R are used to compare values and determine the relationship between them.

2<5

## [1] TRUE

3>9

## [1] FALSE

2+3==6 #to check whether 'is equal to'

## [1] FALSE

2!=3 #to check whether 'not equal to'

## [1] TRUE

1.2 Assigning value to a variable

we can assign a value to a variable using the assignment operator <- or the equal sign = syntax: variable_name <- value

x <- 48  # x is assigned the value 48
print(x)

## [1] 48

x/5

## [1] 9.6

x*2

## [1] 96

y<-"Happy"
rm(x) #to remove x from memory
rm(y)

1.3 Vectors

In R, a vector is a fundamental data structure that can hold a collection of values of the same data type. Vectors are essential in R and are used extensively in data analysis, statistics, and programming. c() is used to create vectors.

a<-c(20,30,40,45,56) #easiest way to create a vector in R

str(a)  #  it will provide information about the data type of a and its contents

##  num [1:5] 20 30 40 45 56

print(a)

## [1] 20 30 40 45 56

View(a)

Subsetting (operator [ ])

Subsetting using the [ ] operator in R allows you to extract specific elements or subsets of elements from an object

a[3] # to extract third value or subset of vector "a" with 3rd element

## [1] 40

a[5]

## [1] 56

a[c(1,3,5)] #to extract several values

## [1] 20 40 56

a[-2] #to drop second value

## [1] 20 40 45 56

a[-3]

## [1] 20 30 45 56

a[c(-2,-5)]# to drop 2nd and 5th value

## [1] 20 40 45

length(a) # To find number of elements

## [1] 5

Find the class of a vector

a1<-c(20,30,40,45,56) 
class(a) # To find class of a vector

## [1] "numeric"

m <- c(5, 'a', -1, 2)
class(m)

## [1] "character"

m<-c(TRUE,F,T,FALSE)
class(m)

## [1] "logical"

sapply(m, class)  # display class of all elements

## [1] "logical" "logical" "logical" "logical"

#Operations on vectors

b<-c(1,2,3,4,5)
a+b

## [1] 21 32 43 49 61

a*b

## [1]  20  60 120 180 280

a-b

## [1] 19 28 37 41 51

Question 1

Construct a vector with elements -2,3,-6,10,7 and assign it as X and another vector with elements 11,23,14,52,16 and assign it as Y a) Find length of X and Y b) Remove 2 nd element from Y c) Find 4th element of X d) Find X+Y , X*Y e) Find Y/X and round to 1 decimal place

X<-c(-2,-3,-6,10,7)
Y<-c(11,23,14,52,16)
length(X)

## [1] 5

length(Y)

## [1] 5

Y[-2]

## [1] 11 14 52 16

X[4]

## [1] 10

X+Y

## [1]  9 20  8 62 23

X*Y

## [1] -22 -69 -84 520 112

round(Y/X,1)

## [1] -5.5 -7.7 -2.3  5.2  2.3

data <- c("apple", 3.14, 42, TRUE)
class(data[2])

## [1] "character"

Question 2 The weight before and after a diet plan for a group of

5 people are given. Find the weight loss and also its mean

before: 78,72,78,79,105

after : 67,65,79,70,93

before<-c(78,72,78,79,105)
after<-c(67,65,79,70,93)
wtloss<-before-after
wtloss

## [1] 11  7 -1  9 12

mean(wtloss)

## [1] 7.6

max(wtloss)

## [1] 12

min(wtloss)

## [1] -1

Practice Questions

Question 1:

Create a vector grades with elements “A”, “B”, “C”, “D”, “F”. Change the third element to “B+”.

Question 2:

Create a vector temperatures with elements 72, 68, 75, 80, 77. Convert the temperatures from Fahrenheit to Celsius using the formula (Fahrenheit - 32) * 5/9. Round the Celsius temperatures to one decimal place.

Question 3:

Create two vectors, vector1 with elements 1 to 5 and vector2 with elements 6 to 10. Calculate the product of these vectors.

Question 4:

Create a vector names with elements “John”, “Jane”, “Bob”, “Alice”, “Eve”. Extract the first three elements of the vector and store them in a new vector called subset_names.

Question 5:

Create a vector data with the following values: “apple”, 3.14, 42, TRUE. Find the class of each element in the vector and store the results in a new vector called data_classes.

Dataframe

In a dataframe, data is organized into rows and columns, where each column can contain data of a different data type.

# create a DATA FRAME

Applied <- data.frame(
                  Specialization = c("Bio","Chem","ES","Bio"),
                  Level = c("AD","Btech","Dip","Btech"),
                  GPA = c(3.21,3.63,2.15,2.01))
Applied

##   Specialization Level  GPA
## 1            Bio    AD 3.21
## 2           Chem Btech 3.63
## 3             ES   Dip 2.15
## 4            Bio Btech 2.01

AS2 <- Applied[,c(2,3)]

# SUBSETTING: create a subset of dataframe 'Applied' containing the
# last and first columns only. Call it 'AS3'

AS3 <- Applied[,c(3,1)]
AS3

##    GPA Specialization
## 1 3.21            Bio
## 2 3.63           Chem
## 3 2.15             ES
## 4 2.01            Bio

# SUBSETTING: create a subset of dataframe 'Applied' containing the
# 2nd column and last three rows only. Call it 'AS4'
AS4 <- Applied[c(2:4),2]
AS4

## [1] "Btech" "Dip"   "Btech"

# Create a dataframe
data <- data.frame(
  Name = c("Alice", "Bob", "Charlie", "David", "Eva"),
  Age = c(25, 30, 22, 35, 28),
  Score = c(92, 85, 78, 96, 88)
)

# Print the dataframe
print(data)

##      Name Age Score
## 1   Alice  25    92
## 2     Bob  30    85
## 3 Charlie  22    78
## 4   David  35    96
## 5     Eva  28    88

# Access specific columns
names <- data$Name
ages <- data$Age
scores <- data$Score

# Calculate summary statistics
mean_age <- mean(ages)
mean_score <- mean(scores)

cat("Mean Age: ", mean_age, "\n")

## Mean Age:  28

cat("Mean Score: ", mean_score, "\n")

## Mean Score:  87.8

# Filter data based on a condition (e.g., age greater than 25)
filtered_data <- data[data$Age > 25, ]

# Print the filtered dataframe
print(filtered_data)

##    Name Age Score
## 2   Bob  30    85
## 4 David  35    96
## 5   Eva  28    88

# Sort the dataframe by Age in descending order
sorted_data <- data[order(data$Age, decreasing = TRUE), ]

# Print the sorted dataframe
print(sorted_data)

##      Name Age Score
## 4   David  35    96
## 2     Bob  30    85
## 5     Eva  28    88
## 1   Alice  25    92
## 3 Charlie  22    78

Practice Question Dataframe

Create a data frame named ‘student_data’ with the following columns: ‘Name’ (character), ‘Age’ (numeric), ‘Gender’ (factor), and ‘Score’ (numeric). Add at least 5 rows of data to this data frame.

student_data=data.frame(
  Name = c('Arun','gopi','nivya','manu','unni'),
  Age = c( 25,30,15,25,23),
  Gender = c('Male','Male','female', 'Male','Male'),
  Score =c(50,25,76,83,11))

student_data

##    Name Age Gender Score
## 1  Arun  25   Male    50
## 2  gopi  30   Male    25
## 3 nivya  15 female    76
## 4  manu  25   Male    83
## 5  unni  23   Male    11

Data Frame Subsetting:

From the ‘student_data’ data frame, extract only the ‘Name’ and ‘Score’ columns into a new data frame named ‘name_and_score.’ Create a subset of ‘student_data’ that includes only male students with ages greater than 20.

name_and_score= student_data[,c(1,4)]

name_and_score

##    Name Score
## 1  Arun    50
## 2  gopi    25
## 3 nivya    76
## 4  manu    83
## 5  unni    11

List

In R, you can create lists using the list() function. A list is a versatile data structure that can hold various data types, including vectors, matrices, data frames, and even other lists. Here’s how you can create and work with lists in R:

# List 
shoplist <- list( 

    c("nails","hummer","woods"), 

    c("veg"), 

    c("notebooks"), 

    c(50,10,5), 

    Applied 

)

SUBSETTING: create a vector containing the first list from the ‘shoplist’. call this as ‘list1’

list1 <- shoplist[[1]] 
list1

## [1] "nails"  "hummer" "woods"

SUBSETTING: create a vector containing the 2nd element of the fourth list from the ‘shoplist’.
call this as ‘list2’

list2 <- shoplist[[4]]

SUBSETTING: create a containing the 2nd and element

of the fourth list from the ‘shoplist’.

call this as ‘list2’

list2 <- shoplist[[4]][[2]] 
list2

## [1] 10

Built in dataframes

cars <- mtcars 

View(cars) 

#OR 

View(mtcars) 

# show the first 6 rows 

head(cars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# show the first 3 rows 

head(cars,n=3)

##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

# show the last 6 rows 

tail(cars)

##                 mpg cyl  disp  hp drat    wt qsec vs am gear carb
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

# show the last 10 rows 

tail(cars,n=10)

##                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## AMC Javelin      15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28       13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9        27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2    26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa     30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L   15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino     19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora    15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E       21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

# To view in built dataframes 

View(mtcars) 

View(iris) 

View(ToothGrowth) 

View(PlantGrowth) 

View(USArrests)

R Programming stat 3101 Practical 1.1

Stat team

2023-09-16