0-39=F Grade, 40-59= D Grade ,60-69= C Grade ,70-80=B Grade, 81-90= A Grade, 91-100= 0 Grade.
df4<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
df4
## Sl.No Roll.No Name SSC
## 1 1 11PA1A0501 ARIGELA AVINASH 87.3
## 2 2 11PA1A0503 BALADARI KEERTHANA 89.0
## 3 3 11PA1A0504 BAVIRISETTI PRAVALIKA 67.0
## 4 4 11PA1A0505 BODDU SAI BABA 71.0
## 5 5 11PA1A0506 BONDAPALLISRINIVAS 67.0
Use the cut command for performing this task. break is used to specify the intervel
df4$SSC<-cut(df4$SSC,breaks=c(0,39,59,69,80,90,100),labels=c("F","D","C","B","A","O"))
print(df4)
## Sl.No Roll.No Name SSC
## 1 1 11PA1A0501 ARIGELA AVINASH A
## 2 2 11PA1A0503 BALADARI KEERTHANA A
## 3 3 11PA1A0504 BAVIRISETTI PRAVALIKA C
## 4 4 11PA1A0505 BODDU SAI BABA B
## 5 5 11PA1A0506 BONDAPALLISRINIVAS C
m<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
print(m)
## Sl.No Roll.No Name SSC
## 1 1 11PA1A0501 ARIGELA AVINASH 87.3
## 2 2 11PA1A0503 BALADARI KEERTHANA 89.0
## 3 3 11PA1A0504 BAVIRISETTI PRAVALIKA 67.0
## 4 4 11PA1A0505 BODDU SAI BABA 71.0
## 5 5 11PA1A0506 BONDAPALLISRINIVAS 67.0
The measurement unit used can affect the data analysis.Many attributes like temperature, weight, height are unit dependent. To help avoid dependence on the choice of measurement units, we use normalization.
The min A and max A are the minimum and maximum values of an attribute, A. Min-max normalization maps all these values between new min A and new max A defined by the user. The formula for MIN -MAX normalization is
We define a function min-max and pass in 3 arguments. 1) The data 2) New Min value ( In this case 1) and 3) New Max value (In this case 5).
min_max <- function(x,new_max=5,new_min=1)
{
# Apply the formula of MIN MAX
a= (((x-min(x))* (new_max-new_min))/(max(x)-min(x)))+new_min
return(a)
}
Now we pass the data we want to normalize. Here in this case, SSC. We use lapply function to apply function on the data.
# Read the data file
m<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
# Pass the attributes you want to normalize and apply the function on the attribute,
m["SSC"]<-(lapply(m["SSC"],min_max))
print(m)
## Sl.No Roll.No Name SSC
## 1 1 11PA1A0501 ARIGELA AVINASH 4.690909
## 2 2 11PA1A0503 BALADARI KEERTHANA 5.000000
## 3 3 11PA1A0504 BAVIRISETTI PRAVALIKA 1.000000
## 4 4 11PA1A0505 BODDU SAI BABA 1.727273
## 5 5 11PA1A0506 BONDAPALLISRINIVAS 1.000000
In z-score normalization (or zero-mean normalization), the values for an attribute, A, are normalized based on the mean and standard deviation of A.
The formula for ZScore normalization is
we define a function ** zscorenorm** . Here we are caliculating mean and standard deviation of data x.
zscorenorm<-function(x)
{
#sd stands for standard deviation.
a=((x-mean(x))/sd(x))
}
Now we pass the data we want to normalize. Here in this case, SSC. We use lapply function to apply function on the data.
# Read the data file
m<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
# Pass the attributes you want to normalize and apply the function on the attribute,
m["SSC"]<-(lapply(m["SSC"],zscorenorm))
print(m)
## Sl.No Roll.No Name SSC
## 1 1 11PA1A0501 ARIGELA AVINASH 1.0043089
## 2 2 11PA1A0503 BALADARI KEERTHANA 1.1589579
## 3 3 11PA1A0504 BAVIRISETTI PRAVALIKA -0.8423823
## 4 4 11PA1A0505 BODDU SAI BABA -0.4785022
## 5 5 11PA1A0506 BONDAPALLISRINIVAS -0.8423823
Normalizes by moving the decimal point of attribute A. For Example, 36000/- is converted to 3.6.
The formula for normalization by decimal scaling is
In this function, we are dividing the data by 100 and making it small.
normalize_ds<-function(x)
{
a=x/100
return(a)
}
Now we pass the data we want to normalize. Here in this case, SSC. We use lapply function to apply function on the data.
# Read the data file
m<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
# Pass the attributes you want to normalize and apply the function on the attribute,
m["SSC"]<-(lapply(m["SSC"],normalize_ds))
print(m)
## Sl.No Roll.No Name SSC
## 1 1 11PA1A0501 ARIGELA AVINASH 0.873
## 2 2 11PA1A0503 BALADARI KEERTHANA 0.890
## 3 3 11PA1A0504 BAVIRISETTI PRAVALIKA 0.670
## 4 4 11PA1A0505 BODDU SAI BABA 0.710
## 5 5 11PA1A0506 BONDAPALLISRINIVAS 0.670