Data Mining Lab

Experiment 3

1) Consider the below table. Convert the marks into grades using following criteria.

0-39=F Grade, 40-59= D Grade ,60-69= C Grade ,70-80=B Grade, 81-90= A Grade, 91-100= 0 Grade.

df4<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
df4

##   Sl.No    Roll.No                  Name  SSC
## 1     1 11PA1A0501       ARIGELA AVINASH 87.3
## 2     2 11PA1A0503    BALADARI KEERTHANA 89.0
## 3     3 11PA1A0504 BAVIRISETTI PRAVALIKA 67.0
## 4     4 11PA1A0505        BODDU SAI BABA 71.0
## 5     5 11PA1A0506    BONDAPALLISRINIVAS 67.0

Use the cut command for performing this task. break is used to specify the intervel

df4$SSC<-cut(df4$SSC,breaks=c(0,39,59,69,80,90,100),labels=c("F","D","C","B","A","O"))

print(df4)

##   Sl.No    Roll.No                  Name SSC
## 1     1 11PA1A0501       ARIGELA AVINASH   A
## 2     2 11PA1A0503    BALADARI KEERTHANA   A
## 3     3 11PA1A0504 BAVIRISETTI PRAVALIKA   C
## 4     4 11PA1A0505        BODDU SAI BABA   B
## 5     5 11PA1A0506    BONDAPALLISRINIVAS   C

2) Refer the table below convert the SSC percentage using

1) MIN MAX Normalization

2) Z Score Normalization

3) Normalization be decimal scaling

m<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
print(m)

##   Sl.No    Roll.No                  Name  SSC
## 1     1 11PA1A0501       ARIGELA AVINASH 87.3
## 2     2 11PA1A0503    BALADARI KEERTHANA 89.0
## 3     3 11PA1A0504 BAVIRISETTI PRAVALIKA 67.0
## 4     4 11PA1A0505        BODDU SAI BABA 71.0
## 5     5 11PA1A0506    BONDAPALLISRINIVAS 67.0

The measurement unit used can affect the data analysis.Many attributes like temperature, weight, height are unit dependent. To help avoid dependence on the choice of measurement units, we use normalization.

1) MIN MAX NORMALIZATION

The min A and max A are the minimum and maximum values of an attribute, A. Min-max normalization maps all these values between new min A and new max A defined by the user. The formula for MIN -MAX normalization is

We define a function min-max and pass in 3 arguments. 1) The data 2) New Min value ( In this case 1) and 3) New Max value (In this case 5).

min_max <- function(x,new_max=5,new_min=1)
{
  # Apply the formula of MIN MAX
  a= (((x-min(x))* (new_max-new_min))/(max(x)-min(x)))+new_min
  return(a)
}

Now we pass the data we want to normalize. Here in this case, SSC. We use lapply function to apply function on the data.

# Read the data file
m<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
# Pass the attributes you want to normalize  and apply the function on the attribute,
m["SSC"]<-(lapply(m["SSC"],min_max))

print(m)

##   Sl.No    Roll.No                  Name      SSC
## 1     1 11PA1A0501       ARIGELA AVINASH 4.690909
## 2     2 11PA1A0503    BALADARI KEERTHANA 5.000000
## 3     3 11PA1A0504 BAVIRISETTI PRAVALIKA 1.000000
## 4     4 11PA1A0505        BODDU SAI BABA 1.727273
## 5     5 11PA1A0506    BONDAPALLISRINIVAS 1.000000

2) ZSCORE Normalization.

In z-score normalization (or zero-mean normalization), the values for an attribute, A, are normalized based on the mean and standard deviation of A.

The formula for ZScore normalization is

we define a function ** zscorenorm** . Here we are caliculating mean and standard deviation of data x.

zscorenorm<-function(x)
{
  #sd stands for standard deviation.
  a=((x-mean(x))/sd(x))
}

Now we pass the data we want to normalize. Here in this case, SSC. We use lapply function to apply function on the data.

# Read the data file
m<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
# Pass the attributes you want to normalize  and apply the function on the attribute,
m["SSC"]<-(lapply(m["SSC"],zscorenorm))

print(m)

##   Sl.No    Roll.No                  Name        SSC
## 1     1 11PA1A0501       ARIGELA AVINASH  1.0043089
## 2     2 11PA1A0503    BALADARI KEERTHANA  1.1589579
## 3     3 11PA1A0504 BAVIRISETTI PRAVALIKA -0.8423823
## 4     4 11PA1A0505        BODDU SAI BABA -0.4785022
## 5     5 11PA1A0506    BONDAPALLISRINIVAS -0.8423823

3) Normalization by decimal scaling

Normalizes by moving the decimal point of attribute A. For Example, 36000/- is converted to 3.6.

The formula for normalization by decimal scaling is
In this function, we are dividing the data by 100 and making it small.

normalize_ds<-function(x)
{
  a=x/100
  return(a)
}

Now we pass the data we want to normalize. Here in this case, SSC. We use lapply function to apply function on the data.

# Read the data file
m<-read.csv("C:/Users/pradeep/OneDrive/dm,cns,cp and JKC/data mining/dm files 2020 passouts/lab/sheet4.csv")
# Pass the attributes you want to normalize  and apply the function on the attribute,
m["SSC"]<-(lapply(m["SSC"],normalize_ds))

print(m)

##   Sl.No    Roll.No                  Name   SSC
## 1     1 11PA1A0501       ARIGELA AVINASH 0.873
## 2     2 11PA1A0503    BALADARI KEERTHANA 0.890
## 3     3 11PA1A0504 BAVIRISETTI PRAVALIKA 0.670
## 4     4 11PA1A0505        BODDU SAI BABA 0.710
## 5     5 11PA1A0506    BONDAPALLISRINIVAS 0.670

Data Mining Lab

I Kali Pradeep

March 10, 2019

Experiment 3

1) Consider the below table. Convert the marks into grades using following criteria.

2) Refer the table below convert the SSC percentage using

1) MIN MAX Normalization

2) Z Score Normalization

3) Normalization be decimal scaling

1) MIN MAX NORMALIZATION

2) ZSCORE Normalization.

3) Normalization by decimal scaling