by Prateek Sarangi, Mon , Mar 16 2020
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
heart <- read.csv("~/HeartDisease/heart.csv")
head(heart)
It uses the normalize function defined above and then categorize the data into four groups.
After normalization the valuesaranges from 0-1
It is grouped as follows
- 0.00 to 0.25 as Group 1 giving it value 0.1
- 0.25 to 0.50 as Group 2 giving it value 0.4
- 0.50 to 0.75 as Group 3 giving it value 0.6
- 0.75 to 1.00 as Group 4 giving it value 0.9
dfNorm <- as.data.frame(lapply(heart["age"], normalize))
heart["age"] <- dfNorm
heart["age"] <- as.data.frame(lapply(heart["age"], function(x){replace(x,between(x, 0.0, 0.25), 0.1)}))
heart["age"] <- as.data.frame(lapply(heart["age"], function(x){replace(x,between(x, 0.25, 0.6), 0.4)}))
heart["age"] <- as.data.frame(lapply(heart["age"], function(x){replace(x,between(x, 0.5, 0.75), 0.6)}))
heart["age"] <- as.data.frame(lapply(heart["age"], function(x){replace(x,between(x, 0.75, 1), 0.9)}))
It uses the normalize function defined above and then categorize the data into three groups.
After normalization the valuesaranges from 0-1
It is grouped as follows
- 0.00 to 0.33 as Group 1 giving it value 0.2
- 0.33 to 0.67 as Group 2 giving it value 0.6
- 0.67 to 1.00 as Group 3 giving it value 1.0
dfNorm <- as.data.frame(lapply(heart["trestbps"], normalize))
heart["trestbps"] <- dfNorm
heart["trestbps"] <- as.data.frame(lapply(heart["trestbps"], function(x){replace(x, between(x, 0.0, 0.33), 0.2)}))
heart["trestbps"] <- as.data.frame(lapply(heart["trestbps"], function(x){replace(x, between(x, 0.33, 0.67), 0.6)}))
heart["trestbps"] <- as.data.frame(lapply(heart["trestbps"], function(x){replace(x, between(x, 0.67, 1), 1)}))
It uses the normalize function defined above and then categorize the data into five groups.
After normalization the valuesaranges from 0-1
It is grouped as follows
- 0.00 to 0.20 as Group 1 giving it value 0.1
- 0.20 to 0.40 as Group 2 giving it value 0.3
- 0.40 to 0.60 as Group 3 giving it value 0.5
- 0.60 to 0.80 as Group 4 giving it value 0.7
- 0.80 to 1.00 as Group 5 giving it value 0.9
dfNorm <- as.data.frame(lapply(heart["chol"], normalize))
heart["chol"] <- dfNorm
heart["chol"] <- as.data.frame(lapply(heart["chol"], function(x){replace(x, between(x, 0.0, 0.2), 0.1)}))
heart["chol"] <- as.data.frame(lapply(heart["chol"], function(x){replace(x, between(x, 0.2, 0.4), 0.3)}))
heart["chol"] <- as.data.frame(lapply(heart["chol"], function(x){replace(x, between(x, 0.4, 0.6), 0.5)}))
heart["chol"] <- as.data.frame(lapply(heart["chol"], function(x){replace(x, between(x, 0.6, 0.8), 0.7)}))
heart["chol"] <- as.data.frame(lapply(heart["chol"], function(x){replace(x, between(x, 0.8, 1), 0.9)}))
It replaces the values of chest pain into different groups.
- Value 0:- Typical angina(Provided value 0.1).
- Value 1:- Atypical angina(Provided value 0.6).
- Value 2:- Non-anginal pain(Provided value 0.9).
- Value 3:- Asymptomatic(Provided value 0.01).
heart["cp"] <- as.data.frame(lapply(heart["cp"], function(x){replace(x, x == 0, 0.1)}))
heart["cp"] <- as.data.frame(lapply(heart["cp"], function(x){replace(x, x == 1, 0.6)}))
heart["cp"] <- as.data.frame(lapply(heart["cp"], function(x){replace(x, x == 2, 0.9)}))
heart["cp"] <- as.data.frame(lapply(heart["cp"], function(x){replace(x, x == 3, 0.01)}))
dfNorm <- as.data.frame(lapply(heart["thalach"], normalize))
heart["thalach"] <- dfNorm
It uses the normalize function defined above and then categorize the data into three groups.
Thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
After normalization the valuesaranges from 0-1
It is grouped as follows
- 0.00 to 0.25 as Group 1 giving it value 0.5
- 0.25 to 0.50 as Group 2 giving it value 0.6
- 0.50 to 0.75 as Group 3 giving it value 0.9
- 0.75 to 1.00 as Group 4 giving it value 0.1
dfNorm <- as.data.frame(lapply(heart["thal"], normalize))
heart["thal"] <- dfNorm
heart["thal"] <- as.data.frame(lapply(heart["thal"], function(x){replace(x, between(x, 0.0, 0.25), 0.5)}))
heart["thal"] <- as.data.frame(lapply(heart["thal"], function(x){replace(x, between(x, 0.25, 0.50), 0.6)}))
heart["thal"] <- as.data.frame(lapply(heart["thal"], function(x){replace(x, between(x, 0.50, 0.75), 0.9)}))
heart["thal"] <- as.data.frame(lapply(heart["thal"], function(x){replace(x, between(x, 0.75, 1.00), 0.1)}))
It replaces the values of Rest ECG in the following manner.
- Value 0:- Normal(Provided value 0.3).
- Value 1:- Having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)(Provided value 0.9).
- Value 2:- showing probable or definite left ventricular hypertrophy by Estes’ criteria.(Provided value 0.1)
heart["restecg"] <- as.data.frame(lapply(heart["restecg"], function(x){replace(x, x == 0, 0.3)}))
heart["restecg"] <- as.data.frame(lapply(heart["restecg"], function(x){replace(x, x == 1, 0.9)}))
heart["restecg"] <- as.data.frame(lapply(heart["restecg"], function(x){replace(x, x == 2, 0.1)}))
It replaces the values of peak exercise ST segment into different groups.
- Value 0:- Upsloping(Provided value 0.01).
- Value 1:- Flat(Provided value 0.2).
- Value 2:- Downsloping(Provided value 0.9).
heart["slope"] <- as.data.frame(lapply(heart["slope"], function(x){replace(x, x == 0, 0.01)}))
heart["slope"] <- as.data.frame(lapply(heart["slope"], function(x){replace(x, x == 1, 0.2)}))
heart["slope"] <- as.data.frame(lapply(heart["slope"], function(x){replace(x, x == 2, 0.9)}))
It replaces the values of major vessels colored into different groups.
- Value 0:- Upsloping(Provided value 0.09).
- Value 1:- Flat(Provided value 0.6).
- Value 2:- Downsloping(Provided value 0.45).
- Value 3:- Downsloping(Provided value 0.3).
- Value 4:- Downsloping(Provided value 0.1).
heart["ca"] <- as.data.frame(lapply(heart["ca"], function(x){replace(x, x == 0, 0.9)}))
heart["ca"] <- as.data.frame(lapply(heart["ca"], function(x){replace(x, x == 1, 0.6)}))
heart["ca"] <- as.data.frame(lapply(heart["ca"], function(x){replace(x, x == 2, 0.45)}))
heart["ca"] <- as.data.frame(lapply(heart["ca"], function(x){replace(x, x == 3, 0.3)}))
heart["ca"] <- as.data.frame(lapply(heart["ca"], function(x){replace(x, x == 4, 0.1)}))
It replaces the values of fasting bloog sugar into different groups.
- Value 0:- Bloog sugar < 120 mg/dl(Provided value 0.9).
- Value 1:- Bloog sugar > 120 mg/dl(Provided value 0.1).
heart["fbs"] <- as.data.frame(lapply(heart["fbs"], function(x){replace(x, x == 0, 0.9)}))
heart["fbs"] <- as.data.frame(lapply(heart["fbs"], function(x){replace(x, x == 1, 0.1)}))
It replaces the values of sex into different groups.
- Value 1:- Male, is replaced with 0.9
- Value 0:- Female, is replaced with 0.1
heart["sex"] <- as.data.frame(lapply(heart["sex"], function(x){replace(x, x == 0, 0.1)}))
heart["sex"] <- as.data.frame(lapply(heart["sex"], function(x){replace(x, x == 1, 0.9)}))
It replaces the values of peak exercise ST segment into different groups.
- Value 0:- No(Provided value 0.9).
- Value 1:- Yes(Provided value 0.1).
heart["exang"] <- as.data.frame(lapply(heart["exang"], function(x){replace(x, x == 0, 0.9)}))
heart["exang"] <- as.data.frame(lapply(heart["exang"], function(x){replace(x, x == 1, 0.1)}))
We are using both randomly generated and sequentially choosen 75% of the data as training set and rest 25% as our test set.
train_ind_rand gives the indeces of the samples which are to be used as the training sample in the dataset.
trainrand gives the randomly choosen train dataset.
testrand given the randomly choosen test dataset.
trainseq gives the sequencially choosen train dataset.
testseq gives the sequentially choosen test dataset.
smp_size <- floor(0.75 * nrow(heart))
train_ind_rand <- sample(seq_len(nrow(heart)), size = smp_size)
trainrand <- heart[train_ind_rand, ]
testrand <- heart[-train_ind_rand, ]
trainseq <- heart[1:227, ]
testseq <- heart[227:303, ]
write.csv(heart, "~/HeartDisease/heart1.csv", row.names = FALSE)
write.csv(trainrand, "~/HeartDisease/trainrand.csv", row.names = FALSE)
write.csv(testrand, "~/HeartDisease/testrand.csv", row.names = FALSE)
write.csv(trainseq, "~/HeartDisease/trainseq.csv", row.names = FALSE)
write.csv(testseq, "~/HeartDisease/testseq.csv", row.names = FALSE)