3) Few sentences to explain the dataset :a)Identify numerical
columns. b)Identify categorical columns.c)Identify target variable
(normally the last column is the target variable)
print("The dataset contains information on 918 individuals with 12 variables, including Age, Sex, ChestPainType, RestingBP, Cholesterol, FastingBS, RestingECG, MaxHR, ExerciseAngina, Oldpeak, ST_Slope, and HeartDisease. The individuals' ages range from 28 to 77 years. There are 5 numerical features (Ages, RestingBP, Cholesterol, MaxHR, and Oldpeak ), while others are categorical (Sex, ChestPainType, FastingBS,RestingECG,ExerciseAngina,ST_Slope,HeartDisease). The last column 'HeartDisease' is the target variable .The dataset provides a comprehensive overview of various health-related attributes for the individuals studied.")
[1] "The dataset contains information on 918 individuals with 12 variables, including Age, Sex, ChestPainType, RestingBP, Cholesterol, FastingBS, RestingECG, MaxHR, ExerciseAngina, Oldpeak, ST_Slope, and HeartDisease. The individuals' ages range from 28 to 77 years. There are 5 numerical features (Ages, RestingBP, Cholesterol, MaxHR, and Oldpeak ), while others are categorical (Sex, ChestPainType, FastingBS,RestingECG,ExerciseAngina,ST_Slope,HeartDisease). The last column 'HeartDisease' is the target variable .The dataset provides a comprehensive overview of various health-related attributes for the individuals studied."
numeric_columns = heart[ ,c(1,4,5,8,10)]
numeric_columns
categorical_columns = heart[ ,c(2,3,6,7,9,11,12)]
categorical_columns
target_variable = heart[ ,12]
target_variable
[1] 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0
[49] 0 1 1 1 0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 1
[97] 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 1 1 1 1 0
[145] 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0
[193] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 1 1
[241] 0 1 1 0 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0
[289] 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1
[337] 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[385] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 1 0
[433] 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1
[481] 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 1 1 0 0 1 0
[529] 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1
[577] 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 1 1 0 1 0 0 0 1 1 1
[625] 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 1 1 1 1 1 0 0 0 0 0 1
[673] 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 1 1 1 1 1 0 1 0 0 0 1 0 1 1
[721] 1 0 1 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 0
[769] 0 0 0 1 0 0 1 1 1 0 1 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0
[817] 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 1 1 0 1
[865] 1 1 0 1 0 0 0 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0 1 1 1 0 0 0 1 0 1 1 1 0 1
[913] 1 1 1 1 1 0
4. Remove all the categorical columns (not the target column) from
the dataset. Now we will call all the numerical columns as features and
the last column as target or class.
Examine_data = heart[ ,c(-2,-3,-6,-7,-9,-11)]
Examine_data
target = heart$HeartDisease
target
[1] 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0
[49] 0 1 1 1 0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 1
[97] 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 1 1 1 1 0
[145] 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0
[193] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 1 1
[241] 0 1 1 0 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0
[289] 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1
[337] 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[385] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1 1 1 1 0
[433] 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1
[481] 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 1 1 0 0 1 0
[529] 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1
[577] 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 1 1 0 1 0 0 0 1 1 1
[625] 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 1 1 1 1 1 0 0 0 0 0 1
[673] 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 1 1 1 1 1 0 1 0 0 0 1 0 1 1
[721] 1 0 1 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 1 0 0 0 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 0
[769] 0 0 0 1 0 0 1 1 1 0 1 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0
[817] 1 1 1 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 1 1 0 1
[865] 1 1 0 1 0 0 0 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0 1 1 1 0 0 0 1 0 1 1 1 0 1
[913] 1 1 1 1 1 0
5.Scatter plot of any two features - each class (target) should be
represented with different colors.
target_color = as.numeric(factor(heart$HeartDisease))
plot(heart$Age,heart$Cholesterol,
col = target_color,
pch = 2,
cex = 0.5,
xlim = c(20,80),
ylim = c(1, 500),
xlab = "Age" ,
ylab = "Cholesterol",
main = "Visualization of Age and Cholesterol",
col.main = 'blue',
col.axis = 'black',
col.lab = 'red',
cex.main = 1.5,
cex.axis = 1,
cex.lab = 1,
)

6. Use the ggplot library functions to plot the following and
explain each figure in (2-3) sentences.
(i) Sclater plot between Age and MaxHR
library(ggplot2)
heart = read.csv('heart.csv')
target_color = as.character(heart$HeartDisease)
ggplot(heart, aes(x = Age, y = MaxHR, color = target_color)) +
geom_point(size = 2)+
scale_color_manual(values = c("0" = "green", "1" = "red")) +
guides(color = guide_legend(order = 1),
size = guide_legend(order = 2),
shape = guide_legend(order = 3))+
labs(title = "Sclater plot between Age and MaxHR",
x = "Age",
y = "MaxHR",
caption = "Source: Iskulghar") +
theme(
legend.position = "top",
text = element_text(colour = 'black', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

NA
NA
(ii) Sclater plot between Age and Cholesterol
ggplot(heart, aes(x = Age, y = Cholesterol, color = target_color)) +
geom_point(size = 2)+
scale_color_manual(values = c("0"="pink","1"= "purple"))+
guides(color = guide_legend(order = 1),
size = guide_legend(order = 2),
shape = guide_legend(order = 3))+
labs(title = "Sclater plot between Age and Cholesterol",
x = "Age",
y = "Cholesterol",
caption = "Source: Iskulghar") +
theme(
legend.position = "top",
text = element_text(colour = "blue", size = 15),
axis.text.x = element_text(color = "black", size = 10),
axis.text.y = element_text(color = "black", size = 10))

NA
(iii) Sclater plot between Age and RestingBP
Heartdisease= as.character(heart$HeartDisease)
ggplot(heart, aes(x = Age, y = RestingBP, color = Heartdisease)) +
geom_point(size = 2)+
scale_color_manual(values = c("0"="cyan3","1"= "darkblue"))+
guides(color = guide_legend(order = 1),
size = guide_legend(order = 2),
shape = guide_legend(order = 2))+
labs(title = "Sclater plot between Age and RestingBP",
x = "Age",
y = "RestingBP",
caption = "Source: Iskulghar") +
theme(
plot.title = element_text(size = 20, color = "darkblue"),
legend.position = "top",
text = element_text(colour = 'black', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

NA
(iv) Sclater plot between Age and Oldpeak
ggplot(heart, aes(x = Age, y = Oldpeak, color = Heartdisease)) +
geom_point(size = 2)+
guides(color = guide_legend(order = 1),
size = guide_legend(order = 2),
shape = guide_legend(order = 3))+
labs(title = "Sclater plot between Age and Oldpeak",
x = "Age",
y = "Oldpeak",
caption = "Source: Iskulghar") +
theme(
plot.title = element_text(size = 20, color = "red"),
legend.position = "top",
text = element_text(colour = 'black', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

(v) Sclater plot between RestingBP and Cholesterol
ggplot(heart, aes(x = RestingBP, y = Cholesterol, color = Heartdisease)) +
geom_point(size = 2)+
scale_color_manual(values = c("0"="darkolivegreen3","1"= "darkolivegreen4"))+
guides(color = guide_legend(order = 1),
size = guide_legend(order = 2),
shape = guide_legend(order = 3))+
labs(title = "Sclater plot between RestingBP and Cholesterol",
x = "RestingBP",
y = "Cholesterol",
caption = "Source: Iskulghar") +
theme(
plot.title = element_text(colour = 'darkgreen', size = 20),
legend.position = "top",
text = element_text(colour = 'brown', size = 12),
axis.text.x = element_text(color = "brown4", size = 10),
axis.text.y = element_text(color = "brown4", size = 10))

(vi) Sclater plot between RestingBP and MaxHR
ggplot(heart, aes(x = RestingBP, y = MaxHR, color = Heartdisease)) +
geom_point(size = 2)+
guides(color = guide_legend(order = 1),
size = guide_legend(order = 2),
shape = guide_legend(order = 3))+
labs(title = "Sclater plot between RestingBP and MaxHR",
x = "RestingBP",
y = "MaxHR",
caption = "Source: Iskulghar") +
theme(
plot.title = element_text(colour = 'blue', size = 20),
legend.position = "top",
text = element_text(colour = 'black', size = 15),
axis.text.x = element_text(color = "red", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

(vii) Sclater plot between RestingBP and Oldpeak
ggplot(heart, aes(x = RestingBP, y =Oldpeak, color = Heartdisease)) +
geom_point(size = 2)+
guides(color = guide_legend(order = 1),
size = guide_legend(order = 2),
shape = guide_legend(order = 3))+
labs(title = "Sclater plot between RestingBP and Oldpeak",
x = "RestingBP",
y = "Oldpeak",
caption = "Source: Iskulghar") +
theme(
plot.title = element_text(colour = 'blue', size = 20),
legend.position = "top",
text = element_text(colour = 'red', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

(viii) “Sclater plot between Cholesterol and MaxHR
ggplot(heart, aes(x = Cholesterol, y = MaxHR, color = Heartdisease)) +
geom_point(size = 2)+
scale_color_manual(values = c("0" = "cadetblue","1" = "darkorange"))+
labs(title = "Sclater plot between Cholesterol and MaxHR",
x = "Cholesterol",
y = "MaxHR",
caption = "Source: Iskulghar") +
theme(
plot.title = element_text(colour = 'black', size = 20),
legend.position = "top",
text = element_text(colour = 'darkblue', size = 15),
axis.text.x = element_text(color = "red", size = 10),
axis.text.y = element_text(color = "red", size = 10))

(ix)Sclater plot between Cholesterol and Oldpeak
ggplot(heart, aes(x = Cholesterol, y = Oldpeak, color = Heartdisease)) +
geom_point(size = 2)+
guides(color = guide_legend(order = 1),
size = guide_legend(order = 2),
shape = guide_legend(order = 3))+
labs(title = "Sclater plot between Cholesterol and Oldpeak",
x = "Cholesterol",
y = "Oldpeak",
caption = "Source: Iskulghar") +
theme(
plot.title = element_text(colour = 'red', size = 20),
legend.position = "top",
text = element_text(colour = 'gray', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

(x)Sclater plot between Oldpeak and MaxHR
ggplot(heart, aes(x =Oldpeak, y = MaxHR, color = Heartdisease)) +
geom_point(size = 2)+
labs(title = "Sclater plot between Oldpeak and MaxHR",
x = "Oldpeak",
y = "MaxHR",
caption = "Source: Iskulghar") +
theme(
plot.title = element_text(colour = 'blue', size = 20),
legend.position = "top",
text = element_text(colour = 'black', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

a. Boxplot of all columns
(i) Box plot of Age
ggplot(data = heart, aes(x = Heartdisease, y=Age)) +
geom_boxplot(fill = c("cyan2","darkblue"), alpha = 0.5) +
labs(title = "Box plot of Age",
x = "HeartDisease",
y = "Age",
caption = "Source: Iskulghar") +
theme(
legend.position = "top",
plot.title.position = "panel",
plot.title = element_text(colour = 'darkblue', size = 20),
text = element_text(colour = 'blue', size = 15),
axis.text.x = element_text(color = "black", size = 10),
axis.text.y = element_text(color = "black", size = 10))

(ii) Box plot of RestingBP
ggplot(data = heart, aes(x = Heartdisease, y=RestingBP)) +
geom_boxplot(fill = c("brown2","darkcyan"), alpha = 0.5) +
labs(title = "Box plot of RestingBP",
x = "HeartDisease",
y = "RestingBP",
caption = "Source: Iskulghar") +
theme(
plot.title = element_text(colour = 'darkblue', size = 20),
legend.position = "top",
text = element_text(colour = 'black', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

(iii) Box plot of Cholesterol
ggplot(data = heart, aes(x = Heartdisease, y=Cholesterol)) +
geom_boxplot(fill = c("green","red"), alpha = 0.5) +
labs(title = "Box plot of Cholesterol",
x = "HeartDisease",
y = "Cholesterol",
caption = "Source: Iskulghar") +
theme(
legend.position = "top",
text = element_text(colour = 'black', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

(iv) Box plot of MaxHR
ggplot(data = heart, aes(x = Heartdisease, y=MaxHR)) +
geom_boxplot(fill = c("red","blue"), alpha = 0.5) +
labs(title = "Box plot of MaxHR",
x = "HeartDisease",
y = "MaxHR",
caption = "Source: Iskulghar") +
theme(
legend.position = "top",
text = element_text(colour = 'black', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))

(v) Box plot of Oldpeak
ggplot(data = heart, aes(x = Heartdisease, y=Cholesterol)) +
geom_boxplot(fill = c("darkorange2","blue"), alpha = 0.5) +
labs(title = "Box plot of Oldpeak",
x = "HeartDisease",
y = "Oldpeak",
caption = "Source: Iskulghar") +
theme(
legend.position = "top",
plot.title = element_text(colour = 'darkblue', size = 15),
text = element_text(colour = 'black', size = 15),
axis.text.x = element_text(color = "blue", size = 10),
axis.text.y = element_text(color = "blue", size = 10))
