tp=120 fn=15
fp=10 tn=50
The model corrected predicted positive cases 120 times and falsely predicted positive cases 10 times. The model corrected predicted negative cases 50 times and falsely predicted negative cases 15 times.
The model made 170 correct predictions (120 + 50) The model made 25 incorrect predictions (10 + 15) There are 195 total scored cases ( 120 + 10 + 50 + 15) The error rate is 25/195= 0.1282 The overall accuracy rate is 170/195= 0.8718
Using the iris dataset.
Display the variable type for each column:
class(iris)
## [1] "data.frame"
sapply(iris,class)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## "numeric" "numeric" "numeric" "numeric" "factor"
EDA on iris dataset. Shows the descriptive statistics for each column such as the median, quartiles, min and max values& no of observations about the dataset :
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
Using base graphics to plot a basic scatterplot showing the petal and sepal lengths for each species:
plot(iris$Sepal.Length, iris$Petal.Length, col=iris$Species,
pch = 16 , cex= 0.5, xlab = "Sepal Length", ylab = "Petal Length",
main="Flower Characteristics in Iris")
legend(x= 4.2, y=7, legend =levels(iris$Species), col=c(1:3), pch = 16)
Using iris
dataset
i.) Change column name. Change the name of the Sepal.length column to Length of Sepal
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
df1 = rename(iris, LengthofSepal=Sepal.Length)
head(df1)
## LengthofSepal Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
ii.) Pick rows based on value. Pick only rows about the setosa species
library(dplyr)
df1 = iris
filter(df1, Species == "setosa")
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
iii.) Add new column to data frame. Create a new column sepalLength2 using an existing column Sepal.Length
library(dplyr)
df1 = iris
head(df1) %>% mutate(sepalLength2 = Sepal.Length *2 )
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species sepalLength2
## 1 5.1 3.5 1.4 0.2 setosa 10.2
## 2 4.9 3.0 1.4 0.2 setosa 9.8
## 3 4.7 3.2 1.3 0.2 setosa 9.4
## 4 4.6 3.1 1.5 0.2 setosa 9.2
## 5 5.0 3.6 1.4 0.2 setosa 10.0
## 6 5.4 3.9 1.7 0.4 setosa 10.8
iv.) Combine data across two or more data frames. Combine another data frame with the color of each species.
library(dplyr)
df1 = iris
df2 <- data.frame(Species= c("setosa","versicolor","virginica"),
Color = c("violet","blue", "light blue"))
df1 = inner_join(df1, df2, by="Species")
head(df1)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Color
## 1 5.1 3.5 1.4 0.2 setosa violet
## 2 4.9 3.0 1.4 0.2 setosa violet
## 3 4.7 3.2 1.3 0.2 setosa violet
## 4 4.6 3.1 1.5 0.2 setosa violet
## 5 5.0 3.6 1.4 0.2 setosa violet
## 6 5.4 3.9 1.7 0.4 setosa violet
tail(df1)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Color
## 145 6.7 3.3 5.7 2.5 virginica light blue
## 146 6.7 3.0 5.2 2.3 virginica light blue
## 147 6.3 2.5 5.0 1.9 virginica light blue
## 148 6.5 3.0 5.2 2.0 virginica light blue
## 149 6.2 3.4 5.4 2.3 virginica light blue
## 150 5.9 3.0 5.1 1.8 virginica light blue