Q1 Interpret Confusion Matrix

tp=120 fn=15
fp=10 tn=50

The model corrected predicted positive cases 120 times and falsely predicted positive cases 10 times. The model corrected predicted negative cases 50 times and falsely predicted negative cases 15 times.

The model made 170 correct predictions (120 + 50) The model made 25 incorrect predictions (10 + 15) There are 195 total scored cases ( 120 + 10 + 50 + 15) The error rate is 25/195= 0.1282 The overall accuracy rate is 170/195= 0.8718

Q2 EDA on dataset and Codebook

Using the iris dataset.


Display the variable type for each column:

class(iris)
## [1] "data.frame"
sapply(iris,class) 
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
##    "numeric"    "numeric"    "numeric"    "numeric"     "factor"

EDA on iris dataset. Shows the descriptive statistics for each column such as the median, quartiles, min and max values& no of observations about the dataset :

summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Using base graphics to plot a basic scatterplot showing the petal and sepal lengths for each species:

plot(iris$Sepal.Length, iris$Petal.Length, col=iris$Species,
     pch = 16 , cex= 0.5, xlab = "Sepal Length", ylab = "Petal Length",
     main="Flower Characteristics in Iris")
legend(x= 4.2, y=7, legend =levels(iris$Species), col=c(1:3), pch = 16)

Q3 dplyr functions demonstrations

Using iris dataset
i.) Change column name. Change the name of the Sepal.length column to Length of Sepal

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
df1 = rename(iris, LengthofSepal=Sepal.Length)
head(df1)
##   LengthofSepal Sepal.Width Petal.Length Petal.Width Species
## 1           5.1         3.5          1.4         0.2  setosa
## 2           4.9         3.0          1.4         0.2  setosa
## 3           4.7         3.2          1.3         0.2  setosa
## 4           4.6         3.1          1.5         0.2  setosa
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa


ii.) Pick rows based on value. Pick only rows about the setosa species

library(dplyr)
df1 = iris

filter(df1, Species == "setosa")
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1           5.1         3.5          1.4         0.2  setosa
## 2           4.9         3.0          1.4         0.2  setosa
## 3           4.7         3.2          1.3         0.2  setosa
## 4           4.6         3.1          1.5         0.2  setosa
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa
## 7           4.6         3.4          1.4         0.3  setosa
## 8           5.0         3.4          1.5         0.2  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa
## 11          5.4         3.7          1.5         0.2  setosa
## 12          4.8         3.4          1.6         0.2  setosa
## 13          4.8         3.0          1.4         0.1  setosa
## 14          4.3         3.0          1.1         0.1  setosa
## 15          5.8         4.0          1.2         0.2  setosa
## 16          5.7         4.4          1.5         0.4  setosa
## 17          5.4         3.9          1.3         0.4  setosa
## 18          5.1         3.5          1.4         0.3  setosa
## 19          5.7         3.8          1.7         0.3  setosa
## 20          5.1         3.8          1.5         0.3  setosa
## 21          5.4         3.4          1.7         0.2  setosa
## 22          5.1         3.7          1.5         0.4  setosa
## 23          4.6         3.6          1.0         0.2  setosa
## 24          5.1         3.3          1.7         0.5  setosa
## 25          4.8         3.4          1.9         0.2  setosa
## 26          5.0         3.0          1.6         0.2  setosa
## 27          5.0         3.4          1.6         0.4  setosa
## 28          5.2         3.5          1.5         0.2  setosa
## 29          5.2         3.4          1.4         0.2  setosa
## 30          4.7         3.2          1.6         0.2  setosa
## 31          4.8         3.1          1.6         0.2  setosa
## 32          5.4         3.4          1.5         0.4  setosa
## 33          5.2         4.1          1.5         0.1  setosa
## 34          5.5         4.2          1.4         0.2  setosa
## 35          4.9         3.1          1.5         0.2  setosa
## 36          5.0         3.2          1.2         0.2  setosa
## 37          5.5         3.5          1.3         0.2  setosa
## 38          4.9         3.6          1.4         0.1  setosa
## 39          4.4         3.0          1.3         0.2  setosa
## 40          5.1         3.4          1.5         0.2  setosa
## 41          5.0         3.5          1.3         0.3  setosa
## 42          4.5         2.3          1.3         0.3  setosa
## 43          4.4         3.2          1.3         0.2  setosa
## 44          5.0         3.5          1.6         0.6  setosa
## 45          5.1         3.8          1.9         0.4  setosa
## 46          4.8         3.0          1.4         0.3  setosa
## 47          5.1         3.8          1.6         0.2  setosa
## 48          4.6         3.2          1.4         0.2  setosa
## 49          5.3         3.7          1.5         0.2  setosa
## 50          5.0         3.3          1.4         0.2  setosa


iii.) Add new column to data frame. Create a new column sepalLength2 using an existing column Sepal.Length

library(dplyr)
df1 = iris

head(df1) %>% mutate(sepalLength2 = Sepal.Length *2 )
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species sepalLength2
## 1          5.1         3.5          1.4         0.2  setosa         10.2
## 2          4.9         3.0          1.4         0.2  setosa          9.8
## 3          4.7         3.2          1.3         0.2  setosa          9.4
## 4          4.6         3.1          1.5         0.2  setosa          9.2
## 5          5.0         3.6          1.4         0.2  setosa         10.0
## 6          5.4         3.9          1.7         0.4  setosa         10.8


iv.) Combine data across two or more data frames. Combine another data frame with the color of each species.

library(dplyr)
df1 = iris
df2 <- data.frame(Species= c("setosa","versicolor","virginica"), 
                  Color = c("violet","blue", "light blue"))

df1 = inner_join(df1, df2, by="Species")
head(df1)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  Color
## 1          5.1         3.5          1.4         0.2  setosa violet
## 2          4.9         3.0          1.4         0.2  setosa violet
## 3          4.7         3.2          1.3         0.2  setosa violet
## 4          4.6         3.1          1.5         0.2  setosa violet
## 5          5.0         3.6          1.4         0.2  setosa violet
## 6          5.4         3.9          1.7         0.4  setosa violet
tail(df1)
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species      Color
## 145          6.7         3.3          5.7         2.5 virginica light blue
## 146          6.7         3.0          5.2         2.3 virginica light blue
## 147          6.3         2.5          5.0         1.9 virginica light blue
## 148          6.5         3.0          5.2         2.0 virginica light blue
## 149          6.2         3.4          5.4         2.3 virginica light blue
## 150          5.9         3.0          5.1         1.8 virginica light blue