The table below shows the result of prediction for people doing screening test for coronavirus disease (COVID-19). Begin your markdown by interpreting the confusion matrix. No R code is required to answer this question.
Confusion Matrix
The table correctly predicted the positive class for TESTED POSITIVE 120 times and incorrectly predicted it 15 times.
The table correctly predicted the negative class for TESTED NEGATIVE 50 times and incorrectly predicted it 10 times.
The table made 170 correct predictions (120+150).
The table made 25 incorrect predictions (15+10).
There are 195 total scored cases (120+15+10+50).
The error rate is 25/195=0.1282
The overall accuracy rate is 170/195=0.8718
Find and get a data set from the data sets available within R. Perform exploratory data analysis (EDA) and prepare a codebook on that data set. Explain every answer given.
Finding the data in the R studio.
library(memisc)
## Warning: package 'memisc' was built under R version 4.0.5
## Loading required package: lattice
## Loading required package: MASS
##
## Attaching package: 'memisc'
## The following objects are masked from 'package:stats':
##
## contr.sum, contr.treatment, contrasts
## The following object is masked from 'package:base':
##
## as.array
data()
x <- data.set(women)
# Determine the min height and weight of x
Sapply(women,Min)
## height weight
## 1 58 115
## 2 115 58
## 3 58 115
## 4 115 58
## 5 58 115
## 6 115 58
## 7 58 115
## 8 115 58
## 9 58 115
## 10 115 58
## 11 58 115
## 12 115 58
## 13 58 115
## 14 115 58
## 15 58 115
# Determine the max height and weight of x
Sapply(women,Max)
## height weight
## 1 72 164
## 2 164 72
## 3 72 164
## 4 164 72
## 5 72 164
## 6 164 72
## 7 72 164
## 8 164 72
## 9 72 164
## 10 164 72
## 11 72 164
## 12 164 72
## 13 72 164
## 14 164 72
## 15 72 164
# Determine the range height and weight of x
Sapply(women,Range)
## , , height
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## [1,] 58 115 58 115 58 115 58 115 58 115 58 115 58 115 58
## [2,] 72 164 72 164 72 164 72 164 72 164 72 164 72 164 72
##
## , , weight
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## [1,] 115 58 115 58 115 58 115 58 115 58 115 58 115 58 115
## [2,] 164 72 164 72 164 72 164 72 164 72 164 72 164 72 164
# Boxplot of the dataset
boxplot(x)
# Getting summary of the dataset
summary(x)
## women.height women.weight
## Min. :58.0 Min. :115.0
## 1st Qu.:61.5 1st Qu.:124.5
## Median :65.0 Median :135.0
## Mean :65.0 Mean :136.7
## 3rd Qu.:68.5 3rd Qu.:148.0
## Max. :72.0 Max. :164.0
# Creating codebook for x
codebook(x)
## ================================================================================
##
## women.height
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
## Measurement: interval
##
## Min: 58.000
## Max: 72.000
## Mean: 65.000
## Std.Dev.: 4.320
##
## ================================================================================
##
## women.weight
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
## Measurement: interval
##
## Min: 115.000
## Max: 164.000
## Mean: 136.733
## Std.Dev.: 14.973
Demonstrate useful functions of dplyr for data manipulation for the following:
Explain the use of each function, show the R code and provide a short explanation for each produced output. You can create your own sensible dataset for this question with at least 10 observations.
Using the dataset from the library(memisc) to get the data and create a dataframe.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:memisc':
##
## collect, recode, rename, syms
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
y <- data.set(HairEyeColor)
df <- as.data.frame(y)
# Original Dataset
df
## HairEyeColor.Hair HairEyeColor.Eye HairEyeColor.Sex HairEyeColor.Freq
## 1 Black Brown Male 32
## 2 Brown Brown Male 53
## 3 Red Brown Male 10
## 4 Blond Brown Male 3
## 5 Black Blue Male 11
## 6 Brown Blue Male 50
## 7 Red Blue Male 10
## 8 Blond Blue Male 30
## 9 Black Hazel Male 10
## 10 Brown Hazel Male 25
## 11 Red Hazel Male 7
## 12 Blond Hazel Male 5
## 13 Black Green Male 3
## 14 Brown Green Male 15
## 15 Red Green Male 7
## 16 Blond Green Male 8
## 17 Black Brown Female 36
## 18 Brown Brown Female 66
## 19 Red Brown Female 16
## 20 Blond Brown Female 4
## 21 Black Blue Female 9
## 22 Brown Blue Female 34
## 23 Red Blue Female 7
## 24 Blond Blue Female 64
## 25 Black Hazel Female 5
## 26 Brown Hazel Female 29
## 27 Red Hazel Female 7
## 28 Blond Hazel Female 5
## 29 Black Green Female 2
## 30 Brown Green Female 14
## 31 Red Green Female 7
## 32 Blond Green Female 8
# Renaming the dataset with the rename() function
mydata1 = rename(df, Hair=HairEyeColor.Hair, Eye=HairEyeColor.Eye, Sex=HairEyeColor.Sex, Frequency=HairEyeColor.Freq)
# New dataset
mydata1
## Hair Eye Sex Frequency
## 1 Black Brown Male 32
## 2 Brown Brown Male 53
## 3 Red Brown Male 10
## 4 Blond Brown Male 3
## 5 Black Blue Male 11
## 6 Brown Blue Male 50
## 7 Red Blue Male 10
## 8 Blond Blue Male 30
## 9 Black Hazel Male 10
## 10 Brown Hazel Male 25
## 11 Red Hazel Male 7
## 12 Blond Hazel Male 5
## 13 Black Green Male 3
## 14 Brown Green Male 15
## 15 Red Green Male 7
## 16 Blond Green Male 8
## 17 Black Brown Female 36
## 18 Brown Brown Female 66
## 19 Red Brown Female 16
## 20 Blond Brown Female 4
## 21 Black Blue Female 9
## 22 Brown Blue Female 34
## 23 Red Blue Female 7
## 24 Blond Blue Female 64
## 25 Black Hazel Female 5
## 26 Brown Hazel Female 29
## 27 Red Hazel Female 7
## 28 Blond Hazel Female 5
## 29 Black Green Female 2
## 30 Brown Green Female 14
## 31 Red Green Female 7
## 32 Blond Green Female 8
# Using the filter() function to get the rows
mydata2 = filter(mydata1, Sex == "Female")
# Data that have been filtered by Female Sex
mydata2
## Hair Eye Sex Frequency
## 1 Black Brown Female 36
## 2 Brown Brown Female 66
## 3 Red Brown Female 16
## 4 Blond Brown Female 4
## 5 Black Blue Female 9
## 6 Brown Blue Female 34
## 7 Red Blue Female 7
## 8 Blond Blue Female 64
## 9 Black Hazel Female 5
## 10 Brown Hazel Female 29
## 11 Red Hazel Female 7
## 12 Blond Hazel Female 5
## 13 Black Green Female 2
## 14 Brown Green Female 14
## 15 Red Green Female 7
## 16 Blond Green Female 8
# Adding the name colume to the dataset
mydata1$Name <- c('A','B','C','D','E','F','G','H')
# Dataset with added column
mydata1
## Hair Eye Sex Frequency Name
## 1 Black Brown Male 32 A
## 2 Brown Brown Male 53 B
## 3 Red Brown Male 10 C
## 4 Blond Brown Male 3 D
## 5 Black Blue Male 11 E
## 6 Brown Blue Male 50 F
## 7 Red Blue Male 10 G
## 8 Blond Blue Male 30 H
## 9 Black Hazel Male 10 A
## 10 Brown Hazel Male 25 B
## 11 Red Hazel Male 7 C
## 12 Blond Hazel Male 5 D
## 13 Black Green Male 3 E
## 14 Brown Green Male 15 F
## 15 Red Green Male 7 G
## 16 Blond Green Male 8 H
## 17 Black Brown Female 36 A
## 18 Brown Brown Female 66 B
## 19 Red Brown Female 16 C
## 20 Blond Brown Female 4 D
## 21 Black Blue Female 9 E
## 22 Brown Blue Female 34 F
## 23 Red Blue Female 7 G
## 24 Blond Blue Female 64 H
## 25 Black Hazel Female 5 A
## 26 Brown Hazel Female 29 B
## 27 Red Hazel Female 7 C
## 28 Blond Hazel Female 5 D
## 29 Black Green Female 2 E
## 30 Brown Green Female 14 F
## 31 Red Green Female 7 G
## 32 Blond Green Female 8 H
# Creating new dataframe to combine with existing dataframe
df1 = data.frame(Country = c('USA', 'UK', 'Germany', 'French'))
# New dataframe
df1
## Country
## 1 USA
## 2 UK
## 3 Germany
## 4 French
# Merging both mydata1 and new dataframe
mydata4 <- merge(mydata1, df1)
# Result of combination
mydata4
## Hair Eye Sex Frequency Name Country
## 1 Black Brown Male 32 A USA
## 2 Brown Brown Male 53 B USA
## 3 Red Brown Male 10 C USA
## 4 Blond Brown Male 3 D USA
## 5 Black Blue Male 11 E USA
## 6 Brown Blue Male 50 F USA
## 7 Red Blue Male 10 G USA
## 8 Blond Blue Male 30 H USA
## 9 Black Hazel Male 10 A USA
## 10 Brown Hazel Male 25 B USA
## 11 Red Hazel Male 7 C USA
## 12 Blond Hazel Male 5 D USA
## 13 Black Green Male 3 E USA
## 14 Brown Green Male 15 F USA
## 15 Red Green Male 7 G USA
## 16 Blond Green Male 8 H USA
## 17 Black Brown Female 36 A USA
## 18 Brown Brown Female 66 B USA
## 19 Red Brown Female 16 C USA
## 20 Blond Brown Female 4 D USA
## 21 Black Blue Female 9 E USA
## 22 Brown Blue Female 34 F USA
## 23 Red Blue Female 7 G USA
## 24 Blond Blue Female 64 H USA
## 25 Black Hazel Female 5 A USA
## 26 Brown Hazel Female 29 B USA
## 27 Red Hazel Female 7 C USA
## 28 Blond Hazel Female 5 D USA
## 29 Black Green Female 2 E USA
## 30 Brown Green Female 14 F USA
## 31 Red Green Female 7 G USA
## 32 Blond Green Female 8 H USA
## 33 Black Brown Male 32 A UK
## 34 Brown Brown Male 53 B UK
## 35 Red Brown Male 10 C UK
## 36 Blond Brown Male 3 D UK
## 37 Black Blue Male 11 E UK
## 38 Brown Blue Male 50 F UK
## 39 Red Blue Male 10 G UK
## 40 Blond Blue Male 30 H UK
## 41 Black Hazel Male 10 A UK
## 42 Brown Hazel Male 25 B UK
## 43 Red Hazel Male 7 C UK
## 44 Blond Hazel Male 5 D UK
## 45 Black Green Male 3 E UK
## 46 Brown Green Male 15 F UK
## 47 Red Green Male 7 G UK
## 48 Blond Green Male 8 H UK
## 49 Black Brown Female 36 A UK
## 50 Brown Brown Female 66 B UK
## 51 Red Brown Female 16 C UK
## 52 Blond Brown Female 4 D UK
## 53 Black Blue Female 9 E UK
## 54 Brown Blue Female 34 F UK
## 55 Red Blue Female 7 G UK
## 56 Blond Blue Female 64 H UK
## 57 Black Hazel Female 5 A UK
## 58 Brown Hazel Female 29 B UK
## 59 Red Hazel Female 7 C UK
## 60 Blond Hazel Female 5 D UK
## 61 Black Green Female 2 E UK
## 62 Brown Green Female 14 F UK
## 63 Red Green Female 7 G UK
## 64 Blond Green Female 8 H UK
## 65 Black Brown Male 32 A Germany
## 66 Brown Brown Male 53 B Germany
## 67 Red Brown Male 10 C Germany
## 68 Blond Brown Male 3 D Germany
## 69 Black Blue Male 11 E Germany
## 70 Brown Blue Male 50 F Germany
## 71 Red Blue Male 10 G Germany
## 72 Blond Blue Male 30 H Germany
## 73 Black Hazel Male 10 A Germany
## 74 Brown Hazel Male 25 B Germany
## 75 Red Hazel Male 7 C Germany
## 76 Blond Hazel Male 5 D Germany
## 77 Black Green Male 3 E Germany
## 78 Brown Green Male 15 F Germany
## 79 Red Green Male 7 G Germany
## 80 Blond Green Male 8 H Germany
## 81 Black Brown Female 36 A Germany
## 82 Brown Brown Female 66 B Germany
## 83 Red Brown Female 16 C Germany
## 84 Blond Brown Female 4 D Germany
## 85 Black Blue Female 9 E Germany
## 86 Brown Blue Female 34 F Germany
## 87 Red Blue Female 7 G Germany
## 88 Blond Blue Female 64 H Germany
## 89 Black Hazel Female 5 A Germany
## 90 Brown Hazel Female 29 B Germany
## 91 Red Hazel Female 7 C Germany
## 92 Blond Hazel Female 5 D Germany
## 93 Black Green Female 2 E Germany
## 94 Brown Green Female 14 F Germany
## 95 Red Green Female 7 G Germany
## 96 Blond Green Female 8 H Germany
## 97 Black Brown Male 32 A French
## 98 Brown Brown Male 53 B French
## 99 Red Brown Male 10 C French
## 100 Blond Brown Male 3 D French
## 101 Black Blue Male 11 E French
## 102 Brown Blue Male 50 F French
## 103 Red Blue Male 10 G French
## 104 Blond Blue Male 30 H French
## 105 Black Hazel Male 10 A French
## 106 Brown Hazel Male 25 B French
## 107 Red Hazel Male 7 C French
## 108 Blond Hazel Male 5 D French
## 109 Black Green Male 3 E French
## 110 Brown Green Male 15 F French
## 111 Red Green Male 7 G French
## 112 Blond Green Male 8 H French
## 113 Black Brown Female 36 A French
## 114 Brown Brown Female 66 B French
## 115 Red Brown Female 16 C French
## 116 Blond Brown Female 4 D French
## 117 Black Blue Female 9 E French
## 118 Brown Blue Female 34 F French
## 119 Red Blue Female 7 G French
## 120 Blond Blue Female 64 H French
## 121 Black Hazel Female 5 A French
## 122 Brown Hazel Female 29 B French
## 123 Red Hazel Female 7 C French
## 124 Blond Hazel Female 5 D French
## 125 Black Green Female 2 E French
## 126 Brown Green Female 14 F French
## 127 Red Green Female 7 G French
## 128 Blond Green Female 8 H French