Question (a)

The table below shows the result of prediction for people doing screening test for coronavirus disease (COVID-19). Begin your markdown by interpreting the confusion matrix. No R code is required to answer this question.

Confusion Matrix

The table correctly predicted the positive class for TESTED POSITIVE 120 times and incorrectly predicted it 15 times.
The table correctly predicted the negative class for TESTED NEGATIVE 50 times and incorrectly predicted it 10 times.

The table made 170 correct predictions (120+150).
The table made 25 incorrect predictions (15+10).
There are 195 total scored cases (120+15+10+50).
The error rate is 25/195=0.1282
The overall accuracy rate is 170/195=0.8718

Question (b)

Find and get a data set from the data sets available within R. Perform exploratory data analysis (EDA) and prepare a codebook on that data set. Explain every answer given.

1. Finding data

Finding the data in the R studio.

library(memisc)
## Warning: package 'memisc' was built under R version 4.0.5
## Loading required package: lattice
## Loading required package: MASS
## 
## Attaching package: 'memisc'
## The following objects are masked from 'package:stats':
## 
##     contr.sum, contr.treatment, contrasts
## The following object is masked from 'package:base':
## 
##     as.array
data()
x <- data.set(women) 

2. Exploratory data analysis (EDA)

# Determine the min height and weight of x
Sapply(women,Min)
##    height weight
## 1      58    115
## 2     115     58
## 3      58    115
## 4     115     58
## 5      58    115
## 6     115     58
## 7      58    115
## 8     115     58
## 9      58    115
## 10    115     58
## 11     58    115
## 12    115     58
## 13     58    115
## 14    115     58
## 15     58    115
# Determine the max height and weight of x
Sapply(women,Max)
##    height weight
## 1      72    164
## 2     164     72
## 3      72    164
## 4     164     72
## 5      72    164
## 6     164     72
## 7      72    164
## 8     164     72
## 9      72    164
## 10    164     72
## 11     72    164
## 12    164     72
## 13     72    164
## 14    164     72
## 15     72    164
# Determine the range height and weight of x
Sapply(women,Range)
## , , height
## 
##       1   2  3   4  5   6  7   8  9  10 11  12 13  14 15
## [1,] 58 115 58 115 58 115 58 115 58 115 58 115 58 115 58
## [2,] 72 164 72 164 72 164 72 164 72 164 72 164 72 164 72
## 
## , , weight
## 
##        1  2   3  4   5  6   7  8   9 10  11 12  13 14  15
## [1,] 115 58 115 58 115 58 115 58 115 58 115 58 115 58 115
## [2,] 164 72 164 72 164 72 164 72 164 72 164 72 164 72 164
# Boxplot of the dataset
boxplot(x)

# Getting summary of the dataset
summary(x)
##   women.height   women.weight  
##  Min.   :58.0   Min.   :115.0  
##  1st Qu.:61.5   1st Qu.:124.5  
##  Median :65.0   Median :135.0  
##  Mean   :65.0   Mean   :136.7  
##  3rd Qu.:68.5   3rd Qu.:148.0  
##  Max.   :72.0   Max.   :164.0

3.Codebook of the dataset

# Creating codebook for x
codebook(x)
## ================================================================================
## 
##    women.height
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min: 58.000
##         Max: 72.000
##        Mean: 65.000
##    Std.Dev.:  4.320
## 
## ================================================================================
## 
##    women.weight
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
##    Measurement: interval
## 
##         Min: 115.000
##         Max: 164.000
##        Mean: 136.733
##    Std.Dev.:  14.973

Question (c)

Demonstrate useful functions of dplyr for data manipulation for the following:

  1. Change the existing column name to something new.
  2. Pick rows based on their values.
  3. Add new columns to a data frame.
  4. Combine data across two or more data frames.

Explain the use of each function, show the R code and provide a short explanation for each produced output. You can create your own sensible dataset for this question with at least 10 observations.

Using the dataset from the library(memisc) to get the data and create a dataframe.

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:memisc':
## 
##     collect, recode, rename, syms
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
y <- data.set(HairEyeColor)
df <- as.data.frame(y)

1. Change the existing column name to something new.

# Original Dataset
df
##    HairEyeColor.Hair HairEyeColor.Eye HairEyeColor.Sex HairEyeColor.Freq
## 1              Black            Brown             Male                32
## 2              Brown            Brown             Male                53
## 3                Red            Brown             Male                10
## 4              Blond            Brown             Male                 3
## 5              Black             Blue             Male                11
## 6              Brown             Blue             Male                50
## 7                Red             Blue             Male                10
## 8              Blond             Blue             Male                30
## 9              Black            Hazel             Male                10
## 10             Brown            Hazel             Male                25
## 11               Red            Hazel             Male                 7
## 12             Blond            Hazel             Male                 5
## 13             Black            Green             Male                 3
## 14             Brown            Green             Male                15
## 15               Red            Green             Male                 7
## 16             Blond            Green             Male                 8
## 17             Black            Brown           Female                36
## 18             Brown            Brown           Female                66
## 19               Red            Brown           Female                16
## 20             Blond            Brown           Female                 4
## 21             Black             Blue           Female                 9
## 22             Brown             Blue           Female                34
## 23               Red             Blue           Female                 7
## 24             Blond             Blue           Female                64
## 25             Black            Hazel           Female                 5
## 26             Brown            Hazel           Female                29
## 27               Red            Hazel           Female                 7
## 28             Blond            Hazel           Female                 5
## 29             Black            Green           Female                 2
## 30             Brown            Green           Female                14
## 31               Red            Green           Female                 7
## 32             Blond            Green           Female                 8
# Renaming the dataset with the rename() function
mydata1 = rename(df, Hair=HairEyeColor.Hair, Eye=HairEyeColor.Eye, Sex=HairEyeColor.Sex, Frequency=HairEyeColor.Freq)

# New dataset
mydata1
##     Hair   Eye    Sex Frequency
## 1  Black Brown   Male        32
## 2  Brown Brown   Male        53
## 3    Red Brown   Male        10
## 4  Blond Brown   Male         3
## 5  Black  Blue   Male        11
## 6  Brown  Blue   Male        50
## 7    Red  Blue   Male        10
## 8  Blond  Blue   Male        30
## 9  Black Hazel   Male        10
## 10 Brown Hazel   Male        25
## 11   Red Hazel   Male         7
## 12 Blond Hazel   Male         5
## 13 Black Green   Male         3
## 14 Brown Green   Male        15
## 15   Red Green   Male         7
## 16 Blond Green   Male         8
## 17 Black Brown Female        36
## 18 Brown Brown Female        66
## 19   Red Brown Female        16
## 20 Blond Brown Female         4
## 21 Black  Blue Female         9
## 22 Brown  Blue Female        34
## 23   Red  Blue Female         7
## 24 Blond  Blue Female        64
## 25 Black Hazel Female         5
## 26 Brown Hazel Female        29
## 27   Red Hazel Female         7
## 28 Blond Hazel Female         5
## 29 Black Green Female         2
## 30 Brown Green Female        14
## 31   Red Green Female         7
## 32 Blond Green Female         8

2. Pick rows based on their values.

# Using the filter() function to get the rows
mydata2 = filter(mydata1, Sex == "Female")

# Data that have been filtered by Female Sex
mydata2
##     Hair   Eye    Sex Frequency
## 1  Black Brown Female        36
## 2  Brown Brown Female        66
## 3    Red Brown Female        16
## 4  Blond Brown Female         4
## 5  Black  Blue Female         9
## 6  Brown  Blue Female        34
## 7    Red  Blue Female         7
## 8  Blond  Blue Female        64
## 9  Black Hazel Female         5
## 10 Brown Hazel Female        29
## 11   Red Hazel Female         7
## 12 Blond Hazel Female         5
## 13 Black Green Female         2
## 14 Brown Green Female        14
## 15   Red Green Female         7
## 16 Blond Green Female         8

3. Add new columns to a data frame.

# Adding the name colume to the dataset
mydata1$Name <- c('A','B','C','D','E','F','G','H')

# Dataset with added column
mydata1
##     Hair   Eye    Sex Frequency Name
## 1  Black Brown   Male        32    A
## 2  Brown Brown   Male        53    B
## 3    Red Brown   Male        10    C
## 4  Blond Brown   Male         3    D
## 5  Black  Blue   Male        11    E
## 6  Brown  Blue   Male        50    F
## 7    Red  Blue   Male        10    G
## 8  Blond  Blue   Male        30    H
## 9  Black Hazel   Male        10    A
## 10 Brown Hazel   Male        25    B
## 11   Red Hazel   Male         7    C
## 12 Blond Hazel   Male         5    D
## 13 Black Green   Male         3    E
## 14 Brown Green   Male        15    F
## 15   Red Green   Male         7    G
## 16 Blond Green   Male         8    H
## 17 Black Brown Female        36    A
## 18 Brown Brown Female        66    B
## 19   Red Brown Female        16    C
## 20 Blond Brown Female         4    D
## 21 Black  Blue Female         9    E
## 22 Brown  Blue Female        34    F
## 23   Red  Blue Female         7    G
## 24 Blond  Blue Female        64    H
## 25 Black Hazel Female         5    A
## 26 Brown Hazel Female        29    B
## 27   Red Hazel Female         7    C
## 28 Blond Hazel Female         5    D
## 29 Black Green Female         2    E
## 30 Brown Green Female        14    F
## 31   Red Green Female         7    G
## 32 Blond Green Female         8    H

4. Combine data across two or more data frames.

# Creating new dataframe to combine with existing dataframe
df1 = data.frame(Country = c('USA', 'UK', 'Germany', 'French'))

# New dataframe 
df1
##   Country
## 1     USA
## 2      UK
## 3 Germany
## 4  French
# Merging both mydata1 and new dataframe
mydata4 <- merge(mydata1, df1)

# Result of combination
mydata4
##      Hair   Eye    Sex Frequency Name Country
## 1   Black Brown   Male        32    A     USA
## 2   Brown Brown   Male        53    B     USA
## 3     Red Brown   Male        10    C     USA
## 4   Blond Brown   Male         3    D     USA
## 5   Black  Blue   Male        11    E     USA
## 6   Brown  Blue   Male        50    F     USA
## 7     Red  Blue   Male        10    G     USA
## 8   Blond  Blue   Male        30    H     USA
## 9   Black Hazel   Male        10    A     USA
## 10  Brown Hazel   Male        25    B     USA
## 11    Red Hazel   Male         7    C     USA
## 12  Blond Hazel   Male         5    D     USA
## 13  Black Green   Male         3    E     USA
## 14  Brown Green   Male        15    F     USA
## 15    Red Green   Male         7    G     USA
## 16  Blond Green   Male         8    H     USA
## 17  Black Brown Female        36    A     USA
## 18  Brown Brown Female        66    B     USA
## 19    Red Brown Female        16    C     USA
## 20  Blond Brown Female         4    D     USA
## 21  Black  Blue Female         9    E     USA
## 22  Brown  Blue Female        34    F     USA
## 23    Red  Blue Female         7    G     USA
## 24  Blond  Blue Female        64    H     USA
## 25  Black Hazel Female         5    A     USA
## 26  Brown Hazel Female        29    B     USA
## 27    Red Hazel Female         7    C     USA
## 28  Blond Hazel Female         5    D     USA
## 29  Black Green Female         2    E     USA
## 30  Brown Green Female        14    F     USA
## 31    Red Green Female         7    G     USA
## 32  Blond Green Female         8    H     USA
## 33  Black Brown   Male        32    A      UK
## 34  Brown Brown   Male        53    B      UK
## 35    Red Brown   Male        10    C      UK
## 36  Blond Brown   Male         3    D      UK
## 37  Black  Blue   Male        11    E      UK
## 38  Brown  Blue   Male        50    F      UK
## 39    Red  Blue   Male        10    G      UK
## 40  Blond  Blue   Male        30    H      UK
## 41  Black Hazel   Male        10    A      UK
## 42  Brown Hazel   Male        25    B      UK
## 43    Red Hazel   Male         7    C      UK
## 44  Blond Hazel   Male         5    D      UK
## 45  Black Green   Male         3    E      UK
## 46  Brown Green   Male        15    F      UK
## 47    Red Green   Male         7    G      UK
## 48  Blond Green   Male         8    H      UK
## 49  Black Brown Female        36    A      UK
## 50  Brown Brown Female        66    B      UK
## 51    Red Brown Female        16    C      UK
## 52  Blond Brown Female         4    D      UK
## 53  Black  Blue Female         9    E      UK
## 54  Brown  Blue Female        34    F      UK
## 55    Red  Blue Female         7    G      UK
## 56  Blond  Blue Female        64    H      UK
## 57  Black Hazel Female         5    A      UK
## 58  Brown Hazel Female        29    B      UK
## 59    Red Hazel Female         7    C      UK
## 60  Blond Hazel Female         5    D      UK
## 61  Black Green Female         2    E      UK
## 62  Brown Green Female        14    F      UK
## 63    Red Green Female         7    G      UK
## 64  Blond Green Female         8    H      UK
## 65  Black Brown   Male        32    A Germany
## 66  Brown Brown   Male        53    B Germany
## 67    Red Brown   Male        10    C Germany
## 68  Blond Brown   Male         3    D Germany
## 69  Black  Blue   Male        11    E Germany
## 70  Brown  Blue   Male        50    F Germany
## 71    Red  Blue   Male        10    G Germany
## 72  Blond  Blue   Male        30    H Germany
## 73  Black Hazel   Male        10    A Germany
## 74  Brown Hazel   Male        25    B Germany
## 75    Red Hazel   Male         7    C Germany
## 76  Blond Hazel   Male         5    D Germany
## 77  Black Green   Male         3    E Germany
## 78  Brown Green   Male        15    F Germany
## 79    Red Green   Male         7    G Germany
## 80  Blond Green   Male         8    H Germany
## 81  Black Brown Female        36    A Germany
## 82  Brown Brown Female        66    B Germany
## 83    Red Brown Female        16    C Germany
## 84  Blond Brown Female         4    D Germany
## 85  Black  Blue Female         9    E Germany
## 86  Brown  Blue Female        34    F Germany
## 87    Red  Blue Female         7    G Germany
## 88  Blond  Blue Female        64    H Germany
## 89  Black Hazel Female         5    A Germany
## 90  Brown Hazel Female        29    B Germany
## 91    Red Hazel Female         7    C Germany
## 92  Blond Hazel Female         5    D Germany
## 93  Black Green Female         2    E Germany
## 94  Brown Green Female        14    F Germany
## 95    Red Green Female         7    G Germany
## 96  Blond Green Female         8    H Germany
## 97  Black Brown   Male        32    A  French
## 98  Brown Brown   Male        53    B  French
## 99    Red Brown   Male        10    C  French
## 100 Blond Brown   Male         3    D  French
## 101 Black  Blue   Male        11    E  French
## 102 Brown  Blue   Male        50    F  French
## 103   Red  Blue   Male        10    G  French
## 104 Blond  Blue   Male        30    H  French
## 105 Black Hazel   Male        10    A  French
## 106 Brown Hazel   Male        25    B  French
## 107   Red Hazel   Male         7    C  French
## 108 Blond Hazel   Male         5    D  French
## 109 Black Green   Male         3    E  French
## 110 Brown Green   Male        15    F  French
## 111   Red Green   Male         7    G  French
## 112 Blond Green   Male         8    H  French
## 113 Black Brown Female        36    A  French
## 114 Brown Brown Female        66    B  French
## 115   Red Brown Female        16    C  French
## 116 Blond Brown Female         4    D  French
## 117 Black  Blue Female         9    E  French
## 118 Brown  Blue Female        34    F  French
## 119   Red  Blue Female         7    G  French
## 120 Blond  Blue Female        64    H  French
## 121 Black Hazel Female         5    A  French
## 122 Brown Hazel Female        29    B  French
## 123   Red Hazel Female         7    C  French
## 124 Blond Hazel Female         5    D  French
## 125 Black Green Female         2    E  French
## 126 Brown Green Female        14    F  French
## 127   Red Green Female         7    G  French
## 128 Blond Green Female         8    H  French