ALTERNATIVE ASSESSMENT Q2

Name: Gan Jia Soon

Matric number: 17206343/1

2.a) Begin your markdown by interpreting the confusion matrix. No R code is required to answer this question.
The model correctly predicted the positive class for screening test 120 times and incorrectly predicted it 15 times.  The model correctly predicted the negative class for screening test 50 times and incorrectly predicted it 10 times.

The following can be computed from this confusion matrix:
i) The model made 170 correct predictions (120 + 50).
ii) The model made 25 incorrect predictions (15 + 10).
iii) There are 195 total scored cases (120 + 15 + 10 + 50).
iv) The error rate is 25/195 = 0.1282.
v) The overall accuracy rate is 1241/1276 = 0.8718.

2.b) Find and get a data set from the data sets available within R. Perform exploratory data analysis (EDA) and prepare a codebook on that data set. Explain every answer given.

library (memisc)
#List the datasets availableta in R, type: data ()

inspray <- datasets::InsectSprays

typeof (inspray)
## [1] "list"
head(inspray, n=20)
##    count spray
## 1     10     A
## 2      7     A
## 3     20     A
## 4     14     A
## 5     14     A
## 6     12     A
## 7     10     A
## 8     23     A
## 9     17     A
## 10    20     A
## 11    14     A
## 12    13     A
## 13    11     B
## 14    17     B
## 15    21     B
## 16    11     B
## 17    16     B
## 18    14     B
## 19    17     B
## 20    17     B
#To find the mean of the insect count
mean(inspray$count)
## [1] 9.5
#Summary of insect count
summary(inspray$count)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    3.00    7.00    9.50   14.25   26.00
#The range of insect count
range(inspray$count)
## [1]  0 26
#The quantile of Insect Sprays 
quantile(inspray$count)
##    0%   25%   50%   75%  100% 
##  0.00  3.00  7.00 14.25 26.00
#The interquartile range of Insect Sprays
IQR(inspray$count)
## [1] 11.25
#Standard deviation of Insect Sprays
sd(inspray$count)
## [1] 7.203286
#summary of InsectSprays dataset
summary(inspray)
##      count       spray 
##  Min.   : 0.00   A:12  
##  1st Qu.: 3.00   B:12  
##  Median : 7.00   C:12  
##  Mean   : 9.50   D:12  
##  3rd Qu.:14.25   E:12  
##  Max.   :26.00   F:12
#The histogram of Insect Sprays count
hist(inspray$count)

#The boxplot of Insect Sprays data
boxplot(count ~ spray, data = inspray,
        xlab = "Type of spray", ylab = "Insect count",
        main = "InsectSprays data", varwidth = TRUE, col = "lightgreen")

#call the codebook function
codebook (inspray)   
## ================================================================================
## 
##    count
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: double
## 
##         Min:  0.000
##         Max: 26.000
##        Mean:  9.500
##    Std.Dev.:  7.153
##    Skewness:  0.571
##    Kurtosis: -0.774
## 
## ================================================================================
## 
##    spray
## 
## --------------------------------------------------------------------------------
## 
##    Storage mode: integer
##    Factor with 6 levels
## 
##    Levels and labels     N Valid
##                                 
##    1 'A'                12  16.7
##    2 'B'                12  16.7
##    3 'C'                12  16.7
##    4 'D'                12  16.7
##    5 'E'                12  16.7
##    6 'F'                12  16.7

2.C)

The dataset I use is created by myself. First of all, we load the dplyr library.

library (dplyr)

Next, we read the dataset in csv format and see the first 16 data in laptop_price and laptop_spec datasets.
The laptop_spec dataset is used for question c.iv to combine two dataframes.

setwd("D:/")
laptop_price <- read.csv("./laptop_price.csv")
laptop_spec <- read.csv("./laptop_spec.csv")

head(laptop_price, n=16)
##    laptop_ID Company         Product  Ram   Price
## 1          1   Apple     MacBook Pro  8GB 1339.69
## 2          2   Apple     Macbook Air  8GB  898.94
## 3          3      HP          250 G6  8GB  575.00
## 4          4   Apple     MacBook Pro 16GB 2537.45
## 5          5   Apple     MacBook Pro  8GB 1803.60
## 6          6    Acer        Aspire 3  4GB  400.00
## 7          7   Apple     MacBook Pro 16GB 2139.97
## 8          8   Apple     Macbook Air  8GB 1158.70
## 9          9    Asus ZenBook UX430UN 16GB 1495.00
## 10        10    Acer         Swift 3  8GB  770.00
## 11        11      HP          250 G6  4GB  393.90
## 12        12      HP          250 G6  4GB  344.99
## 13        13   Apple     MacBook Pro 16GB 2439.97
## 14        14    Dell   Inspiron 3567  4GB  498.90
## 15        15   Apple     MacBook 12"  8GB 1262.40
## 16        16   Apple     MacBook Pro  8GB 1518.55
head(laptop_spec, n=16)
##    laptop_ID         Product  Ram                        Cpu      OpSys
## 1          1     MacBook Pro  8GB       Intel Core i5 2.3GHz      macOS
## 2          2     Macbook Air  8GB       Intel Core i5 1.8GHz      macOS
## 3          3          250 G6  8GB Intel Core i5 7200U 2.5GHz      No OS
## 4          4     MacBook Pro 16GB       Intel Core i7 2.7GHz      macOS
## 5          5     MacBook Pro  8GB       Intel Core i5 3.1GHz      macOS
## 6          6        Aspire 3  4GB    AMD A9-Series 9420 3GHz Windows 10
## 7          7     MacBook Pro 16GB       Intel Core i7 2.2GHz   Mac OS X
## 8          8     Macbook Air  8GB       Intel Core i5 1.8GHz      macOS
## 9          9 ZenBook UX430UN 16GB Intel Core i7 8550U 1.8GHz Windows 10
## 10        10         Swift 3  8GB Intel Core i5 8250U 1.6GHz Windows 10
## 11        11          250 G6  4GB Intel Core i5 7200U 2.5GHz      No OS
## 12        12          250 G6  4GB   Intel Core i3 6006U 2GHz      No OS
## 13        13     MacBook Pro 16GB       Intel Core i7 2.8GHz      macOS
## 14        14   Inspiron 3567  4GB   Intel Core i3 6006U 2GHz Windows 10
## 15        15     MacBook 12"  8GB     Intel Core M m3 1.2GHz      macOS
## 16        16     MacBook Pro  8GB       Intel Core i5 2.3GHz      macOS

c)i. Change the existing column name to something new.
To rename the variable name in the data frame, we can use rename in dplyr. In this example, we rename the Company column in the laptop_price dataset to Brand.

laptop_price <- dplyr::rename(laptop_price, Brand = Company)

head(laptop_price, n=10)
##    laptop_ID Brand         Product  Ram   Price
## 1          1 Apple     MacBook Pro  8GB 1339.69
## 2          2 Apple     Macbook Air  8GB  898.94
## 3          3    HP          250 G6  8GB  575.00
## 4          4 Apple     MacBook Pro 16GB 2537.45
## 5          5 Apple     MacBook Pro  8GB 1803.60
## 6          6  Acer        Aspire 3  4GB  400.00
## 7          7 Apple     MacBook Pro 16GB 2139.97
## 8          8 Apple     Macbook Air  8GB 1158.70
## 9          9  Asus ZenBook UX430UN 16GB 1495.00
## 10        10  Acer         Swift 3  8GB  770.00

c)ii. Pick rows based on their values.
In this example, we filter out the row based on the value. We select the laptops with their price more than 1000 and their brand is Apple.

dplyr::filter(laptop_price,laptop_price$Price > 1000, Brand == "Apple")
##   laptop_ID Brand     Product  Ram   Price
## 1         1 Apple MacBook Pro  8GB 1339.69
## 2         4 Apple MacBook Pro 16GB 2537.45
## 3         5 Apple MacBook Pro  8GB 1803.60
## 4         7 Apple MacBook Pro 16GB 2139.97
## 5         8 Apple Macbook Air  8GB 1158.70
## 6        13 Apple MacBook Pro 16GB 2439.97
## 7        15 Apple MacBook 12"  8GB 1262.40
## 8        16 Apple MacBook Pro  8GB 1518.55
## 9        18 Apple MacBook Pro 16GB 2858.00

c)iii. Add new columns to a data frame.
We can use mutate function in dplyr to add the new column in data frame. In this example, we add a new column called Full_Product_Name by combine the Brand and Product column.

laptop_price <- dplyr::mutate(laptop_price,Full_Product_Name = paste(laptop_price$Brand, laptop_price$Product))

head(laptop_price, n=10)
##    laptop_ID Brand         Product  Ram   Price    Full_Product_Name
## 1          1 Apple     MacBook Pro  8GB 1339.69    Apple MacBook Pro
## 2          2 Apple     Macbook Air  8GB  898.94    Apple Macbook Air
## 3          3    HP          250 G6  8GB  575.00            HP 250 G6
## 4          4 Apple     MacBook Pro 16GB 2537.45    Apple MacBook Pro
## 5          5 Apple     MacBook Pro  8GB 1803.60    Apple MacBook Pro
## 6          6  Acer        Aspire 3  4GB  400.00        Acer Aspire 3
## 7          7 Apple     MacBook Pro 16GB 2139.97    Apple MacBook Pro
## 8          8 Apple     Macbook Air  8GB 1158.70    Apple Macbook Air
## 9          9  Asus ZenBook UX430UN 16GB 1495.00 Asus ZenBook UX430UN
## 10        10  Acer         Swift 3  8GB  770.00         Acer Swift 3

c)iv. Combine data across two or more data frames.
left_join function in dplyr can be used to join two data frame. In this example, we combine two datasets using left_join method to join them according to same “Product” and “Ram” column.

newdata <- dplyr::left_join(laptop_price,laptop_spec)
## Joining, by = c("laptop_ID", "Product", "Ram")
head(newdata, n=10)
##    laptop_ID Brand         Product  Ram   Price    Full_Product_Name
## 1          1 Apple     MacBook Pro  8GB 1339.69    Apple MacBook Pro
## 2          2 Apple     Macbook Air  8GB  898.94    Apple Macbook Air
## 3          3    HP          250 G6  8GB  575.00            HP 250 G6
## 4          4 Apple     MacBook Pro 16GB 2537.45    Apple MacBook Pro
## 5          5 Apple     MacBook Pro  8GB 1803.60    Apple MacBook Pro
## 6          6  Acer        Aspire 3  4GB  400.00        Acer Aspire 3
## 7          7 Apple     MacBook Pro 16GB 2139.97    Apple MacBook Pro
## 8          8 Apple     Macbook Air  8GB 1158.70    Apple Macbook Air
## 9          9  Asus ZenBook UX430UN 16GB 1495.00 Asus ZenBook UX430UN
## 10        10  Acer         Swift 3  8GB  770.00         Acer Swift 3
##                           Cpu      OpSys
## 1        Intel Core i5 2.3GHz      macOS
## 2        Intel Core i5 1.8GHz      macOS
## 3  Intel Core i5 7200U 2.5GHz      No OS
## 4        Intel Core i7 2.7GHz      macOS
## 5        Intel Core i5 3.1GHz      macOS
## 6     AMD A9-Series 9420 3GHz Windows 10
## 7        Intel Core i7 2.2GHz   Mac OS X
## 8        Intel Core i5 1.8GHz      macOS
## 9  Intel Core i7 8550U 1.8GHz Windows 10
## 10 Intel Core i5 8250U 1.6GHz Windows 10