By : IRDINA BATRISYIA BINTI ASRUL NIZAM (17186649/2)
The model made (120+50) = 170 correct predictions
The model made (10+15) = 25 incorrect predictions
There are (120 + 15 + 10 + 50) = 195 total scored cases
The error rate = 25/195 = 0.1282
The overall accuracy rate = 170/195 = 1.7894
Sensitivity = TP/(TP+FN) = 0.92
Specificity = TN/(TN+FP) = 0.15
Precision = TP /(TP+FP) = 0.89
Negative predictive value = TN/(TN+FN) = 0.167
F-measure = 2RecallPrecision/ (recall + precision) = 1.89
library(ggplot2)
library(dplyr)
library(memisc)
data.df<- as.data.frame(HairEyeColor)
str(data.df)
## 'data.frame': 32 obs. of 4 variables:
## $ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1 2 ...
## $ Eye : Factor w/ 4 levels "Brown","Blue",..: 1 1 1 1 2 2 2 2 3 3 ...
## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
## $ Freq: num 32 53 10 3 11 50 10 30 10 25 ...
summary(HairEyeColor)
## Number of cases in table: 592
## Number of factors: 3
## Test for independence of all factors:
## Chisq = 164.92, df = 24, p-value = 5.321e-23
## Chi-squared approximation may be incorrect
qplot(data = data.df, Eye, Freq, geom="boxplot", color=Sex)
Most males and females have blue and brown eyes
qplot(data = data.df, Hair, Freq, geom="boxplot", color=Sex)
Most males and females have brown hair.
Density plot of different hair colors
qplot(data=data.df, Eye, geom="density", fill=Eye, alpha=0.6)
codebook(data.df) # call the codebook function
## ================================================================================
##
## Hair
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Factor with 4 levels
##
## Levels and labels N Valid
##
## 1 'Black' 8 25.0
## 2 'Brown' 8 25.0
## 3 'Red' 8 25.0
## 4 'Blond' 8 25.0
##
## ================================================================================
##
## Eye
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Factor with 4 levels
##
## Levels and labels N Valid
##
## 1 'Brown' 8 25.0
## 2 'Blue' 8 25.0
## 3 'Hazel' 8 25.0
## 4 'Green' 8 25.0
##
## ================================================================================
##
## Sex
##
## --------------------------------------------------------------------------------
##
## Storage mode: integer
## Factor with 2 levels
##
## Levels and labels N Valid
##
## 1 'Male' 16 50.0
## 2 'Female' 16 50.0
##
## ================================================================================
##
## Freq
##
## --------------------------------------------------------------------------------
##
## Storage mode: double
##
## Min: 2.000
## Max: 66.000
## Mean: 18.500
## Std.Dev.: 17.955
## Skewness: 1.364
## Kurtosis: 0.749
Demonstrate useful functions of dplyr for data manipulation for the following:
#Declaring the dataset from source to df
c <- read.csv("https://raw.githubusercontent.com/IrdnaBtrisyia/AAQ2/main/Movies.csv")
View(c)
# showing the old variable names
names(c)
## [1] "movie_title" "release_date"
## [3] "total_gross" "inflation_adjusted_gross"
c<- dplyr::rename(c, "Title"="movie_title", "Release" = "release_date" , "Income" = "total_gross" , "AGI" = "inflation_adjusted_gross")
# showing the new name for variables
names(c)
## [1] "Title" "Release" "Income" "AGI"
Showing row with AGI equals to 14641561
show (dplyr::filter(c, AGI==14641561))
## Title Release Income AGI
## 1 The Jerky Boys 2/3/1995 7555256 14641561
Showing row with Income equals to 14276095
show (dplyr::filter(c, Income==14276095))
## Title Release Income AGI
## 1 Baby: Secret of the Lost Legend 3/22/1985 14276095 33900697
Adding new column, Deductions which is the total of AGI subtract Income
c <- c%>%
dplyr::select(Title:AGI) %>%
dplyr::mutate(
Deductions = AGI - Income
)
Combining data frame with Genre data frame and set a new data frame called Movies
genre <- read.csv("https://raw.githubusercontent.com/IrdnaBtrisyia/AAQ2/main/Genre.csv")
genre <- dplyr::rename (genre, "Title"="movie_title")
movies = inner_join(c, genre, by = "Title")
View(movies)
Reference: https://www.kaggle.com/prateekmaj21/disney-movies