The model correctly predicted the positive class for the screening test 120 times and incorrectly predicted it 10 times.
The model correctly predicted the negative class for the screening test 50 times and incorrectly predicted it 15 times.
The following can be computed from this confusion matrix:
The model made 170 correct predictions (120 + 50).
The model made 25 incorrect predictions (10 + 15)
There are 195 total scored cases (120 + 15 + 10 + 50)
The error rate is (incorrect prediction)
The overall accuracy rate is 34/39 = 0.8718
The precision is 12/13 = 0.9231
The sensitivity is 8/9 = 0.8889
The specificity is 5/6 = 0.8333
The negative predictive value is 10/13 = 0.7692
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
iris
class(iris)
## [1] "data.frame"
sapply(iris, class)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## "numeric" "numeric" "numeric" "numeric" "factor"
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
The standard deviation of the sepal length is 0.8280661
sd(iris$Sepal.Length)
## [1] 0.8280661
The standard deviation of the sepal width is 0.4358663
sd(iris$Sepal.Width)
## [1] 0.4358663
The standard deviation of the petal length is 1.7652982
sd(iris$Petal.Length)
## [1] 1.765298
The standard deviation of the petal width is 0.7622377
sd(iris$Petal.Width)
## [1] 0.7622377
The variance of the sepal length is 0.6856935
var(iris$Sepal.Length)
## [1] 0.6856935
The variance of the sepal width is 0.1899794
var(iris$Sepal.Width)
## [1] 0.1899794
The variance of the petal length is 3.1162779
var(iris$Petal.Length)
## [1] 3.116278
The variance of the petal width is 0.5810063
var(iris$Petal.Width)
## [1] 0.5810063
Numerical Data
Sepal.Length = iris$Sepal.Length
hist(Sepal.Length)
From the histogram above, we have sepal length on the x-axis and frequency of observations on the y-axis. It has a bin width of 0.5. The majority of observations have sepal length between 5.5 to 6.5.
Categorical Data
Species = iris$Species
table(Species)
## Species
## setosa versicolor virginica
## 50 50 50
From the table, we can see that all of the 3 species have the same number of observations; 50.
barplot(table(Species))
From the bar graph above, we have Species on the x-axis and frequency of the species on the y-axis. Since all the species have the same number of observation, the height of the bar for all species are the same.
Dataset obtained from https://github.com/cmdlinetips/data/blob/master/sample_data_to_convert_column_to_datetime_pandas.csv
library(dplyr)
info = read.csv("https://raw.githubusercontent.com/cmdlinetips/data/master/sample_data_to_convert_column_to_datetime_pandas.csv")
info
rename() is a function in dplyr that allows user to rename a column in R.
rename(X, B = A) where X is the name of the data frame, B is the new name and A is the old name.
info <- rename(info, "Date" = "date", "Precipitation" = "precipitation","Maximum temp" = "temp_max","Minimum temp" = "temp_min","Wind" = "wind","Weather" = "weather")
info
rain <- filter(info, Weather == "rain")
rain
Listed are the rows for the value “rain” as the attribute for the column Weather
info["New"] <- "Value"
info
New column “New” is added to the data frame
New dataset is obtained from https://github.com/cmdlinetips/data/blob/master/combine_year_month_day_into_date_pandas.csv
newInfo = read.csv("https://raw.githubusercontent.com/cmdlinetips/data/master/combine_year_month_day_into_date_pandas.csv")
newInfo
xy = bind_cols(info, newInfo)
xy