the dataset used for this activity is “diabetes_data_upload.csv” from the uci repository. this data set was selected because the test variable for positive or negative can be used for the desired predicted variable for our model. here is the data being read into our workspace and then outputting the table to view the different variables:
data <- read.csv("diabetes_data_upload.csv")
str(data)
## 'data.frame': 520 obs. of 17 variables:
## $ Age : int 40 58 41 45 60 55 57 66 67 70 ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Polyuria : chr "No" "No" "Yes" "No" ...
## $ Polydipsia : chr "Yes" "No" "No" "No" ...
## $ sudden.weight.loss: chr "No" "No" "No" "Yes" ...
## $ weakness : chr "Yes" "Yes" "Yes" "Yes" ...
## $ Polyphagia : chr "No" "No" "Yes" "Yes" ...
## $ Genital.thrush : chr "No" "No" "No" "Yes" ...
## $ visual.blurring : chr "No" "Yes" "No" "No" ...
## $ Itching : chr "Yes" "No" "Yes" "Yes" ...
## $ Irritability : chr "No" "No" "No" "No" ...
## $ delayed.healing : chr "Yes" "No" "Yes" "Yes" ...
## $ partial.paresis : chr "No" "Yes" "No" "No" ...
## $ muscle.stiffness : chr "Yes" "No" "Yes" "No" ...
## $ Alopecia : chr "Yes" "Yes" "Yes" "No" ...
## $ Obesity : chr "Yes" "No" "No" "No" ...
## $ class : chr "Positive" "Positive" "Positive" "Positive" ...
table(data$class)
##
## Negative Positive
## 200 320
there are 17 variables in total. age and gender are included along with the class variable stating whether the patient is positive or negative. the rest of the 14 variables were either symptoms or diseases that the patient experienced, eiher yes or no. for simplicity, we decided to get rid of the illness that we were not familiar with and only kept age, gender, weakness, itching, irritability, obesity and class.
data.subset <- data[c('Age','Gender','weakness','Itching','Irritability','Obesity','class')]
str(data.subset)
## 'data.frame': 520 obs. of 7 variables:
## $ Age : int 40 58 41 45 60 55 57 66 67 70 ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ weakness : chr "Yes" "Yes" "Yes" "Yes" ...
## $ Itching : chr "Yes" "No" "Yes" "Yes" ...
## $ Irritability: chr "No" "No" "No" "No" ...
## $ Obesity : chr "Yes" "No" "No" "No" ...
## $ class : chr "Positive" "Positive" "Positive" "Positive" ...
table(data$class)
##
## Negative Positive
## 200 320
normalized <- data.frame(data.subset)
normalized <- data.matrix(normalized)
print(normalized)
## Age Gender weakness Itching Irritability Obesity class
## [1,] 40 2 2 2 1 2 2
## [2,] 58 2 2 1 1 1 2
## [3,] 41 2 2 2 1 1 2
## [4,] 45 2 2 2 1 1 2
## [5,] 60 2 2 2 2 2 2
## [6,] 55 2 2 2 1 2 2
## [7,] 57 2 2 1 1 1 2
## [8,] 66 2 2 2 2 1 2
## [9,] 67 2 2 2 2 2 2
## [10,] 70 2 2 2 2 1 2
## [11,] 44 2 2 1 2 1 2
## [12,] 38 2 1 2 1 1 2
## [13,] 35 2 1 1 2 1 2
## [14,] 61 2 2 2 1 2 2
## [15,] 60 2 2 2 1 1 2
## [16,] 58 2 2 1 1 1 2
## [17,] 54 2 2 1 1 1 2
## [18,] 67 2 2 1 2 2 2
## [19,] 66 2 2 1 1 1 2
## [20,] 43 2 2 1 1 1 2
## [21,] 62 2 2 1 2 1 2
## [22,] 54 2 2 2 1 1 2
## [23,] 39 2 1 2 2 1 2
## [24,] 48 2 2 2 2 1 2
## [25,] 58 2 2 1 1 2 2
## [26,] 32 2 1 1 2 2 2
## [27,] 42 2 2 1 2 1 2
## [28,] 52 2 2 2 1 1 2
## [29,] 38 2 1 1 1 1 2
## [30,] 53 2 2 1 2 1 2
## [31,] 57 2 2 1 1 1 2
## [32,] 41 2 2 2 2 2 2
## [33,] 37 2 2 1 1 1 2
## [34,] 54 2 2 2 2 1 2
## [35,] 49 2 2 2 1 1 2
## [36,] 48 2 2 2 1 1 2
## [37,] 60 2 1 1 1 1 2
## [38,] 63 2 2 1 1 2 2
## [39,] 35 2 1 1 1 1 2
## [40,] 30 1 2 1 1 1 2
## [41,] 53 1 1 2 2 1 2
## [42,] 50 1 1 1 1 1 2
## [43,] 50 1 2 2 2 1 2
## [44,] 35 1 2 2 1 1 2
## [45,] 40 1 2 2 1 1 2
## [46,] 48 1 2 2 1 1 2
## [47,] 60 1 2 2 1 2 2
## [48,] 60 1 2 2 1 1 2
## [49,] 35 1 2 2 1 2 2
## [50,] 46 1 2 1 1 1 2
## [51,] 36 1 1 1 2 1 2
## [52,] 50 1 2 1 1 1 2
## [53,] 60 1 2 1 2 1 2
## [54,] 50 1 2 1 1 1 2
## [55,] 51 1 1 1 1 1 2
## [56,] 38 1 1 1 1 2 2
## [57,] 66 1 2 2 1 1 2
## [58,] 53 1 1 1 1 1 2
## [59,] 59 1 1 1 1 1 2
## [60,] 39 1 2 1 2 2 2
## [61,] 65 1 2 2 1 1 2
## [62,] 35 1 2 2 1 1 2
## [63,] 55 1 2 2 1 1 2
## [64,] 60 1 2 2 1 2 2
## [65,] 45 1 1 2 1 1 2
## [66,] 40 1 1 1 1 1 2
## [67,] 30 1 1 2 1 1 2
## [68,] 35 1 2 2 1 1 2
## [69,] 25 1 2 1 1 1 2
## [70,] 50 1 2 1 1 2 2
## [71,] 40 1 2 1 1 1 2
## [72,] 35 1 2 2 1 1 2
## [73,] 65 1 1 1 1 1 2
## [74,] 38 1 2 1 1 1 2
## [75,] 50 1 2 2 1 1 2
## [76,] 55 1 2 1 1 2 2
## [77,] 48 1 2 1 1 1 2
## [78,] 55 1 1 2 1 1 2
## [79,] 39 1 2 2 2 1 2
## [80,] 43 1 2 1 1 2 2
## [81,] 35 1 2 1 1 1 2
## [82,] 47 1 2 1 1 1 2
## [83,] 50 1 2 1 1 1 2
## [84,] 48 1 2 2 1 1 2
## [85,] 35 1 2 2 1 1 2
## [86,] 49 1 2 1 1 2 2
## [87,] 38 1 2 2 2 1 2
## [88,] 28 1 1 1 1 1 2
## [89,] 68 1 2 2 1 1 2
## [90,] 35 1 1 1 1 1 2
## [91,] 45 1 1 2 1 1 2
## [92,] 48 1 2 2 2 1 2
## [93,] 40 1 2 1 1 1 2
## [94,] 40 1 2 2 1 1 2
## [95,] 36 1 2 2 1 1 2
## [96,] 56 1 2 2 1 1 2
## [97,] 30 1 2 2 1 1 2
## [98,] 31 1 2 2 2 1 2
## [99,] 35 1 1 1 1 1 2
## [100,] 39 1 1 2 1 1 2
## [101,] 48 1 1 1 2 2 2
## [102,] 85 2 2 2 1 1 2
## [103,] 90 1 1 2 1 1 2
## [104,] 72 2 2 1 2 1 2
## [105,] 70 2 2 1 2 1 2
## [106,] 69 1 2 2 1 1 2
## [107,] 58 2 2 2 1 2 2
## [108,] 47 2 2 1 1 1 2
## [109,] 25 2 1 2 1 1 2
## [110,] 39 1 1 1 2 1 2
## [111,] 53 1 2 2 1 1 2
## [112,] 52 2 1 1 2 2 2
## [113,] 68 1 1 1 2 1 2
## [114,] 79 2 2 2 2 1 2
## [115,] 55 1 1 2 1 1 2
## [116,] 45 1 2 2 2 1 2
## [117,] 30 1 1 1 2 1 2
## [118,] 45 1 2 2 2 1 2
## [119,] 65 1 2 2 2 1 2
## [120,] 34 1 2 1 2 1 2
## [121,] 48 2 2 1 1 1 2
## [122,] 35 2 2 2 1 1 2
## [123,] 40 2 2 2 2 2 2
## [124,] 47 2 1 2 1 2 2
## [125,] 38 2 2 1 1 1 2
## [126,] 55 2 2 1 1 1 2
## [127,] 66 2 2 2 2 1 2
## [128,] 57 2 2 2 2 1 2
## [129,] 32 2 2 2 2 1 2
## [130,] 48 2 2 2 1 1 2
## [131,] 47 2 2 2 2 1 2
## [132,] 43 2 1 1 2 1 2
## [133,] 30 2 2 1 1 1 2
## [134,] 16 2 1 1 1 1 2
## [135,] 35 2 2 1 1 1 2
## [136,] 66 2 2 1 2 1 2
## [137,] 54 2 1 2 2 2 2
## [138,] 58 2 2 2 2 2 2
## [139,] 51 2 2 1 2 1 2
## [140,] 40 2 1 2 1 1 2
## [141,] 47 2 1 1 1 1 2
## [142,] 62 2 1 1 2 1 2
## [143,] 49 2 1 2 1 1 2
## [144,] 53 2 1 1 1 1 2
## [145,] 68 2 1 2 2 1 2
## [146,] 61 2 2 2 2 1 2
## [147,] 39 2 1 2 1 2 2
## [148,] 38 2 1 2 1 2 2
## [149,] 44 2 1 1 1 2 2
## [150,] 45 2 2 1 1 1 2
## [151,] 50 2 2 2 2 1 2
## [152,] 42 2 2 1 2 1 2
## [153,] 55 2 2 1 2 1 2
## [154,] 57 2 2 1 2 1 2
## [155,] 62 2 2 2 2 2 2
## [156,] 33 2 1 1 1 1 2
## [157,] 55 2 2 1 2 1 2
## [158,] 48 2 1 1 1 1 2
## [159,] 56 2 2 2 2 1 2
## [160,] 38 1 2 2 2 1 2
## [161,] 28 1 1 1 1 1 2
## [162,] 68 1 2 2 1 1 2
## [163,] 35 1 1 1 1 1 2
## [164,] 45 1 1 2 1 1 2
## [165,] 48 1 2 2 2 1 2
## [166,] 40 1 2 1 1 1 2
## [167,] 57 2 2 1 1 1 2
## [168,] 41 2 2 2 2 2 2
## [169,] 37 2 2 1 1 1 2
## [170,] 54 2 2 2 2 1 2
## [171,] 49 2 2 2 1 1 2
## [172,] 48 2 2 2 1 1 2
## [173,] 60 2 1 1 1 1 2
## [174,] 63 2 2 1 1 2 2
## [175,] 35 2 1 1 1 1 2
## [176,] 30 1 2 1 1 1 2
## [177,] 53 1 1 2 2 1 2
## [178,] 50 1 1 1 1 1 2
## [179,] 50 1 2 2 2 1 2
## [180,] 35 1 2 2 1 1 2
## [181,] 40 1 2 2 1 1 2
## [182,] 31 1 2 2 2 1 2
## [183,] 35 1 1 1 1 1 2
## [184,] 39 1 1 2 1 1 2
## [185,] 48 1 1 1 2 2 2
## [186,] 85 2 2 2 1 1 2
## [187,] 90 1 1 2 1 1 2
## [188,] 72 2 2 1 2 1 2
## [189,] 70 2 2 1 2 1 2
## [190,] 69 1 2 2 1 1 2
## [191,] 58 2 2 2 1 2 2
## [192,] 54 2 1 1 1 1 2
## [193,] 64 2 1 1 2 1 2
## [194,] 36 2 2 2 1 1 2
## [195,] 43 2 2 2 2 1 2
## [196,] 31 2 1 1 1 1 2
## [197,] 66 2 1 1 1 1 2
## [198,] 61 1 1 1 2 1 2
## [199,] 58 1 1 1 2 2 2
## [200,] 69 1 2 2 2 2 2
## [201,] 40 2 2 2 1 1 1
## [202,] 28 2 1 1 1 1 1
## [203,] 37 2 1 1 1 1 1
## [204,] 34 2 1 1 1 1 1
## [205,] 30 2 1 1 1 1 1
## [206,] 67 2 2 2 2 2 1
## [207,] 60 2 2 1 1 1 1
## [208,] 58 2 1 2 1 2 1
## [209,] 54 2 2 1 1 1 1
## [210,] 43 2 1 1 1 1 1
## [211,] 39 2 2 1 1 1 1
## [212,] 40 2 2 1 2 1 1
## [213,] 43 2 1 1 2 1 1
## [214,] 49 2 1 2 1 2 1
## [215,] 47 2 1 2 1 1 1
## [216,] 45 2 1 1 1 1 1
## [217,] 57 2 1 1 1 1 1
## [218,] 72 2 1 2 1 1 1
## [219,] 30 2 1 1 1 1 1
## [220,] 27 2 1 1 1 1 1
## [221,] 38 2 1 1 1 1 1
## [222,] 43 2 2 2 1 1 1
## [223,] 40 2 1 1 1 2 1
## [224,] 55 2 2 2 1 2 1
## [225,] 68 2 2 2 1 1 1
## [226,] 29 2 2 1 1 1 1
## [227,] 37 2 1 2 1 1 1
## [228,] 30 2 2 1 1 1 1
## [229,] 45 2 2 2 2 1 1
## [230,] 47 2 1 2 1 1 1
## [231,] 35 2 2 1 1 1 1
## [232,] 32 2 2 1 1 1 1
## [233,] 56 2 2 2 1 1 1
## [234,] 50 2 2 2 1 1 1
## [235,] 52 2 2 2 1 1 1
## [236,] 26 2 1 1 1 1 1
## [237,] 60 2 2 2 1 1 1
## [238,] 65 2 2 2 1 1 1
## [239,] 72 2 1 2 1 1 1
## [240,] 30 2 2 1 1 1 1
## [241,] 45 2 2 2 1 1 1
## [242,] 65 2 1 2 2 2 1
## [243,] 70 2 2 2 1 1 1
## [244,] 35 2 2 1 1 1 1
## [245,] 54 2 2 2 1 1 1
## [246,] 30 2 1 1 1 1 1
## [247,] 46 2 2 2 1 1 1
## [248,] 53 2 2 2 1 1 1
## [249,] 42 2 1 1 1 1 1
## [250,] 55 1 2 1 1 2 2
## [251,] 48 1 2 1 1 1 2
## [252,] 55 1 1 2 1 1 2
## [253,] 39 1 2 2 2 1 2
## [254,] 43 1 2 1 1 2 2
## [255,] 35 1 2 1 1 1 2
## [256,] 47 1 2 1 1 1 2
## [257,] 50 1 2 1 1 1 2
## [258,] 48 1 2 2 1 1 2
## [259,] 35 1 2 2 1 1 2
## [260,] 62 2 2 2 2 2 2
## [261,] 33 2 1 1 1 1 2
## [262,] 55 2 2 1 2 1 2
## [263,] 48 2 1 1 1 1 2
## [264,] 56 2 2 2 2 1 2
## [265,] 38 1 2 2 2 1 2
## [266,] 28 1 1 1 1 1 2
## [267,] 68 1 2 2 1 1 2
## [268,] 35 1 1 1 1 1 2
## [269,] 45 1 1 2 1 1 2
## [270,] 48 1 2 2 2 1 2
## [271,] 40 1 2 1 1 1 2
## [272,] 57 2 2 1 1 1 2
## [273,] 47 2 1 2 1 1 1
## [274,] 45 2 1 1 1 1 1
## [275,] 57 2 1 1 1 1 1
## [276,] 72 2 1 2 1 1 1
## [277,] 30 2 1 1 1 1 1
## [278,] 27 2 1 1 1 1 1
## [279,] 38 2 1 1 1 1 1
## [280,] 43 2 2 2 1 1 1
## [281,] 40 2 1 1 1 2 1
## [282,] 47 2 1 2 1 1 1
## [283,] 45 2 1 1 1 1 1
## [284,] 57 2 1 1 1 1 1
## [285,] 72 2 1 2 1 1 1
## [286,] 30 2 1 1 1 1 1
## [287,] 27 2 1 1 1 1 1
## [288,] 38 2 1 1 1 1 1
## [289,] 43 2 2 2 1 1 1
## [290,] 40 2 1 1 1 2 1
## [291,] 54 2 2 2 1 1 1
## [292,] 30 2 1 1 1 1 1
## [293,] 46 2 2 2 1 1 1
## [294,] 53 2 2 2 1 1 1
## [295,] 42 2 1 1 1 1 1
## [296,] 55 1 2 1 1 2 2
## [297,] 48 1 2 1 1 1 2
## [298,] 55 1 1 2 1 1 2
## [299,] 39 1 2 2 2 1 2
## [300,] 43 1 2 1 1 2 2
## [301,] 35 1 2 1 1 1 2
## [302,] 47 1 2 1 1 1 2
## [303,] 61 1 1 1 2 1 2
## [304,] 58 1 1 1 2 2 2
## [305,] 69 1 2 2 2 2 2
## [306,] 40 2 2 2 1 1 1
## [307,] 28 2 1 1 1 1 1
## [308,] 37 2 1 1 1 1 1
## [309,] 34 2 1 1 1 1 1
## [310,] 30 2 1 1 1 1 1
## [311,] 67 2 2 2 2 2 1
## [312,] 60 2 2 1 1 1 1
## [313,] 58 2 1 2 1 2 1
## [314,] 54 2 2 1 1 1 1
## [315,] 43 2 1 1 1 1 1
## [316,] 33 1 1 1 1 1 1
## [317,] 55 1 2 2 1 1 1
## [318,] 36 1 2 1 2 1 1
## [319,] 28 1 1 1 1 1 1
## [320,] 34 1 1 1 1 1 1
## [321,] 65 1 2 2 1 1 1
## [322,] 34 1 1 1 1 1 1
## [323,] 64 2 2 2 2 1 1
## [324,] 44 2 2 2 1 2 1
## [325,] 36 2 1 1 1 1 1
## [326,] 43 2 2 2 1 1 1
## [327,] 53 2 2 2 1 1 1
## [328,] 47 2 1 1 2 2 1
## [329,] 58 2 1 2 1 1 1
## [330,] 56 2 2 2 1 1 1
## [331,] 51 1 1 2 1 1 1
## [332,] 59 1 2 2 1 2 1
## [333,] 50 1 2 2 1 1 1
## [334,] 30 2 1 1 1 1 1
## [335,] 46 2 2 2 1 1 1
## [336,] 53 2 2 2 1 1 1
## [337,] 42 2 1 1 1 1 1
## [338,] 55 1 2 1 1 2 2
## [339,] 48 1 2 1 1 1 2
## [340,] 55 1 1 2 1 1 2
## [341,] 39 1 2 2 2 1 2
## [342,] 43 1 2 1 1 2 2
## [343,] 35 1 2 1 1 1 2
## [344,] 47 1 2 1 1 1 2
## [345,] 61 1 1 1 2 1 2
## [346,] 58 1 1 1 2 2 2
## [347,] 69 1 2 2 2 2 2
## [348,] 40 2 2 2 1 1 1
## [349,] 28 2 1 1 1 1 1
## [350,] 37 2 1 1 1 1 1
## [351,] 34 2 1 1 1 1 1
## [352,] 30 2 1 1 1 1 1
## [353,] 67 2 2 2 2 2 1
## [354,] 60 2 2 1 1 1 1
## [355,] 58 2 1 2 1 2 1
## [356,] 54 2 2 1 1 1 1
## [357,] 43 2 1 1 1 1 1
## [358,] 33 1 1 1 1 1 1
## [359,] 55 2 2 1 2 1 2
## [360,] 48 2 1 1 1 1 2
## [361,] 56 2 2 2 2 1 2
## [362,] 38 1 2 2 2 1 2
## [363,] 28 1 1 1 1 1 2
## [364,] 68 1 2 2 1 1 2
## [365,] 35 1 1 1 1 1 2
## [366,] 45 1 1 2 1 1 2
## [367,] 48 1 2 2 2 1 2
## [368,] 40 1 2 1 1 1 2
## [369,] 57 2 2 1 1 1 2
## [370,] 47 2 1 2 1 1 1
## [371,] 45 2 1 1 1 1 1
## [372,] 57 2 1 1 1 1 1
## [373,] 72 2 1 2 1 1 1
## [374,] 30 2 1 1 1 1 1
## [375,] 27 2 1 1 1 1 1
## [376,] 38 2 1 1 1 1 1
## [377,] 43 2 2 2 1 1 1
## [378,] 40 2 1 1 1 2 1
## [379,] 47 2 1 1 1 1 2
## [380,] 62 2 1 1 2 1 2
## [381,] 49 2 1 2 1 1 2
## [382,] 53 2 1 1 1 1 2
## [383,] 68 2 1 2 2 1 2
## [384,] 61 2 2 2 2 1 2
## [385,] 39 2 1 2 1 2 2
## [386,] 38 2 1 2 1 2 2
## [387,] 44 2 2 2 1 2 1
## [388,] 36 2 1 1 1 1 1
## [389,] 43 2 2 2 1 1 1
## [390,] 53 2 2 2 1 1 1
## [391,] 47 2 1 1 2 2 1
## [392,] 58 2 1 2 1 1 1
## [393,] 56 2 2 2 1 1 1
## [394,] 51 1 1 2 1 1 1
## [395,] 59 1 2 2 1 2 1
## [396,] 50 1 2 2 1 1 1
## [397,] 30 2 1 1 1 1 1
## [398,] 46 2 2 2 1 1 1
## [399,] 53 2 2 2 1 1 1
## [400,] 64 2 2 2 2 1 1
## [401,] 44 2 2 2 1 2 1
## [402,] 36 2 1 1 1 1 1
## [403,] 43 2 2 2 1 1 1
## [404,] 53 2 2 2 1 1 1
## [405,] 47 2 1 1 2 2 1
## [406,] 58 2 1 2 1 1 1
## [407,] 56 2 2 2 1 1 1
## [408,] 51 1 1 2 1 1 1
## [409,] 59 1 2 2 1 2 1
## [410,] 50 1 2 2 1 1 1
## [411,] 30 2 1 1 1 1 1
## [412,] 46 2 2 2 1 1 1
## [413,] 53 2 2 2 1 1 1
## [414,] 42 2 1 1 1 1 1
## [415,] 55 1 2 1 1 2 2
## [416,] 48 1 2 1 1 1 2
## [417,] 55 1 1 2 1 1 2
## [418,] 39 1 2 2 2 1 2
## [419,] 43 1 2 1 1 2 2
## [420,] 35 1 2 1 1 1 2
## [421,] 47 1 2 1 1 1 2
## [422,] 61 1 1 1 2 1 2
## [423,] 67 2 2 1 2 2 2
## [424,] 66 2 2 1 1 1 2
## [425,] 43 2 2 1 1 1 2
## [426,] 62 2 2 1 2 1 2
## [427,] 54 2 2 2 1 1 2
## [428,] 39 2 1 2 2 1 2
## [429,] 48 2 2 2 2 1 2
## [430,] 58 2 2 1 1 2 2
## [431,] 32 2 1 1 2 2 2
## [432,] 42 2 2 1 2 1 2
## [433,] 52 2 2 2 1 1 2
## [434,] 38 2 1 1 1 1 2
## [435,] 53 2 2 1 2 1 2
## [436,] 57 2 2 1 1 1 2
## [437,] 41 2 2 2 2 2 2
## [438,] 37 2 2 1 1 1 2
## [439,] 54 2 2 2 2 1 2
## [440,] 49 2 2 2 1 1 2
## [441,] 48 2 2 2 1 1 2
## [442,] 60 2 1 1 1 1 2
## [443,] 63 2 2 1 1 2 2
## [444,] 35 2 1 1 1 1 2
## [445,] 30 1 2 1 1 1 2
## [446,] 53 1 1 2 2 1 2
## [447,] 50 1 1 1 1 1 2
## [448,] 50 1 2 2 2 1 2
## [449,] 35 1 2 2 1 1 2
## [450,] 40 1 2 2 1 1 2
## [451,] 48 1 2 2 1 1 2
## [452,] 60 1 2 2 1 2 2
## [453,] 38 1 2 2 2 1 2
## [454,] 28 1 1 1 1 1 2
## [455,] 68 1 2 2 1 1 2
## [456,] 35 1 1 1 1 1 2
## [457,] 45 1 1 2 1 1 2
## [458,] 48 1 2 2 2 1 2
## [459,] 40 1 2 1 1 1 2
## [460,] 57 2 2 1 1 1 2
## [461,] 47 2 1 2 1 1 1
## [462,] 45 2 1 1 1 1 1
## [463,] 57 2 1 1 1 1 1
## [464,] 72 2 1 2 1 1 1
## [465,] 30 2 1 1 1 1 1
## [466,] 27 2 1 1 1 1 1
## [467,] 38 2 1 1 1 1 1
## [468,] 43 2 2 2 1 1 1
## [469,] 40 2 1 1 1 2 1
## [470,] 47 2 1 2 1 1 1
## [471,] 45 2 1 1 1 1 1
## [472,] 57 2 1 1 1 1 1
## [473,] 72 2 1 2 1 1 1
## [474,] 30 2 1 1 1 1 1
## [475,] 27 2 1 1 1 1 1
## [476,] 38 2 1 1 1 1 1
## [477,] 43 2 2 2 1 1 1
## [478,] 40 2 1 1 1 2 1
## [479,] 54 2 2 2 1 1 1
## [480,] 30 2 1 1 1 1 1
## [481,] 46 2 2 2 1 1 1
## [482,] 53 2 2 2 1 1 1
## [483,] 42 2 1 1 1 1 1
## [484,] 55 1 2 1 1 2 2
## [485,] 48 1 2 1 1 1 2
## [486,] 55 1 1 2 1 1 2
## [487,] 39 1 2 2 2 1 2
## [488,] 43 1 2 1 1 2 2
## [489,] 50 1 2 2 1 1 1
## [490,] 30 2 1 1 1 1 1
## [491,] 46 2 2 2 1 1 1
## [492,] 53 2 2 2 1 1 1
## [493,] 64 2 2 2 2 1 1
## [494,] 44 2 2 2 1 2 1
## [495,] 36 2 1 1 1 1 1
## [496,] 43 2 2 2 1 1 1
## [497,] 53 2 2 2 1 1 1
## [498,] 47 2 1 1 2 2 1
## [499,] 68 1 2 2 1 1 2
## [500,] 64 2 2 2 2 1 1
## [501,] 66 2 1 2 2 1 2
## [502,] 67 2 1 1 1 1 1
## [503,] 70 2 1 2 1 1 1
## [504,] 44 2 1 1 1 1 1
## [505,] 38 2 1 1 1 1 1
## [506,] 35 2 1 1 1 1 1
## [507,] 61 2 2 2 1 1 1
## [508,] 60 2 1 1 1 2 1
## [509,] 58 2 2 2 1 1 1
## [510,] 54 2 1 1 1 1 1
## [511,] 67 2 2 2 1 1 1
## [512,] 66 2 2 2 1 1 1
## [513,] 43 2 1 1 1 1 1
## [514,] 62 1 2 1 1 2 2
## [515,] 54 1 2 1 1 1 2
## [516,] 39 1 1 2 1 1 2
## [517,] 48 1 2 2 2 1 2
## [518,] 58 1 2 1 1 2 2
## [519,] 32 1 2 2 1 1 1
## [520,] 42 2 1 1 1 1 1
the second portion of the code snippet above then normalizes the data. after doing this, the string values have been converted to numeric in order to use the input for the knn algorithm. gender was changed to 2 for male and 1 for female. the yes and no variable for symptoms and illnesses was converted to 2 for no and 1 for yes. lastly, the predicted variable was changed from positive and negative to 2 for positive and 1 for negative.
data splicing refers to separating the data set into a training and a testing set. this function will split the data into 70 percent training and 30 percent testing. the predicted variable, class in this case, is then removed from the set into its own data frame so we can use it to compare the test results.
set.seed(123)
dat.d <- sample(1:nrow(normalized),size=nrow(normalized)*0.7,replace = FALSE) #random selection of 70% data.
train.data <- normalized[dat.d,1:6] # 70% training data
test.data <- normalized[-dat.d,1:6] # remaining 30% test data
#Creating seperate dataframe for 'Creditability' feature which is our target.
train.data_labels <- normalized[dat.d,7]
test.data_labels <- normalized[-dat.d,7]
##Building machine learning model in order to start our knn algorithm we need to find the optimal value of k. the best approach is to square root the bumber of observations, 364. the square root of 364 is 19.07 therefore we need two training sets, k=19 and k=20.
library(class)
#Find the number of observation
NROW(train.data_labels)
## [1] 364
knn.19 <- knn(train=train.data, test=test.data, cl=train.data_labels, k=19)
knn.20 <- knn(train=train.data, test=test.data, cl=train.data_labels, k=20)
#Calculate the proportion of correct classification for k = 19, 20
ACC.19 <- 100 * sum(test.data_labels == knn.19)/NROW(test.data_labels)
ACC.20 <- 100 * sum(test.data_labels == knn.20)/NROW(test.data_labels)
ACC.19
## [1] 67.30769
ACC.20
## [1] 66.66667
as seen above, when the accuracy of our models were 67.31% for k = 19 and 66.67% for k = 20. to examine the accuracy more in depth, the confusion matrix is produced for k = 20. assessing the model myself, i would say that model is decent with an accuracy of 67.31%, however this was a data set with 520 observations so it might have been better to test on a smaller sample size for more accuracy
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
confusionMatrix(table(knn.19 ,test.data_labels))
## Confusion Matrix and Statistics
##
## test.data_labels
## knn.19 1 2
## 1 27 19
## 2 32 78
##
## Accuracy : 0.6731
## 95% CI : (0.5935, 0.7459)
## No Information Rate : 0.6218
## P-Value [Acc > NIR] : 0.10705
##
## Kappa : 0.2736
##
## Mcnemar's Test P-Value : 0.09289
##
## Sensitivity : 0.4576
## Specificity : 0.8041
## Pos Pred Value : 0.5870
## Neg Pred Value : 0.7091
## Prevalence : 0.3782
## Detection Rate : 0.1731
## Detection Prevalence : 0.2949
## Balanced Accuracy : 0.6309
##
## 'Positive' Class : 1
##