Description

the dataset used for this activity is “diabetes_data_upload.csv” from the uci repository. this data set was selected because the test variable for positive or negative can be used for the desired predicted variable for our model. here is the data being read into our workspace and then outputting the table to view the different variables:

data <- read.csv("diabetes_data_upload.csv")

str(data)
## 'data.frame':    520 obs. of  17 variables:
##  $ Age               : int  40 58 41 45 60 55 57 66 67 70 ...
##  $ Gender            : chr  "Male" "Male" "Male" "Male" ...
##  $ Polyuria          : chr  "No" "No" "Yes" "No" ...
##  $ Polydipsia        : chr  "Yes" "No" "No" "No" ...
##  $ sudden.weight.loss: chr  "No" "No" "No" "Yes" ...
##  $ weakness          : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ Polyphagia        : chr  "No" "No" "Yes" "Yes" ...
##  $ Genital.thrush    : chr  "No" "No" "No" "Yes" ...
##  $ visual.blurring   : chr  "No" "Yes" "No" "No" ...
##  $ Itching           : chr  "Yes" "No" "Yes" "Yes" ...
##  $ Irritability      : chr  "No" "No" "No" "No" ...
##  $ delayed.healing   : chr  "Yes" "No" "Yes" "Yes" ...
##  $ partial.paresis   : chr  "No" "Yes" "No" "No" ...
##  $ muscle.stiffness  : chr  "Yes" "No" "Yes" "No" ...
##  $ Alopecia          : chr  "Yes" "Yes" "Yes" "No" ...
##  $ Obesity           : chr  "Yes" "No" "No" "No" ...
##  $ class             : chr  "Positive" "Positive" "Positive" "Positive" ...
table(data$class)
## 
## Negative Positive 
##      200      320

cleaning and exploring data

there are 17 variables in total. age and gender are included along with the class variable stating whether the patient is positive or negative. the rest of the 14 variables were either symptoms or diseases that the patient experienced, eiher yes or no. for simplicity, we decided to get rid of the illness that we were not familiar with and only kept age, gender, weakness, itching, irritability, obesity and class.

data.subset <- data[c('Age','Gender','weakness','Itching','Irritability','Obesity','class')]
str(data.subset)
## 'data.frame':    520 obs. of  7 variables:
##  $ Age         : int  40 58 41 45 60 55 57 66 67 70 ...
##  $ Gender      : chr  "Male" "Male" "Male" "Male" ...
##  $ weakness    : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ Itching     : chr  "Yes" "No" "Yes" "Yes" ...
##  $ Irritability: chr  "No" "No" "No" "No" ...
##  $ Obesity     : chr  "Yes" "No" "No" "No" ...
##  $ class       : chr  "Positive" "Positive" "Positive" "Positive" ...
table(data$class)
## 
## Negative Positive 
##      200      320
normalized <- data.frame(data.subset)
normalized <- data.matrix(normalized)
print(normalized)
##        Age Gender weakness Itching Irritability Obesity class
##   [1,]  40      2        2       2            1       2     2
##   [2,]  58      2        2       1            1       1     2
##   [3,]  41      2        2       2            1       1     2
##   [4,]  45      2        2       2            1       1     2
##   [5,]  60      2        2       2            2       2     2
##   [6,]  55      2        2       2            1       2     2
##   [7,]  57      2        2       1            1       1     2
##   [8,]  66      2        2       2            2       1     2
##   [9,]  67      2        2       2            2       2     2
##  [10,]  70      2        2       2            2       1     2
##  [11,]  44      2        2       1            2       1     2
##  [12,]  38      2        1       2            1       1     2
##  [13,]  35      2        1       1            2       1     2
##  [14,]  61      2        2       2            1       2     2
##  [15,]  60      2        2       2            1       1     2
##  [16,]  58      2        2       1            1       1     2
##  [17,]  54      2        2       1            1       1     2
##  [18,]  67      2        2       1            2       2     2
##  [19,]  66      2        2       1            1       1     2
##  [20,]  43      2        2       1            1       1     2
##  [21,]  62      2        2       1            2       1     2
##  [22,]  54      2        2       2            1       1     2
##  [23,]  39      2        1       2            2       1     2
##  [24,]  48      2        2       2            2       1     2
##  [25,]  58      2        2       1            1       2     2
##  [26,]  32      2        1       1            2       2     2
##  [27,]  42      2        2       1            2       1     2
##  [28,]  52      2        2       2            1       1     2
##  [29,]  38      2        1       1            1       1     2
##  [30,]  53      2        2       1            2       1     2
##  [31,]  57      2        2       1            1       1     2
##  [32,]  41      2        2       2            2       2     2
##  [33,]  37      2        2       1            1       1     2
##  [34,]  54      2        2       2            2       1     2
##  [35,]  49      2        2       2            1       1     2
##  [36,]  48      2        2       2            1       1     2
##  [37,]  60      2        1       1            1       1     2
##  [38,]  63      2        2       1            1       2     2
##  [39,]  35      2        1       1            1       1     2
##  [40,]  30      1        2       1            1       1     2
##  [41,]  53      1        1       2            2       1     2
##  [42,]  50      1        1       1            1       1     2
##  [43,]  50      1        2       2            2       1     2
##  [44,]  35      1        2       2            1       1     2
##  [45,]  40      1        2       2            1       1     2
##  [46,]  48      1        2       2            1       1     2
##  [47,]  60      1        2       2            1       2     2
##  [48,]  60      1        2       2            1       1     2
##  [49,]  35      1        2       2            1       2     2
##  [50,]  46      1        2       1            1       1     2
##  [51,]  36      1        1       1            2       1     2
##  [52,]  50      1        2       1            1       1     2
##  [53,]  60      1        2       1            2       1     2
##  [54,]  50      1        2       1            1       1     2
##  [55,]  51      1        1       1            1       1     2
##  [56,]  38      1        1       1            1       2     2
##  [57,]  66      1        2       2            1       1     2
##  [58,]  53      1        1       1            1       1     2
##  [59,]  59      1        1       1            1       1     2
##  [60,]  39      1        2       1            2       2     2
##  [61,]  65      1        2       2            1       1     2
##  [62,]  35      1        2       2            1       1     2
##  [63,]  55      1        2       2            1       1     2
##  [64,]  60      1        2       2            1       2     2
##  [65,]  45      1        1       2            1       1     2
##  [66,]  40      1        1       1            1       1     2
##  [67,]  30      1        1       2            1       1     2
##  [68,]  35      1        2       2            1       1     2
##  [69,]  25      1        2       1            1       1     2
##  [70,]  50      1        2       1            1       2     2
##  [71,]  40      1        2       1            1       1     2
##  [72,]  35      1        2       2            1       1     2
##  [73,]  65      1        1       1            1       1     2
##  [74,]  38      1        2       1            1       1     2
##  [75,]  50      1        2       2            1       1     2
##  [76,]  55      1        2       1            1       2     2
##  [77,]  48      1        2       1            1       1     2
##  [78,]  55      1        1       2            1       1     2
##  [79,]  39      1        2       2            2       1     2
##  [80,]  43      1        2       1            1       2     2
##  [81,]  35      1        2       1            1       1     2
##  [82,]  47      1        2       1            1       1     2
##  [83,]  50      1        2       1            1       1     2
##  [84,]  48      1        2       2            1       1     2
##  [85,]  35      1        2       2            1       1     2
##  [86,]  49      1        2       1            1       2     2
##  [87,]  38      1        2       2            2       1     2
##  [88,]  28      1        1       1            1       1     2
##  [89,]  68      1        2       2            1       1     2
##  [90,]  35      1        1       1            1       1     2
##  [91,]  45      1        1       2            1       1     2
##  [92,]  48      1        2       2            2       1     2
##  [93,]  40      1        2       1            1       1     2
##  [94,]  40      1        2       2            1       1     2
##  [95,]  36      1        2       2            1       1     2
##  [96,]  56      1        2       2            1       1     2
##  [97,]  30      1        2       2            1       1     2
##  [98,]  31      1        2       2            2       1     2
##  [99,]  35      1        1       1            1       1     2
## [100,]  39      1        1       2            1       1     2
## [101,]  48      1        1       1            2       2     2
## [102,]  85      2        2       2            1       1     2
## [103,]  90      1        1       2            1       1     2
## [104,]  72      2        2       1            2       1     2
## [105,]  70      2        2       1            2       1     2
## [106,]  69      1        2       2            1       1     2
## [107,]  58      2        2       2            1       2     2
## [108,]  47      2        2       1            1       1     2
## [109,]  25      2        1       2            1       1     2
## [110,]  39      1        1       1            2       1     2
## [111,]  53      1        2       2            1       1     2
## [112,]  52      2        1       1            2       2     2
## [113,]  68      1        1       1            2       1     2
## [114,]  79      2        2       2            2       1     2
## [115,]  55      1        1       2            1       1     2
## [116,]  45      1        2       2            2       1     2
## [117,]  30      1        1       1            2       1     2
## [118,]  45      1        2       2            2       1     2
## [119,]  65      1        2       2            2       1     2
## [120,]  34      1        2       1            2       1     2
## [121,]  48      2        2       1            1       1     2
## [122,]  35      2        2       2            1       1     2
## [123,]  40      2        2       2            2       2     2
## [124,]  47      2        1       2            1       2     2
## [125,]  38      2        2       1            1       1     2
## [126,]  55      2        2       1            1       1     2
## [127,]  66      2        2       2            2       1     2
## [128,]  57      2        2       2            2       1     2
## [129,]  32      2        2       2            2       1     2
## [130,]  48      2        2       2            1       1     2
## [131,]  47      2        2       2            2       1     2
## [132,]  43      2        1       1            2       1     2
## [133,]  30      2        2       1            1       1     2
## [134,]  16      2        1       1            1       1     2
## [135,]  35      2        2       1            1       1     2
## [136,]  66      2        2       1            2       1     2
## [137,]  54      2        1       2            2       2     2
## [138,]  58      2        2       2            2       2     2
## [139,]  51      2        2       1            2       1     2
## [140,]  40      2        1       2            1       1     2
## [141,]  47      2        1       1            1       1     2
## [142,]  62      2        1       1            2       1     2
## [143,]  49      2        1       2            1       1     2
## [144,]  53      2        1       1            1       1     2
## [145,]  68      2        1       2            2       1     2
## [146,]  61      2        2       2            2       1     2
## [147,]  39      2        1       2            1       2     2
## [148,]  38      2        1       2            1       2     2
## [149,]  44      2        1       1            1       2     2
## [150,]  45      2        2       1            1       1     2
## [151,]  50      2        2       2            2       1     2
## [152,]  42      2        2       1            2       1     2
## [153,]  55      2        2       1            2       1     2
## [154,]  57      2        2       1            2       1     2
## [155,]  62      2        2       2            2       2     2
## [156,]  33      2        1       1            1       1     2
## [157,]  55      2        2       1            2       1     2
## [158,]  48      2        1       1            1       1     2
## [159,]  56      2        2       2            2       1     2
## [160,]  38      1        2       2            2       1     2
## [161,]  28      1        1       1            1       1     2
## [162,]  68      1        2       2            1       1     2
## [163,]  35      1        1       1            1       1     2
## [164,]  45      1        1       2            1       1     2
## [165,]  48      1        2       2            2       1     2
## [166,]  40      1        2       1            1       1     2
## [167,]  57      2        2       1            1       1     2
## [168,]  41      2        2       2            2       2     2
## [169,]  37      2        2       1            1       1     2
## [170,]  54      2        2       2            2       1     2
## [171,]  49      2        2       2            1       1     2
## [172,]  48      2        2       2            1       1     2
## [173,]  60      2        1       1            1       1     2
## [174,]  63      2        2       1            1       2     2
## [175,]  35      2        1       1            1       1     2
## [176,]  30      1        2       1            1       1     2
## [177,]  53      1        1       2            2       1     2
## [178,]  50      1        1       1            1       1     2
## [179,]  50      1        2       2            2       1     2
## [180,]  35      1        2       2            1       1     2
## [181,]  40      1        2       2            1       1     2
## [182,]  31      1        2       2            2       1     2
## [183,]  35      1        1       1            1       1     2
## [184,]  39      1        1       2            1       1     2
## [185,]  48      1        1       1            2       2     2
## [186,]  85      2        2       2            1       1     2
## [187,]  90      1        1       2            1       1     2
## [188,]  72      2        2       1            2       1     2
## [189,]  70      2        2       1            2       1     2
## [190,]  69      1        2       2            1       1     2
## [191,]  58      2        2       2            1       2     2
## [192,]  54      2        1       1            1       1     2
## [193,]  64      2        1       1            2       1     2
## [194,]  36      2        2       2            1       1     2
## [195,]  43      2        2       2            2       1     2
## [196,]  31      2        1       1            1       1     2
## [197,]  66      2        1       1            1       1     2
## [198,]  61      1        1       1            2       1     2
## [199,]  58      1        1       1            2       2     2
## [200,]  69      1        2       2            2       2     2
## [201,]  40      2        2       2            1       1     1
## [202,]  28      2        1       1            1       1     1
## [203,]  37      2        1       1            1       1     1
## [204,]  34      2        1       1            1       1     1
## [205,]  30      2        1       1            1       1     1
## [206,]  67      2        2       2            2       2     1
## [207,]  60      2        2       1            1       1     1
## [208,]  58      2        1       2            1       2     1
## [209,]  54      2        2       1            1       1     1
## [210,]  43      2        1       1            1       1     1
## [211,]  39      2        2       1            1       1     1
## [212,]  40      2        2       1            2       1     1
## [213,]  43      2        1       1            2       1     1
## [214,]  49      2        1       2            1       2     1
## [215,]  47      2        1       2            1       1     1
## [216,]  45      2        1       1            1       1     1
## [217,]  57      2        1       1            1       1     1
## [218,]  72      2        1       2            1       1     1
## [219,]  30      2        1       1            1       1     1
## [220,]  27      2        1       1            1       1     1
## [221,]  38      2        1       1            1       1     1
## [222,]  43      2        2       2            1       1     1
## [223,]  40      2        1       1            1       2     1
## [224,]  55      2        2       2            1       2     1
## [225,]  68      2        2       2            1       1     1
## [226,]  29      2        2       1            1       1     1
## [227,]  37      2        1       2            1       1     1
## [228,]  30      2        2       1            1       1     1
## [229,]  45      2        2       2            2       1     1
## [230,]  47      2        1       2            1       1     1
## [231,]  35      2        2       1            1       1     1
## [232,]  32      2        2       1            1       1     1
## [233,]  56      2        2       2            1       1     1
## [234,]  50      2        2       2            1       1     1
## [235,]  52      2        2       2            1       1     1
## [236,]  26      2        1       1            1       1     1
## [237,]  60      2        2       2            1       1     1
## [238,]  65      2        2       2            1       1     1
## [239,]  72      2        1       2            1       1     1
## [240,]  30      2        2       1            1       1     1
## [241,]  45      2        2       2            1       1     1
## [242,]  65      2        1       2            2       2     1
## [243,]  70      2        2       2            1       1     1
## [244,]  35      2        2       1            1       1     1
## [245,]  54      2        2       2            1       1     1
## [246,]  30      2        1       1            1       1     1
## [247,]  46      2        2       2            1       1     1
## [248,]  53      2        2       2            1       1     1
## [249,]  42      2        1       1            1       1     1
## [250,]  55      1        2       1            1       2     2
## [251,]  48      1        2       1            1       1     2
## [252,]  55      1        1       2            1       1     2
## [253,]  39      1        2       2            2       1     2
## [254,]  43      1        2       1            1       2     2
## [255,]  35      1        2       1            1       1     2
## [256,]  47      1        2       1            1       1     2
## [257,]  50      1        2       1            1       1     2
## [258,]  48      1        2       2            1       1     2
## [259,]  35      1        2       2            1       1     2
## [260,]  62      2        2       2            2       2     2
## [261,]  33      2        1       1            1       1     2
## [262,]  55      2        2       1            2       1     2
## [263,]  48      2        1       1            1       1     2
## [264,]  56      2        2       2            2       1     2
## [265,]  38      1        2       2            2       1     2
## [266,]  28      1        1       1            1       1     2
## [267,]  68      1        2       2            1       1     2
## [268,]  35      1        1       1            1       1     2
## [269,]  45      1        1       2            1       1     2
## [270,]  48      1        2       2            2       1     2
## [271,]  40      1        2       1            1       1     2
## [272,]  57      2        2       1            1       1     2
## [273,]  47      2        1       2            1       1     1
## [274,]  45      2        1       1            1       1     1
## [275,]  57      2        1       1            1       1     1
## [276,]  72      2        1       2            1       1     1
## [277,]  30      2        1       1            1       1     1
## [278,]  27      2        1       1            1       1     1
## [279,]  38      2        1       1            1       1     1
## [280,]  43      2        2       2            1       1     1
## [281,]  40      2        1       1            1       2     1
## [282,]  47      2        1       2            1       1     1
## [283,]  45      2        1       1            1       1     1
## [284,]  57      2        1       1            1       1     1
## [285,]  72      2        1       2            1       1     1
## [286,]  30      2        1       1            1       1     1
## [287,]  27      2        1       1            1       1     1
## [288,]  38      2        1       1            1       1     1
## [289,]  43      2        2       2            1       1     1
## [290,]  40      2        1       1            1       2     1
## [291,]  54      2        2       2            1       1     1
## [292,]  30      2        1       1            1       1     1
## [293,]  46      2        2       2            1       1     1
## [294,]  53      2        2       2            1       1     1
## [295,]  42      2        1       1            1       1     1
## [296,]  55      1        2       1            1       2     2
## [297,]  48      1        2       1            1       1     2
## [298,]  55      1        1       2            1       1     2
## [299,]  39      1        2       2            2       1     2
## [300,]  43      1        2       1            1       2     2
## [301,]  35      1        2       1            1       1     2
## [302,]  47      1        2       1            1       1     2
## [303,]  61      1        1       1            2       1     2
## [304,]  58      1        1       1            2       2     2
## [305,]  69      1        2       2            2       2     2
## [306,]  40      2        2       2            1       1     1
## [307,]  28      2        1       1            1       1     1
## [308,]  37      2        1       1            1       1     1
## [309,]  34      2        1       1            1       1     1
## [310,]  30      2        1       1            1       1     1
## [311,]  67      2        2       2            2       2     1
## [312,]  60      2        2       1            1       1     1
## [313,]  58      2        1       2            1       2     1
## [314,]  54      2        2       1            1       1     1
## [315,]  43      2        1       1            1       1     1
## [316,]  33      1        1       1            1       1     1
## [317,]  55      1        2       2            1       1     1
## [318,]  36      1        2       1            2       1     1
## [319,]  28      1        1       1            1       1     1
## [320,]  34      1        1       1            1       1     1
## [321,]  65      1        2       2            1       1     1
## [322,]  34      1        1       1            1       1     1
## [323,]  64      2        2       2            2       1     1
## [324,]  44      2        2       2            1       2     1
## [325,]  36      2        1       1            1       1     1
## [326,]  43      2        2       2            1       1     1
## [327,]  53      2        2       2            1       1     1
## [328,]  47      2        1       1            2       2     1
## [329,]  58      2        1       2            1       1     1
## [330,]  56      2        2       2            1       1     1
## [331,]  51      1        1       2            1       1     1
## [332,]  59      1        2       2            1       2     1
## [333,]  50      1        2       2            1       1     1
## [334,]  30      2        1       1            1       1     1
## [335,]  46      2        2       2            1       1     1
## [336,]  53      2        2       2            1       1     1
## [337,]  42      2        1       1            1       1     1
## [338,]  55      1        2       1            1       2     2
## [339,]  48      1        2       1            1       1     2
## [340,]  55      1        1       2            1       1     2
## [341,]  39      1        2       2            2       1     2
## [342,]  43      1        2       1            1       2     2
## [343,]  35      1        2       1            1       1     2
## [344,]  47      1        2       1            1       1     2
## [345,]  61      1        1       1            2       1     2
## [346,]  58      1        1       1            2       2     2
## [347,]  69      1        2       2            2       2     2
## [348,]  40      2        2       2            1       1     1
## [349,]  28      2        1       1            1       1     1
## [350,]  37      2        1       1            1       1     1
## [351,]  34      2        1       1            1       1     1
## [352,]  30      2        1       1            1       1     1
## [353,]  67      2        2       2            2       2     1
## [354,]  60      2        2       1            1       1     1
## [355,]  58      2        1       2            1       2     1
## [356,]  54      2        2       1            1       1     1
## [357,]  43      2        1       1            1       1     1
## [358,]  33      1        1       1            1       1     1
## [359,]  55      2        2       1            2       1     2
## [360,]  48      2        1       1            1       1     2
## [361,]  56      2        2       2            2       1     2
## [362,]  38      1        2       2            2       1     2
## [363,]  28      1        1       1            1       1     2
## [364,]  68      1        2       2            1       1     2
## [365,]  35      1        1       1            1       1     2
## [366,]  45      1        1       2            1       1     2
## [367,]  48      1        2       2            2       1     2
## [368,]  40      1        2       1            1       1     2
## [369,]  57      2        2       1            1       1     2
## [370,]  47      2        1       2            1       1     1
## [371,]  45      2        1       1            1       1     1
## [372,]  57      2        1       1            1       1     1
## [373,]  72      2        1       2            1       1     1
## [374,]  30      2        1       1            1       1     1
## [375,]  27      2        1       1            1       1     1
## [376,]  38      2        1       1            1       1     1
## [377,]  43      2        2       2            1       1     1
## [378,]  40      2        1       1            1       2     1
## [379,]  47      2        1       1            1       1     2
## [380,]  62      2        1       1            2       1     2
## [381,]  49      2        1       2            1       1     2
## [382,]  53      2        1       1            1       1     2
## [383,]  68      2        1       2            2       1     2
## [384,]  61      2        2       2            2       1     2
## [385,]  39      2        1       2            1       2     2
## [386,]  38      2        1       2            1       2     2
## [387,]  44      2        2       2            1       2     1
## [388,]  36      2        1       1            1       1     1
## [389,]  43      2        2       2            1       1     1
## [390,]  53      2        2       2            1       1     1
## [391,]  47      2        1       1            2       2     1
## [392,]  58      2        1       2            1       1     1
## [393,]  56      2        2       2            1       1     1
## [394,]  51      1        1       2            1       1     1
## [395,]  59      1        2       2            1       2     1
## [396,]  50      1        2       2            1       1     1
## [397,]  30      2        1       1            1       1     1
## [398,]  46      2        2       2            1       1     1
## [399,]  53      2        2       2            1       1     1
## [400,]  64      2        2       2            2       1     1
## [401,]  44      2        2       2            1       2     1
## [402,]  36      2        1       1            1       1     1
## [403,]  43      2        2       2            1       1     1
## [404,]  53      2        2       2            1       1     1
## [405,]  47      2        1       1            2       2     1
## [406,]  58      2        1       2            1       1     1
## [407,]  56      2        2       2            1       1     1
## [408,]  51      1        1       2            1       1     1
## [409,]  59      1        2       2            1       2     1
## [410,]  50      1        2       2            1       1     1
## [411,]  30      2        1       1            1       1     1
## [412,]  46      2        2       2            1       1     1
## [413,]  53      2        2       2            1       1     1
## [414,]  42      2        1       1            1       1     1
## [415,]  55      1        2       1            1       2     2
## [416,]  48      1        2       1            1       1     2
## [417,]  55      1        1       2            1       1     2
## [418,]  39      1        2       2            2       1     2
## [419,]  43      1        2       1            1       2     2
## [420,]  35      1        2       1            1       1     2
## [421,]  47      1        2       1            1       1     2
## [422,]  61      1        1       1            2       1     2
## [423,]  67      2        2       1            2       2     2
## [424,]  66      2        2       1            1       1     2
## [425,]  43      2        2       1            1       1     2
## [426,]  62      2        2       1            2       1     2
## [427,]  54      2        2       2            1       1     2
## [428,]  39      2        1       2            2       1     2
## [429,]  48      2        2       2            2       1     2
## [430,]  58      2        2       1            1       2     2
## [431,]  32      2        1       1            2       2     2
## [432,]  42      2        2       1            2       1     2
## [433,]  52      2        2       2            1       1     2
## [434,]  38      2        1       1            1       1     2
## [435,]  53      2        2       1            2       1     2
## [436,]  57      2        2       1            1       1     2
## [437,]  41      2        2       2            2       2     2
## [438,]  37      2        2       1            1       1     2
## [439,]  54      2        2       2            2       1     2
## [440,]  49      2        2       2            1       1     2
## [441,]  48      2        2       2            1       1     2
## [442,]  60      2        1       1            1       1     2
## [443,]  63      2        2       1            1       2     2
## [444,]  35      2        1       1            1       1     2
## [445,]  30      1        2       1            1       1     2
## [446,]  53      1        1       2            2       1     2
## [447,]  50      1        1       1            1       1     2
## [448,]  50      1        2       2            2       1     2
## [449,]  35      1        2       2            1       1     2
## [450,]  40      1        2       2            1       1     2
## [451,]  48      1        2       2            1       1     2
## [452,]  60      1        2       2            1       2     2
## [453,]  38      1        2       2            2       1     2
## [454,]  28      1        1       1            1       1     2
## [455,]  68      1        2       2            1       1     2
## [456,]  35      1        1       1            1       1     2
## [457,]  45      1        1       2            1       1     2
## [458,]  48      1        2       2            2       1     2
## [459,]  40      1        2       1            1       1     2
## [460,]  57      2        2       1            1       1     2
## [461,]  47      2        1       2            1       1     1
## [462,]  45      2        1       1            1       1     1
## [463,]  57      2        1       1            1       1     1
## [464,]  72      2        1       2            1       1     1
## [465,]  30      2        1       1            1       1     1
## [466,]  27      2        1       1            1       1     1
## [467,]  38      2        1       1            1       1     1
## [468,]  43      2        2       2            1       1     1
## [469,]  40      2        1       1            1       2     1
## [470,]  47      2        1       2            1       1     1
## [471,]  45      2        1       1            1       1     1
## [472,]  57      2        1       1            1       1     1
## [473,]  72      2        1       2            1       1     1
## [474,]  30      2        1       1            1       1     1
## [475,]  27      2        1       1            1       1     1
## [476,]  38      2        1       1            1       1     1
## [477,]  43      2        2       2            1       1     1
## [478,]  40      2        1       1            1       2     1
## [479,]  54      2        2       2            1       1     1
## [480,]  30      2        1       1            1       1     1
## [481,]  46      2        2       2            1       1     1
## [482,]  53      2        2       2            1       1     1
## [483,]  42      2        1       1            1       1     1
## [484,]  55      1        2       1            1       2     2
## [485,]  48      1        2       1            1       1     2
## [486,]  55      1        1       2            1       1     2
## [487,]  39      1        2       2            2       1     2
## [488,]  43      1        2       1            1       2     2
## [489,]  50      1        2       2            1       1     1
## [490,]  30      2        1       1            1       1     1
## [491,]  46      2        2       2            1       1     1
## [492,]  53      2        2       2            1       1     1
## [493,]  64      2        2       2            2       1     1
## [494,]  44      2        2       2            1       2     1
## [495,]  36      2        1       1            1       1     1
## [496,]  43      2        2       2            1       1     1
## [497,]  53      2        2       2            1       1     1
## [498,]  47      2        1       1            2       2     1
## [499,]  68      1        2       2            1       1     2
## [500,]  64      2        2       2            2       1     1
## [501,]  66      2        1       2            2       1     2
## [502,]  67      2        1       1            1       1     1
## [503,]  70      2        1       2            1       1     1
## [504,]  44      2        1       1            1       1     1
## [505,]  38      2        1       1            1       1     1
## [506,]  35      2        1       1            1       1     1
## [507,]  61      2        2       2            1       1     1
## [508,]  60      2        1       1            1       2     1
## [509,]  58      2        2       2            1       1     1
## [510,]  54      2        1       1            1       1     1
## [511,]  67      2        2       2            1       1     1
## [512,]  66      2        2       2            1       1     1
## [513,]  43      2        1       1            1       1     1
## [514,]  62      1        2       1            1       2     2
## [515,]  54      1        2       1            1       1     2
## [516,]  39      1        1       2            1       1     2
## [517,]  48      1        2       2            2       1     2
## [518,]  58      1        2       1            1       2     2
## [519,]  32      1        2       2            1       1     1
## [520,]  42      2        1       1            1       1     1

Normalize and clean

the second portion of the code snippet above then normalizes the data. after doing this, the string values have been converted to numeric in order to use the input for the knn algorithm. gender was changed to 2 for male and 1 for female. the yes and no variable for symptoms and illnesses was converted to 2 for no and 1 for yes. lastly, the predicted variable was changed from positive and negative to 2 for positive and 1 for negative.

Data Splicing

data splicing refers to separating the data set into a training and a testing set. this function will split the data into 70 percent training and 30 percent testing. the predicted variable, class in this case, is then removed from the set into its own data frame so we can use it to compare the test results.

set.seed(123)
dat.d <- sample(1:nrow(normalized),size=nrow(normalized)*0.7,replace = FALSE) #random selection of 70% data.
 
train.data <- normalized[dat.d,1:6] # 70% training data
test.data <- normalized[-dat.d,1:6] # remaining 30% test data

#Creating seperate dataframe for 'Creditability' feature which is our target.
train.data_labels <- normalized[dat.d,7]
test.data_labels <- normalized[-dat.d,7]

##Building machine learning model in order to start our knn algorithm we need to find the optimal value of k. the best approach is to square root the bumber of observations, 364. the square root of 364 is 19.07 therefore we need two training sets, k=19 and k=20.

library(class)
#Find the number of observation
NROW(train.data_labels) 
## [1] 364
knn.19 <- knn(train=train.data, test=test.data, cl=train.data_labels, k=19)
knn.20 <- knn(train=train.data, test=test.data, cl=train.data_labels, k=20)

#Calculate the proportion of correct classification for k = 19, 20
ACC.19 <- 100 * sum(test.data_labels == knn.19)/NROW(test.data_labels)
ACC.20 <- 100 * sum(test.data_labels == knn.20)/NROW(test.data_labels)

ACC.19
## [1] 67.30769
ACC.20
## [1] 66.66667

Model Assesssment

as seen above, when the accuracy of our models were 67.31% for k = 19 and 66.67% for k = 20. to examine the accuracy more in depth, the confusion matrix is produced for k = 20. assessing the model myself, i would say that model is decent with an accuracy of 67.31%, however this was a data set with 520 observations so it might have been better to test on a smaller sample size for more accuracy

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
confusionMatrix(table(knn.19 ,test.data_labels))
## Confusion Matrix and Statistics
## 
##       test.data_labels
## knn.19  1  2
##      1 27 19
##      2 32 78
##                                           
##                Accuracy : 0.6731          
##                  95% CI : (0.5935, 0.7459)
##     No Information Rate : 0.6218          
##     P-Value [Acc > NIR] : 0.10705         
##                                           
##                   Kappa : 0.2736          
##                                           
##  Mcnemar's Test P-Value : 0.09289         
##                                           
##             Sensitivity : 0.4576          
##             Specificity : 0.8041          
##          Pos Pred Value : 0.5870          
##          Neg Pred Value : 0.7091          
##              Prevalence : 0.3782          
##          Detection Rate : 0.1731          
##    Detection Prevalence : 0.2949          
##       Balanced Accuracy : 0.6309          
##                                           
##        'Positive' Class : 1               
##