June 5, 2018

Outline

  1. Preprocessing and Exploratory Data Analysis
  2. What was used and how?
    • R, dplyr, caret, ggplot2, base R, NBClust
  3. Results from Unsupervised Learning

Preprocessing

  • 3601 observations with 8 Predictor Variables grouped by subject id, 103 subjects.
  • No Response Variable
Variable Issue
X2 Same Information for each observation
X3 (Dates) X3 gives the same information as X7
X4 and X6 Highly Correlated with each other. One removed
  • X8 is mostly 0's, but is highly correlated with 01897 and a few others and so will be left in for analysis
  • O1918 and O1940 have 33 obs while all others have 35

Exploratory Data Analysis

Exploratory Data Analysis

2. Unsupervised Learning Algorithms

Clustering How they Work
Hclust: Complete Linkage Maximal intercluster dissimilarity. Compute all pairwise dissimilarities between the observations in cluster A and the observations in cluster B, and record the largest of these dissimilarities
Hclust: Average Linkage Mean intercluster dissimilarity. Compute all pairwise dissimilarities between the observations in cluster A and the observations in cluster B, and record the average of these dissimilarities

2. Unsupervised Learning Algorithms

2. Unsupervised Learning Algorithms