Coursera Slidify Project

Disease Prediction Using R

How I have created dataset ?
Cleaning and Rensing dataset ?
Data Mining
ML Models
Results

Dataset

The data set was manually created. Intervied 50 different people.

    head(d1)

  ##                                     symptoms  disease
  ## 1 YELLOW DISCOLORATION OF WHITE PART OF EYES JAUNDICE
  ## 2 YELLOW DISCOLORATION OF WHITE PART OF SKIN JAUNDICE
  ## 3                               FEELING WEAK JAUNDICE
  ## 4                                  LOW FEVER JAUNDICE
  ## 5                            YELLOW PIGMENTS JAUNDICE
  ## 6                            YELLOWISH COLOR JAUNDICE

Data Cleaning

Empty values were removed.

Disease columns was casted as factor.

    colnames(d1)

## [1] "symptoms" "disease"

Ml Algorithms

Using tm library stop word was removed.

Frequency table was built based on the occurance of words.

Also TF-ITF was calculated.

    library(tm)

## Loading required package: NLP

## 
## Attaching package: 'NLP'

## The following object is masked from 'package:ggplot2':
## 
##     annotate

Visualization

Using diamonds dataset for visualization

as my data points are not good. plot of chunk unnamed-chunk-4