The data set was manually created. Intervied 50 different people.
head(d1)
## symptoms disease
## 1 YELLOW DISCOLORATION OF WHITE PART OF EYES JAUNDICE
## 2 YELLOW DISCOLORATION OF WHITE PART OF SKIN JAUNDICE
## 3 FEELING WEAK JAUNDICE
## 4 LOW FEVER JAUNDICE
## 5 YELLOW PIGMENTS JAUNDICE
## 6 YELLOWISH COLOR JAUNDICE
Empty values were removed.
Disease columns was casted as factor.
colnames(d1)
## [1] "symptoms" "disease"
Using tm library stop word was removed.
Frequency table was built based on the occurance of words.
Also TF-ITF was calculated.
library(tm)
## Loading required package: NLP
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
Using diamonds dataset for visualization
as my data points are not good.
