Edwige Talla Badjio
December 17, 2015
Obesity and Health Risk Factors associated
Obesity, aesthetics
Class, end of semester, tools and techniques learned
understand avaiable facts and relevant data
explore the correlations between those variables
perform a sentiment analysis
analysis: three different data sources
Initial format
2. Center for Disease Control and Prevention, CDC - NHANES dataset
Subsetting the initial dataset, kept 14 variables from 78
Create a factor variable for BMI classification
Create a factor variable for Blood Pressure classification based on two variables (Diastolic and Systolic blood pressure)
Missing values (work with complete cases)
Survey Data (library psych)
Predicting the probability of being diagnosed with hypertension based on age,bmi Diabetes, Weight
Is BMI a good predictor of hypertension and hyperlipidemia?
3. Social Media data, Twitter
upload to GitHub
RWeka (NGramTokenizer, term document matrix for 2-grams)
Discovery: Survey data
Challenges
New packages psych - survey, NHANES - CDC data, RWeka - Data mining, Ggally, party, paryekit - Visualization
Perspectives : finish sentiment analysis, working with data from other sources like the news, text mining diet success story library(Ggally) -parallel coordinate plots