In order to accomplish this tasks, I have broken down my project in to 3 steps.
Step 1 : Develop a Patient Cohort - In this step I will use the openSNP database to identify individuals that have released there genomic data for Type 2 Diabetes.
Step 2 : Once the I have compiled a list of individuals, I will use software such as Open Cravat annotate they genomic data to determine any genetic variation in the individuals DNA sequence. Furthermore, I will refer to previous publications and literature to further develop my list of features.
Step 3 : Once I have developed a list of variants, I will emplore the use of Machine Learning techniques,such as Logistic Regression and Random Forest, to develop a classification model that will analyze the DNA sequence as well as characteristics of the genetic mutation to develop a predictive model to determine if an individual is at risk for developing type 2 diabetes.