Prediction of Type 2 Diabetes by Analysis of Genetic Variance

Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes

Determined 143 Risk Variants
139 common / 4 rare

Development of a Predictive Model for Type 2 Diabetes Mellitus Using Genetic and Clinical Data

Identified of 499 SNPS from 87 T2D related genes
Classification Algorithms
- Logistic Regression
- K-nearest neighbor

In order to develop a comprehensive understanding about the relationship between Type 2 Diabetes and genetic makeup of an individual, I researched literature on the subject.

During my research I came across two articles that helped advance my understanding. The paper “Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes”, indexes specific genetic mutations that have been studied and classify them based on occurrence. Thru their research they have identified 143 risk variants related to T2D, 139 are common and 4 are rare.

Furthermore, the paper “Development of a predictive model for Type 2 Diabetes Mellitus Using Genetic and Clinical Data” attempts to used the genetic makeup of individual to develop a classification algorithim to identify if an individual is at risk or not. The group identified 499 SNPs from 87 T2D related genes and used algorithm such as Logistic Regression and K nearest neighbor to classify individuals.

Type 2 Diabetes Patient Cohort
Name	Variation	Genotype Data
Samantha B. Clark	normal	http://opensnp.org/data/8.23andme.2
Daniel Goldowitz	no	http://opensnp.org/data/11.23andme.176
Lb	Gestational diabetes	http://opensnp.org/data/14.23andme.6
Maureen Markov	no	http://opensnp.org/data/17.23andme.143
John Lloyd Scharf	Diabetes Mellitus [T2D]	http://opensnp.org/data/141.23andme.60

Type 2 Diabetes Patient Cohort

Samantha B. Clark

normal

http://opensnp.org/data/8.23andme.2

Daniel Goldowitz

http://opensnp.org/data/11.23andme.176

Gestational diabetes

http://opensnp.org/data/14.23andme.6

Maureen Markov

http://opensnp.org/data/17.23andme.143

John Lloyd Scharf

Diabetes Mellitus [T2D]

http://opensnp.org/data/141.23andme.60

Genomic Data for Type 2 Diabetes Patient
rsID	Chromosome	Position	Genotype
rs4477212	1	72017	AA
rs3094315	1	742429	AG
rs3131972	1	742584	AG
rs12124819	1	766409	AA
rs11240777	1	788822	AA

Genomic Data for Type 2 Diabetes Patient

rs4477212

72017

rs3094315

742429

rs3131972

742584

rs12124819

766409

rs11240777

788822

.outcome	AA	AC	AG	AT
no	55397	4385	29355	2
yes	88046	16231	72008	294
no	88710	16477	71883	312
no	89002	15283	65806	441
no	89467	16773	73171	300

55397

4385

29355

yes

88046

16231

72008

294

88710

16477

71883

312

89002

15283

65806

441

89467

16773

73171

300

mtry	ROC	Sens	Spec
2	0.70	0.55	0.75
15	0.68	0.55	0.75
29	0.68	0.60	0.70

0.70

0.55

0.75

0.68

0.55

0.75

0.68

0.60

0.70

k	ROC	Sens	Spec
5	0.60	0.50	0.65
7	0.68	0.65	0.70
9	0.75	0.60	0.75

0.60

0.50

0.65

0.68

0.65

0.70

0.75

0.60

0.75

Background

Description

Prevelance

Research

Methods

Proposition

Approach

Results (Preliminary)

Develop a Patient Cohort

Load Genotype Data

Tabulate Data

Random Forest

K-Nearest Neighbors

Next Steps