Overview: Using a Kaggle dataset to visualize trends between pregnancy, BMI levels, and Diabetes and training a linear regression model to predict whether a patient has diabetes or not.

First we’ll load all necessary libraries and look at the structure of the data set, identifying the information we have to work with.

library(readr)
library(caTools)
library(caret)
library(e1071)
library(tidyverse)
library(ggplot2)
library(reshape2)

data <- read.csv("diabetes.csv")
head(data)

Next we’ll clean the data by summarizing and checking for missing values. Although there are no NAs in the data set, some of the features like Insulin and SkinThickness contain zeros, potentially representing missing/unrecorded data.

summary(data)
  Pregnancies        Glucose      BloodPressure    SkinThickness  
 Min.   : 0.000   Min.   :  0.0   Min.   :  0.00   Min.   : 0.00  
 1st Qu.: 1.000   1st Qu.: 99.0   1st Qu.: 62.00   1st Qu.: 0.00  
 Median : 3.000   Median :117.0   Median : 72.00   Median :23.00  
 Mean   : 3.845   Mean   :120.9   Mean   : 69.11   Mean   :20.54  
 3rd Qu.: 6.000   3rd Qu.:140.2   3rd Qu.: 80.00   3rd Qu.:32.00  
 Max.   :17.000   Max.   :199.0   Max.   :122.00   Max.   :99.00  
    Insulin           BMI        DiabetesPedigreeFunction
 Min.   :  0.0   Min.   : 0.00   Min.   :0.0780          
 1st Qu.:  0.0   1st Qu.:27.30   1st Qu.:0.2437          
 Median : 30.5   Median :32.00   Median :0.3725          
 Mean   : 79.8   Mean   :31.99   Mean   :0.4719          
 3rd Qu.:127.2   3rd Qu.:36.60   3rd Qu.:0.6262          
 Max.   :846.0   Max.   :67.10   Max.   :2.4200          
      Age           Outcome     
 Min.   :21.00   Min.   :0.000  
 1st Qu.:24.00   1st Qu.:0.000  
 Median :29.00   Median :0.000  
 Mean   :33.24   Mean   :0.349  
 3rd Qu.:41.00   3rd Qu.:1.000  
 Max.   :81.00   Max.   :1.000  
data %>% is.na() %>% colSums()
             Pregnancies                  Glucose 
                       0                        0 
           BloodPressure            SkinThickness 
                       0                        0 
                 Insulin                      BMI 
                       0                        0 
DiabetesPedigreeFunction                      Age 
                       0                        0 
                 Outcome 
                       0 
X <- data[, 1:8]
Y <- data[, 9]

Let’s do some visualization to help identify which variables may be most predictive of diabetes. Strong positive correlations with the outcome variable include Glucose and BMI.

cor_matrix <- cor(data)
cor_melted <- melt(cor_matrix)

ggplot(cor_melted, aes(Var1, Var2, fill=value)) + 
  geom_tile(color="white") + 
  scale_fill_gradient2(low="blue",high="red", mid="white", limit=c(-1,1), name="Correlation") +
  theme(axis.text.x = element_text(angle=45, hjust=1)) +
  labs(title="Correlation Heatmap", x="Features", y="Features")

We can also visualize the distribution of the outcome variable which is clearly imbalanced. The imbalance can affect model performance and will be addressed later using SMOTE.
Outcome = 0 -> the patient does not have diabetes.
Outcome = 1 -> the patient does have diabetes.

outcome_count <- table(data$Outcome)
outcome_df <- data.frame(Outcome = names(outcome_count),
                         Count = as.numeric(outcome_count))
ggplot(outcome_df, aes(x=Outcome, y=Count)) + 
  geom_bar(stat="identity", fill="light pink") +
  ggtitle("Distribution of Diabetes Outcomes")

This histogram shows the distribution of the number of pregnancies among patients, separated by diabetes outcome. We can see that patients with diabetes tend to have a higher number of pregnancies. Also notice the distribution is positively skewed. This indicates that most patients have fewer pregnancies, while a small number have significantly more.

ggplot(data, aes(x=Pregnancies, fill=factor(Outcome))) + 
         geom_histogram(bins=30, col="black") + facet_wrap(~Outcome, scales="free_y") +
  ggtitle("Distribution of Pregnancies by Outcomes")

The box plot shows the distribution of BMI values for diabetic and non-diabetic patients. Patient with diabetes tend to have higher BMI scores which is expected considering known risk facotrs. .

ggplot(data, aes(x=factor(Outcome), y=BMI, fill=factor(Outcome))) +
  geom_boxplot() + ylab("BMI") + ggtitle("BMI Distribution by Outcome")

Let’s get ready to build and train our prediction model. First we’ll split up the features into X and the outcome into Y, forming the final data set by scaling the features using z-score standardization. We’ll use logistic regression to predict whether a patient has diabetes.

X <- data[, 1:8]
Y <- data[, 9]

scaled_X <- as.data.frame(scale(X))
scaled_data <- cbind(scaled_X, Y)

X <- scaled_data[, 1:8]
Y <- scaled_data[, 9]

set.seed(123)
#sample split takes 70% of points from Y and set them to TRUE
sample <- sample.split(Y, SplitRatio = 0.7)
X_train <- X[sample == TRUE, ]
Y_train <- Y[sample == TRUE]
X_test <- X[sample == FALSE, ]
Y_test <- Y[sample == FALSE]

After training the model, we evaluated its performance using a confusion matrix.

log_model <- glm(Y_train ~ ., data=X_train, family=binomial)
summary(log_model)

Call:
glm(formula = Y_train ~ ., family = binomial, data = X_train)

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)               -0.8521     0.1185  -7.192 6.38e-13 ***
Pregnancies                0.3455     0.1354   2.551   0.0107 *  
Glucose                    1.4154     0.1592   8.892  < 2e-16 ***
BloodPressure             -0.2875     0.1364  -2.108   0.0350 *  
SkinThickness              0.1272     0.1404   0.906   0.3649    
Insulin                   -0.3864     0.1447  -2.669   0.0076 ** 
BMI                        0.7028     0.1496   4.699 2.61e-06 ***
DiabetesPedigreeFunction   0.2759     0.1279   2.158   0.0309 *  
Age                        0.1921     0.1392   1.380   0.1675    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 696.28  on 537  degrees of freedom
Residual deviance: 477.81  on 529  degrees of freedom
AIC: 495.81

Number of Fisher Scoring iterations: 5
predictions <- predict(log_model, newdata = X_test, type="response")
predictions <- factor(ifelse(predictions > 0.5, 1, 0), 
                      levels = levels(as.factor(Y_test)))

confusionMatrix(predictions, as.factor(Y_test)) 
Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 127  37
         1  23  43
                                          
               Accuracy : 0.7391          
                 95% CI : (0.6773, 0.7946)
    No Information Rate : 0.6522          
    P-Value [Acc > NIR] : 0.002949        
                                          
                  Kappa : 0.4005          
                                          
 Mcnemar's Test P-Value : 0.093290        
                                          
            Sensitivity : 0.8467          
            Specificity : 0.5375          
         Pos Pred Value : 0.7744          
         Neg Pred Value : 0.6515          
             Prevalence : 0.6522          
         Detection Rate : 0.5522          
   Detection Prevalence : 0.7130          
      Balanced Accuracy : 0.6921          
                                          
       'Positive' Class : 0               
                                          

The following predictions were made.
- 23 patients incorrectly predicted to have diabetes
- 37 patients incorrectly predicted not to have diabetes
- 43 patients correctly predicted to have diabetes
- 127 patients correctly predicted not to have diabetes

m<-confusionMatrix(predictions, as.factor(Y_test)) 
prediction_results <- as.table(m)
matrix_df <- as.data.frame(prediction_results)
ggplot(matrix_df, aes(x=Reference, y=Prediction, fill=Freq)) + 
  geom_tile() +
  geom_text(aes(label = Freq), color = "black", size = 6) +  
  scale_fill_gradient(low = "white", high = "blue") +
  labs(title = "Confusion Matrix Heatmap", x = "Actual", y = "Predicted")

To address class imbalance, we’ll apply SMOTE (synthetic minority over-sampling technique) to generate synthetic examples of the minority class. This will help the model learn more balanced decision boundaries.

library(ROSE)
set.seed(199)

smote_data <- ROSE(Outcome ~ ., data = data, N = 1500, p = 0.5)$data
outcome_count_smote <- table(smote_data$Outcome)
outcome_df_smote <- data.frame(Outcome = names(outcome_count_smote),
                         Count = as.numeric(outcome_count_smote))

Now the distribution of outcomes is significantly more balanced.

library(gridExtra)
p1 <- ggplot(outcome_df, aes(x=Outcome, y=Count)) + 
  geom_bar(stat="identity", fill="light pink") +
  ggtitle("Distribution of Diabetes Outcomes")
p2 <- ggplot(outcome_df_smote, aes(x=Outcome, y=Count)) + 
  geom_bar(stat="identity", fill="light pink") +
  ggtitle("Distribution of Diabetes Outcomes (SMOTE data)")
grid.arrange(p1,p2)

Also observe that the distribution of pregnancies across diabetes outcomes becomes more symmetrical and resembles a normal distribution. This confirms that the SMOTE helped create a more representative dataset for model training.

p1 <- ggplot(data, aes(x=Pregnancies, fill=factor(Outcome))) + 
         geom_histogram(bins=30, col="black") + facet_wrap(~Outcome, scales="free_y") +
  ggtitle("Distribution of Pregnancies by Outcomes")

p2 <- ggplot(smote_data, aes(x=Pregnancies, fill=factor(Outcome))) + 
         geom_histogram(bins=30, col="black") + facet_wrap(~Outcome, scales="free_y") +
  ggtitle("Distribution of Pregnancies by Outcomes (SMOTE data)")

grid.arrange(p1,p2)

After applying SMOTE and retraining the model, we can see an improvement in accuracy and balanced performance metrics.

X_primed <- smote_data[, 1:8]
Y_primed <- smote_data[, 9]

scaled_X_primed <- as.data.frame(scale(X_primed))
scaled_data_smote <- cbind(scaled_X_primed, Y_primed)

X_primed <- scaled_data_smote[, 1:8]
Y_primed <- scaled_data_smote[, 9]

set.seed(123)
sample <- sample.split(Y_primed, SplitRatio = 0.7)
X_train <- X_primed[sample == TRUE, ]
Y_train <- Y_primed[sample == TRUE]
X_test <- X_primed[sample == FALSE, ]
Y_test <- Y_primed[sample == FALSE]

log_model <- glm(Y_train ~ ., data=X_train, family=binomial)
summary(log_model)

Call:
glm(formula = Y_train ~ ., family = binomial, data = X_train)

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -0.01366    0.07221  -0.189 0.850001    
Pregnancies               0.36455    0.08104   4.498 6.85e-06 ***
Glucose                   0.94790    0.08806  10.764  < 2e-16 ***
BloodPressure            -0.15323    0.07888  -1.943 0.052069 .  
SkinThickness            -0.14594    0.08203  -1.779 0.075243 .  
Insulin                  -0.06331    0.07993  -0.792 0.428351    
BMI                       0.57902    0.08438   6.862 6.77e-12 ***
DiabetesPedigreeFunction  0.32377    0.07714   4.197 2.70e-05 ***
Age                       0.29627    0.08122   3.648 0.000264 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1455.4  on 1049  degrees of freedom
Residual deviance: 1146.4  on 1041  degrees of freedom
AIC: 1164.4

Number of Fisher Scoring iterations: 4
predictions <- predict(log_model, newdata = X_test, type="response")
predictions <- factor(ifelse(predictions > 0.5, 1, 0), 
                      levels = levels(as.factor(Y_test)))

m <- confusionMatrix(predictions, as.factor(Y_test)) 
m
Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 178  53
         1  51 168
                                          
               Accuracy : 0.7689          
                 95% CI : (0.7271, 0.8071)
    No Information Rate : 0.5089          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.5376          
                                          
 Mcnemar's Test P-Value : 0.9219          
                                          
            Sensitivity : 0.7773          
            Specificity : 0.7602          
         Pos Pred Value : 0.7706          
         Neg Pred Value : 0.7671          
             Prevalence : 0.5089          
         Detection Rate : 0.3956          
   Detection Prevalence : 0.5133          
      Balanced Accuracy : 0.7687          
                                          
       'Positive' Class : 0               
                                          

The model’s accuracy has improved from 73% to 76%! This improvement highlights the impact of data preprocessing techniques like scaling and oversampling. While logistic regression is a simple model, it provides a strong baseline for future experimentation with more complex algorithms.

prediction_results <- as.table(m)
matrix_df <- as.data.frame(prediction_results)

ggplot(matrix_df, aes(x=Reference, y=Prediction, fill=Freq)) +
  geom_tile() +
  geom_text(aes(label = Freq), color = "black", size = 6) +
  scale_fill_gradient(low = "white", high = "blue") +
  labs(title = "Confusion Matrix Heatmap", x = "Actual", y = "Predicted")

The final prediction results are as follows:
- 51 patients incorrectly predicted to have diabetes
- 53 patients incorrectly predicted not to have diabetes
- 168 patients correctly predicted to have diabetes
- 178 patients correctly predicted not to have diabetes

Dataset:
https://www.kaggle.com/datasets/mathchi/diabetes-data-set/data

LS0tDQp0aXRsZTogIkRpYWJldGVzIFByZWRpY3Rpb24iDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KT3ZlcnZpZXc6DQpVc2luZyBhIEthZ2dsZSBkYXRhc2V0IHRvIHZpc3VhbGl6ZSB0cmVuZHMgYmV0d2VlbiBwcmVnbmFuY3ksIEJNSSBsZXZlbHMsIGFuZCBEaWFiZXRlcyBhbmQgdHJhaW5pbmcgYSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCB0byBwcmVkaWN0IHdoZXRoZXIgYSBwYXRpZW50IGhhcyBkaWFiZXRlcyBvciBub3QuIA0KDQpGaXJzdCB3ZSdsbCBsb2FkIGFsbCBuZWNlc3NhcnkgbGlicmFyaWVzIGFuZCBsb29rIGF0IHRoZSBzdHJ1Y3R1cmUgb2YgdGhlIGRhdGEgc2V0LCBpZGVudGlmeWluZyB0aGUgaW5mb3JtYXRpb24gd2UgaGF2ZSB0byB3b3JrIHdpdGguDQpgYGB7cn0NCmxpYnJhcnkocmVhZHIpDQpsaWJyYXJ5KGNhVG9vbHMpDQpsaWJyYXJ5KGNhcmV0KQ0KbGlicmFyeShlMTA3MSkNCmxpYnJhcnkodGlkeXZlcnNlKQ0KbGlicmFyeShnZ3Bsb3QyKQ0KbGlicmFyeShyZXNoYXBlMikNCg0KZGF0YSA8LSByZWFkLmNzdigiZGlhYmV0ZXMuY3N2IikNCmhlYWQoZGF0YSkNCmBgYA0KDQoNCk5leHQgd2UnbGwgY2xlYW4gdGhlIGRhdGEgYnkgc3VtbWFyaXppbmcgYW5kIGNoZWNraW5nIGZvciBtaXNzaW5nIHZhbHVlcy4gQWx0aG91Z2ggdGhlcmUgYXJlIG5vIE5BcyBpbiB0aGUgZGF0YSBzZXQsIHNvbWUgb2YgdGhlIGZlYXR1cmVzIGxpa2UgSW5zdWxpbiBhbmQgU2tpblRoaWNrbmVzcyBjb250YWluIHplcm9zLCBwb3RlbnRpYWxseSByZXByZXNlbnRpbmcgbWlzc2luZy91bnJlY29yZGVkIGRhdGEuDQpgYGB7cn0NCnN1bW1hcnkoZGF0YSkNCmRhdGEgJT4lIGlzLm5hKCkgJT4lIGNvbFN1bXMoKQ0KWCA8LSBkYXRhWywgMTo4XQ0KWSA8LSBkYXRhWywgOV0NCmBgYA0KDQpMZXQncyBkbyBzb21lIHZpc3VhbGl6YXRpb24gdG8gaGVscCBpZGVudGlmeSB3aGljaCB2YXJpYWJsZXMgbWF5IGJlIG1vc3QgcHJlZGljdGl2ZSBvZiBkaWFiZXRlcy4gU3Ryb25nIHBvc2l0aXZlIGNvcnJlbGF0aW9ucyB3aXRoIHRoZSBvdXRjb21lIHZhcmlhYmxlIGluY2x1ZGUgR2x1Y29zZSBhbmQgQk1JLiANCmBgYHtyfQ0KY29yX21hdHJpeCA8LSBjb3IoZGF0YSkNCmNvcl9tZWx0ZWQgPC0gbWVsdChjb3JfbWF0cml4KQ0KDQpnZ3Bsb3QoY29yX21lbHRlZCwgYWVzKFZhcjEsIFZhcjIsIGZpbGw9dmFsdWUpKSArIA0KICBnZW9tX3RpbGUoY29sb3I9IndoaXRlIikgKyANCiAgc2NhbGVfZmlsbF9ncmFkaWVudDIobG93PSJibHVlIixoaWdoPSJyZWQiLCBtaWQ9IndoaXRlIiwgbGltaXQ9YygtMSwxKSwgbmFtZT0iQ29ycmVsYXRpb24iKSArDQogIHRoZW1lKGF4aXMudGV4dC54ID0gZWxlbWVudF90ZXh0KGFuZ2xlPTQ1LCBoanVzdD0xKSkgKw0KICBsYWJzKHRpdGxlPSJDb3JyZWxhdGlvbiBIZWF0bWFwIiwgeD0iRmVhdHVyZXMiLCB5PSJGZWF0dXJlcyIpDQpgYGANCg0KV2UgY2FuIGFsc28gdmlzdWFsaXplIHRoZSBkaXN0cmlidXRpb24gb2YgdGhlIG91dGNvbWUgdmFyaWFibGUgd2hpY2ggaXMgY2xlYXJseSBpbWJhbGFuY2VkLiBUaGUgaW1iYWxhbmNlIGNhbiBhZmZlY3QgbW9kZWwgcGVyZm9ybWFuY2UgYW5kIHdpbGwgYmUgYWRkcmVzc2VkIGxhdGVyIHVzaW5nIFNNT1RFLjxicj4NCk91dGNvbWUgPSAwIC0+IHRoZSBwYXRpZW50IGRvZXMgbm90IGhhdmUgZGlhYmV0ZXMuPGJyPg0KT3V0Y29tZSA9IDEgLT4gdGhlIHBhdGllbnQgZG9lcyBoYXZlIGRpYWJldGVzLg0KDQpgYGB7cn0NCm91dGNvbWVfY291bnQgPC0gdGFibGUoZGF0YSRPdXRjb21lKQ0Kb3V0Y29tZV9kZiA8LSBkYXRhLmZyYW1lKE91dGNvbWUgPSBuYW1lcyhvdXRjb21lX2NvdW50KSwNCiAgICAgICAgICAgICAgICAgICAgICAgICBDb3VudCA9IGFzLm51bWVyaWMob3V0Y29tZV9jb3VudCkpDQpnZ3Bsb3Qob3V0Y29tZV9kZiwgYWVzKHg9T3V0Y29tZSwgeT1Db3VudCkpICsgDQogIGdlb21fYmFyKHN0YXQ9ImlkZW50aXR5IiwgZmlsbD0ibGlnaHQgcGluayIpICsNCiAgZ2d0aXRsZSgiRGlzdHJpYnV0aW9uIG9mIERpYWJldGVzIE91dGNvbWVzIikNCmBgYA0KDQpUaGlzIGhpc3RvZ3JhbSBzaG93cyB0aGUgZGlzdHJpYnV0aW9uIG9mIHRoZSBudW1iZXIgb2YgcHJlZ25hbmNpZXMgYW1vbmcgcGF0aWVudHMsIHNlcGFyYXRlZCBieSBkaWFiZXRlcyBvdXRjb21lLiBXZSBjYW4gc2VlIHRoYXQgcGF0aWVudHMgd2l0aCBkaWFiZXRlcyB0ZW5kIHRvIGhhdmUgYSBoaWdoZXIgbnVtYmVyIG9mIHByZWduYW5jaWVzLiBBbHNvIG5vdGljZSB0aGUgZGlzdHJpYnV0aW9uIGlzIHBvc2l0aXZlbHkgc2tld2VkLiBUaGlzIGluZGljYXRlcyB0aGF0IG1vc3QgcGF0aWVudHMgaGF2ZSBmZXdlciBwcmVnbmFuY2llcywgd2hpbGUgYSBzbWFsbCBudW1iZXIgaGF2ZSBzaWduaWZpY2FudGx5IG1vcmUuDQogDQpgYGB7cn0NCmdncGxvdChkYXRhLCBhZXMoeD1QcmVnbmFuY2llcywgZmlsbD1mYWN0b3IoT3V0Y29tZSkpKSArIA0KICAgICAgICAgZ2VvbV9oaXN0b2dyYW0oYmlucz0zMCwgY29sPSJibGFjayIpICsgZmFjZXRfd3JhcCh+T3V0Y29tZSwgc2NhbGVzPSJmcmVlX3kiKSArDQogIGdndGl0bGUoIkRpc3RyaWJ1dGlvbiBvZiBQcmVnbmFuY2llcyBieSBPdXRjb21lcyIpDQpgYGANCg0KVGhlIGJveCBwbG90IHNob3dzIHRoZSBkaXN0cmlidXRpb24gb2YgQk1JIHZhbHVlcyBmb3IgZGlhYmV0aWMgYW5kIG5vbi1kaWFiZXRpYyBwYXRpZW50cy4gUGF0aWVudCB3aXRoIGRpYWJldGVzIHRlbmQgdG8gaGF2ZSBoaWdoZXIgQk1JIHNjb3JlcyB3aGljaCBpcyBleHBlY3RlZCBjb25zaWRlcmluZyBrbm93biByaXNrIGZhY290cnMuIA0KLg0KYGBge3J9DQpnZ3Bsb3QoZGF0YSwgYWVzKHg9ZmFjdG9yKE91dGNvbWUpLCB5PUJNSSwgZmlsbD1mYWN0b3IoT3V0Y29tZSkpKSArDQogIGdlb21fYm94cGxvdCgpICsgeWxhYigiQk1JIikgKyBnZ3RpdGxlKCJCTUkgRGlzdHJpYnV0aW9uIGJ5IE91dGNvbWUiKQ0KDQpgYGANCg0KTGV0J3MgZ2V0IHJlYWR5IHRvIGJ1aWxkIGFuZCB0cmFpbiBvdXIgcHJlZGljdGlvbiBtb2RlbC4NCkZpcnN0IHdlJ2xsIHNwbGl0IHVwIHRoZSBmZWF0dXJlcyBpbnRvIFggYW5kIHRoZSBvdXRjb21lIGludG8gWSwgZm9ybWluZyB0aGUgZmluYWwgZGF0YSBzZXQgYnkgc2NhbGluZyB0aGUgZmVhdHVyZXMgdXNpbmcgei1zY29yZSBzdGFuZGFyZGl6YXRpb24uIFdlJ2xsIHVzZSBsb2dpc3RpYyByZWdyZXNzaW9uIHRvIHByZWRpY3Qgd2hldGhlciBhIHBhdGllbnQgaGFzIGRpYWJldGVzLiANCmBgYHtyfQ0KWCA8LSBkYXRhWywgMTo4XQ0KWSA8LSBkYXRhWywgOV0NCg0Kc2NhbGVkX1ggPC0gYXMuZGF0YS5mcmFtZShzY2FsZShYKSkNCnNjYWxlZF9kYXRhIDwtIGNiaW5kKHNjYWxlZF9YLCBZKQ0KDQpYIDwtIHNjYWxlZF9kYXRhWywgMTo4XQ0KWSA8LSBzY2FsZWRfZGF0YVssIDldDQoNCnNldC5zZWVkKDEyMykNCiNzYW1wbGUgc3BsaXQgdGFrZXMgNzAlIG9mIHBvaW50cyBmcm9tIFkgYW5kIHNldCB0aGVtIHRvIFRSVUUNCnNhbXBsZSA8LSBzYW1wbGUuc3BsaXQoWSwgU3BsaXRSYXRpbyA9IDAuNykNClhfdHJhaW4gPC0gWFtzYW1wbGUgPT0gVFJVRSwgXQ0KWV90cmFpbiA8LSBZW3NhbXBsZSA9PSBUUlVFXQ0KWF90ZXN0IDwtIFhbc2FtcGxlID09IEZBTFNFLCBdDQpZX3Rlc3QgPC0gWVtzYW1wbGUgPT0gRkFMU0VdDQpgYGANCg0KQWZ0ZXIgdHJhaW5pbmcgdGhlIG1vZGVsLCB3ZSBldmFsdWF0ZWQgaXRzIHBlcmZvcm1hbmNlIHVzaW5nIGEgY29uZnVzaW9uIG1hdHJpeC4NCmBgYHtyfQ0KbG9nX21vZGVsIDwtIGdsbShZX3RyYWluIH4gLiwgZGF0YT1YX3RyYWluLCBmYW1pbHk9Ymlub21pYWwpDQpzdW1tYXJ5KGxvZ19tb2RlbCkNCg0KcHJlZGljdGlvbnMgPC0gcHJlZGljdChsb2dfbW9kZWwsIG5ld2RhdGEgPSBYX3Rlc3QsIHR5cGU9InJlc3BvbnNlIikNCnByZWRpY3Rpb25zIDwtIGZhY3RvcihpZmVsc2UocHJlZGljdGlvbnMgPiAwLjUsIDEsIDApLCANCiAgICAgICAgICAgICAgICAgICAgICBsZXZlbHMgPSBsZXZlbHMoYXMuZmFjdG9yKFlfdGVzdCkpKQ0KDQpjb25mdXNpb25NYXRyaXgocHJlZGljdGlvbnMsIGFzLmZhY3RvcihZX3Rlc3QpKSANCmBgYA0KVGhlIGZvbGxvd2luZyBwcmVkaWN0aW9ucyB3ZXJlIG1hZGUuPGJyPi0gMjMgcGF0aWVudHMgaW5jb3JyZWN0bHkgcHJlZGljdGVkIHRvIGhhdmUgZGlhYmV0ZXMgPGJyPi0gMzcgcGF0aWVudHMgaW5jb3JyZWN0bHkgcHJlZGljdGVkIG5vdCB0byBoYXZlIGRpYWJldGVzIDxicj4tIDQzIHBhdGllbnRzIGNvcnJlY3RseSBwcmVkaWN0ZWQgdG8gaGF2ZSBkaWFiZXRlczxicj4tIDEyNyBwYXRpZW50cyBjb3JyZWN0bHkgcHJlZGljdGVkIG5vdCB0byBoYXZlIGRpYWJldGVzDQoNCmBgYHtyfQ0KbTwtY29uZnVzaW9uTWF0cml4KHByZWRpY3Rpb25zLCBhcy5mYWN0b3IoWV90ZXN0KSkgDQpwcmVkaWN0aW9uX3Jlc3VsdHMgPC0gYXMudGFibGUobSkNCm1hdHJpeF9kZiA8LSBhcy5kYXRhLmZyYW1lKHByZWRpY3Rpb25fcmVzdWx0cykNCmdncGxvdChtYXRyaXhfZGYsIGFlcyh4PVJlZmVyZW5jZSwgeT1QcmVkaWN0aW9uLCBmaWxsPUZyZXEpKSArIA0KICBnZW9tX3RpbGUoKSArDQogIGdlb21fdGV4dChhZXMobGFiZWwgPSBGcmVxKSwgY29sb3IgPSAiYmxhY2siLCBzaXplID0gNikgKyAgDQogIHNjYWxlX2ZpbGxfZ3JhZGllbnQobG93ID0gIndoaXRlIiwgaGlnaCA9ICJibHVlIikgKw0KICBsYWJzKHRpdGxlID0gIkNvbmZ1c2lvbiBNYXRyaXggSGVhdG1hcCIsIHggPSAiQWN0dWFsIiwgeSA9ICJQcmVkaWN0ZWQiKQ0KYGBgDQoNClRvIGFkZHJlc3MgY2xhc3MgaW1iYWxhbmNlLCB3ZSdsbCBhcHBseSBTTU9URSAoc3ludGhldGljIG1pbm9yaXR5IG92ZXItc2FtcGxpbmcgdGVjaG5pcXVlKSB0byBnZW5lcmF0ZSBzeW50aGV0aWMgZXhhbXBsZXMgb2YgdGhlIG1pbm9yaXR5IGNsYXNzLiBUaGlzIHdpbGwgaGVscCB0aGUgbW9kZWwgbGVhcm4gbW9yZSBiYWxhbmNlZCBkZWNpc2lvbiBib3VuZGFyaWVzLiAgDQpgYGB7cn0NCmxpYnJhcnkoUk9TRSkNCnNldC5zZWVkKDE5OSkNCg0Kc21vdGVfZGF0YSA8LSBST1NFKE91dGNvbWUgfiAuLCBkYXRhID0gZGF0YSwgTiA9IDE1MDAsIHAgPSAwLjUpJGRhdGENCm91dGNvbWVfY291bnRfc21vdGUgPC0gdGFibGUoc21vdGVfZGF0YSRPdXRjb21lKQ0Kb3V0Y29tZV9kZl9zbW90ZSA8LSBkYXRhLmZyYW1lKE91dGNvbWUgPSBuYW1lcyhvdXRjb21lX2NvdW50X3Ntb3RlKSwNCiAgICAgICAgICAgICAgICAgICAgICAgICBDb3VudCA9IGFzLm51bWVyaWMob3V0Y29tZV9jb3VudF9zbW90ZSkpDQpgYGANCg0KTm93IHRoZSBkaXN0cmlidXRpb24gb2Ygb3V0Y29tZXMgaXMgc2lnbmlmaWNhbnRseSBtb3JlIGJhbGFuY2VkLiANCmBgYHtyfQ0KbGlicmFyeShncmlkRXh0cmEpDQpwMSA8LSBnZ3Bsb3Qob3V0Y29tZV9kZiwgYWVzKHg9T3V0Y29tZSwgeT1Db3VudCkpICsgDQogIGdlb21fYmFyKHN0YXQ9ImlkZW50aXR5IiwgZmlsbD0ibGlnaHQgcGluayIpICsNCiAgZ2d0aXRsZSgiRGlzdHJpYnV0aW9uIG9mIERpYWJldGVzIE91dGNvbWVzIikNCnAyIDwtIGdncGxvdChvdXRjb21lX2RmX3Ntb3RlLCBhZXMoeD1PdXRjb21lLCB5PUNvdW50KSkgKyANCiAgZ2VvbV9iYXIoc3RhdD0iaWRlbnRpdHkiLCBmaWxsPSJsaWdodCBwaW5rIikgKw0KICBnZ3RpdGxlKCJEaXN0cmlidXRpb24gb2YgRGlhYmV0ZXMgT3V0Y29tZXMgKFNNT1RFIGRhdGEpIikNCmdyaWQuYXJyYW5nZShwMSxwMikNCmBgYA0KDQpBbHNvIG9ic2VydmUgdGhhdCB0aGUgZGlzdHJpYnV0aW9uIG9mIHByZWduYW5jaWVzIGFjcm9zcyBkaWFiZXRlcyBvdXRjb21lcyBiZWNvbWVzIG1vcmUgc3ltbWV0cmljYWwgYW5kIHJlc2VtYmxlcyBhIG5vcm1hbCBkaXN0cmlidXRpb24uIFRoaXMgY29uZmlybXMgdGhhdCB0aGUgU01PVEUgaGVscGVkIGNyZWF0ZSBhIG1vcmUgcmVwcmVzZW50YXRpdmUgZGF0YXNldCBmb3IgbW9kZWwgdHJhaW5pbmcuDQpgYGB7cn0NCnAxIDwtIGdncGxvdChkYXRhLCBhZXMoeD1QcmVnbmFuY2llcywgZmlsbD1mYWN0b3IoT3V0Y29tZSkpKSArIA0KICAgICAgICAgZ2VvbV9oaXN0b2dyYW0oYmlucz0zMCwgY29sPSJibGFjayIpICsgZmFjZXRfd3JhcCh+T3V0Y29tZSwgc2NhbGVzPSJmcmVlX3kiKSArDQogIGdndGl0bGUoIkRpc3RyaWJ1dGlvbiBvZiBQcmVnbmFuY2llcyBieSBPdXRjb21lcyIpDQoNCnAyIDwtIGdncGxvdChzbW90ZV9kYXRhLCBhZXMoeD1QcmVnbmFuY2llcywgZmlsbD1mYWN0b3IoT3V0Y29tZSkpKSArIA0KICAgICAgICAgZ2VvbV9oaXN0b2dyYW0oYmlucz0zMCwgY29sPSJibGFjayIpICsgZmFjZXRfd3JhcCh+T3V0Y29tZSwgc2NhbGVzPSJmcmVlX3kiKSArDQogIGdndGl0bGUoIkRpc3RyaWJ1dGlvbiBvZiBQcmVnbmFuY2llcyBieSBPdXRjb21lcyAoU01PVEUgZGF0YSkiKQ0KDQpncmlkLmFycmFuZ2UocDEscDIpDQpgYGANCg0KQWZ0ZXIgYXBwbHlpbmcgU01PVEUgYW5kIHJldHJhaW5pbmcgdGhlIG1vZGVsLCB3ZSBjYW4gc2VlIGFuIGltcHJvdmVtZW50IGluIGFjY3VyYWN5IGFuZCBiYWxhbmNlZCBwZXJmb3JtYW5jZSBtZXRyaWNzLiANCg0KYGBge3J9DQpYX3ByaW1lZCA8LSBzbW90ZV9kYXRhWywgMTo4XQ0KWV9wcmltZWQgPC0gc21vdGVfZGF0YVssIDldDQoNCnNjYWxlZF9YX3ByaW1lZCA8LSBhcy5kYXRhLmZyYW1lKHNjYWxlKFhfcHJpbWVkKSkNCnNjYWxlZF9kYXRhX3Ntb3RlIDwtIGNiaW5kKHNjYWxlZF9YX3ByaW1lZCwgWV9wcmltZWQpDQoNClhfcHJpbWVkIDwtIHNjYWxlZF9kYXRhX3Ntb3RlWywgMTo4XQ0KWV9wcmltZWQgPC0gc2NhbGVkX2RhdGFfc21vdGVbLCA5XQ0KDQpzZXQuc2VlZCgxMjMpDQpzYW1wbGUgPC0gc2FtcGxlLnNwbGl0KFlfcHJpbWVkLCBTcGxpdFJhdGlvID0gMC43KQ0KWF90cmFpbiA8LSBYX3ByaW1lZFtzYW1wbGUgPT0gVFJVRSwgXQ0KWV90cmFpbiA8LSBZX3ByaW1lZFtzYW1wbGUgPT0gVFJVRV0NClhfdGVzdCA8LSBYX3ByaW1lZFtzYW1wbGUgPT0gRkFMU0UsIF0NCllfdGVzdCA8LSBZX3ByaW1lZFtzYW1wbGUgPT0gRkFMU0VdDQoNCmxvZ19tb2RlbCA8LSBnbG0oWV90cmFpbiB+IC4sIGRhdGE9WF90cmFpbiwgZmFtaWx5PWJpbm9taWFsKQ0Kc3VtbWFyeShsb2dfbW9kZWwpDQoNCnByZWRpY3Rpb25zIDwtIHByZWRpY3QobG9nX21vZGVsLCBuZXdkYXRhID0gWF90ZXN0LCB0eXBlPSJyZXNwb25zZSIpDQpwcmVkaWN0aW9ucyA8LSBmYWN0b3IoaWZlbHNlKHByZWRpY3Rpb25zID4gMC41LCAxLCAwKSwgDQogICAgICAgICAgICAgICAgICAgICAgbGV2ZWxzID0gbGV2ZWxzKGFzLmZhY3RvcihZX3Rlc3QpKSkNCg0KbSA8LSBjb25mdXNpb25NYXRyaXgocHJlZGljdGlvbnMsIGFzLmZhY3RvcihZX3Rlc3QpKSANCm0NCmBgYA0KVGhlIG1vZGVsJ3MgYWNjdXJhY3kgaGFzIGltcHJvdmVkIGZyb20gNzMlIHRvIDc2JSENClRoaXMgaW1wcm92ZW1lbnQgaGlnaGxpZ2h0cyB0aGUgaW1wYWN0IG9mIGRhdGEgcHJlcHJvY2Vzc2luZyB0ZWNobmlxdWVzIGxpa2Ugc2NhbGluZyBhbmQgb3ZlcnNhbXBsaW5nLiBXaGlsZSBsb2dpc3RpYyByZWdyZXNzaW9uIGlzIGEgc2ltcGxlIG1vZGVsLCBpdCBwcm92aWRlcyBhIHN0cm9uZyBiYXNlbGluZSBmb3IgZnV0dXJlIGV4cGVyaW1lbnRhdGlvbiB3aXRoIG1vcmUgY29tcGxleCBhbGdvcml0aG1zLg0KDQpgYGB7cn0NCnByZWRpY3Rpb25fcmVzdWx0cyA8LSBhcy50YWJsZShtKQ0KbWF0cml4X2RmIDwtIGFzLmRhdGEuZnJhbWUocHJlZGljdGlvbl9yZXN1bHRzKQ0KDQpnZ3Bsb3QobWF0cml4X2RmLCBhZXMoeD1SZWZlcmVuY2UsIHk9UHJlZGljdGlvbiwgZmlsbD1GcmVxKSkgKw0KICBnZW9tX3RpbGUoKSArDQogIGdlb21fdGV4dChhZXMobGFiZWwgPSBGcmVxKSwgY29sb3IgPSAiYmxhY2siLCBzaXplID0gNikgKw0KICBzY2FsZV9maWxsX2dyYWRpZW50KGxvdyA9ICJ3aGl0ZSIsIGhpZ2ggPSAiYmx1ZSIpICsNCiAgbGFicyh0aXRsZSA9ICJDb25mdXNpb24gTWF0cml4IEhlYXRtYXAiLCB4ID0gIkFjdHVhbCIsIHkgPSAiUHJlZGljdGVkIikNCmBgYA0KDQpUaGUgZmluYWwgcHJlZGljdGlvbiByZXN1bHRzIGFyZSBhcyBmb2xsb3dzOjxicj4tIDUxIHBhdGllbnRzIGluY29ycmVjdGx5IHByZWRpY3RlZCB0byBoYXZlIGRpYWJldGVzIDxicj4tIDUzIHBhdGllbnRzIGluY29ycmVjdGx5IHByZWRpY3RlZCBub3QgdG8gaGF2ZSBkaWFiZXRlcyA8YnI+LSAxNjggcGF0aWVudHMgY29ycmVjdGx5IHByZWRpY3RlZCB0byBoYXZlIGRpYWJldGVzPGJyPi0gMTc4IHBhdGllbnRzIGNvcnJlY3RseSBwcmVkaWN0ZWQgbm90IHRvIGhhdmUgZGlhYmV0ZXMNCg0KRGF0YXNldDo8YnI+DQpodHRwczovL3d3dy5rYWdnbGUuY29tL2RhdGFzZXRzL21hdGhjaGkvZGlhYmV0ZXMtZGF0YS1zZXQvZGF0YTxicj4NCg0K