Import data

## # A tibble: 507 × 10
##    Sample `SNP Score` `Obesity Score` `CVD Score` DiabetesScore `High BP Score`
##    <chr>        <dbl>           <dbl>       <dbl>         <dbl>           <dbl>
##  1 24               0             2           0             2.5             1  
##  2 29               0             1           0             0               0  
##  3 30               0             2.5         0             0.5             2.5
##  4 46               0             0           0             0               1  
##  5 47               0             0           0             0               0  
##  6 56               0             0           0             0.5             1  
##  7 67               0             1           0.5           0.5             1  
##  8 76               0             0           0             1               0  
##  9 78               0             0           0             0               0  
## 10 95               0             0           0             0.5             0.5
## # ℹ 497 more rows
## # ℹ 4 more variables: `MI/CVA Before 65 Score` <dbl>,
## #   `MI/CVA After 65 Score` <dbl>, TotalCVDScore <dbl>, `Personal Score` <dbl>

State one question

Is there a visually identifiable trend between samples that shows a possible correlation with high Diabetes Scores and high Total CVD Scores?

Plot data

## <ScaleContinuous>
##  Range:  
##  Limits:    0 --    1

Interpret

If Data visualization is correct, there appears to be a positive correlation with these two variable values, which makes sense due to Diabetes being one of the variables that contribute to CVD prevalence in the family history surveys given. Also every time a Diabetes score is seen, it actually contributes to the TotalCVDScore value as it is falling under that umbrella category. Also, this figure allows me to see where the majority of samples fell on the scatter plot, and where they lie most densely.

Module 5: Apply 4

Austin

Import data

State one question

Plot data

Interpret