Introduction

This report will explore the Framingham Heart Study data set, particularly the relationship between heart rate, cigarettes per day, blood pressure, and education level attainment features. The Framingham Heart Study is a long-term data set that analyzes cardiovascular disease among the population of Framingham, Massachusetts. The data set contains 4240 observations with 16 features. Regarding the features of interest, heart rate contains 1 missing value, cigarettes per day contains 29 missing values, education contains 105 missing values, and both blood pressure features (sysBP and diaBP) contain no missing values. The details of the feature variables are in the provided R output. 
#providing summary of all features in the original data set
summary(heart)
##       male             age          education     currentSmoker   
##  Min.   :0.0000   Min.   :32.00   Min.   :1.000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:42.00   1st Qu.:1.000   1st Qu.:0.0000  
##  Median :0.0000   Median :49.00   Median :2.000   Median :0.0000  
##  Mean   :0.4292   Mean   :49.58   Mean   :1.979   Mean   :0.4941  
##  3rd Qu.:1.0000   3rd Qu.:56.00   3rd Qu.:3.000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :70.00   Max.   :4.000   Max.   :1.0000  
##                                   NA's   :105                     
##    cigsPerDay         BPMeds        prevalentStroke     prevalentHyp   
##  Min.   : 0.000   Min.   :0.00000   Min.   :0.000000   Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0.0000  
##  Median : 0.000   Median :0.00000   Median :0.000000   Median :0.0000  
##  Mean   : 9.006   Mean   :0.02962   Mean   :0.005896   Mean   :0.3106  
##  3rd Qu.:20.000   3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:1.0000  
##  Max.   :70.000   Max.   :1.00000   Max.   :1.000000   Max.   :1.0000  
##  NA's   :29       NA's   :53                                           
##     diabetes          totChol          sysBP           diaBP      
##  Min.   :0.00000   Min.   :107.0   Min.   : 83.5   Min.   : 48.0  
##  1st Qu.:0.00000   1st Qu.:206.0   1st Qu.:117.0   1st Qu.: 75.0  
##  Median :0.00000   Median :234.0   Median :128.0   Median : 82.0  
##  Mean   :0.02571   Mean   :236.7   Mean   :132.4   Mean   : 82.9  
##  3rd Qu.:0.00000   3rd Qu.:263.0   3rd Qu.:144.0   3rd Qu.: 90.0  
##  Max.   :1.00000   Max.   :696.0   Max.   :295.0   Max.   :142.5  
##                    NA's   :50                                     
##       BMI          heartRate         glucose         TenYearCHD    
##  Min.   :15.54   Min.   : 44.00   Min.   : 40.00   Min.   :0.0000  
##  1st Qu.:23.07   1st Qu.: 68.00   1st Qu.: 71.00   1st Qu.:0.0000  
##  Median :25.40   Median : 75.00   Median : 78.00   Median :0.0000  
##  Mean   :25.80   Mean   : 75.88   Mean   : 81.96   Mean   :0.1519  
##  3rd Qu.:28.04   3rd Qu.: 83.00   3rd Qu.: 87.00   3rd Qu.:0.0000  
##  Max.   :56.80   Max.   :143.00   Max.   :394.00   Max.   :1.0000  
##  NA's   :19      NA's   :1        NA's   :388
The mentioned missing values were omitted from the data set of interest during the analysis. While this did result in the loss of some information, it did not have a significant impact on the results of the report. 

Exploratory Data Analysis (EDA)

#providing summary of the features of interest in the reduced data set
summary(heart2)
##  heart.education heart.cigsPerDay heart.heartRate   heart.sysBP   
##  Min.   :1.000   Min.   : 0.000   Min.   : 44.00   Min.   : 83.5  
##  1st Qu.:1.000   1st Qu.: 0.000   1st Qu.: 68.00   1st Qu.:117.0  
##  Median :2.000   Median : 0.000   Median : 75.00   Median :128.0  
##  Mean   :1.979   Mean   : 9.006   Mean   : 75.88   Mean   :132.4  
##  3rd Qu.:3.000   3rd Qu.:20.000   3rd Qu.: 83.00   3rd Qu.:144.0  
##  Max.   :4.000   Max.   :70.000   Max.   :143.00   Max.   :295.0  
##  NA's   :105     NA's   :29       NA's   :1                       
##   heart.diaBP   
##  Min.   : 48.0  
##  1st Qu.: 75.0  
##  Median : 82.0  
##  Mean   : 82.9  
##  3rd Qu.: 90.0  
##  Max.   :142.5  
## 

EDA Objectives

The primary objective of an exploratory data analysis is to assess the data before making any assumptions about it. It provides a look at any data entry errors, and assits with understanding patterns in the data, as well as find any anomalies in the data.

Feature Variable Analysis

As mentioned there are some missing values in the features of interest. Heart rate contains 1 missing value, cigarettes per day contains 29 missing values, and education contains 105 missing values. The blood pressure measure features do not contain any missing values. Education is a categorical value as the results can only be the numbers 1 through 4 and each number represents a specific level of educational attainment. The remaining features are all numerical values with varied ranges. 

Correlations between numerical features

All the numerical features has the potential to be correlated as that is part of the purpose of this report. 

Potential dependency between categorical variables

As there is only one categorical variable being used in this analysis, there is no concern for categorical feature dependency.