The Heart Attack Intervention

There are six stages for this project:

Ask
This is a fake scenario for my Google Analytics capstone project. The CDC has notified you, a senior epidemiologist, about a potential epidemic. In the last few weeks, heart attack incidences have been rapidly increasing. The CDC has an intervention plan. Your job is to analyze the collected data that is statistically significant for public health to allocate resources for heart attack risk campaigns.
Prepare
Process
I used four diffrent tools: R, SPSS,Tableau and Excel for cleaning, analyzing, Visualizating and sharing the data. I used Excel In order to divide the blood pressure column into systolic and diastolic blood pressure, divide the Body Mass Index (BMI) into five categories (underweight, normal, overweight, and obese), and multiply the number of hours spent sleeping and being sedentary each day by seven to get the number of hours per week.
Analyze
The data was analyzed through three difrnce analyses: Descriptive, Correlation, and Prediction.
Descriptive Analyses:
By running this code a summarization measurements like the central tendency and variability of the cleaned data will be displayed.

summary(train_Heart_Attack_Risk_Analysis_cleaned)
##       Age            Sex             Cholesterol     Systolic BP 
##  Min.   :18.00   Length:7010        Min.   :120.0   Min.   : 90  
##  1st Qu.:35.00   Class :character   1st Qu.:192.0   1st Qu.:112  
##  Median :53.00   Mode  :character   Median :259.0   Median :135  
##  Mean   :53.51                      Mean   :259.9   Mean   :135  
##  3rd Qu.:72.00                      3rd Qu.:329.0   3rd Qu.:158  
##  Max.   :90.00                      Max.   :400.0   Max.   :180  
##   Diastolic BP      Heart Rate        Diabetes      Family History  
##  Min.   : 60.00   Min.   : 40.00   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: 72.00   1st Qu.: 57.00   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median : 85.00   Median : 75.00   Median :1.0000   Median :0.0000  
##  Mean   : 85.15   Mean   : 75.11   Mean   :0.6528   Mean   :0.4919  
##  3rd Qu.: 98.00   3rd Qu.: 93.00   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :110.00   Max.   :110.00   Max.   :1.0000   Max.   :1.0000  
##     Smoking          Obesity       Alcohol Consumption Exercise Hours Per Week
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000      Min.   : 0.002442      
##  1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.0000      1st Qu.: 5.046024      
##  Median :1.0000   Median :0.0000   Median :1.0000      Median : 9.982968      
##  Mean   :0.8963   Mean   :0.4999   Mean   :0.5959      Mean   : 9.979109      
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000      3rd Qu.:15.029659      
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000      Max.   :19.998709      
##      Diet           Previous Heart Problems Medication Use    Stress Level   
##  Length:7010        Min.   :0.0000          Min.   :0.0000   Min.   : 1.000  
##  Class :character   1st Qu.:0.0000          1st Qu.:0.0000   1st Qu.: 3.000  
##  Mode  :character   Median :0.0000          Median :1.0000   Median : 5.000  
##                     Mean   :0.4981          Mean   :0.5001   Mean   : 5.452  
##                     3rd Qu.:1.0000          3rd Qu.:1.0000   3rd Qu.: 8.000  
##                     Max.   :1.0000          Max.   :1.0000   Max.   :10.000  
##  Sedentary Hours Per Day     Income            BMI        BMI categories    
##  Min.   : 0.00884        Min.   : 20062   Min.   :18.00   Length:7010       
##  1st Qu.:20.80282        1st Qu.: 88368   1st Qu.:23.42   Class :character  
##  Median :41.55843        Median :157379   Median :28.74   Mode  :character  
##  Mean   :41.95805        Mean   :158245   Mean   :28.88                     
##  3rd Qu.:63.12314        3rd Qu.:227219   3rd Qu.:34.32                     
##  Max.   :83.99519        Max.   :299954   Max.   :39.99                     
##  Triglycerides   Physical Activity Days Per Week Sleep Hours Per Week
##  Min.   : 30.0   Min.   :0.000                   Min.   :28.00       
##  1st Qu.:221.0   1st Qu.:2.000                   1st Qu.:35.00       
##  Median :416.0   Median :3.000                   Median :49.00       
##  Mean   :416.8   Mean   :3.492                   Mean   :49.17       
##  3rd Qu.:613.0   3rd Qu.:5.000                   3rd Qu.:63.00       
##  Max.   :800.0   Max.   :7.000                   Max.   :70.00       
##    Country           Continent          Hemisphere        Heart Attack Risk
##  Length:7010        Length:7010        Length:7010        Min.   :0.0000   
##  Class :character   Class :character   Class :character   1st Qu.:0.0000   
##  Mode  :character   Mode  :character   Mode  :character   Median :0.0000   
##                                                           Mean   :0.3572   
##                                                           3rd Qu.:1.0000   
##                                                           Max.   :1.0000

Figure.1 Population Pyramid
Population Pyramid
Figure.1 illustrates the population pyramid. The highest age group is 75 and above. Therefore, this is an aging population.

Heart Attack Risk Map

The map displayed the distribution of heart attack risk across different countries interactively, with filtering legends for countries, average cholesterol, average triglycerides, and diabetes. Feel free to play with it!
Correlation Analyses:
Table.1: Cross tabulation between the Heart Attack Risk and Diabetes
Table.1 displays the highest proportion of the two-by-two table: those who do not have heart attack risk and have diabetes with (n=2981).
Table.2: Association between Heart Attack Risk and Diabetes
Table.2 illustrates the Chi-square test of the association between heart attack risk and diabetes. There is a significant association between both variables (X2 = 6.972, P<0.05).
Prediction Analyses:
Binary Logistic Regression
A Binary Logistics Regression statistical test was performed predicting The Heart Attack Risk as Dependant Variable (DV), Using the Indepandant Variables: Age, Family History, Income, Cholesterol, Triglycerides, Heart Rate, Systolic BP, Diastolic BP, BMI, Diabetes, Obesity, Previous Heart Problems, Medication Use, Exercise Hours Per Week, Sedentary Hours Per Week, Physical Activity Days Per Week, Smoking, Alcohol Consumption, Stress Level.
Table.3: Binary Logistic Regression
Table.3 The Binary Logistic Regression findings are -2 Log Likelihood value of 9115.345 indicates a reasonable model fit. The Constant has a B coefficient of -1.067, a p-value less than 0.001, indicating it is a statistically significant predictor in the model. The p-value for the variable “Diabetes” = 0.006 the B coefficient is 0.145 and Exp(B) of 1.156. This means that for every unit increase in the diabetes variable, the odds of the outcome occurring increase by a factor of 1.156, holding all other variables constant.
Sharing
This is a origanily inetractive Rmd file using Rstudio convert it through Knit button into a HTML and Published in Rpud. This is my linkedIn Account:https://www.linkedin.com/in/ruba-asiri?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3B%2FRh2hK%2BWQMWrue%2BxuPEWLQ%3D%3D
Act
based on the three different analyses, the population is aging, and diabetes is the highest risk compared to other comorbidities. Heart attack risk covers six continents. Performing both the chi-square test and binary logistic regression, suggesting that diabetes is a statistically significant variable for heart attack risk. Therefore, the allocation of resources and the heart attack risk campaigns should be for diabetic elderly patients around the globe.