Final Project

The main goal of the Final Project is to demonstrate all the tools and skills learned in this course, I am going to obtain the data from two different sources, clean and tyding the data, analyze it, and display graphic visualizations using all the libraries and packages available to do these tasks.

Motivation

The aim of my analysis is to challenge myself and to prove that I gained the skills and knowledge need it to become in a data scientist. I decided to choose a public dataset from the NYC open data, about the New York City leading causes of death in 2021, as well as a public data set from the state of New Jersey from the same year, to make analysis between the two dataframes. This dataset contains rich data to work with, it has information about leading causes of death by in the city in 2021, which includes the year, cause of death, sex, etc. In addition to this data, I would like to include a similar dataset from Bergen County in New Jersey, to make a comparison and to analyze scenarios and causes of deaths in both cities. I am going to choose the year 2021 in both data sets, to review the amount of deaths caused by COVID 19.

I decided to choose this topic because I lost several friends and relatives during that period of time because of the pandemic, and I would like to find out more about it.

Data Sources

For the final project, I’ll be using two data sources, the first one is from the NYC open data website, that contains the leading causes of death. The second data set is from the New Jersey’s data hub website, and it contains the same information on it. Both dataframes contains rich data about main causes of death, the year, gender, ethnicity, etc.

Load the required libraries for the project

library(rvest)
## Warning: package 'rvest' was built under R version 4.3.3
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()         masks stats::filter()
## ✖ readr::guess_encoding() masks rvest::guess_encoding()
## ✖ dplyr::lag()            masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(kableExtra) 
## Warning: package 'kableExtra' was built under R version 4.3.3
## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(stringr)
library(lubridate)
library(RTextTools)
## Warning: package 'RTextTools' was built under R version 4.3.3
## Loading required package: SparseM
## 
## Attaching package: 'SparseM'
## 
## The following object is masked from 'package:base':
## 
##     backsolve

Import first dataset from a web page to scrape it

web_content <- read_html("https://www.health.ny.gov/statistics/vital_statistics/2021/table33a.htm")
nyc_data <- web_content %>% html_table(fill = TRUE)
nyc_data
## [[1]]
## # A tibble: 51 × 11
##    `Cause of Death`              `ICD10 Code(s)` Total Total Total `White Alone`
##    <chr>                         <chr>           <chr> <chr> <chr> <chr>        
##  1 Cause of Death                ICD10 Code(s)   Numb… Crud… Adju… Number       
##  2 Total                         A00-Z99         181,… 913.6 687.0 135,117      
##  3 Tuberculosis                  A15-A19         34    0.2   0.1   12           
##  4 Septicemia                    A40-A41         2,241 11.3  8.4   1,585        
##  5 Acquired Immune Deficiency S… B20-B24         405   2.0   1.7   135          
##  6 Malignant Neoplasms           C00-C97         32,4… 163.5 121.1 24,768       
##  7 Buccal Cavity and Pharynx     C00-C14         556   2.8   2.0   409          
##  8 Digestive Organs and Periton… C15-C26         9,248 46.6  34.5  6,762        
##  9 Respiratory System            C30-C39         7,060 35.6  25.9  5,628        
## 10 Trachea, Bronchus and Lung    C33-C34         6,806 34.3  25.0  5,444        
## # ℹ 41 more rows
## # ℹ 5 more variables: `White Alone` <chr>, `White Alone` <chr>,
## #   `Black Alone` <chr>, `Black Alone` <chr>, `Black Alone` <chr>

Convert HTML table to a DataFrame

nyc_data <- as.data.frame(nyc_data)
nyc_data %>%
   kbl() %>%
  kable_styling(full_width = F)
Cause.of.Death ICD10.Code.s. Total Total.1 Total.2 White.Alone White.Alone.1 White.Alone.2 Black.Alone Black.Alone.1 Black.Alone.2
Cause of Death ICD10 Code(s) Number Crude Rate Adjusted Rate Number Crude Rate Adjusted Rate Number Crude Rate Adjusted Rate
Total A00-Z99 181,421 913.6 687.0 135,117 989.6 669.5 26,707 754.1 695.6
Tuberculosis A15-A19 34 0.2 0.1 12 0.1 0.1 5 0.1 0.1
Septicemia A40-A41 2,241 11.3 8.4 1,585 11.6 7.7 410 11.6 10.8
Acquired Immune Deficiency Syndrome (AIDS) B20-B24 405 2.0 1.7 135 1.0 0.8 199 5.6 5.1
Malignant Neoplasms C00-C97 32,464 163.5 121.1 24,768 181.4 122.1 4,231 119.5 106.1
Buccal Cavity and Pharynx C00-C14 556 2.8 2.0 409 3.0 2.0 60 1.7 1.5
Digestive Organs and Peritoneum C15-C26 9,248 46.6 34.5 6,762 49.5 33.4 1,273 35.9 32.1
Respiratory System C30-C39 7,060 35.6 25.9 5,628 41.2 27.3 779 22.0 19.3
Trachea, Bronchus and Lung C33-C34 6,806 34.3 25.0 5,444 39.9 26.5 736 20.8 18.3
Skin C43-C44 563 2.8 2.1 530 3.9 2.6 17 0.5 0.4
Breast C50 2,303 11.6 9.2 1,663 12.2 8.9 420 11.9 10.1
Genital Organs C51-C63 3,671 18.5 13.5 2,567 18.8 12.5 712 20.1 17.8
Urinary Organs C64-C68 1,548 7.8 5.5 1,314 9.6 6.1 122 3.4 3.2
Other and Unspecified Sites C40-C42,C45-C49,C69-C80,C97 4,535 22.8 17.1 3,613 26.5 18.1 475 13.4 11.9
Lymphatic and Hematopoietic Tissues C81-C96 2,980 15.0 11.2 2,282 16.7 11.2 373 10.5 9.7
Diabetes Mellitus E10-E14 4,799 24.2 18.0 3,159 23.1 15.5 1,029 29.1 26.7
Parkinson’s Disease G20-G21 1,704 8.6 6.2 1,473 10.8 6.9 92 2.6 2.5
Alzheimer’s Disease G30 3,553 17.9 12.8 2,965 21.7 13.7 324 9.1 8.7
Diseases of the Circulatory System I00-I99 53,393 268.9 195.1 39,561 289.8 186.9 8,498 239.9 221.0
Diseases of the Heart I00-I09,I11,I13,I20-I51 42,243 212.7 153.9 31,521 230.9 148.4 6,597 186.3 171.3
Acute Rheumatic Fever & Chronic Rheumatic Heart Dis. I00-I09 193 1.0 0.7 163 1.2 0.8 15 0.4 0.4
Hypertension with Heart Disease I11,I13 4,946 24.9 18.3 3,281 24.0 15.6 1,172 33.1 30.5
Acute Myocardial Infarction I21-I22 5,263 26.5 19.3 4,173 30.6 19.9 620 17.5 16.0
Other Ischemic Heart Diseases I20,I24-I25 22,446 113.0 81.0 16,263 119.1 75.8 3,624 102.3 93.9
Diseases of Pulmonary Circulation I26-I28 975 4.9 3.8 695 5.1 3.5 195 5.5 5.2
Other Diseases of the Heart I30-I51 8,420 42.4 30.9 6,946 50.9 32.9 971 27.4 25.4
Hypertension with or without Renal Disease I10,I12 2,880 14.5 10.5 1,938 14.2 9.1 616 17.4 16.1
Cerebrovascular Disease I60-I69 6,635 33.4 24.5 4,872 35.7 23.2 1,006 28.4 26.3
Arteriosclerosis I70 259 1.3 0.9 198 1.5 0.9 41 1.2 1.0
Other Diseases of the Circulatory System I71-I78,I80-I99 1,376 6.9 5.3 1,032 7.6 5.2 238 6.7 6.2
Pneumonia J12-J18 3,724 18.8 13.6 2,740 20.1 13.0 537 15.2 13.9
Influenza J10-J11 27 0.1 0.1 22 0.2 0.1 3 0.1 0.1
Chronic Lower Respiratory Disease (CLRD) J40-J47 5,865 29.5 21.6 4,941 36.2 23.8 551 15.6 13.9
Gastritis, Enteritis, Colitis, Diverticulitis K29,K50-K52,K57 293 1.5 1.1 251 1.8 1.3 27 0.8 0.7
Cirrhosis of Liver K70,K73-K74 1,955 9.8 8.1 1,548 11.3 8.7 167 4.7 4.3
Nephritis, Nephrotic Syndrome, Nephrosis N00-N07,N17-N19,N25-N27 2,534 12.8 9.3 1,870 13.7 8.9 450 12.7 11.6
Complications of Pregnancy, Childbirth, and Puerperium A34,O00-O99 72 34.3 . 30 24.3 . 27 81.7 .
Maternal Causes A34,O00-O95,O98-O99 55 26.2 . 22 17.8 . 20 60.6 .
Congenital Anomalies Q00-Q99 455 2.3 . 320 2.3 . 63 1.8 .
Certain Conditions Originating in the Perinatal Period P00-P96 427 2.2 . 163 1.2 . 150 4.2 .
Sudden Infant Death Syndrome R95 58 27.6 . 28 22.7 . 19 57.5 .
Accidents (Total) V01-X59,Y85-Y86 10,091 50.8 46.1 6,981 51.1 45.2 1,867 52.7 49.7
Motor Vehicle See Note Below 1,335 6.7 6.3 913 6.7 6.1 211 6.0 5.9
Drownings W65-W74 151 0.8 0.7 93 0.7 0.6 34 1.0 1.0
Falls W00-W19 1,953 9.8 7.2 1,600 11.7 7.7 146 4.1 3.9
Poisonings X40-X49 5,546 27.9 27.1 3,583 26.2 26.4 1,309 37.0 34.3
Suicide X60-X84,Y87.0 1,647 8.3 7.8 1,280 9.4 8.6 122 3.4 3.5
Homicide and Legal Intervention X85-Y09,Y35,Y87.1,Y89.0 928 4.7 4.9 235 1.7 1.8 571 16.1 16.7
COVID-19 U07.1 21,598 108.8 80.2 15,009 109.9 72.7 3,177 89.7 81.6
All Other Causes 33,154 167.0 125.0 26,041 190.7 127.4 4,188 118.3 110.5

Remove codes column

nyc_data <- nyc_data[, -which(names(nyc_data) ==    "ICD10.Code.s.")]
nyc_data
##                                            Cause.of.Death   Total    Total.1
## 1                                          Cause of Death  Number Crude Rate
## 2                                                   Total 181,421      913.6
## 3                                            Tuberculosis      34        0.2
## 4                                              Septicemia   2,241       11.3
## 5              Acquired Immune Deficiency Syndrome (AIDS)     405        2.0
## 6                                     Malignant Neoplasms  32,464      163.5
## 7                               Buccal Cavity and Pharynx     556        2.8
## 8                         Digestive Organs and Peritoneum   9,248       46.6
## 9                                      Respiratory System   7,060       35.6
## 10                             Trachea, Bronchus and Lung   6,806       34.3
## 11                                                   Skin     563        2.8
## 12                                                 Breast   2,303       11.6
## 13                                         Genital Organs   3,671       18.5
## 14                                         Urinary Organs   1,548        7.8
## 15                            Other and Unspecified Sites   4,535       22.8
## 16                    Lymphatic and Hematopoietic Tissues   2,980       15.0
## 17                                      Diabetes Mellitus   4,799       24.2
## 18                                    Parkinson's Disease   1,704        8.6
## 19                                    Alzheimer's Disease   3,553       17.9
## 20                     Diseases of the Circulatory System  53,393      268.9
## 21                                  Diseases of the Heart  42,243      212.7
## 22   Acute Rheumatic Fever & Chronic Rheumatic Heart Dis.     193        1.0
## 23                        Hypertension with Heart Disease   4,946       24.9
## 24                            Acute Myocardial Infarction   5,263       26.5
## 25                          Other Ischemic Heart Diseases  22,446      113.0
## 26                      Diseases of Pulmonary Circulation     975        4.9
## 27                            Other Diseases of the Heart   8,420       42.4
## 28             Hypertension with or without Renal Disease   2,880       14.5
## 29                                Cerebrovascular Disease   6,635       33.4
## 30                                       Arteriosclerosis     259        1.3
## 31               Other Diseases of the Circulatory System   1,376        6.9
## 32                                              Pneumonia   3,724       18.8
## 33                                              Influenza      27        0.1
## 34               Chronic Lower Respiratory Disease (CLRD)   5,865       29.5
## 35          Gastritis, Enteritis, Colitis, Diverticulitis     293        1.5
## 36                                     Cirrhosis of Liver   1,955        9.8
## 37               Nephritis, Nephrotic Syndrome, Nephrosis   2,534       12.8
## 38 Complications of Pregnancy, Childbirth, and Puerperium      72       34.3
## 39                                        Maternal Causes      55       26.2
## 40                                   Congenital Anomalies     455        2.3
## 41 Certain Conditions Originating in the Perinatal Period     427        2.2
## 42                           Sudden Infant Death Syndrome      58       27.6
## 43                                      Accidents (Total)  10,091       50.8
## 44                                          Motor Vehicle   1,335        6.7
## 45                                              Drownings     151        0.8
## 46                                                  Falls   1,953        9.8
## 47                                             Poisonings   5,546       27.9
## 48                                                Suicide   1,647        8.3
## 49                        Homicide and Legal Intervention     928        4.7
## 50                                               COVID-19  21,598      108.8
## 51                                       All Other Causes  33,154      167.0
##          Total.2 White.Alone White.Alone.1 White.Alone.2 Black.Alone
## 1  Adjusted Rate      Number    Crude Rate Adjusted Rate      Number
## 2          687.0     135,117         989.6         669.5      26,707
## 3            0.1          12           0.1           0.1           5
## 4            8.4       1,585          11.6           7.7         410
## 5            1.7         135           1.0           0.8         199
## 6          121.1      24,768         181.4         122.1       4,231
## 7            2.0         409           3.0           2.0          60
## 8           34.5       6,762          49.5          33.4       1,273
## 9           25.9       5,628          41.2          27.3         779
## 10          25.0       5,444          39.9          26.5         736
## 11           2.1         530           3.9           2.6          17
## 12           9.2       1,663          12.2           8.9         420
## 13          13.5       2,567          18.8          12.5         712
## 14           5.5       1,314           9.6           6.1         122
## 15          17.1       3,613          26.5          18.1         475
## 16          11.2       2,282          16.7          11.2         373
## 17          18.0       3,159          23.1          15.5       1,029
## 18           6.2       1,473          10.8           6.9          92
## 19          12.8       2,965          21.7          13.7         324
## 20         195.1      39,561         289.8         186.9       8,498
## 21         153.9      31,521         230.9         148.4       6,597
## 22           0.7         163           1.2           0.8          15
## 23          18.3       3,281          24.0          15.6       1,172
## 24          19.3       4,173          30.6          19.9         620
## 25          81.0      16,263         119.1          75.8       3,624
## 26           3.8         695           5.1           3.5         195
## 27          30.9       6,946          50.9          32.9         971
## 28          10.5       1,938          14.2           9.1         616
## 29          24.5       4,872          35.7          23.2       1,006
## 30           0.9         198           1.5           0.9          41
## 31           5.3       1,032           7.6           5.2         238
## 32          13.6       2,740          20.1          13.0         537
## 33           0.1          22           0.2           0.1           3
## 34          21.6       4,941          36.2          23.8         551
## 35           1.1         251           1.8           1.3          27
## 36           8.1       1,548          11.3           8.7         167
## 37           9.3       1,870          13.7           8.9         450
## 38             .          30          24.3             .          27
## 39             .          22          17.8             .          20
## 40             .         320           2.3             .          63
## 41             .         163           1.2             .         150
## 42             .          28          22.7             .          19
## 43          46.1       6,981          51.1          45.2       1,867
## 44           6.3         913           6.7           6.1         211
## 45           0.7          93           0.7           0.6          34
## 46           7.2       1,600          11.7           7.7         146
## 47          27.1       3,583          26.2          26.4       1,309
## 48           7.8       1,280           9.4           8.6         122
## 49           4.9         235           1.7           1.8         571
## 50          80.2      15,009         109.9          72.7       3,177
## 51         125.0      26,041         190.7         127.4       4,188
##    Black.Alone.1 Black.Alone.2
## 1     Crude Rate Adjusted Rate
## 2          754.1         695.6
## 3            0.1           0.1
## 4           11.6          10.8
## 5            5.6           5.1
## 6          119.5         106.1
## 7            1.7           1.5
## 8           35.9          32.1
## 9           22.0          19.3
## 10          20.8          18.3
## 11           0.5           0.4
## 12          11.9          10.1
## 13          20.1          17.8
## 14           3.4           3.2
## 15          13.4          11.9
## 16          10.5           9.7
## 17          29.1          26.7
## 18           2.6           2.5
## 19           9.1           8.7
## 20         239.9         221.0
## 21         186.3         171.3
## 22           0.4           0.4
## 23          33.1          30.5
## 24          17.5          16.0
## 25         102.3          93.9
## 26           5.5           5.2
## 27          27.4          25.4
## 28          17.4          16.1
## 29          28.4          26.3
## 30           1.2           1.0
## 31           6.7           6.2
## 32          15.2          13.9
## 33           0.1           0.1
## 34          15.6          13.9
## 35           0.8           0.7
## 36           4.7           4.3
## 37          12.7          11.6
## 38          81.7             .
## 39          60.6             .
## 40           1.8             .
## 41           4.2             .
## 42          57.5             .
## 43          52.7          49.7
## 44           6.0           5.9
## 45           1.0           1.0
## 46           4.1           3.9
## 47          37.0          34.3
## 48           3.4           3.5
## 49          16.1          16.7
## 50          89.7          81.6
## 51         118.3         110.5

Rename the columns for better understanding

colnames(nyc_data)[1] = "Causes_Death"
colnames(nyc_data)[2] = "Total_Number"
colnames(nyc_data)[3] = "Crud_Rate"
colnames(nyc_data)[4] = "Adjusted_Rate"
colnames(nyc_data)[5] = "White_Total"
colnames(nyc_data)[6] = "White_Crude_Rate"
colnames(nyc_data)[7] = "White_Adjusted_Rate"
colnames(nyc_data)[8] = "Black_Total"
colnames(nyc_data)[9] = "Black_Crude_Rate"
colnames(nyc_data)[10] = "Black_Adjusted_Rate"
nyc_data %>%
   kbl() %>%
  kable_styling(full_width = F)
Causes_Death Total_Number Crud_Rate Adjusted_Rate White_Total White_Crude_Rate White_Adjusted_Rate Black_Total Black_Crude_Rate Black_Adjusted_Rate
Cause of Death Number Crude Rate Adjusted Rate Number Crude Rate Adjusted Rate Number Crude Rate Adjusted Rate
Total 181,421 913.6 687.0 135,117 989.6 669.5 26,707 754.1 695.6
Tuberculosis 34 0.2 0.1 12 0.1 0.1 5 0.1 0.1
Septicemia 2,241 11.3 8.4 1,585 11.6 7.7 410 11.6 10.8
Acquired Immune Deficiency Syndrome (AIDS) 405 2.0 1.7 135 1.0 0.8 199 5.6 5.1
Malignant Neoplasms 32,464 163.5 121.1 24,768 181.4 122.1 4,231 119.5 106.1
Buccal Cavity and Pharynx 556 2.8 2.0 409 3.0 2.0 60 1.7 1.5
Digestive Organs and Peritoneum 9,248 46.6 34.5 6,762 49.5 33.4 1,273 35.9 32.1
Respiratory System 7,060 35.6 25.9 5,628 41.2 27.3 779 22.0 19.3
Trachea, Bronchus and Lung 6,806 34.3 25.0 5,444 39.9 26.5 736 20.8 18.3
Skin 563 2.8 2.1 530 3.9 2.6 17 0.5 0.4
Breast 2,303 11.6 9.2 1,663 12.2 8.9 420 11.9 10.1
Genital Organs 3,671 18.5 13.5 2,567 18.8 12.5 712 20.1 17.8
Urinary Organs 1,548 7.8 5.5 1,314 9.6 6.1 122 3.4 3.2
Other and Unspecified Sites 4,535 22.8 17.1 3,613 26.5 18.1 475 13.4 11.9
Lymphatic and Hematopoietic Tissues 2,980 15.0 11.2 2,282 16.7 11.2 373 10.5 9.7
Diabetes Mellitus 4,799 24.2 18.0 3,159 23.1 15.5 1,029 29.1 26.7
Parkinson’s Disease 1,704 8.6 6.2 1,473 10.8 6.9 92 2.6 2.5
Alzheimer’s Disease 3,553 17.9 12.8 2,965 21.7 13.7 324 9.1 8.7
Diseases of the Circulatory System 53,393 268.9 195.1 39,561 289.8 186.9 8,498 239.9 221.0
Diseases of the Heart 42,243 212.7 153.9 31,521 230.9 148.4 6,597 186.3 171.3
Acute Rheumatic Fever & Chronic Rheumatic Heart Dis. 193 1.0 0.7 163 1.2 0.8 15 0.4 0.4
Hypertension with Heart Disease 4,946 24.9 18.3 3,281 24.0 15.6 1,172 33.1 30.5
Acute Myocardial Infarction 5,263 26.5 19.3 4,173 30.6 19.9 620 17.5 16.0
Other Ischemic Heart Diseases 22,446 113.0 81.0 16,263 119.1 75.8 3,624 102.3 93.9
Diseases of Pulmonary Circulation 975 4.9 3.8 695 5.1 3.5 195 5.5 5.2
Other Diseases of the Heart 8,420 42.4 30.9 6,946 50.9 32.9 971 27.4 25.4
Hypertension with or without Renal Disease 2,880 14.5 10.5 1,938 14.2 9.1 616 17.4 16.1
Cerebrovascular Disease 6,635 33.4 24.5 4,872 35.7 23.2 1,006 28.4 26.3
Arteriosclerosis 259 1.3 0.9 198 1.5 0.9 41 1.2 1.0
Other Diseases of the Circulatory System 1,376 6.9 5.3 1,032 7.6 5.2 238 6.7 6.2
Pneumonia 3,724 18.8 13.6 2,740 20.1 13.0 537 15.2 13.9
Influenza 27 0.1 0.1 22 0.2 0.1 3 0.1 0.1
Chronic Lower Respiratory Disease (CLRD) 5,865 29.5 21.6 4,941 36.2 23.8 551 15.6 13.9
Gastritis, Enteritis, Colitis, Diverticulitis 293 1.5 1.1 251 1.8 1.3 27 0.8 0.7
Cirrhosis of Liver 1,955 9.8 8.1 1,548 11.3 8.7 167 4.7 4.3
Nephritis, Nephrotic Syndrome, Nephrosis 2,534 12.8 9.3 1,870 13.7 8.9 450 12.7 11.6
Complications of Pregnancy, Childbirth, and Puerperium 72 34.3 . 30 24.3 . 27 81.7 .
Maternal Causes 55 26.2 . 22 17.8 . 20 60.6 .
Congenital Anomalies 455 2.3 . 320 2.3 . 63 1.8 .
Certain Conditions Originating in the Perinatal Period 427 2.2 . 163 1.2 . 150 4.2 .
Sudden Infant Death Syndrome 58 27.6 . 28 22.7 . 19 57.5 .
Accidents (Total) 10,091 50.8 46.1 6,981 51.1 45.2 1,867 52.7 49.7
Motor Vehicle 1,335 6.7 6.3 913 6.7 6.1 211 6.0 5.9
Drownings 151 0.8 0.7 93 0.7 0.6 34 1.0 1.0
Falls 1,953 9.8 7.2 1,600 11.7 7.7 146 4.1 3.9
Poisonings 5,546 27.9 27.1 3,583 26.2 26.4 1,309 37.0 34.3
Suicide 1,647 8.3 7.8 1,280 9.4 8.6 122 3.4 3.5
Homicide and Legal Intervention 928 4.7 4.9 235 1.7 1.8 571 16.1 16.7
COVID-19 21,598 108.8 80.2 15,009 109.9 72.7 3,177 89.7 81.6
All Other Causes 33,154 167.0 125.0 26,041 190.7 127.4 4,188 118.3 110.5

Remove Commas from numerical Values

nyc_data$Total_Number <- gsub( ",","", nyc_data$Total_Number)
nyc_data$Crud_Rate <- gsub( ",","", nyc_data$Crud_Rate)
nyc_data$Adjusted_Rate <- gsub( ",","", nyc_data$Adjusted_Rate)
nyc_data$White_Total <- gsub( ",","", nyc_data$White_Total)
nyc_data$White_Crude_Rate <- gsub( ",","", nyc_data$White_Crude_Rate)
nyc_data$White_Adjusted_Rate <- gsub( ",","", nyc_data$White_Adjusted_Rate)
nyc_data$Black_Total <- gsub( ",","", nyc_data$Black_Total)
nyc_data$Black_Crude_Rate <- gsub( ",","", nyc_data$Black_Crude_Rate)
nyc_data$Black_Adjusted_Rate <- gsub( ",","", nyc_data$Black_Adjusted_Rate)
nyc_data %>%
   kbl() %>%
  kable_styling(full_width = F)
Causes_Death Total_Number Crud_Rate Adjusted_Rate White_Total White_Crude_Rate White_Adjusted_Rate Black_Total Black_Crude_Rate Black_Adjusted_Rate
Cause of Death Number Crude Rate Adjusted Rate Number Crude Rate Adjusted Rate Number Crude Rate Adjusted Rate
Total 181421 913.6 687.0 135117 989.6 669.5 26707 754.1 695.6
Tuberculosis 34 0.2 0.1 12 0.1 0.1 5 0.1 0.1
Septicemia 2241 11.3 8.4 1585 11.6 7.7 410 11.6 10.8
Acquired Immune Deficiency Syndrome (AIDS) 405 2.0 1.7 135 1.0 0.8 199 5.6 5.1
Malignant Neoplasms 32464 163.5 121.1 24768 181.4 122.1 4231 119.5 106.1
Buccal Cavity and Pharynx 556 2.8 2.0 409 3.0 2.0 60 1.7 1.5
Digestive Organs and Peritoneum 9248 46.6 34.5 6762 49.5 33.4 1273 35.9 32.1
Respiratory System 7060 35.6 25.9 5628 41.2 27.3 779 22.0 19.3
Trachea, Bronchus and Lung 6806 34.3 25.0 5444 39.9 26.5 736 20.8 18.3
Skin 563 2.8 2.1 530 3.9 2.6 17 0.5 0.4
Breast 2303 11.6 9.2 1663 12.2 8.9 420 11.9 10.1
Genital Organs 3671 18.5 13.5 2567 18.8 12.5 712 20.1 17.8
Urinary Organs 1548 7.8 5.5 1314 9.6 6.1 122 3.4 3.2
Other and Unspecified Sites 4535 22.8 17.1 3613 26.5 18.1 475 13.4 11.9
Lymphatic and Hematopoietic Tissues 2980 15.0 11.2 2282 16.7 11.2 373 10.5 9.7
Diabetes Mellitus 4799 24.2 18.0 3159 23.1 15.5 1029 29.1 26.7
Parkinson’s Disease 1704 8.6 6.2 1473 10.8 6.9 92 2.6 2.5
Alzheimer’s Disease 3553 17.9 12.8 2965 21.7 13.7 324 9.1 8.7
Diseases of the Circulatory System 53393 268.9 195.1 39561 289.8 186.9 8498 239.9 221.0
Diseases of the Heart 42243 212.7 153.9 31521 230.9 148.4 6597 186.3 171.3
Acute Rheumatic Fever & Chronic Rheumatic Heart Dis. 193 1.0 0.7 163 1.2 0.8 15 0.4 0.4
Hypertension with Heart Disease 4946 24.9 18.3 3281 24.0 15.6 1172 33.1 30.5
Acute Myocardial Infarction 5263 26.5 19.3 4173 30.6 19.9 620 17.5 16.0
Other Ischemic Heart Diseases 22446 113.0 81.0 16263 119.1 75.8 3624 102.3 93.9
Diseases of Pulmonary Circulation 975 4.9 3.8 695 5.1 3.5 195 5.5 5.2
Other Diseases of the Heart 8420 42.4 30.9 6946 50.9 32.9 971 27.4 25.4
Hypertension with or without Renal Disease 2880 14.5 10.5 1938 14.2 9.1 616 17.4 16.1
Cerebrovascular Disease 6635 33.4 24.5 4872 35.7 23.2 1006 28.4 26.3
Arteriosclerosis 259 1.3 0.9 198 1.5 0.9 41 1.2 1.0
Other Diseases of the Circulatory System 1376 6.9 5.3 1032 7.6 5.2 238 6.7 6.2
Pneumonia 3724 18.8 13.6 2740 20.1 13.0 537 15.2 13.9
Influenza 27 0.1 0.1 22 0.2 0.1 3 0.1 0.1
Chronic Lower Respiratory Disease (CLRD) 5865 29.5 21.6 4941 36.2 23.8 551 15.6 13.9
Gastritis, Enteritis, Colitis, Diverticulitis 293 1.5 1.1 251 1.8 1.3 27 0.8 0.7
Cirrhosis of Liver 1955 9.8 8.1 1548 11.3 8.7 167 4.7 4.3
Nephritis, Nephrotic Syndrome, Nephrosis 2534 12.8 9.3 1870 13.7 8.9 450 12.7 11.6
Complications of Pregnancy, Childbirth, and Puerperium 72 34.3 . 30 24.3 . 27 81.7 .
Maternal Causes 55 26.2 . 22 17.8 . 20 60.6 .
Congenital Anomalies 455 2.3 . 320 2.3 . 63 1.8 .
Certain Conditions Originating in the Perinatal Period 427 2.2 . 163 1.2 . 150 4.2 .
Sudden Infant Death Syndrome 58 27.6 . 28 22.7 . 19 57.5 .
Accidents (Total) 10091 50.8 46.1 6981 51.1 45.2 1867 52.7 49.7
Motor Vehicle 1335 6.7 6.3 913 6.7 6.1 211 6.0 5.9
Drownings 151 0.8 0.7 93 0.7 0.6 34 1.0 1.0
Falls 1953 9.8 7.2 1600 11.7 7.7 146 4.1 3.9
Poisonings 5546 27.9 27.1 3583 26.2 26.4 1309 37.0 34.3
Suicide 1647 8.3 7.8 1280 9.4 8.6 122 3.4 3.5
Homicide and Legal Intervention 928 4.7 4.9 235 1.7 1.8 571 16.1 16.7
COVID-19 21598 108.8 80.2 15009 109.9 72.7 3177 89.7 81.6
All Other Causes 33154 167.0 125.0 26041 190.7 127.4 4188 118.3 110.5

Convert columns to numerical values

i <- c(2, 3, 4, 5, 6, 7, 8, 9)  
nyc_data[, i] <- apply(nyc_data[, i], 2, function(x) as.numeric(as.character(x)))
## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion

## Warning in FUN(newX[, i], ...): NAs introduced by coercion
sapply(nyc_data, class)
##        Causes_Death        Total_Number           Crud_Rate       Adjusted_Rate 
##         "character"           "numeric"           "numeric"           "numeric" 
##         White_Total    White_Crude_Rate White_Adjusted_Rate         Black_Total 
##           "numeric"           "numeric"           "numeric"           "numeric" 
##    Black_Crude_Rate Black_Adjusted_Rate 
##           "numeric"         "character"
head(nyc_data)
##                                 Causes_Death Total_Number Crud_Rate
## 1                             Cause of Death           NA        NA
## 2                                      Total       181421     913.6
## 3                               Tuberculosis           34       0.2
## 4                                 Septicemia         2241      11.3
## 5 Acquired Immune Deficiency Syndrome (AIDS)          405       2.0
## 6                        Malignant Neoplasms        32464     163.5
##   Adjusted_Rate White_Total White_Crude_Rate White_Adjusted_Rate Black_Total
## 1            NA          NA               NA                  NA          NA
## 2         687.0      135117            989.6               669.5       26707
## 3           0.1          12              0.1                 0.1           5
## 4           8.4        1585             11.6                 7.7         410
## 5           1.7         135              1.0                 0.8         199
## 6         121.1       24768            181.4               122.1        4231
##   Black_Crude_Rate Black_Adjusted_Rate
## 1               NA       Adjusted Rate
## 2            754.1               695.6
## 3              0.1                 0.1
## 4             11.6                10.8
## 5              5.6                 5.1
## 6            119.5               106.1

BarPlot of total number and Causes of death

barplot(Total_Number ~ Causes_Death, nyc_data,
        horiz = TRUE, col= 1:4)

Chart of top 10 causes of death in NYC

nyc_data %>%
  top_n(10, Causes_Death) %>%
ggplot() + geom_bar(mapping = aes(x= Total_Number,y =Causes_Death), stat = "identity", position = "dodge", fill = "blue") + 
  labs(title = "Chart of Deaths by causes", x= "Causes of Death" , y = "Count")

Load second dataFrame from a csv file.

nj_data<-read.csv("C:/Users/vitug/OneDrive/Desktop/DATA_607/NJ1.csv")
nj_data %>%
   kbl() %>%
  kable_styling(full_width = F)
Causes.of.Death.in.Bergen.County.NJ X X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8 X.9 X.10 X.11 X.12 X.13 X.14 X.15 X.16 X.17 X.18 X.19 X.20 X.21 X.22 X.23 X.24 X.25 X.26 X.27 X.28 X.29 X.30 X.31 X.32
Causes of death year male famale total white black hispanic asian other NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Enterocolitis due to Clostridium difficile (C. diff) 2021 5 5 10 8 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Septicemia 2021 54 69 123 100 11 7 4 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Viral hepatitis 2021 6 NA 6 3 NA 2 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
HIV (human immunodeficiency virus) disease 2021 2 1 3 NA 1 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Coronavirus disease 2019 (COVID-19) 2021 341 256 597 378 30 114 63 12 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Cancer (malignant neoplasms) 2021 732 777 1,509 1163 47 149 128 22 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior 2021 27 22 49 38 2 3 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Anemias 2021 10 5 15 9 1 2 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Diabetes mellitus 2021 87 54 141 96 6 19 16 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Nutritional deficiencies 2021 5 15 20 16 NA 1 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Parkinson’s disease 2021 60 35 95 79 1 9 5 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Alzheimer’s disease 2021 74 200 274 223 9 29 11 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Diseases of heart 2021 860 827 1,687 1354 86 132 93 22 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Essential hypertension and hypertensive renal disease 2021 26 50 76 56 6 4 8 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Stroke (cerebrovascular diseases) 2021 160 217 377 264 24 38 NA 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Atherosclerosis 2021 2 3 5 5 NA NA 42 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Aortic aneurysm and dissection 2021 12 8 20 16 NA 3 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Influenza and pneumonia 2021 56 38 94 72 5 6 7 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Chronic lower respiratory diseases (CLRD) 2021 54 98 172 151 7 7 5 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Pneumonitis due to solids and liquids 2021 37 45 82 66 1 2 11 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Chronic liver disease and cirrhosis 2021 45 24 69 44 2 15 6 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Nephritis, nephrotic syndrome and nephrosis (kidney disease) 2021 75 56 131 92 12 15 10 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Certain conditions originating in the perinatal period 2021 7 4 11 7 1 NA 1 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Congenital malformations, deformations and chromosomal abnormalities (birth defects) 2021 12 4 16 6 1 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Unintentional injuries 2021 230 137 367 257 24 56 17 13 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Suicide (intentional self-harm) 2021 47 12 59 36 1 9 9 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Homicide (assault) 2021 6 8 14 3 1 7 2 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Complications of medical and surgical care 2021 6 6 12 6 2 2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Other than 28 Major Causes 2021 526 733 1,259 986 59 107 92 15 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Total 7,293 . NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

remove empty columns from dataset

nj_data[ , c('X.9','X.10','X.11','X.12','X.13','X.14','X.15','X.16','X.17','X.18','X.19','X.20','X.21','X.22','X.23','X.24','X.25','X.26','X.27','X.28','X.29','X.30','X.31','X.32')] <- list(NULL)
nj_data %>%
 kbl() %>%
  kable_styling(full_width = F)
Causes.of.Death.in.Bergen.County.NJ X X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8
Causes of death year male famale total white black hispanic asian other
Enterocolitis due to Clostridium difficile (C. diff) 2021 5 5 10 8 2 NA NA NA
Septicemia 2021 54 69 123 100 11 7 4 1
Viral hepatitis 2021 6 NA 6 3 NA 2 1 NA
HIV (human immunodeficiency virus) disease 2021 2 1 3 NA 1 2 NA NA
Coronavirus disease 2019 (COVID-19) 2021 341 256 597 378 30 114 63 12
Cancer (malignant neoplasms) 2021 732 777 1,509 1163 47 149 128 22
In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior 2021 27 22 49 38 2 3 6 NA
Anemias 2021 10 5 15 9 1 2 3 NA
Diabetes mellitus 2021 87 54 141 96 6 19 16 4
Nutritional deficiencies 2021 5 15 20 16 NA 1 3 NA
Parkinson’s disease 2021 60 35 95 79 1 9 5 1
Alzheimer’s disease 2021 74 200 274 223 9 29 11 2
Diseases of heart 2021 860 827 1,687 1354 86 132 93 22
Essential hypertension and hypertensive renal disease 2021 26 50 76 56 6 4 8 2
Stroke (cerebrovascular diseases) 2021 160 217 377 264 24 38 NA 9
Atherosclerosis 2021 2 3 5 5 NA NA 42 NA
Aortic aneurysm and dissection 2021 12 8 20 16 NA 3 1 NA
Influenza and pneumonia 2021 56 38 94 72 5 6 7 4
Chronic lower respiratory diseases (CLRD) 2021 54 98 172 151 7 7 5 2
Pneumonitis due to solids and liquids 2021 37 45 82 66 1 2 11 2
Chronic liver disease and cirrhosis 2021 45 24 69 44 2 15 6 2
Nephritis, nephrotic syndrome and nephrosis (kidney disease) 2021 75 56 131 92 12 15 10 2
Certain conditions originating in the perinatal period 2021 7 4 11 7 1 NA 1 2
Congenital malformations, deformations and chromosomal abnormalities (birth defects) 2021 12 4 16 6 1 9 NA NA
Unintentional injuries 2021 230 137 367 257 24 56 17 13
Suicide (intentional self-harm) 2021 47 12 59 36 1 9 9 4
Homicide (assault) 2021 6 8 14 3 1 7 2 1
Complications of medical and surgical care 2021 6 6 12 6 2 2 2 NA
Other than 28 Major Causes 2021 526 733 1,259 986 59 107 92 15
Total 7,293 .

rename columns

colnames(nj_data)[1] = "Causes_Death"
colnames(nj_data)[2] = "Year"
colnames(nj_data)[3] = "Male"
colnames(nj_data)[4] = "Female"
colnames(nj_data)[5] = "Total"
colnames(nj_data)[6] = "White"
colnames(nj_data)[7] = "Black"
colnames(nj_data)[8] = "Hispanic"
colnames(nj_data)[9] = "Asian"
colnames(nj_data)[10] = "Other"
nj_data %>%
   kbl() %>%
  kable_styling(full_width = F)
Causes_Death Year Male Female Total White Black Hispanic Asian Other
Causes of death year male famale total white black hispanic asian other
Enterocolitis due to Clostridium difficile (C. diff) 2021 5 5 10 8 2 NA NA NA
Septicemia 2021 54 69 123 100 11 7 4 1
Viral hepatitis 2021 6 NA 6 3 NA 2 1 NA
HIV (human immunodeficiency virus) disease 2021 2 1 3 NA 1 2 NA NA
Coronavirus disease 2019 (COVID-19) 2021 341 256 597 378 30 114 63 12
Cancer (malignant neoplasms) 2021 732 777 1,509 1163 47 149 128 22
In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior 2021 27 22 49 38 2 3 6 NA
Anemias 2021 10 5 15 9 1 2 3 NA
Diabetes mellitus 2021 87 54 141 96 6 19 16 4
Nutritional deficiencies 2021 5 15 20 16 NA 1 3 NA
Parkinson’s disease 2021 60 35 95 79 1 9 5 1
Alzheimer’s disease 2021 74 200 274 223 9 29 11 2
Diseases of heart 2021 860 827 1,687 1354 86 132 93 22
Essential hypertension and hypertensive renal disease 2021 26 50 76 56 6 4 8 2
Stroke (cerebrovascular diseases) 2021 160 217 377 264 24 38 NA 9
Atherosclerosis 2021 2 3 5 5 NA NA 42 NA
Aortic aneurysm and dissection 2021 12 8 20 16 NA 3 1 NA
Influenza and pneumonia 2021 56 38 94 72 5 6 7 4
Chronic lower respiratory diseases (CLRD) 2021 54 98 172 151 7 7 5 2
Pneumonitis due to solids and liquids 2021 37 45 82 66 1 2 11 2
Chronic liver disease and cirrhosis 2021 45 24 69 44 2 15 6 2
Nephritis, nephrotic syndrome and nephrosis (kidney disease) 2021 75 56 131 92 12 15 10 2
Certain conditions originating in the perinatal period 2021 7 4 11 7 1 NA 1 2
Congenital malformations, deformations and chromosomal abnormalities (birth defects) 2021 12 4 16 6 1 9 NA NA
Unintentional injuries 2021 230 137 367 257 24 56 17 13
Suicide (intentional self-harm) 2021 47 12 59 36 1 9 9 4
Homicide (assault) 2021 6 8 14 3 1 7 2 1
Complications of medical and surgical care 2021 6 6 12 6 2 2 2 NA
Other than 28 Major Causes 2021 526 733 1,259 986 59 107 92 15
Total 7,293 .

Convert table to a DataFrame.

nj_data <- as.data.frame(nj_data)
nj_data %>%
   kbl() %>%
  kable_styling(full_width = F)
Causes_Death Year Male Female Total White Black Hispanic Asian Other
Causes of death year male famale total white black hispanic asian other
Enterocolitis due to Clostridium difficile (C. diff) 2021 5 5 10 8 2 NA NA NA
Septicemia 2021 54 69 123 100 11 7 4 1
Viral hepatitis 2021 6 NA 6 3 NA 2 1 NA
HIV (human immunodeficiency virus) disease 2021 2 1 3 NA 1 2 NA NA
Coronavirus disease 2019 (COVID-19) 2021 341 256 597 378 30 114 63 12
Cancer (malignant neoplasms) 2021 732 777 1,509 1163 47 149 128 22
In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior 2021 27 22 49 38 2 3 6 NA
Anemias 2021 10 5 15 9 1 2 3 NA
Diabetes mellitus 2021 87 54 141 96 6 19 16 4
Nutritional deficiencies 2021 5 15 20 16 NA 1 3 NA
Parkinson’s disease 2021 60 35 95 79 1 9 5 1
Alzheimer’s disease 2021 74 200 274 223 9 29 11 2
Diseases of heart 2021 860 827 1,687 1354 86 132 93 22
Essential hypertension and hypertensive renal disease 2021 26 50 76 56 6 4 8 2
Stroke (cerebrovascular diseases) 2021 160 217 377 264 24 38 NA 9
Atherosclerosis 2021 2 3 5 5 NA NA 42 NA
Aortic aneurysm and dissection 2021 12 8 20 16 NA 3 1 NA
Influenza and pneumonia 2021 56 38 94 72 5 6 7 4
Chronic lower respiratory diseases (CLRD) 2021 54 98 172 151 7 7 5 2
Pneumonitis due to solids and liquids 2021 37 45 82 66 1 2 11 2
Chronic liver disease and cirrhosis 2021 45 24 69 44 2 15 6 2
Nephritis, nephrotic syndrome and nephrosis (kidney disease) 2021 75 56 131 92 12 15 10 2
Certain conditions originating in the perinatal period 2021 7 4 11 7 1 NA 1 2
Congenital malformations, deformations and chromosomal abnormalities (birth defects) 2021 12 4 16 6 1 9 NA NA
Unintentional injuries 2021 230 137 367 257 24 56 17 13
Suicide (intentional self-harm) 2021 47 12 59 36 1 9 9 4
Homicide (assault) 2021 6 8 14 3 1 7 2 1
Complications of medical and surgical care 2021 6 6 12 6 2 2 2 NA
Other than 28 Major Causes 2021 526 733 1,259 986 59 107 92 15
Total 7,293 .

Remove row that contained names

nj_data1<- nj_data[-c(1,2),]
nj_data %>%
   kbl() %>%
  kable_styling(full_width = F)
Causes_Death Year Male Female Total White Black Hispanic Asian Other
Causes of death year male famale total white black hispanic asian other
Enterocolitis due to Clostridium difficile (C. diff) 2021 5 5 10 8 2 NA NA NA
Septicemia 2021 54 69 123 100 11 7 4 1
Viral hepatitis 2021 6 NA 6 3 NA 2 1 NA
HIV (human immunodeficiency virus) disease 2021 2 1 3 NA 1 2 NA NA
Coronavirus disease 2019 (COVID-19) 2021 341 256 597 378 30 114 63 12
Cancer (malignant neoplasms) 2021 732 777 1,509 1163 47 149 128 22
In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior 2021 27 22 49 38 2 3 6 NA
Anemias 2021 10 5 15 9 1 2 3 NA
Diabetes mellitus 2021 87 54 141 96 6 19 16 4
Nutritional deficiencies 2021 5 15 20 16 NA 1 3 NA
Parkinson’s disease 2021 60 35 95 79 1 9 5 1
Alzheimer’s disease 2021 74 200 274 223 9 29 11 2
Diseases of heart 2021 860 827 1,687 1354 86 132 93 22
Essential hypertension and hypertensive renal disease 2021 26 50 76 56 6 4 8 2
Stroke (cerebrovascular diseases) 2021 160 217 377 264 24 38 NA 9
Atherosclerosis 2021 2 3 5 5 NA NA 42 NA
Aortic aneurysm and dissection 2021 12 8 20 16 NA 3 1 NA
Influenza and pneumonia 2021 56 38 94 72 5 6 7 4
Chronic lower respiratory diseases (CLRD) 2021 54 98 172 151 7 7 5 2
Pneumonitis due to solids and liquids 2021 37 45 82 66 1 2 11 2
Chronic liver disease and cirrhosis 2021 45 24 69 44 2 15 6 2
Nephritis, nephrotic syndrome and nephrosis (kidney disease) 2021 75 56 131 92 12 15 10 2
Certain conditions originating in the perinatal period 2021 7 4 11 7 1 NA 1 2
Congenital malformations, deformations and chromosomal abnormalities (birth defects) 2021 12 4 16 6 1 9 NA NA
Unintentional injuries 2021 230 137 367 257 24 56 17 13
Suicide (intentional self-harm) 2021 47 12 59 36 1 9 9 4
Homicide (assault) 2021 6 8 14 3 1 7 2 1
Complications of medical and surgical care 2021 6 6 12 6 2 2 2 NA
Other than 28 Major Causes 2021 526 733 1,259 986 59 107 92 15
Total 7,293 .

Remove empty rows

nj_data2 <- nj_data1[-c(30:61), ]
nj_data2 %>%
 kbl() %>%
  kable_styling(full_width = F)
Causes_Death Year Male Female Total White Black Hispanic Asian Other
3 Enterocolitis due to Clostridium difficile (C. diff) 2021 5 5 10 8 2 NA NA NA
4 Septicemia 2021 54 69 123 100 11 7 4 1
5 Viral hepatitis 2021 6 NA 6 3 NA 2 1 NA
6 HIV (human immunodeficiency virus) disease 2021 2 1 3 NA 1 2 NA NA
7 Coronavirus disease 2019 (COVID-19) 2021 341 256 597 378 30 114 63 12
8 Cancer (malignant neoplasms) 2021 732 777 1,509 1163 47 149 128 22
9 In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior 2021 27 22 49 38 2 3 6 NA
10 Anemias 2021 10 5 15 9 1 2 3 NA
11 Diabetes mellitus 2021 87 54 141 96 6 19 16 4
12 Nutritional deficiencies 2021 5 15 20 16 NA 1 3 NA
13 Parkinson’s disease 2021 60 35 95 79 1 9 5 1
14 Alzheimer’s disease 2021 74 200 274 223 9 29 11 2
15 Diseases of heart 2021 860 827 1,687 1354 86 132 93 22
16 Essential hypertension and hypertensive renal disease 2021 26 50 76 56 6 4 8 2
17 Stroke (cerebrovascular diseases) 2021 160 217 377 264 24 38 NA 9
18 Atherosclerosis 2021 2 3 5 5 NA NA 42 NA
19 Aortic aneurysm and dissection 2021 12 8 20 16 NA 3 1 NA
20 Influenza and pneumonia 2021 56 38 94 72 5 6 7 4
21 Chronic lower respiratory diseases (CLRD) 2021 54 98 172 151 7 7 5 2
22 Pneumonitis due to solids and liquids 2021 37 45 82 66 1 2 11 2
23 Chronic liver disease and cirrhosis 2021 45 24 69 44 2 15 6 2
24 Nephritis, nephrotic syndrome and nephrosis (kidney disease) 2021 75 56 131 92 12 15 10 2
25 Certain conditions originating in the perinatal period 2021 7 4 11 7 1 NA 1 2
26 Congenital malformations, deformations and chromosomal abnormalities (birth defects) 2021 12 4 16 6 1 9 NA NA
27 Unintentional injuries 2021 230 137 367 257 24 56 17 13
28 Suicide (intentional self-harm) 2021 47 12 59 36 1 9 9 4
29 Homicide (assault) 2021 6 8 14 3 1 7 2 1
30 Complications of medical and surgical care 2021 6 6 12 6 2 2 2 NA
31 Other than 28 Major Causes 2021 526 733 1,259 986 59 107 92 15

Remove year column

nj_data3 <- nj_data2[, -which(names(nj_data2) == "Year")]
nj_data3 %>%
   kbl() %>%
  kable_styling(full_width = F)
Causes_Death Male Female Total White Black Hispanic Asian Other
3 Enterocolitis due to Clostridium difficile (C. diff) 5 5 10 8 2 NA NA NA
4 Septicemia 54 69 123 100 11 7 4 1
5 Viral hepatitis 6 NA 6 3 NA 2 1 NA
6 HIV (human immunodeficiency virus) disease 2 1 3 NA 1 2 NA NA
7 Coronavirus disease 2019 (COVID-19) 341 256 597 378 30 114 63 12
8 Cancer (malignant neoplasms) 732 777 1,509 1163 47 149 128 22
9 In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior 27 22 49 38 2 3 6 NA
10 Anemias 10 5 15 9 1 2 3 NA
11 Diabetes mellitus 87 54 141 96 6 19 16 4
12 Nutritional deficiencies 5 15 20 16 NA 1 3 NA
13 Parkinson’s disease 60 35 95 79 1 9 5 1
14 Alzheimer’s disease 74 200 274 223 9 29 11 2
15 Diseases of heart 860 827 1,687 1354 86 132 93 22
16 Essential hypertension and hypertensive renal disease 26 50 76 56 6 4 8 2
17 Stroke (cerebrovascular diseases) 160 217 377 264 24 38 NA 9
18 Atherosclerosis 2 3 5 5 NA NA 42 NA
19 Aortic aneurysm and dissection 12 8 20 16 NA 3 1 NA
20 Influenza and pneumonia 56 38 94 72 5 6 7 4
21 Chronic lower respiratory diseases (CLRD) 54 98 172 151 7 7 5 2
22 Pneumonitis due to solids and liquids 37 45 82 66 1 2 11 2
23 Chronic liver disease and cirrhosis 45 24 69 44 2 15 6 2
24 Nephritis, nephrotic syndrome and nephrosis (kidney disease) 75 56 131 92 12 15 10 2
25 Certain conditions originating in the perinatal period 7 4 11 7 1 NA 1 2
26 Congenital malformations, deformations and chromosomal abnormalities (birth defects) 12 4 16 6 1 9 NA NA
27 Unintentional injuries 230 137 367 257 24 56 17 13
28 Suicide (intentional self-harm) 47 12 59 36 1 9 9 4
29 Homicide (assault) 6 8 14 3 1 7 2 1
30 Complications of medical and surgical care 6 6 12 6 2 2 2 NA
31 Other than 28 Major Causes 526 733 1,259 986 59 107 92 15

Transform files from String to numeric.

i <- c(2, 3, 4, 5, 6, 7, 8, 9)  
nj_data3[, i] <- apply(nj_data3[, i], 2, function(x) as.numeric(as.character(x)))
## Warning in FUN(newX[, i], ...): NAs introduced by coercion
sapply(nj_data3, class)
## Causes_Death         Male       Female        Total        White        Black 
##  "character"    "numeric"    "numeric"    "numeric"    "numeric"    "numeric" 
##     Hispanic        Asian        Other 
##    "numeric"    "numeric"    "numeric"
head(nj_data3)
##                                           Causes_Death Male Female Total White
## 3 Enterocolitis due to Clostridium difficile (C. diff)    5      5    10     8
## 4                                           Septicemia   54     69   123   100
## 5                                      Viral hepatitis    6     NA     6     3
## 6           HIV (human immunodeficiency virus) disease    2      1     3    NA
## 7                  Coronavirus disease 2019 (COVID-19)  341    256   597   378
## 8                         Cancer (malignant neoplasms)  732    777    NA  1163
##   Black Hispanic Asian Other
## 3     2       NA    NA    NA
## 4    11        7     4     1
## 5    NA        2     1    NA
## 6     1        2    NA    NA
## 7    30      114    63    12
## 8    47      149   128    22

Replace NA values rows with zeros.

nj_data3[is.na(nj_data3)] <- 0
nj_data3 %>%
 kbl() %>%
  kable_styling(full_width = F)
Causes_Death Male Female Total White Black Hispanic Asian Other
3 Enterocolitis due to Clostridium difficile (C. diff) 5 5 10 8 2 0 0 0
4 Septicemia 54 69 123 100 11 7 4 1
5 Viral hepatitis 6 0 6 3 0 2 1 0
6 HIV (human immunodeficiency virus) disease 2 1 3 0 1 2 0 0
7 Coronavirus disease 2019 (COVID-19) 341 256 597 378 30 114 63 12
8 Cancer (malignant neoplasms) 732 777 0 1163 47 149 128 22
9 In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior 27 22 49 38 2 3 6 0
10 Anemias 10 5 15 9 1 2 3 0
11 Diabetes mellitus 87 54 141 96 6 19 16 4
12 Nutritional deficiencies 5 15 20 16 0 1 3 0
13 Parkinson’s disease 60 35 95 79 1 9 5 1
14 Alzheimer’s disease 74 200 274 223 9 29 11 2
15 Diseases of heart 860 827 0 1354 86 132 93 22
16 Essential hypertension and hypertensive renal disease 26 50 76 56 6 4 8 2
17 Stroke (cerebrovascular diseases) 160 217 377 264 24 38 0 9
18 Atherosclerosis 2 3 5 5 0 0 42 0
19 Aortic aneurysm and dissection 12 8 20 16 0 3 1 0
20 Influenza and pneumonia 56 38 94 72 5 6 7 4
21 Chronic lower respiratory diseases (CLRD) 54 98 172 151 7 7 5 2
22 Pneumonitis due to solids and liquids 37 45 82 66 1 2 11 2
23 Chronic liver disease and cirrhosis 45 24 69 44 2 15 6 2
24 Nephritis, nephrotic syndrome and nephrosis (kidney disease) 75 56 131 92 12 15 10 2
25 Certain conditions originating in the perinatal period 7 4 11 7 1 0 1 2
26 Congenital malformations, deformations and chromosomal abnormalities (birth defects) 12 4 16 6 1 9 0 0
27 Unintentional injuries 230 137 367 257 24 56 17 13
28 Suicide (intentional self-harm) 47 12 59 36 1 9 9 4
29 Homicide (assault) 6 8 14 3 1 7 2 1
30 Complications of medical and surgical care 6 6 12 6 2 2 2 0
31 Other than 28 Major Causes 526 733 0 986 59 107 92 15

Create a total row with total sum of columns

#add total row to data frame
nj_data4 <- nj_data3 %>%
            bind_rows(summarise(., across(where(is.numeric), sum),
                                   across(where(is.character), ~'Total')))

#view new data frame
tail(nj_data4)
##                                  Causes_Death Male Female Total White Black
## 25                     Unintentional injuries  230    137   367   257    24
## 26            Suicide (intentional self-harm)   47     12    59    36     1
## 27                         Homicide (assault)    6      8    14     3     1
## 28 Complications of medical and surgical care    6      6    12     6     2
## 29                 Other than 28 Major Causes  526    733     0   986    59
## 30                                      Total 3564   3709  2838  5534   342
##    Hispanic Asian Other
## 25       56    17    13
## 26        9     9     4
## 27        7     2     1
## 28        2     2     0
## 29      107    92    15
## 30      749   546   122

Change data to Long Format

nj_data5 <- nj_data4 %>% pivot_longer(cols=Total, names_to = "Total Deaths", values_to = "Total_Deaths")
nj_data5 %>%
   kbl() %>%
  kable_styling(full_width = F)
Causes_Death Male Female White Black Hispanic Asian Other Total Deaths Total_Deaths
Enterocolitis due to Clostridium difficile (C. diff) 5 5 8 2 0 0 0 Total 10
Septicemia 54 69 100 11 7 4 1 Total 123
Viral hepatitis 6 0 3 0 2 1 0 Total 6
HIV (human immunodeficiency virus) disease 2 1 0 1 2 0 0 Total 3
Coronavirus disease 2019 (COVID-19) 341 256 378 30 114 63 12 Total 597
Cancer (malignant neoplasms) 732 777 1163 47 149 128 22 Total 0
In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior 27 22 38 2 3 6 0 Total 49
Anemias 10 5 9 1 2 3 0 Total 15
Diabetes mellitus 87 54 96 6 19 16 4 Total 141
Nutritional deficiencies 5 15 16 0 1 3 0 Total 20
Parkinson’s disease 60 35 79 1 9 5 1 Total 95
Alzheimer’s disease 74 200 223 9 29 11 2 Total 274
Diseases of heart 860 827 1354 86 132 93 22 Total 0
Essential hypertension and hypertensive renal disease 26 50 56 6 4 8 2 Total 76
Stroke (cerebrovascular diseases) 160 217 264 24 38 0 9 Total 377
Atherosclerosis 2 3 5 0 0 42 0 Total 5
Aortic aneurysm and dissection 12 8 16 0 3 1 0 Total 20
Influenza and pneumonia 56 38 72 5 6 7 4 Total 94
Chronic lower respiratory diseases (CLRD) 54 98 151 7 7 5 2 Total 172
Pneumonitis due to solids and liquids 37 45 66 1 2 11 2 Total 82
Chronic liver disease and cirrhosis 45 24 44 2 15 6 2 Total 69
Nephritis, nephrotic syndrome and nephrosis (kidney disease) 75 56 92 12 15 10 2 Total 131
Certain conditions originating in the perinatal period 7 4 7 1 0 1 2 Total 11
Congenital malformations, deformations and chromosomal abnormalities (birth defects) 12 4 6 1 9 0 0 Total 16
Unintentional injuries 230 137 257 24 56 17 13 Total 367
Suicide (intentional self-harm) 47 12 36 1 9 9 4 Total 59
Homicide (assault) 6 8 3 1 7 2 1 Total 14
Complications of medical and surgical care 6 6 6 2 2 2 0 Total 12
Other than 28 Major Causes 526 733 986 59 107 92 15 Total 0
Total 3564 3709 5534 342 749 546 122 Total 2838

Plot of main causes of death in NJ

library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
nj_data5 |>
 filter( Causes_Death %in% c("Coronavirus disease 2019 (COVID-19)" ,"Cancer (malignant neoplasms)", "Diseases of heart", "Stroke (cerebrovascular diseases)")) |>
ggplot( aes(x= Total_Deaths,y = Causes_Death)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ Causes_Death, scales = "free") +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_fill_brewer(palette = "Set3") +
  scale_x_discrete(breaks = unique(nj_data5$Total_Deaths)) +
labs(title = "Total Deaths By Causes", x= "Total Deaths" , y = "Causes of Death")

Plot of male vs female deaths

nj_data6 <- nj_data5 %>%
  filter(nj_data5$Causes_Death == "Coronavirus disease 2019 (COVID-19)")

nj_data6 %>% ggplot(aes(x = Male, y = Female)) + 
  geom_col(fill = "orange")   

  labs(title = "COVID 19 Deaths, Male vs Female", x = "Male", y = "Female")
## $x
## [1] "Male"
## 
## $y
## [1] "Female"
## 
## $title
## [1] "COVID 19 Deaths, Male vs Female"
## 
## attr(,"class")
## [1] "labels"

Plot of white vs Hispanic

ggplot(nj_data6, aes(Male, White, fill = Hispanic)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~Hispanic, ncol = 2, scales = "free_x")

### I am using the library arsenal to compare between the two databases, I found this tool very useful, and I never used it before.

library(arsenal)
## Warning: package 'arsenal' was built under R version 4.3.3
## 
## Attaching package: 'arsenal'
## The following object is masked from 'package:lubridate':
## 
##     is.Date
comparedf(nyc_data, nj_data5)
## Compare Object
## 
## Function Call: 
## comparedf(x = nyc_data, y = nj_data5)
## 
## Shared: 1 non-by variables and 30 observations.
## Not shared: 18 variables and 21 observations.
## 
## Differences found in 1/1 variables compared.
## 0 variables compared have non-identical attributes.
summary(comparedf(nyc_data, nj_data5))
## 
## 
## Table: Summary of data.frames
## 
## version   arg         ncol   nrow
## --------  ---------  -----  -----
## x         nyc_data      10     51
## y         nj_data5      10     30
## 
## 
## 
## Table: Summary of overall comparison
## 
## statistic                                                      value
## ------------------------------------------------------------  ------
## Number of by-variables                                             0
## Number of non-by variables in common                               1
## Number of variables compared                                       1
## Number of variables in x but not y                                 9
## Number of variables in y but not x                                 9
## Number of variables compared with some values unequal              1
## Number of variables compared with all values equal                 0
## Number of observations in common                                  30
## Number of observations in x but not y                             21
## Number of observations in y but not x                              0
## Number of observations with some compared variables unequal       30
## Number of observations with all compared variables equal           0
## Number of values unequal                                          30
## 
## 
## 
## Table: Variables not shared
## 
## version   variable               position  class     
## --------  --------------------  ---------  ----------
## x         Total_Number                  2  numeric   
## x         Crud_Rate                     3  numeric   
## x         Adjusted_Rate                 4  numeric   
## x         White_Total                   5  numeric   
## x         White_Crude_Rate              6  numeric   
## x         White_Adjusted_Rate           7  numeric   
## x         Black_Total                   8  numeric   
## x         Black_Crude_Rate              9  numeric   
## x         Black_Adjusted_Rate          10  character 
## y         Male                          2  numeric   
## y         Female                        3  numeric   
## y         White                         4  numeric   
## y         Black                         5  numeric   
## y         Hispanic                      6  numeric   
## y         Asian                         7  numeric   
## y         Other                         8  numeric   
## y         Total Deaths                  9  character 
## y         Total_Deaths                 10  numeric   
## 
## 
## 
## Table: Other variables not compared
## 
##                                  
##  --------------------------------
##  No other variables not compared 
##  --------------------------------
## 
## 
## 
## Table: Observations not shared
## 
## version    ..row.names..   observation
## --------  --------------  ------------
## x                     31            31
## x                     32            32
## x                     33            33
## x                     34            34
## x                     35            35
## x                     36            36
## x                     37            37
## x                     38            38
## x                     39            39
## x                     40            40
## x                     41            41
## x                     42            42
## x                     43            43
## x                     44            44
## x                     45            45
## x                     46            46
## x                     47            47
## x                     48            48
## x                     49            49
## x                     50            50
## x                     51            51
## 
## 
## 
## Table: Differences detected by variable
## 
## var.x          var.y            n   NAs
## -------------  -------------  ---  ----
## Causes_Death   Causes_Death    30     0
## 
## 
## 
## Table: Differences detected (20 not shown)
## 
## var.x          var.y           ..row.names..  values.x                                     values.y                                                                      row.x   row.y
## -------------  -------------  --------------  -------------------------------------------  ---------------------------------------------------------------------------  ------  ------
## Causes_Death   Causes_Death                1  Cause of Death                               Enterocolitis due to Clostridium difficile (C. diff)                              1       1
## Causes_Death   Causes_Death                2  Total                                        Septicemia                                                                        2       2
## Causes_Death   Causes_Death                3  Tuberculosis                                 Viral hepatitis                                                                   3       3
## Causes_Death   Causes_Death                4  Septicemia                                   HIV (human immunodeficiency virus) disease                                        4       4
## Causes_Death   Causes_Death                5  Acquired Immune Deficiency Syndrome (AIDS)   Coronavirus disease 2019 (COVID-19)                                               5       5
## Causes_Death   Causes_Death                6  Malignant Neoplasms                          Cancer (malignant neoplasms)                                                      6       6
## Causes_Death   Causes_Death                7  Buccal Cavity and Pharynx                    In situ neoplasms, benign neopl. & neopl. of uncertain or unknown behavior        7       7
## Causes_Death   Causes_Death                8  Digestive Organs and Peritoneum              Anemias                                                                           8       8
## Causes_Death   Causes_Death                9  Respiratory System                           Diabetes mellitus                                                                 9       9
## Causes_Death   Causes_Death               10  Trachea, Bronchus and Lung                   Nutritional deficiencies                                                         10      10
## 
## 
## 
## Table: Non-identical attributes
## 
##                              
##  ----------------------------
##  No non-identical attributes 
##  ----------------------------

PLot of total number of deaths from both dataframes

nyc_plot <-
  nyc_data %>% 
  group_by(Total_Number) %>% 
summarize(m = mean(Total_Number))  

nj_plot <- 
  nj_data5 %>% 
  group_by(Total_Deaths) %>% 
  summarize(m = mean(Total_Deaths))

ggplot() +
  geom_point(data = nyc_plot, aes(x = Total_Number, y = m), color = "blue") 
## Warning: Removed 1 rows containing missing values (`geom_point()`).

  geom_point(data = nj_plot, aes(x = Total_Deaths, y = m))
## mapping: x = ~Total_Deaths, y = ~m 
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

Violin plot of Causes of Death

nj_data5 %>% 
  group_by(Causes_Death) %>% 
  top_n(5, Causes_Death) %>%
  ggplot(aes(x = Total_Deaths, y = Causes_Death, group = 'Total Deaths', fill = Total_Deaths)) +
  geom_violin() 
## Warning: The following aesthetics were dropped during statistical transformation: fill
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Conclussion.

I was able to obtain data from two different sources (HTML table and a CSV file), I did scraped and clean the data in order to analyze it, The data needed it a lot of cleaning and tyding to start working on it. I did have some difficulties with the plots, I was not able to successfully create a clean, understandable plot, I am not sure if I did not organized the data correctly or maybe i am not inputting the correct code. I have to admit that I really enjoy this course, I believe that I learned a lot in the past few weeks, considering that I never used R before, I am proud to say I am on my way achieve my goals to become in a data scientist, I know that I have to still work on my skills, but I’m sure that with practice I will improve my knowledge.