Project Proposal

Final Project Proposal

Objective and Motivation.

When I began this journey of becomming a data scientist, I had trouble identifying the primary targets when performing data analysis in a dataset, making the right questions and choices of what and how to look at data and what to be analized became the biggest challenge in assignments and projects. My motivation for the final project is to challenge myself and to prove that I gained the skills and knowledge need it to become in a data scientist. I decided to choose a public dataset from the NYC open data, about the New York City leading causes of death. This dataset contains rich data to work with, it has information about leading causes of death by sex and ethnicity in the city since 2007, which includes the year, cause of death sex, etc. In addition to this data, I would like to include a similar dataset from another city to make a comparison and to analyze escenario and causes of deaths in both cities.

nyc_chart<-read.csv("C:/Users/vitug/Downloads/New_York_City_Leading_Causes_of_Death_20240416.csv")
head(nyc_chart)

##   Year                                                           Leading.Cause
## 1 2011 Nephritis, Nephrotic Syndrome and Nephrisis (N00-N07, N17-N19, N25-N27)
## 2 2009                     Human Immunodeficiency Virus Disease (HIV: B20-B24)
## 3 2009                            Chronic Lower Respiratory Diseases (J40-J47)
## 4 2008                          Diseases of Heart (I00-I09, I11, I13, I20-I51)
## 5 2009                                               Alzheimer's Disease (G30)
## 6 2008        Accidents Except Drug Posioning (V01-X39, X43, X45-X59, Y85-Y86)
##   Sex             Race.Ethnicity Deaths Death.Rate Age.Adjusted.Death.Rate
## 1   F         Black Non-Hispanic     83        7.9                     6.9
## 2   F                   Hispanic     96          8                     8.1
## 3   F                   Hispanic    155       12.9                      16
## 4   F                   Hispanic   1445      122.3                   160.7
## 5   F Asian and Pacific Islander     14        2.5                     3.6
## 6   F Asian and Pacific Islander     36        6.8                     8.5

Methodology.

I will start by performing exploratory data analysis on both datasets by death causes and ethnicity. I will tidy clean and the data for analysis using the tools learned in this course, for example, handling missing values and correcting inconsistencies, adding new columns with results of analysis, and delete unnecesary data on both dataframes. We will then use visualization techniques such as bar charts,and scatterplots to display the data in a meaningful way and draw conclusions based on the findings findings.

colnames(nyc_chart)[2] = "Cause"
colnames(nyc_chart)[4] = "Ethnicity"

comparison_col <- colnames(nyc_chart[3:length(nyc_chart)])
comparison_col

## [1] "Sex"                     "Ethnicity"              
## [3] "Deaths"                  "Death.Rate"             
## [5] "Age.Adjusted.Death.Rate"

nyc_data <- nyc_chart %>% pivot_longer(cols=comparison_col,names_to = "comparison_column", values_to = "Num_of_deaths")

## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(comparison_col)
## 
##   # Now:
##   data %>% select(all_of(comparison_col))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

print(nyc_data)

## # A tibble: 5,470 × 4
##     Year Cause                                   comparison_column Num_of_deaths
##    <int> <chr>                                   <chr>             <chr>        
##  1  2011 Nephritis, Nephrotic Syndrome and Neph… Sex               F            
##  2  2011 Nephritis, Nephrotic Syndrome and Neph… Ethnicity         Black Non-Hi…
##  3  2011 Nephritis, Nephrotic Syndrome and Neph… Deaths            83           
##  4  2011 Nephritis, Nephrotic Syndrome and Neph… Death.Rate        7.9          
##  5  2011 Nephritis, Nephrotic Syndrome and Neph… Age.Adjusted.Dea… 6.9          
##  6  2009 Human Immunodeficiency Virus Disease (… Sex               F            
##  7  2009 Human Immunodeficiency Virus Disease (… Ethnicity         Hispanic     
##  8  2009 Human Immunodeficiency Virus Disease (… Deaths            96           
##  9  2009 Human Immunodeficiency Virus Disease (… Death.Rate        8            
## 10  2009 Human Immunodeficiency Virus Disease (… Age.Adjusted.Dea… 8.1          
## # ℹ 5,460 more rows

Project Proposal

Victor Torres

2024-04-17

Final Project Proposal

Objective and Motivation.

Methodology.

Conclussion.