ANLY545 Project Concept Presentation- Cancer Data Analysis

Vaidyanathan Subramanian,Rahul Singh

2022-04-30

Cancer Disease in the United States

Cancer is “a disease in which some of the body’s cells grow uncontrollably and spread to other parts of the body” (National Cancer Institute 2021). There are many different types of cancer, and those different types are usually named for the part of the body where the cancer starts. There is a one in three chance that you will have to deal with cancer somewhere in your body in your lifetime (American Cancer Society 2021).

How Does Cancer Develop?

Cancer is a genetic disease—that is, it is caused by changes to genes that control the way our cells function, especially how they grow and divide.

Genetic changes that cause cancer can happen because:

Rate of New Cancers in the United States

All Types of Cancer, All Ages, All Races and Ethnicities, Male and Female
Rate of New Cancers in the United States, 2018

Rate of New Cancers in the United States, 2018

Rate of New Cancers in the United States, 2018

Rate of New Cancers in the United States, 2018

Top 10 Cancers by Rates of New Cancer Cases

Top 10 Cancers by Rates of New Cancer Cases

Top 10 Cancers by Rates of New Cancer Cases

Top 10 Cancers by Rates of Cancer Deaths

Top 10 Cancers by Rates of Cancer Deaths

Top 10 Cancers by Rates of Cancer Deaths

Purpose

Toxic chemicals are virtually always released into the environment as a result of current and historical industrial activity, some of which are known or suspected carcinogens (National Institutes of Health 2018). While plant operators can implement technical and operational methods to reduce these emissions, and governments can use legislation to encourage plant operators to do so, no industrial process can be completely clean, and industrially produced toxins remain an unavoidable part of life in a modern society.

Potential linkages between industrial operations and cancer hot areas can be investigated using publicly available geospatial data. While the actual causes of individual tumors are complex and often unknown with any degree of certainty, aggregated data analysis can provide important insights and avenues for directing further (and sometimes scarce) investigatory resources in the right direction.

We’d like to analyze cancer registry data to see if there’s a link between geographic location, age, behavioral risk factors, and industrial toxins in the environment

Hypothesis

Data

To explore the relationship between heart disease and patient’s characteristics, we need to the following information: Cases of patients who experienced heart disease Age Gender Hematic index related to heart health Symptoms and Severity

Data Resource

Dataset Analysis

From Cleveland database, subset of 14 attributes Categorical data and Numerical data Interested attributes: Age (Real) Gender(Binary) Chest Pain Type (Nominal) Rest Blood Pressure( Real) Fasting Blood Sugar (Real) Thalassemia( Nominal) Exercise Induced Angina (Binary) Heart Disease( Binary) There is no missing value

Analytic Scope/Methods

Descriptive and Correlation analysis Methods: χ2 test of independence; Loglinear analysis for multiple attributes H0: There is no relationship between variables H1: There is a relationship between variables Variables: Age and/or Gender vs Heart Disease Age and/or Gender vs Hematic index Graphic: Bar Chart, Mosaic Displays

Analytic Scope/Pitfalls

The bias of the data selection may exist. With the sample size of 270, the data may not represent the behaviors of the population. There are some confounders not included in the data sheet may influencing the result.

References