Project 2 - Data Set 2

Data Used

My second data set was from the discussion thread ‘cause of Death’ by Raghunathan Ramnath. ‘tidyr’ and ‘dplyr’ were the main methods to manipulate data. This data looks at Multiple Cause of Death, 1999-2015 Results

Figure 1. Data Set 2. Multiple Cause of Death, 1999-2015 Results.

Reading in Data

This data was read in using SQL format and it was created using MySQL.

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Warning: package 'knitr' was built under R version 3.3.2

## Loading required package: DBI

Table 1. Unitdy Data
ID	CensusRegion	Deaths	Population	CrudeRateper100000
1	Census Region 1: Northeast (CENS-R1)	461712	54653362	844.8
2	Census Region 2: Midwest (CENS-R2)	564665	66293689	851.8
3	Census Region 3: South (CENS-R3)	924360	110688742	835.1
4	Census Region 4: West (CENS-R4)	472975	69595414	679.6
5	Total	2423712	301231207	840.6

The following begins to cleanup process.

// Cleaning Data
Project6DS2<-tbl_df(Project6DS2)
Project6DS2a <- Project6DS2  %>% gather("RegionMetrics","Values",3:5)
Project6DS2a <- Project6DS2a[2:4]

Project6DS2a <- Project6DS2a %>% group_by(CensusRegion, RegionMetrics) %>% spread(CensusRegion, Values)
colnames(Project6DS2a) <- str_to_title(colnames(Project6DS2a))


kable(head(Project6DS2a), caption="Table 2. Data Cleaned Up and Inverted for Analysis")

Table 2. Data Cleaned Up and Inverted for Analysis
Regionmetrics	Census Region 1: Northeast (Cens-R1)	Census Region 2: Midwest (Cens-R2)	Census Region 3: South (Cens-R3)	Census Region 4: West (Cens-R4)	Total
CrudeRateper100000	844.8	851.8	835.1	679.6	840.6
Deaths	461712.0	564665.0	924360.0	472975.0	2423712.0
Population	54653362.0	66293689.0	110688742.0	69595414.0	301231207.0

Analyze Data

There was no requested analysis from the thread only the type of data found. What is interesting is to see the average across regions.

// Preparing Data for Analysis

Project6DS1b <- Project6DS1a %>% mutate(Delta = `Shipping Fees Collected`-`Price Of Carrier`) 
Project6DS1b <- Project6DS1b[!Project6DS1b$Delta == 0, ]
#Method 1
Project6DS1c <- Project6DS1b %>% group_by(Country) %>% summarise(FeeDelta = mean(Delta,2))
kable(Project6DS1c, caption="Table 3. Average Delta Fees Collected by Country")

#prep data for analysis
Project6DS2b <- Project6DS2a %>% mutate(Average = (`Census Region 1: Northeast (Cens-R1)`+`Census Region 2: Midwest (Cens-R2)`+`Census Region 3: South (Cens-R3)`+`Census Region 4: West (Cens-R4)`)%/%4)

#Method 1
kable(Project6DS2b, caption="Table 3. Average per Metric")

Table 3. Average per Metric
Regionmetrics	Census Region 1: Northeast (Cens-R1)	Census Region 2: Midwest (Cens-R2)	Census Region 3: South (Cens-R3)	Census Region 4: West (Cens-R4)	Total	Average
CrudeRateper100000	844.8	851.8	835.1	679.6	840.6	802
Deaths	461712.0	564665.0	924360.0	472975.0	2423712.0	605928
Population	54653362.0	66293689.0	110688742.0	69595414.0	301231207.0	75307801

Some observations: - For the per 100K metric, Region 4 brings down the average - For the deaths and Population, Region 3 raises the average

For further analysis, I would consider digging deeper into Region 3.

Project 2 - Data Set 2

Cesar L. Espitia

March 12, 2017

Data Used

Reading in Data

Analyze Data