Project 2 - Data Set 1

Data Used

My first data set was the one I posted about the shipping analysis. ‘tidyr’ and ‘dplyr’ were the main methods to manipulate data. This data comes from work data that I am allowed to use.

Figure 1. Data Set 1. Shipping Data.

Reading in Data

This data was read in using CSV format and all NAs were removed.

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Warning: package 'knitr' was built under R version 3.3.2

## Loading required package: DBI

Table 1. Unitdy Data
Date	Values	China
12/29/16	Price of Carrier	518
	Shipping Fees Collected	260
12/30/16	Price of Carrier	232
	Shipping Fees Collected	132
1/3/17	Price of Carrier	1143.5
	Shipping Fees Collected	752

The following begins to cleanup process.

// Cleaning Data
i <- 0

while (i<1000) {
  Project6DS1[i+2,1]<-Project6DS1[i+1,1]
  i <- i+2
}

Project6DS1 <- Project6DS1[c(1:56),]

Project6DS1<-tbl_df(Project6DS1)
Project6DS1a <- Project6DS1  %>% gather("Country","Costs",3:10)

#Convert character to numeric
Project6DS1a[, c(4)] <- sapply(Project6DS1a[, c(4)], as.numeric)
Project6DS1a <- Project6DS1a %>% group_by(Date, Values, Country) %>% spread(Values, Costs)
colnames(Project6DS1a) <- str_to_title(colnames(Project6DS1a))

Project6DS1a[is.na(Project6DS1a)] <- 0

kable(head(Project6DS1a), caption="Table 2. Data Cleaned Up")
}

## Warning: attributes are not identical across measure variables; they will
## be dropped

## Warning in lapply(X = X, FUN = FUN, ...): NAs introduced by coercion

Table 2. Data Cleaned Up
Date	Country	Price Of Carrier	Shipping Fees Collected
1/10/17	Canada	279.5599	336
1/10/17	China	318.0000	108
1/10/17	England	0.0000	0
1/10/17	Hong.Kong	13.0000	20
1/10/17	Korea	0.0000	0
1/10/17	Singapore	0.0000	0

Analyze Data

The following is the analysis of the data trying to determine if we collect enough fees by country overall. This was done using the mean across all days.

// Preparing Data for Analysis

Project6DS1b <- Project6DS1a %>% mutate(Delta = `Shipping Fees Collected`-`Price Of Carrier`) 
Project6DS1b <- Project6DS1b[!Project6DS1b$Delta == 0, ]
#Method 1
Project6DS1c <- Project6DS1b %>% group_by(Country) %>% summarise(FeeDelta = mean(Delta,2))
kable(Project6DS1c, caption="Table 3. Average Delta Fees Collected by Country")

#prep data for analysis
Project6DS1b <- Project6DS1a %>% mutate(Delta = `Shipping Fees Collected`-`Price Of Carrier`) 
Project6DS1b <- Project6DS1b[!Project6DS1b$Delta == 0, ]
#Method 1
Project6DS1c <- Project6DS1b %>% group_by(Country) %>% summarise(FeeDelta = mean(Delta,2))
kable(Project6DS1c, caption="Table 3. Average Delta Fees Collected by Country")

Table 3. Average Delta Fees Collected by Country
Country	FeeDelta
Canada	16.650001
China	-100.000000
England	20.250015
Hong.Kong	7.000000
Korea	4.500002
Singapore	7.200011
Tawian	7.000000
US	-7.399991

Not enough fees are collected for packages going to China or the US.

Project 2 - Data Set 1

Cesar L. Espitia

March 12, 2017

Data Used

Reading in Data

Analyze Data