Project 2 - Data Set 1

Cesar L. Espitia

March 12, 2017

Data Used

My first data set was the one I posted about the shipping analysis. ‘tidyr’ and ‘dplyr’ were the main methods to manipulate data. This data comes from work data that I am allowed to use.

Figure 1. Data Set 1. Shipping Data.

Figure 1. Data Set 1. Shipping Data.

Reading in Data

This data was read in using CSV format and all NAs were removed.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Warning: package 'knitr' was built under R version 3.3.2
## Loading required package: DBI
Table 1. Unitdy Data
Date Values China Canada Tawian Singapore US England Korea Hong.Kong
12/29/16 Price of Carrier 518
Shipping Fees Collected 260
12/30/16 Price of Carrier 232
Shipping Fees Collected 132
1/3/17 Price of Carrier 1143.5
Shipping Fees Collected 752

The following begins to cleanup process.

// Cleaning Data
i <- 0

while (i<1000) {
  Project6DS1[i+2,1]<-Project6DS1[i+1,1]
  i <- i+2
}

Project6DS1 <- Project6DS1[c(1:56),]

Project6DS1<-tbl_df(Project6DS1)
Project6DS1a <- Project6DS1  %>% gather("Country","Costs",3:10)

#Convert character to numeric
Project6DS1a[, c(4)] <- sapply(Project6DS1a[, c(4)], as.numeric)
Project6DS1a <- Project6DS1a %>% group_by(Date, Values, Country) %>% spread(Values, Costs)
colnames(Project6DS1a) <- str_to_title(colnames(Project6DS1a))

Project6DS1a[is.na(Project6DS1a)] <- 0

kable(head(Project6DS1a), caption="Table 2. Data Cleaned Up")
}
## Warning: attributes are not identical across measure variables; they will
## be dropped
## Warning in lapply(X = X, FUN = FUN, ...): NAs introduced by coercion
Table 2. Data Cleaned Up
Date Country Price Of Carrier Shipping Fees Collected
1/10/17 Canada 279.5599 336
1/10/17 China 318.0000 108
1/10/17 England 0.0000 0
1/10/17 Hong.Kong 13.0000 20
1/10/17 Korea 0.0000 0
1/10/17 Singapore 0.0000 0

Analyze Data

The following is the analysis of the data trying to determine if we collect enough fees by country overall. This was done using the mean across all days.

// Preparing Data for Analysis

Project6DS1b <- Project6DS1a %>% mutate(Delta = `Shipping Fees Collected`-`Price Of Carrier`) 
Project6DS1b <- Project6DS1b[!Project6DS1b$Delta == 0, ]
#Method 1
Project6DS1c <- Project6DS1b %>% group_by(Country) %>% summarise(FeeDelta = mean(Delta,2))
kable(Project6DS1c, caption="Table 3. Average Delta Fees Collected by Country")
#prep data for analysis
Project6DS1b <- Project6DS1a %>% mutate(Delta = `Shipping Fees Collected`-`Price Of Carrier`) 
Project6DS1b <- Project6DS1b[!Project6DS1b$Delta == 0, ]
#Method 1
Project6DS1c <- Project6DS1b %>% group_by(Country) %>% summarise(FeeDelta = mean(Delta,2))
kable(Project6DS1c, caption="Table 3. Average Delta Fees Collected by Country")
Table 3. Average Delta Fees Collected by Country
Country FeeDelta
Canada 16.650001
China -100.000000
England 20.250015
Hong.Kong 7.000000
Korea 4.500002
Singapore 7.200011
Tawian 7.000000
US -7.399991

Not enough fees are collected for packages going to China or the US.