Data Used
My first data set was the one I posted about the shipping analysis. ‘tidyr’ and ‘dplyr’ were the main methods to manipulate data. This data comes from work data that I am allowed to use.
Figure 1. Data Set 1. Shipping Data.
Reading in Data
This data was read in using CSV
format and all NAs were removed.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'knitr' was built under R version 3.3.2
## Loading required package: DBI
Date | Values | China | Canada | Tawian | Singapore | US | England | Korea | Hong.Kong |
---|---|---|---|---|---|---|---|---|---|
12/29/16 | Price of Carrier | 518 | |||||||
Shipping Fees Collected | 260 | ||||||||
12/30/16 | Price of Carrier | 232 | |||||||
Shipping Fees Collected | 132 | ||||||||
1/3/17 | Price of Carrier | 1143.5 | |||||||
Shipping Fees Collected | 752 |
The following begins to cleanup process.
// Cleaning Data
i <- 0
while (i<1000) {
Project6DS1[i+2,1]<-Project6DS1[i+1,1]
i <- i+2
}
Project6DS1 <- Project6DS1[c(1:56),]
Project6DS1<-tbl_df(Project6DS1)
Project6DS1a <- Project6DS1 %>% gather("Country","Costs",3:10)
#Convert character to numeric
Project6DS1a[, c(4)] <- sapply(Project6DS1a[, c(4)], as.numeric)
Project6DS1a <- Project6DS1a %>% group_by(Date, Values, Country) %>% spread(Values, Costs)
colnames(Project6DS1a) <- str_to_title(colnames(Project6DS1a))
Project6DS1a[is.na(Project6DS1a)] <- 0
kable(head(Project6DS1a), caption="Table 2. Data Cleaned Up")
}
## Warning: attributes are not identical across measure variables; they will
## be dropped
## Warning in lapply(X = X, FUN = FUN, ...): NAs introduced by coercion
Date | Country | Price Of Carrier | Shipping Fees Collected |
---|---|---|---|
1/10/17 | Canada | 279.5599 | 336 |
1/10/17 | China | 318.0000 | 108 |
1/10/17 | England | 0.0000 | 0 |
1/10/17 | Hong.Kong | 13.0000 | 20 |
1/10/17 | Korea | 0.0000 | 0 |
1/10/17 | Singapore | 0.0000 | 0 |
Analyze Data
The following is the analysis of the data trying to determine if we collect enough fees by country overall. This was done using the mean across all days.
// Preparing Data for Analysis
Project6DS1b <- Project6DS1a %>% mutate(Delta = `Shipping Fees Collected`-`Price Of Carrier`)
Project6DS1b <- Project6DS1b[!Project6DS1b$Delta == 0, ]
#Method 1
Project6DS1c <- Project6DS1b %>% group_by(Country) %>% summarise(FeeDelta = mean(Delta,2))
kable(Project6DS1c, caption="Table 3. Average Delta Fees Collected by Country")
#prep data for analysis
Project6DS1b <- Project6DS1a %>% mutate(Delta = `Shipping Fees Collected`-`Price Of Carrier`)
Project6DS1b <- Project6DS1b[!Project6DS1b$Delta == 0, ]
#Method 1
Project6DS1c <- Project6DS1b %>% group_by(Country) %>% summarise(FeeDelta = mean(Delta,2))
kable(Project6DS1c, caption="Table 3. Average Delta Fees Collected by Country")
Country | FeeDelta |
---|---|
Canada | 16.650001 |
China | -100.000000 |
England | 20.250015 |
Hong.Kong | 7.000000 |
Korea | 4.500002 |
Singapore | 7.200011 |
Tawian | 7.000000 |
US | -7.399991 |
Not enough fees are collected for packages going to China or the US.