CPU=read.csv("https://docs.google.com/spreadsheets/d/14EAUjBXw1Je0C0BMVquT4WOGzWRUtWCel7J69wbavn4/pub?output=csv")
First, we select the data of week ending 03/19 & 03/26.
#Selecting data of week ending 03/19 & 03/26
CPUWeek1=CPU[CPU$Week..==11,]
View(CPUWeek1)
CPUWeek2=CPU[CPU$Week..==12,]
View(CPUWeek2)
#To write to Excel and verify results
#write.csv(CPUWeek2, file = "CPUWeek2.csv")
Now, we calculate CPU, total CPU, total units, average percentage full and total miles for both weeks. The results are stored in the respective R objects.
#Calculating CPU
CPUWeek1$CPU=CPUWeek1$Total.Cost.of.Load/CPUWeek1$Units
CPUWeek2$CPU=CPUWeek2$Total.Cost.of.Load/CPUWeek2$Units
#CPUWeek1$CPU[CPUWeek1$CPU==Inf]=CPUWeek1$Total.Cost.of.Load
#Cost without account for the 0 unit costs incurred
TotalCPUWeek1=sum(CPUWeek1$CPU[CPUWeek1$CPU!=Inf])
TotalCPUWeek2=sum(CPUWeek2$CPU[CPUWeek2$CPU!=Inf])
#Zero Unit costs incurred
ZeroUnitCPUWeek1=sum(CPUWeek1$Total.Cost.of.Load[CPUWeek1$Units==0])
ZeroUnitCPUWeek2=sum(CPUWeek2$Total.Cost.of.Load[CPUWeek2$Units==0])
#Total Miles Covered
MilesWeek1=sum(CPUWeek1$Miles)
MilesWeek2=sum(CPUWeek2$Miles)
#Pts full
FullPercentageWeek1=mean(CPUWeek1$Pts....Full.)
FullPercentageWeek2=mean(CPUWeek2$Pts....Full.)
#Total units
TotalUnitsWeek1=sum(CPUWeek1$Units)
TotalUnitsWeek2=sum(CPUWeek2$Units)
StdDevTCPUWeek1=sqrt(var(CPUWeek1$CPU[CPUWeek1$CPU!=Inf]))
StdDevTCPUWeek2=sqrt(var(CPUWeek2$CPU[CPUWeek2$CPU!=Inf]))
Next, we will try to determine what factors impact the CPU by looking at the entire data.
#Factors that affect CPU
CPU$CPU=CPU$Total.Cost.of.Load/CPU$Units
CPUInfremoved=CPU[CPU$Units!=0,]
#Data Partitioned
dt = sort(sample(nrow(CPUInfremoved), nrow(CPUInfremoved)*.7))
train<-CPUInfremoved[dt,]
test<-CPUInfremoved[-dt,]
#library(knitr)
#library(dplyr)
#library(purrr)
#Removing columns with too many missings and one factor level
#data= CPUInfremoved %>% select_if(function(col) mean(is.na(col)) < .1 && NROW(unique(col)) > 1) %>% data.frame
library(MASS)
Warning: package 'MASS' was built under R version 3.3.2
model=lm(CPU~Miles+Category+Origin.ID+SCAC+Primary...Non.Primary+Mode+Pts....Full., data=(train))
step <- stepAIC(model, direction="both")
Start: AIC=36583.69
CPU ~ Miles + Category + Origin.ID + SCAC + Primary...Non.Primary +
Mode + Pts....Full.
Df Sum of Sq RSS AIC
<none> 548613 36584
- Primary...Non.Primary 1 1752 550364 36610
- Mode 1 5974 554586 36677
- Category 1 20569 569181 36906
- Pts....Full. 1 27848 576461 37018
- SCAC 73 95069 643682 37846
- Miles 1 84648 633260 37846
- Origin.ID 10 1064776 1613388 46068
#prediction=predict(model, newdata = test)
#Error=sum((prediction-test$CPU)^2)/length(test$CPU)
Based on the results above, statistically significant factors are determined.
#step$anova # display results
The model result indicates how the factors impact the CPU in general for all the weeks in dataset. A positive estimate means that the factor increases the CPU. A negative estimate means that the factor decreases the CPU. Based on the ouput below a tableau report is created.
summary(rlm(CPUInfremoved$CPU~CPUInfremoved$Miles+CPUInfremoved$Pts....Full.))
Call: rlm(formula = CPUInfremoved$CPU ~ CPUInfremoved$Miles + CPUInfremoved$Pts....Full.)
Residuals:
Min 1Q Median 3Q Max
-86.047 -5.084 -2.133 8.445 208.614
Coefficients:
Value Std. Error t value
(Intercept) 94.2566 1.2402 76.0007
CPUInfremoved$Miles 0.0143 0.0002 86.2784
CPUInfremoved$Pts....Full. -0.9101 0.0126 -72.1816
Residual standard error: 8.665 on 12585 degrees of freedom
sort(model$coefficients)
SCACGADL Origin.IDF317 SCACHLCU
-14.769583496 -11.500448440 -11.246948631
SCACXPOL SCACPSKL520200 SCACJAWH
-8.068058523 -7.876983224 -7.278218688
SCACDCMT Origin.IDF048 CategoryStock Transfer
-6.640553892 -6.295447491 -3.696450436
SCACSCGP SCACHAEI Primary...Non.PrimaryP
-2.115508417 -1.897690063 -1.731569898
Origin.IDF103 Pts....Full. Miles
-1.350941072 -0.294879630 0.007534757
SCACSQTK SCACCNRU SCACBNUN
0.257121674 1.080549101 1.879745767
SCACKCSM Origin.IDF397 SCACPSKL502100
2.159280275 2.417451160 2.531786158
SCACHUBG SCACPSKL509600 SCACCSNY
2.704414649 2.997697218 3.438024370
Origin.IDF306 SCACBOMN SCACHJBI
3.652288969 3.825541471 4.023032458
SCACTFYI SCACEPES SCACMIOS
4.031277254 4.106234235 4.163247516
SCACKSTS SCACJNJR SCACWSGS
4.443898516 4.462088314 4.535117543
ModeIMC SCACDART SCACISCM
4.695022960 4.876518855 4.906990691
SCACCWAS SCACISCO Origin.IDF139
5.123881291 5.600820908 5.680373126
SCACGRTG SCACBHRI SCACHYWK
5.739987094 5.859938054 5.879354336
SCACSPDG SCACMLXP SCACMTLE
6.048853836 6.066988630 6.104667914
SCACTYEI SCACHGTN SCACPGPR
6.329560178 6.343917097 6.398661051
SCACCTRQ SCACLVTK SCACLRGR
6.414843642 6.537656082 6.609400857
SCACHCIL SCACSCNN SCACCRCJ
6.887468558 6.925909772 7.201544283
SCACRBTW SCACCEVE Origin.IDF311
7.328792372 7.368955737 7.467492826
SCACROEV SCACBTTE SCACDCLH
7.539347752 7.650017217 7.857442204
SCACMSLV SCACKNIG SCACCNWY
7.908054663 8.106791404 8.130330716
SCACPAPT SCACPSKA SCACNS
8.430671177 8.489247319 8.525747023
SCACHKRA SCACUSIT SCACSWFT
8.528227762 8.558642507 8.577654341
SCACCELC SCACPGLI SCACTQYL
8.678242190 9.146970857 9.435669544
SCACMXDN SCACSUSE SCACBOUF
9.599388978 10.155537825 10.286331891
SCACCLLQ SCACPGVE SCACCRCR
10.574734129 10.613219126 11.568552862
SCACPRQD SCACPTMD SCACPGLH
11.762154470 12.262625531 12.271720604
SCACHJBT SCACTCAM SCACCRPS
12.298092570 12.541620192 12.551019552
SCACHUMD Origin.IDF329 SCACCDNK
20.566713718 21.775957867 24.548364135
Origin.IDF314 (Intercept) Origin.IDF325
27.540131004 29.494882369 37.995831644
SCACRDWY
71.353131430