Describe Your Data:
ggplot(data, aes(x=Value)) +
geom_histogram(color="darkblue", fill="lightblue", bins = 26) +
labs(title = "Distribution of the values of cargo", x = "Value", y = "Count") +
theme_light()
The graph shows, that most of the cargo is under 100 euros and the rest are under 500 euros with several exceptions of up to 2500 euros per cargo.
par(mfrow = c(1,3))
boxplot(data$LDM, main = "Distribution of LDM\nof the cargo", ylab = "LDM")
boxplot(data$Weight, main = "Distribution of Weight\nof the cargo", ylab = "Weight")
boxplot(data$Volume, main = "Distribution of Volume\nof the cargo", ylab = "Volume")
The tendency of the previous graph is visible here: due to most of the cargo being of low-value, their dimensionality attributes are low as well.
ggplot(data, aes(x=UnitTypeID, y=datecount, fill=UnitTypeID)) +
geom_bar(stat="identity") +
labs(title = "Number of different types of cargo",
x = "Unit type ID", y = "Count") +
theme_light()
The highest number of cargo belongs to the type EP(120x80x220) cargo, that corresponds to the unit type ID 33, with 25 cargo. The runner-up was the VNT type (unit type ID 47) with 17 cargo.
ggplot(data, aes(x=Terminal_s, y=datecount, fill = Terminal_s)) +
geom_bar(stat="identity") +
labs(title = "Number of cargo that was redirected to a terminal in the sender country",
x = "", y = "Count") +
theme_light()
From this graph we can see that 45 cargo were not redirected to a terminal, leaving 21 (total 66) that were redirected to a terminal in the country from which it was sent.
ggplot(data, aes(x=Terminal_r, y=datecount, fill = Terminal_r)) +
geom_bar(stat="identity") +
labs(title = "Number of cargo that was redirected to a terminal in the sender country",
x = "", y = "Count") +
theme_light()
From this graph we can see that 50 of cargo were not redirected to a terminal, leaving 16 that were redirected to a terminal in the country to which it was sent.
par(mfrow = c(1,2))
cdata <- data[-which(is.na(as.numeric(data$FirstDimension))),]
## Warning in which(is.na(as.numeric(data$FirstDimension))): NAs introduced by
## coercion
cdata$FirstDimension <- as.numeric(cdata$FirstDimension)
cdata$SecondDimension <- as.numeric(cdata$SecondDimension)
boxplot(cdata$FirstDimension, main = "Distribution of the\nFirstDimension of the cargo", ylab = "LDM")
boxplot(cdata$SecondDimension, main = "Distribution of the\nSecondDimension of the cargo", ylab = "Weight")
Here we can see that more than half of the cargo had the same values for both first and second dimensions, thus resulting in the boxplots portrayed in the graphs.
Sys.setlocale("LC_ALL","English")
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
ggplot(data = data[-1,], aes(x = date, y = datecount)) +
geom_bar(stat = "identity", fill = "purple") +
labs(title = "Number of cargo requests per day",
x = "Date", y = "Count")
In this graph the 8th, 15th, 22th and 29th days of November mark the Mondays. From this graph we can see that there were no cargo orders made on the weekdays. Also, a sort of cyclical seasonality can be witnessed where from Monday to Thursday there’s a growth of cargo that drops to a low number of orders on Friday and drop to zero cargo orders on weekends.
It should be noted, that this dataset has a low number of values and spans through a short period of time for proper seasonality evaluation.
ggplot(df, aes(x=day, y=number, group=week, color=week)) +
geom_line(size = 2)+theme_light() +
labs(title = "Number of cargo requests per day",
x = "", y = "Number of cargo")