Simple Exploratory Data Analyst Shipping_data
Introduction Shipment Port
Business Question
You can find the data set in here
In the bustling world of international trade, ports serve as gateways
to exchange goods across borders. To shed light on the dynamics of port
shipments, we embark on a data-driven journey through the
Port Shipment Dataset. This rich collection comprises
260,000 items meticulously packed for shipment, offering a wealth of
information about various aspects of international trade. Our objective
is to unravel valuable insights from this dataset, employing two crucial
steps: data wrangling and exploratory data analysis (EDA).
Data Wrangling: Data wrangling serves as the foundational step in any data analysis project, ensuring data quality and preparing it for further exploration. In the case of the Port Shipment Dataset, we are presented with a vast array of information pertaining to each shipment. The first challenge lies in assessing the dataset’s integrity, detecting missing values, inconsistencies, or outliers. Additionally, we will need to standardize the data format, ensuring consistency across the various attributes such as productname,price,weight, anddimensions.By cleaning and restructuring the data, we aim to create a reliable and harmonized foundation for our subsequent analysis.Exploratory Data Analysis:With the data in a refined state, we embark on the exciting phase of exploratory data analysis (EDA). EDA allows us to gain a deeper understanding of the dataset’s characteristics, relationships, and patterns.
Overview Dataset
This dataset contains information on international shipments of a port. The dataset includes the name of the product, its price, weight, length, width, and height, as well as the date it was shipped and the destination port. This data set take time in 2023 March
This dataset can be used for educational and research purposes, such as data analysis, machine learning, container problem, and data visualization. The data was collected with the intention of representing typical shipments that could be expected to be sent from one port to the other.
It is important to note that this dataset is not based on any actual shipments or real-world data. Therefore, the values and characteristics of the products, as well as the shipment dates and destinations, are purely hypothetical and should not be used for any practical or commercial purposes. Any conclusions drawn from the analysis of this dataset should be viewed as hypothetical and not representative of actual international shipments.
dataset :
name: Name of itemprice($): Price of item in dolarsweight(kg): Weight of item in kilogramslength(m): Length of item in meterswidth (m): Width of item in metersheight(m): height of item in metersshipment.date: Date when item has arrived to the portdestination.port: Port where item is destined to be sent
Data Preparation
Import Package & Data set
Import Package :
library(stringr)
library(ggplot2)Import DataSet
shipping_data <- read.csv(file = "data_input/shipping_data.csv")
rmarkdown::paged_table(shipping_data)Checking Structure using str()
The str() function is used to display the
structure of the object. The object can be a vector,
data frame, list, or other R object.
str(shipping_data)#> 'data.frame': 263821 obs. of 8 variables:
#> $ name : chr "Camera Bag" "Portable Bluetooth Keyboard" "Large Flat Rate Box" "Ceramic Tiles" ...
#> $ price.... : num 37.7 144.7 38.6 10.3 21.6 ...
#> $ weight..kg. : num 1.1 0.39 0.97 6.22 1.18 ...
#> $ length..m. : num 0.4 0.11 0.79 0.36 17.77 ...
#> $ width..m. : num 0.39 0.06 0.55 0.37 0.27 0.35 1.3 0.78 0.94 0.14 ...
#> $ height..m. : num 0.26 0.03 0.35 0.02 0.13 0.15 0.26 0.68 1.47 0.03 ...
#> $ shipment.date : chr "2023-03-19" "2023-03-21" "2023-03-25" "2023-03-15" ...
#> $ destination.port: chr "Port of Singapore (Singapore)" "Port of Busan (South Korea)" "Port of Tianjin (China)" "Port of Shanghai (China)" ...
The dataset provided has the following characteristics according to the str() function:
- It is a data.frame with 263,821 observations (rows) and 8 variables (columns).
Checking Total Row and Column using dim() or
nrow(), ncol()
The dim() function is used to find out the dimensions of
the dataframe.
dim(shipping_data)#> [1] 263821 8
The nrow() function is used to determine the number of
rows, while ncol() is used to determine the number of
columns.
nrow_shipping_data <- nrow(shipping_data)
ncol_shipping_data <- ncol(shipping_data)
print(paste("Shipping Data total Row is: ", nrow_shipping_data))#> [1] "Shipping Data total Row is: 263821"
print(paste("Shipping Data total Column is: ", ncol_shipping_data))#> [1] "Shipping Data total Column is: 8"
Data Inspection
Explicit Coercion
The initial stage before conducting data analysis is to ensure that the data used is clean. One of the data cleansing techniques is changing the data type to the correct data type, otherwise known as the term explicit coercion.
# check the retail data structure again
str(shipping_data)#> 'data.frame': 263821 obs. of 8 variables:
#> $ name : chr "Camera Bag" "Portable Bluetooth Keyboard" "Large Flat Rate Box" "Ceramic Tiles" ...
#> $ price.... : num 37.7 144.7 38.6 10.3 21.6 ...
#> $ weight..kg. : num 1.1 0.39 0.97 6.22 1.18 ...
#> $ length..m. : num 0.4 0.11 0.79 0.36 17.77 ...
#> $ width..m. : num 0.39 0.06 0.55 0.37 0.27 0.35 1.3 0.78 0.94 0.14 ...
#> $ height..m. : num 0.26 0.03 0.35 0.02 0.13 0.15 0.26 0.68 1.47 0.03 ...
#> $ shipment.date : chr "2023-03-19" "2023-03-21" "2023-03-25" "2023-03-15" ...
#> $ destination.port: chr "Port of Singapore (Singapore)" "Port of Busan (South Korea)" "Port of Tianjin (China)" "Port of Shanghai (China)" ...
To change the data type, we can use the as.___()
function where ___ is filled with the destination data
type. Example:
as. character()as. Date()as. integer()as. numeric()as. factor()
From the data, some of the columns that I changed the data type are: - shipment.date -> date
#explicit coercion
shipping_data$shipment.date <- as.Date(shipping_data$shipment.date)Check Missing Value
The is.na() function is used to check for missing values
for each value. The colSums() function is used to sum up
the values in each column.
colSums(is.na(shipping_data))#> name price.... weight..kg. length..m.
#> 0 0 13 184
#> width..m. height..m. shipment.date destination.port
#> 0 0 2638 0
Because there are a total of ‘2638’ missing shipment.date data. I will delete the data on the grounds that the data is less than 1/10 of the existing data and makes it easier for me to do the Exploratory process
## Missing value data Treatment
# Remove rows with missing values
shipping_data <- na.omit(shipping_data)Conclusion: shipment data has missing value. So i remove all the rows with missing values because is only the small portion of the total of data, and for exploratory purpose.
Data Wrangling
Adding a new Variabel
Add generate random additional time “Shipping.Arival.date”
because this data set only have shipment.date and doesnt have arival time I took the initiative to add dummy random data to make it easier and extract more information from this dataset.
# Set the seed for reproducibility
set.seed(123)
# Generate random additional time in days
random_days <- sample(1:7, nrow(shipping_data), replace = TRUE)
# Convert shipment.date to Date type
shipping_data$shipment.date <- as.Date(shipping_data$shipment.date)
# Calculate shipment.arrive.date
shipping_data$shipment.arrive.date <- shipping_data$shipment.date + 7 + random_daysAdding port.city and country column based on the destination.port
in the session to get an insight I tried to extract several countries and port cities from the destination.port column to be able to dig deeper information For example:
| destination.port | port.city | country |
|---|---|---|
| Port of Singapore (Singapore) | Singapore | Singapore |
| Port of Busan (South Korea) | Busan | South Korea |
| Port of Tianjin (China) | Tianjin | China |
| Port of Shanghai (China) | Shanghai | China |
# Extract port city and country
result <- str_match(shipping_data$destination.port, ".*\\b(\\w+)\\s+\\(([^)]+)\\)$")
# Create new columns for port city and country
shipping_data$port.city <- result[, 2]
shipping_data$country <- result[, 3]
# Remove parentheses from port city
shipping_data$port.city <- gsub("\\s*\\([^)]+\\)", "", shipping_data$port.city)
# Remove parentheses from country
shipping_data$country <- gsub("[()]", "", shipping_data$country)head(result)#> [,1] [,2] [,3]
#> [1,] "Port of Singapore (Singapore)" "Singapore" "Singapore"
#> [2,] "Port of Busan (South Korea)" "Busan" "South Korea"
#> [3,] "Port of Tianjin (China)" "Tianjin" "China"
#> [4,] "Port of Shanghai (China)" "Shanghai" "China"
#> [5,] "Port of Tianjin (China)" "Tianjin" "China"
#> [6,] "Port of Shanghai (China)" "Shanghai" "China"
Shipment Duration & Shipment Volume
Calculate the shipment duration:
This can provide insights into shipping efficiency or potential delays.
# Convert shipment.date and shipment.arrive.date to Date type
shipping_data$shipment.date <- as.Date(shipping_data$shipment.date)
shipping_data$shipment.arrive.date <- as.Date(shipping_data$shipment.arrive.date)
# Calculate shipment duration
shipping_data$shipment.duration <- shipping_data$shipment.arrive.date - shipping_data$shipment.date
# Print the dataset with shipment duration
head(shipping_data)#> name price.... weight..kg. length..m. width..m.
#> 1 Camera Bag 37.66 1.10 0.40 0.39
#> 2 Portable Bluetooth Keyboard 144.65 0.39 0.11 0.06
#> 3 Large Flat Rate Box 38.57 0.97 0.79 0.55
#> 4 Ceramic Tiles 10.34 6.22 0.36 0.37
#> 5 Garden Hose 21.63 1.18 17.77 0.27
#> 6 Cookware Set 401.64 7.60 0.49 0.35
#> height..m. shipment.date destination.port shipment.arrive.date
#> 1 0.26 2023-03-19 Port of Singapore (Singapore) 2023-04-02
#> 2 0.03 2023-03-21 Port of Busan (South Korea) 2023-04-04
#> 3 0.35 2023-03-25 Port of Tianjin (China) 2023-04-04
#> 4 0.02 2023-03-15 Port of Shanghai (China) 2023-03-28
#> 5 0.13 2023-03-25 Port of Tianjin (China) 2023-04-04
#> 6 0.15 2023-03-20 Port of Shanghai (China) 2023-03-29
#> port.city country shipment.duration
#> 1 Singapore Singapore 14 days
#> 2 Busan South Korea 14 days
#> 3 Tianjin China 10 days
#> 4 Shanghai China 13 days
#> 5 Tianjin China 10 days
#> 6 Shanghai China 9 days
Create a new column indicating the shipment volume:
I’m calculating the volume of each shipment by multiplying the length, width, and height. This new column can provide insights into the size or capacity of the shipments.
# Create a new column for shipment volume
shipping_data$shipment.volume <- shipping_data$length * shipping_data$width * shipping_data$height
# Print the updated dataset
head(shipping_data)#> name price.... weight..kg. length..m. width..m.
#> 1 Camera Bag 37.66 1.10 0.40 0.39
#> 2 Portable Bluetooth Keyboard 144.65 0.39 0.11 0.06
#> 3 Large Flat Rate Box 38.57 0.97 0.79 0.55
#> 4 Ceramic Tiles 10.34 6.22 0.36 0.37
#> 5 Garden Hose 21.63 1.18 17.77 0.27
#> 6 Cookware Set 401.64 7.60 0.49 0.35
#> height..m. shipment.date destination.port shipment.arrive.date
#> 1 0.26 2023-03-19 Port of Singapore (Singapore) 2023-04-02
#> 2 0.03 2023-03-21 Port of Busan (South Korea) 2023-04-04
#> 3 0.35 2023-03-25 Port of Tianjin (China) 2023-04-04
#> 4 0.02 2023-03-15 Port of Shanghai (China) 2023-03-28
#> 5 0.13 2023-03-25 Port of Tianjin (China) 2023-04-04
#> 6 0.15 2023-03-20 Port of Shanghai (China) 2023-03-29
#> port.city country shipment.duration shipment.volume
#> 1 Singapore Singapore 14 days 0.040560
#> 2 Busan South Korea 14 days 0.000198
#> 3 Tianjin China 10 days 0.152075
#> 4 Shanghai China 13 days 0.002664
#> 5 Tianjin China 10 days 0.623727
#> 6 Shanghai China 9 days 0.025725
- Second Data Coertions
shipping_data$shipment.arrive.date <- as.Date(shipping_data$shipment.arrive.date)
shipping_data$country <- as.factor(shipping_data$country)
shipping_data$port.city <- as.factor(shipping_data$port.city)Summary new dataframe
summary(shipping_data)#> name price.... weight..kg. length..m.
#> Length:260987 Min. : 1.0 Min. : 0.05 Min. : 0.0500
#> Class :character 1st Qu.: 27.9 1st Qu.: 0.71 1st Qu.: 0.2900
#> Mode :character Median : 87.6 Median : 2.96 Median : 0.6000
#> Mean : 4182.3 Mean : 323.90 Mean : 0.9535
#> 3rd Qu.: 257.7 3rd Qu.: 67.34 3rd Qu.: 1.1400
#> Max. :1998160.0 Max. :24982.35 Max. :30.0000
#> width..m. height..m. shipment.date destination.port
#> Min. :0.0100 Min. :0.0100 Min. :2023-03-12 Length:260987
#> 1st Qu.:0.2100 1st Qu.:0.1200 1st Qu.:2023-03-16 Class :character
#> Median :0.3900 Median :0.2500 Median :2023-03-20 Mode :character
#> Mean :0.5301 Mean :0.4626 Mean :2023-03-19
#> 3rd Qu.:0.8000 3rd Qu.:0.6600 3rd Qu.:2023-03-24
#> Max. :6.0000 Max. :5.9900 Max. :2023-03-28
#> shipment.arrive.date port.city country shipment.duration
#> Min. :2023-03-20 Busan :52634 China :104252 Length:260987
#> 1st Qu.:2023-03-27 Shanghai :51730 Japan : 52263 Class :difftime
#> Median :2023-03-31 Singapore:51812 Singapore : 51812 Mode :numeric
#> Mean :2023-03-30 Tianjin :52522 South Korea: 52634
#> 3rd Qu.:2023-04-04 Tokyo :52263 NA's : 26
#> Max. :2023-04-11 NA's : 26
#> shipment.volume
#> Min. : 0.0000
#> 1st Qu.: 0.0099
#> Median : 0.0564
#> Mean : 1.5715
#> 3rd Qu.: 0.4532
#> Max. :937.8705
insight :
- Price:
The prices vary significantly, ranging from (in [$])1.0 to 1,998,160.0, with a mean price of 4,182.4. There seems to be a significant difference between the median price (87.6) and the mean price, indicating the presence of outliers or extreme values in the higher price range.
- Weight:
The weight of shipments varies from 0.05 kg to 24,982.35 kg, with a mean weight of 323.91 kg. There is a substantial difference between the median weight (2.96 kg) and the mean weight, suggesting the presence of outliers or heavier shipments.
- Length, Width, and Height:
The dimensions of the shipments (length, width, and height) exhibit a wide range of values. The maximum values for length (30.0 meters) and width (6.0 meters) indicate the presence of larger-sized shipments.
- Shipment Date and Duration:
The shipment dates range from March 12, 2023, to March 28, 2023. The mean shipment date is March 19, 2023, indicating that the majority of shipments fall around this date.
- Shipment Volume:
The shipment volumes vary from 0.0 to 937.8705, with a mean volume of 1.5711. The median shipment volume (0.0564) is considerably lower than the mean, indicating the presence of shipments with very high volumes.
- Destination :
Their was only four country destionation and China is leading with the most shipping distribution with total of 104252 items
Explanatorty or business Question
head(shipping_data)#> name price.... weight..kg. length..m. width..m.
#> 1 Camera Bag 37.66 1.10 0.40 0.39
#> 2 Portable Bluetooth Keyboard 144.65 0.39 0.11 0.06
#> 3 Large Flat Rate Box 38.57 0.97 0.79 0.55
#> 4 Ceramic Tiles 10.34 6.22 0.36 0.37
#> 5 Garden Hose 21.63 1.18 17.77 0.27
#> 6 Cookware Set 401.64 7.60 0.49 0.35
#> height..m. shipment.date destination.port shipment.arrive.date
#> 1 0.26 2023-03-19 Port of Singapore (Singapore) 2023-04-02
#> 2 0.03 2023-03-21 Port of Busan (South Korea) 2023-04-04
#> 3 0.35 2023-03-25 Port of Tianjin (China) 2023-04-04
#> 4 0.02 2023-03-15 Port of Shanghai (China) 2023-03-28
#> 5 0.13 2023-03-25 Port of Tianjin (China) 2023-04-04
#> 6 0.15 2023-03-20 Port of Shanghai (China) 2023-03-29
#> port.city country shipment.duration shipment.volume
#> 1 Singapore Singapore 14 days 0.040560
#> 2 Busan South Korea 14 days 0.000198
#> 3 Tianjin China 10 days 0.152075
#> 4 Shanghai China 13 days 0.002664
#> 5 Tianjin China 10 days 0.623727
#> 6 Shanghai China 9 days 0.025725
Top 10 shipments with
Highest volume.
# Calculate the total volume for each shipment
shipment_total_volume <- aggregate(shipment.volume ~ name, data = shipping_data, FUN = sum)
# Sort the results by total volume in descending order
sorted_shipments <- shipment_total_volume[order(shipment_total_volume$shipment.volume , decreasing = TRUE), ]
# Select the top 10 shipments with the highest volume
top_10_volume <- head(sorted_shipments, 10)
# Print the top 10 shipments with the highest volume
print(top_10_volume)#> name shipment.volume
#> 226 Sailboat 266298.173
#> 233 Small Boat 15941.631
#> 54 Car 9775.406
#> 251 Sports Car 9221.106
#> 17 Bed 7626.883
#> 10 Backpacking Tent 7257.995
#> 95 Forklift 4804.001
#> 177 Pallete of Coffee 2969.395
#> 204 Piano 2861.961
#> 8 ATV 2633.927
# Create a bar plot of the top 10 shipments with the highest volume
barplot(top_10_volume$shipment.volume, names.arg = top_10_volume$name,
main = "Top 10 Shipments with Highest Volume",
las = 2)Highest Price
# Calculate the mean price for each item
item_mean_price <- aggregate(price.... ~ name, data = shipping_data, FUN = mean)
sorted_items <- item_mean_price[order(item_mean_price$price...., decreasing = TRUE), ]
# Select the top 10 items with the highest mean prices
top_10_price <- head(sorted_items, 10)
print(top_10_price)#> name price....
#> 251 Sports Car 1016532.127
#> 226 Sailboat 277653.895
#> 54 Car 102057.856
#> 233 Small Boat 12835.410
#> 204 Piano 11233.303
#> 185 Pallete of Laptops 11022.831
#> 95 Forklift 7585.236
#> 193 Pallete of Smartphones 6499.505
#> 125 Jet Ski 6118.588
#> 109 Golf Cart 5986.065
# Create a bar plot for the top 10 shipments with highest price
barplot(top_10_price$price...., names.arg = top_10_price$name, las = 2)
title(main = "Top 10 Shipments items with Highest Price")Average price, weight, and dimensions of the shipments based on their country
# Calculate average price, weight, and dimensions by country
averages <- aggregate(shipping_data[, c("price....", "weight..kg.", "length..m.", "width..m.", "height..m.")],
by = list(country = shipping_data$country),
FUN = mean)
averages <- averages[order(averages$country), ]
colnames(averages)[2:6] <- c("avg_price", "avg_weight", "avg_length", "avg_width", "avg_height")
print(averages)#> country avg_price avg_weight avg_length avg_width avg_height
#> 1 China 4269.510 332.8498 0.9589953 0.5319058 0.4621350
#> 2 Japan 4110.728 318.8995 0.9580426 0.5290167 0.4642499
#> 3 Singapore 3977.996 322.2713 0.9540344 0.5271979 0.4625537
#> 4 South Korea 4282.339 312.7865 0.9372075 0.5305135 0.4617606
insight : From the data wrangling i get an insight about china has the most expensive price of shipping the most expensive goods, this might occur due to the goods from a factor of four variable containers or packages that are quite high
Distribution of shipment dates:
shipping_data$shipment.date <- as.Date(shipping_data$shipment.date, format = "%Y-%m-%d")
ggplot(shipping_data, aes(x = shipment.date)) +
geom_histogram(binwidth = 1, fill = "lightblue", color = "black") +
labs(x = "Shipment Date", y = "Frequency") +
ggtitle("Distribution of Shipment Dates")insight : In this diagram I take the insight that shipments from March 12 to March 28 are always fulfilled with a busy and stable delivery schedule and there has never been a single drop
top total weight of shipments for each port city for every day:
shipping_data$shipment.date <- as.Date(shipping_data$shipment.date)
# Calculate the maximum weight of shipments for each port city for every day
max_weight <- aggregate(weight..kg. ~ shipment.date + port.city,
data = shipping_data,
FUN = max)
# Filter to keep only the top weight for each day
top_weight <- aggregate(weight..kg. ~ shipment.date,
data = max_weight,
FUN = max)
# Identify the port city with the highest weight for each day
top_port_city <- aggregate(port.city ~ shipment.date + weight..kg.,
data = max_weight,
FUN = function(x) x[which.max(x)])
top_weight <- merge(top_weight, top_port_city)
# Print the result
print(top_weight)#> shipment.date weight..kg. port.city
#> 1 2023-03-12 24868.32 Tokyo
#> 2 2023-03-13 24968.59 Shanghai
#> 3 2023-03-14 24944.99 Shanghai
#> 4 2023-03-15 24956.64 Shanghai
#> 5 2023-03-16 24977.65 Tokyo
#> 6 2023-03-17 24975.39 Singapore
#> 7 2023-03-18 24736.21 Busan
#> 8 2023-03-19 24642.78 Tokyo
#> 9 2023-03-20 24430.57 Busan
#> 10 2023-03-21 24980.15 Singapore
#> 11 2023-03-22 24855.97 Shanghai
#> 12 2023-03-23 24878.67 Tianjin
#> 13 2023-03-24 24934.49 Singapore
#> 14 2023-03-25 24860.83 Singapore
#> 15 2023-03-26 24982.35 Tianjin
#> 16 2023-03-27 24961.47 Tianjin
#> 17 2023-03-28 24898.33 Shanghai
insight : for the most total weight being delivered to each city everyday Tokyo appears 3 times, shangai 5 times, busan 2 times, tianjin 3 times and singapore 5 times in this timeframe
most frequency destination ports for shipments
most frequency city distribution
# Calculate the frequency of each port.city
port_city_freq <- table(shipping_data$port.city)
port_city_df <- as.data.frame(port_city_freq)
colnames(port_city_df) <- c("portCity", "frequency")
# Sort the dataframe by frequency in descending order
port_city_df <- port_city_df[order(port_city_df$frequency, decreasing = TRUE), ]
print(port_city_df)#> portCity frequency
#> 1 Busan 52634
#> 4 Tianjin 52522
#> 5 Tokyo 52263
#> 3 Singapore 51812
#> 2 Shanghai 51730
# Create a bar plot of port city frequencies with logarithmic y-axis scale
barplot(port_city_df$frequency, names.arg = port_city_df$portCity,
main = "Frequency of Shipments by Port City", las = 2, log = "y")
Insight : Busan is a large city in the distribution of goods
delivery with a total of 52634 shipments followed by Tianjin, Tokyo,
Singapore & Shanghai
Most Frequenct destination country
# Calculate the frequency of each port.city
port_country_freq <- sort(table(shipping_data$country), decreasing = T)
port_country_df <- as.data.frame(port_country_freq)
colnames(port_country_df) <- c("Country", "frequency")
print(port_country_df)#> Country frequency
#> 1 China 104252
#> 2 South Korea 52634
#> 3 Japan 52263
#> 4 Singapore 51812
# Calculate the frequency of each country
port_country_freq <- sort(table(shipping_data$country), decreasing = TRUE)
port_country_df <- as.data.frame(port_country_freq)
colnames(port_country_df) <- c("Country", "Frequency")
y_max <- max(port_country_df$Frequency)
options(scipen = 10)
# Create a bar plot of country frequencies
barplot(port_country_df$Frequency, names.arg = port_country_df$Country,
main = "Frequency of Shipments by Country", las = 2, ylim = c(0, y_max))
text(x = 1:length(port_country_df$Country), y = port_country_df$Frequency, labels = port_country_df$Frequency, pos = 3)
Insight :
- Based on this data China is the most frequency distribution shipment of any other country
- South Korea, Japan & Singapore almost has the same frequency distribution
analyze of shipment durations
based on different city
# Calculate the mean shipment duration for each city
duration_city <- aggregate(shipment.duration ~ port.city,
data = shipping_data,
FUN = mean)
# Sort the results in descending order of mean shipment duration
duration_city <- duration_city[order(duration_city$shipment.duration, decreasing = F), ]
# Print the sorted duration statistics
duration_city#> port.city shipment.duration
#> 1 Busan 10.99122 days
#> 2 Shanghai 10.99233 days
#> 4 Tianjin 10.99989 days
#> 3 Singapore 11.01035 days
#> 5 Tokyo 11.01867 days
Based on different country
# Calculate the mean shipment duration for each city
duration_country <- aggregate(shipment.duration ~ country,
data = shipping_data,
FUN = mean)
duration_country <- duration_country[order(duration_country$shipment.duration, decreasing = F), ]
print(duration_country)#> country shipment.duration
#> 4 South Korea 10.99122 days
#> 1 China 10.99613 days
#> 3 Singapore 11.01035 days
#> 2 Japan 11.01867 days
total revenue generated from shipments to each country and port city combination.
# Calculate the total revenue for each country and port city combination
revenue_country_city <- aggregate(price.... ~ country + port.city,
data = shipping_data,
FUN = sum)
revenue_country_city <- revenue_country_city[order(revenue_country_city$price, decreasing = TRUE), ]
print(revenue_country_city)#> country port.city price....
#> 4 China Tianjin 241688069
#> 1 South Korea Busan 225396653
#> 5 Japan Tokyo 214838955
#> 3 Singapore Singapore 206107905
#> 2 China Shanghai 203416892
# Sort the results in descending order of total revenue
revenue_country_city <- revenue_country_city[order(revenue_country_city$price...., decreasing = TRUE), ]
# Create a bar plot of total revenue
barplot(revenue_country_city$price....,
names.arg = paste(revenue_country_city$country, revenue_country_city$port.city, sep = " - "),
main = "Total Revenue by Country and Port City",
las = 1)
Insight : - China Has the most generated total Revenue by
country and port city among the other country
Conclusion
In 2 weeks, from 12 March 2023 to 28 March 2023 the distribution of PT X has total distribution of 263821 items.This port was very busy that the distribution on everyday never drop more than 1% than total previous shipment distribution.This company main focus shipmentPort Distribution on ASIA Continent such as China, Singapore, South Korea & Japan. From this continent they also continue their distribution based on the port city and each country has many-many port city destination. The highest volume and price for shipment item that concur in this week 2 is sailbot with volume & total price is 266170 & for their median price $368.65.
Recomendations :
Because China is the highest destination country and profit maker shipment distribution, the company need to take this as a serious because a small mistake can create a domino effect that disrupts shipment distribution.
Need to increase market target at any other country besides china Based on our calculation, user mostly sending goods to China with a total frequency for 2 weeks of 104252, it almost 2 times higer from other country