Simple Exploratory Data Analyst Shipping_data

Introduction Shipment Port

Business Question

You can find the data set in here

In the bustling world of international trade, ports serve as gateways to exchange goods across borders. To shed light on the dynamics of port shipments, we embark on a data-driven journey through the Port Shipment Dataset. This rich collection comprises 260,000 items meticulously packed for shipment, offering a wealth of information about various aspects of international trade. Our objective is to unravel valuable insights from this dataset, employing two crucial steps: data wrangling and exploratory data analysis (EDA).

Data Wrangling: Data wrangling serves as the foundational step in any data analysis project, ensuring data quality and preparing it for further exploration. In the case of the Port Shipment Dataset, we are presented with a vast array of information pertaining to each shipment. The first challenge lies in assessing the dataset’s integrity, detecting missing values, inconsistencies, or outliers. Additionally, we will need to standardize the data format, ensuring consistency across the various attributes such as product name, price, weight, and dimensions. By cleaning and restructuring the data, we aim to create a reliable and harmonized foundation for our subsequent analysis.
Exploratory Data Analysis: With the data in a refined state, we embark on the exciting phase of exploratory data analysis (EDA). EDA allows us to gain a deeper understanding of the dataset’s characteristics, relationships, and patterns.

Overview Dataset

This dataset contains information on international shipments of a port. The dataset includes the name of the product, its price, weight, length, width, and height, as well as the date it was shipped and the destination port. This data set take time in 2023 March

This dataset can be used for educational and research purposes, such as data analysis, machine learning, container problem, and data visualization. The data was collected with the intention of representing typical shipments that could be expected to be sent from one port to the other.

It is important to note that this dataset is not based on any actual shipments or real-world data. Therefore, the values and characteristics of the products, as well as the shipment dates and destinations, are purely hypothetical and should not be used for any practical or commercial purposes. Any conclusions drawn from the analysis of this dataset should be viewed as hypothetical and not representative of actual international shipments.

dataset :

name : Name of item
price($) : Price of item in dolars
weight(kg) : Weight of item in kilograms
length(m) : Length of item in meters
width (m) : Width of item in meters
height(m) : height of item in meters
shipment.date : Date when item has arrived to the port
destination.port : Port where item is destined to be sent

Data Preparation

Import Package & Data set

Import Package :

library(stringr)
library(ggplot2)

Import DataSet

shipping_data <- read.csv(file = "data_input/shipping_data.csv")
rmarkdown::paged_table(shipping_data)

Checking Structure using `str()`

The str() function is used to display the structure of the object. The object can be a vector, data frame, list, or other R object.

str(shipping_data)

#> 'data.frame':    263821 obs. of  8 variables:
#>  $ name            : chr  "Camera Bag" "Portable Bluetooth Keyboard" "Large Flat Rate Box" "Ceramic Tiles" ...
#>  $ price....       : num  37.7 144.7 38.6 10.3 21.6 ...
#>  $ weight..kg.     : num  1.1 0.39 0.97 6.22 1.18 ...
#>  $ length..m.      : num  0.4 0.11 0.79 0.36 17.77 ...
#>  $ width..m.       : num  0.39 0.06 0.55 0.37 0.27 0.35 1.3 0.78 0.94 0.14 ...
#>  $ height..m.      : num  0.26 0.03 0.35 0.02 0.13 0.15 0.26 0.68 1.47 0.03 ...
#>  $ shipment.date   : chr  "2023-03-19" "2023-03-21" "2023-03-25" "2023-03-15" ...
#>  $ destination.port: chr  "Port of Singapore (Singapore)" "Port of Busan (South Korea)" "Port of Tianjin (China)" "Port of Shanghai (China)" ...

The dataset provided has the following characteristics according to the str() function:

It is a data.frame with 263,821 observations (rows) and 8 variables (columns).

Checking Total Row and Column using `dim()` or `nrow(), ncol()`

The dim() function is used to find out the dimensions of the dataframe.

dim(shipping_data)

#> [1] 263821      8

The nrow() function is used to determine the number of rows, while ncol() is used to determine the number of columns.

nrow_shipping_data <- nrow(shipping_data)
ncol_shipping_data <- ncol(shipping_data)

print(paste("Shipping Data total Row is: ", nrow_shipping_data))

#> [1] "Shipping Data total Row is:  263821"

print(paste("Shipping Data total Column is: ", ncol_shipping_data))

#> [1] "Shipping Data total Column is:  8"

Data Inspection

Explicit Coercion

The initial stage before conducting data analysis is to ensure that the data used is clean. One of the data cleansing techniques is changing the data type to the correct data type, otherwise known as the term explicit coercion.

# check the retail data structure again
str(shipping_data)

#> 'data.frame':    263821 obs. of  8 variables:
#>  $ name            : chr  "Camera Bag" "Portable Bluetooth Keyboard" "Large Flat Rate Box" "Ceramic Tiles" ...
#>  $ price....       : num  37.7 144.7 38.6 10.3 21.6 ...
#>  $ weight..kg.     : num  1.1 0.39 0.97 6.22 1.18 ...
#>  $ length..m.      : num  0.4 0.11 0.79 0.36 17.77 ...
#>  $ width..m.       : num  0.39 0.06 0.55 0.37 0.27 0.35 1.3 0.78 0.94 0.14 ...
#>  $ height..m.      : num  0.26 0.03 0.35 0.02 0.13 0.15 0.26 0.68 1.47 0.03 ...
#>  $ shipment.date   : chr  "2023-03-19" "2023-03-21" "2023-03-25" "2023-03-15" ...
#>  $ destination.port: chr  "Port of Singapore (Singapore)" "Port of Busan (South Korea)" "Port of Tianjin (China)" "Port of Shanghai (China)" ...

To change the data type, we can use the as.___() function where ___ is filled with the destination data type. Example:

as. character()
as. Date()
as. integer()
as. numeric()
as. factor()

From the data, some of the columns that I changed the data type are: - shipment.date -> date

#explicit coercion
shipping_data$shipment.date <- as.Date(shipping_data$shipment.date)

Check Missing Value

The is.na() function is used to check for missing values for each value. The colSums() function is used to sum up the values in each column.

colSums(is.na(shipping_data))

#>             name        price....      weight..kg.       length..m. 
#>                0                0               13              184 
#>        width..m.       height..m.    shipment.date destination.port 
#>                0                0             2638                0

Because there are a total of ‘2638’ missing shipment.date data. I will delete the data on the grounds that the data is less than 1/10 of the existing data and makes it easier for me to do the Exploratory process

## Missing value data Treatment

# Remove rows with missing values
shipping_data <- na.omit(shipping_data)

Conclusion: shipment data has missing value. So i remove all the rows with missing values because is only the small portion of the total of data, and for exploratory purpose.

Data Wrangling

Adding a new Variabel

Add generate random additional time “Shipping.Arival.date”

because this data set only have shipment.date and doesnt have arival time I took the initiative to add dummy random data to make it easier and extract more information from this dataset.

# Set the seed for reproducibility
set.seed(123)

# Generate random additional time in days
random_days <- sample(1:7, nrow(shipping_data), replace = TRUE)

# Convert shipment.date to Date type
shipping_data$shipment.date <- as.Date(shipping_data$shipment.date)

# Calculate shipment.arrive.date
shipping_data$shipment.arrive.date <- shipping_data$shipment.date + 7 + random_days

Adding port.city and country column based on the destination.port

in the session to get an insight I tried to extract several countries and port cities from the destination.port column to be able to dig deeper information For example:

destination.port	port.city	country
Port of Singapore (Singapore)	Singapore	Singapore
Port of Busan (South Korea)	Busan	South Korea
Port of Tianjin (China)	Tianjin	China
Port of Shanghai (China)	Shanghai	China

# Extract port city and country
result <- str_match(shipping_data$destination.port, ".*\\b(\\w+)\\s+\\(([^)]+)\\)$")

# Create new columns for port city and country
shipping_data$port.city <- result[, 2]
shipping_data$country <- result[, 3]

# Remove parentheses from port city
shipping_data$port.city <- gsub("\\s*\\([^)]+\\)", "", shipping_data$port.city)

# Remove parentheses from country
shipping_data$country <- gsub("[()]", "", shipping_data$country)

head(result)

#>      [,1]                            [,2]        [,3]         
#> [1,] "Port of Singapore (Singapore)" "Singapore" "Singapore"  
#> [2,] "Port of Busan (South Korea)"   "Busan"     "South Korea"
#> [3,] "Port of Tianjin (China)"       "Tianjin"   "China"      
#> [4,] "Port of Shanghai (China)"      "Shanghai"  "China"      
#> [5,] "Port of Tianjin (China)"       "Tianjin"   "China"      
#> [6,] "Port of Shanghai (China)"      "Shanghai"  "China"

Shipment Duration & Shipment Volume

Calculate the shipment duration:

This can provide insights into shipping efficiency or potential delays.

# Convert shipment.date and shipment.arrive.date to Date type
shipping_data$shipment.date <- as.Date(shipping_data$shipment.date)
shipping_data$shipment.arrive.date <- as.Date(shipping_data$shipment.arrive.date)

# Calculate shipment duration
shipping_data$shipment.duration <- shipping_data$shipment.arrive.date - shipping_data$shipment.date

# Print the dataset with shipment duration
head(shipping_data)

#>                          name price.... weight..kg. length..m. width..m.
#> 1                  Camera Bag     37.66        1.10       0.40      0.39
#> 2 Portable Bluetooth Keyboard    144.65        0.39       0.11      0.06
#> 3         Large Flat Rate Box     38.57        0.97       0.79      0.55
#> 4               Ceramic Tiles     10.34        6.22       0.36      0.37
#> 5                 Garden Hose     21.63        1.18      17.77      0.27
#> 6                Cookware Set    401.64        7.60       0.49      0.35
#>   height..m. shipment.date              destination.port shipment.arrive.date
#> 1       0.26    2023-03-19 Port of Singapore (Singapore)           2023-04-02
#> 2       0.03    2023-03-21   Port of Busan (South Korea)           2023-04-04
#> 3       0.35    2023-03-25       Port of Tianjin (China)           2023-04-04
#> 4       0.02    2023-03-15      Port of Shanghai (China)           2023-03-28
#> 5       0.13    2023-03-25       Port of Tianjin (China)           2023-04-04
#> 6       0.15    2023-03-20      Port of Shanghai (China)           2023-03-29
#>   port.city     country shipment.duration
#> 1 Singapore   Singapore           14 days
#> 2     Busan South Korea           14 days
#> 3   Tianjin       China           10 days
#> 4  Shanghai       China           13 days
#> 5   Tianjin       China           10 days
#> 6  Shanghai       China            9 days

Create a new column indicating the shipment volume:

I’m calculating the volume of each shipment by multiplying the length, width, and height. This new column can provide insights into the size or capacity of the shipments.

# Create a new column for shipment volume
shipping_data$shipment.volume <- shipping_data$length * shipping_data$width * shipping_data$height

# Print the updated dataset
head(shipping_data)

#>                          name price.... weight..kg. length..m. width..m.
#> 1                  Camera Bag     37.66        1.10       0.40      0.39
#> 2 Portable Bluetooth Keyboard    144.65        0.39       0.11      0.06
#> 3         Large Flat Rate Box     38.57        0.97       0.79      0.55
#> 4               Ceramic Tiles     10.34        6.22       0.36      0.37
#> 5                 Garden Hose     21.63        1.18      17.77      0.27
#> 6                Cookware Set    401.64        7.60       0.49      0.35
#>   height..m. shipment.date              destination.port shipment.arrive.date
#> 1       0.26    2023-03-19 Port of Singapore (Singapore)           2023-04-02
#> 2       0.03    2023-03-21   Port of Busan (South Korea)           2023-04-04
#> 3       0.35    2023-03-25       Port of Tianjin (China)           2023-04-04
#> 4       0.02    2023-03-15      Port of Shanghai (China)           2023-03-28
#> 5       0.13    2023-03-25       Port of Tianjin (China)           2023-04-04
#> 6       0.15    2023-03-20      Port of Shanghai (China)           2023-03-29
#>   port.city     country shipment.duration shipment.volume
#> 1 Singapore   Singapore           14 days        0.040560
#> 2     Busan South Korea           14 days        0.000198
#> 3   Tianjin       China           10 days        0.152075
#> 4  Shanghai       China           13 days        0.002664
#> 5   Tianjin       China           10 days        0.623727
#> 6  Shanghai       China            9 days        0.025725

Second Data Coertions

shipping_data$shipment.arrive.date <- as.Date(shipping_data$shipment.arrive.date)
shipping_data$country <- as.factor(shipping_data$country)
shipping_data$port.city <- as.factor(shipping_data$port.city)

Summary new dataframe

summary(shipping_data)

#>      name             price....          weight..kg.         length..m.     
#>  Length:260987      Min.   :      1.0   Min.   :    0.05   Min.   : 0.0500  
#>  Class :character   1st Qu.:     27.9   1st Qu.:    0.71   1st Qu.: 0.2900  
#>  Mode  :character   Median :     87.6   Median :    2.96   Median : 0.6000  
#>                     Mean   :   4182.3   Mean   :  323.90   Mean   : 0.9535  
#>                     3rd Qu.:    257.7   3rd Qu.:   67.34   3rd Qu.: 1.1400  
#>                     Max.   :1998160.0   Max.   :24982.35   Max.   :30.0000  
#>    width..m.        height..m.     shipment.date        destination.port  
#>  Min.   :0.0100   Min.   :0.0100   Min.   :2023-03-12   Length:260987     
#>  1st Qu.:0.2100   1st Qu.:0.1200   1st Qu.:2023-03-16   Class :character  
#>  Median :0.3900   Median :0.2500   Median :2023-03-20   Mode  :character  
#>  Mean   :0.5301   Mean   :0.4626   Mean   :2023-03-19                     
#>  3rd Qu.:0.8000   3rd Qu.:0.6600   3rd Qu.:2023-03-24                     
#>  Max.   :6.0000   Max.   :5.9900   Max.   :2023-03-28                     
#>  shipment.arrive.date     port.city            country       shipment.duration
#>  Min.   :2023-03-20   Busan    :52634   China      :104252   Length:260987    
#>  1st Qu.:2023-03-27   Shanghai :51730   Japan      : 52263   Class :difftime  
#>  Median :2023-03-31   Singapore:51812   Singapore  : 51812   Mode  :numeric   
#>  Mean   :2023-03-30   Tianjin  :52522   South Korea: 52634                    
#>  3rd Qu.:2023-04-04   Tokyo    :52263   NA's       :    26                    
#>  Max.   :2023-04-11   NA's     :   26                                         
#>  shipment.volume   
#>  Min.   :  0.0000  
#>  1st Qu.:  0.0099  
#>  Median :  0.0564  
#>  Mean   :  1.5715  
#>  3rd Qu.:  0.4532  
#>  Max.   :937.8705

insight :

Price:

The prices vary significantly, ranging from (in [$])1.0 to 1,998,160.0, with a mean price of 4,182.4. There seems to be a significant difference between the median price (87.6) and the mean price, indicating the presence of outliers or extreme values in the higher price range.

Weight:

The weight of shipments varies from 0.05 kg to 24,982.35 kg, with a mean weight of 323.91 kg. There is a substantial difference between the median weight (2.96 kg) and the mean weight, suggesting the presence of outliers or heavier shipments.

Length, Width, and Height:

The dimensions of the shipments (length, width, and height) exhibit a wide range of values. The maximum values for length (30.0 meters) and width (6.0 meters) indicate the presence of larger-sized shipments.

Shipment Date and Duration:

The shipment dates range from March 12, 2023, to March 28, 2023. The mean shipment date is March 19, 2023, indicating that the majority of shipments fall around this date.

Shipment Volume:

The shipment volumes vary from 0.0 to 937.8705, with a mean volume of 1.5711. The median shipment volume (0.0564) is considerably lower than the mean, indicating the presence of shipments with very high volumes.

Destination :

Their was only four country destionation and China is leading with the most shipping distribution with total of 104252 items

Explanatorty or business Question

head(shipping_data)

#>                          name price.... weight..kg. length..m. width..m.
#> 1                  Camera Bag     37.66        1.10       0.40      0.39
#> 2 Portable Bluetooth Keyboard    144.65        0.39       0.11      0.06
#> 3         Large Flat Rate Box     38.57        0.97       0.79      0.55
#> 4               Ceramic Tiles     10.34        6.22       0.36      0.37
#> 5                 Garden Hose     21.63        1.18      17.77      0.27
#> 6                Cookware Set    401.64        7.60       0.49      0.35
#>   height..m. shipment.date              destination.port shipment.arrive.date
#> 1       0.26    2023-03-19 Port of Singapore (Singapore)           2023-04-02
#> 2       0.03    2023-03-21   Port of Busan (South Korea)           2023-04-04
#> 3       0.35    2023-03-25       Port of Tianjin (China)           2023-04-04
#> 4       0.02    2023-03-15      Port of Shanghai (China)           2023-03-28
#> 5       0.13    2023-03-25       Port of Tianjin (China)           2023-04-04
#> 6       0.15    2023-03-20      Port of Shanghai (China)           2023-03-29
#>   port.city     country shipment.duration shipment.volume
#> 1 Singapore   Singapore           14 days        0.040560
#> 2     Busan South Korea           14 days        0.000198
#> 3   Tianjin       China           10 days        0.152075
#> 4  Shanghai       China           13 days        0.002664
#> 5   Tianjin       China           10 days        0.623727
#> 6  Shanghai       China            9 days        0.025725

Top 10 shipments with

Highest volume.

# Calculate the total volume for each shipment
shipment_total_volume <- aggregate(shipment.volume   ~ name, data = shipping_data, FUN = sum)

# Sort the results by total volume in descending order
sorted_shipments <- shipment_total_volume[order(shipment_total_volume$shipment.volume  , decreasing = TRUE), ]

# Select the top 10 shipments with the highest volume
top_10_volume <- head(sorted_shipments, 10)

# Print the top 10 shipments with the highest volume
print(top_10_volume)

#>                  name shipment.volume
#> 226          Sailboat      266298.173
#> 233        Small Boat       15941.631
#> 54                Car        9775.406
#> 251        Sports Car        9221.106
#> 17                Bed        7626.883
#> 10   Backpacking Tent        7257.995
#> 95           Forklift        4804.001
#> 177 Pallete of Coffee        2969.395
#> 204             Piano        2861.961
#> 8                 ATV        2633.927

# Create a bar plot of the top 10 shipments with the highest volume
barplot(top_10_volume$shipment.volume, names.arg = top_10_volume$name, 
        main = "Top 10 Shipments with Highest Volume",
        las = 2)

Highest Price

# Calculate the mean price for each item
item_mean_price <- aggregate(price.... ~ name, data = shipping_data, FUN = mean)
sorted_items <- item_mean_price[order(item_mean_price$price...., decreasing = TRUE), ]

# Select the top 10 items with the highest mean prices
top_10_price <- head(sorted_items, 10)
print(top_10_price)

#>                       name   price....
#> 251             Sports Car 1016532.127
#> 226               Sailboat  277653.895
#> 54                     Car  102057.856
#> 233             Small Boat   12835.410
#> 204                  Piano   11233.303
#> 185     Pallete of Laptops   11022.831
#> 95                Forklift    7585.236
#> 193 Pallete of Smartphones    6499.505
#> 125                Jet Ski    6118.588
#> 109              Golf Cart    5986.065

# Create a bar plot for the top 10 shipments with highest price
barplot(top_10_price$price...., names.arg = top_10_price$name, las = 2)
title(main = "Top 10 Shipments items with Highest Price")

Average price, weight, and dimensions of the shipments based on their country

# Calculate average price, weight, and dimensions by country
averages <- aggregate(shipping_data[, c("price....", "weight..kg.", "length..m.", "width..m.", "height..m.")], 
                      by = list(country = shipping_data$country),
                      FUN = mean)
averages <- averages[order(averages$country), ]
colnames(averages)[2:6] <- c("avg_price", "avg_weight", "avg_length", "avg_width", "avg_height")

print(averages)

#>       country avg_price avg_weight avg_length avg_width avg_height
#> 1       China  4269.510   332.8498  0.9589953 0.5319058  0.4621350
#> 2       Japan  4110.728   318.8995  0.9580426 0.5290167  0.4642499
#> 3   Singapore  3977.996   322.2713  0.9540344 0.5271979  0.4625537
#> 4 South Korea  4282.339   312.7865  0.9372075 0.5305135  0.4617606

insight : From the data wrangling i get an insight about china has the most expensive price of shipping the most expensive goods, this might occur due to the goods from a factor of four variable containers or packages that are quite high

Distribution of shipment dates:

shipping_data$shipment.date <- as.Date(shipping_data$shipment.date, format = "%Y-%m-%d")
ggplot(shipping_data, aes(x = shipment.date)) +
  geom_histogram(binwidth = 1, fill = "lightblue", color = "black") +
  labs(x = "Shipment Date", y = "Frequency") +
  ggtitle("Distribution of Shipment Dates")

insight : In this diagram I take the insight that shipments from March 12 to March 28 are always fulfilled with a busy and stable delivery schedule and there has never been a single drop

top total weight of shipments for each port city for every day:

shipping_data$shipment.date <- as.Date(shipping_data$shipment.date)

# Calculate the maximum weight of shipments for each port city for every day
max_weight <- aggregate(weight..kg. ~ shipment.date + port.city, 
                        data = shipping_data, 
                        FUN = max)

# Filter to keep only the top weight for each day
top_weight <- aggregate(weight..kg. ~ shipment.date, 
                        data = max_weight, 
                        FUN = max)

# Identify the port city with the highest weight for each day
top_port_city <- aggregate(port.city ~ shipment.date + weight..kg., 
                           data = max_weight, 
                           FUN = function(x) x[which.max(x)])
top_weight <- merge(top_weight, top_port_city)

# Print the result
print(top_weight)

#>    shipment.date weight..kg. port.city
#> 1     2023-03-12    24868.32     Tokyo
#> 2     2023-03-13    24968.59  Shanghai
#> 3     2023-03-14    24944.99  Shanghai
#> 4     2023-03-15    24956.64  Shanghai
#> 5     2023-03-16    24977.65     Tokyo
#> 6     2023-03-17    24975.39 Singapore
#> 7     2023-03-18    24736.21     Busan
#> 8     2023-03-19    24642.78     Tokyo
#> 9     2023-03-20    24430.57     Busan
#> 10    2023-03-21    24980.15 Singapore
#> 11    2023-03-22    24855.97  Shanghai
#> 12    2023-03-23    24878.67   Tianjin
#> 13    2023-03-24    24934.49 Singapore
#> 14    2023-03-25    24860.83 Singapore
#> 15    2023-03-26    24982.35   Tianjin
#> 16    2023-03-27    24961.47   Tianjin
#> 17    2023-03-28    24898.33  Shanghai

insight : for the most total weight being delivered to each city everyday Tokyo appears 3 times, shangai 5 times, busan 2 times, tianjin 3 times and singapore 5 times in this timeframe

most frequency destination ports for shipments

most frequency city distribution

# Calculate the frequency of each port.city
port_city_freq <- table(shipping_data$port.city)
port_city_df <- as.data.frame(port_city_freq)
colnames(port_city_df) <- c("portCity", "frequency")

# Sort the dataframe by frequency in descending order
port_city_df <- port_city_df[order(port_city_df$frequency, decreasing = TRUE), ]
print(port_city_df)

#>    portCity frequency
#> 1     Busan     52634
#> 4   Tianjin     52522
#> 5     Tokyo     52263
#> 3 Singapore     51812
#> 2  Shanghai     51730

# Create a bar plot of port city frequencies with logarithmic y-axis scale
barplot(port_city_df$frequency, names.arg = port_city_df$portCity,
        main = "Frequency of Shipments by Port City", las = 2, log = "y")

Insight : Busan is a large city in the distribution of goods delivery with a total of 52634 shipments followed by Tianjin, Tokyo, Singapore & Shanghai

Most Frequenct destination country

# Calculate the frequency of each port.city
port_country_freq <- sort(table(shipping_data$country), decreasing = T)
port_country_df <- as.data.frame(port_country_freq)
colnames(port_country_df) <- c("Country", "frequency")
print(port_country_df)

#>       Country frequency
#> 1       China    104252
#> 2 South Korea     52634
#> 3       Japan     52263
#> 4   Singapore     51812

# Calculate the frequency of each country
port_country_freq <- sort(table(shipping_data$country), decreasing = TRUE)
port_country_df <- as.data.frame(port_country_freq)
colnames(port_country_df) <- c("Country", "Frequency")
y_max <- max(port_country_df$Frequency)
options(scipen = 10)

# Create a bar plot of country frequencies
barplot(port_country_df$Frequency, names.arg = port_country_df$Country, 
        main = "Frequency of Shipments by Country", las = 2, ylim = c(0, y_max))
text(x = 1:length(port_country_df$Country), y = port_country_df$Frequency, labels = port_country_df$Frequency, pos = 3)

Insight :

Based on this data China is the most frequency distribution shipment of any other country
South Korea, Japan & Singapore almost has the same frequency distribution

analyze of shipment durations

based on different city

# Calculate the mean shipment duration for each city
duration_city <- aggregate(shipment.duration ~ port.city, 
                            data = shipping_data, 
                            FUN = mean)

# Sort the results in descending order of mean shipment duration
duration_city <- duration_city[order(duration_city$shipment.duration, decreasing = F), ]

# Print the sorted duration statistics
duration_city

#>   port.city shipment.duration
#> 1     Busan     10.99122 days
#> 2  Shanghai     10.99233 days
#> 4   Tianjin     10.99989 days
#> 3 Singapore     11.01035 days
#> 5     Tokyo     11.01867 days

Based on different country

# Calculate the mean shipment duration for each city
duration_country <- aggregate(shipment.duration ~ country, 
                            data = shipping_data, 
                            FUN = mean)
duration_country <- duration_country[order(duration_country$shipment.duration, decreasing = F), ]
print(duration_country)

#>       country shipment.duration
#> 4 South Korea     10.99122 days
#> 1       China     10.99613 days
#> 3   Singapore     11.01035 days
#> 2       Japan     11.01867 days

total revenue generated from shipments to each country and port city combination.

# Calculate the total revenue for each country and port city combination
revenue_country_city <- aggregate(price.... ~ country + port.city, 
                           data = shipping_data, 
                           FUN = sum)
revenue_country_city <- revenue_country_city[order(revenue_country_city$price, decreasing = TRUE), ]

print(revenue_country_city)

#>       country port.city price....
#> 4       China   Tianjin 241688069
#> 1 South Korea     Busan 225396653
#> 5       Japan     Tokyo 214838955
#> 3   Singapore Singapore 206107905
#> 2       China  Shanghai 203416892

# Sort the results in descending order of total revenue
revenue_country_city <- revenue_country_city[order(revenue_country_city$price...., decreasing = TRUE), ]

# Create a bar plot of total revenue
barplot(revenue_country_city$price...., 
        names.arg = paste(revenue_country_city$country, revenue_country_city$port.city, sep = " - "),
        main = "Total Revenue by Country and Port City",
        las = 1)

Insight : - China Has the most generated total Revenue by country and port city among the other country

Conclusion

In 2 weeks, from 12 March 2023 to 28 March 2023 the distribution of PT X has total distribution of 263821 items.This port was very busy that the distribution on everyday never drop more than 1% than total previous shipment distribution.This company main focus shipmentPort Distribution on ASIA Continent such as China, Singapore, South Korea & Japan. From this continent they also continue their distribution based on the port city and each country has many-many port city destination. The highest volume and price for shipment item that concur in this week 2 is sailbot with volume & total price is 266170 & for their median price $368.65.

Recomendations :

Because China is the highest destination country and profit maker shipment distribution, the company need to take this as a serious because a small mistake can create a domino effect that disrupts shipment distribution.
Need to increase market target at any other country besides china Based on our calculation, user mostly sending goods to China with a total frequency for 2 weeks of 104252, it almost 2 times higer from other country

Simple Exploratory Data Analyst Shipping_data