Homework assignment No.1

Choose a data set (the number of data attributes should be more than 5), explain why it is important or interesting for you.
Formulate research questions (for which you expect to find the answers)
Make some visualizations for the formulated questions.
Prepare a presentation (where you explain the data, questions, problems, results) and upload it.

1. Chosing the dataset

This dataset contains historical information about cargo that has been assigned to various trucks. The dataset contains 66 objects with 16 attributes. An example of the first few objects is presented below.

##       ID Value              SenderCity ReceiverCity             DateCreated
## 1 165212   288           Oberpframmern      Vilnius 2021-10-29 15:15:19.560
## 2 165754  1400              Ochtendung  Radziunu k. 2021-11-09 12:53:23.840
## 3 165761   590                    Lage       Moscow 2021-11-09 14:10:58.930
## 4 166085  1060                  Bremen      Vilnius 2021-11-11 15:54:19.480
## 5 166125  2450 Leinfelden-Echterdingen      Ivanovo 2021-11-12 09:50:32.430
## 6 166255    60               Weinstadt      Vilnius 2021-11-15 13:44:16.883
##    SenderX   SenderY ReceiverX ReceiverY  LDM Weight Volume UnitTypeID
## 1 48.02556 11.801236  54.63984  25.27138 1.00  456.0  1.800         50
## 2 50.35402  7.423818  54.36220  23.98614 8.50  500.0 42.000         50
## 3 51.98553  8.848820  55.89406  37.44395 0.45  382.0  1.620         50
## 4 53.12085  8.734309  54.63984  25.27138 7.00 9000.0 24.480         33
## 5 48.70013  9.146070  56.99623  41.04270 3.60 4264.0  9.072         36
## 6 48.81519  9.375821  54.63984  25.27138 0.10    9.2  0.097         44
##   FirstDimension SecondDimension    UnitTypeName
## 1            200             100              PL
## 2            850             200              PL
## 3            118              88              PL
## 4            125              85  EP(120x80x220)
## 5            120             120 PL(120x120x220)
## 6             59              40             BOX

I chose this dataset because it is from one of the projects I’m currently working on. This data so far has been used only to depict truck routes, hence I hope to obtain a more in-depth look into what causes various decisions to be made during route planning.

2. Research questions

Do the dimensional characteristics, namely LDM (loading meter), weight and volume, influence the value of cargo?
What is the distribution of value among different cargo types?
Do certain regions have better paying cargo than others?

3. Visualizations

corrplot(cor(data.frame(Value = data[,2]),data[,c(10:12)]), type = "upper", diag = T, method = "pie", title = "The correlation between value and other attributes")

The calculated correlation coefficients show that the Value depends most on the LDM attribute of cargo. In general, higher-dimensional attributes bring more value to cargo.

p <- ggplot(data, aes(x = UnitTypeName, y = Value, fill = UnitTypeName)) + 
  ggtitle("Values for different types of cargo") + 
  geom_boxplot(outlier.colour="black",outlier.shape=16,outlier.size=3, notch=F) + 
  labs(x = "", fill = "Cargo type")

The highest valued cargo is observed to be of PL(120x120x220) type. Since it is only a single occurrence, the PL(120x100x220) type is identified as the highest valued cargo type with the median value of around 750.
The rest of the cargo is distributed significantly lower around the 250 mark with a few outliers.

g = list(
  scope = "europe", 
  domain = list(
    x = c(0, 1), 
    y = c(0, 1)
  ), 
  center = list("lat" = 50.826941, "lon" = 10.4261393),
  lataxis = list(range = c(48.3, 55.5)), 
  lonaxis = list(range = c(5.9, 14.9)),
  showland = TRUE, 
  landcolor = "rgb(229,236,246)", 
  showframe = TRUE, 
  projection = list(type = "Mercator"), 
  resolution = 50, 
  countrycolor = "rgb(102,51,153)", 
  coastlinecolor = "rgb(102,51,153)", 
  showcoastlines = TRUE
)

fig <- plot_geo(data, lat = ~SenderX, lon = ~SenderY, color = ~Value, size = ~Value)
fig <- fig %>% add_markers(
  text = ~paste(data$Value, "eu"), hoverinfo = "text", marker = list(sizeref=0.09, sizemode="area")
)
fig <- fig %>% layout(title = 'Distribution of cargo by value', geo = g)

The cargo seems to be clustered to the west side of Germany. However, clear clustering of high valued cargo is not observed, as such cargo appears to be evenly spread out amongst all cargo.

Data visualization

Arnas Matusevičius

2022-02-14

Homework assignment No.1

1. Chosing the dataset

2. Research questions

3. Visualizations