I will be doing some exploratory analysis and plotting for the Miliary Strength Ranking dataset found on Kaggle. This is meant to be a short project (<2 hours) to understand the data and make some decent graphics. https://www.kaggle.com/blitzr/gfp2017
library(tidyverse)
library(rworldmap)
There are actually two versions of this file, one of which has two rows of column headers. Importing them both and combining the additional details into the column headers can be done as such:
dt <- read.csv('GlobalFirePower.csv')
glimpse(dt)
Observations: 133
Variables: 47
$ Country <fctr> United States, Russia, China, India, France, United Kingdom, Japan, Turkey, Germany, Egypt, Italy, South...
$ ISO3 <fctr> USA, RUS, CHN, IND, FRA, UKD, JPN, TUR, GER, EGP, ITA, SKO, PAK, INO, ISR, VTN, BRA, TAI, POL, THA, IRN,...
$ Rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29...
$ Total.Population <int> 323995528, 142355415, 1373541278, 1266883598, 66836154, 64430428, 126702133, 80274604, 80722792, 94666993...
$ Manpower.Available <int> 145215000, 70000000, 750000000, 616000000, 30000000, 30000000, 54000000, 41640000, 37000000, 42000000, 28...
$ Fit.for.Service <int> 120025000, 47000000, 619000000, 489600000, 23750000, 24040000, 44000000, 35010000, 29540000, 35306000, 22...
$ Reaching.Military.Age <int> 4220000, 1355000, 19550000, 22900000, 775000, 750000, 1215000, 1375000, 791000, 1535000, 570000, 690000, ...
$ Total.Military.Personnel <int> 2363675, 3371027, 3712500, 4207250, 387635, 232675, 311875, 743415, 210000, 1329250, 267500, 5829750, 919...
$ Active.Personnel <int> 1373650, 798527, 2260000, 1362500, 204000, 151175, 248575, 382850, 180000, 454250, 247500, 627500, 637000...
$ Reserve.Personnel <int> 990025, 2572500, 1452500, 2844750, 183635, 81500, 63300, 360565, 30000, 875000, 20000, 5202250, 282000, 5...
$ Total.Aircraft.Strength <int> 13762, 3794, 2955, 2102, 1305, 856, 1594, 1018, 698, 1132, 822, 1477, 951, 441, 652, 278, 697, 850, 465, ...
$ Fighter.Aircraft <int> 2296, 806, 1271, 676, 296, 88, 288, 207, 92, 337, 79, 406, 301, 39, 243, 76, 43, 286, 99, 76, 137, 78, 45...
$ Attack.Aircraft <int> 2785, 1438, 1385, 809, 284, 168, 287, 207, 169, 427, 185, 448, 394, 58, 243, 73, 121, 287, 99, 95, 137, 7...
$ Transport.Aircraft <int> 5739, 1124, 782, 857, 662, 337, 481, 439, 345, 260, 424, 348, 261, 170, 101, 161, 369, 189, 229, 327, 203...
$ Trainer.Aircraft <int> 2831, 387, 352, 323, 283, 329, 447, 276, 47, 384, 189, 273, 190, 111, 219, 25, 175, 203, 98, 154, 79, 158...
$ Total.Helicopter.Strength <int> 6065, 1389, 912, 666, 610, 347, 659, 455, 375, 257, 430, 709, 316, 147, 143, 137, 230, 345, 211, 294, 126...
$ Attack.Helicopters <int> 947, 490, 206, 16, 49, 39, 119, 70, 47, 46, 59, 81, 52, 5, 48, 25, 12, 91, 29, 7, 12, 22, 20, 21, 38, 0, ...
$ Combat.Tanks <int> 5884, 20216, 6457, 4426, 406, 249, 700, 2445, 543, 4110, 200, 2654, 2924, 418, 2620, 1545, 469, 2005, 106...
$ Armored.Fighting.Vehicles <int> 41062, 31298, 4788, 6704, 6863, 5948, 2850, 7550, 5869, 13949, 6972, 2660, 2828, 1089, 10185, 3150, 1707,...
$ Self.Propelled.Artillery <int> 1934, 5972, 1710, 290, 325, 89, 202, 1013, 154, 889, 164, 1990, 465, 37, 650, 524, 112, 482, 443, 26, 320...
$ Towed.Artillery <int> 1299, 4625, 6246, 7414, 233, 138, 500, 697, 0, 2360, 92, 5374, 3278, 80, 300, 2200, 563, 1160, 72, 695, 2...
$ Rocket.Projectors <int> 1331, 3793, 1770, 292, 44, 42, 99, 811, 50, 1481, 21, 214, 134, 86, 48, 1100, 180, 72, 240, 13, 1474, 0, ...
$ Total.Naval.Assets <int> 415, 352, 714, 295, 118, 76, 131, 194, 81, 319, 143, 166, 197, 221, 65, 65, 110, 87, 83, 81, 398, 47, 967...
$ Aircraft.Carriers <int> 19, 1, 1, 3, 4, 2, 4, 0, 0, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0,...
$ Frigates <int> 8, 6, 51, 14, 11, 13, 0, 16, 10, 9, 14, 13, 10, 7, 0, 7, 9, 20, 2, 8, 5, 11, 11, 7, 8, 12, 12, 13, 0, 1, ...
$ Destroyers <int> 63, 15, 35, 11, 4, 6, 42, 0, 0, 0, 4, 12, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Corvettes <int> 0, 81, 35, 23, 0, 0, 6, 9, 5, 2, 4, 16, 0, 24, 3, 13, 4, 1, 2, 7, 3, 0, 2, 4, 13, 0, 0, 0, 9, 1, 3, 0, 6,...
$ Submarines <int> 70, 63, 68, 15, 10, 11, 17, 12, 6, 5, 7, 15, 8, 4, 6, 6, 5, 4, 5, 0, 33, 6, 76, 0, 6, 4, 3, 11, 5, 0, 0, ...
$ Patrol.Craft <int> 13, 28, 220, 139, 17, 4, 6, 34, 4, 227, 10, 70, 17, 74, 32, 25, 34, 51, 3, 32, 230, 13, 438, 11, 43, 20, ...
$ Mine.Warfare.Vessels <int> 11, 46, 31, 6, 18, 15, 25, 11, 13, 23, 10, 11, 3, 12, 0, 8, 6, 8, 25, 7, 10, 6, 25, 3, 1, 0, 6, 4, 8, 1, ...
$ Production..bbl.dy. <int> 8653000, 10110000, 4189000, 767600, 15340, 787200, 4666, 47670, 48830, 478400, 105700, 500, 93630, 789800...
$ Consumption..bbl.dy. <int> 19000000, 3320000, 10120000, 3510000, 1770000, 1510000, 4530800, 720000, 2400000, 740000, 1315000, 232500...
$ Proven.Reserves..bbl. <dbl> 3.652e+10, 8.000e+10, 2.500e+10, 5.675e+09, 8.408e+07, 2.800e+09, 5.416e+08, 3.000e+08, 1.000e+08, 4.400e...
$ Labor.Force <int> 158600000, 77410000, 805900000, 513700000, 30480000, 33170000, 65930000, 30240000, 45300000, 31960000, 25...
$ Merchant.Marine.Strength <int> 393, 1143, 2030, 340, 162, 504, 684, 629, 427, 67, 681, 786, 11, 1340, 8, 579, 109, 112, 9, 363, 76, 41, ...
$ Major.Ports...Terminals <int> 24, 7, 15, 7, 14, 14, 10, 9, 13, 7, 9, 8, 2, 9, 4, 6, 15, 4, 4, 5, 3, 20, 8, 4, 9, 12, 9, 5, 9, 6, 3, 3, ...
$ Roadway.Coverage..km. <int> 6586610, 982000, 3860800, 3320410, 951200, 394428, 1210251, 352046, 644480, 65050, 487700, 103029, 260760...
$ Railway.Coverage..km. <int> 224792, 87157, 86000, 63974, 29640, 16454, 27182, 8699, 41981, 5083, 20255, 3381, 7791, 5042, 975, 2632, ...
$ Serivecable.Airports <int> 13513, 1218, 507, 346, 464, 460, 175, 98, 539, 83, 129, 111, 151, 673, 47, 45, 4093, 37, 126, 101, 319, 4...
$ Defense.Budget <int> 587800, 44600, 161700, 51000, 35000, 45700, 43800, 8208, 39200, 4400, 34000, 43800, 7000, 6900, 15500, 33...
$ External.Debt <int> 17910000, 514800, 983500, 507000, 5360000, 8126000, 3240000, 410400, 5326000, 50670, 2444000, 385600, 640...
$ Foreign.Exchange...Gold <int> 117600, 365500, 3092000, 359100, 138200, 129600, 1233000, 115000, 173700, 15060, 130600, 372700, 20530, 1...
$ Purchasing.Power.Parity <int> 18560000, 3745000, 21270000, 8721000, 2737000, 2788000, 4932000, 1670000, 3979000, 1105000, 2221000, 1929...
$ Square.Land.Area..km. <int> 9826675, 17098242, 9596961, 3287263, 643801, 243610, 377915, 783562, 357022, 1001450, 301340, 99720, 7960...
$ Coastline..km. <fctr> 19924, 37653, 14500, 7000, 4853, 12429, 29751, 7200, 2389, 2450, 7600, 2413, 1046, 54716, 273, 3444, 749...
$ Shared.Borders..km. <int> 12048, 22408, 22457, 13888, 4072, 443, 0, 2816, 3694, 2612, 1836, 237, 7257, 2958, 1068, 4616, 16145, 0, ...
$ Waterways..km. <fctr> 41009, 102000, 110000, 14500, 8501, 3200, 1770, 1200, 7467, 3500, 2400, 1600, 25220, 21579, Minimum not ...
However, this is quite messy and generates a lot of unnecessary typing or clean-up needed. The important information is which groups the variables belong. I instead loaded the basic file and created the grouping in a list:
dt <- read.csv('GlobalFirePower.csv')
grouping <- list(Country = (1:3),
Manpower = (4:10),
Airpower = (11:17),
Army = (18:22),
Navy = (23:30),
NaturalResources = (31:33),
Logistics = (34:39),
Finance = (40:43),
Geography = (44:47)
)
This will let me preserve the information as to which group the variable belonged. I will do some basic clean-up of the file.
dt$Coastline..km.<-as.integer(as.character(dt$Coastline..km.))
NAs introduced by coercion
dt$Waterways..km.<-as.integer(as.character(dt$Waterways..km.))
NAs introduced by coercion
dt$Coastline..km.<- ifelse(is.na(dt$Coastline..km) == T, 0,dt$Coastline..km)
dt$Waterways..km.<- ifelse(is.na(dt$Waterways..km.) == T, 0,dt$Waterways..km.)
dt$Country <- as.character(dt$Country)
dt[dt$Country == 'United States','Country'] <- 'USA'
dt[dt$Country == 'United Kingdom','Country'] <- 'UK'
dt$Country <- as.factor(dt$Country)
As there are over 40 variables and 9 “groups” of variables, I wrote a loop that allows us to plot the relative groups.
dens_list = list()
for (i in 1:9) {
dens_list[[i]] <- dt[grouping[[i]]] %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_density(fill= 'red') +
theme_bw() +
labs(title = paste("Distribution Plots of",names(grouping[i]),"variables"), x = "Variables", y = "Density")
}
attributes are not identical across measure variables; they will be dropped
The graphs are stores in the dens_list object. I can run a loop to access the plots.
for (i in 2:9) {
print(dens_list[[i]])
}
The data is quite interesting with the most density towards the lower end up numbers, and some large outliers with much larger values. These outliers will be clear a bit later.
The dataset also provided their own military ranking.
dt%>%
select(Country, Rank) %>%
head(n=25)
Similar to earlier, lets look at the different variable groups but this time at a country level. As there are too many countries to fit in one run, lets start with the top 15:
top = list()
for (i in 1:9) {
top[[i]] <- dt %>%
filter(Rank <= 15) %>%
select(Country,grouping[[i]]) %>%
gather(key,value,-Country) %>%
ggplot(aes(x=Country,y=value)) +
facet_wrap(~ key, scales = 'free') +
geom_bar(stat = 'identity', fill = 'green') +
coord_flip() +
theme_bw() +
labs(title = paste("Overall Top 15 Breakdown of",names(grouping[i])), x = "Country", y = "")
}
attributes are not identical across measure variables; they will be dropped
for (i in 2:9) {
print(top[[i]])
}
It is very interesting to see the strengths and weaknesses of each of the countries. Already a lot is known about the data by studying these graphs.
Let’s look at the bottom 10 as well.
bot = list()
for (i in 1:9) {
bot[[i]] <- dt %>%
filter(Rank > 122) %>%
select(Country,grouping[[i]]) %>%
gather(key,value,-Country) %>%
ggplot(aes(x=Country,y=value)) +
facet_wrap(~ key, scales = 'free') +
geom_bar(stat = 'identity', fill = 'red') +
coord_flip() +
theme_bw() +
labs(title = paste("Overall Bottom 10 Breakdown of",names(grouping[i])), x = "Country", y = "")
}
attributes are not identical across measure variables; they will be dropped
for (i in 2:9) {
print(bot[[i]])
}
I will also deep dive the breakdown within each group. I will focus on the top 15 once again and only use the military categories that make sense for this view.
det_list = list()
for (i in 1:9) {
det_list[[i]] <- dt %>%
filter(Rank < 15) %>%
select(Country,grouping[[i]]) %>%
gather(key,value,-Country) %>%
ggplot(aes(x=Country,y=value,fill = key)) +
geom_bar(stat='identity',alpha = .7) +
coord_flip() +
theme_bw() +
labs(title = paste("Top 15 Breakdown of",names(grouping[i])), x = "Country", y = "")
}
attributes are not identical across measure variables; they will be dropped
for (i in 2:8) {
if (i %in% c(2,3,4,5,8)){
print(det_list[[i]])
}
}
Multiple interesting observations, including the large external debt finances for US and a few EU countries which are not seen for other countries such as China, India and Russia.
Lastly, I’d like to create a more visual representation of the military rankings. I can use the world map info by joining our rankings.
map.world <- map_data(map = 'world')
Attaching package: <U+393C><U+3E31>maps<U+393C><U+3E32>
The following object is masked from <U+393C><U+3E31>package:purrr<U+393C><U+3E32>:
map
map.world <- left_join(map.world, dt[,1:3], by = c('region' = 'Country') )
Column `region`/`Country` joining character vector and factor, coercing into character vector
map.world %>%
ggplot(aes(map_id=region, x=long, y=lat, fill = Rank)) +
geom_map(data=map.world, map=map.world, alpha = .8) +
scale_fill_gradient(low = 'green', high = 'red',guide = 'colorbar') +
coord_equal()
We can see some interesting geographical trends on strength and weakness, with Africa falling quite behind as well as some SE European countries.
Thanks for taking the time to look at a few of these visuals!