The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work. I am using data on the cost of some popular WW2 airplanes. Since cost typically declines, I hope to show a decrease in the costs of airplanes overtime.
library(dplyr)
library(tidyr)
library(magrittr)
library(stringr)
library(zoo)
require(ggplot2)
require(ggthemes)
library(extrafont)
I copied this file from the web site linked below, and created a CSV file. I uploaded the file to github and I will read the untidy data into R. https://www.ibiblio.org/hyperwar/AAF/StatDigest/aafsd-3.html
untidy= read.csv(file="https://raw.githubusercontent.com/Vthomps000/DATA607_VT/master/AirplaneCost.csv",header=TRUE,sep="\t",na.strings=c("","NA"))
head(untidy)
Type Model X1941 X1942 X1943 X1944 X1945
1 Very Heavy Bombers <NA>
2 <NA> B-29 <NA> 897,730 605,360 509,465
3 Heavy Bombers <NA>
4 <NA> B-17 301,221 258,949 204,370 187,742
5 <NA> B-24 379,162 304,391 215,516 -
6 <NA> B-32 - 790,433 - 790,433 -
The data is in a poor format. There is missing/blank data. There is inconsistant notation, and empty cells. The currency symbol is used irregularly. I will tidy up this data and reorganize it with tidyr and dplyr.
untidy$Type=na.locf(untidy$Type)
untidy$Type = str_trim(untidy$Type)
untidy$Model = str_trim(untidy$Model)
untidy$Type=as.factor(untidy$Type)
untidy$Model= as.factor(untidy$Model)
untidy = gather(untidy,Year,Cost,X1941:X1945)
## Warning: attributes are not identical across measure variables;
## they will be dropped
untidy$Model[untidy$Model==""] <- NA
untidy$Model= as.factor(untidy$Model)
untidy$Year=str_replace(untidy$Year,"X","")
untidy$Year=as.factor(untidy$Year)
untidy$Cost=str_replace(untidy$Cost,",","")
untidy$Cost=as.numeric(untidy$Cost)
## Warning: NAs introduced by coercion
untidy = untidy[complete.cases(untidy$Model),]
write.csv(untidy,"Tidy_AirplaneCost.csv", row.names=FALSE)
head(untidy,6)
## Type Model Year Cost
## 2 Very Heavy Bombers B-29 1941 NA
## 4 Heavy Bombers B-17 1941 301221
## 5 Heavy Bombers B-24 1941 379162
## 6 Heavy Bombers B-32 1941 NA
## 8 Medium Bombers B-25 1941 180031
## 9 Medium Bombers B-26 1941 261062
untidy
## Type Model Year Cost
## 2 Very Heavy Bombers B-29 1941 NA
## 4 Heavy Bombers B-17 1941 301221
## 5 Heavy Bombers B-24 1941 379162
## 6 Heavy Bombers B-32 1941 NA
## 8 Medium Bombers B-25 1941 180031
## 9 Medium Bombers B-26 1941 261062
## 11 Light Bombers A-20 1941 136813
## 12 Light Bombers A-26 1941 224498
## 13 Light Bombers A-28 1941 NA
## 14 Light Bombers A-29 1941 NA
## 15 Light Bombers A-30 1941 NA
## 17 Fighters P-38 1941 132284
## 18 Fighters P-39 1941 77159
## 19 Fighters P-40 1941 60562
## 20 Fighters P-47 1941 113246
## 21 Fighters P-51 1941 NA
## 22 Fighters P-59 1941 NA
## 23 Fighters P-61 1941 649584
## 24 Fighters P-63 1941 NA
## 25 Fighters P-70 1941 143076
## 26 Fighters P-80 1941 NA
## 28 Reconnaissance OA-10 1941 222799
## 30 Transports C-43 1941 NA
## 31 Transports C-45 1941 67743
## 32 Transports C-46 1941 341831
## 33 Transports C-47 1941 128761
## 34 Transports C-53 1941 1366999
## 35 Transports C-54 1941 516553
## 36 Transports C-60 1941 NA
## 37 Transports C-61 1941 NA
## 38 Transports UC-64 1941 NA
## 39 Transports C-69 1941 NA
## 40 Transports C-74 1941 NA
## 41 Transports C-78 1941 NA
## 42 Transports C-82 1941 NA
## 43 Transports C-87 1941 NA
## 45 Trainers PT-13, PT-17, PT-27 1941 10022
## 46 Trainers PT-19, PT-23, PT-26 1941 9710
## 47 Trainers BT-13, BT-15 1941 25035
## 48 Trainers AT-6 1941 29423
## 49 Trainers AT-7, AT-11 1941 76827
## 50 Trainers AT-8, AT-17 1941 41701
## 51 Trainers AT-9 1941 44321
## 52 Trainers AT-10 1941 43501
## 53 Trainers AT-16 1941 NA
## 54 Trainers AT-19 1941 NA
## 55 Trainers AT-21 1941 NA
## 57 Communications L-1 1941 25419
## 58 Communications L-2 1941 NA
## 59 Communications L-3 1941 NA
## 60 Communications L-4 1941 NA
## 61 Communications L-5 1941 NA
## 62 Communications L-6 1941 NA
## 63 Communications R-4 1941 NA
## 64 Communications R-5, TR-5 1941 NA
## 65 Communications R-6 1941 NA
## 67 Very Heavy Bombers B-29 1942 897730
## 69 Heavy Bombers B-17 1942 258949
## 70 Heavy Bombers B-24 1942 304391
## 71 Heavy Bombers B-32 1942 790433
## 73 Medium Bombers B-25 1942 153396
## 74 Medium Bombers B-26 1942 239655
## 76 Light Bombers A-20 1942 124254
## 77 Light Bombers A-26 1942 NA
## 78 Light Bombers A-28 1942 118704
## 79 Light Bombers A-29 1942 118080
## 80 Light Bombers A-30 1942 155570
## 82 Fighters P-38 1942 120407
## 83 Fighters P-39 1942 69534
## 84 Fighters P-40 1942 59444
## 85 Fighters P-47 1942 105594
## 86 Fighters P-51 1942 58698
## 87 Fighters P-59 1942 NA
## 88 Fighters P-61 1942 245327
## 89 Fighters P-63 1942 60277
## 90 Fighters P-70 1942 NA
## 91 Fighters P-80 1942 NA
## 93 Reconnaissance OA-10 1942 NA
## 95 Transports C-43 1942 49524
## 96 Transports C-45 1942 NA
## 97 Transports C-46 1942 314700
## 98 Transports C-47 1942 109696
## 99 Transports C-53 1942 143479
## 100 Transports C-54 1942 370492
## 101 Transports C-60 1942 126881
## 102 Transports C-61 1942 12208
## 103 Transports UC-64 1942 NA
## 104 Transports C-69 1942 NA
## 105 Transports C-74 1942 NA
## 106 Transports C-78 1942 27470
## 107 Transports C-82 1942 NA
## 108 Transports C-87 1942 NA
## 110 Trainers PT-13, PT-17, PT-27 1942 9896
## 111 Trainers PT-19, PT-23, PT-26 1942 12911
## 112 Trainers BT-13, BT-15 1942 23068
## 113 Trainers AT-6 1942 25672
## 114 Trainers AT-7, AT-11 1942 85688
## 115 Trainers AT-8, AT-17 1942 34323
## 116 Trainers AT-9 1942 44392
## 117 Trainers AT-10 1942 42688
## 118 Trainers AT-16 1942 27564
## 119 Trainers AT-19 1942 26574
## 120 Trainers AT-21 1942 92295
## 122 Communications L-1 1942 NA
## 123 Communications L-2 1942 2770
## 124 Communications L-3 1942 2236
## 125 Communications L-4 1942 2432
## 126 Communications L-5 1942 10165
## 127 Communications L-6 1942 NA
## 128 Communications R-4 1942 NA
## 129 Communications R-5, TR-5 1942 NA
## 130 Communications R-6 1942 NA
## 132 Very Heavy Bombers B-29 1943 NA
## 134 Heavy Bombers B-17 1943 NA
## 135 Heavy Bombers B-24 1943 NA
## 136 Heavy Bombers B-32 1943 NA
## 138 Medium Bombers B-25 1943 151894
## 139 Medium Bombers B-26 1943 212932
## 141 Light Bombers A-20 1943 110324
## 142 Light Bombers A-26 1943 254624
## 143 Light Bombers A-28 1943 NA
## 144 Light Bombers A-29 1943 NA
## 145 Light Bombers A-30 1943 151017
## 147 Fighters P-38 1943 105567
## 148 Fighters P-39 1943 NA
## 149 Fighters P-40 1943 49449
## 150 Fighters P-47 1943 104258
## 151 Fighters P-51 1943 58824
## 152 Fighters P-59 1943 NA
## 153 Fighters P-61 1943 180711
## 154 Fighters P-63 1943 57379
## 155 Fighters P-70 1943 NA
## 156 Fighters P-80 1943 NA
## 158 Reconnaissance OA-10 1943 NA
## 160 Transports C-43 1943 27342
## 161 Transports C-45 1943 66189
## 162 Transports C-46 1943 259268
## 163 Transports C-47 1943 92417
## 164 Transports C-53 1943 150470
## 165 Transports C-54 1943 400831
## 166 Transports C-60 1943 113168
## 167 Transports C-61 1943 13057
## 168 Transports UC-64 1943 36811
## 169 Transports C-69 1943 605456
## 170 Transports C-74 1943 NA
## 171 Transports C-78 1943 33797
## 172 Transports C-82 1943 NA
## 173 Transports C-87 1943 NA
## 175 Trainers PT-13, PT-17, PT-27 1943 NA
## 176 Trainers PT-19, PT-23, PT-26 1943 11100
## 177 Trainers BT-13, BT-15 1943 NA
## 178 Trainers AT-6 1943 NA
## 179 Trainers AT-7, AT-11 1943 68441
## 180 Trainers AT-8, AT-17 1943 NA
## 181 Trainers AT-9 1943 NA
## 182 Trainers AT-10 1943 NA
## 183 Trainers AT-16 1943 27416
## 184 Trainers AT-19 1943 22496
## 185 Trainers AT-21 1943 NA
## 187 Communications L-1 1943 NA
## 188 Communications L-2 1943 2916
## 189 Communications L-3 1943 2460
## 190 Communications L-4 1943 2437
## 191 Communications L-5 1943 NA
## 192 Communications L-6 1943 6065
## 193 Communications R-4 1943 43584
## 194 Communications R-5, TR-5 1943 59488
## 195 Communications R-6 1943 47635
## 197 Very Heavy Bombers B-29 1944 605360
## 199 Heavy Bombers B-17 1944 204370
## 200 Heavy Bombers B-24 1944 215516
## 201 Heavy Bombers B-32 1944 790433
## 203 Medium Bombers B-25 1944 142194
## 204 Medium Bombers B-26 1944 192427
## 206 Light Bombers A-20 1944 100800
## 207 Light Bombers A-26 1944 192457
## 208 Light Bombers A-28 1944 NA
## 209 Light Bombers A-29 1944 NA
## 210 Light Bombers A-30 1944 NA
## 212 Fighters P-38 1944 97147
## 213 Fighters P-39 1944 50666
## 214 Fighters P-40 1944 44892
## 215 Fighters P-47 1944 85578
## 216 Fighters P-51 1944 51572
## 217 Fighters P-59 1944 236299
## 218 Fighters P-61 1944 NA
## 219 Fighters P-63 1944 59966
## 220 Fighters P-70 1944 NA
## 221 Fighters P-80 1944 109471
## 223 Reconnaissance OA-10 1944 216617
## 225 Transports C-43 1944 27332
## 226 Transports C-45 1944 52507
## 227 Transports C-46 1944 233377
## 228 Transports C-47 1944 88574
## 229 Transports C-53 1944 NA
## 230 Transports C-54 1944 285113
## 231 Transports C-60 1944 NA
## 232 Transports C-61 1944 15973
## 233 Transports UC-64 1944 35264
## 234 Transports C-69 1944 NA
## 235 Transports C-74 1944 NA
## 236 Transports C-78 1944 NA
## 237 Transports C-82 1944 478549
## 238 Transports C-87 1944 208780
## 240 Trainers PT-13, PT-17, PT-27 1944 NA
## 241 Trainers PT-19, PT-23, PT-26 1944 15052
## 242 Trainers BT-13, BT-15 1944 NA
## 243 Trainers AT-6 1944 22952
## 244 Trainers AT-7, AT-11 1944 NA
## 245 Trainers AT-8, AT-17 1944 NA
## 246 Trainers AT-9 1944 NA
## 247 Trainers AT-10 1944 NA
## 248 Trainers AT-16 1944 NA
## 249 Trainers AT-19 1944 NA
## 250 Trainers AT-21 1944 NA
## 252 Communications L-1 1944 NA
## 253 Communications L-2 1944 NA
## 254 Communications L-3 1944 NA
## 255 Communications L-4 1944 2620
## 256 Communications L-5 1944 9704
## 257 Communications L-6 1944 NA
## 258 Communications R-4 1944 NA
## 259 Communications R-5, TR-5 1944 50950
## 260 Communications R-6 1944 NA
## 262 Very Heavy Bombers B-29 1945 509465
## 264 Heavy Bombers B-17 1945 187742
## 265 Heavy Bombers B-24 1945 NA
## 266 Heavy Bombers B-32 1945 NA
## 268 Medium Bombers B-25 1945 116752
## 269 Medium Bombers B-26 1945 NA
## 271 Light Bombers A-20 1945 NA
## 272 Light Bombers A-26 1945 175892
## 273 Light Bombers A-28 1945 NA
## 274 Light Bombers A-29 1945 NA
## 275 Light Bombers A-30 1945 NA
## 277 Fighters P-38 1945 NA
## 278 Fighters P-39 1945 NA
## 279 Fighters P-40 1945 NA
## 280 Fighters P-47 1945 83001
## 281 Fighters P-51 1945 50985
## 282 Fighters P-59 1945 NA
## 283 Fighters P-61 1945 199598
## 284 Fighters P-63 1945 65914
## 285 Fighters P-70 1945 NA
## 286 Fighters P-80 1945 71840
## 288 Reconnaissance OA-10 1945 207541
## 290 Transports C-43 1945 NA
## 291 Transports C-45 1945 48830
## 292 Transports C-46 1945 221550
## 293 Transports C-47 1945 85035
## 294 Transports C-53 1945 NA
## 295 Transports C-54 1945 259816
## 296 Transports C-60 1945 NA
## 297 Transports C-61 1945 NA
## 298 Transports UC-64 1945 32427
## 299 Transports C-69 1945 NA
## 300 Transports C-74 1945 NA
## 301 Transports C-78 1945 NA
## 302 Transports C-82 1945 310233
## 303 Transports C-87 1945 NA
## 305 Trainers PT-13, PT-17, PT-27 1945 NA
## 306 Trainers PT-19, PT-23, PT-26 1945 NA
## 307 Trainers BT-13, BT-15 1945 NA
## 308 Trainers AT-6 1945 NA
## 309 Trainers AT-7, AT-11 1945 NA
## 310 Trainers AT-8, AT-17 1945 NA
## 311 Trainers AT-9 1945 NA
## 312 Trainers AT-10 1945 NA
## 313 Trainers AT-16 1945 NA
## 314 Trainers AT-19 1945 NA
## 315 Trainers AT-21 1945 NA
## 317 Communications L-1 1945 NA
## 318 Communications L-2 1945 NA
## 319 Communications L-3 1945 NA
## 320 Communications L-4 1945 2701
## 321 Communications L-5 1945 8323
## 322 Communications L-6 1945 NA
## 323 Communications R-4 1945 NA
## 324 Communications R-5, TR-5 1945 NA
## 325 Communications R-6 1945 NA
I am comparing the data to show the decrease in cost of some fighter and bomber airplanes
target=c("P-38","P-40","P-47","P-51")
fighter=filter(untidy,Model %in%target) #filtering fighter planes
g=ggplot(data=fighter,aes(x=Year,y=Cost*0.001,group = Model, color = Model))
g=g+geom_line(stat="identity",size=1.6)
g=g+ggtitle("Cost of Fighters")
g=g+ylab("Cost ($K)")+xlab("Year")
g=g+theme_get()
g=g+theme(plot.title = element_text(hjust = 0.5),text=element_text(size=13, family="Times"))
g
Now I will compare the data with the Bomber planes.
target=c("B-17","B-24", "B-25", "B-26")
bomber=filter(untidy,Model %in%target) #filtering bomber planes
g=ggplot(data=na.omit(bomber),aes(x=Year,y=Cost*0.001,group = Model, color = Model))
g=g+geom_line(stat="identity",size=1.6)
g=g+ggtitle("Cost of Bombers")
g=g+ylab("Cost ($K)")+xlab("Year")
g=g+theme_get()
g=g+theme(plot.title = element_text(hjust = 0.5),text=element_text(size=13, family="Times"))
g
The data and graph confirm my original hypothesis. The cost of the airplanes dramatically decreased over time.