Data Set Title: Flag database
Source Information:
(a) Creators: Collected primarily from the “Collins Gem Guide to Flags”: Collins Publishers (1986).
(b) Donor: Richard S. Forsyth, 8 Grosvenor Avenue, Mapperley Park, Nottingham NG3 5DX, 0602-621676
(c) Date: 5/15/1990
In the following R markdown I am converting a subset of the Flag Dataset from variables to their full attribute information.
The actual data set is found here: http://archive.ics.uci.edu/ml/machine-learning-databases/flags/flag.data
I chose the first 3 columns and the 6th, 8th, 9th, and 10th column from the data set and transform the information based on the data dictionary here: http://archive.ics.uci.edu/ml/machine-learning-databases/flags/flag.names.
library(plyr)
f.data <- file("http://archive.ics.uci.edu/ml/machine-learning-databases/flags/flag.data",
open="r" )
data <- read.table(f.data, sep=",", header=FALSE, stringsAsFactors = FALSE)
subset <- data[, c( 1,2,3,6,8,9,10)]
colnames(subset) <- c("name","landmass","zone","language","bars","stripes","colours")
subset$landmass <- as.factor(mapvalues(subset$landmass, c(seq(1:6)), c("N.America","S.America","Europe","Africa","Asia","Oceania")))
subset$zone <- as.factor(mapvalues(subset$zone, c(seq(1:4)),c("NE","SE","SW","NW")))
subset$language <- as.factor(mapvalues(subset$language, c(seq(1:10)),c("English", "Spanish", "French", "German", "Slavic", "Other Indo-European", "Chinese", "Arabic", "Japanese/Turkish/Finnish/Magyar", "Others")))
close(f.data)
library(htmlTable)
summary(subset)
## name landmass zone language
## Length:194 Africa :52 NE:91 Others :46
## Class :character Asia :39 NW:58 English :43
## Mode :character Europe :35 SE:29 Other Indo-European:30
## N.America:31 SW:16 Spanish :21
## Oceania :20 Arabic :19
## S.America:17 French :17
## (Other) :18
## bars stripes colours
## Min. :0.0000 Min. : 0.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:3.000
## Median :0.0000 Median : 0.000 Median :3.000
## Mean :0.4536 Mean : 1.552 Mean :3.464
## 3rd Qu.:0.0000 3rd Qu.: 3.000 3rd Qu.:4.000
## Max. :5.0000 Max. :14.000 Max. :8.000
##
htmlTable(head(subset))
| name | landmass | zone | language | bars | stripes | colours | |
|---|---|---|---|---|---|---|---|
| 1 | Afghanistan | Asia | NE | Others | 0 | 3 | 5 |
| 2 | Albania | Europe | NE | Other Indo-European | 0 | 0 | 3 |
| 3 | Algeria | Africa | NE | Arabic | 2 | 0 | 3 |
| 4 | American-Samoa | Oceania | SW | English | 0 | 0 | 5 |
| 5 | Andorra | Europe | NE | Other Indo-European | 3 | 0 | 3 |
| 6 | Angola | Africa | SE | Others | 0 | 2 | 3 |