Data Set Title: Flag database

Source Information:
(a) Creators: Collected primarily from the “Collins Gem Guide to Flags”: Collins Publishers (1986).
(b) Donor: Richard S. Forsyth, 8 Grosvenor Avenue, Mapperley Park, Nottingham NG3 5DX, 0602-621676
(c) Date: 5/15/1990

In the following R markdown I am converting a subset of the Flag Dataset from variables to their full attribute information.

The actual data set is found here: http://archive.ics.uci.edu/ml/machine-learning-databases/flags/flag.data

I chose the first 3 columns and the 6th, 8th, 9th, and 10th column from the data set and transform the information based on the data dictionary here: http://archive.ics.uci.edu/ml/machine-learning-databases/flags/flag.names.

  1. name: Name of the country concerned
  2. landmass: 1=N.America, 2=S.America, 3=Europe, 4=Africa, 5=Asia, 6=Oceania
  3. zone: Geographic quadrant, based on Greenwich and the Equator 1=NE, 2=SE, 3=SW, 4=NW
  4. language: 1=English, 2=Spanish, 3=French, 4=German, 5=Slavic, 6=Other Indo-European, 7=Chinese, 8=Arabic, 9=Japanese/Turkish/Finnish/Magyar, 10=Others
  5. bars: Number of vertical bars in the flag
  6. stripes: Number of horizontal stripes in the flag
  7. colours: Number of different colours in the flag
library(plyr)
f.data <- file("http://archive.ics.uci.edu/ml/machine-learning-databases/flags/flag.data",
               open="r" )
data <- read.table(f.data, sep=",", header=FALSE, stringsAsFactors = FALSE)
subset <- data[, c( 1,2,3,6,8,9,10)]
colnames(subset) <- c("name","landmass","zone","language","bars","stripes","colours")
subset$landmass <- as.factor(mapvalues(subset$landmass, c(seq(1:6)), c("N.America","S.America","Europe","Africa","Asia","Oceania")))
subset$zone <- as.factor(mapvalues(subset$zone, c(seq(1:4)),c("NE","SE","SW","NW")))
subset$language <- as.factor(mapvalues(subset$language, c(seq(1:10)),c("English", "Spanish", "French", "German", "Slavic", "Other Indo-European", "Chinese", "Arabic", "Japanese/Turkish/Finnish/Magyar", "Others")))
close(f.data)
library(htmlTable)
summary(subset)
##      name                landmass  zone                   language 
##  Length:194         Africa   :52   NE:91   Others             :46  
##  Class :character   Asia     :39   NW:58   English            :43  
##  Mode  :character   Europe   :35   SE:29   Other Indo-European:30  
##                     N.America:31   SW:16   Spanish            :21  
##                     Oceania  :20           Arabic             :19  
##                     S.America:17           French             :17  
##                                            (Other)            :18  
##       bars           stripes          colours     
##  Min.   :0.0000   Min.   : 0.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.: 0.000   1st Qu.:3.000  
##  Median :0.0000   Median : 0.000   Median :3.000  
##  Mean   :0.4536   Mean   : 1.552   Mean   :3.464  
##  3rd Qu.:0.0000   3rd Qu.: 3.000   3rd Qu.:4.000  
##  Max.   :5.0000   Max.   :14.000   Max.   :8.000  
## 
htmlTable(head(subset))
name landmass zone language bars stripes colours
1 Afghanistan Asia NE Others 0 3 5
2 Albania Europe NE Other Indo-European 0 0 3
3 Algeria Africa NE Arabic 2 0 3
4 American-Samoa Oceania SW English 0 0 5
5 Andorra Europe NE Other Indo-European 3 0 3
6 Angola Africa SE Others 0 2 3