Introduction

Across the world, citizens take great pride in their national flags. Flags are highly symbolic and are often at the core of a nation’s identity. In the United States, we celebrate our Independence Day (4th of July) by flying the American flag high in the sky and dressing up in the “Red, White, and Blue” colors. National flags invoke feelings of patriotism, pride, and freedom. They are not just merely cloths like everyday t-shirts. The symbolism also holds true when it comes to negative connotations. For example, burning or destroying a national flags resembles rebellion, or being displeased with the government. In many countries, these actions can have severe punishments. All of this being said, when a flag is adopted, symbolism is naturally associated with that flag.

Additionally, flags can be associated with military presence or political power. In regions of the world with unstable governments, flags can change frequently, as shown here in this YouTube Video (17:09) - The World: Timeline of National Flags: 1019-2020. Nearly every current flag was adopted after 1800.

Global Overview of the Different Flags

Below, we can note some interesting patterns associated with certain regions of the world. Some regions have highly similar flag colors or geometric patterns. For example:

  • In Northern European countries, there are a lot of flags have a cross in them.
  • Many European flags have the tricolor horizontal or vertical bands.
  • In West Africa, there are a lot of red, yellow, and green color combinations.
  • In Northern Africa and the Middle East, there are many flags with black, red, white, and/or green.
  • In the northern part of South America, we see a lot of the tricolor yellow, blue, and red bands.
  • In Oceania (Australia), there are many flags with blue, white and red.

Some flags are highly reflective of previous colonizers. For example, Australia, New Zealand, and the United States all have the red, white, and blue color patterns, derived from the previous United Kingdom colonization. Many flags share the same symbols or objects, such as stars, the moon, or the sun.

Commons.Wikimedia.org: Flag Map of the World (2022)

Research Question

As shown above, it seems like some regions have specific colors associated with them. Today, we will research the question:

  1. Is there a relationship between the national flag colors and the different regions of the world?

Previous Studies

Although the idea for this analysis came to me naturally, I am not the first person to want to study the colors of national flags. Apparently, the study of flags is called Vexillology and there is a community who devotes time to studying flags, flag history, etc… Below, I link to some of the previous studies that I found interesting. The biggest separator when it comes to these studies’ findings is how different “vexillologists” group the different colors. For example, some separate light blue from dark blue, some look at common color combinations, and some differentiate between the presence of a color (regardless of amount) and the total area on a flag that a color occupies. This goes to show that indicating which colors are present in a flag can be a subjective process.

Here are some previous studies conducted:

My Study

In previous studies that I read, the methodology behind color identification wasn’t specified. As mentioned, color identification can be highly subjective, so today I will go into a lot of detail behind how I choose to quantify and group colors. To do so, I discuss how images are read in by computers as “data” and the different techniques of image processing.

Data Collection and Specifying Regions

Data Source

All of the images of the flags are free to download from the 2020 CIA World Factbook. Downloading the factbook is a download of the entire website, including HTML pages, style sheets, and raw data. Outside of flag analysis, this could be a good data source for research questions conducted globally, including thos related to economy, transportation, demography, military, etc… There are 268 different HTML pages associated with different countries, territories, oceans, and other areas. Using the rvest library makes data collection from HTML pages really easy.

Deciding Which Flags to Use: 193 UN Member States

In the CIA World Factbook, I have access to 256 different flags for various countries, territories, dependencies, and other areas. For my analysis of the relationship between flag colors and global regions, I will only look at the 193 countries acknowledged as “sovereign states” by the United Nations (UN). In order to be a member of the UN, a country must be self-governing and voted into the UN. Here is a link to the UN Member States. Below are the various member states, plotted with most-recently admitted countries colored darker blue. A full list of each country and the year they were voted into the United Nations can be found in Appendix A.

By only using the countries recognized by the United Nations, we may not include all flags of “sovereign” states that aren’t recognized as such by the UN. This includes permanent UN Observers: Vatican City and Palestine, as well as Taiwan which was a former UN member. All of these places have unique flags and some of the previous studies I read, included them, but I will not. Additionally, territories, commonwealths, smaller islands, and other dependencies are not included in this analysis. Appendix B shows some of the flags not included in this analysis.

Defining the World Regions

Below, we can see the 10 different regions defined by the CIA World Factbook. Personally, I do not like the groupings they have created. Mostly because Africa and Europe have a lot of countries, where as North America only has 3 countries.

Instead of the CIA World Factbook’s Regions (outlined above), I will define my own regions, shown below, by mixing the Factbook’s regions with the regions of the United Nations geoscheme. I grouped North America and Central America. I separated Northern Europe (all of the flags with crosses) from the rest of the Europe. I separated Africa into West Africa, Northern Africa (combined with Middle East), and the rest of Africa (South and East). I left the Asia and Oceania groups the same as the CIA World Factbook. I could have combined Central Asia and South Asia to get some more countries in each of the different regions, but there weren’t any similarities in these countries flags or their histories. Appendix A has a full list of the countries, country codes, and regions I assigned each to.

Here are the total number of countries in each of my different regions:

x
Africa 26
Australia - Oceania 13
Central Asia 6
East Asia/Southeast Asia 17
Europe 33
North/Central America 23
Northern Africa/Middle East 23
Northern Europe 10
South America 12
South Asia 8
Western Africa 22

Methodology: Color Identification

RGB Values (Images = Data)

Any image can be read in as a combination of Red, Green, and Blue (RGB) values. 3 Values (between 0-1 or 0-255) are assigned to each pixel in an image. For example, we can read in the U.S. flag using the readJPEG() function from the jpeg library and display its dimensions. We see that this image is 263 pixels tall, 500 pixels wide, and has 3 channels, stored as an array with 394,500 elements (263*500*3). We can display this image using the rasterImage() function from the graphics library to display the image, converting the RGB values back into an image.

US <- readJPEG("attachments/flags/US-flag.jpg")
dim(US)
## [1] 263 500   3
length(US)
## [1] 394500
plot(x = c(0,500), y = c(0, 263), 
     type = "n", xaxt = "n", yaxt = "n", frame.plot=FALSE,
     xlab = "United States Flag",
     ylab = "")
rasterImage(US, 0, 0, 500, 263)

Plotting RGB Values in 3D graphs

Since any image (not just flags) can be represented as combinations of red, green, and blue values; it is possible to plot any image on a 3D graph to see the spread of the different color values. The plotPixels() function from the colordistance package makes it really easy to do this. By giving the function the image path and specifying a random sample (n) = 10,000 points, we can start to see the clusters of red, white, and blue values.

set.seed(1776)
plotPixels("attachments/flags/US-flag.jpg", 
           lower=NULL, upper = NULL, n =10000, pch=10)

Issue with Color Identifaction

Although our eyes can look at the U.S. flag and see it is comprised of 3 colors: red, white, and blue, computers read the various pixels as slight variations of these different colors. For example, we see the top 10 colors (represented in hex code) as a sum of the number of pixels. The top color #FFFFFF (white) only makes up 15.9% of the pixels. We also see other variations of white (all of the hex codes starting with #F…..) and variations of red (#E…..).

## #FFFFFF #FFFEFF #ED1B24 #FFFDFF #FEFFFF #EB1C22 #F01A26 #EE1B21 #ED1B26 #EA1D22 
##   20961    2947    2534    2248    2205    2063    2058    1745    1574    1403 
## (Other) 
##   91762

By factoring the different colors, we see there are a total of 9,742 unique colors in this one image of the U.S. flag. Although our eye recognizes only 3 distinct colors, the computer reads in nearly ten thousand colors. This could be due to poor image quality and “noise”, or that some pixels that fall on the border of red/white, blue/white, etc… and may contain a mix of these colors.

## [1] 9742

“Binning” Color Values into 8 Color Groups

To solve the issue of having many, many unique colors when we expect only a few unique colors, we can take one approach of “binning” each pixel. From the colordistance package, we can use the getImageHist() function to set upper and lower limits for each of the RGB channels. For example, any pixel with R > 0.7, G > 0.7, and B > 0.7 could be labeled “White”, a pixel with R > 0.7, G < 0.3, and B < 0.3 could be “Red”, etc… If we increase the number of bins we use, we get more colors and the bins are more specific. If we decrease the number of bins we use, we get less color groups and the groupings will be easier to define. Since most flags only have 4-5 recognizably different colors and I think a total of 8 colors are easy to explain, I will use 2*2*2 bins. This means that we use 0.5 as a cutoff value for each of the channels and bin each pixel as follows:

Defined Color Bins

After binning the U.S. flag into those 8 color groups (2 for each channel), we see 3 main colors: red, white, and blue with percentages that accurately reflect what our eyes can see.

## RGB and HSV are device-dependent, perceptually non-uniform color spaces. See 'Color spaces' vignette for more information.
## 
## Using 2*2*2 = 8 bins
## RGB and HSV are device-dependent, perceptually non-uniform color spaces. See 'Color spaces' vignette for more information.
## 
## Using 2*2*2 = 8 bins

K-Means clustering (another possible color identification technique)

From the countcolors package, we can use the getKMeanColors() function, specifying the number of clusters. This function finds the optimal average RGB values that would best describe the image and reports these RGB values. Then we can tabulate the average RGB values using the extractClusters() function from the same package.

R G B Pct
0.9210873 0.1233405 0.1571953 0.4110
0.0398685 0.4077117 0.6969841 0.1745
0.9791840 0.9687788 0.9715376 0.4145

We can see the average RGB values and the area percentage of each color on the flag are not much different using the K-Means approach (above) and the binning technique (below).

r g b Pct
2 0.9197430 0.1154069 0.1490103 0.4076198
5 0.0245151 0.3995950 0.6941691 0.1702966
8 0.9803288 0.9635887 0.9663571 0.4141369

Whenever doing k-means clustering, it is only natural to use a scree plot and the “elbow” method to find the most optimal number of clusters. For example, we could run through each flag and use 1 cluster, 2 clusters, … all the way up to 10 clusters. Then plot the sum of squares in a scree plot and use the “elbow” method to select which number of clusters would be best. The “elbow” method looks for the number of clusters that explain the most variance (between SS/ total SS), where adding more clusters doesn’t explain much more variance. Of course, in the U.S. flag, the optimal number would be 3 clusters (red, white, and blue.)

Although, it’s pretty obvious with the U.S. flag that there are only 3 colors, some flags are harder to interpret. For example, the Saint Pierre and Miquelon flag (a French archipelago) may have anywhere between 3-6 clusters. When a distinct “elbow” is not present, the number of clusters you go with, can be up to interpretation. Some statisticians may look to explain at least 90% of the total variance. Others may look for clusters that explain at least 10% more than the previous number of clusters.

This is the flag of Saint Pierre and Miquelon. Its complexity makes the number of clusters hard to interpret from the scree plot. Additionally, the scree plot is the proportion of Between Sum of Squares and Total Sum of Squares, not accounting for the Within Sum of Squares: how points deviate from the cluster means.

R G B Pct
0.1450245 0.5314314 0.3917892 0.04000
0.9274081 0.8092170 0.1227936 0.13045
0.7619533 0.1788976 0.1808250 0.09695
0.1052367 0.6447611 0.8602119 0.51535
0.9540612 0.9696906 0.9719438 0.13445
0.0921948 0.0702046 0.0422374 0.08280

Although K-means might be a viable approach, we can see that there might be some subjectivity involved with selecting the correct number of clusters, especially when it comes to more complex images. In my opinion, it isn’t as intuitive as the binning approach. With that being said, the binning approach is a little more “naive” and selecting the right cutoff values can also be somewhat subjective. The binning approach doesn’t do well with colors really close to the cutoff values. For example, certain shades of blue might look really similar, but might fall into two different color groups. Perhaps, a hybrid approach might be best: using one approach for simpler flags and the other for flags with colors close to the bin cutoff values.

Exploratory Analysis

Above, I outline two different approaches to identifying colors: the binning approach and the k-means clustering approach. The binning approach, to me, is more intuitive and a little easier to explain. The binning approach might not explain quite as much variability from flag-to-flag and a given color that is really close to the cutoff values might be put into two different color groups, whereas k-means clustering handles this specific instance a little better. There are times where each approach might better handle certain flags. For this analysis and the sake of simplicity, I will use the binning approach with 8 color groups, outlined previously. After binning all of the 193 UN member flags, we can now start to explore the flag colors.

Flags with highest % area for each color group

For each of the color groups, we can look at the flags with the most area encompassed by these colors, shown below. For group 2 (red), 3 (green), 4 (yellow), 5 (blue), and 6 (light blue-green), the flags and percentages make sense. China’s flag is 95.7% red, Saudi Arabia’s flag is 90.3% green, Brunei’s flag is 54.8% yellow, Nauru’s flag is 86.6% blue, and Micronesia’s flag is 92.9% light blue-green. However, for color group 1, 6, and 8, we can see the issues that arise with binning. Group 1 was supposed to represent black, but Turkmenistan’s flag is no where near 82.5% black. The dark green color is technically 0.43 < 0.5, so instead of being in color group 3, it’s technically in color group 1. Similarly, for color group 8, the Somalia flag is not 99.9% white, but the light blue has RGB values > 0.5, so it is technically in the color group. Afghanistan’s flag does not have 0.4% pink/purple, but the border of the red and green stripes are mixed and placed in this color group. These examples show where k-means might be a better approach.

ColorGroup r g b pct Region Country Code
1 0.0091817 0.4276603 0.3155710 0.8252108 Central Asia Turkmenistan TX
2 0.9297423 0.1330955 0.1580773 0.9570149 East Asia/Southeast Asia China CH
3 0.0329333 0.5684799 0.2783894 0.9035003 Northern Africa/Middle East Saudi Arabia SA
4 0.9769573 0.8786282 0.0181822 0.5375680 East Asia/Southeast Asia Brunei BX
5 0.0045805 0.3976509 0.7013764 0.8660480 Australia - Oceania Nauru NR
6 0.9597217 0.4717743 0.5356438 0.0041904 South Asia Afghanistan AF
7 0.4818160 0.6861815 0.8707143 0.9289302 Australia - Oceania Micronesia, Federated States of FM
8 0.5855673 0.8584932 0.9717104 0.9999451 Africa Somalia SO

Area % distribution of flag colors

Continuing forward, we can look at the distribution of each of the color groups, after filtering out any color that appears in < 2% of the flag. Filtering out colors < 2% removes any minor details or “noise”. In doing so, we notice that the pink/purple color (group 6) is not present in any of the countries flags more than 2%. We can see a lot of flags have the red and white colors, with relatively few using the black (1) or light blue/green color (6).

Also notice the large spikes in the 30-40% range. Theses are indicative of tricolor (three-colored) flags who may have roughly a 33-33-33 split. Similarly, the red spike around 50% is indicative of a bicolor (two-colored) flag.

Most dominant colors

Rather than looking at the presence of colors, we want to look at the #1 or most “dominant” flag colors. Based on the % area encompassed by each of the color groups, we can analyze which colors appear to be the most dominant. Essentially these values are the color groups we would pick if we had to select just one to represent the flag. Clearly color group 2 (red) and color group 8 (white) are the most popular flag colors. These results are consistent with previous studies’ findings.

ColorGroup n
1 17
2 59
3 21
4 22
5 19
7 12
8 46

As we were able to visually see in the introduction, Western Africa has a large presence of yellows and greens. Northern Africa has a lot of black and whites. South America has red and yellows. South/East Asia has a lot of red and whites. Now, we can see a similar pattern when looking at the #1 colors in each of the countries flags.

Number of Colors in each Flag

Next, we look at the distribution of the number of colors present in each flag. Using a cutoff value of 5% to get only the main colors, we see that most countries have 3, 2, or 4 flag colors. The only 6-colored flag belongs to South Africa who has black, yellow, green, white, red, and blue present in their flag. We see that throughout Europe, every flag has exactly 2 or 3 colors. In North, Central, and South America, most countries have 3 colors. Northern Africa/Middle East, South Asia, and East Asia/Southeast Asia have a similar distribution of 3, 2, and 4 colors. Australia/Oceania has a uniform distribution of 2, 3, and 4 colored flags. Western Africa has a very large number of 3-colored flags.

## `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
n_colors n_flags
1 5
2 53
3 103
4 23
5 8
6 1

Unique Color Combinations

After concatenating all of the 1’s and 0’s (presence in color groups) in Excel, we can summarise the most common color combinations as follows:

Significance Tests

Above, we explored the most common flag colors, the colors that take up the most area on flags, color combinations, and number of colors present on flags. Now, we will test to see if any of these results are significantly different from region to region. Using a cutoff of 5%, we look at which colors are present in flags (1 if present, 0 if present < 5%).

## , , present = 0
## 
##                              colorgroup
## regions                        1  2  3  4  5  6  7  8
##   Africa                      14  6  8 11 19 26 20 10
##   Australia - Oceania         12  5 11  7  5 13  9  4
##   Central Asia                 5  2  4  4  5  6  4  3
##   East Asia/Southeast Asia    13  0 16 12  9 17 17  5
##   Europe                      30  5 28 21 21 33 29 13
##   North/Central America       15  9 18 14 11 23 20  7
##   Northern Africa/Middle East 14  2 13 22 22 23 21  4
##   Northern Europe              9  4  8  7  4 10 10  2
##   South America               11  3  8  6  7 12 10  5
##   South Asia                   5  2  4  5  7  8  8  3
##   Western Africa              22  5  3  7 16 22 20 13
## 
## , , present = 1
## 
##                              colorgroup
## regions                        1  2  3  4  5  6  7  8
##   Africa                      12 20 18 15  7  0  6 16
##   Australia - Oceania          1  8  2  6  8  0  4  9
##   Central Asia                 1  4  2  2  1  0  2  3
##   East Asia/Southeast Asia     4 17  1  5  8  0  0 12
##   Europe                       3 28  5 12 12  0  4 20
##   North/Central America        8 14  5  9 12  0  3 16
##   Northern Africa/Middle East  9 21 10  1  1  0  2 19
##   Northern Europe              1  6  2  3  6  0  0  8
##   South America                1  9  4  6  5  0  2  7
##   South Asia                   3  6  4  3  1  0  0  5
##   Western Africa               0 17 19 15  6  0  2  9

Below, we go color-by-color and use a chi-square test to see if certain regions have colors more or less present than others. We first report the chi-square test with d.o.f. = 10 (11 regions - 1), then use simulated p-value to see if there are any relationships that exist. We will find that every p-value (non-simulated and simulated) < 0.05 indicating an association exists within each color group. The mosaic plots help us understand which presence of colors have higher or lower associations with certain regions. By looking at the mosaic plots, we see:

  • (South and East) Africa has a significantly higher association with color group 1: black.
  • There aren’t any associations with the color red, larger than 2 standardized residuals away.
  • The color green has a positive association with West and (South/East) Africa, but a negative association with Europe and East/Southeast Asia, where very few flags have the color green.
  • The gold/yellow color is highly associated with West Africa, but negatively associated with Northern Africa/Middle East.
  • The presence of the color blue is negatively associated with Northern Africa and the Middle East.
  • Lastly, presence of the pink/purple, light blue/green, and white colors do not have any standardized residuals greater than +/- 2 standarsdized residuals away indicating that regions might not be highly associated with these colors, rather certain regions have a little more or less of these colors than another.
## 
##  Pearson's Chi-squared test
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 79.322, df = 10, p-value = 6.818e-13
## 
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 79.322, df = NA, p-value = 0.0004998

## 
##  Pearson's Chi-squared test
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 69.931, df = 10, p-value = 4.572e-11
## 
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 69.931, df = NA, p-value = 0.0004998

## 
##  Pearson's Chi-squared test
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 32.24, df = 10, p-value = 0.0003652
## 
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 32.24, df = NA, p-value = 0.0004998

## 
##  Pearson's Chi-squared test
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 22.229, df = 10, p-value = 0.01398
## 
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 22.229, df = NA, p-value = 0.01049

## 
##  Pearson's Chi-squared test
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 37.86, df = 10, p-value = 4.014e-05
## 
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 37.86, df = NA, p-value = 0.0004998

## 
##  Pearson's Chi-squared test
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 193, df = 10, p-value < 2.2e-16
## 
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 193, df = NA, p-value = 0.0004998

## 
##  Pearson's Chi-squared test
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 108.38, df = 10, p-value < 2.2e-16
## 
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 108.38, df = NA, p-value = 0.0004998

## 
##  Pearson's Chi-squared test
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 36.109, df = 10, p-value = 8.063e-05
## 
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 36.109, df = NA, p-value = 0.0004998

Fitting Logistic Regression Models

Model 1: Color Presence ~ Color Group (Region has no effect)

The first logistic regression model I will build uses the color group to predict whether a color is present or not (1 or 0), ignoring the region’s effect all together. The baseline color group is 1 (black). Looking at the estimates, we see significant p-values and positive coefficients for all of the other color groups besides 6 (pink/purple) and 7 (light blue/green). This indicates that the other 5 color groups are significantly more likely to be present in a flag than the color black. The light blue/green color is significantly less likely to be present than the black color. The pink/purple color is not significant, but has a really low estimate for the coefficient, which makes sense since it is not present more than 5% on any flag. We also note that AIC = 1589.9 and the residual deviance is 1573.8 on 1536 d.o.f.

fit1 <- glm(ColorPresent ~ as.factor(ColorGroup), family="binomial", data =df3)
summary(fit1)
## 
## Call:
## glm(formula = ColorPresent ~ as.factor(ColorGroup), family = "binomial", 
##     data = df3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7329  -0.9235  -0.5267   0.9406   2.0218  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             -1.2494     0.1730  -7.223 5.09e-13 ***
## as.factor(ColorGroup)2   2.4989     0.2446  10.215  < 2e-16 ***
## as.factor(ColorGroup)3   0.7303     0.2282   3.200 0.001373 ** 
## as.factor(ColorGroup)4   0.8397     0.2270   3.699 0.000217 ***
## as.factor(ColorGroup)5   0.6178     0.2297   2.689 0.007162 ** 
## as.factor(ColorGroup)6 -16.3166   284.7721  -0.057 0.954308    
## as.factor(ColorGroup)7  -0.6557     0.2755  -2.380 0.017300 *  
## as.factor(ColorGroup)8   1.8356     0.2291   8.013 1.12e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2020.2  on 1543  degrees of freedom
## Residual deviance: 1573.8  on 1536  degrees of freedom
## AIC: 1589.8
## 
## Number of Fisher Scoring iterations: 16

By exponentiating the coefficients, we can write out the first logistic regression model in terms of the odds ratios, instead of the log odds. The \(logit(color presence) = 0.29 + 12.7(red) + 2.08(green) + 2.3(yellow) + 1.85(blue) + 0.00000(pink/purple) + 0.52(light green/blue) + 6.27(white)\)

exp(coef(fit1))
##            (Intercept) as.factor(ColorGroup)2 as.factor(ColorGroup)3 
##           2.866667e-01           1.216874e+01           2.075726e+00 
## as.factor(ColorGroup)4 as.factor(ColorGroup)5 as.factor(ColorGroup)6 
##           2.315557e+00           1.854928e+00           8.199289e-08 
## as.factor(ColorGroup)7 as.factor(ColorGroup)8 
##           5.191030e-01           6.268959e+00

Model 2: Color Presence ~ Color Group and Regions (assuming they are indpendent)

For the second model, the coefficients are log odds, compared to the (South/East) African region and the presence of the black color. We see, there is a higher AIC = 1594.6 than the previous model and the residual deviance is 1558.6 on 1526 d.o.f. This model does not seem to be better than the first model. Maybe the region and color group aren’t independent…

fit2 <- glm(ColorPresent ~ Region + as.factor(ColorGroup), family="binomial", data =df3) 
summary(fit2)
## 
## Call:
## glm(formula = ColorPresent ~ Region + as.factor(ColorGroup), 
##     family = "binomial", data = df3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9674  -0.8837  -0.4632   0.8828   2.1526  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                        -0.7504     0.2277  -3.296 0.000980 ***
## RegionAustralia - Oceania          -0.4881     0.2870  -1.700 0.089054 .  
## RegionCentral Asia                 -0.8007     0.3925  -2.040 0.041333 *  
## RegionEast Asia/Southeast Asia     -0.6031     0.2649  -2.277 0.022768 *  
## RegionEurope                       -0.7662     0.2237  -3.425 0.000614 ***
## RegionNorth/Central America        -0.4953     0.2415  -2.051 0.040267 *  
## RegionNorthern Africa/Middle East  -0.6219     0.2429  -2.560 0.010460 *  
## RegionNorthern Europe              -0.7252     0.3195  -2.270 0.023204 *  
## RegionSouth America                -0.5530     0.2960  -1.869 0.061691 .  
## RegionSouth Asia                   -0.6139     0.3447  -1.781 0.074920 .  
## RegionWestern Africa               -0.3680     0.2433  -1.513 0.130371    
## as.factor(ColorGroup)2              2.5298     0.2463  10.270  < 2e-16 ***
## as.factor(ColorGroup)3              0.7396     0.2297   3.220 0.001280 ** 
## as.factor(ColorGroup)4              0.8505     0.2285   3.722 0.000197 ***
## as.factor(ColorGroup)5              0.6256     0.2312   2.706 0.006810 ** 
## as.factor(ColorGroup)6            -17.3175   466.1327  -0.037 0.970364    
## as.factor(ColorGroup)7             -0.6618     0.2767  -2.392 0.016772 *  
## as.factor(ColorGroup)8              1.8598     0.2307   8.060  7.6e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2020.2  on 1543  degrees of freedom
## Residual deviance: 1558.6  on 1526  degrees of freedom
## AIC: 1594.6
## 
## Number of Fisher Scoring iterations: 17

Model 3: Color Presence ~ Color Group and Regions (with an interaction)

Below, we see all of the coefficients that represent the log odds of the different regions compared to (South/East) Africa, the presence of the color black in flags, and the interaction between Africa and the color black. This model has the lowerst AIC = 1552.7 and a residual deviance of 1376.7 on 1456 d.o.f. This model is seemingly better than the previous models based on AIC, but we should test the models to see if any of them can significantly explain the presence (or absence) of a color.

fit3 <- glm(ColorPresent ~ Region * as.factor(ColorGroup), family="binomial", data =df3)
fit3
## 
## Call:  glm(formula = ColorPresent ~ Region * as.factor(ColorGroup), 
##     family = "binomial", data = df3)
## 
## Coefficients:
##                                              (Intercept)  
##                                                 -0.15415  
##                                RegionAustralia - Oceania  
##                                                 -2.33076  
##                                       RegionCentral Asia  
##                                                 -1.45529  
##                           RegionEast Asia/Southeast Asia  
##                                                 -1.02450  
##                                             RegionEurope  
##                                                 -2.14843  
##                              RegionNorth/Central America  
##                                                 -0.47446  
##                        RegionNorthern Africa/Middle East  
##                                                 -0.28768  
##                                    RegionNorthern Europe  
##                                                 -2.04307  
##                                      RegionSouth America  
##                                                 -2.24374  
##                                         RegionSouth Asia  
##                                                 -0.35667  
##                                     RegionWestern Africa  
##                                                -18.41192  
##                                   as.factor(ColorGroup)2  
##                                                  1.35812  
##                                   as.factor(ColorGroup)3  
##                                                  0.96508  
##                                   as.factor(ColorGroup)4  
##                                                  0.46431  
##                                   as.factor(ColorGroup)5  
##                                                 -0.84438  
##                                   as.factor(ColorGroup)6  
##                                                -18.41192  
##                                   as.factor(ColorGroup)7  
##                                                 -1.04982  
##                                   as.factor(ColorGroup)8  
##                                                  0.62415  
##         RegionAustralia - Oceania:as.factor(ColorGroup)2  
##                                                  1.59679  
##                RegionCentral Asia:as.factor(ColorGroup)2  
##                                                  0.94446  
##    RegionEast Asia/Southeast Asia:as.factor(ColorGroup)2  
##                                                 18.38660  
##                      RegionEurope:as.factor(ColorGroup)2  
##                                                  2.66723  
##       RegionNorth/Central America:as.factor(ColorGroup)2  
##                                                 -0.28768  
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)2  
##                                                  1.43508  
##             RegionNorthern Europe:as.factor(ColorGroup)2  
##                                                  1.24457  
##               RegionSouth America:as.factor(ColorGroup)2  
##                                                  2.13838  
##                  RegionSouth Asia:as.factor(ColorGroup)2  
##                                                  0.25131  
##              RegionWestern Africa:as.factor(ColorGroup)2  
##                                                 18.43172  
##         RegionAustralia - Oceania:as.factor(ColorGroup)3  
##                                                 -0.18492  
##                RegionCentral Asia:as.factor(ColorGroup)3  
##                                                 -0.04879  
##    RegionEast Asia/Southeast Asia:as.factor(ColorGroup)3  
##                                                 -2.55901  
##                      RegionEurope:as.factor(ColorGroup)3  
##                                                 -0.38526  
##       RegionNorth/Central America:as.factor(ColorGroup)3  
##                                                 -1.61741  
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)3  
##                                                 -0.78561  
##             RegionNorthern Europe:as.factor(ColorGroup)3  
##                                                 -0.15415  
##               RegionSouth America:as.factor(ColorGroup)3  
##                                                  0.73967  
##                  RegionSouth Asia:as.factor(ColorGroup)3  
##                                                 -0.45426  
##              RegionWestern Africa:as.factor(ColorGroup)3  
##                                                 19.44681  
##         RegionAustralia - Oceania:as.factor(ColorGroup)4  
##                                                  1.86645  
##                RegionCentral Asia:as.factor(ColorGroup)4  
##                                                  0.45199  
##    RegionEast Asia/Southeast Asia:as.factor(ColorGroup)4  
##                                                 -0.16112  
##                      RegionEurope:as.factor(ColorGroup)4  
##                                                  1.27866  
##       RegionNorth/Central America:as.factor(ColorGroup)4  
##                                                 -0.27753  
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)4  
##                                                 -3.11352  
##             RegionNorthern Europe:as.factor(ColorGroup)4  
##                                                  0.88562  
##               RegionSouth America:as.factor(ColorGroup)4  
##                                                  1.93359  
##                  RegionSouth Asia:as.factor(ColorGroup)4  
##                                                 -0.46431  
##              RegionWestern Africa:as.factor(ColorGroup)4  
##                                                 18.86390  
##         RegionAustralia - Oceania:as.factor(ColorGroup)5  
##                                                  3.79929  
##                RegionCentral Asia:as.factor(ColorGroup)5  
##                                                  0.84438  
##    RegionEast Asia/Southeast Asia:as.factor(ColorGroup)5  
##                                                  1.90525  
##                      RegionEurope:as.factor(ColorGroup)5  
##                                                  2.58735  
##       RegionNorth/Central America:as.factor(ColorGroup)5  
##                                                  1.56000  
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)5  
##                                                 -1.80483  
##             RegionNorthern Europe:as.factor(ColorGroup)5  
##                                                  3.44707  
##               RegionSouth America:as.factor(ColorGroup)5  
##                                                  2.90580  
##                  RegionSouth Asia:as.factor(ColorGroup)5  
##                                                 -0.59071  
##              RegionWestern Africa:as.factor(ColorGroup)5  
##                                                 18.42962  
##         RegionAustralia - Oceania:as.factor(ColorGroup)6  
##                                                  2.33076  
##                RegionCentral Asia:as.factor(ColorGroup)6  
##                                                  1.45529  
##    RegionEast Asia/Southeast Asia:as.factor(ColorGroup)6  
##                                                  1.02450  
##                      RegionEurope:as.factor(ColorGroup)6  
##                                                  2.14843  
##       RegionNorth/Central America:as.factor(ColorGroup)6  
##                                                  0.47446  
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)6  
##                                                  0.28768  
##             RegionNorthern Europe:as.factor(ColorGroup)6  
##                                                  2.04307  
##               RegionSouth America:as.factor(ColorGroup)6  
##                                                  2.24374  
##                  RegionSouth Asia:as.factor(ColorGroup)6  
##                                                  0.35667  
##              RegionWestern Africa:as.factor(ColorGroup)6  
##                                                 18.41192  
##         RegionAustralia - Oceania:as.factor(ColorGroup)7  
##                                                  2.72380  
##                RegionCentral Asia:as.factor(ColorGroup)7  
##                                                  1.96611  
##    RegionEast Asia/Southeast Asia:as.factor(ColorGroup)7  
##                                                -16.33759  
##                      RegionEurope:as.factor(ColorGroup)7  
##                                                  1.37141  
##       RegionNorth/Central America:as.factor(ColorGroup)7  
##                                                 -0.21869  
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)7  
##                                                 -0.85972  
##             RegionNorthern Europe:as.factor(ColorGroup)7  
##                                                -15.31902  
##               RegionSouth America:as.factor(ColorGroup)7  
##                                                  1.83828  
##                  RegionSouth Asia:as.factor(ColorGroup)7  
##                                                -17.00542  
##              RegionWestern Africa:as.factor(ColorGroup)7  
##                                                 17.31331  
##         RegionAustralia - Oceania:as.factor(ColorGroup)8  
##                                                  2.67168  
##                RegionCentral Asia:as.factor(ColorGroup)8  
##                                                  0.98528  
##    RegionEast Asia/Southeast Asia:as.factor(ColorGroup)8  
##                                                  1.42997  
##                      RegionEurope:as.factor(ColorGroup)8  
##                                                  2.10921  
##       RegionNorth/Central America:as.factor(ColorGroup)8  
##                                                  0.83113  
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)8  
##                                                  1.37582  
##             RegionNorthern Europe:as.factor(ColorGroup)8  
##                                                  2.95936  
##               RegionSouth America:as.factor(ColorGroup)8  
##                                                  2.11021  
##                  RegionSouth Asia:as.factor(ColorGroup)8  
##                                                  0.39750  
##              RegionWestern Africa:as.factor(ColorGroup)8  
##                                                 17.57419  
## 
## Degrees of Freedom: 1543 Total (i.e. Null);  1456 Residual
## Null Deviance:       2020 
## Residual Deviance: 1377  AIC: 1553

Comparing the 3 models

The third model, which has an interaction between the region and color, has the lowest AIC and the closest residual deviance to the residual degrees of freedom. However, below we see that none of the models have a significant p-value and I would refrain from trying to predict the color presence of new flags, even given the region.

Further Analysis

To extend this analysis, we could perhaps add another term. For example, the number of colors in the flag, the objects/shapes in the flag, or the horizontal/vertical/diagonal pattern could be added into the model to better explain the differences of flags from region-to-region. Perhaps, a symmetry indicator would make sense in place of the horizontal/vertical bands. Perhaps more color groups would help reduce variance within each grouping, or using K-means clustering, rather than the binning technique explained previously. Lastly, since RGB values can be plotted rather intuitively in 3-D plots, I think an interactive 3-D plot (possibly Shiny web app) would be a really cool way of interacting with the various flags.

Summary

At the beginning of our analysis, we discussed the country and region selection process. Slight differences in the selection of flags and regions may directly change the results of this flag analysis. After choosing the 193 U.N. members as the flags and the 11 regions outlined previously, we discussed how images can be read into red, green, and blue value (RGB) form. From there, we discussed two color identification methods: the binning method and the k-means clustering method. Since our human eyes may struggle indicating which colors are present in a flag, we need a way to identify colors using quantifiable methods, such as the binning or k-means methods.

Personally I thought the binning method was a little more intuitive, and 2x2x2 groupings seemed like an okay fit, so I went with this method over the k-means method. From there, it was found that the most common number of colors in a flag was 3, followed by 2. The most common color combination was red, white, and blue, followed by red and white. The red, green, and yellow color scheme was popular in West Africa. The red, white, black color scheme was popular in North Africa/Middle East. And lastly, it was found that the blue, red, and white color scheme was popular in Australia/Oceania.

R packages used

  • Thejpeg::readJPEG() function was used to read in images of flags.
  • The colordistance and countcolors packages made it really easy to bin colors, count the number of pixels within certain color groups, and display histograms or 3D plots.
  • The rvest package is used to extract flag descriptions, countries, regions, and file paths from HTML files (downloaded from The 2020 CIA World Factbook)

Appendix

A: The 193 UN Members (as of 2022)

Region Code Country UN Member State
Africa AO Angola 1976
Africa BC Botswana 1966
Africa BY Burundi 1962
Africa CN Comoros 1975
Africa CG Congo, Democratic Republic of the 1960
Africa DJ Djibouti 1977
Africa ER Eritrea 1993
Africa WZ Eswatini 1968
Africa ET Ethiopia 1945
Africa KE Kenya 1963
Africa LT Lesotho 1966
Africa MA Madagascar 1960
Africa MI Malawi 1964
Africa MP Mauritius 1968
Africa MZ Mozambique 1975
Africa WA Namibia 1990
Africa RW Rwanda 1962
Africa TP Sao Tome and Principe 1975
Africa SE Seychelles 1976
Africa SO Somalia 1960
Africa SF South Africa 1945
Africa OD South Sudan 2011
Africa TZ Tanzania 1961
Africa UG Uganda 1962
Africa ZA Zambia 1964
Africa ZI Zimbabwe 1980
Australia - Oceania AS Australia 1945
Australia - Oceania FJ Fiji 1970
Australia - Oceania KR Kiribati 1999
Australia - Oceania RM Marshall Islands 1991
Australia - Oceania FM Micronesia, Federated States of 1991
Australia - Oceania NR Nauru 1999
Australia - Oceania NZ New Zealand 1945
Australia - Oceania PS Palau 1994
Australia - Oceania WS Samoa 1976
Australia - Oceania BP Solomon Islands 1978
Australia - Oceania TN Tonga 1999
Australia - Oceania TV Tuvalu 2000
Australia - Oceania NH Vanuatu 1981
Central Asia KZ Kazakhstan 1992
Central Asia KG Kyrgyzstan 1992
Central Asia RS Russia 1945
Central Asia TI Tajikistan 1992
Central Asia TX Turkmenistan 1992
Central Asia UZ Uzbekistan 1992
East Asia/Southeast Asia BX Brunei 1984
East Asia/Southeast Asia BM Burma (Myanmar) 1948
East Asia/Southeast Asia CB Cambodia 1955
East Asia/Southeast Asia CH China 1945
East Asia/Southeast Asia ID Indonesia 1950
East Asia/Southeast Asia JA Japan 1956
East Asia/Southeast Asia KN Korea, North 1991
East Asia/Southeast Asia KS Korea, South 1991
East Asia/Southeast Asia LA Laos 1955
East Asia/Southeast Asia MY Malaysia 1957
East Asia/Southeast Asia MG Mongolia 1961
East Asia/Southeast Asia PP Papua New Guinea 1975
East Asia/Southeast Asia RP Philippines 1945
East Asia/Southeast Asia SN Singapore 1965
East Asia/Southeast Asia TH Thailand 1946
East Asia/Southeast Asia TT Timor-Leste 2002
East Asia/Southeast Asia VM Vietnam 1977
Europe AL Albania 1955
Europe AN Andorra 1993
Europe AU Austria 1955
Europe BO Belarus 1945
Europe BE Belgium 1945
Europe BK Bosnia and Herzegovina 1992
Europe BU Bulgaria 1955
Europe HR Croatia 1992
Europe CY Cyprus 1960
Europe EZ Czechia (Czech Republic) 1993
Europe FR France 1945
Europe GM Germany 1973
Europe GR Greece 1945
Europe HU Hungary 1955
Europe IT Italy 1955
Europe LS Liechtenstein 1990
Europe LU Luxembourg 1945
Europe MT Malta 1964
Europe MD Moldova 1992
Europe MN Monaco 1993
Europe MJ Montenegro 2006
Europe NL Netherlands 1945
Europe MK North Macedonia 1993
Europe PL Poland 1945
Europe PO Portugal 1955
Europe RO Romania 1955
Europe SM San Marino 1992
Europe RI Serbia 2000
Europe LO Slovakia 1993
Europe SI Slovenia 1992
Europe SP Spain 1955
Europe SZ Switzerland 2002
Europe UP Ukraine 1945
North/Central America AC Antigua and Barbuda 1981
North/Central America BF Bahamas, The 1973
North/Central America BB Barbados 1966
North/Central America BH Belize 1981
North/Central America CA Canada 1945
North/Central America CS Costa Rica 1945
North/Central America CU Cuba 1945
North/Central America DO Dominica 1978
North/Central America DR Dominican Republic 1945
North/Central America ES El Salvador 1945
North/Central America GJ Grenada 1974
North/Central America GT Guatemala 1945
North/Central America HA Haiti 1945
North/Central America HO Honduras 1945
North/Central America JM Jamaica 1962
North/Central America MX Mexico 1945
North/Central America NU Nicaragua 1945
North/Central America PM Panama 1945
North/Central America SC Saint Kitts and Nevis 1983
North/Central America ST Saint Lucia 1979
North/Central America VC Saint Vincent and the Grenadines 1980
North/Central America TD Trinidad and Tobago 1962
North/Central America US United States 1945
Northern Africa/Middle East AG Algeria 1962
Northern Africa/Middle East AM Armenia 1992
Northern Africa/Middle East AJ Azerbaijan 1992
Northern Africa/Middle East BA Bahrain 1971
Northern Africa/Middle East EG Egypt 1945
Northern Africa/Middle East GG Georgia 1992
Northern Africa/Middle East IR Iran 1945
Northern Africa/Middle East IZ Iraq 1945
Northern Africa/Middle East IS Israel 1949
Northern Africa/Middle East JO Jordan 1955
Northern Africa/Middle East KU Kuwait 1963
Northern Africa/Middle East LE Lebanon 1945
Northern Africa/Middle East LY Libya 1955
Northern Africa/Middle East MO Morocco 1956
Northern Africa/Middle East MU Oman 1971
Northern Africa/Middle East QA Qatar 1971
Northern Africa/Middle East SA Saudi Arabia 1945
Northern Africa/Middle East SU Sudan 1956
Northern Africa/Middle East SY Syria 1945
Northern Africa/Middle East TS Tunisia 1956
Northern Africa/Middle East TU Turkey 1945
Northern Africa/Middle East AE United Arab Emirates 1971
Northern Africa/Middle East YM Yemen 1947
Northern Europe DA Denmark 1945
Northern Europe EN Estonia 1991
Northern Europe FI Finland 1955
Northern Europe IC Iceland 1946
Northern Europe EI Ireland 1955
Northern Europe LG Latvia 1991
Northern Europe LH Lithuania 1991
Northern Europe NO Norway 1945
Northern Europe SW Sweden 1946
Northern Europe UK United Kingdom 1945
South America AR Argentina 1945
South America BL Bolivia 1945
South America BR Brazil 1945
South America CI Chile 1945
South America CO Colombia 1945
South America EC Ecuador 1945
South America GY Guyana 1966
South America PA Paraguay 1945
South America PE Peru 1945
South America NS Suriname 1975
South America UY Uruguay 1945
South America VE Venezuela 1945
South Asia AF Afghanistan 1946
South Asia BG Bangladesh 1974
South Asia BT Bhutan 1971
South Asia IN India 1945
South Asia MV Maldives 1965
South Asia NP Nepal 1955
South Asia PK Pakistan 1947
South Asia CE Sri Lanka 1955
Western Africa BN Benin 1960
Western Africa UV Burkina Faso 1960
Western Africa CV Cabo Verde 1975
Western Africa CM Cameroon 1960
Western Africa CT Central African Republic 1960
Western Africa CD Chad 1960
Western Africa CF Congo, Republic of the 1960
Western Africa IV Cote d’Ivoire 1960
Western Africa EK Equatorial Guinea 1968
Western Africa GB Gabon 1960
Western Africa GA Gambia, The 1965
Western Africa GH Ghana 1957
Western Africa GV Guinea 1958
Western Africa PU Guinea-Bissau 1974
Western Africa LI Liberia 1945
Western Africa ML Mali 1960
Western Africa MR Mauritania 1961
Western Africa NG Niger 1960
Western Africa NI Nigeria 1960
Western Africa SG Senegal 1960
Western Africa SL Sierra Leone 1961
Western Africa TO Togo 1960

B: Dependencies not included in this analysis (Non-UN Members)

Dependencies and Territories of The United States

Flags shown are the bolded territories, top-to-bottom and left-to-right. Note that 9 of the 14 territories did not have unique flags. Those with unique flags have common elements such as the red, white, and blue color scheme, the red and white stripes, or the bald eagle.

Dependencies and Territories of The United Kingdom

The dependencies of the U.K. that have unique flags are bolded and shown above, displayed top-to-bottom and left-to-right. It is interesting how so many flags have the U.K. flag in the canton region (top-left corner) of their flag, with the only differences being the specific emblem or shield. I really like the wavy stripes in the British Indian Ocean Territory flag, the castle in the Gibraltar flag, and the 3 legs in the Isle of Man.

Dependencies and Territories of Other Countries

Here are 6 other flags from some of the other dependencies. Once again, we see the reach of the British colonization and their flag in the canton region of flags of New Zealand or Australia dependencies. The six flags shown are: top row - Cook Islands (New Zealand), Norfolk Island (Australia), French Polynesia (France), bottom row - Sint Maarten (Netherlands), Niue, Faroe Islands (Denmark).