Across the world, citizens take great pride in their national flags. Flags are highly symbolic and are often at the core of a nation’s identity. In the United States, we celebrate our Independence Day (4th of July) by flying the American flag high in the sky and dressing up in the “Red, White, and Blue” colors. National flags invoke feelings of patriotism, pride, and freedom. They are not just merely cloths like everyday t-shirts. The symbolism also holds true when it comes to negative connotations. For example, burning or destroying a national flags resembles rebellion, or being displeased with the government. In many countries, these actions can have severe punishments. All of this being said, when a flag is adopted, symbolism is naturally associated with that flag.
Additionally, flags can be associated with military presence or political power. In regions of the world with unstable governments, flags can change frequently, as shown here in this YouTube Video (17:09) - The World: Timeline of National Flags: 1019-2020. Nearly every current flag was adopted after 1800.
Below, we can note some interesting patterns associated with certain regions of the world. Some regions have highly similar flag colors or geometric patterns. For example:
Some flags are highly reflective of previous colonizers. For example, Australia, New Zealand, and the United States all have the red, white, and blue color patterns, derived from the previous United Kingdom colonization. Many flags share the same symbols or objects, such as stars, the moon, or the sun.
Commons.Wikimedia.org: Flag Map of the World (2022)
As shown above, it seems like some regions have specific colors associated with them. Today, we will research the question:
Although the idea for this analysis came to me naturally, I am not the first person to want to study the colors of national flags. Apparently, the study of flags is called Vexillology and there is a community who devotes time to studying flags, flag history, etc… Below, I link to some of the previous studies that I found interesting. The biggest separator when it comes to these studies’ findings is how different “vexillologists” group the different colors. For example, some separate light blue from dark blue, some look at common color combinations, and some differentiate between the presence of a color (regardless of amount) and the total area on a flag that a color occupies. This goes to show that indicating which colors are present in a flag can be a subjective process.
Here are some previous studies conducted:
Flags of the World posted a study (2000) conducted by Bruce Berry that breaks down the percentage of color usage by continent and over time.
Flags of the World posted another study (2000) conducted by Ivan Sarajcic that breaks down the most common color combinations.
In previous studies that I read, the methodology behind color identification wasn’t specified. As mentioned, color identification can be highly subjective, so today I will go into a lot of detail behind how I choose to quantify and group colors. To do so, I discuss how images are read in by computers as “data” and the different techniques of image processing.
All of the images of the flags are free to download from the 2020 CIA World Factbook. Downloading the factbook is a download of the entire website, including HTML pages, style sheets, and raw data. Outside of flag analysis, this could be a good data source for research questions conducted globally, including thos related to economy, transportation, demography, military, etc… There are 268 different HTML pages associated with different countries, territories, oceans, and other areas. Using the rvest
library makes data collection from HTML pages really easy.
In the CIA World Factbook, I have access to 256 different flags for various countries, territories, dependencies, and other areas. For my analysis of the relationship between flag colors and global regions, I will only look at the 193 countries acknowledged as “sovereign states” by the United Nations (UN). In order to be a member of the UN, a country must be self-governing and voted into the UN. Here is a link to the UN Member States. Below are the various member states, plotted with most-recently admitted countries colored darker blue. A full list of each country and the year they were voted into the United Nations can be found in Appendix A.
By only using the countries recognized by the United Nations, we may not include all flags of “sovereign” states that aren’t recognized as such by the UN. This includes permanent UN Observers: Vatican City and Palestine, as well as Taiwan which was a former UN member. All of these places have unique flags and some of the previous studies I read, included them, but I will not. Additionally, territories, commonwealths, smaller islands, and other dependencies are not included in this analysis. Appendix B shows some of the flags not included in this analysis.
Below, we can see the 10 different regions defined by the CIA World Factbook. Personally, I do not like the groupings they have created. Mostly because Africa and Europe have a lot of countries, where as North America only has 3 countries.
Instead of the CIA World Factbook’s Regions (outlined above), I will define my own regions, shown below, by mixing the Factbook’s regions with the regions of the United Nations geoscheme. I grouped North America and Central America. I separated Northern Europe (all of the flags with crosses) from the rest of the Europe. I separated Africa into West Africa, Northern Africa (combined with Middle East), and the rest of Africa (South and East). I left the Asia and Oceania groups the same as the CIA World Factbook. I could have combined Central Asia and South Asia to get some more countries in each of the different regions, but there weren’t any similarities in these countries flags or their histories. Appendix A has a full list of the countries, country codes, and regions I assigned each to.
Here are the total number of countries in each of my different regions:
x | |
---|---|
Africa | 26 |
Australia - Oceania | 13 |
Central Asia | 6 |
East Asia/Southeast Asia | 17 |
Europe | 33 |
North/Central America | 23 |
Northern Africa/Middle East | 23 |
Northern Europe | 10 |
South America | 12 |
South Asia | 8 |
Western Africa | 22 |
Any image can be read in as a combination of Red, Green, and Blue (RGB) values. 3 Values (between 0-1 or 0-255) are assigned to each pixel in an image. For example, we can read in the U.S. flag using the readJPEG()
function from the jpeg
library and display its dimensions. We see that this image is 263 pixels tall, 500 pixels wide, and has 3 channels, stored as an array with 394,500 elements (263*500*3). We can display this image using the rasterImage()
function from the graphics
library to display the image, converting the RGB values back into an image.
US <- readJPEG("attachments/flags/US-flag.jpg")
dim(US)
## [1] 263 500 3
length(US)
## [1] 394500
plot(x = c(0,500), y = c(0, 263),
type = "n", xaxt = "n", yaxt = "n", frame.plot=FALSE,
xlab = "United States Flag",
ylab = "")
rasterImage(US, 0, 0, 500, 263)
Since any image (not just flags) can be represented as combinations of red, green, and blue values; it is possible to plot any image on a 3D graph to see the spread of the different color values. The plotPixels()
function from the colordistance
package makes it really easy to do this. By giving the function the image path and specifying a random sample (n) = 10,000 points, we can start to see the clusters of red, white, and blue values.
set.seed(1776)
plotPixels("attachments/flags/US-flag.jpg",
lower=NULL, upper = NULL, n =10000, pch=10)
Although our eyes can look at the U.S. flag and see it is comprised of 3 colors: red, white, and blue, computers read the various pixels as slight variations of these different colors. For example, we see the top 10 colors (represented in hex code) as a sum of the number of pixels. The top color #FFFFFF (white) only makes up 15.9% of the pixels. We also see other variations of white (all of the hex codes starting with #F…..) and variations of red (#E…..).
## #FFFFFF #FFFEFF #ED1B24 #FFFDFF #FEFFFF #EB1C22 #F01A26 #EE1B21 #ED1B26 #EA1D22
## 20961 2947 2534 2248 2205 2063 2058 1745 1574 1403
## (Other)
## 91762
By factoring the different colors, we see there are a total of 9,742 unique colors in this one image of the U.S. flag. Although our eye recognizes only 3 distinct colors, the computer reads in nearly ten thousand colors. This could be due to poor image quality and “noise”, or that some pixels that fall on the border of red/white, blue/white, etc… and may contain a mix of these colors.
## [1] 9742
To solve the issue of having many, many unique colors when we expect only a few unique colors, we can take one approach of “binning” each pixel. From the colordistance
package, we can use the getImageHist()
function to set upper and lower limits for each of the RGB channels. For example, any pixel with R > 0.7, G > 0.7, and B > 0.7 could be labeled “White”, a pixel with R > 0.7, G < 0.3, and B < 0.3 could be “Red”, etc… If we increase the number of bins we use, we get more colors and the bins are more specific. If we decrease the number of bins we use, we get less color groups and the groupings will be easier to define. Since most flags only have 4-5 recognizably different colors and I think a total of 8 colors are easy to explain, I will use 2*2*2 bins. This means that we use 0.5 as a cutoff value for each of the channels and bin each pixel as follows:
Defined Color Bins
After binning the U.S. flag into those 8 color groups (2 for each channel), we see 3 main colors: red, white, and blue with percentages that accurately reflect what our eyes can see.
## RGB and HSV are device-dependent, perceptually non-uniform color spaces. See 'Color spaces' vignette for more information.
##
## Using 2*2*2 = 8 bins
## RGB and HSV are device-dependent, perceptually non-uniform color spaces. See 'Color spaces' vignette for more information.
##
## Using 2*2*2 = 8 bins
From the countcolors package, we can use the getKMeanColors()
function, specifying the number of clusters. This function finds the optimal average RGB values that would best describe the image and reports these RGB values. Then we can tabulate the average RGB values using the extractClusters()
function from the same package.
R | G | B | Pct |
---|---|---|---|
0.9210873 | 0.1233405 | 0.1571953 | 0.4110 |
0.0398685 | 0.4077117 | 0.6969841 | 0.1745 |
0.9791840 | 0.9687788 | 0.9715376 | 0.4145 |
We can see the average RGB values and the area percentage of each color on the flag are not much different using the K-Means approach (above) and the binning technique (below).
r | g | b | Pct | |
---|---|---|---|---|
2 | 0.9197430 | 0.1154069 | 0.1490103 | 0.4076198 |
5 | 0.0245151 | 0.3995950 | 0.6941691 | 0.1702966 |
8 | 0.9803288 | 0.9635887 | 0.9663571 | 0.4141369 |
Whenever doing k-means clustering, it is only natural to use a scree plot and the “elbow” method to find the most optimal number of clusters. For example, we could run through each flag and use 1 cluster, 2 clusters, … all the way up to 10 clusters. Then plot the sum of squares in a scree plot and use the “elbow” method to select which number of clusters would be best. The “elbow” method looks for the number of clusters that explain the most variance (between SS/ total SS), where adding more clusters doesn’t explain much more variance. Of course, in the U.S. flag, the optimal number would be 3 clusters (red, white, and blue.)
Although, it’s pretty obvious with the U.S. flag that there are only 3 colors, some flags are harder to interpret. For example, the Saint Pierre and Miquelon flag (a French archipelago) may have anywhere between 3-6 clusters. When a distinct “elbow” is not present, the number of clusters you go with, can be up to interpretation. Some statisticians may look to explain at least 90% of the total variance. Others may look for clusters that explain at least 10% more than the previous number of clusters.
This is the flag of Saint Pierre and Miquelon. Its complexity makes the number of clusters hard to interpret from the scree plot. Additionally, the scree plot is the proportion of Between Sum of Squares and Total Sum of Squares, not accounting for the Within Sum of Squares: how points deviate from the cluster means.
R | G | B | Pct |
---|---|---|---|
0.1450245 | 0.5314314 | 0.3917892 | 0.04000 |
0.9274081 | 0.8092170 | 0.1227936 | 0.13045 |
0.7619533 | 0.1788976 | 0.1808250 | 0.09695 |
0.1052367 | 0.6447611 | 0.8602119 | 0.51535 |
0.9540612 | 0.9696906 | 0.9719438 | 0.13445 |
0.0921948 | 0.0702046 | 0.0422374 | 0.08280 |
Although K-means might be a viable approach, we can see that there might be some subjectivity involved with selecting the correct number of clusters, especially when it comes to more complex images. In my opinion, it isn’t as intuitive as the binning approach. With that being said, the binning approach is a little more “naive” and selecting the right cutoff values can also be somewhat subjective. The binning approach doesn’t do well with colors really close to the cutoff values. For example, certain shades of blue might look really similar, but might fall into two different color groups. Perhaps, a hybrid approach might be best: using one approach for simpler flags and the other for flags with colors close to the bin cutoff values.
Above, I outline two different approaches to identifying colors: the binning approach and the k-means clustering approach. The binning approach, to me, is more intuitive and a little easier to explain. The binning approach might not explain quite as much variability from flag-to-flag and a given color that is really close to the cutoff values might be put into two different color groups, whereas k-means clustering handles this specific instance a little better. There are times where each approach might better handle certain flags. For this analysis and the sake of simplicity, I will use the binning approach with 8 color groups, outlined previously. After binning all of the 193 UN member flags, we can now start to explore the flag colors.
For each of the color groups, we can look at the flags with the most area encompassed by these colors, shown below. For group 2 (red), 3 (green), 4 (yellow), 5 (blue), and 6 (light blue-green), the flags and percentages make sense. China’s flag is 95.7% red, Saudi Arabia’s flag is 90.3% green, Brunei’s flag is 54.8% yellow, Nauru’s flag is 86.6% blue, and Micronesia’s flag is 92.9% light blue-green. However, for color group 1, 6, and 8, we can see the issues that arise with binning. Group 1 was supposed to represent black, but Turkmenistan’s flag is no where near 82.5% black. The dark green color is technically 0.43 < 0.5, so instead of being in color group 3, it’s technically in color group 1. Similarly, for color group 8, the Somalia flag is not 99.9% white, but the light blue has RGB values > 0.5, so it is technically in the color group. Afghanistan’s flag does not have 0.4% pink/purple, but the border of the red and green stripes are mixed and placed in this color group. These examples show where k-means might be a better approach.
ColorGroup | r | g | b | pct | Region | Country | Code |
---|---|---|---|---|---|---|---|
1 | 0.0091817 | 0.4276603 | 0.3155710 | 0.8252108 | Central Asia | Turkmenistan | TX |
2 | 0.9297423 | 0.1330955 | 0.1580773 | 0.9570149 | East Asia/Southeast Asia | China | CH |
3 | 0.0329333 | 0.5684799 | 0.2783894 | 0.9035003 | Northern Africa/Middle East | Saudi Arabia | SA |
4 | 0.9769573 | 0.8786282 | 0.0181822 | 0.5375680 | East Asia/Southeast Asia | Brunei | BX |
5 | 0.0045805 | 0.3976509 | 0.7013764 | 0.8660480 | Australia - Oceania | Nauru | NR |
6 | 0.9597217 | 0.4717743 | 0.5356438 | 0.0041904 | South Asia | Afghanistan | AF |
7 | 0.4818160 | 0.6861815 | 0.8707143 | 0.9289302 | Australia - Oceania | Micronesia, Federated States of | FM |
8 | 0.5855673 | 0.8584932 | 0.9717104 | 0.9999451 | Africa | Somalia | SO |
Continuing forward, we can look at the distribution of each of the color groups, after filtering out any color that appears in < 2% of the flag. Filtering out colors < 2% removes any minor details or “noise”. In doing so, we notice that the pink/purple color (group 6) is not present in any of the countries flags more than 2%. We can see a lot of flags have the red and white colors, with relatively few using the black (1) or light blue/green color (6).
Also notice the large spikes in the 30-40% range. Theses are indicative of tricolor (three-colored) flags who may have roughly a 33-33-33 split. Similarly, the red spike around 50% is indicative of a bicolor (two-colored) flag.
Rather than looking at the presence of colors, we want to look at the #1 or most “dominant” flag colors. Based on the % area encompassed by each of the color groups, we can analyze which colors appear to be the most dominant. Essentially these values are the color groups we would pick if we had to select just one to represent the flag. Clearly color group 2 (red) and color group 8 (white) are the most popular flag colors. These results are consistent with previous studies’ findings.
ColorGroup | n |
---|---|
1 | 17 |
2 | 59 |
3 | 21 |
4 | 22 |
5 | 19 |
7 | 12 |
8 | 46 |
As we were able to visually see in the introduction, Western Africa has a large presence of yellows and greens. Northern Africa has a lot of black and whites. South America has red and yellows. South/East Asia has a lot of red and whites. Now, we can see a similar pattern when looking at the #1 colors in each of the countries flags.
Next, we look at the distribution of the number of colors present in each flag. Using a cutoff value of 5% to get only the main colors, we see that most countries have 3, 2, or 4 flag colors. The only 6-colored flag belongs to South Africa who has black, yellow, green, white, red, and blue present in their flag. We see that throughout Europe, every flag has exactly 2 or 3 colors. In North, Central, and South America, most countries have 3 colors. Northern Africa/Middle East, South Asia, and East Asia/Southeast Asia have a similar distribution of 3, 2, and 4 colors. Australia/Oceania has a uniform distribution of 2, 3, and 4 colored flags. Western Africa has a very large number of 3-colored flags.
## `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
n_colors | n_flags |
---|---|
1 | 5 |
2 | 53 |
3 | 103 |
4 | 23 |
5 | 8 |
6 | 1 |
After concatenating all of the 1’s and 0’s (presence in color groups) in Excel, we can summarise the most common color combinations as follows:
Above, we explored the most common flag colors, the colors that take up the most area on flags, color combinations, and number of colors present on flags. Now, we will test to see if any of these results are significantly different from region to region. Using a cutoff of 5%, we look at which colors are present in flags (1 if present, 0 if present < 5%).
## , , present = 0
##
## colorgroup
## regions 1 2 3 4 5 6 7 8
## Africa 14 6 8 11 19 26 20 10
## Australia - Oceania 12 5 11 7 5 13 9 4
## Central Asia 5 2 4 4 5 6 4 3
## East Asia/Southeast Asia 13 0 16 12 9 17 17 5
## Europe 30 5 28 21 21 33 29 13
## North/Central America 15 9 18 14 11 23 20 7
## Northern Africa/Middle East 14 2 13 22 22 23 21 4
## Northern Europe 9 4 8 7 4 10 10 2
## South America 11 3 8 6 7 12 10 5
## South Asia 5 2 4 5 7 8 8 3
## Western Africa 22 5 3 7 16 22 20 13
##
## , , present = 1
##
## colorgroup
## regions 1 2 3 4 5 6 7 8
## Africa 12 20 18 15 7 0 6 16
## Australia - Oceania 1 8 2 6 8 0 4 9
## Central Asia 1 4 2 2 1 0 2 3
## East Asia/Southeast Asia 4 17 1 5 8 0 0 12
## Europe 3 28 5 12 12 0 4 20
## North/Central America 8 14 5 9 12 0 3 16
## Northern Africa/Middle East 9 21 10 1 1 0 2 19
## Northern Europe 1 6 2 3 6 0 0 8
## South America 1 9 4 6 5 0 2 7
## South Asia 3 6 4 3 1 0 0 5
## Western Africa 0 17 19 15 6 0 2 9
Below, we go color-by-color and use a chi-square test to see if certain regions have colors more or less present than others. We first report the chi-square test with d.o.f. = 10 (11 regions - 1), then use simulated p-value to see if there are any relationships that exist. We will find that every p-value (non-simulated and simulated) < 0.05 indicating an association exists within each color group. The mosaic plots help us understand which presence of colors have higher or lower associations with certain regions. By looking at the mosaic plots, we see:
##
## Pearson's Chi-squared test
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 79.322, df = 10, p-value = 6.818e-13
##
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 79.322, df = NA, p-value = 0.0004998
##
## Pearson's Chi-squared test
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 69.931, df = 10, p-value = 4.572e-11
##
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 69.931, df = NA, p-value = 0.0004998
##
## Pearson's Chi-squared test
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 32.24, df = 10, p-value = 0.0003652
##
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 32.24, df = NA, p-value = 0.0004998
##
## Pearson's Chi-squared test
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 22.229, df = 10, p-value = 0.01398
##
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 22.229, df = NA, p-value = 0.01049
##
## Pearson's Chi-squared test
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 37.86, df = 10, p-value = 4.014e-05
##
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 37.86, df = NA, p-value = 0.0004998
##
## Pearson's Chi-squared test
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 193, df = 10, p-value < 2.2e-16
##
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 193, df = NA, p-value = 0.0004998
##
## Pearson's Chi-squared test
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 108.38, df = 10, p-value < 2.2e-16
##
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 108.38, df = NA, p-value = 0.0004998
##
## Pearson's Chi-squared test
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 36.109, df = 10, p-value = 8.063e-05
##
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: matrix(table2[, , color], nrow = 11, ncol = 2)
## X-squared = 36.109, df = NA, p-value = 0.0004998
The first logistic regression model I will build uses the color group to predict whether a color is present or not (1 or 0), ignoring the region’s effect all together. The baseline color group is 1 (black). Looking at the estimates, we see significant p-values and positive coefficients for all of the other color groups besides 6 (pink/purple) and 7 (light blue/green). This indicates that the other 5 color groups are significantly more likely to be present in a flag than the color black. The light blue/green color is significantly less likely to be present than the black color. The pink/purple color is not significant, but has a really low estimate for the coefficient, which makes sense since it is not present more than 5% on any flag. We also note that AIC = 1589.9 and the residual deviance is 1573.8 on 1536 d.o.f.
fit1 <- glm(ColorPresent ~ as.factor(ColorGroup), family="binomial", data =df3)
summary(fit1)
##
## Call:
## glm(formula = ColorPresent ~ as.factor(ColorGroup), family = "binomial",
## data = df3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7329 -0.9235 -0.5267 0.9406 2.0218
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2494 0.1730 -7.223 5.09e-13 ***
## as.factor(ColorGroup)2 2.4989 0.2446 10.215 < 2e-16 ***
## as.factor(ColorGroup)3 0.7303 0.2282 3.200 0.001373 **
## as.factor(ColorGroup)4 0.8397 0.2270 3.699 0.000217 ***
## as.factor(ColorGroup)5 0.6178 0.2297 2.689 0.007162 **
## as.factor(ColorGroup)6 -16.3166 284.7721 -0.057 0.954308
## as.factor(ColorGroup)7 -0.6557 0.2755 -2.380 0.017300 *
## as.factor(ColorGroup)8 1.8356 0.2291 8.013 1.12e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2020.2 on 1543 degrees of freedom
## Residual deviance: 1573.8 on 1536 degrees of freedom
## AIC: 1589.8
##
## Number of Fisher Scoring iterations: 16
By exponentiating the coefficients, we can write out the first logistic regression model in terms of the odds ratios, instead of the log odds. The \(logit(color presence) = 0.29 + 12.7(red) + 2.08(green) + 2.3(yellow) + 1.85(blue) + 0.00000(pink/purple) + 0.52(light green/blue) + 6.27(white)\)
exp(coef(fit1))
## (Intercept) as.factor(ColorGroup)2 as.factor(ColorGroup)3
## 2.866667e-01 1.216874e+01 2.075726e+00
## as.factor(ColorGroup)4 as.factor(ColorGroup)5 as.factor(ColorGroup)6
## 2.315557e+00 1.854928e+00 8.199289e-08
## as.factor(ColorGroup)7 as.factor(ColorGroup)8
## 5.191030e-01 6.268959e+00
For the second model, the coefficients are log odds, compared to the (South/East) African region and the presence of the black color. We see, there is a higher AIC = 1594.6 than the previous model and the residual deviance is 1558.6 on 1526 d.o.f. This model does not seem to be better than the first model. Maybe the region and color group aren’t independent…
fit2 <- glm(ColorPresent ~ Region + as.factor(ColorGroup), family="binomial", data =df3)
summary(fit2)
##
## Call:
## glm(formula = ColorPresent ~ Region + as.factor(ColorGroup),
## family = "binomial", data = df3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9674 -0.8837 -0.4632 0.8828 2.1526
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7504 0.2277 -3.296 0.000980 ***
## RegionAustralia - Oceania -0.4881 0.2870 -1.700 0.089054 .
## RegionCentral Asia -0.8007 0.3925 -2.040 0.041333 *
## RegionEast Asia/Southeast Asia -0.6031 0.2649 -2.277 0.022768 *
## RegionEurope -0.7662 0.2237 -3.425 0.000614 ***
## RegionNorth/Central America -0.4953 0.2415 -2.051 0.040267 *
## RegionNorthern Africa/Middle East -0.6219 0.2429 -2.560 0.010460 *
## RegionNorthern Europe -0.7252 0.3195 -2.270 0.023204 *
## RegionSouth America -0.5530 0.2960 -1.869 0.061691 .
## RegionSouth Asia -0.6139 0.3447 -1.781 0.074920 .
## RegionWestern Africa -0.3680 0.2433 -1.513 0.130371
## as.factor(ColorGroup)2 2.5298 0.2463 10.270 < 2e-16 ***
## as.factor(ColorGroup)3 0.7396 0.2297 3.220 0.001280 **
## as.factor(ColorGroup)4 0.8505 0.2285 3.722 0.000197 ***
## as.factor(ColorGroup)5 0.6256 0.2312 2.706 0.006810 **
## as.factor(ColorGroup)6 -17.3175 466.1327 -0.037 0.970364
## as.factor(ColorGroup)7 -0.6618 0.2767 -2.392 0.016772 *
## as.factor(ColorGroup)8 1.8598 0.2307 8.060 7.6e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2020.2 on 1543 degrees of freedom
## Residual deviance: 1558.6 on 1526 degrees of freedom
## AIC: 1594.6
##
## Number of Fisher Scoring iterations: 17
Below, we see all of the coefficients that represent the log odds of the different regions compared to (South/East) Africa, the presence of the color black in flags, and the interaction between Africa and the color black. This model has the lowerst AIC = 1552.7 and a residual deviance of 1376.7 on 1456 d.o.f. This model is seemingly better than the previous models based on AIC, but we should test the models to see if any of them can significantly explain the presence (or absence) of a color.
fit3 <- glm(ColorPresent ~ Region * as.factor(ColorGroup), family="binomial", data =df3)
fit3
##
## Call: glm(formula = ColorPresent ~ Region * as.factor(ColorGroup),
## family = "binomial", data = df3)
##
## Coefficients:
## (Intercept)
## -0.15415
## RegionAustralia - Oceania
## -2.33076
## RegionCentral Asia
## -1.45529
## RegionEast Asia/Southeast Asia
## -1.02450
## RegionEurope
## -2.14843
## RegionNorth/Central America
## -0.47446
## RegionNorthern Africa/Middle East
## -0.28768
## RegionNorthern Europe
## -2.04307
## RegionSouth America
## -2.24374
## RegionSouth Asia
## -0.35667
## RegionWestern Africa
## -18.41192
## as.factor(ColorGroup)2
## 1.35812
## as.factor(ColorGroup)3
## 0.96508
## as.factor(ColorGroup)4
## 0.46431
## as.factor(ColorGroup)5
## -0.84438
## as.factor(ColorGroup)6
## -18.41192
## as.factor(ColorGroup)7
## -1.04982
## as.factor(ColorGroup)8
## 0.62415
## RegionAustralia - Oceania:as.factor(ColorGroup)2
## 1.59679
## RegionCentral Asia:as.factor(ColorGroup)2
## 0.94446
## RegionEast Asia/Southeast Asia:as.factor(ColorGroup)2
## 18.38660
## RegionEurope:as.factor(ColorGroup)2
## 2.66723
## RegionNorth/Central America:as.factor(ColorGroup)2
## -0.28768
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)2
## 1.43508
## RegionNorthern Europe:as.factor(ColorGroup)2
## 1.24457
## RegionSouth America:as.factor(ColorGroup)2
## 2.13838
## RegionSouth Asia:as.factor(ColorGroup)2
## 0.25131
## RegionWestern Africa:as.factor(ColorGroup)2
## 18.43172
## RegionAustralia - Oceania:as.factor(ColorGroup)3
## -0.18492
## RegionCentral Asia:as.factor(ColorGroup)3
## -0.04879
## RegionEast Asia/Southeast Asia:as.factor(ColorGroup)3
## -2.55901
## RegionEurope:as.factor(ColorGroup)3
## -0.38526
## RegionNorth/Central America:as.factor(ColorGroup)3
## -1.61741
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)3
## -0.78561
## RegionNorthern Europe:as.factor(ColorGroup)3
## -0.15415
## RegionSouth America:as.factor(ColorGroup)3
## 0.73967
## RegionSouth Asia:as.factor(ColorGroup)3
## -0.45426
## RegionWestern Africa:as.factor(ColorGroup)3
## 19.44681
## RegionAustralia - Oceania:as.factor(ColorGroup)4
## 1.86645
## RegionCentral Asia:as.factor(ColorGroup)4
## 0.45199
## RegionEast Asia/Southeast Asia:as.factor(ColorGroup)4
## -0.16112
## RegionEurope:as.factor(ColorGroup)4
## 1.27866
## RegionNorth/Central America:as.factor(ColorGroup)4
## -0.27753
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)4
## -3.11352
## RegionNorthern Europe:as.factor(ColorGroup)4
## 0.88562
## RegionSouth America:as.factor(ColorGroup)4
## 1.93359
## RegionSouth Asia:as.factor(ColorGroup)4
## -0.46431
## RegionWestern Africa:as.factor(ColorGroup)4
## 18.86390
## RegionAustralia - Oceania:as.factor(ColorGroup)5
## 3.79929
## RegionCentral Asia:as.factor(ColorGroup)5
## 0.84438
## RegionEast Asia/Southeast Asia:as.factor(ColorGroup)5
## 1.90525
## RegionEurope:as.factor(ColorGroup)5
## 2.58735
## RegionNorth/Central America:as.factor(ColorGroup)5
## 1.56000
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)5
## -1.80483
## RegionNorthern Europe:as.factor(ColorGroup)5
## 3.44707
## RegionSouth America:as.factor(ColorGroup)5
## 2.90580
## RegionSouth Asia:as.factor(ColorGroup)5
## -0.59071
## RegionWestern Africa:as.factor(ColorGroup)5
## 18.42962
## RegionAustralia - Oceania:as.factor(ColorGroup)6
## 2.33076
## RegionCentral Asia:as.factor(ColorGroup)6
## 1.45529
## RegionEast Asia/Southeast Asia:as.factor(ColorGroup)6
## 1.02450
## RegionEurope:as.factor(ColorGroup)6
## 2.14843
## RegionNorth/Central America:as.factor(ColorGroup)6
## 0.47446
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)6
## 0.28768
## RegionNorthern Europe:as.factor(ColorGroup)6
## 2.04307
## RegionSouth America:as.factor(ColorGroup)6
## 2.24374
## RegionSouth Asia:as.factor(ColorGroup)6
## 0.35667
## RegionWestern Africa:as.factor(ColorGroup)6
## 18.41192
## RegionAustralia - Oceania:as.factor(ColorGroup)7
## 2.72380
## RegionCentral Asia:as.factor(ColorGroup)7
## 1.96611
## RegionEast Asia/Southeast Asia:as.factor(ColorGroup)7
## -16.33759
## RegionEurope:as.factor(ColorGroup)7
## 1.37141
## RegionNorth/Central America:as.factor(ColorGroup)7
## -0.21869
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)7
## -0.85972
## RegionNorthern Europe:as.factor(ColorGroup)7
## -15.31902
## RegionSouth America:as.factor(ColorGroup)7
## 1.83828
## RegionSouth Asia:as.factor(ColorGroup)7
## -17.00542
## RegionWestern Africa:as.factor(ColorGroup)7
## 17.31331
## RegionAustralia - Oceania:as.factor(ColorGroup)8
## 2.67168
## RegionCentral Asia:as.factor(ColorGroup)8
## 0.98528
## RegionEast Asia/Southeast Asia:as.factor(ColorGroup)8
## 1.42997
## RegionEurope:as.factor(ColorGroup)8
## 2.10921
## RegionNorth/Central America:as.factor(ColorGroup)8
## 0.83113
## RegionNorthern Africa/Middle East:as.factor(ColorGroup)8
## 1.37582
## RegionNorthern Europe:as.factor(ColorGroup)8
## 2.95936
## RegionSouth America:as.factor(ColorGroup)8
## 2.11021
## RegionSouth Asia:as.factor(ColorGroup)8
## 0.39750
## RegionWestern Africa:as.factor(ColorGroup)8
## 17.57419
##
## Degrees of Freedom: 1543 Total (i.e. Null); 1456 Residual
## Null Deviance: 2020
## Residual Deviance: 1377 AIC: 1553
The third model, which has an interaction between the region and color, has the lowest AIC and the closest residual deviance to the residual degrees of freedom. However, below we see that none of the models have a significant p-value and I would refrain from trying to predict the color presence of new flags, even given the region.
To extend this analysis, we could perhaps add another term. For example, the number of colors in the flag, the objects/shapes in the flag, or the horizontal/vertical/diagonal pattern could be added into the model to better explain the differences of flags from region-to-region. Perhaps, a symmetry indicator would make sense in place of the horizontal/vertical bands. Perhaps more color groups would help reduce variance within each grouping, or using K-means clustering, rather than the binning technique explained previously. Lastly, since RGB values can be plotted rather intuitively in 3-D plots, I think an interactive 3-D plot (possibly Shiny web app) would be a really cool way of interacting with the various flags.
At the beginning of our analysis, we discussed the country and region selection process. Slight differences in the selection of flags and regions may directly change the results of this flag analysis. After choosing the 193 U.N. members as the flags and the 11 regions outlined previously, we discussed how images can be read into red, green, and blue value (RGB) form. From there, we discussed two color identification methods: the binning method and the k-means clustering method. Since our human eyes may struggle indicating which colors are present in a flag, we need a way to identify colors using quantifiable methods, such as the binning or k-means methods.
Personally I thought the binning method was a little more intuitive, and 2x2x2 groupings seemed like an okay fit, so I went with this method over the k-means method. From there, it was found that the most common number of colors in a flag was 3, followed by 2. The most common color combination was red, white, and blue, followed by red and white. The red, green, and yellow color scheme was popular in West Africa. The red, white, black color scheme was popular in North Africa/Middle East. And lastly, it was found that the blue, red, and white color scheme was popular in Australia/Oceania.
jpeg::readJPEG()
function was used to read in images of flags.colordistance
and countcolors
packages made it really easy to bin colors, count the number of pixels within certain color groups, and display histograms or 3D plots.rvest
package is used to extract flag descriptions, countries, regions, and file paths from HTML files (downloaded from The 2020 CIA World Factbook)If you are interested in this topic and want to see some cool visualizations or read more:
The Washington Post posted 7 really cool visualizations that do a good job of summarizing the most common colors, shapes, and patterns in flags, as well as the history of flags.
Visual Capitalist posted a reall cool visualization that shows the common colors and patterns using a network map.
The Flags of the World homepage. This website is a little outdated, but has information on a wide variety of flags, not just national flags.
Wikipedia - List of Flags by Color Combination includes flags of countries, states, and local municipalities.
Region | Code | Country | UN Member State |
---|---|---|---|
Africa | AO | Angola | 1976 |
Africa | BC | Botswana | 1966 |
Africa | BY | Burundi | 1962 |
Africa | CN | Comoros | 1975 |
Africa | CG | Congo, Democratic Republic of the | 1960 |
Africa | DJ | Djibouti | 1977 |
Africa | ER | Eritrea | 1993 |
Africa | WZ | Eswatini | 1968 |
Africa | ET | Ethiopia | 1945 |
Africa | KE | Kenya | 1963 |
Africa | LT | Lesotho | 1966 |
Africa | MA | Madagascar | 1960 |
Africa | MI | Malawi | 1964 |
Africa | MP | Mauritius | 1968 |
Africa | MZ | Mozambique | 1975 |
Africa | WA | Namibia | 1990 |
Africa | RW | Rwanda | 1962 |
Africa | TP | Sao Tome and Principe | 1975 |
Africa | SE | Seychelles | 1976 |
Africa | SO | Somalia | 1960 |
Africa | SF | South Africa | 1945 |
Africa | OD | South Sudan | 2011 |
Africa | TZ | Tanzania | 1961 |
Africa | UG | Uganda | 1962 |
Africa | ZA | Zambia | 1964 |
Africa | ZI | Zimbabwe | 1980 |
Australia - Oceania | AS | Australia | 1945 |
Australia - Oceania | FJ | Fiji | 1970 |
Australia - Oceania | KR | Kiribati | 1999 |
Australia - Oceania | RM | Marshall Islands | 1991 |
Australia - Oceania | FM | Micronesia, Federated States of | 1991 |
Australia - Oceania | NR | Nauru | 1999 |
Australia - Oceania | NZ | New Zealand | 1945 |
Australia - Oceania | PS | Palau | 1994 |
Australia - Oceania | WS | Samoa | 1976 |
Australia - Oceania | BP | Solomon Islands | 1978 |
Australia - Oceania | TN | Tonga | 1999 |
Australia - Oceania | TV | Tuvalu | 2000 |
Australia - Oceania | NH | Vanuatu | 1981 |
Central Asia | KZ | Kazakhstan | 1992 |
Central Asia | KG | Kyrgyzstan | 1992 |
Central Asia | RS | Russia | 1945 |
Central Asia | TI | Tajikistan | 1992 |
Central Asia | TX | Turkmenistan | 1992 |
Central Asia | UZ | Uzbekistan | 1992 |
East Asia/Southeast Asia | BX | Brunei | 1984 |
East Asia/Southeast Asia | BM | Burma (Myanmar) | 1948 |
East Asia/Southeast Asia | CB | Cambodia | 1955 |
East Asia/Southeast Asia | CH | China | 1945 |
East Asia/Southeast Asia | ID | Indonesia | 1950 |
East Asia/Southeast Asia | JA | Japan | 1956 |
East Asia/Southeast Asia | KN | Korea, North | 1991 |
East Asia/Southeast Asia | KS | Korea, South | 1991 |
East Asia/Southeast Asia | LA | Laos | 1955 |
East Asia/Southeast Asia | MY | Malaysia | 1957 |
East Asia/Southeast Asia | MG | Mongolia | 1961 |
East Asia/Southeast Asia | PP | Papua New Guinea | 1975 |
East Asia/Southeast Asia | RP | Philippines | 1945 |
East Asia/Southeast Asia | SN | Singapore | 1965 |
East Asia/Southeast Asia | TH | Thailand | 1946 |
East Asia/Southeast Asia | TT | Timor-Leste | 2002 |
East Asia/Southeast Asia | VM | Vietnam | 1977 |
Europe | AL | Albania | 1955 |
Europe | AN | Andorra | 1993 |
Europe | AU | Austria | 1955 |
Europe | BO | Belarus | 1945 |
Europe | BE | Belgium | 1945 |
Europe | BK | Bosnia and Herzegovina | 1992 |
Europe | BU | Bulgaria | 1955 |
Europe | HR | Croatia | 1992 |
Europe | CY | Cyprus | 1960 |
Europe | EZ | Czechia (Czech Republic) | 1993 |
Europe | FR | France | 1945 |
Europe | GM | Germany | 1973 |
Europe | GR | Greece | 1945 |
Europe | HU | Hungary | 1955 |
Europe | IT | Italy | 1955 |
Europe | LS | Liechtenstein | 1990 |
Europe | LU | Luxembourg | 1945 |
Europe | MT | Malta | 1964 |
Europe | MD | Moldova | 1992 |
Europe | MN | Monaco | 1993 |
Europe | MJ | Montenegro | 2006 |
Europe | NL | Netherlands | 1945 |
Europe | MK | North Macedonia | 1993 |
Europe | PL | Poland | 1945 |
Europe | PO | Portugal | 1955 |
Europe | RO | Romania | 1955 |
Europe | SM | San Marino | 1992 |
Europe | RI | Serbia | 2000 |
Europe | LO | Slovakia | 1993 |
Europe | SI | Slovenia | 1992 |
Europe | SP | Spain | 1955 |
Europe | SZ | Switzerland | 2002 |
Europe | UP | Ukraine | 1945 |
North/Central America | AC | Antigua and Barbuda | 1981 |
North/Central America | BF | Bahamas, The | 1973 |
North/Central America | BB | Barbados | 1966 |
North/Central America | BH | Belize | 1981 |
North/Central America | CA | Canada | 1945 |
North/Central America | CS | Costa Rica | 1945 |
North/Central America | CU | Cuba | 1945 |
North/Central America | DO | Dominica | 1978 |
North/Central America | DR | Dominican Republic | 1945 |
North/Central America | ES | El Salvador | 1945 |
North/Central America | GJ | Grenada | 1974 |
North/Central America | GT | Guatemala | 1945 |
North/Central America | HA | Haiti | 1945 |
North/Central America | HO | Honduras | 1945 |
North/Central America | JM | Jamaica | 1962 |
North/Central America | MX | Mexico | 1945 |
North/Central America | NU | Nicaragua | 1945 |
North/Central America | PM | Panama | 1945 |
North/Central America | SC | Saint Kitts and Nevis | 1983 |
North/Central America | ST | Saint Lucia | 1979 |
North/Central America | VC | Saint Vincent and the Grenadines | 1980 |
North/Central America | TD | Trinidad and Tobago | 1962 |
North/Central America | US | United States | 1945 |
Northern Africa/Middle East | AG | Algeria | 1962 |
Northern Africa/Middle East | AM | Armenia | 1992 |
Northern Africa/Middle East | AJ | Azerbaijan | 1992 |
Northern Africa/Middle East | BA | Bahrain | 1971 |
Northern Africa/Middle East | EG | Egypt | 1945 |
Northern Africa/Middle East | GG | Georgia | 1992 |
Northern Africa/Middle East | IR | Iran | 1945 |
Northern Africa/Middle East | IZ | Iraq | 1945 |
Northern Africa/Middle East | IS | Israel | 1949 |
Northern Africa/Middle East | JO | Jordan | 1955 |
Northern Africa/Middle East | KU | Kuwait | 1963 |
Northern Africa/Middle East | LE | Lebanon | 1945 |
Northern Africa/Middle East | LY | Libya | 1955 |
Northern Africa/Middle East | MO | Morocco | 1956 |
Northern Africa/Middle East | MU | Oman | 1971 |
Northern Africa/Middle East | QA | Qatar | 1971 |
Northern Africa/Middle East | SA | Saudi Arabia | 1945 |
Northern Africa/Middle East | SU | Sudan | 1956 |
Northern Africa/Middle East | SY | Syria | 1945 |
Northern Africa/Middle East | TS | Tunisia | 1956 |
Northern Africa/Middle East | TU | Turkey | 1945 |
Northern Africa/Middle East | AE | United Arab Emirates | 1971 |
Northern Africa/Middle East | YM | Yemen | 1947 |
Northern Europe | DA | Denmark | 1945 |
Northern Europe | EN | Estonia | 1991 |
Northern Europe | FI | Finland | 1955 |
Northern Europe | IC | Iceland | 1946 |
Northern Europe | EI | Ireland | 1955 |
Northern Europe | LG | Latvia | 1991 |
Northern Europe | LH | Lithuania | 1991 |
Northern Europe | NO | Norway | 1945 |
Northern Europe | SW | Sweden | 1946 |
Northern Europe | UK | United Kingdom | 1945 |
South America | AR | Argentina | 1945 |
South America | BL | Bolivia | 1945 |
South America | BR | Brazil | 1945 |
South America | CI | Chile | 1945 |
South America | CO | Colombia | 1945 |
South America | EC | Ecuador | 1945 |
South America | GY | Guyana | 1966 |
South America | PA | Paraguay | 1945 |
South America | PE | Peru | 1945 |
South America | NS | Suriname | 1975 |
South America | UY | Uruguay | 1945 |
South America | VE | Venezuela | 1945 |
South Asia | AF | Afghanistan | 1946 |
South Asia | BG | Bangladesh | 1974 |
South Asia | BT | Bhutan | 1971 |
South Asia | IN | India | 1945 |
South Asia | MV | Maldives | 1965 |
South Asia | NP | Nepal | 1955 |
South Asia | PK | Pakistan | 1947 |
South Asia | CE | Sri Lanka | 1955 |
Western Africa | BN | Benin | 1960 |
Western Africa | UV | Burkina Faso | 1960 |
Western Africa | CV | Cabo Verde | 1975 |
Western Africa | CM | Cameroon | 1960 |
Western Africa | CT | Central African Republic | 1960 |
Western Africa | CD | Chad | 1960 |
Western Africa | CF | Congo, Republic of the | 1960 |
Western Africa | IV | Cote d’Ivoire | 1960 |
Western Africa | EK | Equatorial Guinea | 1968 |
Western Africa | GB | Gabon | 1960 |
Western Africa | GA | Gambia, The | 1965 |
Western Africa | GH | Ghana | 1957 |
Western Africa | GV | Guinea | 1958 |
Western Africa | PU | Guinea-Bissau | 1974 |
Western Africa | LI | Liberia | 1945 |
Western Africa | ML | Mali | 1960 |
Western Africa | MR | Mauritania | 1961 |
Western Africa | NG | Niger | 1960 |
Western Africa | NI | Nigeria | 1960 |
Western Africa | SG | Senegal | 1960 |
Western Africa | SL | Sierra Leone | 1961 |
Western Africa | TO | Togo | 1960 |
Dependencies and Territories of The United States
Flags shown are the bolded territories, top-to-bottom and left-to-right. Note that 9 of the 14 territories did not have unique flags. Those with unique flags have common elements such as the red, white, and blue color scheme, the red and white stripes, or the bald eagle.
Dependencies and Territories of The United Kingdom
The dependencies of the U.K. that have unique flags are bolded and shown above, displayed top-to-bottom and left-to-right. It is interesting how so many flags have the U.K. flag in the canton region (top-left corner) of their flag, with the only differences being the specific emblem or shield. I really like the wavy stripes in the British Indian Ocean Territory flag, the castle in the Gibraltar flag, and the 3 legs in the Isle of Man.
Dependencies and Territories of Other Countries
Here are 6 other flags from some of the other dependencies. Once again, we see the reach of the British colonization and their flag in the canton region of flags of New Zealand or Australia dependencies. The six flags shown are: top row - Cook Islands (New Zealand), Norfolk Island (Australia), French Polynesia (France), bottom row - Sint Maarten (Netherlands), Niue, Faroe Islands (Denmark).