Image by Gordon Johnson from Pixabay
Download flag.csv and flag.names to your working directory. Make sure to set your working directory appropriately!
Let’s look at some information about this file. Open flag.names in RStudio by double clicking it in the files pane in bottom left. Read through this file.
Who is the donor of this data? Richard S. Forsyth
Is there any missing data? NO
# fill in your code here
setwd("C:/Users/StarKid/Desktop/Data_Science/Data_101/week_4/flag/flag")
flag_df <- read.csv("flag.csv")
# fill in your code here
class(flag_df)
## [1] "data.frame"
is.data.frame(flag_df)
## [1] TRUE
# fill in your code here
head(flag_df,5)
## X name landmass zone area population language religion bars stripes
## 1 1 Afghanistan 5 1 648 16 10 2 0 3
## 2 2 Albania 3 1 29 3 6 6 0 0
## 3 3 Algeria 4 1 2388 20 8 2 2 0
## 4 4 American-Samoa 6 3 0 0 1 1 0 0
## 5 5 Andorra 3 1 0 0 6 0 3 0
## colours red green blue gold white black orange mainhue circles crosses
## 1 5 1 1 0 1 1 1 0 green 0 0
## 2 3 1 0 0 1 0 1 0 red 0 0
## 3 3 1 1 0 0 1 0 0 green 0 0
## 4 5 1 0 1 1 1 0 1 blue 0 0
## 5 3 1 0 1 1 0 0 0 gold 0 0
## saltires quarters sunstars crescent triangle icon animate text topleft
## 1 0 0 1 0 0 1 0 0 black
## 2 0 0 1 0 0 0 1 0 red
## 3 0 0 1 1 0 0 0 0 green
## 4 0 0 0 0 1 1 1 0 blue
## 5 0 0 0 0 0 0 0 0 blue
## botright
## 1 green
## 2 red
## 3 white
## 4 red
## 5 red
tail(flag_df,5)
## X name landmass zone area population language religion bars
## 190 190 Western-Samoa 6 3 3 0 1 1 0
## 191 191 Yugoslavia 3 1 256 22 6 6 0
## 192 192 Zaire 4 2 905 28 10 5 0
## 193 193 Zambia 4 2 753 6 10 5 3
## 194 194 Zimbabwe 4 2 391 8 10 5 0
## stripes colours red green blue gold white black orange mainhue circles
## 190 0 3 1 0 1 0 1 0 0 red 0
## 191 3 4 1 0 1 1 1 0 0 red 0
## 192 0 4 1 1 0 1 0 0 1 green 1
## 193 0 4 1 1 0 0 0 1 1 green 0
## 194 7 5 1 1 0 1 1 1 0 green 0
## crosses saltires quarters sunstars crescent triangle icon animate text
## 190 0 0 1 5 0 0 0 0 0
## 191 0 0 0 1 0 0 0 0 0
## 192 0 0 0 0 0 0 1 1 0
## 193 0 0 0 0 0 0 0 1 0
## 194 0 0 0 1 0 1 1 1 0
## topleft botright
## 190 blue red
## 191 blue red
## 192 green green
## 193 green brown
## 194 green green
# fill in your code here
summary(flag_df)
## X name landmass zone
## Min. : 1.00 Length:194 Min. :1.000 Min. :1.000
## 1st Qu.: 49.25 Class :character 1st Qu.:3.000 1st Qu.:1.000
## Median : 97.50 Mode :character Median :4.000 Median :2.000
## Mean : 97.50 Mean :3.572 Mean :2.211
## 3rd Qu.:145.75 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :194.00 Max. :6.000 Max. :4.000
## area population language religion
## Min. : 0.0 Min. : 0.00 Min. : 1.00 Min. :0.000
## 1st Qu.: 9.0 1st Qu.: 0.00 1st Qu.: 2.00 1st Qu.:1.000
## Median : 111.0 Median : 4.00 Median : 6.00 Median :1.000
## Mean : 700.0 Mean : 23.27 Mean : 5.34 Mean :2.191
## 3rd Qu.: 471.2 3rd Qu.: 14.00 3rd Qu.: 9.00 3rd Qu.:4.000
## Max. :22402.0 Max. :1008.00 Max. :10.00 Max. :7.000
## bars stripes colours red
## Min. :0.0000 Min. : 0.000 Min. :1.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:3.000 1st Qu.:1.0000
## Median :0.0000 Median : 0.000 Median :3.000 Median :1.0000
## Mean :0.4536 Mean : 1.552 Mean :3.464 Mean :0.7887
## 3rd Qu.:0.0000 3rd Qu.: 3.000 3rd Qu.:4.000 3rd Qu.:1.0000
## Max. :5.0000 Max. :14.000 Max. :8.000 Max. :1.0000
## green blue gold white
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000
## Median :0.0000 Median :1.0000 Median :0.0000 Median :1.0000
## Mean :0.4691 Mean :0.5103 Mean :0.4691 Mean :0.7526
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## black orange mainhue circles
## Min. :0.000 Min. :0.000 Length:194 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.000 Class :character 1st Qu.:0.0000
## Median :0.000 Median :0.000 Mode :character Median :0.0000
## Mean :0.268 Mean :0.134 Mean :0.1701
## 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.0000
## Max. :1.000 Max. :1.000 Max. :4.0000
## crosses saltires quarters sunstars
## Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.: 0.000
## Median :0.0000 Median :0.00000 Median :0.0000 Median : 0.000
## Mean :0.1495 Mean :0.09278 Mean :0.1495 Mean : 1.387
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.: 1.000
## Max. :2.0000 Max. :1.00000 Max. :4.0000 Max. :50.000
## crescent triangle icon animate
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.000
## Mean :0.0567 Mean :0.1392 Mean :0.2526 Mean :0.201
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.7500 3rd Qu.:0.000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.000
## text topleft botright
## Min. :0.00000 Length:194 Length:194
## 1st Qu.:0.00000 Class :character Class :character
## Median :0.00000 Mode :character Mode :character
## Mean :0.08247
## 3rd Qu.:0.00000
## Max. :1.00000
# fill in your code here
str(flag_df)
## 'data.frame': 194 obs. of 31 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ name : chr "Afghanistan" "Albania" "Algeria" "American-Samoa" ...
## $ landmass : int 5 3 4 6 3 4 1 1 2 2 ...
## $ zone : int 1 1 1 3 1 2 4 4 3 3 ...
## $ area : int 648 29 2388 0 0 1247 0 0 2777 2777 ...
## $ population: int 16 3 20 0 0 7 0 0 28 28 ...
## $ language : int 10 6 8 1 6 10 1 1 2 2 ...
## $ religion : int 2 6 2 1 0 5 1 1 0 0 ...
## $ bars : int 0 0 2 0 3 0 0 0 0 0 ...
## $ stripes : int 3 0 0 0 0 2 1 1 3 3 ...
## $ colours : int 5 3 3 5 3 3 3 5 2 3 ...
## $ red : int 1 1 1 1 1 1 0 1 0 0 ...
## $ green : int 1 0 1 0 0 0 0 0 0 0 ...
## $ blue : int 0 0 0 1 1 0 1 1 1 1 ...
## $ gold : int 1 1 0 1 1 1 0 1 0 1 ...
## $ white : int 1 0 1 1 0 0 1 1 1 1 ...
## $ black : int 1 1 0 0 0 1 0 1 0 0 ...
## $ orange : int 0 0 0 1 0 0 1 0 0 0 ...
## $ mainhue : chr "green" "red" "green" "blue" ...
## $ circles : int 0 0 0 0 0 0 0 0 0 0 ...
## $ crosses : int 0 0 0 0 0 0 0 0 0 0 ...
## $ saltires : int 0 0 0 0 0 0 0 0 0 0 ...
## $ quarters : int 0 0 0 0 0 0 0 0 0 0 ...
## $ sunstars : int 1 1 1 0 0 1 0 1 0 1 ...
## $ crescent : int 0 0 1 0 0 0 0 0 0 0 ...
## $ triangle : int 0 0 0 1 0 0 0 1 0 0 ...
## $ icon : int 1 0 0 1 0 1 0 0 0 0 ...
## $ animate : int 0 1 0 1 0 0 1 0 0 0 ...
## $ text : int 0 0 0 0 0 0 0 0 0 0 ...
## $ topleft : chr "black" "red" "green" "blue" ...
## $ botright : chr "green" "red" "white" "red" ...
We are going to use the dplyr package.
# fill in your code here
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tibble)
as_tibble(flag_df)
## # A tibble: 194 × 31
## X name landmass zone area population language religion bars stripes
## <int> <chr> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 1 Afghan… 5 1 648 16 10 2 0 3
## 2 2 Albania 3 1 29 3 6 6 0 0
## 3 3 Algeria 4 1 2388 20 8 2 2 0
## 4 4 Americ… 6 3 0 0 1 1 0 0
## 5 5 Andorra 3 1 0 0 6 0 3 0
## 6 6 Angola 4 2 1247 7 10 5 0 2
## 7 7 Anguil… 1 4 0 0 1 1 0 1
## 8 8 Antigu… 1 4 0 0 1 1 0 1
## 9 9 Argent… 2 3 2777 28 2 0 0 3
## 10 10 Argent… 2 3 2777 28 2 0 0 3
## # ℹ 184 more rows
## # ℹ 21 more variables: colours <int>, red <int>, green <int>, blue <int>,
## # gold <int>, white <int>, black <int>, orange <int>, mainhue <chr>,
## # circles <int>, crosses <int>, saltires <int>, quarters <int>,
## # sunstars <int>, crescent <int>, triangle <int>, icon <int>, animate <int>,
## # text <int>, topleft <chr>, botright <chr>
# fill in your code here
colnames(flag_df)
## [1] "X" "name" "landmass" "zone" "area"
## [6] "population" "language" "religion" "bars" "stripes"
## [11] "colours" "red" "green" "blue" "gold"
## [16] "white" "black" "orange" "mainhue" "circles"
## [21] "crosses" "saltires" "quarters" "sunstars" "crescent"
## [26] "triangle" "icon" "animate" "text" "topleft"
## [31] "botright"
Something should look strange about the first column name. Let’s investigate this.
# fill in your code here
colnames(flag_df)[1]
## [1] "X"
view(flag_df$...1)
What is in this first column? numbers from to 1-194
Do we really need it? Well depends if you want to have an order of names by alphabetical order
# fill in your code here
flag_df <- flag_df[,-1]
# fill in your code here
which(is.na(flag_df))
## integer(0)
At this point, we know there are no missing values in the dataset so we will use dplyr to make the dataset a bit more readable to us. Look at the flag.names file again. Under “Attribute Information” look at the variables landmass, zone, language, religion.
Instead of encoding these categories using numbers, we would like to just use the categories in the variables. For example, in the zone column, we want our data to be “NE”, “SE”, “SW”, “NW”, instead of 1, 2, 3, 4.
# fill in your code here
flag_df$landmass <- factor(flag_df$landmass, levels = 1:6, labels = c("N.America", "S.America", "Europe", "Africa", "Asia", "Oceania"))
#view(flag_df$landmass)
flag_df$zone <- factor(flag_df$zone, levels = 1:4, labels = c("NE", "SE", "SW", "NW"))
#view(flag_df$zone)
flag_df$language <- factor(flag_df$language, levels = 1:10, labels = c("English", "Spanish", "French", "German", "Slavic", "Indo-European","Chinese", "Arabic", "Japanese/Turkish/Finnish/Magyyar", "Others") )
#view(flag_df$language)
flag_df$religion <-factor(flag_df$religion, levels = 0:7, labels = c("Catholic", "Other Christian", "Muslim", "Buddhist", "Hindu", "Ethnic", "Marxist", "Other"))
#view(flag_df$religion)
Notice from our earlier structure command that the data types for columns red, green, blue, gold, white, black, orange, crescent, triangle, icon, animate, text are all integer. Looking at flag.names these integer variables are really just an encoding for true (1) or false (0). We don’t want to compute with these 1s and 0s (for example find a mean). So we should change these to logicals.
# fill in your code here
class(flag_df$red)
## [1] "integer"
flag_df$red <- as.logical(flag_df$red)
flag_df$green <- as.logical(flag_df$green)
flag_df$blue <- as.logical(flag_df$blue)
flag_df$gold <- as.logical(flag_df$gold)
flag_df$white <- as.logical(flag_df$white)
flag_df$black <- as.logical(flag_df$black)
flag_df$orange <- as.logical(flag_df$orange)
flag_df$crescent <- as.logical(flag_df$crescent)
flag_df$triangle <- as.logical(flag_df$triangle)
flag_df$icon <- as.logical(flag_df$icon)
flag_df$animate <- as.logical(flag_df$animate)
flag_df$text <- as.logical(flag_df$text)
Now that our data is clean, let’s answer some questions about it!
# fill in your code here
table(flag_df$mainhue)
##
## black blue brown gold green orange red white
## 5 40 2 19 31 4 71 22
# fill in your code here
red_white_blue <- flag_df[flag_df$red == 1 & flag_df$white == 1 & flag_df$blue == 1, ]
red_white_blue
## name landmass zone area population language
## 4 American-Samoa Oceania SW 0 0 English
## 8 Antigua-Barbuda N.America NW 0 0 English
## 11 Australia Oceania SE 7690 15 English
## 18 Belize N.America NW 23 0 English
## 20 Bermuda N.America NW 0 0 English
## 25 British-Virgin-Isles N.America NW 0 0 English
## 27 Bulgaria Europe NE 111 9 Slavic
## 29 Burma Asia NE 678 35 Others
## 34 Cayman-Islands N.America NW 0 0 English
## 35 Central-African-Republic Africa NE 623 2 Others
## 37 Chile S.America SW 757 11 Spanish
## 42 Cook-Islands Oceania SW 0 0 English
## 43 Costa-Rica N.America NW 51 2 Spanish
## 44 Cuba N.America NW 115 10 Spanish
## 46 Czechoslovakia Europe NE 128 15 Slavic
## 48 Djibouti Africa NE 22 0 French
## 49 Dominica N.America NW 0 0 English
## 50 Dominican-Republic N.America NW 49 6 Spanish
## 54 Equatorial-Guinea Africa NE 28 0 Others
## 56 Faeroes Europe NW 1 0 Indo-European
## 57 Falklands-Malvinas S.America SW 12 0 English
## 58 Fiji Oceania SE 18 1 English
## 60 France Europe NE 547 54 French
## 61 French-Guiana S.America NW 91 0 French
## 62 French-Polynesia Oceania SW 4 0 French
## 64 Gambia Africa NW 10 1 English
## 72 Guam Oceania NE 0 0 English
## 79 Hong-Kong Asia NE 1 5 Chinese
## 81 Iceland Europe NW 103 0 Indo-European
## 95 Kiribati Oceania NE 0 0 English
## 97 Laos Asia NE 236 3 Others
## 99 Lesotho Africa SE 30 1 Others
## 100 Liberia Africa NW 111 1 Others
## 103 Luxembourg Europe NE 3 0 German
## 106 Malaysia Asia NE 333 13 Others
## 117 Montserrat N.America NW 0 0 English
## 122 Netherlands Europe NE 41 14 Indo-European
## 123 Netherlands-Antilles N.America NW 0 0 Indo-European
## 124 New-Zealand Oceania SE 268 2 English
## 128 Niue Oceania SW 0 0 English
## 129 North-Korea Asia NE 121 18 Others
## 131 Norway Europe NE 324 4 Indo-European
## 134 Panama S.America NW 76 2 Spanish
## 136 Parguay S.America SW 407 3 Spanish
## 138 Philippines Oceania NE 300 48 Others
## 140 Portugal Europe NW 92 10 Indo-European
## 141 Puerto-Rico N.America NW 9 3 Spanish
## 143 Romania Europe NE 237 22 Indo-European
## 154 South-Africa Africa SE 1221 29 Indo-European
## 155 South-Korea Asia NE 99 39 Others
## 156 South-Yemen Asia NE 288 2 Arabic
## 159 St-Helena Africa SW 0 0 English
## 165 Swaziland Africa SE 17 1 Others
## 169 Taiwan Asia NE 36 18 Chinese
## 171 Thailand Asia NE 514 49 Others
## 177 Turks-Cocos-Islands N.America NW 0 0 English
## 178 Tuvalu Oceania SE 0 0 English
## 181 UK Europe NW 245 56 English
## 183 US-Virgin-Isles N.America NW 0 0 English
## 184 USA N.America NW 9363 231 English
## 188 Venezuela S.America NW 912 15 Spanish
## 190 Western-Samoa Oceania SW 3 0 English
## 191 Yugoslavia Europe NE 256 22 Indo-European
## religion bars stripes colours red green blue gold white black
## 4 Other Christian 0 0 5 TRUE FALSE TRUE TRUE TRUE FALSE
## 8 Other Christian 0 1 5 TRUE FALSE TRUE TRUE TRUE TRUE
## 11 Other Christian 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 18 Other Christian 0 2 8 TRUE TRUE TRUE TRUE TRUE TRUE
## 20 Other Christian 0 0 6 TRUE TRUE TRUE TRUE TRUE TRUE
## 25 Other Christian 0 0 6 TRUE TRUE TRUE TRUE TRUE FALSE
## 27 Marxist 0 3 5 TRUE TRUE TRUE TRUE TRUE FALSE
## 29 Buddhist 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 34 Other Christian 0 0 6 TRUE TRUE TRUE TRUE TRUE FALSE
## 35 Ethnic 1 0 5 TRUE TRUE TRUE TRUE TRUE FALSE
## 37 Catholic 0 2 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 42 Other Christian 0 0 4 TRUE FALSE TRUE FALSE TRUE FALSE
## 43 Catholic 0 5 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 44 Marxist 0 5 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 46 Marxist 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 48 Muslim 0 0 4 TRUE TRUE TRUE FALSE TRUE FALSE
## 49 Other Christian 0 0 6 TRUE TRUE TRUE TRUE TRUE TRUE
## 50 Catholic 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 54 Ethnic 0 3 4 TRUE TRUE TRUE FALSE TRUE FALSE
## 56 Other Christian 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 57 Other Christian 0 0 6 TRUE TRUE TRUE TRUE TRUE FALSE
## 58 Other Christian 0 0 7 TRUE TRUE TRUE TRUE TRUE FALSE
## 60 Catholic 3 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 61 Catholic 3 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 62 Catholic 0 3 5 TRUE FALSE TRUE TRUE TRUE TRUE
## 64 Ethnic 0 5 4 TRUE TRUE TRUE FALSE TRUE FALSE
## 72 Other Christian 0 0 7 TRUE TRUE TRUE TRUE TRUE FALSE
## 79 Buddhist 0 0 6 TRUE TRUE TRUE TRUE TRUE FALSE
## 81 Other Christian 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 95 Other Christian 0 0 4 TRUE FALSE TRUE TRUE TRUE FALSE
## 97 Marxist 0 3 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 99 Ethnic 2 0 4 TRUE TRUE TRUE FALSE TRUE FALSE
## 100 Ethnic 0 11 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 103 Catholic 0 3 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 106 Muslim 0 14 4 TRUE FALSE TRUE TRUE TRUE FALSE
## 117 Other Christian 0 0 7 TRUE TRUE TRUE TRUE TRUE TRUE
## 122 Other Christian 0 3 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 123 Other Christian 0 1 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 124 Other Christian 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 128 Other Christian 0 0 4 TRUE FALSE TRUE TRUE TRUE FALSE
## 129 Marxist 0 5 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 131 Other Christian 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 134 Catholic 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 136 Catholic 0 3 6 TRUE TRUE TRUE TRUE TRUE TRUE
## 138 Catholic 0 0 4 TRUE FALSE TRUE TRUE TRUE FALSE
## 140 Catholic 0 0 5 TRUE TRUE TRUE TRUE TRUE FALSE
## 141 Catholic 0 5 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 143 Marxist 3 0 7 TRUE TRUE TRUE TRUE TRUE FALSE
## 154 Other Christian 0 3 5 TRUE TRUE TRUE FALSE TRUE FALSE
## 155 Other 0 0 4 TRUE FALSE TRUE FALSE TRUE TRUE
## 156 Muslim 0 3 4 TRUE FALSE TRUE FALSE TRUE TRUE
## 159 Other Christian 0 0 7 TRUE TRUE TRUE TRUE TRUE FALSE
## 165 Other Christian 0 5 7 TRUE FALSE TRUE TRUE TRUE TRUE
## 169 Buddhist 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 171 Buddhist 0 5 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 177 Other Christian 0 0 6 TRUE TRUE TRUE TRUE TRUE FALSE
## 178 Other Christian 0 0 5 TRUE FALSE TRUE TRUE TRUE FALSE
## 181 Other Christian 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 183 Other Christian 0 0 6 TRUE TRUE TRUE TRUE TRUE FALSE
## 184 Other Christian 0 13 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 188 Catholic 0 3 7 TRUE TRUE TRUE TRUE TRUE TRUE
## 190 Other Christian 0 0 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 191 Marxist 0 3 4 TRUE FALSE TRUE TRUE TRUE FALSE
## orange mainhue circles crosses saltires quarters sunstars crescent triangle
## 4 TRUE blue 0 0 0 0 0 FALSE TRUE
## 8 FALSE red 0 0 0 0 1 FALSE TRUE
## 11 FALSE blue 0 1 1 1 6 FALSE FALSE
## 18 TRUE blue 1 0 0 0 0 FALSE FALSE
## 20 FALSE red 1 1 1 1 0 FALSE FALSE
## 25 TRUE blue 0 1 1 1 0 FALSE FALSE
## 27 FALSE red 0 0 0 0 1 FALSE FALSE
## 29 FALSE red 0 0 0 1 14 FALSE FALSE
## 34 TRUE blue 1 1 1 1 4 FALSE FALSE
## 35 FALSE gold 0 0 0 0 1 FALSE FALSE
## 37 FALSE red 0 0 0 1 1 FALSE FALSE
## 42 FALSE blue 1 1 1 1 15 FALSE FALSE
## 43 FALSE blue 0 0 0 0 0 FALSE FALSE
## 44 FALSE blue 0 0 0 0 1 FALSE TRUE
## 46 FALSE white 0 0 0 0 0 FALSE TRUE
## 48 FALSE blue 0 0 0 0 1 FALSE TRUE
## 49 FALSE green 1 0 0 0 10 FALSE FALSE
## 50 FALSE blue 0 1 0 0 0 FALSE FALSE
## 54 FALSE green 0 0 0 0 0 FALSE TRUE
## 56 FALSE white 0 1 0 0 0 FALSE FALSE
## 57 FALSE blue 1 1 1 1 0 FALSE FALSE
## 58 TRUE blue 0 2 1 1 0 FALSE FALSE
## 60 FALSE white 0 0 0 0 0 FALSE FALSE
## 61 FALSE white 0 0 0 0 0 FALSE FALSE
## 62 FALSE red 1 0 0 0 1 FALSE FALSE
## 64 FALSE red 0 0 0 0 0 FALSE FALSE
## 72 TRUE blue 0 0 0 0 0 FALSE FALSE
## 79 TRUE blue 1 1 1 1 0 FALSE FALSE
## 81 FALSE blue 0 1 0 0 0 FALSE FALSE
## 95 FALSE red 0 0 0 0 1 FALSE FALSE
## 97 FALSE red 1 0 0 0 0 FALSE FALSE
## 99 FALSE blue 0 0 0 0 0 FALSE FALSE
## 100 FALSE red 0 0 0 1 1 FALSE FALSE
## 103 FALSE red 0 0 0 0 0 FALSE FALSE
## 106 FALSE red 0 0 0 1 1 TRUE FALSE
## 117 FALSE blue 0 2 1 1 0 FALSE FALSE
## 122 FALSE red 0 0 0 0 0 FALSE FALSE
## 123 FALSE white 0 0 0 0 6 FALSE FALSE
## 124 FALSE blue 0 1 1 1 4 FALSE FALSE
## 128 FALSE gold 1 1 1 1 5 FALSE FALSE
## 129 FALSE blue 1 0 0 0 1 FALSE FALSE
## 131 FALSE red 0 1 0 0 0 FALSE FALSE
## 134 FALSE red 0 0 0 4 2 FALSE FALSE
## 136 FALSE red 1 0 0 0 1 FALSE FALSE
## 138 FALSE blue 0 0 0 0 4 FALSE TRUE
## 140 FALSE red 1 0 0 0 0 FALSE FALSE
## 141 FALSE red 0 0 0 0 1 FALSE TRUE
## 143 TRUE red 0 0 0 0 2 FALSE FALSE
## 154 TRUE orange 0 1 1 0 0 FALSE FALSE
## 155 FALSE white 1 0 0 0 0 FALSE FALSE
## 156 FALSE red 0 0 0 0 1 FALSE TRUE
## 159 TRUE blue 0 1 1 1 0 FALSE FALSE
## 165 TRUE blue 0 0 0 0 0 FALSE FALSE
## 169 FALSE red 1 0 0 1 1 FALSE FALSE
## 171 FALSE red 0 0 0 0 0 FALSE FALSE
## 177 TRUE blue 0 1 1 1 0 FALSE FALSE
## 178 FALSE blue 0 1 1 1 9 FALSE FALSE
## 181 FALSE red 0 1 1 0 0 FALSE FALSE
## 183 FALSE white 0 0 0 0 0 FALSE FALSE
## 184 FALSE white 0 0 0 1 50 FALSE FALSE
## 188 TRUE red 0 0 0 0 7 FALSE FALSE
## 190 FALSE red 0 0 0 1 5 FALSE FALSE
## 191 FALSE red 0 0 0 0 1 FALSE FALSE
## icon animate text topleft botright
## 4 TRUE TRUE FALSE blue red
## 8 FALSE FALSE FALSE black red
## 11 FALSE FALSE FALSE white blue
## 18 TRUE TRUE TRUE red red
## 20 TRUE TRUE FALSE white red
## 25 TRUE TRUE TRUE white blue
## 27 TRUE TRUE FALSE white red
## 29 TRUE TRUE FALSE blue red
## 34 TRUE TRUE TRUE white blue
## 35 FALSE FALSE FALSE blue gold
## 37 FALSE FALSE FALSE blue red
## 42 FALSE FALSE FALSE white blue
## 43 FALSE FALSE FALSE blue blue
## 44 FALSE FALSE FALSE blue blue
## 46 FALSE FALSE FALSE white red
## 48 FALSE FALSE FALSE white green
## 49 FALSE TRUE FALSE green green
## 50 FALSE FALSE FALSE blue blue
## 54 FALSE FALSE FALSE green red
## 56 FALSE FALSE FALSE white white
## 57 TRUE TRUE TRUE white blue
## 58 TRUE TRUE FALSE white blue
## 60 FALSE FALSE FALSE blue red
## 61 FALSE FALSE FALSE blue red
## 62 TRUE FALSE FALSE red red
## 64 FALSE FALSE FALSE red green
## 72 TRUE TRUE TRUE red red
## 79 TRUE TRUE TRUE white blue
## 81 FALSE FALSE FALSE blue blue
## 95 TRUE TRUE FALSE red blue
## 97 FALSE FALSE FALSE red red
## 99 TRUE FALSE FALSE green blue
## 100 FALSE FALSE FALSE blue red
## 103 FALSE FALSE FALSE red blue
## 106 FALSE FALSE FALSE blue white
## 117 TRUE TRUE FALSE white blue
## 122 FALSE FALSE FALSE red blue
## 123 FALSE FALSE FALSE white white
## 124 FALSE FALSE FALSE white blue
## 128 FALSE FALSE FALSE white gold
## 129 FALSE FALSE FALSE blue blue
## 131 FALSE FALSE FALSE red red
## 134 FALSE FALSE FALSE white white
## 136 TRUE TRUE TRUE red blue
## 138 FALSE FALSE FALSE blue red
## 140 TRUE FALSE FALSE green red
## 141 FALSE FALSE FALSE red red
## 143 TRUE TRUE TRUE blue red
## 154 FALSE FALSE FALSE orange blue
## 155 TRUE FALSE FALSE white white
## 156 FALSE FALSE FALSE red black
## 159 TRUE FALSE FALSE white blue
## 165 TRUE FALSE FALSE blue blue
## 169 FALSE FALSE FALSE blue red
## 171 FALSE FALSE FALSE red red
## 177 TRUE TRUE FALSE white blue
## 178 FALSE FALSE FALSE white blue
## 181 FALSE FALSE FALSE white red
## 183 TRUE TRUE TRUE white white
## 184 FALSE FALSE FALSE blue red
## 188 TRUE TRUE FALSE gold red
## 190 FALSE FALSE FALSE blue red
## 191 FALSE FALSE FALSE blue red
str(red_white_blue)
## 'data.frame': 63 obs. of 30 variables:
## $ name : chr "American-Samoa" "Antigua-Barbuda" "Australia" "Belize" ...
## $ landmass : Factor w/ 6 levels "N.America","S.America",..: 6 1 6 1 1 1 3 5 1 4 ...
## $ zone : Factor w/ 4 levels "NE","SE","SW",..: 3 4 2 4 4 4 1 1 4 1 ...
## $ area : int 0 0 7690 23 0 0 111 678 0 623 ...
## $ population: int 0 0 15 0 0 0 9 35 0 2 ...
## $ language : Factor w/ 10 levels "English","Spanish",..: 1 1 1 1 1 1 5 10 1 10 ...
## $ religion : Factor w/ 8 levels "Catholic","Other Christian",..: 2 2 2 2 2 2 7 4 2 6 ...
## $ bars : int 0 0 0 0 0 0 0 0 0 1 ...
## $ stripes : int 0 1 0 2 0 0 3 0 0 0 ...
## $ colours : int 5 5 3 8 6 6 5 3 6 5 ...
## $ red : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ green : logi FALSE FALSE FALSE TRUE TRUE TRUE ...
## $ blue : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ gold : logi TRUE TRUE FALSE TRUE TRUE TRUE ...
## $ white : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ black : logi FALSE TRUE FALSE TRUE TRUE FALSE ...
## $ orange : logi TRUE FALSE FALSE TRUE FALSE TRUE ...
## $ mainhue : chr "blue" "red" "blue" "blue" ...
## $ circles : int 0 0 0 1 1 0 0 0 1 0 ...
## $ crosses : int 0 0 1 0 1 1 0 0 1 0 ...
## $ saltires : int 0 0 1 0 1 1 0 0 1 0 ...
## $ quarters : int 0 0 1 0 1 1 0 1 1 0 ...
## $ sunstars : int 0 1 6 0 0 0 1 14 4 1 ...
## $ crescent : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ triangle : logi TRUE TRUE FALSE FALSE FALSE FALSE ...
## $ icon : logi TRUE FALSE FALSE TRUE TRUE TRUE ...
## $ animate : logi TRUE FALSE FALSE TRUE TRUE TRUE ...
## $ text : logi FALSE FALSE FALSE TRUE FALSE TRUE ...
## $ topleft : chr "blue" "black" "white" "red" ...
## $ botright : chr "red" "red" "blue" "red" ...
# fill in your code here
str(flag_df)
## 'data.frame': 194 obs. of 30 variables:
## $ name : chr "Afghanistan" "Albania" "Algeria" "American-Samoa" ...
## $ landmass : Factor w/ 6 levels "N.America","S.America",..: 5 3 4 6 3 4 1 1 2 2 ...
## $ zone : Factor w/ 4 levels "NE","SE","SW",..: 1 1 1 3 1 2 4 4 3 3 ...
## $ area : int 648 29 2388 0 0 1247 0 0 2777 2777 ...
## $ population: int 16 3 20 0 0 7 0 0 28 28 ...
## $ language : Factor w/ 10 levels "English","Spanish",..: 10 6 8 1 6 10 1 1 2 2 ...
## $ religion : Factor w/ 8 levels "Catholic","Other Christian",..: 3 7 3 2 1 6 2 2 1 1 ...
## $ bars : int 0 0 2 0 3 0 0 0 0 0 ...
## $ stripes : int 3 0 0 0 0 2 1 1 3 3 ...
## $ colours : int 5 3 3 5 3 3 3 5 2 3 ...
## $ red : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ green : logi TRUE FALSE TRUE FALSE FALSE FALSE ...
## $ blue : logi FALSE FALSE FALSE TRUE TRUE FALSE ...
## $ gold : logi TRUE TRUE FALSE TRUE TRUE TRUE ...
## $ white : logi TRUE FALSE TRUE TRUE FALSE FALSE ...
## $ black : logi TRUE TRUE FALSE FALSE FALSE TRUE ...
## $ orange : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ mainhue : chr "green" "red" "green" "blue" ...
## $ circles : int 0 0 0 0 0 0 0 0 0 0 ...
## $ crosses : int 0 0 0 0 0 0 0 0 0 0 ...
## $ saltires : int 0 0 0 0 0 0 0 0 0 0 ...
## $ quarters : int 0 0 0 0 0 0 0 0 0 0 ...
## $ sunstars : int 1 1 1 0 0 1 0 1 0 1 ...
## $ crescent : logi FALSE FALSE TRUE FALSE FALSE FALSE ...
## $ triangle : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ icon : logi TRUE FALSE FALSE TRUE FALSE TRUE ...
## $ animate : logi FALSE TRUE FALSE TRUE FALSE FALSE ...
## $ text : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ topleft : chr "black" "red" "green" "blue" ...
## $ botright : chr "green" "red" "white" "red" ...
order(-flag_df$population)
## [1] 38 82 185 184 83 24 91 15 133 113 66 189 88 127 181 60 171 138
## [19] 52 176 84 155 157 139 29 55 154 9 10 39 192 32 143 191 3 118
## [37] 163 129 169 170 65 94 1 121 11 46 158 188 67 85 122 137 106 180
## [55] 119 37 80 17 44 69 140 168 27 104 130 147 12 31 51 73 166 194
## [73] 6 28 89 108 175 22 50 74 77 93 105 148 167 193 47 53 59 79
## [91] 126 144 153 30 36 78 87 131 2 19 86 97 98 101 125 135 136 141
## [109] 150 151 182 35 41 43 90 92 96 111 116 124 134 156 172 21 23 45
## [127] 58 63 64 75 76 99 100 112 132 165 174 179 4 5 7 8 13 14
## [145] 16 18 20 25 26 33 34 40 42 48 49 54 56 57 61 62 68 70
## [163] 71 72 81 95 102 103 107 109 110 114 115 117 120 123 128 142 145 146
## [181] 149 152 159 160 161 162 164 173 177 178 183 186 187 190
largest_population <- head(flag_df[order(-flag_df$population), ], 10)
largest_population
## name landmass zone area population language
## 38 China Asia NE 9561 1008 Chinese
## 82 India Asia NE 3268 684 Indo-European
## 185 USSR Asia NE 22402 274 Slavic
## 184 USA N.America NW 9363 231 English
## 83 Indonesia Oceania SE 1904 157 Others
## 24 Brazil S.America SW 8512 119 Indo-European
## 91 Japan Asia NE 372 118 Japanese/Turkish/Finnish/Magyyar
## 15 Bangladesh Asia NE 143 90 Indo-European
## 133 Pakistan Asia NE 804 84 Indo-European
## 113 Mexico N.America NW 1973 77 Spanish
## religion bars stripes colours red green blue gold white black
## 38 Marxist 0 0 2 TRUE FALSE FALSE TRUE FALSE FALSE
## 82 Hindu 0 3 4 FALSE TRUE TRUE FALSE TRUE FALSE
## 185 Marxist 0 0 2 TRUE FALSE FALSE TRUE FALSE FALSE
## 184 Other Christian 0 13 3 TRUE FALSE TRUE FALSE TRUE FALSE
## 83 Muslim 0 2 2 TRUE FALSE FALSE FALSE TRUE FALSE
## 24 Catholic 0 0 4 FALSE TRUE TRUE TRUE TRUE FALSE
## 91 Other 0 0 2 TRUE FALSE FALSE FALSE TRUE FALSE
## 15 Muslim 0 0 2 TRUE TRUE FALSE FALSE FALSE FALSE
## 133 Muslim 1 0 2 FALSE TRUE FALSE FALSE TRUE FALSE
## 113 Catholic 3 0 4 TRUE TRUE FALSE FALSE TRUE FALSE
## orange mainhue circles crosses saltires quarters sunstars crescent triangle
## 38 FALSE red 0 0 0 0 5 FALSE FALSE
## 82 TRUE orange 1 0 0 0 0 FALSE FALSE
## 185 FALSE red 0 0 0 0 1 FALSE FALSE
## 184 FALSE white 0 0 0 1 50 FALSE FALSE
## 83 FALSE red 0 0 0 0 0 FALSE FALSE
## 24 FALSE green 1 0 0 0 22 FALSE FALSE
## 91 FALSE white 1 0 0 0 1 FALSE FALSE
## 15 FALSE green 1 0 0 0 0 FALSE FALSE
## 133 FALSE green 0 0 0 0 1 TRUE FALSE
## 113 TRUE green 0 0 0 0 0 FALSE FALSE
## icon animate text topleft botright
## 38 FALSE FALSE FALSE red red
## 82 TRUE FALSE FALSE orange green
## 185 TRUE FALSE FALSE red red
## 184 FALSE FALSE FALSE blue red
## 83 FALSE FALSE FALSE red white
## 24 FALSE FALSE TRUE green green
## 91 FALSE FALSE FALSE white white
## 15 FALSE FALSE FALSE green green
## 133 FALSE FALSE FALSE white green
## 113 FALSE TRUE FALSE green red
Let’s see if we can find any patterns in the data.
Your output should be a data frame with each row corresponding to a group. There will be five columns.
Repeat this process except group by zone, language, and religion.
# You may find this function useful (ie. you should call this function in your code)! It calculates the mode of a factor.
cat_mode <- function(cat_var){
mode_idx <- which.max(table(cat_var))
levels(cat_var)[mode_idx]
}
# fill in your code here
flag_by_landmass <- flag_df %>%
group_by(landmass) %>%
summarise(
mainhue_mode = cat_mode(mainhue),
sunstars_median = median(sunstars),
animate_img = sum(animate),
animate_flag_percent = mean(animate) * 100
)
flag_by_landmass
## # A tibble: 6 × 4
## landmass sunstars_median animate_img animate_flag_percent
## <fct> <dbl> <int> <dbl>
## 1 N.America 0 13 41.9
## 2 S.America 0 3 17.6
## 3 Europe 0 4 11.4
## 4 Africa 0 7 13.5
## 5 Asia 1 6 15.4
## 6 Oceania 2.5 6 30
Do you see any patterns in flag mainhue, sun or star symbols, and animate images? If so, describe these patterns. (Hint: you should see patterns! Look at the trends when grouping by landmass, zone, language, and religion.) Write a paragraph to answer this question.
FILL IN YOUR ANSWER HERE