First we will look at the US Armed Forces data frame from Activity 8, the code used to wrangle the data can be found in the code appendix.
We will look at a frequency table that looks at enlisted soldiers in the Army and attempts to establish a connection with gender.
Table 1: Frequency table of male Army soldiers by Rank
Rank
Army
Corporal OR Specialist
79,234 (26.44%)
First Sergeant OR Master Sergeant
9,482 (3.16%)
Private
29,767 (9.93%)
Private First Class
43,775 (14.61%)
Sergeant
54,803 (18.29%)
Sergeant First Class
30,264 (10.10%)
Sergeant Major OR Command Sergeant Major
2,865 (0.96%)
Staff Sergeant
49,502 (16.52%)
Total
299,692 (100.00%)
Fig 1. Frequency tables of Men(top) and Women(bottom) in the Army as Enlisteds
Table 2: Frequency table of female Army soldiers by Rank
Rank
Army
Corporal OR Specialist
15,143 (27.22%)
First Sergeant OR Master Sergeant
1,472 (2.65%)
Private
5,662 (10.18%)
Private First Class
10,229 (18.39%)
Sergeant
10,954 (19.69%)
Sergeant First Class
4,410 (7.93%)
Sergeant Major OR Command Sergeant Major
394 (0.71%)
Staff Sergeant
7,363 (13.24%)
Total
55,627 (100.00%)
Fig 1. Frequency tables of Men(top) and Women(bottom) in the Army as Enlisteds
The first table pictured is for men, the second is for women. The rank columns shows the rank of a selected group of soldiers, the next column shows the count of those soldiers, and the Army column shows the percentage of that rank by gender, ex. there are 29,767 male privates which is 9.93% of all male army enlisted soldiers. From this we can see that overall proportions are very similar between genders, but at higher levels such as staff sergeant or sergeant first class there is a slightly higher proportion of men. Conversely there is a higher proportion of women who are private first class.
Next we will look at a data visualization for baby names across a wide set of years. The code for the wrangling and production of this visualization can be found in the code appendix.
Fig 2. The Baby Names Mary Thomas James and Anna plotted by year and count
With this visualization, we can see the popularity of baby names from about 1875-2010. The Y-axis denotes the number of babies born that year named a certain name, and the X-axis denotes the year. The legend on the right shows both the color and line type used for each name. With this graph we can see that the name Anna was overall the least popular, and that James and Mary competed for the top spot, makes sense, they are both biblical names. All of the names besides Anna peak from 1940-1960, during the Baby Boomer generation.
Next, we will look at a visualization for a math problem called the Box Problem. The code for this problem and the visualization can be found in the appendix.
Fig 3. The Box Problem When the Sheet of Paper is 36 x 48
From this visualization, we see the maximum volume we can get is 5239.8 cubic inches, from a cut of length 6.8 inches. Interpreting this, we can maximize the volume of our box with a cut of about 7 inches, and we can see that that it is a much slower decline of volume with cuts greater than 6.8 as opposed to less than 6.8.
Finally, lets talk about what I have learned in this class. I love sports statistics, and I like playing around with them in my spare time. This class has opened up that world to me so much more, I had some experience with data wrangling, but topics like the join functions never made sense until this class, and now I use them in personal projects. A really cool unit to me was Chartjunk, I knew when I saw a data visualization I liked, but I didn’t know WHY I liked it. That unit helped put into words why some visualizations worked for me and some didn’t. Now when I make my own I try and incorporate those techniques the best I can.
Appendix A: Code
This code is for the Data Wrangling Section of the Armed Forces data.
library(googlesheets4)library(tidyverse)gs4_deauth()forcesHeaders <-read_sheet( ##set the headers we will usess ='https://docs.google.com/spreadsheets/d/19xQnI1cBh6Jkw7eP8YQuuicMlVDF7Gr-nXCb5qbwb_E/edit?gid=597536282#gid=597536282',col_names =FALSE,n_max =3)rawForces <-read_sheet( ##get the unwrangled datass ='https://docs.google.com/spreadsheets/d/19xQnI1cBh6Jkw7eP8YQuuicMlVDF7Gr-nXCb5qbwb_E/edit?gid=597536282#gid=597536282',col_names =FALSE,skip =3,n_max =28,col_types ='c')branchNames <-rep( ##set the branch names we will usex =c("Army", "Navy", "Marine Corps", "Air Force", "Space Force", "Total"),each =3)tempHeaders <-paste(c("", branchNames), forcesHeaders[3,],sep =".")names(rawForces)<- tempHeaderscleanForces <- rawForces %>%##clean the data by renaming and getting the frequencys by parsingrename(Pay.Grade ='.Pay Grade') %>% dplyr::select(!contains("Total")) %>%pivot_longer(cols =!Pay.Grade,names_to ="Branch.Sex",values_to ="Frequency") %>%separate_wider_delim(cols = Branch.Sex,delim =".",names =c("Branch", "Sex") ) %>%mutate(Frequency =na_if(Frequency, y ="N/A*"),Frequency =parse_number(Frequency) )
Now here is the code for the frequency table of enlisted Army Soldiers.
Table 1: Frequency table of male Army soldiers by Rank
Rank
Army
Corporal OR Specialist
79,234 (26.44%)
First Sergeant OR Master Sergeant
9,482 (3.16%)
Private
29,767 (9.93%)
Private First Class
43,775 (14.61%)
Sergeant
54,803 (18.29%)
Sergeant First Class
30,264 (10.10%)
Sergeant Major OR Command Sergeant Major
2,865 (0.96%)
Staff Sergeant
49,502 (16.52%)
Total
299,692 (100.00%)
Fig 1. Frequency tables of Men(top) and Women(bottom) in the Army as Enlisteds
Table 2: Frequency table of female Army soldiers by Rank
Rank
Army
Corporal OR Specialist
15,143 (27.22%)
First Sergeant OR Master Sergeant
1,472 (2.65%)
Private
5,662 (10.18%)
Private First Class
10,229 (18.39%)
Sergeant
10,954 (19.69%)
Sergeant First Class
4,410 (7.93%)
Sergeant Major OR Command Sergeant Major
394 (0.71%)
Staff Sergeant
7,363 (13.24%)
Total
55,627 (100.00%)
Fig 1. Frequency tables of Men(top) and Women(bottom) in the Army as Enlisteds
Here is the code for wrangling and creating the visualization for BabyNames data.
Fig 2. The Baby Names Mary Thomas James and Anna plotted by year and count
Here is the code used for the function and plot of the box problem.
Fig 3. The Box Problem When the Sheet of Paper is 36 x 48