Final Project

Is there a relationship between a person’s rank in the military and their gender?

Introduction

Many people enlist into the military to serve the country whether it be a part of the army, marines, navy, air force, or coast guard. While in the military your rank can increase due to several different factors such as achievements, performance, and time served in the military. One factor I wanted to question is whether the persons gender plays a part in their persons rank in the military.

The data set I will be using to answer this question is titled military. This data was collected by the Department of Defense on 02-20-2012. This data contains information from the branches of the Army, Navy, Air Force, and Marine Corps. There are 1,414,593 observations and 6 variables. The data set can be found on OpenIntro at https://www.openintro.org/data/index.php?data=military.

Data Analysis

The names of the variables I will be using is rank, and gender. Both variables are categorical variables.

rank: This is their numeric rank with higher numbers meaning higher ranks
gender: This is the gender of the person

Checking the head and structure of the data set. Everything looks good so I check if there is any NA’s in any of the columns. None of the columns have any NA’s so I don’t have to clean up much.

head(military)

## # A tibble: 6 × 6
##   grade   branch gender race    hisp   rank
##   <chr>   <chr>  <chr>  <chr>   <lgl> <dbl>
## 1 officer army   male   ami/aln TRUE      2
## 2 officer army   male   ami/aln TRUE      2
## 3 officer army   male   ami/aln TRUE      5
## 4 officer army   male   ami/aln TRUE      5
## 5 officer army   male   ami/aln TRUE      5
## 6 officer army   male   ami/aln TRUE      5

str(military)

## spc_tbl_ [1,414,593 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ grade : chr [1:1414593] "officer" "officer" "officer" "officer" ...
##  $ branch: chr [1:1414593] "army" "army" "army" "army" ...
##  $ gender: chr [1:1414593] "male" "male" "male" "male" ...
##  $ race  : chr [1:1414593] "ami/aln" "ami/aln" "ami/aln" "ami/aln" ...
##  $ hisp  : logi [1:1414593] TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ rank  : num [1:1414593] 2 2 5 5 5 5 5 7 10 2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   grade = col_character(),
##   ..   branch = col_character(),
##   ..   gender = col_character(),
##   ..   race = col_character(),
##   ..   hisp = col_logical(),
##   ..   rank = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

colSums(is.na(military))

##  grade branch gender   race   hisp   rank 
##      0      0      0      0      0      0

Here I make both the varibles factors to be able to plot the data easier. I also make tables for both to see the total amounts in each group.

military <- military|>
  mutate(gender_factor = factor(gender))|>
  mutate(rank_factor = factor(rank))

table(military$gender_factor)

## 
##  female    male 
##  202718 1211875

table(military$rank_factor)

## 
##      1      2      3      4      5      6      7      8      9     10     11 
##      9  76159     38 113016 301070 324590 279320 183252  98525  28093  10521

Here I mutate the factors to group the ranks together into 5 groups being “1-3”, “4-5”, “6-7”, “8-9”, and “10-11”. This is because rank 1 and rank 3 didn’t have many people which did cause issues for the chi-squared test as there was not an expected amount over 5 making the results unreliable.

military <- military |>
  mutate(
    rank_recoded = case_when(
      rank_factor %in% c("1","2","3") ~ "1-3",
      rank_factor %in% c("4","5") ~ "4-5",
      rank_factor %in% c("6","7") ~ "6-7",
      rank_factor %in% c("8","9") ~ "8-9",
      rank_factor %in% c("10","11") ~ "10-11"),
    rank_recoded = factor(rank_recoded,
                            levels = c("1-3", "4-5", "6-7", "8-9", "10-11")))

table(military$rank_recoded)

## 
##    1-3    4-5    6-7    8-9  10-11 
##  76206 414086 603910 281777  38614

Here I create bar plots to show the total amount of people there are in each gender and in each rank. I also create a third bar plot to visualize the total amount of each gender in each rank with pink showing the total amount of women and light blue showing the total amount of men. There is also a bar plot showing the amount of people in each rank group. I made one for each rank amount by gender for fun but it doesn’t really show the data well as the colors don’t correlate.

barplot(table(military$gender_factor),
        main = "Total Count by Gender",
        xlab = "Gender",
        ylab = "Count",
        col = "purple")

barplot(table(military$rank_factor),
        main = "Total Count by Rank",
        xlab = "Rank",
        ylab = "Count",
        col = "violet")

barplot(table(military$gender_factor, military$rank_factor),
        beside = TRUE,
        main = "Amount of Gender in Rank",
        xlab = "Rank",
        ylab = "Count",
        col = c("pink","lightblue"))

barplot(table(military$rank_recoded),
        main = "Total Count by Rank Group",
        xlab = "Rank",
        ylab = "Count",
        col = "maroon")

barplot(table(military$rank_factor, military$gender_factor),
        beside = TRUE,
        main = "Amount of Rank in Gender",
        xlab = "Gender",
        ylab = "Count",
        col = c("black","red","orange","yellow","green","blue","lightblue", "purple", "magenta", "grey"))

Statistical Analysis

To help me answer my question I am going to perform a chi-squared test for association at a 5% significance level.

Hypothesis

\(H_0\): A person’s rank in the military is not associated with their gender

\(H_a\): A person’s rank in the military is associated with their gender

α = 0.05

observed_dataset<- table(military$gender, military$rank_recoded)
observed_dataset

##         
##             1-3    4-5    6-7    8-9  10-11
##   female  12359  64196  88155  34755   3253
##   male    63847 349890 515755 247022  35361

test <- chisq.test(observed_dataset)

test

## 
##  Pearson's Chi-squared test
## 
## data:  observed_dataset
## X-squared = 2731.7, df = 4, p-value < 2.2e-16

test$expected

##         
##               1-3       4-5       6-7    8-9     10-11
##   female 10920.69  59340.52  86543.22  40380  5533.572
##   male   65285.31 354745.48 517366.78 241397 33080.428

test$statistic

## X-squared 
##  2731.683

All the expected values are more than 5 so the chi-squared test is able to be performed and the results such as the p-value are reliable.

The results of the chi-squared test show a degree of freedom (df) = 4 and a p-value < 2.2e-16. As the p-value < 2.2e-16 this shows that the p-value is extremely small and is less than 0.05 so we can reject the null. The p-value is statistically significant at α = 0.05. We have enough evidence at a 0.05 significance level that there is a significant association between a person’s rank in the military and their gender.

Conclusion and Future Directions

After performing my chi-squared test for association I am certain that a persons gender does play a part in their rank in the military. The test results showed me a p-value < 2.2e-16 meaning that I have enough evidence to say there is an association between the two. Looking back at the bar plot that shows the total amount of each gender in each rank you can somewhat notice that the peak for the womens rank is around rank 5 while the peak for the mens rank is at rank 6. This goes to show that men do have higher ranks more often when compared to women further showing there is a factor at play. The data set I used didn’t have a lot of people in either rank 1 or rank 3 which made me have to group the ranks together in order to perform the chi-squared test for association. Maybe in the future if I am able to find another data set that is distributed more evenly I can perform another test to see if there is any change in the results. We can also test for associations with other factors in this data set as well such as testing for an association between a persons race and their rank which could be interesting.

References

Data set found from openintro.org at https://www.openintro.org/data/index.php?data=military. Data collected by the Department of Defense on 02-20-2012.