1 Welcome to Anthony’s Correspondence Analysis

This LBB was created as a project that meets certification along with an explanation of the data that I will use

1.1 Dataset & Library R

In making this Learn By Building (LBB) project, I used graduation.xlsx data. Data sourced from the National Center for Education Statistics (NCES) which provides information on the percentage of high school students in the United States who graduated in the 2019-2020 school year, based on characteristics such as ethnicity, poor status, English as a second language, and disability. The data can be used to understand student graduation rates in the United States and differences in graduation rates among the groups

# Set Library
#data wrangling
library("tidyverse")

#data analysis
library("FactoMineR")
library("factoextra")

#data visualization
library("ggpubr") #untuk balloonplot
library("graphics") #untuk mosaicplot
library("ggplot2")
library("plotly")

#data reference
library("datasets")

1.2 Import Data & Description

# Read Dataset
graduation <- readxl::read_xlsx("data/graduation.xlsx")

glimpse(graduation)

#> Rows: 29
#> Columns: 8
#> $ State                        <chr> "Alabama", "Alaska", "Arizona", "Arkansas…
#> $ Black                        <dbl> 88.2, 74.0, 71.7, 84.5, 76.9, 76.6, 80.0,…
#> $ White                        <dbl> 92.2, 84.4, 83.0, 90.9, 87.9, 86.1, 93.4,…
#> $ `Economically disadvantaged` <dbl> 85.5, 72.3, 73.6, 86.2, 81.2, 72.3, 80.6,…
#> $ `English learner`            <dbl> 72.0, 68.0, 55.2, 84.4, 69.1, 70.2, 67.0,…
#> $ `Students with disabilities` <dbl> 68.9, 59.0, 66.2, 84.1, 68.4, 61.8, 68.1,…
#> $ `Homeless enrolled`          <dbl> 74.0, 58.0, 48.6, 78.0, 69.7, 56.7, 65.0,…
#> $ `Foster care`                <dbl> 67.0, 54.0, 45.0, 65.0, 58.2, 31.0, 47.0,…

length(unique(graduation$State))

#> [1] 29

length(unique(graduation$Black))

#> [1] 24

length(unique(graduation$White))

#> [1] 26

max(graduation$Black)

#> [1] 88.2

min(graduation$Black)

#> [1] 69

max(graduation$White)

#> [1] 95

min(graduation$White)

#> [1] 82.8

unique(graduation$Black)

#>  [1] 88.2 74.0 71.7 84.5 76.9 76.6 80.0 87.0 72.9 86.9 69.0 81.0 78.9 83.0 84.7
#> [16] 83.1 70.4 86.1 77.0 75.0 69.5 85.7 75.3 76.5

unique(graduation$White)

#>  [1] 92.2 84.4 83.0 90.9 87.9 86.1 93.4 90.5 93.0 91.9 84.2 92.5 93.8 90.3 87.8
#> [16] 94.1 93.2 85.4 89.9 88.7 86.4 89.4 95.0 90.4 82.8 91.4

Location :

State: Areas within the United States (“Alabama,” “Alaska,” “Arizona,” “Arkansas,” “California,” “Colorado,” “Connecticut,” “Delaware,” “District of Columbia,” “Florida,” “Idaho,” “Indiana,” “Iowa,” “Kansas,” “Louisiana,” “Maine,” “Maryland,” “Massachusetts,” “Michigan,” “Mississippi,” “Montana,” “Nebraska,” “Nevada,” “New Hampshire,” “New Jersey,” “New York,” “Oklahoma,” “Pennsylvania,” “Rhode Island”)

Ethnicity :

Black: Total scores for each region / African American ethnicity
White: Total scores for each region / Caucasian ethnicity

Other Factors :

Economically disadvantaged: Total scores for each region / Economic status
English learner: Total scores for each region / Learning English as a second language Students with disabilities: Total scores for each region / Students with disabilities
Homeless enrolled: Total scores for each region / Living on the street or in public places, experiencing financial difficulties and lack of access to facilities
Foster care: Total scores for each region / Foster children

head(graduation,10)

1.3 Important Note

Correspondence Analysis Workflow

Import Data
Data Preprocessing: Contingency Table
EDA: Ballonplot & Mosaicplot (Optional)
Chi-Square Test
Perform CA in R

Rows Components or Columns Components

The function component grad.ca$row or grad.ca$col contains:

$coord: the coordinates of each row point or column point in each dimension (1, 2, etc.). Used for creating plots. $cos2: the quality of row or column representation. $contrib: the contribution of rows (in %) to the definition of dimensions.

Interpretation of CA Biplot

The standard plot for Correspondence Analysis is a symmetric biplot where rows (blue points) and columns (red triangles) are represented in the same space using new coordinates.

In CA, a biplot combines two plots: one for row variables and one for column variables. The coordinates in the CA biplot represent the profiles of rows and columns. In R, you can obtain a CA biplot using the function fviz_ca_biplot(ca object, repel = TRUE).

Points to consider when interpreting the CA biplot (Correspondence Analysis):

Proximity of category points to the origin indicates the distinctiveness of the category:

Categories close to the origin suggest less differentiation from other categories or lack of distinctiveness.
Categories further away from the origin indicate higher distinctiveness and differentiation from other categories.
Proximity of points to each other within the context of rows or columns indicates similarity, assuming proper data scaling (which is automatically done by the CA() function).

special for Biplot The closer a coordinate point is to the center of the quadrant, the less information is typically obtained. Therefore, coordinate points that are far from the quadrant tend to have more insights.

To understand the relationship between row categories and column categories, observe the angles formed by the row and column arrows with respect to the origin. You can add the arrow parameter to display the row and column arrows on the biplot.

Interpreting the relationship between row and column categories:

Angle close to 0 degrees: Indicates a positive relationship between row and column categories.
Angle of 90 degrees: Indicates no relationship between row and column categories.
Angle of 180 degrees: Indicates a negative relationship between row and column categories.
The further a point is from the origin, the stronger its relationship with another category, based on the criteria of the angle (positive/negative).

One of the differences between PCA and CA is that PCA is used for numerical data, while CA is used for categorical data.

Standardized Residual: A measure of the significance of the relationship derived from the chi-square statistic.
Row Marginal: The total frequency per row.
Column Marginal: The total frequency per column.
Singular Value Decomposition (SVD): A matrix decomposition procedure used to obtain eigenvalues and eigenvectors.
Singular Value: The diagonal values of the diagonal matrix obtained from SVD.
Eigenvalue (sv^2): The variance retained by each dimension.
Row Eigenvector (U): The row coordinate components in CA.
Column Eigenvector (v): The column coordinate components in CA.
Orthogonal Matrix: A matrix that has an inverse equal to its transpose.
Diagonal Matrix: A matrix in which the values are only present on its diagonal, while the other elements are zero.

2 Data Preprocessing

# Data preprocessing1
blackgrad <- graduation %>% # data graduation dikelompokkan berdasarkan kelompok wilayah (State)
  select(c(State, Black)) %>% 
  arrange(-Black) %>% 
  head(10)

blackgrad

Vblackgrad <- blackgrad %>% 
  top_n(n = 10, wt = Black) %>%
  arrange(-Black)

# Create a bar chart showing the number of graduating Black Americans by state
ggplot(Vblackgrad, aes(x = Black, y = fct_reorder(State, Black))) +
  geom_col(fill = "Blue", width = 0.7) +
  geom_text(aes(label = scales::comma(Black), hjust = -0.1, vjust = 0.5), size = 3) +
  scale_x_continuous(expand = c(0, 0), labels = scales::comma_format()) +
  scale_y_discrete(expand = c(0.05, 0), limits = rev(levels(Vblackgrad$State))) +
  coord_cartesian(xlim = c(0, max(Vblackgrad$Black) * 1.1))+
  labs(title = "Top 10 States with the Highest Number of Graduating Black Americans",
       x = "Number of Graduating Black Americans",
       y = "") +
  theme_minimal() +
  theme(plot.title = element_text(size = 12, face = "bold", hjust = 0.5),
        axis.title = element_text(size = 10, face = "bold"),
        axis.text = element_text(size = 8),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        panel.border = element_blank())

insight of Top 10 States with the Highest Number of Graduating Black Americans :

The state with the highest graduation rate for Black students is Alabama with a rate of 88.2%. This is an encouraging sign that the state’s education system is working to ensure Black students are able to complete high school at a high rate.
Five out of the top 10 states with the highest graduation rates for Black students are located in the South. This may indicate that states in the region have made strides in improving educational opportunities and outcomes for Black students.
The states with the highest graduation rates for Black students, Alabama and Delaware, have lower per pupil spending on education compared to other states. This suggests that it’s not just about the amount of money spent on education, but also how those funds are allocated and utilized to support Black students.
There is a significant difference in graduation rates for Black students between the top-performing states and the rest of the country. This highlights the need for continued efforts to improve educational equity and provide support to Black students to ensure they have equal opportunities to succeed.

# Data preprocessing1
whitegrad <- graduation %>% # graduation data grouped by regional (State)
  select(c(State, White)) %>% 
  arrange(-White) %>% 
  head(10)

whitegrad

Vwhitegrad <- whitegrad %>% 
  top_n(n = 10, wt = White) %>%
  arrange(-White)

# Create a bar chart showing the number of graduating Black Americans by state
ggplot(Vwhitegrad, aes(x = White, y = fct_reorder(State, White))) +
  geom_col(fill = "Blue", width = 0.7) +
  geom_text(aes(label = scales::comma(White), hjust = -0.1, vjust = 0.5), size = 3) +
  scale_x_continuous(expand = c(0, 0), labels = scales::comma_format()) +
  scale_y_discrete(expand = c(0.05, 0), limits = rev(levels(Vwhitegrad$State))) +
  coord_cartesian(xlim = c(0, max(Vwhitegrad$White) * 1.1))+
  labs(title = "Top 10 States with the Highest Number of Graduating White Americans",
       x = "Number of Graduating White Americans",
       y = "") +
  theme_minimal() +
  theme(plot.title = element_text(size = 12, face = "bold", hjust = 0.5),
        axis.title = element_text(size = 10, face = "bold"),
        axis.text = element_text(size = 8),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        panel.border = element_blank())

insight of Top 10 States with the Highest Number of Graduating White Americans :

The states with the highest percentage of white graduation are New Jersey with a percentage of 95%. This indicates that education in New Jersey has successfully achieved a very high graduation rate for white students.
Seven out of the top 10 states with the highest percentage of white graduation are located in the Northeast and East Coast regions. This suggests that states in these regions have effective education systems in improving the percentage of white student graduation.
The states with the highest percentage of white graduation, New Jersey and Maryland, are also the states with the highest education spending per student in the United States. This indicates that investment in education can improve the graduation rates of white students.
There is a significant difference in the percentage of white graduation between the top states and other states. This indicates that there are significant differences in the quality of education throughout the United States and there is a need for efforts to improve the quality of education uniformly across all regions.

2.1 Summary Data

summary(graduation)

#>     State               Black           White       Economically disadvantaged
#>  Length:29          Min.   :69.00   Min.   :82.80   Min.   :62.00             
#>  Class :character   1st Qu.:75.00   1st Qu.:87.80   1st Qu.:75.90             
#>  Mode  :character   Median :78.90   Median :90.30   Median :79.60             
#>                     Mean   :78.98   Mean   :89.53   Mean   :79.43             
#>                     3rd Qu.:84.50   3rd Qu.:92.20   3rd Qu.:85.00             
#>                     Max.   :88.20   Max.   :95.00   Max.   :89.80             
#>  English learner Students with disabilities Homeless enrolled  Foster care   
#>  Min.   :39.00   Min.   :55.40              Min.   :48.60     Min.   :31.00  
#>  1st Qu.:65.00   1st Qu.:63.00              1st Qu.:61.00     1st Qu.:50.00  
#>  Median :69.00   Median :68.60              Median :66.00     Median :56.00  
#>  Mean   :69.04   Mean   :69.95              Mean   :66.62     Mean   :55.39  
#>  3rd Qu.:76.00   3rd Qu.:75.00              3rd Qu.:74.00     3rd Qu.:62.00  
#>  Max.   :89.00   Max.   :88.10              Max.   :88.00     Max.   :74.00

Insight yang dapat diambil dari Summary Data :

There is a significant difference in graduation rates between Black and White students. The average graduation rate for Black students is approximately 79%, while White students have an average graduation rate of nearly 90%.
Economically disadvantaged students have a lower graduation rate compared to financially privileged students. The graduation rate for economically disadvantaged students is only around 79%, whereas financially privileged students have an average graduation rate of about 89%.
There is a significant difference in the graduation rates between students with disabilities and students without disabilities. The average graduation rate for students with disabilities is approximately 70%, while students without disabilities have an average graduation rate of nearly 70%.
Students classified as English learners, who are in the process of learning English, have an average graduation rate of only about 69%, indicating they face additional challenges in taking exams.
The level of homeless enrollment and foster care in a state can influence the graduation rate of students in that state. States with lower levels of homeless enrollment and higher levels of foster care tend to have higher graduation rates.

< Omitted the “State” column because it is not a relevant variable for correspondence analysis

rownames(graduation) <- graduation$State
graduation <- graduation %>%
rownames_to_column(var = "Index") %>% # make column "State" to column "Index"
select(-State) %>%   # deleted column "State"
column_to_rownames(var = "Index")

head(graduation)

Calculate the row and column marginals

In CA, row and column margins are used to calculate the expected values, which are utilized in the chi-squared statistical calculations. You can calculate the row and column margins using the rowSums() and colSums() functions, respectively.

Row margins: The sum of frequencies per row.
Column margins: The sum of frequencies per column.

# Calculate row marginals
row_marginals <- rowSums(graduation)
row_marginals

#>              Alabama               Alaska              Arizona 
#>                547.8                469.7                443.3 
#>             Arkansas           California             Colorado 
#>                573.1                511.4                454.7 
#>          Connecticut             Delaware District of Columbia 
#>                501.1                555.5                447.9 
#>              Florida                Idaho              Indiana 
#>                571.6                452.0                589.9 
#>                 Iowa               Kansas            Louisiana 
#>                553.7                546.0                484.7 
#>                Maine             Maryland        Massachusetts 
#>                519.7                498.2                522.1 
#>             Michigan          Mississippi              Montana 
#>                460.5                519.3                516.5 
#>             Nebraska               Nevada        New Hampshire 
#>                481.9                501.3                482.4 
#>           New Jersey             New York             Oklahoma 
#>                548.2                460.6                541.1 
#>         Pennsylvania         Rhode Island 
#>                515.3                489.8

# Calculate column marginals
col_marginals <- colSums(graduation)
col_marginals

#>                      Black                      White 
#>                     2290.4                     2596.5 
#> Economically disadvantaged            English learner 
#>                     2303.5                     2002.2 
#> Students with disabilities          Homeless enrolled 
#>                     2028.5                     1932.0 
#>                Foster care 
#>                     1606.2

# Calculate expected values
n <- sum(graduation)
expected <- outer(row_marginals, col_marginals) / n
expected

#>                         Black     White Economically disadvantaged
#> Alabama              85.00953  96.37061                   85.49574
#> Alaska               72.88970  82.63102                   73.30659
#> Arizona              68.79285  77.98666                   69.18631
#> Arkansas             88.93567 100.82146                   89.44434
#> California           79.36085  89.96701                   79.81475
#> Colorado             70.56194  79.99218                   70.96552
#> Connecticut          77.76246  88.15500                   78.20722
#> Delaware             86.20444  97.72521                   86.69749
#> District of Columbia 69.50669  78.79590                   69.90424
#> Florida              88.70290 100.55757                   89.21023
#> Idaho                70.14295  79.51719                   70.54413
#> Indiana              91.54275 103.77696                   92.06633
#> Iowa                 85.92511  97.40855                   86.41656
#> Kansas               84.73020  96.05395                   85.21481
#> Louisiana            75.21745  85.26987                   75.64766
#> Maine                80.64887  91.42717                   81.11014
#> Maryland             77.31243  87.64483                   77.75462
#> Massachusetts        81.02131  91.84939                   81.48471
#> Michigan             71.46201  81.01253                   71.87074
#> Mississippi          80.58680  91.35680                   81.04772
#> Montana              80.15228  90.86422                   80.61072
#> Nebraska             74.78293  84.77728                   75.21066
#> Nevada               77.79349  88.19019                   78.23844
#> New Hampshire        74.86053  84.86524                   75.28869
#> New Jersey           85.07160  96.44098                   85.55817
#> New York             71.47753  81.03012                   71.88634
#> Oklahoma             83.96980  95.19192                   84.45007
#> Pennsylvania         79.96606  90.65311                   80.42343
#> Rhode Island         76.00888  86.16707                   76.44362
#>                      English learner Students with disabilities
#> Alabama                     74.31282                   75.28896
#> Alaska                      63.71802                   64.55499
#> Arizona                     60.13668                   60.92661
#> Arkansas                    77.74494                   78.76616
#> California                  69.37491                   70.28619
#> Colorado                    61.68317                   62.49341
#> Connecticut                 67.97764                   68.87057
#> Delaware                    75.35737                   76.34724
#> District of Columbia        60.76070                   61.55882
#> Florida                     77.54145                   78.56000
#> Idaho                       61.31689                   62.12232
#> Indiana                     80.02397                   81.07513
#> Iowa                        75.11319                   76.09985
#> Kansas                      74.06863                   75.04157
#> Louisiana                   65.75287                   66.61657
#> Maine                       70.50086                   71.42693
#> Maryland                    67.58424                   68.47199
#> Massachusetts               70.82644                   71.75678
#> Michigan                    62.46997                   63.29055
#> Mississippi                 70.44660                   71.37195
#> Montana                     70.06676                   70.98712
#> Nebraska                    65.37303                   66.23174
#> Nevada                      68.00477                   68.89805
#> New Hampshire               65.44086                   66.30046
#> New Jersey                  74.36708                   75.34393
#> New York                    62.48354                   63.30430
#> Oklahoma                    73.40392                   74.36812
#> Pennsylvania                69.90397                   70.82220
#> Rhode Island                66.44472                   67.31751
#>                      Homeless enrolled Foster care
#> Alabama                       71.70730    59.61505
#> Alaska                        61.48397    51.11571
#> Arizona                       58.02820    48.24270
#> Arkansas                      75.01909    62.36835
#> California                    66.94252    55.65377
#> Colorado                      59.52047    49.48332
#> Connecticut                   65.59425    54.53286
#> Delaware                      72.71524    60.45301
#> District of Columbia          58.63034    48.74330
#> Florida                       74.82274    62.20511
#> Idaho                         59.16703    49.18949
#> Indiana                       77.21821    64.19663
#> Iowa                          72.47962    60.25712
#> Kansas                        71.47168    59.41916
#> Louisiana                     63.44748    52.74811
#> Maine                         68.02900    56.55703
#> Maryland                      65.21464    54.21726
#> Massachusetts                 68.34316    56.81821
#> Michigan                      60.27969    50.11451
#> Mississippi                   67.97664    56.51350
#> Montana                       67.61012    56.20878
#> Nebraska                      63.08096    52.44339
#> Nevada                        65.62043    54.55462
#> New Hampshire                 63.14641    52.49781
#> New Jersey                    71.75966    59.65858
#> New York                      60.29278    50.12539
#> Oklahoma                      70.83027    58.88591
#> Pennsylvania                  67.45304    56.07819
#> Rhode Island                  64.11507    53.30312

2.2 Contingency Table Black & White Ethnic

# Contingency table
blackCT <- table(graduation$Black, graduation$`Economically disadvantaged`)

# Print contingency table using kable
kable(blackCT, format = "markdown") %>%
  kable_styling(full_width = F, position = "center", font_size = 1)

	62	71.7	72.3	73.6	73.8	75	75.9	76.8	77.2	78.4	78.9	79.1	79.3	79.6	79.7	80.6	81.2	81.3	82	85	85.5	85.9	86.2	87.1	87.2	89.8
69	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
69.5	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
70.4	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
71.7	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
72.9	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
74	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
75	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	1	0
75.3	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
76.5	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
76.6	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
76.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
77	0	0	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
78.9	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
80	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	1	0	0	0	0	0	0	0	0
81	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0
83	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
83.1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
84.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	1
84.7	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0
85.7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
86.1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0
86.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0
87	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0
88.2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0

blackCTgroup <- graduation %>% 
  group_by(Black) %>%
  summarise(Total_ED = sum(`Economically disadvantaged`)) %>% 
  arrange(-Total_ED)

blackCTgroup

# Contingency table
blackCT1 <- table(graduation$Black, graduation$`English learner`)

# Print contingency table using kable
kable(blackCT1, format = "markdown") %>%
  kable_styling(full_width = F, position = "center", font_size = 1)

	39	50	52	55.2	55.6	56	62	65	67	68	68.3	69	69.1	70.2	72	73.1	73.7	75.3	76	77	81	83.5	84	84.4	85.8	89
69	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
69.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
70.4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
71.7	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
72.9	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
74	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
75	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
75.3	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
76.5	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
76.6	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
76.9	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0
77	0	0	0	0	0	0	0	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
78.9	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
80	0	0	0	0	0	0	0	0	1	0	0	1	0	0	0	0	0	0	0	0	0	1	0	0	0	0
81	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
83	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0
83.1	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
84.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	1
84.7	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
85.7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
86.1	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
86.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
87	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0
88.2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0

# Contingency table
blackCT2 <- table(graduation$Black, graduation$`Students with disabilities`)

# Print contingency table using kable
kable(blackCT2, format = "markdown") %>%
  kable_styling(full_width = F, position = "center", font_size = 1)

	55.4	56	59	59.3	60.7	61.8	63	65	66	66.2	68.1	68.4	68.5	68.6	68.9	72.8	73	74	74.9	75	76.4	79.1	80.4	80.9	82.9	84.1	88.1
69	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
69.5	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
70.4	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
71.7	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
72.9	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
74	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
75	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1
75.3	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
76.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
76.6	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
76.9	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
77	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	1	0	0	0	0	0	0	0
78.9	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0
80	0	0	0	0	0	0	1	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
81	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
83	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
83.1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
84.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	1	0
84.7	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
85.7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0
86.1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
86.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0
87	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
88.2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0

# Contingency table
blackCT3 <- table(graduation$Black, graduation$`Students with disabilities`)

# Print contingency table using kable
kable(blackCT3, format = "markdown") %>%
  kable_styling(full_width = F, position = "center", font_size = 1)

	55.4	56	59	59.3	60.7	61.8	63	65	66	66.2	68.1	68.4	68.5	68.6	68.9	72.8	73	74	74.9	75	76.4	79.1	80.4	80.9	82.9	84.1	88.1
69	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
69.5	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
70.4	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
71.7	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
72.9	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
74	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
75	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1
75.3	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
76.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
76.6	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
76.9	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
77	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	1	0	0	0	0	0	0	0
78.9	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0
80	0	0	0	0	0	0	1	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
81	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
83	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
83.1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
84.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	1	0
84.7	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
85.7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0
86.1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
86.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0
87	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
88.2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0

# Contingency table
blackCT4 <- table(graduation$Black, graduation$`Foster care`)

# Print contingency table using kable
kable(blackCT4, format = "markdown") %>%
  kable_styling(full_width = F, position = "center", font_size = 1)

	31	40	43	45	47	50	53	54	55	56	57	58	58.2	62	64	65	67	71	74
69	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
69.5	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0
70.4	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
71.7	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
72.9	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
74	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
75	0	0	0	0	0	0	0	0	1	0	0	1	0	0	0	0	0	0	0
75.3	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
76.5	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
76.6	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
76.9	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
77	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
78.9	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
80	0	0	0	0	1	0	0	0	0	0	1	0	0	1	0	0	0	0	0
81	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0
83	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
83.1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0
84.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	0	0
84.7	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0
85.7	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
86.1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
86.9	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
87	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1
88.2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0

White

# Contingency table
WhiteCT1 <- table(graduation$White, graduation$`English learner`)

# Print contingency table using kable
kable(WhiteCT1, format = "markdown") %>%
  kable_styling(full_width = F, position = "center", font_size = 1)

	39	50	52	55.2	55.6	56	62	65	67	68	68.3	69	69.1	70.2	72	73.1	73.7	75.3	76	77	81	83.5	84	84.4	85.8	89
82.8	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
83	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
84.2	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
84.4	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
85.4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
86.1	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
86.4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
87.8	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0
87.9	0	0	0	0	0	0	0	0	0	0	0	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0
88.7	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
89.4	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
89.9	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
90.3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0
90.4	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
90.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0
90.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0
91.4	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
91.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
92.2	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
92.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1
93	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
93.2	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
93.4	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
93.8	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
94.1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
95	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0

# Contingency table
WhiteCT2 <- table(graduation$White, graduation$`Students with disabilities`)

# Print contingency table using kable
kable(WhiteCT2, format = "markdown") %>%
  kable_styling(full_width = F, position = "center", font_size = 1)

	55.4	56	59	59.3	60.7	61.8	63	65	66	66.2	68.1	68.4	68.5	68.6	68.9	72.8	73	74	74.9	75	76.4	79.1	80.4	80.9	82.9	84.1	88.1
82.8	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1
83	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
84.2	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
84.4	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
85.4	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
86.1	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
86.4	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
87.8	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	1	0	0	0	0	0	0	0	0	0
87.9	0	0	0	0	0	0	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
88.7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0
89.4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
89.9	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
90.3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
90.4	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
90.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
90.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
91.4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
91.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0
92.2	0	0	0	0	0	0	0	1	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
92.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0
93	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
93.2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
93.4	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
93.8	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
94.1	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
95	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0

# Contingency table
WhiteCT3 <- table(graduation$White, graduation$`Students with disabilities`)

# Print contingency table using kable
kable(WhiteCT3, format = "markdown") %>%
  kable_styling(full_width = F, position = "center", font_size = 1)

	55.4	56	59	59.3	60.7	61.8	63	65	66	66.2	68.1	68.4	68.5	68.6	68.9	72.8	73	74	74.9	75	76.4	79.1	80.4	80.9	82.9	84.1	88.1
82.8	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1
83	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
84.2	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
84.4	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
85.4	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
86.1	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
86.4	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
87.8	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	1	0	0	0	0	0	0	0	0	0
87.9	0	0	0	0	0	0	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
88.7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0
89.4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
89.9	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
90.3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
90.4	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
90.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
90.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
91.4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
91.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0
92.2	0	0	0	0	0	0	0	1	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
92.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0
93	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
93.2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
93.4	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
93.8	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
94.1	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
95	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0

# Contingency table
WhiteCT4 <- table(graduation$White, graduation$`Foster care`)

# Print contingency table using kable
kable(WhiteCT4, format = "markdown") %>%
  kable_styling(full_width = F, position = "center", font_size = 1)

	31	40	43	45	47	50	53	54	55	56	57	58	58.2	62	64	65	67	71	74
82.8	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0
83	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
84.2	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
84.4	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0
85.4	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
86.1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
86.4	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0
87.8	0	0	0	0	0	0	1	1	0	0	0	0	0	0	0	0	0	0	0
87.9	0	0	0	0	0	0	0	0	0	0	1	0	1	0	0	0	0	0	0
88.7	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
89.4	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
89.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
90.3	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0
90.4	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
90.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1
90.9	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
91.4	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
91.9	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0
92.2	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	1	0	0
92.5	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0
93	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0
93.2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0
93.4	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
93.8	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0
94.1	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0
95	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0

3 Ballonplot & Mosaicplot

ballon_col <- c("#0D0887FF", "#6A00A8FF", "#B12A90FF","#E16462FF", "#FCA636FF", "#F0F921FF")
ballonplot <- ggballoonplot(graduation, fill = "value") +
  scale_fill_gradientn(colours = ballon_col) +
  labs(title = "Balloonplot for Graduation Of America") +
  theme(plot.title = element_text(hjust = 0.5))

  ggplotly(ballonplot,height = 1300, width = 1000)

# Mosaic plot of observed values
mosaicplot(graduation,
           las = 2,
           shade = T,
           off = 25,
           main = "Mosaic plot for brand personalities")

4 Chi-Square Test

# Chi-Square Test
chi_result <- chisq.test(graduation)
chi_result

#> 
#>  Pearson's Chi-squared test
#> 
#> data:  graduation
#> X-squared = 102.98, df = 168, p-value = 1

In the context of using CA, the hypothesis formulation for the chi-square test can be stated as follows:

$H_0$: There is no relationship between the row variable and the column variable in the data distribution.
$H_1$: There is a relationship between the row variable and the column variable in the data distribution.

Note: The null hypothesis ($H_0$) will be rejected if the p-value obtained from the chi-square test is smaller than the predetermined significance level, such as 0.05.

Conclusion: In the CA analysis, based on the chi-square test result of X-squared = 102.98 with df = 168 and p-value = 1, there is not enough evidence to reject the null hypothesis ($H_0$) stating that there is no relationship between the row variable and the column variable in the data distribution. Therefore, based on the chi-square test result, there is no significant relationship between the row variable and the column variable in the dataset used in the CA analysis.

5 Biplot Interpretation

# Processing & Perform Correspondence Analysis
grad.ca <- CA(graduation)

#> Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

# Result Correspondence Analysis
grad.ca

#> **Results of the Correspondence Analysis (CA)**
#> The row variable has  29  categories; the column variable has 7 categories
#> The chi square of independence between the two variables is equal to 102.977 (p-value =  0.9999802 ).
#> *The results are available in the following objects:
#> 
#>    name              description                   
#> 1  "$eig"            "eigenvalues"                 
#> 2  "$col"            "results for the columns"     
#> 3  "$col$coord"      "coord. for the columns"      
#> 4  "$col$cos2"       "cos2 for the columns"        
#> 5  "$col$contrib"    "contributions of the columns"
#> 6  "$row"            "results for the rows"        
#> 7  "$row$coord"      "coord. for the rows"         
#> 8  "$row$cos2"       "cos2 for the rows"           
#> 9  "$row$contrib"    "contributions of the rows"   
#> 10 "$call"           "summary called parameters"   
#> 11 "$call$marge.col" "weights of the columns"      
#> 12 "$call$marge.row" "weights of the rows"

The Row and Column Variables: There are 29 categories for the row variable and 7 categories for the column variable. This reflects the complexity and variability within the analyzed dataset.
Chi-Square and p-value: The chi-square of independence between the two variables is 102.977, with a p-value of 0.9999802. This indicates a significant relationship between the row and column variables in the dataset, as the p-value is very high, approaching 1.
Eigenvalues: Eigenvalues provide information about the variability explained by each axis in the correspondence analysis. The $eig object may contain information about the eigenvalues and their contribution to the data’s variability.
Coordinates and Cosine Squares: The $col$coord and $row$coord objects contain the coordinates of the columns and rows in the correspondence space. The $col$cos2 and $row$cos2 objects contain the cosine squares for the columns and rows, respectively. This information can be used to visualize the relative positions and associations between categories in the analysis.
Contributions: The $col$contrib and $row$contrib objects contain the contributions of each column and row to the total variability in the analysis. This information can help identify the most influential categories in the analysis.
Parameters and Weights: The $call object may contain the parameters used in the correspondence analysis, while the $call$marge.col and $call$marge.row objects may contain the column and row weights used in the calculations.

Overall, the results of the Correspondence Analysis provide insights into the relationships and patterns within the observed dataset.

# biplot ca_row
fviz_ca_row(grad.ca, repel = T)

# biplot ca_col
fviz_ca_col(grad.ca, repel = TRUE)

# biplot CA
fviz_ca_biplot(grad.ca, col.var = "contrib",
               gradient.cols = c("red", "green", "yellow"),
               repel = T,
               geom.label.repel = T,
               geom.label.nudge.x = 0.01,
               geom.label.nudge.y = 0.2,
               ggtheme = theme_minimal(base_size = 6, base_family = ""))

6 EIGEN Value

head(grad.ca$eig)

#>         eigenvalue percentage of variance cumulative percentage of variance
#> dim 1 0.0030475075              43.678769                          43.67877
#> dim 2 0.0019539062              28.004596                          71.68337
#> dim 3 0.0009356291              13.410018                          85.09338
#> dim 4 0.0006744599               9.666778                          94.76016
#> dim 5 0.0002193719               3.144174                          97.90433
#> dim 6 0.0001462165               2.095665                         100.00000

Eigen value serves as a measure of the importance of each dimension in representing the variability within the data. A higher eigenvalue indicates a greater contribution of the dimension to the variability within the data.

In the first dimension, the eigenvalue is 0.0030475075, accounting for 43.678769% of the total variability in the data. This indicates that the first dimension is highly important and has a significant contribution in explaining the variability within the data.
In the second dimension, the eigenvalue is 0.0019539062. This eigenvalue contributes 28.004596% of the total variability in the data. Although lower than the eigenvalue in the first dimension, the second dimension still has a significant contribution in explaining the variability within the data.
Similarly, the third, fourth, fifth, and sixth dimensions each have decreasing eigenvalues and variability contributions. Despite lower eigenvalues in these dimensions, they still provide meaningful contributions to our understanding of the variability within the data.
Looking at the cumulative percentage of variance, we can observe that the first five dimensions (dimension 1 to dimension 5) collectively account for approximately 97.90433% of the total variability in the data. This suggests that the first five dimensions are sufficient to explain a significant portion of the variability within the data, while the sixth dimension only contributes relatively little.
Therefore, as a professional data analyst and data scientist, we can conclude that the first and second dimensions play the most important roles in representing the variability within the data, while the subsequent dimensions provide diminishing contributions.

7 Row component & column component

# Row Component & Column Component
row_component1 <- grad.ca$row$coord
row_component2 <- grad.ca$row$cos2
row_component3 <- grad.ca$row$contrib
column_component1 <- grad.ca$col$coord
column_component2 <- grad.ca$col$cos2
column_component3 <- grad.ca$col$contrib

head(row_component1)

#>                   Dim 1        Dim 2        Dim 3        Dim 4         Dim 5
#> Alabama    -0.033380129  0.035487857 -0.023100410 -0.013008876  0.0192904200
#> Alaska      0.001180289  0.009298571 -0.011927322 -0.046792397  0.0024739898
#> Arizona    -0.017641217 -0.054904027  0.059999546 -0.004905556  0.0147285732
#> Arkansas    0.039177234  0.048321216  0.008950018  0.015691435 -0.0004491340
#> California -0.006296293  0.020939341 -0.014327178  0.008465513 -0.0007968192
#> Colorado    0.081789722 -0.112793699 -0.019520270 -0.014591585  0.0142964913

head(row_component2)

#>                   Dim 1      Dim 2      Dim 3       Dim 4         Dim 5
#> Alabama    0.3208277048 0.36262303 0.15365093 0.048727638 0.10714679829
#> Alaska     0.0005477674 0.03399787 0.05593777 0.860933906 0.00240666161
#> Arizona    0.0413277017 0.40030644 0.47805747 0.003195661 0.02880749759
#> Arkansas   0.3553169804 0.54053539 0.01854369 0.056999851 0.00004669822
#> California 0.0471891564 0.52191335 0.24433953 0.085305878 0.00075577408
#> Colorado   0.3295693941 0.62678571 0.01877245 0.010489489 0.01006951004

head(row_component3)

#>                  Dim 1      Dim 2      Dim 3      Dim 4       Dim 5
#> Alabama    1.357023946  2.3922806  2.1168588  0.9312786 6.295913228
#> Alaska     0.001454745  0.1408262  0.4838792 10.3311634 0.088791155
#> Arizona    0.306721525  4.6337934 11.5564424  0.1071648 2.970109247
#> Arkansas   1.955633220  4.6401996  0.3324365  1.4175350 0.003570555
#> California 0.045073366  0.7775302  0.7601732  0.3681674 0.010028446
#> Colorado   6.762571461 20.0597339  1.2546630  0.9725411 2.870365634

head(column_component1)

#>                                  Dim 1        Dim 2        Dim 3       Dim 4
#> Black                      -0.02694753 -0.031162726 -0.001803787 -0.01846495
#> White                      -0.03379917 -0.054668035 -0.001439753 -0.01828960
#> Economically disadvantaged -0.01047897 -0.016735097 -0.004384464  0.01904894
#> English learner             0.11856755  0.023156333 -0.012581688 -0.02912370
#> Students with disabilities  0.03262476  0.001175325  0.061467729  0.03102843
#> Homeless enrolled          -0.00239575  0.021551945 -0.053284101  0.03966462
#>                                    Dim 5
#> Black                       0.0220179065
#> White                      -0.0217767766
#> Economically disadvantaged  0.0171858717
#> English learner            -0.0007318496
#> Students with disabilities -0.0053372382
#> Homeless enrolled          -0.0086446672

head(column_component2)

#>                                  Dim 1        Dim 2       Dim 3      Dim 4
#> Black                      0.257760741 0.3447067724 0.001154914 0.12102505
#> White                      0.230350457 0.6026207115 0.000417978 0.06745053
#> Economically disadvantaged 0.070698512 0.1803141191 0.012376740 0.23362267
#> English learner            0.900620857 0.0343518285 0.010141172 0.05433799
#> Students with disabilities 0.180905188 0.0002347861 0.642171039 0.16363488
#> Homeless enrolled          0.001134624 0.0918209917 0.561260307 0.31101093
#>                                    Dim 5
#> Black                      0.17208023202
#> White                      0.09562342981
#> Economically disadvantaged 0.19015879681
#> English learner            0.00003431263
#> Students with disabilities 0.00484160717
#> Homeless enrolled          0.01477289463

head(column_component3)

#>                                  Dim 1       Dim 2       Dim 3     Dim 4
#> Black                       3.69775916  7.71281211  0.05396503  7.844870
#> White                       6.59461929 26.90827087  0.03897581  8.725186
#> Economically disadvantaged  0.56236030  2.23704747  0.32066510  8.396688
#> English learner            62.57900025  3.72286738  2.29517497 17.059991
#> Students with disabilities  4.80019745  0.00971676 55.50087665 19.618807
#> Homeless enrolled           0.02465356  3.11179228 39.72216486 30.534570
#>                                  Dim 5
#> Black                      34.29388767
#> White                      38.03022162
#> Economically disadvantaged 21.01283687
#> English learner             0.03312108
#> Students with disabilities  1.78468478
#> Homeless enrolled           4.45920072

fviz_ca_biplot(grad.ca, repel = T, arrows = c(T,T))

Here is the mapping for graduation data in America:

The graduation rate in America, particularly in the states of Alabama and Montana, shows that some students graduate with foster care status.
The graduation rate in Nevada and Florida is predominantly composed of students who are learning English as their second language.
In America, specifically in the states of Maryland and Arizona, there is a higher number of students with white ethnic graduating compared to students with black ethnic.
In America, particularly in Massachusetts, obstacles do not deter students from achieving graduation. This is evident from the graduation rate, where students with black ethnic rank 9th and students with white rank 5th out of 29 states in America.
The highest enrollment of students with Homeless status can be found in California.
The number of students with disabilities is relatively small and is concentrated in Nevada.

fviz_ca_row(grad.ca, col.row = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), 
             repel = TRUE,
            title="Row Poins based on Their Quality Cos2")

fviz_contrib(grad.ca, choice = "row", axes = 1)

Insight: The row category that contributes the most to dimension 1 is New York.

fviz_contrib(grad.ca, choice = "row", axes = 2)

Insight: The row category that contributes the most to dimension 2 is Colorado.

fviz_ca_row(grad.ca, col.col = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), 
             repel = TRUE,
            title="Row Poins based on Their Quality Cos2")

fviz_contrib(grad.ca, choice = "col", axes = 1)

Insight: The row category that contributes the most to dimension 1 is English Learner.

fviz_contrib(grad.ca, choice = "col", axes = 2)

Insight: The row category that contributes the most to dimension 2 is Foster Care.

8 Conclusion

Correspondence Analysis provides easily interpretable visualizations that enable us to identify relationships between two categorical variable categories. CA can assist us in defining the graduation patterns of American students based on their social status (column variable) across different regions (row variable) in the United States.

Correspondence Analysis

Rino Anthony Stiawan

2023-05-17