Here is the final project:

We will be exploring the dataset ResumeNames.

Cross-section of data about resume, call-back and employer information for 4870 fictitious resumes.

Are Emily and Greg More Employable than Lakisha and Jamal? In other words, are Caucasian sounding names more likely to receive a call-back than African American sounding names?

Lets begin with a walk through of the methodology of analyzing the dataset.

library(ggpubr)
## Loading required package: ggplot2
theme_set(theme_pubr())
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(gmodels)
library(ggplot2)
ResumeNames<-"https://raw.githubusercontent.com/lszydziak/LS_CUNY/main/ResumeNames.csv"

#C:/Users/Lisa/Documents/CUNY/Bridge/rsconnect/documents/HW3/ResumeNames.csv"

#ResumeNames<-"C:/Personal/CUNY/ResumeNames.csv"

#

Resume<-read.table(file=ResumeNames,header=TRUE, sep=",")


class(Resume)
## [1] "data.frame"
#Take a peek at the first few records
head(Resume)
##   X    name gender ethnicity quality call    city jobs experience honors
## 1 1 Allison female      cauc     low   no chicago    2          6     no
## 2 2 Kristen female      cauc    high   no chicago    3          6     no
## 3 3 Lakisha female      afam     low   no chicago    1          6     no
## 4 4 Latonya female      afam    high   no chicago    4          6     no
## 5 5  Carrie female      cauc    high   no chicago    3         22     no
## 6 6     Jay   male      cauc     low   no chicago    2          6    yes
##   volunteer military holes school email computer special college minimum equal
## 1        no       no   yes     no    no      yes      no     yes       5   yes
## 2       yes      yes    no    yes   yes      yes      no      no       5   yes
## 3        no       no    no    yes    no      yes      no     yes       5   yes
## 4       yes       no   yes     no   yes      yes     yes      no       5   yes
## 5        no       no    no    yes   yes      yes      no      no    some   yes
## 6        no       no    no     no    no       no     yes     yes    none   yes
##       wanted requirements reqexp reqcomm reqeduc reqcomp reqorg
## 1 supervisor          yes    yes      no      no     yes     no
## 2 supervisor          yes    yes      no      no     yes     no
## 3 supervisor          yes    yes      no      no     yes     no
## 4 supervisor          yes    yes      no      no     yes     no
## 5  secretary          yes    yes      no      no     yes    yes
## 6      other           no     no      no      no      no     no
##                           industry
## 1                    manufacturing
## 2                    manufacturing
## 3                    manufacturing
## 4                    manufacturing
## 5 health/education/social services
## 6                            trade
# What is the size of this dataset?
dim(Resume)
## [1] 4870   28
#What types of variables are in the dataset?
str(Resume)
## 'data.frame':    4870 obs. of  28 variables:
##  $ X           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ name        : chr  "Allison" "Kristen" "Lakisha" "Latonya" ...
##  $ gender      : chr  "female" "female" "female" "female" ...
##  $ ethnicity   : chr  "cauc" "cauc" "afam" "afam" ...
##  $ quality     : chr  "low" "high" "low" "high" ...
##  $ call        : chr  "no" "no" "no" "no" ...
##  $ city        : chr  "chicago" "chicago" "chicago" "chicago" ...
##  $ jobs        : int  2 3 1 4 3 2 2 4 3 2 ...
##  $ experience  : int  6 6 6 6 22 6 5 21 3 6 ...
##  $ honors      : chr  "no" "no" "no" "no" ...
##  $ volunteer   : chr  "no" "yes" "no" "yes" ...
##  $ military    : chr  "no" "yes" "no" "no" ...
##  $ holes       : chr  "yes" "no" "no" "yes" ...
##  $ school      : chr  "no" "yes" "yes" "no" ...
##  $ email       : chr  "no" "yes" "no" "yes" ...
##  $ computer    : chr  "yes" "yes" "yes" "yes" ...
##  $ special     : chr  "no" "no" "no" "yes" ...
##  $ college     : chr  "yes" "no" "yes" "no" ...
##  $ minimum     : chr  "5" "5" "5" "5" ...
##  $ equal       : chr  "yes" "yes" "yes" "yes" ...
##  $ wanted      : chr  "supervisor" "supervisor" "supervisor" "supervisor" ...
##  $ requirements: chr  "yes" "yes" "yes" "yes" ...
##  $ reqexp      : chr  "yes" "yes" "yes" "yes" ...
##  $ reqcomm     : chr  "no" "no" "no" "no" ...
##  $ reqeduc     : chr  "no" "no" "no" "no" ...
##  $ reqcomp     : chr  "yes" "yes" "yes" "yes" ...
##  $ reqorg      : chr  "no" "no" "no" "no" ...
##  $ industry    : chr  "manufacturing" "manufacturing" "manufacturing" "manufacturing" ...
#quick summary
summary(Resume)
##        X            name              gender           ethnicity        
##  Min.   :   1   Length:4870        Length:4870        Length:4870       
##  1st Qu.:1218   Class :character   Class :character   Class :character  
##  Median :2436   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2436                                                           
##  3rd Qu.:3653                                                           
##  Max.   :4870                                                           
##    quality              call               city                jobs      
##  Length:4870        Length:4870        Length:4870        Min.   :1.000  
##  Class :character   Class :character   Class :character   1st Qu.:3.000  
##  Mode  :character   Mode  :character   Mode  :character   Median :4.000  
##                                                           Mean   :3.661  
##                                                           3rd Qu.:4.000  
##                                                           Max.   :7.000  
##    experience        honors           volunteer           military        
##  Min.   : 1.000   Length:4870        Length:4870        Length:4870       
##  1st Qu.: 5.000   Class :character   Class :character   Class :character  
##  Median : 6.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 7.843                                                           
##  3rd Qu.: 9.000                                                           
##  Max.   :44.000                                                           
##     holes              school             email             computer        
##  Length:4870        Length:4870        Length:4870        Length:4870       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    special            college            minimum             equal          
##  Length:4870        Length:4870        Length:4870        Length:4870       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     wanted          requirements          reqexp            reqcomm         
##  Length:4870        Length:4870        Length:4870        Length:4870       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    reqeduc            reqcomp             reqorg            industry        
##  Length:4870        Length:4870        Length:4870        Length:4870       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
## 
#What are the variables?
colnames(Resume)
##  [1] "X"            "name"         "gender"       "ethnicity"    "quality"     
##  [6] "call"         "city"         "jobs"         "experience"   "honors"      
## [11] "volunteer"    "military"     "holes"        "school"       "email"       
## [16] "computer"     "special"      "college"      "minimum"      "equal"       
## [21] "wanted"       "requirements" "reqexp"       "reqcomm"      "reqeduc"     
## [26] "reqcomp"      "reqorg"       "industry"
#Lets reduce the number of variables in the dataset

Resume2<-data.frame(Resume$name, Resume$gender, Resume$ethnicity,Resume$quality,
                    
                    Resume$call, Resume$city, Resume$jobs, Resume$experience,
                    
                    Resume$computer, Resume$college, Resume$minimum,
                    
                    Resume$requirements, Resume$reqexp, Resume$reqcomm,
                    
                    Resume$reqeduc, Resume$reqcomp, Resume$reqorg, Resume$industry)

colnames(Resume2)<-c("name", "gender", "ethnicity", "quality", "call", "city", "jobs", "experience", 
  "computer", "college", "minimum", "requirements", "reqexp", "reqcomm", "reqeduc", "reqcomp", "reqorg", "industry")


#what is the size of the dataset (records variables)?
dim(Resume2)
## [1] 4870   18
#Take a peek at the dataset with reduced variables

head(Resume2)
##      name gender ethnicity quality call    city jobs experience computer
## 1 Allison female      cauc     low   no chicago    2          6      yes
## 2 Kristen female      cauc    high   no chicago    3          6      yes
## 3 Lakisha female      afam     low   no chicago    1          6      yes
## 4 Latonya female      afam    high   no chicago    4          6      yes
## 5  Carrie female      cauc    high   no chicago    3         22      yes
## 6     Jay   male      cauc     low   no chicago    2          6       no
##   college minimum requirements reqexp reqcomm reqeduc reqcomp reqorg
## 1     yes       5          yes    yes      no      no     yes     no
## 2      no       5          yes    yes      no      no     yes     no
## 3     yes       5          yes    yes      no      no     yes     no
## 4      no       5          yes    yes      no      no     yes     no
## 5      no    some          yes    yes      no      no     yes    yes
## 6     yes    none           no     no      no      no      no     no
##                           industry
## 1                    manufacturing
## 2                    manufacturing
## 3                    manufacturing
## 4                    manufacturing
## 5 health/education/social services
## 6                            trade

Let’s discuss the variable “requirements”. Does the ad mention some “requirement” for the job?

So, if there is some “requirement”, it should be met, otherwise there is no possibility of a call-back.

Conversely, if there is no requirement, anyone is eligible for this job. So, lets separate the jobs which don’t have any minimum requirements, and store them for later.

table(Resume2$requirements)
## 
##   no  yes 
## 1036 3834

We will store 1036 records for later. Now let’s look at the 3834 records with a “requirement”.

#recall,if AD doesnt have requirements keep these 1036 records

#if AD does have requirements, resume must meet the requirements

#if resume does not meet requirements, no call regardless, drop records

NoReq<-Resume2[Resume2$requirements=="no",]

dim(NoReq)
## [1] 1036   18
#Store these 1036 for later
#No Requirement so Keep these 1036

#do the remaining 3834 meet requirements?

#unfortunately, only education, computer and experience

#can be checked explicitly.  So Keep the records

#that meet the requirements

So, now the issue is, did the resumes with a requirement meet that requirement?

Here is the rub: the “requirement” variable is simply a Yes/No. 

There are specific “requirement” variables: experience, communication, education, computer and organizational skills. On the other hand, the resume attributes variables are: experience, education, computer and special skills.

We can pair [requirement – attribute] for experience, education and computer.

Unfortunately, we cannot pair communication and organizational skills with the attribute of special skills because it is not explicitly defined. We don’t know what a special skill is, and the two remaining requirements do not have a one-one pairing of attributes.

Let’s begin…with the 3834 records that have a requirement…..

Req<-Resume2[Resume2$requirements=="yes",]
#Here is the number of resumes that apply to a job with some "requirement"

dim(Req)
## [1] 3834   18
#Lets start with the education requirement,is it met #or is it not required?
table(Req$college,Req$reqeduc)
##      
##         no  yes
##   no   986   90
##   yes 2328  430
#this cross tab view gives a better description of #the breakdown by categories

CrossTable(Req$college, Req$reqeduc)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  3834 
## 
##  
##              | Req$reqeduc 
##  Req$college |        no |       yes | Row Total | 
## -------------|-----------|-----------|-----------|
##           no |       986 |        90 |      1076 | 
##              |     3.364 |    21.440 |           | 
##              |     0.916 |     0.084 |     0.281 | 
##              |     0.298 |     0.173 |           | 
##              |     0.257 |     0.023 |           | 
## -------------|-----------|-----------|-----------|
##          yes |      2328 |       430 |      2758 | 
##              |     1.312 |     8.365 |           | 
##              |     0.844 |     0.156 |     0.719 | 
##              |     0.702 |     0.827 |           | 
##              |     0.607 |     0.112 |           | 
## -------------|-----------|-----------|-----------|
## Column Total |      3314 |       520 |      3834 | 
##              |     0.864 |     0.136 |           | 
## -------------|-----------|-----------|-----------|
## 
## 

So, 986+2328 have no education requirement, so we need to keep these. 430 meet the education requirement, so keep these as well. 986+2328+430= 3744 records need to check the next requirement.

CollegeR<-Req[Req$college=="yes"|(Req$college=="no" & Req$reqeduc=="no"),]

dim(CollegeR)
## [1] 3744   18

Now 3744 meet the education requirement, now check these to see if they meet the computer requirement as well.

table(CollegeR$computer, CollegeR$reqcomp)
##      
##         no  yes
##   no   432  110
##   yes 1237 1965

This cross tab view gives a better description of the breakdown by categories

CrossTable(CollegeR$computer, CollegeR$reqcomp)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  3744 
## 
##  
##                   | CollegeR$reqcomp 
## CollegeR$computer |        no |       yes | Row Total | 
## ------------------|-----------|-----------|-----------|
##                no |       432 |       110 |       542 | 
##                   |   150.022 |   120.669 |           | 
##                   |     0.797 |     0.203 |     0.145 | 
##                   |     0.259 |     0.053 |           | 
##                   |     0.115 |     0.029 |           | 
## ------------------|-----------|-----------|-----------|
##               yes |      1237 |      1965 |      3202 | 
##                   |    25.394 |    20.425 |           | 
##                   |     0.386 |     0.614 |     0.855 | 
##                   |     0.741 |     0.947 |           | 
##                   |     0.330 |     0.525 |           | 
## ------------------|-----------|-----------|-----------|
##      Column Total |      1669 |      2075 |      3744 | 
##                   |     0.446 |     0.554 |           | 
## ------------------|-----------|-----------|-----------|
## 
## 

So, 432+1237 have no computer requirement, so we need to keep these. 1965 meet the computer requirement, so keep these as well. 423+1237+1965= 3634 records need to check the next requirement.

CompR<-CollegeR[CollegeR$computer=="yes"|(CollegeR$computer=="no" & CollegeR$reqcomp=="no"),]

dim(CompR)
## [1] 3634   18

Here 3634 records meet computer and education require, now check experience

Experience required: none, some, .5, 1-8, 10. Actual experience: 0-44.

So, we must check to see if the actual experience adequately meets the required experience……


```r
table(CompR$experience,CompR$minimum)
##     
##        0 0.5   1  10   2   3   4   5   6   7   8 none some
##   1    0   1   0   0   1   0   0   0   0   0   0   22    4
##   2    1   0  13   1  22  10   0   4   0   0   0   87   93
##   3    0   0   4   1  19   6   0   1   0   0   0   75   45
##   4    0   0  10   0  44  46   1   3   1   0   0  193  110
##   5    0   0   7   2  32  31   2  18   0   0   0  162  111
##   6    1   0  25   2  52  59   1  22   1   0   0  284  166
##   7    0   0  20   1  35  22   0  20   4   3   1  145  118
##   8    1   0  21   1  53  48   1  30   0   2   1  128  153
##   9    0   1   1   0  11   4   0   7   0   2   2   81   26
##   10   0   0   4   4  13   4   0   6   0   1   3   46   27
##   11   0   2   3   6   4   7   0   9   0   1   2  100   20
##   12   0   0   2   0   5   3   0   1   0   0   0   34   17
##   13   0   0   1   0  11   8   0   4   0   0   0   52   33
##   14   0   0   4   0   7  16   0   8   0   1   1   48   30
##   15   0   0   1   0   2   0   0   4   0   0   0   18    5
##   16   0   0   1   0   0   6   1   0   0   0   0   50    7
##   17   0   0   0   0   0   2   0   0   0   0   0    0    0
##   18   0   0   4   0   7  14   0   5   0   1   0    9   18
##   19   0   0   0   0   4   1   0   1   0   0   0   10    7
##   20   0   0   0   0   3   4   0   2   0   0   0    9    6
##   21   0   0   3   0   5   2   1   6   1   0   0   19    9
##   22   0   0   1   0   0   0   1   0   0   0   0    4    2
##   23   0   0   1   0   1   0   0   1   1   0   0    2    1
##   25   0   0   2   0   1   0   0   1   0   0   0    2    1
##   26   0   0   5   0   7  13   0   6   0   1   0   26   18
##   44   0   0   0   0   0   1   0   0   0   0   0    0    0
class(CompR$experience)
## [1] "integer"
class(CompR$minimum)
## [1] "character"
ExpR<-CompR[(CompR$minimum == "none"| CompR$minimum=="0")
            
            | (CompR$minimum=="some" & CompR$experience>=0)
            
            | (CompR$minimum==".5" & CompR$experience>=0.5)
            
            | (CompR$minimum=="1" & CompR$experience >=1.5)
            
            | (CompR$minimum=="2" & CompR$experience >=2)
            
            | (CompR$minimum=="3" & CompR$experience >=3)
            
            | (CompR$minimum=="4" & CompR$experience >=4)
            
            | (CompR$minimum=="5" & CompR$experience >=5)
            
            | (CompR$minimum=="6" & CompR$experience >=6)
            
            | (CompR$minimum=="7" & CompR$experience >=7)
            
            | (CompR$minimum=="8" & CompR$experience >=8)
            
            | (CompR$minimum=="10" & CompR$experience >=10),]

dim(ExpR)
## [1] 3601   18

3601 meet computer,education and experience requirements. (Out of 3834)

Recall, the 3834 records were retained because “Requirements”=yes. It is understood that this occurs when one of the following requirements are met: education, experience, computer, communication and organization.

We were able to tease out experience, computer and education requirements met, however, unable to distinquish whether organzation and communication requirements are met.

Unfortunately, we need to drop the records which have a organzational skills or communication requirement or both, and do not have an education, experience or computer skill requirement because we cannot explicity check if the resumes have this attribute.

In other words, we need to exclude this subset of records which have communication or organizational skills “requirements” but we have no way of determining if these resumes meet these requirements.

ExpRFinal<-ExpR[ExpR$reqcomm=="no" & ExpR$reqorg=="no",]
#


dim(ExpRFinal)
## [1] 2851   18

So, all of the resumes with “requirements” (which we can check) have met requirements corresponds to 2851 records.

Now we need to Append 1036 resumes which no requirements, to 2851 which met education, computer and experience requirements. 1036+2851=3887 records to be analyzed.

NewResume<-rbind(NoReq,ExpRFinal)

dim(NewResume)
## [1] 3887   18

Recall we began with 4870 and kept all records which met requirements 2851 or had no requirements 1036.

We move to our analysis of 3887 records.

Here are the names in the study, classified by ethnicity

NewResume$ethnicity[NewResume$ethnicity == 'afam'] <- 'AfricanAmer'
NewResume$ethnicity[NewResume$ethnicity == 'cauc'] <- 'Caucasian'

table(NewResume$name,NewResume$ethnicity)
##           
##            AfricanAmer Caucasian
##   Aisha            134         0
##   Allison            0       182
##   Anne               0       187
##   Brad               0        57
##   Brendan            0        56
##   Brett              0        51
##   Carrie             0       132
##   Darnell           38         0
##   Ebony            156         0
##   Emily              0       169
##   Geoffrey           0        53
##   Greg               0        48
##   Hakim             47         0
##   Jamal             54         0
##   Jay                0        60
##   Jermaine          48         0
##   Jill               0       170
##   Kareem            61         0
##   Keisha           145         0
##   Kenya            150         0
##   Kristen            0       153
##   Lakisha          159         0
##   Latonya          186         0
##   Latoya           163         0
##   Laurie             0       151
##   Leroy             57         0
##   Matthew            0        56
##   Meredith           0       140
##   Neil               0        72
##   Rasheed           58         0
##   Sarah              0       146
##   Tamika           200         0
##   Tanisha          160         0
##   Todd               0        59
##   Tremayne          65         0
##   Tyrone            64         0

Recall, we are interested in who got a call back. Lets look at some visualizations.

#there are few callbacks
ggplot(NewResume) + geom_bar(aes(x = call))

# a larger number of qualified females vs males
ggplot(NewResume) + geom_bar(aes(x = gender))

#Approximately the same number of African American and Caucasian resumes in the analysis, well balanced.

ggplot(NewResume, aes(ethnicity)) +
  geom_bar(fill = "#0073C2FF") +
  theme_pubclean()

#Caucasians appear to have a more call backs
ggplot(data = NewResume, aes(x = ethnicity, fill = call)) +
    geom_bar()

# the boxplots suggest similar years experience 
# of the two groups, African American and Caucasian

boxplot(NewResume$experience~NewResume$ethnicity)

# This gives you an idea of popular names and call backs
ggplot(NewResume, aes(x = name, fill = call)) +
  geom_bar() +labs(title = "Call back by Name")+
  theme(axis.text.x = element_text(angle = 90))  

# This gives you an idea of industry applicants
# Similar number of Caucasian vs African American applicants across industries

ggplot(NewResume, aes(x = industry, fill = ethnicity)) +
  geom_bar() +labs(title = "Call back by Industry")+
  theme(axis.text.x = element_text(angle = 90))  

# Similar number of Caucasian vs African American by requirement qualifications

ggplot(NewResume, aes(x = requirements, fill = ethnicity)) +
  geom_bar() +labs(title = "Call back by jobs with some requirement")+
  theme(axis.text.x = element_text(angle = 90)) 

#
#

#

Now, let’s get to the root problem…..

Is there an association between call-backs and ethnicity?

table1<-table(NewResume$ethnicity,NewResume$call)


table1
##              
##                 no  yes
##   AfricanAmer 1818  127
##   Caucasian   1752  190
chisq.test(table1)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table1
## X-squared = 13.307, df = 1, p-value = 0.0002644
##The chisq test is signficant.  

There is an association between ethnicity and calls backs. This conclusion is confirmed by the significant chisq test. Furthermore, view the mosaic plot below. The mosaic plot gives an overview of the data and makes it possible to recognize relationships between different variables. Notice the larger proportion of call backs for caucasians as depicted in the plot.

mosaicplot(table1, main = "Mosaic plot:  Call backs for Applicants ", color = TRUE)

##
#


#

Now, lets analyze two groups: Resumes with requirements that were met, Resumes with no requirements. Is there an association between ethnicity and call backs in the these groups?

We begin with the Group of resumes with no required qualifications.

NewResumeNoReq<-NewResume[NewResume$requirements=="no",]

table3<-table(NewResumeNoReq$ethnicity,NewResumeNoReq$call)
table3
##              
##                no yes
##   AfricanAmer 480  38
##   Caucasian   450  68
chisq.test(table3)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table3
## X-squared = 8.8383, df = 1, p-value = 0.00295
#The chisq test is signficant

There is an association between ethnicity and call backs for those applicants responding to ads for employment with no requirements.

mosaicplot(table3, main = "Call backs for Applicants NO requirements needed", color = TRUE)

#

#

Now lets look at the resumes with required qualifications met. A more skillful pool of prospective employees.

NewResumeReq<-NewResume[NewResume$requirements=="yes",]

table4<-table(NewResumeReq$ethnicity,NewResumeReq$call)
table4
##              
##                 no  yes
##   AfricanAmer 1338   89
##   Caucasian   1302  122
#
#
chisq.test(table4)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table4
## X-squared = 5.3139, df = 1, p-value = 0.02116
# The chisq test is significant.



mosaicplot(table4,main = "Call backs for Applicants meeting requirements", color = TRUE)

#
#

The group of resumes that meet requirements also shows there is an association between ethnicity and call backs.

FINAL CONCLUSION

Upon understanding the underlying data, we manipulated the dataset to contain the records feasible to our study.

There is an association between ethnicity and call backs, with Caucasians obtaining a higher proportion of call backs than African Americans.

In a further breakdown of resumes with and without requirements, both groups resumes had a significant chisq test indicating an association between ethnicity and call backs with Caucasians more likely to get called back.