Project 2 Team Members

Peter Phung, Coffy Andrews-Guo, Chinedu Onyeka, Krutika Patel

Data

Alan Noel, https://raw.githubusercontent.com/alnoel/CUNYSPS-Data607/main/globaldataset_20200414.csv

Infroduction

In this report we will be transforming and analyzing a data set on global human trafficking cases from 2002 to 2019, found at https://raw.githubusercontent.com/Patel-Krutika/Data_607/main/globaldataset_20200414.csv The data set contains 63 variables with information on the manner of control, exploitation and labor, and country in each case. The data will be manipulated to answer the following questions:

1. What means of control is most used now?
2. Which means of control is most used with females versus males?
3. Is there a trend in which means of control is most prevalent over the years?
4. What age groups are exploited the most in some of the types of labor?
library(dplyr)
library(plyr)
library(tidyr)
library(tidyverse)
library(ggplot2)
library(reshape2)

A majority of the columns are of type integer using 1 and 0 to indicate TRUE and FALSE values respectively, and -99 to represent NA. As the data set is being read, all of the -99 value have been changed to NA for easier analysis.

df <- data.frame(read.csv("https://raw.githubusercontent.com/Patel-Krutika/Data_607/main/globaldataset_20200414.csv", header=T, na.strings=c("-99","NA")))
dim(df)
## [1] 48801    64

For the purposes of this report we will only be looking at the manner of trafficking with regards to years and gender. The necessary columns have been selected from the original data frame to create a compact new data frame.

ht <- df %>% select( 2,4,5, starts_with("means"), starts_with("is"), starts_with("type"),
             starts_with("recruiter"), starts_with("Country"))

1. What means of control is most used now?

To see the means of control most used, a temporary data frame is created with the year column and all columns associated with a specific type of control. Over all Psychological Abuse has been the most used form of control, as well as the most used in 2018. The top five forms of control are show for all years combined and year 2018. The means of control for all cases registered in 2019 are not specified, hence the focus is on year 2018.

moc <- ht %>% select(starts_with("means") & !ends_with("Concatenated") &
                       !ends_with("NotSpecified"))
knitr::kable(head(data.frame(Means_Of_Control_Total = colnames(moc), 
                             Count = colSums(moc, na.rm = TRUE))%>% arrange(desc(Count)),5))
Means_Of_Control_Total Count
meansOfControlPsychologicalAbuse 4845
meansOfControlRestrictsMovement 4415
meansOfControlThreats 3972
meansOfControlPhysicalAbuse 3632
meansOfControlTakesEarnings 2776
moc <- ht %>% filter(yearOfRegistration == 2018) %>% select(starts_with("means") & !ends_with("Concatenated") &
                       !ends_with("NotSpecified")) 
knitr::kable(head(data.frame(Means_Of_Control_Total_2018 = colnames(moc), 
                             Count = colSums(moc, na.rm = TRUE))%>% arrange(desc(Count)),5))
Means_Of_Control_Total_2018 Count
meansOfControlPsychologicalAbuse 997
meansOfControlRestrictsMovement 886
meansOfControlThreats 822
meansOfControlPhysicalAbuse 780
meansOfControlPsychoactiveSubstances 734

2. Which means of control is most used with females versus males?

Two separate data frames are created for with the gender filtered to included only female and male cases over all years respectively and all columns associated with a specific form of control. For females “Psychological Abuse” for the most used form of control, while the most used for males was “False Promises”. The top five form of control are shown for each group.

female <- data.frame(ht %>% filter(gender=="Female") %>% select(starts_with("means")
                          & !ends_with("Concatenated") & !ends_with("NotSpecified")))
knitr::kable(head(data.frame(Means_Of_Control_Female = colnames(female), 
                             Count = colSums(female, na.rm = TRUE)) %>% arrange(desc(Count)),5))
Means_Of_Control_Female Count
meansOfControlPsychologicalAbuse 3549
meansOfControlRestrictsMovement 3231
meansOfControlPhysicalAbuse 2982
meansOfControlThreats 2941
meansOfControlPsychoactiveSubstances 2224
male <- data.frame(ht %>% filter(gender=="Male" ) %>% select(starts_with("means") 
                          & !ends_with("Concatenated") & !ends_with("NotSpecified")))
knitr::kable(head(data.frame(Means_Of_Control_Male = colnames(male), 
                             Count = colSums(male, na.rm = TRUE)) %>% arrange(desc(Count)),5))
Means_Of_Control_Male Count
meansOfControlFalsePromises 1403
meansOfControlTakesEarnings 1332
meansOfControlExcessiveWorkingHours 1301
meansOfControlPsychologicalAbuse 1296
meansOfControlRestrictsMovement 1184

3. Is there a trend in which means of control is most prevalent over the years?

In order to see the trend in the means of control over the years, all specific means are categorized into one of six categories:

Money: Debt Bondage, Restricts Financial Access, Takes Earnings
Abuse: Physical Abuse, Psychological Abuse, Sexual Abuse
Physical Dependency: Psychoactive Substances, Restricts Movement, Restricts Medical Care
Labor: Excessive Working Hours
Blackmail: Threats, Uses Children, Threat of Law Enforcement, Withholds Documents, Withholds Necessities
False Info: False Promises

Categorizing the forms of control allows us to better visualize the trends in use of control than if we were to look at each form of control separately.

The “ht” data frame is used to create a new temporary data frame with the year of case registration and each specific means of control. It is then mutate to add a column for each of the above category which holds the sum of values for each mean of control in it. The data frame is grouped by each and the sum of each category is calculated for each year.

moc_year <- ht %>% select(yearOfRegistration, starts_with("means") & !ends_with("Concatenated") & !ends_with("NotSpecified")& !ends_with("Other")) 
c <- colnames(moc_year)
c <- gsub('^.{14}', '', c)
c[1] <- "Year"
colnames(moc_year) <- c

moc_year <- moc_year %>% mutate(Money = DebtBondage + RestrictsFinancialAccess + TakesEarnings,
                                Abuse = PhysicalAbuse + PsychologicalAbuse + SexualAbuse,
                                Physical_Dependency = PsychoactiveSubstances, RestrictsMovement + RestrictsMedicalCare, 
                                Labour = ExcessiveWorkingHours,
                                Blackmail = Threats + UsesChildren + ThreatOfLawEnforcement + WithholdsDocuments + WithholdsNecessities,
                                False_Info = FalsePromises )

moc_year <- moc_year %>% group_by(Year) %>% dplyr::summarise(  Money = sum(Money, na.rm=TRUE),
                                                               Abuse = sum(Abuse, na.rm=TRUE),
                                                               Physical_Dependency = sum(Physical_Dependency, na.rm=TRUE),
                                                               Labour = sum(Labour, na.rm=TRUE),
                                                               Blackmail = sum(Blackmail, na.rm=TRUE),
                                                               False_Info = sum(False_Info, na.rm=TRUE))

knitr::kable(moc_year)
Year Money Abuse Physical_Dependency Labour Blackmail False_Info
2002 0 0 0 0 0 0
2003 0 0 0 0 0 0
2004 0 0 0 0 0 0
2005 0 3 0 0 0 0
2006 0 0 0 1 0 1
2007 0 9 2 0 0 4
2008 0 174 29 102 0 122
2009 0 403 70 154 0 184
2010 0 168 48 174 0 201
2011 0 3 2 4 0 4
2012 0 12 0 77 0 71
2013 0 6 7 197 0 176
2014 0 0 4 90 0 177
2015 6 105 202 552 0 596
2016 15 177 528 476 0 597
2017 18 221 656 253 0 527
2018 0 378 734 77 3 96
2019 0 0 0 0 0 0
plot(moc_year$Year,moc_year$Money,type="l",col="red", ylim = c(0,800), main = "Labor Type", ylab = "Labor Type" )
lines(moc_year$Year,moc_year$Abuse,col="green")
lines(moc_year$Year,moc_year$Physical_Dependency,col="yellow")
lines(moc_year$Year,moc_year$Labour,col="blue")
lines(moc_year$Year,moc_year$Blackmail,col="pink")
lines(moc_year$Year,moc_year$False_Info,col="violet")
legend(x = "topleft",
       legend = c("Money", "Abuse", "Physical_Dependency", "Labour", "Blackmail", "False_Info"),  # Legend texts
       col = c("red","green","yellow", "blue", "pink", "purple"),
       lwd = 2)

Forms of control relating to physical dependency have the highest numbers, followed by false information.

4. What age groups are exploited the most in some of the types of labor?

For the purposes of this question we will focus on the top five specific type of labor.

labour <- data.frame(select(ht, starts_with("typeOfLabour") & !ends_with("Concatenated")
                            & !ends_with("Other") & !ends_with("NotSpecified")))
knitr::kable(slice(data.frame(type = colnames(labour), 
                              count = colSums(labour, na.rm = TRUE)) %>% arrange(desc(count)), 1:5))
type count
typeOfLabourDomesticWork 2744
typeOfLabourConstruction 1254
typeOfLabourManufacturing 453
typeOfLabourAgriculture 152
typeOfLabourBegging 149

To see the age groups most affected by the five types of labor, a temporary data frame will be created for each labor by selecting the age group column and the respective labor column. In order to account for NA values, the data frame has been filtered to not contain any rows with NAs in the age group column. The data is then grouped by age and arranged in descending order. A final data frame is created to contain the top age group for each type of labor.

#Domestic Work
dw <- ht %>% select(typeOfLabourDomesticWork, ageBroad) %>% 
  filter(typeOfLabourDomesticWork == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>% 
  dplyr::summarise(Total = n(),) %>% arrange(desc(Total))
#Construction
c <- ht %>% select(typeOfLabourConstruction, ageBroad) %>% 
  filter(typeOfLabourConstruction == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>% 
  dplyr::summarise(Total = n(),) %>% arrange(desc(Total))
#Manufacturing
m <- ht %>% select(typeOfLabourManufacturing, ageBroad) %>% 
  filter(typeOfLabourManufacturing == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>% 
  dplyr::summarise(Total = n(),) %>% arrange(desc(Total))
#Agriculture
a <- ht %>% select(typeOfLabourAgriculture, ageBroad) %>% 
  filter(typeOfLabourAgriculture == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>% 
  dplyr::summarise(Total = n(),) %>% arrange(desc(Total))
#Begging
b <- ht %>% select(typeOfLabourBegging, ageBroad) %>%
  filter(typeOfLabourBegging == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>% 
  dplyr::summarise(Total = n(),) %>% arrange(desc(Total))
labour_type_age_broad <- data.frame(
  Labour_Type = c("Domestic Work", "Contruction", "Manufacturing", "Agriculture", "Begging"))
labour_type <- data.frame(head(dw,1))
labour_type <- bind_rows(labour_type, head(c,1), head(m,1), head(a,1), head(b,1))
knitr::kable(bind_cols(labour_type_age_broad, labour_type))
Labour_Type ageBroad Total
Domestic Work 30–38 223
Contruction 30–38 580
Manufacturing 30–38 130
Agriculture 30–38 69
Begging 9–17 102