Project 2 Team Members
Peter Phung, Coffy Andrews-Guo, Chinedu Onyeka, Krutika Patel
Data
Alan Noel, https://raw.githubusercontent.com/alnoel/CUNYSPS-Data607/main/globaldataset_20200414.csv
Infroduction
In this report we will be transforming and analyzing a data set on global human trafficking cases from 2002 to 2019, found at https://raw.githubusercontent.com/Patel-Krutika/Data_607/main/globaldataset_20200414.csv The data set contains 63 variables with information on the manner of control, exploitation and labor, and country in each case. The data will be manipulated to answer the following questions:
1. What means of control is most used now?
2. Which means of control is most used with females versus males?
3. Is there a trend in which means of control is most prevalent over the years?
4. What age groups are exploited the most in some of the types of labor?
library(dplyr)
library(plyr)
library(tidyr)
library(tidyverse)
library(ggplot2)
library(reshape2)A majority of the columns are of type integer using 1 and 0 to indicate TRUE and FALSE values respectively, and -99 to represent NA. As the data set is being read, all of the -99 value have been changed to NA for easier analysis.
df <- data.frame(read.csv("https://raw.githubusercontent.com/Patel-Krutika/Data_607/main/globaldataset_20200414.csv", header=T, na.strings=c("-99","NA")))
dim(df)## [1] 48801 64
For the purposes of this report we will only be looking at the manner of trafficking with regards to years and gender. The necessary columns have been selected from the original data frame to create a compact new data frame.
ht <- df %>% select( 2,4,5, starts_with("means"), starts_with("is"), starts_with("type"),
starts_with("recruiter"), starts_with("Country"))1. What means of control is most used now?
To see the means of control most used, a temporary data frame is created with the year column and all columns associated with a specific type of control. Over all Psychological Abuse has been the most used form of control, as well as the most used in 2018. The top five forms of control are show for all years combined and year 2018. The means of control for all cases registered in 2019 are not specified, hence the focus is on year 2018.
moc <- ht %>% select(starts_with("means") & !ends_with("Concatenated") &
!ends_with("NotSpecified"))
knitr::kable(head(data.frame(Means_Of_Control_Total = colnames(moc),
Count = colSums(moc, na.rm = TRUE))%>% arrange(desc(Count)),5))| Means_Of_Control_Total | Count |
|---|---|
| meansOfControlPsychologicalAbuse | 4845 |
| meansOfControlRestrictsMovement | 4415 |
| meansOfControlThreats | 3972 |
| meansOfControlPhysicalAbuse | 3632 |
| meansOfControlTakesEarnings | 2776 |
moc <- ht %>% filter(yearOfRegistration == 2018) %>% select(starts_with("means") & !ends_with("Concatenated") &
!ends_with("NotSpecified"))
knitr::kable(head(data.frame(Means_Of_Control_Total_2018 = colnames(moc),
Count = colSums(moc, na.rm = TRUE))%>% arrange(desc(Count)),5))| Means_Of_Control_Total_2018 | Count |
|---|---|
| meansOfControlPsychologicalAbuse | 997 |
| meansOfControlRestrictsMovement | 886 |
| meansOfControlThreats | 822 |
| meansOfControlPhysicalAbuse | 780 |
| meansOfControlPsychoactiveSubstances | 734 |
2. Which means of control is most used with females versus males?
Two separate data frames are created for with the gender filtered to included only female and male cases over all years respectively and all columns associated with a specific form of control. For females “Psychological Abuse” for the most used form of control, while the most used for males was “False Promises”. The top five form of control are shown for each group.
female <- data.frame(ht %>% filter(gender=="Female") %>% select(starts_with("means")
& !ends_with("Concatenated") & !ends_with("NotSpecified")))
knitr::kable(head(data.frame(Means_Of_Control_Female = colnames(female),
Count = colSums(female, na.rm = TRUE)) %>% arrange(desc(Count)),5))| Means_Of_Control_Female | Count |
|---|---|
| meansOfControlPsychologicalAbuse | 3549 |
| meansOfControlRestrictsMovement | 3231 |
| meansOfControlPhysicalAbuse | 2982 |
| meansOfControlThreats | 2941 |
| meansOfControlPsychoactiveSubstances | 2224 |
male <- data.frame(ht %>% filter(gender=="Male" ) %>% select(starts_with("means")
& !ends_with("Concatenated") & !ends_with("NotSpecified")))
knitr::kable(head(data.frame(Means_Of_Control_Male = colnames(male),
Count = colSums(male, na.rm = TRUE)) %>% arrange(desc(Count)),5))| Means_Of_Control_Male | Count |
|---|---|
| meansOfControlFalsePromises | 1403 |
| meansOfControlTakesEarnings | 1332 |
| meansOfControlExcessiveWorkingHours | 1301 |
| meansOfControlPsychologicalAbuse | 1296 |
| meansOfControlRestrictsMovement | 1184 |
3. Is there a trend in which means of control is most prevalent over the years?
In order to see the trend in the means of control over the years, all specific means are categorized into one of six categories:
Money: Debt Bondage, Restricts Financial Access, Takes Earnings
Abuse: Physical Abuse, Psychological Abuse, Sexual Abuse
Physical Dependency: Psychoactive Substances, Restricts Movement, Restricts Medical Care
Labor: Excessive Working Hours
Blackmail: Threats, Uses Children, Threat of Law Enforcement, Withholds Documents, Withholds Necessities
False Info: False Promises
Categorizing the forms of control allows us to better visualize the trends in use of control than if we were to look at each form of control separately.
The “ht” data frame is used to create a new temporary data frame with the year of case registration and each specific means of control. It is then mutate to add a column for each of the above category which holds the sum of values for each mean of control in it. The data frame is grouped by each and the sum of each category is calculated for each year.
moc_year <- ht %>% select(yearOfRegistration, starts_with("means") & !ends_with("Concatenated") & !ends_with("NotSpecified")& !ends_with("Other"))
c <- colnames(moc_year)
c <- gsub('^.{14}', '', c)
c[1] <- "Year"
colnames(moc_year) <- c
moc_year <- moc_year %>% mutate(Money = DebtBondage + RestrictsFinancialAccess + TakesEarnings,
Abuse = PhysicalAbuse + PsychologicalAbuse + SexualAbuse,
Physical_Dependency = PsychoactiveSubstances, RestrictsMovement + RestrictsMedicalCare,
Labour = ExcessiveWorkingHours,
Blackmail = Threats + UsesChildren + ThreatOfLawEnforcement + WithholdsDocuments + WithholdsNecessities,
False_Info = FalsePromises )
moc_year <- moc_year %>% group_by(Year) %>% dplyr::summarise( Money = sum(Money, na.rm=TRUE),
Abuse = sum(Abuse, na.rm=TRUE),
Physical_Dependency = sum(Physical_Dependency, na.rm=TRUE),
Labour = sum(Labour, na.rm=TRUE),
Blackmail = sum(Blackmail, na.rm=TRUE),
False_Info = sum(False_Info, na.rm=TRUE))
knitr::kable(moc_year)| Year | Money | Abuse | Physical_Dependency | Labour | Blackmail | False_Info |
|---|---|---|---|---|---|---|
| 2002 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2003 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2004 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2005 | 0 | 3 | 0 | 0 | 0 | 0 |
| 2006 | 0 | 0 | 0 | 1 | 0 | 1 |
| 2007 | 0 | 9 | 2 | 0 | 0 | 4 |
| 2008 | 0 | 174 | 29 | 102 | 0 | 122 |
| 2009 | 0 | 403 | 70 | 154 | 0 | 184 |
| 2010 | 0 | 168 | 48 | 174 | 0 | 201 |
| 2011 | 0 | 3 | 2 | 4 | 0 | 4 |
| 2012 | 0 | 12 | 0 | 77 | 0 | 71 |
| 2013 | 0 | 6 | 7 | 197 | 0 | 176 |
| 2014 | 0 | 0 | 4 | 90 | 0 | 177 |
| 2015 | 6 | 105 | 202 | 552 | 0 | 596 |
| 2016 | 15 | 177 | 528 | 476 | 0 | 597 |
| 2017 | 18 | 221 | 656 | 253 | 0 | 527 |
| 2018 | 0 | 378 | 734 | 77 | 3 | 96 |
| 2019 | 0 | 0 | 0 | 0 | 0 | 0 |
plot(moc_year$Year,moc_year$Money,type="l",col="red", ylim = c(0,800), main = "Labor Type", ylab = "Labor Type" )
lines(moc_year$Year,moc_year$Abuse,col="green")
lines(moc_year$Year,moc_year$Physical_Dependency,col="yellow")
lines(moc_year$Year,moc_year$Labour,col="blue")
lines(moc_year$Year,moc_year$Blackmail,col="pink")
lines(moc_year$Year,moc_year$False_Info,col="violet")
legend(x = "topleft",
legend = c("Money", "Abuse", "Physical_Dependency", "Labour", "Blackmail", "False_Info"), # Legend texts
col = c("red","green","yellow", "blue", "pink", "purple"),
lwd = 2) Forms of control relating to physical dependency have the highest numbers, followed by false information.
4. What age groups are exploited the most in some of the types of labor?
For the purposes of this question we will focus on the top five specific type of labor.
labour <- data.frame(select(ht, starts_with("typeOfLabour") & !ends_with("Concatenated")
& !ends_with("Other") & !ends_with("NotSpecified")))
knitr::kable(slice(data.frame(type = colnames(labour),
count = colSums(labour, na.rm = TRUE)) %>% arrange(desc(count)), 1:5))| type | count |
|---|---|
| typeOfLabourDomesticWork | 2744 |
| typeOfLabourConstruction | 1254 |
| typeOfLabourManufacturing | 453 |
| typeOfLabourAgriculture | 152 |
| typeOfLabourBegging | 149 |
To see the age groups most affected by the five types of labor, a temporary data frame will be created for each labor by selecting the age group column and the respective labor column. In order to account for NA values, the data frame has been filtered to not contain any rows with NAs in the age group column. The data is then grouped by age and arranged in descending order. A final data frame is created to contain the top age group for each type of labor.
#Domestic Work
dw <- ht %>% select(typeOfLabourDomesticWork, ageBroad) %>%
filter(typeOfLabourDomesticWork == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>%
dplyr::summarise(Total = n(),) %>% arrange(desc(Total))#Construction
c <- ht %>% select(typeOfLabourConstruction, ageBroad) %>%
filter(typeOfLabourConstruction == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>%
dplyr::summarise(Total = n(),) %>% arrange(desc(Total))#Manufacturing
m <- ht %>% select(typeOfLabourManufacturing, ageBroad) %>%
filter(typeOfLabourManufacturing == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>%
dplyr::summarise(Total = n(),) %>% arrange(desc(Total))#Agriculture
a <- ht %>% select(typeOfLabourAgriculture, ageBroad) %>%
filter(typeOfLabourAgriculture == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>%
dplyr::summarise(Total = n(),) %>% arrange(desc(Total))#Begging
b <- ht %>% select(typeOfLabourBegging, ageBroad) %>%
filter(typeOfLabourBegging == 1 & !is.na(ageBroad)) %>% group_by(ageBroad) %>%
dplyr::summarise(Total = n(),) %>% arrange(desc(Total))labour_type_age_broad <- data.frame(
Labour_Type = c("Domestic Work", "Contruction", "Manufacturing", "Agriculture", "Begging"))
labour_type <- data.frame(head(dw,1))
labour_type <- bind_rows(labour_type, head(c,1), head(m,1), head(a,1), head(b,1))
knitr::kable(bind_cols(labour_type_age_broad, labour_type))| Labour_Type | ageBroad | Total |
|---|---|---|
| Domestic Work | 30–38 | 223 |
| Contruction | 30–38 | 580 |
| Manufacturing | 30–38 | 130 |
| Agriculture | 30–38 | 69 |
| Begging | 9–17 | 102 |