title: “US Perm Visa Denied” author: “CT” date: “2/26/2020” output: html_document — US Visa Application for Labor Certification Dataset # https://www.foreignlaborcert.doleta.gov/performancedata.cfm US Visa Application for Labor Certification Dataset Background: This dataset contains administrative data from employers’ Applications for Permanent Employment Certification (ETA Form 9089) and certification determinations processed by the Department’s Office of Foreign Labor Certification, Employment and Training Administration, where the date of the determination was issued on or after October 1, 2018, and on or before September 30, 2019.
The process is that the employers file and not the employee. In general, the DOL works to ensure that the admission of foreign workers to work in the U.S. will not adversely affect the job opportunities, wages and working conditions of U.S. workers. Once a permanent labor certification application has been approved by the DOL, the employer will need to seek the immigration authorization from the U.S. Citizenship and Immigration Services (USCIS). DOL processes Applications for Permanent Employment Certification, ETA Form 9089, except for Schedule A and sheepherder applications which are filed under 20 CFR § 656.16. The date the labor certification application is received by the DOL is known as the filing date and is used by USCIS and the Department of State as the priority date. After the labor certification application is certified by DOL, it is valid for 180 days and it should be submitted to the appropriate USCIS Service Center with a Form I-140, Immigrant Petition for Alien Worker. Purpose: The purpose of this project was to practice data visualization techniques using R. as a beginner. The process: Data cleaning, data subset and variable selection Since the dataset contained originally 154 columns with over 50,000 observations, a subset was selected for the analysis: visa applications “denied”. This step narrowed down the dataset close to 25,000 observations. From there, some 29 variables were selected to explore some key trends in the visa denied subset. Subsequently the following variables were retained. The ones used to produce the graphs include:
case_number, case_status employer_name, employer_state, pw_soc_code pw_soc_title job_info_education job_info_major, job_info_alt_field, job_info_experience, job_info_foreign_ed, job_info_job_req_normal, country_of_citizenship, recr_info_professional_occ, foreign_worker_info_education, pw_job_title_9089, ri_coll_tch_basic_process, job_info_foreign_lang_req
## function (...)
## {
## dots = resolve(...)
## if (length(dots))
## defaults <<- merge(dots)
## invisible(NULL)
## }
## <bytecode: 0x0000000013576390>
## <environment: 0x00000000135a02d8>
##Get the data
##Read the data
## [1] "C:/Users/rande/Documents/Data visualization/Spring2020"
##Uploading the libraries
##Creating new data frame for the denied cases only
df <- fread(filename)
den_df <- df[case_status == "Denied"]
##Selection of columns for the new data frame
##creating new variable as new_state to deal with the ##issue of having abbreviations and state names spelled out
## # A tibble: 30 x 2
## country_of_citizenship count
## <fct> <int>
## 1 INDIA 8546
## 2 SOUTH KOREA 2382
## 3 MEXICO 1777
## 4 CHINA 1487
## 5 PHILIPPINES 1361
## 6 CANADA 805
## 7 UNITED KINGDOM 360
## 8 PAKISTAN 308
## 9 VENEZUELA 261
## 10 JAPAN 256
## # ... with 20 more rows
fillColor = "#7BB700"
fillColor4 = "#f10fad"
fillcolor2 = "#FFD505"
## New dataframe for state where employers are located
## # A tibble: 2 x 2
## refile count
## <fct> <int>
## 1 N 7192
## 2 Y 49
fillcolor4= "#9B111E"
##Libraries
library(dplyr)
library(plotly)
## # A tibble: 19 x 2
## orig_file_date count
## <chr> <int>
## 1 1996 1
## 2 1997 1
## 3 1998 1
## 4 2001 22
## 5 2002 1
## 6 2003 5
## 7 2004 1
## 8 2005 4
## 9 2006 1
## 10 2007 3
## 11 2008 2
## 12 2009 2
## 13 2010 1
## 14 2011 2
## 15 2012 5
## 16 2013 4
## 17 2014 13
## 18 2015 28
## 19 2016 8