Guess Who Is the Lucky Dog?

–2011-2016 H1b Petition Analysis

by Alexa Chenyang Li

Synopsis

H-1B is an employment-based, non-immigrant visa category for temporary foreign workers in the United States. For a foreign national to apply for H1-B visa, an US employer must offer a job and petition for H-1B visa with the US immigration department. This is the most common visa status applied for and held by international students once they complete college/ higher education (Masters, PhD) and work in a full-time position.

As an international and a STEM-major student who wants to secure a job in the U.S., asking sponsorship for H-1b is an indispensable process. Therefore, it is important for international students to know what is going on about H-1b, especially when we are confronting a flunctant H-1b petition situation today.

As a business analytics major student, whose future job title is usually with the word “analyst”, learning the changes for analysts’ H-1b petition will be very useful for our job hunting. In this project, I will mainly focus on descriptive analysis, studying which worksite have the most H-1b petition and the certified H-1b petition, which employers send most number of H-1B visa applications, what kind of analyst will get the most H-1b petition and the certified H-1b petition, what is the proportion of full-time job and part-time job in general H-1b and certified H-1b petition. Furthermore, I will analyze the H1b petition situation by year and study the tendency from 2011 to 2016 in a general perspective.







Packages Required

library(stringr)  
library(tidyverse) 
library(DT)  
library(doBy) 
library(plyr)  
library(psych)
library(Hmisc) 
library(userfriendlyscience)
library(DataCombine)
library(plotly)

stringr
Simple, Consistent Wrappers for Common String Operations.

tidyverse
Allows data manipulation and works with other packages.

DT
Display the data on the screen in a scrollable format.

doBy
Facilities for groupwise computations of summary statistics.

plyr
Tools for splitting, applying and combining Data.

psych
General purpose toolbox for personality, psychometric theory and experimental psychology.

Hmisc
Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power.

userfriendlyscience
Make R more accessible by adding functions behave roughly like SPSS equivalents.

DataCombine
Tools for combining and cleaning data sets, particularly with grouped and time series data.

Data Preparation

Source: H-1B Visa Petitions 2011-2016
First I used readr package to read csv data. I imported the original data set and named it h1b_raw.

getwd()
setwd("C:/Users/Alexa Li/Desktop/R_Class")
h1b_raw<-read.csv("C:/Users/Alexa Li/Desktop/R_Class/h1b_kaggle.csv")

Then, I cleaned the missing values in h1b_raw and named the new dataset h1b.

str(h1b_raw)
complete.cases(h1b_raw)
h1b_raw[complete.cases(h1b_raw), ]
h1b_raw[!h1b_raw$C == "N/A", ]->h1b
str(h1b)
h1b <- na.omit(h1b)
h1b <- h1b[-c(19797, 74904, 185702, 187388, 188030, 277184, 393684, 396132, 438355, 
              506707, 721439, 874759, 2051359),]





I created several new data sets based on h1b. These new data sets are for different research questions. For example, when studying what kind of analyst will get the highest amount of H-1b petition, I selected titles that only contains analyst, counted the frequency of each kind of title and sorted the frequency by descending order. When studying the worksite with highest H1b petitions, I only kept WORKSITE for analysis. The original data sets contain 11 variables, but in some new data sets, I created the variable Freq to count the frequency of a certain element.

Data Dictionary

[1] "X": int  Number of each case

[2] "CASE_STATUS":  Factor  Status associated with the last significant event or decision. Valid values include "Certified," "Certified-Withdrawn," Denied," and "Withdrawn"

[3] "EMPLOYER_NAME":  Factor  Name of employer submitting labor condition application  

[4] "SOC_NAME":  Factor  Occupational name associated with the SOC_CODE. SOC_CODE is the occupational code associated with the job being requested for temporary labor condition, as classified by the Standard Occupational Classification (SOC) System

[5] "JOB_TITLE":  Factor  Title of the job    

[6] "FULL_TIME_POSITION":  Factor  Y = Full Time Position; N = Part Time Position 

[7] "PREVAILING_WAGE":  num  Prevailing Wage for the job being requested for temporary labor condition. The wage is listed at annual scale in USD. The prevailing wage for a job position is defined as the average wage paid to similarly employed workers in the requested occupation in the area of intended employment. The prevailing wage is based on the employer's minimum requirements for the position

[8] "YEAR":  int  Year in which the H-1B visa petition was filed

[9] "WORKSITE":  Factor  City and State information of the foreign worker's intended area of employment

[10] "lon":  num  Longitude of the worksite               

[11] "lat":  num  Latitude of the worksite

[12] "Freq"  num  Occurance of certain element

Exploratory Data Analysis

What kind of analyst will get the highest amount of H-1b petition?

What kind of analyst will get the highest amount of H-1b petition?

h1b%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%
  filter(str_detect(JOB_TITLE, "ANALYST"))->analyst
orderBy(~JOB_TITLE, analyst)->analyst
ddply(analyst, .(analyst$JOB_TITLE), nrow)->counts_title_freq
names(counts_title_freq) <- c("JOB_TITLE","Freq")
counts_title_freq[order(counts_title_freq$Freq, decreasing = TRUE), ]->counts_sort_title
head(counts_sort_title, n=30)->title_thir
title_tirty<-ggplot(data=title_thir, aes(JOB_TITLE,Freq, fill=Freq))+
  geom_bar(stat="identity")+
  theme(axis.text.x= element_text(angle = 45, vjust = 1, hjust = 1), axis.text.y= element_text(angle = 45, vjust = 1, hjust = 1),(axis.title.x = element_blank()))+
  labs(x=" ",y=" ")
ggplotly(title_tirty, width = 1000, height = 600)





What kind of analyst will get the highest amount of CERTIFIED H-1b petition?

What kind of analyst will get the highest amount of CERTIFIED H-1b petition?

analyst%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%
  filter(CASE_STATUS == "CERTIFIED")->certified
  certified<-orderBy(~JOB_TITLE, certified)
  ddply(certified, .(certified$JOB_TITLE), nrow)->counts_certified_analyst
  names(counts_certified_analyst) <- c("JOB_TITLE","Freq")
  counts_certified_analyst[order(counts_certified_analyst$Freq, decreasing = TRUE), ]->counts_sort_analyst
  head(counts_sort_analyst, n=30)->title_cer_thir
  title_cer_thirty<-ggplot(data=title_cer_thir,aes(JOB_TITLE,Freq,fill=Freq))+
    geom_bar(stat="identity")+theme(axis.text.x= element_text(angle = 45, vjust = 1, hjust = 1), axis.text.y= element_text(angle = 45, vjust = 1, hjust = 1),(axis.title.x = element_blank()))+
  labs(x=" ",y=" ")
  ggplotly(title_cer_thirty, width = 1000, height=600)




Which worksite has the highest amount of H-1b petition?

Which worksite has the highest amount of H-1b petition?

ddply(h1b, .(WORKSITE), nrow)->counts_worksite
names(counts_worksite) <- c("WORKSITE","Freq")
counts_worksite[order(counts_worksite$Freq, decreasing = TRUE), ]->counts_sort_worksite
head(counts_sort_worksite, n=30)->worksite_thir
worksite_thir_plot<-ggplot(data=worksite_thir, aes(WORKSITE,Freq,color=Freq))+
  geom_point(size=6)+
   theme(axis.text.x= element_text(angle = 45, vjust = 1, hjust = 1), axis.text.y= element_text(angle = 45, vjust = 1, hjust = 1),(axis.title.x = element_blank()))+
  labs(x=" ",y=" ")
ggplotly(worksite_thir_plot, width=1000, height=600)





In general, which employers send the highest number of H-1B petitions?

In general, which employers send the highest number of H-1B petitions?

h1b%>%
select(EMPLOYER_NAME)->em.data
ddply(em.data, .(EMPLOYER_NAME), nrow)->num_em
colnames(num_em) <- c("EMPLOYER_NAME", "Freq")
num_em[order(num_em$Freq, decreasing = TRUE), ]->sort_num_em
head(sort_num_em, 50L)->sort_num_em_fifty
sort_num_em_fifty_plot<-ggplot(data = sort_num_em_fifty, aes(x=EMPLOYER_NAME,y=Freq, fill=Freq)) + 
  geom_bar(stat = "identity")+
   theme(axis.text.x= element_text(angle = 45, vjust = 1, hjust = 1), axis.text.y= element_text(angle = 45, vjust = 1, hjust = 1),(axis.title.x = element_blank()))+
  labs(x=" ",y=" ")
ggplotly(sort_num_em_fifty_plot, width = 1100,height=600)





From this graph, we can see the distribution of H1b petitions from the top 50 companies that send the highest amount of applocations. We can find out that INFOSYS LIMITED not only send the highest amount of H1b petitions, but also has branch companies that locate in many cities in the U.S.. This may indicate that this company has a great deal of foreign employees and very large scale.

h1b%>%
select(EMPLOYER_NAME,WORKSITE)->EM_WO
ddply(EM_WO, .(EMPLOYER_NAME, WORKSITE), nrow)->counts_employer
colnames(counts_employer) <- c("EMPLOYER_NAME", "WORKSITE","Freq")
counts_employer[order(counts_employer$Freq, decreasing = TRUE), ]->counts_sort_employer
head(counts_sort_employer, 50L)->counts_sort_em_fifty
worksite_distribution<-ggplot(data = counts_sort_em_fifty, aes(x=EMPLOYER_NAME,y=WORKSITE, fill=WORKSITE)) + 
  geom_tile()+
  theme(axis.text.x= element_text(angle = 45, vjust = 1, hjust = 1), axis.text.y= element_text(angle = 45, vjust = 1, hjust = 1),(axis.title.x = element_blank()))+
  labs(x=" ",y=" ")
ggplotly(worksite_distribution, width = 1100, height=680)



According to different worksite, which employers send the highest number of H-1b applications?

According to different worksite, which employers send the highest number of H-1b applications?

employer_different_site<-ggplot(data=counts_sort_em_fifty, aes(x=EMPLOYER_NAME, y=Freq, fill=EMPLOYER_NAME))+
  geom_point(size=6)+
  theme(axis.text.x= element_text(angle = 45, vjust = 1, hjust = 1), axis.text.y= element_text(angle = 45, vjust = 1, hjust = 1),(axis.title.x = element_blank()))+
  labs(x=" ",y=" ")
ggplotly(employer_different_site, width = 1100, height=700)





The proportion of full-time job and part-time job in GENERAL H-1b petition

The proportion of full-time job and part-time job in GENERAL H-1b petition

Full Time Position

#Full Time Position#
sum(str_count(h1b$FULL_TIME_POSITION, "Y"))/nrow(h1b)
## [1] 0.8582076


Part Time Position

#Part Time Position#
sum(str_count(h1b$FULL_TIME_POSITION, "N"))/nrow(h1b)
## [1] 0.1417924


The proportion of full-time job and part-time job in CERTIFIED H-1b petition

The proportion of full-time job and part-time job in CERTIFIED H-1b petition


Full Time Position

#Full Time Position#
sum(str_count(certified$FULL_TIME_POSITION, "Y"))/nrow(certified)
## [1] 0.8267968


Part Time Position

#Part Time Position#
sum(str_count(certified$FULL_TIME_POSITION, "N"))/nrow(certified)
## [1] 0.1732032


H1b Petition Situation and Trend Analysis from 2011 to 2016

2016 Petition

2016 Petition

h1b%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(YEAR == "2016")->sixteen
sixteen%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(CASE_STATUS == "CERTIFIED")->sixteen_certified
Certified Rate in 2016

Certified Rate in 2016

nrow(sixteen_certified)/nrow(sixteen)->CertifiedRate2016
print(CertifiedRate2016)
## [1] 0.8800812
Descriptive Statistics for Prevailing Wage in 2016
General Petition
descriptives(sixteen$PREVAILING_WAGE)
## ###### Descriptives for sixteen$PREVAILING_WAGE 
## 
## Describing the central tendency:
##   mean median mode         95% CI mean
##  89016  68411   NA [85184.6; 92847.69]
## 
## Describing the spread:
##        var      sd   iqr   se
##  2.405e+12 1550796 27664 1955
## 
## Describing the range:
##  min    q1    q3       max
##    0 57512 85176 329139200
## 
## Describing the distribution shape:
##  skewness kurtosis     dip
##     117.1    15685 0.00653
## 
## Describing the sample size:
##   total NA.  valid
##  629301   0 629301
Certified Petition
descriptives(sixteen_certified$PREVAILING_WAGE)
## ###### Descriptives for sixteen_certified$PREVAILING_WAGE 
## 
## Describing the central tendency:
##   mean median mode          95% CI mean
##  74297  68557   NA [73967.71; 74626.38]
## 
## Describing the spread:
##        var     sd   iqr  se
##  1.564e+10 125048 27123 168
## 
## Describing the range:
##    min    q1    q3      max
##  15080 58053 85197 91199680
## 
## Describing the distribution shape:
##  skewness kurtosis     dip
##     698.7   509180 0.00688
## 
## Describing the sample size:
##   total NA.  valid
##  553836   0 553836


2015 Petition

2015 Petition

h1b%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(YEAR == "2015")->fifteen
fifteen%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(CASE_STATUS == "CERTIFIED")->fifteen_certified
Certified Rate in 2015

Certified Rate in 2015

nrow(fifteen_certified)/nrow(fifteen)->CertifiedRate2015
print(CertifiedRate2015)
## [1] 0.8858145
Descriptive Statistics for Prevailing Wage in 2015
General Petition
descriptives(fifteen$PREVAILING_WAGE)
## ###### Descriptives for fifteen$PREVAILING_WAGE 
## 
## Describing the central tendency:
##   mean median mode          95% CI mean
##  91661  66498   NA [87157.54; 96164.56]
## 
## Describing the spread:
##        var      sd   iqr   se
##  3.168e+12 1780005 25834 2298
## 
## Describing the range:
##  min    q1    q3       max
##    0 56160 82002 306741760
## 
## Describing the distribution shape:
##  skewness kurtosis      dip
##     101.7    11633 0.005736
## 
## Describing the sample size:
##   total NA.  valid
##  600120   0 600120
Certified Petition
descriptives(fifteen_certified$PREVAILING_WAGE)
## ###### Descriptives for fifteen_certified$PREVAILING_WAGE 
## 
## Describing the central tendency:
##   mean median mode          95% CI mean
##  72568  66560   NA [71678.32; 73458.13]
## 
## Describing the spread:
##        var     sd   iqr  se
##  1.096e+11 331042 25438 454
## 
## Describing the range:
##    min    q1    q3       max
##  15080 56430 81869 181232480
## 
## Describing the distribution shape:
##  skewness kurtosis     dip
##     456.5   220155 0.00601
## 
## Describing the sample size:
##   total NA.  valid
##  531595   0 531595


2014 Petition

2014 Petition

h1b%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(YEAR == "2014")->fourteen
fourteen%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(CASE_STATUS == "CERTIFIED")->fourteen_certified
Certified Rate in 2014

Certified Rate in 2014

nrow(fourteen_certified)/nrow(fourteen)->CertifiedRate2014
print(CertifiedRate2014)
## [1] 0.8774288
Descriptive Statistics for Prevailing Wage in 2014
General Petition
descriptives(fourteen$PREVAILING_WAGE)
## ###### Descriptives for fourteen$PREVAILING_WAGE 
## 
## Describing the central tendency:
##    mean median mode            95% CI mean
##  181762  65166   NA [170193.63; 193330.03]
## 
## Describing the spread:
##        var      sd   iqr   se
##  1.735e+13 4165279 26228 5902
## 
## Describing the range:
##  min    q1    q3       max
##    0 55162 81390 820132347
## 
## Describing the distribution shape:
##  skewness kurtosis      dip
##     55.79     5525 0.005368
## 
## Describing the sample size:
##   total NA.  valid
##  498029   0 498029
Certified Petition
descriptives(fourteen_certified$PREVAILING_WAGE)
## ###### Descriptives for fourteen_certified$PREVAILING_WAGE 
## 
## Describing the central tendency:
##   mean median mode          95% CI mean
##  70637  65250   NA [70563.93; 70709.31]
## 
## Describing the spread:
##        var    sd   iqr    se
##  601091548 24517 25480 37.09
## 
## Describing the range:
##    min    q1    q3     max
##  14581 55515 80995 1512072
## 
## Describing the distribution shape:
##  skewness kurtosis      dip
##     2.407     38.4 0.005701
## 
## Describing the sample size:
##   total NA.  valid
##  436985   0 436985


2013 Petition

2013 Petition

h1b%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(YEAR == "2013")->thirteen
thirteen%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(CASE_STATUS == "CERTIFIED")->thirteen_certified
Certified Rate in 2013

Certified Rate in 2013

nrow(thirteen_certified)/nrow(thirteen)->CertifiedRate2013
print(CertifiedRate2013)
## [1] 0.8673309
Descriptive Statistics for Prevailing Wage in 2013
General Petition
descriptives(thirteen$PREVAILING_WAGE)
## ###### Descriptives for thirteen$PREVAILING_WAGE 
## 
## Describing the central tendency:
##    mean median mode            95% CI mean
##  194010  64085   NA [159254.01; 228765.16]
## 
## Describing the spread:
##        var       sd   iqr    se
##  1.328e+14 11525267 26852 17733
## 
## Describing the range:
##   min    q1    q3       max
##  7799 53914 80766 6.998e+09
## 
## Describing the distribution shape:
##  skewness kurtosis      dip
##     532.1   321726 0.006304
## 
## Describing the sample size:
##   total NA.  valid
##  422427   0 422427
Certified Petition
descriptives(thirteen_certified$PREVAILING_WAGE)
## ###### Descriptives for thirteen_certified$PREVAILING_WAGE 
## 
## Describing the central tendency:
##   mean median mode          95% CI mean
##  71863  64189   NA [70127.68; 73597.63]
## 
## Describing the spread:
##        var     sd   iqr    se
##  2.871e+11 535812 26517 885.2
## 
## Describing the range:
##    min    q1    q3       max
##  15080 54184 80702 169507520
## 
## Describing the distribution shape:
##  skewness kurtosis      dip
##     246.8    63764 0.006684
## 
## Describing the sample size:
##   total NA.  valid
##  366384   0 366384


2012 Petition

2012 Petition

h1b%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(YEAR == "2012")->twelve
twelve%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(CASE_STATUS == "CERTIFIED")->twelve_certified
Certified Rate in 2012

Certified Rate in 2012

nrow(twelve_certified)/nrow(twelve)->CertifiedRate2012
print(CertifiedRate2012)
## [1] 0.8513904
Descriptive Statistics for Prevailing Wage in 2012
General Petition
descriptives(twelve$PREVAILING_WAGE)
## ###### Descriptives for twelve$PREVAILING_WAGE 
## 
## Describing the central tendency:
##    mean median mode            95% CI mean
##  176107  62546   NA [163884.67; 188328.44]
## 
## Describing the spread:
##        var      sd   iqr   se
##  1.533e+13 3915475 26541 6236
## 
## Describing the range:
##    min    q1    q3       max
##  15.16 51854 78458 378343680
## 
## Describing the distribution shape:
##  skewness kurtosis     dip
##      41.5     2003 0.00486
## 
## Describing the sample size:
##   total NA.  valid
##  394268   0 394268
Certified Petition
descriptives(twelve_certified$PREVAILING_WAGE)
## ###### Descriptives for twelve_certified$PREVAILING_WAGE 
## 
## Describing the central tendency:
##   mean median mode         95% CI mean
##  71001  62691   NA [68628.6; 73373.44]
## 
## Describing the spread:
##        var     sd   iqr   se
##  4.918e+11 701297 26082 1210
## 
## Describing the range:
##    min    q1    q3       max
##  10504 52395 78478 207277824
## 
## Describing the distribution shape:
##  skewness kurtosis      dip
##       222    51965 0.005121
## 
## Describing the sample size:
##   total NA.  valid
##  335676   0 335676


2011 Petition

2011 Petition

h1b%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(YEAR == "2011")->eleven
eleven%>%
  select(X,CASE_STATUS, EMPLOYER_NAME,SOC_NAME,JOB_TITLE,FULL_TIME_POSITION,PREVAILING_WAGE,YEAR,WORKSITE,lon,lat)%>%  
  filter(CASE_STATUS == "CERTIFIED")->eleven_certified
Certified Rate in 2011

Certified Rate in 2011

nrow(eleven_certified)/nrow(eleven)->CertifiedRate2011
print(CertifiedRate2011)
## [1] 0.8621596
Descriptive Statistics for Prevailing Wage in 2011
General Petition
descriptives(eleven$PREVAILING_WAGE)
## ###### Descriptives for eleven$PREVAILING_WAGE 
## 
## Describing the central tendency:
##    mean median mode            95% CI mean
##  194288  61152   NA [178903.15; 209672.97]
## 
## Describing the spread:
##        var      sd   iqr   se
##  2.056e+13 4533927 27269 7850
## 
## Describing the range:
##  min    q1    q3       max
##    0 50315 77584 1.008e+09
## 
## Describing the distribution shape:
##  skewness kurtosis      dip
##     63.92     8669 0.005833
## 
## Describing the sample size:
##   total NA.  valid
##  333625   0 333625
Certified Petition
descriptives(eleven_certified$PREVAILING_WAGE)
## ###### Descriptives for eleven_certified$PREVAILING_WAGE 
## 
## Describing the central tendency:
##   mean median mode          95% CI mean
##  75415  61464   NA [71151.15; 79678.04]
## 
## Describing the spread:
##        var      sd   iqr   se
##  1.361e+12 1166632 27269 2175
## 
## Describing the range:
##    min    q1    q3      max
##  15070 50565 77896 3.06e+08
## 
## Describing the distribution shape:
##  skewness kurtosis      dip
##     163.6    32560 0.005889
## 
## Describing the sample size:
##   total NA.  valid
##  287638   0 287638


Summary about H1b Certification and Prevailing Wage Trend

Since the mean of prewailing wage each year is strongly influenced by outliers, I choose to study the median of prevailing wage and how the medians have changed.

Certification Trend
year<-c("2011","2012","2013","2014","2015","2016")
certified_rate<-c(CertifiedRate2011,CertifiedRate2012,CertifiedRate2013,CertifiedRate2014,CertifiedRate2015,CertifiedRate2016)
certified_year <- data.frame(year, certified_rate)
certified_trend<-ggplot(data=certified_year, aes(x=year, y=certified_rate,fill=year))+
  geom_point(size=6)+theme(axis.title.x = element_blank())+
  labs(x=" ",y=" ")
ggplotly(certified_trend,width=1000,height = 600)
Prevailing Wage Trend
YEAR<-c("2016","2016","2015","2015","2014","2014","2013","2013","2012","2012","2011","2011")
median_wage<-c("68411","68557","66498","66560","65166","65250","64080","64189","62546","62691","61150","61464")
status<-c("general","certified","general","certified","general","certified","general","certified","general","certified","general","certified")
data.frame(YEAR, median_wage,status)->wage_analysis
Prevailing_Wage_Trend<-ggplot(data = wage_analysis,aes(x=YEAR,y=median_wage,color=status))+
  geom_point(size=6)+
  theme(axis.title.x = element_blank())+
  labs(x=" ",y=" ")
ggplotly(Prevailing_Wage_Trend,width=1000,height = 600)

Summary

  • Job Title
    According to this analysis, programmer analyst, among all kinds of analysts, has the highest amount of H1b petition, which could indicate that international students are in highest demand for program analyst position. I am also grad to see that business analyst ranks the third, and among top 50 of different kinds of analysts, there are also other business-analytics-related title, such as data analyst, financial analyst and market research analyst.

  • Worksite
    When talking about worksite, New York City is the one has the most amount of H1b petition. It is very interesting, since when I considered pursuing a master degree in the U.S., many people said that sometimes location would be the most important factor to international students’ job hunting. This analysis proved that this may make sense. So international students at Wake Forest University may consider relocating to NYC after graduation. To my surprise, Charlotte is also in the top 30, international students at Wake Forest University may take this location advantage in future job hunting.

  • Employer
    When it comes to which employer send highest amount of H1b petition, the result is beyond my expectation. I thought it might be one of the FLAG companies(Facebook, LinkedIn, Amazon/Apple, Google), but actually it is INFOSYS LIMITED, the second is TATA Consultancy Services Limited and the third is WIPRO Limited. I have never heard of these companies, so I did some research. I found out these three are all Indian information technology services companies and are all reported having H1b abuse issues.







Hence as to the U.S. government, they should give high priority to the H1b abuse issue when considering H1b reform, since this issue not only keep qualified international students from jobs, but also let the Americans lose jobs since these H1b abusers usually require unreasonably lower wages.

  • Certified Situation from 2011 to 2016
    Although the H1b abuse issue prevent some qualified foreign employees from jobs, there are still high possibility to get certified. From 2011 to 2016, the certified rates have always been beyond 80%, and the possibility is increasing in recent years, which may be a good news for foreign employees.

  • Full Time VS Part Time
    When comparing H1b petitions for full time and part time jobs, full time job has an obvious higher amount of petitions and certified petitions than part time job does, which suggests that international students should focus on finding full-time jobs if they want to work in the U.S for the long run.

  • Prevailing Wages
    Lottery system is well-known when talking about H1b petition, which means whether one could get H1b certified is random. But when analyzing the prevailing wages for H1b petition from 2011 to 2016, we can see that the median wages for certified cases are always higher than the general cases. I also found out that generally the H1b petitions have cases with 0 wages, the certified cases, however, all have minimum incomes beyond $10k. which may indicate that the wage will influence the certified possibility and the system may be not completely “lottery-like”.

  • Suggestions

To the U.S. Government:

The U.S. Government should launch the policy regarding the abuse of H1b, so that the real qualified people for H1b will contribute to the development of the U.s. economy, and the domestic employees will also secure their job since they will not be replaced by someone who only asked for minimum wage.

In addition, once issue of H1b abuse is solved, maybe we will not need the lottery system any more, since the reason government uses lottery system is that the number of people who asked for H1b is way beyond the number of H1b that the government plans to release. By canceling the lottery system, the government may save the budget and improve the efficiency.

To International Students:

  1. When searching for jobs, they should focus more on full-time positions, since these positions provide much more H1b petitions than part-time positions do, also the certified rate for full-time positions is also much higher than that for part-time positions.

  2. As to international students who plan to seek jobs with the “analyst” title, I suggest them improve their programming skills, because “programmer analyst” provides the highest amount of H1b petition, which means international students are welcomed.

  3. In order to improve the certified rate of H1b petition, international students should compare the wage of their desired position with the median wage of all H1b applicants and try to find a position that has higher wage than the median for all applicants.

  4. Explore the job oppurtunities in big cities, since these places will give more H1b petitions.

  5. Stay positive because the certified rate is increasing!

  • Limitations
    • Further studies need to be done to predict the tendency for H1b petition
    • The prevailing wage and certified rate only for the title with “analyst” need to be studied
    • No enough variables doing linear regression analysis