Section 1 - Overview:

In this document we will explore Salary information for High-Skilled Immigrant Employees on work visas, using data analysis in R. Some of the questions we will analyze include:
  1. Highest Paying Employers, by State.
  2. Salary range by job title.
  3. Number of visas by Year.




Section 2 - Data Preparation:

We start by loading the salary dataset given in salary_data.csv file and preparing the workspace.

Let’s take a peek at the data.

saldata[1:10, "EMPLOYER_NAME"]
##  [1] "ADVANCED TECHNOLOGY GROUP USA, INC."               
##  [2] "SAN FRANCISCO STATE UNIVERSITY"                    
##  [3] "CAROUSEL SCHOOL"                                   
##  [4] "HARLINGEN CONSOLIDATED INDEPENDENT SCHOOL DISTRICT"
##  [5] "SIGNAL SCIENCES CORPORATION"                       
##  [6] "CAPGEMINI U.S. LLC"                                
##  [7] "PURE STORAGE, INC."                                
##  [8] "POLMAK, INC."                                      
##  [9] "GOOGLE INC."                                       
## [10] "STLPORT CONSULTING, INC."


Next we look at the type of visas and the salary ranges in our dataset.

table(saldata$VISA_CLASS)
## 
##  E-3 Australian       greencard            H-1B     H-1B1 Chile 
##            1393           11093          154497             147 
## H-1B1 Singapore 
##             148
summary(saldata$PREVAILING_WAGE_SUBMITTED)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       5   54500   69700   71200   89900 6220000


Number of records by job-types.

table(saldata$JOB_TITLE_SUBGROUP)
## 
##   assistant professor              attorney      business analyst 
##                 18866                  1488                 27811 
##          data analyst        data scientist management consultant 
##                  3840                  1227                   770 
##     software engineer               teacher 
##                 99364                 13912


Next, we will remove columns with more than 70% missing values and create some derived variables. (Code not shown)


We will also create a new Date variable in a standard format.

saldata$case_date = as.Date(saldata$CASE_RECEIVED_DATE, "%m/%d/%Y")
saldata$caseyr = as.numeric(format(saldata$case_date, "%Y"))



Section 3 - Graphical Exploration

  1. Let us view the number of high-skilled visa applications per year.


2. Visualize number of visas by year and job title.




Section 4 - Data analysis

  1. let us aggregate information on the employers with highest number of applications, by State.
    We see California leads our list.
setctstate = sqldf("select count(*) as 'count',
              EMPLOYER_NAME as 'employer',
              WORK_STATE2 as 'state',
              avg(PAID_WAGE_PER_YEAR) as 'salary'
              from activesal
              group by EMPLOYER_NAME, WORK_STATE2")
#top_setctstate = setctstate[with(setctstate, order(state, -count, -salary)),]
top_setctstate = setctstate[with(setctstate, order( -count)),]
head(top_setctstate)
##       count                    employer      state salary
## 11321  4195                 GOOGLE INC. california 126006
## 17765  1112       MICROSOFT CORPORATION washington 121732
## 22106   882 QUALCOMM TECHNOLOGIES, INC. california  94285
## 9691    806              FACEBOOK, INC. california 129203
## 13934   732           INTEL CORPORATION california 101627
## 5226    700          CERNER CORPORATION   missouri  69044


2)Let us see which states offer the highest salary.

##            state  salary
## 1             dc 2400000
## 2       new york 2400000
## 3        alabama 1400000
## 4           ohio  600000
## 5     california  582400
## 6          texas  542400
## 7  massachusetts  528000
## 8         kansas  475000
## 9       new york  428333
## 10        nevada  385000


3) Analyze how software engineers fared! A topic (obviously) close to my heart! We create a dataframe called setstatesal for this analysis.

setstatesal = sqldf("select caseyr as 'year', 
                    count(*) as 'count',
                    JOB_TITLE_SUBGROUP as 'title',
                    WORK_STATE2 as 'state',
                    max(PAID_WAGE_PER_YEAR) as 'maxsal',
                    avg(PAID_WAGE_PER_YEAR) as 'avgsal',
                    min(PAID_WAGE_PER_YEAR) as 'minsal'
                    from activesal
                    group by caseyr, JOB_TITLE_SUBGROUP, WORK_STATE2 ")



Section 5 - Data Visualization

  1. Visualize states with highest avg salary for software engineers, using ggplot.


2) Salary by year and avg_salary.
From graph below, we conclude that Attorneys make the highest salaries, which is not very surprising!
Also, 2012 was the best year (salary-wise) for foreign software engineers.


3) Interactive Map using Leaflet. We create an interactive map to show Top employers in every state.
Click on the blue marker to see info. The numbered circles are clickable links that zoom into the map for a better view.


That concludes our analysis for now. Did you like this tutorial? If so, please don’t forget to share!