CUNY SPS DATA607 Tidyverse Extend Vignette

The original Tidyverse assignment was done by Coffy. Code chunks that contains extensions/additions are wrapped with comments in five hash tags.

This dataset contains current job postings available on the City of New York’s official jobs site ( http://www.nyc.gov/html/careers/html/search/search.shtml ). Internal postings available to city employees and external postings available to the general public are included.

Load Packages

##### Added 'message=FALSE' in the r chunk to suppress output from loading the libraries
library("tidyverse")
library("reactable")
#####

Data Set

The readr package in tidyverse library contains the function read_csv that will import data from a csv file. The csv file was downloaded from Kaggle.com dataset on Data Analysis Jobs in NYC for October 2021. The imported data was read in to RStudio as a dataframe, da.

##### Added 'message=FALSE' in the r chunk to suppress output from loading the libraries
da <- read_csv("https://github.com/candrewxs/Vignettes/blob/master/dadata/data_analysis_jobs.csv?raw=true")
#####

##### Check the head of the data
head(da)
#> # A tibble: 6 x 23
#>   `Job ID` Agency                        `Posting Type` `# Of Positions` `Business Title`
#>      <dbl> <chr>                         <chr>                     <dbl> <chr>           
#> 1   474330 ADMIN TRIALS AND HEARINGS     Internal                      1 Procurement Ana~
#> 2   493105 HRA/DEPT OF SOCIAL SERVICES   Internal                      1 DEPUTY COMMISSI~
#> 3   478953 DEPARTMENT OF TRANSPORTATION  Internal                      2 PRINCIPAL ADMIN~
#> 4   484908 DEPARTMENT OF CORRECTION      External                     22 Program Support~
#> 5   475012 DEPT OF CITYWIDE ADMIN SVCS   Internal                      1 Director of Dat~
#> 6   454719 DEPT OF HEALTH/MENTAL HYGIENE Internal                      1 Quality Managem~
#> # ... with 18 more variables: Civil Service Title <chr>, Title Code No <chr>,
#> #   Level <chr>, Salary Range From <dbl>, Salary Range To <dbl>,
#> #   Salary Frequency <chr>, Work Location <chr>, Division/Work Unit <chr>,
#> #   Job Description <chr>, Minimum Qual Requirements <chr>,
#> #   Preferred Skills <chr>, Additional Information <chr>, To Apply <chr>,
#> #   Hours/Shift <chr>, Work Location 1 <chr>, Recruitment Contact <lgl>,
#> #   Residency Requirement <chr>, Post Until <chr>
#####

Retrieve the names attribute of the da data set with the names() function.

names(da)
#>  [1] "Job ID"                    "Agency"                   
#>  [3] "Posting Type"              "# Of Positions"           
#>  [5] "Business Title"            "Civil Service Title"      
#>  [7] "Title Code No"             "Level"                    
#>  [9] "Salary Range From"         "Salary Range To"          
#> [11] "Salary Frequency"          "Work Location"            
#> [13] "Division/Work Unit"        "Job Description"          
#> [15] "Minimum Qual Requirements" "Preferred Skills"         
#> [17] "Additional Information"    "To Apply"                 
#> [19] "Hours/Shift"               "Work Location 1"          
#> [21] "Recruitment Contact"       "Residency Requirement"    
#> [23] "Post Until"

##### Check the variables and structure of each variable using dplyr's glimpse function
glimpse(da)
#> Rows: 851
#> Columns: 23
#> $ `Job ID`                    <dbl> 474330, 493105, 478953, 484908, 475012, 45~
#> $ Agency                      <chr> "ADMIN TRIALS AND HEARINGS", "HRA/DEPT OF ~
#> $ `Posting Type`              <chr> "Internal", "Internal", "Internal", "Exter~
#> $ `# Of Positions`            <dbl> 1, 1, 2, 22, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,~
#> $ `Business Title`            <chr> "Procurement Analyst", "DEPUTY COMMISSIONE~
#> $ `Civil Service Title`       <chr> "PROCUREMENT ANALYST", "ADMINISTRATIVE STA~
#> $ `Title Code No`             <chr> "12158", "10026", "10124", "56058", "1002E~
#> $ Level                       <chr> "2", "M5", "2", "0", "0", "1", "0", "0", "~
#> $ `Salary Range From`         <dbl> 50972.0, 88936.0, 53057.0, 54100.0, 102292~
#> $ `Salary Range To`           <dbl> 58618.0, 169476.0, 77124.0, 62215.0, 10229~
#> $ `Salary Frequency`          <chr> "Annual", "Annual", "Annual", "Annual", "A~
#> $ `Work Location`             <chr> "100 Church St., N.Y.", "4 World Trade Cen~
#> $ `Division/Work Unit`        <chr> "Admin, GC, PI & Exec", "Data Reporting/An~
#> $ `Job Description`           <chr> "The Procurement Analyst, under overall su~
#> $ `Minimum Qual Requirements` <chr> "1. A baccalaureate degree from an accredi~
#> $ `Preferred Skills`          <chr> "Ã¢Â€Â¢\tExperience in the procurement of ~
#> $ `Additional Information`    <chr> NA, "**LOAN FORGIVENESS  The federal gover~
#> $ `To Apply`                  <chr> "Special Note:  Only candidates who are cu~
#> $ `Hours/Shift`               <chr> NA, "Monday-Friday 9 AM-5 PM", "35/M-F/ 9:~
#> $ `Work Location 1`           <chr> NA, NA, "34-02 Queens Boulevard Long Islan~
#> $ `Recruitment Contact`       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
#> $ `Residency Requirement`     <chr> "New York City residency is generally requ~
#> $ `Post Until`                <chr> NA, "8-Nov-21", "26-Oct-21", NA, "6-Dec-21~
#####

Group by Variables

Dataset Question: How many job postings are Internal and External? What are the Levels and how many are available?

Create a new data frame using existing data frame (da) by extracting columns: Level and Posting Type. The count() combines group_by and count rows in each group into a single function. Renamed the column name “n” to “Count” and created an interactive data table.

plot1 <- da %>%
   count(Level, `Posting Type`)  # dplyr package: group and count 

colnames(plot1)[3] <- "Count" 
# use function colnames() to rename column 
# and access individual column names with colnames(df)[index]

reactable(plot1) # interactive data table

Visualization

##### Add title and theme to the plot
# discrete visualization with ggplot2
p1 <- ggplot(data = plot1) 
  p1 + geom_col(aes(Level, Count, fill = `Posting Type`)) + labs(title = "Bar Chart of Levels") + theme_bw()

##### Plot Barchart for only External Posting Type
# discrete visualization with ggplot2
p1 <-  plot1 |> filter(`Posting Type` == "External") |> ggplot(aes(Level, Count))
  p1 + geom_col(fill = "brown") + labs(title = "Bar Chart of Levels -  External") + theme_bw()

#####

##### Plot Barchart for only Internal Posting Type
# discrete visualization with ggplot2
p1 <-  plot1 |> filter(`Posting Type` == "Internal") |> ggplot(aes(Level, Count))
  p1 + geom_col(fill = "blue") + labs(title = "Bar Chart of Levels - Internal") + theme_bw()

#####

Dataset Question: What are the Business Titles categories which NYC hires Data Analysis to perform? Which Business Title has the highest demand for Data Analysis?

Create a new data frame using existing data frame (da) by extracting column: Business Title. Tidy data frame column to upper case and build a contingency table of the counts at each combination of factor levels with the table() function. Renamed the column name “pl2” to “Business Title” and created an interactive data table showing the frequency.

plot2 <- as.data.frame(da[,5])

plot2$`Business Title` <- toupper(plot2$`Business Title`)

pl2 <- as.data.frame(table(plot2))

colnames(pl2)[1] <- "Business Title"

reactable(pl2)

summary(pl2) # calculates some statistics on the data
#>                                   Business Title      Freq       
#>  .NET DEVELOPER                          :  1    Min.   : 1.000  
#>  .NET DEVELOPER ANALYST                  :  1    1st Qu.: 2.000  
#>  .NET PROGRAMMER/ANALYST (TECHNICAL LEAD):  1    Median : 2.000  
#>  .NET/CRM DEVELOPER                      :  1    Mean   : 2.251  
#>  .NET/JAVASCRIPT DEVELOPER               :  1    3rd Qu.: 2.000  
#>  311 ASSISTANT BUSINESS SUPPORT ASSOCIATE:  1    Max.   :19.000  
#>  (Other)                                 :372

Filter Business Titles with a frequency greater or equal to 10.

pl_2 <- filter(pl2, Freq >= 10) 

pl_2
#>                          Business Title Freq
#> 1 ADMINISTRATIVE HOUSING SUPERINTENDENT   10
#> 2                      BUSINESS ANALYST   19
#> 3        COMPUTER SPECIALIST (SOFTWARE)   16
#> 4                          DATA ANALYST   11
#> 5                       PROJECT MANAGER   14

Visualization

##### Add theme, title, and color to the bar chart
p2 <- ggplot(pl_2, aes(`Business Title`, Freq)) 
  p2 + geom_col(fill = "brown") + coord_flip() + 
    labs(title = "Frequency of each Job title") + xlab(NULL) + ylab("Frequency") + theme_bw()

#####

CUNY SPS DATA607 Tidyverse Extend Vignette

Chinedu Onyeka

2021-11-10

Load Packages

Data Set

Group by Variables

Dataset Question: How many job postings are Internal and External? What are the Levels and how many are available?

Visualization

Dataset Question: What are the Business Titles categories which NYC hires Data Analysis to perform? Which Business Title has the highest demand for Data Analysis?

Visualization

Conclusion