Source: Kagle data - Data Analysis Jobs, Based on NYC Jobs - October 2021
This dataset contains current job postings available on the City of New York’s official jobs site ( http://www.nyc.gov/html/careers/html/search/search.shtml ). Internal postings available to city employees and external postings available to the general public are included.
library("tidyverse")
#> -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
#> v ggplot2 3.3.5 v purrr 0.3.4
#> v tibble 3.1.5 v dplyr 1.0.7
#> v tidyr 1.1.4 v stringr 1.4.0
#> v readr 2.0.2 v forcats 0.5.1
#> -- Conflicts ------------------------------------------ tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
library("reactable")The readr package in tidyverse library contains the function read_csv that will import data from a csv file. The csv file was downloaded from Kaggle.com dataset on Data Analysis Jobs in NYC for October 2021. The imported data was read in to RStudio as a dataframe, da.
da <- read_csv("https://github.com/candrewxs/Vignettes/blob/master/dadata/data_analysis_jobs.csv?raw=true")
#> Rows: 851 Columns: 23
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (18): Agency, Posting Type, Business Title, Civil Service Title, Title C...
#> dbl (4): Job ID, # Of Positions, Salary Range From, Salary Range To
#> lgl (1): Recruitment Contact
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.Retrieve the names attribute of the da data set with the names() function.
names(da)
#> [1] "Job ID" "Agency"
#> [3] "Posting Type" "# Of Positions"
#> [5] "Business Title" "Civil Service Title"
#> [7] "Title Code No" "Level"
#> [9] "Salary Range From" "Salary Range To"
#> [11] "Salary Frequency" "Work Location"
#> [13] "Division/Work Unit" "Job Description"
#> [15] "Minimum Qual Requirements" "Preferred Skills"
#> [17] "Additional Information" "To Apply"
#> [19] "Hours/Shift" "Work Location 1"
#> [21] "Recruitment Contact" "Residency Requirement"
#> [23] "Post Until"Create a new data frame using existing data frame (da) by extracting columns: Level and Posting Type. The count() combines group_by and count rows in each group into a single function. Renamed the column name “n” to “Count” and created an interactive data table.
plot1 <- da %>%
count(Level, `Posting Type`) # dplyr package: group and count
colnames(plot1)[3] <- "Count"
# use function colnames() to rename column
# and access individual column names with colnames(df)[index]
reactable(plot1) # interactive data table# discrete visualization with ggplot2
p1 <- ggplot(data = plot1)
p1 + geom_col(aes(Level, Count, fill = `Posting Type`))Create a new data frame using existing data frame (da) by extracting column: Business Title. Tidy data frame column to upper case and build a contingency table of the counts at each combination of factor levels with the table() function. Renamed the column name “pl2” to “Business Title” and created an interactive data table showing the frequency.
plot2 <- as.data.frame(da[,5])
plot2$`Business Title` <- toupper(plot2$`Business Title`)
pl2 <- as.data.frame(table(plot2))
colnames(pl2)[1] <- "Business Title"
reactable(pl2)summary(pl2) # calculates some statistics on the data
#> Business Title Freq
#> .NET DEVELOPER : 1 Min. : 1.000
#> .NET DEVELOPER ANALYST : 1 1st Qu.: 2.000
#> .NET PROGRAMMER/ANALYST (TECHNICAL LEAD): 1 Median : 2.000
#> .NET/CRM DEVELOPER : 1 Mean : 2.251
#> .NET/JAVASCRIPT DEVELOPER : 1 3rd Qu.: 2.000
#> 311 ASSISTANT BUSINESS SUPPORT ASSOCIATE: 1 Max. :19.000
#> (Other) :372Filter Business Titles with a frequency greater or equal to 10.
pl_2 <- filter(pl2, Freq >= 10)
pl_2
#> Business Title Freq
#> 1 ADMINISTRATIVE HOUSING SUPERINTENDENT 10
#> 2 BUSINESS ANALYST 19
#> 3 COMPUTER SPECIALIST (SOFTWARE) 16
#> 4 DATA ANALYST 11
#> 5 PROJECT MANAGER 14p2 <- ggplot(pl_2)
p2 + geom_col(aes(`Business Title`, Freq)) +
coord_flip()Kagle data - Data Analysis Jobs, Based on NYC Jobs was loaded and analyzed using R base functions and packages from Tidyverse and Reactable. These are the Tidyverse packages that were utilized to load (readr) , perform data manipulation (dplyr) , create graphic representation of input data (ggplot2).
Links GitHub