The {vtree} package is a tool for calculating and displaying variable trees.
Here are the steps on how to use the vtree package.
# Set working directory
setwd("C:/MyRData/Exploring data using the vtree package")
# Upload tidyverse library
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.4.2 v purrr 1.0.1
## v tibble 3.2.1 v dplyr 1.1.2
## v tidyr 1.3.0 v stringr 1.5.0
## v readr 2.1.3 v forcats 0.5.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# Upload vtree library
library(vtree)
# Import data
df<- read.csv("resume.csv", stringsAsFactors = TRUE, fileEncoding = "UTF-8-BOM")
# Get a glimpse of the data
df %>% glimpse()
## Rows: 4,870
## Columns: 31
## $ rownames <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ~
## $ job_ad_id <int> 384, 384, 384, 384, 385, 386, 386, 385, 386, 38~
## $ job_city <fct> Chicago, Chicago, Chicago, Chicago, Chicago, Ch~
## $ job_industry <fct> manufacturing, manufacturing, manufacturing, ma~
## $ job_type <fct> supervisor, supervisor, supervisor, supervisor,~
## $ job_fed_contractor <fct> , , , , No, No, No, No, No, No, No, No, No, No,~
## $ job_equal_opp_employer <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye~
## $ job_ownership <fct> unknown, unknown, unknown, unknown, nonprofit, ~
## $ job_req_any <fct> Yes, Yes, Yes, Yes, Yes, No, No, Yes, No, No, Y~
## $ job_req_communication <fct> No, No, No, No, No, No, No, No, No, No, No, No,~
## $ job_req_education <fct> No, No, No, No, No, No, No, No, No, No, No, No,~
## $ job_req_min_experience <fct> 5, 5, 5, 5, some, , , some, , , some, some, , ,~
## $ job_req_computer <fct> Yes, Yes, Yes, Yes, Yes, No, No, Yes, No, No, Y~
## $ job_req_organization <fct> No, No, No, No, Yes, No, No, Yes, No, No, No, N~
## $ job_req_school <fct> none_listed, none_listed, none_listed, none_lis~
## $ received_callback <fct> No, No, No, No, No, No, No, No, No, No, No, No,~
## $ firstname <fct> Allison, Kristen, Lakisha, Latonya, Carrie, Jay~
## $ race <fct> white, white, black, black, white, white, white~
## $ gender <fct> Female, Female, Female, Female, Female, Male, F~
## $ years_college <int> 4, 3, 4, 3, 3, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 1,~
## $ college_degree <fct> Yes, No, Yes, No, No, Yes, Yes, No, Yes, Yes, Y~
## $ honors <fct> No, No, No, No, No, Yes, No, No, No, No, No, No~
## $ worked_during_school <fct> No, Yes, Yes, No, Yes, No, Yes, No, No, Yes, No~
## $ years_experience <int> 6, 6, 6, 6, 22, 6, 5, 21, 3, 6, 8, 8, 4, 4, 5, ~
## $ computer_skills <fct> Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, No,~
## $ special_skills <fct> No, No, No, Yes, No, Yes, Yes, Yes, Yes, Yes, Y~
## $ volunteer <fct> No, Yes, No, Yes, No, No, Yes, Yes, No, Yes, Ye~
## $ military <fct> No, Yes, No, No, No, No, No, No, No, No, No, No~
## $ employment_holes <fct> Yes, No, No, Yes, No, No, No, Yes, No, No, Yes,~
## $ has_email_address <fct> No, Yes, No, Yes, Yes, No, Yes, Yes, No, Yes, Y~
## $ resume_quality <fct> low, high, low, high, high, low, high, high, lo~
# Create age group variable
df<-df %>%
mutate(
# Create categories
years_experience_group = dplyr::case_when(
years_experience <= 5 ~ "1-5 years",
years_experience > 5 & years_experience <= 10 ~ "6-10 years",
years_experience > 10 & years_experience <= 20 ~ "11-20 years",
years_experience > 20 ~ "> 20 years"
),
# Convert to factor
years_experience_group = factor(years_experience_group , level = c("1-5 years","6-10 years","11-20 years","> 20 years")))
vtree(df,"race years_experience_group received_callback", horiz=FALSE)
# Display summaries
vtree(df,"gender",summary="years_experience",horiz=FALSE)
# Find patterns. Example 1
vtree(df,"race received_callback", pattern = TRUE)
# Find patterns. Example 2
vtree(df,"race gender received_callback", pattern = TRUE)
# Find patterns. Example 3
vtree(df,"race gender years_experience_group received_callback", pattern = TRUE)
If you are new to R Programming Language, don’t give up. Your R skills will get better with time.
Note. The ‘resume.csv’ file can be downloaded from: <https://vincentarelbundock.github.io/Rdatasets/articles/data.html>