Using the vtree package

The {vtree} package is a tool for calculating and displaying variable trees.

Here are the steps on how to use the vtree package.

  1. Set working directory.
# Set working directory
setwd("C:/MyRData/Exploring data using the vtree package")
  1. Upload libraries. Always include the {tidyverse} package.
# Upload tidyverse library
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.4.2     v purrr   1.0.1
## v tibble  3.2.1     v dplyr   1.1.2
## v tidyr   1.3.0     v stringr 1.5.0
## v readr   2.1.3     v forcats 0.5.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# Upload vtree library
library(vtree)
  1. Import data from working directory and get a glimpse of the data.
# Import data
df<- read.csv("resume.csv", stringsAsFactors = TRUE, fileEncoding = "UTF-8-BOM")

# Get a glimpse of the data
df %>% glimpse()
## Rows: 4,870
## Columns: 31
## $ rownames               <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ~
## $ job_ad_id              <int> 384, 384, 384, 384, 385, 386, 386, 385, 386, 38~
## $ job_city               <fct> Chicago, Chicago, Chicago, Chicago, Chicago, Ch~
## $ job_industry           <fct> manufacturing, manufacturing, manufacturing, ma~
## $ job_type               <fct> supervisor, supervisor, supervisor, supervisor,~
## $ job_fed_contractor     <fct> , , , , No, No, No, No, No, No, No, No, No, No,~
## $ job_equal_opp_employer <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye~
## $ job_ownership          <fct> unknown, unknown, unknown, unknown, nonprofit, ~
## $ job_req_any            <fct> Yes, Yes, Yes, Yes, Yes, No, No, Yes, No, No, Y~
## $ job_req_communication  <fct> No, No, No, No, No, No, No, No, No, No, No, No,~
## $ job_req_education      <fct> No, No, No, No, No, No, No, No, No, No, No, No,~
## $ job_req_min_experience <fct> 5, 5, 5, 5, some, , , some, , , some, some, , ,~
## $ job_req_computer       <fct> Yes, Yes, Yes, Yes, Yes, No, No, Yes, No, No, Y~
## $ job_req_organization   <fct> No, No, No, No, Yes, No, No, Yes, No, No, No, N~
## $ job_req_school         <fct> none_listed, none_listed, none_listed, none_lis~
## $ received_callback      <fct> No, No, No, No, No, No, No, No, No, No, No, No,~
## $ firstname              <fct> Allison, Kristen, Lakisha, Latonya, Carrie, Jay~
## $ race                   <fct> white, white, black, black, white, white, white~
## $ gender                 <fct> Female, Female, Female, Female, Female, Male, F~
## $ years_college          <int> 4, 3, 4, 3, 3, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 1,~
## $ college_degree         <fct> Yes, No, Yes, No, No, Yes, Yes, No, Yes, Yes, Y~
## $ honors                 <fct> No, No, No, No, No, Yes, No, No, No, No, No, No~
## $ worked_during_school   <fct> No, Yes, Yes, No, Yes, No, Yes, No, No, Yes, No~
## $ years_experience       <int> 6, 6, 6, 6, 22, 6, 5, 21, 3, 6, 8, 8, 4, 4, 5, ~
## $ computer_skills        <fct> Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, No,~
## $ special_skills         <fct> No, No, No, Yes, No, Yes, Yes, Yes, Yes, Yes, Y~
## $ volunteer              <fct> No, Yes, No, Yes, No, No, Yes, Yes, No, Yes, Ye~
## $ military               <fct> No, Yes, No, No, No, No, No, No, No, No, No, No~
## $ employment_holes       <fct> Yes, No, No, Yes, No, No, No, Yes, No, No, Yes,~
## $ has_email_address      <fct> No, Yes, No, Yes, Yes, No, Yes, Yes, No, Yes, Y~
## $ resume_quality         <fct> low, high, low, high, high, low, high, high, lo~
  1. Create ‘years_experience_group’ variable from ‘years_experience’ variable.
# Create age group variable
df<-df %>% 
  mutate(
    # Create categories
     years_experience_group = dplyr::case_when(
      years_experience <= 5                          ~ "1-5 years",
      years_experience > 5 & years_experience <= 10  ~ "6-10 years",
      years_experience > 10 & years_experience <= 20  ~ "11-20 years",
      years_experience > 20                          ~ "> 20 years"
    ), 
    # Convert to factor
    years_experience_group  = factor(years_experience_group , level = c("1-5 years","6-10 years","11-20 years","> 20 years")))
  1. Display variable tree using the following line of code.
vtree(df,"race years_experience_group received_callback", horiz=FALSE)

  1. Display summary information for the years of experience by gender.
# Display summaries
vtree(df,"gender",summary="years_experience",horiz=FALSE)

  1. Find patterns in specific combination of variables.
# Find patterns. Example 1
vtree(df,"race received_callback", pattern = TRUE)

# Find patterns. Example 2
vtree(df,"race gender received_callback", pattern = TRUE)

# Find patterns. Example 3
vtree(df,"race gender years_experience_group received_callback", pattern = TRUE)

If you are new to R Programming Language, don’t give up. Your R skills will get better with time.

Note. The ‘resume.csv’ file can be downloaded from: <https://vincentarelbundock.github.io/Rdatasets/articles/data.html>