Substring variables are used when you need to focus on or extract specific portions of a variable for analysis, reporting, or data manipulation.

Here are key scenarios when substring variables are particularly useful:

  1. Data Analysis To analyze only relevant parts of a string without modifying the original variable.

  2. Data Cleaning and Standardization To standardize strings by creating new variables based on consistent patterns.

  3. Feature Engineering for Machine Learning To create new features that enhance predictive modeling.

  4. Data Preparation for Reporting To prepare data for visualization or reporting by simplifying or focusing on specific details.

  5. Extracting Information from Identifiers To separate parts of an identifier for easier analysis or categorization.

  6. Enhancing User Experience To create user-friendly outputs by extracting and displaying meaningful parts of a string.

  7. Filtering and Subsetting Data To filter data based on specific criteria derived from substrings.

  8. Debugging and Data Validation To create temporary variables that help in debugging or validating data integrity.

1 String Variable

1.1 Using substr()

text <- "Hello, world!"
substring_text <- substr(text, start = 1, stop = 5)
print(substring_text)
## [1] "Hello"

1.2 Using substring()

text <- "Hello, world!"
substring_text <- substring(text, first = 8)
print(substring_text)
## [1] "world!"

1.3 Using str_sub()

library(stringr)
text <- "Hello, world!"
substring_text <- str_sub(text, start = 8, end = 12)
print(substring_text)
## [1] "world"

2 Numeric Variable

num_var <- 123456789

# Convert to character
num_char <- as.character(num_var)

substring_part <- substr(num_char, 2, 5)  # Extract digits position 2-5
print(substring_part)  # Output: "2345"
## [1] "2345"
# Convert back to numeric
substring_num <- as.numeric(substring_part)
print(substring_num)  # Output: 2345
## [1] 2345

3 Applied Data SPSS

library(haven)
library(dplyr)

#Data Simulasi
data <- read_sav("tabelku.sav")

3.1 String Variable

#substring_text <- str_sub(data$nama, start = 3, end = 25)

data <- data %>%
  mutate(name_substring = substr(data$nama, start = 3, stop = 30))

3.2 Numeric Variable

# Convert the numeric variable to a character, then extract the substring
data <- data %>%
  mutate(
    num_var_str = as.character(kodeSLS),                  # Convert to character
    sub_str = substr(num_var_str, start = 1, stop = 4),   # Extract digits position 1-4
    kabu_num = as.numeric(sub_str)                        # Convert back to numeric
  )

Eksport data:

write_sav(data, "ambil_data.sav")
library(DT)
datatable(data, 
          options = list(pageLength = 10,  # Number of rows per page
                         searching = TRUE) # Adds a search box
         )

4 Substring in SPSS

Ingin dibentuk variabel R101 (sebanyak 2 digit) yang bertipe numerik dari variabel KECAMATAN yang terdiri atas 7 digit (misal 7201010). Variabel R101 diambil dari digit pertama dan kedua variabel KECAMATAN.

F7.0 harus benar-benar sesuai dengan banyaknya digit dari variabel KECAMATAN.



COMPUTE R101 = NUMBER(SUBSTR(STRING(KECAMATAN, F7.0), 1, 2), F2.0).



Welfare Statistics Directorate, BPS,