Substring variables are used when you need to focus on or extract specific portions of a variable for analysis, reporting, or data manipulation.
Here are key scenarios when substring variables are particularly useful:
Data Analysis To analyze only relevant parts of a string without modifying the original variable.
Data Cleaning and Standardization To standardize strings by creating new variables based on consistent patterns.
Feature Engineering for Machine Learning To create new features that enhance predictive modeling.
Data Preparation for Reporting To prepare data for visualization or reporting by simplifying or focusing on specific details.
Extracting Information from Identifiers To separate parts of an identifier for easier analysis or categorization.
Enhancing User Experience To create user-friendly outputs by extracting and displaying meaningful parts of a string.
Filtering and Subsetting Data To filter data based on specific criteria derived from substrings.
Debugging and Data Validation To create temporary variables that help in debugging or validating data integrity.
text <- "Hello, world!"
substring_text <- substr(text, start = 1, stop = 5)
print(substring_text)
## [1] "Hello"
text <- "Hello, world!"
substring_text <- substring(text, first = 8)
print(substring_text)
## [1] "world!"
library(stringr)
text <- "Hello, world!"
substring_text <- str_sub(text, start = 8, end = 12)
print(substring_text)
## [1] "world"
num_var <- 123456789
# Convert to character
num_char <- as.character(num_var)
substring_part <- substr(num_char, 2, 5) # Extract digits position 2-5
print(substring_part) # Output: "2345"
## [1] "2345"
# Convert back to numeric
substring_num <- as.numeric(substring_part)
print(substring_num) # Output: 2345
## [1] 2345
library(haven)
library(dplyr)
#Data Simulasi
data <- read_sav("tabelku.sav")
#substring_text <- str_sub(data$nama, start = 3, end = 25)
data <- data %>%
mutate(name_substring = substr(data$nama, start = 3, stop = 30))
# Convert the numeric variable to a character, then extract the substring
data <- data %>%
mutate(
num_var_str = as.character(kodeSLS), # Convert to character
sub_str = substr(num_var_str, start = 1, stop = 4), # Extract digits position 1-4
kabu_num = as.numeric(sub_str) # Convert back to numeric
)
Eksport data:
write_sav(data, "ambil_data.sav")
library(DT)
datatable(data,
options = list(pageLength = 10, # Number of rows per page
searching = TRUE) # Adds a search box
)
Ingin dibentuk variabel R101 (sebanyak 2 digit) yang bertipe numerik dari variabel KECAMATAN yang terdiri atas 7 digit (misal 7201010). Variabel R101 diambil dari digit pertama dan kedua variabel KECAMATAN.
F7.0 harus benar-benar sesuai dengan banyaknya digit dari variabel KECAMATAN.
COMPUTE R101 = NUMBER(SUBSTR(STRING(KECAMATAN, F7.0), 1, 2), F2.0).
Welfare Statistics Directorate, BPS, saptahas@bps.go.id