1. Introduction

  • 백화점의 구매, 고객 데이터
  • 데이터 수집 기간 : 2000.05.01 ~ 2001.04.29
  • 1인당 년간 구매액 : 약 329만원 (최대 1억 1479만원)

1.1. Intelligent Targeting

- What is Marketing?

“The process by which companies create value for customers and build strong customer relationships in order to capture value from customers in return(기업이 고객을 위해 가치를 창출하고, 고객 관계를 구축하여, 고객의 가치를 보상하는 프로세스).” - Kotler and Armstrong (2010)

-> (본인이 생각하는) 마케팅이란 가치 교환의 과정이라고 생각한다. 고객과 기업이 만들어내는 각각의 가치를 어떻게 교환할 것인지, 그 과정을 만들어내는 걸 마케팅이라고한다. 더 간단하게 요약하자면, ’고객의 행동을 이끌어 내는 것이라고 말 할 수 있다. 짧게 요약했지만, 그 안에는 무수히; 많은 과정과 노력이 필요하다. 가치 교환의 과정에서 모든 행동의 주체는 기업이다. 물론 고객의 관점에서 마케팅을 수행해야하지만 기업이 주체가 되어서 해야할 일을 하는 것이 마케팅이다.

- The 4P’s:

마케팅 캠페인은 고객의 요구 사항과 전반적인 만족도에 중점을 둔다. 그럼에도 불구하고 마케팅 캠페인의 성공 여부를 결정하는 다양한 변수가 있다. 캠페인을 할 때 고려해야 할 몇 가지 변수가 있다.

  1. Segment of the Population - “마케팅 캠페인이 집단의 어느 대상에 이루어지며 그 이유는 무엇인가?” 이러한 측면은 인구 중 어느 부분이 메시지를 받을 가능성이 가장 높은지를 알려주기 때문에 매우 중요하다.

  2. Distribution channel to reach the customer’s place - 캠페인을 최대한 활용하려면 가장 효과적인 전략을 구현해야한다. 인구 중 어느 집단을 다루고, 기업의 메시지를 전달하기 위해 어떤 도구를 사용해야하는가? (예 : 전화, 라디오, TV, 소셜 미디어 등)

  3. Price - 잠재 고객에게 제공할 수 있는 가장 좋은 가격은 얼마인가? (은행의 경우, 그들의 주요 관심사는 잠재 고객이 정기예금 계좌를 개설하여 은행의 운영 활동을 계속할 수 있도록 하기 위한 것이므로 필요하지 않다.)

  4. Promotional Strategy - 전략이 구현되고 잠재 고객이 어떻게 대응할 것인가이다. 이것은 이전에 했던 실수에 대해 배우고 마케팅 캠페인을 훨씬 효과적으로 만드는 방법을 결정하기 위해 (가능하면)이전 캠페인에 대한 철저한 분석이 있어야 하기 때문에 마케팅 캠페인 분석의 마지막 부분이어야 한다.

1.2. Aim

1) Data Analysis

- 백화점의 고객 매출 데이터를 바탕으로 고객 특성을 분석
    - 고객 프로파일 개발 (현황 파악)
    - 다이렉트 메일 광고 효율성 제고
    - 타겟 메일링에 의한 응답률 제고

2) Data Analytics

- 나이, 성별, 직업, 거주지, 결혼 여부, 등급 등의 고객 정보를 통해 고객의 성별을 예측 (Male / Female)
    - 고객의 과거 이력과 유사한 고객군들의 데이터를 기반으로 해당 고객이 가입할지 예측
    
  - 나이, 성별, 직업, 거주지, 결혼 여부, 등급 등의 고객 정보를 통해 고객의 추후 구매등급을 예측

1.3. Data Analysis Process

1) 기존 고객 DB로부터 데이터 분석을 위한 표본 고객 목록을 추출
2) 데이터 현황 파악(분석 및 시각화)
3) 성별/고객등급에 따른 차별화 마케팅 실시
4) R을 이용하여 캠페인 결과를 분석

1.4. Data Description

- Data Files :

  • HDS_Cards.tab : 카드정보
  • HDS_Customers.tab : 고객정보
  • HDS_Jobs.tab : 직업정보
  • HDS_Transactions_MG.tab : 구매정보
  • mic_engzipcode_DB20050215.xlsx : 우편번호
  • HDS-테이블소개.xls : 테이블 딕셔너리

- Data Field :

  • age : 나이
  • sex : 성별
  • region : 거주지
  • income : 소득
  • married : 결혼 여부

2. Setting

2.1. Import Libraries

- Import Libraries :

suppressPackageStartupMessages({
    
library(data.table)
library(tidyr)
library(lubridate)
library(DT)
library(ggplot2)
library(corrplot)
library(ggthemes)
library(sqldf)

library(readr)
    
library(randomForest)
library(caret)
library(C50)
library(dplyr)
library(kknn)
  
library(ROCR)
    
library(eeptools)
  
library(DescTools)
  
library(kableExtra)
  
library(reshape)
  
#if (!require("kknn")) install.packages("kknn"); library(kknn)
})

rm(list=ls())

fillColor = "#FFA07A"
fillColor2 = "#F1C40F"

- Load Dataset :

# 윈도우 OS에서 전송한 파일이 맥북에서 깨질 경우

read.any <- function(text, sep = "", ...) {
    
encoding <- as.character(guess_encoding(text)[1,1])

setting <- as.character(tools::file_ext(text))

if(sep != "" | !(setting %in% c("csv", "txt")) ) setting <- "custom"

separate <- list(csv = ",", txt = "\n", custom = sep)

result <- read.table(text, sep = separate[[setting]], fileEncoding = encoding, ...)

return(result)
    
}

cs <- read.any('./input/HDS_Customers.tab', sep="\t", header=T, stringsAsFactors = F)
tr <- read.any('./input/HDS_Transactions_MG.tab', sep="\t", header=T, stringsAsFactors = F)

jobs <- read.any('./input/input/HDS_Jobs.tab', sep="\t", header=T, stringsAsFactors = F)
cards <- read.any('./input/input/HDS_Cards.tab', sep="\t", header=T, stringsAsFactors = F)
#addr <- read.any('./input/input/mic_engzipcode_DB20050215.xlsx', sep="\t", header=T, stringsAsFactors = F)

2.2. Peek into the data

- Overview Data :

  - 총 49,995명의 고객
datatable(cs, style="bootstrap", class="table-condensed", options = list(dom = 'tp',scrollX = TRUE))
length(unique(tr$custid))
## [1] 49995

2.3. Preprocessing

2.3.1. Sex

  • sex (성별코드)
    • 0: 무효값
    • 1: 남성
    • 2: 여성
  • feature vector map :
    • 0 : 남성
    • 1 : 여성
table(cs['sex'])
## 
##     0     1     2 
##    24 15162 34809
## custsig$sex = 0 제외, custid 제외, NA 처리

cs <- subset(cs, cs['sex'] > 0)
cs['sex'] <- cs['sex'] - 1 # 남자 : 0, 여자 : 1

table(cs['sex'])
## 
##     0     1 
## 15162 34809
#task1 <- data_frame(prop.table(table(cs['sex'])))

task1 <- sqldf("

select
    sex
    , round(100. * g_cnt / t_cnt, 2) as prob

from (
    select
        sex
        , count(*) g_cnt
    from cs
    group by 1
    ) as a, (
    select
        count(*) as t_cnt
    from cs
    ) as b

")

task1
##   sex  prob
## 1   0 30.34
## 2   1 69.66

2.3.2. Age

cs$birth_year <- as.integer(substr(cs$birth, 1, 4))
cs$birth_month <- as.integer(substr(cs$birth, 6, 7))

cs['age'] <- 2000 - cs['birth_year'] + 1
summary(cs['age'])
##       age          
##  Min.   :-7959.00  
##  1st Qu.:   28.00  
##  Median :   33.00  
##  Mean   :   36.11  
##  3rd Qu.:   43.00  
##  Max.   :  101.00  
##  NA's   :7835
boxplot(cs['birth_year'])

boxplot(cs['age'])

## custsig$age = 0 제외, NA 처리

cs <- subset(cs, cs['age'] > 0)

summary(cs['age'])
##       age        
##  Min.   :  2.00  
##  1st Qu.: 28.00  
##  Median : 33.00  
##  Mean   : 36.47  
##  3rd Qu.: 43.00  
##  Max.   :101.00
boxplot(cs['birth_year'])

boxplot(cs['age'])

Desc(cs['age'])
## ------------------------------------------------------------------------- 
## Describe cs["age"] (data.frame):
## 
## data.frame:  42133 obs. of  1 variables
## 
##   Nr  ColName  Class    NAs  Levels
##   1   age      numeric  .          
## 
## 
## ------------------------------------------------------------------------- 
## 1 - age (numeric)
## 
##   length       n    NAs  unique     0s   mean  meanCI
##   42'133  42'133      0      87      0  36.47   36.36
##           100.0%   0.0%           0.0%          36.57
##                                                      
##      .05     .10    .25  median    .75    .90     .95
##    24.00   26.00  28.00   33.00  43.00  53.00   58.00
##                                                      
##    range      sd  vcoef     mad    IQR   skew    kurt
##    99.00   10.96   0.30    8.90  15.00   1.00    0.86
##                                                      
## lowest : 2.0 (8), 3.0 (13), 4.0 (28), 5.0 (12), 6.0 (6)
## highest: 88.0, 90.0, 94.0, 95.0, 101.0 (19)

2.3.3. Marriage

  • mrg_flg (결혼여부코드)
    • 0 : 기타
    • 1 : 기혼
    • 2 : 미혼
    • 7 : 기타
  • feature vector map :
    • 0 : 기타
    • 1 : 기혼
    • 2 : 미혼
table(cs['mrg_flg'])
## 
##     0     1     2     7 
##  4503 21167 16462     1
#prop.table(table(cs['mrg_flg']))

task2 <- sqldf("

select
    mrg_flg
    , round(100. * m_cnt / t_cnt, 2) as prob

from (
    select
        mrg_flg
        , count(*) m_cnt
    from cs
    group by 1
    ) as a, (
    select
        count(*) as t_cnt
    from cs
    ) as b

")

task2
##   mrg_flg  prob
## 1       0 10.69
## 2       1 50.24
## 3       2 39.07
## 4       7  0.00

2.3.4. Job

prop.table(table(sa_cs['job']))
## 
##   개인사업     건설업   공공기관     공무원   교육기관     그룹사 
##      0.406      0.032      0.032      0.030      0.072      0.044 
##   금융기관   도소매업   언론기관   연구기관       예술 운송서비스 
##      0.038      0.020      0.012      0.012      0.004      0.014 
##   의료기관     전문직 정보서비스     제조업 
##      0.046      0.028      0.032      0.178
sa_cs$job <- as.character(sa_cs$job)
sa_cs$job[sa_cs$job == "개인사업"] <- "Entrepreneur"

sa_cs$job[sa_cs$job == "건설업"] <- "Construction"
sa_cs$job[sa_cs$job == "공공기관"] <- "Public"
sa_cs$job[sa_cs$job == "공무원"] <- "Official"
sa_cs$job[sa_cs$job == "교육기관"] <- "Education"
sa_cs$job[sa_cs$job == "그룹사"] <- "Company"
sa_cs$job[sa_cs$job == "금융기관"] <- "Finance"

sa_cs$job[sa_cs$job == "도소매업"] <- "Retail"
sa_cs$job[sa_cs$job == "언론기관"] <- "Press"
sa_cs$job[sa_cs$job == "연구기관"] <- "Laboratory"
sa_cs$job[sa_cs$job == "예술"] <- "Art"
sa_cs$job[sa_cs$job == "운송서비스"] <- "Logistic"
sa_cs$job[sa_cs$job == "의료기관"] <- "Medical"
sa_cs$job[sa_cs$job == "전문직"] <- "Profession"

sa_cs$job[sa_cs$job == "정보서비스"] <- "IT"
sa_cs$job[sa_cs$job == "제조업"] <- "Manufacturing"

sa_cs$job <- as.factor(sa_cs$job)

2.3.5. Residence

2.3.6. Home type

3. Exploratary Data Analysis

1) 기초 통계량
2) 시각화

3.1. Sex

- 성별에 따른 평균 나이

r1 <- cs %>%
  group_by(sex) %>%
  summarise(age = round(mean(age)))
r1
## # A tibble: 2 x 2
##     sex   age
##   <dbl> <dbl>
## 1     0    40
## 2     1    35

- 고객 성별 비중

p1 <- ggplot(data = task1, aes(x = as.factor(sex), y = prob)) +
  geom_bar(stat="identity", fill=fillColor2) +
  ggtitle("Customer Rate by Gender") + xlab("Gender") + ylab("Percent (%)") + 
  geom_text(aes(label=paste0(prob, "%", sep="")), vjust=-0.5) + theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
        axis.text.x=element_text(size=12), legend.position="null") +
  scale_x_discrete(breaks=c("0", "1"),
        labels=c("Male", "Female"))

p1

3.2. Age

3.2.1. Before

# View age chart
ggplot(cs, aes(x=age)) +
  geom_histogram(aes(y=..density..),      
# Histogram with density instead of count on y-axis
  binwidth=.5,
  colour="black", fill=fillColor2) +
  geom_density(alpha=.2, fill="green") + # Overlay with transparent density plot
  ggtitle("Customer Age Distribution") + xlab("Age") + ylab("Density") + 
  theme_economist()

ggplot(data = cs, aes(age, fill = factor(sex))) +
  geom_bar() + 
  ggtitle("Customer Age by Gender") + xlab("Age") + ylab("Freq") + 
  theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
       axis.text.x=element_text(size=12)) +
  scale_x_discrete(breaks=c("0", "1"),
        labels=c("Male", "Female")) +
  guides(fill=guide_legend(title="Sex"))

3.2.2. After

- 고객 연령 분포도

# View age chart
p2 <- cs %>%
  filter(70 > age & age > 20) %>%
  ggplot(aes(x=age)) +
  geom_histogram(aes(y=..density..),      
# Histogram with density instead of count on y-axis
  binwidth=.5,
  colour="black", fill=fillColor2) +
  geom_density(alpha=.2, fill="green") + # Overlay with transparent density plot
  ggtitle("Customer Age Distribution") + xlab("Age") + ylab("Density") + 
  theme_economist()

p2

- 고객 성별/연령 분포도

- 0 : 남성
- 1 : 여성
p3 <- cs %>%
  filter(70 > age & age > 20) %>%

ggplot(aes(age, fill = factor(sex))) +
  geom_bar() + 
  ggtitle("Customer Age Distribution") + xlab("Age") + ylab("Freq") + 
  theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
        axis.text.x=element_text(size=12)) +
  guides(fill=guide_legend(title="Sex"))
p3

3.3. Marriage

3.3.1. Before

ggplot(data = task2, aes(x = factor(mrg_flg), y = prob)) +
  geom_bar(stat="identity", fill=fillColor2) +
  ggtitle("Customer Rate by Marriage") + xlab("Marriage") + ylab("Percent (%)") + 
  geom_text(aes(label=paste0(prob, "%", sep="")), vjust=-0.5) + theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
        axis.text.x=element_text(size=12), legend.position="null") +
  scale_x_discrete(breaks=c("0", "1", "2"),
        labels=c("Unknown", "Married", "Single"))

ggplot(data = cs, aes(age, fill = factor(mrg_flg))) +
  geom_bar() + 
  ggtitle("Customer Age by Marriage") + xlab("Age") + ylab("Freq") + 
  theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
        axis.text.x=element_text(size=12))

cs %>%
  filter(70 > age & age > 20) %>%

ggplot(aes(age, fill = factor(mrg_flg))) +
  geom_bar() + 
  ggtitle("Customer Age by Marriage") + xlab("Age") + ylab("Freq") + 
  theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
        axis.text.x=element_text(size=12))

3.3.2. After

- 결혼 여부에 따른 고객 비율

- 1 : 기혼
- 2 : 미혼
#prop.table(table(cs['mrg_flg']))

task3 <- sqldf("
select
    mrg_flg
    , round(100. * m_cnt / t_cnt, 2) as prob

from (

    select
        mrg_flg
        , count(*) m_cnt
    from cs
    where mrg_flg in ('1', '2')
    group by 1

    ) as a, (

    select
        count(*) as t_cnt
    from cs
    where mrg_flg in ('1', '2')

    ) as b

")

task3
##   mrg_flg  prob
## 1       1 56.25
## 2       2 43.75
p4 <- ggplot(data = task3, aes(x = factor(mrg_flg), y = prob)) +
  geom_bar(stat="identity", fill=fillColor2) +
  ggtitle("Customer Rate by Marriage") + xlab("Marriage") + ylab("Percent (%)") + 
  geom_text(aes(label=paste0(prob, "%", sep="")), vjust=-0.5) + theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
        axis.text.x=element_text(size=12), legend.position="null") +
  scale_x_discrete(breaks=c("0", "1", "2"),
        labels=c("Unknown", "Married", "Single"))
p4

p5 <- cs %>%
  filter(70 > age & age > 20) %>%
  filter(mrg_flg == '1' | mrg_flg == '2') %>%

ggplot(aes(age, fill = factor(mrg_flg))) +
  geom_bar() + 
  ggtitle("Customer Age by Marriage") + xlab("Age") + ylab("Freq") + 
  #scale_x_discrete(labels = c('Married','Single')) +
  #scale_colour_manual("range", labels = c('Married','Single'), values = c("forestgreen", "peach")) +
  #scale_colour_discrete(labels=c('Married','Single')) +
  guides(fill=guide_legend(title="Marriage")) +
  theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
        axis.text.x=element_text(size=12))

p5

3.4. Job

- 직업에 따른 고객 비율

#prop.table(table(sa_cs$job))

#require(forcats)
#ggplot(sa_cs, aes(fct_infreq(job))) +
#  geom_bar(fill=fillColor2) +
#  ggtitle("Customer Rate by Job") + xlab("Job") + ylab("Freq") + 
#  theme_economist() + 
#  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
#        axis.text.x=element_text(size=12, angle = 90))
task1 <- sqldf("
select 
    job
    , round(100. * freq/tot_freq, 2) as prob
from (
    select
        job
        , count(*) as freq
    from sa_cs
    group by job
    ) as a, (
    select
        count(*) as tot_freq
    from sa_cs 
    ) as b
")
task1
##              job prob
## 1            Art  0.4
## 2        Company  4.4
## 3   Construction  3.2
## 4      Education  7.2
## 5   Entrepreneur 40.6
## 6        Finance  3.8
## 7             IT  3.2
## 8     Laboratory  1.2
## 9       Logistic  1.4
## 10 Manufacturing 17.8
## 11       Medical  4.6
## 12      Official  3.0
## 13         Press  1.2
## 14    Profession  2.8
## 15        Public  3.2
## 16        Retail  2.0
p6 <- ggplot(data =task1, aes(x = reorder(job, prob), y = prob)) +
  geom_bar(stat = "identity", fill = fillColor2) +
  geom_text(aes(x = job, y = 0.05, label = paste0("(",prob, "%)",sep="")),
            hjust=0.5, vjust=.5, size = 2, colour = 'black',
            fontface = 'bold') +
  labs(x = 'Job', 
       y = 'Prob', 
       title = 'Customer Rate by Job') +
  theme_economist() +
  coord_flip()
p6

3.5. Residence

- 거주지에 따른 고객 비율

task2 <- sqldf("

select 
    residence
    , round(100. * freq/tot_freq, 2) as prob
from (
    select
        residence
        , count(*) as freq
    from sa_cs
    group by 1
    ) as a, (
    select count(*) as tot_freq
    from sa_cs ) as b
order by 2 desc
")

task2
##          residence prob
## 1       Gangnam-gu 18.0
## 2        Seocho-gu 10.8
## 3          Mapo-gu  9.4
## 4     Seodaemun-gu  9.4
## 5      Gangdong-gu  9.0
## 6        Songpa-gu  6.6
## 7      Gwangjin-gu  5.8
## 8     Seongdong-gu  4.2
## 9       Yongsan-gu  3.8
## 10      Dongjak-gu  3.2
## 11    Eunpyeong-gu  3.2
## 12    Yangcheon-gu  2.8
## 13       Gwanak-gu  1.8
## 14     Jungnang-gu  1.8
## 15         Jung-gu  1.6
## 16       Jongno-gu  1.2
## 17        Nowon-gu  1.2
## 18 Yeongdeungpo-gu  1.2
## 19   Dongdaemun-gu  1.0
## 20       Dobong-gu  0.8
## 21      Gangbuk-gu  0.8
## 22      Gangseo-gu  0.8
## 23    Geumcheon-gu  0.6
## 24     Seongbuk-gu  0.6
## 25         Guro-gu  0.4
p7 <- ggplot(data = task2, aes(x = reorder(residence, prob), y = prob)) +
  geom_bar(stat = "identity", fill = fillColor2) +
  geom_text(aes(x = residence, y = 0.05, label = paste0("(", prob, "%)", sep = "")),
            hjust = 1, vjust = .5, size = 2, colour = 'black',
            fontface = 'bold') +
  labs(x = "Residence",
       y = "Prob",
       title = "Customer Rate by Residence") +
  theme_economist() +
  coord_flip()
p7

3.6. VIP

- VIP 명단 뽑기

sa_data <- inner_join(sa_cs, sa_tr, by='custid')

sa_data %>%
  group_by(custid) %>%
  summarise(amount = sum(amount), freq = n(), avg_amt = round(amount/freq)) %>%
  arrange(desc(amount)) %>%
  head(10)
## # A tibble: 10 x 4
##    custid   amount  freq avg_amt
##     <int>    <int> <int>   <dbl>
##  1  42800 61672778   250  246691
##  2  15968 44478591   228  195082
##  3  13493 35480804   184  192830
##  4  42322 34305296   172  199449
##  5  48278 31569011   230  137257
##  6  27074 29419710    92  319779
##  7  32232 28308279   268  105628
##  8  33829 26050612   148  176018
##  9  30026 23227040    96  241948
## 10  37340 20128817    94  214136

3.7. Age Group

- 나이대별에 따른 고객 비율

sa_data['age_grp'] <- trunc(sa_data['age'] / 10)*10

table(sa_data['age_grp'])
## 
##   20   30   40   50   60 
## 1570 7580 5967 3248  760
prop.table(table(sa_data['age_grp']))
## 
##         20         30         40         50         60 
## 0.08209150 0.39633987 0.31200000 0.16983007 0.03973856
#task3 <- data.frame(prop.table(table(sa_data['age_grp'])))
#task3

task3 <- sqldf("
select
    age_grp
    , round(100. * freq/tot_freq, 2) as prob
from (
    select age_grp
    , count(*) as freq
    from sa_data
    group by 1
    ) as a, (
    select count (*) as tot_freq
    from sa_data 
     ) as b
")
task3
##   age_grp  prob
## 1      20  8.21
## 2      30 39.63
## 3      40 31.20
## 4      50 16.98
## 5      60  3.97
p8 <- ggplot(data = task3, aes(x = reorder(age_grp, prob), y = prob)) +
  geom_bar(stat="identity", fill=fillColor2) +
  ggtitle("Customer Age Group Distribution") + xlab("Age Group") + ylab("Percent (%)") + 
  #geom_text(aes(label=paste0(round(Freq, 2) * 100, "%")), vjust=-0.5) + theme_economist() + 
  geom_text(aes(label=paste0(prob, "%")), vjust=-0.5) + theme_economist() + 
  theme(plot.title=element_text(hjust=0.5), axis.title=element_text(size=12, face="bold"), 
        axis.text.x=element_text(size=12), legend.position="null")
p8

3.8. Customer Transaction

3.8.1 Sex

- 성별로 알아보는 객단가 (최소, 평균, 중앙, 최대값)

sa_data %>%
  group_by(gender) %>%
  summarise_each(funs(min, median, max, mean, n()), amount) %>%
  arrange(desc(mean))
## # A tibble: 2 x 6
##   gender   min median     max    mean     n
##   <chr>  <dbl>  <dbl>   <dbl>   <dbl> <int>
## 1 여       840  54354 3930000 110432. 12713
## 2 남       650  51020 8000000 106750.  6412

3.8.2. Age Group

- 나이대별로 알아보는 객단가 (최소, 평균, 중앙, 최대값)

sa_data %>%
  group_by(age_grp) %>%
  summarise_each(funs(min, median, max, mean, n()), amount) %>%
  arrange(desc(mean))
## # A tibble: 5 x 6
##   age_grp   min median     max    mean     n
##     <dbl> <dbl>  <dbl>   <dbl>   <dbl> <int>
## 1      50  1680 61546. 3500000 126620.  3248
## 2      40   650 55090  7544000 111784.  5967
## 3      30   840 48000  8000000 102553.  7580
## 4      60  2800 56625  2767350 100157.   760
## 5      20  1000 52000  2458000  99780.  1570

3.8.3. Job

- 직업별로 알아보는 객단가 (최소, 평균, 중앙, 최대값)

sa_data %>%
  group_by(job) %>%
  summarise_each(funs(min, median, max, mean, n()), amount) %>%
  arrange(desc(mean))
## # A tibble: 16 x 6
##    job             min median     max    mean     n
##    <fct>         <dbl>  <dbl>   <dbl>   <dbl> <int>
##  1 Press          9300 85644  2933000 198288.   210
##  2 Profession     2500 57418  7544000 134361.   575
##  3 Official       2712 59000  3500000 125971.   341
##  4 Logistic       2350 35432  3930000 116466.   356
##  5 Entrepreneur    650 55000  8000000 112784.  8908
##  6 Construction   5200 62000  1315000 109839.   722
##  7 Company        2000 50000  2539000 109610.  1060
##  8 Medical        1000 56158  2947000 108888.   893
##  9 Retail         2300 54740   900900 105834.   259
## 10 Manufacturing   930 50400  3580000 104917.  2837
## 11 Laboratory     7154 39000  1300000 102630.   171
## 12 Art            1350 55300   468000  92773.    65
## 13 Education       840 50466. 2458000  91845.  1024
## 14 Public         2390 48000  2064000  88796.   562
## 15 Finance        3460 51500  1630000  84618.   621
## 16 IT             2900 39000  1338000  81653.   521

3.8.4. Marriage

- 결혼 여부로 알아보는 객단가 (최소, 평균, 중앙, 최대값)

sa_data %>%
  group_by(marriage) %>%
  summarise_each(funs(min, median, max, mean, n()), amount) %>%
  arrange(desc(mean))
## # A tibble: 2 x 6
##   marriage   min median     max    mean     n
##   <chr>    <dbl>  <dbl>   <dbl>   <dbl> <int>
## 1 미혼       840  55170 8000000 116178.  4763
## 2 기혼       650  52373 7544000 106883. 14362

3.8.5. Store

- 지점별로 알아보는 객단가 (최소, 평균, 중앙, 최대값)

sa_data %>%
  group_by(store) %>%
  summarise_each(funs(min, median, max, mean, n()), amount) %>%
  arrange(desc(mean))
## # A tibble: 4 x 6
##   store    min median     max    mean     n
##   <chr>  <dbl>  <dbl>   <dbl>   <dbl> <int>
## 1 본점    1400  59000 7544000 128493.  3722
## 2 무역점  2000  56700 3930000 120541.  4955
## 3 천호점  1400  51550 8000000 103303.  4338
## 4 신촌점   650  49000 2780000  92429.  6110

3.8.6. Brand

- 브랜드별로 알아보는 객단가 (최소, 평균, 중앙, 최대값)

sa_data %>%
  group_by(brand) %>%
  summarise_each(funs(min, median, max, mean, n()), amount) %>%
  arrange(desc(n), desc(mean))
## # A tibble: 1,191 x 6
##    brand                min median     max    mean     n
##    <chr>              <dbl>  <dbl>   <dbl>   <dbl> <int>
##  1 식품                 650  25840 1050000  38288.  5267
##  2 지오다노            7800  39800  695000  49828.   301
##  3 랑콤                7000  81000  541000  96482.   251
##  4 크리니크            6000  45000  318000  65926.   188
##  5 샤넬                7000  53000 1039000  83066.   181
##  6 에스티로더          5000  80000  605000 109847.   177
##  7 아모레              5000  70000  374000 102922.   147
##  8 NUK                 3000  14600  120800  24689.   137
##  9 크리스챤디올화장품 15000  64000 1152000 107771.   131
## 10 폴로               15000  98000  473000 133423.   127
## # ... with 1,181 more rows

3.8.7. corner

- 코너별로 알아보는 객단가 (최소, 평균, 중앙, 최대값)

sa_data %>%
  group_by(corner) %>%
  summarise_each(funs(min, median, max, mean, n()), amount) %>%
  arrange(desc(mean))
## # A tibble: 26 x 6
##    corner           min median     max    mean     n
##    <chr>          <dbl>  <dbl>   <dbl>   <dbl> <int>
##  1 타운모피       39000 217150 3500000 683760.    52
##  2 디자이너부띠끄 19000 461150 2095000 538516.   160
##  3 가구            3900 241055 2631580 486957.    66
##  4 수입명품       20000 270000 3020000 409793.   353
##  5 가전            6000 158000 3513000 375707.   244
##  6 엘레강스캐주얼 15000 189000 2780000 291944.   397
##  7 침구수예       10000 118000 7544000 277591.   135
##  8 캐릭터캐주얼    1350 169500 3580000 249193.   766
##  9 정장셔츠       15000 101700 1545000 206881.   317
## 10 장신구          3000  75000 3930000 163452.   459
## # ... with 16 more rows
sa_data %>%
  group_by(corner) %>%
  summarise_each(funs(min, median, max, mean, n()), amount) %>%
  arrange(desc(n), desc(mean))
## # A tibble: 26 x 6
##    corner         min median     max    mean     n
##    <chr>        <dbl>  <dbl>   <dbl>   <dbl> <int>
##  1 일반식품       650  25840 1050000  38288.  5267
##  2 화장품        3000  64000 1152000  98515.  2158
##  3 유아동복      1180  42000 1053000  67864.  1345
##  4 유니캐주얼    2000  55100  695000  75874.  1138
##  5 스포츠        3500  78000 2200000 119376.  1129
##  6 피혁          8000 120000  807200 119439.   931
##  7 영캐주얼      5000  89600 1524000 128529.   853
##  8 캐릭터캐주얼  1350 169500 3580000 249193.   766
##  9 니트단품      2000  55000  879000  85472    730
## 10 섬유          1500  30000  390000  56192.   676
## # ... with 16 more rows

3.9 Bestseller

3.9.1. Age Group

- 연령대별 인기상품 (코너)

r2 <- sqldf("
select
    age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2
order by 1, 3 desc
") %>% head(10)


r3 <- sqldf("
select
    age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2
order by 1, 3 desc
") %>% head(10)


r4 <- sqldf("
select
    age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2
order by 1, 3 desc
") %>% head(10)


r5 <- sqldf("
select
    age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2
order by 1, 3 desc
") %>% head(10)
Best Corner by Age Group : 20
age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
20 일반식품 313 14824304 47361.99 1000 32676 375000
20 화장품 178 16473400 92547.19 7000 54000 1000000
20 유아동복 154 6794860 44122.47 2000 29800 274000
20 유니캐주얼 118 9752540 82648.64 9800 59600 695000
20 캐릭터캐주얼 82 17452690 212837.68 27300 168000 1030000
20 니트단품 79 6498800 82263.29 4000 53000 394000
20 영캐주얼 79 11310700 143173.42 22400 98000 809000
20 스포츠 78 7515100 96347.44 12000 66900 420000
20 피혁 76 10057300 132332.89 17000 139250 429000
20 섬유 71 2923020 41169.30 3300 25000 325000


Best Corner by Age Group : 30
age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
30 일반식품 1938 63283777 32654.17 840 24000 760000
30 화장품 941 93226700 99071.94 5000 60000 1152000
30 유아동복 920 61948680 67335.52 1180 39650 1053000
30 스포츠 373 42284240 113362.57 3800 69000 2200000
30 피혁 347 40906850 117887.18 10000 114400 807200
30 캐릭터캐주얼 335 89152950 266128.21 1350 176000 3580000
30 유니캐주얼 324 22456400 69309.88 2000 50000 478000
30 문화완구 312 9974810 31970.54 2300 22500 459000
30 영캐주얼 305 36823020 120731.21 6900 89000 494000
30 섬유 283 16203220 57255.19 3500 28000 390000


Best Corner by Age Group : 40
age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
40 일반식품 1743 74149668 42541.40 650 28131 1050000
40 화장품 570 55518600 97401.05 7000 65000 764000
40 유니캐주얼 504 38059080 75514.05 7400 54500 559920
40 스포츠 426 49990720 117349.11 3500 75500 1400000
40 영캐주얼 298 40340703 135371.49 5000 89050 1524000
40 피혁 298 33273900 111657.38 11000 114500 506000
40 니트단품 212 19594000 92424.53 5000 60000 610000
40 유아동복 209 16370180 78326.22 3800 51000 726000
40 캐릭터캐주얼 207 47670180 230290.72 17500 168000 890000
40 섬유 202 11310180 55990.99 1500 35000 360000


Best Corner by Age Group : 50
age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
50 일반식품 976 37976148 38909.99 1680 25000 468768
50 화장품 325 30649800 94307.08 3000 60000 635000
50 스포츠 206 29581900 143601.46 10000 102750 1550000
50 피혁 183 24039700 131364.48 8000 133200 480000
50 유니캐주얼 175 14618240 83532.80 4000 62400 404500
50 니트단품 153 15241180 99615.56 5500 59000 531200
50 영캐주얼 135 17306200 128194.07 10000 95100 634000
50 캐릭터캐주얼 121 30181800 249436.36 18000 168000 1515000
50 수입명품 110 36179900 328908.18 20000 228750 2853000
50 트래디셔널캐주얼 107 17679900 165232.71 10000 110000 1463000



- 연령대별 인기상품 (브랜드)

r6 <- sqldf("
select
    age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2
order by 1, 3 desc
") %>% head(10)


r7 <- sqldf("
select
    age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2
order by 1, 3 desc
") %>% head(10)


r8 <- sqldf("
select
    age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2
order by 1, 3 desc
") %>% head(10)


r9 <- sqldf("
select
    age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2
order by 1, 3 desc
") %>% head(10)
Best Brand by Age Group : 20
age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
20 식품 313 14824304 47361.99 1000 32676 375000
20 지오다노 30 2158500 71950.00 9800 50700 695000
20 랑콤 28 2985000 106607.14 24000 77000 315000
20 NUK 24 568560 23690.00 3500 14400 113400
20 비오뗌 23 1579000 68652.17 17000 52000 311000
20 폴로 19 1728600 90978.95 15000 82000 201000
20 샤넬 18 1261000 70055.56 25000 54000 195000
20 메이컵포에버 17 524000 30823.53 12000 24000 93000
20 리바이스 15 1539200 102613.33 20000 105000 200000
20 아가방 15 663150 44210.00 2000 30000 178000


Best Brand by Age Group : 30
age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
30 식품 1938 63283777 32654.17 840 24000 760000
30 NUK 104 2736680 26314.23 3000 16550 120800
30 랑콤 104 10083000 96951.92 12000 79500 541000
30 샤넬 94 9047000 96244.68 7000 57500 1039000
30 크리니크 89 6150000 69101.12 6000 49000 282000
30 지오다노 82 3519100 42915.85 9800 38000 124200
30 에스티로더 76 8427000 110881.58 20000 78000 605000
30 시슬리 63 12073400 191641.27 25000 110000 791000
30 아모레 61 5477500 89795.08 5000 60000 360000
30 블루독 56 4098100 73180.36 4600 55100 255200


Best Brand by Age Group : 40
age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
40 식품 1743 74149668 42541.40 650 28131 1050000
40 지오다노 137 6212800 45348.91 7800 39600 157800
40 랑콤 72 7711000 107097.22 7000 95000 340000
40 크리니크 55 3159000 57436.36 10000 43000 250000
40 에스티로더 52 5533000 106403.85 25000 98000 496000
40 폴로 43 7272500 169127.91 35000 160000 335000
40 샤넬 42 2803000 66738.10 14000 52000 255000
40 아모레 42 5106000 121571.43 20000 100000 374000
40 시슬리 39 7226400 185292.31 60000 154000 496000
40 크리스챤디올화장품 38 3624000 95368.42 29000 57000 259000


Best Brand by Age Group : 50
age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
50 식품 976 37976148 38909.99 1680 25000 468768
50 지오다노 47 2844600 60523.40 9800 57600 184800
50 랑콤 31 1985000 64032.26 25000 50000 226000
50 에스티로더 31 3471000 111967.74 20000 105000 381000
50 아모레 30 3299000 109966.67 15000 87500 361000
50 크리니크 29 2187000 75413.79 20000 40000 318000
50 폴로 28 3434100 122646.43 35000 99000 423500
50 헤레나 26 1731000 66576.92 3000 58000 162000
50 뻬띠앙뜨 23 4568000 198608.70 39000 193500 435000
50 샤넬 22 1456000 66181.82 25000 55000 231000

3.9.2. Sex

- 성별/연령대별 인기상품 (코너)

r10 <- sqldf("
select
    gender
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r11 <- sqldf("
select
    gender
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r12 <- sqldf("
select
    gender
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r13 <- sqldf("
select
    gender
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)
Best Corner by Gender & Age Group : 20
gender age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
20 일반식품 255 12405824 48650.29 1000 33684 375000
20 유아동복 138 5923910 42926.88 2000 29000 274000
20 화장품 137 12201400 89061.31 7000 52000 851000
20 유니캐주얼 91 7814040 85868.57 9800 64100 695000
20 영캐주얼 70 8792700 125610.00 22400 88300 738000
20 스포츠 64 6218200 97159.38 12000 66900 420000
20 문화완구 64 1660120 25939.38 5490 22000 89700
20 니트단품 63 4345600 68977.78 4000 53000 359000
20 캐릭터캐주얼 62 11546900 186240.32 27300 150000 735000
20 일반식품 58 2418480 41697.93 4050 31900 250000


Best Corner by Gender & Age Group : 30
gender age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
30 일반식품 1449 46578180 32145.05 840 23020 760000
30 화장품 677 66675700 98487.00 6000 60000 1039000
30 일반식품 489 16705597 34162.78 2683 25420 312558
30 유아동복 466 30982280 66485.58 1180 40000 712000
30 유아동복 454 30966400 68207.93 3600 39000 1053000
30 화장품 264 26551000 100571.97 5000 55500 1152000
30 캐릭터캐주얼 262 73639150 281065.46 1350 179600 3580000
30 스포츠 250 30976570 123906.28 4000 78500 2200000
30 피혁 243 29710600 122265.84 10000 118000 807200
30 영캐주얼 228 28506680 125029.30 6900 92400 494000


Best Corner by Gender & Age Group : 40
gender age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
40 일반식품 878 36296153 41339.58 1500 29783 500000
40 일반식품 865 37853515 43761.29 650 26740 1050000
40 화장품 396 36416100 91959.85 7000 59000 496000
40 유니캐주얼 312 23593420 75619.94 7800 55200 559920
40 스포츠 276 32745170 118641.92 3500 78700 902000
40 영캐주얼 199 27202900 136697.99 5000 98000 887000
40 피혁 197 22184400 112611.17 15000 116000 358000
40 유니캐주얼 192 14465660 75341.98 7400 49000 433000
40 화장품 174 19102500 109784.48 9000 92000 764000
40 니트단품 151 14767380 97797.22 5000 60000 610000


Best Corner by Gender & Age Group : 50
gender age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
50 일반식품 686 29002908 42278.29 1680 25986 468768
50 일반식품 290 8973240 30942.21 2000 23381 180000
50 화장품 217 21854500 100711.98 3000 60000 635000
50 스포츠 128 17870900 139616.41 10000 100000 1125000
50 피혁 122 16256300 133248.36 8000 134100 480000
50 유니캐주얼 121 10311420 85218.35 9800 59800 404500
50 화장품 108 8795300 81437.96 7400 52500 386000
50 니트단품 93 8399820 90320.65 10000 59000 420000
50 수입명품 88 29715400 337675.00 20000 240000 2853000
50 영캐주얼 79 10712300 135598.73 10000 97300 634000



- 성별/연령대별 인기상품 (브랜드)

r14 <- sqldf("
select
    gender
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r15 <- sqldf("
select
    gender
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r16 <- sqldf("
select
    gender
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)

r17 <- sqldf("
select
    gender
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)
Best Brand by Gender & Age Group : 20
gender age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
20 식품 255 12405824 48650.29 1000 33684 375000
20 식품 58 2418480 41697.93 4050 31900 250000
20 지오다노 23 1921900 83560.87 9800 59600 695000
20 NUK 23 554560 24111.30 3500 14800 113400
20 랑콤 20 2174000 108700.00 24000 89500 315000
20 비오뗌 20 1347000 67350.00 17000 44500 311000
20 메이컵포에버 15 460000 30666.67 12000 24000 93000
20 샤넬 14 1094000 78142.86 25000 58000 195000
20 아가방 14 485150 34653.57 2000 26875 142200
20 폴로 12 887600 73966.67 15000 80000 156000


Best Brand by Gender & Age Group : 30
gender age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
30 식품 1449 46578180 32145.05 840 23020 760000
30 식품 489 16705597 34162.78 2683 25420 312558
30 랑콤 75 6739000 89853.33 12000 79000 294000
30 NUK 68 1811660 26642.06 3600 15500 107360
30 크리니크 64 4257000 66515.63 20000 49500 213000
30 샤넬 63 6903000 109571.43 24000 59000 1039000
30 에스티로더 57 6636000 116421.05 20000 72000 605000
30 지오다노 51 2174700 42641.18 9800 36400 119000
30 시슬리 49 8836100 180328.57 25000 105000 791000
30 아모레 47 4197000 89297.87 15000 69000 350000


Best Brand by Gender & Age Group : 40
gender age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
40 식품 878 36296153 41339.58 1500 29783 500000
40 식품 865 37853515 43761.29 650 26740 1050000
40 지오다노 88 4393800 49929.55 7800 39800 157800
40 랑콤 54 5489000 101648.15 7000 95000 330000
40 지오다노 49 1819000 37122.45 9800 32800 108000
40 샤넬 36 2288000 63555.56 14000 52000 255000
40 폴로 34 6041500 177691.18 35000 167500 335000
40 아모레 34 4288000 126117.65 20000 100000 374000
40 시슬리 31 5949100 191906.45 60000 154000 496000
40 에스티로더 30 3424000 114133.33 25000 89000 496000


Best Brand by Gender & Age Group : 50
gender age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
50 식품 686 29002908 42278.29 1680 25986 468768
50 식품 290 8973240 30942.21 2000 23381 180000
50 지오다노 32 1991600 62237.50 9800 54100 184800
50 헤레나 26 1731000 66576.92 3000 58000 162000
50 뻬띠앙뜨 21 3954500 188309.52 39000 175000 435000
50 에스티로더 21 2461000 117190.48 20000 105000 381000
50 크리니크 21 1711000 81476.19 20000 43000 318000
50 랑콤 21 1378000 65619.05 25000 45000 226000
50 안지크 19 10030800 527936.84 80000 258000 2780000
50 아가타 19 1094000 57578.95 29000 44000 132000

3.9.3. Marriage

- 혼인여부별 인기상품 (코너)

r18 <- sqldf("
select
    marriage
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r19 <- sqldf("
select
    marriage
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r20 <- sqldf("
select
    marriage
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r21 <- sqldf("
select
    marriage
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)
Best Corner by Marrige & Age Group : 20
marriage age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
미혼 20 일반식품 218 10849125 49766.63 1190 33246.5 375000
미혼 20 화장품 133 12438400 93521.80 7000 65000.0 1000000
미혼 20 유아동복 100 4081860 40818.60 3500 28295.0 190000
미혼 20 유니캐주얼 95 6788800 71461.05 9800 54800.0 323000
기혼 20 일반식품 95 3975179 41843.99 1000 29477.0 160159
미혼 20 영캐주얼 75 11111900 148158.67 22400 100800.0 809000
미혼 20 캐릭터캐주얼 68 14617690 214966.03 27300 161500.0 1030000
미혼 20 니트단품 62 5796200 93487.10 7300 55000.0 394000
미혼 20 스포츠 58 6096800 105117.24 12000 69500.0 420000
미혼 20 피혁 55 7633600 138792.73 17000 145000.0 429000


Best Corner by Marrige & Age Group : 30
marriage age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
기혼 30 일반식품 1295 43128045 33303.51 930 25779 390000
기혼 30 유아동복 714 46545290 65189.48 1180 40000 1053000
미혼 30 일반식품 643 20155732 31346.40 840 20000 760000
기혼 30 화장품 476 46456300 97597.27 5000 61500 1039000
미혼 30 화장품 465 46770400 100581.51 6000 58000 1152000
기혼 30 문화완구 237 7497790 31636.24 2300 21000 459000
기혼 30 스포츠 223 27724490 124325.07 3800 69000 2200000
기혼 30 유니캐주얼 217 14494700 66795.85 6000 50400 396000
미혼 30 캐릭터캐주얼 206 61041900 296319.90 7000 189500 3580000
미혼 30 유아동복 206 15403390 74773.74 3000 39000 1008810


Best Corner by Marrige & Age Group : 40
marriage age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
기혼 40 일반식품 1654 68496913 41412.89 650 28101 1050000
기혼 40 화장품 510 47893600 93909.02 7000 64000 764000
기혼 40 유니캐주얼 477 35592580 74617.57 7400 53000 559920
기혼 40 스포츠 399 45776970 114729.25 3500 74000 1400000
기혼 40 피혁 282 31413900 111396.81 11000 117000 506000
기혼 40 영캐주얼 242 31380203 129670.26 5000 89000 1524000
기혼 40 유아동복 205 16184180 78947.22 3800 51000 726000
기혼 40 니트단품 196 17586400 89726.53 5000 60000 610000
기혼 40 섬유 192 10606680 55243.12 1500 35000 360000
기혼 40 트래디셔널캐주얼 172 24430900 142040.12 10000 95000 1780100


Best Corner by Marrige & Age Group : 50
marriage age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
기혼 50 일반식품 832 33137378 39828.58 1680 25101 468768
기혼 50 화장품 306 27421800 89613.73 3000 58000 635000
기혼 50 스포츠 186 27634300 148571.51 10000 107550 1550000
기혼 50 유니캐주얼 172 14471640 84137.44 4000 62850 404500
기혼 50 피혁 169 22064900 130561.54 8000 133200 480000
기혼 50 니트단품 147 14888180 101280.14 5500 59000 531200
미혼 50 일반식품 144 4838770 33602.57 3140 24885 193098
기혼 50 영캐주얼 121 15186500 125508.26 10000 96000 634000
기혼 50 캐릭터캐주얼 120 30121800 251015.00 18000 172500 1515000
기혼 50 트래디셔널캐주얼 106 17061600 160958.49 10000 109950 1463000



- 혼인여부별 인기상품 (브랜드)

r22 <- sqldf("
select
    marriage
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r23 <- sqldf("
select
    marriage
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r24 <- sqldf("
select
    marriage
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)

r25 <- sqldf("
select
    marriage
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)
Best Brand by Marrige & Age Group : 20
marriage age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
미혼 20 식품 218 10849125 49766.63 1190 33246.5 375000
기혼 20 식품 95 3975179 41843.99 1000 29477.0 160159
미혼 20 지오다노 28 1403900 50139.29 9800 43800.0 104400
미혼 20 랑콤 26 2684000 103230.77 24000 77000.0 315000
미혼 20 폴로 19 1728600 90978.95 15000 82000.0 201000
미혼 20 샤넬 17 1100000 64705.88 25000 53000.0 195000
기혼 20 비오뗌 14 999000 71357.14 21000 38500.0 311000
미혼 20 메이컵포에버 14 459000 32785.71 14000 22000.0 93000
미혼 20 시스템 13 2251000 173153.85 22400 119000.0 809000
미혼 20 리바이스 13 1394200 107246.15 20000 105000.0 200000


Best Brand by Marrige & Age Group : 30
marriage age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
기혼 30 식품 1295 43128045 33303.51 930 25779 390000
미혼 30 식품 643 20155732 31346.40 840 20000 760000
기혼 30 NUK 64 1669020 26078.44 4000 16250 120800
기혼 30 지오다노 57 2646700 46433.33 9800 39600 124200
기혼 30 블루독 54 3990100 73890.74 4600 55100 255200
기혼 30 랑콤 53 5700000 107547.17 12000 98000 294000
기혼 30 에스티로더 51 6262000 122784.31 27000 72000 605000
미혼 30 랑콤 51 4383000 85941.18 15000 68000 541000
기혼 30 크리니크 51 4040000 79215.69 20000 46000 282000
기혼 30 샤넬 50 4661000 93220.00 7000 52000 1039000


Best Brand by Marrige & Age Group : 40
marriage age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
기혼 40 식품 1654 68496913 41412.89 650 28101 1050000
기혼 40 지오다노 130 5733200 44101.54 7800 39600 144400
미혼 40 식품 89 5652755 63514.10 3091 30000 1020000
기혼 40 랑콤 66 6963000 105500.00 7000 95000 340000
기혼 40 크리니크 55 3159000 57436.36 10000 43000 250000
기혼 40 에스티로더 48 4905000 102187.50 25000 102000 255000
기혼 40 크리스챤디올화장품 37 3525000 95270.27 29000 57000 259000
기혼 40 폴로 36 5530500 153625.00 35000 133500 335000
기혼 40 아모레 36 3940000 109444.44 20000 95000 374000
기혼 40 시슬리 34 6682100 196532.35 60000 160000 496000


Best Brand by Marrige & Age Group : 50
marriage age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
기혼 50 식품 832 33137378 39828.58 1680 25101 468768
미혼 50 식품 144 4838770 33602.57 3140 24885 193098
기혼 50 지오다노 45 2723800 60528.89 9800 49800 184800
기혼 50 아모레 30 3299000 109966.67 15000 87500 361000
기혼 50 랑콤 30 1900000 63333.33 25000 48000 226000
기혼 50 에스티로더 29 3176000 109517.24 20000 103000 381000
기혼 50 크리니크 29 2187000 75413.79 20000 40000 318000
기혼 50 폴로 28 3434100 122646.43 35000 99000 423500
기혼 50 헤레나 26 1731000 66576.92 3000 58000 162000
기혼 50 뻬띠앙뜨 23 4568000 198608.70 39000 193500 435000

3.9.4. Store

- 지점별 인기상품 (코너)

r26 <- sqldf("
select
    store
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r27 <- sqldf("
select
    store
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r28 <- sqldf("
select
    store
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r29 <- sqldf("
select
    store
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)
Best Corner by Store & Age Group : 20
store age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
신촌점 20 일반식품 175 8760646 50060.83 1000 38152 250000
신촌점 20 화장품 76 5498900 72353.95 10000 48500 327000
신촌점 20 유니캐주얼 67 4869540 72679.70 12000 54800 279200
무역점 20 일반식품 61 2195615 35993.69 3000 19352 375000
신촌점 20 유아동복 59 3385260 57377.29 7500 39000 274000
본점 20 일반식품 57 3230948 56683.30 7000 44485 300000
본점 20 화장품 52 5701500 109644.23 14000 77500 1000000
천호점 20 유아동복 46 1531200 33286.96 4500 25800 148000
무역점 20 화장품 37 3517000 95054.05 15000 88000 315000
천호점 20 문화완구 36 795180 22088.33 7600 17900 58000


Best Corner by Store & Age Group : 30
store age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
신촌점 30 일반식품 814 23181935 28479.04 840 21755.5 220000
무역점 30 일반식품 418 12751348 30505.62 2350 21580.0 760000
천호점 30 일반식품 374 12345166 33008.47 2500 24915.0 312558
천호점 30 유아동복 358 24537410 68540.25 3000 37050.0 1053000
신촌점 30 유아동복 339 20134390 59393.48 1180 37000.0 712000
본점 30 일반식품 332 15005328 45196.77 3460 35774.5 390000
신촌점 30 화장품 270 26614100 98570.74 6000 65000.0 1039000
무역점 30 화장품 239 24011000 100464.44 14000 60000.0 740000
본점 30 화장품 218 25380000 116422.02 13000 58500.0 1152000
천호점 30 화장품 214 17221600 80474.77 5000 55500.0 350000


Best Corner by Store & Age Group : 40
store age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
천호점 40 일반식품 529 27565162 52108.06 1400 31609 1020000
무역점 40 일반식품 498 20586674 41338.70 3091 30658 500000
신촌점 40 일반식품 471 14063379 29858.55 650 21800 311598
본점 40 일반식품 245 11934453 48712.05 1400 30000 1050000
신촌점 40 유니캐주얼 178 13480440 75732.81 7400 49350 452000
천호점 40 화장품 166 15501600 93383.13 7000 69000 374000
무역점 40 화장품 163 17635000 108190.18 8000 76000 764000
천호점 40 유니캐주얼 140 9146920 65335.14 7800 45900 339000
무역점 40 유니캐주얼 139 11114600 79961.15 12000 69000 404000
신촌점 40 화장품 139 11107000 79906.47 9000 58000 259000


Best Corner by Store & Age Group : 50
store age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
본점 50 일반식품 351 12933605 36847.88 1680 21001.0 468768
무역점 50 일반식품 332 10988844 33098.93 2000 24595.0 200000
신촌점 50 일반식품 224 9944397 44394.63 3464 34823.5 420565
본점 50 화장품 129 12330500 95585.27 3000 58000.0 607000
무역점 50 화장품 105 9995000 95190.48 5000 64000.0 635000
무역점 50 스포츠 70 11651400 166448.57 21000 115000.0 1550000
천호점 50 일반식품 69 4109302 59555.10 5000 45958.0 236791
무역점 50 피혁 62 7787100 125598.39 8000 124500.0 480000
무역점 50 유니캐주얼 57 5432420 95305.61 13000 69600.0 404500
무역점 50 수입명품 55 23001600 418210.91 35000 315000.0 2853000



- 지점별 인기상품 (브랜드)

r30 <- sqldf("
select
    store
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r31 <- sqldf("
select
    store
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r32 <- sqldf("
select
    store
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)

r33 <- sqldf("
select
    store
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)
Best Brand by Store & Age Group : 20
store age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
신촌점 20 식품 175 8760646 50060.83 1000 38152 250000
무역점 20 식품 61 2195615 35993.69 3000 19352 375000
본점 20 식품 57 3230948 56683.30 7000 44485 300000
천호점 20 식품 20 637095 31854.75 5500 28497 69669
신촌점 20 비오뗌 17 806000 47411.76 17000 39000 105000
신촌점 20 지오다노 14 721200 51514.29 19800 47300 99400
신촌점 20 랑콤 10 1143000 114300.00 31000 82500 301000
무역점 20 랑콤 10 1102000 110200.00 35000 97500 315000
신촌점 20 메이컵포에버 10 273000 27300.00 12000 20000 93000
신촌점 20 샤넬 9 505000 56111.11 29000 34000 161000


Best Brand by Store & Age Group : 30
store age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
신촌점 30 식품 814 23181935 28479.04 840 21755.5 220000
무역점 30 식품 418 12751348 30505.62 2350 21580.0 760000
천호점 30 식품 374 12345166 33008.47 2500 24915.0 312558
본점 30 식품 332 15005328 45196.77 3460 35774.5 390000
천호점 30 NUK 46 1123300 24419.57 3000 17000.0 89000
본점 30 샤넬 41 4547000 110902.44 23000 59000.0 617000
천호점 30 아모레 39 3608500 92525.64 5000 70000.0 350000
신촌점 30 NUK 38 1127360 29667.37 5000 16250.0 120800
신촌점 30 샤넬 35 3572000 102057.14 7000 59000.0 1039000
신촌점 30 랑콤 33 3424000 103757.58 15000 80000.0 294000


Best Brand by Store & Age Group : 40
store age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
천호점 40 식품 529 27565162 52108.06 1400 31609 1020000
무역점 40 식품 498 20586674 41338.70 3091 30658 500000
신촌점 40 식품 471 14063379 29858.55 650 21800 311598
본점 40 식품 245 11934453 48712.05 1400 30000 1050000
천호점 40 지오다노 53 2184700 41220.75 7800 39600 157800
무역점 40 지오다노 36 1946800 54077.78 16800 39800 144400
신촌점 40 랑콤 29 2658000 91655.17 17000 81000 195000
본점 40 폴로 27 4901000 181518.52 35000 175000 325000
천호점 40 아모레 26 3138000 120692.31 20000 100000 374000
신촌점 40 지오다노 26 1021400 39284.62 9800 31300 137200


Best Brand by Store & Age Group : 50
store age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
본점 50 식품 351 12933605 36847.88 1680 21001.0 468768
무역점 50 식품 332 10988844 33098.93 2000 24595.0 200000
신촌점 50 식품 224 9944397 44394.63 3464 34823.5 420565
천호점 50 식품 69 4109302 59555.10 5000 45958.0 236791
무역점 50 헤레나 21 1554000 74000.00 8000 58000.0 162000
신촌점 50 안지크 18 8534800 474155.56 80000 253000.0 2780000
본점 50 뻬띠앙뜨 18 3566500 198138.89 39000 185000.0 435000
무역점 50 지오다노 17 952800 56047.06 21800 48200.0 119400
본점 50 에스티로더 15 1728000 115200.00 20000 91000.0 381000
본점 50 크리니크 15 1437000 95800.00 20000 48000.0 318000

3.9.5. Job

- 직업별 인기상품 (코너)

r34 <- sqldf("
select
    job
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r35 <- sqldf("
select
    job
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r36 <- sqldf("
select
    job
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r37 <- sqldf("
select
    job
    , age_grp
    , corner
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)
Best Corner by Job & Age Group : 20
job age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
Entrepreneur 20 일반식품 87 4349841 49998.17 5500 41630.0 160159
Manufacturing 20 유아동복 80 2881810 36022.62 4500 27700.0 148000
Manufacturing 20 일반식품 66 2804128 42486.79 3000 24304.5 375000
Education 20 일반식품 55 3404876 61906.84 4000 48869.0 153256
Entrepreneur 20 화장품 52 4923900 94690.38 12000 47000.0 851000
Company 20 일반식품 44 1912306 43461.50 4050 32072.0 250000
Entrepreneur 20 유아동복 38 2254850 59338.16 2000 36720.0 274000
Medical 20 일반식품 38 1438278 37849.42 1000 15960.0 300000
Manufacturing 20 문화완구 37 953080 25758.92 7600 20000.0 89700
Manufacturing 20 유니캐주얼 33 2406700 72930.30 9800 49000.0 301000


Best Corner by Job & Age Group : 30
job age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
Entrepreneur 30 일반식품 759 23575264 31060.95 2600 24020 390000
Entrepreneur 30 유아동복 325 24747920 76147.45 3600 44000 1053000
Entrepreneur 30 화장품 324 31132300 96087.35 6000 59000 1109000
Manufacturing 30 일반식품 285 9867681 34623.44 930 25312 760000
Manufacturing 30 유아동복 278 16836110 60561.55 1180 37300 612090
Manufacturing 30 화장품 198 19136500 96648.99 7000 63000 740000
Entrepreneur 30 스포츠 145 14695020 101344.97 7500 67000 394000
Entrepreneur 30 캐릭터캐주얼 128 37923300 296275.78 9000 198000 1179000
Finance 30 일반식품 125 4749038 37992.30 3460 35051 161400
Logistic 30 일반식품 118 2846662 24124.25 2350 16860 116000


Best Corner by Job & Age Group : 40
job age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
Entrepreneur 40 일반식품 1078 46946882 43549.98 650 27281.0 1050000
Entrepreneur 40 화장품 327 31369800 95932.11 7500 62000.0 496000
Entrepreneur 40 유니캐주얼 310 23574940 76048.19 7400 55000.0 559920
Entrepreneur 40 스포츠 268 31304420 116807.54 5000 78700.0 902000
Entrepreneur 40 피혁 208 22489500 108122.60 11000 91000.0 506000
Manufacturing 40 일반식품 166 4996687 30100.52 3091 26587.5 103088
Medical 40 일반식품 160 7029628 43935.18 3000 36170.0 400000
Entrepreneur 40 영캐주얼 158 18576000 117569.62 9000 91800.0 887000
Entrepreneur 40 트래디셔널캐주얼 140 18475600 131968.57 10000 97300.0 420000
Entrepreneur 40 니트단품 134 12194240 91001.79 5000 53550.0 504000


Best Corner by Job & Age Group : 50
job age_grp corner freq tot_amount avg_amount min_amount med_amount max_amount
Entrepreneur 50 일반식품 381 14490435 38032.64 1680 26089 420565
Entrepreneur 50 화장품 133 13477600 101335.34 5000 62000 635000
Company 50 일반식품 125 2620449 20963.59 2000 17754 64647
Entrepreneur 50 피혁 111 14963600 134807.21 10000 138000 412200
Entrepreneur 50 유니캐주얼 111 9011000 81180.18 4000 59800 380000
Profession 50 일반식품 103 3490670 33890.00 3140 23386 193098
IT 50 일반식품 99 2832834 28614.48 3464 17493 160000
Construction 50 일반식품 95 5154091 54253.59 10000 38293 236791
Entrepreneur 50 스포츠 94 12959500 137867.02 14000 98200 1125000
Entrepreneur 50 캐릭터캐주얼 89 23030100 258765.17 18000 177000 1515000



- 직업별 인기상품 (브랜드)

r38 <- sqldf("
select
    job
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '20'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r39 <- sqldf("
select
    job
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '30'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)


r40 <- sqldf("
select
    job
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '40'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)

r41 <- sqldf("
select
    job
    , age_grp
    , brand
    , count(*)              as freq
    , sum(amount)           as tot_amount
    , round(avg(amount), 2) as avg_amount
    , min(amount)           as min_amount
    , median(amount)        as med_amount
    , max(amount)           as max_amount
from sa_data
where age_grp = '50'
group by 1, 2, 3
order by 4 desc , 5 desc, 6 desc
") %>% head(10)
Best Brand by Job & Age Group : 20
job age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
Entrepreneur 20 식품 87 4349841 49998.17 5500 41630.0 160159
Manufacturing 20 식품 66 2804128 42486.79 3000 24304.5 375000
Education 20 식품 55 3404876 61906.84 4000 48869.0 153256
Company 20 식품 44 1912306 43461.50 4050 32072.0 250000
Medical 20 식품 38 1438278 37849.42 1000 15960.0 300000
Finance 20 식품 19 790191 41589.00 8700 30492.0 188220
Medical 20 지오다노 11 683400 62127.27 19800 59600.0 99400
Entrepreneur 20 NUK 11 355740 32340.00 7500 20800.0 113400
Entrepreneur 20 랑콤 9 1169000 129888.89 25000 65000.0 315000
Medical 20 랑콤 9 812000 90222.22 24000 89000.0 202000


Best Brand by Job & Age Group : 30
job age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
Entrepreneur 30 식품 759 23575264 31060.95 2600 24020 390000
Manufacturing 30 식품 285 9867681 34623.44 930 25312 760000
Finance 30 식품 125 4749038 37992.30 3460 35051 161400
Logistic 30 식품 118 2846662 24124.25 2350 16860 116000
Public 30 식품 110 2789175 25356.14 2500 19350 92111
Education 30 식품 93 1792794 19277.35 840 10000 300000
Company 30 식품 71 2290769 32264.35 5000 25420 140000
Profession 30 식품 69 2725934 39506.29 4000 27000 180000
IT 30 식품 67 1657305 24735.90 2900 18233 91235
Medical 30 식품 61 2004601 32862.31 3000 19650 220000


Best Brand by Job & Age Group : 40
job age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
Entrepreneur 40 식품 1078 46946882 43549.98 650 27281.0 1050000
Manufacturing 40 식품 166 4996687 30100.52 3091 26587.5 103088
Medical 40 식품 160 7029628 43935.18 3000 36170.0 400000
Education 40 식품 112 4124437 36825.33 3600 29297.5 160000
Entrepreneur 40 지오다노 84 3998900 47605.95 9800 39800.0 157800
Construction 40 식품 77 5907308 76718.29 6750 43224.0 1020000
Entrepreneur 40 랑콤 53 5998000 113169.81 17000 100000.0 340000
Official 40 식품 38 883056 23238.32 3350 15275.0 80000
Entrepreneur 40 폴로 36 6420500 178347.22 35000 170000.0 335000
Logistic 40 식품 35 497301 14208.60 3800 11902.0 53000


Best Brand by Job & Age Group : 50
job age_grp brand freq tot_amount avg_amount min_amount med_amount max_amount
Entrepreneur 50 식품 381 14490435 38032.64 1680 26089.0 420565
Company 50 식품 125 2620449 20963.59 2000 17754.0 64647
Profession 50 식품 103 3490670 33890.00 3140 23386.0 193098
IT 50 식품 99 2832834 28614.48 3464 17493.0 160000
Construction 50 식품 95 5154091 54253.59 10000 38293.0 236791
Manufacturing 50 식품 55 2681787 48759.76 3600 37761.0 200000
Education 50 식품 26 1356718 52181.46 5900 48384.5 112000
Press 50 식품 25 3793028 151721.12 11419 117509.0 468768
Education 50 헤레나 25 1707000 68280.00 3000 58000.0 162000
Laboratory 50 식품 25 545013 21800.52 7536 16950.0 81236

4. Feature Engneering

  • 14개의 파생변수 생성

4.1. v1 : 백화점 고객의 환불행태 (금액, 건수)에 대한 변수 생성

cs <- read.any('./input/HDS_Customers.tab', sep="\t", header=T, stringsAsFactors = F)
tr <- read.any('./input/HDS_Transactions_MG.tab', sep="\t", header=T, stringsAsFactors = F)


cs.v1 = tr %>%
  filter(net_amt < 0) %>%
  group_by(custid) %>%
  summarise(rf_amt = sum(net_amt), rf_cnt = n())

head(cs.v1)
## # A tibble: 6 x 3
##   custid   rf_amt rf_cnt
##    <int>    <int>  <int>
## 1      1  -191920      1
## 2      2  -259350      2
## 3      3 -3849460     15
## 4      6  -221730      4
## 5      8 -1934341     21
## 6      9  -820800      6

4.2. v2 : 백화점 고객의 구매상품 다양성에 대한 변수 생성

cs.v2 = tr %>%
  distinct(custid, brd_nm) %>%
  group_by(custid) %>%
  summarize(buy_brd = n())

head(cs.v2)
## # A tibble: 6 x 2
##   custid buy_brd
##    <int>   <int>
## 1      1      23
## 2      2      16
## 3      3      30
## 4      4       6
## 5      5       4
## 6      6      14

4.3. v3 : 백화점 고객의 내점일수와 평균 구매주기 계산

start_date = ymd(ymd_hms(min(tr$sales_date)))
end_date = ymd(ymd_hms(max(tr$sales_date)))

cs.v3 = tr %>%
  distinct(custid, sales_date) %>%
  group_by(custid) %>%
  summarise(visits = n()) %>% 
  mutate(API = as.integer(end_date - start_date) / visits)

head(cs.v3)
## # A tibble: 6 x 3
##   custid visits    API
##    <int>  <int>  <dbl>
## 1      1     41   8.85
## 2      2     11  33   
## 3      3     27  13.4 
## 4      4      4  90.8 
## 5      5      3 121   
## 6      6     15  24.2

4.4. v4 : 내점 당 구매건수 도출

tmp = tr %>%
  group_by(custid) %>%
  summarise(n = n())

head(tmp)
## # A tibble: 6 x 2
##   custid     n
##    <int> <int>
## 1      1    77
## 2      2    28
## 3      3    68
## 4      4     6
## 5      5     4
## 6      6    29
cs.v4 = inner_join(cs.v3, tmp, by = "custid") %>%
  mutate(NPPV = n / visits) %>%
  select(custid, NPPV)

head(cs.v4)
## # A tibble: 6 x 2
##   custid  NPPV
##    <int> <dbl>
## 1      1  1.88
## 2      2  2.55
## 3      3  2.52
## 4      4  1.5 
## 5      5  1.33
## 6      6  1.93

4.5. v5 : 백화점 고객의 주중/주말 구매패턴에 대한 변수 생성

cs.v5 = tr %>%
  mutate(wk_amt = ifelse(wday(sales_date) %in% 2:6, net_amt, 0), 
                      we_amt = ifelse(wday(sales_date) %in% c(1,7), net_amt, 0)) %>% 
  group_by(custid) %>%
  summarize_each(funs(sum), wk_amt, we_amt) %>%
  mutate(wk_pat = ifelse(wk_amt >= we_amt * 1.5, "주중형", 
                         ifelse(we_amt >= wk_amt * 1.5, "주말형", "유형없음")))

head(cs.v5)
## # A tibble: 6 x 4
##   custid  wk_amt  we_amt wk_pat
##    <int>   <dbl>   <dbl> <chr> 
## 1      1 2590892 1167489 주중형
## 2      2 1735009  326170 주중형
## 3      3 5742729  683320 주중형
## 4      4  254300  127000 주중형
## 5      5  129600   26100 주중형
## 6      6  460900 1556500 주말형

4.6. v6 : 고객의 생일로부터 특정시점의 나이와 연령대를 계산

# v6 고객의 생일로부터 특정시점의 나이와 연령대를 계산
#현재 시점(2018-11-10)을 적용하게 되면, 데이터 기준이 차이나,
#연령대 분류가 영향을 받게 되므로, ppt에 제시된 시점을 사용 (2001-05-01)

cs.v6 = cs %>%
  mutate(age=year('2001-05-01') - year(ymd_hms(birth))) %>%
  mutate(age=ifelse(age < 10 | age > 100, NA, age)) %>%
  mutate(age=ifelse(is.na(age),round(mean(age,na.rm=T)),age)) %>%
  mutate(agegrp=cut(age, c(0,19,29,39,49,59,69,100), labels=F)*10) %>%
  select(custid, age, agegrp)

head(cs.v6)
##   custid age agegrp
## 1      1  36     30
## 2      2  36     30
## 3      3  36     30
## 4      4  19     10
## 5      5  19     10
## 6      6  20     20
tail(cs.v6)
##       custid age agegrp
## 49990  49995  28     20
## 49991  49996  90     70
## 49992  49997  90     70
## 49993  49998  31     30
## 49994  49999  58     50
## 49995  50000  36     30
## parse error 처리 필요 없음. 
sum(is.na(cs.v6))
## [1] 0

4.7. v7 : 백화점 고객의 최근 12개월 구매금액 및 구매횟수에 대한 변수 생성

# v7 : 백화점 고객의 최근 12개월 구매금액 및 구매횟수에 대한 변수 생성
end_date = ymd(ymd_hms(max(tr$sales_date)))
start_date = ymd('20010501') - months(12)

cs.v7.12 <- tr %>% 
  filter(start_date<=sales_date & sales_date<=end_date) %>%
  group_by(custid) %>% 
  summarize(amt12=sum(net_amt), nop12=n())

head(cs.v7.12)
## # A tibble: 6 x 3
##   custid   amt12 nop12
##    <int>   <int> <int>
## 1      1 3758381    77
## 2      2 2061179    28
## 3      3 6426049    68
## 4      4  381300     6
## 5      5  155700     4
## 6      6 2017400    29
# 최근 3개월, 6개월, 12개월 구매 금액 및 횟수 계산 및 병합
# 최근 6개월
start_date = ymd('20010501') - months(6)
cs.v7.06 <- tr %>% 
  filter(start_date<=sales_date & sales_date<=end_date) %>%
  group_by(custid) %>% 
  summarize(amt6=sum(net_amt), nop6=n())
head(cs.v7.06)
## # A tibble: 6 x 3
##   custid    amt6  nop6
##    <int>   <int> <int>
## 1      1 1337108    34
## 2      2  190929     7
## 3      3 4021376    41
## 4      4  381300     6
## 5      5  155700     4
## 6      6  786060    14
# 최근 3개월
start_date <- ymd('20010501') - months(3)
cs.v7.03 <- tr %>% 
  filter(start_date<=sales_date & sales_date<=end_date) %>%
  group_by(custid) %>% 
  summarize(amt3=sum(net_amt), nop3=n())
head(cs.v7.03)
## # A tibble: 6 x 3
##   custid    amt3  nop3
##    <int>   <int> <int>
## 1      1  511654    13
## 2      2  190929     7
## 3      3 3227130    18
## 4      4   89700     2
## 5      5  155700     4
## 6      6   18060     3
# NA => 0 대체 코드 삽입
cs.v7.03[is.na(cs.v7.03$amt3)] = 0
cs.v7.03[is.na(cs.v7.03$nop3)] = 0
# merging cs.v7
cs.v7 <- left_join(cs.v7.12, cs.v7.06) %>% left_join(cs.v7.03)

# NA => 0 대체 코드 삽입
cs.v7$amt3[is.na(cs.v7$amt3)] = 0
cs.v7$nop3[is.na(cs.v7$nop3)] = 0
cs.v7$amt6[is.na(cs.v7$amt6)] = 0
cs.v7$nop6[is.na(cs.v7$nop6)] = 0
cs.v7$amt12[is.na(cs.v7$amt12)] = 0
cs.v7$nop12[is.na(cs.v7$nop12)] = 0

4.8. v8 : 가격 선호도 변수 생성

### 모든 attribute의 값은 string이 아닌 integer value 로 지정. 
## 가격 선호도 변수 (v8) (custsig$pref) 
# cmv = customer mean buying price value
# pref = price preference (0 = "저가", 1 = "중가", 2 ="고가")
cs.v8 = tr %>% filter(net_amt > 0) %>% group_by(custid)

summary(cs.v8$net_amt)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##       50    27000    56050   118554   123300 72000000
medval = as.integer(summary(cs.v8$net_amt)[3])

qt3 = as.integer(summary(cs.v8$net_amt)[5])
# 고객별 구매 가격의 평균이 전체 구매 가격의 3rd quantile 을 넘어가면 고가 선호 고객 (value = 2)
# 고객별 구매 가격의 평균이 전체 구매 가격의 Median을 넘지 않는 경우 저가 선호 고객 (value = 0)
# 그 외 고객은 모두 중가 선호 고객으로 분류 (value = 1)
cs.v8 = tr %>% filter(net_amt > 0) %>% #환불은 고려하지 않음
  group_by(custid) %>% summarize(cmv = mean(net_amt)) %>%
  mutate(pref = ifelse(cmv < medval,0,ifelse(cmv > qt3,2,1)))
table(cs.v8$pref)
## 
##     0     1     2 
##  9443 24687 15865

4.9. v9 : 시즌 선호도 변수 생성

# 시즌 선호도 변수(v9) (custsig$se_pat)
# 봄 : 3~5월 (value = 1) / 여름 : 6~8월 (value = 2)/ 
# 가을: 9~11월 (value = 3)/ 겨울 : 12~2월 (value = 4)로 지정하여 각 구매 금액을 변수에 지정하였음 
cs.v9 = tr %>%
  mutate(sp_amt = ifelse(month(sales_date) %in% 3:5, net_amt, 0),
         su_amt = ifelse(month(sales_date) %in% 6:8, net_amt, 0),
         fa_amt = ifelse(month(sales_date) %in% 9:11, net_amt, 0),
         wi_amt = ifelse(month(sales_date) %in% c(12,1,2), net_amt, 0)) %>%
  group_by(custid) %>% summarize_each(funs(sum), sp_amt, su_amt, fa_amt, wi_amt)
head(cs.v9)
## # A tibble: 6 x 5
##   custid  sp_amt  su_amt  fa_amt wi_amt
##    <int>   <dbl>   <dbl>   <dbl>  <dbl>
## 1      1  714038 1414688  776028 853627
## 2      2  112709 1870250       0  78220
## 3      3 3227130 1690100  714573 794246
## 4      4   89700       0       0 291600
## 5      5  155700       0       0      0
## 6      6   29160  190950 1505490 291800
cs.v9 = mutate(cs.v9, se_pat = max.col(cs.v9) - 1)
head(cs.v9)
## # A tibble: 6 x 6
##   custid  sp_amt  su_amt  fa_amt wi_amt se_pat
##    <int>   <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
## 1      1  714038 1414688  776028 853627      2
## 2      2  112709 1870250       0  78220      2
## 3      3 3227130 1690100  714573 794246      1
## 4      4   89700       0       0 291600      4
## 5      5  155700       0       0      0      1
## 6      6   29160  190950 1505490 291800      3
## 구매추세 패턴 (v10)
start_date1 = ymd(ymd_hms(min(tr$sales_date)))
end_date1 = ymd("2000-10-29")
# as.integer(end_date1) - as.integer(start_date1) = 181일
start_date2 = ymd("2000-10-30")
end_date2 = ymd(ymd_hms(max(tr$sales_date)))
# as.integer(end_date2) - as.integer(start_date2) = 181일

4.10. v10 : 평균 내점일수의 증감 여부를 구매 추세의 변화 지표 (6개월 단위로 나누어 전반기/후반기)

### 6개월 단위로 전반기/ 후반기를 나누어 
### 평균 내점일수의 증감 여부를 구매 추세의 변화 지표로 사용
cs.v10 = tr %>% distinct(custid, sales_date) %>% group_by(custid) %>%
  mutate(fha = ifelse(sales_date <= end_date1,1,0),
         sha = ifelse(sales_date >= start_date2,1, 0)) %>%
  group_by(custid) %>% summarize_each(funs(sum),fha,sha)
cs.v10$purpat = ifelse(cs.v10$fha < cs.v10$sha,"증가",
                       ifelse(cs.v10$fha == cs.v10$sha,"변화없음","감소"))
# fha = first half year // sha = second half year
# purpat = purchase pattern

head(cs.v10)
## # A tibble: 6 x 4
##   custid   fha   sha purpat
##    <int> <dbl> <dbl> <chr> 
## 1      1    21    20 감소  
## 2      2     7     4 감소  
## 3      3    11    16 증가  
## 4      4     0     4 증가  
## 5      5     0     3 증가  
## 6      6     9     6 감소
tail(cs.v10)
## # A tibble: 6 x 4
##   custid   fha   sha purpat
##    <int> <dbl> <dbl> <chr> 
## 1  49995     1    10 증가  
## 2  49996    12    40 증가  
## 3  49997    13     7 감소  
## 4  49998    16     8 감소  
## 5  49999     0    17 증가  
## 6  50000     1    10 증가

4.11. v11 : 상품별 구매 금액/ 횟수/ 여부 변수 생성

  • for v11/v12/v13 상품별 => goodcd를 쓰기엔 데이터 처리가 복잡하고, 분류하더라도 큰 의미를 찾기 힘들어 상품’군’별로 데이터를 처리하는 것으로 정하고, buyer_nm을 변수로 사용
## for v11/v12/v13 상품별 => goodcd를 쓰기엔 데이터 처리가 복잡하고, 분류하더라도 큰 의미를 찾기 힘들어
## 상품'군'별로 데이터를 처리하는 것으로 정하고, buyer_nm을 변수로 사용

## 상품별 구매 금액/ 횟수/ 여부 변수 (v11)
# puramt = purchase amount (각 상품군 구매 금액)
# purnum = purchase number of time (각 상품군 구매 횟수)
# purexp = purchase experience (각 상품군 구매 여부 (if puramt > 0 : then value = 1, else : value = 0))
cs.v11.amt = tr %>%  group_by(custid,buyer_nm) %>% summarise(puramt = sum(net_amt))

amt.melt = melt(cs.v11.amt,id.vars = c("custid","buyer_nm"), measure.vars = "puramt" )
cs.v11.amt = cast(amt.melt,custid ~ buyer_nm + variable) 
cs.v11.amt[is.na(cs.v11.amt)] = 0 # NA 처리

cs.v11.num = tr %>%  group_by(custid,buyer_nm ) %>% filter(net_amt > 0) %>% summarise(purnum = n())

num.melt = melt(cs.v11.num,id.vars = c("custid","buyer_nm"), measure.vars = "purnum" )
cs.v11.num = cast(num.melt,custid ~ buyer_nm + variable) 
cs.v11.num[is.na(cs.v11.num)] = 0 # NA 처리

cs.v11.exp = tr %>%  group_by(custid,buyer_nm) %>% summarise(purexp = ifelse(sum(net_amt)>0,1,0))

exp.melt = melt(cs.v11.exp,id.vars = c("custid","buyer_nm"), measure.vars = "purexp" )
cs.v11.exp = cast(exp.melt,custid ~ buyer_nm + variable) 
cs.v11.exp[is.na(cs.v11.exp)] = 0 # NA 처리

cs.v11 = left_join(cs.v11.amt, cs.v11.num) %>% left_join(cs.v11.exp)
cs.v11[is.na(cs.v11)] = 0 # NA 처리

head(cs.v11)
##   custid 가구_puramt 가전_puramt 기타바이어_puramt 니트단품_puramt
## 1      1           0      509000                 0               0
## 2      2           0           0                 0               0
## 3      3      491150     2130000                 0               0
## 4      4           0           0                 0               0
## 5      5           0           0                 0               0
## 6      6           0           0                 0               0
##   도자기크리스탈_puramt 디자이너부띠끄_puramt 문화완구_puramt
## 1                 36640                     0               0
## 2                 14400                     0               0
## 3                     0                     0           86550
## 4                     0                     0               0
## 5                     0                     0               0
## 6                     0                     0               0
##   생활용품_puramt 섬유_puramt 수입명품_puramt 스포츠_puramt
## 1           88250       15000               0        179250
## 2               0       26400               0             0
## 3               0      282000         1740000        146000
## 4               0       20000               0        214200
## 5               0       18000               0             0
## 6               0       29160               0        243350
##   엘레강스캐주얼_puramt 영캐주얼_puramt 유니캐주얼_puramt 유아동복_puramt
## 1                     0          352050            558550          129100
## 2                     0          916100                 0               0
## 3                     0          185050             63600          755360
## 4                     0               0                 0               0
## 5                     0           26100                 0               0
## 6                     0          119000            426090               0
##   일반식품_puramt 장신구_puramt 점외_puramt 정장셔츠_puramt
## 1         1472691         40000           0               0
## 2          135509         96000           0          552300
## 3          113919             0           0               0
## 4            8000             0           0               0
## 5               0             0           0               0
## 6               0             0           0               0
##   조리식품_puramt 조리욕실_puramt 청과곡물_puramt 침구수예_puramt
## 1               0           90100               0               0
## 2               0           44020               0               0
## 3               0          340620               0           50000
## 4               0               0               0               0
## 5               0               0               0               0
## 6               0               0               0               0
##   캐릭터캐주얼_puramt 타운모피_puramt 트래디셔널캐주얼_puramt 피혁A_puramt
## 1                   0               0                       0       208750
## 2                   0               0                       0            0
## 3                   0               0                       0            0
## 4                   0               0                   57400            0
## 5                   0               0                       0            0
## 6                   0               0                  868250       140600
##   피혁B_puramt 행사장(남성)_puramt 행사장(아동스포츠)_puramt
## 1            0                   0                         0
## 2            0                   0                         0
## 3            0                   0                         0
## 4            0                   0                         0
## 5        83700                   0                         0
## 6            0                   0                         0
##   행사장(여성정장)_puramt 행사장(여성캐주얼)_puramt
## 1                       0                         0
## 2                       0                         0
## 3                       0                         0
## 4                       0                         0
## 5                       0                         0
## 6                       0                         0
##   행사장(여성캐쥬)_puramt 행사장(잡화)_puramt 화장품_puramt 가구_purnum
## 1                       0                   0         79000           0
## 2                       0                   0        276450           0
## 3                       0                   0         41800           1
## 4                       0                   0         81700           0
## 5                       0                   0         27900           0
## 6                       0                   0        190950           0
##   가전_purnum 기타바이어_purnum 니트단품_purnum 도자기크리스탈_purnum
## 1           1                 0               0                     3
## 2           0                 0               0                     1
## 3           2                 0               0                     0
## 4           0                 0               0                     0
## 5           0                 0               0                     0
## 6           0                 0               0                     0
##   디자이너부띠끄_purnum 문화완구_purnum 생활용품_purnum 섬유_purnum
## 1                     0               0               5           1
## 2                     0               0               0           2
## 3                     0               2               0           3
## 4                     0               0               0           1
## 5                     0               0               0           1
## 6                     0               0               0           1
##   수입명품_purnum 스포츠_purnum 엘레강스캐주얼_purnum 영캐주얼_purnum
## 1               0             3                     0               2
## 2               0             0                     0               6
## 3               5             9                     0               2
## 4               0             2                     0               0
## 5               0             0                     0               1
## 6               0             4                     0               1
##   유니캐주얼_purnum 유아동복_purnum 일반식품_purnum 장신구_purnum
## 1                13               2              39             2
## 2                 0               0               4             1
## 3                 2              18               4             0
## 4                 0               0               1             0
## 5                 0               0               0             0
## 6                 9               0               0             0
##   점외_purnum 정장셔츠_purnum 조리식품_purnum 조리욕실_purnum
## 1           0               0               0               2
## 2           0               8               0               2
## 3           0               0               0               3
## 4           0               0               0               0
## 5           0               0               0               0
## 6           0               0               0               0
##   청과곡물_purnum 침구수예_purnum 캐릭터캐주얼_purnum 타운모피_purnum
## 1               0               0                   0               0
## 2               0               0                   0               0
## 3               0               1                   0               0
## 4               0               0                   0               0
## 5               0               0                   0               0
## 6               0               0                   0               0
##   트래디셔널캐주얼_purnum 피혁A_purnum 피혁B_purnum 행사장(남성)_purnum
## 1                       0            2            0                   0
## 2                       0            0            0                   0
## 3                       0            0            0                   0
## 4                       1            0            0                   0
## 5                       0            0            1                   0
## 6                       6            1            0                   0
##   행사장(아동스포츠)_purnum 행사장(여성정장)_purnum
## 1                         0                       0
## 2                         0                       0
## 3                         0                       0
## 4                         0                       0
## 5                         0                       0
## 6                         0                       0
##   행사장(여성캐주얼)_purnum 행사장(여성캐쥬)_purnum 행사장(잡화)_purnum
## 1                         0                       0                   0
## 2                         0                       0                   0
## 3                         0                       0                   0
## 4                         0                       0                   0
## 5                         0                       0                   0
## 6                         0                       0                   0
##   화장품_purnum 가구_purexp 가전_purexp 기타바이어_purexp 니트단품_purexp
## 1             1           0           1                 0               0
## 2             2           0           0                 0               0
## 3             1           1           1                 0               0
## 4             1           0           0                 0               0
## 5             1           0           0                 0               0
## 6             3           0           0                 0               0
##   도자기크리스탈_purexp 디자이너부띠끄_purexp 문화완구_purexp
## 1                     1                     0               0
## 2                     1                     0               0
## 3                     0                     0               1
## 4                     0                     0               0
## 5                     0                     0               0
## 6                     0                     0               0
##   생활용품_purexp 섬유_purexp 수입명품_purexp 스포츠_purexp
## 1               1           1               0             1
## 2               0           1               0             0
## 3               0           1               1             1
## 4               0           1               0             1
## 5               0           1               0             0
## 6               0           1               0             1
##   엘레강스캐주얼_purexp 영캐주얼_purexp 유니캐주얼_purexp 유아동복_purexp
## 1                     0               1                 1               1
## 2                     0               1                 0               0
## 3                     0               1                 1               1
## 4                     0               0                 0               0
## 5                     0               1                 0               0
## 6                     0               1                 1               0
##   일반식품_purexp 장신구_purexp 점외_purexp 정장셔츠_purexp
## 1               1             1           0               0
## 2               1             1           0               1
## 3               1             0           0               0
## 4               1             0           0               0
## 5               0             0           0               0
## 6               0             0           0               0
##   조리식품_purexp 조리욕실_purexp 청과곡물_purexp 침구수예_purexp
## 1               0               1               0               0
## 2               0               1               0               0
## 3               0               1               0               1
## 4               0               0               0               0
## 5               0               0               0               0
## 6               0               0               0               0
##   캐릭터캐주얼_purexp 타운모피_purexp 트래디셔널캐주얼_purexp 피혁A_purexp
## 1                   0               0                       0            1
## 2                   0               0                       0            0
## 3                   0               0                       0            0
## 4                   0               0                       1            0
## 5                   0               0                       0            0
## 6                   0               0                       1            1
##   피혁B_purexp 행사장(남성)_purexp 행사장(아동스포츠)_purexp
## 1            0                   0                         0
## 2            0                   0                         0
## 3            0                   0                         0
## 4            0                   0                         0
## 5            1                   0                         0
## 6            0                   0                         0
##   행사장(여성정장)_purexp 행사장(여성캐주얼)_purexp
## 1                       0                         0
## 2                       0                         0
## 3                       0                         0
## 4                       0                         0
## 5                       0                         0
## 6                       0                         0
##   행사장(여성캐쥬)_purexp 행사장(잡화)_purexp 화장품_purexp
## 1                       0                   0             1
## 2                       0                   0             1
## 3                       0                   0             1
## 4                       0                   0             1
## 5                       0                   0             1
## 6                       0                   0             1

4.12. v12 : 상품별 구매 순서 변수 생성

## 상품별 구매 순서 변수 (v12)
# 상품별 최초 구매 시간 순으로 나열하여, 그 중에서
# 상품별 최초 구매 일자(purmin = purchase date min)를 구하고, 
# 이에 rank (purord = purchase order)를 매김 
cs.v12 = tr %>% group_by(custid,buyer_nm)
cs.v12 = cs.v12[order(cs.v12$sales_time),]
cs.v12 = cs.v12 %>% summarise(purmin = min(sales_date)) %>% 
  mutate(purord = rank(purmin,ties.method = "first"))
cs.v12.melt = melt(cs.v12, id.vars = c("custid","buyer_nm"), measure.vars = "purord")
cs.v12 = cast(cs.v12.melt, custid ~ buyer_nm + variable)
# cs.v12[is.na(cs.v12)] = 0 # 구매한 적이 없는 경우, 구매 순서 변수를 NA로 유지

head(cs.v12)
##   custid 가구_purord 가전_purord 기타바이어_purord 니트단품_purord
## 1      1          NA           7                NA              NA
## 2      2          NA          NA                NA              NA
## 3      3          13          12                NA              NA
## 4      4          NA          NA                NA              NA
## 5      5          NA          NA                NA              NA
## 6      6          NA          NA                NA              NA
##   도자기크리스탈_purord 디자이너부띠끄_purord 문화완구_purord
## 1                     3                    NA              NA
## 2                     1                    NA              NA
## 3                    NA                    NA              10
## 4                    NA                    NA              NA
## 5                    NA                    NA              NA
## 6                    NA                    NA              NA
##   생활용품_purord 섬유_purord 수입명품_purord 스포츠_purord
## 1              10           8              NA            12
## 2              NA           6              NA            NA
## 3              NA           3               1             2
## 4              NA           1              NA             2
## 5              NA           1              NA            NA
## 6              NA           7              NA             6
##   엘레강스캐주얼_purord 영캐주얼_purord 유니캐주얼_purord 유아동복_purord
## 1                    NA               9                 4               2
## 2                    NA               2                NA              NA
## 3                    NA               4                 8               9
## 4                    NA              NA                NA              NA
## 5                    NA               3                NA              NA
## 6                    NA               5                 3              NA
##   일반식품_purord 장신구_purord 점외_purord 정장셔츠_purord
## 1               1             6          NA              NA
## 2               7             4          NA               3
## 3               5            NA          NA              NA
## 4               5            NA          NA              NA
## 5              NA            NA          NA              NA
## 6              NA            NA          NA              NA
##   조리식품_purord 조리욕실_purord 청과곡물_purord 침구수예_purord
## 1              NA              11              NA              NA
## 2              NA               8              NA              NA
## 3              NA               6              NA               7
## 4              NA              NA              NA              NA
## 5              NA              NA              NA              NA
## 6              NA              NA              NA              NA
##   캐릭터캐주얼_purord 타운모피_purord 트래디셔널캐주얼_purord 피혁A_purord
## 1                  NA              NA                      NA            5
## 2                  NA              NA                      NA           NA
## 3                  NA              NA                      NA           NA
## 4                  NA              NA                       3           NA
## 5                  NA              NA                      NA           NA
## 6                  NA              NA                       4            2
##   피혁B_purord 행사장(남성)_purord 행사장(아동스포츠)_purord
## 1           NA                  NA                        NA
## 2           NA                  NA                        NA
## 3           NA                  NA                        NA
## 4           NA                  NA                        NA
## 5            4                  NA                        NA
## 6           NA                  NA                        NA
##   행사장(여성정장)_purord 행사장(여성캐주얼)_purord
## 1                      NA                        NA
## 2                      NA                        NA
## 3                      NA                        NA
## 4                      NA                        NA
## 5                      NA                        NA
## 6                      NA                        NA
##   행사장(여성캐쥬)_purord 행사장(잡화)_purord 화장품_purord
## 1                      NA                  NA            13
## 2                      NA                  NA             5
## 3                      NA                  NA            11
## 4                      NA                  NA             4
## 5                      NA                  NA             2
## 6                      NA                  NA             1

4.13. v13 : 주 구매상품 변수 생성

## 주 구매상품 변수 (v13) (purpref = purchase preference)
# 고객별로 각 칼럼의 상품군이 주 구매 상품일 경우 (purpref value = 1), 
# 각 칼럼의 상품군이 주 구매 상품이 아닐 경우 (purpref value = 0)
cs.v13.tmp = tr %>%  group_by(custid,buyer_nm) %>% summarise(puramt = sum(net_amt)) %>%
  group_by(custid) %>% mutate(purpref = max(puramt)) %>% filter(puramt == purpref)
cs.v13.melt = melt(cs.v13.tmp, id.vars = c("custid","buyer_nm"), measure.vars = "purpref" )
cs.v13 = cast(cs.v13.melt, custid ~ buyer_nm + variable)
cs.v13[is.na(cs.v13)] = 0 
cs.v13[cs.v13 > 0] = 1

head(cs.v13)
##   custid 가구_purpref 가전_purpref 기타바이어_purpref 니트단품_purpref
## 1      1            0            0                  0                0
## 2      1            0            0                  0                0
## 3      1            0            1                  0                0
## 4      1            0            0                  0                0
## 5      1            0            0                  0                0
## 6      1            0            0                  0                0
##   도자기크리스탈_purpref 디자이너부띠끄_purpref 문화완구_purpref
## 1                      0                      0                0
## 2                      0                      0                0
## 3                      0                      0                0
## 4                      0                      0                0
## 5                      0                      0                0
## 6                      0                      0                0
##   생활용품_purpref 섬유_purpref 수입명품_purpref 스포츠_purpref
## 1                0            0                0              0
## 2                0            0                0              0
## 3                0            0                0              0
## 4                0            0                0              1
## 5                0            0                0              0
## 6                0            0                0              0
##   엘레강스캐주얼_purpref 영캐주얼_purpref 유니캐주얼_purpref
## 1                      0                0                  0
## 2                      0                1                  0
## 3                      0                0                  0
## 4                      0                0                  0
## 5                      0                0                  0
## 6                      0                0                  0
##   유아동복_purpref 일반식품_purpref 장신구_purpref 정장셔츠_purpref
## 1                0                1              0                0
## 2                0                0              0                0
## 3                0                0              0                0
## 4                0                0              0                0
## 5                0                0              0                0
## 6                0                0              0                0
##   조리욕실_purpref 침구수예_purpref 캐릭터캐주얼_purpref 타운모피_purpref
## 1                0                0                    0                0
## 2                0                0                    0                0
## 3                0                0                    0                0
## 4                0                0                    0                0
## 5                0                0                    0                0
## 6                0                0                    0                0
##   트래디셔널캐주얼_purpref 피혁A_purpref 피혁B_purpref
## 1                        0             0             0
## 2                        0             0             0
## 3                        0             0             0
## 4                        0             0             0
## 5                        0             0             1
## 6                        1             0             0
##   행사장(남성)_purpref 행사장(여성정장)_purpref 행사장(여성캐주얼)_purpref
## 1                    0                        0                          0
## 2                    0                        0                          0
## 3                    0                        0                          0
## 4                    0                        0                          0
## 5                    0                        0                          0
## 6                    0                        0                          0
##   행사장(여성캐쥬)_purpref 화장품_purpref
## 1                        0              0
## 2                        0              0
## 3                        0              0
## 4                        0              0
## 5                        0              0
## 6                        0              0

4.14. v14 : 휴면/ 이탈 가망 변수 생성

## 휴면/ 이탈 가망 변수 (v14)
## 최근 방문일로 부터 지난 시간이, 
## 평균 구매 주기보다 길 경우, 휴면/이탈 가능성이 있는 것으로 고려. variable = exit
## (if 휴면/이탈 : value = 1, else : value = 0)
end_date = ymd(ymd_hms(max(tr$sales_date)))
cs.v14 = tr %>% distinct(custid, sales_date) %>%
  group_by(custid) %>% summarise(maxday = max(sales_date))
cs.v14$exit = ifelse(as.integer((end_date - ymd(ymd_hms((cs.v14$maxday)))) > cs.v3$API),1,0)

head(cs.v14)
## # A tibble: 6 x 3
##   custid maxday               exit
##    <int> <chr>               <dbl>
## 1      1 2001-04-29 00:00:00     0
## 2      2 2001-04-11 00:00:00     0
## 3      3 2001-04-27 00:00:00     0
## 4      4 2001-04-14 00:00:00     0
## 5      5 2001-04-25 00:00:00     0
## 6      6 2001-04-29 00:00:00     0

5. Modeling

  • 업데이트 예정

      1. Logistic Regression을 통해 통계적으로 유의한 변수를 추출
      1. Decision Tree를 통해 유의한 변수 추출
      1. 교집합된 변수 추리기
      1. Ensemble 모형을 Cross validation 기법을 활용해 최적화
      1. 평가

5.1. Setting

##### Modeling : Line 271 ~ 
##### 전체 Custsig 생성 line : 273 ~ #####
custsig <- cs[,1:2]
custsig = left_join(custsig, cs.v1) # 환불 행태
# v1 변수 이상치 처리(NA)
custsig$rf_amt[is.na(custsig$rf_amt)] = 0
custsig$rf_cnt[is.na(custsig$rf_cnt)] = 0
custsig = left_join(custsig, cs.v2) # 구매상품 다양성
custsig = left_join(custsig, cs.v3) # 내점일수 & 내점당 구매건수
custsig = left_join(custsig, cs.v4) # 구매주기
# custsig = left_join(custsig, cs.v5) # 요일별 구매패턴
# custsig = left_join(custsig, cs.v6) # 연령대
custsig = left_join(custsig, cs.v7) # 기간별 구매 금액 & 횟수
custsig = left_join(custsig, cs.v8) # 가격 선호도 변수
# custsig = left_join(custsig, cs.v9) # 시즌 선호도 변수
# custsig = left_join(custsig, cs.v10) # 구매추세 패턴
custsig = left_join(custsig, cs.v11) # 상품별 구매 금액/횟수/여부 변수
custsig = left_join(custsig, cs.v12) # 상품별 구매순서 변수
custsig = left_join(custsig, cs.v13) # 주 구매상품 변수
# custsig = left_join(custsig, cs.v14) # 휴면/이탈 가망 변수
custsig.made = custsig
###  답 확인
nrow(custsig)
## [1] 99989
ncol(custsig)
## [1] 186
custsig[sample(nrow(custsig), 5), c(1,sample(ncol(custsig), 20))]
##       custid 점외_purexp 행사장(아동스포츠)_purexp 디자이너부띠끄_purnum
## 52190   2197           0                         0                     2
## 30807      1           0                         0                     0
## 72824  22835           0                         0                     0
## 41384      1           0                         0                     0
## 45872      1           0                         0                     0
##       수입명품_purpref 도자기크리스탈_purpref 피혁B_purnum 타운모피_purexp
## 52190               NA                     NA            0               0
## 30807                0                      0            0               0
## 72824               NA                     NA            0               0
## 41384                0                      0            0               0
## 45872                0                      0            0               0
##         amt12 침구수예_puramt 행사장(여성정장)_purexp 생활용품_puramt
## 52190 3136209               0                       0               0
## 30807 3758381               0                       0           88250
## 72824 2152070               0                       0               0
## 41384 3758381               0                       0           88250
## 45872 3758381               0                       0           88250
##       디자이너부띠끄_puramt 유니캐주얼_puramt 피혁A_purpref       API
## 52190                532100            291500            NA 10.371429
## 30807                     0            558550             0  8.853659
## 72824                     0             56050            NA 33.000000
## 41384                     0            558550             0  8.853659
## 45872                     0            558550             0  8.853659
##       행사장(남성)_purpref 행사장(잡화)_puramt 청과곡물_purord
## 52190                   NA                   0              NA
## 30807                    0                   0              NA
## 72824                   NA                   0              NA
## 41384                    0                   0              NA
## 45872                    0                   0              NA
##       유아동복_purnum 점외_puramt
## 52190               0           0
## 30807               2           0
## 72824               0           0
## 41384               2           0
## 45872               2           0
str(custsig)
## 'data.frame':    99989 obs. of  186 variables:
##  $ custid                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ sex                       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ rf_amt                    : num  -191920 -191920 -191920 -191920 -191920 ...
##  $ rf_cnt                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ buy_brd                   : int  23 23 23 23 23 23 23 23 23 23 ...
##  $ visits                    : int  41 41 41 41 41 41 41 41 41 41 ...
##  $ API                       : num  8.85 8.85 8.85 8.85 8.85 ...
##  $ NPPV                      : num  1.88 1.88 1.88 1.88 1.88 ...
##  $ amt12                     : num  3758381 3758381 3758381 3758381 3758381 ...
##  $ nop12                     : num  77 77 77 77 77 77 77 77 77 77 ...
##  $ amt6                      : num  1337108 1337108 1337108 1337108 1337108 ...
##  $ nop6                      : num  34 34 34 34 34 34 34 34 34 34 ...
##  $ amt3                      : num  511654 511654 511654 511654 511654 ...
##  $ nop3                      : num  13 13 13 13 13 13 13 13 13 13 ...
##  $ cmv                       : num  51978 51978 51978 51978 51978 ...
##  $ pref                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 가구_puramt               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 가전_puramt               : num  509000 509000 509000 509000 509000 509000 509000 509000 509000 509000 ...
##  $ 기타바이어_puramt         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 니트단품_puramt           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 도자기크리스탈_puramt     : num  36640 36640 36640 36640 36640 ...
##  $ 디자이너부띠끄_puramt     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 문화완구_puramt           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 생활용품_puramt           : num  88250 88250 88250 88250 88250 ...
##  $ 섬유_puramt               : num  15000 15000 15000 15000 15000 15000 15000 15000 15000 15000 ...
##  $ 수입명품_puramt           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 스포츠_puramt             : num  179250 179250 179250 179250 179250 ...
##  $ 엘레강스캐주얼_puramt     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 영캐주얼_puramt           : num  352050 352050 352050 352050 352050 ...
##  $ 유니캐주얼_puramt         : num  558550 558550 558550 558550 558550 ...
##  $ 유아동복_puramt           : num  129100 129100 129100 129100 129100 ...
##  $ 일반식품_puramt           : num  1472691 1472691 1472691 1472691 1472691 ...
##  $ 장신구_puramt             : num  40000 40000 40000 40000 40000 40000 40000 40000 40000 40000 ...
##  $ 점외_puramt               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 정장셔츠_puramt           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 조리식품_puramt           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 조리욕실_puramt           : num  90100 90100 90100 90100 90100 90100 90100 90100 90100 90100 ...
##  $ 청과곡물_puramt           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 침구수예_puramt           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 캐릭터캐주얼_puramt       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 타운모피_puramt           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 트래디셔널캐주얼_puramt   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 피혁A_puramt              : num  208750 208750 208750 208750 208750 ...
##  $ 피혁B_puramt              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(남성)_puramt       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(아동스포츠)_puramt : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(여성정장)_puramt   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(여성캐주얼)_puramt : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(여성캐쥬)_puramt   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(잡화)_puramt       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 화장품_puramt             : num  79000 79000 79000 79000 79000 79000 79000 79000 79000 79000 ...
##  $ 가구_purnum               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 가전_purnum               : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 기타바이어_purnum         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 니트단품_purnum           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 도자기크리스탈_purnum     : num  3 3 3 3 3 3 3 3 3 3 ...
##  $ 디자이너부띠끄_purnum     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 문화완구_purnum           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 생활용품_purnum           : num  5 5 5 5 5 5 5 5 5 5 ...
##  $ 섬유_purnum               : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 수입명품_purnum           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 스포츠_purnum             : num  3 3 3 3 3 3 3 3 3 3 ...
##  $ 엘레강스캐주얼_purnum     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 영캐주얼_purnum           : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ 유니캐주얼_purnum         : num  13 13 13 13 13 13 13 13 13 13 ...
##  $ 유아동복_purnum           : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ 일반식품_purnum           : num  39 39 39 39 39 39 39 39 39 39 ...
##  $ 장신구_purnum             : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ 점외_purnum               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 정장셔츠_purnum           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 조리식품_purnum           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 조리욕실_purnum           : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ 청과곡물_purnum           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 침구수예_purnum           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 캐릭터캐주얼_purnum       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 타운모피_purnum           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 트래디셔널캐주얼_purnum   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 피혁A_purnum              : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ 피혁B_purnum              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(남성)_purnum       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(아동스포츠)_purnum : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(여성정장)_purnum   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(여성캐주얼)_purnum : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(여성캐쥬)_purnum   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(잡화)_purnum       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 화장품_purnum             : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 가구_purexp               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 가전_purexp               : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 기타바이어_purexp         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 니트단품_purexp           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 도자기크리스탈_purexp     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 디자이너부띠끄_purexp     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 문화완구_purexp           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 생활용품_purexp           : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 섬유_purexp               : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 수입명품_purexp           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 스포츠_purexp             : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 엘레강스캐주얼_purexp     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 영캐주얼_purexp           : num  1 1 1 1 1 1 1 1 1 1 ...
##   [list output truncated]
custsig = subset(custsig,custsig$sex > 0)
custsig = custsig[,-1]
custsig$sex = custsig$sex - 1 # 남자 : 0, 여자 : 1
custsig[is.na(custsig)] = 0
sum(is.na(custsig))
## [1] 0

5.2. Logistic Regression 1

lm_model1 = glm(sex ~ ., data = custsig, family = binomial("logit"))
summary(lm_model1)
## 
## Call:
## glm(formula = sex ~ ., family = binomial("logit"), data = custsig)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4636  -1.2279   0.6889   0.8421   2.1905  
## 
## Coefficients: (35 not defined because of singularities)
##                                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   2.201e+00  3.965e-01   5.550 2.86e-08 ***
## rf_amt                       -5.291e-09  9.563e-09  -0.553 0.580099    
## rf_cnt                       -9.067e-03  6.716e-03  -1.350 0.177048    
## buy_brd                       8.079e-03  3.607e-03   2.240 0.025084 *  
## visits                       -8.737e-04  2.246e-03  -0.389 0.697278    
## API                          -6.295e-04  2.203e-04  -2.858 0.004267 ** 
## NPPV                         -4.190e-02  2.199e-02  -1.905 0.056790 .  
## amt12                        -5.354e-09  3.660e-08  -0.146 0.883710    
## nop12                         4.084e-03  4.672e-03   0.874 0.382015    
## amt6                         -3.125e-09  1.444e-08  -0.216 0.828695    
## nop6                          9.205e-03  2.465e-03   3.734 0.000189 ***
## amt3                          4.855e-09  1.704e-08   0.285 0.775641    
## nop3                         -1.174e-03  2.972e-03  -0.395 0.692856    
## cmv                          -5.618e-07  1.668e-07  -3.369 0.000754 ***
## pref                         -1.496e-01  2.177e-02  -6.872 6.35e-12 ***
## 가구_puramt                   1.934e-08  5.559e-08   0.348 0.727876    
## 가전_puramt                  -1.493e-08  4.273e-08  -0.349 0.726816    
## 기타바이어_puramt            -3.010e-08  6.696e-08  -0.449 0.653086    
## 니트단품_puramt              -6.320e-08  7.210e-08  -0.877 0.380707    
## 도자기크리스탈_puramt        -3.302e-08  8.698e-08  -0.380 0.704244    
## 디자이너부띠끄_puramt         1.731e-08  4.056e-08   0.427 0.669459    
## 문화완구_puramt              -9.792e-08  1.148e-07  -0.853 0.393708    
## 생활용품_puramt               1.551e-06  1.285e-06   1.207 0.227334    
## 섬유_puramt                  -8.648e-08  1.134e-07  -0.763 0.445703    
## 수입명품_puramt              -2.396e-09  3.774e-08  -0.063 0.949373    
## 스포츠_puramt                -5.905e-08  4.733e-08  -1.248 0.212171    
## 엘레강스캐주얼_puramt         9.086e-09  5.353e-08   0.170 0.865220    
## 영캐주얼_puramt               8.771e-08  6.722e-08   1.305 0.191976    
## 유니캐주얼_puramt             1.040e-07  8.969e-08   1.159 0.246327    
## 유아동복_puramt               9.014e-08  6.017e-08   1.498 0.134123    
## 일반식품_puramt              -7.016e-08  4.311e-08  -1.627 0.103691    
## 장신구_puramt                 1.204e-08  4.409e-08   0.273 0.784752    
## 점외_puramt                   1.408e-04  2.439e-03   0.058 0.953970    
## 정장셔츠_puramt              -5.206e-08  5.101e-08  -1.021 0.307463    
## 조리식품_puramt               1.531e-05  3.454e-05   0.443 0.657702    
## 조리욕실_puramt               2.339e-07  1.350e-07   1.732 0.083208 .  
## 청과곡물_puramt               1.317e-04  2.614e-02   0.005 0.995981    
## 침구수예_puramt               2.922e-08  6.602e-08   0.443 0.658050    
## 캐릭터캐주얼_puramt           6.061e-08  4.825e-08   1.256 0.209088    
## 타운모피_puramt              -1.189e-08  4.539e-08  -0.262 0.793287    
## 트래디셔널캐주얼_puramt       1.262e-07  9.015e-08   1.400 0.161439    
## 피혁A_puramt                 -3.810e-08  1.251e-07  -0.305 0.760659    
## 피혁B_puramt                 -2.995e-07  1.311e-07  -2.285 0.022312 *  
## `행사장(남성)_puramt`        -3.001e-06  2.512e-06  -1.195 0.232181    
## `행사장(아동스포츠)_puramt`   4.609e-06  4.710e-05   0.098 0.922033    
## `행사장(여성정장)_puramt`     7.295e-06  6.062e-06   1.203 0.228803    
## `행사장(여성캐주얼)_puramt`  -5.794e-07  2.977e-06  -0.195 0.845675    
## `행사장(여성캐쥬)_puramt`    -1.263e-07  2.110e-06  -0.060 0.952266    
## `행사장(잡화)_puramt`         1.049e-06  1.209e-05   0.087 0.930897    
## 화장품_puramt                        NA         NA      NA       NA    
## 가구_purnum                   1.042e-02  3.869e-02   0.269 0.787638    
## 가전_purnum                  -1.374e-02  1.841e-02  -0.746 0.455536    
## 기타바이어_purnum            -1.452e-04  1.779e-02  -0.008 0.993490    
## 니트단품_purnum               1.137e-02  9.864e-03   1.153 0.248965    
## 도자기크리스탈_purnum        -3.214e-02  1.768e-02  -1.818 0.069134 .  
## 디자이너부띠끄_purnum        -4.776e-03  1.205e-02  -0.396 0.691790    
## 문화완구_purnum              -2.014e-02  8.594e-03  -2.344 0.019088 *  
## 생활용품_purnum              -4.973e-02  3.464e-02  -1.435 0.151166    
## 섬유_purnum                   2.334e-02  1.053e-02   2.217 0.026607 *  
## 수입명품_purnum               1.012e-03  7.948e-03   0.127 0.898644    
## 스포츠_purnum                -1.921e-02  7.021e-03  -2.736 0.006211 ** 
## 엘레강스캐주얼_purnum        -7.223e-04  1.248e-02  -0.058 0.953843    
## 영캐주얼_purnum              -2.370e-02  9.146e-03  -2.592 0.009544 ** 
## 유니캐주얼_purnum            -2.910e-02  7.928e-03  -3.670 0.000243 ***
## 유아동복_purnum              -2.979e-02  5.936e-03  -5.019 5.19e-07 ***
## 일반식품_purnum              -9.725e-03  4.545e-03  -2.140 0.032390 *  
## 장신구_purnum                -1.161e-02  9.707e-03  -1.196 0.231564    
## 점외_purnum                          NA         NA      NA       NA    
## 정장셔츠_purnum              -3.390e-02  1.318e-02  -2.572 0.010125 *  
## 조리식품_purnum              -3.392e-01  3.702e-01  -0.916 0.359578    
## 조리욕실_purnum               1.274e-02  1.290e-02   0.988 0.323394    
## 청과곡물_purnum              -2.132e+01  2.786e+02  -0.077 0.939000    
## 침구수예_purnum               1.884e-02  2.474e-02   0.761 0.446371    
## 캐릭터캐주얼_purnum          -2.882e-02  9.736e-03  -2.961 0.003070 ** 
## 타운모피_purnum              -3.155e-02  2.144e-02  -1.472 0.141135    
## 트래디셔널캐주얼_purnum      -6.337e-02  1.278e-02  -4.958 7.11e-07 ***
## 피혁A_purnum                  2.398e-03  1.832e-02   0.131 0.895858    
## 피혁B_purnum                  5.167e-02  1.755e-02   2.944 0.003244 ** 
## `행사장(남성)_purnum`        -8.684e-01  5.555e-01  -1.563 0.118016    
## `행사장(아동스포츠)_purnum`  -7.330e-01  1.460e+00  -0.502 0.615587    
## `행사장(여성정장)_purnum`    -1.360e+00  1.243e+00  -1.093 0.274199    
## `행사장(여성캐주얼)_purnum`  -4.897e-02  1.567e-01  -0.313 0.754656    
## `행사장(여성캐쥬)_purnum`     4.576e-01  3.796e-01   1.205 0.228025    
## `행사장(잡화)_purnum`         1.145e+00  7.416e-01   1.544 0.122624    
## 화장품_purnum                        NA         NA      NA       NA    
## 가구_purexp                   2.430e-02  9.646e-02   0.252 0.801070    
## 가전_purexp                   4.267e-02  5.092e-02   0.838 0.402010    
## 기타바이어_purexp             1.565e-01  5.911e-02   2.647 0.008129 ** 
## 니트단품_purexp               1.514e-01  3.615e-02   4.189 2.81e-05 ***
## 도자기크리스탈_purexp        -3.938e-02  5.735e-02  -0.687 0.492316    
## 디자이너부띠끄_purexp        -4.889e-02  5.623e-02  -0.870 0.384549    
## 문화완구_purexp              -2.654e-01  4.284e-02  -6.195 5.82e-10 ***
## 생활용품_purexp              -3.395e-01  1.974e-01  -1.720 0.085498 .  
## 섬유_purexp                   1.163e-01  3.721e-02   3.126 0.001771 ** 
## 수입명품_purexp              -1.410e-01  4.419e-02  -3.190 0.001421 ** 
## 스포츠_purexp                -2.464e-01  3.354e-02  -7.346 2.04e-13 ***
## 엘레강스캐주얼_purexp         1.089e-01  4.379e-02   2.488 0.012847 *  
## 영캐주얼_purexp               3.849e-01  3.607e-02  10.670  < 2e-16 ***
## 유니캐주얼_purexp             4.502e-02  3.568e-02   1.262 0.207017    
## 유아동복_purexp              -1.458e-01  3.780e-02  -3.857 0.000115 ***
## 일반식품_purexp              -1.893e-01  3.364e-02  -5.626 1.84e-08 ***
## 장신구_purexp                 8.092e-02  4.091e-02   1.978 0.047950 *  
## 점외_purexp                          NA         NA      NA       NA    
## 정장셔츠_purexp              -5.522e-01  4.355e-02 -12.679  < 2e-16 ***
## 조리식품_purexp               2.381e-01  7.345e-01   0.324 0.745840    
## 조리욕실_purexp              -1.562e-01  4.793e-02  -3.260 0.001115 ** 
## 청과곡물_purexp               2.091e+00  2.448e+02   0.009 0.993183    
## 침구수예_purexp              -2.310e-02  7.008e-02  -0.330 0.741704    
## 캐릭터캐주얼_purexp           8.349e-02  3.732e-02   2.237 0.025266 *  
## 타운모피_purexp               2.167e-01  8.033e-02   2.697 0.006998 ** 
## 트래디셔널캐주얼_purexp      -2.669e-01  3.996e-02  -6.680 2.38e-11 ***
## 피혁A_purexp                  2.876e-02  3.855e-02   0.746 0.455604    
## 피혁B_purexp                  1.242e-01  4.320e-02   2.875 0.004037 ** 
## `행사장(남성)_purexp`         1.240e+00  7.236e-01   1.714 0.086601 .  
## `행사장(아동스포츠)_purexp`   1.378e+00  1.728e+00   0.797 0.425259    
## `행사장(여성정장)_purexp`     7.991e-01  3.088e+00   0.259 0.795774    
## `행사장(여성캐주얼)_purexp`  -5.178e-01  2.990e-01  -1.732 0.083289 .  
## `행사장(여성캐쥬)_purexp`    -7.401e-01  5.205e-01  -1.422 0.155019    
## `행사장(잡화)_purexp`        -1.289e+00  7.600e-01  -1.695 0.089982 .  
## 화장품_purexp                -7.111e-01  3.920e-01  -1.814 0.069646 .  
## 가구_purord                  -7.024e-03  7.616e-03  -0.922 0.356400    
## 가전_purord                  -9.937e-03  4.649e-03  -2.137 0.032578 *  
## 기타바이어_purord            -2.681e-03  5.679e-03  -0.472 0.636921    
## 니트단품_purord              -1.054e-02  4.061e-03  -2.594 0.009483 ** 
## 도자기크리스탈_purord        -3.267e-03  5.155e-03  -0.634 0.526217    
## 디자이너부띠끄_purord        -8.285e-03  5.404e-03  -1.533 0.125256    
## 문화완구_purord               9.449e-03  4.495e-03   2.102 0.035553 *  
## 생활용품_purord               2.607e-02  1.711e-02   1.524 0.127496    
## 섬유_purord                   1.732e-03  3.868e-03   0.448 0.654268    
## 수입명품_purord              -1.139e-02  4.565e-03  -2.496 0.012577 *  
## 스포츠_purord                 4.433e-03  4.073e-03   1.088 0.276477    
## 엘레강스캐주얼_purord        -7.205e-03  4.257e-03  -1.693 0.090523 .  
## 영캐주얼_purord              -2.185e-02  4.032e-03  -5.419 5.99e-08 ***
## 유니캐주얼_purord            -3.922e-03  4.122e-03  -0.951 0.341435    
## 유아동복_purord               2.827e-03  4.387e-03   0.644 0.519385    
## 일반식품_purord               1.569e-02  4.362e-03   3.598 0.000321 ***
## 장신구_purord                -3.787e-03  4.201e-03  -0.901 0.367323    
## 점외_purord                          NA         NA      NA       NA    
## 정장셔츠_purord               2.615e-02  4.412e-03   5.928 3.06e-09 ***
## 조리식품_purord               1.742e-02  6.116e-02   0.285 0.775760    
## 조리욕실_purord               2.604e-03  4.693e-03   0.555 0.578977    
## 청과곡물_purord               3.426e+00  2.014e+01   0.170 0.864919    
## 침구수예_purord              -6.913e-04  5.621e-03  -0.123 0.902114    
## 캐릭터캐주얼_purord          -8.022e-03  4.060e-03  -1.976 0.048173 *  
## 타운모피_purord              -6.919e-03  6.869e-03  -1.007 0.313827    
## 트래디셔널캐주얼_purord       2.220e-03  4.185e-03   0.531 0.595756    
## 피혁A_purord                  7.597e-04  4.055e-03   0.187 0.851402    
## 피혁B_purord                 -4.526e-03  4.150e-03  -1.090 0.275536    
## `행사장(남성)_purord`        -1.664e-02  4.402e-02  -0.378 0.705404    
## `행사장(아동스포츠)_purord`  -1.198e-01  1.066e-01  -1.124 0.260852    
## `행사장(여성정장)_purord`    -4.118e-02  1.208e-01  -0.341 0.733157    
## `행사장(여성캐주얼)_purord`   5.130e-02  2.540e-02   2.020 0.043390 *  
## `행사장(여성캐쥬)_purord`     1.668e-02  3.278e-02   0.509 0.610798    
## `행사장(잡화)_purord`        -2.282e-02  4.053e-02  -0.563 0.573368    
## 화장품_purord                -3.881e-03  4.226e-03  -0.918 0.358434    
## 가구_purpref                         NA         NA      NA       NA    
## 가전_purpref                         NA         NA      NA       NA    
## 기타바이어_purpref                   NA         NA      NA       NA    
## 니트단품_purpref                     NA         NA      NA       NA    
## 도자기크리스탈_purpref               NA         NA      NA       NA    
## 디자이너부띠끄_purpref               NA         NA      NA       NA    
## 문화완구_purpref                     NA         NA      NA       NA    
## 생활용품_purpref                     NA         NA      NA       NA    
## 섬유_purpref                         NA         NA      NA       NA    
## 수입명품_purpref                     NA         NA      NA       NA    
## 스포츠_purpref                       NA         NA      NA       NA    
## 엘레강스캐주얼_purpref               NA         NA      NA       NA    
## 영캐주얼_purpref                     NA         NA      NA       NA    
## 유니캐주얼_purpref                   NA         NA      NA       NA    
## 유아동복_purpref                     NA         NA      NA       NA    
## 일반식품_purpref                     NA         NA      NA       NA    
## 장신구_purpref                       NA         NA      NA       NA    
## 정장셔츠_purpref                     NA         NA      NA       NA    
## 조리욕실_purpref                     NA         NA      NA       NA    
## 침구수예_purpref                     NA         NA      NA       NA    
## 캐릭터캐주얼_purpref                 NA         NA      NA       NA    
## 타운모피_purpref                     NA         NA      NA       NA    
## 트래디셔널캐주얼_purpref             NA         NA      NA       NA    
## 피혁A_purpref                        NA         NA      NA       NA    
## 피혁B_purpref                        NA         NA      NA       NA    
## `행사장(남성)_purpref`               NA         NA      NA       NA    
## `행사장(여성정장)_purpref`           NA         NA      NA       NA    
## `행사장(여성캐주얼)_purpref`         NA         NA      NA       NA    
## `행사장(여성캐쥬)_purpref`           NA         NA      NA       NA    
## 화장품_purpref                       NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 61337  on 49970  degrees of freedom
## Residual deviance: 57867  on 49821  degrees of freedom
## AIC: 58167
## 
## Number of Fisher Scoring iterations: 10
var1 <- list('sex', 'buy_brd', '피혁B_puramt', '문화완구_purnum', '섬유_purnum', '일반식품_purnum', '정장셔츠_purnum',
             '엘레강스캐주얼_purexp', '장신구_purexp', '캐릭터캐주얼_purexp', '가전_purord', '문화완구_purord',
             '수입명품_purord', '캐릭터캐주얼_purord', '`행사장(여성캐주얼)_purord`',
             
             'API', '스포츠_purnum', '영캐주얼_purnum', '캐릭터캐주얼_purnum', '피혁B_purnum', '기타바이어_purexp',
             '섬유_purexp', '수입명품_purexp', '조리욕실_purexp', '타운모피_purexp', '피혁B_purexp','니트단품_purord',
             
             'nop6', 'cmv', 'pref', '유니캐주얼_purnum', '유아동복_purnum', '트래디셔널캐주얼_purnum',
             '니트단품_purexp', '문화완구_purexp', '스포츠_purexp', '영캐주얼_purexp', '유아동복_purexp', '일반식품_purexp',
             '정장셔츠_purexp', '트래디셔널캐주얼_purexp', '영캐주얼_purord', '일반식품_purord', '정장셔츠_purord'
)

var1
## [[1]]
## [1] "sex"
## 
## [[2]]
## [1] "buy_brd"
## 
## [[3]]
## [1] "피혁B_puramt"
## 
## [[4]]
## [1] "문화완구_purnum"
## 
## [[5]]
## [1] "섬유_purnum"
## 
## [[6]]
## [1] "일반식품_purnum"
## 
## [[7]]
## [1] "정장셔츠_purnum"
## 
## [[8]]
## [1] "엘레강스캐주얼_purexp"
## 
## [[9]]
## [1] "장신구_purexp"
## 
## [[10]]
## [1] "캐릭터캐주얼_purexp"
## 
## [[11]]
## [1] "가전_purord"
## 
## [[12]]
## [1] "문화완구_purord"
## 
## [[13]]
## [1] "수입명품_purord"
## 
## [[14]]
## [1] "캐릭터캐주얼_purord"
## 
## [[15]]
## [1] "`행사장(여성캐주얼)_purord`"
## 
## [[16]]
## [1] "API"
## 
## [[17]]
## [1] "스포츠_purnum"
## 
## [[18]]
## [1] "영캐주얼_purnum"
## 
## [[19]]
## [1] "캐릭터캐주얼_purnum"
## 
## [[20]]
## [1] "피혁B_purnum"
## 
## [[21]]
## [1] "기타바이어_purexp"
## 
## [[22]]
## [1] "섬유_purexp"
## 
## [[23]]
## [1] "수입명품_purexp"
## 
## [[24]]
## [1] "조리욕실_purexp"
## 
## [[25]]
## [1] "타운모피_purexp"
## 
## [[26]]
## [1] "피혁B_purexp"
## 
## [[27]]
## [1] "니트단품_purord"
## 
## [[28]]
## [1] "nop6"
## 
## [[29]]
## [1] "cmv"
## 
## [[30]]
## [1] "pref"
## 
## [[31]]
## [1] "유니캐주얼_purnum"
## 
## [[32]]
## [1] "유아동복_purnum"
## 
## [[33]]
## [1] "트래디셔널캐주얼_purnum"
## 
## [[34]]
## [1] "니트단품_purexp"
## 
## [[35]]
## [1] "문화완구_purexp"
## 
## [[36]]
## [1] "스포츠_purexp"
## 
## [[37]]
## [1] "영캐주얼_purexp"
## 
## [[38]]
## [1] "유아동복_purexp"
## 
## [[39]]
## [1] "일반식품_purexp"
## 
## [[40]]
## [1] "정장셔츠_purexp"
## 
## [[41]]
## [1] "트래디셔널캐주얼_purexp"
## 
## [[42]]
## [1] "영캐주얼_purord"
## 
## [[43]]
## [1] "일반식품_purord"
## 
## [[44]]
## [1] "정장셔츠_purord"
var1_idx <- c()

for (i in var1){
  g <- grep(i, colnames(custsig))
  var1_idx <- c(var1_idx, g)
}

str(var1_idx)
##  int [1:73] 1 4 43 57 59 66 69 97 102 109 ...
var1_idx
##  [1]   1   4  43  57  59  66  69  97 102 109 122 127 130 144   6  61  63
## [18]  74  78  88  94  95 106 110 113 124  11  14  15 156 157 158 159 160
## [35] 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177
## [52] 178 179 180 181 182 183 184 185  64  65  76  89  92  96  98 100 101
## [69] 104 111 133 136 139
custsig.fe = custsig
str(custsig.fe[c(var1_idx)])
## 'data.frame':    49971 obs. of  73 variables:
##  $ sex                       : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ buy_brd                   : int  6 4 14 10 67 13 16 8 3 8 ...
##  $ 피혁B_puramt              : num  0 83700 0 250400 1421730 ...
##  $ 문화완구_purnum           : num  0 0 0 0 3 0 0 0 0 0 ...
##  $ 섬유_purnum               : num  1 1 1 0 16 2 1 0 0 0 ...
##  $ 일반식품_purnum           : num  1 0 0 0 78 2 20 0 0 0 ...
##  $ 정장셔츠_purnum           : num  0 0 0 0 3 0 0 0 0 1 ...
##  $ 엘레강스캐주얼_purexp     : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ 장신구_purexp             : num  0 0 0 1 1 0 0 1 0 0 ...
##  $ 캐릭터캐주얼_purexp       : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 가전_purord               : num  0 0 0 5 0 0 8 0 0 0 ...
##  $ 문화완구_purord           : num  0 0 0 0 11 0 0 0 0 0 ...
##  $ 수입명품_purord           : num  0 0 0 0 0 8 0 0 0 0 ...
##  $ 캐릭터캐주얼_purord       : num  0 0 0 0 10 0 0 0 0 4 ...
##  $ API                       : num  90.75 121 24.2 51.86 3.18 ...
##  $ 스포츠_purnum             : num  2 0 4 0 5 0 0 1 0 0 ...
##  $ 영캐주얼_purnum           : num  0 1 1 2 6 0 0 4 0 2 ...
##  $ 캐릭터캐주얼_purnum       : num  0 0 0 0 7 0 0 0 0 2 ...
##  $ 피혁B_purnum              : num  0 1 0 1 11 0 0 1 0 0 ...
##  $ 기타바이어_purexp         : num  0 0 0 0 1 0 1 0 0 0 ...
##  $ 섬유_purexp               : num  1 1 1 0 1 1 1 0 0 0 ...
##  $ 수입명품_purexp           : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ 조리욕실_purexp           : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ 타운모피_purexp           : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ 피혁B_purexp              : num  0 1 0 1 1 0 0 1 0 0 ...
##  $ 니트단품_purord           : num  0 0 0 6 13 0 12 0 0 0 ...
##  $ nop6                      : num  6 4 14 11 114 28 41 12 4 11 ...
##  $ cmv                       : num  63550 38925 89565 242727 55486 ...
##  $ pref                      : num  1 0 1 2 0 1 0 1 0 2 ...
##  $ 가구_purpref              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 가전_purpref              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 기타바이어_purpref        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 니트단품_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 도자기크리스탈_purpref    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 디자이너부띠끄_purpref    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 문화완구_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 생활용품_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 섬유_purpref              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 수입명품_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 스포츠_purpref            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 엘레강스캐주얼_purpref    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 영캐주얼_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 유니캐주얼_purpref        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 유아동복_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 일반식품_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 장신구_purpref            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 정장셔츠_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 조리욕실_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 침구수예_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 캐릭터캐주얼_purpref      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 타운모피_purpref          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 트래디셔널캐주얼_purpref  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 피혁A_purpref             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 피혁B_purpref             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(남성)_purpref      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(여성정장)_purpref  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(여성캐주얼)_purpref: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 행사장(여성캐쥬)_purpref  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 화장품_purpref            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 유니캐주얼_purnum         : num  0 0 9 0 28 5 3 1 2 0 ...
##  $ 유아동복_purnum           : num  0 0 0 0 0 2 2 0 0 1 ...
##  $ 트래디셔널캐주얼_purnum   : num  1 0 6 0 7 2 0 0 0 0 ...
##  $ 니트단품_purexp           : num  0 0 0 1 1 0 1 0 0 0 ...
##  $ 문화완구_purexp           : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ 스포츠_purexp             : num  1 0 1 0 1 0 0 0 0 0 ...
##  $ 영캐주얼_purexp           : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ 유아동복_purexp           : num  0 0 0 0 0 1 1 0 0 1 ...
##  $ 일반식품_purexp           : num  1 0 0 0 1 1 1 0 0 0 ...
##  $ 정장셔츠_purexp           : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 트래디셔널캐주얼_purexp   : num  1 0 1 0 1 1 0 0 0 0 ...
##  $ 영캐주얼_purord           : num  0 3 5 2 7 0 0 2 0 2 ...
##  $ 일반식품_purord           : num  5 0 0 0 1 7 2 0 0 0 ...
##  $ 정장셔츠_purord           : num  0 0 0 0 16 0 0 0 0 1 ...
custsig = custsig.fe[c(var1_idx)]

dim(custsig)
## [1] 49971    73
custsig_backup1 = custsig

5.3. Logistic Regression 2

lm_model_2 = glm(sex ~ ., data = custsig, family = binomial("logit"))
summary(lm_model_2)
## 
## Call:
## glm(formula = sex ~ ., family = binomial("logit"), data = custsig)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3343  -1.2340   0.6925   0.8445   2.0746  
## 
## Coefficients: (30 not defined because of singularities)
##                                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   1.484e+00  4.416e-02  33.597  < 2e-16 ***
## buy_brd                       6.645e-03  2.512e-03   2.645 0.008170 ** 
## 피혁B_puramt                 -2.851e-07  1.221e-07  -2.335 0.019547 *  
## 문화완구_purnum              -1.836e-02  6.393e-03  -2.872 0.004073 ** 
## 섬유_purnum                   2.232e-02  7.556e-03   2.953 0.003145 ** 
## 일반식품_purnum              -9.291e-03  9.647e-04  -9.630  < 2e-16 ***
## 정장셔츠_purnum              -4.680e-02  9.541e-03  -4.905 9.32e-07 ***
## 엘레강스캐주얼_purexp         6.161e-02  2.567e-02   2.400 0.016382 *  
## 장신구_purexp                 4.316e-02  2.430e-02   1.776 0.075759 .  
## 캐릭터캐주얼_purexp           9.266e-02  3.606e-02   2.570 0.010176 *  
## 가전_purord                  -8.266e-03  2.873e-03  -2.877 0.004009 ** 
## 문화완구_purord               9.863e-03  4.412e-03   2.236 0.025378 *  
## 수입명품_purord              -1.185e-02  4.338e-03  -2.731 0.006311 ** 
## 캐릭터캐주얼_purord          -9.324e-03  3.983e-03  -2.341 0.019244 *  
## API                          -7.851e-04  1.848e-04  -4.248 2.16e-05 ***
## 스포츠_purnum                -2.664e-02  3.754e-03  -7.098 1.26e-12 ***
## 영캐주얼_purnum              -1.140e-02  4.708e-03  -2.421 0.015493 *  
## 캐릭터캐주얼_purnum          -1.214e-02  5.192e-03  -2.337 0.019428 *  
## 피혁B_purnum                  5.949e-02  1.663e-02   3.578 0.000346 ***
## 기타바이어_purexp             1.311e-01  3.047e-02   4.302 1.69e-05 ***
## 섬유_purexp                   1.244e-01  2.708e-02   4.594 4.35e-06 ***
## 수입명품_purexp              -1.477e-01  3.874e-02  -3.813 0.000137 ***
## 조리욕실_purexp              -1.142e-01  2.591e-02  -4.406 1.05e-05 ***
## 타운모피_purexp               9.115e-02  3.998e-02   2.280 0.022628 *  
## 피혁B_purexp                  9.079e-02  3.099e-02   2.929 0.003398 ** 
## 니트단품_purord              -1.241e-02  3.834e-03  -3.237 0.001209 ** 
## nop6                          8.730e-03  1.254e-03   6.961 3.37e-12 ***
## cmv                          -6.682e-07  1.343e-07  -4.975 6.52e-07 ***
## pref                         -1.567e-01  2.022e-02  -7.751 9.12e-15 ***
## 가구_purpref                         NA         NA      NA       NA    
## 가전_purpref                         NA         NA      NA       NA    
## 기타바이어_purpref                   NA         NA      NA       NA    
## 니트단품_purpref                     NA         NA      NA       NA    
## 도자기크리스탈_purpref               NA         NA      NA       NA    
## 디자이너부띠끄_purpref               NA         NA      NA       NA    
## 문화완구_purpref                     NA         NA      NA       NA    
## 생활용품_purpref                     NA         NA      NA       NA    
## 섬유_purpref                         NA         NA      NA       NA    
## 수입명품_purpref                     NA         NA      NA       NA    
## 스포츠_purpref                       NA         NA      NA       NA    
## 엘레강스캐주얼_purpref               NA         NA      NA       NA    
## 영캐주얼_purpref                     NA         NA      NA       NA    
## 유니캐주얼_purpref                   NA         NA      NA       NA    
## 유아동복_purpref                     NA         NA      NA       NA    
## 일반식품_purpref                     NA         NA      NA       NA    
## 장신구_purpref                       NA         NA      NA       NA    
## 정장셔츠_purpref                     NA         NA      NA       NA    
## 조리욕실_purpref                     NA         NA      NA       NA    
## 침구수예_purpref                     NA         NA      NA       NA    
## 캐릭터캐주얼_purpref                 NA         NA      NA       NA    
## 타운모피_purpref                     NA         NA      NA       NA    
## 트래디셔널캐주얼_purpref             NA         NA      NA       NA    
## 피혁A_purpref                        NA         NA      NA       NA    
## 피혁B_purpref                        NA         NA      NA       NA    
## `행사장(남성)_purpref`               NA         NA      NA       NA    
## `행사장(여성정장)_purpref`           NA         NA      NA       NA    
## `행사장(여성캐주얼)_purpref`         NA         NA      NA       NA    
## `행사장(여성캐쥬)_purpref`           NA         NA      NA       NA    
## 화장품_purpref                       NA         NA      NA       NA    
## 유니캐주얼_purnum            -1.625e-02  3.637e-03  -4.469 7.87e-06 ***
## 유아동복_purnum              -2.158e-02  2.799e-03  -7.709 1.27e-14 ***
## 트래디셔널캐주얼_purnum      -4.826e-02  7.190e-03  -6.712 1.92e-11 ***
## 니트단품_purexp               1.754e-01  3.032e-02   5.787 7.15e-09 ***
## 문화완구_purexp              -2.726e-01  4.138e-02  -6.587 4.48e-11 ***
## 스포츠_purexp                -2.286e-01  2.526e-02  -9.049  < 2e-16 ***
## 영캐주얼_purexp               3.911e-01  3.473e-02  11.264  < 2e-16 ***
## 유아동복_purexp              -1.369e-01  2.528e-02  -5.418 6.04e-08 ***
## 일반식품_purexp              -2.179e-01  3.173e-02  -6.868 6.51e-12 ***
## 정장셔츠_purexp              -5.495e-01  4.242e-02 -12.955  < 2e-16 ***
## 트래디셔널캐주얼_purexp      -2.525e-01  2.789e-02  -9.054  < 2e-16 ***
## 영캐주얼_purord              -2.240e-02  3.965e-03  -5.649 1.61e-08 ***
## 일반식품_purord               1.766e-02  4.195e-03   4.210 2.56e-05 ***
## 정장셔츠_purord               2.553e-02  4.337e-03   5.887 3.94e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 61337  on 49970  degrees of freedom
## Residual deviance: 58035  on 49928  degrees of freedom
## AIC: 58121
## 
## Number of Fisher Scoring iterations: 4
var2 <- list('sex', '피혁B_puramt', '엘레강스캐주얼_purexp', '캐릭터캐주얼_purexp', '문화완구_purord',
             '캐릭터캐주얼_purord', '영캐주얼_purnum', '캐릭터캐주얼_purnum','타운모피_purexp',
             
             '문화완구_purnum', '섬유_purnum', '가전_purord', '수입명품_purord', '피혁B_purexp', '니트단품_purord',
             
             'buy_brd', '일반식품_purnum', '정장셔츠_purnum', 'API', '스포츠_purnum', '피혁B_purnum', '기타바이어_purexp',
             # '섬유_purexp', '수입명품_purexp', '조리욕실_purexp', 'nop6', 'cmv', 'pref', '유니캐주얼_purnum',
             '섬유_purexp', '수입명품_purexp', '조리욕실_purexp', 'nop6', 'cmv', '유니캐주얼_purnum',
             '유아동복_purnum', '트래디셔널캐주얼_purnum', '니트단품_purexp', '문화완구_purexp', '스포츠_purexp',
             '영캐주얼_purexp', '유아동복_purexp', '일반식품_purexp', '정장셔츠_purexp', '트래디셔널캐주얼_purexp',
             '영캐주얼_purord', '일반식품_purord', '정장셔츠_purord')

var2
## [[1]]
## [1] "sex"
## 
## [[2]]
## [1] "피혁B_puramt"
## 
## [[3]]
## [1] "엘레강스캐주얼_purexp"
## 
## [[4]]
## [1] "캐릭터캐주얼_purexp"
## 
## [[5]]
## [1] "문화완구_purord"
## 
## [[6]]
## [1] "캐릭터캐주얼_purord"
## 
## [[7]]
## [1] "영캐주얼_purnum"
## 
## [[8]]
## [1] "캐릭터캐주얼_purnum"
## 
## [[9]]
## [1] "타운모피_purexp"
## 
## [[10]]
## [1] "문화완구_purnum"
## 
## [[11]]
## [1] "섬유_purnum"
## 
## [[12]]
## [1] "가전_purord"
## 
## [[13]]
## [1] "수입명품_purord"
## 
## [[14]]
## [1] "피혁B_purexp"
## 
## [[15]]
## [1] "니트단품_purord"
## 
## [[16]]
## [1] "buy_brd"
## 
## [[17]]
## [1] "일반식품_purnum"
## 
## [[18]]
## [1] "정장셔츠_purnum"
## 
## [[19]]
## [1] "API"
## 
## [[20]]
## [1] "스포츠_purnum"
## 
## [[21]]
## [1] "피혁B_purnum"
## 
## [[22]]
## [1] "기타바이어_purexp"
## 
## [[23]]
## [1] "섬유_purexp"
## 
## [[24]]
## [1] "수입명품_purexp"
## 
## [[25]]
## [1] "조리욕실_purexp"
## 
## [[26]]
## [1] "nop6"
## 
## [[27]]
## [1] "cmv"
## 
## [[28]]
## [1] "유니캐주얼_purnum"
## 
## [[29]]
## [1] "유아동복_purnum"
## 
## [[30]]
## [1] "트래디셔널캐주얼_purnum"
## 
## [[31]]
## [1] "니트단품_purexp"
## 
## [[32]]
## [1] "문화완구_purexp"
## 
## [[33]]
## [1] "스포츠_purexp"
## 
## [[34]]
## [1] "영캐주얼_purexp"
## 
## [[35]]
## [1] "유아동복_purexp"
## 
## [[36]]
## [1] "일반식품_purexp"
## 
## [[37]]
## [1] "정장셔츠_purexp"
## 
## [[38]]
## [1] "트래디셔널캐주얼_purexp"
## 
## [[39]]
## [1] "영캐주얼_purord"
## 
## [[40]]
## [1] "일반식품_purord"
## 
## [[41]]
## [1] "정장셔츠_purord"
var2_idx <- c()

for (i in var2){
  gg <- grep(i, colnames(custsig))
  var2_idx <- c(var2_idx, gg)
}

str(var2_idx)
##  int [1:41] 1 3 8 10 12 14 17 18 24 4 ...
var2_idx
##  [1]  1  3  8 10 12 14 17 18 24  4  5 11 13 25 26  2  6  7 15 16 19 20 21
## [24] 22 23 27 28 60 61 62 63 64 65 66 67 68 69 70 71 72 73
custsig.fe = custsig
str(custsig.fe[c(var2_idx)])
## 'data.frame':    49971 obs. of  41 variables:
##  $ sex                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 피혁B_puramt           : num  0 83700 0 250400 1421730 ...
##  $ 엘레강스캐주얼_purexp  : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ 캐릭터캐주얼_purexp    : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 문화완구_purord        : num  0 0 0 0 11 0 0 0 0 0 ...
##  $ 캐릭터캐주얼_purord    : num  0 0 0 0 10 0 0 0 0 4 ...
##  $ 영캐주얼_purnum        : num  0 1 1 2 6 0 0 4 0 2 ...
##  $ 캐릭터캐주얼_purnum    : num  0 0 0 0 7 0 0 0 0 2 ...
##  $ 타운모피_purexp        : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ 문화완구_purnum        : num  0 0 0 0 3 0 0 0 0 0 ...
##  $ 섬유_purnum            : num  1 1 1 0 16 2 1 0 0 0 ...
##  $ 가전_purord            : num  0 0 0 5 0 0 8 0 0 0 ...
##  $ 수입명품_purord        : num  0 0 0 0 0 8 0 0 0 0 ...
##  $ 피혁B_purexp           : num  0 1 0 1 1 0 0 1 0 0 ...
##  $ 니트단품_purord        : num  0 0 0 6 13 0 12 0 0 0 ...
##  $ buy_brd                : int  6 4 14 10 67 13 16 8 3 8 ...
##  $ 일반식품_purnum        : num  1 0 0 0 78 2 20 0 0 0 ...
##  $ 정장셔츠_purnum        : num  0 0 0 0 3 0 0 0 0 1 ...
##  $ API                    : num  90.75 121 24.2 51.86 3.18 ...
##  $ 스포츠_purnum          : num  2 0 4 0 5 0 0 1 0 0 ...
##  $ 피혁B_purnum           : num  0 1 0 1 11 0 0 1 0 0 ...
##  $ 기타바이어_purexp      : num  0 0 0 0 1 0 1 0 0 0 ...
##  $ 섬유_purexp            : num  1 1 1 0 1 1 1 0 0 0 ...
##  $ 수입명품_purexp        : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ 조리욕실_purexp        : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ nop6                   : num  6 4 14 11 114 28 41 12 4 11 ...
##  $ cmv                    : num  63550 38925 89565 242727 55486 ...
##  $ 유니캐주얼_purnum      : num  0 0 9 0 28 5 3 1 2 0 ...
##  $ 유아동복_purnum        : num  0 0 0 0 0 2 2 0 0 1 ...
##  $ 트래디셔널캐주얼_purnum: num  1 0 6 0 7 2 0 0 0 0 ...
##  $ 니트단품_purexp        : num  0 0 0 1 1 0 1 0 0 0 ...
##  $ 문화완구_purexp        : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ 스포츠_purexp          : num  1 0 1 0 1 0 0 0 0 0 ...
##  $ 영캐주얼_purexp        : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ 유아동복_purexp        : num  0 0 0 0 0 1 1 0 0 1 ...
##  $ 일반식품_purexp        : num  1 0 0 0 1 1 1 0 0 0 ...
##  $ 정장셔츠_purexp        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 트래디셔널캐주얼_purexp: num  1 0 1 0 1 1 0 0 0 0 ...
##  $ 영캐주얼_purord        : num  0 3 5 2 7 0 0 2 0 2 ...
##  $ 일반식품_purord        : num  5 0 0 0 1 7 2 0 0 0 ...
##  $ 정장셔츠_purord        : num  0 0 0 0 16 0 0 0 0 1 ...
custsig = custsig.fe[c(var2_idx)]

dim(custsig)
## [1] 49971    41
custsig_backup2 = custsig

5.4. Logistic Regression 3

lm_model_3 = glm(sex ~ ., data = custsig, family = binomial("logit"))
summary(lm_model_3)
## 
## Call:
## glm(formula = sex ~ ., family = binomial("logit"), data = custsig)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3453  -1.2367   0.6975   0.8397   2.0770  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              1.388e+00  4.185e-02  33.159  < 2e-16 ***
## 피혁B_puramt            -3.563e-07  1.221e-07  -2.919  0.00351 ** 
## 엘레강스캐주얼_purexp    4.790e-02  2.561e-02   1.870  0.06146 .  
## 캐릭터캐주얼_purexp      6.893e-02  3.593e-02   1.919  0.05504 .  
## 문화완구_purord          9.563e-03  4.405e-03   2.171  0.02994 *  
## 캐릭터캐주얼_purord     -8.186e-03  3.980e-03  -2.057  0.03969 *  
## 영캐주얼_purnum         -1.189e-02  4.706e-03  -2.526  0.01154 *  
## 캐릭터캐주얼_purnum     -1.307e-02  5.190e-03  -2.519  0.01176 *  
## 타운모피_purexp          7.783e-02  3.996e-02   1.948  0.05146 .  
## 문화완구_purnum         -1.875e-02  6.379e-03  -2.938  0.00330 ** 
## 섬유_purnum              2.389e-02  7.561e-03   3.160  0.00158 ** 
## 가전_purord             -9.208e-03  2.868e-03  -3.211  0.00132 ** 
## 수입명품_purord         -1.134e-02  4.338e-03  -2.615  0.00893 ** 
## 피혁B_purexp             9.075e-02  3.098e-02   2.930  0.00339 ** 
## 니트단품_purord         -1.241e-02  3.827e-03  -3.244  0.00118 ** 
## buy_brd                  7.224e-03  2.458e-03   2.939  0.00329 ** 
## 일반식품_purnum         -8.357e-03  9.551e-04  -8.750  < 2e-16 ***
## 정장셔츠_purnum         -4.958e-02  9.532e-03  -5.202 1.97e-07 ***
## API                     -7.472e-04  1.839e-04  -4.064 4.83e-05 ***
## 스포츠_purnum           -2.819e-02  3.733e-03  -7.553 4.26e-14 ***
## 피혁B_purnum             6.585e-02  1.662e-02   3.961 7.45e-05 ***
## 기타바이어_purexp        1.333e-01  3.040e-02   4.383 1.17e-05 ***
## 섬유_purexp              1.278e-01  2.707e-02   4.722 2.33e-06 ***
## 수입명품_purexp         -1.768e-01  3.861e-02  -4.578 4.69e-06 ***
## 조리욕실_purexp         -1.149e-01  2.587e-02  -4.441 8.94e-06 ***
## nop6                     8.376e-03  1.252e-03   6.691 2.22e-11 ***
## cmv                     -1.253e-06  1.145e-07 -10.949  < 2e-16 ***
## 유니캐주얼_purnum       -1.493e-02  3.625e-03  -4.119 3.81e-05 ***
## 유아동복_purnum         -2.110e-02  2.782e-03  -7.584 3.35e-14 ***
## 트래디셔널캐주얼_purnum -4.881e-02  7.183e-03  -6.795 1.08e-11 ***
## 니트단품_purexp          1.766e-01  3.026e-02   5.836 5.33e-09 ***
## 문화완구_purexp         -2.627e-01  4.130e-02  -6.360 2.01e-10 ***
## 스포츠_purexp           -2.345e-01  2.524e-02  -9.288  < 2e-16 ***
## 영캐주얼_purexp          3.871e-01  3.471e-02  11.154  < 2e-16 ***
## 유아동복_purexp         -1.322e-01  2.524e-02  -5.237 1.63e-07 ***
## 일반식품_purexp         -1.972e-01  3.156e-02  -6.248 4.15e-10 ***
## 정장셔츠_purexp         -5.618e-01  4.238e-02 -13.256  < 2e-16 ***
## 트래디셔널캐주얼_purexp -2.579e-01  2.786e-02  -9.257  < 2e-16 ***
## 영캐주얼_purord         -2.206e-02  3.961e-03  -5.569 2.56e-08 ***
## 일반식품_purord          1.660e-02  4.186e-03   3.965 7.35e-05 ***
## 정장셔츠_purord          2.580e-02  4.334e-03   5.953 2.63e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 61337  on 49970  degrees of freedom
## Residual deviance: 58098  on 49930  degrees of freedom
## AIC: 58180
## 
## Number of Fisher Scoring iterations: 4
var3 <- list('sex', '피혁B_puramt', '문화완구_purord',
             '캐릭터캐주얼_purord', '영캐주얼_purnum', '캐릭터캐주얼_purnum',
             
             '문화완구_purnum', '섬유_purnum', '가전_purord', '수입명품_purord', '피혁B_purexp', '니트단품_purord',
             
             'buy_brd', '일반식품_purnum', '정장셔츠_purnum', 'API', '스포츠_purnum', '피혁B_purnum', '기타바이어_purexp',
             '섬유_purexp', '수입명품_purexp', '조리욕실_purexp', 'nop6', 'cmv', '유니캐주얼_purnum',
             '유아동복_purnum', '트래디셔널캐주얼_purnum', '니트단품_purexp', '문화완구_purexp', '스포츠_purexp',
             '영캐주얼_purexp', '유아동복_purexp', '일반식품_purexp', '정장셔츠_purexp', '트래디셔널캐주얼_purexp',
             '영캐주얼_purord', '일반식품_purord', '정장셔츠_purord')

var3
## [[1]]
## [1] "sex"
## 
## [[2]]
## [1] "피혁B_puramt"
## 
## [[3]]
## [1] "문화완구_purord"
## 
## [[4]]
## [1] "캐릭터캐주얼_purord"
## 
## [[5]]
## [1] "영캐주얼_purnum"
## 
## [[6]]
## [1] "캐릭터캐주얼_purnum"
## 
## [[7]]
## [1] "문화완구_purnum"
## 
## [[8]]
## [1] "섬유_purnum"
## 
## [[9]]
## [1] "가전_purord"
## 
## [[10]]
## [1] "수입명품_purord"
## 
## [[11]]
## [1] "피혁B_purexp"
## 
## [[12]]
## [1] "니트단품_purord"
## 
## [[13]]
## [1] "buy_brd"
## 
## [[14]]
## [1] "일반식품_purnum"
## 
## [[15]]
## [1] "정장셔츠_purnum"
## 
## [[16]]
## [1] "API"
## 
## [[17]]
## [1] "스포츠_purnum"
## 
## [[18]]
## [1] "피혁B_purnum"
## 
## [[19]]
## [1] "기타바이어_purexp"
## 
## [[20]]
## [1] "섬유_purexp"
## 
## [[21]]
## [1] "수입명품_purexp"
## 
## [[22]]
## [1] "조리욕실_purexp"
## 
## [[23]]
## [1] "nop6"
## 
## [[24]]
## [1] "cmv"
## 
## [[25]]
## [1] "유니캐주얼_purnum"
## 
## [[26]]
## [1] "유아동복_purnum"
## 
## [[27]]
## [1] "트래디셔널캐주얼_purnum"
## 
## [[28]]
## [1] "니트단품_purexp"
## 
## [[29]]
## [1] "문화완구_purexp"
## 
## [[30]]
## [1] "스포츠_purexp"
## 
## [[31]]
## [1] "영캐주얼_purexp"
## 
## [[32]]
## [1] "유아동복_purexp"
## 
## [[33]]
## [1] "일반식품_purexp"
## 
## [[34]]
## [1] "정장셔츠_purexp"
## 
## [[35]]
## [1] "트래디셔널캐주얼_purexp"
## 
## [[36]]
## [1] "영캐주얼_purord"
## 
## [[37]]
## [1] "일반식품_purord"
## 
## [[38]]
## [1] "정장셔츠_purord"
var3_idx <- c()

for (i in var3){
  gg <- grep(i, colnames(custsig))
  var3_idx <- c(var3_idx, gg)
}

str(var3_idx)
##  int [1:38] 1 2 5 6 7 8 10 11 12 13 ...
var3_idx
##  [1]  1  2  5  6  7  8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## [24] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
custsig.fe = custsig
str(custsig.fe[c(var3_idx)])
## 'data.frame':    49971 obs. of  38 variables:
##  $ sex                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 피혁B_puramt           : num  0 83700 0 250400 1421730 ...
##  $ 문화완구_purord        : num  0 0 0 0 11 0 0 0 0 0 ...
##  $ 캐릭터캐주얼_purord    : num  0 0 0 0 10 0 0 0 0 4 ...
##  $ 영캐주얼_purnum        : num  0 1 1 2 6 0 0 4 0 2 ...
##  $ 캐릭터캐주얼_purnum    : num  0 0 0 0 7 0 0 0 0 2 ...
##  $ 문화완구_purnum        : num  0 0 0 0 3 0 0 0 0 0 ...
##  $ 섬유_purnum            : num  1 1 1 0 16 2 1 0 0 0 ...
##  $ 가전_purord            : num  0 0 0 5 0 0 8 0 0 0 ...
##  $ 수입명품_purord        : num  0 0 0 0 0 8 0 0 0 0 ...
##  $ 피혁B_purexp           : num  0 1 0 1 1 0 0 1 0 0 ...
##  $ 니트단품_purord        : num  0 0 0 6 13 0 12 0 0 0 ...
##  $ buy_brd                : int  6 4 14 10 67 13 16 8 3 8 ...
##  $ 일반식품_purnum        : num  1 0 0 0 78 2 20 0 0 0 ...
##  $ 정장셔츠_purnum        : num  0 0 0 0 3 0 0 0 0 1 ...
##  $ API                    : num  90.75 121 24.2 51.86 3.18 ...
##  $ 스포츠_purnum          : num  2 0 4 0 5 0 0 1 0 0 ...
##  $ 피혁B_purnum           : num  0 1 0 1 11 0 0 1 0 0 ...
##  $ 기타바이어_purexp      : num  0 0 0 0 1 0 1 0 0 0 ...
##  $ 섬유_purexp            : num  1 1 1 0 1 1 1 0 0 0 ...
##  $ 수입명품_purexp        : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ 조리욕실_purexp        : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ nop6                   : num  6 4 14 11 114 28 41 12 4 11 ...
##  $ cmv                    : num  63550 38925 89565 242727 55486 ...
##  $ 유니캐주얼_purnum      : num  0 0 9 0 28 5 3 1 2 0 ...
##  $ 유아동복_purnum        : num  0 0 0 0 0 2 2 0 0 1 ...
##  $ 트래디셔널캐주얼_purnum: num  1 0 6 0 7 2 0 0 0 0 ...
##  $ 니트단품_purexp        : num  0 0 0 1 1 0 1 0 0 0 ...
##  $ 문화완구_purexp        : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ 스포츠_purexp          : num  1 0 1 0 1 0 0 0 0 0 ...
##  $ 영캐주얼_purexp        : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ 유아동복_purexp        : num  0 0 0 0 0 1 1 0 0 1 ...
##  $ 일반식품_purexp        : num  1 0 0 0 1 1 1 0 0 0 ...
##  $ 정장셔츠_purexp        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 트래디셔널캐주얼_purexp: num  1 0 1 0 1 1 0 0 0 0 ...
##  $ 영캐주얼_purord        : num  0 3 5 2 7 0 0 2 0 2 ...
##  $ 일반식품_purord        : num  5 0 0 0 1 7 2 0 0 0 ...
##  $ 정장셔츠_purord        : num  0 0 0 0 16 0 0 0 0 1 ...
custsig = custsig.fe[c(var3_idx)]


dim(custsig)
## [1] 49971    38
custsig_backup3 = custsig

5.5. Decision Tree : C5.0

c5_options <- C5.0Control(winnow = FALSE, noGlobalPruning = FALSE)
c5_model <- C5.0(as.factor(sex) ~ ., data=custsig_backup1, control=c5_options, rules=FALSE)

summary(c5_model)
## 
## Call:
## C5.0.formula(formula = as.factor(sex) ~ ., data = custsig_backup1,
##  control = c5_options, rules = FALSE)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Sun Dec  9 13:53:44 2018
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 49971 cases (73 attributes) from undefined.data
## 
## Decision tree:
## 
## 정장셔츠_purexp <= 0:
## :...유아동복_purnum > 2:
## :   :...유아동복_purnum <= 8: 1 (3833/1286)
## :   :   유아동복_purnum > 8:
## :   :   :...문화완구_purnum <= 11: 1 (1569/649)
## :   :       문화완구_purnum > 11: 0 (140/57)
## :   유아동복_purnum <= 2:
## :   :...트래디셔널캐주얼_purnum > 3:
## :       :...니트단품_purord <= 12: 1 (1030/382)
## :       :   니트단품_purord > 12: 0 (66/25)
## :       트래디셔널캐주얼_purnum <= 3:
## :       :...buy_brd <= 56:
## :           :...문화완구_purnum <= 3: 1 (28550/6670)
## :           :   문화완구_purnum > 3:
## :           :   :...스포츠_purnum <= 6: 1 (320/107)
## :           :       스포츠_purnum > 6: 0 (22/4)
## :           buy_brd > 56:
## :           :...일반식품_purnum > 49: 0 (15/2)
## :               일반식품_purnum <= 49:
## :               :...유니캐주얼_purnum <= 17: 1 (32/5)
## :                   유니캐주얼_purnum > 17: 0 (11/2)
## 정장셔츠_purexp > 0:
## :...수입명품_purexp <= 0:
##     :...정장셔츠_purnum > 1:
##     :   :...스포츠_purnum > 5:
##     :   :   :...기타바이어_purexp <= 0: 0 (610/274)
##     :   :   :   기타바이어_purexp > 0: 1 (372/146)
##     :   :   스포츠_purnum <= 5:
##     :   :   :...정장셔츠_purord > 9: 1 (642/200)
##     :   :       정장셔츠_purord <= 9:
##     :   :       :...장신구_purexp <= 0:
##     :   :           :...영캐주얼_purexp <= 0:
##     :   :           :   :...pref <= 1: 1 (818/360)
##     :   :           :   :   pref > 1: 0 (487/233)
##     :   :           :   영캐주얼_purexp > 0:
##     :   :           :   :...API <= 5.950819: 0 (30/8)
##     :   :           :       API > 5.950819: 1 (794/304)
##     :   :           장신구_purexp > 0:
##     :   :           :...정장셔츠_purnum <= 5: 1 (936/324)
##     :   :               정장셔츠_purnum > 5:
##     :   :               :...정장셔츠_purord <= 4: 1 (53/19)
##     :   :                   정장셔츠_purord > 4: 0 (34/8)
##     :   정장셔츠_purnum <= 1:
##     :   :...pref <= 1: 1 (3479/1116)
##     :       pref > 1:
##     :       :...영캐주얼_purexp > 0: 1 (546/192)
##     :           영캐주얼_purexp <= 0:
##     :           :...니트단품_purexp > 0: 1 (301/113)
##     :               니트단품_purexp <= 0:
##     :               :...피혁B_purnum > 1: 1 (52/16)
##     :                   피혁B_purnum <= 1:
##     :                   :...섬유_purexp <= 0: 1 (275/131)
##     :                       섬유_purexp > 0:
##     :                       :...유니캐주얼_purnum <= 2: 0 (108/37)
##     :                           유니캐주얼_purnum > 2: 1 (14/3)
##     수입명품_purexp > 0:
##     :...유아동복_purnum > 11: 0 (548/234)
##         유아동복_purnum <= 11:
##         :...유니캐주얼_purnum > 7: 0 (486/218)
##             유니캐주얼_purnum <= 7:
##             :...문화완구_purnum > 12: 0 (25/5)
##                 문화완구_purnum <= 12:
##                 :...섬유_purnum > 3: 1 (940/359)
##                     섬유_purnum <= 3:
##                     :...정장셔츠_purnum > 5: 0 (189/74)
##                         정장셔츠_purnum <= 5:
##                         :...피혁B_purnum > 1: 1 (413/149)
##                             피혁B_purnum <= 1:
##                             :...스포츠_purexp <= 0:
##                                 :...문화완구_purnum <= 2: 1 (595/236)
##                                 :   문화완구_purnum > 2: 0 (47/18)
##                                 스포츠_purexp > 0:
##                                 :...니트단품_purord <= 8: 1 (1222/566)
##                                     니트단품_purord > 8:
##                                     :...기타바이어_purexp <= 0: 0 (299/119)
##                                         기타바이어_purexp > 0: 1 (68/30)
## 
## 
## Evaluation on training data (49971 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##      39 14681(29.4%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##    1799 13363    (a): class 0
##    1318 33491    (b): class 1
## 
## 
##  Attribute usage:
## 
##  100.00% 정장셔츠_purexp
##   80.89% 유아동복_purnum
##   68.84% 문화완구_purnum
##   60.13% 트래디셔널캐주얼_purnum
##   57.93% buy_brd
##   28.78% 수입명품_purexp
##   24.78% 정장셔츠_purnum
##   12.17% pref
##   10.24% 스포츠_purnum
##    8.90% 유니캐주얼_purnum
##    7.59% 정장셔츠_purord
##    7.55% 섬유_purnum
##    6.85% 영캐주얼_purexp
##    6.31% 장신구_purexp
##    6.19% 피혁B_purnum
##    5.37% 니트단품_purord
##    4.46% 스포츠_purexp
##    2.70% 기타바이어_purexp
##    1.65% API
##    1.50% 니트단품_purexp
##    0.79% 섬유_purexp
##    0.12% 일반식품_purnum
## 
## 
## Time: 3.5 secs
var_dt <- list('sex', '정장셔츠_purexp','유아동복_purnum','문화완구_purnum','트래디셔널캐주얼_purnum',
               # 'buy_brd','수입명품_purexp','정장셔츠_purnum','pref','스포츠_purnum','유니캐주얼_purnum',
               'buy_brd','수입명품_purexp','정장셔츠_purnum','스포츠_purnum','유니캐주얼_purnum',
               '정장셔츠_purord','섬유_purnum','영캐주얼_purexp','장신구_purexp','피혁B_purnum','니트단품_purord',
               '스포츠_purexp','기타바이어_purexp','API','니트단품_purexp','섬유_purexp','일반식품_purnum')

var_dt
## [[1]]
## [1] "sex"
## 
## [[2]]
## [1] "정장셔츠_purexp"
## 
## [[3]]
## [1] "유아동복_purnum"
## 
## [[4]]
## [1] "문화완구_purnum"
## 
## [[5]]
## [1] "트래디셔널캐주얼_purnum"
## 
## [[6]]
## [1] "buy_brd"
## 
## [[7]]
## [1] "수입명품_purexp"
## 
## [[8]]
## [1] "정장셔츠_purnum"
## 
## [[9]]
## [1] "스포츠_purnum"
## 
## [[10]]
## [1] "유니캐주얼_purnum"
## 
## [[11]]
## [1] "정장셔츠_purord"
## 
## [[12]]
## [1] "섬유_purnum"
## 
## [[13]]
## [1] "영캐주얼_purexp"
## 
## [[14]]
## [1] "장신구_purexp"
## 
## [[15]]
## [1] "피혁B_purnum"
## 
## [[16]]
## [1] "니트단품_purord"
## 
## [[17]]
## [1] "스포츠_purexp"
## 
## [[18]]
## [1] "기타바이어_purexp"
## 
## [[19]]
## [1] "API"
## 
## [[20]]
## [1] "니트단품_purexp"
## 
## [[21]]
## [1] "섬유_purexp"
## 
## [[22]]
## [1] "일반식품_purnum"
var_idx_dt <- c()

for (i in var_dt){
  gg <- grep(i, colnames(custsig_backup1))
  var_idx_dt <- c(var_idx_dt, gg)
}

str(var_idx_dt)
##  int [1:22] 1 69 61 4 62 2 22 7 16 60 ...
var_idx_dt
##  [1]  1 69 61  4 62  2 22  7 16 60 73  5 66  9 19 26 65 20 15 63 21  6
# colnames(custsig)
custsig.fe = custsig_backup1
str(custsig.fe[c(var_idx_dt)])
## 'data.frame':    49971 obs. of  22 variables:
##  $ sex                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 정장셔츠_purexp        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 유아동복_purnum        : num  0 0 0 0 0 2 2 0 0 1 ...
##  $ 문화완구_purnum        : num  0 0 0 0 3 0 0 0 0 0 ...
##  $ 트래디셔널캐주얼_purnum: num  1 0 6 0 7 2 0 0 0 0 ...
##  $ buy_brd                : int  6 4 14 10 67 13 16 8 3 8 ...
##  $ 수입명품_purexp        : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ 정장셔츠_purnum        : num  0 0 0 0 3 0 0 0 0 1 ...
##  $ 스포츠_purnum          : num  2 0 4 0 5 0 0 1 0 0 ...
##  $ 유니캐주얼_purnum      : num  0 0 9 0 28 5 3 1 2 0 ...
##  $ 정장셔츠_purord        : num  0 0 0 0 16 0 0 0 0 1 ...
##  $ 섬유_purnum            : num  1 1 1 0 16 2 1 0 0 0 ...
##  $ 영캐주얼_purexp        : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ 장신구_purexp          : num  0 0 0 1 1 0 0 1 0 0 ...
##  $ 피혁B_purnum           : num  0 1 0 1 11 0 0 1 0 0 ...
##  $ 니트단품_purord        : num  0 0 0 6 13 0 12 0 0 0 ...
##  $ 스포츠_purexp          : num  1 0 1 0 1 0 0 0 0 0 ...
##  $ 기타바이어_purexp      : num  0 0 0 0 1 0 1 0 0 0 ...
##  $ API                    : num  90.75 121 24.2 51.86 3.18 ...
##  $ 니트단품_purexp        : num  0 0 0 1 1 0 1 0 0 0 ...
##  $ 섬유_purexp            : num  1 1 1 0 1 1 1 0 0 0 ...
##  $ 일반식품_purnum        : num  1 0 0 0 78 2 20 0 0 0 ...
custsig = custsig.fe[c(var_idx_dt)]


dim(custsig)
## [1] 49971    22
custsig_backup4 = custsig
plot(c5_model)

5.6. Reshape

str(custsig_backup3) # lm3
## 'data.frame':    49971 obs. of  38 variables:
##  $ sex                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 피혁B_puramt           : num  0 83700 0 250400 1421730 ...
##  $ 문화완구_purord        : num  0 0 0 0 11 0 0 0 0 0 ...
##  $ 캐릭터캐주얼_purord    : num  0 0 0 0 10 0 0 0 0 4 ...
##  $ 영캐주얼_purnum        : num  0 1 1 2 6 0 0 4 0 2 ...
##  $ 캐릭터캐주얼_purnum    : num  0 0 0 0 7 0 0 0 0 2 ...
##  $ 문화완구_purnum        : num  0 0 0 0 3 0 0 0 0 0 ...
##  $ 섬유_purnum            : num  1 1 1 0 16 2 1 0 0 0 ...
##  $ 가전_purord            : num  0 0 0 5 0 0 8 0 0 0 ...
##  $ 수입명품_purord        : num  0 0 0 0 0 8 0 0 0 0 ...
##  $ 피혁B_purexp           : num  0 1 0 1 1 0 0 1 0 0 ...
##  $ 니트단품_purord        : num  0 0 0 6 13 0 12 0 0 0 ...
##  $ buy_brd                : int  6 4 14 10 67 13 16 8 3 8 ...
##  $ 일반식품_purnum        : num  1 0 0 0 78 2 20 0 0 0 ...
##  $ 정장셔츠_purnum        : num  0 0 0 0 3 0 0 0 0 1 ...
##  $ API                    : num  90.75 121 24.2 51.86 3.18 ...
##  $ 스포츠_purnum          : num  2 0 4 0 5 0 0 1 0 0 ...
##  $ 피혁B_purnum           : num  0 1 0 1 11 0 0 1 0 0 ...
##  $ 기타바이어_purexp      : num  0 0 0 0 1 0 1 0 0 0 ...
##  $ 섬유_purexp            : num  1 1 1 0 1 1 1 0 0 0 ...
##  $ 수입명품_purexp        : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ 조리욕실_purexp        : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ nop6                   : num  6 4 14 11 114 28 41 12 4 11 ...
##  $ cmv                    : num  63550 38925 89565 242727 55486 ...
##  $ 유니캐주얼_purnum      : num  0 0 9 0 28 5 3 1 2 0 ...
##  $ 유아동복_purnum        : num  0 0 0 0 0 2 2 0 0 1 ...
##  $ 트래디셔널캐주얼_purnum: num  1 0 6 0 7 2 0 0 0 0 ...
##  $ 니트단품_purexp        : num  0 0 0 1 1 0 1 0 0 0 ...
##  $ 문화완구_purexp        : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ 스포츠_purexp          : num  1 0 1 0 1 0 0 0 0 0 ...
##  $ 영캐주얼_purexp        : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ 유아동복_purexp        : num  0 0 0 0 0 1 1 0 0 1 ...
##  $ 일반식품_purexp        : num  1 0 0 0 1 1 1 0 0 0 ...
##  $ 정장셔츠_purexp        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 트래디셔널캐주얼_purexp: num  1 0 1 0 1 1 0 0 0 0 ...
##  $ 영캐주얼_purord        : num  0 3 5 2 7 0 0 2 0 2 ...
##  $ 일반식품_purord        : num  5 0 0 0 1 7 2 0 0 0 ...
##  $ 정장셔츠_purord        : num  0 0 0 0 16 0 0 0 0 1 ...
str(custsig_backup4) # C5.0
## 'data.frame':    49971 obs. of  22 variables:
##  $ sex                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 정장셔츠_purexp        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 유아동복_purnum        : num  0 0 0 0 0 2 2 0 0 1 ...
##  $ 문화완구_purnum        : num  0 0 0 0 3 0 0 0 0 0 ...
##  $ 트래디셔널캐주얼_purnum: num  1 0 6 0 7 2 0 0 0 0 ...
##  $ buy_brd                : int  6 4 14 10 67 13 16 8 3 8 ...
##  $ 수입명품_purexp        : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ 정장셔츠_purnum        : num  0 0 0 0 3 0 0 0 0 1 ...
##  $ 스포츠_purnum          : num  2 0 4 0 5 0 0 1 0 0 ...
##  $ 유니캐주얼_purnum      : num  0 0 9 0 28 5 3 1 2 0 ...
##  $ 정장셔츠_purord        : num  0 0 0 0 16 0 0 0 0 1 ...
##  $ 섬유_purnum            : num  1 1 1 0 16 2 1 0 0 0 ...
##  $ 영캐주얼_purexp        : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ 장신구_purexp          : num  0 0 0 1 1 0 0 1 0 0 ...
##  $ 피혁B_purnum           : num  0 1 0 1 11 0 0 1 0 0 ...
##  $ 니트단품_purord        : num  0 0 0 6 13 0 12 0 0 0 ...
##  $ 스포츠_purexp          : num  1 0 1 0 1 0 0 0 0 0 ...
##  $ 기타바이어_purexp      : num  0 0 0 0 1 0 1 0 0 0 ...
##  $ API                    : num  90.75 121 24.2 51.86 3.18 ...
##  $ 니트단품_purexp        : num  0 0 0 1 1 0 1 0 0 0 ...
##  $ 섬유_purexp            : num  1 1 1 0 1 1 1 0 0 0 ...
##  $ 일반식품_purnum        : num  1 0 0 0 78 2 20 0 0 0 ...
colnames(custsig_backup3)
##  [1] "sex"                     "피혁B_puramt"           
##  [3] "문화완구_purord"         "캐릭터캐주얼_purord"    
##  [5] "영캐주얼_purnum"         "캐릭터캐주얼_purnum"    
##  [7] "문화완구_purnum"         "섬유_purnum"            
##  [9] "가전_purord"             "수입명품_purord"        
## [11] "피혁B_purexp"            "니트단품_purord"        
## [13] "buy_brd"                 "일반식품_purnum"        
## [15] "정장셔츠_purnum"         "API"                    
## [17] "스포츠_purnum"           "피혁B_purnum"           
## [19] "기타바이어_purexp"       "섬유_purexp"            
## [21] "수입명품_purexp"         "조리욕실_purexp"        
## [23] "nop6"                    "cmv"                    
## [25] "유니캐주얼_purnum"       "유아동복_purnum"        
## [27] "트래디셔널캐주얼_purnum" "니트단품_purexp"        
## [29] "문화완구_purexp"         "스포츠_purexp"          
## [31] "영캐주얼_purexp"         "유아동복_purexp"        
## [33] "일반식품_purexp"         "정장셔츠_purexp"        
## [35] "트래디셔널캐주얼_purexp" "영캐주얼_purord"        
## [37] "일반식품_purord"         "정장셔츠_purord"
colnames(custsig_backup4)
##  [1] "sex"                     "정장셔츠_purexp"        
##  [3] "유아동복_purnum"         "문화완구_purnum"        
##  [5] "트래디셔널캐주얼_purnum" "buy_brd"                
##  [7] "수입명품_purexp"         "정장셔츠_purnum"        
##  [9] "스포츠_purnum"           "유니캐주얼_purnum"      
## [11] "정장셔츠_purord"         "섬유_purnum"            
## [13] "영캐주얼_purexp"         "장신구_purexp"          
## [15] "피혁B_purnum"            "니트단품_purord"        
## [17] "스포츠_purexp"           "기타바이어_purexp"      
## [19] "API"                     "니트단품_purexp"        
## [21] "섬유_purexp"             "일반식품_purnum"
custsig_backup3 %in% custsig_backup4
##  [1]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
## [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
## [23] FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE
## [34]  TRUE FALSE FALSE FALSE  TRUE
col_nm <- names(custsig_backup3[,custsig_backup3 %in% custsig_backup4])
custsig <- custsig_backup3[,col_nm]

str(custsig)
## 'data.frame':    49971 obs. of  21 variables:
##  $ sex                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 문화완구_purnum        : num  0 0 0 0 3 0 0 0 0 0 ...
##  $ 섬유_purnum            : num  1 1 1 0 16 2 1 0 0 0 ...
##  $ 니트단품_purord        : num  0 0 0 6 13 0 12 0 0 0 ...
##  $ buy_brd                : int  6 4 14 10 67 13 16 8 3 8 ...
##  $ 일반식품_purnum        : num  1 0 0 0 78 2 20 0 0 0 ...
##  $ 정장셔츠_purnum        : num  0 0 0 0 3 0 0 0 0 1 ...
##  $ API                    : num  90.75 121 24.2 51.86 3.18 ...
##  $ 스포츠_purnum          : num  2 0 4 0 5 0 0 1 0 0 ...
##  $ 피혁B_purnum           : num  0 1 0 1 11 0 0 1 0 0 ...
##  $ 기타바이어_purexp      : num  0 0 0 0 1 0 1 0 0 0 ...
##  $ 섬유_purexp            : num  1 1 1 0 1 1 1 0 0 0 ...
##  $ 수입명품_purexp        : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ 유니캐주얼_purnum      : num  0 0 9 0 28 5 3 1 2 0 ...
##  $ 유아동복_purnum        : num  0 0 0 0 0 2 2 0 0 1 ...
##  $ 트래디셔널캐주얼_purnum: num  1 0 6 0 7 2 0 0 0 0 ...
##  $ 니트단품_purexp        : num  0 0 0 1 1 0 1 0 0 0 ...
##  $ 스포츠_purexp          : num  1 0 1 0 1 0 0 0 0 0 ...
##  $ 영캐주얼_purexp        : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ 정장셔츠_purexp        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 정장셔츠_purord        : num  0 0 0 0 16 0 0 0 0 1 ...
names(custsig[c(5, 8)])
## [1] "buy_brd" "API"
custsig <- custsig[-c(5, 8)]

str(custsig)
## 'data.frame':    49971 obs. of  19 variables:
##  $ sex                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 문화완구_purnum        : num  0 0 0 0 3 0 0 0 0 0 ...
##  $ 섬유_purnum            : num  1 1 1 0 16 2 1 0 0 0 ...
##  $ 니트단품_purord        : num  0 0 0 6 13 0 12 0 0 0 ...
##  $ 일반식품_purnum        : num  1 0 0 0 78 2 20 0 0 0 ...
##  $ 정장셔츠_purnum        : num  0 0 0 0 3 0 0 0 0 1 ...
##  $ 스포츠_purnum          : num  2 0 4 0 5 0 0 1 0 0 ...
##  $ 피혁B_purnum           : num  0 1 0 1 11 0 0 1 0 0 ...
##  $ 기타바이어_purexp      : num  0 0 0 0 1 0 1 0 0 0 ...
##  $ 섬유_purexp            : num  1 1 1 0 1 1 1 0 0 0 ...
##  $ 수입명품_purexp        : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ 유니캐주얼_purnum      : num  0 0 9 0 28 5 3 1 2 0 ...
##  $ 유아동복_purnum        : num  0 0 0 0 0 2 2 0 0 1 ...
##  $ 트래디셔널캐주얼_purnum: num  1 0 6 0 7 2 0 0 0 0 ...
##  $ 니트단품_purexp        : num  0 0 0 1 1 0 1 0 0 0 ...
##  $ 스포츠_purexp          : num  1 0 1 0 1 0 0 0 0 0 ...
##  $ 영캐주얼_purexp        : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ 정장셔츠_purexp        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ 정장셔츠_purord        : num  0 0 0 0 16 0 0 0 0 1 ...
custsig$sex = factor(custsig$sex)

inTrain = createDataPartition(y = custsig$sex, p = 0.8, list = FALSE)

custsig.test = custsig[-inTrain,]
custsig = custsig[inTrain,]

sampling = createDataPartition(y = custsig$sex, p = 0.8, list = FALSE) # 20%(0.1) = 10,000 rows

custsig.train = custsig[sampling,]

5.6. C5.0

# c5_options <- C5.0Control(winnow = FALSE, noGlobalPruning = FALSE)
# c5_model <- C5.0(sex ~ ., data=custsig.train, control=c5_options, rules=FALSE)
# summary(c5_model)
# plot(c5_model)
# custsig.test$c5_pred <- predict(c5_model, custsig.test, type="class")
# custsig.test$c5_pred_prob <- predict(c5_model, custsig.test, type="prob")

fitControl <- trainControl(## ?-fold CV
  method = "repeatedcv",
  number = 3,
  ## repeated five times
  repeats = 1)

model_c5 <- caret::train(sex ~ ., 
                         data = custsig.train, 
                         method = "C5.0", 
                         trControl = fitControl)

custsig.test$c5_pred_train <- predict(model_c5, custsig.test, type="raw")
custsig.test$c5_pred_prob_train <- predict(model_c5, custsig.test, type="prob")

#confusionMatrix(custsig.test$c5_pred, custsig.test$sex)
confusionMatrix(custsig.test$c5_pred_train, custsig.test$sex)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0    0    0
##          1 3032 6961
##                                           
##                Accuracy : 0.6966          
##                  95% CI : (0.6875, 0.7056)
##     No Information Rate : 0.6966          
##     P-Value [Acc > NIR] : 0.5049          
##                                           
##                   Kappa : 0               
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.0000          
##             Specificity : 1.0000          
##          Pos Pred Value :    NaN          
##          Neg Pred Value : 0.6966          
##              Prevalence : 0.3034          
##          Detection Rate : 0.0000          
##    Detection Prevalence : 0.0000          
##       Balanced Accuracy : 0.5000          
##                                           
##        'Positive' Class : 0               
## 

5.7. k-Nearest Neighbors

model_kknn <- caret::train(sex  ~ .,
                           data = custsig.train,
                           method = "kknn",
                           preProcess = NULL,
                           trControl = fitControl)
model_kknn
## k-Nearest Neighbors 
## 
## 31983 samples
##    18 predictor
##     2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (3 fold, repeated 1 times) 
## Summary of sample sizes: 21322, 21322, 21322 
## Resampling results across tuning parameters:
## 
##   kmax  Accuracy   Kappa     
##   5     0.6464372  0.08493933
##   7     0.6555983  0.08981885
##   9     0.6605071  0.09149299
## 
## Tuning parameter 'distance' was held constant at a value of 2
## 
## Tuning parameter 'kernel' was held constant at a value of optimal
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were kmax = 9, distance = 2 and
##  kernel = optimal.
custsig.test$kknn_pred_train = predict(model_kknn, custsig.test)
confusionMatrix(custsig.test$kknn_pred_train, custsig.test$sex)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0  756 1036
##          1 2276 5925
##                                           
##                Accuracy : 0.6686          
##                  95% CI : (0.6592, 0.6778)
##     No Information Rate : 0.6966          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.1136          
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.24934         
##             Specificity : 0.85117         
##          Pos Pred Value : 0.42188         
##          Neg Pred Value : 0.72247         
##              Prevalence : 0.30341         
##          Detection Rate : 0.07565         
##    Detection Prevalence : 0.17933         
##       Balanced Accuracy : 0.55026         
##                                           
##        'Positive' Class : 0               
## 

5.8. Ensemble : Random Forest

rf <- randomForest(as.factor(sex)~., data=custsig.train, ntree=1000, importance=TRUE)

imp <- importance(rf, type=1)
featureImportance <- data.frame(Feature=row.names(imp), Importance=imp[,1])
pred_tr_rf <- predict(rf, type = "class")
pred_te_rf <- predict(rf, custsig.test, type = "class")
confusionMatrix(custsig.test$sex, pred_te_rf)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0  426 2606
##          1  440 6521
##                                           
##                Accuracy : 0.6952          
##                  95% CI : (0.6861, 0.7042)
##     No Information Rate : 0.9133          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.0968          
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.49192         
##             Specificity : 0.71447         
##          Pos Pred Value : 0.14050         
##          Neg Pred Value : 0.93679         
##              Prevalence : 0.08666         
##          Detection Rate : 0.04263         
##    Detection Prevalence : 0.30341         
##       Balanced Accuracy : 0.60320         
##                                           
##        'Positive' Class : 0               
## 
rf
## 
## Call:
##  randomForest(formula = as.factor(sex) ~ ., data = custsig.train,      ntree = 1000, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 1000
## No. of variables tried at each split: 4
## 
##         OOB estimate of  error rate: 30.79%
## Confusion matrix:
##      0     1 class.error
## 0 1349  8355  0.86098516
## 1 1494 20785  0.06705867
##### simple use of Cross Validation
# prepare training scheme
control <- trainControl(method = "repeatedcv", number = 5, repeats = 5) # maximum fold = 5 (5등분, train:test = 8:2)

# if (!require("kknn")) install.packages("kknn"); library(kknn)
model_rf <- caret::train(sex  ~ .,
                         data = custsig.train,
                         method = "rf",
                         preProcess = NULL,
                         trControl = control)
model_rf
## Random Forest 
## 
## 31983 samples
##    18 predictor
##     2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 25586, 25586, 25586, 25586, 25588, 25587, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##    2    0.6988400  0.05355415
##   10    0.6772286  0.09700679
##   18    0.6709378  0.09597880
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
custsig.test$rf_pred_train = predict(model_rf, custsig.test)
confusionMatrix(custsig.test$rf_pred_train, custsig.test$sex)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0  165  172
##          1 2867 6789
##                                           
##                Accuracy : 0.6959          
##                  95% CI : (0.6868, 0.7049)
##     No Information Rate : 0.6966          
##     P-Value [Acc > NIR] : 0.5654          
##                                           
##                   Kappa : 0.0397          
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.05442         
##             Specificity : 0.97529         
##          Pos Pred Value : 0.48961         
##          Neg Pred Value : 0.70309         
##              Prevalence : 0.30341         
##          Detection Rate : 0.01651         
##    Detection Prevalence : 0.03372         
##       Balanced Accuracy : 0.51486         
##                                           
##        'Positive' Class : 0               
## 

5.9. Evaluation

#### Evaluation
### Comparing accuracy of models 1
# Create a list of models
models <- list(C5 = model_c5, rf = model_rf,  kknn = model_kknn)
models
## $C5
## C5.0 
## 
## 31983 samples
##    18 predictor
##     2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (3 fold, repeated 1 times) 
## Summary of sample sizes: 21322, 21322, 21322 
## Resampling results across tuning parameters:
## 
##   model  winnow  trials  Accuracy   Kappa     
##   rules  FALSE    1      0.6973392  0.03685282
##   rules  FALSE   10      0.6949629  0.10640038
##   rules  FALSE   20      0.6968389  0.07628433
##   rules   TRUE    1      0.6984335  0.04147847
##   rules   TRUE   10      0.6948691  0.05894243
##   rules   TRUE   20      0.6959635  0.05617548
##   tree   FALSE    1      0.6974643  0.09091232
##   tree   FALSE   10      0.6940875  0.11108102
##   tree   FALSE   20      0.6947753  0.09895512
##   tree    TRUE    1      0.6978082  0.04877661
##   tree    TRUE   10      0.6949629  0.05589899
##   tree    TRUE   20      0.6946503  0.06380408
## 
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were trials = 1, model = rules
##  and winnow = TRUE.
## 
## $rf
## Random Forest 
## 
## 31983 samples
##    18 predictor
##     2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 25586, 25586, 25586, 25586, 25588, 25587, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa     
##    2    0.6988400  0.05355415
##   10    0.6772286  0.09700679
##   18    0.6709378  0.09597880
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
## 
## $kknn
## k-Nearest Neighbors 
## 
## 31983 samples
##    18 predictor
##     2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (3 fold, repeated 1 times) 
## Summary of sample sizes: 21322, 21322, 21322 
## Resampling results across tuning parameters:
## 
##   kmax  Accuracy   Kappa     
##   5     0.6464372  0.08493933
##   7     0.6555983  0.08981885
##   9     0.6605071  0.09149299
## 
## Tuning parameter 'distance' was held constant at a value of 2
## 
## Tuning parameter 'kernel' was held constant at a value of optimal
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were kmax = 9, distance = 2 and
##  kernel = optimal.
# Resample the models
#resample_results <- resamples(models)

# Generate a summary
#summary(resample_results, metric = c("Accuracy", "Kappa"))
#bwplot(resample_results, metric = c("Accuracy", "Kappa"))