8์›” 23์ผ์ž ์ถœ์„๋ถ€์—๋งŒ ๋‚˜์˜ค๋Š” ์‚ฌ๋žŒ๋“ค์˜ ๋ถ„ํฌ

id group
20163304 Red
20173426 Red
20175257 Red
20182202 Red
20182716 Red
20182998 Red
20192329 Black
20192429 Black
20192535 Red
20192737 Black
20192918 Red
20193606 Red
20193611 Black
20193624 Red
20193632 Red
20193645 Red
20193966 Red
20195170 Red
20196218 Red
20196506 Black
20202725 Red
20203604 Black
20203924 Black
20205120 Red
20212426 Red
20212603 Red
20216272 Black
20216277 Red
20216733 Red
20221084 Black
20222422 Black
20222583 Black
20222627 Red
20222975 Red
20223849 Red
20226104 Black
20226175 Black
20226238 Black
20226707 Black
Red Black
24 15

8์›” 23์ผ ์ถœ์„๋ถ€์—๋„ ๋‚˜์˜ค๋Š” ์‚ฌ๋žŒ๋“ค์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

## 'data.frame':    974 obs. of  7 variables:
##  $ dept   : chr  "์Šค๋งˆํŠธIoT์ „๊ณต" "์ฒ ํ•™์ „๊ณต" "์‚ฌํšŒํ•™๊ณผ" "๋””์ง€ํ„ธ๋ฏธ๋””์–ด์ฝ˜ํ…์ธ ์ „๊ณต" ...
##  $ id     : chr  "20095324" "20141321" "20142239" "20152552" ...
##  $ name   : chr  "๊น€ํƒœ์–ธ" "์˜ค์žฌ์„" "์ตœ์ข…์›" "์–‘์„ฑ์ผ" ...
##  $ status : chr  "ํ•™์ƒ" "ํ•™์ƒ" "ํ•™์ƒ" "ํ•™์ƒ" ...
##  $ email  : chr  "youngble@kakao.com" "mintohjs@gmail.com" "cjw950712@hanmail.net" "kerect@naver.com" ...
##  $ cell_no: chr  "01020556431" "01071709869" "01025038265" "01041712254" ...
##  $ group  : Factor w/ 2 levels "Red","Black": 1 2 2 2 2 2 1 2 1 1 ...

์ˆ˜๊ฐ•์‹ ์ฒญ ๋ณ€๊ฒฝ์œผ๋กœ ๋‚˜๊ฐ„ ์‚ฌ๋žŒ๋“ค์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

## tibble [39 ร— 7] (S3: tbl_df/tbl/data.frame)
##  $ dept   : chr [1:39] "๋ฐ˜๋„์ฒด์ „๊ณต" "ํ™”ํ•™๊ณผ" "๋น…๋ฐ์ดํ„ฐ์ „๊ณต" "์‚ฌํšŒํ•™๊ณผ" ...
##  $ id     : chr [1:39] "20163304" "20173426" "20175257" "20182202" ...
##  $ name   : chr [1:39] "๊ฐ•์œค๊ตฌ" "์ด์ค€์˜" "์กฐ์šฐํ˜•" "๊ณฝ๋ฏผ์ˆ˜" ...
##  $ status : chr [1:39] "ํœดํ•™" "ํ•™์ƒ" "ํ•™์ƒ" "ํ•™์ƒ" ...
##  $ email  : chr [1:39] "jnh04136@daum.net" "ak5566@naver.com" "uh9222959@gmail.com" "kmen000@gmail.com" ...
##  $ cell_no: chr [1:39] "01027037496" "01085076590" "01027202959" "01056581614" ...
##  $ group  : Factor w/ 2 levels "Red","Black": 1 1 1 1 1 1 2 2 1 2 ...

์ˆ˜๊ฐ• ์‹ ์ฒญ ๋ณ€๊ฒฝ์œผ๋กœ ์ƒˆ๋กœ ๋“ค์–ด์˜จ ์‚ฌ๋žŒ๋“ค์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ

## 'data.frame':    108 obs. of  7 variables:
##  $ dept   : chr  "์‚ฌํšŒ๋ณต์ง€ํ•™์ „๊ณต" "๊ด‘๊ณ ํ™๋ณดํ•™๊ณผ" "๋ฒ•ํ•™๊ณผ" "๊ฒฝ์˜ํ•™๊ณผ" ...
##  $ id     : chr  "20172304" "20172627" "20172741" "20172877" ...
##  $ name   : chr  "๊น€๋‚˜์˜" "์ด์ข…๋ช…" "์ด์ค€ํ™˜" "์ด์ƒํ›ˆ" ...
##  $ status : chr  "ํ•™์ƒ" "ํ•™์ƒ" "ํ•™์ƒ" "ํ•™์ƒ" ...
##  $ email  : chr  "na09320932@naver.com" "mlele@naver.com" "jhl012248@gmail.com" "dkrlenf1001@naver.com" ...
##  $ cell_no: chr  "01072230932" "01055797134" "01048511088" "01055917376" ...
##  $ group  : Factor w/ 2 levels "Red","Black": NA NA NA NA NA NA NA NA NA NA ...

๋‚จ์•„ ์žˆ๋˜ ์‚ฌ๋žŒ๋“ค + ์ƒˆ๋กœ ๋“ค์–ด์˜จ ์‚ฌ๋žŒ๋“ค

## 'data.frame':    1082 obs. of  7 variables:
##  $ dept   : chr  "์Šค๋งˆํŠธIoT์ „๊ณต" "์ฒ ํ•™์ „๊ณต" "์‚ฌํšŒํ•™๊ณผ" "๋””์ง€ํ„ธ๋ฏธ๋””์–ด์ฝ˜ํ…์ธ ์ „๊ณต" ...
##  $ id     : chr  "20095324" "20141321" "20142239" "20152552" ...
##  $ name   : chr  "๊น€ํƒœ์–ธ" "์˜ค์žฌ์„" "์ตœ์ข…์›" "์–‘์„ฑ์ผ" ...
##  $ status : chr  "ํ•™์ƒ" "ํ•™์ƒ" "ํ•™์ƒ" "ํ•™์ƒ" ...
##  $ email  : chr  "youngble@kakao.com" "mintohjs@gmail.com" "cjw950712@hanmail.net" "kerect@naver.com" ...
##  $ cell_no: chr  "01020556431" "01071709869" "01025038265" "01041712254" ...
##  $ group  : Factor w/ 2 levels "Red","Black": 1 2 2 2 2 2 1 2 1 1 ...

์ƒˆ๋กœ ๋“ค์–ด์˜จ ์‚ฌ๋žŒ์—๊ฒŒ๋งŒ ๋žœ๋คํ™” ์ ์šฉ

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.79    3.63    4.40    4.49    5.26   11.81
## [1] 1.2
## 4010368 4071302 4347866 4384724 4616274 4634838 4640294 4737156 4860036 4936595 
##    0.97    0.90    0.79    0.91    0.99    0.90    0.79    0.99    0.89    0.98
## [1] "4347866"

Randomization

# set.seed(Xmin)
set.seed(Xmin)
id_red <- sample(1:N_new, size = red_new)
class_roll[class_roll$id %in% id_new, "group"] <- 
  factor(ifelse(1:N_new %in% id_red, "Red", "Black"), levels = c("Red", "Black")) 
red_and_black(Xmin)
## [1] 0.7872892

ํ•™๋ฒˆ

class_roll$id_2 <-
  class_roll$id %>%
  ifelse(. <= 2016, "2016", .)
tbl1 <- class_roll %$%
  table(.$group, .$id_2 %>% substr(1, 4)) %>%
  `colnames<-`(c("2016 ์ด์ „", 2017:2022)) 
tbl1 %>%
  pander
ย  2016 ์ด์ „ 2017 2018 2019 2020 2021 2022
Red 14 33 64 63 50 73 244
Black 15 31 62 64 51 73 245
class_roll <- class_roll[, names(class_roll0823)]
X1min <- tbl1 %>%
  chisq.test(simulate.p.value = TRUE) %>%
  `[[`(1)
X1min
## X-squared 
## 0.1485488

ํ•™๋ฒˆ ํ™€์ง

tbl2 <- class_roll$id %>%
  as.numeric %>%
  `%%`(2) %>%
  factor(levels = c(1, 0), labels = c("ํ™€", "์ง")) %>%
  table(class_roll$group, .) 
tbl2 %>%
  pander
ย  ํ™€ ์ง
Red 286 255
Black 284 257
X2min <- tbl2 %>%
  chisq.test(simulate.p.value = TRUE) %>%
  `[[`(1)
X2min
##  X-squared 
## 0.01483004

ํ•™์  ์ƒํƒœ

tbl3 <- class_roll$status %>%
  table(class_roll$group, .) 
tbl3 %>%
  pander
ย  ํ•™์ƒ ํœดํ•™
Red 532 9
Black 533 8
X3min <- tbl3 %>%
  chisq.test(simulate.p.value = TRUE) %>%
  `[[`(1)
X3min
## X-squared 
## 0.0597625

e-mail ์„œ๋น„์Šค์—…์ฒด

tbl4 <- class_roll$email %>%
  strsplit("@", fixed = TRUE) %>%
  sapply("[", 2) %>%
  `==`("naver.com") %>%
  ifelse("๋„ค์ด๋ฒ„", "๊ธฐํƒ€์„œ๋น„์Šค") %>%
  factor(levels = c("๋„ค์ด๋ฒ„", "๊ธฐํƒ€์„œ๋น„์Šค")) %>%
  table(class_roll$group, .) 
tbl4 %>%
  pander
ย  ๋„ค์ด๋ฒ„ ๊ธฐํƒ€์„œ๋น„์Šค
Red 430 111
Black 433 108
X4min <- tbl4 %>%
  chisq.test(simulate.p.value = TRUE) %>%
  `[[`(1)
X4min
##  X-squared 
## 0.05152463

์ „ํ™”๋ฒˆํ˜ธ์˜ ๋ถ„ํฌ

cut_label <- paste(paste0(0:9, "000"), paste0(0:9, "999"), 
                   sep = "~")
tbl5 <- class_roll$cell_no %>%
  substr(start = 8, stop = 11) %>%
  sapply(as.numeric) %>%
  cut(labels = cut_label, 
      breaks = seq(0, 10000, by = 1000)) %>%
  table(class_roll$group, .) 
tbl5 %>%
  pander
ย  0000~0999 1000~1999 2000~2999 3000~3999 4000~4999 5000~5999 6000~6999 7000~7999 8000~8999 9000~9999
Red 50 50 47 57 62 53 61 59 48 54
Black 47 49 50 58 62 56 59 59 47 54
X5min <- tbl5 %>%
  chisq.test(simulate.p.value = TRUE) %>%
  `[[`(1)
X5min
## X-squared 
## 0.3307921

์„ฑ์”จ ๋ถ„ํฌ

f_name <- class_roll$name %>%
  substring(first = 1, last = 1) 
tbl6 <- f_name %>%
  `%in%`(c("๊น€", "์ด", "๋ฐ•")) %>%
  ifelse(f_name, "๊ธฐํƒ€") %>%
  factor(levels = c("๊น€", "์ด", "๋ฐ•", "๊ธฐํƒ€")) %>%
  table(class_roll$group, .) 
tbl6 %>%
  pander
ย  ๊น€ ์ด ๋ฐ• ๊ธฐํƒ€
Red 127 78 38 298
Black 124 80 41 296
X6min <- tbl6 %>%
  chisq.test(simulate.p.value = TRUE) %>%
  `[[`(1)
X6min
## X-squared 
## 0.1818311

Sum of Chi_Squares

Xsum_min <- X1min + X2min + X3min + X4min + X5min + X6min
Xsum_min
## X-squared 
## 0.7872892