Statistics with R

R 강의 Summary

2024년 4월 27일부터 5월 10일까지 AI Bigdata Statistics with R 강의 요약본입니다.

Summary는 네 부분으로 나눠져 있습니다.

기본 이론 부분
프로그램실습 (소수구하기, 구구단, ANOVA program)
R base & ggplot graph
data.table 부분

1. 연산자의 종류

x = 10
y = 20

# 산술연산자의 연습
x+y

## [1] 30

x-y

## [1] -10

x*y

## [1] 200

x/y

## [1] 0.5

x**2        # x의 제곱

## [1] 100

x^2         # x의 제곱

## [1] 100

x^0.5       # x의 제곱근

## [1] 3.162

sqrt(x)     # x의 제곱근

## [1] 3.162

10 %/% 3    # 나눗셈의 몫

## [1] 3

10 %% 3     # 나눗셈의 나머지

## [1] 1

# 논리연산자의 연습
x < 3

## [1] FALSE

x <= 3

## [1] FALSE

x > 3

## [1] TRUE

x >= 3

## [1] TRUE

x == y

## [1] FALSE

x != y

## [1] TRUE

x > 3 | x < 8

## [1] TRUE

x > 3 & x < 8

## [1] FALSE

2. vector의 생성 및 선택

c(1, 3, 5, 7, 9)      # 벡터의 생성

## [1] 1 3 5 7 9

1:6                   # 벡터의 생성

## [1] 1 2 3 4 5 6

c(1, 3, 5, 1:3)       # 벡터의 병합: c(1,3,5)와 1:3

## [1] 1 3 5 1 2 3

seq(1, 3, by=0.5)     # 1부터 3사이에 0.5씩 증가한 숫자

## [1] 1.0 1.5 2.0 2.5 3.0

rep(1:3, times=2)     # 1:3 전체를 두번 반복

## [1] 1 2 3 1 2 3

rep(1:3, each=2)      # 1:3 각각의 숫자를 두번 반복

## [1] 1 1 2 2 3 3

a = c(1, 3, 5, 7, 9)  # 벡터를 a라는 저장소에 저장
a[1]                  # a의 첫번째 숫자

## [1] 1

a[3]                  # a의 세번째 숫자

## [1] 5

a[2:4]                # a의 두번째에서 네번째 숫자

## [1] 3 5 7

a[c(1,3)]             # a의 첫번째와 세번째 숫자

## [1] 1 5

a[-3]                 # a에서 세번째 숫자만 제외하고 출력하기

## [1] 1 3 7 9

a[a == 5]             # a에서 5인 숫자만 출력하기

## [1] 5

a[a != 5]             # a에서 5가 아닌 숫자만 출력하기

## [1] 1 3 7 9

a[a == 5] = 0         # a에서 5를 0으로 바꾸기

3. 행렬(matrix)의 생성 및 선택

x = matrix(1:12, 3, 4)
y = matrix(1, 5, 1)
z = c(1, 4, 3, 2, 5, 4)
z = matrix(z, 3, 2)
print(x)

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

print(y)

##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1
## [4,]    1
## [5,]    1

print(z)

##      [,1] [,2]
## [1,]    1    2
## [2,]    4    5
## [3,]    3    4

print(t(z))

##      [,1] [,2] [,3]
## [1,]    1    4    3
## [2,]    2    5    4

4. 배열(array)의 생성 및 선택

X1=array(1:24,c(3,4,2)) # 1부터 24까지의 숫자로 3x4 배열을 2개 생성
Y1=array(1:24,c(6,4))   # 1부터 24까지의 숫자로 6x4 배열을 생성

X1[,,1]                 # 첫번째 3x4 행렬

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

X1[,,2]                 # 두번째 3x4 행렬

##      [,1] [,2] [,3] [,4]
## [1,]   13   16   19   22
## [2,]   14   17   20   23
## [3,]   15   18   21   24

X1[1,1,1]               # 첫번째 행렬의 [1,1]에 해당하는 숫자

## [1] 1

Y1[5,3]                 # 5번째 행, 3번째 열

## [1] 17

t(Y1)                   # Y1 배열을 치환(transpose)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    2    3    4    5    6
## [2,]    7    8    9   10   11   12
## [3,]   13   14   15   16   17   18
## [4,]   19   20   21   22   23   24

t(Y1)%*%Y1              # Y1' * Y1

##      [,1] [,2] [,3] [,4]
## [1,]   91  217  343  469
## [2,]  217  559  901 1243
## [3,]  343  901 1459 2017
## [4,]  469 1243 2017 2791

if (det(t(Y1)%*%Y1) == 0) {
  print("There is no inverse matrix becuase it is singular.")
} else {
  solve(t(Y1)%*%Y1)     # singular matrix이므로 역행렬이 존재하지 않음
}

## [1] "There is no inverse matrix becuase it is singular."

5. Dataframe의 생성과 선택

library(openxlsx)
library(knitr)

## Warning: 패키지 'knitr'는 R 버전 4.3.3에서 작성되었습니다

library(kableExtra)

## Warning: 패키지 'kableExtra'는 R 버전 4.3.3에서 작성되었습니다

df  =read.xlsx('regress.xlsx')
df[,1]

##  [1] 88 98 88 89 89 78 78 67 78 89 88 67 45 43 45 56 45 67 56 56 45 45 56 45 56
## [26] 45 34 45 56 65

df[[1]]

##  [1] 88 98 88 89 89 78 78 67 78 89 88 67 45 43 45 56 45 67 56 56 45 45 56 45 56
## [26] 45 34 45 56 65

df$근무만족도

##  [1] 88 98 88 89 89 78 78 67 78 89 88 67 45 43 45 56 45 67 56 56 45 45 56 45 56
## [26] 45 34 45 56 65

head(df)

##   근무만족도 대인관계 자아개념 근무평정 SES점수
## 1         88       34       78       88      88
## 2         98       23       98       78      56
## 3         88       34       78       98      78
## 4         89       23       88       77      78
## 5         89       34       88       89      67
## 6         78       45       87       89      78

df[,2:3]

##    대인관계 자아개념
## 1        34       78
## 2        23       98
## 3        34       78
## 4        23       88
## 5        34       88
## 6        45       87
## 7        34       89
## 8        34       67
## 9        45       56
## 10       34       78
## 11       78       67
## 12       65       34
## 13       56       45
## 14       67       34
## 15       78       45
## 16       76       56
## 17       78       45
## 18       89       34
## 19       78       45
## 20       67       34
## 21       56       23
## 22       34       34
## 23       23       56
## 24       34       45
## 25       23       34
## 26       34       45
## 27       45       34
## 28       34       23
## 29       45       34
## 30       56       56

df[1:3,c(1,5)]

##   근무만족도 SES점수
## 1         88      88
## 2         98      56
## 3         88      78

6. List의 생성과 선택

x=1:3                                   # vector
t=3.88                                  # 숫자 (scalar)
z=matrix(1:6,2,3)                       # 2x3 행렬식식
df=data.frame(x=1:3,                    # x,y,z를 포함한 dataframe
              y=c('kim','lee','park'),
              z=c(TRUE,TRUE,FALSE))

output=list(x=x,t=t,df=df,z=z)          # x,t, df, z를 포함한 list
output                                  # list의 출력

## $x
## [1] 1 2 3
## 
## $t
## [1] 3.88
## 
## $df
##   x    y     z
## 1 1  kim  TRUE
## 2 2  lee  TRUE
## 3 3 park FALSE
## 
## $z
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

output$df                               # list안에 있는 df를 출력

##   x    y     z
## 1 1  kim  TRUE
## 2 2  lee  TRUE
## 3 3 park FALSE

z[,1]                                   # z행렬의 첫번째 열

## [1] 1 2

output$df[3,3]                          # list안에 있는 df[3,3]을 출력

## [1] FALSE

7. 프로그램을 위한 기초 (print)

options(digits=5)
cputime = 5.144623                           # 모든 자리수의 합이 5개개
print(cputime)                               # 실수의 출력

## [1] 5.1446

print(round(cputime,3))                      # 소수점 3자리까지 표시

## [1] 5.145

print('이것은 결과입니다.')                  # 문자의 출력

## [1] "이것은 결과입니다."

print(paste('CPUtime:',cputime,'seconds'))   # paste()의 예시

## [1] "CPUtime: 5.144623 seconds"

print(paste0('CPUtime: ',cputime,'seconds')) # paste0()의 예시시

## [1] "CPUtime: 5.144623seconds"

cat('이것은 결과입니다')                     # cat

## 이것은 결과입니다

cat('CPU    time:',cputime,'seconds','\n')

## CPU  time: 5.1446 seconds

a   <- 'hello'
b   <-  12345
c   <-  12345.1287  
sprintf('문자형: %s, 정수형: %d, 실수형: %.3f',a,b,c)      # sprintf의 예시

## [1] "문자형: hello, 정수형: 12345, 실수형: 12345.129"

v=c('성별','수학성적')
sprintf('%s은 %s에 따라 차이가 있을 것이다.',v[2],v[1])    # sprintf의 예시

## [1] "수학성적은 성별에 따라 차이가 있을 것이다."

8. 내장함수

library(knitr)
df = read.xlsx('regress.xlsx')

head(df)          # 파일의 처음 6줄을 출력

##   근무만족도 대인관계 자아개념 근무평정 SES점수
## 1         88       34       78       88      88
## 2         98       23       98       78      56
## 3         88       34       78       98      78
## 4         89       23       88       77      78
## 5         89       34       88       89      67
## 6         78       45       87       89      78

tail(df)          # 파일의 마지막 6줄을 출력

##    근무만족도 대인관계 자아개념 근무평정 SES점수
## 25         56       23       34       89      34
## 26         45       34       45       78      45
## 27         34       45       34       67      34
## 28         45       34       23       56      45
## 29         56       45       34       67      56
## 30         65       56       56       78      66

str(df)           # 파일의 구조(structure)를 출력

## 'data.frame':    30 obs. of  5 variables:
##  $ 근무만족도: num  88 98 88 89 89 78 78 67 78 89 ...
##  $ 대인관계  : num  34 23 34 23 34 45 34 34 45 34 ...
##  $ 자아개념  : num  78 98 78 88 88 87 89 67 56 78 ...
##  $ 근무평정  : num  88 78 98 77 89 89 98 67 78 67 ...
##  $ SES점수   : num  88 56 78 78 67 78 77 56 77 65 ...

class(df)         # 파일의 형태

## [1] "data.frame"

summary(df)       # 통계적 summary를 출력

##    근무만족도      대인관계       자아개념       근무평정       SES점수    
##  Min.   :34.0   Min.   :23.0   Min.   :23.0   Min.   :56.0   Min.   :34.0  
##  1st Qu.:45.0   1st Qu.:34.0   1st Qu.:34.0   1st Qu.:67.0   1st Qu.:45.0  
##  Median :56.0   Median :45.0   Median :45.0   Median :78.0   Median :56.0  
##  Mean   :63.4   Mean   :48.5   Mean   :54.3   Mean   :76.3   Mean   :57.1  
##  3rd Qu.:78.0   3rd Qu.:66.5   3rd Qu.:75.2   3rd Qu.:85.5   3rd Qu.:66.8  
##  Max.   :98.0   Max.   :89.0   Max.   :98.0   Max.   :98.0   Max.   :88.0

dim(df)           # 파일의 행과 열의 갯수를 출력

## [1] 30  5

nrow(df)          # 행(row)의 갯수

## [1] 30

ncol(df)          # 열(column)의 갯수

## [1] 5

x = c(1, 3, 5, 7, 9, 6, 3, 5, 2, 1)
cputime = 1.236648

mean(x)                   # x 벡터의 평균

## [1] 4.2

sd(x)                     # x의 표준편차

## [1] 2.6583

sum(x)                    # x의 합

## [1] 42

median(x)                 # x의 median(50 percentile)

## [1] 4

table(x)                  # x의 빈도

## x
## 1 2 3 5 6 7 9 
## 2 1 2 2 1 1 1

round(cputime,3)          # cputime을 소숫점 3자리에서 반올림해서 출력

## [1] 1.237

rnorm(100)                # 표준점수 100개를 생성

##   [1] -0.8062899 -1.0973818 -0.8277390  0.5042571  0.3471299  0.2556781
##   [7] -0.7277648  1.3614901 -1.3533104 -0.3729283  1.1483214  0.1850582
##  [13] -0.9720168 -1.6114221 -1.1021664 -0.4777951 -0.0915187 -1.0909899
##  [19]  0.3440682  1.9805342  0.1487838  0.5152881 -1.6082548 -1.1757024
##  [25]  0.9458490  2.9648426 -0.1020485 -0.5568637 -0.7971295 -0.4718033
##  [31]  0.1732033  1.3247653 -0.4922776 -1.5808338 -0.1250612 -0.4314272
##  [37]  1.1481644 -0.0379028 -1.5131668 -0.7175106  0.1579532  1.2933634
##  [43]  0.4574107  0.0583636  0.5997591 -0.6778376  0.0528283  1.7686372
##  [49] -1.1477371  1.2152226 -2.0744018 -1.0780262  0.1897434  1.4799311
##  [55] -0.7859045  0.6457989  0.2814963  1.4354522 -0.0093698  0.5344324
##  [61]  1.0332453 -0.0544037 -0.9646329 -0.3042311 -0.2457586 -0.2435776
##  [67]  0.7054742 -1.2240795  0.3870255 -0.3462118 -0.1857398  0.8541510
##  [73] -0.0631331 -0.4762070  0.8898569 -0.9138903  1.8138317  0.5439800
##  [79]  1.3256203 -0.4646191  0.0990272  0.6765738  0.4811449  0.2558923
##  [85]  0.2881863  1.5519616  1.6389326 -0.3493246 -0.3216736  0.2071126
##  [91] -1.9031895  0.0131968 -0.8258077 -0.3302246  0.0989575 -1.9516861
##  [97]  0.2312368  1.0210516 -2.0970142 -0.1919916

sample(50,100,replace=T)  # 1~50사이의 숫자를 중복을 허용하며 100개를 생성

##   [1] 47  8 29  9 26 20 41 13 50 40  1 16 26 18 37 14 17  5 23 32 22  7 45 48 28
##  [26] 16 45 37  3 44 15 20  5 24  6 44 48 44 35 46  8 43 22  6  3 24  1 41 16 40
##  [51]  3 44 49  6  5 28 44 35  4 22  3 25 20 31 49  3 50 49 18  9 30 44  9 48 48
##  [76] 22 41 11 46 38 50 41 50  4 40 36 38 28 32  6 28 25 33 38 20 19  6 32 33  3

factorial(10)             # 10 factorial 계산

## [1] 3628800

colMeans(df)              # 열별 평균

## 근무만족도   대인관계   자아개념   근무평정    SES점수 
##     63.400     48.533     54.333     76.267     57.133

colSums(df)               # 열별 합계

## 근무만족도   대인관계   자아개념   근무평정    SES점수 
##       1902       1456       1630       2288       1714

apply(df,2,sd)            # 열별 표준편차

## 근무만족도   대인관계   자아개념   근무평정    SES점수 
##     18.548     20.134     22.166     12.086     15.167

rowMeans(df)              # 행별 평균

##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 75.2 70.6 75.2 71.0 73.4 75.4 75.2 58.2 66.8 66.6 70.6 59.6 51.6 55.6 62.6 62.4 
##   17   18   19   20   21   22   23   24   25   26   27   28   29   30 
## 58.2 62.6 62.6 53.8 51.2 42.8 51.6 49.4 47.2 49.4 42.8 40.6 51.6 64.2

rowSums(df)               # 행별 합계

##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
## 376 353 376 355 367 377 376 291 334 333 353 298 258 278 313 312 291 313 313 269 
##  21  22  23  24  25  26  27  28  29  30 
## 256 214 258 247 236 247 214 203 258 321

apply(df,1,sd)            # 행별 표준편차

##       1       2       3       4       5       6       7       8       9      10 
## 23.4350 31.7616 24.4786 27.3953 23.9437 17.7285 24.5906 14.3422 15.3851 20.5985 
##      11      12      13      14      15      16      17      18      19      20 
## 12.5220 14.3457 16.6823 22.2666 16.6823  8.9051 18.0748 19.9825 14.7580 14.3422 
##      21      22      23      24      25      26      27      28      29      30 
## 28.9085  9.2033 16.6823 16.6823 26.2621 16.6823 14.3422 12.5419 12.5419  9.0664

cbind(df[,1],df[,4])      # df의 1열과 4열을 합치는 것

##       [,1] [,2]
##  [1,]   88   88
##  [2,]   98   78
##  [3,]   88   98
##  [4,]   89   77
##  [5,]   89   89
##  [6,]   78   89
##  [7,]   78   98
##  [8,]   67   67
##  [9,]   78   78
## [10,]   89   67
## [11,]   88   56
## [12,]   67   67
## [13,]   45   78
## [14,]   43   89
## [15,]   45   78
## [16,]   56   67
## [17,]   45   78
## [18,]   67   67
## [19,]   56   78
## [20,]   56   67
## [21,]   45   98
## [22,]   45   56
## [23,]   56   67
## [24,]   45   78
## [25,]   56   89
## [26,]   45   78
## [27,]   34   67
## [28,]   45   56
## [29,]   56   67
## [30,]   65   78

rbind(df[1,],df[3,])      # df의 1행과 3행을 합치는 것

##   근무만족도 대인관계 자아개념 근무평정 SES점수
## 1         88       34       78       88      88
## 3         88       34       78       98      78

9.프로그램의 기초

for (i in (1:10)) {    # 1부터 10까지를 프린트
  print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

hap = 0
for (i in 1:10) {           # 1부터 10까지 합을 구함
  hap = hap + i
}
print(hap)

## [1] 55

gop=1                       # 1부터 10까지 곱을 구함
for (i in 1:10) {
  gop = gop * i
}
print(gop)

## [1] 3628800

i = j = 10                     # i가 5보다 크거나 j가 15보다 작을 때까지 실행
while ((i > 5) & (j < 15)) {
  cat("i =",i,"\tj =",j,"\n")
  i = i -1
  j = j -1
}

## i = 10   j = 10 
## i = 9    j = 9 
## i = 8    j = 8 
## i = 7    j = 7 
## i = 6    j = 6

for (i in 1:10) {             # 1 부터 10까지 홀수, 짝수 표시하기
  if (i %% 2 == 0) {
    print(paste(i,"is even number"))
  } else {  
    print(paste(i,"is odd number"))
  }
}

## [1] "1 is odd number"
## [1] "2 is even number"
## [1] "3 is odd number"
## [1] "4 is even number"
## [1] "5 is odd number"
## [1] "6 is even number"
## [1] "7 is odd number"
## [1] "8 is even number"
## [1] "9 is odd number"
## [1] "10 is even number"

hap = function(n) {         # 1부터 n까지 수를 합한 수
  hap = 0
  for (i in 1:n) {
    hap = hap + i
  }
  print(hap)
}

hap (10)

## [1] 55

hap (100)

## [1] 5050

hap (1000)

## [1] 500500

10.실습1: 소수(prime number) 구하기

# Prime Number Function: 에라토스테네스의 체 

prime <- function(n) {
  options(scipen=999)
  start <- Sys.time()
  out <- c(1:n)
  out[1] <- 0
  m <- sqrt(n)
  for (i in c(2,seq(3,m,2))) {
    if (out[i] != 0) {
      out[seq(i+i,n,i)] <- 0
    }
  }
  out = out[out != 0]
  np = length(out)
  cputime <- Sys.time()-start
  
  cat("==============================================","\n")
  cat(" Finding prime numbers between 1 to",n,"\n")
  cat("----------------------------------------------","\n")
  cat(" Number of prime numbers:",np, "\n")
  cat(" CPU Running Time:",round(cputime,3),"seconds. ","\n")
  cat("==============================================","\n")
  
}

prime(100000000)

## ============================================== 
##  Finding prime numbers between 1 to 100000000 
## ---------------------------------------------- 
##  Number of prime numbers: 5761455 
##  CPU Running Time: 4.193 seconds.  
## ==============================================

11.실습2: 구구단 출력하기

#*******************************<9단 코드>*****************************
# 9단까지 한 줄에 3단씩 출력하기
# lineGap과 colGap을 활용하여 각 단의 시작점을 찾는 것이 가장 중요함.

out = matrix("",(9+1)*3,3)
pos = matrix(0,9,2)
lineGap=9
colGap =3

for (i in 1:9) {
    pos[i,1] = ((i-1) %/% colGap) * lineGap + ((i+(colGap-1)) %/% colGap)
    pos[i,2] = ((i-1) %%  colGap) + 1
}

for (i in 1:9)   {out[pos[i,1], pos[i,2]]  = paste0("[",i,"]단")
  for (j in 1:9) {out[pos[i,1]+j,pos[i,2]] = paste0(i, " x ", j, " = ",i*j)
  }
}

print(out, quote=F)

##       [,1]       [,2]       [,3]      
##  [1,] [1]단      [2]단      [3]단     
##  [2,] 1 x 1 = 1  2 x 1 = 2  3 x 1 = 3 
##  [3,] 1 x 2 = 2  2 x 2 = 4  3 x 2 = 6 
##  [4,] 1 x 3 = 3  2 x 3 = 6  3 x 3 = 9 
##  [5,] 1 x 4 = 4  2 x 4 = 8  3 x 4 = 12
##  [6,] 1 x 5 = 5  2 x 5 = 10 3 x 5 = 15
##  [7,] 1 x 6 = 6  2 x 6 = 12 3 x 6 = 18
##  [8,] 1 x 7 = 7  2 x 7 = 14 3 x 7 = 21
##  [9,] 1 x 8 = 8  2 x 8 = 16 3 x 8 = 24
## [10,] 1 x 9 = 9  2 x 9 = 18 3 x 9 = 27
## [11,] [4]단      [5]단      [6]단     
## [12,] 4 x 1 = 4  5 x 1 = 5  6 x 1 = 6 
## [13,] 4 x 2 = 8  5 x 2 = 10 6 x 2 = 12
## [14,] 4 x 3 = 12 5 x 3 = 15 6 x 3 = 18
## [15,] 4 x 4 = 16 5 x 4 = 20 6 x 4 = 24
## [16,] 4 x 5 = 20 5 x 5 = 25 6 x 5 = 30
## [17,] 4 x 6 = 24 5 x 6 = 30 6 x 6 = 36
## [18,] 4 x 7 = 28 5 x 7 = 35 6 x 7 = 42
## [19,] 4 x 8 = 32 5 x 8 = 40 6 x 8 = 48
## [20,] 4 x 9 = 36 5 x 9 = 45 6 x 9 = 54
## [21,] [7]단      [8]단      [9]단     
## [22,] 7 x 1 = 7  8 x 1 = 8  9 x 1 = 9 
## [23,] 7 x 2 = 14 8 x 2 = 16 9 x 2 = 18
## [24,] 7 x 3 = 21 8 x 3 = 24 9 x 3 = 27
## [25,] 7 x 4 = 28 8 x 4 = 32 9 x 4 = 36
## [26,] 7 x 5 = 35 8 x 5 = 40 9 x 5 = 45
## [27,] 7 x 6 = 42 8 x 6 = 48 9 x 6 = 54
## [28,] 7 x 7 = 49 8 x 7 = 56 9 x 7 = 63
## [29,] 7 x 8 = 56 8 x 8 = 64 9 x 8 = 72
## [30,] 7 x 9 = 63 8 x 9 = 72 9 x 9 = 81

#*******************************<일반 코드>*****************************
# Default: 9단까지 한 줄에 3단식 출력하기

gugudan <- function(dan=9,col=3) {
  out = matrix("",(dan+1)*ceiling(dan/col),col)
  pos = matrix(0,dan,2)
  
  for (i in 1:dan) {
    pos[i,1] = ((i-1) %/% col) * dan + ((i+col-1) %/% col)
    pos[i,2] = ((i-1) %%  col) + 1
  }
  for (i in 1:dan) {out[pos[i,1], pos[i,2]]  = paste0("[",i,"]단")
    for (j in 1:dan) {out[pos[i,1]+j,pos[i,2]] = paste0(i, " x ", j, " = ",i*j)}
  }

  require('readr')
  library(readr)
  print(out,quote=F)
  write_excel_csv(as.data.frame(out),"gugudan.csv")
}

gugudan()     # 9단까지, 한 줄에 3단씩 출력

## 필요한 패키지를 로딩중입니다: readr

## Warning: 패키지 'readr'는 R 버전 4.3.3에서 작성되었습니다

##       [,1]       [,2]       [,3]      
##  [1,] [1]단      [2]단      [3]단     
##  [2,] 1 x 1 = 1  2 x 1 = 2  3 x 1 = 3 
##  [3,] 1 x 2 = 2  2 x 2 = 4  3 x 2 = 6 
##  [4,] 1 x 3 = 3  2 x 3 = 6  3 x 3 = 9 
##  [5,] 1 x 4 = 4  2 x 4 = 8  3 x 4 = 12
##  [6,] 1 x 5 = 5  2 x 5 = 10 3 x 5 = 15
##  [7,] 1 x 6 = 6  2 x 6 = 12 3 x 6 = 18
##  [8,] 1 x 7 = 7  2 x 7 = 14 3 x 7 = 21
##  [9,] 1 x 8 = 8  2 x 8 = 16 3 x 8 = 24
## [10,] 1 x 9 = 9  2 x 9 = 18 3 x 9 = 27
## [11,] [4]단      [5]단      [6]단     
## [12,] 4 x 1 = 4  5 x 1 = 5  6 x 1 = 6 
## [13,] 4 x 2 = 8  5 x 2 = 10 6 x 2 = 12
## [14,] 4 x 3 = 12 5 x 3 = 15 6 x 3 = 18
## [15,] 4 x 4 = 16 5 x 4 = 20 6 x 4 = 24
## [16,] 4 x 5 = 20 5 x 5 = 25 6 x 5 = 30
## [17,] 4 x 6 = 24 5 x 6 = 30 6 x 6 = 36
## [18,] 4 x 7 = 28 5 x 7 = 35 6 x 7 = 42
## [19,] 4 x 8 = 32 5 x 8 = 40 6 x 8 = 48
## [20,] 4 x 9 = 36 5 x 9 = 45 6 x 9 = 54
## [21,] [7]단      [8]단      [9]단     
## [22,] 7 x 1 = 7  8 x 1 = 8  9 x 1 = 9 
## [23,] 7 x 2 = 14 8 x 2 = 16 9 x 2 = 18
## [24,] 7 x 3 = 21 8 x 3 = 24 9 x 3 = 27
## [25,] 7 x 4 = 28 8 x 4 = 32 9 x 4 = 36
## [26,] 7 x 5 = 35 8 x 5 = 40 9 x 5 = 45
## [27,] 7 x 6 = 42 8 x 6 = 48 9 x 6 = 54
## [28,] 7 x 7 = 49 8 x 7 = 56 9 x 7 = 63
## [29,] 7 x 8 = 56 8 x 8 = 64 9 x 8 = 72
## [30,] 7 x 9 = 63 8 x 9 = 72 9 x 9 = 81

gugudan(19,5) # 19단까지, 한 줄에 5단식 출력

##       [,1]          [,2]          [,3]          [,4]          [,5]         
##  [1,] [1]단         [2]단         [3]단         [4]단         [5]단        
##  [2,] 1 x 1 = 1     2 x 1 = 2     3 x 1 = 3     4 x 1 = 4     5 x 1 = 5    
##  [3,] 1 x 2 = 2     2 x 2 = 4     3 x 2 = 6     4 x 2 = 8     5 x 2 = 10   
##  [4,] 1 x 3 = 3     2 x 3 = 6     3 x 3 = 9     4 x 3 = 12    5 x 3 = 15   
##  [5,] 1 x 4 = 4     2 x 4 = 8     3 x 4 = 12    4 x 4 = 16    5 x 4 = 20   
##  [6,] 1 x 5 = 5     2 x 5 = 10    3 x 5 = 15    4 x 5 = 20    5 x 5 = 25   
##  [7,] 1 x 6 = 6     2 x 6 = 12    3 x 6 = 18    4 x 6 = 24    5 x 6 = 30   
##  [8,] 1 x 7 = 7     2 x 7 = 14    3 x 7 = 21    4 x 7 = 28    5 x 7 = 35   
##  [9,] 1 x 8 = 8     2 x 8 = 16    3 x 8 = 24    4 x 8 = 32    5 x 8 = 40   
## [10,] 1 x 9 = 9     2 x 9 = 18    3 x 9 = 27    4 x 9 = 36    5 x 9 = 45   
## [11,] 1 x 10 = 10   2 x 10 = 20   3 x 10 = 30   4 x 10 = 40   5 x 10 = 50  
## [12,] 1 x 11 = 11   2 x 11 = 22   3 x 11 = 33   4 x 11 = 44   5 x 11 = 55  
## [13,] 1 x 12 = 12   2 x 12 = 24   3 x 12 = 36   4 x 12 = 48   5 x 12 = 60  
## [14,] 1 x 13 = 13   2 x 13 = 26   3 x 13 = 39   4 x 13 = 52   5 x 13 = 65  
## [15,] 1 x 14 = 14   2 x 14 = 28   3 x 14 = 42   4 x 14 = 56   5 x 14 = 70  
## [16,] 1 x 15 = 15   2 x 15 = 30   3 x 15 = 45   4 x 15 = 60   5 x 15 = 75  
## [17,] 1 x 16 = 16   2 x 16 = 32   3 x 16 = 48   4 x 16 = 64   5 x 16 = 80  
## [18,] 1 x 17 = 17   2 x 17 = 34   3 x 17 = 51   4 x 17 = 68   5 x 17 = 85  
## [19,] 1 x 18 = 18   2 x 18 = 36   3 x 18 = 54   4 x 18 = 72   5 x 18 = 90  
## [20,] 1 x 19 = 19   2 x 19 = 38   3 x 19 = 57   4 x 19 = 76   5 x 19 = 95  
## [21,] [6]단         [7]단         [8]단         [9]단         [10]단       
## [22,] 6 x 1 = 6     7 x 1 = 7     8 x 1 = 8     9 x 1 = 9     10 x 1 = 10  
## [23,] 6 x 2 = 12    7 x 2 = 14    8 x 2 = 16    9 x 2 = 18    10 x 2 = 20  
## [24,] 6 x 3 = 18    7 x 3 = 21    8 x 3 = 24    9 x 3 = 27    10 x 3 = 30  
## [25,] 6 x 4 = 24    7 x 4 = 28    8 x 4 = 32    9 x 4 = 36    10 x 4 = 40  
## [26,] 6 x 5 = 30    7 x 5 = 35    8 x 5 = 40    9 x 5 = 45    10 x 5 = 50  
## [27,] 6 x 6 = 36    7 x 6 = 42    8 x 6 = 48    9 x 6 = 54    10 x 6 = 60  
## [28,] 6 x 7 = 42    7 x 7 = 49    8 x 7 = 56    9 x 7 = 63    10 x 7 = 70  
## [29,] 6 x 8 = 48    7 x 8 = 56    8 x 8 = 64    9 x 8 = 72    10 x 8 = 80  
## [30,] 6 x 9 = 54    7 x 9 = 63    8 x 9 = 72    9 x 9 = 81    10 x 9 = 90  
## [31,] 6 x 10 = 60   7 x 10 = 70   8 x 10 = 80   9 x 10 = 90   10 x 10 = 100
## [32,] 6 x 11 = 66   7 x 11 = 77   8 x 11 = 88   9 x 11 = 99   10 x 11 = 110
## [33,] 6 x 12 = 72   7 x 12 = 84   8 x 12 = 96   9 x 12 = 108  10 x 12 = 120
## [34,] 6 x 13 = 78   7 x 13 = 91   8 x 13 = 104  9 x 13 = 117  10 x 13 = 130
## [35,] 6 x 14 = 84   7 x 14 = 98   8 x 14 = 112  9 x 14 = 126  10 x 14 = 140
## [36,] 6 x 15 = 90   7 x 15 = 105  8 x 15 = 120  9 x 15 = 135  10 x 15 = 150
## [37,] 6 x 16 = 96   7 x 16 = 112  8 x 16 = 128  9 x 16 = 144  10 x 16 = 160
## [38,] 6 x 17 = 102  7 x 17 = 119  8 x 17 = 136  9 x 17 = 153  10 x 17 = 170
## [39,] 6 x 18 = 108  7 x 18 = 126  8 x 18 = 144  9 x 18 = 162  10 x 18 = 180
## [40,] 6 x 19 = 114  7 x 19 = 133  8 x 19 = 152  9 x 19 = 171  10 x 19 = 190
## [41,] [11]단        [12]단        [13]단        [14]단        [15]단       
## [42,] 11 x 1 = 11   12 x 1 = 12   13 x 1 = 13   14 x 1 = 14   15 x 1 = 15  
## [43,] 11 x 2 = 22   12 x 2 = 24   13 x 2 = 26   14 x 2 = 28   15 x 2 = 30  
## [44,] 11 x 3 = 33   12 x 3 = 36   13 x 3 = 39   14 x 3 = 42   15 x 3 = 45  
## [45,] 11 x 4 = 44   12 x 4 = 48   13 x 4 = 52   14 x 4 = 56   15 x 4 = 60  
## [46,] 11 x 5 = 55   12 x 5 = 60   13 x 5 = 65   14 x 5 = 70   15 x 5 = 75  
## [47,] 11 x 6 = 66   12 x 6 = 72   13 x 6 = 78   14 x 6 = 84   15 x 6 = 90  
## [48,] 11 x 7 = 77   12 x 7 = 84   13 x 7 = 91   14 x 7 = 98   15 x 7 = 105 
## [49,] 11 x 8 = 88   12 x 8 = 96   13 x 8 = 104  14 x 8 = 112  15 x 8 = 120 
## [50,] 11 x 9 = 99   12 x 9 = 108  13 x 9 = 117  14 x 9 = 126  15 x 9 = 135 
## [51,] 11 x 10 = 110 12 x 10 = 120 13 x 10 = 130 14 x 10 = 140 15 x 10 = 150
## [52,] 11 x 11 = 121 12 x 11 = 132 13 x 11 = 143 14 x 11 = 154 15 x 11 = 165
## [53,] 11 x 12 = 132 12 x 12 = 144 13 x 12 = 156 14 x 12 = 168 15 x 12 = 180
## [54,] 11 x 13 = 143 12 x 13 = 156 13 x 13 = 169 14 x 13 = 182 15 x 13 = 195
## [55,] 11 x 14 = 154 12 x 14 = 168 13 x 14 = 182 14 x 14 = 196 15 x 14 = 210
## [56,] 11 x 15 = 165 12 x 15 = 180 13 x 15 = 195 14 x 15 = 210 15 x 15 = 225
## [57,] 11 x 16 = 176 12 x 16 = 192 13 x 16 = 208 14 x 16 = 224 15 x 16 = 240
## [58,] 11 x 17 = 187 12 x 17 = 204 13 x 17 = 221 14 x 17 = 238 15 x 17 = 255
## [59,] 11 x 18 = 198 12 x 18 = 216 13 x 18 = 234 14 x 18 = 252 15 x 18 = 270
## [60,] 11 x 19 = 209 12 x 19 = 228 13 x 19 = 247 14 x 19 = 266 15 x 19 = 285
## [61,] [16]단        [17]단        [18]단        [19]단                     
## [62,] 16 x 1 = 16   17 x 1 = 17   18 x 1 = 18   19 x 1 = 19                
## [63,] 16 x 2 = 32   17 x 2 = 34   18 x 2 = 36   19 x 2 = 38                
## [64,] 16 x 3 = 48   17 x 3 = 51   18 x 3 = 54   19 x 3 = 57                
## [65,] 16 x 4 = 64   17 x 4 = 68   18 x 4 = 72   19 x 4 = 76                
## [66,] 16 x 5 = 80   17 x 5 = 85   18 x 5 = 90   19 x 5 = 95                
## [67,] 16 x 6 = 96   17 x 6 = 102  18 x 6 = 108  19 x 6 = 114               
## [68,] 16 x 7 = 112  17 x 7 = 119  18 x 7 = 126  19 x 7 = 133               
## [69,] 16 x 8 = 128  17 x 8 = 136  18 x 8 = 144  19 x 8 = 152               
## [70,] 16 x 9 = 144  17 x 9 = 153  18 x 9 = 162  19 x 9 = 171               
## [71,] 16 x 10 = 160 17 x 10 = 170 18 x 10 = 180 19 x 10 = 190              
## [72,] 16 x 11 = 176 17 x 11 = 187 18 x 11 = 198 19 x 11 = 209              
## [73,] 16 x 12 = 192 17 x 12 = 204 18 x 12 = 216 19 x 12 = 228              
## [74,] 16 x 13 = 208 17 x 13 = 221 18 x 13 = 234 19 x 13 = 247              
## [75,] 16 x 14 = 224 17 x 14 = 238 18 x 14 = 252 19 x 14 = 266              
## [76,] 16 x 15 = 240 17 x 15 = 255 18 x 15 = 270 19 x 15 = 285              
## [77,] 16 x 16 = 256 17 x 16 = 272 18 x 16 = 288 19 x 16 = 304              
## [78,] 16 x 17 = 272 17 x 17 = 289 18 x 17 = 306 19 x 17 = 323              
## [79,] 16 x 18 = 288 17 x 18 = 306 18 x 18 = 324 19 x 18 = 342              
## [80,] 16 x 19 = 304 17 x 19 = 323 18 x 19 = 342 19 x 19 = 361

12.실습3: ANOVA Program

library(openxlsx)
library(knitr)
library(kableExtra)
options(knitr.kable.NA = '')
#**************************************************

josa <- function(word) {
  k <- substr(word,nchar(word),nchar(word))
  if (((k >="가") & (k <= "힝")) & (((utf8ToInt(k) - utf8ToInt("가")) %% 28) > 0)) {
    return (paste0(word,"은 "))
  } else {
    return (paste0(word,"는 "))
  }
}

#**************************************************
ANOVA  = function(data) {
  
  vname=colnames(data)
  x = as.matrix(data[,1])
  y = as.matrix(data[,2])

  m = aggregate(y ~ x, 'FUN'= mean)
  s = aggregate(y ~ x, 'FUN'= sd)
  n = aggregate(y ~ x, 'FUN'= length)
  tmean= mean(y)

  ssb = sum((m[,2] - tmean)^2 * n[,2])
  sst = sum((y - tmean)^2)
  ssw = sst - ssb

  df1 = nrow(m) - 1
  df2 = nrow(y) - nrow(m)
  dft = nrow(y) - 1

  f = (ssb/df1) / (ssw/df2)
  p = pf(f,df1,df2,lower.tail = FALSE)

  table = matrix(NaN, 3, 5)
  table[1,1] = ssb
  table[2,1] = ssw
  table[3,1] = sst
  table[1,2] = df1
  table[2,2] = df2
  table[3,2] = dft
  table[1,3] = ssb/df1
  table[2,3] = ssw/df2
  table[1,4] = f
  table[1,5] = p

  colnames(table) = c("제곱합", "자유도", "평균제곱","F","p")
  rownames(table) = c("집단간", "집단내","합계계")

#*******************************<익힘1: 평균테이블작성>***********************

  stat = matrix(NaN, nrow(m)+1, 5)
  temp = cbind(m[,2],s[,2],n[,2])
  stat[1:3,1:3] = temp
  stat[4,1] = tmean
  stat[4,2] = sd(y)
  stat[4,3] = nrow(y)
  stat[1,4] = f
  stat[1,5] = p

  colnames(stat) = c("평균","표준편차","사례수","F","p")
  rownames(stat) = c(m[,1],"합계")

  hypo = paste0("가설: ",josa(vname[2]),vname[1],"에 따라 통계적으로 차이가 있을 것이다.")
if (p >= 0.01 & p < 0.05) level = 'p < 0.05'
if (p >= 0.001 & p < 0.01) level='p < 0.01'
if (p < 0.001) level= 'p < 0.001'

if (p >= 0.05) {res=paste0(josa(vname[2]), vname[1],"에 따라 차이가 없을 것이다.")
}else {res=paste0(josa(vname[2]),vname[1],"에 따라 통계적으로 ",level,"수준에서 의미있는 차이가 있다. (p = ",f,", df1 = ",df1,", df2 = ",df2,", p = ",round(p,4),')')
  result = list(hypo=hypo,table=table,stat=stat,res=res)
}  
  return(result)
}

data = read.csv("anova.csv")

out=ANOVA(data)
print(out$hypo,quote=F)

## [1] 가설: 근무만족도는 상사의유형에 따라 통계적으로 차이가 있을 것이다.

out$table %>%
  kbl(caption = "ANOVA Table") %>%
  kable_classic_2(full_width = F, html_font = "D2Coding", position='left')

ANOVA Table
	제곱합	자유도	평균제곱	F	p
집단간	40.444	2	20.2222	4.0625	0.03891
집단내	74.667	15	4.9778
합계계	115.111	17

out$stat %>%
  kbl(caption = "MEAN Table") %>%
  kable_classic_2(full_width = F, html_font = "D2Coding", position='left')

MEAN Table
	평균	표준편차	사례수	F	p
민주형	11.3333	3.5590	6	4.0625	0.03891
자유방임형	8.3333	1.0328	6
전제형	8.0000	1.0955	6
합계	9.2222	2.6022	18

print(out$res,quote=F)

## [1] 근무만족도는 상사의유형에 따라 통계적으로 p < 0.05수준에서 의미있는 차이가 있다. (p = 4.0625, df1 = 2, df2 = 15, p = 0.0389)

13.실습4: 회귀계수 Program (b와 \(\beta\))

library(openxlsx)
library(knitr)
library(kableExtra)
options(knitr.kable.NA = '')
reg=function(df){

y=as.matrix(df[1])
x=as.matrix(df[-1])
x=cbind(ones(nrow(df),1),x)

zdf=as.matrix(df)
zdf <- as.matrix((zdf-matrix(1,nrow(zdf),1)%*%colMeans(zdf))/
                   matrix(1,nrow(zdf),1)%*%apply(zdf,2,sd))
xx=t(x)%*%x
xy=t(x)%*%y
b<-solve(xx)%*%xy

zy=zdf[,1]
zx=zdf[,-1]
zxx = t(zx)%*% zx
zxy = t(zx)%*% zy
beta = solve(zxx) %*% zxy
rownames(b)[1]<-"(상수)"
result = list(b=b, beta=beta)
return(result)
}

library(openxlsx)
df=read.xlsx("regress.xlsx")
out = reg(df)

out$b %>%
  kbl(caption = "비표준화 회귀계수(b)") %>%
  kable_classic_2(full_width = F, html_font = "D2Coding", position='left')

비표준화 회귀계수(b)
	근무만족도
(상수)	22.04299
대인관계	-0.00812
자아개념	0.52797
근무평정	-0.13881
SES점수	0.41397

out$beta %>%
  kbl(caption = "표준화 회귀계수(beta)") %>%
  kable_classic_2(full_width = F, html_font = "D2Coding", position='left')

표준화 회귀계수(beta)
대인관계	-0.00882
자아개념	0.63094
근무평정	-0.09044
SES점수	0.33851

14. Rbase Graph

library(openxlsx)
df1 = read.xlsx("twoway.xlsx")
df3 = read.xlsx("regress.xlsx")
df4 = read.xlsx("ClassExample.xlsx")

# Bar Chart
barplot(table(df4$sliptime))

# Bar Plot (Scatter Plot)
barplot(table(df4$sliptime),
        main = "Sleep Time 빈도",
        xlab = "Sleep Time 정도",
        ylab = "빈도",
        col  = 'red'
        )

# Box Plot 
boxplot(df1$근무만족도 ~ df1$상사의유형)

# Box Plot with titles
boxplot(df1$근무만족도 ~ df1$상사의유형,
        main = '상사의 유형별 근무만족도',
        xlab = "상사의 유형",
        ylab = "근무만족도",
        col = 'yellow')

#plot(df3$근무만족도,df3$근무평정,
#     main = '근무만족도-근무평정 산점도',
#     xlab = '근무만족도',
#     ylab = '근무평정',
#     col = "red",
#     pch = 20)

# Multiple Scatter Plots
pairs(df3,
      main = 'Multi Plots',
      pct = 20,
      col = "red")

# Pie Chart
pie(table(df4$sliptime))

# Pie Chart with titles
df6 = table(df4$sliptime)
df6 = data.frame(freq=df6,lab=c("1-3hrs","3-5hrs","6~7hrs",
                                "7~8hrs","8~9hrs","10~hrs"))
pie(df6[,2],labels=df6[,3])

df7 = aggregate(df4$weight ~df4$sliptime, df4, mean)

# Plot with point
plot(df7[,1],df7[,2])

# Plot with line and titles
plot(df7[,1],df7[,2],
     main = "Weight by Sleeptime",
     xlab = "Sleep Time", ylab="Weight", type='o')

### 15. ggplot Graph

library(ggplot2)

month = c(1,2,3,4,5,6)
rain  = c(55,50,45,50,60,70)

df = data.frame(x=month, y=rain)

# Bar Chart
ggplot(df,aes(x=month, y=rain)) +
  geom_bar(stat = "identity",
           width = 0.7,
           fill = 'steelblue')

# Bar Chart flipped
ggplot(df,aes(x=month, y=rain)) +
  geom_bar(stat = "identity",
           width = 0.7,
           fill = 'steelblue') +
  coord_flip()

# Bar Chart flipped with title
ggplot(df, aes(x=month, y=rain)) +
  geom_bar(stat = "identity",
           width=0.7,
           fill="steelblue") +
  ggtitle("월별강수량") +
  theme(plot.title=element_text(size=25,face="bold",color="steelblue")) +
  labs(x='월',y="강수량") +
  coord_flip()

# Histogram
ggplot(iris, aes(x=Petal.Length)) +
  geom_histogram(binwidth=0.5, fill='red', color='black')

# Histogram by grouping variable
ggplot(iris,aes(x=Sepal.Width, fill=Species, color=Species)) +
  geom_histogram(binwidth =0.5, position='dodge', color='black') +
  theme(legend.position = "top")

# Plot
ggplot(iris, aes(x=Petal.Length, y=Petal.Width)) +
  geom_point()

# Plot by grouping variable
ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
  geom_point(size=3) +
  ggtitle("꽃잎의 길이와 폭") +
  theme(plot.title=element_text(size=25,face="bold", color='steelblue'))

# Box Plot
ggplot(iris, aes(y=Petal.Length, fill=Species)) +
  geom_boxplot()

# Line chart
year = 1937:1960
cnt = as.vector(airmiles)
df  = data.frame(year,cnt)
ggplot(df, aes(x=year, y=cnt)) +
  geom_line(col='red')

# Plot with colored grouping
ggplot() +
  geom_point(mapping=aes(x=displ, y=hwy,color=class), data=mpg)

# Plot with dimming colors
ggplot() +
  geom_point(mapping=aes(x=cty, y=hwy, color=displ), data=mpg)

# Plot with different shapes by grouping variable
ggplot() +
  geom_point(mapping=aes(x=displ, y=hwy, color=class, shape=drv), data=mpg)

# a New Type of plot
ggplot() +
  geom_point(mapping=aes(x=age, y=circumference), data=Orange)

# Multiple Line charts
ggplot() +
  geom_line(mapping=aes(x=age, y=circumference), data=Orange)

# Plot with smooth line
ggplot() +
  geom_line(mapping=aes(x=age, y=circumference, linetype=Tree), data=Orange)

# Plot with smooth lines
ggplot() +
  geom_point(mapping=aes(x=displ, y=hwy), data=mpg) +
  geom_smooth(mapping=aes(x=displ, y=hwy), data=mpg)

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Plot with multiple smooth lines
ggplot() +
  geom_point(mapping=aes(x=displ, y=hwy), data=mpg) +
  geom_smooth(mapping=aes(x=displ, y=hwy), data=mpg) +
  geom_point(mapping=aes(x=displ, y=cty), data=mpg, col='red', shape=1) +
  geom_smooth(mapping=aes(x=displ, y=cty), data=mpg, linetype=2, col='red')

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

# Colored Bar Chart by grouping variable
ggplot(mpg) + geom_bar(aes(class, fill=drv))

# Line chart with density groups
ggplot(mpg, aes(hwy, color=drv)) + geom_density()

# Bar chart with polar coordination
ggplot(mpg, aes(class, fill=drv)) +
  geom_bar(position='fill') + coord_polar()

### 16. 데이터의 전처리 (data.table)

data.table 기본 문법

filename[(1), (2), (3)][(4)]

(1): 행(케이스)의 선택, blank는 모든 케이스 선택
(2): 열(변수)의 선택 및 summary
(3): gropuing variable: key 대신 keyby를 활용하면 key에 따라 ascending 정렬
(4): key변수로 정렬. 예) [k1, -k2]는 k1별로 ascending, k2별로 descending 정렬

library(data.table)
library(openxlsx)
library(knitr)
library(kableExtra)

df1 = fread("anova.csv")
df2 = fread("tempbig.csv")
df3 = as.data.table(read.xlsx("twoway1.xlsx"))

# df2파일의 전체 케이스에 대하여, 결측치가 아닌 것만을 대상으로 전체 평균/표준편차/사례수를 구함
df2[, .(Mean=mean(y, na.rm=T), SD=sd(y, na.rm=T), N=sum(!is.na(y)))]

##      Mean     SD        N
##     <num>  <num>    <int>
## 1: 61.215 27.514 10000000

# df2파일의 전체 케이스에 대하여, 결측치가 아닌 것만을 대상으로 x1,x2 두개 변수로 grouping을 하여
# 집단별 평균/표준편차/사례수를 구한 후에, x1,x2 집단별로 ascending 순으로 정렬함
out = df2[, .(Mean=mean(y, na.rm=T), SD=sd(y, na.rm=T), N=sum(!is.na(y))), keyby = .(x1,x2)]

# 결과(out)를 knitr::kableExtra 를 이용하여 출력함
out %>%
  kbl(caption = "df2 Output") %>%
  kable_classic(full_width = F, html_font = "NanumGothicCoding", position = 'left')

df2 Output
x1	x2	Mean	SD	N
1	1	60.223	29.101	1021910
1	2	62.809	27.611	1395690
1	3	60.827	28.032	1246580
1	4	54.759	26.603	789890
1	5	51.329	26.428	489860
1	6	47.394	25.760	354670
2	1	63.309	26.892	1843800
2	2	66.919	26.112	1671400
2	3	66.280	25.149	677430
2	4	58.400	26.930	275400
2	5	53.890	28.434	145660
2	6	51.272	24.851	87710

# df2파일의 전체 케이스에 대하여, 결측치가 아닌 것만을 대상으로 x1,x2 두개 변수로 grouping을 하여
# 집단별 평균/표준편차/사례수를 계산한 후에, x1별로는 ascending, x2별로는 descending 순으로 정렬
df2[, .(Mean=mean(y), SD=sd(y), N=sum(!is.na(y))), by=.(x1,x2)][order(x1,-x2)]

##        x1    x2   Mean     SD       N
##     <int> <int>  <num>  <num>   <int>
##  1:     1     6 47.394 25.760  354670
##  2:     1     5 51.329 26.428  489860
##  3:     1     4 54.759 26.603  789890
##  4:     1     3 60.827 28.032 1246580
##  5:     1     2 62.809 27.611 1395690
##  6:     1     1 60.223 29.101 1021910
##  7:     2     6 51.272 24.851   87710
##  8:     2     5 53.890 28.434  145660
##  9:     2     4 58.400 26.930  275400
## 10:     2     3 66.280 25.149  677430
## 11:     2     2 66.919 26.112 1671400
## 12:     2     1 63.309 26.892 1843800

# x2가 4보다 같거나 작은 케이스를 대상으로 평균/표준편차/사례수를 구하여 x1,x2별로 정렬함
df2[x2 <= 4, .(Mean=mean(y), SD=sd(y), N=sum(!is.na(y))),  keyby = .(x1,x2)]

## Key: <x1, x2>
##       x1    x2   Mean     SD       N
##    <int> <int>  <num>  <num>   <int>
## 1:     1     1 60.223 29.101 1021910
## 2:     1     2 62.809 27.611 1395690
## 3:     1     3 60.827 28.032 1246580
## 4:     1     4 54.759 26.603  789890
## 5:     2     1 63.309 26.892 1843800
## 6:     2     2 66.919 26.112 1671400
## 7:     2     3 66.280 25.149  677430
## 8:     2     4 58.400 26.930  275400

# x2 <= 4 & x1=1인 케이스를 대상으로 평균/표준편차/사례수를 구하여 x1,x2별로 정렬함
df2[x2 <= 4 & x1 == 1, .(Mean=mean(y), SD=sd(y), N=sum(!is.na(y))), keyby = .(x1,x2)]

## Key: <x1, x2>
##       x1    x2   Mean     SD       N
##    <int> <int>  <num>  <num>   <int>
## 1:     1     1 60.223 29.101 1021910
## 2:     1     2 62.809 27.611 1395690
## 3:     1     3 60.827 28.032 1246580
## 4:     1     4 54.759 26.603  789890

# df1 파일을 이용하여 (1) 모든 케이스를 활용, (2) 근무만족도에 대한 평균/표준편차/사례수 계산
#                     (3) 상사의 유형별로 grouping, keyby를 이용하여 정렬
df1[, .(Mean=mean(근무만족도), SD=sd(근무만족도), N=sum(!is.na(근무만족도))),
           keyby = .(상사의유형)]

## Key: <상사의유형>
##    상사의유형    Mean     SD     N
##        <char>   <num>  <num> <int>
## 1:     민주형 11.3333 3.5590     6
## 2: 자유방임형  8.3333 1.0328     6
## 3:     전제형  8.0000 1.0954     6

Statistics with R

김경성

2024-4-27

R 강의 Summary

1. 연산자의 종류

2. vector의 생성 및 선택

3. 행렬(matrix)의 생성 및 선택

4. 배열(array)의 생성 및 선택

5. Dataframe의 생성과 선택

6. List의 생성과 선택

7. 프로그램을 위한 기초 (print)

8. 내장함수

9.프로그램의 기초

10.실습1: 소수(prime number) 구하기

11.실습2: 구구단 출력하기

12.실습3: ANOVA Program

13.실습4: 회귀계수 Program (b와 \(\beta\))

14. Rbase Graph