[Author]
@ JungHwan Yun
@ Master Student in Data-Science
@ Seoul National University of Science & Technology(SeoulTech)
@ E-mail : junghwan.yun@seoultech.ac.kr
[Contents]
@ Topic : CRM : 데이터 마이닝 실습 : PART2
@ Class : Customer Relationship Management
@ Version : 2.0
@ Version date : 2017-05-11
@ Summary : R의 데이터 다루기 및 데이터 입출력
(d <- data.frame(NAME=c('YUN','GEUM','MOON','KANG','KIM'),
MATH=c(2,4,6,8,10),
ENGLISH=c(3,6,9,12,15),
KOREAN=c(4,8,12,16,20),
TOTAL_GRADE = c('D','C','B','B','A')))
## NAME MATH ENGLISH KOREAN TOTAL_GRADE
## 1 YUN 2 3 4 D
## 2 GEUM 4 6 8 C
## 3 MOON 6 9 12 B
## 4 KANG 8 12 16 B
## 5 KIM 10 15 20 A
str(d)
## 'data.frame': 5 obs. of 5 variables:
## $ NAME : Factor w/ 5 levels "GEUM","KANG",..: 5 1 4 2 3
## $ MATH : num 2 4 6 8 10
## $ ENGLISH : num 3 6 9 12 15
## $ KOREAN : num 4 8 12 16 20
## $ TOTAL_GRADE: Factor w/ 4 levels "A","B","C","D": 4 3 2 2 1
d[1,]
## NAME MATH ENGLISH KOREAN TOTAL_GRADE
## 1 YUN 2 3 4 D
d[,1]
## [1] YUN GEUM MOON KANG KIM
## Levels: GEUM KANG KIM MOON YUN
d[1:2,1:3]
## NAME MATH ENGLISH
## 1 YUN 2 3
## 2 GEUM 4 6
d$NAME
## [1] YUN GEUM MOON KANG KIM
## Levels: GEUM KANG KIM MOON YUN
d$KOREAN
## [1] 4 8 12 16 20
str(d)
## 'data.frame': 5 obs. of 5 variables:
## $ NAME : Factor w/ 5 levels "GEUM","KANG",..: 5 1 4 2 3
## $ MATH : num 2 4 6 8 10
## $ ENGLISH : num 3 6 9 12 15
## $ KOREAN : num 4 8 12 16 20
## $ TOTAL_GRADE: Factor w/ 4 levels "A","B","C","D": 4 3 2 2 1
ex1) 숫자형태를 문자형태로 변경
(d$ENGLISH)
## [1] 3 6 9 12 15
d$ENGLISH <- as.character(d$ENGLISH)
(d$ENGLISH)
## [1] "3" "6" "9" "12" "15"
ex2) 다시 숫자형태로 변경
(d$ENGLISH)
## [1] "3" "6" "9" "12" "15"
d$ENGLISH <- as.numeric(d$ENGLISH)
(d$ENGLISH)
## [1] 3 6 9 12 15
= 많이 쓰이는 형변환 함수=
as.numeric : 숫자형으로 변환
as.character : 문자형으로 변환
as.factor : 요인형으로 변환
# setwd("C:/Users/revin/Documents")
# getwd()
# setwd("D:/Google Drive/5_CODE_Library/R_MARKDOWN/2_BasicGrammer_DataHandle")
# getwd()
=> 현재 위치와 새롭게 설정된 위치를 확인 가능
data_iris <- read.csv("iris.csv",stringsAsFactors = F)
불러온 데이터의 기본적인 속성을 알려주는 몇가지 함수를 사용
data_iris : 데이터프레임 형태이며 150개의 관측치(행)와 6개의 속성(열)로 되어있음
str(data_iris)
## 'data.frame': 150 obs. of 6 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
summary(data_iris)
## X Sepal.Length Sepal.Width Petal.Length
## Min. : 1.00 Min. :4.300 Min. :2.000 Min. :1.000
## 1st Qu.: 38.25 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600
## Median : 75.50 Median :5.800 Median :3.000 Median :4.350
## Mean : 75.50 Mean :5.843 Mean :3.057 Mean :3.758
## 3rd Qu.:112.75 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100
## Max. :150.00 Max. :7.900 Max. :4.400 Max. :6.900
## Petal.Width Species
## Min. :0.100 Length:150
## 1st Qu.:0.300 Class :character
## Median :1.300 Mode :character
## Mean :1.199
## 3rd Qu.:1.800
## Max. :2.500
head(data_iris,10)
## X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 1 5.1 3.5 1.4 0.2 setosa
## 2 2 4.9 3.0 1.4 0.2 setosa
## 3 3 4.7 3.2 1.3 0.2 setosa
## 4 4 4.6 3.1 1.5 0.2 setosa
## 5 5 5.0 3.6 1.4 0.2 setosa
## 6 6 5.4 3.9 1.7 0.4 setosa
## 7 7 4.6 3.4 1.4 0.3 setosa
## 8 8 5.0 3.4 1.5 0.2 setosa
## 9 9 4.4 2.9 1.4 0.2 setosa
## 10 10 4.9 3.1 1.5 0.1 setosa
tail(data_iris)
## X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 145 145 6.7 3.3 5.7 2.5 virginica
## 146 146 6.7 3.0 5.2 2.3 virginica
## 147 147 6.3 2.5 5.0 1.9 virginica
## 148 148 6.5 3.0 5.2 2.0 virginica
## 149 149 6.2 3.4 5.4 2.3 virginica
## 150 150 5.9 3.0 5.1 1.8 virginica
(subset_iris <- data_iris[1:30,])
## X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 1 5.1 3.5 1.4 0.2 setosa
## 2 2 4.9 3.0 1.4 0.2 setosa
## 3 3 4.7 3.2 1.3 0.2 setosa
## 4 4 4.6 3.1 1.5 0.2 setosa
## 5 5 5.0 3.6 1.4 0.2 setosa
## 6 6 5.4 3.9 1.7 0.4 setosa
## 7 7 4.6 3.4 1.4 0.3 setosa
## 8 8 5.0 3.4 1.5 0.2 setosa
## 9 9 4.4 2.9 1.4 0.2 setosa
## 10 10 4.9 3.1 1.5 0.1 setosa
## 11 11 5.4 3.7 1.5 0.2 setosa
## 12 12 4.8 3.4 1.6 0.2 setosa
## 13 13 4.8 3.0 1.4 0.1 setosa
## 14 14 4.3 3.0 1.1 0.1 setosa
## 15 15 5.8 4.0 1.2 0.2 setosa
## 16 16 5.7 4.4 1.5 0.4 setosa
## 17 17 5.4 3.9 1.3 0.4 setosa
## 18 18 5.1 3.5 1.4 0.3 setosa
## 19 19 5.7 3.8 1.7 0.3 setosa
## 20 20 5.1 3.8 1.5 0.3 setosa
## 21 21 5.4 3.4 1.7 0.2 setosa
## 22 22 5.1 3.7 1.5 0.4 setosa
## 23 23 4.6 3.6 1.0 0.2 setosa
## 24 24 5.1 3.3 1.7 0.5 setosa
## 25 25 4.8 3.4 1.9 0.2 setosa
## 26 26 5.0 3.0 1.6 0.2 setosa
## 27 27 5.0 3.4 1.6 0.4 setosa
## 28 28 5.2 3.5 1.5 0.2 setosa
## 29 29 5.2 3.4 1.4 0.2 setosa
## 30 30 4.7 3.2 1.6 0.2 setosa
write.csv(subset_iris,"subset.csv")