형태소분석
1.1
자료 가져오기 (1)
- 뉴스샘플 데이터 활용
txt <- readLines("sample_news.txt")
noun <- lapply(txt, extractNoun)
noun## [[1]]
## [1] "정부" "내수" "회복"
## [4] "하반기" "경제운용" "방향"
## [7] "핵심" "키워드" "마땅"
## [10] "한" "해법이" "고심"
## [13] "소비" "투자" "등"
## [16] "내수" "회복" "속도"
## [19] "기대" "치" "충족"
## [22] "데" "세월" "호"
## [25] "참사" "경제" "심리"
## [28] "가운데" "내수" "부진"
## [31] "물줄기" "만" "수"
## [34] "것" "현실" "전문가"
## [37] "들" "추가경정예산(추경)" "편성"
## [40] "유일" "효과" "수"
## [43] "대책" "시기" "것"
## [46] "중론"
##
## [[2]]
## [1] "9" "관계" "부처" "정부" "이달" "말"
## [7] "발표" "한" "하반기" "경제운용" "방향" "소비"
## [13] "투자" "방안" "방침" "그간" "정부" "세월"
## [19] "호" "참사" "이후" "소비" "심리" "악화"
## [25] "됨" "재정" "당초" "계획" "집행" "공무원"
## [31] "복지" "포인트" "사용" "독려" "등" "대책"
## [37] "발표" "한" "바" "근본" "적" "대책"
## [43] "거리" "정부" "표현" "한" "대" "‘원"
## [49] "포인트’" "수준"
##
## [[3]]
## [1] "문제" "카드" "마땅" "치" "점" "서비스업"
## [7] "규제" "완화" "자영업자" "대책" "등" "정부"
## [13] "마련" "수" "내수" "진작" "책" "대책"
## [19] "급격" "한" "내수" "회복" "지적"
##
## [[4]]
## [1] "추가경정예산(추경)을" "편성" "것"
## [4] "주장" "일각" "실행"
## [7] "정부" "재정상황" "뿐"
## [10] "추경" "편성" "만"
## [13] "상황" "지" "확신"
## [16] "하기"
##
## [[5]]
## [1] "신민" "영" "LG" "경제연구원"
## [5] "경제" "연구" "부문장" "“내수"
## [9] "효과" "수" "방법" "사실상"
## [13] "추경" "추경" "편성" "시기"
## [17] "아니다”라며" "“세월호" "분위기" "브라"
## [21] "질" "월드컵" "등" "영향"
## [25] "소비" "심리" "가능성" "만큼"
## [29] "추경" "금리인하" "빠르다”고" "지적"
##
## [[6]]
## [1] "이" "세월" "호" "여파" "일상"
## [6] "적" "경제" "활동" "전개" "노력"
## [11] "우선" "적" "필요" "진단" "강명헌"
## [16] "단국대" "경제학과" "교수" "“일단" "세월"
## [21] "호" "분위기" "정책" "효과" "수"
## [26] "것”이라며" "“추경" "등" "상황" "추후"
## [31] "검토" "수" "것”이라고" "말"
##
## [[7]]
## [1] "현오석" "부총리" "겸" "기획" "재정"
## [6] "부" "장관" "기업" "들" "세월"
## [11] "호" "여파" "중단" "마케팅" "활동"
## [16] "속개" "투자" "고용" "계획" "집행"
## [21] "해" "줄" "것" "당부" "등"
## [26] "정부" "‘일상으로" "복귀’를"
##
## [[8]]
## [1] "근본" "적" "기업" "가계" "사이" "소득"
## [7] "차" "줄" "노력" "서민" "들" "가처분"
## [13] "소득" "수" "방안" "모색" "지적" "가계"
## [19] "소득" "주거" "사교육비" "지출" "노후" "불안"
## [25] "한" "한국" "사회" "구조" "개선" "노력"
## [31] "필요" "것"
##
## [[9]]
## [1] "조원희" "국민" "대" "경제학과"
## [5] "교수" "“정부가" "규제" "완화"
## [9] "적극" "나" "규제" "투자"
## [13] "활성화" "가설" "않다”며" "“박근혜정부가"
## [17] "출범" "당시" "강조" "한"
## [21] "경제" "민주화" "복지" "등"
## [25] "분배" "신경" "한다”고" "주장"
리스트에서 백터 타입으로 변환
unlist(noun)## [1] "정부" "내수" "회복"
## [4] "하반기" "경제운용" "방향"
## [7] "핵심" "키워드" "마땅"
## [10] "한" "해법이" "고심"
## [13] "소비" "투자" "등"
## [16] "내수" "회복" "속도"
## [19] "기대" "치" "충족"
## [22] "데" "세월" "호"
## [25] "참사" "경제" "심리"
## [28] "가운데" "내수" "부진"
## [31] "물줄기" "만" "수"
## [34] "것" "현실" "전문가"
## [37] "들" "추가경정예산(추경)" "편성"
## [40] "유일" "효과" "수"
## [43] "대책" "시기" "것"
## [46] "중론" "9" "관계"
## [49] "부처" "정부" "이달"
## [52] "말" "발표" "한"
## [55] "하반기" "경제운용" "방향"
## [58] "소비" "투자" "방안"
## [61] "방침" "그간" "정부"
## [64] "세월" "호" "참사"
## [67] "이후" "소비" "심리"
## [70] "악화" "됨" "재정"
## [73] "당초" "계획" "집행"
## [76] "공무원" "복지" "포인트"
## [79] "사용" "독려" "등"
## [82] "대책" "발표" "한"
## [85] "바" "근본" "적"
## [88] "대책" "거리" "정부"
## [91] "표현" "한" "대"
## [94] "‘원" "포인트’" "수준"
## [97] "문제" "카드" "마땅"
## [100] "치" "점" "서비스업"
## [103] "규제" "완화" "자영업자"
## [106] "대책" "등" "정부"
## [109] "마련" "수" "내수"
## [112] "진작" "책" "대책"
## [115] "급격" "한" "내수"
## [118] "회복" "지적" "추가경정예산(추경)을"
## [121] "편성" "것" "주장"
## [124] "일각" "실행" "정부"
## [127] "재정상황" "뿐" "추경"
## [130] "편성" "만" "상황"
## [133] "지" "확신" "하기"
## [136] "신민" "영" "LG"
## [139] "경제연구원" "경제" "연구"
## [142] "부문장" "“내수" "효과"
## [145] "수" "방법" "사실상"
## [148] "추경" "추경" "편성"
## [151] "시기" "아니다”라며" "“세월호"
## [154] "분위기" "브라" "질"
## [157] "월드컵" "등" "영향"
## [160] "소비" "심리" "가능성"
## [163] "만큼" "추경" "금리인하"
## [166] "빠르다”고" "지적" "이"
## [169] "세월" "호" "여파"
## [172] "일상" "적" "경제"
## [175] "활동" "전개" "노력"
## [178] "우선" "적" "필요"
## [181] "진단" "강명헌" "단국대"
## [184] "경제학과" "교수" "“일단"
## [187] "세월" "호" "분위기"
## [190] "정책" "효과" "수"
## [193] "것”이라며" "“추경" "등"
## [196] "상황" "추후" "검토"
## [199] "수" "것”이라고" "말"
## [202] "현오석" "부총리" "겸"
## [205] "기획" "재정" "부"
## [208] "장관" "기업" "들"
## [211] "세월" "호" "여파"
## [214] "중단" "마케팅" "활동"
## [217] "속개" "투자" "고용"
## [220] "계획" "집행" "해"
## [223] "줄" "것" "당부"
## [226] "등" "정부" "‘일상으로"
## [229] "복귀’를" "근본" "적"
## [232] "기업" "가계" "사이"
## [235] "소득" "차" "줄"
## [238] "노력" "서민" "들"
## [241] "가처분" "소득" "수"
## [244] "방안" "모색" "지적"
## [247] "가계" "소득" "주거"
## [250] "사교육비" "지출" "노후"
## [253] "불안" "한" "한국"
## [256] "사회" "구조" "개선"
## [259] "노력" "필요" "것"
## [262] "조원희" "국민" "대"
## [265] "경제학과" "교수" "“정부가"
## [268] "규제" "완화" "적극"
## [271] "나" "규제" "투자"
## [274] "활성화" "가설" "않다”며"
## [277] "“박근혜정부가" "출범" "당시"
## [280] "강조" "한" "경제"
## [283] "민주화" "복지" "등"
## [286] "분배" "신경" "한다”고"
## [289] "주장"
자료 가져오기 (2)
txt <- read.csv("sample_voc.csv", stringsAsFactors = FALSE)
noun <- lapply(txt$CONTENTS, extractNoun)getMorph("우리의 소원은 통일입니다")## [1] "우리" "소원" "통일"
1.2 비정형 빈도 분석
1.2.1 고빈도 단어 추출하기
table()
x <- c("a", "a", "c", "a")
table(x)## x
## a c
## 3 1
sort()
y <- c(5, 8, 3, 1, 2)
sort(y)## [1] 1 2 3 5 8
sort(y, decreasing = T)## [1] 8 5 3 2 1
- table()에 대해서도 빈도순 정렬이 가능하다.
sort(table(x)); sort(table(x), decreasing = T)## x
## c a
## 1 3
## x
## a c
## 3 1
1.2.2 빈도표 생성
txt <- readLines("sample_news.txt")
noun <- lapply(txt, getMorph, "noun")
nounVec <- unlist(noun)
nounFreq <- table(nounVec)1.2.3 빈도수와 단어 표시
head(sort(nounFreq, decreasing = T),20)## nounVec
## 정부 추경 경제 내수 세월호 대책 소비 규제 노력 당장
## 9 7 6 6 6 5 4 3 3 3
## 상황 소득 심리 투자 회복 효과 가계 경정 경제학 계획
## 3 3 3 3 3 3 2 2 2 2
1.2.4 빈도 높은 단어만 표시
names(head(sort(nounFreq, decreasing = T),20))## [1] "정부" "추경" "경제" "내수" "세월호" "대책" "소비"
## [8] "규제" "노력" "당장" "상황" "소득" "심리" "투자"
## [15] "회복" "효과" "가계" "경정" "경제학" "계획"
1.2.5 막대그래프 그리기
freq <- as.vector(head(sort(nounFreq, decreasing = T),20))
word <- names(head(sort(nounFreq, decreasing = T),20))
sum <- sum(nounFreq)
percent <- round((freq/sum) * 100, digits = 2)
mainTxt <- "고빈도 단어"
bp <- barplot(percent, main = mainTxt, las = 2, ylim = c(0,5), ylab = "%", names.arg = word, col = "black")
text(x = bp, y = percent + 0.3, labels = paste(freq), col = "black", cex = 0.8)