Ecommerce mengalami pertumbuhan yang sangat pesat di beberapa tahun terakhir. Pemerintah mencatat nilai transaksi ekonomi pada e-commerce di kuartal I-2022 telah mencapai Rp 108,54 triliun. Capaian tersebut mengalami pertumbuhan 23 persen dibandingkan periode yang sama di tahun lalu. Pertumbuhan dari ecommerce ini dipengaruhi oleh banyak faktor, di antaranya adalah:
Pertumbuhan pengguna internet di Indonesia yang tumbuh 21 juta hanya dalam waktu singkat. Pertumbuhan tersebut tercatat hanya selama masa pandemi atau sejak 2020 hingga 2021 semester I. APJII dalam laporan terbaru bertajuk “Profil Internet Indonesia 2022” mengungkapkan sembilan alasan utama seseorang menggunakan internet dan salah satu yang tertinggi adalah untuk melakukan transaksi online sebesar 79 persen.
Peningkatan fitur dan infrastruktur pembayaran pembelian online. Dari sisi penjual juga semakin paham dengan teknologi digital, bahkan 98 persen telah menggunakan metode pembayaran digital.
Pandemi telah mendorong percepatan penggunaan teknologi digital baik di industri maupun di masyarakat. Hal ini sejalan dengan yang disampaikan oleh Direktorat Ekonomi Digital Ditjen Aptika (2021) bahwa pembangunan infrastruktur telekomunikasi di Indonesia yang sebelumnya didorong hanya dari sisi disrupsi teknologi namun saat pandemi covid-19 ternyata mampu memacu akselerasi transformasi digital di semua sektor.
Peningkatan penetrasi smartphone. APJII dalam laporan “Profil Internet Indonesia 2022” mengungkapkan bahwa pengguna internet Indonesia mayoritas atau sebanyak 89,03 persen mengakses internet dengan perangkat ponsel atau tablet. Sementara itu, hanya 0,73 persen yang mengakses lewat komputer atau laptop.
Penerimaan investasi asing.
Masa depan ecommerce di Indonesia akan terus bertumbuh. Direktorat Jenderal Aplikasi Informatika Kementerian Kominfo (2022) menyatakan pertumbuhan nilai perdagangan elektronik (e-commerce) di Indonesia mencapai 78 persen, tertinggi di dunia. Sementara itu, dilansir dari Katadata disebutkan bahwa Indonesia diperkirakan dapat menjadi kontributor pertumbuhan utama di Asia Pasifik. Berdasarkan analisis RedSeer, pasar e-commerce Indonesia diproyeksikan dapat meningkat menjadi 137,5 miliar USD pada 2025. Nilai transaksi tersebut merupakan pertumbuhan majemuk tahunan (CAGR) sebesar 25,3% dari pencapaian tahun 2020 sebesar 44,6 miliar USD. RedSeer juga memproyeksikan nilai transaksi e-commerce Indonesia mencapai 67,4 miliar USD pada 2021. Pada 2022, nilai transaksi diproyeksikan menjadi 86 miliar USD. Selanjutnya, nilai tersebut meningkat menjadi 104 miliar USD pada 2023 dan 121 miliar USD pada 2024. Nilai transaksi e-commerce di Indonesia juga akan menjadi yang terbesar di Asia Pasifik dengan estimasi 137,5 miliar USD pada 2025. Berdasarkan hal tersebut berarti Indonesia mencakup 59% dari total nilai transaksi Asia Pasifik yang sebesar 231 miliar USD.
Salah satu hal yang menjadi keunggulan berbelanja online dibandingkan offline adalah adanya fitur review customer. Beberapa penelitian yang dilakukan untuk melihat pengaruh customer review terhadap minat beli pada marketplace menyimpulkan bahwa customer review berpengaruh pada minat beli. Hal ini juga didukung oleh data pada artikel yang dimuat di sosiakita.com yang menyatakan bahwa hampir 89% customer melihat review online sebagai bahan pertimbangan untuk mereka berbelanja. Namun 80% customer akan merubah pikiran untuk berbelanja ketika melihat review buruk pada suatu produk atau jasa tersebut.
Mengingat pentingnya customer review pada platform belanja online membuat setiap pihak yang menggunakan platform ecommerce untuk berjualan seharusnya dapat menggunakan review customer sebagai salah satu data penting dalam pembuatan berbagai strategi perusahaan mereka. Walaupun demikian, dengan banyaknya jumlah customer yang berbelanja dan memberikan review, membuat ada kebutuhan bagi pelaku usaha untuk dapat mengolah data customer review tersebut dengan baik dan segera. Selain itu review juga dapat bervariasi dan bertambah jumlahnya dari waktu ke waktu. Untuk itu dibutuhkan kemampuan identifikasi topik dari review customer tersebut untuk menemukan informasi apa saja yang sering dibahas terkait produk-produk yang diperjualbelikan di akun penjual ecommerce tersebut.Namun sayangnya belum tentu seluruh pelaku usaha yang berjualan pada platform ecommerce memiliki kapabilitas dalam pengolahan data customer review. Karena itu dibutuhkan sebuah alat analisis yang dapat secara cepat digunakan.
Terkait dengan produk yang diperdagangkan di ecommerce, Katadata Insight Center (KIC) bekerja sama dengan Kredivo merilis studi Perilaku Konsumen E-Commerce Indonesia. Pada laporan tersebut dijelaskan bahwa yang menjadi produk dengan proporsi jumlah transaksi terbesar adalah produk fesyen dan aksesorisnya berada di peringkat kedua dengan proporsi jumlah transaksi 17,3% sedikit di bawah pulsa dan voucher dengan jumlah transaksi 23,4% dari total jumlah transaksi e-commerce pada tahun lalu. Hal ini juga sejalan dengan Hasil riset MarkPlus mengenai e-commerce di Indonesia di masa pandemi covid-19 yang menyebutkan persentase penjualan dalam berbagai kategori yang paling sering dibeli oleh konsumen di e-commerce untuk produk fesyen atau pakaian di Shopee sebesar 59 persen, Tokopedia 33 persen, Bukalapak 26 persen, Lazada 40 persen, JD.ID 31 persen, dan Blibli 28 persen. Salah satu produk fashion yang menjadi peluang besar di tahun 2022 adalah clothing atau yang juga sering disebut pakaian jadi. Produk ini dulu sempat berkembang di saat banyak anak muda di Indonesia yang berbondong-bondong membuka usaha clothing line. Tren ini ternyata masih berlanjut sampai sekarang dan sudah mulai merambah penjualanya, dimana tidak lagi hanya berbentuk outlet offline yang disebut sebagai distro, namun juga pada penjualan online salah satunya melalui ecommerce. Hal inilah yang membuat diperlukan project yang bukan hanya mendalami data review customer namun juga berfokus pada salah satu produk yang paling banyak diperjual belikan pada platform ecommerce, yakni pakaian jadi.
Project ini memberikan kesempatan kepada para pelaku usaha, yakni toko fesyen pada platform ecommerce di berbagai skala untuk dapat melakukan analisis terhadap data review customer yang mereka miliki. Review customer direncanakan untuk dilakukan per produk, sehingga user dari project dapat melakukan analisis sesuai dengan jumlah produk yang dimiliki. Dengan melakukan pemanfaatan data dan analisis terhadap review customer pelaku usaha diharapkan dapat memberikan manfaat langsung, yakni:
Mendapatkan informasi kebutuhan pelanggan.
Melakukan evaluasi dan pengembangan produk.
Dengan mendapatkan manfaat tersebut, para pelaku usaha juga nantinya dapat membuat strategi untuk meningkatkan kepercayaan customer serta menumbuhkan kredibilitas dari usaha yang dimiliki. Seluruh aktivitas yang berfokus pada persepsi dan opini pelanggan akan sangat membantu pelaku usaha online untuk mendapatkan lebih banyak customer dan menjaga para customer tersebut agar tetap loyal.
Target pengguna project ini adalah pemilik toko fesyen pada platform ecommerce.
Manfaat yang diperoleh dari project ini adalah kemudahan pengolahan data review customer yang dikaitkan dengan jenis produk. User juga mendapatkan insight serta visualisasi data yang mudah dipahami dan digunakan untuk melakukan analisis.
Implementasi bisa dikembangkan pada berbagai perusahaan di berbagai industri (FMCG, Retail, Financial, Teknologi, Media, dsb) yang memiliki data review customer untuk berbagai jenis produk yang dipasarkan baik melalui platform ecommerce atau bahkan platform lainnya yang juga memiliki fitur customer review.
Project ini bertujuan untuk menghasilkan:
Pengolahan data review customer.
Menemukan model machine learning yang sesuai.
Membangun dashboard visualisasi dengan fitur input output data.
<- read.csv("Womens_Ecommerce_Clothing_Reviews/Womens_Ecommerce_Clothing.csv")
clothing
dim(clothing)
## [1] 23486 11
Terlihat bahwa dataset memiliki dimensi 23.486 observasi dengan 11 kolom. Selanjutnya dilihat jumlah kolom dan type data untuk masing-masing kolom.
glimpse(clothing)
## Rows: 23,486
## Columns: 11
## $ X <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, …
## $ Clothing.ID <int> 767, 1080, 1077, 1049, 847, 1080, 858, 858, 10…
## $ Age <int> 33, 34, 60, 50, 47, 49, 39, 39, 24, 34, 53, 39…
## $ Title <chr> "", "", "Some major design flaws", "My favorit…
## $ Review.Text <chr> "Absolutely wonderful - silky and sexy and com…
## $ Rating <int> 4, 5, 3, 5, 5, 2, 5, 4, 5, 5, 3, 5, 5, 5, 3, 4…
## $ Recommended.IND <int> 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1…
## $ Positive.Feedback.Count <int> 0, 4, 0, 0, 6, 4, 1, 4, 0, 0, 14, 2, 2, 0, 1, …
## $ Division.Name <chr> "Initmates", "General", "General", "General Pe…
## $ Department.Name <chr> "Intimate", "Dresses", "Dresses", "Bottoms", "…
## $ Class.Name <chr> "Intimates", "Dresses", "Dresses", "Pants", "B…
Dataset berisi 10 variabel dengan penjelasan sebagai berikut:
Clothing ID: spesifik barang yang direview.
Age: Umur customer yang memberikan review.
Title: Judul review.
Review Text: Isi review.
Rating: Skor produk yang diberikan customer, dimana 1 untuk paling buruk dan 5 untuk paling baik.
Recommended IND: Rekomendasi produk, dimana 1 untuk direkomondasikan dan 0 untuk tidak direkomendasikan.
Positive Feedback Count: Jumlah dimana customer lain menyatakan review ini membantu.
Division Name: Nama divisi dari produk yang direview.
Department Name: Nama departemen dari produk yang direview.
Class Name: Nama kelas dari produk yang direview.
Data yang akan digunakan bersumber dari Kaggle “Women’s E-Commerce Clothing Reviews” https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews. Data ini berasal dari data komersial yang ditarik dengan cara web-scraped namun sudah dibuat anonimus dan nama perusahaan pada review text sudah diganti menjadi “retailer”.
Data yang digunakan sudah sesuai dengan kebutuhan bisnis karena isi dari data yang ada cukup lengkap menggambarkan kondisi data customer review yang biasanya ada di berbagai platform ecommerce. Walaupun demikian, hal yang harus diperhatikan jika implementasi dari project ini dilakukan pada data teks yang berbahasa Indonesia, maka perlu adanya penyesuaian dari sisi pengeolahan datanya.
Topic Modelling pada project ini menggunakan Latent Dirichlet Allocation (LDA). LDA termasuk model tanpa pengawasan (unsupervised) untuk menemukan topik yang terkandung dalam suatu kumpulan dokumen. LDA digunakan karena metode ini memiliki kemampuan untuk mendeteksi topik-topik yang ada pada koleksi dokumen beserta proporsi kemunculan topik tersebut, baik di koleksi maupun di dokumen tertentu. LDA juga mampu mengasosiasikan kata-kata yang ada pada dokumen dan koleksi ke topik tertentu. Berikut adalah beberapa pendalaman lainnya mengenai Topic Modelling dan LDA:
Konsep dasar dari LDA yaitu bahwa dokumen terdiri dari beberapa topik. LDA adalah model statistik dari kumpulan dokumen yang berusaha untuk merepresentasikan konsep tersebut. Proses LDA bersifat generatif melalui imaginary random process pada model yang mengasumsikan bahwa dokumen berasal dari topik tertentu. Setiap topik terdiri dari distribusi kata-kata.
LDA dapat digunakan untuk meringkas, melakukan klasterisasi, menghubungkan maupun memproses data yang sangat besar karena LDA menghasilkan daftar topik yang diberi bobot untuk masing-masing dokumen. LDA menggunakan asumsi bag of words, yaitu urutan kemunculan kata dalam dokumen diabaikan. Sebuah teks yang berupa kalimat ataupun dokumen diwakili sebagai kantung (bag) multiset dari kata-kata yang terkandung di dalamnya, tanpa memandang urutan kata dan tata bahasa namun tetap mempertahankan keberagamannya.
LDA mengklusterkan dengan melihat jumlah kemunculan kata pada Bag of Word, kemudian menentukan jumlah cluster atau jumlah topik dan menentukan jumlah iterasi. LDA menandai setiap kata pada topik secara semi random distribution kemudian menghitung probabilitas topik pada dokumen dan menghitung probabilitas kata pada topik setiap iterasinya.
LDA outputs:
Estimasi dari berapa banyak setiap topik berkontribusi terhadap setiap dokumen.
Estimasi dari berapa banyak setiap kata berkontribusi terhadap setiap topik.
Input untuk menambahkan data baru. Harapannya pengguna dashboard yang adalah pemilik usaha pada platform ecommerce dapat melakukan input data mandiri menggunakan data review customer milik mereka sendiri. Hal yang akan menjadi perhatian pada bagia ini adalah adanya elemen yang menunjukkan informasi penggunaan secara jelas. Elemen ini bertujuan agar para pengguna dashboard mengerti struktur data yang dibutuhkan dan tahapan input data secara lengkap dan jelas untuk menghindari error pada output.
Trend topic analysis: Fitur untuk melihat probabilitas tertinggi pada setiap topik dan kata-kata yang mewakilinya. Hal yang penting menjadi perhatian nantinya adalah penggunaan label, susunan, warna akan dibuat semenarik dan sejelas menungkin agar dashboard menjadi intuitif bagi pengguna yang awam sekalipun.
Project ini menggunakan:
Text Mining adalah salah satu metode analisis data yang fokus utamanya adalah mencari informasi dan pola-pola dari data yang tidak terstruktur, yaitu data teks sebagai variabel prediktornya.
Unsupervised Learning, dimana tidak memiliki target variabel.
<-
clothing_1 %>% select(Rating, Class.Name, Review.Text)
clothing
head(clothing_1)
## Rating Class.Name
## 1 4 Intimates
## 2 5 Dresses
## 3 3 Dresses
## 4 5 Pants
## 5 5 Blouses
## 6 2 Dresses
## Review.Text
## 1 Absolutely wonderful - silky and sexy and comfortable
## 2 Love this dress! it's sooo pretty. i happened to find it in a store, and i'm glad i did bc i never would have ordered it online bc it's petite. i bought a petite and am 5'8". i love the length on me- hits just a little below the knee. would definitely be a true midi on someone who is truly petite.
## 3 I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to be outrageously small. so small in fact that i could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half was comfortable and fit nicely, but the bottom half had a very tight under layer and several somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn directly into the zipper - it c
## 4 I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing but great compliments!
## 5 This shirt is very flattering to all due to the adjustable front tie. it is the perfect length to wear with leggings and it is sleeveless so it pairs well with any cardigan. love this shirt!!!
## 6 I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.
Dari dataset yang ada, kolom yang digunakan adalah Rating, Class.Name dan Review.Text. Ketiga variabel ini sudah cukup untuk kebutuhan pengolahan data dan isi dari kolom tersebut juga umum tersedia untuk diakses di platform ecommerce jika nantinya hasil dari project ini akan digunakan oleh pelaku usaha yang lain.
# Jumlah data unik untuk Rating dan Class.Name
nrow(unique(as.data.frame(clothing_1$Rating)))
## [1] 5
nrow(unique(as.data.frame(clothing_1$Class.Name)))
## [1] 21
Dilanjutkan dengan pengecekan kemungkinan data kosong
#Jumlah data kosong
sum(is.na(clothing_1))
## [1] 0
Data yang akan digunakan dalam analisis project ini adalah data teks yang berada pada rating rendah, yakni: nilai rating 1 dan 2 dan untuk kategori product dengan comment terbanyak (asumsi product utama) yakni Dresses.
<-
clothing_rating_low %>%
clothing_1 filter(Rating==1|Rating==2) %>%
filter(Class.Name=="Dresses")
#Jumlah data observasi
nrow(clothing_rating_low)
## [1] 689
Setelah mendapatkan dataset yang hanya berisi rating rendah, dataset tersebut kemudian diambil hanya bagian teksnya saja dan mengubahnya dalam bentuk corpus.
# Take text data only
<- clothing_rating_low[,3]
clothing_rating_low_text
# Create a corpus
<- Corpus(VectorSource(clothing_rating_low_text))
clothing_rating_low_text_corpus
head(clothing_rating_low_text_corpus$content, 3)
## [1] "I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress."
## [2] "First of all, this is not pullover styling. there is a side zipper. i wouldn't have purchased it if i knew there was a side zipper because i have a large bust and side zippers are next to impossible for me.\n\nsecond of all, the tulle feels and looks cheap and the slip has an awkward tight shape underneath.\n\nnot at all what is looks like or is described as. sadly will be returning, but i'm sure i will find something to exchange it for!"
## [3] "The design/shape of the dress are quite flattering, flirty and feminine. but.... there is no way that the dress i received is new. the color is a faded washed out red and there are black stains all over the belt area. there is no tag... the fabric looks droopy and laundered and is not crisp, stiff or new. i am very disappointed by the quality of the item that i received. undoubtedly this one is going back.\n\ndear retailer - please make sure that you do not send pre-owend clothing articles to"
Data yang ada pada dataset tidak sepenuhnya dapat langsung digunakan dalam proses permodelan. Data ini terlebih dahulu perlu disiapkan agar dapat sesuai dengan kebutuhan. Beberapa penjelasan mengenai tahapan yang dilakukan adalah sebagai berikut:
Case-folding Pengolahan data pada R bersifat case sensitive, sementara data yang ada di awal sebelum dirapikan masih terdiri dari berbagai jenis Font baik huruf kecil maupun huruf kapital. Agar seluruh kata yang ada seragam sehingga tidak berpotensi salah dama pengolahan data dan analisisnya maka perlu dilakukan tahapan untuk mengubah semua huruf kapital menjadi huruf kecil. Setelah semua huruf dalam suatu kata berada pada bentuk yang sama, maka suatu kata tersebut siap untuk dibandingkan untuk kebutuhan pengolahan teks tertentu. Proses mengubah semua huruf menjadi huruf kecil ini dapat dilakukan dengan fungsi tolower.
Remove punctuation Tanda baca bukan merupakan hal
yang akan dianlisis dalam Topic Modelling dimana yang elemen analisis
yang paling banyak berperan adalah kata. Selain itu, keberadaan tanda
baca juga bisa memberikan ruang kesalahan dalam kode program. Karena itu
kita perlu menghapus seluruh karakter-karakter yang tidak digunakan
untuk pengolahan teks ini. Tanda baca yang perlu dihilangkan adalah ! ’
# S % & ’ ( ) * + , - . / : ; < = > ? @ [ / ] ^ _ { | } ~.
Proses penghapusan tanda baca ini dapat dilakukan dengan fungsi
removePunctuation
.
Remove numbers Tidak jauh berbeda dengan tanda baca,
elemen angka juga bukan merupakan hal yang akan dianalisis dalam Topic
Modelling. Karena itu kita perlu menghapus seluruh angka yang ada dalam
data. Proses penghapusan tanda baca ini dapat dilakukan dengan fungsi
removeNumbers
.
Stemming & lemmatize Kata yang memiliki imbuhan
akan sulit diinterpretasikan dan dianggap sebagai unit yang berbeda
dengan kata dasarnya. Pada tahap ini dilakukan proses pengembalian
berbagai bentukan kata ke dalam suatu representasi yang sama. Stemming
adalah tahap mencari kata dasar dari tiap kata yang ada. Proses mengubah
kata menjadi kata dasarnya dapat dilakukan dengan fungsi
stemDocument
. Stemming hanya melakukan pemotongan kata,
apabila ingin lebih akurat kita dapat menggunakan teknik lemmatizing
(lookup kata sebenarnya sesuai kaidah bahasa, contoh: receiving ->
receive). Namun trade-offnya, komputasi lemmatizing lebih lama
dibandingkan stemming. Silahkan mengacu pada referensi untuk penggunaan
lemmatizing. Dari hasil stemming pada 6 data teratas terlihat bahwa
beberapa kata menjadi tidak baku karena proses stemming yang dilakukan
hanya memotong kata tanpa menyesuaikan kata tersebut ke bentuk dasar
yang tepat dan baku. Karena itu untuk tahapan ini akan menggunakan
metode untuk mendapatkan data-data yang lebih sesuai.
Remove stopwords Stopwords merupakan kata yang
diabaikan dalam Topic Modelling dan biasanya disimpan di dalam stop
lists. Stop list ini berisi daftar kata umum yang mempunyai fungsi tapi
tidak mempunyai arti. Tujuan utama dalam penerapan proses stopwords ini
adalah mengurangi jumlah kata dalam sebuah dokumen yang nantinya akan
berpengaruh dalam kecepatan dan performa. Karakteristik utama dalam
pemilihan stopwords biasanya adalah kata yang mempunyai frekuensi
kemunculan yang tinggi. Di bawah ini adalah stopwords
dalam
Bahasa Inggris.
# Preprocessing Data
<-
clothing_rating_low_text_corpus %>%
clothing_rating_low_text_corpus tm_map(content_transformer(tolower)) %>% # Case-folding
tm_map(content_transformer(removePunctuation)) %>% # Remove punctuation
tm_map(content_transformer(removeNumbers)) %>% # Remove numbers
tm_map(content_transformer(lemmatize_strings)) %>% # Stemming & lemmatize
tm_map(removeWords, stopwords("english")) # Remove stopwords
# Mengubah data ke bentuk dgCMatrix class
<- clothing_rating_low_text_corpus %>% DocumentTermMatrix()
clothing_rating_low_text_corpus_dtm inspect(clothing_rating_low_text_corpus_dtm)
## <<DocumentTermMatrix (documents: 689, terms: 2355)>>
## Non-/sparse entries: 17973/1604622
## Sparsity : 99%
## Maximal term length: 18
## Weighting : term frequency (tf)
## Sample :
## Terms
## Docs dress fabric fit good just like look love much size
## 131 2 0 1 0 0 0 1 1 1 4
## 145 1 3 0 0 0 3 1 1 1 0
## 153 0 0 0 0 0 3 3 0 1 0
## 155 3 2 1 0 1 0 1 0 1 1
## 51 3 1 1 2 0 0 0 0 0 5
## 561 3 0 0 1 0 2 0 0 0 0
## 586 0 0 1 2 2 1 2 0 1 0
## 662 0 1 0 1 1 1 0 0 0 0
## 681 4 1 0 2 0 0 1 0 4 2
## 75 3 3 0 0 3 1 0 0 0 0
Model LDA yang akan dibuat menggunakan nilai k = 5 topics. Nilai 5 masih berdasarkan nilai dipilih secara subjektif. Walaupun demikian, nilai k yang optimum akan dicari setelah modelnya dibangun.
Menggunakan Gibbs-sampling menggunakan 1000 iterasi sampling dan 500 burn-in iterasi. Burn-in iteration berarti kita hanya menggunakan sampel mulai dari iterasi 500 karena iterasi awal masih belum stabil dan mungkin tidak merefleksikan distribusi data aktual.
<- Matrix::Matrix(as.matrix(clothing_rating_low_text_corpus_dtm), sparse = T)
rating_low_dtm_lda
set.seed(123)
<- FitLdaModel(rating_low_dtm_lda,
rating_low_lda k = 5,
iterations = 1000,
burnin = 500,
calc_coherence = T
)
glimpse(rating_low_lda)
## List of 7
## $ phi : num [1:5, 1:2355] 2.44e-04 2.03e-05 1.41e-05 1.43e-05 1.23e-05 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
## .. ..$ : chr [1:2355] "alteration" "away" "brand" "color" ...
## $ theta : num [1:689, 1:5] 0.4582 0.2928 0.0272 0.1945 0.0328 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:689] "1" "2" "3" "4" ...
## .. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
## $ gamma : num [1:5, 1:2355] 0.8354 0.0427 0.0409 0.041 0.0401 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
## .. ..$ : chr [1:2355] "alteration" "away" "brand" "color" ...
## $ data :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## .. ..@ i : int [1:17973] 0 638 0 3 93 151 242 286 514 548 ...
## .. ..@ p : int [1:2356] 0 2 13 26 185 750 751 760 775 793 ...
## .. ..@ Dim : int [1:2] 689 2355
## .. ..@ Dimnames:List of 2
## .. ..@ x : num [1:17973] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..@ factors : list()
## $ alpha : Named num [1:5] 0.1 0.1 0.1 0.1 0.1
## ..- attr(*, "names")= chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
## $ beta : Named num [1:2355] 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
## ..- attr(*, "names")= chr [1:2355] "alteration" "away" "brand" "color" ...
## $ coherence: Named num [1:5] 0.0749 0.0477 0.042 0.0347 0.032
## ..- attr(*, "names")= chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
## - attr(*, "class")= chr "lda_topic_model"
Berikut adala beberapa atrribut yang diperoleh dari Model LDA:
phi adalah atribut yang menerangkan kemungkinan per-topik-per-kata
theta adalah atribut yang menerangkan kemungkinan per-dokumen-per-topik
coherence adalah atribut yang menerangkan coherence-per-topik
Topic coherence yaitu dimana satu set dari kata-kata yang dihasilkan pada topik model dengan dinilai berdasarkan tingkat koherensi atau dalam diinterpretasi oleh manusia dengan tingkat kemudahannya. Topic Coherence mengukur nilai dari suatu topik dengan mengukur tingkat kesamaan semantik antara kata-kata yang ada dalam topik. Pengkukuran ini dapat membantu dalam membedakan antara topik yang dapat diinterpretasi secara semantik dengan topik yang memiliki keterkaitan secara statistik. Topic Coherence merupakan suatu ukuran yang akan digunakan untuk mengevaluasi Topic Modeling, dimana jika coherence skor topik yang tinggi maka model yang dihasilkan tersebut yang baik.
Kita akan mencari jumlah topik yang optimal (k), berdasarkan rata-rata probabilitas coherence untuk beberapa jumlah topik yang berbeda, di antara k = 5 sampai k = 30 dengan interval 5. Untuk mempercepat komputasi, akan digunakan 100 iterasi sampling dan 50 burn-in iterasi.
# rating_low_dtm_lda <- Matrix::Matrix(as.matrix(clothing_rating_low_text_corpus_dtm), sparse = T)
#
# k_list <- seq(10, 30, by = 10)
#
# model_list <- TmParallelApply(X = k_list, FUN = function(k){
#
# m <- FitLdaModel(dtm = rating_low_dtm_lda,
# k = k,
# iterations = 10,
# burnin = 5,
# calc_coherence = TRUE)
#
# m <- mean(m$coherence)
#
# return(m)
# },
# cpus = 4
# )
# To get the top terms for each topic, we can use the GetTopTerms function.
<- GetTopTerms(rating_low_lda$phi, 50) %>%
rating_low_lda_word_topic as.data.frame()
# %>% set_names(paste("Topic", 1:5))
rating_low_lda_word_topic
## t_1 t_2 t_3 t_4 t_5
## 1 dress dress look dress dress
## 2 size get dress fabric fit
## 3 fit one like look fabric
## 4 order wear just like waist
## 5 small try make color much
## 6 large zipper back much just
## 7 love see good cheap top
## 8 try back fabric material good
## 9 petite can much picture really
## 10 just retailer love feel love
## 11 wear wash think quality hip
## 12 big time model make make
## 13 return button material price beautiful
## 14 think first shape see small
## 15 like take really back look
## 16 get buy feel love size
## 17 tight disappoint flatter disappoint way
## 18 way side heavy good color
## 19 arm even sack receive want
## 20 run review cut online work
## 21 much cute cute photo line
## 22 work great side slip cut
## 23 huge come order return bust
## 24 color two also expect chest
## 25 retailer sew body nice can
## 26 even find wear thin around
## 27 look fall right really design
## 28 good bad try also also
## 29 one seam great can high
## 30 bust will soft arrive skirt
## 31 usually sale return blue wear
## 32 purchase look tall retailer short
## 33 store put may just bottom
## 34 review hand didnt line material
## 35 medium sure say show bite
## 36 however quality nice fit like
## 37 style didnt super isnt flatter
## 38 perfect year color review unflattering
## 39 short wouldnt unfortunately worth run
## 40 will pull arm order area
## 41 hole now print come unfortunately
## 42 pretty although maybe design think
## 43 couldnt keep send style detail
## 44 really wed someone great long
## 45 model immediately huge white shoulder
## 46 buy spin-dry curvy high didnt
## 47 didnt every pretty want wasnt
## 48 beautiful either big unflattering strange
## 49 person part want say tight
## 50 cant completely know excite reviewer
<-
z %>%
rating_low_lda_word_topic rownames_to_column("id") %>%
mutate(id = as.numeric(id)) %>%
pivot_longer(-id, names_to = "topic", values_to = "term") %>%
ggplot(aes(label = term, size = rev(id), alpha = rev(id))) +
geom_text_wordcloud (seed = 123) +
facet_wrap(~topic, scales = "free") +
scale_alpha_continuous(range = c(0.4, 1)) +
theme_minimal() +
theme(strip.background = element_rect(fill = "blue"),
strip.text.x = element_text(colour = "white"))
z
%>%
rating_low_lda_word_topic select(t_1) %>%
rownames_to_column("id") %>%
pivot_longer(-id, names_to = "topic", values_to = "term") %>%
ggplot(aes(label = term, size = rev(id), alpha = rev(id))) +
geom_text_wordcloud (seed = 123) +
theme_minimal()
Nilai theta yang tinggi pada term menunjukkan bahwa term tersebut memiliki probabilitas yang tinggi dihasilkan dari topik tersebut. Hal ini juga mengindikasikan bahwa term tersebut memiliki asosiasi yang kuat kepada topik tertentu. Interpretasi topik dilakukan tidak hanya melihat kata-kata yang paling sering muncul, namun juga mempertimbangkan konteks dari review dengan menggunakan top 10 review pada setiap topik berdasarkan nilai thetanya.
# Menyimpan Theta per Topik
<-
rating_low_theta $theta %>%
rating_low_ldaas.data.frame() %>%
# set_names(paste("Topic", 1:5)) %>%
rownames_to_column("document")
Selanjutnya menggabungkan nilai Theta dengan data awal untuk mendapatkan isi review yang lengkap.
<-
rating_low_theta_review %>% cbind(clothing_rating_low, deparse.level = 0)
rating_low_theta
head(rating_low_theta_review, 3)
## document t_1 t_2 t_3 t_4 t_5 Rating
## 1 1 0.45822785 0.002531646 0.07848101 0.002531646 0.458227848 2
## 2 2 0.29275362 0.350724638 0.11884058 0.234782609 0.002898551 2
## 3 3 0.02716049 0.397530864 0.05185185 0.446913580 0.076543210 2
## Class.Name
## 1 Dresses
## 2 Dresses
## 3 Dresses
## Review.Text
## 1 I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.
## 2 First of all, this is not pullover styling. there is a side zipper. i wouldn't have purchased it if i knew there was a side zipper because i have a large bust and side zippers are next to impossible for me.\n\nsecond of all, the tulle feels and looks cheap and the slip has an awkward tight shape underneath.\n\nnot at all what is looks like or is described as. sadly will be returning, but i'm sure i will find something to exchange it for!
## 3 The design/shape of the dress are quite flattering, flirty and feminine. but.... there is no way that the dress i received is new. the color is a faded washed out red and there are black stains all over the belt area. there is no tag... the fabric looks droopy and laundered and is not crisp, stiff or new. i am very disappointed by the quality of the item that i received. undoubtedly this one is going back.\n\ndear retailer - please make sure that you do not send pre-owend clothing articles to
# Top 20 Terms
1] rating_low_lda_word_topic[,
## [1] "dress" "size" "fit" "order" "small" "large"
## [7] "love" "try" "petite" "just" "wear" "big"
## [13] "return" "think" "like" "get" "tight" "way"
## [19] "arm" "run" "much" "work" "huge" "color"
## [25] "retailer" "even" "look" "good" "one" "bust"
## [31] "usually" "purchase" "store" "review" "medium" "however"
## [37] "style" "perfect" "short" "will" "hole" "pretty"
## [43] "couldnt" "really" "model" "buy" "didnt" "beautiful"
## [49] "person" "cant"
# # Review
# rating_low_theta_review %>%
# arrange(desc(`Topic 1`)) %>%
# select(`Topic 1`, Class.Name, Review.Text) %>%
# head(10)
Review Highlight
After reading every review, i was certain i knew exactly what i
was getting with this dress. wrong.
i sized up from a small to a medium and the dress was still uncomfortably tight on my rib cage.
the top is very exposing of the bust, and i’m small busted. i can’t
stand wearing camis, so
there no way i'd be able to wear this dress even if i sized up again to a large.
I purchased this and another eva franco dress during retailer’s recent 20% off sale. i was looking for dresses that were work appropriate, but that would also transition well to happy hour or date night. they both seemed to be just what i was looking for. `i ordered a 4 regular and a 6 regular, as i am usually in between sizes. the 4 was definitely too small. the 6 fit, technically, but was very ill fitting. not only is the dress itself short, but it is very short-waisted.’
This dress is huge. i am normally a l-xl regular in retailer brands.
i love a loose, flowy dress. after trying one on in the store (just a
random returned petite xl)
i ordered the petite medium. and it was still huge
and not
flattering. so sad. i wanted to love this- loved the material and
colors, but way off.
Don't buy this dress unless you are normally a medium or larger.
order it one or two sizes smaller than your normal size. i ordered an xs
and it’s more like a medium or large.
I love this dress but i will return it because the
sleeves are too tight
. i have never had this problem
before. i ordered a petite 0 and petite 2. for reference i am 5’ 4” tall
and 117 lbs. the length is perfect and i really like the style. if the
sleeves had not been so tight i would have kept the petite 0. the petite
2 sleeves were no looser. sorry that this did not work for me.
The dress is beautiful on the model. in person it’s sheer, but
i could have dealt with that if the fit had been great. i am 5'4" and the petite was too short and the regular too long. both are very boxy and seem to run big. even the size small in both petite and regular were too big on me
and usually in cloth and stone size small is perfect. i really wanted to
love this dress but ended up returning several sizes in my attempt to
find the perfect size.
Interpretasi: Ukuran tidak standar
# Top 20 Terms
2] rating_low_lda_word_topic[,
## [1] "dress" "get" "one" "wear" "try"
## [6] "zipper" "see" "back" "can" "retailer"
## [11] "wash" "time" "button" "first" "take"
## [16] "buy" "disappoint" "side" "even" "review"
## [21] "cute" "great" "come" "two" "sew"
## [26] "find" "fall" "bad" "seam" "will"
## [31] "sale" "look" "put" "hand" "sure"
## [36] "quality" "didnt" "year" "wouldnt" "pull"
## [41] "now" "although" "keep" "wed" "immediately"
## [46] "spin-dry" "every" "either" "part" "completely"
# # Review
# rating_low_theta_review %>%
# arrange(desc(`Topic 2`)) %>%
# select(`Topic 2`, Class.Name, Review.Text) %>%
# head(10)
Review Highlight
As another reviewer pointed out, although
the tag inside the dress says that you can machine wash in cold water, the color around the popsicles will bleed! this is extremely disappointing.
i’m hoping it can either be fixed with dye remover, or that retailer
will take this item back. would only dry clean this item.
After the second wash, the stitching in the armpits completely fell apart.
after sewing it back up, wearing it, and
washing it a third time, the stitching in the other armpit completely fell apart as well.
not sure if this was just a fluke, but it was very disappointing. now
that both armpits are sewn and all other stitches are intact, i must say
that this is a great dress. i would have returned the dress to retailer,
but i’ve already washed and worn it, so again…very
disappointing.
Retailer has incredible service, and so they let me return this
dress even after having worn it. however, this dress cannot be cleaned.
i saw a stain on it after the first use, so
i first tried hand washing it. however, the black part of the dress came off on the white part, making the stain worse. i then tried dry cleaning it (the care tag allows both)
and that produced 3 or 4 new stains where the black rubbed off on the
white. it’s a beautiful, striking dress (that would be great
matern
I love maeve dresses and have always gotten a lot of wear out of
them while still looking great. this dress does not meet those
standards.
on the 2nd wear of this dress the zipper completely broke.
i had to take it to a dry cleaner to get replaced at a pricey cost for a
pricey dress, i’m dissapointed.
According to the label, this dress can be hand washed or dry
cleaned.
i hand washed and line dried per the instructions on the label, and all the seams have completely shrunk and gathered.
it looks like there’s a drawstring running through every seam. so
disappointed in the quality and the labeling. awful! i gave it one extra
star because it’s cute.
Unfortunately
the dress is pulling apart after the first wash ( gentle hand wash cycle).
also, there are no bra-straps holders, and my bra straps did show up a
lot…
Love this dress! wore it last night for a musical event…
came home and put it on the hand wash setting in the washer with cold water...just as the label states. pulled the dress out and guess what? the design is completely faded!!!
it looks 5 years old :( so sad. i’m going to call retailer and ask for a
refund (i live two hours away)
Interpretasi: Kualitas material, mengalami kerusakan setelah pencucian
# Top 20 Terms
3] rating_low_lda_word_topic[,
## [1] "look" "dress" "like" "just"
## [5] "make" "back" "good" "fabric"
## [9] "much" "love" "think" "model"
## [13] "material" "shape" "really" "feel"
## [17] "flatter" "heavy" "sack" "cut"
## [21] "cute" "side" "order" "also"
## [25] "body" "wear" "right" "try"
## [29] "great" "soft" "return" "tall"
## [33] "may" "didnt" "say" "nice"
## [37] "super" "color" "unfortunately" "arm"
## [41] "print" "maybe" "send" "someone"
## [45] "huge" "curvy" "pretty" "big"
## [49] "want" "know"
# # Review
# rating_low_theta_review %>%
# arrange(desc(`Topic 3`)) %>%
# select(`Topic 3`, Class.Name, Review.Text) %>%
# head(10)
Review Highlight
This looked fabulous on the model, of course, and i love sweater
dresses.
but even in an xs, i was drowning in fabric and looked ludicrous.
concept is good but execution went terribly wrong somewhere. a definite
return. * The pleats on the bib make this look like something from chloe
sevigny’s wardrobe on the set of big love. and the shoulders are cut for
an offensive lineman.
Unfortunately,
this dress is shaped like a sack and has no shape to speak of.
i have sent it back for a refund :( too big, too shapeless.
Too full of a dress for me. i tried a belt and it is then too chunky.
if you are younger, tall and slim, this would look great
If you have any curves, avoid! it is not
flattering.the striped side panels look odd with the flow of the dress.
I wanted to like this but
it just didn't have much of a shape,
not flattering at all.
maybe if i was taller?
Frumpy - looked like a nightgown on me; maybe better on someone younger …
There were just too many pleats to this thing. it was not flattering.
it looks fantastic on this model but not on me. oh well.
Didn’t like the dress. look pregnant in it.
this
dress is going back.
This dress looks lovely on the model,
but it looks just awful on me!\ni am 5'4" and curvy. the shoulders are cut w/ a weird shape.
instead of a cap sleeve, it is like a pointy cap on the side and stuck
out on my shoulders. it looks like i am ready to get beamed into
space.material is nice and the sequins are pretty, but the cut of the
dress is unflattering.
Interpretasi: Bentuk saat digunakan tidak bagus terlihat berbeda dengan saat dikenakan oleh model (Display)
# Top 20 Terms
4] rating_low_lda_word_topic[,
## [1] "dress" "fabric" "look" "like" "color"
## [6] "much" "cheap" "material" "picture" "feel"
## [11] "quality" "make" "price" "see" "back"
## [16] "love" "disappoint" "good" "receive" "online"
## [21] "photo" "slip" "return" "expect" "nice"
## [26] "thin" "really" "also" "can" "arrive"
## [31] "blue" "retailer" "just" "line" "show"
## [36] "fit" "isnt" "review" "worth" "order"
## [41] "come" "design" "style" "great" "white"
## [46] "high" "want" "unflattering" "say" "excite"
# # Review
# rating_low_theta_review %>%
# arrange(desc(`Topic 4`)) %>%
# select(`Topic 4`, Class.Name, Review.Text) %>%
# head(10)
Review Highlight
The fabric of this looks and feels so cheap
it’s
hard to believe this is anna sui. i would say this is unwearable.
there is zero stretch and it is made of itchy textured poly fabric that looks like novelty fabric at best.
going back asap.
Looks much nicer in the photo.
i expected a much higher quality fabric. this fabric truly felt cheap.
i expected a nicer dress for the price. the fit was unflattering on me.
sent it back
I returned this dress because
the material is so flimsy and thin.
i was expecting higher
quality, especially for the price. i was disappointed. it is a pretty
dress and fit great, but i felt like i was wearing tissue
paper.
There is no way this is worth the price. i was deeply
disappointed when it arrived.
the material is thin and feels cheap.
i love the design,
and anna sui, but this is just so overpriced.
I was very surprised to see such dark blue sequence on this dress. it did not appear as it did on the photo. i was pretty disappointed by the color. the photo of the dress is much more impressive.
The dress is very pretty, but the sequins are dark blue! i imagined they would be silver/gold, given the photo, but the dress is essentially light pink and blue.
the dress,
but the fabric makes it look cheap. for an expensive dress,
i had expected better quality.
The dress is not the beautiful color in the picture…it is a very
yellowish cream.
the fabric is thick and does not\nhang pretty.
very
disappointed. had to return it.
I was disappointed with the style and quality of this dress. it poofs out funny in the waste and it just looks very unflattering. i’m sending it back.
returned the dress because the color was not as pictured in the online photo. i did not like the color.
Interpretasi: Kualitas bahan tidak sesuai dengan harga
# Top 20 Terms
5] rating_low_lda_word_topic[,
## [1] "dress" "fit" "fabric" "waist"
## [5] "much" "just" "top" "good"
## [9] "really" "love" "hip" "make"
## [13] "beautiful" "small" "look" "size"
## [17] "way" "color" "want" "work"
## [21] "line" "cut" "bust" "chest"
## [25] "can" "around" "design" "also"
## [29] "high" "skirt" "wear" "short"
## [33] "bottom" "material" "bite" "like"
## [37] "flatter" "unflattering" "run" "area"
## [41] "unfortunately" "think" "detail" "long"
## [45] "shoulder" "didnt" "wasnt" "strange"
## [49] "tight" "reviewer"
# # Review
# rating_low_theta_review %>%
# arrange(desc(`Topic 5`)) %>%
# select(`Topic 5`, Class.Name, Review.Text) %>%
# head(10)
Review Highlight
This dress had an odd fit. very loose on top with pointy darts, very tight on bottom.
fabric was stiff so it didn’t move well with me
The fabric and colors of this dress are beautiful but the
fit is terrible. i had to go up a size to get a fit in the waist but it was then incredibly loose in the shoulders.
what a pity. had to return.
I had high hopes for this dress, but unfortunately was
disappointed and returned the dress.
it flares out with too much material under the gathered waist. it's so odd.
I loved the design and material of the test, but unfortunately it
just did not fit me right.
it was tight in the hips and did not have the relaxed look i was hoping for
also the pattern did not look good on me. unfortunately i had to
return.
The fabric and its texture didn’t meet my expectations unfortunately. there were too much fray and cast-off of edge yarns of a fabric… however this will fit if you wear over the swimming wears.
My v-shape figure (broad shoulder, narrow hips, waist not well defined) looked completely square in this.
not flattering at all. perhaps better for other figures.
This ran a bit small; i’m normally a 2, this was a 4 and ‘just’ fit. the pattern was very ‘digi,’ the length was shorter than i hoped (i’m 5’1”), and the fabric, while structured, was just cheap and acrylic-feeling. pass.
So disappointed! beautiful dress in the photographs but the cut
was incredibly strange.
loose and baggy through the top and mid-section but tight around the buttock and thighs.
This is a pretty dress, but the cut is quite boxy. i had thought
that the dress would cut in a bit along the body and then
"flare out" more around the waist down to the hem; however, the dress is baggy from the sleeves all the way down,
which lends to a rather unflattering look.
I was thoroughly underwhelmed. it did not hang well, and the fit
seemed off. the tie waist just sort of hung there.
i could
have sized down, but i didn’t even want to bother. it was pretty boring
and flimsy for $138.
Interpretasi: Ukuran/bentuk tidak proporsional
<- data_frame(Text = clothing_rating_low_text)
df
head(df, n = 20)
## # A tibble: 20 × 1
## Text
## <chr>
## 1 "I love tracy reese dresses, but this one is not for the very petite. i am j…
## 2 "First of all, this is not pullover styling. there is a side zipper. i would…
## 3 "The design/shape of the dress are quite flattering, flirty and feminine. bu…
## 4 "The colors are vivid and perfectly autumnal but the fit is a mess. it was o…
## 5 "I don't normally review my purchases, but i was so amazed at how poorly thi…
## 6 "I love byron lars dresses, and this design is on-point. the ruffle at the n…
## 7 "Three strikes and retailer is out for me! i am so disappointed. i really li…
## 8 "The fun colors drew me to this but it sure fit weird. the top was fine but …
## 9 ""
## 10 "I loved this dress when i saw it. however the fit was way off. i am 5'7\" 1…
## 11 "I don't typically write bad reviews, but this dress is so bad and i want to…
## 12 "Don't buy this dress unless you are normally a medium or larger. order it o…
## 13 "The overall styling was great, and the dress is super-cute, if a little thi…
## 14 "I love retailer and fell in love as soon as i saw this dress online. being …
## 15 "I am floored by the amount of positive reviews on this dress! when i receiv…
## 16 "This didn't work for me. im normally a m (8/10). got this in xs. that was t…
## 17 "I just received this in the mail today. first of all there was no slip incl…
## 18 "Ditto what the first reviewer said, unfortunately. i was so looking forward…
## 19 "I loved the photo of this dress. upon examination of the dress (and trying …
## 20 "The dress arrived with a few snags and the fabric already pilling. the fabr…
#The unnest_tokens() function from the tidytext package picks out the individual words and places them as rows.
<- df %>%
clothing_rating_low_words unnest_tokens(output = word, input = Text)
#An anti_join() is used to remove stopwords from peter_words().
<- clothing_rating_low_words %>%
clothing_rating_low_words anti_join(stop_words) # Remove stop words in clothing_rating_low_words
## Joining, by = "word"
#The count() function with the %>% pipe operator from the dplyr package is used to obtain counts of the words.
<- clothing_rating_low_words %>% count(word, sort = TRUE)
clothing_rating_low_wordcounts
head(clothing_rating_low_wordcounts)
## # A tibble: 6 × 2
## word n
## <chr> <int>
## 1 dress 1065
## 2 fabric 254
## 3 size 209
## 4 fit 208
## 5 love 152
## 6 material 139
The data now has a column for words and a second column for the word counts. A bar graph can be prepared with the ggplot2 function ggplot().
# ggplot2 Plot:
%>%
clothing_rating_low_wordcounts filter(n > 70) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col() +
coord_flip() +
labs(x = "Word \n", y = "\n Count ", title = "Frequent Words Clothing Review \n") +
geom_text(aes(label = n), hjust = 1.2, colour = "white", fontface = "bold") +
theme(plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(face="bold", colour="darkblue", size = 12),
axis.title.y = element_text(face="bold", colour="darkblue", size = 12))
Coba-coba term frequency
<-
words_t1 %>%
rating_low_theta_review arrange(desc(t_2)) %>%
filter(t_2 > 0.5) %>%
select(t_2, Review.Text)
# words_text <-
# words_t1[,2]
<- data_frame(Text = words_t1[,2])
df_t1
# head(df_t1, n = 20)
#The unnest_tokens() function from the tidytext package picks out the individual words and places them as rows.
<- df_t1 %>%
df_t1_words unnest_tokens(output = word, input = Text)
#An anti_join() is used to remove stopwords from peter_words().
<- df_t1_words %>%
df_t1_words anti_join(stop_words) #Remove stop words in clothing_rating_low_words
## Joining, by = "word"
#The count() function with the %>% pipe operator from the dplyr package is used to obtain counts of the words.
<- df_t1_words %>% count(word, sort = TRUE) %>% as.data.frame()
df_t1_wordcounts
#df_t1_wordcounts
%>%
df_t1_wordcounts top_n(10) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col() +
coord_flip() +
labs(x = "Word \n", y = "\n Count ", title = "Frequent Words Clothing Review \n") +
geom_text(aes(label = n), hjust = 1.2, colour = "white", fontface = "bold") +
theme(plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(face="bold", colour="darkblue", size = 12),
axis.title.y = element_text(face="bold", colour="darkblue", size = 12))
## Selecting by n
#Data Tabel interpretasi dan solusi
= c("t_1","t_2","t_3","t_4","t_5")
topic = c("Ukuran tidak standar",
interpretasi "Kualitas material, mengalami kerusakan setelah pencucian",
"Bentuk saat digunakan tidak bagus, berbeda dengan display",
"Kualitas bahan tidak sesuai dengan harga",
"Ukuran atau bentuk tidak proporsional")
= c("solusi_1", "solusi_2", "solusi_3", "solusi_4", "solusi_5")
solusi
<- as.data.frame(cbind(topic, interpretasi, solusi))
data_solusi data_solusi
## topic interpretasi solusi
## 1 t_1 Ukuran tidak standar solusi_1
## 2 t_2 Kualitas material, mengalami kerusakan setelah pencucian solusi_2
## 3 t_3 Bentuk saat digunakan tidak bagus, berbeda dengan display solusi_3
## 4 t_4 Kualitas bahan tidak sesuai dengan harga solusi_4
## 5 t_5 Ukuran atau bentuk tidak proporsional solusi_5
#mengubah bentuk dataset wordcount menjadi Longer
<-
df_wordcounts_longer %>%
rating_low_theta_review select(t_1, t_2, t_3, t_4, t_5, Review.Text) %>%
pivot_longer(cols= c(t_1, t_2, t_3, t_4, t_5), names_to = "topic") %>%
filter(value > 0.5) %>% as.data.frame()
#df_wordcounts_longer
<-
words_longer_t1 %>% filter(topic=="t_1")
df_wordcounts_longer
<- data_frame(Text = words_longer_t1[,1])
df_longer_t1
# head(df_t1, n = 20)
#The unnest_tokens() function from the tidytext package picks out the individual words and places them as rows.
<- df_longer_t1 %>%
df_longer_t1_words mutate_all(as.character) %>%
unnest_tokens(output = word, input = Text)
#An anti_join() is used to remove stopwords from peter_words().
<- df_longer_t1_words %>%
df_longer_t1_words anti_join(stop_words) #Remove stop words in clothing_rating_low_words
## Joining, by = "word"
#The count() function with the %>% pipe operator from the dplyr package is used to obtain counts of the words.
<- df_longer_t1_words %>% count(word, sort = TRUE) %>% as.data.frame()
df_longer_t1_words
#df_longer_t1_words
https://www.tidytextmining.com/topicmodeling.html
https://towardsdatascience.com/beginners-guide-to-lda-topic-modelling-with-r-e57a5a8e7a25
https://rpubs.com/Argaadya/topic_lda
https://ladal.edu.au/topicmodels.html
https://www.kaggle.com/code/rtatman/nlp-in-r-topic-modelling
https://rdrr.io/cran/textmineR/man/FitLdaModel.html
https://content-analysis-with-r.com/6-topic_models.html
https://rstudio-pubs-static.s3.amazonaws.com/266565_171416f6c4be464fb11f7d8200c0b8f7.html
https://rpubs.com/salomo/detikcom-topic-modelling
https://cran.r-project.org/web/packages/tidytext/vignettes/tf_idf.html
https://dk81.github.io/dkmathstats_site/rtext-freq-words.html