1 . Latar Belakang dan Pemahaman Masalah

1.1 Latar Belakang

Ecommerce mengalami pertumbuhan yang sangat pesat di beberapa tahun terakhir. Pemerintah mencatat nilai transaksi ekonomi pada e-commerce di kuartal I-2022 telah mencapai Rp 108,54 triliun. Capaian tersebut mengalami pertumbuhan 23 persen dibandingkan periode yang sama di tahun lalu. Pertumbuhan dari ecommerce ini dipengaruhi oleh banyak faktor, di antaranya adalah:

Pertumbuhan pengguna internet di Indonesia yang tumbuh 21 juta hanya dalam waktu singkat. Pertumbuhan tersebut tercatat hanya selama masa pandemi atau sejak 2020 hingga 2021 semester I. APJII dalam laporan terbaru bertajuk “Profil Internet Indonesia 2022” mengungkapkan sembilan alasan utama seseorang menggunakan internet dan salah satu yang tertinggi adalah untuk melakukan transaksi online sebesar 79 persen.
Peningkatan fitur dan infrastruktur pembayaran pembelian online. Dari sisi penjual juga semakin paham dengan teknologi digital, bahkan 98 persen telah menggunakan metode pembayaran digital.
Pandemi telah mendorong percepatan penggunaan teknologi digital baik di industri maupun di masyarakat. Hal ini sejalan dengan yang disampaikan oleh Direktorat Ekonomi Digital Ditjen Aptika (2021) bahwa pembangunan infrastruktur telekomunikasi di Indonesia yang sebelumnya didorong hanya dari sisi disrupsi teknologi namun saat pandemi covid-19 ternyata mampu memacu akselerasi transformasi digital di semua sektor.
Peningkatan penetrasi smartphone. APJII dalam laporan “Profil Internet Indonesia 2022” mengungkapkan bahwa pengguna internet Indonesia mayoritas atau sebanyak 89,03 persen mengakses internet dengan perangkat ponsel atau tablet. Sementara itu, hanya 0,73 persen yang mengakses lewat komputer atau laptop.
Penerimaan investasi asing.

Masa depan ecommerce di Indonesia akan terus bertumbuh. Direktorat Jenderal Aplikasi Informatika Kementerian Kominfo (2022) menyatakan pertumbuhan nilai perdagangan elektronik (e-commerce) di Indonesia mencapai 78 persen, tertinggi di dunia. Sementara itu, dilansir dari Katadata disebutkan bahwa Indonesia diperkirakan dapat menjadi kontributor pertumbuhan utama di Asia Pasifik. Berdasarkan analisis RedSeer, pasar e-commerce Indonesia diproyeksikan dapat meningkat menjadi 137,5 miliar USD pada 2025. Nilai transaksi tersebut merupakan pertumbuhan majemuk tahunan (CAGR) sebesar 25,3% dari pencapaian tahun 2020 sebesar 44,6 miliar USD. RedSeer juga memproyeksikan nilai transaksi e-commerce Indonesia mencapai 67,4 miliar USD pada 2021. Pada 2022, nilai transaksi diproyeksikan menjadi 86 miliar USD. Selanjutnya, nilai tersebut meningkat menjadi 104 miliar USD pada 2023 dan 121 miliar USD pada 2024. Nilai transaksi e-commerce di Indonesia juga akan menjadi yang terbesar di Asia Pasifik dengan estimasi 137,5 miliar USD pada 2025. Berdasarkan hal tersebut berarti Indonesia mencakup 59% dari total nilai transaksi Asia Pasifik yang sebesar 231 miliar USD.

Salah satu hal yang menjadi keunggulan berbelanja online dibandingkan offline adalah adanya fitur review customer. Beberapa penelitian yang dilakukan untuk melihat pengaruh customer review terhadap minat beli pada marketplace menyimpulkan bahwa customer review berpengaruh pada minat beli. Hal ini juga didukung oleh data pada artikel yang dimuat di sosiakita.com yang menyatakan bahwa hampir 89% customer melihat review online sebagai bahan pertimbangan untuk mereka berbelanja. Namun 80% customer akan merubah pikiran untuk berbelanja ketika melihat review buruk pada suatu produk atau jasa tersebut.

Mengingat pentingnya customer review pada platform belanja online membuat setiap pihak yang menggunakan platform ecommerce untuk berjualan seharusnya dapat menggunakan review customer sebagai salah satu data penting dalam pembuatan berbagai strategi perusahaan mereka. Walaupun demikian, dengan banyaknya jumlah customer yang berbelanja dan memberikan review, membuat ada kebutuhan bagi pelaku usaha untuk dapat mengolah data customer review tersebut dengan baik dan segera. Selain itu review juga dapat bervariasi dan bertambah jumlahnya dari waktu ke waktu. Untuk itu dibutuhkan kemampuan identifikasi topik dari review customer tersebut untuk menemukan informasi apa saja yang sering dibahas terkait produk-produk yang diperjualbelikan di akun penjual ecommerce tersebut.Namun sayangnya belum tentu seluruh pelaku usaha yang berjualan pada platform ecommerce memiliki kapabilitas dalam pengolahan data customer review. Karena itu dibutuhkan sebuah alat analisis yang dapat secara cepat digunakan.

Terkait dengan produk yang diperdagangkan di ecommerce, Katadata Insight Center (KIC) bekerja sama dengan Kredivo merilis studi Perilaku Konsumen E-Commerce Indonesia. Pada laporan tersebut dijelaskan bahwa yang menjadi produk dengan proporsi jumlah transaksi terbesar adalah produk fesyen dan aksesorisnya berada di peringkat kedua dengan proporsi jumlah transaksi 17,3% sedikit di bawah pulsa dan voucher dengan jumlah transaksi 23,4% dari total jumlah transaksi e-commerce pada tahun lalu. Hal ini juga sejalan dengan Hasil riset MarkPlus mengenai e-commerce di Indonesia di masa pandemi covid-19 yang menyebutkan persentase penjualan dalam berbagai kategori yang paling sering dibeli oleh konsumen di e-commerce untuk produk fesyen atau pakaian di Shopee sebesar 59 persen, Tokopedia 33 persen, Bukalapak 26 persen, Lazada 40 persen, JD.ID 31 persen, dan Blibli 28 persen. Salah satu produk fashion yang menjadi peluang besar di tahun 2022 adalah clothing atau yang juga sering disebut pakaian jadi. Produk ini dulu sempat berkembang di saat banyak anak muda di Indonesia yang berbondong-bondong membuka usaha clothing line. Tren ini ternyata masih berlanjut sampai sekarang dan sudah mulai merambah penjualanya, dimana tidak lagi hanya berbentuk outlet offline yang disebut sebagai distro, namun juga pada penjualan online salah satunya melalui ecommerce. Hal inilah yang membuat diperlukan project yang bukan hanya mendalami data review customer namun juga berfokus pada salah satu produk yang paling banyak diperjual belikan pada platform ecommerce, yakni pakaian jadi.

1.2 Business impact project

Project ini memberikan kesempatan kepada para pelaku usaha, yakni toko fesyen pada platform ecommerce di berbagai skala untuk dapat melakukan analisis terhadap data review customer yang mereka miliki. Review customer direncanakan untuk dilakukan per produk, sehingga user dari project dapat melakukan analisis sesuai dengan jumlah produk yang dimiliki. Dengan melakukan pemanfaatan data dan analisis terhadap review customer pelaku usaha diharapkan dapat memberikan manfaat langsung, yakni:

Mendapatkan informasi kebutuhan pelanggan.
Melakukan evaluasi dan pengembangan produk.

Dengan mendapatkan manfaat tersebut, para pelaku usaha juga nantinya dapat membuat strategi untuk meningkatkan kepercayaan customer serta menumbuhkan kredibilitas dari usaha yang dimiliki. Seluruh aktivitas yang berfokus pada persepsi dan opini pelanggan akan sangat membantu pelaku usaha online untuk mendapatkan lebih banyak customer dan menjaga para customer tersebut agar tetap loyal.

1.3 Target Pengguna dan Manfaat

Target pengguna project ini adalah pemilik toko fesyen pada platform ecommerce.

Manfaat yang diperoleh dari project ini adalah kemudahan pengolahan data review customer yang dikaitkan dengan jenis produk. User juga mendapatkan insight serta visualisasi data yang mudah dipahami dan digunakan untuk melakukan analisis.

1.4 Implementasi Business Serupa

Implementasi bisa dikembangkan pada berbagai perusahaan di berbagai industri (FMCG, Retail, Financial, Teknologi, Media, dsb) yang memiliki data review customer untuk berbagai jenis produk yang dipasarkan baik melalui platform ecommerce atau bahkan platform lainnya yang juga memiliki fitur customer review.

1.5 Tujuan dan Output Project

Project ini bertujuan untuk menghasilkan:

Pengolahan data review customer.
Menemukan model machine learning yang sesuai.
Membangun dashboard visualisasi dengan fitur input output data.

2 . Data Collection (Pengumpulan Data)

2.1 Informasi Singkat Data

clothing <- read.csv("Womens_Ecommerce_Clothing_Reviews/Womens_Ecommerce_Clothing.csv")

dim(clothing)

## [1] 23486    11

Terlihat bahwa dataset memiliki dimensi 23.486 observasi dengan 11 kolom. Selanjutnya dilihat jumlah kolom dan type data untuk masing-masing kolom.

glimpse(clothing)

## Rows: 23,486
## Columns: 11
## $ X                       <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, …
## $ Clothing.ID             <int> 767, 1080, 1077, 1049, 847, 1080, 858, 858, 10…
## $ Age                     <int> 33, 34, 60, 50, 47, 49, 39, 39, 24, 34, 53, 39…
## $ Title                   <chr> "", "", "Some major design flaws", "My favorit…
## $ Review.Text             <chr> "Absolutely wonderful - silky and sexy and com…
## $ Rating                  <int> 4, 5, 3, 5, 5, 2, 5, 4, 5, 5, 3, 5, 5, 5, 3, 4…
## $ Recommended.IND         <int> 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1…
## $ Positive.Feedback.Count <int> 0, 4, 0, 0, 6, 4, 1, 4, 0, 0, 14, 2, 2, 0, 1, …
## $ Division.Name           <chr> "Initmates", "General", "General", "General Pe…
## $ Department.Name         <chr> "Intimate", "Dresses", "Dresses", "Bottoms", "…
## $ Class.Name              <chr> "Intimates", "Dresses", "Dresses", "Pants", "B…

Dataset berisi 10 variabel dengan penjelasan sebagai berikut:

Clothing ID: spesifik barang yang direview.
Age: Umur customer yang memberikan review.
Title: Judul review.
Review Text: Isi review.
Rating: Skor produk yang diberikan customer, dimana 1 untuk paling buruk dan 5 untuk paling baik.
Recommended IND: Rekomendasi produk, dimana 1 untuk direkomondasikan dan 0 untuk tidak direkomendasikan.
Positive Feedback Count: Jumlah dimana customer lain menyatakan review ini membantu.
Division Name: Nama divisi dari produk yang direview.
Department Name: Nama departemen dari produk yang direview.
Class Name: Nama kelas dari produk yang direview.

2.2 Sumber datanya dan Teknik Pengumpulannya

Data yang akan digunakan bersumber dari Kaggle “Women’s E-Commerce Clothing Reviews” https://www.kaggle.com/datasets/nicapotato/womens-ecommerce-clothing-reviews. Data ini berasal dari data komersial yang ditarik dengan cara web-scraped namun sudah dibuat anonimus dan nama perusahaan pada review text sudah diganti menjadi “retailer”.

2.3 Kesesuaian Data Dengan Kebutuhan Bisnis

Data yang digunakan sudah sesuai dengan kebutuhan bisnis karena isi dari data yang ada cukup lengkap menggambarkan kondisi data customer review yang biasanya ada di berbagai platform ecommerce. Walaupun demikian, hal yang harus diperhatikan jika implementasi dari project ini dilakukan pada data teks yang berbahasa Indonesia, maka perlu adanya penyesuaian dari sisi pengeolahan datanya.

3 . Product Design

3.1 Algoritma Machine Learning yang akan digunakan

Topic Modelling pada project ini menggunakan Latent Dirichlet Allocation (LDA). LDA termasuk model tanpa pengawasan (unsupervised) untuk menemukan topik yang terkandung dalam suatu kumpulan dokumen. LDA digunakan karena metode ini memiliki kemampuan untuk mendeteksi topik-topik yang ada pada koleksi dokumen beserta proporsi kemunculan topik tersebut, baik di koleksi maupun di dokumen tertentu. LDA juga mampu mengasosiasikan kata-kata yang ada pada dokumen dan koleksi ke topik tertentu. Berikut adalah beberapa pendalaman lainnya mengenai Topic Modelling dan LDA:

Tujuan topic modeling yaitu menentukan topik secara otomatis dari sekumpulan dokumen. Dokumen yang diteliti memiliki struktur tersembunyi (hidden structure) berupa topik, distribusi topik per dokumen, dan penentuan topik per kata dalam setiap dokumen.

Konsep dasar dari LDA yaitu bahwa dokumen terdiri dari beberapa topik. LDA adalah model statistik dari kumpulan dokumen yang berusaha untuk merepresentasikan konsep tersebut. Proses LDA bersifat generatif melalui imaginary random process pada model yang mengasumsikan bahwa dokumen berasal dari topik tertentu. Setiap topik terdiri dari distribusi kata-kata.
LDA dapat digunakan untuk meringkas, melakukan klasterisasi, menghubungkan maupun memproses data yang sangat besar karena LDA menghasilkan daftar topik yang diberi bobot untuk masing-masing dokumen. LDA menggunakan asumsi bag of words, yaitu urutan kemunculan kata dalam dokumen diabaikan. Sebuah teks yang berupa kalimat ataupun dokumen diwakili sebagai kantung (bag) multiset dari kata-kata yang terkandung di dalamnya, tanpa memandang urutan kata dan tata bahasa namun tetap mempertahankan keberagamannya.
LDA mengklusterkan dengan melihat jumlah kemunculan kata pada Bag of Word, kemudian menentukan jumlah cluster atau jumlah topik dan menentukan jumlah iterasi. LDA menandai setiap kata pada topik secara semi random distribution kemudian menghitung probabilitas topik pada dokumen dan menghitung probabilitas kata pada topik setiap iterasinya.
LDA outputs:

Estimasi dari berapa banyak setiap topik berkontribusi terhadap setiap dokumen.
Estimasi dari berapa banyak setiap kata berkontribusi terhadap setiap topik.

3.2 Fitur Dashboard

Input untuk menambahkan data baru. Harapannya pengguna dashboard yang adalah pemilik usaha pada platform ecommerce dapat melakukan input data mandiri menggunakan data review customer milik mereka sendiri. Hal yang akan menjadi perhatian pada bagia ini adalah adanya elemen yang menunjukkan informasi penggunaan secara jelas. Elemen ini bertujuan agar para pengguna dashboard mengerti struktur data yang dibutuhkan dan tahapan input data secara lengkap dan jelas untuk menghindari error pada output.
Trend topic analysis: Fitur untuk melihat probabilitas tertinggi pada setiap topik dan kata-kata yang mewakilinya. Hal yang penting menjadi perhatian nantinya adalah penggunaan label, susunan, warna akan dibuat semenarik dan sejelas menungkin agar dashboard menjadi intuitif bagi pengguna yang awam sekalipun.

4 . Persiapan & Eksplorasi Data

4.1 Variabel Target/Prediktor

Project ini menggunakan:

Text Mining adalah salah satu metode analisis data yang fokus utamanya adalah mencari informasi dan pola-pola dari data yang tidak terstruktur, yaitu data teks sebagai variabel prediktornya.
Unsupervised Learning, dimana tidak memiliki target variabel.

4.2 Mengambil kolom dan memeriksa struktur data

clothing_1 <- 
clothing %>% select(Rating, Class.Name, Review.Text)

head(clothing_1)

##   Rating Class.Name
## 1      4  Intimates
## 2      5    Dresses
## 3      3    Dresses
## 4      5      Pants
## 5      5    Blouses
## 6      2    Dresses
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Review.Text
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                Absolutely wonderful - silky and sexy and comfortable
## 2                                                                                                                                                                                                      Love this dress!  it's sooo pretty.  i happened to find it in a store, and i'm glad i did bc i never would have ordered it online bc it's petite.  i bought a petite and am 5'8".  i love the length on me- hits just a little below the knee.  would definitely be a true midi on someone who is truly petite.
## 3 I had such high hopes for this dress and really wanted it to work for me. i initially ordered the petite small (my usual size) but i found this to be outrageously small. so small in fact that i could not zip it up! i reordered it in petite medium, which was just ok. overall, the top half was comfortable and fit nicely, but the bottom half had a very tight under layer and several somewhat cheap (net) over layers. imo, a major design flaw was the net over layer sewn directly into the zipper - it c
## 4                                                                                                                                                                                                                                                                                                                                                                                         I love, love, love this jumpsuit. it's fun, flirty, and fabulous! every time i wear it, i get nothing but great compliments!
## 5                                                                                                                                                                                                                                                                                                                     This shirt is very flattering to all due to the adjustable front tie. it is the perfect length to wear with leggings and it is sleeveless so it pairs well with any cardigan. love this shirt!!!
## 6             I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.

Dari dataset yang ada, kolom yang digunakan adalah Rating, Class.Name dan Review.Text. Ketiga variabel ini sudah cukup untuk kebutuhan pengolahan data dan isi dari kolom tersebut juga umum tersedia untuk diakses di platform ecommerce jika nantinya hasil dari project ini akan digunakan oleh pelaku usaha yang lain.

# Jumlah data unik untuk Rating dan Class.Name
nrow(unique(as.data.frame(clothing_1$Rating)))

## [1] 5

nrow(unique(as.data.frame(clothing_1$Class.Name)))

## [1] 21

Dilanjutkan dengan pengecekan kemungkinan data kosong

#Jumlah data kosong
sum(is.na(clothing_1))

## [1] 0

4.3 Mengambil data comment dengan rating rendah

Data yang akan digunakan dalam analisis project ini adalah data teks yang berada pada rating rendah, yakni: nilai rating 1 dan 2 dan untuk kategori product dengan comment terbanyak (asumsi product utama) yakni Dresses.

clothing_rating_low <- 
clothing_1 %>%
  filter(Rating==1|Rating==2) %>% 
  filter(Class.Name=="Dresses")

#Jumlah data observasi
nrow(clothing_rating_low)

## [1] 689

4.4 Mengambil data teks dan mengubahnya ke corpus

Setelah mendapatkan dataset yang hanya berisi rating rendah, dataset tersebut kemudian diambil hanya bagian teksnya saja dan mengubahnya dalam bentuk corpus.

# Take text data only
clothing_rating_low_text <- clothing_rating_low[,3]

# Create a corpus  
clothing_rating_low_text_corpus <- Corpus(VectorSource(clothing_rating_low_text))

head(clothing_rating_low_text_corpus$content, 3)

## [1] "I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress."         
## [2] "First of all, this is not pullover styling. there is a side zipper. i wouldn't have purchased it if i knew there was a side zipper because i have a large bust and side zippers are next to impossible for me.\n\nsecond of all, the tulle feels and looks cheap and the slip has an awkward tight shape underneath.\n\nnot at all what is looks like or is described as. sadly will be returning, but i'm sure i will find something to exchange it for!"                                                        
## [3] "The design/shape of the dress are quite flattering, flirty and feminine. but.... there is no way that the dress i received is new. the color is a faded washed out red and there are black stains all over the belt area. there is no tag... the fabric looks droopy and laundered and is not crisp, stiff or new. i am very disappointed by the quality of the item that i received. undoubtedly this one is going back.\n\ndear retailer - please make sure that you do not send pre-owend clothing articles to"

4.5 Melakukan Pre-processing Data

Data yang ada pada dataset tidak sepenuhnya dapat langsung digunakan dalam proses permodelan. Data ini terlebih dahulu perlu disiapkan agar dapat sesuai dengan kebutuhan. Beberapa penjelasan mengenai tahapan yang dilakukan adalah sebagai berikut:

Case-folding Pengolahan data pada R bersifat case sensitive, sementara data yang ada di awal sebelum dirapikan masih terdiri dari berbagai jenis Font baik huruf kecil maupun huruf kapital. Agar seluruh kata yang ada seragam sehingga tidak berpotensi salah dama pengolahan data dan analisisnya maka perlu dilakukan tahapan untuk mengubah semua huruf kapital menjadi huruf kecil. Setelah semua huruf dalam suatu kata berada pada bentuk yang sama, maka suatu kata tersebut siap untuk dibandingkan untuk kebutuhan pengolahan teks tertentu. Proses mengubah semua huruf menjadi huruf kecil ini dapat dilakukan dengan fungsi tolower.

Remove punctuation Tanda baca bukan merupakan hal yang akan dianlisis dalam Topic Modelling dimana yang elemen analisis yang paling banyak berperan adalah kata. Selain itu, keberadaan tanda baca juga bisa memberikan ruang kesalahan dalam kode program. Karena itu kita perlu menghapus seluruh karakter-karakter yang tidak digunakan untuk pengolahan teks ini. Tanda baca yang perlu dihilangkan adalah ! ’ # S % & ’ ( ) * + , - . / : ; < = > ? @ [ / ] ^ _ { | } ~. Proses penghapusan tanda baca ini dapat dilakukan dengan fungsi removePunctuation.

Remove numbers Tidak jauh berbeda dengan tanda baca, elemen angka juga bukan merupakan hal yang akan dianalisis dalam Topic Modelling. Karena itu kita perlu menghapus seluruh angka yang ada dalam data. Proses penghapusan tanda baca ini dapat dilakukan dengan fungsi removeNumbers.

Stemming & lemmatize Kata yang memiliki imbuhan akan sulit diinterpretasikan dan dianggap sebagai unit yang berbeda dengan kata dasarnya. Pada tahap ini dilakukan proses pengembalian berbagai bentukan kata ke dalam suatu representasi yang sama. Stemming adalah tahap mencari kata dasar dari tiap kata yang ada. Proses mengubah kata menjadi kata dasarnya dapat dilakukan dengan fungsi stemDocument. Stemming hanya melakukan pemotongan kata, apabila ingin lebih akurat kita dapat menggunakan teknik lemmatizing (lookup kata sebenarnya sesuai kaidah bahasa, contoh: receiving -> receive). Namun trade-offnya, komputasi lemmatizing lebih lama dibandingkan stemming. Silahkan mengacu pada referensi untuk penggunaan lemmatizing. Dari hasil stemming pada 6 data teratas terlihat bahwa beberapa kata menjadi tidak baku karena proses stemming yang dilakukan hanya memotong kata tanpa menyesuaikan kata tersebut ke bentuk dasar yang tepat dan baku. Karena itu untuk tahapan ini akan menggunakan metode untuk mendapatkan data-data yang lebih sesuai.

Remove stopwords Stopwords merupakan kata yang diabaikan dalam Topic Modelling dan biasanya disimpan di dalam stop lists. Stop list ini berisi daftar kata umum yang mempunyai fungsi tapi tidak mempunyai arti. Tujuan utama dalam penerapan proses stopwords ini adalah mengurangi jumlah kata dalam sebuah dokumen yang nantinya akan berpengaruh dalam kecepatan dan performa. Karakteristik utama dalam pemilihan stopwords biasanya adalah kata yang mempunyai frekuensi kemunculan yang tinggi. Di bawah ini adalah stopwords dalam Bahasa Inggris.

# Preprocessing Data
clothing_rating_low_text_corpus <-
  clothing_rating_low_text_corpus %>%
  tm_map(content_transformer(tolower)) %>% # Case-folding
  tm_map(content_transformer(removePunctuation)) %>% # Remove punctuation
  tm_map(content_transformer(removeNumbers)) %>% # Remove numbers
  tm_map(content_transformer(lemmatize_strings)) %>% # Stemming & lemmatize
  tm_map(removeWords, stopwords("english")) # Remove stopwords

# Mengubah data ke bentuk dgCMatrix class
clothing_rating_low_text_corpus_dtm <- clothing_rating_low_text_corpus %>% DocumentTermMatrix()
inspect(clothing_rating_low_text_corpus_dtm)

## <<DocumentTermMatrix (documents: 689, terms: 2355)>>
## Non-/sparse entries: 17973/1604622
## Sparsity           : 99%
## Maximal term length: 18
## Weighting          : term frequency (tf)
## Sample             :
##      Terms
## Docs  dress fabric fit good just like look love much size
##   131     2      0   1    0    0    0    1    1    1    4
##   145     1      3   0    0    0    3    1    1    1    0
##   153     0      0   0    0    0    3    3    0    1    0
##   155     3      2   1    0    1    0    1    0    1    1
##   51      3      1   1    2    0    0    0    0    0    5
##   561     3      0   0    1    0    2    0    0    0    0
##   586     0      0   1    2    2    1    2    0    1    0
##   662     0      1   0    1    1    1    0    0    0    0
##   681     4      1   0    2    0    0    1    0    4    2
##   75      3      3   0    0    3    1    0    0    0    0

5 . Membuat Model LDA

Model LDA yang akan dibuat menggunakan nilai k = 5 topics. Nilai 5 masih berdasarkan nilai dipilih secara subjektif. Walaupun demikian, nilai k yang optimum akan dicari setelah modelnya dibangun.

Menggunakan Gibbs-sampling menggunakan 1000 iterasi sampling dan 500 burn-in iterasi. Burn-in iteration berarti kita hanya menggunakan sampel mulai dari iterasi 500 karena iterasi awal masih belum stabil dan mungkin tidak merefleksikan distribusi data aktual.

rating_low_dtm_lda <- Matrix::Matrix(as.matrix(clothing_rating_low_text_corpus_dtm), sparse = T)

set.seed(123)
rating_low_lda <- FitLdaModel(rating_low_dtm_lda, 
                        k = 5, 
                        iterations = 1000,
                        burnin = 500, 
                        calc_coherence = T
                        )

glimpse(rating_low_lda)

## List of 7
##  $ phi      : num [1:5, 1:2355] 2.44e-04 2.03e-05 1.41e-05 1.43e-05 1.23e-05 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
##   .. ..$ : chr [1:2355] "alteration" "away" "brand" "color" ...
##  $ theta    : num [1:689, 1:5] 0.4582 0.2928 0.0272 0.1945 0.0328 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:689] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
##  $ gamma    : num [1:5, 1:2355] 0.8354 0.0427 0.0409 0.041 0.0401 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
##   .. ..$ : chr [1:2355] "alteration" "away" "brand" "color" ...
##  $ data     :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. ..@ i       : int [1:17973] 0 638 0 3 93 151 242 286 514 548 ...
##   .. ..@ p       : int [1:2356] 0 2 13 26 185 750 751 760 775 793 ...
##   .. ..@ Dim     : int [1:2] 689 2355
##   .. ..@ Dimnames:List of 2
##   .. ..@ x       : num [1:17973] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..@ factors : list()
##  $ alpha    : Named num [1:5] 0.1 0.1 0.1 0.1 0.1
##   ..- attr(*, "names")= chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
##  $ beta     : Named num [1:2355] 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
##   ..- attr(*, "names")= chr [1:2355] "alteration" "away" "brand" "color" ...
##  $ coherence: Named num [1:5] 0.0749 0.0477 0.042 0.0347 0.032
##   ..- attr(*, "names")= chr [1:5] "t_1" "t_2" "t_3" "t_4" ...
##  - attr(*, "class")= chr "lda_topic_model"

Berikut adala beberapa atrribut yang diperoleh dari Model LDA:

phi adalah atribut yang menerangkan kemungkinan per-topik-per-kata
theta adalah atribut yang menerangkan kemungkinan per-dokumen-per-topik
coherence adalah atribut yang menerangkan coherence-per-topik

6 . Evaluasi Model

6.1 Mencari Optimal K

Topic coherence yaitu dimana satu set dari kata-kata yang dihasilkan pada topik model dengan dinilai berdasarkan tingkat koherensi atau dalam diinterpretasi oleh manusia dengan tingkat kemudahannya. Topic Coherence mengukur nilai dari suatu topik dengan mengukur tingkat kesamaan semantik antara kata-kata yang ada dalam topik. Pengkukuran ini dapat membantu dalam membedakan antara topik yang dapat diinterpretasi secara semantik dengan topik yang memiliki keterkaitan secara statistik. Topic Coherence merupakan suatu ukuran yang akan digunakan untuk mengevaluasi Topic Modeling, dimana jika coherence skor topik yang tinggi maka model yang dihasilkan tersebut yang baik.

Kita akan mencari jumlah topik yang optimal (k), berdasarkan rata-rata probabilitas coherence untuk beberapa jumlah topik yang berbeda, di antara k = 5 sampai k = 30 dengan interval 5. Untuk mempercepat komputasi, akan digunakan 100 iterasi sampling dan 50 burn-in iterasi.

# rating_low_dtm_lda <- Matrix::Matrix(as.matrix(clothing_rating_low_text_corpus_dtm), sparse = T)
# 
# k_list <- seq(10, 30, by = 10)
# 
# model_list <- TmParallelApply(X = k_list, FUN = function(k){
# 
#   m <- FitLdaModel(dtm = rating_low_dtm_lda,
#                    k = k,
#                    iterations = 10,
#                    burnin = 5,
#                    calc_coherence = TRUE)
# 
#   m <- mean(m$coherence)
# 
#   return(m)
#   },
# cpus = 4
# )

6.2 Melihat terms yang paling banyak muncul pada setiap topik

# To get the top terms for each topic, we can use the GetTopTerms function.
rating_low_lda_word_topic <- GetTopTerms(rating_low_lda$phi, 50) %>%
  as.data.frame()
  # %>% set_names(paste("Topic", 1:5))

rating_low_lda_word_topic

##          t_1         t_2           t_3          t_4           t_5
## 1      dress       dress          look        dress         dress
## 2       size         get         dress       fabric           fit
## 3        fit         one          like         look        fabric
## 4      order        wear          just         like         waist
## 5      small         try          make        color          much
## 6      large      zipper          back         much          just
## 7       love         see          good        cheap           top
## 8        try        back        fabric     material          good
## 9     petite         can          much      picture        really
## 10      just    retailer          love         feel          love
## 11      wear        wash         think      quality           hip
## 12       big        time         model         make          make
## 13    return      button      material        price     beautiful
## 14     think       first         shape          see         small
## 15      like        take        really         back          look
## 16       get         buy          feel         love          size
## 17     tight  disappoint       flatter   disappoint           way
## 18       way        side         heavy         good         color
## 19       arm        even          sack      receive          want
## 20       run      review           cut       online          work
## 21      much        cute          cute        photo          line
## 22      work       great          side         slip           cut
## 23      huge        come         order       return          bust
## 24     color         two          also       expect         chest
## 25  retailer         sew          body         nice           can
## 26      even        find          wear         thin        around
## 27      look        fall         right       really        design
## 28      good         bad           try         also          also
## 29       one        seam         great          can          high
## 30      bust        will          soft       arrive         skirt
## 31   usually        sale        return         blue          wear
## 32  purchase        look          tall     retailer         short
## 33     store         put           may         just        bottom
## 34    review        hand         didnt         line      material
## 35    medium        sure           say         show          bite
## 36   however     quality          nice          fit          like
## 37     style       didnt         super         isnt       flatter
## 38   perfect        year         color       review  unflattering
## 39     short     wouldnt unfortunately        worth           run
## 40      will        pull           arm        order          area
## 41      hole         now         print         come unfortunately
## 42    pretty    although         maybe       design         think
## 43   couldnt        keep          send        style        detail
## 44    really         wed       someone        great          long
## 45     model immediately          huge        white      shoulder
## 46       buy    spin-dry         curvy         high         didnt
## 47     didnt       every        pretty         want         wasnt
## 48 beautiful      either           big unflattering       strange
## 49    person        part          want          say         tight
## 50      cant  completely          know       excite      reviewer

z <- 
rating_low_lda_word_topic %>% 
   rownames_to_column("id") %>%
   mutate(id = as.numeric(id)) %>%
   pivot_longer(-id, names_to = "topic", values_to = "term") %>%
   ggplot(aes(label = term, size = rev(id), alpha = rev(id))) +
   geom_text_wordcloud (seed = 123) +
   facet_wrap(~topic, scales = "free") +
   scale_alpha_continuous(range = c(0.4, 1)) +
   theme_minimal() +
   theme(strip.background = element_rect(fill = "blue"),
         strip.text.x = element_text(colour = "white"))

z

rating_low_lda_word_topic %>%
  select(t_1) %>% 
  rownames_to_column("id") %>%
  pivot_longer(-id, names_to = "topic", values_to = "term") %>% 
   ggplot(aes(label = term, size = rev(id), alpha = rev(id))) +
   geom_text_wordcloud (seed = 123) +
   theme_minimal()

6.3 Interpretasi Topik berdasarkan Probabilitas Document-Topic (Theta)

Nilai theta yang tinggi pada term menunjukkan bahwa term tersebut memiliki probabilitas yang tinggi dihasilkan dari topik tersebut. Hal ini juga mengindikasikan bahwa term tersebut memiliki asosiasi yang kuat kepada topik tertentu. Interpretasi topik dilakukan tidak hanya melihat kata-kata yang paling sering muncul, namun juga mempertimbangkan konteks dari review dengan menggunakan top 10 review pada setiap topik berdasarkan nilai thetanya.

# Menyimpan Theta per Topik
rating_low_theta <- 
rating_low_lda$theta %>%
   as.data.frame() %>% 
   # set_names(paste("Topic", 1:5)) %>% 
   rownames_to_column("document")

Selanjutnya menggabungkan nilai Theta dengan data awal untuk mendapatkan isi review yang lengkap.

rating_low_theta_review <- 
rating_low_theta %>% cbind(clothing_rating_low, deparse.level = 0)

head(rating_low_theta_review, 3)

##   document        t_1         t_2        t_3         t_4         t_5 Rating
## 1        1 0.45822785 0.002531646 0.07848101 0.002531646 0.458227848      2
## 2        2 0.29275362 0.350724638 0.11884058 0.234782609 0.002898551      2
## 3        3 0.02716049 0.397530864 0.05185185 0.446913580 0.076543210      2
##   Class.Name
## 1    Dresses
## 2    Dresses
## 3    Dresses
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Review.Text
## 1          I love tracy reese dresses, but this one is not for the very petite. i am just under 5 feet tall and usually wear a 0p in this brand. this dress was very pretty out of the package but its a lot of dress. the skirt is long and very full so it overwhelmed my small frame. not a stranger to alterations, shortening and narrowing the skirt would take away from the embellishment of the garment. i love the color and the idea of the style but it just did not work on me. i returned this dress.
## 2                                                         First of all, this is not pullover styling. there is a side zipper. i wouldn't have purchased it if i knew there was a side zipper because i have a large bust and side zippers are next to impossible for me.\n\nsecond of all, the tulle feels and looks cheap and the slip has an awkward tight shape underneath.\n\nnot at all what is looks like or is described as. sadly will be returning, but i'm sure i will find something to exchange it for!
## 3 The design/shape of the dress are quite flattering, flirty and feminine. but.... there is no way that the dress i received is new. the color is a faded washed out red and there are black stains all over the belt area. there is no tag... the fabric looks droopy and laundered and is not crisp, stiff or new. i am very disappointed by the quality of the item that i received. undoubtedly this one is going back.\n\ndear retailer - please make sure that you do not send pre-owend clothing articles to

6.3.1 Topik 1

# Top 20 Terms
rating_low_lda_word_topic[,1]

##  [1] "dress"     "size"      "fit"       "order"     "small"     "large"    
##  [7] "love"      "try"       "petite"    "just"      "wear"      "big"      
## [13] "return"    "think"     "like"      "get"       "tight"     "way"      
## [19] "arm"       "run"       "much"      "work"      "huge"      "color"    
## [25] "retailer"  "even"      "look"      "good"      "one"       "bust"     
## [31] "usually"   "purchase"  "store"     "review"    "medium"    "however"  
## [37] "style"     "perfect"   "short"     "will"      "hole"      "pretty"   
## [43] "couldnt"   "really"    "model"     "buy"       "didnt"     "beautiful"
## [49] "person"    "cant"

# # Review
# rating_low_theta_review %>%
#   arrange(desc(`Topic 1`)) %>% 
#   select(`Topic 1`, Class.Name, Review.Text) %>%
#   head(10)

Review Highlight

After reading every review, i was certain i knew exactly what i was getting with this dress. wrong. i sized up from a small to a medium and the dress was still uncomfortably tight on my rib cage. the top is very exposing of the bust, and i’m small busted. i can’t stand wearing camis, so there no way i'd be able to wear this dress even if i sized up again to a large.
I purchased this and another eva franco dress during retailer’s recent 20% off sale. i was looking for dresses that were work appropriate, but that would also transition well to happy hour or date night. they both seemed to be just what i was looking for. `i ordered a 4 regular and a 6 regular, as i am usually in between sizes. the 4 was definitely too small. the 6 fit, technically, but was very ill fitting. not only is the dress itself short, but it is very short-waisted.’
This dress is huge. i am normally a l-xl regular in retailer brands. i love a loose, flowy dress. after trying one on in the store (just a random returned petite xl) i ordered the petite medium. and it was still huge and not flattering. so sad. i wanted to love this- loved the material and colors, but way off.
Don't buy this dress unless you are normally a medium or larger. order it one or two sizes smaller than your normal size. i ordered an xs and it’s more like a medium or large.
I love this dress but i will return it because the sleeves are too tight. i have never had this problem before. i ordered a petite 0 and petite 2. for reference i am 5’ 4” tall and 117 lbs. the length is perfect and i really like the style. if the sleeves had not been so tight i would have kept the petite 0. the petite 2 sleeves were no looser. sorry that this did not work for me.
The dress is beautiful on the model. in person it’s sheer, but i could have dealt with that if the fit had been great. i am 5'4" and the petite was too short and the regular too long. both are very boxy and seem to run big. even the size small in both petite and regular were too big on me and usually in cloth and stone size small is perfect. i really wanted to love this dress but ended up returning several sizes in my attempt to find the perfect size.

Interpretasi: Ukuran tidak standar

6.3.2 Topik 2

# Top 20 Terms
rating_low_lda_word_topic[,2]

##  [1] "dress"       "get"         "one"         "wear"        "try"        
##  [6] "zipper"      "see"         "back"        "can"         "retailer"   
## [11] "wash"        "time"        "button"      "first"       "take"       
## [16] "buy"         "disappoint"  "side"        "even"        "review"     
## [21] "cute"        "great"       "come"        "two"         "sew"        
## [26] "find"        "fall"        "bad"         "seam"        "will"       
## [31] "sale"        "look"        "put"         "hand"        "sure"       
## [36] "quality"     "didnt"       "year"        "wouldnt"     "pull"       
## [41] "now"         "although"    "keep"        "wed"         "immediately"
## [46] "spin-dry"    "every"       "either"      "part"        "completely"

# # Review
# rating_low_theta_review %>%
#   arrange(desc(`Topic 2`)) %>% 
#   select(`Topic 2`, Class.Name, Review.Text) %>%
#   head(10)

Review Highlight

As another reviewer pointed out, although the tag inside the dress says that you can machine wash in cold water, the color around the popsicles will bleed! this is extremely disappointing. i’m hoping it can either be fixed with dye remover, or that retailer will take this item back. would only dry clean this item.
After the second wash, the stitching in the armpits completely fell apart. after sewing it back up, wearing it, and washing it a third time, the stitching in the other armpit completely fell apart as well. not sure if this was just a fluke, but it was very disappointing. now that both armpits are sewn and all other stitches are intact, i must say that this is a great dress. i would have returned the dress to retailer, but i’ve already washed and worn it, so again…very disappointing.
Retailer has incredible service, and so they let me return this dress even after having worn it. however, this dress cannot be cleaned. i saw a stain on it after the first use, so i first tried hand washing it. however, the black part of the dress came off on the white part, making the stain worse. i then tried dry cleaning it (the care tag allows both) and that produced 3 or 4 new stains where the black rubbed off on the white. it’s a beautiful, striking dress (that would be great matern
I love maeve dresses and have always gotten a lot of wear out of them while still looking great. this dress does not meet those standards. on the 2nd wear of this dress the zipper completely broke. i had to take it to a dry cleaner to get replaced at a pricey cost for a pricey dress, i’m dissapointed.
According to the label, this dress can be hand washed or dry cleaned. i hand washed and line dried per the instructions on the label, and all the seams have completely shrunk and gathered. it looks like there’s a drawstring running through every seam. so disappointed in the quality and the labeling. awful! i gave it one extra star because it’s cute.
Unfortunately the dress is pulling apart after the first wash ( gentle hand wash cycle). also, there are no bra-straps holders, and my bra straps did show up a lot…
Love this dress! wore it last night for a musical event… came home and put it on the hand wash setting in the washer with cold water...just as the label states. pulled the dress out and guess what? the design is completely faded!!! it looks 5 years old :( so sad. i’m going to call retailer and ask for a refund (i live two hours away)

Interpretasi: Kualitas material, mengalami kerusakan setelah pencucian

6.3.3 Topik 3

# Top 20 Terms
rating_low_lda_word_topic[,3]

##  [1] "look"          "dress"         "like"          "just"         
##  [5] "make"          "back"          "good"          "fabric"       
##  [9] "much"          "love"          "think"         "model"        
## [13] "material"      "shape"         "really"        "feel"         
## [17] "flatter"       "heavy"         "sack"          "cut"          
## [21] "cute"          "side"          "order"         "also"         
## [25] "body"          "wear"          "right"         "try"          
## [29] "great"         "soft"          "return"        "tall"         
## [33] "may"           "didnt"         "say"           "nice"         
## [37] "super"         "color"         "unfortunately" "arm"          
## [41] "print"         "maybe"         "send"          "someone"      
## [45] "huge"          "curvy"         "pretty"        "big"          
## [49] "want"          "know"

# # Review
# rating_low_theta_review %>%
#   arrange(desc(`Topic 3`)) %>% 
#   select(`Topic 3`, Class.Name, Review.Text) %>%
#   head(10)

Review Highlight

This looked fabulous on the model, of course, and i love sweater dresses. but even in an xs, i was drowning in fabric and looked ludicrous. concept is good but execution went terribly wrong somewhere. a definite return. * The pleats on the bib make this look like something from chloe sevigny’s wardrobe on the set of big love. and the shoulders are cut for an offensive lineman.
Unfortunately, this dress is shaped like a sack and has no shape to speak of. i have sent it back for a refund :( too big, too shapeless.
Too full of a dress for me. i tried a belt and it is then too chunky. if you are younger, tall and slim, this would look great
If you have any curves, avoid! it is not flattering.the striped side panels look odd with the flow of the dress.
I wanted to like this but it just didn't have much of a shape, not flattering at all. maybe if i was taller?
Frumpy - looked like a nightgown on me; maybe better on someone younger …
There were just too many pleats to this thing. it was not flattering. it looks fantastic on this model but not on me. oh well.
Didn’t like the dress. look pregnant in it. this dress is going back.
This dress looks lovely on the model, but it looks just awful on me!\ni am 5'4" and curvy. the shoulders are cut w/ a weird shape. instead of a cap sleeve, it is like a pointy cap on the side and stuck out on my shoulders. it looks like i am ready to get beamed into space.material is nice and the sequins are pretty, but the cut of the dress is unflattering.

Interpretasi: Bentuk saat digunakan tidak bagus terlihat berbeda dengan saat dikenakan oleh model (Display)

6.3.4 Topik 4

# Top 20 Terms
rating_low_lda_word_topic[,4]

##  [1] "dress"        "fabric"       "look"         "like"         "color"       
##  [6] "much"         "cheap"        "material"     "picture"      "feel"        
## [11] "quality"      "make"         "price"        "see"          "back"        
## [16] "love"         "disappoint"   "good"         "receive"      "online"      
## [21] "photo"        "slip"         "return"       "expect"       "nice"        
## [26] "thin"         "really"       "also"         "can"          "arrive"      
## [31] "blue"         "retailer"     "just"         "line"         "show"        
## [36] "fit"          "isnt"         "review"       "worth"        "order"       
## [41] "come"         "design"       "style"        "great"        "white"       
## [46] "high"         "want"         "unflattering" "say"          "excite"

# # Review
# rating_low_theta_review %>%
#   arrange(desc(`Topic 4`)) %>% 
#   select(`Topic 4`, Class.Name, Review.Text) %>%
#   head(10)

Review Highlight

The fabric of this looks and feels so cheap it’s hard to believe this is anna sui. i would say this is unwearable. there is zero stretch and it is made of itchy textured poly fabric that looks like novelty fabric at best. going back asap.
Looks much nicer in the photo. i expected a much higher quality fabric. this fabric truly felt cheap. i expected a nicer dress for the price. the fit was unflattering on me. sent it back
I returned this dress because the material is so flimsy and thin. i was expecting higher quality, especially for the price. i was disappointed. it is a pretty dress and fit great, but i felt like i was wearing tissue paper.
There is no way this is worth the price. i was deeply disappointed when it arrived. the material is thin and feels cheap. i love the design, and anna sui, but this is just so overpriced.
I was very surprised to see such dark blue sequence on this dress. it did not appear as it did on the photo. i was pretty disappointed by the color. the photo of the dress is much more impressive.
The dress is very pretty, but the sequins are dark blue! i imagined they would be silver/gold, given the photo, but the dress is essentially light pink and blue.
the dress, but the fabric makes it look cheap. for an expensive dress, i had expected better quality.
The dress is not the beautiful color in the picture…it is a very yellowish cream. the fabric is thick and does not\nhang pretty. very disappointed. had to return it.
I was disappointed with the style and quality of this dress. it poofs out funny in the waste and it just looks very unflattering. i’m sending it back.
returned the dress because the color was not as pictured in the online photo. i did not like the color.

Interpretasi: Kualitas bahan tidak sesuai dengan harga

6.3.5 Topik 5

# Top 20 Terms
rating_low_lda_word_topic[,5]

##  [1] "dress"         "fit"           "fabric"        "waist"        
##  [5] "much"          "just"          "top"           "good"         
##  [9] "really"        "love"          "hip"           "make"         
## [13] "beautiful"     "small"         "look"          "size"         
## [17] "way"           "color"         "want"          "work"         
## [21] "line"          "cut"           "bust"          "chest"        
## [25] "can"           "around"        "design"        "also"         
## [29] "high"          "skirt"         "wear"          "short"        
## [33] "bottom"        "material"      "bite"          "like"         
## [37] "flatter"       "unflattering"  "run"           "area"         
## [41] "unfortunately" "think"         "detail"        "long"         
## [45] "shoulder"      "didnt"         "wasnt"         "strange"      
## [49] "tight"         "reviewer"

# # Review
# rating_low_theta_review %>%
#   arrange(desc(`Topic 5`)) %>% 
#   select(`Topic 5`, Class.Name, Review.Text) %>%
#   head(10)

Review Highlight

This dress had an odd fit. very loose on top with pointy darts, very tight on bottom. fabric was stiff so it didn’t move well with me
The fabric and colors of this dress are beautiful but the fit is terrible. i had to go up a size to get a fit in the waist but it was then incredibly loose in the shoulders. what a pity. had to return.
I had high hopes for this dress, but unfortunately was disappointed and returned the dress. it flares out with too much material under the gathered waist. it's so odd.
I loved the design and material of the test, but unfortunately it just did not fit me right. it was tight in the hips and did not have the relaxed look i was hoping for also the pattern did not look good on me. unfortunately i had to return.
The fabric and its texture didn’t meet my expectations unfortunately. there were too much fray and cast-off of edge yarns of a fabric… however this will fit if you wear over the swimming wears.
My v-shape figure (broad shoulder, narrow hips, waist not well defined) looked completely square in this. not flattering at all. perhaps better for other figures.
This ran a bit small; i’m normally a 2, this was a 4 and ‘just’ fit. the pattern was very ‘digi,’ the length was shorter than i hoped (i’m 5’1”), and the fabric, while structured, was just cheap and acrylic-feeling. pass.
So disappointed! beautiful dress in the photographs but the cut was incredibly strange. loose and baggy through the top and mid-section but tight around the buttock and thighs.
This is a pretty dress, but the cut is quite boxy. i had thought that the dress would cut in a bit along the body and then "flare out" more around the waist down to the hem; however, the dress is baggy from the sleeves all the way down, which lends to a rather unflattering look.
I was thoroughly underwhelmed. it did not hang well, and the fit seemed off. the tie waist just sort of hung there. i could have sized down, but i didn’t even want to bother. it was pretty boring and flimsy for $138.

Interpretasi: Ukuran/bentuk tidak proporsional

7 . Menghitung terms Frequency

df <- data_frame(Text = clothing_rating_low_text)

head(df, n = 20)

## # A tibble: 20 × 1
##    Text                                                                         
##    <chr>                                                                        
##  1 "I love tracy reese dresses, but this one is not for the very petite. i am j…
##  2 "First of all, this is not pullover styling. there is a side zipper. i would…
##  3 "The design/shape of the dress are quite flattering, flirty and feminine. bu…
##  4 "The colors are vivid and perfectly autumnal but the fit is a mess. it was o…
##  5 "I don't normally review my purchases, but i was so amazed at how poorly thi…
##  6 "I love byron lars dresses, and this design is on-point. the ruffle at the n…
##  7 "Three strikes and retailer is out for me! i am so disappointed. i really li…
##  8 "The fun colors drew me to this but it sure fit weird. the top was fine but …
##  9 ""                                                                           
## 10 "I loved this dress when i saw it. however the fit was way off. i am 5'7\" 1…
## 11 "I don't typically write bad reviews, but this dress is so bad and i want to…
## 12 "Don't buy this dress unless you are normally a medium or larger. order it o…
## 13 "The overall styling was great, and the dress is super-cute, if a little thi…
## 14 "I love retailer and fell in love as soon as i saw this dress online. being …
## 15 "I am floored by the amount of positive reviews on this dress! when i receiv…
## 16 "This didn't work for me. im normally a m (8/10). got this in xs. that was t…
## 17 "I just received this in the mail today. first of all there was no slip incl…
## 18 "Ditto what the first reviewer said, unfortunately. i was so looking forward…
## 19 "I loved the photo of this dress. upon examination of the dress (and trying …
## 20 "The dress arrived with a few snags and the fabric already pilling. the fabr…

#The unnest_tokens() function from the tidytext package picks out the individual words and places them as rows.
clothing_rating_low_words <- df %>% 
                  unnest_tokens(output = word, input = Text)

#An anti_join() is used to remove stopwords from peter_words().
clothing_rating_low_words <- clothing_rating_low_words %>%
                   anti_join(stop_words) # Remove stop words in clothing_rating_low_words

## Joining, by = "word"

#The count() function with the %>% pipe operator from the dplyr package is used to obtain counts of the words.
clothing_rating_low_wordcounts <- clothing_rating_low_words %>% count(word, sort = TRUE)

head(clothing_rating_low_wordcounts)

## # A tibble: 6 × 2
##   word         n
##   <chr>    <int>
## 1 dress     1065
## 2 fabric     254
## 3 size       209
## 4 fit        208
## 5 love       152
## 6 material   139

The data now has a column for words and a second column for the word counts. A bar graph can be prepared with the ggplot2 function ggplot().

# ggplot2 Plot:
clothing_rating_low_wordcounts %>% 
  filter(n > 70) %>% 
  mutate(word = reorder(word, n)) %>% 
    ggplot(aes(word, n)) + 
    geom_col() +
    coord_flip() +
    labs(x = "Word \n", y = "\n Count ", title = "Frequent Words Clothing Review \n") +
    geom_text(aes(label = n), hjust = 1.2, colour = "white", fontface = "bold") +
    theme(plot.title = element_text(hjust = 0.5), 
        axis.title.x = element_text(face="bold", colour="darkblue", size = 12),
        axis.title.y = element_text(face="bold", colour="darkblue", size = 12))

Coba-coba term frequency

words_t1 <- 
rating_low_theta_review %>% 
  arrange(desc(t_2)) %>%
  filter(t_2 > 0.5) %>% 
  select(t_2, Review.Text)

# words_text <- 
# words_t1[,2]

df_t1 <- data_frame(Text = words_t1[,2])

# head(df_t1, n = 20)

#The unnest_tokens() function from the tidytext package picks out the individual words and places them as rows.
df_t1_words <- df_t1 %>%
                  unnest_tokens(output = word, input = Text)

#An anti_join() is used to remove stopwords from peter_words().
df_t1_words <- df_t1_words %>%
                   anti_join(stop_words) #Remove stop words in clothing_rating_low_words

## Joining, by = "word"

#The count() function with the %>% pipe operator from the dplyr package is used to obtain counts of the words.
df_t1_wordcounts <- df_t1_words %>% count(word, sort = TRUE) %>% as.data.frame()

#df_t1_wordcounts

df_t1_wordcounts %>% 
  top_n(10) %>% 
  mutate(word = reorder(word, n)) %>% 
    ggplot(aes(word, n)) + 
    geom_col() +
    coord_flip() +
    labs(x = "Word \n", y = "\n Count ", title = "Frequent Words Clothing Review \n") +
    geom_text(aes(label = n), hjust = 1.2, colour = "white", fontface = "bold") +
    theme(plot.title = element_text(hjust = 0.5), 
        axis.title.x = element_text(face="bold", colour="darkblue", size = 12),
        axis.title.y = element_text(face="bold", colour="darkblue", size = 12))

## Selecting by n

#Data Tabel interpretasi dan solusi

topic = c("t_1","t_2","t_3","t_4","t_5")
interpretasi = c("Ukuran tidak standar",
                 "Kualitas material, mengalami kerusakan setelah pencucian",
                 "Bentuk saat digunakan tidak bagus, berbeda dengan display",
                 "Kualitas bahan tidak sesuai dengan harga",
                 "Ukuran atau bentuk tidak proporsional")
solusi = c("solusi_1", "solusi_2", "solusi_3", "solusi_4", "solusi_5")


data_solusi <- as.data.frame(cbind(topic, interpretasi, solusi))
data_solusi

##   topic                                              interpretasi   solusi
## 1   t_1                                      Ukuran tidak standar solusi_1
## 2   t_2  Kualitas material, mengalami kerusakan setelah pencucian solusi_2
## 3   t_3 Bentuk saat digunakan tidak bagus, berbeda dengan display solusi_3
## 4   t_4                  Kualitas bahan tidak sesuai dengan harga solusi_4
## 5   t_5                     Ukuran atau bentuk tidak proporsional solusi_5

#mengubah bentuk dataset wordcount menjadi Longer

df_wordcounts_longer <- 
rating_low_theta_review %>%
  select(t_1, t_2, t_3, t_4, t_5, Review.Text) %>% 
  pivot_longer(cols= c(t_1, t_2, t_3, t_4, t_5), names_to = "topic") %>%
  filter(value > 0.5) %>%  as.data.frame()

#df_wordcounts_longer

8 Term frequency longer

words_longer_t1 <- 
df_wordcounts_longer %>% filter(topic=="t_1")

df_longer_t1 <- data_frame(Text = words_longer_t1[,1])

# head(df_t1, n = 20)

#The unnest_tokens() function from the tidytext package picks out the individual words and places them as rows.
df_longer_t1_words <- df_longer_t1 %>%
  mutate_all(as.character) %>%
  unnest_tokens(output = word, input = Text)

#An anti_join() is used to remove stopwords from peter_words().
df_longer_t1_words <- df_longer_t1_words %>%
                   anti_join(stop_words) #Remove stop words in clothing_rating_low_words

## Joining, by = "word"

#The count() function with the %>% pipe operator from the dplyr package is used to obtain counts of the words.
df_longer_t1_words <- df_longer_t1_words %>% count(word, sort = TRUE) %>% as.data.frame()

#df_longer_t1_words