Data Retail merupakan data tahun 2014-2017 yang berasal
dari sebuah perusahaan retail di mana datanya berfokus pada 3 kategori
produk yaitu furnitur, peralatan kantor, dan teknologi. Berikut
penjelasan masing-masing kolomnya:
Order.ID ID transaksiOrder.Date Tanggal transaksiShip.Date Tanggal pengirimanShip.Mode Jenis pengirimanCustomer.ID ID pelangganSegment Segmen pelangganProduct.ID ID produkCategory Kategori produk (furnitur, peralatan kantor,
teknologi)Sub-Category Jenis dari tiap kategory produkProduct.Name Nama produkSales Revenue dari transaksiQuantity Jumlah produk terjualDiscount Diskon tiap produk tiap transaksiProfit Keuntungan perusahaan tiap transaksi per
produklibrary(readr)
library(ggplot2)## Warning: package 'ggplot2' was built under R version 4.2.2
library(scales)##
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
##
## col_factor
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(skimr)
library(tidyr)
library(purrr)##
## Attaching package: 'purrr'
## The following object is masked from 'package:scales':
##
## discard
library(tibble)
library(stringr)retail <- read.csv("retail.csv")
#menampilkan 6 data teratas (6 data adalah default "head" di R)
head(retail)## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 1 1 CA-2016-152156 11/8/16 11/11/16 Second Class CG-12520
## 2 2 CA-2016-152156 11/8/16 11/11/16 Second Class CG-12520
## 3 3 CA-2016-138688 6/12/16 6/16/16 Second Class DV-13045
## 4 4 US-2015-108966 10/11/15 10/18/15 Standard Class SO-20335
## 5 5 US-2015-108966 10/11/15 10/18/15 Standard Class SO-20335
## 6 6 CA-2014-115812 6/9/14 6/14/14 Standard Class BH-11710
## Segment Product.ID Category Sub.Category
## 1 Consumer FUR-BO-10001798 Furniture Bookcases
## 2 Consumer FUR-CH-10000454 Furniture Chairs
## 3 Corporate OFF-LA-10000240 Office Supplies Labels
## 4 Consumer FUR-TA-10000577 Furniture Tables
## 5 Consumer OFF-ST-10000760 Office Supplies Storage
## 6 Consumer FUR-FU-10001487 Furniture Furnishings
## Product.Name Sales
## 1 Bush Somerset Collection Bookcase 261.9600
## 2 Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back 731.9400
## 3 Self-Adhesive Address Labels for Typewriters by Universal 14.6200
## 4 Bretford CR4500 Series Slim Rectangular Table 957.5775
## 5 Eldon Fold 'N Roll Cart System 22.3680
## 6 Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood 48.8600
## Quantity Discount Profit
## 1 2 0.00 41.9136
## 2 3 0.00 219.5820
## 3 2 0.00 6.8714
## 4 5 0.45 -383.0310
## 5 2 0.20 2.5164
## 6 7 0.00 14.1694
str(retail)## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
## $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
Mengubah tipe data untuk kolom Order.Date,
Ship.Date, month, dan year
menjadi data “datetime”, sementara Ship.Mode,
Segment, Category, dan
Sub.Category menjadi data “factor”.
retail$Order.Date <- mdy(retail$Order.Date)
retail$Ship.Date <- mdy(retail$Ship.Date)
retail$Ship.Mode <- as.factor(retail$Ship.Mode)
retail$Segment <- as.factor(retail$Segment)
retail$Category <- as.factor(retail$Category)
retail$Sub.Category <- as.factor(retail$Sub.Category)
retail$month <- month(retail$Order.Date, label = T, abbr = T)
retail$year <- year(retail$Order.Date)Mengecek apakah tipe data sudah berubah
str(retail)## 'data.frame': 9994 obs. of 17 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : Date, format: "2016-11-08" "2016-11-08" ...
## $ Ship.Date : Date, format: "2016-11-11" "2016-11-11" ...
## $ Ship.Mode : Factor w/ 4 levels "First Class",..: 3 3 3 4 4 4 4 4 4 4 ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : Factor w/ 3 levels "Consumer","Corporate",..: 1 1 2 1 1 1 1 1 1 1 ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : Factor w/ 3 levels "Furniture","Office Supplies",..: 1 1 2 1 2 1 2 3 2 2 ...
## $ Sub.Category: Factor w/ 17 levels "Accessories",..: 5 6 11 17 15 10 3 14 4 2 ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
## $ month : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 11 11 6 10 10 6 6 6 6 6 ...
## $ year : num 2016 2016 2016 2015 2015 ...
retail_clean <- retail %>%
arrange(retail$Order.Date)
head(retail_clean)## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 1 7981 CA-2014-103800 2014-01-03 2014-01-07 Standard Class DP-13000
## 2 740 CA-2014-112326 2014-01-04 2014-01-08 Standard Class PO-19195
## 3 741 CA-2014-112326 2014-01-04 2014-01-08 Standard Class PO-19195
## 4 742 CA-2014-112326 2014-01-04 2014-01-08 Standard Class PO-19195
## 5 1760 CA-2014-141817 2014-01-05 2014-01-12 Standard Class MB-18085
## 6 5328 CA-2014-130813 2014-01-06 2014-01-08 Second Class LS-17230
## Segment Product.ID Category Sub.Category
## 1 Consumer OFF-PA-10000174 Office Supplies Paper
## 2 Home Office OFF-LA-10003223 Office Supplies Labels
## 3 Home Office OFF-ST-10002743 Office Supplies Storage
## 4 Home Office OFF-BI-10004094 Office Supplies Binders
## 5 Consumer OFF-AR-10003478 Office Supplies Art
## 6 Consumer OFF-PA-10002005 Office Supplies Paper
## Product.Name
## 1 Message Book, Wirebound, Four 5 1/2" X 4" Forms/Pg., 200 Dupl. Sets/Book
## 2 Avery 508
## 3 SAFCO Boltless Steel Shelving
## 4 GBC Standard Plastic Binding Systems Combs
## 5 Avery Hi-Liter EverBold Pen Style Fluorescent Highlighters, 4/Pack
## 6 Xerox 225
## Sales Quantity Discount Profit month year
## 1 16.448 2 0.2 5.5512 Jan 2014
## 2 11.784 3 0.2 4.2717 Jan 2014
## 3 272.736 3 0.2 -64.7748 Jan 2014
## 4 3.540 2 0.8 -5.4870 Jan 2014
## 5 19.536 3 0.2 4.8840 Jan 2014
## 6 19.440 3 0.0 9.3312 Jan 2014
tail(retail_clean)## Row.ID Order.ID Order.Date Ship.Date Ship.Mode Customer.ID
## 9989 907 CA-2017-143259 2017-12-30 2018-01-03 Standard Class PO-18865
## 9990 908 CA-2017-143259 2017-12-30 2018-01-03 Standard Class PO-18865
## 9991 909 CA-2017-143259 2017-12-30 2018-01-03 Standard Class PO-18865
## 9992 1297 CA-2017-115427 2017-12-30 2018-01-03 Standard Class EB-13975
## 9993 1298 CA-2017-115427 2017-12-30 2018-01-03 Standard Class EB-13975
## 9994 5092 CA-2017-156720 2017-12-30 2018-01-03 Standard Class JM-15580
## Segment Product.ID Category Sub.Category
## 9989 Consumer FUR-BO-10003441 Furniture Bookcases
## 9990 Consumer TEC-PH-10004774 Technology Phones
## 9991 Consumer OFF-BI-10003684 Office Supplies Binders
## 9992 Corporate OFF-BI-10002103 Office Supplies Binders
## 9993 Corporate OFF-BI-10004632 Office Supplies Binders
## 9994 Consumer OFF-FA-10003472 Office Supplies Fasteners
## Product.Name Sales Quantity
## 9989 Bush Westfield Collection Bookcases, Fully Assembled 323.136 4
## 9990 Gear Head AU3700S Headset 90.930 7
## 9991 Wilson Jones Legal Size Ring Binders 52.776 3
## 9992 Cardinal Slant-D Ring Binder, Heavy Gauge Vinyl 13.904 2
## 9993 GBC Binding covers 20.720 2
## 9994 Bagged Rubber Bands 3.024 3
## Discount Profit month year
## 9989 0.2 12.1176 Dec 2017
## 9990 0.0 2.7279 Dec 2017
## 9991 0.2 19.7910 Dec 2017
## 9992 0.2 4.5188 Dec 2017
## 9993 0.2 6.4750 Dec 2017
## 9994 0.2 -0.6048 Dec 2017
Ada beberapa hal yang akan kita analisis dari data
retail, yaitu: - Kategori dengan penjualan terbesar pada
tahun 2017 - Pada bulan apa kategori tersebut memiliki penjualan
terbesar di tahun 2017 - Sub kategori dari kategori tersebur dengan
penjualan terbesar di tahun 2017
category_2017 <-
retail_clean %>%
select(Category, Sales, year) %>%
filter(year == 2017)
ggplot(category_2017, aes(Category,Sales)) + geom_col() + theme_minimal() +
labs(title = "Penjualan Terbesar Tahun 2017 per Kategori")
Pada tahun 2017, kategori teknologi memiliki nilai penjualan
terbesar.
tech_2017 <-
retail_clean %>%
select(Category, Sales, month, year) %>%
filter(Category == "Technology" & year == 2017)
tech_month <- aggregate( Sales ~ month , tech_2017 , sum )
ggplot(tech_2017, aes(month,Sales)) +
geom_col() +
geom_hline(yintercept = mean(tech_month$Sales, na.rm=TRUE), color='red', lty='dashed', lwd=2) +
theme_minimal() +
labs(title = "Penjualan Teknologi Selama 2017")## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
Penjualan teknologi terbesar pada tahun 2017 terjadi di Bulan November
dengan penjualan hampir $50,000. Pada tahun 2017, rata-rata penjualan
produk teknologi berada di sekitar $22,500.
tech <-
retail_clean %>%
select(Category, Sales, month, year, Sub.Category) %>%
filter(Category == "Technology" & year == 2017)
ggplot(tech, aes(Sub.Category,Sales)) +
geom_col() +
theme_minimal() +
labs(title = "Penjualan Sub-Kategori Teknologi Tahun 2017")Sub-kategori teknologi yang paling besar menyumbang penjualan adalah
Phones.