class: center, middle, inverse, title-slide .title[ #
SUPERSTORE SALES PREDICTION BY USING MACHINE LEARNING MODEL
] .subtitle[ ##
WQD7004 PROGRAMMING DATA SCIENCE PROJECT
] .author[ ###
DANIEL (S2115750), VIRGIL (S2136594), YUEN HERN (S2121801), SYAKIRAH (S2132021)
] .institute[ ###
UNIVERSITI MALAYA
] --- ## Project Background Superstore provides the sale of goods and services to the consumers. Undeniably, there are many superstore company existed in the world. For example, Tesco, Target, Walmart and many more. In conjunction with that matter, this project will also work on the database from the superstore. Plus, in this project, prediction and sales forecasting will be carried out by using machine learning model. <img src="data:image/png;base64,#img/pic1.jpg" width="450" height="300" style="display: block; margin: 0 auto" /> --- ## Problem Statement With growing demands and cut-throat competitions in the market, knowledge is important for the understanding of what works best for the company. In addition to that, good strategic is also one of the key point for the company, to stay relevant, fighting with the other big superstore company. Hence,superstore attributes like products, regions, categories, customer segments and many others are the few factors that will be studied during this project. ## Project Objective - To explore and find pattern out of the shopping attributes of the superstore. - To develop the machine learning model in predicting sales of the superstore. - To evaluate the machine learning model when predicting the sales of the superstore. --- ## Project Methodology This project is consisted of 5 steps. They are Data Collection, Data Pre-processing, Data Exploration, Modelling and Evaluation. <img src="data:image/png;base64,#img/pic6.jpg" width="800" height="300" style="display: block; margin: 0 auto" /> --- ## Data Collection The data is collected from the Kaggle Website. The data is consisted of 21 columns with 9,994 number of rows. In addition, in the XLS file, there are 3 sheets: - Orders: List of transactions - Returns: List of items returned - People: List of sales person for West, East, Central and South All sheet of data are interconnected between one another. Hence, the sheets are combined together into 1 sheet of data that is called, `df`. Sample of the data will be shown on the next slide. <br/><br/>The link to the Kaggle website can be obtained below: <br/> https://www.kaggle.com/datasets/vivek468/superstore-dataset-final?select=Sample+-+Superstore.csv --- ## Sample of the Dataset This is the 50 samples from the original dataset. <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["Row ID"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Order ID"],"name":[2],"type":["chr"],"align":["left"]},{"label":["Order Date"],"name":[3],"type":["date"],"align":["right"]},{"label":["Ship Date"],"name":[4],"type":["date"],"align":["right"]},{"label":["Ship Mode"],"name":[5],"type":["chr"],"align":["left"]},{"label":["Customer ID"],"name":[6],"type":["chr"],"align":["left"]},{"label":["Customer Name"],"name":[7],"type":["chr"],"align":["left"]},{"label":["Segment"],"name":[8],"type":["chr"],"align":["left"]},{"label":["Country"],"name":[9],"type":["chr"],"align":["left"]},{"label":["City"],"name":[10],"type":["chr"],"align":["left"]},{"label":["State"],"name":[11],"type":["chr"],"align":["left"]},{"label":["Postal Code"],"name":[12],"type":["dbl"],"align":["right"]},{"label":["Region"],"name":[13],"type":["chr"],"align":["left"]},{"label":["Product ID"],"name":[14],"type":["chr"],"align":["left"]},{"label":["Category"],"name":[15],"type":["chr"],"align":["left"]},{"label":["Sub-Category"],"name":[16],"type":["chr"],"align":["left"]},{"label":["Product Name"],"name":[17],"type":["chr"],"align":["left"]},{"label":["Sales"],"name":[18],"type":["dbl"],"align":["right"]},{"label":["Quantity"],"name":[19],"type":["dbl"],"align":["right"]},{"label":["Discount"],"name":[20],"type":["dbl"],"align":["right"]},{"label":["Profit"],"name":[21],"type":["dbl"],"align":["right"]},{"label":["Returned"],"name":[22],"type":["chr"],"align":["left"]}],"data":[{"1":"1","2":"CA-2016-152156","3":"2016-11-08","4":"2016-11-11","5":"Second Class","6":"CG-12520","7":"Claire Gute","8":"Consumer","9":"United States","10":"Henderson","11":"KENTUCKY","12":"42420","13":"South","14":"FUR-BO-10001798","15":"Furniture","16":"Bookcases","17":"Bush Somerset Collection Bookcase","18":"261.9600","19":"2","20":"0.00","21":"41.9136","22":"NA"},{"1":"2","2":"CA-2016-152156","3":"2016-11-08","4":"2016-11-11","5":"Second Class","6":"CG-12520","7":"Claire Gute","8":"Consumer","9":"United States","10":"Henderson","11":"kentucky","12":"42420","13":"South","14":"FUR-CH-10000454","15":"Furniture","16":"Chairs","17":"Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back","18":"731.9400","19":"-3","20":"0.00","21":"219.5820","22":"NA"},{"1":"3","2":"CA-2016-138688","3":"2016-06-12","4":"2016-06-16","5":"Second Class","6":"DV-13045","7":"Darrin Van Huff","8":"Corporate","9":"United States","10":"Los Angeles","11":"California","12":"90036","13":"West","14":"OFF-LA-10000240","15":"Office Supplies","16":"Labels","17":"Self-Adhesive Address Labels for Typewriters by Universal","18":"14.6200","19":"2","20":"0.00","21":"6.8714","22":"NA"},{"1":"4","2":"US-2015-108966","3":"2015-10-11","4":"2015-10-18","5":"NA","6":"SO-20335","7":"Sean O'Donnell","8":"Consumer","9":"United States","10":"Fort Lauderdale","11":"florida","12":"33311","13":"South","14":"FUR-TA-10000577","15":"Furniture","16":"Tables","17":"Bretford CR4500 Series Slim Rectangular Table","18":"957.5775","19":"5","20":"0.45","21":"-383.0310","22":"NA"},{"1":"5","2":"US-2015-108966","3":"2015-10-11","4":"2015-10-18","5":"Standard Class","6":"SO-20335","7":"Sean O'Donnell","8":"Consumer","9":"United States","10":"Fort Lauderdale","11":"Florida","12":"33311","13":"South","14":"OFF-ST-10000760","15":"Office Supplies","16":"Storage","17":"Eldon Fold 'N Roll Cart System","18":"22.3680","19":"2","20":"0.20","21":"2.5164","22":"NA"},{"1":"6","2":"CA-2014-115812","3":"2014-06-09","4":"2014-06-14","5":"Standard Class","6":"BH-11710","7":"Brosina Hoffman","8":"Consumer","9":"United States","10":"Los Angeles","11":"California","12":"90032","13":"West","14":"FUR-FU-10001487","15":"Furniture","16":"Furnishings","17":"Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood","18":"48.8600","19":"-7","20":"0.00","21":"14.1694","22":"NA"},{"1":"7","2":"CA-2014-115812","3":"2014-06-09","4":"2014-06-14","5":"NA","6":"BH-11710","7":"Brosina Hoffman","8":"Consumer","9":"United States","10":"Los Angeles","11":"CALIFORNIA","12":"90032","13":"West","14":"OFF-AR-10002833","15":"Office Supplies","16":"Art","17":"Newell 322","18":"7.2800","19":"4","20":"0.00","21":"1.9656","22":"NA"},{"1":"8","2":"CA-2014-115812","3":"2014-06-09","4":"2014-06-14","5":"Standard Class","6":"BH-11710","7":"Brosina Hoffman","8":"Consumer","9":"United States","10":"Los Angeles","11":"California","12":"90032","13":"West","14":"TEC-PH-10002275","15":"Technology","16":"Phones","17":"Mitel 5320 IP Phone VoIP phone","18":"907.1520","19":"6","20":"0.20","21":"90.7152","22":"NA"},{"1":"9","2":"CA-2014-115812","3":"2014-06-09","4":"2014-06-14","5":"Standard Class","6":"BH-11710","7":"Brosina Hoffman","8":"Consumer","9":"United States","10":"Los Angeles","11":"California","12":"90032","13":"West","14":"OFF-BI-10003910","15":"Office Supplies","16":"Binders","17":"DXL Angle-View Binders with Locking Rings by Samsill","18":"18.5040","19":"3","20":"0.20","21":"5.7825","22":"NA"},{"1":"10","2":"CA-2014-115812","3":"2014-06-09","4":"2014-06-14","5":"NA","6":"BH-11710","7":"Brosina Hoffman","8":"Consumer","9":"United States","10":"Los Angeles","11":"California","12":"90032","13":"West","14":"OFF-AP-10002892","15":"Office Supplies","16":"Appliances","17":"Belkin F5C206VTEL 6 Outlet Surge","18":"114.9000","19":"5","20":"0.00","21":"34.4700","22":"NA"},{"1":"11","2":"CA-2014-115812","3":"2014-06-09","4":"2014-06-14","5":"Standard Class","6":"BH-11710","7":"Brosina Hoffman","8":"Consumer","9":"United States","10":"Los Angeles","11":"California","12":"90032","13":"West","14":"FUR-TA-10001539","15":"Furniture","16":"Tables","17":"Chromcraft Rectangular Conference Tables","18":"1706.1840","19":"9","20":"0.20","21":"85.3092","22":"NA"},{"1":"12","2":"CA-2014-115812","3":"2014-06-09","4":"2014-06-14","5":"Standard Class","6":"BH-11710","7":"Brosina Hoffman","8":"Consumer","9":"United States","10":"Los Angeles","11":"California","12":"90032","13":"West","14":"TEC-PH-10002033","15":"Technology","16":"Phones","17":"Konftel 250 Conference phone - Charcoal black","18":"911.4240","19":"4","20":"0.20","21":"68.3568","22":"NA"},{"1":"13","2":"CA-2017-114412","3":"2017-04-15","4":"2017-04-20","5":"Standard Class","6":"AA-10480","7":"Andrew Allen","8":"Consumer","9":"United States","10":"Concord","11":"North Carolina","12":"28027","13":"South","14":"OFF-PA-10002365","15":"Office Supplies","16":"Paper","17":"Xerox 1967","18":"15.5520","19":"3","20":"0.20","21":"5.4432","22":"NA"},{"1":"14","2":"CA-2016-161389","3":"2016-12-05","4":"2016-12-10","5":"Standard Class","6":"IM-15070","7":"Irene Maddox","8":"Consumer","9":"United States","10":"Seattle","11":"Washington","12":"98103","13":"West","14":"OFF-BI-10003656","15":"Office Supplies","16":"Binders","17":"Fellowes PB200 Plastic Comb Binding Machine","18":"407.9760","19":"3","20":"0.20","21":"132.5922","22":"NA"},{"1":"15","2":"US-2015-118983","3":"2015-11-22","4":"2015-11-26","5":"NA","6":"HP-14815","7":"Harold Pawlan","8":"Home Office","9":"United States","10":"Fort Worth","11":"Texas","12":"76106","13":"Central","14":"OFF-AP-10002311","15":"Office Supplies","16":"Appliances","17":"Holmes Replacement Filter for HEPA Air Cleaner, Very Large Room, HEPA Filter","18":"68.8100","19":"-5","20":"0.80","21":"-123.8580","22":"NA"},{"1":"16","2":"US-2015-118983","3":"2015-11-22","4":"2015-11-26","5":"Standard Class","6":"HP-14815","7":"Harold Pawlan","8":"Home Office","9":"United States","10":"Fort Worth","11":"Texas","12":"76106","13":"Central","14":"OFF-BI-10000756","15":"Office Supplies","16":"Binders","17":"Storex DuraTech Recycled Plastic Frosted Binders","18":"2.5440","19":"3","20":"0.80","21":"-3.8160","22":"NA"},{"1":"17","2":"CA-2014-105893","3":"2014-11-11","4":"2014-11-18","5":"Standard Class","6":"PK-19075","7":"Pete Kriz","8":"Consumer","9":"United States","10":"Madison","11":"Wisconsin","12":"53711","13":"Central","14":"OFF-ST-10004186","15":"Office Supplies","16":"Storage","17":"Stur-D-Stor Shelving, Vertical 5-Shelf: 72\"H x 36\"W x 18 1/2\"D","18":"665.8800","19":"6","20":"0.00","21":"13.3176","22":"NA"},{"1":"18","2":"CA-2014-167164","3":"2014-05-13","4":"2014-05-15","5":"Second Class","6":"AG-10270","7":"Alejandro Grove","8":"Consumer","9":"United States","10":"West Jordan","11":"Utah","12":"84084","13":"West","14":"OFF-ST-10000107","15":"Office Supplies","16":"Storage","17":"Fellowes Super Stor/Drawer","18":"55.5000","19":"2","20":"0.00","21":"9.9900","22":"NA"},{"1":"19","2":"CA-2014-143336","3":"2014-08-27","4":"2014-09-01","5":"Second Class","6":"ZD-21925","7":"Zuschuss Donatelli","8":"Consumer","9":"United States","10":"San Francisco","11":"California","12":"94109","13":"West","14":"OFF-AR-10003056","15":"Office Supplies","16":"Art","17":"Newell 341","18":"8.5600","19":"2","20":"0.00","21":"2.4824","22":"Yes"},{"1":"20","2":"CA-2014-143336","3":"2014-08-27","4":"2014-09-01","5":"Second Class","6":"ZD-21925","7":"Zuschuss Donatelli","8":"Consumer","9":"United States","10":"San Francisco","11":"California","12":"94109","13":"West","14":"TEC-PH-10001949","15":"Technology","16":"Phones","17":"Cisco SPA 501G IP Phone","18":"213.4800","19":"3","20":"0.20","21":"16.0110","22":"Yes"},{"1":"21","2":"CA-2014-143336","3":"2014-08-27","4":"2014-09-01","5":"Second Class","6":"ZD-21925","7":"Zuschuss Donatelli","8":"Consumer","9":"United States","10":"San Francisco","11":"California","12":"94109","13":"West","14":"OFF-BI-10002215","15":"Office Supplies","16":"Binders","17":"Wilson Jones Hanging View Binder, White, 1\"","18":"22.7200","19":"-4","20":"0.20","21":"7.3840","22":"Yes"},{"1":"22","2":"CA-2016-137330","3":"2016-12-09","4":"2016-12-13","5":"Standard Class","6":"KB-16585","7":"Ken Black","8":"Corporate","9":"United States","10":"Fremont","11":"Nebraska","12":"68025","13":"Central","14":"OFF-AR-10000246","15":"Office Supplies","16":"Art","17":"Newell 318","18":"19.4600","19":"7","20":"NA","21":"5.0596","22":"NA"},{"1":"23","2":"CA-2016-137330","3":"2016-12-09","4":"2016-12-13","5":"Standard Class","6":"KB-16585","7":"Ken Black","8":"Corporate","9":"United States","10":"Fremont","11":"Nebraska","12":"68025","13":"Central","14":"OFF-AP-10001492","15":"Office Supplies","16":"Appliances","17":"Acco Six-Outlet Power Strip, 4' Cord Length","18":"60.3400","19":"7","20":"0.00","21":"15.6884","22":"NA"},{"1":"24","2":"US-2017-156909","3":"2017-07-16","4":"2017-07-18","5":"Second Class","6":"SF-20065","7":"Sandra Flanagan","8":"Consumer","9":"United States","10":"Philadelphia","11":"Pennsylvania","12":"19140","13":"East","14":"FUR-CH-10002774","15":"Furniture","16":"Chairs","17":"Global Deluxe Stacking Chair, Gray","18":"71.3720","19":"2","20":"0.30","21":"-1.0196","22":"NA"},{"1":"25","2":"CA-2015-106320","3":"2015-09-25","4":"2015-09-30","5":"Standard Class","6":"EB-13870","7":"Emily Burns","8":"Consumer","9":"United States","10":"Orem","11":"UTAH","12":"84057","13":"West","14":"FUR-TA-10000577","15":"Furniture","16":"Tables","17":"Bretford CR4500 Series Slim Rectangular Table","18":"1044.6300","19":"3","20":"0.00","21":"240.2649","22":"NA"},{"1":"26","2":"CA-2016-121755","3":"2016-01-16","4":"2016-01-20","5":"Second Class","6":"EH-13945","7":"Eric Hoffmann","8":"Consumer","9":"United States","10":"Los Angeles","11":"California","12":"90049","13":"West","14":"OFF-BI-10001634","15":"Office Supplies","16":"Binders","17":"Wilson Jones Active Use Binders","18":"11.6480","19":"2","20":"0.20","21":"4.2224","22":"NA"},{"1":"27","2":"CA-2016-121755","3":"2016-01-16","4":"2016-01-20","5":"Second Class","6":"EH-13945","7":"Eric Hoffmann","8":"Consumer","9":"United States","10":"Los Angeles","11":"California","12":"90049","13":"West","14":"TEC-AC-10003027","15":"Technology","16":"Accessories","17":"Imation 8GB Mini TravelDrive USB 2.0 Flash Drive","18":"90.5700","19":"3","20":"0.00","21":"11.7741","22":"NA"},{"1":"28","2":"US-2015-150630","3":"2015-09-17","4":"2015-09-21","5":"Standard Class","6":"TB-21520","7":"Tracy Blumstein","8":"Consumer","9":"United States","10":"Philadelphia","11":"Pennsylvania","12":"19140","13":"East","14":"FUR-BO-10004834","15":"Furniture","16":"Bookcases","17":"Riverside Palais Royal Lawyers Bookcase, Royale Cherry Finish","18":"3083.4300","19":"7","20":"0.50","21":"-1665.0522","22":"NA"},{"1":"29","2":"US-2015-150630","3":"2015-09-17","4":"2015-09-21","5":"Standard Class","6":"TB-21520","7":"Tracy Blumstein","8":"Consumer","9":"United States","10":"Philadelphia","11":"Pennsylvania","12":"19140","13":"East","14":"OFF-BI-10000474","15":"Office Supplies","16":"Binders","17":"Avery Recycled Flexi-View Covers for Binding Systems","18":"9.6180","19":"2","20":"0.70","21":"-7.0532","22":"NA"},{"1":"30","2":"US-2015-150630","3":"2015-09-17","4":"2015-09-21","5":"NA","6":"TB-21520","7":"Tracy Blumstein","8":"Consumer","9":"United States","10":"Philadelphia","11":"Pennsylvania","12":"19140","13":"East","14":"FUR-FU-10004848","15":"Furniture","16":"Furnishings","17":"Howard Miller 13-3/4\" Diameter Brushed Chrome Round Wall Clock","18":"124.2000","19":"3","20":"0.20","21":"15.5250","22":"NA"},{"1":"31","2":"US-2015-150630","3":"2015-09-17","4":"2015-09-21","5":"Standard Class","6":"TB-21520","7":"Tracy Blumstein","8":"Consumer","9":"United States","10":"Philadelphia","11":"Pennsylvania","12":"19140","13":"East","14":"OFF-EN-10001509","15":"Office Supplies","16":"Envelopes","17":"Poly String Tie Envelopes","18":"3.2640","19":"2","20":"0.20","21":"1.1016","22":"NA"},{"1":"32","2":"US-2015-150630","3":"2015-09-17","4":"2015-09-21","5":"NA","6":"TB-21520","7":"Tracy Blumstein","8":"Consumer","9":"United States","10":"Philadelphia","11":"Pennsylvania","12":"19140","13":"East","14":"OFF-AR-10004042","15":"Office Supplies","16":"Art","17":"BOSTON Model 1800 Electric Pencil Sharpeners, Putty/Woodgrain","18":"86.3040","19":"6","20":"0.20","21":"9.7092","22":"NA"},{"1":"33","2":"US-2015-150630","3":"2015-09-17","4":"2015-09-21","5":"NA","6":"TB-21520","7":"Tracy Blumstein","8":"Consumer","9":"United States","10":"Philadelphia","11":"Pennsylvania","12":"19140","13":"East","14":"OFF-BI-10001525","15":"Office Supplies","16":"Binders","17":"Acco Pressboard Covers with Storage Hooks, 14 7/8\" x 11\", Executive Red","18":"6.8580","19":"6","20":"0.70","21":"-5.7150","22":"NA"},{"1":"34","2":"US-2015-150630","3":"2015-09-17","4":"2015-09-21","5":"Standard Class","6":"TB-21520","7":"Tracy Blumstein","8":"Consumer","9":"United States","10":"Philadelphia","11":"Pennsylvania","12":"19140","13":"East","14":"OFF-AR-10001683","15":"Office Supplies","16":"Art","17":"Lumber Crayons","18":"15.7600","19":"2","20":"0.20","21":"3.5460","22":"NA"},{"1":"35","2":"CA-2017-107727","3":"2017-10-19","4":"2017-10-23","5":"Second Class","6":"MA-17560","7":"Matt Abelman","8":"Home Office","9":"United States","10":"Houston","11":"Texas","12":"77095","13":"Central","14":"OFF-PA-10000249","15":"Office Supplies","16":"Paper","17":"Easy-staple paper","18":"29.4720","19":"3","20":"0.20","21":"9.9468","22":"NA"},{"1":"36","2":"CA-2016-117590","3":"2016-12-08","4":"2016-12-10","5":"First Class","6":"GH-14485","7":"Gene Hale","8":"Corporate","9":"United States","10":"Richardson","11":"Texas","12":"75080","13":"Central","14":"TEC-PH-10004977","15":"Technology","16":"Phones","17":"GE 30524EE4","18":"1097.5440","19":"7","20":"0.20","21":"123.4737","22":"NA"},{"1":"37","2":"CA-2016-117590","3":"2016-12-08","4":"2016-12-10","5":"First Class","6":"GH-14485","7":"Gene Hale","8":"Corporate","9":"United States","10":"Richardson","11":"Texas","12":"75080","13":"Central","14":"FUR-FU-10003664","15":"Furniture","16":"Furnishings","17":"Electrix Architect's Clamp-On Swing Arm Lamp, Black","18":"190.9200","19":"5","20":"0.60","21":"-147.9630","22":"NA"},{"1":"38","2":"CA-2015-117415","3":"2015-12-27","4":"2015-12-31","5":"Standard Class","6":"SN-20710","7":"Steve Nguyen","8":"Home Office","9":"United States","10":"Houston","11":"TEXAS","12":"77041","13":"Central","14":"OFF-EN-10002986","15":"Office Supplies","16":"Envelopes","17":"#10-4 1/8\" x 9 1/2\" Premium Diagonal Seam Envelopes","18":"113.3280","19":"9","20":"0.20","21":"35.4150","22":"NA"},{"1":"39","2":"CA-2015-117415","3":"2015-12-27","4":"2015-12-31","5":"Standard Class","6":"SN-20710","7":"Steve Nguyen","8":"Home Office","9":"United States","10":"Houston","11":"Texas","12":"77041","13":"Central","14":"FUR-BO-10002545","15":"Furniture","16":"Bookcases","17":"Atlantic Metals Mobile 3-Shelf Bookcases, Custom Colors","18":"532.3992","19":"3","20":"0.32","21":"-46.9764","22":"NA"},{"1":"40","2":"CA-2015-117415","3":"2015-12-27","4":"2015-12-31","5":"NA","6":"SN-20710","7":"Steve Nguyen","8":"Home Office","9":"United States","10":"Houston","11":"Texas","12":"77041","13":"Central","14":"FUR-CH-10004218","15":"Furniture","16":"Chairs","17":"Global Fabric Manager's Chair, Dark Gray","18":"212.0580","19":"3","20":"0.30","21":"-15.1470","22":"NA"},{"1":"41","2":"CA-2015-117415","3":"2015-12-27","4":"2015-12-31","5":"Standard Class","6":"SN-20710","7":"Steve Nguyen","8":"Home Office","9":"United States","10":"Houston","11":"Texas","12":"77041","13":"Central","14":"TEC-PH-10000486","15":"Technology","16":"Phones","17":"Plantronics HL10 Handset Lifter","18":"371.1680","19":"4","20":"0.20","21":"41.7564","22":"NA"},{"1":"42","2":"CA-2017-120999","3":"2017-09-10","4":"2017-09-15","5":"Standard Class","6":"LC-16930","7":"Linda Cazamias","8":"Corporate","9":"United States","10":"Naperville","11":"Illinois","12":"60540","13":"Central","14":"TEC-PH-10004093","15":"Technology","16":"Phones","17":"Panasonic Kx-TS550","18":"147.1680","19":"4","20":"0.20","21":"16.5564","22":"NA"},{"1":"43","2":"CA-2016-101343","3":"2016-07-17","4":"2016-07-22","5":"Standard Class","6":"RA-19885","7":"Ruben Ausman","8":"Corporate","9":"United States","10":"Los Angeles","11":"California","12":"90049","13":"West","14":"OFF-ST-10003479","15":"Office Supplies","16":"Storage","17":"Eldon Base for stackable storage shelf, platinum","18":"77.8800","19":"2","20":"0.00","21":"3.8940","22":"NA"},{"1":"44","2":"CA-2017-139619","3":"2017-09-19","4":"2017-09-23","5":"Standard Class","6":"ES-14080","7":"Erin Smith","8":"Corporate","9":"United States","10":"Melbourne","11":"Florida","12":"32935","13":"South","14":"OFF-ST-10003282","15":"Office Supplies","16":"Storage","17":"Advantus 10-Drawer Portable Organizer, Chrome Metal Frame, Smoke Drawers","18":"95.6160","19":"2","20":"0.20","21":"9.5616","22":"NA"},{"1":"45","2":"CA-2016-118255","3":"2016-03-11","4":"2016-03-13","5":"First Class","6":"ON-18715","7":"Odella Nelson","8":"Corporate","9":"United States","10":"Eagan","11":"Minnesota","12":"55122","13":"Central","14":"TEC-AC-10000171","15":"Technology","16":"Accessories","17":"Verbatim 25 GB 6x Blu-ray Single Layer Recordable Disc, 25/Pack","18":"45.9800","19":"2","20":"0.00","21":"19.7714","22":"NA"},{"1":"46","2":"CA-2016-118255","3":"2016-03-11","4":"2016-03-13","5":"First Class","6":"ON-18715","7":"Odella Nelson","8":"Corporate","9":"United States","10":"Eagan","11":"Minnesota","12":"55122","13":"Central","14":"OFF-BI-10003291","15":"Office Supplies","16":"Binders","17":"Wilson Jones Leather-Like Binders with DublLock Round Rings","18":"17.4600","19":"2","20":"0.00","21":"8.2062","22":"NA"},{"1":"47","2":"CA-2014-146703","3":"2014-10-20","4":"2014-10-25","5":"Second Class","6":"PO-18865","7":"Patrick O'Donnell","8":"Consumer","9":"United States","10":"Westland","11":"Michigan","12":"48185","13":"Central","14":"OFF-ST-10001713","15":"Office Supplies","16":"Storage","17":"Gould Plastics 9-Pocket Panel Bin, 18-3/8w x 5-1/4d x 20-1/2h, Black","18":"211.9600","19":"4","20":"0.00","21":"8.4784","22":"NA"},{"1":"48","2":"CA-2016-169194","3":"2016-06-20","4":"2016-06-25","5":"Standard Class","6":"LH-16900","7":"Lena Hernandez","8":"Consumer","9":"United States","10":"Dover","11":"Delaware","12":"19901","13":"East","14":"TEC-AC-10002167","15":"Technology","16":"Accessories","17":"Imation 8gb Micro Traveldrive Usb 2.0 Flash Drive","18":"45.0000","19":"3","20":"0.00","21":"4.9500","22":"NA"},{"1":"49","2":"CA-2016-169194","3":"2016-06-20","4":"2016-06-25","5":"Standard Class","6":"LH-16900","7":"Lena Hernandez","8":"Consumer","9":"United States","10":"Dover","11":"Delaware","12":"19901","13":"East","14":"TEC-PH-10003988","15":"Technology","16":"Phones","17":"LF Elite 3D Dazzle Designer Hard Case Cover, Lf Stylus Pen and Wiper For Apple Iphone 5c Mini Lite","18":"21.8000","19":"-2","20":"0.00","21":"6.1040","22":"NA"},{"1":"50","2":"CA-2015-115742","3":"2015-04-18","4":"2015-04-22","5":"Standard Class","6":"DP-13000","7":"Darren Powers","8":"Consumer","9":"United States","10":"New Albany","11":"Indiana","12":"47150","13":"Central","14":"OFF-BI-10004410","15":"Office Supplies","16":"Binders","17":"C-Line Peel & Stick Add-On Filing Pockets, 8-3/4 x 5-1/8, 10/Pack","18":"38.2200","19":"6","20":"0.00","21":"17.9634","22":"NA"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- ## Drop Column The columns that not bring benefits to the analysis are dropped. For example, Row ID, Customer ID, Order ID, Customer Name, Product Name, Postal Code, Product ID, and Country. Country is dropped because there is only one country under that variable, which is America. <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["Ship Mode"],"name":[1],"type":["chr"],"align":["left"]},{"label":["Segment"],"name":[2],"type":["chr"],"align":["left"]},{"label":["City"],"name":[3],"type":["chr"],"align":["left"]},{"label":["State"],"name":[4],"type":["chr"],"align":["left"]},{"label":["Region"],"name":[5],"type":["chr"],"align":["left"]},{"label":["Category"],"name":[6],"type":["chr"],"align":["left"]},{"label":["Sub-Category"],"name":[7],"type":["chr"],"align":["left"]},{"label":["Sales"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["Quantity"],"name":[9],"type":["dbl"],"align":["right"]},{"label":["Discount"],"name":[10],"type":["dbl"],"align":["right"]},{"label":["Profit"],"name":[11],"type":["dbl"],"align":["right"]},{"label":["Returned"],"name":[12],"type":["chr"],"align":["left"]}],"data":[{"1":"Second Class","2":"Consumer","3":"Henderson","4":"KENTUCKY","5":"South","6":"Furniture","7":"Bookcases","8":"261.9600","9":"2","10":"0.00","11":"41.9136","12":"NA"},{"1":"Second Class","2":"Consumer","3":"Henderson","4":"kentucky","5":"South","6":"Furniture","7":"Chairs","8":"731.9400","9":"-3","10":"0.00","11":"219.5820","12":"NA"},{"1":"Second Class","2":"Corporate","3":"Los Angeles","4":"California","5":"West","6":"Office Supplies","7":"Labels","8":"14.6200","9":"2","10":"0.00","11":"6.8714","12":"NA"},{"1":"NA","2":"Consumer","3":"Fort Lauderdale","4":"florida","5":"South","6":"Furniture","7":"Tables","8":"957.5775","9":"5","10":"0.45","11":"-383.0310","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Fort Lauderdale","4":"Florida","5":"South","6":"Office Supplies","7":"Storage","8":"22.3680","9":"2","10":"0.20","11":"2.5164","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Los Angeles","4":"California","5":"West","6":"Furniture","7":"Furnishings","8":"48.8600","9":"-7","10":"0.00","11":"14.1694","12":"NA"},{"1":"NA","2":"Consumer","3":"Los Angeles","4":"CALIFORNIA","5":"West","6":"Office Supplies","7":"Art","8":"7.2800","9":"4","10":"0.00","11":"1.9656","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Los Angeles","4":"California","5":"West","6":"Technology","7":"Phones","8":"907.1520","9":"6","10":"0.20","11":"90.7152","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Los Angeles","4":"California","5":"West","6":"Office Supplies","7":"Binders","8":"18.5040","9":"3","10":"0.20","11":"5.7825","12":"NA"},{"1":"NA","2":"Consumer","3":"Los Angeles","4":"California","5":"West","6":"Office Supplies","7":"Appliances","8":"114.9000","9":"5","10":"0.00","11":"34.4700","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Los Angeles","4":"California","5":"West","6":"Furniture","7":"Tables","8":"1706.1840","9":"9","10":"0.20","11":"85.3092","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Los Angeles","4":"California","5":"West","6":"Technology","7":"Phones","8":"911.4240","9":"4","10":"0.20","11":"68.3568","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Concord","4":"North Carolina","5":"South","6":"Office Supplies","7":"Paper","8":"15.5520","9":"3","10":"0.20","11":"5.4432","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Seattle","4":"Washington","5":"West","6":"Office Supplies","7":"Binders","8":"407.9760","9":"3","10":"0.20","11":"132.5922","12":"NA"},{"1":"NA","2":"Home Office","3":"Fort Worth","4":"Texas","5":"Central","6":"Office Supplies","7":"Appliances","8":"68.8100","9":"-5","10":"0.80","11":"-123.8580","12":"NA"},{"1":"Standard Class","2":"Home Office","3":"Fort Worth","4":"Texas","5":"Central","6":"Office Supplies","7":"Binders","8":"2.5440","9":"3","10":"0.80","11":"-3.8160","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Madison","4":"Wisconsin","5":"Central","6":"Office Supplies","7":"Storage","8":"665.8800","9":"6","10":"0.00","11":"13.3176","12":"NA"},{"1":"Second Class","2":"Consumer","3":"West Jordan","4":"Utah","5":"West","6":"Office Supplies","7":"Storage","8":"55.5000","9":"2","10":"0.00","11":"9.9900","12":"NA"},{"1":"Second Class","2":"Consumer","3":"San Francisco","4":"California","5":"West","6":"Office Supplies","7":"Art","8":"8.5600","9":"2","10":"0.00","11":"2.4824","12":"Yes"},{"1":"Second Class","2":"Consumer","3":"San Francisco","4":"California","5":"West","6":"Technology","7":"Phones","8":"213.4800","9":"3","10":"0.20","11":"16.0110","12":"Yes"},{"1":"Second Class","2":"Consumer","3":"San Francisco","4":"California","5":"West","6":"Office Supplies","7":"Binders","8":"22.7200","9":"-4","10":"0.20","11":"7.3840","12":"Yes"},{"1":"Standard Class","2":"Corporate","3":"Fremont","4":"Nebraska","5":"Central","6":"Office Supplies","7":"Art","8":"19.4600","9":"7","10":"NA","11":"5.0596","12":"NA"},{"1":"Standard Class","2":"Corporate","3":"Fremont","4":"Nebraska","5":"Central","6":"Office Supplies","7":"Appliances","8":"60.3400","9":"7","10":"0.00","11":"15.6884","12":"NA"},{"1":"Second Class","2":"Consumer","3":"Philadelphia","4":"Pennsylvania","5":"East","6":"Furniture","7":"Chairs","8":"71.3720","9":"2","10":"0.30","11":"-1.0196","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Orem","4":"UTAH","5":"West","6":"Furniture","7":"Tables","8":"1044.6300","9":"3","10":"0.00","11":"240.2649","12":"NA"},{"1":"Second Class","2":"Consumer","3":"Los Angeles","4":"California","5":"West","6":"Office Supplies","7":"Binders","8":"11.6480","9":"2","10":"0.20","11":"4.2224","12":"NA"},{"1":"Second Class","2":"Consumer","3":"Los Angeles","4":"California","5":"West","6":"Technology","7":"Accessories","8":"90.5700","9":"3","10":"0.00","11":"11.7741","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Philadelphia","4":"Pennsylvania","5":"East","6":"Furniture","7":"Bookcases","8":"3083.4300","9":"7","10":"0.50","11":"-1665.0522","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Philadelphia","4":"Pennsylvania","5":"East","6":"Office Supplies","7":"Binders","8":"9.6180","9":"2","10":"0.70","11":"-7.0532","12":"NA"},{"1":"NA","2":"Consumer","3":"Philadelphia","4":"Pennsylvania","5":"East","6":"Furniture","7":"Furnishings","8":"124.2000","9":"3","10":"0.20","11":"15.5250","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Philadelphia","4":"Pennsylvania","5":"East","6":"Office Supplies","7":"Envelopes","8":"3.2640","9":"2","10":"0.20","11":"1.1016","12":"NA"},{"1":"NA","2":"Consumer","3":"Philadelphia","4":"Pennsylvania","5":"East","6":"Office Supplies","7":"Art","8":"86.3040","9":"6","10":"0.20","11":"9.7092","12":"NA"},{"1":"NA","2":"Consumer","3":"Philadelphia","4":"Pennsylvania","5":"East","6":"Office Supplies","7":"Binders","8":"6.8580","9":"6","10":"0.70","11":"-5.7150","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Philadelphia","4":"Pennsylvania","5":"East","6":"Office Supplies","7":"Art","8":"15.7600","9":"2","10":"0.20","11":"3.5460","12":"NA"},{"1":"Second Class","2":"Home Office","3":"Houston","4":"Texas","5":"Central","6":"Office Supplies","7":"Paper","8":"29.4720","9":"3","10":"0.20","11":"9.9468","12":"NA"},{"1":"First Class","2":"Corporate","3":"Richardson","4":"Texas","5":"Central","6":"Technology","7":"Phones","8":"1097.5440","9":"7","10":"0.20","11":"123.4737","12":"NA"},{"1":"First Class","2":"Corporate","3":"Richardson","4":"Texas","5":"Central","6":"Furniture","7":"Furnishings","8":"190.9200","9":"5","10":"0.60","11":"-147.9630","12":"NA"},{"1":"Standard Class","2":"Home Office","3":"Houston","4":"TEXAS","5":"Central","6":"Office Supplies","7":"Envelopes","8":"113.3280","9":"9","10":"0.20","11":"35.4150","12":"NA"},{"1":"Standard Class","2":"Home Office","3":"Houston","4":"Texas","5":"Central","6":"Furniture","7":"Bookcases","8":"532.3992","9":"3","10":"0.32","11":"-46.9764","12":"NA"},{"1":"NA","2":"Home Office","3":"Houston","4":"Texas","5":"Central","6":"Furniture","7":"Chairs","8":"212.0580","9":"3","10":"0.30","11":"-15.1470","12":"NA"},{"1":"Standard Class","2":"Home Office","3":"Houston","4":"Texas","5":"Central","6":"Technology","7":"Phones","8":"371.1680","9":"4","10":"0.20","11":"41.7564","12":"NA"},{"1":"Standard Class","2":"Corporate","3":"Naperville","4":"Illinois","5":"Central","6":"Technology","7":"Phones","8":"147.1680","9":"4","10":"0.20","11":"16.5564","12":"NA"},{"1":"Standard Class","2":"Corporate","3":"Los Angeles","4":"California","5":"West","6":"Office Supplies","7":"Storage","8":"77.8800","9":"2","10":"0.00","11":"3.8940","12":"NA"},{"1":"Standard Class","2":"Corporate","3":"Melbourne","4":"Florida","5":"South","6":"Office Supplies","7":"Storage","8":"95.6160","9":"2","10":"0.20","11":"9.5616","12":"NA"},{"1":"First Class","2":"Corporate","3":"Eagan","4":"Minnesota","5":"Central","6":"Technology","7":"Accessories","8":"45.9800","9":"2","10":"0.00","11":"19.7714","12":"NA"},{"1":"First Class","2":"Corporate","3":"Eagan","4":"Minnesota","5":"Central","6":"Office Supplies","7":"Binders","8":"17.4600","9":"2","10":"0.00","11":"8.2062","12":"NA"},{"1":"Second Class","2":"Consumer","3":"Westland","4":"Michigan","5":"Central","6":"Office Supplies","7":"Storage","8":"211.9600","9":"4","10":"0.00","11":"8.4784","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Dover","4":"Delaware","5":"East","6":"Technology","7":"Accessories","8":"45.0000","9":"3","10":"0.00","11":"4.9500","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"Dover","4":"Delaware","5":"East","6":"Technology","7":"Phones","8":"21.8000","9":"-2","10":"0.00","11":"6.1040","12":"NA"},{"1":"Standard Class","2":"Consumer","3":"New Albany","4":"Indiana","5":"Central","6":"Office Supplies","7":"Binders","8":"38.2200","9":"6","10":"0.00","11":"17.9634","12":"NA"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- ## Missing Data Next, we check for missing value. The data set is checked for any non-availability (NA). This is because missing value is dirty and might result in affecting our analysis. <img src="data:image/png;base64,#img/pic8.jpg" width="600" height="150" style="display: block; margin: 0 auto" /> Based on the results, we can see that there are few variables that contain NA value. They are Ship Mode, Discount and Returned. Ship Mode has 33 missing values, Discount has 42 missing values and Returned has 9194 missing values. These missing value will be handled accordingly. --- class: middle **Imputation (Using mode & median)** <br /> We know that Ship Mode is in nominal pattern. Hence, we will handle the missing data by using imputation by mode. For Discount, it is a numerical data. Hence, the missing data will be handled by using imputation by median. However, for Returned, the missing value is for the row with 'No' data. Only 'Yes' are filled inside the dataset. Hence, 'No' imputation is done upon the variable. <img src="data:image/png;base64,#img/pic7.jpg" width="600" height="150" style="display: block; margin: 0 auto" /> From the results, all columns illustrate 0 from any NA value. This show that all missing value had been handled properly. After missing value is cattered, the pattern of the data is studied. --- ## Noisy Data **(Numerical data)** <br /> The pattern for the numerical data is studied. The aim is to determine for any irrelevant pattern in the dataset. To do so, histogram is plotted. <img src="data:image/png;base64,#img/pic2.jpg" width="450" height="300" style="display: block; margin: 0 auto" /> From the histogram, there is weird value in Quantity. It cannot be negative because this variable indicate the number of product. So, removing it would be good since the amount of error in this variable is really small. --- **The rows of the error data are removed:** <!-- --> <br /> From the results, it shows that the rows containing error value is removed successfully. --- ## Inconsistent Data **(Categorical data)** <br /> Categorical data in this dataset includes Ship Mode, Segment, Country, City, State, Region, Category, and Sub-Category. Any weird naming value or redundancy can be determined from this method. <img src="data:image/png;base64,#img/pic3.jpg" width="450" height="300" style="display: block; margin: 0 auto" /> From the results, State have some value redundancy. The case of the letter is not standardized. --- class: middle **Standardizing the case of the letters in State:** <br /> The letters are standardized into the same case of letter. <br /> <img src="data:image/png;base64,#img/pic4.jpg" width="450" height="300" style="display: block; margin: 0 auto" /> From the results, we can see that the data redundancy is catered successfully. --- ## Data Exploration <br /> In this section, we will drill deeper into the data for more insights. But first, we need to know what problems we want to solve and what questions to ask. Taking the POV of the owner of the Superstore: <br /> - Overview - Increase Revenue - Which product category has the highest sales? - Which customer segment that contribute to the highest sales? - Which region, state and city contribute to the highest sales and profit? - Overview - Reduce Loss - Which product category and subcategory that has the highest returned item? - Correlation - How the factors have influenced on each other? --- ### Overview - Increase Revenue</span><br /> **1. Which product category with highest sales?** <br /> <span style="text-decoration:underline"> Bar Chart for Category by Sales Breakdown</span> <img src="data:image/png;base64,#img/pic10.jpg" width="430" height="250" style="display: block; margin: 0 auto" /> From the bar graph plotted above, the product that falls under the category of 'Technology' makes the highest sales in this superstore. Looking onto this pattern, increasing the amount of technology product will be a good move to improve the Sales of the company. --- **2. Which customer segment that contribute to the highest sales?** <br /><br /> <span style="text-decoration:underline"> Pie Chart for CUstomer Segment Sales Contribution</span><br /><br /> <img src="data:image/png;base64,#img/pic11.jpg" width="400" height="350" style="display: block; margin: 0 auto" /><br /> From the pie chart above, consumer are among the people that contributes to the highest sales in this superstore. It made up to 51% of the total customers that come to the superstore. Meanwhile, Corporate comes with 31% and Home Office with 19%. --- **3. Which region, state and city contribute to the highest sales?** <span style="text-decoration:underline"> Bar Chart for Region by Sales</span><br /><br /> <img src="data:image/png;base64,#img/pic12.jpg" width="450" height="300" style="display: block; margin: 0 auto" /><br /> Next, for Sales breakdown by region. West region made up the highest number of sales which is around $720,000 (31.6%), followed by East at $680,000 (29.6%), Central with $500,000 (21.8%) and finally, South at $390,000 (17%). Hence, with that numbers, we can take the action to do more promotion on South region so that the amount of sales can be boosted. --- <span style="text-decoration:underline"> Horizontal Bar Chart for Top 10 State by Sales</span><br /><br /> <img src="data:image/png;base64,#img/pic13.jpg" width="450" height="300" style="display: block; margin: 0 auto" /><br /> According to the graph above, people from California contributes to the highest Sales toward the Superstore. This is maybe due to the location of the Superstore that near California. However, in order to increase the Sales of the Superstore, it is crucial to reach those state with less contributions on Sale. This is maybe due to their location that is unreachable to the superstore. To solve this issue, maybe a delivery service to the targeted state might be helpful in boosting the sales of the superstore. --- <span style="text-decoration:underline"> Horizontal Bar Chart for Top 10 City by Sales</span><br /><br /> <img src="data:image/png;base64,#img/pic14.jpg" width="450" height="300" style="display: block; margin: 0 auto" /><br /> New York City has the highest Sales among the other city. For the same reason, it is also maybe due to the strategic location that have influenced in impacting the Sales. Or in a simple word, people can reach the superstore easily. But for the city that contributes less on Sales, it is maybe due ot their location that is far from the superstore. Hence, purchase delivery would be a great idea to solve this issue since it can be the bridge to link the people who stays at the city that is far from the superstore. --- ### Overview - Reduce Loss</span><br /> **1. Which product category and subcategory that has the highest returned item? ** <img src="data:image/png;base64,#img/pic20.jpg" width="450" height="300" style="display: block; margin: 0 auto" /><br /> Office Supplies are among the product category to have the most returned item. This may relate to the quality of the office supplies. Hence, to reduce loss, selling office supplies with good qualities might reduce the probability of the items to be returned. --- <span style="text-decoration:underline"> Sub-Category Having Highest Returns</span><br /> <img src="data:image/png;base64,#img/pic21.jpg" width="450" height="300" style="display: block; margin: 0 auto" /><br /> Meanwhile, for subcategory, binders are the top product to be returned from the customers, followed by paper. These returned item can be relate back to the quality of the product. The better the quality, the more the probability of the customer to be satisfied with their purchased product. Hence, when the number of returned item is decreased, the loss that faced by the superstore can be reduced significantly. --- ## Label Encoder <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Ship Mode"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Segment"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["City"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["State"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Region"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["Category"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["Sub-Category"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["Sales"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["Quantity"],"name":[9],"type":["dbl"],"align":["right"]},{"label":["Discount"],"name":[10],"type":["dbl"],"align":["right"]},{"label":["Profit"],"name":[11],"type":["dbl"],"align":["right"]},{"label":["Returned"],"name":[12],"type":["dbl"],"align":["right"]}],"data":[{"1":"3","2":"1","3":"195","4":"16","5":"3","6":"1","7":"5","8":"261.9600","9":"2","10":"0.00","11":"41.9136","12":"1","_rn_":"1"},{"1":"3","2":"2","3":"267","4":"4","5":"4","6":"2","7":"11","8":"14.6200","9":"2","10":"0.00","11":"6.8714","12":"1","_rn_":"3"},{"1":"4","2":"1","3":"154","4":"9","5":"3","6":"1","7":"17","8":"957.5775","9":"5","10":"0.45","11":"-383.0310","12":"1","_rn_":"4"},{"1":"4","2":"1","3":"154","4":"9","5":"3","6":"2","7":"15","8":"22.3680","9":"2","10":"0.20","11":"2.5164","12":"1","_rn_":"5"},{"1":"4","2":"1","3":"267","4":"4","5":"4","6":"2","7":"3","8":"7.2800","9":"4","10":"0.00","11":"1.9656","12":"1","_rn_":"7"},{"1":"4","2":"1","3":"267","4":"4","5":"4","6":"3","7":"14","8":"907.1520","9":"6","10":"0.20","11":"90.7152","12":"1","_rn_":"8"},{"1":"4","2":"1","3":"267","4":"4","5":"4","6":"2","7":"4","8":"18.5040","9":"3","10":"0.20","11":"5.7825","12":"1","_rn_":"9"},{"1":"4","2":"1","3":"267","4":"4","5":"4","6":"2","7":"2","8":"114.9000","9":"5","10":"0.00","11":"34.4700","12":"1","_rn_":"10"},{"1":"4","2":"1","3":"267","4":"4","5":"4","6":"1","7":"17","8":"1706.1840","9":"9","10":"0.20","11":"85.3092","12":"1","_rn_":"11"},{"1":"4","2":"1","3":"267","4":"4","5":"4","6":"3","7":"14","8":"911.4240","9":"4","10":"0.20","11":"68.3568","12":"1","_rn_":"12"},{"1":"4","2":"1","3":"97","4":"32","5":"3","6":"2","7":"13","8":"15.5520","9":"3","10":"0.20","11":"5.4432","12":"1","_rn_":"13"},{"1":"4","2":"1","3":"453","4":"46","5":"4","6":"2","7":"4","8":"407.9760","9":"3","10":"0.20","11":"132.5922","12":"1","_rn_":"14"},{"1":"4","2":"3","3":"155","4":"42","5":"1","6":"2","7":"4","8":"2.5440","9":"3","10":"0.80","11":"-3.8160","12":"1","_rn_":"16"},{"1":"4","2":"1","3":"273","4":"48","5":"1","6":"2","7":"15","8":"665.8800","9":"6","10":"0.00","11":"13.3176","12":"1","_rn_":"17"},{"1":"3","2":"1","3":"514","4":"43","5":"4","6":"2","7":"15","8":"55.5000","9":"2","10":"0.00","11":"9.9900","12":"1","_rn_":"18"},{"1":"3","2":"1","3":"439","4":"4","5":"4","6":"2","7":"3","8":"8.5600","9":"2","10":"0.00","11":"2.4824","12":"1","_rn_":"19"},{"1":"3","2":"1","3":"439","4":"4","5":"4","6":"3","7":"14","8":"213.4800","9":"3","10":"0.20","11":"16.0110","12":"1","_rn_":"20"},{"1":"4","2":"2","3":"159","4":"26","5":"1","6":"2","7":"3","8":"19.4600","9":"7","10":"0.20","11":"5.0596","12":"1","_rn_":"22"},{"1":"4","2":"2","3":"159","4":"26","5":"1","6":"2","7":"2","8":"60.3400","9":"7","10":"0.00","11":"15.6884","12":"1","_rn_":"23"},{"1":"3","2":"1","3":"375","4":"37","5":"2","6":"1","7":"6","8":"71.3720","9":"2","10":"0.30","11":"-1.0196","12":"1","_rn_":"24"},{"1":"4","2":"1","3":"352","4":"43","5":"4","6":"1","7":"17","8":"1044.6300","9":"3","10":"0.00","11":"240.2649","12":"1","_rn_":"25"},{"1":"3","2":"1","3":"267","4":"4","5":"4","6":"2","7":"4","8":"11.6480","9":"2","10":"0.20","11":"4.2224","12":"1","_rn_":"26"},{"1":"3","2":"1","3":"267","4":"4","5":"4","6":"3","7":"1","8":"90.5700","9":"3","10":"0.00","11":"11.7741","12":"1","_rn_":"27"},{"1":"4","2":"1","3":"375","4":"37","5":"2","6":"1","7":"5","8":"3083.4300","9":"7","10":"0.50","11":"-1665.0522","12":"1","_rn_":"28"},{"1":"4","2":"1","3":"375","4":"37","5":"2","6":"2","7":"4","8":"9.6180","9":"2","10":"0.70","11":"-7.0532","12":"1","_rn_":"29"},{"1":"4","2":"1","3":"375","4":"37","5":"2","6":"1","7":"10","8":"124.2000","9":"3","10":"0.20","11":"15.5250","12":"1","_rn_":"30"},{"1":"4","2":"1","3":"375","4":"37","5":"2","6":"2","7":"8","8":"3.2640","9":"2","10":"0.20","11":"1.1016","12":"1","_rn_":"31"},{"1":"4","2":"1","3":"375","4":"37","5":"2","6":"2","7":"3","8":"86.3040","9":"6","10":"0.20","11":"9.7092","12":"1","_rn_":"32"},{"1":"4","2":"1","3":"375","4":"37","5":"2","6":"2","7":"4","8":"6.8580","9":"6","10":"0.70","11":"-5.7150","12":"1","_rn_":"33"},{"1":"4","2":"1","3":"375","4":"37","5":"2","6":"2","7":"3","8":"15.7600","9":"2","10":"0.20","11":"3.5460","12":"1","_rn_":"34"},{"1":"3","2":"3","3":"208","4":"42","5":"1","6":"2","7":"13","8":"29.4720","9":"3","10":"0.20","11":"9.9468","12":"1","_rn_":"35"},{"1":"1","2":"2","3":"407","4":"42","5":"1","6":"3","7":"14","8":"1097.5440","9":"7","10":"0.20","11":"123.4737","12":"1","_rn_":"36"},{"1":"1","2":"2","3":"407","4":"42","5":"1","6":"1","7":"10","8":"190.9200","9":"5","10":"0.60","11":"-147.9630","12":"1","_rn_":"37"},{"1":"4","2":"3","3":"208","4":"42","5":"1","6":"2","7":"8","8":"113.3280","9":"9","10":"0.20","11":"35.4150","12":"1","_rn_":"38"},{"1":"4","2":"3","3":"208","4":"42","5":"1","6":"1","7":"5","8":"532.3992","9":"3","10":"0.32","11":"-46.9764","12":"1","_rn_":"39"},{"1":"4","2":"3","3":"208","4":"42","5":"1","6":"1","7":"6","8":"212.0580","9":"3","10":"0.30","11":"-15.1470","12":"1","_rn_":"40"},{"1":"4","2":"3","3":"208","4":"42","5":"1","6":"3","7":"14","8":"371.1680","9":"4","10":"0.20","11":"41.7564","12":"1","_rn_":"41"},{"1":"4","2":"2","3":"322","4":"12","5":"1","6":"3","7":"14","8":"147.1680","9":"4","10":"0.20","11":"16.5564","12":"1","_rn_":"42"},{"1":"4","2":"2","3":"267","4":"4","5":"4","6":"2","7":"15","8":"77.8800","9":"2","10":"0.00","11":"3.8940","12":"1","_rn_":"43"},{"1":"4","2":"2","3":"289","4":"9","5":"3","6":"2","7":"15","8":"95.6160","9":"2","10":"0.20","11":"9.5616","12":"1","_rn_":"44"},{"1":"1","2":"2","3":"130","4":"22","5":"1","6":"3","7":"1","8":"45.9800","9":"2","10":"0.00","11":"19.7714","12":"1","_rn_":"45"},{"1":"1","2":"2","3":"130","4":"22","5":"1","6":"2","7":"4","8":"17.4600","9":"2","10":"0.00","11":"8.2062","12":"1","_rn_":"46"},{"1":"3","2":"1","3":"517","4":"21","5":"1","6":"2","7":"15","8":"211.9600","9":"4","10":"0.00","11":"8.4784","12":"1","_rn_":"47"},{"1":"4","2":"1","3":"125","4":"7","5":"2","6":"3","7":"1","8":"45.0000","9":"3","10":"0.00","11":"4.9500","12":"1","_rn_":"48"},{"1":"4","2":"1","3":"325","4":"13","5":"1","6":"2","7":"4","8":"38.2200","9":"6","10":"0.00","11":"17.9634","12":"1","_rn_":"50"},{"1":"4","2":"1","3":"325","4":"13","5":"1","6":"2","7":"11","8":"75.1800","9":"6","10":"0.00","11":"35.3346","12":"1","_rn_":"51"},{"1":"4","2":"1","3":"325","4":"13","5":"1","6":"1","7":"10","8":"6.1600","9":"2","10":"0.00","11":"2.9568","12":"1","_rn_":"52"},{"1":"4","2":"1","3":"325","4":"13","5":"1","6":"1","7":"6","8":"89.9900","9":"1","10":"0.00","11":"17.0981","12":"1","_rn_":"53"},{"1":"4","2":"2","3":"330","4":"31","5":"2","6":"2","7":"9","8":"15.2600","9":"7","10":"0.00","11":"6.2566","12":"1","_rn_":"54"},{"1":"4","2":"2","3":"330","4":"31","5":"2","6":"3","7":"14","8":"1029.9500","9":"5","10":"0.00","11":"298.6855","12":"1","_rn_":"55"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- ## Heatmap Correlation <!-- --> --- ## Correlation Matrix <br /> <img src="data:image/png;base64,#img/pic9.jpg" width="450" height="300" style="display: block; margin: 0 auto" /><br /> Profit and Ship Mode will be our focus since these 2 variables will be the targeted output for the case of classification and regression activity during machine learning model. From the results, there are variables that is weakly correlated with the output. But, there is also variables that have a better correlation towards the output. Hence, there will be few variables that will be dropped for machine learning prediction. --- ## Machine Learning & Assessment <span style="text-decoration:underline"> a - Linear Regression</span><br /><br /> Based on the correlation analysis, we can see that “Sales” has a high positive correlation with dependent variable “Profit” <img src="data:image/png;base64,#img/pic15.jpg" width="450" height="300" style="display: block; margin: 0 auto" /><br /> --- Whether we can use our model to make predictions will depend on:<br /> Whether we can reject the null hypothesis that there is no relationship between our variables. The model is a good fit for our data. <img src="data:image/png;base64,#img/pic16.jpg" width="450" height="300" style="display: block; margin: 0 auto" /><br /> Based on the result summary shown above, is the hypothesis supported? - Since the p-value is smaller than 0.05 as the cutoff for significance, we reject Ho . We can reject the null hypothesis in favor of believing there to be a relationship between Sales and Profit. --- <span style="text-decoration:underline"> Prediction</span><br /><br /> The predicted value is determined to be compared to the actual value. <img src="data:image/png;base64,#img/pic17.5.jpg" width="550" height="500" style="display: block; margin: 0 auto" /><br /> --- class: middle Let’s now, compare the predicted vs actual values.<br /> The output of the above command is shown below in a graph that shows the predicted Profit.<br /><br /> <img src="data:image/png;base64,#img/pic17.jpg" width="500" height="350" style="display: block; margin: 0 auto" /> --- <span style="text-decoration:underline"> Model Accuracy</span><br /> The accuracy of the model is calculated through Root Mean Square Error (RMSE). <br /> <img src="data:image/png;base64,#img/pic17.6.jpg" width="450" height="400" style="display: block; margin: 0 auto" /> --- <span style="text-decoration:underline"> b - Classification For Ship Mode Using Decision Tree</span><br /> The classification is done by using decision tree algorithm, Below are the steps of classification starting from data splitting, followed by fitting the model and finally performance evaluation.<br /> <img src="data:image/png;base64,#img/pic18.jpg" width="450" height="400" style="display: block; margin: 0 auto" /> --- <span style="text-decoration:underline"> Decision Tree</span><br /><br /> The accuracy of the decision tree model is presented in the figure below. <img src="data:image/png;base64,#img/pic19.jpg" width="550" height="500" style="display: block; margin: 0 auto" /> --- <span style="text-decoration:underline"> Random Forest</span><br /><br /> The accuracy of the random forest model is presented in the figure below. <img src="data:image/png;base64,#img/rf.jpg" width="550" height="500" style="display: block; margin: 0 auto" /> --- <span style="text-decoration:underline"> Support Vector Machine (SVM)</span><br /><br /> <img src="data:image/png;base64,#img/SVM1.jpg" width="550" height="500" style="display: block; margin: 0 auto" /> --- The accuracy of the svm model is presented in the figure below. <img src="data:image/png;base64,#img/SVM2.jpg" width="550" height="500" style="display: block; margin: 0 auto" /> --- <span style="text-decoration:underline"> Naive Bayes (NV)</span><br /><br /> The accuracy of the NV model is presented in the figure below. <img src="data:image/png;base64,#img/NV.jpg" width="550" height="500" style="display: block; margin: 0 auto" /> --- ## Conclusion To reiterate, the Superstore owners have the following concerns: - They would like to understand which products, regions, categories and customer segments they should target or avoid. - They also want to have a Regression model to predict Sales or Profit. *A data scientist's recommendations to the Superstore owners are as follows:* - In terms of Customer Segment, it is suggested to target the Home Office segment as this will be the trend, accelerated by Covid. Corporate segment is too competitive as this is a very established market segment. - In terms of Sales by Region, it is suggested to take a deep dive into the Superstore's market presence in the Southern region - to understand why the sales in this region is much lower than East and West. - In terms of Returns, Office Supplies category records the highest returned items. Deep dive shows it is Paper and Binders sub-category. A survey must be conducted to collect feedback on why these two categories have such a ridiculously high item return. --- class: middle *In regard to predictive analytics (regression and classification) on the dataset:* - Correlation only between few selected variables: Profit, Sales, Quantity, Discount and Sub-category. - Regression: A predictive model to predict Profit from Sales has been constructed and scored 1.8 on RMSE. Looking at the plot, there might be some instances with large errors needed to be rechecked. - Classification: This dataset not suitable for classification as most of the variables are not correlated to each other. Further data collection needed on new features. - Future work will include deep dive into instances that triggers large errors in the regression model; and features to be collected for classification problem. --- class: center, middle <img src="data:image/png;base64,#img/thankyou.jpg" width="550" height="500" style="display: block; margin: 0 auto" />