Objective of the Task: The broad objective of this task was to better understand how specific product types perform against each other. The outcomes of this task will help the sales team better understand how types of products might impact sales across the enterprise
To do this an analysis of historical sales data was done in order to make sales volume predictions for a list of new product types. The focus would be on predicting sales of four different product types: PC, Laptops, Netbooks and Smartphones and then assessing the impact services reviews and customer reviews have on sales of different product types
Deliverables: The key deliverables for this task were to 1. Identify the Algorithms tested 2. Select an Algorithm and state the reason why it was selected 3. Do charts that show the impact of customer and service reviews on sales volume 4. Export the predicted findings to an excel sheet 5. Predict the sales of 4 specific products - PC, Laptops, Netbooks and Smartphones
Methodology: Regression was used to build machine learning models for this analyses using a choice three popular algorithms. Predictions were also done using all three algorithms and the best for the provided dataset identified. Below a detailed set of steps taken and the codes used are presented.
existing_products <- read.csv("C:/Users/gebruiker/Desktop/Ubiqum_1/existingproductattributes2017.csv")
#import the new products dataset
new_products <- read.csv("C:/Users/gebruiker/Desktop/Ubiqum_1/newproductattributes2017.csv")
#load the Caret package again - try to always do this so that it will be activated for the task
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
#dumify the data - typical datasets don’t contain only numeric values. Most data will contain a mixture of numeric and nominal data. Dumifying helps to incorporate both (numeric and nominal data) for developing regression models and making predictions.
#How to dumify the data - convert categorical variables (factor and character variables) to binary variables using the process below
#dumify the data - step1 - create a new dataframe made up of dummy variables from the exisiting products data
newDataFrame <- dummyVars(" ~ .", data = existing_products)
#next integrate the dummy variables df called newdataframe into the existing products dataframe and assign all to a new name called ready dataframe
readyData <- data.frame(predict(newDataFrame, newdata = existing_products))
#cross-check to ensure there are no nominal variables-check the structure
str(readyData)
## 'data.frame': 80 obs. of 29 variables:
## $ ProductType.Accessories : num 0 0 0 0 0 1 1 1 1 1 ...
## $ ProductType.Display : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.ExtendedWarranty: num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.GameConsole : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.Laptop : num 0 0 0 1 1 0 0 0 0 0 ...
## $ ProductType.Netbook : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.PC : num 1 1 1 0 0 0 0 0 0 0 ...
## $ ProductType.Printer : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.PrinterSupplies : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.Smartphone : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.Software : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.Tablet : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductNum : num 101 102 103 104 105 106 107 108 109 110 ...
## $ Price : num 949 2250 399 410 1080 ...
## $ x5StarReviews : num 3 2 3 49 58 83 11 33 16 10 ...
## $ x4StarReviews : num 3 1 0 19 31 30 3 19 9 1 ...
## $ x3StarReviews : num 2 0 0 8 11 10 0 12 2 1 ...
## $ x2StarReviews : num 0 0 0 3 7 9 0 5 0 0 ...
## $ x1StarReviews : num 0 0 0 9 36 40 1 9 2 0 ...
## $ PositiveServiceReview : num 2 1 1 7 7 12 3 5 2 2 ...
## $ NegativeServiceReview : num 0 0 0 8 20 5 0 3 1 0 ...
## $ Recommendproduct : num 0.9 0.9 0.9 0.8 0.7 0.3 0.9 0.7 0.8 0.9 ...
## $ BestSellersRank : num 1967 4806 12076 109 268 ...
## $ ShippingWeight : num 25.8 50 17.4 5.7 7 1.6 7.3 12 1.8 0.75 ...
## $ ProductDepth : num 23.9 35 10.5 15 12.9 ...
## $ ProductWidth : num 6.62 31.75 8.3 9.9 0.3 ...
## $ ProductHeight : num 16.9 19 10.2 1.3 8.9 ...
## $ ProfitMargin : num 0.15 0.25 0.08 0.08 0.09 0.05 0.05 0.05 0.05 0.05 ...
## $ Volume : num 12 8 12 196 232 332 44 132 64 40 ...
#Check for missing data - all the columns/sections with NA’s
summary(readyData)
## ProductType.Accessories ProductType.Display ProductType.ExtendedWarranty
## Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.000 Median :0.0000 Median :0.000
## Mean :0.325 Mean :0.0625 Mean :0.125
## 3rd Qu.:1.000 3rd Qu.:0.0000 3rd Qu.:0.000
## Max. :1.000 Max. :1.0000 Max. :1.000
##
## ProductType.GameConsole ProductType.Laptop ProductType.Netbook
## Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.000 Median :0.0000 Median :0.000
## Mean :0.025 Mean :0.0375 Mean :0.025
## 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.000
## Max. :1.000 Max. :1.0000 Max. :1.000
##
## ProductType.PC ProductType.Printer ProductType.PrinterSupplies
## Min. :0.00 Min. :0.00 Min. :0.0000
## 1st Qu.:0.00 1st Qu.:0.00 1st Qu.:0.0000
## Median :0.00 Median :0.00 Median :0.0000
## Mean :0.05 Mean :0.15 Mean :0.0375
## 3rd Qu.:0.00 3rd Qu.:0.00 3rd Qu.:0.0000
## Max. :1.00 Max. :1.00 Max. :1.0000
##
## ProductType.Smartphone ProductType.Software ProductType.Tablet
## Min. :0.00 Min. :0.000 Min. :0.0000
## 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.0000
## Median :0.00 Median :0.000 Median :0.0000
## Mean :0.05 Mean :0.075 Mean :0.0375
## 3rd Qu.:0.00 3rd Qu.:0.000 3rd Qu.:0.0000
## Max. :1.00 Max. :1.000 Max. :1.0000
##
## ProductNum Price x5StarReviews x4StarReviews
## Min. :101.0 Min. : 3.60 Min. : 0.0 Min. : 0.00
## 1st Qu.:120.8 1st Qu.: 52.66 1st Qu.: 10.0 1st Qu.: 2.75
## Median :140.5 Median : 132.72 Median : 50.0 Median : 22.00
## Mean :142.6 Mean : 247.25 Mean : 176.2 Mean : 40.20
## 3rd Qu.:160.2 3rd Qu.: 352.49 3rd Qu.: 306.5 3rd Qu.: 33.00
## Max. :200.0 Max. :2249.99 Max. :2801.0 Max. :431.00
##
## x3StarReviews x2StarReviews x1StarReviews PositiveServiceReview
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 2.00 1st Qu.: 2.00
## Median : 7.00 Median : 3.00 Median : 8.50 Median : 5.50
## Mean : 14.79 Mean : 13.79 Mean : 37.67 Mean : 51.75
## 3rd Qu.: 11.25 3rd Qu.: 7.00 3rd Qu.: 15.25 3rd Qu.: 42.00
## Max. :162.00 Max. :370.00 Max. :1654.00 Max. :536.00
##
## NegativeServiceReview Recommendproduct BestSellersRank ShippingWeight
## Min. : 0.000 Min. :0.100 Min. : 1 Min. : 0.0100
## 1st Qu.: 1.000 1st Qu.:0.700 1st Qu.: 7 1st Qu.: 0.5125
## Median : 3.000 Median :0.800 Median : 27 Median : 2.1000
## Mean : 6.225 Mean :0.745 Mean : 1126 Mean : 9.6681
## 3rd Qu.: 6.250 3rd Qu.:0.900 3rd Qu.: 281 3rd Qu.:11.2050
## Max. :112.000 Max. :1.000 Max. :17502 Max. :63.0000
## NA's :15
## ProductDepth ProductWidth ProductHeight ProfitMargin
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.0500
## 1st Qu.: 4.775 1st Qu.: 1.750 1st Qu.: 0.400 1st Qu.:0.0500
## Median : 7.950 Median : 6.800 Median : 3.950 Median :0.1200
## Mean : 14.425 Mean : 7.819 Mean : 6.259 Mean :0.1545
## 3rd Qu.: 15.025 3rd Qu.:11.275 3rd Qu.:10.300 3rd Qu.:0.2000
## Max. :300.000 Max. :31.750 Max. :25.800 Max. :0.4000
##
## Volume
## Min. : 0
## 1st Qu.: 40
## Median : 200
## Mean : 705
## 3rd Qu.: 1226
## Max. :11204
##
#delete all columns with missing data
readyData$ProductHeight <- NULL
readyData$BestSellersRank <- NULL
#Find the correlation between the relevant independent variables and the dependent variable
corrData <- cor(readyData)
#call the data
corrData
## ProductType.Accessories ProductType.Display
## ProductType.Accessories 1.000000000 -0.17916128
## ProductType.Display -0.179161283 1.00000000
## ProductType.ExtendedWarranty -0.262265264 -0.09759001
## ProductType.GameConsole -0.111111111 -0.04134491
## ProductType.Laptop -0.136963567 -0.05096472
## ProductType.Netbook -0.111111111 -0.04134491
## ProductType.PC -0.159188978 -0.05923489
## ProductType.Printer -0.291491544 -0.10846523
## ProductType.PrinterSupplies -0.136963567 -0.05096472
## ProductType.Smartphone -0.159188978 -0.05923489
## ProductType.Software -0.197582993 -0.07352146
## ProductType.Tablet -0.136963567 -0.05096472
## ProductNum -0.338862490 0.08407390
## Price -0.384906124 0.23172981
## x5StarReviews 0.127803771 -0.03758386
## x4StarReviews 0.156715126 -0.00293832
## x3StarReviews 0.110608918 -0.03849540
## x2StarReviews 0.033055555 -0.02708636
## x1StarReviews -0.041647041 -0.03628464
## PositiveServiceReview 0.002699224 -0.09438421
## NegativeServiceReview -0.148034357 -0.01861755
## Recommendproduct 0.058505351 0.07239820
## ShippingWeight -0.341367875 0.10374059
## ProductDepth 0.191398963 0.01528395
## ProductWidth -0.154462467 0.28447123
## ProfitMargin -0.626935212 0.03906690
## Volume 0.127803771 -0.03758386
## ProductType.ExtendedWarranty
## ProductType.Accessories -0.26226526
## ProductType.Display -0.09759001
## ProductType.ExtendedWarranty 1.00000000
## ProductType.GameConsole -0.06052275
## ProductType.Laptop -0.07460471
## ProductType.Netbook -0.06052275
## ProductType.PC -0.08671100
## ProductType.Printer -0.15877684
## ProductType.PrinterSupplies -0.07460471
## ProductType.Smartphone -0.08671100
## ProductType.Software -0.10762440
## ProductType.Tablet -0.07460471
## ProductNum -0.08607897
## Price -0.09780278
## x5StarReviews 0.07086528
## x4StarReviews -0.09946665
## x3StarReviews -0.09934446
## x2StarReviews -0.09348376
## x1StarReviews -0.05189306
## PositiveServiceReview 0.62710951
## NegativeServiceReview 0.01528844
## Recommendproduct 0.14451833
## ShippingWeight -0.23680262
## ProductDepth -0.15707124
## ProductWidth -0.43640441
## ProfitMargin 0.80226723
## Volume 0.07086528
## ProductType.GameConsole ProductType.Laptop
## ProductType.Accessories -0.111111111 -0.136963567
## ProductType.Display -0.041344912 -0.050964719
## ProductType.ExtendedWarranty -0.060522753 -0.074604710
## ProductType.GameConsole 1.000000000 -0.031606977
## ProductType.Laptop -0.031606977 1.000000000
## ProductType.Netbook -0.025641026 -0.031606977
## ProductType.PC -0.036735918 -0.045283341
## ProductType.Printer -0.067267279 -0.082918499
## ProductType.PrinterSupplies -0.031606977 -0.038961039
## ProductType.Smartphone -0.036735918 -0.045283341
## ProductType.Software -0.045596075 -0.056205010
## ProductType.Tablet -0.031606977 -0.038961039
## ProductNum 0.340268975 -0.187367237
## Price -0.015543759 0.296140664
## x5StarReviews 0.388298241 -0.069799582
## x4StarReviews 0.344636607 -0.052974299
## x3StarReviews 0.258709076 -0.045679827
## x2StarReviews 0.074429824 -0.038007390
## x1StarReviews 0.003300983 -0.021994035
## PositiveServiceReview -0.014267327 -0.085716506
## NegativeServiceReview 0.081949267 0.052417536
## Recommendproduct 0.126534828 -0.011740126
## ShippingWeight -0.006072795 -0.055573162
## ProductDepth -0.019260720 -0.005033888
## ProductWidth 0.022014271 -0.042331972
## ProfitMargin -0.006230125 -0.081632386
## Volume 0.388298241 -0.069799582
## ProductType.Netbook ProductType.PC
## ProductType.Accessories -0.11111111 -0.15918898
## ProductType.Display -0.04134491 -0.05923489
## ProductType.ExtendedWarranty -0.06052275 -0.08671100
## ProductType.GameConsole -0.02564103 -0.03673592
## ProductType.Laptop -0.03160698 -0.04528334
## ProductType.Netbook 1.00000000 -0.03673592
## ProductType.PC -0.03673592 1.00000000
## ProductType.Printer -0.06726728 -0.09637388
## ProductType.PrinterSupplies -0.03160698 -0.04528334
## ProductType.Smartphone -0.03673592 -0.05263158
## ProductType.Software -0.04559608 -0.06532553
## ProductType.Tablet -0.03160698 -0.04528334
## ProductNum 0.22272699 -0.26383058
## Price 0.05587061 0.54711260
## x5StarReviews -0.07001054 -0.10289168
## x4StarReviews -0.08017983 -0.12221649
## x3StarReviews -0.05874134 -0.10093459
## x2StarReviews -0.04311403 -0.06931004
## x1StarReviews -0.02819859 -0.04287299
## PositiveServiceReview -0.07750629 -0.10938596
## NegativeServiceReview -0.04759253 -0.08835918
## Recommendproduct -0.36327741 0.09356725
## ShippingWeight -0.06005908 0.31738315
## ProductDepth -0.03192360 0.05401162
## ProductWidth 0.06221219 0.20211260
## ProfitMargin -0.06160902 -0.02380242
## Volume -0.07001054 -0.10289168
## ProductType.Printer
## ProductType.Accessories -0.291491544
## ProductType.Display -0.108465229
## ProductType.ExtendedWarranty -0.158776837
## ProductType.GameConsole -0.067267279
## ProductType.Laptop -0.082918499
## ProductType.Netbook -0.067267279
## ProductType.PC -0.096373885
## ProductType.Printer 1.000000000
## ProductType.PrinterSupplies -0.082918499
## ProductType.Smartphone -0.096373885
## ProductType.Software -0.119617833
## ProductType.Tablet -0.082918499
## ProductNum 0.249589095
## Price -0.037212288
## x5StarReviews -0.149200679
## x4StarReviews -0.133159178
## x3StarReviews -0.121109706
## x2StarReviews -0.087025567
## x1StarReviews -0.060204072
## PositiveServiceReview -0.184785826
## NegativeServiceReview 0.008126681
## Recommendproduct -0.149914932
## ShippingWeight 0.757676417
## ProductDepth 0.029243566
## ProductWidth 0.555981505
## ProfitMargin -0.055691552
## Volume -0.149200679
## ProductType.PrinterSupplies
## ProductType.Accessories -0.13696357
## ProductType.Display -0.05096472
## ProductType.ExtendedWarranty -0.07460471
## ProductType.GameConsole -0.03160698
## ProductType.Laptop -0.03896104
## ProductType.Netbook -0.03160698
## ProductType.PC -0.04528334
## ProductType.Printer -0.08291850
## ProductType.PrinterSupplies 1.00000000
## ProductType.Smartphone -0.04528334
## ProductType.Software -0.05620501
## ProductType.Tablet -0.03896104
## ProductNum -0.09325018
## Price -0.11477363
## x5StarReviews -0.09040334
## x4StarReviews -0.11100268
## x3StarReviews -0.09486115
## x2StarReviews -0.05819149
## x1StarReviews -0.03866021
## PositiveServiceReview -0.09649048
## NegativeServiceReview -0.08180838
## Recommendproduct -0.07882656
## ShippingWeight -0.07272702
## ProductDepth -0.06686404
## ProductWidth -0.18418343
## ProfitMargin 0.27675370
## Volume -0.09040334
## ProductType.Smartphone ProductType.Software
## ProductType.Accessories -0.159188978 -0.197582993
## ProductType.Display -0.059234888 -0.073521462
## ProductType.ExtendedWarranty -0.086710997 -0.107624401
## ProductType.GameConsole -0.036735918 -0.045596075
## ProductType.Laptop -0.045283341 -0.056205010
## ProductType.Netbook -0.036735918 -0.045596075
## ProductType.PC -0.052631579 -0.065325533
## ProductType.Printer -0.096373885 -0.119617833
## ProductType.PrinterSupplies -0.045283341 -0.056205010
## ProductType.Smartphone 1.000000000 -0.065325533
## ProductType.Software -0.065325533 1.000000000
## ProductType.Tablet -0.045283341 -0.056205010
## ProductNum 0.431369468 -0.214914072
## Price 0.001358954 -0.058780827
## x5StarReviews -0.038508275 0.001196472
## x4StarReviews -0.073264624 0.154461169
## x3StarReviews -0.054335058 0.234863478
## x2StarReviews -0.037891166 0.350735807
## x1StarReviews -0.031436074 0.393364234
## PositiveServiceReview -0.093917237 -0.041827585
## NegativeServiceReview -0.056081857 0.406129939
## Recommendproduct -0.023391813 -0.065325533
## ShippingWeight -0.133107177 -0.167879959
## ProductDepth -0.074189350 -0.071848214
## ProductWidth -0.110745236 -0.183007483
## ProfitMargin -0.053555434 0.091501867
## Volume -0.038508275 0.001196472
## ProductType.Tablet ProductNum Price
## ProductType.Accessories -0.136963567 -0.338862490 -0.384906124
## ProductType.Display -0.050964719 0.084073899 0.231729810
## ProductType.ExtendedWarranty -0.074604710 -0.086078971 -0.097802784
## ProductType.GameConsole -0.031606977 0.340268975 -0.015543759
## ProductType.Laptop -0.038961039 -0.187367237 0.296140664
## ProductType.Netbook -0.031606977 0.222726991 0.055870610
## ProductType.PC -0.045283341 -0.263830575 0.547112596
## ProductType.Printer -0.082918499 0.249589095 -0.037212288
## ProductType.PrinterSupplies -0.038961039 -0.093250185 -0.114773627
## ProductType.Smartphone -0.045283341 0.431369468 0.001358954
## ProductType.Software -0.056205010 -0.214914072 -0.058780827
## ProductType.Tablet 1.000000000 0.332753315 0.131659520
## ProductNum 0.332753315 1.000000000 -0.039748728
## Price 0.131659520 -0.039748728 1.000000000
## x5StarReviews -0.050941908 0.166120763 -0.142343990
## x4StarReviews -0.002433448 0.119400607 -0.165283699
## x3StarReviews 0.005639815 0.090200642 -0.150537613
## x2StarReviews -0.013498120 -0.004533099 -0.110681189
## x1StarReviews -0.026603829 -0.063063850 -0.083957332
## PositiveServiceReview -0.081913925 -0.057748062 -0.142143291
## NegativeServiceReview -0.049409024 -0.019427155 -0.060790373
## Recommendproduct 0.088889522 0.003886211 0.068930357
## ShippingWeight -0.098414265 0.081238782 0.416777401
## ProductDepth -0.036157465 0.036187970 0.010967649
## ProductWidth 0.039281194 0.126793427 0.382397533
## ProfitMargin 0.026452306 0.039715141 0.099669405
## Volume -0.050941908 0.166120763 -0.142343990
## x5StarReviews x4StarReviews x3StarReviews
## ProductType.Accessories 0.127803771 0.1567151258 0.110608918
## ProductType.Display -0.037583856 -0.0029383203 -0.038495398
## ProductType.ExtendedWarranty 0.070865276 -0.0994666496 -0.099344457
## ProductType.GameConsole 0.388298241 0.3446366067 0.258709076
## ProductType.Laptop -0.069799582 -0.0529742995 -0.045679827
## ProductType.Netbook -0.070010545 -0.0801798318 -0.058741337
## ProductType.PC -0.102891676 -0.1222164888 -0.100934593
## ProductType.Printer -0.149200679 -0.1331591777 -0.121109706
## ProductType.PrinterSupplies -0.090403335 -0.1110026840 -0.094861150
## ProductType.Smartphone -0.038508275 -0.0732646241 -0.054335058
## ProductType.Software 0.001196472 0.1544611686 0.234863478
## ProductType.Tablet -0.050941908 -0.0024334484 0.005639815
## ProductNum 0.166120763 0.1194006067 0.090200642
## Price -0.142343990 -0.1652836990 -0.150537613
## x5StarReviews 1.000000000 0.8790063940 0.763373189
## x4StarReviews 0.879006394 1.0000000000 0.937214175
## x3StarReviews 0.763373189 0.9372141751 1.000000000
## x2StarReviews 0.487279328 0.6790056214 0.861480050
## x1StarReviews 0.255023904 0.4449417168 0.679276158
## PositiveServiceReview 0.622260219 0.4834212832 0.418517393
## NegativeServiceReview 0.309418989 0.5332221777 0.684096619
## Recommendproduct 0.169541264 0.0714153315 -0.056613257
## ShippingWeight -0.188023980 -0.1949140938 -0.171842042
## ProductDepth 0.066105249 -0.0317207111 -0.049376503
## ProductWidth -0.143436609 -0.0006476125 -0.018838926
## ProfitMargin -0.013448603 -0.1466538020 -0.128706922
## Volume 1.000000000 0.8790063940 0.763373189
## x2StarReviews x1StarReviews
## ProductType.Accessories 0.033055555 -0.041647041
## ProductType.Display -0.027086357 -0.036284641
## ProductType.ExtendedWarranty -0.093483762 -0.051893064
## ProductType.GameConsole 0.074429824 0.003300983
## ProductType.Laptop -0.038007390 -0.021994035
## ProductType.Netbook -0.043114035 -0.028198592
## ProductType.PC -0.069310042 -0.042872993
## ProductType.Printer -0.087025567 -0.060204072
## ProductType.PrinterSupplies -0.058191494 -0.038660213
## ProductType.Smartphone -0.037891166 -0.031436074
## ProductType.Software 0.350735807 0.393364234
## ProductType.Tablet -0.013498120 -0.026603829
## ProductNum -0.004533099 -0.063063850
## Price -0.110681189 -0.083957332
## x5StarReviews 0.487279328 0.255023904
## x4StarReviews 0.679005621 0.444941717
## x3StarReviews 0.861480050 0.679276158
## x2StarReviews 1.000000000 0.951912978
## x1StarReviews 0.951912978 1.000000000
## PositiveServiceReview 0.308901370 0.200035288
## NegativeServiceReview 0.864754808 0.884728323
## Recommendproduct -0.197917979 -0.246092974
## ShippingWeight -0.128685586 -0.095656192
## ProductDepth -0.042636007 -0.034639801
## ProductWidth -0.065799979 -0.101139826
## ProfitMargin -0.090093715 -0.031227760
## Volume 0.487279328 0.255023904
## PositiveServiceReview NegativeServiceReview
## ProductType.Accessories 0.002699224 -0.148034357
## ProductType.Display -0.094384206 -0.018617554
## ProductType.ExtendedWarranty 0.627109511 0.015288441
## ProductType.GameConsole -0.014267327 0.081949267
## ProductType.Laptop -0.085716506 0.052417536
## ProductType.Netbook -0.077506288 -0.047592529
## ProductType.PC -0.109385958 -0.088359185
## ProductType.Printer -0.184785826 0.008126681
## ProductType.PrinterSupplies -0.096490485 -0.081808383
## ProductType.Smartphone -0.093917237 -0.056081857
## ProductType.Software -0.041827585 0.406129939
## ProductType.Tablet -0.081913925 -0.049409024
## ProductNum -0.057748062 -0.019427155
## Price -0.142143291 -0.060790373
## x5StarReviews 0.622260219 0.309418989
## x4StarReviews 0.483421283 0.533222178
## x3StarReviews 0.418517393 0.684096619
## x2StarReviews 0.308901370 0.864754808
## x1StarReviews 0.200035288 0.884728323
## PositiveServiceReview 1.000000000 0.265549747
## NegativeServiceReview 0.265549747 1.000000000
## Recommendproduct 0.232828810 -0.188329242
## ShippingWeight -0.270738543 -0.111793874
## ProductDepth -0.050526592 -0.067410452
## ProductWidth -0.339093728 -0.097207127
## ProfitMargin 0.423591716 0.042035630
## Volume 0.622260219 0.309418989
## Recommendproduct ShippingWeight ProductDepth
## ProductType.Accessories 0.058505351 -0.341367875 0.191398963
## ProductType.Display 0.072398196 0.103740595 0.015283953
## ProductType.ExtendedWarranty 0.144518328 -0.236802620 -0.157071240
## ProductType.GameConsole 0.126534828 -0.006072795 -0.019260720
## ProductType.Laptop -0.011740126 -0.055573162 -0.005033888
## ProductType.Netbook -0.363277411 -0.060059077 -0.031923597
## ProductType.PC 0.093567251 0.317383148 0.054011618
## ProductType.Printer -0.149914932 0.757676417 0.029243566
## ProductType.PrinterSupplies -0.078826557 -0.072727018 -0.066864039
## ProductType.Smartphone -0.023391813 -0.133107177 -0.074189350
## ProductType.Software -0.065325533 -0.167879959 -0.071848214
## ProductType.Tablet 0.088889522 -0.098414265 -0.036157465
## ProductNum 0.003886211 0.081238782 0.036187970
## Price 0.068930357 0.416777401 0.010967649
## x5StarReviews 0.169541264 -0.188023980 0.066105249
## x4StarReviews 0.071415331 -0.194914094 -0.031720711
## x3StarReviews -0.056613257 -0.171842042 -0.049376503
## x2StarReviews -0.197917979 -0.128685586 -0.042636007
## x1StarReviews -0.246092974 -0.095656192 -0.034639801
## PositiveServiceReview 0.232828810 -0.270738543 -0.050526592
## NegativeServiceReview -0.188329242 -0.111793874 -0.067410452
## Recommendproduct 1.000000000 -0.126043887 0.090358266
## ShippingWeight -0.126043887 1.000000000 0.065596924
## ProductDepth 0.090358266 0.065596924 1.000000000
## ProductWidth 0.011091086 0.692473518 -0.006008512
## ProfitMargin 0.095760642 -0.079215379 -0.207176026
## Volume 0.169541264 -0.188023980 0.066105249
## ProductWidth ProfitMargin Volume
## ProductType.Accessories -0.1544624673 -0.626935212 0.127803771
## ProductType.Display 0.2844712255 0.039066904 -0.037583856
## ProductType.ExtendedWarranty -0.4364044058 0.802267233 0.070865276
## ProductType.GameConsole 0.0220142711 -0.006230125 0.388298241
## ProductType.Laptop -0.0423319723 -0.081632386 -0.069799582
## ProductType.Netbook 0.0622121883 -0.061609018 -0.070010545
## ProductType.PC 0.2021125967 -0.023802415 -0.102891676
## ProductType.Printer 0.5559815049 -0.055691552 -0.149200679
## ProductType.PrinterSupplies -0.1841834287 0.276753698 -0.090403335
## ProductType.Smartphone -0.1107452361 -0.053555434 -0.038508275
## ProductType.Software -0.1830074830 0.091501867 0.001196472
## ProductType.Tablet 0.0392811944 0.026452306 -0.050941908
## ProductNum 0.1267934273 0.039715141 0.166120763
## Price 0.3823975328 0.099669405 -0.142343990
## x5StarReviews -0.1434366092 -0.013448603 1.000000000
## x4StarReviews -0.0006476125 -0.146653802 0.879006394
## x3StarReviews -0.0188389256 -0.128706922 0.763373189
## x2StarReviews -0.0657999794 -0.090093715 0.487279328
## x1StarReviews -0.1011398264 -0.031227760 0.255023904
## PositiveServiceReview -0.3390937285 0.423591716 0.622260219
## NegativeServiceReview -0.0972071272 0.042035630 0.309418989
## Recommendproduct 0.0110910859 0.095760642 0.169541264
## ShippingWeight 0.6924735181 -0.079215379 -0.188023980
## ProductDepth -0.0060085117 -0.207176026 0.066105249
## ProductWidth 1.0000000000 -0.291436397 -0.143436609
## ProfitMargin -0.2914363968 1.000000000 -0.013448603
## Volume -0.1434366092 -0.013448603 1.000000000
#note: Correlation values fall within -1 and 1 with variables have string positive relationships having correlation values closer to 1 and strong negative relationships with values closer to -1.
#visualize the correlation matrix using a heat map
install.packages(“corrplot”)
#load corrplot- a correllation matrix heatwave creator
library(corrplot)
## corrplot 0.84 loaded
#call the corrplot matrix for the data
corrplot(corrData)
#blue (cooler) colors show a positive relationship and red (warmer) colors indicate more negative relationships #create training and test sets after allowing for creation of random numbers using set seed
set.seed(123)
#assign names and calculate the taining size and test size
trainSize <- round(nrow(readyData)*0.7)
testSize <- round(nrow(readyData)- trainSize)
#check training and test size
trainSize
## [1] 56
testSize
## [1] 24
#train the dataset
training_indices<-sample(seq_len(nrow(readyData)),size =trainSize)
#Assign the training and test data into the names trainingset and testsize
trainSet<-readyData[training_indices,]
testSet<-readyData[-training_indices,]
#run linear regression model
readydata_LM<-lm(Volume ~ ., trainSet)
#check the outcome of the linear regression model
readydata_LM
##
## Call:
## lm(formula = Volume ~ ., data = trainSet)
##
## Coefficients:
## (Intercept) ProductType.Accessories
## -1.377e-12 7.275e-13
## ProductType.Display ProductType.ExtendedWarranty
## 4.620e-13 -3.395e-13
## ProductType.GameConsole ProductType.Laptop
## 1.053e-12 5.793e-13
## ProductType.Netbook ProductType.PC
## -6.979e-14 5.547e-13
## ProductType.Printer ProductType.PrinterSupplies
## 1.100e-13 -4.381e-13
## ProductType.Smartphone ProductType.Software
## -3.830e-14 3.388e-13
## ProductType.Tablet ProductNum
## NA 8.616e-15
## Price x5StarReviews
## -3.079e-16 4.000e+00
## x4StarReviews x3StarReviews
## 6.992e-17 9.073e-15
## x2StarReviews x1StarReviews
## 8.772e-16 -1.159e-15
## PositiveServiceReview NegativeServiceReview
## 1.256e-15 3.137e-17
## Recommendproduct ShippingWeight
## -2.794e-13 5.957e-15
## ProductDepth ProductWidth
## 4.160e-16 -2.053e-14
## ProfitMargin
## 2.073e-12
#get a summary of the content of the finding
summary(readydata_LM)
## Warning in summary.lm(readydata_LM): essentially perfect fit: summary may
## be unreliable
##
## Call:
## lm(formula = Volume ~ ., data = trainSet)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.331e-12 -1.152e-13 2.310e-14 1.943e-13 1.396e-12
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.377e-12 2.002e-12 -6.880e-01 0.497
## ProductType.Accessories 7.275e-13 9.713e-13 7.490e-01 0.460
## ProductType.Display 4.620e-13 7.612e-13 6.070e-01 0.548
## ProductType.ExtendedWarranty -3.395e-13 1.549e-12 -2.190e-01 0.828
## ProductType.GameConsole 1.053e-12 9.956e-13 1.058e+00 0.299
## ProductType.Laptop 5.793e-13 9.961e-13 5.820e-01 0.565
## ProductType.Netbook -6.979e-14 9.880e-13 -7.100e-02 0.944
## ProductType.PC 5.547e-13 9.691e-13 5.720e-01 0.571
## ProductType.Printer 1.100e-13 1.186e-12 9.300e-02 0.927
## ProductType.PrinterSupplies -4.381e-13 1.318e-12 -3.320e-01 0.742
## ProductType.Smartphone -3.830e-14 7.363e-13 -5.200e-02 0.959
## ProductType.Software 3.388e-13 1.160e-12 2.920e-01 0.772
## ProductType.Tablet NA NA NA NA
## ProductNum 8.616e-15 8.942e-15 9.640e-01 0.343
## Price -3.080e-16 8.915e-16 -3.450e-01 0.732
## x5StarReviews 4.000e+00 1.387e-15 2.885e+15 <2e-16 ***
## x4StarReviews 6.992e-17 1.175e-14 6.000e-03 0.995
## x3StarReviews 9.073e-15 3.695e-14 2.460e-01 0.808
## x2StarReviews 8.772e-16 4.077e-14 2.200e-02 0.983
## x1StarReviews -1.159e-15 8.343e-15 -1.390e-01 0.890
## PositiveServiceReview 1.256e-15 2.245e-15 5.600e-01 0.580
## NegativeServiceReview 3.137e-17 2.693e-14 1.000e-03 0.999
## Recommendproduct -2.794e-13 6.903e-13 -4.050e-01 0.688
## ShippingWeight 5.957e-15 2.197e-14 2.710e-01 0.788
## ProductDepth 4.160e-16 3.175e-15 1.310e-01 0.897
## ProductWidth -2.053e-14 4.488e-14 -4.570e-01 0.651
## ProfitMargin 2.073e-12 4.011e-12 5.170e-01 0.609
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.273e-13 on 30 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 1.291e+31 on 25 and 30 DF, p-value: < 2.2e-16
#non-parametric machine learning models - support veector machine, random forest and K-NN. #we will work with two of these models
#rF
#randomize the dataset
set.seed(123)
#assign names and calculate the taining size and test size
trainSize <- round(nrow(readyData)*0.7)
testSize <- round(nrow(readyData)- trainSize)
#check training and test size
trainSize
## [1] 56
testSize
## [1] 24
#train the dataset
training_indices<-sample(seq_len(nrow(readyData)),size =trainSize)
#Assign the training and test data into the names trainingset and testsize
trainSet<-readyData[training_indices,]
testSet<-readyData[-training_indices,]
#10 fold cross validation
fit_Control <- trainControl(method = "repeatedcv", number = 10, repeats = 1)
#train Random Forest Regression model with a tuneLenght = 1 (trains with 1 mtry value for RandomForest)
readydata_rf <- train(Volume ~ ., data = trainSet, method = "rf", trControl=fit_Control, tuneLength = 1, importance = T)
#training results
readydata_rf
## Random Forest
##
## 56 samples
## 26 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 51, 50, 50, 50, 52, 49, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 744.0774 0.9094836 365.2133
##
## Tuning parameter 'mtry' was held constant at a value of 5
#see asummary of the model created
summary(readydata_rf)
## Length Class Mode
## call 5 -none- call
## type 1 -none- character
## predicted 56 -none- numeric
## mse 500 -none- numeric
## rsq 500 -none- numeric
## oob.times 56 -none- numeric
## importance 52 -none- numeric
## importanceSD 26 -none- numeric
## localImportance 0 -none- NULL
## proximity 0 -none- NULL
## ntree 1 -none- numeric
## mtry 1 -none- numeric
## forest 11 -none- list
## coefs 0 -none- NULL
## y 56 -none- numeric
## test 0 -none- NULL
## inbag 0 -none- NULL
## xNames 26 -none- character
## problemType 1 -none- character
## tuneValue 1 data.frame list
## obsLevels 1 -none- logical
## param 1 -none- list
#predict the findings
readydata_rf_predict<-predict(object = readydata_rf, newdata=testSet, na.action = na.pass)
#to see all the predictions
readydata_rf_predict
## 2 3 4 11 16 20
## 51.45960 30.86053 237.20227 175.33970 68.04251 104.56920
## 22 24 28 30 33 35
## 2416.66453 132.83141 66.16314 38.20800 48.83653 1228.60438
## 37 44 46 47 48 52
## 1228.60438 460.19996 1230.46023 867.05756 4179.61370 342.12703
## 59 60 61 63 70 76
## 89.79021 312.41008 345.58541 33.01707 22.49693 403.67973
#this tells the most important independent variables
varImp(readydata_rf)
## rf variable importance
##
## only 20 most important variables shown (out of 26)
##
## Overall
## x5StarReviews 100.00
## PositiveServiceReview 98.02
## x4StarReviews 72.82
## x3StarReviews 70.14
## x1StarReviews 63.92
## x2StarReviews 62.28
## ProductNum 51.37
## Recommendproduct 50.21
## NegativeServiceReview 43.74
## ShippingWeight 35.70
## Price 29.18
## ProfitMargin 28.66
## ProductType.Printer 25.57
## ProductType.PrinterSupplies 24.23
## ProductType.Display 20.56
## ProductType.Tablet 17.83
## ProductWidth 17.58
## ProductType.GameConsole 14.95
## ProductType.Netbook 14.48
## ProductType.Accessories 11.16
#Error check 1 - test of testSet result
postResample(testSet$Volume, readydata_rf_predict)
## RMSE Rsquared MAE
## 456.600407 0.873047 167.923326
#Error check 2 - test of trainSet result
readydata_rf_predict2 <- predict(object = readydata_rf,
newdata = trainSet)
postResample(testSet$Volume, readydata_rf_predict2)
## Warning in pred - obs: longer object length is not a multiple of shorter
## object length
## Warning in pred - obs: longer object length is not a multiple of shorter
## object length
## RMSE Rsquared MAE
## 1432.1413 NA 875.6286
#Error check 3 - confusion matrix - not working confusionMatrix(table(testSet$Volume, readydata_rf_predict))
#because the confusion matrix directly is giving an error, then use the union function to unify it
U <- union(readydata_rf_predict, testSet$Volume)
#create a model that incorporates the both items as factor and adds the unifying model done earlier readydata_conf_matrix <- table(factor(readydata_KNN_predict, U), factor(testSet$Volume, U))
#run the confusion matrix confusionMatrix(readydata_conf_matrix)
#KNN
#randomize the dataset
set.seed
## function (seed, kind = NULL, normal.kind = NULL, sample.kind = NULL)
## {
## kinds <- c("Wichmann-Hill", "Marsaglia-Multicarry", "Super-Duper",
## "Mersenne-Twister", "Knuth-TAOCP", "user-supplied", "Knuth-TAOCP-2002",
## "L'Ecuyer-CMRG", "default")
## n.kinds <- c("Buggy Kinderman-Ramage", "Ahrens-Dieter", "Box-Muller",
## "user-supplied", "Inversion", "Kinderman-Ramage", "default")
## s.kinds <- c("Rounding", "Rejection", "default")
## if (length(kind)) {
## if (!is.character(kind) || length(kind) > 1L)
## stop("'kind' must be a character string of length 1 (RNG to be used).")
## if (is.na(i.knd <- pmatch(kind, kinds) - 1L))
## stop(gettextf("'%s' is not a valid abbreviation of an RNG",
## kind), domain = NA)
## if (i.knd == length(kinds) - 1L)
## i.knd <- -1L
## }
## else i.knd <- NULL
## if (!is.null(normal.kind)) {
## if (!is.character(normal.kind) || length(normal.kind) !=
## 1L)
## stop("'normal.kind' must be a character string of length 1")
## normal.kind <- pmatch(normal.kind, n.kinds) - 1L
## if (is.na(normal.kind))
## stop(gettextf("'%s' is not a valid choice", normal.kind),
## domain = NA)
## if (normal.kind == 0L)
## stop("buggy version of Kinderman-Ramage generator is not allowed",
## domain = NA)
## if (normal.kind == length(n.kinds) - 1L)
## normal.kind <- -1L
## }
## if (!is.null(sample.kind)) {
## if (!is.character(sample.kind) || length(sample.kind) !=
## 1L)
## stop("'sample.kind' must be a character string of length 1")
## sample.kind <- pmatch(sample.kind, s.kinds) - 1L
## if (is.na(sample.kind))
## stop(gettextf("'%s' is not a valid choice", sample.kind),
## domain = NA)
## if (sample.kind == 0L)
## warning("non-uniform 'Rounding' sampler used", domain = NA)
## if (sample.kind == length(s.kinds) - 1L)
## sample.kind <- -1L
## }
## .Internal(set.seed(seed, i.knd, normal.kind, sample.kind))
## }
## <bytecode: 0x000000001d4bc0b0>
## <environment: namespace:base>
#assign names and calculate the taining size and test size
trainSize <- round(nrow(readyData)*0.7)
testSize <- round(nrow(readyData)- trainSize)
#check training and test size
trainSize
## [1] 56
testSize
## [1] 24
#train the dataset
training_indices<-sample(seq_len(nrow(readyData)),size =trainSize)
#Assign the training and test data into the names trainingset and testsize
trainSet<-readyData[training_indices,]
testSet<-readyData[-training_indices,]
#10 fold cross validation
fit_Control <- trainControl(method = "repeatedcv", number = 10, repeats = 1)
#train Random Forest Regression model with a tuneLenght = 1 (trains with 1 mtry value for RandomForest)
readydata_KNN <- train(Volume ~ ., data = trainSet, method = "knn", trControl=fit_Control, tuneLength = 1)
#training results
readydata_KNN
## k-Nearest Neighbors
##
## 56 samples
## 26 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 52, 50, 51, 52, 51, 49, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 409.2678 0.9129648 237.8632
##
## Tuning parameter 'k' was held constant at a value of 5
#this helps you to see all the data
summary(readydata_KNN)
## Length Class Mode
## learn 2 -none- list
## k 1 -none- numeric
## theDots 0 -none- list
## xNames 26 -none- character
## problemType 1 -none- character
## tuneValue 1 data.frame list
## obsLevels 1 -none- logical
## param 0 -none- list
#predict the findings
readydata_KNN_predict<-predict(object = readydata_KNN, newdata=testSet, na.action = na.pass)
#to see all the predictions
readydata_KNN_predict
## [1] 231.2 90.4 44.0 1227.2 1150.4 44.8 284.0 94.4 40.8 44.0
## [11] 1278.4 1278.4 1278.4 1322.4 240.8 2794.4 1439.2 129.6 46.4 46.4
## [21] 97.6 85.6 90.4 90.4
#this tells the most important independent variables
varImp(readydata_KNN)
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 26)
##
## Overall
## x5StarReviews 100.0000
## x4StarReviews 81.7153
## x3StarReviews 64.5191
## ProductType.GameConsole 46.6173
## x2StarReviews 38.8412
## NegativeServiceReview 34.5710
## PositiveServiceReview 32.0117
## ProductNum 31.1917
## ShippingWeight 8.9524
## ProductDepth 7.8388
## Recommendproduct 4.4946
## Price 3.9370
## ProfitMargin 3.3200
## ProductWidth 3.2377
## ProductType.Printer 1.9986
## ProductType.PC 1.8028
## ProductType.Netbook 0.6405
## ProductType.PrinterSupplies 0.6001
## ProductType.Accessories 0.5932
## ProductType.ExtendedWarranty 0.4183
#Error check 1 - test of testSet result
postResample(testSet$Volume, readydata_KNN_predict)
## RMSE Rsquared MAE
## 1723.3695754 0.6703293 440.4333333
#Error check 2 - test of trainSet result
readydata_KNN_predict2 <- predict(object = readydata_KNN,
newdata = trainSet)
postResample(testSet$Volume, readydata_KNN_predict2)
## Warning in pred - obs: longer object length is not a multiple of shorter
## object length
## Warning in pred - obs: longer object length is not a multiple of shorter
## object length
## RMSE Rsquared MAE
## 2241.0706 NA 965.9857
#Error check 3 - confusion matrix - not yet working confusionMatrix(table(testSet$Volume, readydata_KNN_predict))
#because the confusion matrix directly is giving an error, then use the union function to unify it
U <- union(readydata_KNN_predict, testSet$Volume)
#create a model that incorporates the both items as factor and adds the unifying model done earlier
readydata_conf_matrix <- table(factor(readydata_KNN_predict, U), factor(testSet$Volume, U))
#run the confusion matrix
confusionMatrix(readydata_conf_matrix)
## Confusion Matrix and Statistics
##
##
## 231.2 90.4 44 1227.2 1150.4 44.8 284 94.4 40.8 1278.4 1322.4
## 231.2 0 0 0 0 0 0 0 0 0 0 0
## 90.4 0 0 0 0 0 0 0 0 0 0 0
## 44 0 0 0 0 0 0 0 0 0 0 0
## 1227.2 0 0 0 0 0 0 0 0 0 0 0
## 1150.4 0 0 0 0 0 0 0 0 0 0 0
## 44.8 0 0 0 0 0 0 0 0 0 0 0
## 284 0 0 0 0 0 0 0 0 0 0 0
## 94.4 0 0 0 0 0 0 0 0 0 0 0
## 40.8 0 0 0 0 0 0 0 0 0 0 0
## 1278.4 0 0 0 0 0 0 0 0 0 0 0
## 1322.4 0 0 0 0 0 0 0 0 0 0 0
## 240.8 0 0 0 0 0 0 0 0 0 0 0
## 2794.4 0 0 0 0 0 0 0 0 0 0 0
## 1439.2 0 0 0 0 0 0 0 0 0 0 0
## 129.6 0 0 0 0 0 0 0 0 0 0 0
## 46.4 0 0 0 0 0 0 0 0 0 0 0
## 97.6 0 0 0 0 0 0 0 0 0 0 0
## 85.6 0 0 0 0 0 0 0 0 0 0 0
## 12 0 0 0 0 0 0 0 0 0 0 0
## 196 0 0 0 0 0 0 0 0 0 0 0
## 64 0 0 0 0 0 0 0 0 0 0 0
## 1252 0 0 0 0 0 0 0 0 0 0 0
## 680 0 0 0 0 0 0 0 0 0 0 0
## 60 0 0 0 0 0 0 0 0 0 0 0
## 308 0 0 0 0 0 0 0 0 0 0 0
## 88 0 0 0 0 0 0 0 0 0 0 0
## 0 0 0 0 0 0 0 0 0 0 0 0
## 20 0 0 0 0 0 0 0 0 0 0 0
## 1232 0 0 0 0 0 0 0 0 0 0 0
## 11204 0 0 0 0 0 0 0 0 0 0 0
## 1896 0 0 0 0 0 0 0 0 0 0 0
## 232 0 0 0 0 0 0 0 0 0 0 0
## 32 0 0 0 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0 0
## 16 0 0 0 0 0 0 0 0 0 0 0
##
## 240.8 2794.4 1439.2 129.6 46.4 97.6 85.6 12 196 64 1252 680 60
## 231.2 0 0 0 0 0 0 0 1 0 0 0 0 0
## 90.4 0 0 0 0 0 0 0 1 1 0 0 0 0
## 44 0 0 0 0 0 0 0 0 0 1 0 0 0
## 1227.2 0 0 0 0 0 0 0 0 0 0 1 0 0
## 1150.4 0 0 0 0 0 0 0 0 0 0 0 1 0
## 44.8 0 0 0 0 0 0 0 0 0 0 0 0 1
## 284 0 0 0 0 0 0 0 0 0 0 0 0 0
## 94.4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 40.8 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1278.4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1322.4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 240.8 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2794.4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1439.2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 129.6 0 0 0 0 0 0 0 0 0 0 0 0 0
## 46.4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 97.6 0 0 0 0 0 0 0 0 0 0 0 0 0
## 85.6 0 0 0 0 0 0 0 0 0 0 0 0 0
## 12 0 0 0 0 0 0 0 0 0 0 0 0 0
## 196 0 0 0 0 0 0 0 0 0 0 0 0 0
## 64 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1252 0 0 0 0 0 0 0 0 0 0 0 0 0
## 680 0 0 0 0 0 0 0 0 0 0 0 0 0
## 60 0 0 0 0 0 0 0 0 0 0 0 0 0
## 308 0 0 0 0 0 0 0 0 0 0 0 0 0
## 88 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 20 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1232 0 0 0 0 0 0 0 0 0 0 0 0 0
## 11204 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1896 0 0 0 0 0 0 0 0 0 0 0 0 0
## 232 0 0 0 0 0 0 0 0 0 0 0 0 0
## 32 0 0 0 0 0 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0 0 0 0
## 16 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## 308 88 0 20 1232 11204 1896 232 32 8 16
## 231.2 0 0 0 0 0 0 0 0 0 0 0
## 90.4 0 1 0 0 0 0 0 0 0 0 0
## 44 0 0 0 1 0 0 0 0 0 0 0
## 1227.2 0 0 0 0 0 0 0 0 0 0 0
## 1150.4 0 0 0 0 0 0 0 0 0 0 0
## 44.8 0 0 0 0 0 0 0 0 0 0 0
## 284 1 0 0 0 0 0 0 0 0 0 0
## 94.4 0 1 0 0 0 0 0 0 0 0 0
## 40.8 0 0 1 0 0 0 0 0 0 0 0
## 1278.4 0 0 0 0 3 0 0 0 0 0 0
## 1322.4 0 0 0 0 1 0 0 0 0 0 0
## 240.8 0 1 0 0 0 0 0 0 0 0 0
## 2794.4 0 0 0 0 0 1 0 0 0 0 0
## 1439.2 0 0 0 0 0 0 1 0 0 0 0
## 129.6 0 0 0 0 0 0 0 1 0 0 0
## 46.4 0 0 0 0 0 0 0 0 1 1 0
## 97.6 0 0 0 0 0 0 0 0 1 0 0
## 85.6 0 0 0 0 0 0 0 0 0 0 1
## 12 0 0 0 0 0 0 0 0 0 0 0
## 196 0 0 0 0 0 0 0 0 0 0 0
## 64 0 0 0 0 0 0 0 0 0 0 0
## 1252 0 0 0 0 0 0 0 0 0 0 0
## 680 0 0 0 0 0 0 0 0 0 0 0
## 60 0 0 0 0 0 0 0 0 0 0 0
## 308 0 0 0 0 0 0 0 0 0 0 0
## 88 0 0 0 0 0 0 0 0 0 0 0
## 0 0 0 0 0 0 0 0 0 0 0 0
## 20 0 0 0 0 0 0 0 0 0 0 0
## 1232 0 0 0 0 0 0 0 0 0 0 0
## 11204 0 0 0 0 0 0 0 0 0 0 0
## 1896 0 0 0 0 0 0 0 0 0 0 0
## 232 0 0 0 0 0 0 0 0 0 0 0
## 32 0 0 0 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0 0
## 16 0 0 0 0 0 0 0 0 0 0 0
##
## Overall Statistics
##
## Accuracy : 0
## 95% CI : (0, 0.1425)
## No Information Rate : 0.1667
## P-Value [Acc > NIR] : 1
##
## Kappa : 0
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 231.2 Class: 90.4 Class: 44 Class: 1227.2
## Sensitivity NA NA NA NA
## Specificity 0.95833 0.875 0.91667 0.95833
## Pos Pred Value NA NA NA NA
## Neg Pred Value NA NA NA NA
## Prevalence 0.00000 0.000 0.00000 0.00000
## Detection Rate 0.00000 0.000 0.00000 0.00000
## Detection Prevalence 0.04167 0.125 0.08333 0.04167
## Balanced Accuracy NA NA NA NA
## Class: 1150.4 Class: 44.8 Class: 284 Class: 94.4
## Sensitivity NA NA NA NA
## Specificity 0.95833 0.95833 0.95833 0.95833
## Pos Pred Value NA NA NA NA
## Neg Pred Value NA NA NA NA
## Prevalence 0.00000 0.00000 0.00000 0.00000
## Detection Rate 0.00000 0.00000 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167 0.04167 0.04167
## Balanced Accuracy NA NA NA NA
## Class: 40.8 Class: 1278.4 Class: 1322.4 Class: 240.8
## Sensitivity NA NA NA NA
## Specificity 0.95833 0.875 0.95833 0.95833
## Pos Pred Value NA NA NA NA
## Neg Pred Value NA NA NA NA
## Prevalence 0.00000 0.000 0.00000 0.00000
## Detection Rate 0.00000 0.000 0.00000 0.00000
## Detection Prevalence 0.04167 0.125 0.04167 0.04167
## Balanced Accuracy NA NA NA NA
## Class: 2794.4 Class: 1439.2 Class: 129.6 Class: 46.4
## Sensitivity NA NA NA NA
## Specificity 0.95833 0.95833 0.95833 0.91667
## Pos Pred Value NA NA NA NA
## Neg Pred Value NA NA NA NA
## Prevalence 0.00000 0.00000 0.00000 0.00000
## Detection Rate 0.00000 0.00000 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167 0.04167 0.08333
## Balanced Accuracy NA NA NA NA
## Class: 97.6 Class: 85.6 Class: 12 Class: 196
## Sensitivity NA NA 0.00000 0.00000
## Specificity 0.95833 0.95833 1.00000 1.00000
## Pos Pred Value NA NA NaN NaN
## Neg Pred Value NA NA 0.91667 0.95833
## Prevalence 0.00000 0.00000 0.08333 0.04167
## Detection Rate 0.00000 0.00000 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167 0.00000 0.00000
## Balanced Accuracy NA NA 0.50000 0.50000
## Class: 64 Class: 1252 Class: 680 Class: 60 Class: 308
## Sensitivity 0.00000 0.00000 0.00000 0.00000 0.00000
## Specificity 1.00000 1.00000 1.00000 1.00000 1.00000
## Pos Pred Value NaN NaN NaN NaN NaN
## Neg Pred Value 0.95833 0.95833 0.95833 0.95833 0.95833
## Prevalence 0.04167 0.04167 0.04167 0.04167 0.04167
## Detection Rate 0.00000 0.00000 0.00000 0.00000 0.00000
## Detection Prevalence 0.00000 0.00000 0.00000 0.00000 0.00000
## Balanced Accuracy 0.50000 0.50000 0.50000 0.50000 0.50000
## Class: 88 Class: 0 Class: 20 Class: 1232 Class: 11204
## Sensitivity 0.000 0.00000 0.00000 0.0000 0.00000
## Specificity 1.000 1.00000 1.00000 1.0000 1.00000
## Pos Pred Value NaN NaN NaN NaN NaN
## Neg Pred Value 0.875 0.95833 0.95833 0.8333 0.95833
## Prevalence 0.125 0.04167 0.04167 0.1667 0.04167
## Detection Rate 0.000 0.00000 0.00000 0.0000 0.00000
## Detection Prevalence 0.000 0.00000 0.00000 0.0000 0.00000
## Balanced Accuracy 0.500 0.50000 0.50000 0.5000 0.50000
## Class: 1896 Class: 232 Class: 32 Class: 8 Class: 16
## Sensitivity 0.00000 0.00000 0.00000 0.00000 0.00000
## Specificity 1.00000 1.00000 1.00000 1.00000 1.00000
## Pos Pred Value NaN NaN NaN NaN NaN
## Neg Pred Value 0.95833 0.95833 0.91667 0.95833 0.95833
## Prevalence 0.04167 0.04167 0.08333 0.04167 0.04167
## Detection Rate 0.00000 0.00000 0.00000 0.00000 0.00000
## Detection Prevalence 0.00000 0.00000 0.00000 0.00000 0.00000
## Balanced Accuracy 0.50000 0.50000 0.50000 0.50000 0.50000
#svm
#this helps to randomize the dataset
set.seed(123)
#assign names and calculate the taining size and test size
trainSize <- round(nrow(readyData)*0.7)
testSize <- round(nrow(readyData)- trainSize)
#check training and test size
trainSize
## [1] 56
testSize
## [1] 24
#train the dataset
training_indices<-sample(seq_len(nrow(readyData)),size =trainSize)
#Assign the training and test data into the names trainingset and testsize
trainSet<-readyData[training_indices,]
testSet<-readyData[-training_indices,]
#10 fold cross validation
fit_Control <- trainControl(method = "repeatedcv", number = 10, repeats = 1)
#train Random Forest Regression model with a tuneLenght = 1 (trains with 1 mtry value for RandomForest)
readydata_svm <- caret::train(Volume ~ .,
data = trainSet,
method = 'svmLinear',
trControl=fit_Control)
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
#training results - run the findings
readydata_svm
## Support Vector Machines with Linear Kernel
##
## 56 samples
## 26 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 51, 50, 50, 50, 52, 49, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 297.188 0.9449585 203.8192
##
## Tuning parameter 'C' was held constant at a value of 1
summary(readydata_svm)
## Length Class Mode
## 1 ksvm S4
#predict the findings
readydata_svm_predict<-predict(object = readydata_svm, newdata=testSet, na.action = na.pass)
#to see all the predictions
readydata_svm_predict
## 2 3 4 11 16 20
## -111.34558 -139.76812 69.13923 -139.55534 -77.01048 -102.91078
## 22 24 28 30 33 35
## 2050.59371 13.55979 74.91169 -18.07959 172.93467 1058.55421
## 37 44 46 47 48 52
## 1064.13237 363.76145 1433.87597 1244.44290 3850.96163 326.11444
## 59 60 61 63 70 76
## 105.29633 415.54477 424.60908 106.74523 -41.21731 446.69317
#this tells the most important independent variables
varImp(readydata_svm)
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 26)
##
## Overall
## x5StarReviews 100.0000
## x4StarReviews 94.4107
## PositiveServiceReview 81.5216
## x3StarReviews 71.0288
## x1StarReviews 61.2867
## x2StarReviews 47.6658
## NegativeServiceReview 32.1437
## ProductType.GameConsole 15.2493
## ProductNum 14.1920
## ProductWidth 7.7526
## ShippingWeight 7.6467
## ProductDepth 6.4082
## Price 3.8756
## Recommendproduct 3.5714
## ProductType.Printer 2.2693
## ProfitMargin 2.2596
## ProductType.Accessories 1.1128
## ProductType.PrinterSupplies 0.7925
## ProductType.PC 0.7197
## ProductType.Laptop 0.5254
#Error check 1 - test of testSet result
postResample(testSet$Volume, readydata_svm_predict)
## RMSE Rsquared MAE
## 391.4194368 0.8898941 206.9953504
#Error check 2 - test of trainSet result
readydata_svm_predict2 <- predict(object = readydata_svm,
newdata = trainSet)
postResample(testSet$Volume, readydata_svm_predict2)
## Warning in pred - obs: longer object length is not a multiple of shorter
## object length
## Warning in pred - obs: longer object length is not a multiple of shorter
## object length
## RMSE Rsquared MAE
## 1808.1496 NA 929.0069
#Error check 3 - confusion matrix - not yet working round(prop.table(confusionMatrix(testSet\(Volume, readydata_svm_predict)\)table))
confusionMatrix(table(testSet$Volume, readydata_svm_predict))
#because the confusion matrix directly is giving an error, then use the union function to unify it
U <- union(readydata_svm_predict, testSet$Volume)
#create a model that incorporates the both items as factor and adds the unifying model done earlier
readydata_conf_matrix <- table(factor(readydata_svm_predict, U), factor(testSet$Volume, U))
#run the confusion matrix
confusionMatrix(readydata_conf_matrix)
## Confusion Matrix and Statistics
##
##
## -111.345580871333 -139.768119082221 69.1392320836945
## -111.345580871333 0 0 0
## -139.768119082221 0 0 0
## 69.1392320836945 0 0 0
## -139.555340084582 0 0 0
## -77.0104840810835 0 0 0
## -102.910781300902 0 0 0
## 2050.5937087161 0 0 0
## 13.5597926256926 0 0 0
## 74.9116912054817 0 0 0
## -18.0795931178567 0 0 0
## 172.934673095712 0 0 0
## 1058.5542072308 0 0 0
## 1064.13237335324 0 0 0
## 363.76145495496 0 0 0
## 1433.87597119037 0 0 0
## 1244.44290005606 0 0 0
## 3850.96162627287 0 0 0
## 326.114439752832 0 0 0
## 105.296328373813 0 0 0
## 415.544773264371 0 0 0
## 424.609077423724 0 0 0
## 106.745231243858 0 0 0
## -41.2173078999225 0 0 0
## 446.693167676331 0 0 0
## 8 0 0 0
## 12 0 0 0
## 196 0 0 0
## 84 0 0 0
## 32 0 0 0
## 80 0 0 0
## 1576 0 0 0
## 116 0 0 0
## 88 0 0 0
## 24 0 0 0
## 20 0 0 0
## 1232 0 0 0
## 368 0 0 0
## 1464 0 0 0
## 836 0 0 0
## 2140 0 0 0
## 204 0 0 0
## 296 0 0 0
## 232 0 0 0
## 4 0 0 0
## 248 0 0 0
##
## -139.555340084582 -77.0104840810835 -102.910781300902
## -111.345580871333 0 0 0
## -139.768119082221 0 0 0
## 69.1392320836945 0 0 0
## -139.555340084582 0 0 0
## -77.0104840810835 0 0 0
## -102.910781300902 0 0 0
## 2050.5937087161 0 0 0
## 13.5597926256926 0 0 0
## 74.9116912054817 0 0 0
## -18.0795931178567 0 0 0
## 172.934673095712 0 0 0
## 1058.5542072308 0 0 0
## 1064.13237335324 0 0 0
## 363.76145495496 0 0 0
## 1433.87597119037 0 0 0
## 1244.44290005606 0 0 0
## 3850.96162627287 0 0 0
## 326.114439752832 0 0 0
## 105.296328373813 0 0 0
## 415.544773264371 0 0 0
## 424.609077423724 0 0 0
## 106.745231243858 0 0 0
## -41.2173078999225 0 0 0
## 446.693167676331 0 0 0
## 8 0 0 0
## 12 0 0 0
## 196 0 0 0
## 84 0 0 0
## 32 0 0 0
## 80 0 0 0
## 1576 0 0 0
## 116 0 0 0
## 88 0 0 0
## 24 0 0 0
## 20 0 0 0
## 1232 0 0 0
## 368 0 0 0
## 1464 0 0 0
## 836 0 0 0
## 2140 0 0 0
## 204 0 0 0
## 296 0 0 0
## 232 0 0 0
## 4 0 0 0
## 248 0 0 0
##
## 2050.5937087161 13.5597926256926 74.9116912054817
## -111.345580871333 0 0 0
## -139.768119082221 0 0 0
## 69.1392320836945 0 0 0
## -139.555340084582 0 0 0
## -77.0104840810835 0 0 0
## -102.910781300902 0 0 0
## 2050.5937087161 0 0 0
## 13.5597926256926 0 0 0
## 74.9116912054817 0 0 0
## -18.0795931178567 0 0 0
## 172.934673095712 0 0 0
## 1058.5542072308 0 0 0
## 1064.13237335324 0 0 0
## 363.76145495496 0 0 0
## 1433.87597119037 0 0 0
## 1244.44290005606 0 0 0
## 3850.96162627287 0 0 0
## 326.114439752832 0 0 0
## 105.296328373813 0 0 0
## 415.544773264371 0 0 0
## 424.609077423724 0 0 0
## 106.745231243858 0 0 0
## -41.2173078999225 0 0 0
## 446.693167676331 0 0 0
## 8 0 0 0
## 12 0 0 0
## 196 0 0 0
## 84 0 0 0
## 32 0 0 0
## 80 0 0 0
## 1576 0 0 0
## 116 0 0 0
## 88 0 0 0
## 24 0 0 0
## 20 0 0 0
## 1232 0 0 0
## 368 0 0 0
## 1464 0 0 0
## 836 0 0 0
## 2140 0 0 0
## 204 0 0 0
## 296 0 0 0
## 232 0 0 0
## 4 0 0 0
## 248 0 0 0
##
## -18.0795931178567 172.934673095712 1058.5542072308
## -111.345580871333 0 0 0
## -139.768119082221 0 0 0
## 69.1392320836945 0 0 0
## -139.555340084582 0 0 0
## -77.0104840810835 0 0 0
## -102.910781300902 0 0 0
## 2050.5937087161 0 0 0
## 13.5597926256926 0 0 0
## 74.9116912054817 0 0 0
## -18.0795931178567 0 0 0
## 172.934673095712 0 0 0
## 1058.5542072308 0 0 0
## 1064.13237335324 0 0 0
## 363.76145495496 0 0 0
## 1433.87597119037 0 0 0
## 1244.44290005606 0 0 0
## 3850.96162627287 0 0 0
## 326.114439752832 0 0 0
## 105.296328373813 0 0 0
## 415.544773264371 0 0 0
## 424.609077423724 0 0 0
## 106.745231243858 0 0 0
## -41.2173078999225 0 0 0
## 446.693167676331 0 0 0
## 8 0 0 0
## 12 0 0 0
## 196 0 0 0
## 84 0 0 0
## 32 0 0 0
## 80 0 0 0
## 1576 0 0 0
## 116 0 0 0
## 88 0 0 0
## 24 0 0 0
## 20 0 0 0
## 1232 0 0 0
## 368 0 0 0
## 1464 0 0 0
## 836 0 0 0
## 2140 0 0 0
## 204 0 0 0
## 296 0 0 0
## 232 0 0 0
## 4 0 0 0
## 248 0 0 0
##
## 1064.13237335324 363.76145495496 1433.87597119037
## -111.345580871333 0 0 0
## -139.768119082221 0 0 0
## 69.1392320836945 0 0 0
## -139.555340084582 0 0 0
## -77.0104840810835 0 0 0
## -102.910781300902 0 0 0
## 2050.5937087161 0 0 0
## 13.5597926256926 0 0 0
## 74.9116912054817 0 0 0
## -18.0795931178567 0 0 0
## 172.934673095712 0 0 0
## 1058.5542072308 0 0 0
## 1064.13237335324 0 0 0
## 363.76145495496 0 0 0
## 1433.87597119037 0 0 0
## 1244.44290005606 0 0 0
## 3850.96162627287 0 0 0
## 326.114439752832 0 0 0
## 105.296328373813 0 0 0
## 415.544773264371 0 0 0
## 424.609077423724 0 0 0
## 106.745231243858 0 0 0
## -41.2173078999225 0 0 0
## 446.693167676331 0 0 0
## 8 0 0 0
## 12 0 0 0
## 196 0 0 0
## 84 0 0 0
## 32 0 0 0
## 80 0 0 0
## 1576 0 0 0
## 116 0 0 0
## 88 0 0 0
## 24 0 0 0
## 20 0 0 0
## 1232 0 0 0
## 368 0 0 0
## 1464 0 0 0
## 836 0 0 0
## 2140 0 0 0
## 204 0 0 0
## 296 0 0 0
## 232 0 0 0
## 4 0 0 0
## 248 0 0 0
##
## 1244.44290005606 3850.96162627287 326.114439752832
## -111.345580871333 0 0 0
## -139.768119082221 0 0 0
## 69.1392320836945 0 0 0
## -139.555340084582 0 0 0
## -77.0104840810835 0 0 0
## -102.910781300902 0 0 0
## 2050.5937087161 0 0 0
## 13.5597926256926 0 0 0
## 74.9116912054817 0 0 0
## -18.0795931178567 0 0 0
## 172.934673095712 0 0 0
## 1058.5542072308 0 0 0
## 1064.13237335324 0 0 0
## 363.76145495496 0 0 0
## 1433.87597119037 0 0 0
## 1244.44290005606 0 0 0
## 3850.96162627287 0 0 0
## 326.114439752832 0 0 0
## 105.296328373813 0 0 0
## 415.544773264371 0 0 0
## 424.609077423724 0 0 0
## 106.745231243858 0 0 0
## -41.2173078999225 0 0 0
## 446.693167676331 0 0 0
## 8 0 0 0
## 12 0 0 0
## 196 0 0 0
## 84 0 0 0
## 32 0 0 0
## 80 0 0 0
## 1576 0 0 0
## 116 0 0 0
## 88 0 0 0
## 24 0 0 0
## 20 0 0 0
## 1232 0 0 0
## 368 0 0 0
## 1464 0 0 0
## 836 0 0 0
## 2140 0 0 0
## 204 0 0 0
## 296 0 0 0
## 232 0 0 0
## 4 0 0 0
## 248 0 0 0
##
## 105.296328373813 415.544773264371 424.609077423724
## -111.345580871333 0 0 0
## -139.768119082221 0 0 0
## 69.1392320836945 0 0 0
## -139.555340084582 0 0 0
## -77.0104840810835 0 0 0
## -102.910781300902 0 0 0
## 2050.5937087161 0 0 0
## 13.5597926256926 0 0 0
## 74.9116912054817 0 0 0
## -18.0795931178567 0 0 0
## 172.934673095712 0 0 0
## 1058.5542072308 0 0 0
## 1064.13237335324 0 0 0
## 363.76145495496 0 0 0
## 1433.87597119037 0 0 0
## 1244.44290005606 0 0 0
## 3850.96162627287 0 0 0
## 326.114439752832 0 0 0
## 105.296328373813 0 0 0
## 415.544773264371 0 0 0
## 424.609077423724 0 0 0
## 106.745231243858 0 0 0
## -41.2173078999225 0 0 0
## 446.693167676331 0 0 0
## 8 0 0 0
## 12 0 0 0
## 196 0 0 0
## 84 0 0 0
## 32 0 0 0
## 80 0 0 0
## 1576 0 0 0
## 116 0 0 0
## 88 0 0 0
## 24 0 0 0
## 20 0 0 0
## 1232 0 0 0
## 368 0 0 0
## 1464 0 0 0
## 836 0 0 0
## 2140 0 0 0
## 204 0 0 0
## 296 0 0 0
## 232 0 0 0
## 4 0 0 0
## 248 0 0 0
##
## 106.745231243858 -41.2173078999225 446.693167676331 8
## -111.345580871333 0 0 0 1
## -139.768119082221 0 0 0 0
## 69.1392320836945 0 0 0 0
## -139.555340084582 0 0 0 0
## -77.0104840810835 0 0 0 0
## -102.910781300902 0 0 0 0
## 2050.5937087161 0 0 0 0
## 13.5597926256926 0 0 0 0
## 74.9116912054817 0 0 0 0
## -18.0795931178567 0 0 0 0
## 172.934673095712 0 0 0 0
## 1058.5542072308 0 0 0 0
## 1064.13237335324 0 0 0 0
## 363.76145495496 0 0 0 0
## 1433.87597119037 0 0 0 0
## 1244.44290005606 0 0 0 0
## 3850.96162627287 0 0 0 0
## 326.114439752832 0 0 0 0
## 105.296328373813 0 0 0 0
## 415.544773264371 0 0 0 0
## 424.609077423724 0 0 0 0
## 106.745231243858 0 0 0 0
## -41.2173078999225 0 0 0 0
## 446.693167676331 0 0 0 0
## 8 0 0 0 0
## 12 0 0 0 0
## 196 0 0 0 0
## 84 0 0 0 0
## 32 0 0 0 0
## 80 0 0 0 0
## 1576 0 0 0 0
## 116 0 0 0 0
## 88 0 0 0 0
## 24 0 0 0 0
## 20 0 0 0 0
## 1232 0 0 0 0
## 368 0 0 0 0
## 1464 0 0 0 0
## 836 0 0 0 0
## 2140 0 0 0 0
## 204 0 0 0 0
## 296 0 0 0 0
## 232 0 0 0 0
## 4 0 0 0 0
## 248 0 0 0 0
##
## 12 196 84 32 80 1576 116 88 24 20 1232 368 1464 836
## -111.345580871333 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## -139.768119082221 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 69.1392320836945 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## -139.555340084582 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## -77.0104840810835 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## -102.910781300902 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## 2050.5937087161 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## 13.5597926256926 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## 74.9116912054817 0 0 0 0 0 0 0 1 0 0 0 0 0 0
## -18.0795931178567 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## 172.934673095712 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## 1058.5542072308 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## 1064.13237335324 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## 363.76145495496 0 0 0 0 0 0 0 0 0 0 0 1 0 0
## 1433.87597119037 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## 1244.44290005606 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## 3850.96162627287 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 326.114439752832 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 105.296328373813 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## 415.544773264371 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 424.609077423724 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 106.745231243858 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## -41.2173078999225 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 446.693167676331 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 196 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 84 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1576 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 116 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1232 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 368 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1464 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 836 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2140 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 204 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 296 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 232 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 248 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## 2140 204 296 232 4 248
## -111.345580871333 0 0 0 0 0 0
## -139.768119082221 0 0 0 0 0 0
## 69.1392320836945 0 0 0 0 0 0
## -139.555340084582 0 0 0 0 0 0
## -77.0104840810835 0 0 0 0 0 0
## -102.910781300902 0 0 0 0 0 0
## 2050.5937087161 0 0 0 0 0 0
## 13.5597926256926 0 0 0 0 0 0
## 74.9116912054817 0 0 0 0 0 0
## -18.0795931178567 0 0 0 0 0 0
## 172.934673095712 0 0 0 0 0 0
## 1058.5542072308 0 0 0 0 0 0
## 1064.13237335324 0 0 0 0 0 0
## 363.76145495496 0 0 0 0 0 0
## 1433.87597119037 0 0 0 0 0 0
## 1244.44290005606 0 0 0 0 0 0
## 3850.96162627287 1 0 0 0 0 0
## 326.114439752832 0 1 0 0 0 0
## 105.296328373813 0 0 0 0 0 0
## 415.544773264371 0 0 1 0 0 0
## 424.609077423724 0 0 0 1 0 0
## 106.745231243858 0 0 0 0 0 0
## -41.2173078999225 0 0 0 0 1 0
## 446.693167676331 0 0 0 0 0 1
## 8 0 0 0 0 0 0
## 12 0 0 0 0 0 0
## 196 0 0 0 0 0 0
## 84 0 0 0 0 0 0
## 32 0 0 0 0 0 0
## 80 0 0 0 0 0 0
## 1576 0 0 0 0 0 0
## 116 0 0 0 0 0 0
## 88 0 0 0 0 0 0
## 24 0 0 0 0 0 0
## 20 0 0 0 0 0 0
## 1232 0 0 0 0 0 0
## 368 0 0 0 0 0 0
## 1464 0 0 0 0 0 0
## 836 0 0 0 0 0 0
## 2140 0 0 0 0 0 0
## 204 0 0 0 0 0 0
## 296 0 0 0 0 0 0
## 232 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 248 0 0 0 0 0 0
##
## Overall Statistics
##
## Accuracy : 0
## 95% CI : (0, 0.1425)
## No Information Rate : 0.0833
## P-Value [Acc > NIR] : 1
##
## Kappa : 0
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: -111.345580871333 Class: -139.768119082221
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 69.1392320836945 Class: -139.555340084582
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: -77.0104840810835 Class: -102.910781300902
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 2050.5937087161 Class: 13.5597926256926
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 74.9116912054817 Class: -18.0795931178567
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 172.934673095712 Class: 1058.5542072308
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 1064.13237335324 Class: 363.76145495496
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 1433.87597119037 Class: 1244.44290005606
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 3850.96162627287 Class: 326.114439752832
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 105.296328373813 Class: 415.544773264371
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 424.609077423724 Class: 106.745231243858
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: -41.2173078999225 Class: 446.693167676331
## Sensitivity NA NA
## Specificity 0.95833 0.95833
## Pos Pred Value NA NA
## Neg Pred Value NA NA
## Prevalence 0.00000 0.00000
## Detection Rate 0.00000 0.00000
## Detection Prevalence 0.04167 0.04167
## Balanced Accuracy NA NA
## Class: 8 Class: 12 Class: 196 Class: 84 Class: 32
## Sensitivity 0.00000 0.00000 0.00000 0.00000 0.00000
## Specificity 1.00000 1.00000 1.00000 1.00000 1.00000
## Pos Pred Value NaN NaN NaN NaN NaN
## Neg Pred Value 0.95833 0.95833 0.95833 0.91667 0.91667
## Prevalence 0.04167 0.04167 0.04167 0.08333 0.08333
## Detection Rate 0.00000 0.00000 0.00000 0.00000 0.00000
## Detection Prevalence 0.00000 0.00000 0.00000 0.00000 0.00000
## Balanced Accuracy 0.50000 0.50000 0.50000 0.50000 0.50000
## Class: 80 Class: 1576 Class: 116 Class: 88 Class: 24
## Sensitivity 0.00000 0.00000 0.00000 0.00000 0.00000
## Specificity 1.00000 1.00000 1.00000 1.00000 1.00000
## Pos Pred Value NaN NaN NaN NaN NaN
## Neg Pred Value 0.95833 0.95833 0.95833 0.95833 0.95833
## Prevalence 0.04167 0.04167 0.04167 0.04167 0.04167
## Detection Rate 0.00000 0.00000 0.00000 0.00000 0.00000
## Detection Prevalence 0.00000 0.00000 0.00000 0.00000 0.00000
## Balanced Accuracy 0.50000 0.50000 0.50000 0.50000 0.50000
## Class: 20 Class: 1232 Class: 368 Class: 1464
## Sensitivity 0.00000 0.00000 0.00000 0.00000
## Specificity 1.00000 1.00000 1.00000 1.00000
## Pos Pred Value NaN NaN NaN NaN
## Neg Pred Value 0.95833 0.91667 0.95833 0.95833
## Prevalence 0.04167 0.08333 0.04167 0.04167
## Detection Rate 0.00000 0.00000 0.00000 0.00000
## Detection Prevalence 0.00000 0.00000 0.00000 0.00000
## Balanced Accuracy 0.50000 0.50000 0.50000 0.50000
## Class: 836 Class: 2140 Class: 204 Class: 296
## Sensitivity 0.00000 0.00000 0.00000 0.00000
## Specificity 1.00000 1.00000 1.00000 1.00000
## Pos Pred Value NaN NaN NaN NaN
## Neg Pred Value 0.95833 0.95833 0.95833 0.95833
## Prevalence 0.04167 0.04167 0.04167 0.04167
## Detection Rate 0.00000 0.00000 0.00000 0.00000
## Detection Prevalence 0.00000 0.00000 0.00000 0.00000
## Balanced Accuracy 0.50000 0.50000 0.50000 0.50000
## Class: 232 Class: 4 Class: 248
## Sensitivity 0.00000 0.00000 0.00000
## Specificity 1.00000 1.00000 1.00000
## Pos Pred Value NaN NaN NaN
## Neg Pred Value 0.95833 0.95833 0.95833
## Prevalence 0.04167 0.04167 0.04167
## Detection Rate 0.00000 0.00000 0.00000
## Detection Prevalence 0.00000 0.00000 0.00000
## Balanced Accuracy 0.50000 0.50000 0.50000
#Error check 4 - ggplot (this can be used to compare all the final results - this is done at the end of the process)
#the tested models are then run on the other data set
new_products <- read.csv("C:/Users/gebruiker/Desktop/Ubiqum_1/newproductattributes2017.csv")
#call the caret folder
library(caret)
#dumify the data - typical datasets don’t contain only numeric values. Most data will contain a mixture of numeric and nominal data. Dumifying helps to incorporate both for developing regression models and making predictions.
#How to dumify the data - convert categorical variables (factor and character variables) to binary variables using the process below
#dumify the data - step1 - create a new dataframe made up of dummy variables from the exisiting products data newDataFrame <- dummyVars(" ~ .", data = new_products)
#next integrate the dummy variables df called newdataframe into the existing products dataframe and assign all to a new name called ready dataframe
readyData_newprod <- data.frame(predict(newDataFrame, newdata = new_products))
#cross-check to ensure there are no nominal variables check the structure
str(readyData_newprod)
## 'data.frame': 24 obs. of 29 variables:
## $ ProductType.Accessories : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.Display : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.ExtendedWarranty: num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.GameConsole : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.Laptop : num 0 0 1 1 1 0 0 0 0 0 ...
## $ ProductType.Netbook : num 0 0 0 0 0 1 1 1 1 0 ...
## $ ProductType.PC : num 1 1 0 0 0 0 0 0 0 0 ...
## $ ProductType.Printer : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.PrinterSupplies : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.Smartphone : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.Software : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductType.Tablet : num 0 0 0 0 0 0 0 0 0 1 ...
## $ ProductNum : num 171 172 173 175 176 178 180 181 183 186 ...
## $ Price : num 699 860 1199 1199 1999 ...
## $ x5StarReviews : num 96 51 74 7 1 19 312 23 3 296 ...
## $ x4StarReviews : num 26 11 10 2 1 8 112 18 4 66 ...
## $ x3StarReviews : num 14 10 3 1 1 4 28 7 0 30 ...
## $ x2StarReviews : num 14 10 3 1 3 1 31 22 1 21 ...
## $ x1StarReviews : num 25 21 11 1 0 10 47 18 0 36 ...
## $ PositiveServiceReview : num 12 7 11 2 0 2 28 5 1 28 ...
## $ NegativeServiceReview : num 3 5 5 1 1 4 16 16 0 9 ...
## $ Recommendproduct : num 0.7 0.6 0.8 0.6 0.3 0.6 0.7 0.4 0.7 0.8 ...
## $ BestSellersRank : num 2498 490 111 4446 2820 ...
## $ ShippingWeight : num 19.9 27 6.6 13 11.6 5.8 4.6 4.8 4.3 3 ...
## $ ProductDepth : num 20.63 21.89 8.94 16.3 16.81 ...
## $ ProductWidth : num 19.2 27 12.8 10.8 10.9 ...
## $ ProductHeight : num 8.39 9.13 0.68 1.4 0.88 1.2 0.95 1.5 0.97 0.37 ...
## $ ProfitMargin : num 0.25 0.2 0.1 0.15 0.23 0.08 0.09 0.11 0.09 0.1 ...
## $ Volume : num 0 0 0 0 0 0 0 0 0 0 ...
#Check for missing data
summary(readyData_newprod)
## ProductType.Accessories ProductType.Display ProductType.ExtendedWarranty
## Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.08333 Mean :0.04167 Mean :0.04167
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000
## ProductType.GameConsole ProductType.Laptop ProductType.Netbook
## Min. :0.00000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.000 1st Qu.:0.0000
## Median :0.00000 Median :0.000 Median :0.0000
## Mean :0.08333 Mean :0.125 Mean :0.1667
## 3rd Qu.:0.00000 3rd Qu.:0.000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.000 Max. :1.0000
## ProductType.PC ProductType.Printer ProductType.PrinterSupplies
## Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.08333 Mean :0.04167 Mean :0.04167
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000
## ProductType.Smartphone ProductType.Software ProductType.Tablet
## Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.1667 Mean :0.04167 Mean :0.08333
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.00000 Max. :1.00000
## ProductNum Price x5StarReviews x4StarReviews
## Min. :171.0 Min. : 8.5 Min. : 0.00 Min. : 0.00
## 1st Qu.:179.5 1st Qu.: 130.0 1st Qu.: 16.00 1st Qu.: 2.00
## Median :193.5 Median : 275.0 Median : 46.00 Median : 10.50
## Mean :219.5 Mean : 425.6 Mean : 178.50 Mean : 48.04
## 3rd Qu.:301.2 3rd Qu.: 486.5 3rd Qu.: 99.25 3rd Qu.: 26.00
## Max. :307.0 Max. :1999.0 Max. :1525.00 Max. :437.00
## x3StarReviews x2StarReviews x1StarReviews PositiveServiceReview
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.75 1st Qu.: 1.00 1st Qu.: 1.75 1st Qu.: 2.00
## Median : 4.50 Median : 4.00 Median : 13.00 Median : 5.00
## Mean : 21.92 Mean : 17.50 Mean : 27.58 Mean :13.46
## 3rd Qu.: 16.75 3rd Qu.: 20.25 3rd Qu.: 35.25 3rd Qu.:12.50
## Max. :224.00 Max. :160.00 Max. :247.00 Max. :90.00
## NegativeServiceReview Recommendproduct BestSellersRank
## Min. : 0.000 Min. :0.3000 Min. : 1.00
## 1st Qu.: 1.000 1st Qu.:0.6000 1st Qu.: 93.25
## Median : 3.500 Median :0.7000 Median : 750.50
## Mean : 5.667 Mean :0.6708 Mean : 3957.62
## 3rd Qu.: 7.500 3rd Qu.:0.8000 3rd Qu.: 3150.00
## Max. :23.000 Max. :1.0000 Max. :44465.00
## ShippingWeight ProductDepth ProductWidth ProductHeight
## Min. : 0.200 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.900 1st Qu.: 5.225 1st Qu.: 5.832 1st Qu.: 0.400
## Median : 4.450 Median : 8.000 Median : 9.950 Median : 0.985
## Mean : 7.802 Mean : 9.094 Mean :10.408 Mean : 3.541
## 3rd Qu.: 9.575 3rd Qu.:11.425 3rd Qu.:12.875 3rd Qu.: 2.888
## Max. :42.000 Max. :21.890 Max. :27.010 Max. :25.800
## ProfitMargin Volume
## Min. :0.0500 Min. :0
## 1st Qu.:0.0975 1st Qu.:0
## Median :0.1150 Median :0
## Mean :0.1817 Mean :0
## 3rd Qu.:0.2000 3rd Qu.:0
## Max. :0.9000 Max. :0
#delete all missing data
readyData_newprod[is.na(readyData_newprod)] <- 0
#Find the correlation between the relevant independent variables and the dependent variable
corrData <- cor(readyData_newprod)
## Warning in cor(readyData_newprod): the standard deviation is zero
#call the correlation of the dataset
corrData
## ProductType.Accessories ProductType.Display
## ProductType.Accessories 1.000000000 -0.06286946
## ProductType.Display -0.062869461 1.00000000
## ProductType.ExtendedWarranty -0.062869461 -0.04347826
## ProductType.GameConsole -0.090909091 -0.06286946
## ProductType.Laptop -0.113960576 -0.07881104
## ProductType.Netbook -0.134839972 -0.09325048
## ProductType.PC -0.090909091 -0.06286946
## ProductType.Printer -0.062869461 -0.04347826
## ProductType.PrinterSupplies -0.062869461 -0.04347826
## ProductType.Smartphone -0.134839972 -0.09325048
## ProductType.Software -0.062869461 -0.04347826
## ProductType.Tablet -0.090909091 -0.06286946
## ProductNum 0.450781996 -0.07033263
## Price -0.265412932 -0.12759805
## x5StarReviews -0.130951120 -0.10465503
## x4StarReviews -0.143700957 -0.10258130
## x3StarReviews -0.118301893 -0.09736158
## x2StarReviews -0.142932035 -0.11160107
## x1StarReviews -0.123548795 -0.10884094
## PositiveServiceReview -0.164968075 -0.12404259
## NegativeServiceReview -0.257361129 -0.16075767
## Recommendproduct 0.050438089 0.03488117
## BestSellersRank -0.129158661 -0.08259333
## ShippingWeight -0.207289431 0.02272106
## ProductDepth -0.004891727 0.16260679
## ProductWidth -0.017290702 0.24028011
## ProductHeight -0.127310054 0.13555836
## ProfitMargin -0.188703553 -0.16108668
## Volume NA NA
## ProductType.ExtendedWarranty
## ProductType.Accessories -0.06286946
## ProductType.Display -0.04347826
## ProductType.ExtendedWarranty 1.00000000
## ProductType.GameConsole -0.06286946
## ProductType.Laptop -0.07881104
## ProductType.Netbook -0.09325048
## ProductType.PC -0.06286946
## ProductType.Printer -0.04347826
## ProductType.PrinterSupplies -0.04347826
## ProductType.Smartphone -0.09325048
## ProductType.Software -0.04347826
## ProductType.Tablet -0.06286946
## ProductNum 0.32885257
## Price -0.14547070
## x5StarReviews -0.10705400
## x4StarReviews -0.10044605
## x3StarReviews -0.09291922
## x2StarReviews -0.10522386
## x1StarReviews -0.11309531
## PositiveServiceReview -0.13399919
## NegativeServiceReview -0.09186153
## Recommendproduct -0.32389658
## BestSellersRank -0.08984429
## ShippingWeight -0.15732285
## ProductDepth -0.32814546
## ProductWidth -0.34771780
## ProductHeight -0.12771428
## ProfitMargin 0.26711841
## Volume NA
## ProductType.GameConsole ProductType.Laptop
## ProductType.Accessories -0.09090909 -0.11396058
## ProductType.Display -0.06286946 -0.07881104
## ProductType.ExtendedWarranty -0.06286946 -0.07881104
## ProductType.GameConsole 1.00000000 -0.11396058
## ProductType.Laptop -0.11396058 1.00000000
## ProductType.Netbook -0.13483997 -0.16903085
## ProductType.PC -0.09090909 -0.11396058
## ProductType.Printer -0.06286946 -0.07881104
## ProductType.PrinterSupplies -0.06286946 -0.07881104
## ProductType.Smartphone -0.13483997 -0.16903085
## ProductType.Software -0.06286946 -0.07881104
## ProductType.Tablet -0.09090909 -0.11396058
## ProductNum 0.18416094 -0.30895915
## Price -0.05693774 0.84212919
## x5StarReviews 0.70678916 -0.16433710
## x4StarReviews 0.39044978 -0.16917232
## x3StarReviews 0.25748059 -0.16306209
## x2StarReviews 0.17520701 -0.17532157
## x1StarReviews 0.14713072 -0.18186756
## PositiveServiceReview 0.46131073 -0.16468675
## NegativeServiceReview 0.34038085 -0.20814145
## Recommendproduct 0.30983398 -0.22581247
## BestSellersRank -0.12465575 -0.06174655
## ShippingWeight 0.19145439 0.09745391
## ProductDepth -0.09098613 0.32200363
## ProductWidth -0.03902988 0.06613537
## ProductHeight 0.20514324 -0.16700053
## ProfitMargin -0.08255780 -0.04804971
## Volume NA NA
## ProductType.Netbook ProductType.PC
## ProductType.Accessories -0.13483997 -0.09090909
## ProductType.Display -0.09325048 -0.06286946
## ProductType.ExtendedWarranty -0.09325048 -0.06286946
## ProductType.GameConsole -0.13483997 -0.09090909
## ProductType.Laptop -0.16903085 -0.11396058
## ProductType.Netbook 1.00000000 -0.13483997
## ProductType.PC -0.13483997 1.00000000
## ProductType.Printer -0.09325048 -0.06286946
## ProductType.PrinterSupplies -0.09325048 -0.06286946
## ProductType.Smartphone -0.20000000 -0.13483997
## ProductType.Software -0.09325048 -0.06286946
## ProductType.Tablet -0.13483997 -0.09090909
## ProductNum -0.31800113 -0.26387239
## Price -0.04900114 0.22856832
## x5StarReviews -0.11480263 -0.09105873
## x4StarReviews -0.05743602 -0.09121216
## x3StarReviews -0.11592140 -0.06370102
## x2StarReviews -0.05129092 -0.05071782
## x1StarReviews -0.08060067 -0.02819578
## PositiveServiceReview -0.09520556 -0.05698897
## NegativeServiceReview 0.24627629 -0.08301972
## Recommendproduct -0.18168574 -0.03602721
## BestSellersRank -0.02631946 -0.08097427
## ShippingWeight -0.12991915 0.46825592
## ProductDepth -0.04595216 0.63481575
## ProductWidth -0.01489812 0.61459487
## ProductHeight -0.18457706 0.27215539
## ProfitMargin -0.23397272 0.07666082
## Volume NA NA
## ProductType.Printer
## ProductType.Accessories -0.06286946
## ProductType.Display -0.04347826
## ProductType.ExtendedWarranty -0.04347826
## ProductType.GameConsole -0.06286946
## ProductType.Laptop -0.07881104
## ProductType.Netbook -0.09325048
## ProductType.PC -0.06286946
## ProductType.Printer 1.00000000
## ProductType.PrinterSupplies -0.04347826
## ProductType.Smartphone -0.09325048
## ProductType.Software -0.04347826
## ProductType.Tablet -0.06286946
## ProductNum 0.32124904
## Price -0.10080023
## x5StarReviews -0.05427668
## x4StarReviews -0.08549925
## x3StarReviews -0.08403452
## x2StarReviews -0.10522386
## x1StarReviews -0.10458657
## PositiveServiceReview -0.08421621
## NegativeServiceReview -0.16075767
## Recommendproduct 0.15447375
## BestSellersRank -0.08904873
## ShippingWeight 0.70771569
## ProductDepth 0.29612026
## ProductWidth 0.43739305
## ProductHeight 0.80275615
## ProfitMargin 0.87883997
## Volume NA
## ProductType.PrinterSupplies
## ProductType.Accessories -0.06286946
## ProductType.Display -0.04347826
## ProductType.ExtendedWarranty -0.04347826
## ProductType.GameConsole -0.06286946
## ProductType.Laptop -0.07881104
## ProductType.Netbook -0.09325048
## ProductType.PC -0.06286946
## ProductType.Printer -0.04347826
## ProductType.PrinterSupplies 1.00000000
## ProductType.Smartphone -0.09325048
## ProductType.Software -0.04347826
## ProductType.Tablet -0.06286946
## ProductNum 0.32505081
## Price -0.18076038
## x5StarReviews -0.10405529
## x4StarReviews -0.10258130
## x3StarReviews -0.09736158
## x2StarReviews -0.11160107
## x1StarReviews -0.11734967
## PositiveServiceReview -0.12404259
## NegativeServiceReview -0.19520575
## Recommendproduct 0.39365892
## BestSellersRank -0.06697762
## ShippingWeight -0.14076709
## ProductDepth -0.15854725
## ProductWidth -0.25083178
## ProductHeight 0.09949362
## ProfitMargin 0.14477410
## Volume NA
## ProductType.Smartphone ProductType.Software
## ProductType.Accessories -0.134839972 -0.06286946
## ProductType.Display -0.093250481 -0.04347826
## ProductType.ExtendedWarranty -0.093250481 -0.04347826
## ProductType.GameConsole -0.134839972 -0.06286946
## ProductType.Laptop -0.169030851 -0.07881104
## ProductType.Netbook -0.200000000 -0.09325048
## ProductType.PC -0.134839972 -0.06286946
## ProductType.Printer -0.093250481 -0.04347826
## ProductType.PrinterSupplies -0.093250481 -0.04347826
## ProductType.Smartphone 1.000000000 -0.09325048
## ProductType.Software -0.093250481 1.00000000
## ProductType.Tablet -0.134839972 -0.06286946
## ProductNum -0.203846876 0.31744728
## Price -0.240853254 -0.15842514
## x5StarReviews -0.136026645 -0.08966148
## x4StarReviews -0.129564971 -0.06414668
## x3StarReviews -0.051608842 -0.08403452
## x2StarReviews 0.010258184 -0.10522386
## x1StarReviews 0.008364221 -0.08331472
## PositiveServiceReview -0.121898713 -0.09417280
## NegativeServiceReview -0.049255257 -0.12630960
## Recommendproduct -0.245810121 0.15447375
## BestSellersRank 0.648309435 -0.08718484
## ShippingWeight -0.309679173 -0.15732285
## ProductDepth -0.497347475 -0.03946767
## ProductWidth -0.372632025 -0.11385499
## ProductHeight -0.243556636 -0.09164953
## ProfitMargin -0.155252927 0.02242979
## Volume NA NA
## ProductType.Tablet ProductNum Price
## ProductType.Accessories -0.090909091 0.45078200 -0.265412932
## ProductType.Display -0.062869461 -0.07033263 -0.127598046
## ProductType.ExtendedWarranty -0.062869461 0.32885257 -0.145470703
## ProductType.GameConsole -0.090909091 0.18416094 -0.056937736
## ProductType.Laptop -0.113960576 -0.30895915 0.842129187
## ProductType.Netbook -0.134839972 -0.31800113 -0.049001142
## ProductType.PC -0.090909091 -0.26387239 0.228568319
## ProductType.Printer -0.062869461 0.32124904 -0.100800228
## ProductType.PrinterSupplies -0.062869461 0.32505081 -0.180760378
## ProductType.Smartphone -0.134839972 -0.20384688 -0.240853254
## ProductType.Software -0.062869461 0.31744728 -0.158425140
## ProductType.Tablet 1.000000000 -0.18141227 -0.007520556
## ProductNum -0.181412267 1.00000000 -0.502843414
## Price -0.007520556 -0.50284341 1.000000000
## x5StarReviews 0.382446649 0.14369493 -0.073827536
## x4StarReviews 0.628193172 -0.02964380 -0.110872859
## x3StarReviews 0.675016683 -0.05761888 -0.120417139
## x2StarReviews 0.673163776 -0.14449492 -0.120403166
## x1StarReviews 0.700793372 -0.19989038 -0.145942983
## PositiveServiceReview 0.655673113 -0.06973952 -0.087276184
## NegativeServiceReview 0.514722258 -0.27624009 -0.075491928
## Recommendproduct 0.223368680 0.32962103 -0.335467972
## BestSellersRank -0.129503774 -0.21143500 -0.064065637
## ShippingWeight -0.175120605 0.04608903 0.305381720
## ProductDepth -0.142903660 -0.20489592 0.554406040
## ProductWidth -0.089754643 -0.24980215 0.299762108
## ProductHeight -0.164596972 0.19876714 -0.083221972
## ProfitMargin -0.056021367 0.44070709 -0.039665042
## Volume NA NA NA
## x5StarReviews x4StarReviews x3StarReviews
## ProductType.Accessories -0.13095112 -0.14370096 -0.11830189
## ProductType.Display -0.10465503 -0.10258130 -0.09736158
## ProductType.ExtendedWarranty -0.10705400 -0.10044605 -0.09291922
## ProductType.GameConsole 0.70678916 0.39044978 0.25748059
## ProductType.Laptop -0.16433710 -0.16917232 -0.16306209
## ProductType.Netbook -0.11480263 -0.05743602 -0.11592140
## ProductType.PC -0.09105873 -0.09121216 -0.06370102
## ProductType.Printer -0.05427668 -0.08549925 -0.08403452
## ProductType.PrinterSupplies -0.10405529 -0.10258130 -0.09736158
## ProductType.Smartphone -0.13602664 -0.12956497 -0.05160884
## ProductType.Software -0.08966148 -0.06414668 -0.08403452
## ProductType.Tablet 0.38244665 0.62819317 0.67501668
## ProductNum 0.14369493 -0.02964380 -0.05761888
## Price -0.07382754 -0.11087286 -0.12041714
## x5StarReviews 1.00000000 0.86156376 0.78144956
## x4StarReviews 0.86156376 1.00000000 0.97642080
## x3StarReviews 0.78144956 0.97642080 1.00000000
## x2StarReviews 0.71083333 0.95261395 0.98478962
## x1StarReviews 0.61377988 0.91396774 0.95019130
## PositiveServiceReview 0.88571578 0.98200009 0.94939773
## NegativeServiceReview 0.66453818 0.80016317 0.75033197
## Recommendproduct 0.38626309 0.31062215 0.26292639
## BestSellersRank -0.14310283 -0.12340182 -0.06364895
## ShippingWeight 0.14592069 -0.02366589 -0.05810772
## ProductDepth -0.11059532 -0.15395982 -0.16836843
## ProductWidth -0.15108379 -0.16341943 -0.15862034
## ProductHeight -0.03516902 -0.10534722 -0.13693604
## ProfitMargin -0.03045462 -0.04346523 -0.03102536
## Volume NA NA NA
## x2StarReviews x1StarReviews
## ProductType.Accessories -0.142932035 -0.123548795
## ProductType.Display -0.111601068 -0.108840937
## ProductType.ExtendedWarranty -0.105223864 -0.113095306
## ProductType.GameConsole 0.175207010 0.147130723
## ProductType.Laptop -0.175321566 -0.181867555
## ProductType.Netbook -0.051290920 -0.080600675
## ProductType.PC -0.050717819 -0.028195783
## ProductType.Printer -0.105223864 -0.104586568
## ProductType.PrinterSupplies -0.111601068 -0.117349675
## ProductType.Smartphone 0.010258184 0.008364221
## ProductType.Software -0.105223864 -0.083314724
## ProductType.Tablet 0.673163776 0.700793372
## ProductNum -0.144494923 -0.199890381
## Price -0.120403166 -0.145942983
## x5StarReviews 0.710833335 0.613779876
## x4StarReviews 0.952613948 0.913967738
## x3StarReviews 0.984789621 0.950191299
## x2StarReviews 1.000000000 0.971015430
## x1StarReviews 0.971015430 1.000000000
## PositiveServiceReview 0.922510464 0.893580388
## NegativeServiceReview 0.806746949 0.779582027
## Recommendproduct 0.172124249 0.171672785
## BestSellersRank 0.004361332 -0.039696208
## ShippingWeight -0.104655234 -0.142093238
## ProductDepth -0.190425489 -0.195358134
## ProductWidth -0.169516140 -0.123405315
## ProductHeight -0.165810554 -0.106701931
## ProfitMargin -0.060938012 -0.077581647
## Volume NA NA
## PositiveServiceReview NegativeServiceReview
## ProductType.Accessories -0.164968075 -0.2573611292
## ProductType.Display -0.124042590 -0.1607576747
## ProductType.ExtendedWarranty -0.133999186 -0.0918615284
## ProductType.GameConsole 0.461310727 0.3403808484
## ProductType.Laptop -0.164686747 -0.2081414510
## ProductType.Netbook -0.095205564 0.2462762861
## ProductType.PC -0.056988971 -0.0830197191
## ProductType.Printer -0.084216207 -0.1607576747
## ProductType.PrinterSupplies -0.124042590 -0.1952057478
## ProductType.Smartphone -0.121898713 -0.0492552572
## ProductType.Software -0.094172803 -0.1263096015
## ProductType.Tablet 0.655673113 0.5147222585
## ProductNum -0.069739521 -0.2762400896
## Price -0.087276184 -0.0754919276
## x5StarReviews 0.885715784 0.6645381774
## x4StarReviews 0.982000086 0.8001631738
## x3StarReviews 0.949397729 0.7503319690
## x2StarReviews 0.922510464 0.8067469492
## x1StarReviews 0.893580388 0.7795820270
## PositiveServiceReview 1.000000000 0.8196544641
## NegativeServiceReview 0.819654464 1.0000000000
## Recommendproduct 0.360831992 0.0539570729
## BestSellersRank -0.134995058 -0.0009529939
## ShippingWeight 0.007449719 -0.1073175650
## ProductDepth -0.141295323 -0.2155576801
## ProductWidth -0.123905533 -0.1382365188
## ProductHeight -0.067617496 -0.1463361480
## ProfitMargin -0.065353674 -0.1892907418
## Volume NA NA
## Recommendproduct BestSellersRank
## ProductType.Accessories 0.05043809 -0.1291586611
## ProductType.Display 0.03488117 -0.0825933261
## ProductType.ExtendedWarranty -0.32389658 -0.0898442865
## ProductType.GameConsole 0.30983398 -0.1246557539
## ProductType.Laptop -0.22581247 -0.0617465535
## ProductType.Netbook -0.18168574 -0.0263194606
## ProductType.PC -0.03602721 -0.0809742676
## ProductType.Printer 0.15447375 -0.0890487266
## ProductType.PrinterSupplies 0.39365892 -0.0669776214
## ProductType.Smartphone -0.24581012 0.6483094350
## ProductType.Software 0.15447375 -0.0871848433
## ProductType.Tablet 0.22336868 -0.1295037744
## ProductNum 0.32962103 -0.2114349960
## Price -0.33546797 -0.0640656369
## x5StarReviews 0.38626309 -0.1431028328
## x4StarReviews 0.31062215 -0.1234018174
## x3StarReviews 0.26292639 -0.0636489532
## x2StarReviews 0.17212425 0.0043613320
## x1StarReviews 0.17167279 -0.0396962079
## PositiveServiceReview 0.36083199 -0.1349950580
## NegativeServiceReview 0.05395707 -0.0009529939
## Recommendproduct 1.00000000 -0.1542109832
## BestSellersRank -0.15421098 1.0000000000
## ShippingWeight 0.11079815 -0.2033580728
## ProductDepth 0.01437657 -0.2917761162
## ProductWidth 0.09762164 -0.2473962751
## ProductHeight 0.26833286 -0.1949204105
## ProfitMargin 0.08156030 -0.1473475029
## Volume NA NA
## ShippingWeight ProductDepth ProductWidth
## ProductType.Accessories -0.207289431 -0.004891727 -0.01729070
## ProductType.Display 0.022721058 0.162606786 0.24028011
## ProductType.ExtendedWarranty -0.157322848 -0.328145456 -0.34771780
## ProductType.GameConsole 0.191454389 -0.090986127 -0.03902988
## ProductType.Laptop 0.097453914 0.322003629 0.06613537
## ProductType.Netbook -0.129919151 -0.045952159 -0.01489812
## ProductType.PC 0.468255916 0.634815755 0.61459487
## ProductType.Printer 0.707715687 0.296120264 0.43739305
## ProductType.PrinterSupplies -0.140767086 -0.158547255 -0.25083178
## ProductType.Smartphone -0.309679173 -0.497347475 -0.37263202
## ProductType.Software -0.157322848 -0.039467667 -0.11385499
## ProductType.Tablet -0.175120605 -0.142903660 -0.08975464
## ProductNum 0.046089027 -0.204895921 -0.24980215
## Price 0.305381720 0.554406040 0.29976211
## x5StarReviews 0.145920693 -0.110595322 -0.15108379
## x4StarReviews -0.023665887 -0.153959818 -0.16341943
## x3StarReviews -0.058107720 -0.168368426 -0.15862034
## x2StarReviews -0.104655234 -0.190425489 -0.16951614
## x1StarReviews -0.142093238 -0.195358134 -0.12340531
## PositiveServiceReview 0.007449719 -0.141295323 -0.12390553
## NegativeServiceReview -0.107317565 -0.215557680 -0.13823652
## Recommendproduct 0.110798147 0.014376568 0.09762164
## BestSellersRank -0.203358073 -0.291776116 -0.24739628
## ShippingWeight 1.000000000 0.756791718 0.77092781
## ProductDepth 0.756791718 1.000000000 0.85162710
## ProductWidth 0.770927807 0.851627104 1.00000000
## ProductHeight 0.795171973 0.486154560 0.67409332
## ProfitMargin 0.666513549 0.252162759 0.27358454
## Volume NA NA NA
## ProductHeight ProfitMargin Volume
## ProductType.Accessories -0.12731005 -0.18870355 NA
## ProductType.Display 0.13555836 -0.16108668 NA
## ProductType.ExtendedWarranty -0.12771428 0.26711841 NA
## ProductType.GameConsole 0.20514324 -0.08255780 NA
## ProductType.Laptop -0.16700053 -0.04804971 NA
## ProductType.Netbook -0.18457706 -0.23397272 NA
## ProductType.PC 0.27215539 0.07666082 NA
## ProductType.Printer 0.80275615 0.87883997 NA
## ProductType.PrinterSupplies 0.09949362 0.14477410 NA
## ProductType.Smartphone -0.24355664 -0.15525293 NA
## ProductType.Software -0.09164953 0.02242979 NA
## ProductType.Tablet -0.16459697 -0.05602137 NA
## ProductNum 0.19876714 0.44070709 NA
## Price -0.08322197 -0.03966504 NA
## x5StarReviews -0.03516902 -0.03045462 NA
## x4StarReviews -0.10534722 -0.04346523 NA
## x3StarReviews -0.13693604 -0.03102536 NA
## x2StarReviews -0.16581055 -0.06093801 NA
## x1StarReviews -0.10670193 -0.07758165 NA
## PositiveServiceReview -0.06761750 -0.06535367 NA
## NegativeServiceReview -0.14633615 -0.18929074 NA
## Recommendproduct 0.26833286 0.08156030 NA
## BestSellersRank -0.19492041 -0.14734750 NA
## ShippingWeight 0.79517197 0.66651355 NA
## ProductDepth 0.48615456 0.25216276 NA
## ProductWidth 0.67409332 0.27358454 NA
## ProductHeight 1.00000000 0.72202181 NA
## ProfitMargin 0.72202181 1.00000000 NA
## Volume NA NA 1
#note: Correlation values fall within -1 and 1 with variables have string positive relationships having correlation values closer to 1 and strong negative relationships with values closer to -1.
#correlation matrix using a heat map
install.packages(“corrplot”)
library(corrplot)
#Do a plot tof the data
corrplot(corrData)
#blue (cooler) colors show a positive relationship and red (warmer) colors indicate more negative relationships
#predict the findings for the new product rf - trained model name (is known as the object) and the dataset is meant to be inside the bracket
readydata_rf_predict_newprod<-predict(object = readydata_rf, newdata=readyData_newprod, na.action = na.pass)
#to see all the predictions
readydata_rf_predict_newprod
## 1 2 3 4 5 6
## 418.09453 242.73587 257.22520 50.01307 67.44080 83.58013
## 7 8 9 10 11 12
## 1363.87747 295.09227 28.71653 1096.55453 4773.67787 454.93600
## 13 14 15 16 17 18
## 902.34453 190.33154 509.13293 2042.42013 445.60173 492.42451
## 19 20 21 22 23 24
## 586.58076 516.15794 531.28488 475.64000 468.25987 4724.95827
#predict the findings for the new product KNN - trained model name and the dataset is meant to be inside the bracket
readydata_KNN_predict_newprod<-predict(object = readydata_KNN, newdata=readyData_newprod, na.action = na.pass)
#to see all the predictions
readydata_KNN_predict_newprod
## [1] 240.8 168.8 231.2 231.2 216.0 90.4 1120.0 87.2 90.4 514.4
## [11] 1652.0 168.8 275.2 107.2 210.4 1484.0 44.8 136.0 220.0 60.8
## [21] 156.0 62.4 36.8 2781.6
#predict the findings for the new product SVM - trained model name and the dataset is meant to be inside the bracket
readydata_svm_predict_newprod<-predict(object = readydata_svm, newdata=readyData_newprod, na.action = na.pass)
#to see all the predictions
readydata_svm_predict_newprod
## 1 2 3 4 5 6
## 538.73341 248.94260 257.90869 74.24331 28.49271 78.93764
## 7 8 9 10 11 12
## 1549.46347 61.99109 53.95642 1139.30168 6651.44163 484.42319
## 13 14 15 16 17 18
## 806.98151 261.57497 371.18396 1695.71787 62.87584 508.63044
## 19 20 21 22 23 24
## 427.34255 525.66379 880.88309 482.80142 601.98731 6586.05181
#steps to add data/findings to the dataset. How: new prod file is assigned into a new file named new prod plus predictions #first we add for rnadom forest
newprod_pluspredictionsrf <- readyData_newprod
#Then u add the final predictions from each of the models (rf)
newprod_pluspredictionsrf$pred <- predict(object = readydata_rf, newdata = readyData_newprod)
#here we add the predicted rf data to the predictions column in the new prod plus prediction file
newprod_pluspredictionsrf$predictions <- readydata_rf_predict_newprod
#Create a csv file and write it to your hard drive. Note: You may need to use your computer’s search function to locate your output file
write.csv(newprod_pluspredictionsrf, file="newprod_pluspredictionsrf.csv", row.names = TRUE)
#steps to add data/findings to the dataset. How: new prod file is assigned into a new file named new prod plus predictions #next we do same for KNN
newprod_pluspredictionsKNN <- readyData_newprod
#Then u add the final predictions from each of the models (KNN)
newprod_pluspredictionsKNN$pred <- predict(object = readydata_KNN, newdata = readyData_newprod)
#here we add the predicted KNN data to the predictions column in the new prod plus prediction file
newprod_pluspredictionsKNN$predictions <- readydata_KNN_predict_newprod
#Create a csv file and write it to your hard drive. Note: You may need to use your computer’s search function to locate your output file
write.csv(newprod_pluspredictionsKNN, file="newprod_pluspredictionsKNN.csv", row.names = TRUE)
#steps to add data/findings to the dataset. How: new prod file is assigned into a new file named new prod plus predictions #next we do same for svm
newprod_pluspredictionssvm <- readyData_newprod
#Then u add the final predictions from each of the models (svm)
newprod_pluspredictionssvm$pred <- predict(object = readydata_svm, newdata = readyData_newprod)
#here we add the predicted KNN data to the predictions column in the new prod plus prediction file
newprod_pluspredictionssvm$predictions <- readydata_svm_predict_newprod
#Create a csv file and write it to your hard drive. Note: You may need to use your computer’s search function to locate your output file
write.csv(newprod_pluspredictionssvm, file="newprod_pluspredictionssvm.csv", row.names = TRUE)
#run dplyr
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(magrittr)
#since the product type was dumifyed, we can use the product number to identify the product types and then assign colours to the product types in the graph so we can identify the outliers. here we used dplyr package. so we merge product type and number under the exisitng products and name it product type and then use the procut type in the graph to identify each data point.
testSet <- testSet %>%
left_join(existing_products %>%
select(ProductNum, ProductType),
by = "ProductNum")
#here we use ggplot to visualize the findings in the testSet.The focus is on the test set because we can to see the level of errors that may have occured. geom_abline is used to draw a regression line across. for it you must put the intercepts and slope. color is used to identify the product types. This is for rf(random forest)
rf_plot <- ggplot(data = testSet) +
geom_point(mapping = aes(x = readydata_rf_predict, y = testSet$Volume, col = ProductType)) +
geom_abline(slope = 1, intercept = 0) +
labs(title = "rf model error with all variables") +
theme(legend.position="bottom", legend.title = element_blank())
#call the function
rf_plot
#here we use ggplot to visualize the findings in the testSet.The focus is on the test set because we can to see the level of errors that may have occured. geom_abline is used to draw a regression line across. for it you must put the intercepts and slope. color is used to identify the product types. This is for KNN
KNN_plot <- ggplot(data = testSet) +
geom_point(mapping = aes(x = readydata_KNN_predict, y = testSet$Volume, col = ProductType)) +
geom_abline(slope = 1, intercept = 0) +
labs(title = "KNN model error with all variables") +
theme(legend.position="bottom", legend.title = element_blank())
#call the function
KNN_plot
#here we use ggplot to visualize the findings in the testSet.The focus is on the test set because we can to see the level of errors that may have occured. geom_abline is used to draw a regression line across. for it you must put the intercepts and slope. color is used to identify the product types. This is for svm
svm_plot <- ggplot(data = testSet) +
geom_point(mapping = aes(x = readydata_svm_predict, y = testSet$Volume, col = ProductType)) +
geom_abline(slope = 1, intercept = 0) +
labs(title = "svm model error with all variables") +
theme(legend.position="bottom", legend.title = element_blank())
#call the function
svm_plot
#Reviews Vs Volume of Sales
Review_Vol_of_sale <- ggplot(data = testSet) +
geom_point(mapping = aes(x = testSet$PositiveServiceReview, y = readydata_svm_predict, col = ProductType)) +
labs(title = "Positive Service Reviews VS Volume of Sale") +
theme(legend.position="bottom", legend.title = element_blank())
Review_Vol_of_sale
NReview_Vol_of_sale <- ggplot(data = testSet) +
geom_point(mapping = aes(x = testSet$NegativeServiceReview, y = readydata_svm_predict, col = ProductType)) +
labs(title = "Negative Service Reviews VS Volume of Sale") +
theme(legend.position="bottom", legend.title = element_blank())
NReview_Vol_of_sale
x5star_Review_Vol_of_sale <- ggplot(data = testSet) +
geom_point(mapping = aes(x = testSet$x5StarReviews, y = readydata_svm_predict, col = ProductType)) +
labs(title = "x5star_Review VS Volume of Sale") +
theme(legend.position="bottom", legend.title = element_blank())
x5star_Review_Vol_of_sale
Findings In line with the deliverables, the Algorithms tested were linear regression, random forest, KNN and SVM. At the end of the tests the linear regression model was not used ecause the r - squared value that was gotten was equal to 1. Although this may seem good, it is likely that some kind of over fitting may occour. So other models were tested. Specifically, the RF, kNN, and SVM. Following testing of the models, the KNN model gave an error rating of 0.67, while the rf and svm gave values of 0.87 and 0.88. In addition an observation of the graphs that were generated using the error metrics of the showed a closer convergence between the predicted data and the test set volume.
The last set of charts loosely shows the the impact of service reviews on the sales volume. There you would observe that the higher the reviews, the more likley sales are to occour.
Finally, the four key products of focus were; PC, Laptops, netbooks and Smartphones. Based on predictions driven by the SVM model, smartphone are likely to produce the highest volume of sales at about 1924, this would be followed by Netbooks at 1744, then PC at 787 and laptops at 360. This hierachy of sales prediction tallied with that of the random forest, with the following sequence: smartphones:2056, netbook: 1771, PC: 660 and laptop: 374.