Using Regresion Model on Laptop Sales Price
Introduction
Business Goal
In this scenario, we want to know what factors influence the
worldwide selling price of laptops, and we want to
know:
- Which variables are important in forecasting
laptop pricing - how effectively the variable describes a
laptop's price
Based on this business query, we gathered data from flipkart.com on the selling prices of computers from major laptop brands throughout the world.
With the supplied independent variables, we will be asked to model the pricing of a laptop. As a result, we can plan our next moves and company plans. Furthermore, the model would help management understand market pricing.
DataSet Overview
The Laptop Dataset contains 23 features/attributes and 896 records.Let’s explore through the features of the Dataset.
- brand: Manufacturer of Laptop.
- model: Model name of the Laptop.
- processor_brand: Manufacturer of the Processor.
- processor_name: An integral part of the laptop, the processor will determine how powerful your computer is.
- processor_gnrtn: A geneartion is a group of processors that were launched in some particular years and had significant improvements from processors launched earlier.
- ram_gb: The storage size of ram in GB(Giga Bytes)
- ram_type: The type of RAM used. This feature contains values such as-DDR4,LPDDR4X,LPDDR4,LPDDR3,DDR3,DDR5.
- ssd : The storage size of SSD.
- hdd : The storage size of Hard Disk used .
- os : Operating system such as Windows, DOS, Mac.
- os_bit : OS Bit with values as 64 bit, 32 bit.
- graphic_card_gb : The size of graphic card used in the Laptop.
- weight : The type of Laptop based on weight. This feature contains values such as- Casual, ThinNlight, Gaming.
- display_size : Display size in inches .
- warranty : Warranty in no. of years .
- Touchscreen : This feature reveals whether the Laptop has touchscreen feature or not.
- msoffice : This feature reveals whether the Laptop has MS Office installed or not.
- old_price : Price of the Laptop when it was released. discount : Discount percentage for price available for the laptop.
- star_rating : User rating for the laptop with max rating as 5.0.
- ratings : Total no. of ratings received for the Laptop model.
- reviews : No. of reviews received for the Laptop model.
This data set is available on kagle you can find it in here.
1. Data Preparation
Load the require package
library(lubridate) # Provides functions for working with dates and times in R
library(dplyr) # Package for data manipulation
library(MASS) # Collection of functions and datasets for applied statistics
library(tidyverse) # Collection of packages for data manipulation and visualization
library(caret) # Comprehensive toolkit for machine learning
library(plotly) # Interactive plotting library
library(data.table) # Enhanced data frame for efficient data manipulation
library(GGally) # Extension to ggplot2 for exploratory data analysis
library(tidymodels) # Framework for modeling and machine learning
library(car) # Tools for applied regression analysis
library(scales) # Functions for scaling and formatting plot axes
library(lmtest) # Diagnostic tests and specification tests for linear regression modelsLoad DataSet
# read data copiers
laptopSales <- read.csv("data_input/Laptop_data.csv")
rmarkdown::paged_table(laptopSales)2. Data Cleansing
Checking Data Structure
This stage is carried out with the aim of checking regarding the suitability of the data type for each column/variable of the data that we have.
glimpse(laptopSales)#> Rows: 896
#> Columns: 23
#> $ brand <chr> "Lenovo", "Lenovo", "Avita", "Avita", "Avita", "Avita"…
#> $ model <chr> "A6-9225", "Ideapad", "PURA", "PURA", "PURA", "PURA", …
#> $ processor_brand <chr> "AMD", "AMD", "AMD", "AMD", "AMD", "AMD", "AMD", "AMD"…
#> $ processor_name <chr> "A6-9225 Processor", "APU Dual", "APU Dual", "APU Dual…
#> $ processor_gnrtn <chr> "10th", "10th", "10th", "10th", "10th", "10th", "10th"…
#> $ ram_gb <chr> "4 GB GB", "4 GB GB", "4 GB GB", "4 GB GB", "4 GB GB",…
#> $ ram_type <chr> "DDR4", "DDR4", "DDR4", "DDR4", "DDR4", "DDR4", "DDR4"…
#> $ ssd <chr> "0 GB", "0 GB", "128 GB", "128 GB", "256 GB", "256 GB"…
#> $ hdd <chr> "1024 GB", "512 GB", "0 GB", "0 GB", "0 GB", "0 GB", "…
#> $ os <chr> "Windows", "Windows", "Windows", "Windows", "Windows",…
#> $ os_bit <chr> "64-bit", "64-bit", "64-bit", "64-bit", "64-bit", "64-…
#> $ graphic_card_gb <int> 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 2, …
#> $ weight <chr> "ThinNlight", "Casual", "ThinNlight", "ThinNlight", "T…
#> $ display_size <dbl> NA, NA, NA, NA, NA, 14.0, 14.0, NA, 14.0, NA, 15.6, NA…
#> $ warranty <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ Touchscreen <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ msoffice <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ latest_price <int> 24990, 19590, 19990, 21490, 24990, 24990, 20900, 21896…
#> $ old_price <int> 32790, 21325, 27990, 27990, 33490, 33490, 22825, 0, 27…
#> $ discount <int> 23, 8, 28, 23, 25, 25, 8, 0, 2, 13, 17, 8, 9, 0, 6, 17…
#> $ star_rating <dbl> 3.7, 3.6, 3.7, 3.7, 3.7, 3.7, 3.9, 3.9, 0.0, 4.2, 2.3,…
#> $ ratings <int> 63, 1894, 1153, 1153, 1657, 1657, 1185, 219, 0, 76, 3,…
#> $ reviews <int> 12, 256, 159, 159, 234, 234, 141, 18, 0, 13, 0, 5, 1, …
💡 From the results of the inspection above, what information do we get?
- data has 896 rows, and 23 variables/columns
- Variables with numeric data type are latest_price, old_price, discount , star_rating, ratings, reviews
- order date, ship date = date
- processor_brand, processor_gnrtn, ram_gb, ram_type, ssd, hdd, os, os_bit, , weight, display_size, warranty, Touchscreen, msoffice = factor
Data Type Adjustment
Converting features into Numeric data type, We have some categorical features in the dataset that contains numerical values in it.The features like ram_gb,ssd,hdd,os_bit has numerical values in it.
So instead of encoding we can directly convert the data type of these columns and provide the same numerical values present in it for modelling .
laptopSales_clean <- laptopSales %>%
mutate(ram_gb = as.numeric(gsub(" GB", "", ram_gb)),
ssd = as.numeric(gsub(" GB", "", ssd)),
hdd = as.numeric(gsub(" GB", "", hdd)),
os_bit = as.numeric(gsub("-bit", "", os_bit))) %>%
mutate_at(vars(brand, model, processor_name, display_size), as.factor)Missing Value Check & Handling
This step needs to be done to find out if there is a missing value
from the data we have, to check it we can use is.na
Checking in Size
colSums(is.na(laptopSales_clean))#> brand model processor_brand processor_name processor_gnrtn
#> 0 95 0 0 239
#> ram_gb ram_type ssd hdd os
#> 0 0 0 0 0
#> os_bit graphic_card_gb weight display_size warranty
#> 0 0 0 332 0
#> Touchscreen msoffice latest_price old_price discount
#> 0 0 0 0 0
#> star_rating ratings reviews
#> 0 0 0
Checking in percentage
# Calculate the percentage of missing values in each variable
missing_percent <- colMeans(is.na(laptopSales_clean)) * 100
# Create a data frame for visualization
missing_data <- data.frame(variable = names(missing_percent),
missing_percent = missing_percent)
# Filter variables with missing values
missing_data <- missing_data[missing_data$missing_percent > 0, ]
# Create the interactive bar plot with tooltips
p <- plot_ly(data = missing_data, x = ~variable, y = ~missing_percent, type = "bar",
text = ~paste0(variable, ": ", round(missing_percent, 2), "%"),
hoverinfo = "text", marker = list(color = "steelblue"))
# Set plot labels and title
p <- layout(p, x = list(title = "Variable"), y = list(title = "Percentage of Missing Values"),
title = "Percentage of Missing Values in Variables")
pA variable with a high percentage of missing values, such as 37%
display_size, raises concerns about the overall quality and
reliability of the data. It suggests that a significant portion of the
variable’s values is unknown or unrecorded, potentially introducing bias
or inaccuracies into the analysis. So i decided to drop the column out
of the data Frame
laptopSales_clean <- laptopSales_clean %>%
select(-c(display_size)) To handle missing other values in my regression model, I have idea depending on the nature of the missing data and the requirements of my analysis.
Since deleted data may include useful information, we are unable to completely remove all of the missing values from the dataset. - May result in the erasure of a significant portion of the data. - Can lead to a bias in the dataset if a significant portion of a certain kind of variable is removed from it. - When there are missing data, the production model won’t know what to do. Therefore, we need to use some additional methods to deal with these missing information.
Data Imputation Techniques
Using an arbitrary value to fill in the missing data is a crucial
Imputation approach since it can handle both categorical and numerical
variables. According to this method, we should aggregate the missing
values in a column and assign them to a new value that is well beyond
the column’s range. - Sometimes the missing data is useful in and of
itself, making assumptions about it based on the most prevalent class
would be inappropriate. In this situation, a value like
Unknown might be used to replace them.
# Create holdingDataFrame with only the 'brand' variable
holdingDataFrame <- data.frame(brand = laptopSales_clean$brand, model = laptopSales_clean$model)
head(holdingDataFrame)#> brand model
#> 1 Lenovo A6-9225
#> 2 Lenovo Ideapad
#> 3 Avita PURA
#> 4 Avita PURA
#> 5 Avita PURA
#> 6 Avita PURA
# Replace missing values in character columns with "Unknown"
laptopSales_clean <- laptopSales_clean %>%
mutate(across(where(is.character), ~ifelse(is.na(.), "Unknown", .)))
# Replace missing values in factor columns with "Unknown"
laptopSales_clean <- laptopSales_clean %>%
mutate(across(where(is.factor), ~ifelse(is.na(.), "Unknown", .)))
# Replace missing values in numeric columns with an arbitrary value
arbitrary_value <- -999 # You can choose any value that is outside the range of your numeric variables
laptopSales_clean <- laptopSales_clean %>%
mutate(across(where(is.numeric), ~ifelse(is.na(.), arbitrary_value, .)))# Drop the 'brand' variable from laptopSales_clean
laptopSales_clean$brand <- NULL
laptopSales_clean$model <- NULL
# Add the 'brand' variable back to laptopSales_clean
laptopSales_clean <- cbind(holdingDataFrame, laptopSales_clean)
# Print the updated data frame
head(laptopSales_clean)#> brand model processor_brand processor_name processor_gnrtn ram_gb ram_type
#> 1 Lenovo A6-9225 AMD 1 10th 4 DDR4
#> 2 Lenovo Ideapad AMD 2 10th 4 DDR4
#> 3 Avita PURA AMD 2 10th 4 DDR4
#> 4 Avita PURA AMD 2 10th 4 DDR4
#> 5 Avita PURA AMD 2 10th 4 DDR4
#> 6 Avita PURA AMD 2 10th 8 DDR4
#> ssd hdd os os_bit graphic_card_gb weight warranty Touchscreen
#> 1 0 1024 Windows 64 0 ThinNlight 0 No
#> 2 0 512 Windows 64 0 Casual 0 No
#> 3 128 0 Windows 64 0 ThinNlight 0 No
#> 4 128 0 Windows 64 0 ThinNlight 0 No
#> 5 256 0 Windows 64 0 ThinNlight 0 No
#> 6 256 0 Windows 64 0 ThinNlight 0 No
#> msoffice latest_price old_price discount star_rating ratings reviews
#> 1 No 24990 32790 23 3.7 63 12
#> 2 No 19590 21325 8 3.6 1894 256
#> 3 No 19990 27990 28 3.7 1153 159
#> 4 No 21490 27990 23 3.7 1153 159
#> 5 No 24990 33490 25 3.7 1657 234
#> 6 No 24990 33490 25 3.7 1657 234
glimpse(laptopSales_clean)#> Rows: 896
#> Columns: 22
#> $ brand <fct> Lenovo, Lenovo, Avita, Avita, Avita, Avita, HP, Lenovo…
#> $ model <fct> A6-9225, Ideapad, PURA, PURA, PURA, PURA, APU, APU, At…
#> $ processor_brand <chr> "AMD", "AMD", "AMD", "AMD", "AMD", "AMD", "AMD", "AMD"…
#> $ processor_name <int> 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 6, 6, 6, 7, 4, 4, 4, 7, …
#> $ processor_gnrtn <chr> "10th", "10th", "10th", "10th", "10th", "10th", "10th"…
#> $ ram_gb <dbl> 4, 4, 4, 4, 4, 8, 4, 4, 32, 4, 4, 4, 4, 8, 4, 4, 4, 8,…
#> $ ram_type <chr> "DDR4", "DDR4", "DDR4", "DDR4", "DDR4", "DDR4", "DDR4"…
#> $ ssd <dbl> 0, 0, 128, 128, 256, 256, 0, 0, 32, 256, 0, 0, 0, 512,…
#> $ hdd <dbl> 1024, 512, 0, 0, 0, 0, 1024, 1024, 0, 0, 1024, 1024, 1…
#> $ os <chr> "Windows", "Windows", "Windows", "Windows", "Windows",…
#> $ os_bit <dbl> 64, 64, 64, 64, 64, 64, 32, 64, 32, 64, 64, 64, 64, 32…
#> $ graphic_card_gb <int> 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 2, …
#> $ weight <chr> "ThinNlight", "Casual", "ThinNlight", "ThinNlight", "T…
#> $ warranty <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ Touchscreen <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ msoffice <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ latest_price <int> 24990, 19590, 19990, 21490, 24990, 24990, 20900, 21896…
#> $ old_price <int> 32790, 21325, 27990, 27990, 33490, 33490, 22825, 0, 27…
#> $ discount <int> 23, 8, 28, 23, 25, 25, 8, 0, 2, 13, 17, 8, 9, 0, 6, 17…
#> $ star_rating <dbl> 3.7, 3.6, 3.7, 3.7, 3.7, 3.7, 3.9, 3.9, 0.0, 4.2, 2.3,…
#> $ ratings <int> 63, 1894, 1153, 1153, 1657, 1657, 1185, 219, 0, 76, 3,…
#> $ reviews <int> 12, 256, 159, 159, 234, 234, 141, 18, 0, 13, 0, 5, 1, …
Categorical Data Encoding
Categorical data encoding is an essential step in regression
analysis because most regression algorithms cannot directly handle
categorical variables. Categorical variables represent qualitative data
that are typically divided into distinct categories or groups, such as
processor_gnrtn, ram_type,
Touchscreen,msoffice, os,
weight, and processor_brand.
categorical data encoding is necessary for regression analysis because it allows us to transform qualitative categorical variables into numerical representations that can be processed by regression algorithms. By doing so, we can include categorical variables in the regression model and uncover any relationships or patterns they may have with the dependent variable.
library(dplyr)
# Define the custom mapping
custom_mapping <- list(
processor_gnrtn = c('4th' = 0, '7th' = 1, '8th' = 2, 'Unknown' = 3, '9th' = 4, '10th' = 5, '11th' = 6, '12th' = 7),
ram_type = c('Ddr3' = 0, 'Lpddr3' = 1, 'Ddr4' = 2, 'Lpddr4' = 3, 'Lpddr4x' = 4, 'Ddr5' = 5),
Touchscreen = c('No' = 0, 'Yes' = 1),
msoffice = c('No' = 0, 'Yes' = 1),
os = c('Windows' = 0, 'Mac' = 1, 'DOS' = 2),
weight = c('Casual' = 0, 'Gaming' = 1, 'ThinNlight' = 2),
processor_brand = c('AMD' = 0, 'Intel' = 1, 'M1' = 2, 'MediaTek' = 3, 'Qualcomm' = 4)
)
# Apply the custom mapping using mutate() and ifelse()
laptopSales_clean <- laptopSales_clean %>%
mutate(processor_gnrtn = ifelse(processor_gnrtn %in% names(custom_mapping$processor_gnrtn),
custom_mapping$processor_gnrtn[processor_gnrtn],
processor_gnrtn),
ram_type = ifelse(ram_type %in% names(custom_mapping$ram_type),
custom_mapping$ram_type[ram_type],
ram_type),
Touchscreen = ifelse(Touchscreen %in% names(custom_mapping$Touchscreen),
custom_mapping$Touchscreen[Touchscreen],
Touchscreen),
msoffice = ifelse(msoffice %in% names(custom_mapping$msoffice),
custom_mapping$msoffice[msoffice],
msoffice),
os = ifelse(os %in% names(custom_mapping$os),
custom_mapping$os[os],
os),
weight = ifelse(weight %in% names(custom_mapping$weight),
custom_mapping$weight[weight],
weight),
processor_brand = ifelse(processor_brand %in% names(custom_mapping$processor_brand),
custom_mapping$processor_brand[processor_brand],
processor_brand)
)3. Exploratory Data Analysis (EDA)
Exploratory data analysis is a phase where we explore the data variables, see if there are any pattern that can indicate any kind of correlation between variables.
Find the Pearson correlation between features.
ggcorr(laptopSales_clean, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)In the correlation chart, - it can be seen that only a few variables
have a positive effect on latest price where the
old_price factor has the highest positive correlation compared
to other factors. - there are several variables that are moderately
correlated with the last price or equal to a correlation >= 0.5 such
as graphic_card_gb, ssd, and
ram_gb
4. Modelling
Feature Selection
In feature selection I will choose a predictor variable for target prediction, I will use several methods and then compare the results of the model :
head(laptopSales_clean)#> brand model processor_brand processor_name processor_gnrtn ram_gb ram_type
#> 1 Lenovo A6-9225 0 1 5 4 DDR4
#> 2 Lenovo Ideapad 0 2 5 4 DDR4
#> 3 Avita PURA 0 2 5 4 DDR4
#> 4 Avita PURA 0 2 5 4 DDR4
#> 5 Avita PURA 0 2 5 4 DDR4
#> 6 Avita PURA 0 2 5 8 DDR4
#> ssd hdd os os_bit graphic_card_gb weight warranty Touchscreen msoffice
#> 1 0 1024 0 64 0 2 0 0 0
#> 2 0 512 0 64 0 0 0 0 0
#> 3 128 0 0 64 0 2 0 0 0
#> 4 128 0 0 64 0 2 0 0 0
#> 5 256 0 0 64 0 2 0 0 0
#> 6 256 0 0 64 0 2 0 0 0
#> latest_price old_price discount star_rating ratings reviews
#> 1 24990 32790 23 3.7 63 12
#> 2 19590 21325 8 3.6 1894 256
#> 3 19990 27990 28 3.7 1153 159
#> 4 21490 27990 23 3.7 1153 159
#> 5 24990 33490 25 3.7 1657 234
#> 6 24990 33490 25 3.7 1657 234
Model None
# model without predictors of data crime for `latest_price`
model_none <- lm(formula = latest_price ~ 1,
data = laptopSales_clean)Model with all predictors
The formula for a regression model with all predictor variabel has is:
model_all <- lm(formula = latest_price ~ .,
data = laptopSales_clean)
summary(model_all) #>
#> Call:
#> lm(formula = latest_price ~ ., data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -59530 -6404 0 5486 154406
#>
#> Coefficients: (12 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 67657.06641 22676.77917 2.984 0.002955 **
#> brandAPPLE 20857.43225 22276.19263 0.936 0.349458
#> brandASUS -1517.39740 12305.44877 -0.123 0.901899
#> brandAvita -18634.75488 22534.35821 -0.827 0.408567
#> brandDELL -13556.46677 14458.04240 -0.938 0.348774
#> brandHP -13960.93080 12614.79504 -1.107 0.268825
#> brandiball -11378.06995 28175.81766 -0.404 0.686474
#> brandInfinix -47712.72966 26965.50961 -1.769 0.077293 .
#> brandlenovo 8713.51511 17858.80818 0.488 0.625776
#> brandLenovo -2254.64758 13051.45745 -0.173 0.862901
#> brandLG 12507.26666 22179.04578 0.564 0.573000
#> brandMi -23678.24983 23945.94354 -0.989 0.323116
#> brandMICROSOFT 24235.39734 23246.30371 1.043 0.297542
#> brandMSI 9614.89660 27153.58592 0.354 0.723383
#> brandNokia -6476.70848 22551.41763 -0.287 0.774052
#> brandrealme -15140.66787 24359.62035 -0.622 0.534456
#> brandRedmiBook -41951.90792 19211.15083 -2.184 0.029336 *
#> brandSAMSUNG 15382.07953 30623.03464 0.502 0.615622
#> brandSmartron -25462.28515 24181.17227 -1.053 0.292739
#> brandVaio -10075.80053 26698.35016 -0.377 0.706003
#> model14s 192.79509 18301.19526 0.011 0.991598
#> model15 24097.85256 21331.79767 1.130 0.259030
#> model15-ec1105AX -19765.80844 23517.22699 -0.840 0.400945
#> model15q -6720.48194 19025.69913 -0.353 0.724028
#> model15s -11280.94463 17504.77836 -0.644 0.519510
#> model250 -7585.90395 23531.16795 -0.322 0.747270
#> model250-G6 -8156.00671 23431.70644 -0.348 0.727895
#> model3000 1094.43466 24985.01364 0.044 0.965074
#> model3511 -10111.36172 24915.02239 -0.406 0.684997
#> model430 1842.87174 23607.53165 0.078 0.937802
#> modelA6-9225 -20981.08957 24831.28072 -0.845 0.398450
#> modelAlpha -58253.94690 23823.73724 -2.445 0.014739 *
#> modelAMD 119.51677 23971.81365 0.005 0.996024
#> modelAPU -46179.04470 19884.66840 -2.322 0.020520 *
#> modelAspire -16950.57068 21133.51599 -0.802 0.422803
#> modelAsus -18410.05267 24041.50775 -0.766 0.444094
#> modelASUS -17708.88614 18115.51515 -0.978 0.328656
#> modelAthlon -71605.00706 24464.62781 -2.927 0.003543 **
#> modelB50-70 -11123.97309 25483.95915 -0.437 0.662611
#> modelBook -3622.58717 17490.11001 -0.207 0.835979
#> modelBook(Slim) NA NA NA NA
#> modelBravo -32714.62029 23926.30146 -1.367 0.171998
#> modelCeleron -13413.51830 19643.27247 -0.683 0.494940
#> modelChromebook -16127.64724 17707.31743 -0.911 0.362741
#> modelCommercial -20607.83581 24028.84499 -0.858 0.391411
#> modelCompBook NA NA NA NA
#> modelConceptD -13605.78623 26556.47266 -0.512 0.608590
#> modelCosmos 4844.21003 18958.99429 0.256 0.798410
#> modelCreator -2505.33074 23745.40805 -0.106 0.916005
#> modelDA -3811.72797 23281.02839 -0.164 0.869997
#> modelDELL -12186.27995 22001.91207 -0.554 0.579854
#> modelDelta 8532.73035 24015.85442 0.355 0.722482
#> modelDual -35102.59949 26522.74085 -1.323 0.186134
#> modelE -5838.82754 18678.22083 -0.313 0.754683
#> modelEeeBook -17455.81161 19115.23668 -0.913 0.361479
#> modelEnvy 5653.15121 18068.23538 0.313 0.754473
#> modelExpertBook -3691.05193 18025.46780 -0.205 0.837816
#> modelExtensa -20976.51161 26815.90097 -0.782 0.434355
#> modelF17 -36324.31181 24028.49661 -1.512 0.131088
#> modelG15 -21241.67505 21253.33011 -0.999 0.317945
#> modelG3 -15242.80080 24893.99382 -0.612 0.540546
#> modelG5 -13834.24680 22272.83386 -0.621 0.534732
#> modelG7 26620.37540 25959.75659 1.025 0.305530
#> modelGalaxy NA NA NA NA
#> modelGAMING -18801.16342 24925.52434 -0.754 0.450944
#> modelGE76 82907.66296 24383.83636 3.400 0.000714 ***
#> modelGF63 -32660.78088 18050.79401 -1.809 0.070850 .
#> modelGF65 -49733.45176 20530.11489 -2.422 0.015686 *
#> modelGP65 17839.22983 24353.93435 0.732 0.464126
#> modelGP76 -2794.33781 24236.05426 -0.115 0.908245
#> modelGram NA NA NA NA
#> modelGS -20417.01897 24975.20716 -0.817 0.413945
#> modelGS66 9772.21867 24240.72863 0.403 0.686983
#> modelHP -8328.56825 18097.79860 -0.460 0.645527
#> modelIdeapad -9490.09394 18511.76857 -0.513 0.608368
#> modelIdeaPad -8261.96979 18506.92500 -0.446 0.655437
#> modelIDEAPAD -27694.33080 24851.46810 -1.114 0.265519
#> modelINBook 43462.57580 19556.49311 2.222 0.026596 *
#> modelInpiron -3248.22309 24879.11773 -0.131 0.896163
#> modelInspiron -1082.02244 18835.54547 -0.057 0.954208
#> modelINSPIRON -2297.95132 21968.41401 -0.105 0.916723
#> modelInsprion -6823.87475 24999.26495 -0.273 0.784968
#> modelIntel 7557.87177 20163.29294 0.375 0.707906
#> modelKatana -36046.76369 18685.05003 -1.929 0.054141 .
#> modelLegion -10279.34083 18970.94270 -0.542 0.588108
#> modelLenovo -3409.97116 24771.53672 -0.138 0.890554
#> modelLiber 5223.77045 10804.61869 0.483 0.628920
#> modelMacBook NA NA NA NA
#> modelModern -27873.79796 17996.47029 -1.549 0.121901
#> modelNitro -2740.33058 22529.69491 -0.122 0.903228
#> modelNotebook -8679.14290 23524.91475 -0.369 0.712296
#> modelOmen -9137.66245 20714.61814 -0.441 0.659271
#> modelOMEN -9493.33349 18567.43496 -0.511 0.609321
#> modelPavilion -4451.83204 16936.59628 -0.263 0.792747
#> modelPentium -8357.98578 18777.33878 -0.445 0.656387
#> modelPredator -21419.53855 21914.72342 -0.977 0.328730
#> modelPrestige -13694.28024 18489.22922 -0.741 0.459163
#> modelPro 7270.55427 19441.14358 0.374 0.708542
#> modelPulse -37389.86396 19664.78730 -1.901 0.057693 .
#> modelPURA NA NA NA NA
#> modelPureBook NA NA NA NA
#> modelRog 6201.99937 24121.19874 0.257 0.797168
#> modelROG -7311.70883 17888.18196 -0.409 0.682860
#> modelRyzen -8266.15721 17286.38328 -0.478 0.632675
#> modelSE NA NA NA NA
#> modelSpectre 35620.31681 17488.24282 2.037 0.042070 *
#> modelSpin 36748.95547 27401.51635 1.341 0.180344
#> modelStealth -9213.45777 20775.51321 -0.443 0.657567
#> modelSummit -35808.58616 23555.60621 -1.520 0.128950
#> modelSurface NA NA NA NA
#> modelSwift -10707.53910 21928.57789 -0.488 0.625507
#> modelSword -34241.89252 20788.07859 -1.647 0.099999 .
#> modelt.book NA NA NA NA
#> modelThinkbook -1640.55435 20492.00492 -0.080 0.936215
#> modelThinkBook -2699.63481 20267.76615 -0.133 0.894077
#> modelThinkpad -51593.20398 25614.99920 -2.014 0.044399 *
#> modelThinkPad 19267.96161 19252.41994 1.001 0.317290
#> modelThinpad -10686.85099 24542.08401 -0.435 0.663379
#> modelTravelmate -17847.06068 23016.98810 -0.775 0.438391
#> modelTUF -25770.35401 18298.35503 -1.408 0.159504
#> modelv15 -16942.47009 24428.33439 -0.694 0.488205
#> modelV15 -17083.33707 24669.40340 -0.692 0.488875
#> modelVivo -14447.92878 23902.67216 -0.604 0.545755
#> modelVivoBook -12956.17719 17475.92680 -0.741 0.458733
#> modelVivoBook14 -8953.63224 24054.53499 -0.372 0.709848
#> modelVostro -5245.59184 19081.74915 -0.275 0.783479
#> modelWF65 NA NA NA NA
#> modelX1 NA NA NA NA
#> modelx360 -3733.32671 20358.31725 -0.183 0.854556
#> modelX390 9489.82381 24530.60899 0.387 0.698988
#> modelXPS 26596.75274 20586.29599 1.292 0.196825
#> modelYoga -4480.19759 19073.28931 -0.235 0.814365
#> modelZenbook -3724.22436 18713.69951 -0.199 0.842316
#> modelZenBook 144.07041 18006.70785 0.008 0.993619
#> modelZephyrus 49038.22746 19309.07108 2.540 0.011326 *
#> processor_brand -5603.88832 3333.42429 -1.681 0.093216 .
#> processor_name -302.57815 199.62706 -1.516 0.130073
#> processor_gnrtn -89.19753 880.65195 -0.101 0.919355
#> ram_gb 1022.45171 180.88930 5.652 0.0000000236302574 ***
#> ram_typeDDR4 -5799.66385 7831.00949 -0.741 0.459200
#> ram_typeDDR5 17196.23435 10756.43666 1.599 0.110371
#> ram_typeLPDDR3 18696.10910 9071.72573 2.061 0.039705 *
#> ram_typeLPDDR4 -14529.36400 8818.50173 -1.648 0.099914 .
#> ram_typeLPDDR4X -12694.99083 7907.73032 -1.605 0.108890
#> ssd 50.75616 3.49021 14.542 < 0.0000000000000002 ***
#> hdd 13.41026 2.09281 6.408 0.0000000002822356 ***
#> os 22683.05295 2931.22877 7.738 0.0000000000000382 ***
#> os_bit -155.54614 85.40778 -1.821 0.069030 .
#> graphic_card_gb 4241.29732 556.03883 7.628 0.0000000000000845 ***
#> weight 625.64006 940.97392 0.665 0.506358
#> warranty 2685.30535 1523.67560 1.762 0.078470 .
#> Touchscreen 10505.74460 2580.38097 4.071 0.0000524482570815 ***
#> msoffice -1730.07157 1829.33162 -0.946 0.344630
#> old_price 0.28283 0.02102 13.457 < 0.0000000000000002 ***
#> discount -1097.23683 86.04487 -12.752 < 0.0000000000000002 ***
#> star_rating -1684.90027 377.03575 -4.469 0.0000092668907647 ***
#> ratings -9.13735 3.59625 -2.541 0.011290 *
#> reviews 64.23836 30.75408 2.089 0.037113 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 16370 on 655 degrees of freedom
#> (95 observations deleted due to missingness)
#> Multiple R-squared: 0.887, Adjusted R-squared: 0.862
#> F-statistic: 35.45 on 145 and 655 DF, p-value: < 0.00000000000000022
We obtain a lot more coefficients than we do variables. This
is due to the category variables being converted into dummy
variables. Using contrasts, for example, we may
convert the warranty variable into many dummy
variables.
- Interpretation of the coefficients for numerical predictors:
- old_price = 0.24891, meaning that the
latest_pricevalue will increasse by 0.24891 provided that the values of other predictor variables are fixed. - discount = -894.37347, meaning that the
latest_pricevalue will decrease by -894.37347 provided that the values of other predictor variables are fixed.
- Predictor significance:
- predictor variables that significantly influence inequality are
brandRedmiBook,modelAlpha,modelAPU,modelAthlon,modelGE76,modelGF65,modelINBook,modelSpectre,modelThinkpad,modelZephyrus,ram_gb,ram_typeLPDDR3,ssd,hdd,os,graphic_card_gb,Touchscreen,old_price,discount,star_rating,ratingsandreviews
- Adjusted R-squared:
- 0.862 , meaning that our model can properly explain the inequality of 86.2%
Model with correlation value (strong)
To select a variable that has the potential to be used as a predictor in the regression model, a predictor variable that has a strong correlation with the target variable will be selected.
#cek korelasi
ggcorr(laptopSales_clean, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)💡 Insight: Strong correlation > 0.6 old_price
# Models with columns that have a fairly strong correlation
model_selection <- lm(formula = latest_price ~ old_price,
data = laptopSales_clean)
summary(model_selection)#>
#> Call:
#> lm(formula = latest_price ~ old_price, data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -51799 -13547 -8447 1917 420858
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 21132.14002 1934.73437 10.92 <0.0000000000000002 ***
#> old_price 0.62607 0.01856 33.74 <0.0000000000000002 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 30930 on 894 degrees of freedom
#> Multiple R-squared: 0.5601, Adjusted R-squared: 0.5596
#> F-statistic: 1138 on 1 and 894 DF, p-value: < 0.00000000000000022
Model Stepwise Reggresion backward
# Perform stepwise model selection using AIC
model_backward <- stepAIC(model_all,
direction = "backward",
trace = F)
# Print the summary of the simplified model
summary(model_backward)#>
#> Call:
#> lm(formula = latest_price ~ brand + model + processor_brand +
#> processor_name + ram_gb + ram_type + ssd + hdd + os + os_bit +
#> graphic_card_gb + warranty + Touchscreen + old_price + discount +
#> star_rating + ratings + reviews, data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -58883 -6584 0 5311 154896
#>
#> Coefficients: (12 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 67806.03866 22632.07445 2.996 0.002838 **
#> brandAPPLE 20953.04832 22172.80842 0.945 0.345012
#> brandASUS -1997.95099 12266.47860 -0.163 0.870664
#> brandAvita -18160.19145 22345.42127 -0.813 0.416682
#> brandDELL -13865.08518 14372.99519 -0.965 0.335068
#> brandHP -14458.06590 12584.19477 -1.149 0.251012
#> brandiball -12371.59967 27839.25253 -0.444 0.656904
#> brandInfinix -47702.06029 26864.11590 -1.776 0.076248 .
#> brandlenovo 7963.03635 17704.43392 0.450 0.653020
#> brandLenovo -3183.27170 12948.38845 -0.246 0.805880
#> brandLG 12516.61507 22033.17585 0.568 0.570174
#> brandMi -25183.06463 23784.59603 -1.059 0.290081
#> brandMICROSOFT 24259.23809 23108.44245 1.050 0.294196
#> brandMSI 11065.30557 26989.69853 0.410 0.681952
#> brandNokia -5863.16494 22407.81243 -0.262 0.793668
#> brandrealme -16955.97004 24154.84702 -0.702 0.482946
#> brandRedmiBook -41477.07046 18907.91790 -2.194 0.028611 *
#> brandSAMSUNG 15129.77772 30555.84508 0.495 0.620658
#> brandSmartron -25438.26273 24100.21780 -1.056 0.291575
#> brandVaio -9759.33333 26601.52838 -0.367 0.713833
#> model14s 326.58570 18018.05240 0.018 0.985544
#> model15 25010.09178 21272.33276 1.176 0.240136
#> model15-ec1105AX -19076.77360 23382.20096 -0.816 0.414871
#> model15q -6306.35390 18995.74333 -0.332 0.740004
#> model15s -11217.45588 17189.38280 -0.653 0.514255
#> model250 -8509.49091 23388.19252 -0.364 0.716097
#> model250-G6 -7447.92017 23368.37002 -0.319 0.750042
#> model3000 1090.98555 24775.64341 0.044 0.964890
#> model3511 -9904.17253 24712.29756 -0.401 0.688713
#> model430 1874.94673 23428.40504 0.080 0.936239
#> modelA6-9225 -19501.76120 24510.89299 -0.796 0.426530
#> modelAlpha -59045.93937 23759.42904 -2.485 0.013197 *
#> modelAMD 301.28654 23864.48520 0.013 0.989931
#> modelAPU -46115.72378 19668.17063 -2.345 0.019339 *
#> modelAspire -16693.12886 21002.77652 -0.795 0.427013
#> modelAsus -18717.01731 23969.92036 -0.781 0.435169
#> modelASUS -16874.09030 17923.88681 -0.941 0.346830
#> modelAthlon -71356.13370 24322.42210 -2.934 0.003465 **
#> modelB50-70 -10741.63618 25445.77519 -0.422 0.673062
#> modelBook -2332.60760 17342.25166 -0.135 0.893045
#> modelBook(Slim) NA NA NA NA
#> modelBravo -34513.45598 23833.25635 -1.448 0.148059
#> modelCeleron -12515.60319 19574.01204 -0.639 0.522786
#> modelChromebook -16036.59028 17666.20472 -0.908 0.364340
#> modelCommercial -20636.91522 23925.98730 -0.863 0.388709
#> modelCompBook NA NA NA NA
#> modelConceptD -13197.18411 26510.25918 -0.498 0.618781
#> modelCosmos 3868.64321 18708.01404 0.207 0.836237
#> modelCreator -4081.35577 23652.66591 -0.173 0.863055
#> modelDA -3430.64774 23215.18377 -0.148 0.882565
#> modelDELL -12946.50653 21827.09900 -0.593 0.553292
#> modelDelta 6396.71477 23906.05509 0.268 0.789109
#> modelDual -35165.47085 26443.48992 -1.330 0.184034
#> modelE -6219.52003 18394.31076 -0.338 0.735379
#> modelEeeBook -17210.20587 19074.59861 -0.902 0.367250
#> modelEnvy 4922.31372 17914.21273 0.275 0.783578
#> modelExpertBook -3756.84414 17864.99041 -0.210 0.833506
#> modelExtensa -21026.37592 26726.00875 -0.787 0.431718
#> modelF17 -36122.58689 23890.91616 -1.512 0.131019
#> modelG15 -20994.60145 21091.25850 -0.995 0.319899
#> modelG3 -15110.17732 24734.61163 -0.611 0.541482
#> modelG5 -13711.31830 22054.03635 -0.622 0.534345
#> modelG7 27232.97220 25727.39511 1.059 0.290207
#> modelGalaxy NA NA NA NA
#> modelGAMING -18181.71385 24791.04701 -0.733 0.463577
#> modelGE76 81892.64679 24302.69180 3.370 0.000796 ***
#> modelGF63 -33214.41541 18013.82313 -1.844 0.065657 .
#> modelGF65 -50064.36488 20469.32266 -2.446 0.014713 *
#> modelGP65 16245.70083 24244.73482 0.670 0.503047
#> modelGP76 -3900.76356 24136.66990 -0.162 0.871661
#> modelGram NA NA NA NA
#> modelGS -21296.62246 24718.67363 -0.862 0.389243
#> modelGS66 8496.18112 24121.02548 0.352 0.724778
#> modelHP -8547.78914 17899.08256 -0.478 0.633126
#> modelIdeapad -9667.28710 18325.03017 -0.528 0.597993
#> modelIdeaPad -7288.76931 18253.59578 -0.399 0.689797
#> modelIDEAPAD -26554.72599 24675.44681 -1.076 0.282250
#> modelINBook 44078.78723 19507.88632 2.260 0.024176 *
#> modelInpiron -2832.02241 24770.84449 -0.114 0.909012
#> modelInspiron -1070.46578 18573.07821 -0.058 0.954057
#> modelINSPIRON -2304.33060 21791.67925 -0.106 0.915818
#> modelInsprion -6845.91170 24778.24145 -0.276 0.782414
#> modelIntel 7607.31872 19940.20922 0.382 0.702951
#> modelKatana -37034.26537 18636.25253 -1.987 0.047313 *
#> modelLegion -10068.07968 18797.42637 -0.536 0.592409
#> modelLenovo -3911.19762 24597.96025 -0.159 0.873714
#> modelLiber 5527.72316 10402.71995 0.531 0.595340
#> modelMacBook NA NA NA NA
#> modelModern -29637.06619 17908.23882 -1.655 0.098413 .
#> modelNitro -2755.73145 22237.73513 -0.124 0.901415
#> modelNotebook -7057.91968 23205.71487 -0.304 0.761113
#> modelOmen -9609.74489 20546.19727 -0.468 0.640144
#> modelOMEN -9508.81833 18396.89082 -0.517 0.605420
#> modelPavilion -4722.40097 16732.17346 -0.282 0.777852
#> modelPentium -8141.37873 18686.95825 -0.436 0.663218
#> modelPredator -20946.26558 21789.81158 -0.961 0.336761
#> modelPrestige -15141.43210 18425.63339 -0.822 0.411512
#> modelPro 8407.70773 19316.80149 0.435 0.663521
#> modelPulse -38425.95861 19592.46262 -1.961 0.050270 .
#> modelPURA NA NA NA NA
#> modelPureBook NA NA NA NA
#> modelRog 6517.75715 24030.16006 0.271 0.786297
#> modelROG -7092.32998 17771.44135 -0.399 0.689959
#> modelRyzen -7880.46305 17078.40676 -0.461 0.644644
#> modelSE NA NA NA NA
#> modelSpectre 36141.42688 17378.75021 2.080 0.037946 *
#> modelSpin 37220.54151 27318.66816 1.362 0.173519
#> modelStealth -10111.38319 20715.98962 -0.488 0.625645
#> modelSummit -35622.67400 23518.76190 -1.515 0.130341
#> modelSurface NA NA NA NA
#> modelSwift -10264.08388 21861.21608 -0.470 0.638860
#> modelSword -36238.36542 20670.51738 -1.753 0.080043 .
#> modelt.book NA NA NA NA
#> modelThinkbook -841.34026 20288.13874 -0.041 0.966934
#> modelThinkBook -3301.01601 20053.40131 -0.165 0.869301
#> modelThinkpad -50421.24815 25293.66128 -1.993 0.046626 *
#> modelThinkPad 20571.66678 18957.74027 1.085 0.278260
#> modelThinpad -10439.11119 24345.30467 -0.429 0.668214
#> modelTravelmate -18495.91163 22827.78217 -0.810 0.418097
#> modelTUF -25327.45428 18104.70648 -1.399 0.162301
#> modelv15 -16653.78412 24289.61241 -0.686 0.493185
#> modelV15 -15848.00420 24320.60383 -0.652 0.514868
#> modelVivo -14531.33374 23783.68697 -0.611 0.541424
#> modelVivoBook -12900.05952 17327.98941 -0.744 0.456862
#> modelVivoBook14 -7604.50777 23794.69774 -0.320 0.749382
#> modelVostro -5467.23643 18687.85849 -0.293 0.769954
#> modelWF65 NA NA NA NA
#> modelX1 NA NA NA NA
#> modelx360 -3246.85193 20160.13618 -0.161 0.872101
#> modelX390 9722.52867 24421.41044 0.398 0.690675
#> modelXPS 26702.06998 20260.76332 1.318 0.187989
#> modelYoga -3936.56025 18831.46025 -0.209 0.834480
#> modelZenbook -3743.52604 18564.75392 -0.202 0.840255
#> modelZenBook 896.05591 17833.89463 0.050 0.959943
#> modelZephyrus 49860.14719 19172.83953 2.601 0.009516 **
#> processor_brand -5697.55200 3264.14668 -1.745 0.081365 .
#> processor_name -292.64779 180.23915 -1.624 0.104927
#> ram_gb 1020.65575 179.69537 5.680 0.00000002023641693 ***
#> ram_typeDDR4 -6127.10247 7352.74709 -0.833 0.404973
#> ram_typeDDR5 16969.90914 10336.51322 1.642 0.101121
#> ram_typeLPDDR3 18104.16847 8953.02393 2.022 0.043567 *
#> ram_typeLPDDR4 -14707.21090 8597.28133 -1.711 0.087611 .
#> ram_typeLPDDR4X -13022.64794 7456.08282 -1.747 0.081177 .
#> ssd 50.67682 3.40783 14.871 < 0.0000000000000002 ***
#> hdd 13.33086 2.08507 6.393 0.00000000030736938 ***
#> os 22751.68108 2863.88065 7.944 0.00000000000000848 ***
#> os_bit -144.35363 84.46219 -1.709 0.087905 .
#> graphic_card_gb 4166.09241 538.41993 7.738 0.00000000000003821 ***
#> warranty 2124.31800 1412.35192 1.504 0.133036
#> Touchscreen 9834.80543 2484.18944 3.959 0.00008347721279763 ***
#> old_price 0.28100 0.02093 13.428 < 0.0000000000000002 ***
#> discount -1095.16380 84.10894 -13.021 < 0.0000000000000002 ***
#> star_rating -1736.40019 370.99756 -4.680 0.00000347854288514 ***
#> ratings -9.07415 3.59120 -2.527 0.011745 *
#> reviews 63.74358 30.70616 2.076 0.038289 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 16350 on 658 degrees of freedom
#> (95 observations deleted due to missingness)
#> Multiple R-squared: 0.8867, Adjusted R-squared: 0.8623
#> F-statistic: 36.28 on 142 and 658 DF, p-value: < 0.00000000000000022
Model Stepwise Reggresion forward
# Perform stepwise model selection using AIC
model_forward <- stepAIC(model_all,
direction = "forward",
trace = F)
# Print the summary of the simplified model
summary(model_forward)#>
#> Call:
#> lm(formula = latest_price ~ brand + model + processor_brand +
#> processor_name + processor_gnrtn + ram_gb + ram_type + ssd +
#> hdd + os + os_bit + graphic_card_gb + weight + warranty +
#> Touchscreen + msoffice + old_price + discount + star_rating +
#> ratings + reviews, data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -59530 -6404 0 5486 154406
#>
#> Coefficients: (12 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 67657.06641 22676.77917 2.984 0.002955 **
#> brandAPPLE 20857.43225 22276.19263 0.936 0.349458
#> brandASUS -1517.39740 12305.44877 -0.123 0.901899
#> brandAvita -18634.75488 22534.35821 -0.827 0.408567
#> brandDELL -13556.46677 14458.04240 -0.938 0.348774
#> brandHP -13960.93080 12614.79504 -1.107 0.268825
#> brandiball -11378.06995 28175.81766 -0.404 0.686474
#> brandInfinix -47712.72966 26965.50961 -1.769 0.077293 .
#> brandlenovo 8713.51511 17858.80818 0.488 0.625776
#> brandLenovo -2254.64758 13051.45745 -0.173 0.862901
#> brandLG 12507.26666 22179.04578 0.564 0.573000
#> brandMi -23678.24983 23945.94354 -0.989 0.323116
#> brandMICROSOFT 24235.39734 23246.30371 1.043 0.297542
#> brandMSI 9614.89660 27153.58592 0.354 0.723383
#> brandNokia -6476.70848 22551.41763 -0.287 0.774052
#> brandrealme -15140.66787 24359.62035 -0.622 0.534456
#> brandRedmiBook -41951.90792 19211.15083 -2.184 0.029336 *
#> brandSAMSUNG 15382.07953 30623.03464 0.502 0.615622
#> brandSmartron -25462.28515 24181.17227 -1.053 0.292739
#> brandVaio -10075.80053 26698.35016 -0.377 0.706003
#> model14s 192.79509 18301.19526 0.011 0.991598
#> model15 24097.85256 21331.79767 1.130 0.259030
#> model15-ec1105AX -19765.80844 23517.22699 -0.840 0.400945
#> model15q -6720.48194 19025.69913 -0.353 0.724028
#> model15s -11280.94463 17504.77836 -0.644 0.519510
#> model250 -7585.90395 23531.16795 -0.322 0.747270
#> model250-G6 -8156.00671 23431.70644 -0.348 0.727895
#> model3000 1094.43466 24985.01364 0.044 0.965074
#> model3511 -10111.36172 24915.02239 -0.406 0.684997
#> model430 1842.87174 23607.53165 0.078 0.937802
#> modelA6-9225 -20981.08957 24831.28072 -0.845 0.398450
#> modelAlpha -58253.94690 23823.73724 -2.445 0.014739 *
#> modelAMD 119.51677 23971.81365 0.005 0.996024
#> modelAPU -46179.04470 19884.66840 -2.322 0.020520 *
#> modelAspire -16950.57068 21133.51599 -0.802 0.422803
#> modelAsus -18410.05267 24041.50775 -0.766 0.444094
#> modelASUS -17708.88614 18115.51515 -0.978 0.328656
#> modelAthlon -71605.00706 24464.62781 -2.927 0.003543 **
#> modelB50-70 -11123.97309 25483.95915 -0.437 0.662611
#> modelBook -3622.58717 17490.11001 -0.207 0.835979
#> modelBook(Slim) NA NA NA NA
#> modelBravo -32714.62029 23926.30146 -1.367 0.171998
#> modelCeleron -13413.51830 19643.27247 -0.683 0.494940
#> modelChromebook -16127.64724 17707.31743 -0.911 0.362741
#> modelCommercial -20607.83581 24028.84499 -0.858 0.391411
#> modelCompBook NA NA NA NA
#> modelConceptD -13605.78623 26556.47266 -0.512 0.608590
#> modelCosmos 4844.21003 18958.99429 0.256 0.798410
#> modelCreator -2505.33074 23745.40805 -0.106 0.916005
#> modelDA -3811.72797 23281.02839 -0.164 0.869997
#> modelDELL -12186.27995 22001.91207 -0.554 0.579854
#> modelDelta 8532.73035 24015.85442 0.355 0.722482
#> modelDual -35102.59949 26522.74085 -1.323 0.186134
#> modelE -5838.82754 18678.22083 -0.313 0.754683
#> modelEeeBook -17455.81161 19115.23668 -0.913 0.361479
#> modelEnvy 5653.15121 18068.23538 0.313 0.754473
#> modelExpertBook -3691.05193 18025.46780 -0.205 0.837816
#> modelExtensa -20976.51161 26815.90097 -0.782 0.434355
#> modelF17 -36324.31181 24028.49661 -1.512 0.131088
#> modelG15 -21241.67505 21253.33011 -0.999 0.317945
#> modelG3 -15242.80080 24893.99382 -0.612 0.540546
#> modelG5 -13834.24680 22272.83386 -0.621 0.534732
#> modelG7 26620.37540 25959.75659 1.025 0.305530
#> modelGalaxy NA NA NA NA
#> modelGAMING -18801.16342 24925.52434 -0.754 0.450944
#> modelGE76 82907.66296 24383.83636 3.400 0.000714 ***
#> modelGF63 -32660.78088 18050.79401 -1.809 0.070850 .
#> modelGF65 -49733.45176 20530.11489 -2.422 0.015686 *
#> modelGP65 17839.22983 24353.93435 0.732 0.464126
#> modelGP76 -2794.33781 24236.05426 -0.115 0.908245
#> modelGram NA NA NA NA
#> modelGS -20417.01897 24975.20716 -0.817 0.413945
#> modelGS66 9772.21867 24240.72863 0.403 0.686983
#> modelHP -8328.56825 18097.79860 -0.460 0.645527
#> modelIdeapad -9490.09394 18511.76857 -0.513 0.608368
#> modelIdeaPad -8261.96979 18506.92500 -0.446 0.655437
#> modelIDEAPAD -27694.33080 24851.46810 -1.114 0.265519
#> modelINBook 43462.57580 19556.49311 2.222 0.026596 *
#> modelInpiron -3248.22309 24879.11773 -0.131 0.896163
#> modelInspiron -1082.02244 18835.54547 -0.057 0.954208
#> modelINSPIRON -2297.95132 21968.41401 -0.105 0.916723
#> modelInsprion -6823.87475 24999.26495 -0.273 0.784968
#> modelIntel 7557.87177 20163.29294 0.375 0.707906
#> modelKatana -36046.76369 18685.05003 -1.929 0.054141 .
#> modelLegion -10279.34083 18970.94270 -0.542 0.588108
#> modelLenovo -3409.97116 24771.53672 -0.138 0.890554
#> modelLiber 5223.77045 10804.61869 0.483 0.628920
#> modelMacBook NA NA NA NA
#> modelModern -27873.79796 17996.47029 -1.549 0.121901
#> modelNitro -2740.33058 22529.69491 -0.122 0.903228
#> modelNotebook -8679.14290 23524.91475 -0.369 0.712296
#> modelOmen -9137.66245 20714.61814 -0.441 0.659271
#> modelOMEN -9493.33349 18567.43496 -0.511 0.609321
#> modelPavilion -4451.83204 16936.59628 -0.263 0.792747
#> modelPentium -8357.98578 18777.33878 -0.445 0.656387
#> modelPredator -21419.53855 21914.72342 -0.977 0.328730
#> modelPrestige -13694.28024 18489.22922 -0.741 0.459163
#> modelPro 7270.55427 19441.14358 0.374 0.708542
#> modelPulse -37389.86396 19664.78730 -1.901 0.057693 .
#> modelPURA NA NA NA NA
#> modelPureBook NA NA NA NA
#> modelRog 6201.99937 24121.19874 0.257 0.797168
#> modelROG -7311.70883 17888.18196 -0.409 0.682860
#> modelRyzen -8266.15721 17286.38328 -0.478 0.632675
#> modelSE NA NA NA NA
#> modelSpectre 35620.31681 17488.24282 2.037 0.042070 *
#> modelSpin 36748.95547 27401.51635 1.341 0.180344
#> modelStealth -9213.45777 20775.51321 -0.443 0.657567
#> modelSummit -35808.58616 23555.60621 -1.520 0.128950
#> modelSurface NA NA NA NA
#> modelSwift -10707.53910 21928.57789 -0.488 0.625507
#> modelSword -34241.89252 20788.07859 -1.647 0.099999 .
#> modelt.book NA NA NA NA
#> modelThinkbook -1640.55435 20492.00492 -0.080 0.936215
#> modelThinkBook -2699.63481 20267.76615 -0.133 0.894077
#> modelThinkpad -51593.20398 25614.99920 -2.014 0.044399 *
#> modelThinkPad 19267.96161 19252.41994 1.001 0.317290
#> modelThinpad -10686.85099 24542.08401 -0.435 0.663379
#> modelTravelmate -17847.06068 23016.98810 -0.775 0.438391
#> modelTUF -25770.35401 18298.35503 -1.408 0.159504
#> modelv15 -16942.47009 24428.33439 -0.694 0.488205
#> modelV15 -17083.33707 24669.40340 -0.692 0.488875
#> modelVivo -14447.92878 23902.67216 -0.604 0.545755
#> modelVivoBook -12956.17719 17475.92680 -0.741 0.458733
#> modelVivoBook14 -8953.63224 24054.53499 -0.372 0.709848
#> modelVostro -5245.59184 19081.74915 -0.275 0.783479
#> modelWF65 NA NA NA NA
#> modelX1 NA NA NA NA
#> modelx360 -3733.32671 20358.31725 -0.183 0.854556
#> modelX390 9489.82381 24530.60899 0.387 0.698988
#> modelXPS 26596.75274 20586.29599 1.292 0.196825
#> modelYoga -4480.19759 19073.28931 -0.235 0.814365
#> modelZenbook -3724.22436 18713.69951 -0.199 0.842316
#> modelZenBook 144.07041 18006.70785 0.008 0.993619
#> modelZephyrus 49038.22746 19309.07108 2.540 0.011326 *
#> processor_brand -5603.88832 3333.42429 -1.681 0.093216 .
#> processor_name -302.57815 199.62706 -1.516 0.130073
#> processor_gnrtn -89.19753 880.65195 -0.101 0.919355
#> ram_gb 1022.45171 180.88930 5.652 0.0000000236302574 ***
#> ram_typeDDR4 -5799.66385 7831.00949 -0.741 0.459200
#> ram_typeDDR5 17196.23435 10756.43666 1.599 0.110371
#> ram_typeLPDDR3 18696.10910 9071.72573 2.061 0.039705 *
#> ram_typeLPDDR4 -14529.36400 8818.50173 -1.648 0.099914 .
#> ram_typeLPDDR4X -12694.99083 7907.73032 -1.605 0.108890
#> ssd 50.75616 3.49021 14.542 < 0.0000000000000002 ***
#> hdd 13.41026 2.09281 6.408 0.0000000002822356 ***
#> os 22683.05295 2931.22877 7.738 0.0000000000000382 ***
#> os_bit -155.54614 85.40778 -1.821 0.069030 .
#> graphic_card_gb 4241.29732 556.03883 7.628 0.0000000000000845 ***
#> weight 625.64006 940.97392 0.665 0.506358
#> warranty 2685.30535 1523.67560 1.762 0.078470 .
#> Touchscreen 10505.74460 2580.38097 4.071 0.0000524482570815 ***
#> msoffice -1730.07157 1829.33162 -0.946 0.344630
#> old_price 0.28283 0.02102 13.457 < 0.0000000000000002 ***
#> discount -1097.23683 86.04487 -12.752 < 0.0000000000000002 ***
#> star_rating -1684.90027 377.03575 -4.469 0.0000092668907647 ***
#> ratings -9.13735 3.59625 -2.541 0.011290 *
#> reviews 64.23836 30.75408 2.089 0.037113 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 16370 on 655 degrees of freedom
#> (95 observations deleted due to missingness)
#> Multiple R-squared: 0.887, Adjusted R-squared: 0.862
#> F-statistic: 35.45 on 145 and 655 DF, p-value: < 0.00000000000000022
Model Stepwise Reggresion both
# Perform stepwise model selection using AIC
model_both <- stepAIC(model_all,
direction = "both",
trace = F)
# Print the summary of the simplified model
summary(model_both)#>
#> Call:
#> lm(formula = latest_price ~ brand + model + processor_brand +
#> processor_name + ram_gb + ram_type + ssd + hdd + os + os_bit +
#> graphic_card_gb + warranty + Touchscreen + old_price + discount +
#> star_rating + ratings + reviews, data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -58883 -6584 0 5311 154896
#>
#> Coefficients: (12 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 67806.03866 22632.07445 2.996 0.002838 **
#> brandAPPLE 20953.04832 22172.80842 0.945 0.345012
#> brandASUS -1997.95099 12266.47860 -0.163 0.870664
#> brandAvita -18160.19145 22345.42127 -0.813 0.416682
#> brandDELL -13865.08518 14372.99519 -0.965 0.335068
#> brandHP -14458.06590 12584.19477 -1.149 0.251012
#> brandiball -12371.59967 27839.25253 -0.444 0.656904
#> brandInfinix -47702.06029 26864.11590 -1.776 0.076248 .
#> brandlenovo 7963.03635 17704.43392 0.450 0.653020
#> brandLenovo -3183.27170 12948.38845 -0.246 0.805880
#> brandLG 12516.61507 22033.17585 0.568 0.570174
#> brandMi -25183.06463 23784.59603 -1.059 0.290081
#> brandMICROSOFT 24259.23809 23108.44245 1.050 0.294196
#> brandMSI 11065.30557 26989.69853 0.410 0.681952
#> brandNokia -5863.16494 22407.81243 -0.262 0.793668
#> brandrealme -16955.97004 24154.84702 -0.702 0.482946
#> brandRedmiBook -41477.07046 18907.91790 -2.194 0.028611 *
#> brandSAMSUNG 15129.77772 30555.84508 0.495 0.620658
#> brandSmartron -25438.26273 24100.21780 -1.056 0.291575
#> brandVaio -9759.33333 26601.52838 -0.367 0.713833
#> model14s 326.58570 18018.05240 0.018 0.985544
#> model15 25010.09178 21272.33276 1.176 0.240136
#> model15-ec1105AX -19076.77360 23382.20096 -0.816 0.414871
#> model15q -6306.35390 18995.74333 -0.332 0.740004
#> model15s -11217.45588 17189.38280 -0.653 0.514255
#> model250 -8509.49091 23388.19252 -0.364 0.716097
#> model250-G6 -7447.92017 23368.37002 -0.319 0.750042
#> model3000 1090.98555 24775.64341 0.044 0.964890
#> model3511 -9904.17253 24712.29756 -0.401 0.688713
#> model430 1874.94673 23428.40504 0.080 0.936239
#> modelA6-9225 -19501.76120 24510.89299 -0.796 0.426530
#> modelAlpha -59045.93937 23759.42904 -2.485 0.013197 *
#> modelAMD 301.28654 23864.48520 0.013 0.989931
#> modelAPU -46115.72378 19668.17063 -2.345 0.019339 *
#> modelAspire -16693.12886 21002.77652 -0.795 0.427013
#> modelAsus -18717.01731 23969.92036 -0.781 0.435169
#> modelASUS -16874.09030 17923.88681 -0.941 0.346830
#> modelAthlon -71356.13370 24322.42210 -2.934 0.003465 **
#> modelB50-70 -10741.63618 25445.77519 -0.422 0.673062
#> modelBook -2332.60760 17342.25166 -0.135 0.893045
#> modelBook(Slim) NA NA NA NA
#> modelBravo -34513.45598 23833.25635 -1.448 0.148059
#> modelCeleron -12515.60319 19574.01204 -0.639 0.522786
#> modelChromebook -16036.59028 17666.20472 -0.908 0.364340
#> modelCommercial -20636.91522 23925.98730 -0.863 0.388709
#> modelCompBook NA NA NA NA
#> modelConceptD -13197.18411 26510.25918 -0.498 0.618781
#> modelCosmos 3868.64321 18708.01404 0.207 0.836237
#> modelCreator -4081.35577 23652.66591 -0.173 0.863055
#> modelDA -3430.64774 23215.18377 -0.148 0.882565
#> modelDELL -12946.50653 21827.09900 -0.593 0.553292
#> modelDelta 6396.71477 23906.05509 0.268 0.789109
#> modelDual -35165.47085 26443.48992 -1.330 0.184034
#> modelE -6219.52003 18394.31076 -0.338 0.735379
#> modelEeeBook -17210.20587 19074.59861 -0.902 0.367250
#> modelEnvy 4922.31372 17914.21273 0.275 0.783578
#> modelExpertBook -3756.84414 17864.99041 -0.210 0.833506
#> modelExtensa -21026.37592 26726.00875 -0.787 0.431718
#> modelF17 -36122.58689 23890.91616 -1.512 0.131019
#> modelG15 -20994.60145 21091.25850 -0.995 0.319899
#> modelG3 -15110.17732 24734.61163 -0.611 0.541482
#> modelG5 -13711.31830 22054.03635 -0.622 0.534345
#> modelG7 27232.97220 25727.39511 1.059 0.290207
#> modelGalaxy NA NA NA NA
#> modelGAMING -18181.71385 24791.04701 -0.733 0.463577
#> modelGE76 81892.64679 24302.69180 3.370 0.000796 ***
#> modelGF63 -33214.41541 18013.82313 -1.844 0.065657 .
#> modelGF65 -50064.36488 20469.32266 -2.446 0.014713 *
#> modelGP65 16245.70083 24244.73482 0.670 0.503047
#> modelGP76 -3900.76356 24136.66990 -0.162 0.871661
#> modelGram NA NA NA NA
#> modelGS -21296.62246 24718.67363 -0.862 0.389243
#> modelGS66 8496.18112 24121.02548 0.352 0.724778
#> modelHP -8547.78914 17899.08256 -0.478 0.633126
#> modelIdeapad -9667.28710 18325.03017 -0.528 0.597993
#> modelIdeaPad -7288.76931 18253.59578 -0.399 0.689797
#> modelIDEAPAD -26554.72599 24675.44681 -1.076 0.282250
#> modelINBook 44078.78723 19507.88632 2.260 0.024176 *
#> modelInpiron -2832.02241 24770.84449 -0.114 0.909012
#> modelInspiron -1070.46578 18573.07821 -0.058 0.954057
#> modelINSPIRON -2304.33060 21791.67925 -0.106 0.915818
#> modelInsprion -6845.91170 24778.24145 -0.276 0.782414
#> modelIntel 7607.31872 19940.20922 0.382 0.702951
#> modelKatana -37034.26537 18636.25253 -1.987 0.047313 *
#> modelLegion -10068.07968 18797.42637 -0.536 0.592409
#> modelLenovo -3911.19762 24597.96025 -0.159 0.873714
#> modelLiber 5527.72316 10402.71995 0.531 0.595340
#> modelMacBook NA NA NA NA
#> modelModern -29637.06619 17908.23882 -1.655 0.098413 .
#> modelNitro -2755.73145 22237.73513 -0.124 0.901415
#> modelNotebook -7057.91968 23205.71487 -0.304 0.761113
#> modelOmen -9609.74489 20546.19727 -0.468 0.640144
#> modelOMEN -9508.81833 18396.89082 -0.517 0.605420
#> modelPavilion -4722.40097 16732.17346 -0.282 0.777852
#> modelPentium -8141.37873 18686.95825 -0.436 0.663218
#> modelPredator -20946.26558 21789.81158 -0.961 0.336761
#> modelPrestige -15141.43210 18425.63339 -0.822 0.411512
#> modelPro 8407.70773 19316.80149 0.435 0.663521
#> modelPulse -38425.95861 19592.46262 -1.961 0.050270 .
#> modelPURA NA NA NA NA
#> modelPureBook NA NA NA NA
#> modelRog 6517.75715 24030.16006 0.271 0.786297
#> modelROG -7092.32998 17771.44135 -0.399 0.689959
#> modelRyzen -7880.46305 17078.40676 -0.461 0.644644
#> modelSE NA NA NA NA
#> modelSpectre 36141.42688 17378.75021 2.080 0.037946 *
#> modelSpin 37220.54151 27318.66816 1.362 0.173519
#> modelStealth -10111.38319 20715.98962 -0.488 0.625645
#> modelSummit -35622.67400 23518.76190 -1.515 0.130341
#> modelSurface NA NA NA NA
#> modelSwift -10264.08388 21861.21608 -0.470 0.638860
#> modelSword -36238.36542 20670.51738 -1.753 0.080043 .
#> modelt.book NA NA NA NA
#> modelThinkbook -841.34026 20288.13874 -0.041 0.966934
#> modelThinkBook -3301.01601 20053.40131 -0.165 0.869301
#> modelThinkpad -50421.24815 25293.66128 -1.993 0.046626 *
#> modelThinkPad 20571.66678 18957.74027 1.085 0.278260
#> modelThinpad -10439.11119 24345.30467 -0.429 0.668214
#> modelTravelmate -18495.91163 22827.78217 -0.810 0.418097
#> modelTUF -25327.45428 18104.70648 -1.399 0.162301
#> modelv15 -16653.78412 24289.61241 -0.686 0.493185
#> modelV15 -15848.00420 24320.60383 -0.652 0.514868
#> modelVivo -14531.33374 23783.68697 -0.611 0.541424
#> modelVivoBook -12900.05952 17327.98941 -0.744 0.456862
#> modelVivoBook14 -7604.50777 23794.69774 -0.320 0.749382
#> modelVostro -5467.23643 18687.85849 -0.293 0.769954
#> modelWF65 NA NA NA NA
#> modelX1 NA NA NA NA
#> modelx360 -3246.85193 20160.13618 -0.161 0.872101
#> modelX390 9722.52867 24421.41044 0.398 0.690675
#> modelXPS 26702.06998 20260.76332 1.318 0.187989
#> modelYoga -3936.56025 18831.46025 -0.209 0.834480
#> modelZenbook -3743.52604 18564.75392 -0.202 0.840255
#> modelZenBook 896.05591 17833.89463 0.050 0.959943
#> modelZephyrus 49860.14719 19172.83953 2.601 0.009516 **
#> processor_brand -5697.55200 3264.14668 -1.745 0.081365 .
#> processor_name -292.64779 180.23915 -1.624 0.104927
#> ram_gb 1020.65575 179.69537 5.680 0.00000002023641693 ***
#> ram_typeDDR4 -6127.10247 7352.74709 -0.833 0.404973
#> ram_typeDDR5 16969.90914 10336.51322 1.642 0.101121
#> ram_typeLPDDR3 18104.16847 8953.02393 2.022 0.043567 *
#> ram_typeLPDDR4 -14707.21090 8597.28133 -1.711 0.087611 .
#> ram_typeLPDDR4X -13022.64794 7456.08282 -1.747 0.081177 .
#> ssd 50.67682 3.40783 14.871 < 0.0000000000000002 ***
#> hdd 13.33086 2.08507 6.393 0.00000000030736938 ***
#> os 22751.68108 2863.88065 7.944 0.00000000000000848 ***
#> os_bit -144.35363 84.46219 -1.709 0.087905 .
#> graphic_card_gb 4166.09241 538.41993 7.738 0.00000000000003821 ***
#> warranty 2124.31800 1412.35192 1.504 0.133036
#> Touchscreen 9834.80543 2484.18944 3.959 0.00008347721279763 ***
#> old_price 0.28100 0.02093 13.428 < 0.0000000000000002 ***
#> discount -1095.16380 84.10894 -13.021 < 0.0000000000000002 ***
#> star_rating -1736.40019 370.99756 -4.680 0.00000347854288514 ***
#> ratings -9.07415 3.59120 -2.527 0.011745 *
#> reviews 63.74358 30.70616 2.076 0.038289 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 16350 on 658 degrees of freedom
#> (95 observations deleted due to missingness)
#> Multiple R-squared: 0.8867, Adjusted R-squared: 0.8623
#> F-statistic: 36.28 on 142 and 658 DF, p-value: < 0.00000000000000022
Model Comparison (Goodness of fit)
Objective: to get the best model for target variable prediction.
goodness of fit for regression models with more than one predictor (multiple linear regression)
The more predictors used, the Multiple R-squared value will definitely increase, regardless of whether the predictors have an influence on the model or not.
Adjusted R-squared Formula: \[Adjusted\ R^2 = 1 - \frac{(1-R^2)(n-1)}{n-k-1}\] Information:
- n: Number of data
- k: Number of predictor variables
- \(R^2\): Value of \(R^2\)
Purpose: to determine how well the model explains the variance of the target variable.
Compare the R-Squared values of the models that have been made
# check the R-Squared value for each model
summary(model_all)$adj.r.squared#> [1] 0.8619639
summary(model_selection)$adj.r.squared#> [1] 0.5595685
summary(model_backward)$adj.r.squared#> [1] 0.8623013
summary(model_forward)$adj.r.squared#> [1] 0.8619639
summary(model_both)$adj.r.squared#> [1] 0.8623013
💡 Conclusion: a model that can explain the target variable well is
the both model, which is 0.8623013 or 86.2%
5. Evaluation
Purpose: to find out whether the machine learning model that has been made is good enough by seeing whether the prediction results have produced the smallest error.
Model Performance
Root Mean Squared Error (RMSE) RMSE is the square root form
of MSE. Because it has been rooted, its interpretation is more or less
the same as MAE. RMSE can be used if we are more concerned with very
large errors. In R, using the RMSE() function from the
`MLmetrics package.
\[RMSE = \sqrt{\frac{1}{n} \sum (\hat y - y)^2}\]
to speed things up I used the performance library for
the compare_performance function. And I see here is the
RMSE for each model
library(performance)
comparison <- compare_performance(model_all, model_selection, model_backward, model_forward, model_both)
as.data.frame(comparison)#> Name Model AIC AIC_wt AICc AICc_wt BIC
#> 1 model_all lm 17950.69 0.0521678 18017.33 0.0128193 18639.51
#> 2 model_selection lm 21075.37 0.0000000 21075.40 0.0000000 21089.77
#> 3 model_backward lm 17946.39 0.4478322 18010.05 0.4871807 18621.16
#> 4 model_forward lm 17950.69 0.0521678 18017.33 0.0128193 18639.51
#> 5 model_both lm 17946.39 0.4478322 18010.05 0.4871807 18621.16
#> BIC_wt R2 R2_adjusted RMSE Sigma
#> 1 0.00005159968 0.8869829 0.8619639 14804.94 16372.02
#> 2 0.00000000000 0.5600606 0.5595685 30900.41 30934.96
#> 3 0.49994840032 0.8867428 0.8623013 14820.66 16352.00
#> 4 0.00005159968 0.8869829 0.8619639 14804.94 16372.02
#> 5 0.49994840032 0.8867428 0.8623013 14820.66 16352.00
💡 Conclusion: the model that gives the smallest error in predicting the inequality value is the model_all & model_forward, with an RMSE value of 14804.94
Assumption
As a statistical model, linear regression is a model with strict assumptions. The following are some assumptions that must be checked to ensure that the model we make is considered a Best Linear Unbiased Estimator (BLUE) model, namely a model that can predict new data consistently.
Assumptions of linear regression models:
1.Linearity
Linearity denotes a linear or straight line relationship between the target variable and its predictors.
A plot of residuals against fitted values can be used to assess the assumption of linearity of a regression model (multiple linear regression). This is a scatter plot with the fitted values (predicted results of the target variable) on the x-axis and the residual/error values generated by the model on the y-axis.
plot(model_both, # model tested
which = 1) # residual vs fitted
💡 Conclusion: The residual values are randomly distributed between
-5000 and 5000, meaning that our model meets the linear assumptions.
2. Normality of Residuals
# histogram residual
hist(model_both$residuals)plot(model_both, which = 2)# shapiro test
shapiro.test(model_both$residuals)#>
#> Shapiro-Wilk normality test
#>
#> data: model_both$residuals
#> W = 0.82289, p-value < 0.00000000000000022
💡 Conclusion: p-value < 0.00000000000000022, meaning that the residual data isnt normally distributed
3. Homoscedasticity of Residuals
plot(x = model_both$fitted.values,
y = model_both$residuals)
abline(h = 0, col = "red")
Model is Heteroscedasticity (Fan Shape Pattern)
The model’s mistakes are predicted to spread randomly or with constant variation. The mistake is not patterned when seen. Homoscedasticity is another name for this disease.
To make sure, In this time we use Test statistics with
bptest() from the lmtest package
Breusch-Pagan hypothesis test:
- H0: constant spreading error or homoscedasticity
- H1: error spread is NOT constant or heteroscedasticity
Expected conditions: H0
reject H0 if the p-value < 0.05 (alpha)
# bptest of models
library(lmtest)
bptest(model_both)#>
#> studentized Breusch-Pagan test
#>
#> data: model_both
#> BP = 360.73, df = 142, p-value < 0.00000000000000022
💡 Conclusion: p-value = 0.00000000000000022 < 0.05, meaning that the errors are not randomly distributed or heteroscedasticity
💡 Conclusion: all predictor variables meet the no multicollinearity assumption
# 6. Model Improvement {.tabset}
## Tuning
We've already noted that a number of assumptions in our model are false, including the ones of linearity, heteroscedasticity, and autocorrelation. We're going to try to fix them now. One of the approaches that can be adopted is to shun off the variables that have correlation coefficient above 0.7. Also i try to use `sqrt` for every single numeric variabel.
```r
# transform variable
laptopSales_clean <- laptopSales_clean %>%
select(-c(reviews, old_price)) %>%
mutate_if(~is.numeric(.), sqrt)
head(laptopSales_clean)
#> brand model processor_brand processor_name processor_gnrtn ram_gb
#> 1 Lenovo A6-9225 0 1.000000 2.236068 2.000000
#> 2 Lenovo Ideapad 0 1.414214 2.236068 2.000000
#> 3 Avita PURA 0 1.414214 2.236068 2.000000
#> 4 Avita PURA 0 1.414214 2.236068 2.000000
#> 5 Avita PURA 0 1.414214 2.236068 2.000000
#> 6 Avita PURA 0 1.414214 2.236068 2.828427
#> ram_type ssd hdd os os_bit graphic_card_gb weight warranty
#> 1 DDR4 0.00000 32.00000 0 8 0 1.414214 0
#> 2 DDR4 0.00000 22.62742 0 8 0 0.000000 0
#> 3 DDR4 11.31371 0.00000 0 8 0 1.414214 0
#> 4 DDR4 11.31371 0.00000 0 8 0 1.414214 0
#> 5 DDR4 16.00000 0.00000 0 8 0 1.414214 0
#> 6 DDR4 16.00000 0.00000 0 8 0 1.414214 0
#> Touchscreen msoffice latest_price discount star_rating ratings
#> 1 0 0 158.0823 4.795832 1.923538 7.937254
#> 2 0 0 139.9643 2.828427 1.897367 43.520110
#> 3 0 0 141.3860 5.291503 1.923538 33.955854
#> 4 0 0 146.5947 4.795832 1.923538 33.955854
#> 5 0 0 158.0823 5.000000 1.923538 40.706265
#> 6 0 0 158.0823 5.000000 1.923538 40.706265
Remove all the outlier using interquartile range (IQR) Method
interquartile range (IQR) Method is Calculate the IQR for the variable and remove the data points that fall outside the range defined by Q1 + 1.5 * IQR and Q3 + 1.5 * IQR, where Q1 is the 25th percentile and Q3 is the 75th percentile.
# Calculate IQR and define the threshold
calculate_iqr <- function(x) {
q1 <- quantile(x, 0.25)
q3 <- quantile(x, 0.75)
iqr <- q3 - q1
threshold <- 1.5 * iqr
return(list(q1 = q1, q3 = q3, iqr = iqr, threshold = threshold))
}
# Function to remove outliers
remove_outliers <- function(data, column) {
iqr_values <- calculate_iqr(data[[column]])
lower_bound <- iqr_values$q1 - iqr_values$threshold
upper_bound <- iqr_values$q3 + iqr_values$threshold
data <- data[data[[column]] >= lower_bound & data[[column]] <= upper_bound, ]
return(data)
}
# Applying the function to each numeric column
numeric_columns <- sapply(laptopSales_clean, is.numeric)
laptopSales_clean <- Reduce(remove_outliers, names(laptopSales_clean)[numeric_columns], init = laptopSales_clean)Performance
model_allTuning <- lm(formula = latest_price ~ .,
data = laptopSales_clean)
summary(model_allTuning) #>
#> Call:
#> lm(formula = latest_price ~ ., data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -15.419 -4.566 0.000 2.386 29.256
#>
#> Coefficients: (16 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -161.53865 59.91917 -2.696 0.008897 **
#> brandASUS 38.24889 18.47446 2.070 0.042334 *
#> brandDELL -4.85797 15.19335 -0.320 0.750173
#> brandHP 0.75717 17.12698 0.044 0.964871
#> brandInfinix -18.93953 12.13192 -1.561 0.123275
#> brandLenovo 7.49561 14.32174 0.523 0.602468
#> brandLG 42.08832 17.75733 2.370 0.020710 *
#> brandMi -26.21575 17.69882 -1.481 0.143308
#> brandMSI 29.58811 14.66149 2.018 0.047651 *
#> brandNokia -1.01243 17.10427 -0.059 0.952978
#> modelAspire 0.48341 17.07611 0.028 0.977501
#> modelASUS -13.55734 13.30977 -1.019 0.312112
#> modelEnvy 69.19190 13.37664 5.173 0.00000233683067 ***
#> modelF17 -12.64937 15.17152 -0.834 0.407425
#> modelGAMING 17.96033 12.06157 1.489 0.141236
#> modelGF63 -30.74499 16.85820 -1.824 0.072719 .
#> modelGram NA NA NA NA
#> modelIdeapad -13.83130 15.99896 -0.865 0.390438
#> modelIdeaPad -4.74803 16.16594 -0.294 0.769904
#> modelINBook NA NA NA NA
#> modelInspiron 18.33226 5.12381 3.578 0.000655 ***
#> modelIntel NA NA NA NA
#> modelLegion -2.42014 16.63331 -0.145 0.884760
#> modelModern -35.17965 16.16134 -2.177 0.033079 *
#> modelNotebook NA NA NA NA
#> modelPavilion 7.56497 8.60453 0.879 0.382491
#> modelPrestige NA NA NA NA
#> modelPureBook NA NA NA NA
#> modelROG -6.75657 12.83465 -0.526 0.600353
#> modelThinkbook 4.60508 18.89732 0.244 0.808227
#> modelThinkBook 0.59571 17.81181 0.033 0.973421
#> modelThinkPad -0.29711 17.50943 -0.017 0.986513
#> modelTUF -26.77617 13.65186 -1.961 0.054058 .
#> modelVivoBook -45.78300 10.30042 -4.445 0.00003445731408 ***
#> modelVostro NA NA NA NA
#> modelYoga NA NA NA NA
#> modelZenBook NA NA NA NA
#> processor_brand NA NA NA NA
#> processor_name 132.47803 15.34587 8.633 0.00000000000196 ***
#> processor_gnrtn 24.23635 11.67709 2.076 0.041836 *
#> ram_gb NA NA NA NA
#> ram_typeLPDDR3 NA NA NA NA
#> ram_typeLPDDR4X 8.41653 11.53153 0.730 0.468051
#> ssd 1.09634 0.40255 2.723 0.008258 **
#> hdd NA NA NA NA
#> os NA NA NA NA
#> os_bit NA NA NA NA
#> graphic_card_gb 7.09128 2.61252 2.714 0.008465 **
#> weight -3.50159 2.64432 -1.324 0.190005
#> warranty 2.12188 3.92034 0.541 0.590159
#> Touchscreen NA NA NA NA
#> msoffice 7.22968 3.77502 1.915 0.059810 .
#> discount -4.77728 1.44059 -3.316 0.001486 **
#> star_rating -5.58194 14.09798 -0.396 0.693427
#> ratings -0.06325 0.11401 -0.555 0.580939
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 9.467 on 66 degrees of freedom
#> (20 observations deleted due to missingness)
#> Multiple R-squared: 0.9346, Adjusted R-squared: 0.897
#> F-statistic: 24.83 on 38 and 66 DF, p-value: < 0.00000000000000022
#cek korelasi
ggcorr(laptopSales_clean, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)💡 Insight: Strong correlation > 0.6 old_price
# Models with columns that have a fairly strong correlation
model_selection <- lm(formula = latest_price ~ graphic_card_gb,
data = laptopSales_clean)
summary(model_selection)#>
#> Call:
#> lm(formula = latest_price ~ graphic_card_gb, data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -45.395 -19.825 1.744 13.784 59.541
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 227.027 2.469 91.951 < 0.0000000000000002 ***
#> graphic_card_gb 19.880 2.681 7.415 0.0000000000172 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 24.02 on 123 degrees of freedom
#> Multiple R-squared: 0.3089, Adjusted R-squared: 0.3033
#> F-statistic: 54.98 on 1 and 123 DF, p-value: 0.0000000000172
Model Stepwise Reggresion backward
# Perform stepwise model selection using AIC
model_backward <- stepAIC(model_allTuning,
direction = "backward",
trace = F)
# Print the summary of the simplified model
summary(model_backward)#>
#> Call:
#> lm(formula = latest_price ~ model + processor_name + processor_gnrtn +
#> ssd + graphic_card_gb + weight + msoffice + discount, data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -15.451 -5.365 0.000 2.365 28.982
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -185.3185 42.1272 -4.399 0.00003804387984601 ***
#> modelAspire 1.3644 9.5800 0.142 0.887156
#> modelASUS 24.0496 10.2740 2.341 0.022096 *
#> modelEnvy 71.8139 12.6564 5.674 0.00000029154314923 ***
#> modelF17 24.7945 13.0277 1.903 0.061125 .
#> modelGAMING 16.4437 12.5667 1.309 0.194978
#> modelGF63 -0.8614 10.9117 -0.079 0.937302
#> modelGram 40.8222 10.5906 3.855 0.000254 ***
#> modelIdeapad -5.1018 8.1063 -0.629 0.531157
#> modelIdeaPad 3.7114 7.7551 0.479 0.633733
#> modelINBook -12.0161 8.9743 -1.339 0.184921
#> modelInspiron 15.0390 7.2856 2.064 0.042705 *
#> modelIntel 9.1623 11.7078 0.783 0.436513
#> modelLegion 6.1580 10.8985 0.565 0.573858
#> modelModern -6.6307 9.3473 -0.709 0.480453
#> modelNotebook -25.2192 10.2190 -2.468 0.016039 *
#> modelPavilion 9.2791 8.0502 1.153 0.252975
#> modelPrestige 38.0203 11.9681 3.177 0.002216 **
#> modelPureBook -2.8073 9.0932 -0.309 0.758444
#> modelROG 29.7220 10.4791 2.836 0.005963 **
#> modelThinkbook 12.5684 12.0683 1.041 0.301257
#> modelThinkBook 8.5806 10.3670 0.828 0.410662
#> modelThinkPad 7.8123 8.3575 0.935 0.353122
#> modelTUF 13.1599 10.6349 1.237 0.220065
#> modelVivoBook -7.1481 7.6145 -0.939 0.351092
#> modelVostro -3.2908 7.9912 -0.412 0.681739
#> modelYoga 17.8446 12.3907 1.440 0.154279
#> modelZenBook 38.6399 11.9727 3.227 0.001902 **
#> processor_name 135.7023 13.6819 9.918 0.00000000000000561 ***
#> processor_gnrtn 24.5405 11.3559 2.161 0.034115 *
#> ssd 1.2354 0.3551 3.479 0.000870 ***
#> graphic_card_gb 6.1557 2.3930 2.572 0.012223 *
#> weight -4.0548 2.3502 -1.725 0.088877 .
#> msoffice 7.4183 2.9136 2.546 0.013097 *
#> discount -4.5371 1.2834 -3.535 0.000727 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 9.273 on 70 degrees of freedom
#> (20 observations deleted due to missingness)
#> Multiple R-squared: 0.9335, Adjusted R-squared: 0.9012
#> F-statistic: 28.89 on 34 and 70 DF, p-value: < 0.00000000000000022
Model Stepwise Reggresion forward
# Perform stepwise model selection using AIC
model_forward <- stepAIC(model_allTuning,
direction = "forward",
trace = F)
# Print the summary of the simplified model
summary(model_forward)#>
#> Call:
#> lm(formula = latest_price ~ brand + model + processor_brand +
#> processor_name + processor_gnrtn + ram_gb + ram_type + ssd +
#> hdd + os + os_bit + graphic_card_gb + weight + warranty +
#> Touchscreen + msoffice + discount + star_rating + ratings,
#> data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -15.419 -4.566 0.000 2.386 29.256
#>
#> Coefficients: (16 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -161.53865 59.91917 -2.696 0.008897 **
#> brandASUS 38.24889 18.47446 2.070 0.042334 *
#> brandDELL -4.85797 15.19335 -0.320 0.750173
#> brandHP 0.75717 17.12698 0.044 0.964871
#> brandInfinix -18.93953 12.13192 -1.561 0.123275
#> brandLenovo 7.49561 14.32174 0.523 0.602468
#> brandLG 42.08832 17.75733 2.370 0.020710 *
#> brandMi -26.21575 17.69882 -1.481 0.143308
#> brandMSI 29.58811 14.66149 2.018 0.047651 *
#> brandNokia -1.01243 17.10427 -0.059 0.952978
#> modelAspire 0.48341 17.07611 0.028 0.977501
#> modelASUS -13.55734 13.30977 -1.019 0.312112
#> modelEnvy 69.19190 13.37664 5.173 0.00000233683067 ***
#> modelF17 -12.64937 15.17152 -0.834 0.407425
#> modelGAMING 17.96033 12.06157 1.489 0.141236
#> modelGF63 -30.74499 16.85820 -1.824 0.072719 .
#> modelGram NA NA NA NA
#> modelIdeapad -13.83130 15.99896 -0.865 0.390438
#> modelIdeaPad -4.74803 16.16594 -0.294 0.769904
#> modelINBook NA NA NA NA
#> modelInspiron 18.33226 5.12381 3.578 0.000655 ***
#> modelIntel NA NA NA NA
#> modelLegion -2.42014 16.63331 -0.145 0.884760
#> modelModern -35.17965 16.16134 -2.177 0.033079 *
#> modelNotebook NA NA NA NA
#> modelPavilion 7.56497 8.60453 0.879 0.382491
#> modelPrestige NA NA NA NA
#> modelPureBook NA NA NA NA
#> modelROG -6.75657 12.83465 -0.526 0.600353
#> modelThinkbook 4.60508 18.89732 0.244 0.808227
#> modelThinkBook 0.59571 17.81181 0.033 0.973421
#> modelThinkPad -0.29711 17.50943 -0.017 0.986513
#> modelTUF -26.77617 13.65186 -1.961 0.054058 .
#> modelVivoBook -45.78300 10.30042 -4.445 0.00003445731408 ***
#> modelVostro NA NA NA NA
#> modelYoga NA NA NA NA
#> modelZenBook NA NA NA NA
#> processor_brand NA NA NA NA
#> processor_name 132.47803 15.34587 8.633 0.00000000000196 ***
#> processor_gnrtn 24.23635 11.67709 2.076 0.041836 *
#> ram_gb NA NA NA NA
#> ram_typeLPDDR3 NA NA NA NA
#> ram_typeLPDDR4X 8.41653 11.53153 0.730 0.468051
#> ssd 1.09634 0.40255 2.723 0.008258 **
#> hdd NA NA NA NA
#> os NA NA NA NA
#> os_bit NA NA NA NA
#> graphic_card_gb 7.09128 2.61252 2.714 0.008465 **
#> weight -3.50159 2.64432 -1.324 0.190005
#> warranty 2.12188 3.92034 0.541 0.590159
#> Touchscreen NA NA NA NA
#> msoffice 7.22968 3.77502 1.915 0.059810 .
#> discount -4.77728 1.44059 -3.316 0.001486 **
#> star_rating -5.58194 14.09798 -0.396 0.693427
#> ratings -0.06325 0.11401 -0.555 0.580939
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 9.467 on 66 degrees of freedom
#> (20 observations deleted due to missingness)
#> Multiple R-squared: 0.9346, Adjusted R-squared: 0.897
#> F-statistic: 24.83 on 38 and 66 DF, p-value: < 0.00000000000000022
Model Stepwise Reggresion both
# Perform stepwise model selection using AIC
model_both <- stepAIC(model_allTuning,
direction = "both",
trace = F)
# Print the summary of the simplified model
summary(model_both)#>
#> Call:
#> lm(formula = latest_price ~ model + processor_name + processor_gnrtn +
#> ssd + graphic_card_gb + weight + msoffice + discount, data = laptopSales_clean)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -15.451 -5.365 0.000 2.365 28.982
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -185.3185 42.1272 -4.399 0.00003804387984601 ***
#> modelAspire 1.3644 9.5800 0.142 0.887156
#> modelASUS 24.0496 10.2740 2.341 0.022096 *
#> modelEnvy 71.8139 12.6564 5.674 0.00000029154314923 ***
#> modelF17 24.7945 13.0277 1.903 0.061125 .
#> modelGAMING 16.4437 12.5667 1.309 0.194978
#> modelGF63 -0.8614 10.9117 -0.079 0.937302
#> modelGram 40.8222 10.5906 3.855 0.000254 ***
#> modelIdeapad -5.1018 8.1063 -0.629 0.531157
#> modelIdeaPad 3.7114 7.7551 0.479 0.633733
#> modelINBook -12.0161 8.9743 -1.339 0.184921
#> modelInspiron 15.0390 7.2856 2.064 0.042705 *
#> modelIntel 9.1623 11.7078 0.783 0.436513
#> modelLegion 6.1580 10.8985 0.565 0.573858
#> modelModern -6.6307 9.3473 -0.709 0.480453
#> modelNotebook -25.2192 10.2190 -2.468 0.016039 *
#> modelPavilion 9.2791 8.0502 1.153 0.252975
#> modelPrestige 38.0203 11.9681 3.177 0.002216 **
#> modelPureBook -2.8073 9.0932 -0.309 0.758444
#> modelROG 29.7220 10.4791 2.836 0.005963 **
#> modelThinkbook 12.5684 12.0683 1.041 0.301257
#> modelThinkBook 8.5806 10.3670 0.828 0.410662
#> modelThinkPad 7.8123 8.3575 0.935 0.353122
#> modelTUF 13.1599 10.6349 1.237 0.220065
#> modelVivoBook -7.1481 7.6145 -0.939 0.351092
#> modelVostro -3.2908 7.9912 -0.412 0.681739
#> modelYoga 17.8446 12.3907 1.440 0.154279
#> modelZenBook 38.6399 11.9727 3.227 0.001902 **
#> processor_name 135.7023 13.6819 9.918 0.00000000000000561 ***
#> processor_gnrtn 24.5405 11.3559 2.161 0.034115 *
#> ssd 1.2354 0.3551 3.479 0.000870 ***
#> graphic_card_gb 6.1557 2.3930 2.572 0.012223 *
#> weight -4.0548 2.3502 -1.725 0.088877 .
#> msoffice 7.4183 2.9136 2.546 0.013097 *
#> discount -4.5371 1.2834 -3.535 0.000727 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 9.273 on 70 degrees of freedom
#> (20 observations deleted due to missingness)
#> Multiple R-squared: 0.9335, Adjusted R-squared: 0.9012
#> F-statistic: 28.89 on 34 and 70 DF, p-value: < 0.00000000000000022
Model Comparison (Goodness of fit)
Compare the R-Squared values of the models that have been made
# check the R-Squared value for each model
summary(model_all)$adj.r.squared#> [1] 0.8619639
summary(model_selection)$adj.r.squared#> [1] 0.3032816
summary(model_backward)$adj.r.squared#> [1] 0.9011722
summary(model_forward)$adj.r.squared#> [1] 0.8969916
summary(model_both)$adj.r.squared#> [1] 0.9011722
💡 Conclusion: a model that can explain the target variable well is
the both model, which is 0.901172 or 90.1%
Evaluation
Model Performance
comparison <- compare_performance(model_all, model_selection, model_backward, model_forward, model_both)
as.data.frame(comparison)#> Name Model AIC
#> 1 model_all lm 17950.6912
#> 2 model_selection lm 1153.4333
#> 3 model_backward lm 795.0923
#> 4 model_forward lm 801.2644
#> 5 model_both lm 795.0923
#> AIC_wt
#> 1 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000
#> 2 0.0000000000000000000000000000000000000000000000000000000000000000000000000000007523405
#> 3 0.4888343312100105753081891180045204237103462219238281250000000000000000000000000000000
#> 4 0.0223313375799788979558790913415577961131930351257324218750000000000000000000000000000
#> 5 0.4888343312100105753081891180045204237103462219238281250000000000000000000000000000000
#> AICc
#> 1 18017.3252
#> 2 1153.6316
#> 3 834.2688
#> 4 852.5144
#> 5 834.2688
#> AICc_wt
#> 1 0.0000000000000000000000000000000000000000000000000000000000000000000000000000
#> 2 0.0000000000000000000000000000000000000000000000000000000000000000000002239706
#> 3 0.4999727142594505013839523144270060583949089050292968750000000000000000000000
#> 4 0.0000545714810991374194362735217112003738293424248695373535156250000000000000
#> 5 0.4999727142594505013839523144270060583949089050292968750000000000000000000000
#> BIC
#> 1 18639.5127
#> 2 1161.9182
#> 3 890.6349
#> 4 907.4228
#> 5 890.6349
#> BIC_wt
#> 1 0.000000000000000000000000000000000000000000000000000000000000000000
#> 2 0.000000000000000000000000000000000000000000000000000000000006173026
#> 3 0.499943448614036001220739535710890777409076690673828125000000000000
#> 4 0.000113102771928052730858980934325330736101022921502590179443359375
#> 5 0.499943448614036001220739535710890777409076690673828125000000000000
#> R2 R2_adjusted RMSE Sigma
#> 1 0.8869829 0.8619639 14804.943021 16372.023085
#> 2 0.3089003 0.3032816 23.826408 24.019337
#> 3 0.9334813 0.9011722 7.571250 9.272850
#> 4 0.9346293 0.8969916 7.505633 9.466948
#> 5 0.9334813 0.9011722 7.571250 9.272850
💡 Conclusion: the model that gives the smallest error in predicting the inequality value is the model_forward, with an RMSE value of 7.505633
Assumption Linear Regression
1.Linearity
plot(model_both, # model yg akan diujikan
which = 1) # residual vs fitted
💡 Conclusion: The residual values are randomly distributed between -10
and 10, meaning that our model meets the linear assumptions
2. Normality of Residuals
# histogram residual
hist(model_both$residuals)plot(model_both, which = 2)# shapiro test
shapiro.test(model_both$residuals)#>
#> Shapiro-Wilk normality test
#>
#> data: model_both$residuals
#> W = 0.93301, p-value = 0.00004905
- H0: normally distributed error
- H1: errors are NOT normally distributed
H0 is rejected if p-values < 0.05 (alpha)
Expected conditions: H0
💡 Conclusion: p-value = 0.00004905, meaning that the residual data is not normally distributed
3. Homoscedasticity of Residuals
plot(x = model_both$fitted.values,
y = model_both$residuals)
abline(h = 0, col = "red") Breusch-Pagan hypothesis test:
- H0: constant spreading error or homoscedasticity
- H1: error spread is NOT constant or heteroscedasticity
Expected conditions: H0
reject H0 if the p-value < 0.05 (alpha)
# bptest of models
library(lmtest)
bptest(model_both)#>
#> studentized Breusch-Pagan test
#>
#> data: model_both
#> BP = 41.316, df = 34, p-value = 0.1814
- H0: constant spreading error or homoscedasticity
- H1: error spread is NOT constant or heteroscedasticity
Expected conditions: H0 💡 Conclusion: p-value = 0.1814 > 0.05, meaning that the errors are randomly distributed or homoscedasticity
Conclusion
Our final model has satisfied the classical assumptions. The R-squared of the model is high, with 89.7% of the variables can explain the variances in the Laptop price.
During our model tuning and improvement process, we initially applied the interquartile range (IQR) method to handle outliers in the data. By removing the data points that fell outside the range defined by Q1 + 1.5 * IQR and Q3 + 1.5 * IQR, we aimed to reduce the impact of extreme values on our model’s performance.
However, despite this preprocessing step, we encountered an issue with the Normality of Residuals. The spread of errors was not constant, indicating heteroscedasticity. This suggested that the variability of the residuals was not consistent across the range of predicted values.
Although we observed an improvement in the Homoscedasticity of Residuals, indicating that the spreading error was becoming more constant, the issue of heteroscedasticity persisted. This non-constant spread of errors could affect the accuracy and reliability of our model’s predictions.
Despite these challenges, we found encouraging results in terms of Linearity. The residual values were randomly distributed between -10 and 10, demonstrating that our model fulfilled the assumption of linearity. This indicated that the relationship between the predictor variables and the response variable was adequately captured by our model.
Given the persistent issue of heteroscedasticity, we recognized the need for further model refinement. Future steps could include exploring alternative transformation methods, such as logarithmic or power transformations, to address the issue of heteroscedasticity. Additionally, employing weighted least squares regression, robust regression techniques, or modeling frameworks specifically designed to handle heteroscedasticity could be considered.
It’s important to note that while linearity and homoscedasticity are crucial assumptions for regression analysis, addressing heteroscedasticity is equally important to ensure the validity of our model’s predictions. By recognizing and addressing these challenges, we can continue working towards improving the overall performance and accuracy of our model.