Using Regresion Model on Laptop Sales Price

Introduction

Business Goal

In this scenario, we want to know what factors influence the worldwide selling price of laptops, and we want to know:

  • Which variables are important in forecasting laptop pricing
  • how effectively the variable describes a laptop's price

Based on this business query, we gathered data from flipkart.com on the selling prices of computers from major laptop brands throughout the world.

With the supplied independent variables, we will be asked to model the pricing of a laptop. As a result, we can plan our next moves and company plans. Furthermore, the model would help management understand market pricing.

DataSet Overview

The Laptop Dataset contains 23 features/attributes and 896 records.Let’s explore through the features of the Dataset.

  • brand: Manufacturer of Laptop.
  • model: Model name of the Laptop.
  • processor_brand: Manufacturer of the Processor.
  • processor_name: An integral part of the laptop, the processor will determine how powerful your computer is.
  • processor_gnrtn: A geneartion is a group of processors that were launched in some particular years and had significant improvements from processors launched earlier.
  • ram_gb: The storage size of ram in GB(Giga Bytes)
  • ram_type: The type of RAM used. This feature contains values such as-DDR4,LPDDR4X,LPDDR4,LPDDR3,DDR3,DDR5.
  • ssd : The storage size of SSD.
  • hdd : The storage size of Hard Disk used .
  • os : Operating system such as Windows, DOS, Mac.
  • os_bit : OS Bit with values as 64 bit, 32 bit.
  • graphic_card_gb : The size of graphic card used in the Laptop.
  • weight : The type of Laptop based on weight. This feature contains values such as- Casual, ThinNlight, Gaming.
  • display_size : Display size in inches .
  • warranty : Warranty in no. of years .
  • Touchscreen : This feature reveals whether the Laptop has touchscreen feature or not.
  • msoffice : This feature reveals whether the Laptop has MS Office installed or not.
  • old_price : Price of the Laptop when it was released. discount : Discount percentage for price available for the laptop.
  • star_rating : User rating for the laptop with max rating as 5.0.
  • ratings : Total no. of ratings received for the Laptop model.
  • reviews : No. of reviews received for the Laptop model.

This data set is available on kagle you can find it in here.

1. Data Preparation

Load the require package

library(lubridate)    # Provides functions for working with dates and times in R
library(dplyr)       # Package for data manipulation
library(MASS)        # Collection of functions and datasets for applied statistics
library(tidyverse)   # Collection of packages for data manipulation and visualization
library(caret)       # Comprehensive toolkit for machine learning
library(plotly)      # Interactive plotting library
library(data.table)  # Enhanced data frame for efficient data manipulation
library(GGally)      # Extension to ggplot2 for exploratory data analysis
library(tidymodels)  # Framework for modeling and machine learning
library(car)         # Tools for applied regression analysis
library(scales)      # Functions for scaling and formatting plot axes
library(lmtest)      # Diagnostic tests and specification tests for linear regression models

Load DataSet

# read data copiers
laptopSales <- read.csv("data_input/Laptop_data.csv")
rmarkdown::paged_table(laptopSales)

2. Data Cleansing

Checking Data Structure

This stage is carried out with the aim of checking regarding the suitability of the data type for each column/variable of the data that we have.

glimpse(laptopSales)
#> Rows: 896
#> Columns: 23
#> $ brand           <chr> "Lenovo", "Lenovo", "Avita", "Avita", "Avita", "Avita"…
#> $ model           <chr> "A6-9225", "Ideapad", "PURA", "PURA", "PURA", "PURA", …
#> $ processor_brand <chr> "AMD", "AMD", "AMD", "AMD", "AMD", "AMD", "AMD", "AMD"…
#> $ processor_name  <chr> "A6-9225 Processor", "APU Dual", "APU Dual", "APU Dual…
#> $ processor_gnrtn <chr> "10th", "10th", "10th", "10th", "10th", "10th", "10th"…
#> $ ram_gb          <chr> "4 GB GB", "4 GB GB", "4 GB GB", "4 GB GB", "4 GB GB",…
#> $ ram_type        <chr> "DDR4", "DDR4", "DDR4", "DDR4", "DDR4", "DDR4", "DDR4"…
#> $ ssd             <chr> "0 GB", "0 GB", "128 GB", "128 GB", "256 GB", "256 GB"…
#> $ hdd             <chr> "1024 GB", "512 GB", "0 GB", "0 GB", "0 GB", "0 GB", "…
#> $ os              <chr> "Windows", "Windows", "Windows", "Windows", "Windows",…
#> $ os_bit          <chr> "64-bit", "64-bit", "64-bit", "64-bit", "64-bit", "64-…
#> $ graphic_card_gb <int> 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 2, …
#> $ weight          <chr> "ThinNlight", "Casual", "ThinNlight", "ThinNlight", "T…
#> $ display_size    <dbl> NA, NA, NA, NA, NA, 14.0, 14.0, NA, 14.0, NA, 15.6, NA…
#> $ warranty        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ Touchscreen     <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ msoffice        <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ latest_price    <int> 24990, 19590, 19990, 21490, 24990, 24990, 20900, 21896…
#> $ old_price       <int> 32790, 21325, 27990, 27990, 33490, 33490, 22825, 0, 27…
#> $ discount        <int> 23, 8, 28, 23, 25, 25, 8, 0, 2, 13, 17, 8, 9, 0, 6, 17…
#> $ star_rating     <dbl> 3.7, 3.6, 3.7, 3.7, 3.7, 3.7, 3.9, 3.9, 0.0, 4.2, 2.3,…
#> $ ratings         <int> 63, 1894, 1153, 1153, 1657, 1657, 1185, 219, 0, 76, 3,…
#> $ reviews         <int> 12, 256, 159, 159, 234, 234, 141, 18, 0, 13, 0, 5, 1, …

💡 From the results of the inspection above, what information do we get?

  • data has 896 rows, and 23 variables/columns
  • Variables with numeric data type are latest_price, old_price, discount , star_rating, ratings, reviews
  • order date, ship date = date
  • processor_brand, processor_gnrtn, ram_gb, ram_type, ssd, hdd, os, os_bit, , weight, display_size, warranty, Touchscreen, msoffice = factor

Data Type Adjustment

Converting features into Numeric data type, We have some categorical features in the dataset that contains numerical values in it.The features like ram_gb,ssd,hdd,os_bit has numerical values in it.

So instead of encoding we can directly convert the data type of these columns and provide the same numerical values present in it for modelling .

laptopSales_clean <- laptopSales %>%
  mutate(ram_gb = as.numeric(gsub(" GB", "", ram_gb)),
         ssd = as.numeric(gsub(" GB", "", ssd)),
         hdd = as.numeric(gsub(" GB", "", hdd)),
         os_bit = as.numeric(gsub("-bit", "", os_bit))) %>%
  mutate_at(vars(brand, model, processor_name, display_size), as.factor)

Missing Value Check & Handling

This step needs to be done to find out if there is a missing value from the data we have, to check it we can use is.na

Checking in Size

colSums(is.na(laptopSales_clean))
#>           brand           model processor_brand  processor_name processor_gnrtn 
#>               0              95               0               0             239 
#>          ram_gb        ram_type             ssd             hdd              os 
#>               0               0               0               0               0 
#>          os_bit graphic_card_gb          weight    display_size        warranty 
#>               0               0               0             332               0 
#>     Touchscreen        msoffice    latest_price       old_price        discount 
#>               0               0               0               0               0 
#>     star_rating         ratings         reviews 
#>               0               0               0

Checking in percentage

# Calculate the percentage of missing values in each variable
missing_percent <- colMeans(is.na(laptopSales_clean)) * 100

# Create a data frame for visualization
missing_data <- data.frame(variable = names(missing_percent),
                           missing_percent = missing_percent)

# Filter variables with missing values
missing_data <- missing_data[missing_data$missing_percent > 0, ]

# Create the interactive bar plot with tooltips
p <- plot_ly(data = missing_data, x = ~variable, y = ~missing_percent, type = "bar",
             text = ~paste0(variable, ": ", round(missing_percent, 2), "%"),
             hoverinfo = "text", marker = list(color = "steelblue"))

# Set plot labels and title
p <- layout(p, x = list(title = "Variable"), y = list(title = "Percentage of Missing Values"),
            title = "Percentage of Missing Values in Variables")
p

A variable with a high percentage of missing values, such as 37% display_size, raises concerns about the overall quality and reliability of the data. It suggests that a significant portion of the variable’s values is unknown or unrecorded, potentially introducing bias or inaccuracies into the analysis. So i decided to drop the column out of the data Frame

laptopSales_clean <- laptopSales_clean %>%
  select(-c(display_size)) 

To handle missing other values in my regression model, I have idea depending on the nature of the missing data and the requirements of my analysis.

Since deleted data may include useful information, we are unable to completely remove all of the missing values from the dataset. - May result in the erasure of a significant portion of the data. - Can lead to a bias in the dataset if a significant portion of a certain kind of variable is removed from it. - When there are missing data, the production model won’t know what to do. Therefore, we need to use some additional methods to deal with these missing information.

Data Imputation Techniques

Using an arbitrary value to fill in the missing data is a crucial Imputation approach since it can handle both categorical and numerical variables. According to this method, we should aggregate the missing values in a column and assign them to a new value that is well beyond the column’s range. - Sometimes the missing data is useful in and of itself, making assumptions about it based on the most prevalent class would be inappropriate. In this situation, a value like Unknown might be used to replace them.

# Create holdingDataFrame with only the 'brand' variable
holdingDataFrame <- data.frame(brand = laptopSales_clean$brand, model = laptopSales_clean$model)
head(holdingDataFrame)
#>    brand   model
#> 1 Lenovo A6-9225
#> 2 Lenovo Ideapad
#> 3  Avita    PURA
#> 4  Avita    PURA
#> 5  Avita    PURA
#> 6  Avita    PURA
# Replace missing values in character columns with "Unknown"
laptopSales_clean <- laptopSales_clean %>%
  mutate(across(where(is.character), ~ifelse(is.na(.), "Unknown", .)))

# Replace missing values in factor columns with "Unknown"
laptopSales_clean <- laptopSales_clean %>%
  mutate(across(where(is.factor), ~ifelse(is.na(.), "Unknown", .)))

# Replace missing values in numeric columns with an arbitrary value
arbitrary_value <- -999  # You can choose any value that is outside the range of your numeric variables
laptopSales_clean <- laptopSales_clean %>%
  mutate(across(where(is.numeric), ~ifelse(is.na(.), arbitrary_value, .)))
# Drop the 'brand' variable from laptopSales_clean
laptopSales_clean$brand <- NULL
laptopSales_clean$model <- NULL

# Add the 'brand' variable back to laptopSales_clean
laptopSales_clean <- cbind(holdingDataFrame, laptopSales_clean)

# Print the updated data frame
head(laptopSales_clean)
#>    brand   model processor_brand processor_name processor_gnrtn ram_gb ram_type
#> 1 Lenovo A6-9225             AMD              1            10th      4     DDR4
#> 2 Lenovo Ideapad             AMD              2            10th      4     DDR4
#> 3  Avita    PURA             AMD              2            10th      4     DDR4
#> 4  Avita    PURA             AMD              2            10th      4     DDR4
#> 5  Avita    PURA             AMD              2            10th      4     DDR4
#> 6  Avita    PURA             AMD              2            10th      8     DDR4
#>   ssd  hdd      os os_bit graphic_card_gb     weight warranty Touchscreen
#> 1   0 1024 Windows     64               0 ThinNlight        0          No
#> 2   0  512 Windows     64               0     Casual        0          No
#> 3 128    0 Windows     64               0 ThinNlight        0          No
#> 4 128    0 Windows     64               0 ThinNlight        0          No
#> 5 256    0 Windows     64               0 ThinNlight        0          No
#> 6 256    0 Windows     64               0 ThinNlight        0          No
#>   msoffice latest_price old_price discount star_rating ratings reviews
#> 1       No        24990     32790       23         3.7      63      12
#> 2       No        19590     21325        8         3.6    1894     256
#> 3       No        19990     27990       28         3.7    1153     159
#> 4       No        21490     27990       23         3.7    1153     159
#> 5       No        24990     33490       25         3.7    1657     234
#> 6       No        24990     33490       25         3.7    1657     234
glimpse(laptopSales_clean)
#> Rows: 896
#> Columns: 22
#> $ brand           <fct> Lenovo, Lenovo, Avita, Avita, Avita, Avita, HP, Lenovo…
#> $ model           <fct> A6-9225, Ideapad, PURA, PURA, PURA, PURA, APU, APU, At…
#> $ processor_brand <chr> "AMD", "AMD", "AMD", "AMD", "AMD", "AMD", "AMD", "AMD"…
#> $ processor_name  <int> 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 6, 6, 6, 7, 4, 4, 4, 7, …
#> $ processor_gnrtn <chr> "10th", "10th", "10th", "10th", "10th", "10th", "10th"…
#> $ ram_gb          <dbl> 4, 4, 4, 4, 4, 8, 4, 4, 32, 4, 4, 4, 4, 8, 4, 4, 4, 8,…
#> $ ram_type        <chr> "DDR4", "DDR4", "DDR4", "DDR4", "DDR4", "DDR4", "DDR4"…
#> $ ssd             <dbl> 0, 0, 128, 128, 256, 256, 0, 0, 32, 256, 0, 0, 0, 512,…
#> $ hdd             <dbl> 1024, 512, 0, 0, 0, 0, 1024, 1024, 0, 0, 1024, 1024, 1…
#> $ os              <chr> "Windows", "Windows", "Windows", "Windows", "Windows",…
#> $ os_bit          <dbl> 64, 64, 64, 64, 64, 64, 32, 64, 32, 64, 64, 64, 64, 32…
#> $ graphic_card_gb <int> 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 2, …
#> $ weight          <chr> "ThinNlight", "Casual", "ThinNlight", "ThinNlight", "T…
#> $ warranty        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ Touchscreen     <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ msoffice        <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
#> $ latest_price    <int> 24990, 19590, 19990, 21490, 24990, 24990, 20900, 21896…
#> $ old_price       <int> 32790, 21325, 27990, 27990, 33490, 33490, 22825, 0, 27…
#> $ discount        <int> 23, 8, 28, 23, 25, 25, 8, 0, 2, 13, 17, 8, 9, 0, 6, 17…
#> $ star_rating     <dbl> 3.7, 3.6, 3.7, 3.7, 3.7, 3.7, 3.9, 3.9, 0.0, 4.2, 2.3,…
#> $ ratings         <int> 63, 1894, 1153, 1153, 1657, 1657, 1185, 219, 0, 76, 3,…
#> $ reviews         <int> 12, 256, 159, 159, 234, 234, 141, 18, 0, 13, 0, 5, 1, …

Categorical Data Encoding

Categorical data encoding is an essential step in regression analysis because most regression algorithms cannot directly handle categorical variables. Categorical variables represent qualitative data that are typically divided into distinct categories or groups, such as processor_gnrtn, ram_type, Touchscreen,msoffice, os, weight, and processor_brand.

categorical data encoding is necessary for regression analysis because it allows us to transform qualitative categorical variables into numerical representations that can be processed by regression algorithms. By doing so, we can include categorical variables in the regression model and uncover any relationships or patterns they may have with the dependent variable.

library(dplyr)

# Define the custom mapping
custom_mapping <- list(
  processor_gnrtn = c('4th' = 0, '7th' = 1, '8th' = 2, 'Unknown' = 3, '9th' = 4, '10th' = 5, '11th' = 6, '12th' = 7),
  ram_type = c('Ddr3' = 0, 'Lpddr3' = 1, 'Ddr4' = 2, 'Lpddr4' = 3, 'Lpddr4x' = 4, 'Ddr5' = 5),
  Touchscreen = c('No' = 0, 'Yes' = 1),
  msoffice = c('No' = 0, 'Yes' = 1),
  os = c('Windows' = 0, 'Mac' = 1, 'DOS' = 2),
  weight = c('Casual' = 0, 'Gaming' = 1, 'ThinNlight' = 2),
  processor_brand = c('AMD' = 0, 'Intel' = 1, 'M1' = 2, 'MediaTek' = 3, 'Qualcomm' = 4)
)

# Apply the custom mapping using mutate() and ifelse()
laptopSales_clean <- laptopSales_clean %>%
  mutate(processor_gnrtn = ifelse(processor_gnrtn %in% names(custom_mapping$processor_gnrtn),
                                 custom_mapping$processor_gnrtn[processor_gnrtn],
                                 processor_gnrtn),
         ram_type = ifelse(ram_type %in% names(custom_mapping$ram_type),
                           custom_mapping$ram_type[ram_type],
                           ram_type),
         Touchscreen = ifelse(Touchscreen %in% names(custom_mapping$Touchscreen),
                              custom_mapping$Touchscreen[Touchscreen],
                              Touchscreen),
         msoffice = ifelse(msoffice %in% names(custom_mapping$msoffice),
                           custom_mapping$msoffice[msoffice],
                           msoffice),
         os = ifelse(os %in% names(custom_mapping$os),
                     custom_mapping$os[os],
                     os),
         weight = ifelse(weight %in% names(custom_mapping$weight),
                         custom_mapping$weight[weight],
                         weight),
         processor_brand = ifelse(processor_brand %in% names(custom_mapping$processor_brand),
                                  custom_mapping$processor_brand[processor_brand],
                                  processor_brand)
  )

3. Exploratory Data Analysis (EDA)

Exploratory data analysis is a phase where we explore the data variables, see if there are any pattern that can indicate any kind of correlation between variables.

Find the Pearson correlation between features.

ggcorr(laptopSales_clean, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)

In the correlation chart, - it can be seen that only a few variables have a positive effect on latest price where the old_price factor has the highest positive correlation compared to other factors. - there are several variables that are moderately correlated with the last price or equal to a correlation >= 0.5 such as graphic_card_gb, ssd, and ram_gb

4. Modelling

Feature Selection

In feature selection I will choose a predictor variable for target prediction, I will use several methods and then compare the results of the model :

head(laptopSales_clean)
#>    brand   model processor_brand processor_name processor_gnrtn ram_gb ram_type
#> 1 Lenovo A6-9225               0              1               5      4     DDR4
#> 2 Lenovo Ideapad               0              2               5      4     DDR4
#> 3  Avita    PURA               0              2               5      4     DDR4
#> 4  Avita    PURA               0              2               5      4     DDR4
#> 5  Avita    PURA               0              2               5      4     DDR4
#> 6  Avita    PURA               0              2               5      8     DDR4
#>   ssd  hdd os os_bit graphic_card_gb weight warranty Touchscreen msoffice
#> 1   0 1024  0     64               0      2        0           0        0
#> 2   0  512  0     64               0      0        0           0        0
#> 3 128    0  0     64               0      2        0           0        0
#> 4 128    0  0     64               0      2        0           0        0
#> 5 256    0  0     64               0      2        0           0        0
#> 6 256    0  0     64               0      2        0           0        0
#>   latest_price old_price discount star_rating ratings reviews
#> 1        24990     32790       23         3.7      63      12
#> 2        19590     21325        8         3.6    1894     256
#> 3        19990     27990       28         3.7    1153     159
#> 4        21490     27990       23         3.7    1153     159
#> 5        24990     33490       25         3.7    1657     234
#> 6        24990     33490       25         3.7    1657     234

Model None

# model without predictors of data crime for `latest_price`
model_none <- lm(formula = latest_price ~ 1,
                 data = laptopSales_clean)

Model with all predictors

The formula for a regression model with all predictor variabel has is:

model_all <- lm(formula = latest_price ~ .,
                data = laptopSales_clean)

summary(model_all) 
#> 
#> Call:
#> lm(formula = latest_price ~ ., data = laptopSales_clean)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -59530  -6404      0   5486 154406 
#> 
#> Coefficients: (12 not defined because of singularities)
#>                      Estimate   Std. Error t value             Pr(>|t|)    
#> (Intercept)       67657.06641  22676.77917   2.984             0.002955 ** 
#> brandAPPLE        20857.43225  22276.19263   0.936             0.349458    
#> brandASUS         -1517.39740  12305.44877  -0.123             0.901899    
#> brandAvita       -18634.75488  22534.35821  -0.827             0.408567    
#> brandDELL        -13556.46677  14458.04240  -0.938             0.348774    
#> brandHP          -13960.93080  12614.79504  -1.107             0.268825    
#> brandiball       -11378.06995  28175.81766  -0.404             0.686474    
#> brandInfinix     -47712.72966  26965.50961  -1.769             0.077293 .  
#> brandlenovo        8713.51511  17858.80818   0.488             0.625776    
#> brandLenovo       -2254.64758  13051.45745  -0.173             0.862901    
#> brandLG           12507.26666  22179.04578   0.564             0.573000    
#> brandMi          -23678.24983  23945.94354  -0.989             0.323116    
#> brandMICROSOFT    24235.39734  23246.30371   1.043             0.297542    
#> brandMSI           9614.89660  27153.58592   0.354             0.723383    
#> brandNokia        -6476.70848  22551.41763  -0.287             0.774052    
#> brandrealme      -15140.66787  24359.62035  -0.622             0.534456    
#> brandRedmiBook   -41951.90792  19211.15083  -2.184             0.029336 *  
#> brandSAMSUNG      15382.07953  30623.03464   0.502             0.615622    
#> brandSmartron    -25462.28515  24181.17227  -1.053             0.292739    
#> brandVaio        -10075.80053  26698.35016  -0.377             0.706003    
#> model14s            192.79509  18301.19526   0.011             0.991598    
#> model15           24097.85256  21331.79767   1.130             0.259030    
#> model15-ec1105AX -19765.80844  23517.22699  -0.840             0.400945    
#> model15q          -6720.48194  19025.69913  -0.353             0.724028    
#> model15s         -11280.94463  17504.77836  -0.644             0.519510    
#> model250          -7585.90395  23531.16795  -0.322             0.747270    
#> model250-G6       -8156.00671  23431.70644  -0.348             0.727895    
#> model3000          1094.43466  24985.01364   0.044             0.965074    
#> model3511        -10111.36172  24915.02239  -0.406             0.684997    
#> model430           1842.87174  23607.53165   0.078             0.937802    
#> modelA6-9225     -20981.08957  24831.28072  -0.845             0.398450    
#> modelAlpha       -58253.94690  23823.73724  -2.445             0.014739 *  
#> modelAMD            119.51677  23971.81365   0.005             0.996024    
#> modelAPU         -46179.04470  19884.66840  -2.322             0.020520 *  
#> modelAspire      -16950.57068  21133.51599  -0.802             0.422803    
#> modelAsus        -18410.05267  24041.50775  -0.766             0.444094    
#> modelASUS        -17708.88614  18115.51515  -0.978             0.328656    
#> modelAthlon      -71605.00706  24464.62781  -2.927             0.003543 ** 
#> modelB50-70      -11123.97309  25483.95915  -0.437             0.662611    
#> modelBook         -3622.58717  17490.11001  -0.207             0.835979    
#> modelBook(Slim)            NA           NA      NA                   NA    
#> modelBravo       -32714.62029  23926.30146  -1.367             0.171998    
#> modelCeleron     -13413.51830  19643.27247  -0.683             0.494940    
#> modelChromebook  -16127.64724  17707.31743  -0.911             0.362741    
#> modelCommercial  -20607.83581  24028.84499  -0.858             0.391411    
#> modelCompBook              NA           NA      NA                   NA    
#> modelConceptD    -13605.78623  26556.47266  -0.512             0.608590    
#> modelCosmos        4844.21003  18958.99429   0.256             0.798410    
#> modelCreator      -2505.33074  23745.40805  -0.106             0.916005    
#> modelDA           -3811.72797  23281.02839  -0.164             0.869997    
#> modelDELL        -12186.27995  22001.91207  -0.554             0.579854    
#> modelDelta         8532.73035  24015.85442   0.355             0.722482    
#> modelDual        -35102.59949  26522.74085  -1.323             0.186134    
#> modelE            -5838.82754  18678.22083  -0.313             0.754683    
#> modelEeeBook     -17455.81161  19115.23668  -0.913             0.361479    
#> modelEnvy          5653.15121  18068.23538   0.313             0.754473    
#> modelExpertBook   -3691.05193  18025.46780  -0.205             0.837816    
#> modelExtensa     -20976.51161  26815.90097  -0.782             0.434355    
#> modelF17         -36324.31181  24028.49661  -1.512             0.131088    
#> modelG15         -21241.67505  21253.33011  -0.999             0.317945    
#> modelG3          -15242.80080  24893.99382  -0.612             0.540546    
#> modelG5          -13834.24680  22272.83386  -0.621             0.534732    
#> modelG7           26620.37540  25959.75659   1.025             0.305530    
#> modelGalaxy                NA           NA      NA                   NA    
#> modelGAMING      -18801.16342  24925.52434  -0.754             0.450944    
#> modelGE76         82907.66296  24383.83636   3.400             0.000714 ***
#> modelGF63        -32660.78088  18050.79401  -1.809             0.070850 .  
#> modelGF65        -49733.45176  20530.11489  -2.422             0.015686 *  
#> modelGP65         17839.22983  24353.93435   0.732             0.464126    
#> modelGP76         -2794.33781  24236.05426  -0.115             0.908245    
#> modelGram                  NA           NA      NA                   NA    
#> modelGS          -20417.01897  24975.20716  -0.817             0.413945    
#> modelGS66          9772.21867  24240.72863   0.403             0.686983    
#> modelHP           -8328.56825  18097.79860  -0.460             0.645527    
#> modelIdeapad      -9490.09394  18511.76857  -0.513             0.608368    
#> modelIdeaPad      -8261.96979  18506.92500  -0.446             0.655437    
#> modelIDEAPAD     -27694.33080  24851.46810  -1.114             0.265519    
#> modelINBook       43462.57580  19556.49311   2.222             0.026596 *  
#> modelInpiron      -3248.22309  24879.11773  -0.131             0.896163    
#> modelInspiron     -1082.02244  18835.54547  -0.057             0.954208    
#> modelINSPIRON     -2297.95132  21968.41401  -0.105             0.916723    
#> modelInsprion     -6823.87475  24999.26495  -0.273             0.784968    
#> modelIntel         7557.87177  20163.29294   0.375             0.707906    
#> modelKatana      -36046.76369  18685.05003  -1.929             0.054141 .  
#> modelLegion      -10279.34083  18970.94270  -0.542             0.588108    
#> modelLenovo       -3409.97116  24771.53672  -0.138             0.890554    
#> modelLiber         5223.77045  10804.61869   0.483             0.628920    
#> modelMacBook               NA           NA      NA                   NA    
#> modelModern      -27873.79796  17996.47029  -1.549             0.121901    
#> modelNitro        -2740.33058  22529.69491  -0.122             0.903228    
#> modelNotebook     -8679.14290  23524.91475  -0.369             0.712296    
#> modelOmen         -9137.66245  20714.61814  -0.441             0.659271    
#> modelOMEN         -9493.33349  18567.43496  -0.511             0.609321    
#> modelPavilion     -4451.83204  16936.59628  -0.263             0.792747    
#> modelPentium      -8357.98578  18777.33878  -0.445             0.656387    
#> modelPredator    -21419.53855  21914.72342  -0.977             0.328730    
#> modelPrestige    -13694.28024  18489.22922  -0.741             0.459163    
#> modelPro           7270.55427  19441.14358   0.374             0.708542    
#> modelPulse       -37389.86396  19664.78730  -1.901             0.057693 .  
#> modelPURA                  NA           NA      NA                   NA    
#> modelPureBook              NA           NA      NA                   NA    
#> modelRog           6201.99937  24121.19874   0.257             0.797168    
#> modelROG          -7311.70883  17888.18196  -0.409             0.682860    
#> modelRyzen        -8266.15721  17286.38328  -0.478             0.632675    
#> modelSE                    NA           NA      NA                   NA    
#> modelSpectre      35620.31681  17488.24282   2.037             0.042070 *  
#> modelSpin         36748.95547  27401.51635   1.341             0.180344    
#> modelStealth      -9213.45777  20775.51321  -0.443             0.657567    
#> modelSummit      -35808.58616  23555.60621  -1.520             0.128950    
#> modelSurface               NA           NA      NA                   NA    
#> modelSwift       -10707.53910  21928.57789  -0.488             0.625507    
#> modelSword       -34241.89252  20788.07859  -1.647             0.099999 .  
#> modelt.book                NA           NA      NA                   NA    
#> modelThinkbook    -1640.55435  20492.00492  -0.080             0.936215    
#> modelThinkBook    -2699.63481  20267.76615  -0.133             0.894077    
#> modelThinkpad    -51593.20398  25614.99920  -2.014             0.044399 *  
#> modelThinkPad     19267.96161  19252.41994   1.001             0.317290    
#> modelThinpad     -10686.85099  24542.08401  -0.435             0.663379    
#> modelTravelmate  -17847.06068  23016.98810  -0.775             0.438391    
#> modelTUF         -25770.35401  18298.35503  -1.408             0.159504    
#> modelv15         -16942.47009  24428.33439  -0.694             0.488205    
#> modelV15         -17083.33707  24669.40340  -0.692             0.488875    
#> modelVivo        -14447.92878  23902.67216  -0.604             0.545755    
#> modelVivoBook    -12956.17719  17475.92680  -0.741             0.458733    
#> modelVivoBook14   -8953.63224  24054.53499  -0.372             0.709848    
#> modelVostro       -5245.59184  19081.74915  -0.275             0.783479    
#> modelWF65                  NA           NA      NA                   NA    
#> modelX1                    NA           NA      NA                   NA    
#> modelx360         -3733.32671  20358.31725  -0.183             0.854556    
#> modelX390          9489.82381  24530.60899   0.387             0.698988    
#> modelXPS          26596.75274  20586.29599   1.292             0.196825    
#> modelYoga         -4480.19759  19073.28931  -0.235             0.814365    
#> modelZenbook      -3724.22436  18713.69951  -0.199             0.842316    
#> modelZenBook        144.07041  18006.70785   0.008             0.993619    
#> modelZephyrus     49038.22746  19309.07108   2.540             0.011326 *  
#> processor_brand   -5603.88832   3333.42429  -1.681             0.093216 .  
#> processor_name     -302.57815    199.62706  -1.516             0.130073    
#> processor_gnrtn     -89.19753    880.65195  -0.101             0.919355    
#> ram_gb             1022.45171    180.88930   5.652   0.0000000236302574 ***
#> ram_typeDDR4      -5799.66385   7831.00949  -0.741             0.459200    
#> ram_typeDDR5      17196.23435  10756.43666   1.599             0.110371    
#> ram_typeLPDDR3    18696.10910   9071.72573   2.061             0.039705 *  
#> ram_typeLPDDR4   -14529.36400   8818.50173  -1.648             0.099914 .  
#> ram_typeLPDDR4X  -12694.99083   7907.73032  -1.605             0.108890    
#> ssd                  50.75616      3.49021  14.542 < 0.0000000000000002 ***
#> hdd                  13.41026      2.09281   6.408   0.0000000002822356 ***
#> os                22683.05295   2931.22877   7.738   0.0000000000000382 ***
#> os_bit             -155.54614     85.40778  -1.821             0.069030 .  
#> graphic_card_gb    4241.29732    556.03883   7.628   0.0000000000000845 ***
#> weight              625.64006    940.97392   0.665             0.506358    
#> warranty           2685.30535   1523.67560   1.762             0.078470 .  
#> Touchscreen       10505.74460   2580.38097   4.071   0.0000524482570815 ***
#> msoffice          -1730.07157   1829.33162  -0.946             0.344630    
#> old_price             0.28283      0.02102  13.457 < 0.0000000000000002 ***
#> discount          -1097.23683     86.04487 -12.752 < 0.0000000000000002 ***
#> star_rating       -1684.90027    377.03575  -4.469   0.0000092668907647 ***
#> ratings              -9.13735      3.59625  -2.541             0.011290 *  
#> reviews              64.23836     30.75408   2.089             0.037113 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 16370 on 655 degrees of freedom
#>   (95 observations deleted due to missingness)
#> Multiple R-squared:  0.887,  Adjusted R-squared:  0.862 
#> F-statistic: 35.45 on 145 and 655 DF,  p-value: < 0.00000000000000022

We obtain a lot more coefficients than we do variables. This is due to the category variables being converted into dummy variables. Using contrasts, for example, we may convert the warranty variable into many dummy variables.

  1. Interpretation of the coefficients for numerical predictors:
  • old_price = 0.24891, meaning that the latest_price value will increasse by 0.24891 provided that the values of other predictor variables are fixed.
  • discount = -894.37347, meaning that the latest_price value will decrease by -894.37347 provided that the values of other predictor variables are fixed.
  1. Predictor significance:
  • predictor variables that significantly influence inequality are brandRedmiBook, modelAlpha, modelAPU, modelAthlon, modelGE76, modelGF65, modelINBook, modelSpectre , modelThinkpad, modelZephyrus, ram_gb, ram_typeLPDDR3 , ssd, hdd, os, graphic_card_gb, Touchscreen, old_price, discount, star_rating, ratings and reviews
  1. Adjusted R-squared:
  • 0.862 , meaning that our model can properly explain the inequality of 86.2%

Model with correlation value (strong)

To select a variable that has the potential to be used as a predictor in the regression model, a predictor variable that has a strong correlation with the target variable will be selected.

#cek korelasi
ggcorr(laptopSales_clean, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)

💡 Insight: Strong correlation > 0.6 old_price

# Models with columns that have a fairly strong correlation
model_selection <- lm(formula = latest_price ~ old_price,
                      data = laptopSales_clean)

summary(model_selection)
#> 
#> Call:
#> lm(formula = latest_price ~ old_price, data = laptopSales_clean)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -51799 -13547  -8447   1917 420858 
#> 
#> Coefficients:
#>                Estimate  Std. Error t value            Pr(>|t|)    
#> (Intercept) 21132.14002  1934.73437   10.92 <0.0000000000000002 ***
#> old_price       0.62607     0.01856   33.74 <0.0000000000000002 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 30930 on 894 degrees of freedom
#> Multiple R-squared:  0.5601, Adjusted R-squared:  0.5596 
#> F-statistic:  1138 on 1 and 894 DF,  p-value: < 0.00000000000000022

Model Stepwise Reggresion backward

# Perform stepwise model selection using AIC
model_backward <- stepAIC(model_all, 
                            direction = "backward", 
                            trace = F)

# Print the summary of the simplified model
summary(model_backward)
#> 
#> Call:
#> lm(formula = latest_price ~ brand + model + processor_brand + 
#>     processor_name + ram_gb + ram_type + ssd + hdd + os + os_bit + 
#>     graphic_card_gb + warranty + Touchscreen + old_price + discount + 
#>     star_rating + ratings + reviews, data = laptopSales_clean)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -58883  -6584      0   5311 154896 
#> 
#> Coefficients: (12 not defined because of singularities)
#>                      Estimate   Std. Error t value             Pr(>|t|)    
#> (Intercept)       67806.03866  22632.07445   2.996             0.002838 ** 
#> brandAPPLE        20953.04832  22172.80842   0.945             0.345012    
#> brandASUS         -1997.95099  12266.47860  -0.163             0.870664    
#> brandAvita       -18160.19145  22345.42127  -0.813             0.416682    
#> brandDELL        -13865.08518  14372.99519  -0.965             0.335068    
#> brandHP          -14458.06590  12584.19477  -1.149             0.251012    
#> brandiball       -12371.59967  27839.25253  -0.444             0.656904    
#> brandInfinix     -47702.06029  26864.11590  -1.776             0.076248 .  
#> brandlenovo        7963.03635  17704.43392   0.450             0.653020    
#> brandLenovo       -3183.27170  12948.38845  -0.246             0.805880    
#> brandLG           12516.61507  22033.17585   0.568             0.570174    
#> brandMi          -25183.06463  23784.59603  -1.059             0.290081    
#> brandMICROSOFT    24259.23809  23108.44245   1.050             0.294196    
#> brandMSI          11065.30557  26989.69853   0.410             0.681952    
#> brandNokia        -5863.16494  22407.81243  -0.262             0.793668    
#> brandrealme      -16955.97004  24154.84702  -0.702             0.482946    
#> brandRedmiBook   -41477.07046  18907.91790  -2.194             0.028611 *  
#> brandSAMSUNG      15129.77772  30555.84508   0.495             0.620658    
#> brandSmartron    -25438.26273  24100.21780  -1.056             0.291575    
#> brandVaio         -9759.33333  26601.52838  -0.367             0.713833    
#> model14s            326.58570  18018.05240   0.018             0.985544    
#> model15           25010.09178  21272.33276   1.176             0.240136    
#> model15-ec1105AX -19076.77360  23382.20096  -0.816             0.414871    
#> model15q          -6306.35390  18995.74333  -0.332             0.740004    
#> model15s         -11217.45588  17189.38280  -0.653             0.514255    
#> model250          -8509.49091  23388.19252  -0.364             0.716097    
#> model250-G6       -7447.92017  23368.37002  -0.319             0.750042    
#> model3000          1090.98555  24775.64341   0.044             0.964890    
#> model3511         -9904.17253  24712.29756  -0.401             0.688713    
#> model430           1874.94673  23428.40504   0.080             0.936239    
#> modelA6-9225     -19501.76120  24510.89299  -0.796             0.426530    
#> modelAlpha       -59045.93937  23759.42904  -2.485             0.013197 *  
#> modelAMD            301.28654  23864.48520   0.013             0.989931    
#> modelAPU         -46115.72378  19668.17063  -2.345             0.019339 *  
#> modelAspire      -16693.12886  21002.77652  -0.795             0.427013    
#> modelAsus        -18717.01731  23969.92036  -0.781             0.435169    
#> modelASUS        -16874.09030  17923.88681  -0.941             0.346830    
#> modelAthlon      -71356.13370  24322.42210  -2.934             0.003465 ** 
#> modelB50-70      -10741.63618  25445.77519  -0.422             0.673062    
#> modelBook         -2332.60760  17342.25166  -0.135             0.893045    
#> modelBook(Slim)            NA           NA      NA                   NA    
#> modelBravo       -34513.45598  23833.25635  -1.448             0.148059    
#> modelCeleron     -12515.60319  19574.01204  -0.639             0.522786    
#> modelChromebook  -16036.59028  17666.20472  -0.908             0.364340    
#> modelCommercial  -20636.91522  23925.98730  -0.863             0.388709    
#> modelCompBook              NA           NA      NA                   NA    
#> modelConceptD    -13197.18411  26510.25918  -0.498             0.618781    
#> modelCosmos        3868.64321  18708.01404   0.207             0.836237    
#> modelCreator      -4081.35577  23652.66591  -0.173             0.863055    
#> modelDA           -3430.64774  23215.18377  -0.148             0.882565    
#> modelDELL        -12946.50653  21827.09900  -0.593             0.553292    
#> modelDelta         6396.71477  23906.05509   0.268             0.789109    
#> modelDual        -35165.47085  26443.48992  -1.330             0.184034    
#> modelE            -6219.52003  18394.31076  -0.338             0.735379    
#> modelEeeBook     -17210.20587  19074.59861  -0.902             0.367250    
#> modelEnvy          4922.31372  17914.21273   0.275             0.783578    
#> modelExpertBook   -3756.84414  17864.99041  -0.210             0.833506    
#> modelExtensa     -21026.37592  26726.00875  -0.787             0.431718    
#> modelF17         -36122.58689  23890.91616  -1.512             0.131019    
#> modelG15         -20994.60145  21091.25850  -0.995             0.319899    
#> modelG3          -15110.17732  24734.61163  -0.611             0.541482    
#> modelG5          -13711.31830  22054.03635  -0.622             0.534345    
#> modelG7           27232.97220  25727.39511   1.059             0.290207    
#> modelGalaxy                NA           NA      NA                   NA    
#> modelGAMING      -18181.71385  24791.04701  -0.733             0.463577    
#> modelGE76         81892.64679  24302.69180   3.370             0.000796 ***
#> modelGF63        -33214.41541  18013.82313  -1.844             0.065657 .  
#> modelGF65        -50064.36488  20469.32266  -2.446             0.014713 *  
#> modelGP65         16245.70083  24244.73482   0.670             0.503047    
#> modelGP76         -3900.76356  24136.66990  -0.162             0.871661    
#> modelGram                  NA           NA      NA                   NA    
#> modelGS          -21296.62246  24718.67363  -0.862             0.389243    
#> modelGS66          8496.18112  24121.02548   0.352             0.724778    
#> modelHP           -8547.78914  17899.08256  -0.478             0.633126    
#> modelIdeapad      -9667.28710  18325.03017  -0.528             0.597993    
#> modelIdeaPad      -7288.76931  18253.59578  -0.399             0.689797    
#> modelIDEAPAD     -26554.72599  24675.44681  -1.076             0.282250    
#> modelINBook       44078.78723  19507.88632   2.260             0.024176 *  
#> modelInpiron      -2832.02241  24770.84449  -0.114             0.909012    
#> modelInspiron     -1070.46578  18573.07821  -0.058             0.954057    
#> modelINSPIRON     -2304.33060  21791.67925  -0.106             0.915818    
#> modelInsprion     -6845.91170  24778.24145  -0.276             0.782414    
#> modelIntel         7607.31872  19940.20922   0.382             0.702951    
#> modelKatana      -37034.26537  18636.25253  -1.987             0.047313 *  
#> modelLegion      -10068.07968  18797.42637  -0.536             0.592409    
#> modelLenovo       -3911.19762  24597.96025  -0.159             0.873714    
#> modelLiber         5527.72316  10402.71995   0.531             0.595340    
#> modelMacBook               NA           NA      NA                   NA    
#> modelModern      -29637.06619  17908.23882  -1.655             0.098413 .  
#> modelNitro        -2755.73145  22237.73513  -0.124             0.901415    
#> modelNotebook     -7057.91968  23205.71487  -0.304             0.761113    
#> modelOmen         -9609.74489  20546.19727  -0.468             0.640144    
#> modelOMEN         -9508.81833  18396.89082  -0.517             0.605420    
#> modelPavilion     -4722.40097  16732.17346  -0.282             0.777852    
#> modelPentium      -8141.37873  18686.95825  -0.436             0.663218    
#> modelPredator    -20946.26558  21789.81158  -0.961             0.336761    
#> modelPrestige    -15141.43210  18425.63339  -0.822             0.411512    
#> modelPro           8407.70773  19316.80149   0.435             0.663521    
#> modelPulse       -38425.95861  19592.46262  -1.961             0.050270 .  
#> modelPURA                  NA           NA      NA                   NA    
#> modelPureBook              NA           NA      NA                   NA    
#> modelRog           6517.75715  24030.16006   0.271             0.786297    
#> modelROG          -7092.32998  17771.44135  -0.399             0.689959    
#> modelRyzen        -7880.46305  17078.40676  -0.461             0.644644    
#> modelSE                    NA           NA      NA                   NA    
#> modelSpectre      36141.42688  17378.75021   2.080             0.037946 *  
#> modelSpin         37220.54151  27318.66816   1.362             0.173519    
#> modelStealth     -10111.38319  20715.98962  -0.488             0.625645    
#> modelSummit      -35622.67400  23518.76190  -1.515             0.130341    
#> modelSurface               NA           NA      NA                   NA    
#> modelSwift       -10264.08388  21861.21608  -0.470             0.638860    
#> modelSword       -36238.36542  20670.51738  -1.753             0.080043 .  
#> modelt.book                NA           NA      NA                   NA    
#> modelThinkbook     -841.34026  20288.13874  -0.041             0.966934    
#> modelThinkBook    -3301.01601  20053.40131  -0.165             0.869301    
#> modelThinkpad    -50421.24815  25293.66128  -1.993             0.046626 *  
#> modelThinkPad     20571.66678  18957.74027   1.085             0.278260    
#> modelThinpad     -10439.11119  24345.30467  -0.429             0.668214    
#> modelTravelmate  -18495.91163  22827.78217  -0.810             0.418097    
#> modelTUF         -25327.45428  18104.70648  -1.399             0.162301    
#> modelv15         -16653.78412  24289.61241  -0.686             0.493185    
#> modelV15         -15848.00420  24320.60383  -0.652             0.514868    
#> modelVivo        -14531.33374  23783.68697  -0.611             0.541424    
#> modelVivoBook    -12900.05952  17327.98941  -0.744             0.456862    
#> modelVivoBook14   -7604.50777  23794.69774  -0.320             0.749382    
#> modelVostro       -5467.23643  18687.85849  -0.293             0.769954    
#> modelWF65                  NA           NA      NA                   NA    
#> modelX1                    NA           NA      NA                   NA    
#> modelx360         -3246.85193  20160.13618  -0.161             0.872101    
#> modelX390          9722.52867  24421.41044   0.398             0.690675    
#> modelXPS          26702.06998  20260.76332   1.318             0.187989    
#> modelYoga         -3936.56025  18831.46025  -0.209             0.834480    
#> modelZenbook      -3743.52604  18564.75392  -0.202             0.840255    
#> modelZenBook        896.05591  17833.89463   0.050             0.959943    
#> modelZephyrus     49860.14719  19172.83953   2.601             0.009516 ** 
#> processor_brand   -5697.55200   3264.14668  -1.745             0.081365 .  
#> processor_name     -292.64779    180.23915  -1.624             0.104927    
#> ram_gb             1020.65575    179.69537   5.680  0.00000002023641693 ***
#> ram_typeDDR4      -6127.10247   7352.74709  -0.833             0.404973    
#> ram_typeDDR5      16969.90914  10336.51322   1.642             0.101121    
#> ram_typeLPDDR3    18104.16847   8953.02393   2.022             0.043567 *  
#> ram_typeLPDDR4   -14707.21090   8597.28133  -1.711             0.087611 .  
#> ram_typeLPDDR4X  -13022.64794   7456.08282  -1.747             0.081177 .  
#> ssd                  50.67682      3.40783  14.871 < 0.0000000000000002 ***
#> hdd                  13.33086      2.08507   6.393  0.00000000030736938 ***
#> os                22751.68108   2863.88065   7.944  0.00000000000000848 ***
#> os_bit             -144.35363     84.46219  -1.709             0.087905 .  
#> graphic_card_gb    4166.09241    538.41993   7.738  0.00000000000003821 ***
#> warranty           2124.31800   1412.35192   1.504             0.133036    
#> Touchscreen        9834.80543   2484.18944   3.959  0.00008347721279763 ***
#> old_price             0.28100      0.02093  13.428 < 0.0000000000000002 ***
#> discount          -1095.16380     84.10894 -13.021 < 0.0000000000000002 ***
#> star_rating       -1736.40019    370.99756  -4.680  0.00000347854288514 ***
#> ratings              -9.07415      3.59120  -2.527             0.011745 *  
#> reviews              63.74358     30.70616   2.076             0.038289 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 16350 on 658 degrees of freedom
#>   (95 observations deleted due to missingness)
#> Multiple R-squared:  0.8867, Adjusted R-squared:  0.8623 
#> F-statistic: 36.28 on 142 and 658 DF,  p-value: < 0.00000000000000022

Model Stepwise Reggresion forward

# Perform stepwise model selection using AIC
model_forward <- stepAIC(model_all, 
                            direction = "forward", 
                            trace = F)

# Print the summary of the simplified model
summary(model_forward)
#> 
#> Call:
#> lm(formula = latest_price ~ brand + model + processor_brand + 
#>     processor_name + processor_gnrtn + ram_gb + ram_type + ssd + 
#>     hdd + os + os_bit + graphic_card_gb + weight + warranty + 
#>     Touchscreen + msoffice + old_price + discount + star_rating + 
#>     ratings + reviews, data = laptopSales_clean)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -59530  -6404      0   5486 154406 
#> 
#> Coefficients: (12 not defined because of singularities)
#>                      Estimate   Std. Error t value             Pr(>|t|)    
#> (Intercept)       67657.06641  22676.77917   2.984             0.002955 ** 
#> brandAPPLE        20857.43225  22276.19263   0.936             0.349458    
#> brandASUS         -1517.39740  12305.44877  -0.123             0.901899    
#> brandAvita       -18634.75488  22534.35821  -0.827             0.408567    
#> brandDELL        -13556.46677  14458.04240  -0.938             0.348774    
#> brandHP          -13960.93080  12614.79504  -1.107             0.268825    
#> brandiball       -11378.06995  28175.81766  -0.404             0.686474    
#> brandInfinix     -47712.72966  26965.50961  -1.769             0.077293 .  
#> brandlenovo        8713.51511  17858.80818   0.488             0.625776    
#> brandLenovo       -2254.64758  13051.45745  -0.173             0.862901    
#> brandLG           12507.26666  22179.04578   0.564             0.573000    
#> brandMi          -23678.24983  23945.94354  -0.989             0.323116    
#> brandMICROSOFT    24235.39734  23246.30371   1.043             0.297542    
#> brandMSI           9614.89660  27153.58592   0.354             0.723383    
#> brandNokia        -6476.70848  22551.41763  -0.287             0.774052    
#> brandrealme      -15140.66787  24359.62035  -0.622             0.534456    
#> brandRedmiBook   -41951.90792  19211.15083  -2.184             0.029336 *  
#> brandSAMSUNG      15382.07953  30623.03464   0.502             0.615622    
#> brandSmartron    -25462.28515  24181.17227  -1.053             0.292739    
#> brandVaio        -10075.80053  26698.35016  -0.377             0.706003    
#> model14s            192.79509  18301.19526   0.011             0.991598    
#> model15           24097.85256  21331.79767   1.130             0.259030    
#> model15-ec1105AX -19765.80844  23517.22699  -0.840             0.400945    
#> model15q          -6720.48194  19025.69913  -0.353             0.724028    
#> model15s         -11280.94463  17504.77836  -0.644             0.519510    
#> model250          -7585.90395  23531.16795  -0.322             0.747270    
#> model250-G6       -8156.00671  23431.70644  -0.348             0.727895    
#> model3000          1094.43466  24985.01364   0.044             0.965074    
#> model3511        -10111.36172  24915.02239  -0.406             0.684997    
#> model430           1842.87174  23607.53165   0.078             0.937802    
#> modelA6-9225     -20981.08957  24831.28072  -0.845             0.398450    
#> modelAlpha       -58253.94690  23823.73724  -2.445             0.014739 *  
#> modelAMD            119.51677  23971.81365   0.005             0.996024    
#> modelAPU         -46179.04470  19884.66840  -2.322             0.020520 *  
#> modelAspire      -16950.57068  21133.51599  -0.802             0.422803    
#> modelAsus        -18410.05267  24041.50775  -0.766             0.444094    
#> modelASUS        -17708.88614  18115.51515  -0.978             0.328656    
#> modelAthlon      -71605.00706  24464.62781  -2.927             0.003543 ** 
#> modelB50-70      -11123.97309  25483.95915  -0.437             0.662611    
#> modelBook         -3622.58717  17490.11001  -0.207             0.835979    
#> modelBook(Slim)            NA           NA      NA                   NA    
#> modelBravo       -32714.62029  23926.30146  -1.367             0.171998    
#> modelCeleron     -13413.51830  19643.27247  -0.683             0.494940    
#> modelChromebook  -16127.64724  17707.31743  -0.911             0.362741    
#> modelCommercial  -20607.83581  24028.84499  -0.858             0.391411    
#> modelCompBook              NA           NA      NA                   NA    
#> modelConceptD    -13605.78623  26556.47266  -0.512             0.608590    
#> modelCosmos        4844.21003  18958.99429   0.256             0.798410    
#> modelCreator      -2505.33074  23745.40805  -0.106             0.916005    
#> modelDA           -3811.72797  23281.02839  -0.164             0.869997    
#> modelDELL        -12186.27995  22001.91207  -0.554             0.579854    
#> modelDelta         8532.73035  24015.85442   0.355             0.722482    
#> modelDual        -35102.59949  26522.74085  -1.323             0.186134    
#> modelE            -5838.82754  18678.22083  -0.313             0.754683    
#> modelEeeBook     -17455.81161  19115.23668  -0.913             0.361479    
#> modelEnvy          5653.15121  18068.23538   0.313             0.754473    
#> modelExpertBook   -3691.05193  18025.46780  -0.205             0.837816    
#> modelExtensa     -20976.51161  26815.90097  -0.782             0.434355    
#> modelF17         -36324.31181  24028.49661  -1.512             0.131088    
#> modelG15         -21241.67505  21253.33011  -0.999             0.317945    
#> modelG3          -15242.80080  24893.99382  -0.612             0.540546    
#> modelG5          -13834.24680  22272.83386  -0.621             0.534732    
#> modelG7           26620.37540  25959.75659   1.025             0.305530    
#> modelGalaxy                NA           NA      NA                   NA    
#> modelGAMING      -18801.16342  24925.52434  -0.754             0.450944    
#> modelGE76         82907.66296  24383.83636   3.400             0.000714 ***
#> modelGF63        -32660.78088  18050.79401  -1.809             0.070850 .  
#> modelGF65        -49733.45176  20530.11489  -2.422             0.015686 *  
#> modelGP65         17839.22983  24353.93435   0.732             0.464126    
#> modelGP76         -2794.33781  24236.05426  -0.115             0.908245    
#> modelGram                  NA           NA      NA                   NA    
#> modelGS          -20417.01897  24975.20716  -0.817             0.413945    
#> modelGS66          9772.21867  24240.72863   0.403             0.686983    
#> modelHP           -8328.56825  18097.79860  -0.460             0.645527    
#> modelIdeapad      -9490.09394  18511.76857  -0.513             0.608368    
#> modelIdeaPad      -8261.96979  18506.92500  -0.446             0.655437    
#> modelIDEAPAD     -27694.33080  24851.46810  -1.114             0.265519    
#> modelINBook       43462.57580  19556.49311   2.222             0.026596 *  
#> modelInpiron      -3248.22309  24879.11773  -0.131             0.896163    
#> modelInspiron     -1082.02244  18835.54547  -0.057             0.954208    
#> modelINSPIRON     -2297.95132  21968.41401  -0.105             0.916723    
#> modelInsprion     -6823.87475  24999.26495  -0.273             0.784968    
#> modelIntel         7557.87177  20163.29294   0.375             0.707906    
#> modelKatana      -36046.76369  18685.05003  -1.929             0.054141 .  
#> modelLegion      -10279.34083  18970.94270  -0.542             0.588108    
#> modelLenovo       -3409.97116  24771.53672  -0.138             0.890554    
#> modelLiber         5223.77045  10804.61869   0.483             0.628920    
#> modelMacBook               NA           NA      NA                   NA    
#> modelModern      -27873.79796  17996.47029  -1.549             0.121901    
#> modelNitro        -2740.33058  22529.69491  -0.122             0.903228    
#> modelNotebook     -8679.14290  23524.91475  -0.369             0.712296    
#> modelOmen         -9137.66245  20714.61814  -0.441             0.659271    
#> modelOMEN         -9493.33349  18567.43496  -0.511             0.609321    
#> modelPavilion     -4451.83204  16936.59628  -0.263             0.792747    
#> modelPentium      -8357.98578  18777.33878  -0.445             0.656387    
#> modelPredator    -21419.53855  21914.72342  -0.977             0.328730    
#> modelPrestige    -13694.28024  18489.22922  -0.741             0.459163    
#> modelPro           7270.55427  19441.14358   0.374             0.708542    
#> modelPulse       -37389.86396  19664.78730  -1.901             0.057693 .  
#> modelPURA                  NA           NA      NA                   NA    
#> modelPureBook              NA           NA      NA                   NA    
#> modelRog           6201.99937  24121.19874   0.257             0.797168    
#> modelROG          -7311.70883  17888.18196  -0.409             0.682860    
#> modelRyzen        -8266.15721  17286.38328  -0.478             0.632675    
#> modelSE                    NA           NA      NA                   NA    
#> modelSpectre      35620.31681  17488.24282   2.037             0.042070 *  
#> modelSpin         36748.95547  27401.51635   1.341             0.180344    
#> modelStealth      -9213.45777  20775.51321  -0.443             0.657567    
#> modelSummit      -35808.58616  23555.60621  -1.520             0.128950    
#> modelSurface               NA           NA      NA                   NA    
#> modelSwift       -10707.53910  21928.57789  -0.488             0.625507    
#> modelSword       -34241.89252  20788.07859  -1.647             0.099999 .  
#> modelt.book                NA           NA      NA                   NA    
#> modelThinkbook    -1640.55435  20492.00492  -0.080             0.936215    
#> modelThinkBook    -2699.63481  20267.76615  -0.133             0.894077    
#> modelThinkpad    -51593.20398  25614.99920  -2.014             0.044399 *  
#> modelThinkPad     19267.96161  19252.41994   1.001             0.317290    
#> modelThinpad     -10686.85099  24542.08401  -0.435             0.663379    
#> modelTravelmate  -17847.06068  23016.98810  -0.775             0.438391    
#> modelTUF         -25770.35401  18298.35503  -1.408             0.159504    
#> modelv15         -16942.47009  24428.33439  -0.694             0.488205    
#> modelV15         -17083.33707  24669.40340  -0.692             0.488875    
#> modelVivo        -14447.92878  23902.67216  -0.604             0.545755    
#> modelVivoBook    -12956.17719  17475.92680  -0.741             0.458733    
#> modelVivoBook14   -8953.63224  24054.53499  -0.372             0.709848    
#> modelVostro       -5245.59184  19081.74915  -0.275             0.783479    
#> modelWF65                  NA           NA      NA                   NA    
#> modelX1                    NA           NA      NA                   NA    
#> modelx360         -3733.32671  20358.31725  -0.183             0.854556    
#> modelX390          9489.82381  24530.60899   0.387             0.698988    
#> modelXPS          26596.75274  20586.29599   1.292             0.196825    
#> modelYoga         -4480.19759  19073.28931  -0.235             0.814365    
#> modelZenbook      -3724.22436  18713.69951  -0.199             0.842316    
#> modelZenBook        144.07041  18006.70785   0.008             0.993619    
#> modelZephyrus     49038.22746  19309.07108   2.540             0.011326 *  
#> processor_brand   -5603.88832   3333.42429  -1.681             0.093216 .  
#> processor_name     -302.57815    199.62706  -1.516             0.130073    
#> processor_gnrtn     -89.19753    880.65195  -0.101             0.919355    
#> ram_gb             1022.45171    180.88930   5.652   0.0000000236302574 ***
#> ram_typeDDR4      -5799.66385   7831.00949  -0.741             0.459200    
#> ram_typeDDR5      17196.23435  10756.43666   1.599             0.110371    
#> ram_typeLPDDR3    18696.10910   9071.72573   2.061             0.039705 *  
#> ram_typeLPDDR4   -14529.36400   8818.50173  -1.648             0.099914 .  
#> ram_typeLPDDR4X  -12694.99083   7907.73032  -1.605             0.108890    
#> ssd                  50.75616      3.49021  14.542 < 0.0000000000000002 ***
#> hdd                  13.41026      2.09281   6.408   0.0000000002822356 ***
#> os                22683.05295   2931.22877   7.738   0.0000000000000382 ***
#> os_bit             -155.54614     85.40778  -1.821             0.069030 .  
#> graphic_card_gb    4241.29732    556.03883   7.628   0.0000000000000845 ***
#> weight              625.64006    940.97392   0.665             0.506358    
#> warranty           2685.30535   1523.67560   1.762             0.078470 .  
#> Touchscreen       10505.74460   2580.38097   4.071   0.0000524482570815 ***
#> msoffice          -1730.07157   1829.33162  -0.946             0.344630    
#> old_price             0.28283      0.02102  13.457 < 0.0000000000000002 ***
#> discount          -1097.23683     86.04487 -12.752 < 0.0000000000000002 ***
#> star_rating       -1684.90027    377.03575  -4.469   0.0000092668907647 ***
#> ratings              -9.13735      3.59625  -2.541             0.011290 *  
#> reviews              64.23836     30.75408   2.089             0.037113 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 16370 on 655 degrees of freedom
#>   (95 observations deleted due to missingness)
#> Multiple R-squared:  0.887,  Adjusted R-squared:  0.862 
#> F-statistic: 35.45 on 145 and 655 DF,  p-value: < 0.00000000000000022

Model Stepwise Reggresion both

# Perform stepwise model selection using AIC
model_both <- stepAIC(model_all, 
                            direction = "both", 
                            trace = F)

# Print the summary of the simplified model
summary(model_both)
#> 
#> Call:
#> lm(formula = latest_price ~ brand + model + processor_brand + 
#>     processor_name + ram_gb + ram_type + ssd + hdd + os + os_bit + 
#>     graphic_card_gb + warranty + Touchscreen + old_price + discount + 
#>     star_rating + ratings + reviews, data = laptopSales_clean)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -58883  -6584      0   5311 154896 
#> 
#> Coefficients: (12 not defined because of singularities)
#>                      Estimate   Std. Error t value             Pr(>|t|)    
#> (Intercept)       67806.03866  22632.07445   2.996             0.002838 ** 
#> brandAPPLE        20953.04832  22172.80842   0.945             0.345012    
#> brandASUS         -1997.95099  12266.47860  -0.163             0.870664    
#> brandAvita       -18160.19145  22345.42127  -0.813             0.416682    
#> brandDELL        -13865.08518  14372.99519  -0.965             0.335068    
#> brandHP          -14458.06590  12584.19477  -1.149             0.251012    
#> brandiball       -12371.59967  27839.25253  -0.444             0.656904    
#> brandInfinix     -47702.06029  26864.11590  -1.776             0.076248 .  
#> brandlenovo        7963.03635  17704.43392   0.450             0.653020    
#> brandLenovo       -3183.27170  12948.38845  -0.246             0.805880    
#> brandLG           12516.61507  22033.17585   0.568             0.570174    
#> brandMi          -25183.06463  23784.59603  -1.059             0.290081    
#> brandMICROSOFT    24259.23809  23108.44245   1.050             0.294196    
#> brandMSI          11065.30557  26989.69853   0.410             0.681952    
#> brandNokia        -5863.16494  22407.81243  -0.262             0.793668    
#> brandrealme      -16955.97004  24154.84702  -0.702             0.482946    
#> brandRedmiBook   -41477.07046  18907.91790  -2.194             0.028611 *  
#> brandSAMSUNG      15129.77772  30555.84508   0.495             0.620658    
#> brandSmartron    -25438.26273  24100.21780  -1.056             0.291575    
#> brandVaio         -9759.33333  26601.52838  -0.367             0.713833    
#> model14s            326.58570  18018.05240   0.018             0.985544    
#> model15           25010.09178  21272.33276   1.176             0.240136    
#> model15-ec1105AX -19076.77360  23382.20096  -0.816             0.414871    
#> model15q          -6306.35390  18995.74333  -0.332             0.740004    
#> model15s         -11217.45588  17189.38280  -0.653             0.514255    
#> model250          -8509.49091  23388.19252  -0.364             0.716097    
#> model250-G6       -7447.92017  23368.37002  -0.319             0.750042    
#> model3000          1090.98555  24775.64341   0.044             0.964890    
#> model3511         -9904.17253  24712.29756  -0.401             0.688713    
#> model430           1874.94673  23428.40504   0.080             0.936239    
#> modelA6-9225     -19501.76120  24510.89299  -0.796             0.426530    
#> modelAlpha       -59045.93937  23759.42904  -2.485             0.013197 *  
#> modelAMD            301.28654  23864.48520   0.013             0.989931    
#> modelAPU         -46115.72378  19668.17063  -2.345             0.019339 *  
#> modelAspire      -16693.12886  21002.77652  -0.795             0.427013    
#> modelAsus        -18717.01731  23969.92036  -0.781             0.435169    
#> modelASUS        -16874.09030  17923.88681  -0.941             0.346830    
#> modelAthlon      -71356.13370  24322.42210  -2.934             0.003465 ** 
#> modelB50-70      -10741.63618  25445.77519  -0.422             0.673062    
#> modelBook         -2332.60760  17342.25166  -0.135             0.893045    
#> modelBook(Slim)            NA           NA      NA                   NA    
#> modelBravo       -34513.45598  23833.25635  -1.448             0.148059    
#> modelCeleron     -12515.60319  19574.01204  -0.639             0.522786    
#> modelChromebook  -16036.59028  17666.20472  -0.908             0.364340    
#> modelCommercial  -20636.91522  23925.98730  -0.863             0.388709    
#> modelCompBook              NA           NA      NA                   NA    
#> modelConceptD    -13197.18411  26510.25918  -0.498             0.618781    
#> modelCosmos        3868.64321  18708.01404   0.207             0.836237    
#> modelCreator      -4081.35577  23652.66591  -0.173             0.863055    
#> modelDA           -3430.64774  23215.18377  -0.148             0.882565    
#> modelDELL        -12946.50653  21827.09900  -0.593             0.553292    
#> modelDelta         6396.71477  23906.05509   0.268             0.789109    
#> modelDual        -35165.47085  26443.48992  -1.330             0.184034    
#> modelE            -6219.52003  18394.31076  -0.338             0.735379    
#> modelEeeBook     -17210.20587  19074.59861  -0.902             0.367250    
#> modelEnvy          4922.31372  17914.21273   0.275             0.783578    
#> modelExpertBook   -3756.84414  17864.99041  -0.210             0.833506    
#> modelExtensa     -21026.37592  26726.00875  -0.787             0.431718    
#> modelF17         -36122.58689  23890.91616  -1.512             0.131019    
#> modelG15         -20994.60145  21091.25850  -0.995             0.319899    
#> modelG3          -15110.17732  24734.61163  -0.611             0.541482    
#> modelG5          -13711.31830  22054.03635  -0.622             0.534345    
#> modelG7           27232.97220  25727.39511   1.059             0.290207    
#> modelGalaxy                NA           NA      NA                   NA    
#> modelGAMING      -18181.71385  24791.04701  -0.733             0.463577    
#> modelGE76         81892.64679  24302.69180   3.370             0.000796 ***
#> modelGF63        -33214.41541  18013.82313  -1.844             0.065657 .  
#> modelGF65        -50064.36488  20469.32266  -2.446             0.014713 *  
#> modelGP65         16245.70083  24244.73482   0.670             0.503047    
#> modelGP76         -3900.76356  24136.66990  -0.162             0.871661    
#> modelGram                  NA           NA      NA                   NA    
#> modelGS          -21296.62246  24718.67363  -0.862             0.389243    
#> modelGS66          8496.18112  24121.02548   0.352             0.724778    
#> modelHP           -8547.78914  17899.08256  -0.478             0.633126    
#> modelIdeapad      -9667.28710  18325.03017  -0.528             0.597993    
#> modelIdeaPad      -7288.76931  18253.59578  -0.399             0.689797    
#> modelIDEAPAD     -26554.72599  24675.44681  -1.076             0.282250    
#> modelINBook       44078.78723  19507.88632   2.260             0.024176 *  
#> modelInpiron      -2832.02241  24770.84449  -0.114             0.909012    
#> modelInspiron     -1070.46578  18573.07821  -0.058             0.954057    
#> modelINSPIRON     -2304.33060  21791.67925  -0.106             0.915818    
#> modelInsprion     -6845.91170  24778.24145  -0.276             0.782414    
#> modelIntel         7607.31872  19940.20922   0.382             0.702951    
#> modelKatana      -37034.26537  18636.25253  -1.987             0.047313 *  
#> modelLegion      -10068.07968  18797.42637  -0.536             0.592409    
#> modelLenovo       -3911.19762  24597.96025  -0.159             0.873714    
#> modelLiber         5527.72316  10402.71995   0.531             0.595340    
#> modelMacBook               NA           NA      NA                   NA    
#> modelModern      -29637.06619  17908.23882  -1.655             0.098413 .  
#> modelNitro        -2755.73145  22237.73513  -0.124             0.901415    
#> modelNotebook     -7057.91968  23205.71487  -0.304             0.761113    
#> modelOmen         -9609.74489  20546.19727  -0.468             0.640144    
#> modelOMEN         -9508.81833  18396.89082  -0.517             0.605420    
#> modelPavilion     -4722.40097  16732.17346  -0.282             0.777852    
#> modelPentium      -8141.37873  18686.95825  -0.436             0.663218    
#> modelPredator    -20946.26558  21789.81158  -0.961             0.336761    
#> modelPrestige    -15141.43210  18425.63339  -0.822             0.411512    
#> modelPro           8407.70773  19316.80149   0.435             0.663521    
#> modelPulse       -38425.95861  19592.46262  -1.961             0.050270 .  
#> modelPURA                  NA           NA      NA                   NA    
#> modelPureBook              NA           NA      NA                   NA    
#> modelRog           6517.75715  24030.16006   0.271             0.786297    
#> modelROG          -7092.32998  17771.44135  -0.399             0.689959    
#> modelRyzen        -7880.46305  17078.40676  -0.461             0.644644    
#> modelSE                    NA           NA      NA                   NA    
#> modelSpectre      36141.42688  17378.75021   2.080             0.037946 *  
#> modelSpin         37220.54151  27318.66816   1.362             0.173519    
#> modelStealth     -10111.38319  20715.98962  -0.488             0.625645    
#> modelSummit      -35622.67400  23518.76190  -1.515             0.130341    
#> modelSurface               NA           NA      NA                   NA    
#> modelSwift       -10264.08388  21861.21608  -0.470             0.638860    
#> modelSword       -36238.36542  20670.51738  -1.753             0.080043 .  
#> modelt.book                NA           NA      NA                   NA    
#> modelThinkbook     -841.34026  20288.13874  -0.041             0.966934    
#> modelThinkBook    -3301.01601  20053.40131  -0.165             0.869301    
#> modelThinkpad    -50421.24815  25293.66128  -1.993             0.046626 *  
#> modelThinkPad     20571.66678  18957.74027   1.085             0.278260    
#> modelThinpad     -10439.11119  24345.30467  -0.429             0.668214    
#> modelTravelmate  -18495.91163  22827.78217  -0.810             0.418097    
#> modelTUF         -25327.45428  18104.70648  -1.399             0.162301    
#> modelv15         -16653.78412  24289.61241  -0.686             0.493185    
#> modelV15         -15848.00420  24320.60383  -0.652             0.514868    
#> modelVivo        -14531.33374  23783.68697  -0.611             0.541424    
#> modelVivoBook    -12900.05952  17327.98941  -0.744             0.456862    
#> modelVivoBook14   -7604.50777  23794.69774  -0.320             0.749382    
#> modelVostro       -5467.23643  18687.85849  -0.293             0.769954    
#> modelWF65                  NA           NA      NA                   NA    
#> modelX1                    NA           NA      NA                   NA    
#> modelx360         -3246.85193  20160.13618  -0.161             0.872101    
#> modelX390          9722.52867  24421.41044   0.398             0.690675    
#> modelXPS          26702.06998  20260.76332   1.318             0.187989    
#> modelYoga         -3936.56025  18831.46025  -0.209             0.834480    
#> modelZenbook      -3743.52604  18564.75392  -0.202             0.840255    
#> modelZenBook        896.05591  17833.89463   0.050             0.959943    
#> modelZephyrus     49860.14719  19172.83953   2.601             0.009516 ** 
#> processor_brand   -5697.55200   3264.14668  -1.745             0.081365 .  
#> processor_name     -292.64779    180.23915  -1.624             0.104927    
#> ram_gb             1020.65575    179.69537   5.680  0.00000002023641693 ***
#> ram_typeDDR4      -6127.10247   7352.74709  -0.833             0.404973    
#> ram_typeDDR5      16969.90914  10336.51322   1.642             0.101121    
#> ram_typeLPDDR3    18104.16847   8953.02393   2.022             0.043567 *  
#> ram_typeLPDDR4   -14707.21090   8597.28133  -1.711             0.087611 .  
#> ram_typeLPDDR4X  -13022.64794   7456.08282  -1.747             0.081177 .  
#> ssd                  50.67682      3.40783  14.871 < 0.0000000000000002 ***
#> hdd                  13.33086      2.08507   6.393  0.00000000030736938 ***
#> os                22751.68108   2863.88065   7.944  0.00000000000000848 ***
#> os_bit             -144.35363     84.46219  -1.709             0.087905 .  
#> graphic_card_gb    4166.09241    538.41993   7.738  0.00000000000003821 ***
#> warranty           2124.31800   1412.35192   1.504             0.133036    
#> Touchscreen        9834.80543   2484.18944   3.959  0.00008347721279763 ***
#> old_price             0.28100      0.02093  13.428 < 0.0000000000000002 ***
#> discount          -1095.16380     84.10894 -13.021 < 0.0000000000000002 ***
#> star_rating       -1736.40019    370.99756  -4.680  0.00000347854288514 ***
#> ratings              -9.07415      3.59120  -2.527             0.011745 *  
#> reviews              63.74358     30.70616   2.076             0.038289 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 16350 on 658 degrees of freedom
#>   (95 observations deleted due to missingness)
#> Multiple R-squared:  0.8867, Adjusted R-squared:  0.8623 
#> F-statistic: 36.28 on 142 and 658 DF,  p-value: < 0.00000000000000022

Model Comparison (Goodness of fit)

Objective: to get the best model for target variable prediction.

goodness of fit for regression models with more than one predictor (multiple linear regression)

The more predictors used, the Multiple R-squared value will definitely increase, regardless of whether the predictors have an influence on the model or not.

Adjusted R-squared Formula: \[Adjusted\ R^2 = 1 - \frac{(1-R^2)(n-1)}{n-k-1}\] Information:

  • n: Number of data
  • k: Number of predictor variables
  • \(R^2\): Value of \(R^2\)

Purpose: to determine how well the model explains the variance of the target variable.

Compare the R-Squared values of the models that have been made

# check the R-Squared value for each model
summary(model_all)$adj.r.squared
#> [1] 0.8619639
summary(model_selection)$adj.r.squared
#> [1] 0.5595685
summary(model_backward)$adj.r.squared
#> [1] 0.8623013
summary(model_forward)$adj.r.squared
#> [1] 0.8619639
summary(model_both)$adj.r.squared
#> [1] 0.8623013

💡 Conclusion: a model that can explain the target variable well is the both model, which is 0.8623013 or 86.2%

5. Evaluation

Purpose: to find out whether the machine learning model that has been made is good enough by seeing whether the prediction results have produced the smallest error.

Model Performance

Root Mean Squared Error (RMSE) RMSE is the square root form of MSE. Because it has been rooted, its interpretation is more or less the same as MAE. RMSE can be used if we are more concerned with very large errors. In R, using the RMSE() function from the `MLmetrics package.

\[RMSE = \sqrt{\frac{1}{n} \sum (\hat y - y)^2}\]

to speed things up I used the performance library for the compare_performance function. And I see here is the RMSE for each model

library(performance)
comparison <- compare_performance(model_all, model_selection, model_backward, model_forward, model_both)

as.data.frame(comparison)
#>              Name Model      AIC    AIC_wt     AICc   AICc_wt      BIC
#> 1       model_all    lm 17950.69 0.0521678 18017.33 0.0128193 18639.51
#> 2 model_selection    lm 21075.37 0.0000000 21075.40 0.0000000 21089.77
#> 3  model_backward    lm 17946.39 0.4478322 18010.05 0.4871807 18621.16
#> 4   model_forward    lm 17950.69 0.0521678 18017.33 0.0128193 18639.51
#> 5      model_both    lm 17946.39 0.4478322 18010.05 0.4871807 18621.16
#>          BIC_wt        R2 R2_adjusted     RMSE    Sigma
#> 1 0.00005159968 0.8869829   0.8619639 14804.94 16372.02
#> 2 0.00000000000 0.5600606   0.5595685 30900.41 30934.96
#> 3 0.49994840032 0.8867428   0.8623013 14820.66 16352.00
#> 4 0.00005159968 0.8869829   0.8619639 14804.94 16372.02
#> 5 0.49994840032 0.8867428   0.8623013 14820.66 16352.00

💡 Conclusion: the model that gives the smallest error in predicting the inequality value is the model_all & model_forward, with an RMSE value of 14804.94

Assumption

As a statistical model, linear regression is a model with strict assumptions. The following are some assumptions that must be checked to ensure that the model we make is considered a Best Linear Unbiased Estimator (BLUE) model, namely a model that can predict new data consistently.

Assumptions of linear regression models:

1.Linearity

Linearity denotes a linear or straight line relationship between the target variable and its predictors.

A plot of residuals against fitted values can be used to assess the assumption of linearity of a regression model (multiple linear regression). This is a scatter plot with the fitted values (predicted results of the target variable) on the x-axis and the residual/error values generated by the model on the y-axis.

plot(model_both, # model  tested
     which = 1) # residual vs fitted

💡 Conclusion: The residual values are randomly distributed between -5000 and 5000, meaning that our model meets the linear assumptions.

2. Normality of Residuals

# histogram residual
hist(model_both$residuals)

plot(model_both, which = 2)

# shapiro test
shapiro.test(model_both$residuals)
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  model_both$residuals
#> W = 0.82289, p-value < 0.00000000000000022

💡 Conclusion: p-value < 0.00000000000000022, meaning that the residual data isnt normally distributed

3. Homoscedasticity of Residuals

plot(x = model_both$fitted.values, 
     y = model_both$residuals) 
abline(h = 0, col = "red") 

Model is Heteroscedasticity (Fan Shape Pattern)

The model’s mistakes are predicted to spread randomly or with constant variation. The mistake is not patterned when seen. Homoscedasticity is another name for this disease.

To make sure, In this time we use Test statistics with bptest() from the lmtest package

Breusch-Pagan hypothesis test:

  • H0: constant spreading error or homoscedasticity
  • H1: error spread is NOT constant or heteroscedasticity

Expected conditions: H0

reject H0 if the p-value < 0.05 (alpha)

# bptest of models
library(lmtest)
bptest(model_both)
#> 
#>  studentized Breusch-Pagan test
#> 
#> data:  model_both
#> BP = 360.73, df = 142, p-value < 0.00000000000000022

💡 Conclusion: p-value = 0.00000000000000022 < 0.05, meaning that the errors are not randomly distributed or heteroscedasticity


💡 Conclusion: all predictor variables meet the no multicollinearity assumption

# 6. Model Improvement {.tabset}

## Tuning
We've already noted that a number of assumptions in our model are false, including the ones of linearity, heteroscedasticity, and autocorrelation. We're going to try to fix them now. One of the approaches that can be adopted is to shun off the variables that have correlation coefficient above 0.7. Also i try to use `sqrt` for every single numeric variabel.



```r
# transform variable
laptopSales_clean <- laptopSales_clean %>% 
  select(-c(reviews, old_price)) %>%
  mutate_if(~is.numeric(.), sqrt)

head(laptopSales_clean)
#>    brand   model processor_brand processor_name processor_gnrtn   ram_gb
#> 1 Lenovo A6-9225               0       1.000000        2.236068 2.000000
#> 2 Lenovo Ideapad               0       1.414214        2.236068 2.000000
#> 3  Avita    PURA               0       1.414214        2.236068 2.000000
#> 4  Avita    PURA               0       1.414214        2.236068 2.000000
#> 5  Avita    PURA               0       1.414214        2.236068 2.000000
#> 6  Avita    PURA               0       1.414214        2.236068 2.828427
#>   ram_type      ssd      hdd os os_bit graphic_card_gb   weight warranty
#> 1     DDR4  0.00000 32.00000  0      8               0 1.414214        0
#> 2     DDR4  0.00000 22.62742  0      8               0 0.000000        0
#> 3     DDR4 11.31371  0.00000  0      8               0 1.414214        0
#> 4     DDR4 11.31371  0.00000  0      8               0 1.414214        0
#> 5     DDR4 16.00000  0.00000  0      8               0 1.414214        0
#> 6     DDR4 16.00000  0.00000  0      8               0 1.414214        0
#>   Touchscreen msoffice latest_price discount star_rating   ratings
#> 1           0        0     158.0823 4.795832    1.923538  7.937254
#> 2           0        0     139.9643 2.828427    1.897367 43.520110
#> 3           0        0     141.3860 5.291503    1.923538 33.955854
#> 4           0        0     146.5947 4.795832    1.923538 33.955854
#> 5           0        0     158.0823 5.000000    1.923538 40.706265
#> 6           0        0     158.0823 5.000000    1.923538 40.706265

Remove all the outlier using interquartile range (IQR) Method

interquartile range (IQR) Method is Calculate the IQR for the variable and remove the data points that fall outside the range defined by Q1 + 1.5 * IQR and Q3 + 1.5 * IQR, where Q1 is the 25th percentile and Q3 is the 75th percentile.

# Calculate IQR and define the threshold
calculate_iqr <- function(x) {
  q1 <- quantile(x, 0.25)
  q3 <- quantile(x, 0.75)
  iqr <- q3 - q1
  threshold <- 1.5 * iqr
  return(list(q1 = q1, q3 = q3, iqr = iqr, threshold = threshold))
}

# Function to remove outliers
remove_outliers <- function(data, column) {
  iqr_values <- calculate_iqr(data[[column]])
  lower_bound <- iqr_values$q1 - iqr_values$threshold
  upper_bound <- iqr_values$q3 + iqr_values$threshold
  data <- data[data[[column]] >= lower_bound & data[[column]] <= upper_bound, ]
  return(data)
}

# Applying the function to each numeric column
numeric_columns <- sapply(laptopSales_clean, is.numeric)
laptopSales_clean <- Reduce(remove_outliers, names(laptopSales_clean)[numeric_columns], init = laptopSales_clean)

Performance

model_allTuning <- lm(formula = latest_price ~ .,
                data = laptopSales_clean)

summary(model_allTuning) 
#> 
#> Call:
#> lm(formula = latest_price ~ ., data = laptopSales_clean)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -15.419  -4.566   0.000   2.386  29.256 
#> 
#> Coefficients: (16 not defined because of singularities)
#>                   Estimate Std. Error t value         Pr(>|t|)    
#> (Intercept)     -161.53865   59.91917  -2.696         0.008897 ** 
#> brandASUS         38.24889   18.47446   2.070         0.042334 *  
#> brandDELL         -4.85797   15.19335  -0.320         0.750173    
#> brandHP            0.75717   17.12698   0.044         0.964871    
#> brandInfinix     -18.93953   12.13192  -1.561         0.123275    
#> brandLenovo        7.49561   14.32174   0.523         0.602468    
#> brandLG           42.08832   17.75733   2.370         0.020710 *  
#> brandMi          -26.21575   17.69882  -1.481         0.143308    
#> brandMSI          29.58811   14.66149   2.018         0.047651 *  
#> brandNokia        -1.01243   17.10427  -0.059         0.952978    
#> modelAspire        0.48341   17.07611   0.028         0.977501    
#> modelASUS        -13.55734   13.30977  -1.019         0.312112    
#> modelEnvy         69.19190   13.37664   5.173 0.00000233683067 ***
#> modelF17         -12.64937   15.17152  -0.834         0.407425    
#> modelGAMING       17.96033   12.06157   1.489         0.141236    
#> modelGF63        -30.74499   16.85820  -1.824         0.072719 .  
#> modelGram               NA         NA      NA               NA    
#> modelIdeapad     -13.83130   15.99896  -0.865         0.390438    
#> modelIdeaPad      -4.74803   16.16594  -0.294         0.769904    
#> modelINBook             NA         NA      NA               NA    
#> modelInspiron     18.33226    5.12381   3.578         0.000655 ***
#> modelIntel              NA         NA      NA               NA    
#> modelLegion       -2.42014   16.63331  -0.145         0.884760    
#> modelModern      -35.17965   16.16134  -2.177         0.033079 *  
#> modelNotebook           NA         NA      NA               NA    
#> modelPavilion      7.56497    8.60453   0.879         0.382491    
#> modelPrestige           NA         NA      NA               NA    
#> modelPureBook           NA         NA      NA               NA    
#> modelROG          -6.75657   12.83465  -0.526         0.600353    
#> modelThinkbook     4.60508   18.89732   0.244         0.808227    
#> modelThinkBook     0.59571   17.81181   0.033         0.973421    
#> modelThinkPad     -0.29711   17.50943  -0.017         0.986513    
#> modelTUF         -26.77617   13.65186  -1.961         0.054058 .  
#> modelVivoBook    -45.78300   10.30042  -4.445 0.00003445731408 ***
#> modelVostro             NA         NA      NA               NA    
#> modelYoga               NA         NA      NA               NA    
#> modelZenBook            NA         NA      NA               NA    
#> processor_brand         NA         NA      NA               NA    
#> processor_name   132.47803   15.34587   8.633 0.00000000000196 ***
#> processor_gnrtn   24.23635   11.67709   2.076         0.041836 *  
#> ram_gb                  NA         NA      NA               NA    
#> ram_typeLPDDR3          NA         NA      NA               NA    
#> ram_typeLPDDR4X    8.41653   11.53153   0.730         0.468051    
#> ssd                1.09634    0.40255   2.723         0.008258 ** 
#> hdd                     NA         NA      NA               NA    
#> os                      NA         NA      NA               NA    
#> os_bit                  NA         NA      NA               NA    
#> graphic_card_gb    7.09128    2.61252   2.714         0.008465 ** 
#> weight            -3.50159    2.64432  -1.324         0.190005    
#> warranty           2.12188    3.92034   0.541         0.590159    
#> Touchscreen             NA         NA      NA               NA    
#> msoffice           7.22968    3.77502   1.915         0.059810 .  
#> discount          -4.77728    1.44059  -3.316         0.001486 ** 
#> star_rating       -5.58194   14.09798  -0.396         0.693427    
#> ratings           -0.06325    0.11401  -0.555         0.580939    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 9.467 on 66 degrees of freedom
#>   (20 observations deleted due to missingness)
#> Multiple R-squared:  0.9346, Adjusted R-squared:  0.897 
#> F-statistic: 24.83 on 38 and 66 DF,  p-value: < 0.00000000000000022
#cek korelasi
ggcorr(laptopSales_clean, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)

💡 Insight: Strong correlation > 0.6 old_price

# Models with columns that have a fairly strong correlation
model_selection <- lm(formula = latest_price ~ graphic_card_gb,
                      data = laptopSales_clean)

summary(model_selection)
#> 
#> Call:
#> lm(formula = latest_price ~ graphic_card_gb, data = laptopSales_clean)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -45.395 -19.825   1.744  13.784  59.541 
#> 
#> Coefficients:
#>                 Estimate Std. Error t value             Pr(>|t|)    
#> (Intercept)      227.027      2.469  91.951 < 0.0000000000000002 ***
#> graphic_card_gb   19.880      2.681   7.415      0.0000000000172 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 24.02 on 123 degrees of freedom
#> Multiple R-squared:  0.3089, Adjusted R-squared:  0.3033 
#> F-statistic: 54.98 on 1 and 123 DF,  p-value: 0.0000000000172

Model Stepwise Reggresion backward

# Perform stepwise model selection using AIC
model_backward <- stepAIC(model_allTuning, 
                            direction = "backward", 
                            trace = F)

# Print the summary of the simplified model
summary(model_backward)
#> 
#> Call:
#> lm(formula = latest_price ~ model + processor_name + processor_gnrtn + 
#>     ssd + graphic_card_gb + weight + msoffice + discount, data = laptopSales_clean)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -15.451  -5.365   0.000   2.365  28.982 
#> 
#> Coefficients:
#>                  Estimate Std. Error t value            Pr(>|t|)    
#> (Intercept)     -185.3185    42.1272  -4.399 0.00003804387984601 ***
#> modelAspire        1.3644     9.5800   0.142            0.887156    
#> modelASUS         24.0496    10.2740   2.341            0.022096 *  
#> modelEnvy         71.8139    12.6564   5.674 0.00000029154314923 ***
#> modelF17          24.7945    13.0277   1.903            0.061125 .  
#> modelGAMING       16.4437    12.5667   1.309            0.194978    
#> modelGF63         -0.8614    10.9117  -0.079            0.937302    
#> modelGram         40.8222    10.5906   3.855            0.000254 ***
#> modelIdeapad      -5.1018     8.1063  -0.629            0.531157    
#> modelIdeaPad       3.7114     7.7551   0.479            0.633733    
#> modelINBook      -12.0161     8.9743  -1.339            0.184921    
#> modelInspiron     15.0390     7.2856   2.064            0.042705 *  
#> modelIntel         9.1623    11.7078   0.783            0.436513    
#> modelLegion        6.1580    10.8985   0.565            0.573858    
#> modelModern       -6.6307     9.3473  -0.709            0.480453    
#> modelNotebook    -25.2192    10.2190  -2.468            0.016039 *  
#> modelPavilion      9.2791     8.0502   1.153            0.252975    
#> modelPrestige     38.0203    11.9681   3.177            0.002216 ** 
#> modelPureBook     -2.8073     9.0932  -0.309            0.758444    
#> modelROG          29.7220    10.4791   2.836            0.005963 ** 
#> modelThinkbook    12.5684    12.0683   1.041            0.301257    
#> modelThinkBook     8.5806    10.3670   0.828            0.410662    
#> modelThinkPad      7.8123     8.3575   0.935            0.353122    
#> modelTUF          13.1599    10.6349   1.237            0.220065    
#> modelVivoBook     -7.1481     7.6145  -0.939            0.351092    
#> modelVostro       -3.2908     7.9912  -0.412            0.681739    
#> modelYoga         17.8446    12.3907   1.440            0.154279    
#> modelZenBook      38.6399    11.9727   3.227            0.001902 ** 
#> processor_name   135.7023    13.6819   9.918 0.00000000000000561 ***
#> processor_gnrtn   24.5405    11.3559   2.161            0.034115 *  
#> ssd                1.2354     0.3551   3.479            0.000870 ***
#> graphic_card_gb    6.1557     2.3930   2.572            0.012223 *  
#> weight            -4.0548     2.3502  -1.725            0.088877 .  
#> msoffice           7.4183     2.9136   2.546            0.013097 *  
#> discount          -4.5371     1.2834  -3.535            0.000727 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 9.273 on 70 degrees of freedom
#>   (20 observations deleted due to missingness)
#> Multiple R-squared:  0.9335, Adjusted R-squared:  0.9012 
#> F-statistic: 28.89 on 34 and 70 DF,  p-value: < 0.00000000000000022

Model Stepwise Reggresion forward

# Perform stepwise model selection using AIC
model_forward <- stepAIC(model_allTuning, 
                            direction = "forward", 
                            trace = F)

# Print the summary of the simplified model
summary(model_forward)
#> 
#> Call:
#> lm(formula = latest_price ~ brand + model + processor_brand + 
#>     processor_name + processor_gnrtn + ram_gb + ram_type + ssd + 
#>     hdd + os + os_bit + graphic_card_gb + weight + warranty + 
#>     Touchscreen + msoffice + discount + star_rating + ratings, 
#>     data = laptopSales_clean)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -15.419  -4.566   0.000   2.386  29.256 
#> 
#> Coefficients: (16 not defined because of singularities)
#>                   Estimate Std. Error t value         Pr(>|t|)    
#> (Intercept)     -161.53865   59.91917  -2.696         0.008897 ** 
#> brandASUS         38.24889   18.47446   2.070         0.042334 *  
#> brandDELL         -4.85797   15.19335  -0.320         0.750173    
#> brandHP            0.75717   17.12698   0.044         0.964871    
#> brandInfinix     -18.93953   12.13192  -1.561         0.123275    
#> brandLenovo        7.49561   14.32174   0.523         0.602468    
#> brandLG           42.08832   17.75733   2.370         0.020710 *  
#> brandMi          -26.21575   17.69882  -1.481         0.143308    
#> brandMSI          29.58811   14.66149   2.018         0.047651 *  
#> brandNokia        -1.01243   17.10427  -0.059         0.952978    
#> modelAspire        0.48341   17.07611   0.028         0.977501    
#> modelASUS        -13.55734   13.30977  -1.019         0.312112    
#> modelEnvy         69.19190   13.37664   5.173 0.00000233683067 ***
#> modelF17         -12.64937   15.17152  -0.834         0.407425    
#> modelGAMING       17.96033   12.06157   1.489         0.141236    
#> modelGF63        -30.74499   16.85820  -1.824         0.072719 .  
#> modelGram               NA         NA      NA               NA    
#> modelIdeapad     -13.83130   15.99896  -0.865         0.390438    
#> modelIdeaPad      -4.74803   16.16594  -0.294         0.769904    
#> modelINBook             NA         NA      NA               NA    
#> modelInspiron     18.33226    5.12381   3.578         0.000655 ***
#> modelIntel              NA         NA      NA               NA    
#> modelLegion       -2.42014   16.63331  -0.145         0.884760    
#> modelModern      -35.17965   16.16134  -2.177         0.033079 *  
#> modelNotebook           NA         NA      NA               NA    
#> modelPavilion      7.56497    8.60453   0.879         0.382491    
#> modelPrestige           NA         NA      NA               NA    
#> modelPureBook           NA         NA      NA               NA    
#> modelROG          -6.75657   12.83465  -0.526         0.600353    
#> modelThinkbook     4.60508   18.89732   0.244         0.808227    
#> modelThinkBook     0.59571   17.81181   0.033         0.973421    
#> modelThinkPad     -0.29711   17.50943  -0.017         0.986513    
#> modelTUF         -26.77617   13.65186  -1.961         0.054058 .  
#> modelVivoBook    -45.78300   10.30042  -4.445 0.00003445731408 ***
#> modelVostro             NA         NA      NA               NA    
#> modelYoga               NA         NA      NA               NA    
#> modelZenBook            NA         NA      NA               NA    
#> processor_brand         NA         NA      NA               NA    
#> processor_name   132.47803   15.34587   8.633 0.00000000000196 ***
#> processor_gnrtn   24.23635   11.67709   2.076         0.041836 *  
#> ram_gb                  NA         NA      NA               NA    
#> ram_typeLPDDR3          NA         NA      NA               NA    
#> ram_typeLPDDR4X    8.41653   11.53153   0.730         0.468051    
#> ssd                1.09634    0.40255   2.723         0.008258 ** 
#> hdd                     NA         NA      NA               NA    
#> os                      NA         NA      NA               NA    
#> os_bit                  NA         NA      NA               NA    
#> graphic_card_gb    7.09128    2.61252   2.714         0.008465 ** 
#> weight            -3.50159    2.64432  -1.324         0.190005    
#> warranty           2.12188    3.92034   0.541         0.590159    
#> Touchscreen             NA         NA      NA               NA    
#> msoffice           7.22968    3.77502   1.915         0.059810 .  
#> discount          -4.77728    1.44059  -3.316         0.001486 ** 
#> star_rating       -5.58194   14.09798  -0.396         0.693427    
#> ratings           -0.06325    0.11401  -0.555         0.580939    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 9.467 on 66 degrees of freedom
#>   (20 observations deleted due to missingness)
#> Multiple R-squared:  0.9346, Adjusted R-squared:  0.897 
#> F-statistic: 24.83 on 38 and 66 DF,  p-value: < 0.00000000000000022

Model Stepwise Reggresion both

# Perform stepwise model selection using AIC
model_both <- stepAIC(model_allTuning, 
                            direction = "both", 
                            trace = F)

# Print the summary of the simplified model
summary(model_both)
#> 
#> Call:
#> lm(formula = latest_price ~ model + processor_name + processor_gnrtn + 
#>     ssd + graphic_card_gb + weight + msoffice + discount, data = laptopSales_clean)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -15.451  -5.365   0.000   2.365  28.982 
#> 
#> Coefficients:
#>                  Estimate Std. Error t value            Pr(>|t|)    
#> (Intercept)     -185.3185    42.1272  -4.399 0.00003804387984601 ***
#> modelAspire        1.3644     9.5800   0.142            0.887156    
#> modelASUS         24.0496    10.2740   2.341            0.022096 *  
#> modelEnvy         71.8139    12.6564   5.674 0.00000029154314923 ***
#> modelF17          24.7945    13.0277   1.903            0.061125 .  
#> modelGAMING       16.4437    12.5667   1.309            0.194978    
#> modelGF63         -0.8614    10.9117  -0.079            0.937302    
#> modelGram         40.8222    10.5906   3.855            0.000254 ***
#> modelIdeapad      -5.1018     8.1063  -0.629            0.531157    
#> modelIdeaPad       3.7114     7.7551   0.479            0.633733    
#> modelINBook      -12.0161     8.9743  -1.339            0.184921    
#> modelInspiron     15.0390     7.2856   2.064            0.042705 *  
#> modelIntel         9.1623    11.7078   0.783            0.436513    
#> modelLegion        6.1580    10.8985   0.565            0.573858    
#> modelModern       -6.6307     9.3473  -0.709            0.480453    
#> modelNotebook    -25.2192    10.2190  -2.468            0.016039 *  
#> modelPavilion      9.2791     8.0502   1.153            0.252975    
#> modelPrestige     38.0203    11.9681   3.177            0.002216 ** 
#> modelPureBook     -2.8073     9.0932  -0.309            0.758444    
#> modelROG          29.7220    10.4791   2.836            0.005963 ** 
#> modelThinkbook    12.5684    12.0683   1.041            0.301257    
#> modelThinkBook     8.5806    10.3670   0.828            0.410662    
#> modelThinkPad      7.8123     8.3575   0.935            0.353122    
#> modelTUF          13.1599    10.6349   1.237            0.220065    
#> modelVivoBook     -7.1481     7.6145  -0.939            0.351092    
#> modelVostro       -3.2908     7.9912  -0.412            0.681739    
#> modelYoga         17.8446    12.3907   1.440            0.154279    
#> modelZenBook      38.6399    11.9727   3.227            0.001902 ** 
#> processor_name   135.7023    13.6819   9.918 0.00000000000000561 ***
#> processor_gnrtn   24.5405    11.3559   2.161            0.034115 *  
#> ssd                1.2354     0.3551   3.479            0.000870 ***
#> graphic_card_gb    6.1557     2.3930   2.572            0.012223 *  
#> weight            -4.0548     2.3502  -1.725            0.088877 .  
#> msoffice           7.4183     2.9136   2.546            0.013097 *  
#> discount          -4.5371     1.2834  -3.535            0.000727 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 9.273 on 70 degrees of freedom
#>   (20 observations deleted due to missingness)
#> Multiple R-squared:  0.9335, Adjusted R-squared:  0.9012 
#> F-statistic: 28.89 on 34 and 70 DF,  p-value: < 0.00000000000000022

Model Comparison (Goodness of fit)

Compare the R-Squared values of the models that have been made

# check the R-Squared value for each model
summary(model_all)$adj.r.squared
#> [1] 0.8619639
summary(model_selection)$adj.r.squared
#> [1] 0.3032816
summary(model_backward)$adj.r.squared
#> [1] 0.9011722
summary(model_forward)$adj.r.squared
#> [1] 0.8969916
summary(model_both)$adj.r.squared
#> [1] 0.9011722

💡 Conclusion: a model that can explain the target variable well is the both model, which is 0.901172 or 90.1%

Evaluation

Model Performance

comparison <- compare_performance(model_all, model_selection, model_backward, model_forward, model_both)

as.data.frame(comparison)
#>              Name Model        AIC
#> 1       model_all    lm 17950.6912
#> 2 model_selection    lm  1153.4333
#> 3  model_backward    lm   795.0923
#> 4   model_forward    lm   801.2644
#> 5      model_both    lm   795.0923
#>                                                                                    AIC_wt
#> 1 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000
#> 2 0.0000000000000000000000000000000000000000000000000000000000000000000000000000007523405
#> 3 0.4888343312100105753081891180045204237103462219238281250000000000000000000000000000000
#> 4 0.0223313375799788979558790913415577961131930351257324218750000000000000000000000000000
#> 5 0.4888343312100105753081891180045204237103462219238281250000000000000000000000000000000
#>         AICc
#> 1 18017.3252
#> 2  1153.6316
#> 3   834.2688
#> 4   852.5144
#> 5   834.2688
#>                                                                          AICc_wt
#> 1 0.0000000000000000000000000000000000000000000000000000000000000000000000000000
#> 2 0.0000000000000000000000000000000000000000000000000000000000000000000002239706
#> 3 0.4999727142594505013839523144270060583949089050292968750000000000000000000000
#> 4 0.0000545714810991374194362735217112003738293424248695373535156250000000000000
#> 5 0.4999727142594505013839523144270060583949089050292968750000000000000000000000
#>          BIC
#> 1 18639.5127
#> 2  1161.9182
#> 3   890.6349
#> 4   907.4228
#> 5   890.6349
#>                                                                 BIC_wt
#> 1 0.000000000000000000000000000000000000000000000000000000000000000000
#> 2 0.000000000000000000000000000000000000000000000000000000000006173026
#> 3 0.499943448614036001220739535710890777409076690673828125000000000000
#> 4 0.000113102771928052730858980934325330736101022921502590179443359375
#> 5 0.499943448614036001220739535710890777409076690673828125000000000000
#>          R2 R2_adjusted         RMSE        Sigma
#> 1 0.8869829   0.8619639 14804.943021 16372.023085
#> 2 0.3089003   0.3032816    23.826408    24.019337
#> 3 0.9334813   0.9011722     7.571250     9.272850
#> 4 0.9346293   0.8969916     7.505633     9.466948
#> 5 0.9334813   0.9011722     7.571250     9.272850

💡 Conclusion: the model that gives the smallest error in predicting the inequality value is the model_forward, with an RMSE value of 7.505633

Assumption Linear Regression

1.Linearity

plot(model_both, # model yg akan diujikan
     which = 1) # residual vs fitted

💡 Conclusion: The residual values are randomly distributed between -10 and 10, meaning that our model meets the linear assumptions

2. Normality of Residuals

# histogram residual
hist(model_both$residuals)

plot(model_both, which = 2)

# shapiro test
shapiro.test(model_both$residuals)
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  model_both$residuals
#> W = 0.93301, p-value = 0.00004905
  • H0: normally distributed error
  • H1: errors are NOT normally distributed

H0 is rejected if p-values < 0.05 (alpha)

Expected conditions: H0

💡 Conclusion: p-value = 0.00004905, meaning that the residual data is not normally distributed

3. Homoscedasticity of Residuals

plot(x = model_both$fitted.values, 
     y = model_both$residuals) 
abline(h = 0, col = "red") 

Breusch-Pagan hypothesis test:

  • H0: constant spreading error or homoscedasticity
  • H1: error spread is NOT constant or heteroscedasticity

Expected conditions: H0

reject H0 if the p-value < 0.05 (alpha)

# bptest of models
library(lmtest)
bptest(model_both)
#> 
#>  studentized Breusch-Pagan test
#> 
#> data:  model_both
#> BP = 41.316, df = 34, p-value = 0.1814
  • H0: constant spreading error or homoscedasticity
  • H1: error spread is NOT constant or heteroscedasticity

Expected conditions: H0 💡 Conclusion: p-value = 0.1814 > 0.05, meaning that the errors are randomly distributed or homoscedasticity

Conclusion

Our final model has satisfied the classical assumptions. The R-squared of the model is high, with 89.7% of the variables can explain the variances in the Laptop price.

During our model tuning and improvement process, we initially applied the interquartile range (IQR) method to handle outliers in the data. By removing the data points that fell outside the range defined by Q1 + 1.5 * IQR and Q3 + 1.5 * IQR, we aimed to reduce the impact of extreme values on our model’s performance.

However, despite this preprocessing step, we encountered an issue with the Normality of Residuals. The spread of errors was not constant, indicating heteroscedasticity. This suggested that the variability of the residuals was not consistent across the range of predicted values.

Although we observed an improvement in the Homoscedasticity of Residuals, indicating that the spreading error was becoming more constant, the issue of heteroscedasticity persisted. This non-constant spread of errors could affect the accuracy and reliability of our model’s predictions.

Despite these challenges, we found encouraging results in terms of Linearity. The residual values were randomly distributed between -10 and 10, demonstrating that our model fulfilled the assumption of linearity. This indicated that the relationship between the predictor variables and the response variable was adequately captured by our model.

Given the persistent issue of heteroscedasticity, we recognized the need for further model refinement. Future steps could include exploring alternative transformation methods, such as logarithmic or power transformations, to address the issue of heteroscedasticity. Additionally, employing weighted least squares regression, robust regression techniques, or modeling frameworks specifically designed to handle heteroscedasticity could be considered.

It’s important to note that while linearity and homoscedasticity are crucial assumptions for regression analysis, addressing heteroscedasticity is equally important to ensure the validity of our model’s predictions. By recognizing and addressing these challenges, we can continue working towards improving the overall performance and accuracy of our model.