We continue our notes on linear regression. In this article, we’re going to do multiple linear regression with R, which we’ve done with Python in the last time. Theoretically, we’ve said something, in the first four articles of Notes on Linear Regression. I won’t go into the theory here. First, in order to understand the event, I’d like to talk a bit about the dataset we have, what we’re trying to solve with the model we’re going to build, which attributes correspond to which variables.

Multiple Linear Regression Definition

Multiple linear regression is a statistical technique used to model the relationship between one dependent variable and two or more independent (predictor) variables. The method extends simple linear regression, which deals with one independent variable, to cases where the outcome depends on multiple factors.

The general formula for a multiple linear regression model is:

plot(1:10, 1:10, main=expression(Y == beta[0] + beta[1]*X[1] + beta[2]*X[2]+...+ beta[n]*X[n]+ϵ))

Where:

- Y is the dependent variable,

- X1, X2, …, Xn are the independent variables,

- β0 is the intercept,

- β1, β2, …, βn are the coefficients for the independent variables, representing the change in 𝑌 for a one-unit change in 𝑋

- ϵ is the error term, accounting for variability not explained by the model.

The goal of multiple linear regression is to find the values of β1, β2, …, βn that minimize the differences between the observed and predicted values of 𝑌, typically using a method like ordinary least squares.

library(readr)
Sirket_Harcama_Kar_Bilgileri <- read_csv("Sirket_Harcama_Kar_Bilgileri.csv")
## Rows: 50 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Sehir
## dbl (4): ArgeHarcamasi, YonetimGiderleri, PazarlamaHarcamasi, Kar
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Sirket_Harcama_Kar_Bilgileri)
View(Sirket_Harcama_Kar_Bilgileri)
regressor = lm(formula = Kar ~ ArgeHarcamasi + YonetimGiderleri + PazarlamaHarcamasi,
               data = Sirket_Harcama_Kar_Bilgileri)
summary(regressor)
## 
## Call:
## lm(formula = Kar ~ ArgeHarcamasi + YonetimGiderleri + PazarlamaHarcamasi, 
##     data = Sirket_Harcama_Kar_Bilgileri)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33534  -4795     63   6606  17275 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         5.012e+04  6.572e+03   7.626 1.06e-09 ***
## ArgeHarcamasi       8.057e-01  4.515e-02  17.846  < 2e-16 ***
## YonetimGiderleri   -2.682e-02  5.103e-02  -0.526    0.602    
## PazarlamaHarcamasi  2.723e-02  1.645e-02   1.655    0.105    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9232 on 46 degrees of freedom
## Multiple R-squared:  0.9507, Adjusted R-squared:  0.9475 
## F-statistic:   296 on 3 and 46 DF,  p-value: < 2.2e-16

Interpretation

We have 3 independent variables

Arge Expenses (ArgeHarcamasi) is significant as P<0.05 and the null hypothesis H0 is rejected, there is a positive correlation between profit and ArgeExpenses.

Management Expenses(YonetimGiderleri) is insignificant as P>0.05 and the null hypothesis H0 is accepted, there is no correlation between profit and ManagementExpenses.

MarketingExpenditure(PazarlamaHarcamasi) is insignificant as P > 0.05 and the null hypothesis H0 is accepted, there is no relationship between profit and MarketingExpenditure.

R-squared (R2 ): 0.9507 Indicates that the independent variables explain approximately 95.7% of the variability in the model.

Adjusted R2: 94.75% of the change in the dependent variable is explained by the independent variables. The rest is explained by other variables.

F-statistic: (tests the validity of the model in general terms).

P-value: 2.2e-16 (P<0.05, indicating that the model has a non-zero coefficient of at least one independent variable) The model is significant.

library(ggplot2)
library(PerformanceAnalytics)
## Le chargement a nécessité le package : xts
## Le chargement a nécessité le package : zoo
## 
## Attachement du package : 'zoo'
## Les objets suivants sont masqués depuis 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attachement du package : 'PerformanceAnalytics'
## L'objet suivant est masqué depuis 'package:graphics':
## 
##     legend
9232/mean(Sirket_Harcama_Kar_Bilgileri$Kar)
## [1] 0.08241927
regressor$coefficients
##        (Intercept)      ArgeHarcamasi   YonetimGiderleri PazarlamaHarcamasi 
##       5.012219e+04       8.057150e-01      -2.681597e-02       2.722806e-02

Our multiple linear regression model

y= 5.0120000+0.857xArgeHarcamasi-0.02682xYonetimGiderleri+0.02723xPazarlamaHarcamasi

A one unit increase in the dependent variable Y is estimated to be a 0.857 one unit increase in the independent variable X1.

If Y increases by one unit, X2 is expected to decrease by -0.02682 one unit.

A one unit increase in the dependent variable Y is estimated to be a 0.02723 one unit increase in the independent variable X3.

Understanding Data

In the table above we see our qualifications:

ArgeHarcamasi(Arge Expenditures): Arge expenditures made by the company during the survey period. Attribute type numeric.

YonetimGiderleri (Management Expenses): Administrative expenses incurred for the management of the company during the research period. Attribute type numeric.

PazarlamaHarcamasi (Marketing Expenditures): Expenses incurred by the company for marketing activities during the research period. Attribute type numeric.

Sehir (City): The city where the company operates. Attribute type categorical, unordered.

Kar (Profit): Profit earned by the company during the survey period. Attribute type numeric.

To summarize, we have a total of 5 attributes and 50 observations (companies) in our dataset.