
We continue our notes on linear regression. In this article, we’re
going to do multiple linear regression with R, which we’ve done with
Python in the last time. Theoretically, we’ve said something, in the
first four articles of Notes on Linear Regression. I won’t go into the
theory here. First, in order to understand the event, I’d like to talk a
bit about the dataset we have, what we’re trying to solve with the model
we’re going to build, which attributes correspond to which
variables.
Multiple Linear Regression Definition
Multiple linear regression is a statistical technique used to model
the relationship between one dependent variable and two or more
independent (predictor) variables. The method extends simple linear
regression, which deals with one independent variable, to cases where
the outcome depends on multiple factors.
Where:
- Y is the dependent variable,
- X1, X2, …, Xn are the independent variables,
- β0 is the intercept,
- β1, β2, …, βn are the coefficients for the independent variables,
representing the change in 𝑌 for a one-unit change in 𝑋
- ϵ is the error term, accounting for variability not explained by
the model.
The goal of multiple linear regression is to find the values of β1,
β2, …, βn that minimize the differences between the observed and
predicted values of 𝑌, typically using a method like ordinary least
squares.
Sirket_Harcama_Kar_Bilgileri <- read_csv("Sirket_Harcama_Kar_Bilgileri.csv")
## Rows: 50 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Sehir
## dbl (4): ArgeHarcamasi, YonetimGiderleri, PazarlamaHarcamasi, Kar
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Sirket_Harcama_Kar_Bilgileri)
View(Sirket_Harcama_Kar_Bilgileri)
regressor = lm(formula = Kar ~ ArgeHarcamasi + YonetimGiderleri + PazarlamaHarcamasi,
data = Sirket_Harcama_Kar_Bilgileri)
##
## Call:
## lm(formula = Kar ~ ArgeHarcamasi + YonetimGiderleri + PazarlamaHarcamasi,
## data = Sirket_Harcama_Kar_Bilgileri)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33534 -4795 63 6606 17275
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.012e+04 6.572e+03 7.626 1.06e-09 ***
## ArgeHarcamasi 8.057e-01 4.515e-02 17.846 < 2e-16 ***
## YonetimGiderleri -2.682e-02 5.103e-02 -0.526 0.602
## PazarlamaHarcamasi 2.723e-02 1.645e-02 1.655 0.105
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9232 on 46 degrees of freedom
## Multiple R-squared: 0.9507, Adjusted R-squared: 0.9475
## F-statistic: 296 on 3 and 46 DF, p-value: < 2.2e-16
Interpretation
We have 3 independent variables
Arge Expenses (ArgeHarcamasi) is significant as P<0.05 and the
null hypothesis H0 is rejected, there is a positive correlation between
profit and ArgeExpenses.
Management Expenses(YonetimGiderleri) is insignificant as P>0.05
and the null hypothesis H0 is accepted, there is no correlation between
profit and ManagementExpenses.
MarketingExpenditure(PazarlamaHarcamasi) is insignificant as P >
0.05 and the null hypothesis H0 is accepted, there is no relationship
between profit and MarketingExpenditure.
R-squared (R2 ): 0.9507 Indicates that the independent variables
explain approximately 95.7% of the variability in the model.
Adjusted R2: 94.75% of the change in the dependent variable is
explained by the independent variables. The rest is explained by other
variables.
F-statistic: (tests the validity of the model in general
terms).
P-value: 2.2e-16 (P<0.05, indicating that the model has a
non-zero coefficient of at least one independent variable) The model is
significant.
library(PerformanceAnalytics)
## Le chargement a nécessité le package : xts
## Le chargement a nécessité le package : zoo
##
## Attachement du package : 'zoo'
## Les objets suivants sont masqués depuis 'package:base':
##
## as.Date, as.Date.numeric
##
## Attachement du package : 'PerformanceAnalytics'
## L'objet suivant est masqué depuis 'package:graphics':
##
## legend
9232/mean(Sirket_Harcama_Kar_Bilgileri$Kar)
## [1] 0.08241927
## (Intercept) ArgeHarcamasi YonetimGiderleri PazarlamaHarcamasi
## 5.012219e+04 8.057150e-01 -2.681597e-02 2.722806e-02
Our multiple linear regression model
y=
5.0120000+0.857xArgeHarcamasi-0.02682xYonetimGiderleri+0.02723xPazarlamaHarcamasi
A one unit increase in the dependent variable Y is estimated to be a
0.857 one unit increase in the independent variable X1.
If Y increases by one unit, X2 is expected to decrease by -0.02682
one unit.
A one unit increase in the dependent variable Y is estimated to be a
0.02723 one unit increase in the independent variable X3.
Understanding Data
In the table above we see our qualifications:
ArgeHarcamasi(Arge Expenditures): Arge expenditures made by the
company during the survey period. Attribute type numeric.
YonetimGiderleri (Management Expenses): Administrative expenses
incurred for the management of the company during the research period.
Attribute type numeric.
PazarlamaHarcamasi (Marketing Expenditures): Expenses incurred by
the company for marketing activities during the research period.
Attribute type numeric.
Sehir (City): The city where the company operates. Attribute type
categorical, unordered.
Kar (Profit): Profit earned by the company during the survey period.
Attribute type numeric.
To summarize, we have a total of 5 attributes and 50 observations
(companies) in our dataset.