Dosen Pengampu : Prof. Dr. Suhartono, M.Kom
Mata Kuliah : Linear Algebra
Prodi : Teknik Informatika
Lembaga : Universitas Islam Negeri Maulana Malik Ibrahim Malang
Regresi Linear Berganda adalah model regresi linear dengan melibatkan lebih dari satu variable bebas. Dengan Y adalah variabel bebas, dan X adalah variabel-variabel bebas, a adalah konstanta dan b adalah koefisien regresi pada masing-masing variabel bebas. Berikut regresi linear berganda data Self Isolation Covid-19 dan Google Moility Index pada bulan Agustus 2020.
library(readxl)
DataSelfIsolation <- read_excel(path = "Data Self Isolation Agustus 2020.xlsx")
DataSelfIsolation
library(ggplot2)
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.1.3
x <- DataSelfIsolation$`Self Isolation`
retail <- DataSelfIsolation$retail_and_recreation_percent_change_from_baseline
grocery <- DataSelfIsolation$grocery_and_pharmacy_percent_change_from_baseline
park <- DataSelfIsolation$parks_percent_change_from_baseline
station <- DataSelfIsolation$transit_stations_percent_change_from_baseline
workplace <- DataSelfIsolation$workplaces_percent_change_from_baseline
residental <- DataSelfIsolation$residential_percent_change_from_baseline
df <- data.frame(x, retail, grocery, park, station, workplace,residental )
# melt the data to a long format
df2 <- melt(data = df, id.vars = "x")
# plot, using the aesthetics argument 'colour'
ggplot(data = df2, aes(x = x, y = value, colour = variable))+
geom_point() +
geom_line() +
theme(legend.justification = "top") +
labs(title = "Google Mobility Index",
subtitle = "Provinsi DKI Jakarta Indonesia Bulan Agustus 2020",
y = "Mobility", x = "Data Self Isolation") +
theme(axis.text.x = element_text(angle = -90))
summary(DataSelfIsolation)
## Tanggal Self Isolation
## Min. :2020-08-01 00:00:00 Min. :4128
## 1st Qu.:2020-08-08 12:00:00 1st Qu.:5087
## Median :2020-08-16 00:00:00 Median :6040
## Mean :2020-08-16 00:00:00 Mean :5670
## 3rd Qu.:2020-08-23 12:00:00 3rd Qu.:6307
## Max. :2020-08-31 00:00:00 Max. :6602
## retail_and_recreation_percent_change_from_baseline
## Min. :-37.00
## 1st Qu.:-32.00
## Median :-27.00
## Mean :-28.32
## 3rd Qu.:-26.00
## Max. :-21.00
## grocery_and_pharmacy_percent_change_from_baseline
## Min. :-19.000
## 1st Qu.:-14.000
## Median : -9.000
## Mean : -9.871
## 3rd Qu.: -7.000
## Max. : -2.000
## parks_percent_change_from_baseline
## Min. :-71.00
## 1st Qu.:-65.50
## Median :-62.00
## Mean :-62.06
## 3rd Qu.:-59.00
## Max. :-49.00
## transit_stations_percent_change_from_baseline
## Min. :-57.00
## 1st Qu.:-42.00
## Median :-41.00
## Mean :-40.84
## 3rd Qu.:-38.50
## Max. :-32.00
## workplaces_percent_change_from_baseline
## Min. :-70.0
## 1st Qu.:-33.0
## Median :-31.0
## Mean :-30.1
## 3rd Qu.:-20.0
## Max. :-12.0
## residential_percent_change_from_baseline
## Min. : 7.00
## 1st Qu.: 9.00
## Median :14.00
## Mean :12.58
## 3rd Qu.:14.00
## Max. :21.00
pairs(DataSelfIsolation)
pairs(DataSelfIsolation, lower.panel=NULL)
plot(DataSelfIsolation$`Self Isolation` ~ DataSelfIsolation$Tanggal, data = DataSelfIsolation)
Mengvisualisasikan dengan Data Self Isolation sebagai variable Y dan Google Mobility Index sebagai variable X
plot(DataSelfIsolation$`Self Isolation`, DataSelfIsolation$retail_and_recreation_percent_change_from_baseline+DataSelfIsolation$grocery_and_pharmacy_percent_change_from_baseline+DataSelfIsolation$parks_percent_change_from_baseline+DataSelfIsolation$transit_stations_percent_change_from_baseline+DataSelfIsolation$workplaces_percent_change_from_baseline+DataSelfIsolation$residential_percent_change_from_baseline, data = DataSelfIsolation)
## Warning in plot.window(...): "data" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "data" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "data" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "data" is not a
## graphical parameter
## Warning in box(...): "data" is not a graphical parameter
## Warning in title(...): "data" is not a graphical parameter
cor(DataSelfIsolation$`Self Isolation`,DataSelfIsolation$retail_and_recreation_percent_change_from_baseline)
## [1] -0.2427305
cor(DataSelfIsolation$`Self Isolation`,DataSelfIsolation$grocery_and_pharmacy_percent_change_from_baseline)
## [1] -0.4488862
cor(DataSelfIsolation$`Self Isolation`,DataSelfIsolation$parks_percent_change_from_baseline)
## [1] 0.2001587
cor(DataSelfIsolation$`Self Isolation`,DataSelfIsolation$transit_stations_percent_change_from_baseline)
## [1] -0.2089803
cor(DataSelfIsolation$`Self Isolation`,DataSelfIsolation$workplaces_percent_change_from_baseline)
## [1] -0.2600343
cor(DataSelfIsolation$`Self Isolation`,DataSelfIsolation$residential_percent_change_from_baseline)
## [1] 0.2144458
model <- lm(DataSelfIsolation$`Self Isolation` ~ DataSelfIsolation$Tanggal, data = DataSelfIsolation)
summary(model)
##
## Call:
## lm(formula = DataSelfIsolation$`Self Isolation` ~ DataSelfIsolation$Tanggal,
## data = DataSelfIsolation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1373.5 -587.4 322.2 625.0 959.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.658e+05 2.777e+05 0.957 0.346
## DataSelfIsolation$Tanggal -1.628e-04 1.738e-04 -0.937 0.357
##
## Residual standard error: 747.9 on 29 degrees of freedom
## Multiple R-squared: 0.02937, Adjusted R-squared: -0.004098
## F-statistic: 0.8776 on 1 and 29 DF, p-value: 0.3566
Di atas merupakan rincian dari model yang telah dibuat.
Di posisi paling atas terdapat lm formula adalah DataSelfIsolation$SelfIsolation ~ DataSelfIsolation$Tanggal, data = DataSelfIsolation
Lalu di bawahnya terdapat 5 nilai residual, sebelumnya kita perlu mengetahui bahwa nilai residual merupakan selisih dari nilai prediksi dan nilai sebenarnya (actual) atau ei = Yi - (a + b Xi ). Jika nilai pengamatan terletak dalam garis regresi maka nilai residunya = 0. Yang mana semakin kecil nilai residual maka semakin baik atau benar model yang kita buat. Berikut nilai-nilai residual yang dihasilkan :
Nilai Minimum = -1373.5
Nilai Maximum = 959.9
Nilai Median = 322.2
Nilai Quartil 1 = -587.4
Nilai Quartil 3 = 625.0
Dari nilai-nilai tersebut dapat kita lihat bahwa dalam konteks ini berupa nilai minimum, maximum, median, quartil 1 dan quartil 3. Dapat kita simpulkan bahwa model yang telah kita buat belum bisa dikatakan baik atau benar karena nilai-nilai yang dihasilkan tidak mendekati 0.
Di bawah nilai residual terdapat koefisien, yang mana dalam koefisien tersebut terdapat nilai intersep, retail_and_recreation, grocery_and_pharmacy, parks, transit_stations, workplaces dan residential. Selain itu juga terdapat nilai-p dari koefisien.
Uji Anova (Analysis of Variance Table) berfungsi untuk membandingkan rata-rata populasi yang digunakan untuk mengetahui perbedaan signifikan dari dua atau lebih kelompok data.
anova(model)
plot(DataSelfIsolation$`Self Isolation` ~ DataSelfIsolation$Tanggal, data = DataSelfIsolation, col = "red", pch = 20, cex = 1.5, main = "Data Inflow Covid-19 DKI Jakarta Agustus 2020 dan Google Mobility Index")
abline(model)
Titik-titik merah yang ada pada grafik tersebut adalah data real dan garis hitam di dalam kotak adalah data prediksi.
plot(cooks.distance(model), pch = 16, col = "red") #Plot the Cooks Distances.
plot(model)
AIC(model)
## [1] 502.1802
BIC(model)
## [1] 506.4822
head(predict(model), n = 11)
## 1 2 3 4 5 6 7 8
## 5881.331 5867.261 5853.192 5839.123 5825.053 5810.984 5796.915 5782.845
## 9 10 11
## 5768.776 5754.706 5740.637
plot(head(predict(model), n = 10))
head(resid(model), n = 11)
## 1 2 3 4 5 6
## -1198.330645 -951.261290 -769.191935 -623.122581 -697.053226 -413.983871
## 7 8 9 10 11
## 6.085484 327.154839 322.224194 497.293548 495.362903
coef(model)
## (Intercept) DataSelfIsolation$Tanggal
## 2.658127e+05 -1.628398e-04
DataSelfIsolation$residuals <- model$residuals
DataSelfIsolation$predicted <- model$fitted.values
DataSelfIsolation
scatter.smooth(x=DataSelfIsolation$Tanggal, y=DataSelfIsolation$`Self Isolation`, main="Tanggal - Self Isolation")
boxplot(DataSelfIsolation$`Self Isolation`, main="Data Self Isolation", boxplot.stats(DataSelfIsolation$`Self Isolation`)$out)
plot(density(DataSelfIsolation$`Self Isolation`), main="Google Mobility Index : Data Self Isolation", ylab="Frequency")
coefs <- coef(model)
plot(`Self Isolation` ~ Tanggal, data = DataSelfIsolation)
abline(coefs)
text(x = 12, y = 10, paste('expression = ', round(coefs[1], 2), '+', round(coefs[2], 2), 'DataSelfIsolation'))
Adanya korelasi antar variabel dapat dilakukan melalui visualisasi menggunakan scatterplot dan perhitungan matematis menggunakan metode Pearson untuk metode parametrik dan metode rangking Spearman dan Kendall untuk metode non-parametrik.
cor.test(DataSelfIsolation$retail_and_recreation_percent_change_from_baseline, DataSelfIsolation$`Self Isolation`)
##
## Pearson's product-moment correlation
##
## data: DataSelfIsolation$retail_and_recreation_percent_change_from_baseline and DataSelfIsolation$`Self Isolation`
## t = -1.3474, df = 29, p-value = 0.1883
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5497841 0.1221125
## sample estimates:
## cor
## -0.2427305
cor.test(DataSelfIsolation$grocery_and_pharmacy_percent_change_from_baseline, DataSelfIsolation$`Self Isolation`)
##
## Pearson's product-moment correlation
##
## data: DataSelfIsolation$grocery_and_pharmacy_percent_change_from_baseline and DataSelfIsolation$`Self Isolation`
## t = -2.7052, df = 29, p-value = 0.01131
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6929990 -0.1124289
## sample estimates:
## cor
## -0.4488862
cor.test(DataSelfIsolation$parks_percent_change_from_baseline, DataSelfIsolation$`Self Isolation`)
##
## Pearson's product-moment correlation
##
## data: DataSelfIsolation$parks_percent_change_from_baseline and DataSelfIsolation$`Self Isolation`
## t = 1.1002, df = 29, p-value = 0.2803
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1659514 0.5177759
## sample estimates:
## cor
## 0.2001587
cor.test(DataSelfIsolation$transit_stations_percent_change_from_baseline, DataSelfIsolation$`Self Isolation`)
##
## Pearson's product-moment correlation
##
## data: DataSelfIsolation$transit_stations_percent_change_from_baseline and DataSelfIsolation$`Self Isolation`
## t = -1.1508, df = 29, p-value = 0.2592
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5244824 0.1569846
## sample estimates:
## cor
## -0.2089803
cor.test(DataSelfIsolation$workplaces_percent_change_from_baseline, DataSelfIsolation$`Self Isolation`)
##
## Pearson's product-moment correlation
##
## data: DataSelfIsolation$workplaces_percent_change_from_baseline and DataSelfIsolation$`Self Isolation`
## t = -1.4502, df = 29, p-value = 0.1577
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5625415 0.1038771
## sample estimates:
## cor
## -0.2600343
cor.test(DataSelfIsolation$residential_percent_change_from_baseline, DataSelfIsolation$`Self Isolation`)
##
## Pearson's product-moment correlation
##
## data: DataSelfIsolation$residential_percent_change_from_baseline and DataSelfIsolation$`Self Isolation`
## t = 1.1823, df = 29, p-value = 0.2467
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1513986 0.5286179
## sample estimates:
## cor
## 0.2144458