Email          : ferdinand.widjaya@student.matanauniversity.ac.id
RPubs         : https://rpubs.com/ferdnw/
Address     : ARA
Center, Matana University Tower
             Jl. CBD
Barat Kav, RT.1, Curug Sangereng, Kelapa Dua, Tangerang, Banten
15810.
library(ggplot2)
library(dplyr)
library(broom)
library(ggpubr)heart = read.csv("heart.data.csv")
summary (heart)## X biking smoking heart.disease
## Min. : 1.0 Min. : 1.119 Min. : 0.5259 Min. : 0.5519
## 1st Qu.:125.2 1st Qu.:20.205 1st Qu.: 8.2798 1st Qu.: 6.5137
## Median :249.5 Median :35.824 Median :15.8146 Median :10.3853
## Mean :249.5 Mean :37.788 Mean :15.4350 Mean :10.1745
## 3rd Qu.:373.8 3rd Qu.:57.853 3rd Qu.:22.5689 3rd Qu.:13.7240
## Max. :498.0 Max. :74.907 Max. :29.9467 Max. :20.4535
Dependent Variable (Y) = heart.disease Independent
Variable - X1 = biking - X2 = smoking
cor(heart$biking, heart$smoking)## [1] 0.01513618
Karena Corelasi antar Independent Variabel rendah, hanya 15% bisa dibilang hasil regresinya tdak akan terlalu bias.
Testing apakah data yang digunakan berdistribusi normal atau tidak
hist(heart$heart.disease)shapiro.test(heart$heart.disease)##
## Shapiro-Wilk normality test
##
## data: heart$heart.disease
## W = 0.98047, p-value = 3.158e-06
Karena p-value < 0,05 dan Histrogram mengvisualisasikan data nya mirip seperti lonceng, maka diasumsikan datanya memiliki distribusi normal.
Variabel Dependen dan Independen harus memiliki hubungan linear yang jelas
plot(heart.disease ~ biking, data=heart)
Hasil Grafk menunjukkan sebuah hubungan linear yang kuat antara
heart.disease dan biking Semakin Sering
Bersepeda semakin rendah peluang terkena penyakit Jantung.
plot(heart.disease ~ smoking, data=heart)Hubungan Linear antara smoking dan
heart.disease bisa terlihat samar-samar linear walau tidak
sesignificant biking tapi bisa dilihat Semakin Sering
Merokok, Semakin Besar Peluang terkena Penyakit Jantung.
Homogenitas Variansi akan diuji setelah model sudsh dibuat untuk menunjukkan predksi tidak akan meleset jauh daripada prediksi lainnya.
lmheart <- lm(heart.disease ~ biking + smoking , data = heart)
summary(lmheart)##
## Call:
## lm(formula = heart.disease ~ biking + smoking, data = heart)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1789 -0.4463 0.0362 0.4422 1.9331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.984658 0.080137 186.99 <2e-16 ***
## biking -0.200133 0.001366 -146.53 <2e-16 ***
## smoking 0.178334 0.003539 50.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.654 on 495 degrees of freedom
## Multiple R-squared: 0.9796, Adjusted R-squared: 0.9795
## F-statistic: 1.19e+04 on 2 and 495 DF, p-value: < 2.2e-16
Let’s see if there’s a linear relationship between income and happiness in our survey of 500 people with incomes ranging from $15k to $75k, where happiness is measured on a scale of 1 to 10.
\[ Y = 14.498 - 0.2 X1 + 0.178 X2 \]
Karena p-value < 0.05 bisa di bilang Model Linear yang ada akan berfungsu cukup baik dengan tingkat akurasi (R_SQ) kurang lebih 98%.
Dengan Setiap Bersepeda bisa mengurangi peluang terkena Serangan
Jantung dan sebaliknya, merokok dapat memperbesar peluang terkena heart
diseases dilihat dari Koefisien regresinya yang Negatif untuk
biking dan Positif untuk smoking.
par(mfrow=c(2,2))
plot(lmheart)par(mfrow=c(1,1))Use the function expand.grid() to create a dataframe with the parameters you supply. Within this function we will:
Create a sequence from the lowest to the highest value of your observed biking data; Choose the minimum, mean, and maximum values of smoking, in order to make 3 levels of smoking over which to predict rates of heart disease.
plotting.data<-expand.grid(
biking = seq(min(heart$biking), max(heart$biking), length.out=30),
smoking=c(min(heart$smoking), mean(heart$smoking), max(heart$smoking)))Next we will save our ‘predicted y’ values as a new column in the dataset we just created.
plotting.data$predicted.y <- predict.lm(lmheart, newdata=plotting.data)This allows us to plot the interaction between biking and heart disease at each of the three levels of smoking we chose.
plotting.data$smoking <- as.factor(plotting.data$smoking)heart.plot <- ggplot(heart, aes(x=biking, y=heart.disease)) +
geom_point()
heart.plot(Anggaplah Jarang, Lumayan Sering dan Sangat Sering)
heart.plot <- heart.plot +
geom_line(data=plotting.data, aes(x=biking, y=predicted.y, color=smoking), size=1.25)
heart.plotheart.plot <-
heart.plot +
theme_bw() +
labs(title = "Rates of heart disease (% of population) \n as a function of biking to work and smoking",
x = "Biking to work (% of population)",
y = "Heart disease (% of population)",
color = "Smoking \n (% of population)")
heart.plotAfter we see the Graph, we can conclude that there is significant
relation between Chance of Heart Diseases to Intensity of
Biking and Smoking
Based on Model, 1% Intensitas Bersepda naik maka 0.2% peluang untuk mengalami Heart Disease turun dan untuk 1% Intensitas Merokok akan naik 0.178% Peluang terkena Heart Disease.
Pesan Moral : Jangan Merokok, Tetap Bersepeda