library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.1 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(readr)
my_data <- read_csv("C:/Users/HP/Desktop/R Data Science/EDA-MODELING AND PRESENTATION/activity1_data.csv")
## Rows: 71 Columns: 8
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): MONTH, 91TBILL
## dbl (2): TEL, GLO
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
my_data
The result above is now our dataset with a 71 observations and 8 variables.
names(my_data)
## [1] "MONTH" "PCOMP" "TEL" "GLO" "PHP" "PHIVTA" "PHMS"
## [8] "91TBILL"
The result above is the 8 variables names in the dataset.
sum(is.na(my_data))
## [1] 0
So there no any missing or null value in the dataset.
We see in our previous dataset that our “91TBill” variable is character data type so we make some changes on it. Before we create our model we should convert a “91TBILL” variable into a double data type.
data_model <- my_data %>%
select(TEL,PCOMP, GLO, PHP, PHIVTA, PHMS, `91TBILL`) %>%
mutate(`91TBILL` =gsub('.{1}$', '', `91TBILL`)) %>%
mutate(`91TBILL` = as.double(`91TBILL`))
head(data_model)
The result above is the first 6 observations with a new dataset. We also, see that their no “MONTH” variable in our dataset since it no use for creating the model. As we can see that “91TBILL”variable is now a double data type, now we can use it for creating our model.Lastly, the “TEL” variable in a first column so that it serve as our response variable for our model.
plot(data_model, col = "blue", main = "SCATTERPLOT MATRIX")
The chart above is scatter plot matrix that show the relationship of our all variables with each other in our data set. Since we only looking for a relationship of “TEL” to the remaining variables on the next chunk we make a scatter plot of two only variables for easy visualization.
corr.test(x = data_model)
## Call:corr.test(x = data_model)
## Correlation matrix
## TEL PCOMP GLO PHP PHIVTA PHMS 91TBILL
## TEL 1.00 0.89 -0.40 -0.80 0.51 -0.69 0.69
## PCOMP 0.89 1.00 -0.44 -0.87 0.50 -0.68 0.55
## GLO -0.40 -0.44 1.00 0.65 -0.21 0.80 -0.73
## PHP -0.80 -0.87 0.65 1.00 -0.45 0.86 -0.65
## PHIVTA 0.51 0.50 -0.21 -0.45 1.00 -0.21 0.23
## PHMS -0.69 -0.68 0.80 0.86 -0.21 1.00 -0.82
## 91TBILL 0.69 0.55 -0.73 -0.65 0.23 -0.82 1.00
## Sample Size
## [1] 71
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## TEL PCOMP GLO PHP PHIVTA PHMS 91TBILL
## TEL 0 0 0.00 0 0.00 0.00 0.00
## PCOMP 0 0 0.00 0 0.00 0.00 0.00
## GLO 0 0 0.00 0 0.16 0.00 0.00
## PHP 0 0 0.00 0 0.00 0.00 0.00
## PHIVTA 0 0 0.08 0 0.00 0.16 0.16
## PHMS 0 0 0.00 0 0.08 0.00 0.00
## 91TBILL 0 0 0.00 0 0.05 0.00 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
The result above our the correlation with the dependent variable “TEL” are PCOMP 0.89, GLO -0.40, PHP -0.80, PHIVTA 0.51, PHMS -0.69, 91TBILL 0.69
data_model %>%
ggplot(aes(x = PCOMP, y = TEL, colour = PCOMP)) +
geom_point(shape = "circle", size = 1.5) +
geom_smooth(method = lm, se = F) +
scale_color_gradient() +
labs(
x = "THE PHILIPPINE COMPOSITE INDEX",
y = "TEL",
title = "Relationship Between TEL and PCOMP"
) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
The chart above show that it have a strong positive relationship since it have a correlation of 0.89 which is fairly high.
data_model %>%
ggplot(aes(x = GLO, y = TEL, colour = GLO)) +
geom_point(shape = "circle", size = 1.5) +
geom_smooth(method = lm, se = F) +
scale_color_gradient() +
labs(
x = "STOCK PRICE OF GLOBE TELECOMS",
y = "TEL",
title = "Relationship Between TEL and GLO"
) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
The chart above show that it have a negative relationship since it have a correlation of -0.40 which is moderately low.
data_model %>%
ggplot(aes(x = PHP, y = TEL, colour = PHP)) +
geom_point(shape = "circle", size = 1.5) +
geom_smooth(method = lm, se = F) +
scale_color_gradient() +
labs(
x = "US DOLLAR EXCHANGE RATE",
y = "TEL",
title = "Relationship Between TEL and PHP"
) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
The chart above show that it have a strong negative relationship since it have a correlation of -0.80 which is fairly high.
data_model %>%
ggplot(aes(x = PHIVTA, y = TEL, colour = PHIVTA)) +
geom_point(shape = "circle", size = 1.5) +
geom_smooth(method = lm, se = F) +
scale_color_gradient() +
labs(
x = "PHILIPPINE VISITOR TRAVEL ARRIVAL",
y = "TEL",
title = "Relationship Between TEL and PHIVTA"
) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
The chart above show that it have a moderate positive relationship since it have a correlation of 0.51 which is not bit too high or low. It may considered to be low because of too many outliers that affect the r value.
data_model %>%
ggplot(aes(x = PHMS, y = TEL, colour = PHMS)) +
geom_point(shape = "circle", size = 1.5) +
geom_smooth(method = lm, se = F) +
scale_color_gradient() +
labs(
x = "PHILIPPINE MONEY SUPLLY(MI)",
y = "TEL",
title = "Relationship Between TEL and PHMS"
) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
The chart above show that it have a negative relationship since it have a correlation of -0.69 which is moderately high.
data_model %>%
ggplot(aes(x = `91TBILL`, y = TEL, colour = `91TBILL`)) +
geom_point(shape = "circle", size = 1.5) +
geom_smooth(method = lm, se = F) +
scale_color_gradient() +
labs(
x = "PHILIPPINE 91-DAY TREASURYT BILL RATES",
y = "TEL",
title = "Relationship Between TEL and 91TBILL"
) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
The figure above show that it have a positive relationship since it have a correlation of 0.69 which is moderately high.
positive_model1 <- lm(data_model$TEL ~ data_model$PCOMP + data_model$PHIVTA + data_model$`91TBILL`)
positive_model1
##
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$PHIVTA +
## data_model$`91TBILL`)
##
## Coefficients:
## (Intercept) data_model$PCOMP data_model$PHIVTA
## -3.832e+02 4.309e-01 1.276e-03
## data_model$`91TBILL`
## 2.384e+01
summary(positive_model1)
##
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$PHIVTA +
## data_model$`91TBILL`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -164.567 -82.298 1.316 62.207 278.569
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.832e+02 9.174e+01 -4.177 8.72e-05 ***
## data_model$PCOMP 4.309e-01 4.090e-02 10.536 7.42e-16 ***
## data_model$PHIVTA 1.276e-03 6.251e-04 2.041 0.0452 *
## data_model$`91TBILL` 2.384e+01 4.510e+00 5.287 1.47e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 104.6 on 67 degrees of freedom
## Multiple R-squared: 0.8535, Adjusted R-squared: 0.8469
## F-statistic: 130.1 on 3 and 67 DF, p-value: < 2.2e-16
The value of r-squared is 0.8535 in model 1 or we can say that the three predictor variables PCOMP, PHIVTA and 91TBILL explain 85.35% of variability to TEL which is our response variable. The r-squared is 0.8535 or 85.35 % which is fairly high and we have a model that can predict a good value.
positive_model1[["coefficients"]][["(Intercept)"]]
## [1] -383.2146
positive_model1[["coefficients"]][["data_model$PCOMP"]]
## [1] 0.4309425
positive_model1[["coefficients"]][["data_model$PHIVTA"]]
## [1] 0.001275861
positive_model1[["coefficients"]][["data_model$`91TBILL`"]]
## [1] 23.84147
predicted1 <- -383.2146 + (0.4309425 * data_model$PCOMP + 0.001275861 * data_model$PHIVTA +
23.84147 * data_model$`91TBILL`)
predicted1
## [1] 691.4312 554.5983 568.4467 482.4263 462.3039 488.1147 443.7543
## [8] 376.2159 409.9886 412.2814 398.7273 419.8613 425.3258 379.4038
## [15] 386.6462 392.2392 415.5464 432.5177 431.5487 504.2543 520.7387
## [22] 596.0019 589.8882 589.7076 544.0040 481.9450 418.7096 540.8264
## [29] 629.1610 633.2518 615.9432 636.9414 673.2192 708.7840 753.7647
## [36] 847.9549 804.4305 798.8409 583.2013 615.6338 694.4076 651.6648
## [43] 694.1776 667.4043 750.4575 797.7611 763.2542 927.1257 1023.6837
## [50] 932.3325 922.2898 937.8204 971.0146 1046.1560 1124.0014 1113.0560
## [57] 1161.8881 1027.1227 991.3715 1016.7286 1062.4082 1044.3615 917.9368
## [64] 701.8009 682.9923 883.6274 915.2217 1031.4517 1144.0442 1216.7908
## [71] 1239.5674
residual1 <- data_model$TEL - predicted1
residual1
## [1] 278.568789 200.401673 196.553301 157.573745 72.696122 41.885262
## [7] 121.245676 56.284124 -29.988616 -97.281367 -106.227324 -124.861347
## [13] -155.325826 -89.403811 -156.646202 -112.239220 -130.546443 -117.517676
## [19] -61.548669 -66.754348 -60.738658 -61.001855 -84.888161 -79.707570
## [25] -126.503998 -54.444960 -38.709590 -60.826392 -144.161010 41.748248
## [31] 109.056844 13.058642 16.780790 6.215963 41.235347 67.045123
## [37] 60.569536 6.159107 191.798675 194.366240 70.592388 18.335188
## [43] 80.822444 97.595717 -10.457463 102.238893 126.745817 22.874264
## [49] 1.316343 -92.332477 -87.289815 -52.820394 -31.014578 63.844000
## [55] 35.998550 1.943978 68.111860 -7.122699 -86.371512 -116.728615
## [61] -62.408223 -14.361465 47.063199 218.199142 42.007722 41.372571
## [67] 34.778336 -31.451652 -69.044222 -161.790809 -164.567401
Data1 <- cbind(data_model, predicted1, residual1)
Data1
sum(residual1)
## [1] -0.000749914
The sum of the Residual in model 1 is 0 so it make a good prediction.
negative_model2 <- lm(data_model$TEL ~ data_model$GLO + data_model$PHP + data_model$PHMS)
negative_model2
##
## Call:
## lm(formula = data_model$TEL ~ data_model$GLO + data_model$PHP +
## data_model$PHMS)
##
## Coefficients:
## (Intercept) data_model$GLO data_model$PHP data_model$PHMS
## 2.416e+03 5.726e-01 -3.234e-02 -1.279e-03
summary(negative_model2)
##
## Call:
## lm(formula = data_model$TEL ~ data_model$GLO + data_model$PHP +
## data_model$PHMS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -238.85 -121.40 -38.47 91.83 514.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.416e+03 1.545e+02 15.633 < 2e-16 ***
## data_model$GLO 5.726e-01 1.958e-01 2.924 0.00471 **
## data_model$PHP -3.234e-02 5.915e-03 -5.468 7.26e-07 ***
## data_model$PHMS -1.279e-03 6.653e-04 -1.923 0.05871 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 155 on 67 degrees of freedom
## Multiple R-squared: 0.6785, Adjusted R-squared: 0.6641
## F-statistic: 47.14 on 3 and 67 DF, p-value: < 2.2e-16
The value of r-squared is 0.6785 in model 2 or we can say that the three predictor variables GLO, PHMS and PHP explain 67.85% of variability to TEL which is our response variable. The r-squared is 0.6785 or 67.85 % which is not too high and we have a model that can predict a moderate value.
negative_model2[["coefficients"]][["(Intercept)"]]
## [1] 2415.641
negative_model2[["coefficients"]][["data_model$GLO"]]
## [1] 0.5725913
negative_model2[["coefficients"]][["data_model$PHMS"]]
## [1] -0.001279403
negative_model2[["coefficients"]][["data_model$PHP"]]
## [1] -0.03234167
n_predicted2 <- 2415.641 + 0.5725913 * data_model$GLO -0.001279403 * data_model$PHMS + -0.03234167 * data_model$PHP
n_predicted2
## [1] 455.0582 423.4010 499.3698 468.3476 424.0964 462.2850 462.5904
## [8] 470.9702 465.5918 438.7892 387.3375 407.4289 347.3204 376.8979
## [15] 394.1481 460.6403 499.1987 533.0851 563.9351 655.3897 624.6503
## [22] 633.2957 645.7771 633.0680 557.0550 543.2442 534.5210 588.8434
## [29] 604.7312 516.3169 555.5606 610.0422 572.5319 647.1779 724.5687
## [36] 697.6885 623.7806 650.1253 583.9358 850.8673 906.3764 874.7472
## [43] 935.2562 919.2418 978.8488 939.5954 951.1574 905.0265 856.4070
## [50] 909.4427 981.5351 950.9895 1009.5491 1068.8366 1083.2918 1075.8630
## [57] 1041.1627 938.7595 931.5091 979.7389 929.0070 970.8717 909.5233
## [64] 786.5273 793.3849 880.7361 877.4868 949.8623 913.7529 983.2905
## [71] 887.1030
n_residual2<- data_model$TEL - n_predicted2
n_residual2
## [1] 514.94181 331.59899 265.63023 171.65244 110.90359 67.71498
## [7] 102.40964 -38.47020 -85.59176 -123.78916 -94.83750 -112.42888
## [13] -77.32037 -86.89788 -164.14811 -180.64034 -214.19871 -218.08509
## [19] -193.93507 -217.88966 -164.65034 -98.29567 -140.77711 -123.06802
## [25] -139.55495 -115.74418 -154.52101 -108.84338 -119.73121 158.68310
## [31] 169.43941 39.95783 117.46810 67.82212 70.43128 217.31154
## [37] 241.21944 154.87468 191.06421 -40.86734 -141.37645 -204.74723
## [43] -160.25619 -154.24176 -238.84877 -39.59544 -61.15737 44.97355
## [49] 168.59298 -69.44266 -146.53515 -65.98953 -69.54913 41.16342
## [55] 76.70822 39.13698 188.83727 81.24045 -26.50912 -79.73893
## [61] 70.99296 59.12835 55.47674 133.47270 -68.38490 44.26389
## [67] 72.51320 50.13772 161.24713 71.70948 187.89699
n_data2 <- cbind(data_model, n_predicted2, n_residual2)
n_data2
sum(n_residual2)
## [1] -0.04315892
The sum of residual is typically zero that our model 2 predict an accurate value.
pn_model3 <- lm(data_model$TEL ~ data_model$PCOMP + data_model$GLO + data_model$PHP +
data_model$PHIVTA + data_model$PHMS + data_model$`91TBILL`)
pn_model3
##
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$GLO +
## data_model$PHP + data_model$PHIVTA + data_model$PHMS + data_model$`91TBILL`)
##
## Coefficients:
## (Intercept) data_model$PCOMP data_model$GLO
## -5.174e+02 3.914e-01 5.713e-01
## data_model$PHP data_model$PHIVTA data_model$PHMS
## -1.860e-03 1.549e-03 -4.326e-04
## data_model$`91TBILL`
## 3.652e+01
summary(pn_model3)
##
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$GLO +
## data_model$PHP + data_model$PHIVTA + data_model$PHMS + data_model$`91TBILL`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -165.40 -65.37 -17.86 56.62 268.78
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.174e+02 2.952e+02 -1.753 0.0844 .
## data_model$PCOMP 3.914e-01 5.997e-02 6.527 1.25e-08 ***
## data_model$GLO 5.713e-01 1.233e-01 4.633 1.82e-05 ***
## data_model$PHP -1.860e-03 6.001e-03 -0.310 0.7576
## data_model$PHIVTA 1.549e-03 6.027e-04 2.570 0.0125 *
## data_model$PHMS -4.326e-04 5.219e-04 -0.829 0.4102
## data_model$`91TBILL` 3.652e+01 6.317e+00 5.781 2.39e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 91.16 on 64 degrees of freedom
## Multiple R-squared: 0.8938, Adjusted R-squared: 0.8838
## F-statistic: 89.73 on 6 and 64 DF, p-value: < 2.2e-16
The value of r-squared is 0.8938 in model 3 or we can say that the six predictor variables GLO,PHP, PHIVTA, PHMS and 91TBILL explain 89.38% of variability to TEL which is our response variable. The r-squared is 0.8938 or 89.38% which is pretty high and we have a model that can predict a good value.
pn_fitted3 <- -517.402 + 0.3913958 * data_model$PCOMP + 0.5713458 * data_model$GLO +
-0.001859649 * data_model$PHP + 0.001548627 * data_model$PHIVTA + -0.0004326304 * data_model$PHMS +
36.5221 * data_model$`91TBILL`
pn_fitted3
## [1] 810.1342 600.8200 649.0391 505.0524 451.6652 504.0369 421.1852
## [8] 368.7576 385.6347 388.8232 347.6912 367.6268 323.1700 271.2767
## [15] 272.5182 301.5678 348.6981 382.2264 382.7592 536.1589 537.5004
## [22] 636.7784 639.8043 631.3326 568.2978 467.6272 397.8645 547.0332
## [29] 650.3994 642.7001 633.8391 663.0484 710.9546 741.8850 816.0734
## [36] 923.6892 890.0263 901.8112 584.1258 706.7502 810.3011 732.1423
## [43] 781.9903 722.4373 830.5059 825.2548 760.0200 894.7038 981.3213
## [50] 910.9889 922.2460 917.6878 958.3382 1052.0567 1131.0787 1132.6033
## [57] 1148.6842 965.6618 936.7953 985.9763 1024.6945 1027.3843 864.7756
## [64] 651.2206 641.2508 869.9392 881.4480 977.6987 1096.8813 1185.6442
## [71] 1180.3823
pn_rsd3 <- data_model$TEL - pn_fitted3
pn_rsd3
## [1] 159.865761 154.180000 115.960883 134.947602 83.334823 25.963134
## [7] 143.814826 63.742388 -5.634700 -73.823190 -55.191170 -72.626823
## [13] -53.169974 18.723301 -42.518182 -21.567771 -63.698113 -67.226403
## [19] -12.759206 -98.658873 -77.500374 -101.778413 -134.804268 -121.332649
## [25] -150.797765 -40.127241 -17.864532 -67.033201 -165.399443 32.299922
## [31] 91.160902 -13.048406 -20.954624 -26.884993 -21.073434 -8.689236
## [37] -25.026331 -96.811250 190.874215 103.249773 -45.301128 -62.142324
## [43] -6.990295 42.562742 -90.505899 74.745192 129.980015 55.296212
## [49] 43.678700 -70.988882 -87.245964 -32.687829 -18.338209 57.943337
## [55] 28.921342 -17.603288 81.315803 54.338184 -31.795253 -85.976285
## [61] -24.694488 2.615708 100.224369 268.779437 83.749190 55.060751
## [67] 68.551979 22.301263 -21.881254 -130.644218 -105.382254
pn_data3 <- cbind(data_model , pn_fitted3, pn_rsd3)
pn_data3
sum(pn_rsd3)
## [1] 0.003618887
The sum of residual on model 3 is zero so it predict a good value.
pn_model4 <- lm(data_model$TEL ~ data_model$PCOMP + data_model$GLO + data_model$PHIVTA +
data_model$`91TBILL`)
pn_model4
##
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$GLO +
## data_model$PHIVTA + data_model$`91TBILL`)
##
## Coefficients:
## (Intercept) data_model$PCOMP data_model$GLO
## -7.828e+02 4.378e-01 4.704e-01
## data_model$PHIVTA data_model$`91TBILL`
## 1.378e-03 3.966e+01
summary(pn_model4)
##
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$GLO +
## data_model$PHIVTA + data_model$`91TBILL`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -164.80 -63.46 -20.39 67.09 292.74
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.828e+02 1.163e+02 -6.730 4.88e-09 ***
## data_model$PCOMP 4.378e-01 3.565e-02 12.281 < 2e-16 ***
## data_model$GLO 4.704e-01 9.952e-02 4.727 1.24e-05 ***
## data_model$PHIVTA 1.378e-03 5.448e-04 2.528 0.0139 *
## data_model$`91TBILL` 3.966e+01 5.160e+00 7.686 9.65e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 91.12 on 66 degrees of freedom
## Multiple R-squared: 0.8905, Adjusted R-squared: 0.8839
## F-statistic: 134.2 on 4 and 66 DF, p-value: < 2.2e-16
The value of r-squared is 0.8905 in model 4 or we can say that the 4 predictor variables PCOMP, GLO, PHIVTA and 91TBILL explain 89.05% of variability to TEL which is our response variable. The r-squared is 0.8905 or 89.05% which is pretty high and we have a model that can predict a good value.
pn_model4[["coefficients"]][["(Intercept)"]]
## [1] -782.8472
pn_model4[["coefficients"]][["data_model$PCOMP"]]
## [1] 0.437831
pn_model4[["coefficients"]][["data_model$GLO"]]
## [1] 0.4703947
pn_model4[["coefficients"]][["data_model$PHIVTA"]]
## [1] 0.001377586
pn_model4[["coefficients"]][["data_model$`91TBILL`"]]
## [1] 39.66295
pn_fitted4 <- -782.8472 + (0.437831 * data_model$PCOMP + 0.4703947 * data_model$GLO +
0.001377586 * data_model$PHIVTA + 39.66295 * data_model$`91TBILL`)
pn_fitted4
## [1] 829.4008 621.9778 656.6593 517.2916 461.7663 509.6347 438.7308
## [8] 386.0370 405.2896 399.1766 359.3245 376.6481 346.1844 289.9296
## [15] 285.9536 308.5448 343.8319 373.3701 375.0045 519.7080 524.8927
## [22] 625.6109 628.4716 620.2825 565.7205 467.7336 394.9414 547.6876
## [29] 649.7960 649.7916 640.9357 670.1462 718.3513 739.8814 820.9865
## [36] 942.8010 911.1754 919.4129 578.1231 683.9633 780.3376 701.7430
## [43] 753.8440 696.8376 802.0328 805.1807 739.3996 898.7751 1008.2216
## [50] 909.3759 911.6164 912.6211 948.2494 1043.9730 1130.9181 1135.3897
## [57] 1156.3278 969.8743 943.1025 979.7618 1025.6818 1017.5833 854.3763
## [64] 627.2587 612.6658 854.1851 876.6633 985.4936 1111.9301 1200.8707
## [71] 1213.0392
pn_rsd4 <- data_model$TEL - pn_fitted4
pn_rsd4
## [1] 140.59923585 133.02222532 108.34065405 122.70843791 73.23367723
## [6] 20.36527011 126.26920854 46.46300955 -25.28956107 -84.17655787
## [11] -66.82451807 -81.64814400 -76.18438515 0.07037975 -55.95359648
## [16] -28.54484533 -58.83194814 -58.37005491 -5.00453179 -82.20803975
## [21] -64.89271858 -90.61090820 -123.47164610 -110.28247962 -148.22048357
## [26] -40.23362631 -14.94137428 -67.68757670 -164.79602517 25.20838015
## [31] 84.06427439 -20.14620219 -28.35129712 -24.88142098 -25.98653509
## [36] -27.80095082 -46.17536489 -114.41294358 196.87685207 126.03674098
## [41] -15.33756705 -31.74297131 21.15596255 68.16241283 -62.03275861
## [46] 94.81926457 150.60039597 51.22493200 16.77836774 -69.37587342
## [51] -76.61644866 -27.62108192 -8.24940282 66.02698956 29.08190599
## [56] -20.38970594 73.67224433 50.12566168 -38.10247084 -79.76181748
## [61] -25.68176035 12.41674709 110.62372586 292.74133635 112.33416628
## [66] 70.81490763 73.33670275 14.50639918 -36.93007770 -145.87071262
## [71] -138.03922743
pn_data4 <- cbind(data_model , pn_fitted4, pn_rsd4)
pn_data4
sum(pn_rsd4)
## [1] 0.000856366
The sum of residual on model 4 is zero so it predict a good value.
#creatinga index
data_index <- as.integer(rownames(data_model))
my_color <- c("predicted Value" = "green", "Dependent Value" = "blue",
"Residual Value" = "red")
Data1 %>%
ggplot(aes(data_index)) +
geom_line(aes(y = predicted1, color = "predicted Value")) +
geom_line(aes( y = data_model$TEL, color ="Dependent Value")) +
geom_line(aes( y = residual1, color ="Residual Value")) +
labs(x = "Index",
y = "Value",
title = "Positive Correlation Model 1",
color = "Legend") +
theme_bw() +
scale_color_manual(values = my_color)
###NEGATIVE Correlation Model 2
n_data2 %>%
ggplot(aes(data_index)) +
geom_line(aes(y = n_predicted2, color = "predicted Value")) +
geom_line(aes( y = data_model$TEL, color ="Dependent Value")) +
geom_line(aes( y = n_residual2, color ="Residual Value")) +
labs(x = "Index",
y = "Value",
title = "Negative Correlation Model 2",
color = "Legend") +
theme_bw() +
scale_color_manual(values = my_color)
pn_data3 %>%
ggplot(aes(data_index)) +
geom_line(aes(y = pn_fitted3, color = "predicted Value")) +
geom_line(aes( y = data_model$TEL, color ="Dependent Value")) +
geom_line(aes( y =pn_rsd3, color ="Residual Value")) +
labs(x = "Index",
y = "Value",
title = "Positive and Negative Correlation Model 3",
color = "Legend") +
theme_bw() +
scale_color_manual(values = my_color)
pn_data4 %>%
ggplot(aes(data_index)) +
geom_line(aes(y = pn_fitted4, color = "predicted Value")) +
geom_line(aes( y = data_model$TEL, color ="Dependent Value")) +
geom_line(aes( y = pn_rsd4, color ="Residual Value")) +
labs(x = "Index",
y = "Value",
title = "Positive and Negative Correlation Model 4",
color = "Legend") +
theme_bw() +
scale_color_manual(values = my_color)
The r-squared, fitted value, sum of residuals, and p-value are used to select the best in model. The residual of all models is typically zero, indicating that all models correctly predicted a value. The r-squared of various models differs widely, with some being especially high and others being perfectly acceptable. We also check the p value of the predictor variables in each model to see if they are statistically significant.One variable in Model 1 is not statistically significant, PHMS, which has a value of 0.05871 because it is greater than 0.05. In model 3, there are two variables, PHP with a p-value of.7556 and PHMS with a p-value of.4102, or 75.56 percent that this predictor PHP is not meaningful for our regression model, and PHMS with 41.02 percent that this predictor PHMS is not meaningful for our regression model.
Now i conclude that the best model among the 4 is the “MODEL 4” since it all predictor variables p-value is statically significant and second for the highest r-squared that is .8905 or 89.05% that higher to model 2 is .6785 or 67.85% and model 2 also all predictor variables p-value are statistically significant. We also see the line graph above in model 4 that the fitted values is very close to the dependent values and have low residual values.
Our model 4 have a response variable of TEL(STock price of Philippines Long Distance Telephone Company) and have a predictor variables are PCOMP(The Philippine Composite Index), GLO(Stock Price Of Globe Telecoms), PHIVTA(Philippine Visitor Travel ArrivaL) and 91TBILL(Philippines 91-day Treasury Bill Rates).
data_model %>%
select(everything()) %>%
skimr::skim()
| Name | Piped data |
| Number of rows | 71 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| numeric | 7 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| TEL | 0 | 1 | 714.26 | 267.42 | 230.00 | 482.50 | 765.00 | 922.50 | 1230.00 | ▅▅▇▇▂ |
| PCOMP | 0 | 1 | 1547.47 | 413.54 | 993.35 | 1231.61 | 1410.07 | 1959.60 | 2486.86 | ▇▇▃▅▂ |
| GLO | 0 | 1 | 481.75 | 160.31 | 124.00 | 431.00 | 510.00 | 588.00 | 860.00 | ▃▁▇▃▁ |
| PHP | 0 | 1 | 47036.17 | 6102.25 | 37975.00 | 40491.50 | 49600.00 | 52202.50 | 55725.00 | ▇▂▂▆▆ |
| PHIVTA | 0 | 1 | 165641.89 | 23126.95 | 109803.00 | 155365.50 | 166265.00 | 175787.50 | 238316.00 | ▂▅▇▂▁ |
| PHMS | 0 | 1 | 356415.80 | 69291.73 | 235853.00 | 302385.50 | 358852.00 | 404984.00 | 514214.00 | ▆▇▇▇▁ |
| 91TBILL | 0 | 1 | 9.20 | 3.34 | 4.69 | 6.31 | 8.90 | 10.75 | 17.79 | ▆▇▂▃▁ |