Load Libraries

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(psych)

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

library(readr)

Load The Dataset

my_data <- read_csv("C:/Users/HP/Desktop/R Data Science/EDA-MODELING AND PRESENTATION/activity1_data.csv")

## Rows: 71 Columns: 8

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): MONTH, 91TBILL
## dbl (2): TEL, GLO

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

my_data

The result above is now our dataset with a 71 observations and 8 variables.

Check the Variables Names

names(my_data)

## [1] "MONTH"   "PCOMP"   "TEL"     "GLO"     "PHP"     "PHIVTA"  "PHMS"   
## [8] "91TBILL"

The result above is the 8 variables names in the dataset.

Check if Their Any Missing Value

sum(is.na(my_data))

## [1] 0

So there no any missing or null value in the dataset.

DAta Cleaning

We see in our previous dataset that our “91TBill” variable is character data type so we make some changes on it. Before we create our model we should convert a “91TBILL” variable into a double data type.

data_model <-  my_data %>% 
   select(TEL,PCOMP, GLO, PHP, PHIVTA, PHMS, `91TBILL`) %>% 
  mutate(`91TBILL` =gsub('.{1}$', '', `91TBILL`)) %>% 
  mutate(`91TBILL` = as.double(`91TBILL`))

head(data_model)

The result above is the first 6 observations with a new dataset. We also, see that their no “MONTH” variable in our dataset since it no use for creating the model. As we can see that “91TBILL”variable is now a double data type, now we can use it for creating our model.Lastly, the “TEL” variable in a first column so that it serve as our response variable for our model.

Make a MATRIX Scatterplot

plot(data_model, col = "blue", main = "SCATTERPLOT MATRIX")

The chart above is scatter plot matrix that show the relationship of our all variables with each other in our data set. Since we only looking for a relationship of “TEL” to the remaining variables on the next chunk we make a scatter plot of two only variables for easy visualization.

Creating a Correlation Matrix

corr.test(x = data_model)

## Call:corr.test(x = data_model)
## Correlation matrix 
##           TEL PCOMP   GLO   PHP PHIVTA  PHMS 91TBILL
## TEL      1.00  0.89 -0.40 -0.80   0.51 -0.69    0.69
## PCOMP    0.89  1.00 -0.44 -0.87   0.50 -0.68    0.55
## GLO     -0.40 -0.44  1.00  0.65  -0.21  0.80   -0.73
## PHP     -0.80 -0.87  0.65  1.00  -0.45  0.86   -0.65
## PHIVTA   0.51  0.50 -0.21 -0.45   1.00 -0.21    0.23
## PHMS    -0.69 -0.68  0.80  0.86  -0.21  1.00   -0.82
## 91TBILL  0.69  0.55 -0.73 -0.65   0.23 -0.82    1.00
## Sample Size 
## [1] 71
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##         TEL PCOMP  GLO PHP PHIVTA PHMS 91TBILL
## TEL       0     0 0.00   0   0.00 0.00    0.00
## PCOMP     0     0 0.00   0   0.00 0.00    0.00
## GLO       0     0 0.00   0   0.16 0.00    0.00
## PHP       0     0 0.00   0   0.00 0.00    0.00
## PHIVTA    0     0 0.08   0   0.00 0.16    0.16
## PHMS      0     0 0.00   0   0.08 0.00    0.00
## 91TBILL   0     0 0.00   0   0.05 0.00    0.00
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

The result above our the correlation with the dependent variable “TEL” are PCOMP 0.89, GLO -0.40, PHP -0.80, PHIVTA 0.51, PHMS -0.69, 91TBILL 0.69

creating a scatterplpot

THE PHILIPPINE COMPOSITE INDEX

data_model %>% 
  ggplot(aes(x = PCOMP, y = TEL, colour = PCOMP)) +
  geom_point(shape = "circle", size = 1.5) +
  geom_smooth(method = lm, se = F) +
  scale_color_gradient() +
  labs(
    x = "THE PHILIPPINE COMPOSITE INDEX",
    y = "TEL",
    title = "Relationship Between TEL and PCOMP"
  ) +
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

The chart above show that it have a strong positive relationship since it have a correlation of 0.89 which is fairly high.

STOCK PRICE OF GLOBE TELECOMS

data_model %>% 
  ggplot(aes(x = GLO, y = TEL, colour = GLO)) +
  geom_point(shape = "circle", size = 1.5) +
  geom_smooth(method = lm, se = F) +
  scale_color_gradient() +
  labs(
    x = "STOCK PRICE OF GLOBE TELECOMS",
    y = "TEL",
    title = "Relationship Between TEL and GLO"
  ) +
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

The chart above show that it have a negative relationship since it have a correlation of -0.40 which is moderately low.

US DOLLAR EXCHANGE RATE

data_model %>% 
  ggplot(aes(x = PHP, y = TEL, colour = PHP)) +
  geom_point(shape = "circle", size = 1.5) +
  geom_smooth(method = lm, se = F) +
  scale_color_gradient() +
  labs(
    x = "US DOLLAR EXCHANGE RATE",
    y = "TEL",
    title = "Relationship Between TEL and PHP"
  ) +
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

The chart above show that it have a strong negative relationship since it have a correlation of -0.80 which is fairly high.

PHILIPPINE VISITOR TRAVEL ARRIVAL

data_model %>% 
  ggplot(aes(x = PHIVTA, y = TEL, colour = PHIVTA)) +
  geom_point(shape = "circle", size = 1.5) +
  geom_smooth(method = lm, se = F) +
  scale_color_gradient() +
  labs(
    x = "PHILIPPINE VISITOR TRAVEL ARRIVAL",
    y = "TEL",
    title = "Relationship Between TEL and PHIVTA"
  ) +
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

The chart above show that it have a moderate positive relationship since it have a correlation of 0.51 which is not bit too high or low. It may considered to be low because of too many outliers that affect the r value.

PHILIPPINE MONEY SUPLLY(MI)

data_model %>% 
  ggplot(aes(x = PHMS, y = TEL, colour = PHMS)) +
  geom_point(shape = "circle", size = 1.5) +
  geom_smooth(method = lm, se = F) +
  scale_color_gradient() +
  labs(
    x = "PHILIPPINE MONEY SUPLLY(MI)",
    y = "TEL",
    title = "Relationship Between TEL and PHMS"
  ) +
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

The chart above show that it have a negative relationship since it have a correlation of -0.69 which is moderately high.

PHILIPPINE 91-DAY TREASURYT BILL RATES

data_model %>% 
  ggplot(aes(x = `91TBILL`, y = TEL, colour = `91TBILL`)) +
  geom_point(shape = "circle", size = 1.5) +
  geom_smooth(method = lm, se = F) +
  scale_color_gradient() +
  labs(
    x = "PHILIPPINE 91-DAY TREASURYT BILL RATES",
    y = "TEL",
    title = "Relationship Between TEL and 91TBILL"
  ) +
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

The figure above show that it have a positive relationship since it have a correlation of 0.69 which is moderately high.

CREATING A MULTIPLE LINEAR MODEL

Based on Positive Correlation

POSITIVE CORRELATION Model 1

positive_model1 <- lm(data_model$TEL ~ data_model$PCOMP + data_model$PHIVTA + data_model$`91TBILL`)
positive_model1

## 
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$PHIVTA + 
##     data_model$`91TBILL`)
## 
## Coefficients:
##          (Intercept)      data_model$PCOMP     data_model$PHIVTA  
##           -3.832e+02             4.309e-01             1.276e-03  
## data_model$`91TBILL`  
##            2.384e+01

summary(positive_model1)

## 
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$PHIVTA + 
##     data_model$`91TBILL`)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -164.567  -82.298    1.316   62.207  278.569 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -3.832e+02  9.174e+01  -4.177 8.72e-05 ***
## data_model$PCOMP      4.309e-01  4.090e-02  10.536 7.42e-16 ***
## data_model$PHIVTA     1.276e-03  6.251e-04   2.041   0.0452 *  
## data_model$`91TBILL`  2.384e+01  4.510e+00   5.287 1.47e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 104.6 on 67 degrees of freedom
## Multiple R-squared:  0.8535, Adjusted R-squared:  0.8469 
## F-statistic: 130.1 on 3 and 67 DF,  p-value: < 2.2e-16

The value of r-squared is 0.8535 in model 1 or we can say that the three predictor variables PCOMP, PHIVTA and 91TBILL explain 85.35% of variability to TEL which is our response variable. The r-squared is 0.8535 or 85.35 % which is fairly high and we have a model that can predict a good value.

Better view of Coeffiecents of The Model 1

positive_model1[["coefficients"]][["(Intercept)"]]

## [1] -383.2146

positive_model1[["coefficients"]][["data_model$PCOMP"]]

## [1] 0.4309425

positive_model1[["coefficients"]][["data_model$PHIVTA"]]

## [1] 0.001275861

positive_model1[["coefficients"]][["data_model$`91TBILL`"]]

## [1] 23.84147

Result of the Fitted Value and Residual of Model 1

predicted1 <- -383.2146 + (0.4309425 * data_model$PCOMP + 0.001275861 * data_model$PHIVTA +
                            23.84147 * data_model$`91TBILL`)
predicted1

##  [1]  691.4312  554.5983  568.4467  482.4263  462.3039  488.1147  443.7543
##  [8]  376.2159  409.9886  412.2814  398.7273  419.8613  425.3258  379.4038
## [15]  386.6462  392.2392  415.5464  432.5177  431.5487  504.2543  520.7387
## [22]  596.0019  589.8882  589.7076  544.0040  481.9450  418.7096  540.8264
## [29]  629.1610  633.2518  615.9432  636.9414  673.2192  708.7840  753.7647
## [36]  847.9549  804.4305  798.8409  583.2013  615.6338  694.4076  651.6648
## [43]  694.1776  667.4043  750.4575  797.7611  763.2542  927.1257 1023.6837
## [50]  932.3325  922.2898  937.8204  971.0146 1046.1560 1124.0014 1113.0560
## [57] 1161.8881 1027.1227  991.3715 1016.7286 1062.4082 1044.3615  917.9368
## [64]  701.8009  682.9923  883.6274  915.2217 1031.4517 1144.0442 1216.7908
## [71] 1239.5674

residual1 <- data_model$TEL - predicted1
residual1

##  [1]  278.568789  200.401673  196.553301  157.573745   72.696122   41.885262
##  [7]  121.245676   56.284124  -29.988616  -97.281367 -106.227324 -124.861347
## [13] -155.325826  -89.403811 -156.646202 -112.239220 -130.546443 -117.517676
## [19]  -61.548669  -66.754348  -60.738658  -61.001855  -84.888161  -79.707570
## [25] -126.503998  -54.444960  -38.709590  -60.826392 -144.161010   41.748248
## [31]  109.056844   13.058642   16.780790    6.215963   41.235347   67.045123
## [37]   60.569536    6.159107  191.798675  194.366240   70.592388   18.335188
## [43]   80.822444   97.595717  -10.457463  102.238893  126.745817   22.874264
## [49]    1.316343  -92.332477  -87.289815  -52.820394  -31.014578   63.844000
## [55]   35.998550    1.943978   68.111860   -7.122699  -86.371512 -116.728615
## [61]  -62.408223  -14.361465   47.063199  218.199142   42.007722   41.372571
## [67]   34.778336  -31.451652  -69.044222 -161.790809 -164.567401

Data1 <- cbind(data_model, predicted1, residual1)
Data1

Check if the total residual of the model 1 is zero

sum(residual1)

## [1] -0.000749914

The sum of the Residual in model 1 is 0 so it make a good prediction.

Based on Negative Correlation

NEGATIVE CORRELATION MODEL 2

negative_model2 <- lm(data_model$TEL ~ data_model$GLO + data_model$PHP + data_model$PHMS)
negative_model2

## 
## Call:
## lm(formula = data_model$TEL ~ data_model$GLO + data_model$PHP + 
##     data_model$PHMS)
## 
## Coefficients:
##     (Intercept)   data_model$GLO   data_model$PHP  data_model$PHMS  
##       2.416e+03        5.726e-01       -3.234e-02       -1.279e-03

summary(negative_model2)

## 
## Call:
## lm(formula = data_model$TEL ~ data_model$GLO + data_model$PHP + 
##     data_model$PHMS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -238.85 -121.40  -38.47   91.83  514.94 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      2.416e+03  1.545e+02  15.633  < 2e-16 ***
## data_model$GLO   5.726e-01  1.958e-01   2.924  0.00471 ** 
## data_model$PHP  -3.234e-02  5.915e-03  -5.468 7.26e-07 ***
## data_model$PHMS -1.279e-03  6.653e-04  -1.923  0.05871 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 155 on 67 degrees of freedom
## Multiple R-squared:  0.6785, Adjusted R-squared:  0.6641 
## F-statistic: 47.14 on 3 and 67 DF,  p-value: < 2.2e-16

The value of r-squared is 0.6785 in model 2 or we can say that the three predictor variables GLO, PHMS and PHP explain 67.85% of variability to TEL which is our response variable. The r-squared is 0.6785 or 67.85 % which is not too high and we have a model that can predict a moderate value.

Better view of Coeffiecents of The Model 2

negative_model2[["coefficients"]][["(Intercept)"]]

## [1] 2415.641

negative_model2[["coefficients"]][["data_model$GLO"]]

## [1] 0.5725913

negative_model2[["coefficients"]][["data_model$PHMS"]]

## [1] -0.001279403

negative_model2[["coefficients"]][["data_model$PHP"]]

## [1] -0.03234167

Result of the Fitted Value and Residual of Model 2

n_predicted2 <- 2415.641 + 0.5725913 * data_model$GLO -0.001279403 * data_model$PHMS  + -0.03234167 * data_model$PHP
n_predicted2

##  [1]  455.0582  423.4010  499.3698  468.3476  424.0964  462.2850  462.5904
##  [8]  470.9702  465.5918  438.7892  387.3375  407.4289  347.3204  376.8979
## [15]  394.1481  460.6403  499.1987  533.0851  563.9351  655.3897  624.6503
## [22]  633.2957  645.7771  633.0680  557.0550  543.2442  534.5210  588.8434
## [29]  604.7312  516.3169  555.5606  610.0422  572.5319  647.1779  724.5687
## [36]  697.6885  623.7806  650.1253  583.9358  850.8673  906.3764  874.7472
## [43]  935.2562  919.2418  978.8488  939.5954  951.1574  905.0265  856.4070
## [50]  909.4427  981.5351  950.9895 1009.5491 1068.8366 1083.2918 1075.8630
## [57] 1041.1627  938.7595  931.5091  979.7389  929.0070  970.8717  909.5233
## [64]  786.5273  793.3849  880.7361  877.4868  949.8623  913.7529  983.2905
## [71]  887.1030

n_residual2<- data_model$TEL - n_predicted2
n_residual2

##  [1]  514.94181  331.59899  265.63023  171.65244  110.90359   67.71498
##  [7]  102.40964  -38.47020  -85.59176 -123.78916  -94.83750 -112.42888
## [13]  -77.32037  -86.89788 -164.14811 -180.64034 -214.19871 -218.08509
## [19] -193.93507 -217.88966 -164.65034  -98.29567 -140.77711 -123.06802
## [25] -139.55495 -115.74418 -154.52101 -108.84338 -119.73121  158.68310
## [31]  169.43941   39.95783  117.46810   67.82212   70.43128  217.31154
## [37]  241.21944  154.87468  191.06421  -40.86734 -141.37645 -204.74723
## [43] -160.25619 -154.24176 -238.84877  -39.59544  -61.15737   44.97355
## [49]  168.59298  -69.44266 -146.53515  -65.98953  -69.54913   41.16342
## [55]   76.70822   39.13698  188.83727   81.24045  -26.50912  -79.73893
## [61]   70.99296   59.12835   55.47674  133.47270  -68.38490   44.26389
## [67]   72.51320   50.13772  161.24713   71.70948  187.89699

n_data2 <- cbind(data_model, n_predicted2, n_residual2)
n_data2

check if the total residual of the model 2 is zero

sum(n_residual2)

## [1] -0.04315892

The sum of residual is typically zero that our model 2 predict an accurate value.

Using both Negative and Positive Correlation in One Model

POSITIVE AND NEGATIVE CORRELATION MODEL 3

pn_model3 <- lm(data_model$TEL ~ data_model$PCOMP + data_model$GLO + data_model$PHP + 
             data_model$PHIVTA + data_model$PHMS + data_model$`91TBILL`)
pn_model3

## 
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$GLO + 
##     data_model$PHP + data_model$PHIVTA + data_model$PHMS + data_model$`91TBILL`)
## 
## Coefficients:
##          (Intercept)      data_model$PCOMP        data_model$GLO  
##           -5.174e+02             3.914e-01             5.713e-01  
##       data_model$PHP     data_model$PHIVTA       data_model$PHMS  
##           -1.860e-03             1.549e-03            -4.326e-04  
## data_model$`91TBILL`  
##            3.652e+01

summary(pn_model3)

## 
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$GLO + 
##     data_model$PHP + data_model$PHIVTA + data_model$PHMS + data_model$`91TBILL`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -165.40  -65.37  -17.86   56.62  268.78 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -5.174e+02  2.952e+02  -1.753   0.0844 .  
## data_model$PCOMP      3.914e-01  5.997e-02   6.527 1.25e-08 ***
## data_model$GLO        5.713e-01  1.233e-01   4.633 1.82e-05 ***
## data_model$PHP       -1.860e-03  6.001e-03  -0.310   0.7576    
## data_model$PHIVTA     1.549e-03  6.027e-04   2.570   0.0125 *  
## data_model$PHMS      -4.326e-04  5.219e-04  -0.829   0.4102    
## data_model$`91TBILL`  3.652e+01  6.317e+00   5.781 2.39e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 91.16 on 64 degrees of freedom
## Multiple R-squared:  0.8938, Adjusted R-squared:  0.8838 
## F-statistic: 89.73 on 6 and 64 DF,  p-value: < 2.2e-16

The value of r-squared is 0.8938 in model 3 or we can say that the six predictor variables GLO,PHP, PHIVTA, PHMS and 91TBILL explain 89.38% of variability to TEL which is our response variable. The r-squared is 0.8938 or 89.38% which is pretty high and we have a model that can predict a good value.

Result of the Fitted Value and Residual of Model 3

pn_fitted3 <- -517.402 + 0.3913958 * data_model$PCOMP + 0.5713458 * data_model$GLO +
  -0.001859649 * data_model$PHP + 0.001548627 * data_model$PHIVTA + -0.0004326304 * data_model$PHMS + 
  36.5221 * data_model$`91TBILL`

pn_fitted3

##  [1]  810.1342  600.8200  649.0391  505.0524  451.6652  504.0369  421.1852
##  [8]  368.7576  385.6347  388.8232  347.6912  367.6268  323.1700  271.2767
## [15]  272.5182  301.5678  348.6981  382.2264  382.7592  536.1589  537.5004
## [22]  636.7784  639.8043  631.3326  568.2978  467.6272  397.8645  547.0332
## [29]  650.3994  642.7001  633.8391  663.0484  710.9546  741.8850  816.0734
## [36]  923.6892  890.0263  901.8112  584.1258  706.7502  810.3011  732.1423
## [43]  781.9903  722.4373  830.5059  825.2548  760.0200  894.7038  981.3213
## [50]  910.9889  922.2460  917.6878  958.3382 1052.0567 1131.0787 1132.6033
## [57] 1148.6842  965.6618  936.7953  985.9763 1024.6945 1027.3843  864.7756
## [64]  651.2206  641.2508  869.9392  881.4480  977.6987 1096.8813 1185.6442
## [71] 1180.3823

pn_rsd3 <- data_model$TEL - pn_fitted3
pn_rsd3

##  [1]  159.865761  154.180000  115.960883  134.947602   83.334823   25.963134
##  [7]  143.814826   63.742388   -5.634700  -73.823190  -55.191170  -72.626823
## [13]  -53.169974   18.723301  -42.518182  -21.567771  -63.698113  -67.226403
## [19]  -12.759206  -98.658873  -77.500374 -101.778413 -134.804268 -121.332649
## [25] -150.797765  -40.127241  -17.864532  -67.033201 -165.399443   32.299922
## [31]   91.160902  -13.048406  -20.954624  -26.884993  -21.073434   -8.689236
## [37]  -25.026331  -96.811250  190.874215  103.249773  -45.301128  -62.142324
## [43]   -6.990295   42.562742  -90.505899   74.745192  129.980015   55.296212
## [49]   43.678700  -70.988882  -87.245964  -32.687829  -18.338209   57.943337
## [55]   28.921342  -17.603288   81.315803   54.338184  -31.795253  -85.976285
## [61]  -24.694488    2.615708  100.224369  268.779437   83.749190   55.060751
## [67]   68.551979   22.301263  -21.881254 -130.644218 -105.382254

pn_data3 <- cbind(data_model , pn_fitted3, pn_rsd3)
pn_data3

check if the total residual of the model 3 is zero

sum(pn_rsd3)

## [1] 0.003618887

The sum of residual on model 3 is zero so it predict a good value.

POSITIVE AND NEGATIVE CORRELATION MODEL 4

pn_model4 <- lm(data_model$TEL ~ data_model$PCOMP + data_model$GLO  + data_model$PHIVTA +
                data_model$`91TBILL`)
pn_model4

## 
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$GLO + 
##     data_model$PHIVTA + data_model$`91TBILL`)
## 
## Coefficients:
##          (Intercept)      data_model$PCOMP        data_model$GLO  
##           -7.828e+02             4.378e-01             4.704e-01  
##    data_model$PHIVTA  data_model$`91TBILL`  
##            1.378e-03             3.966e+01

summary(pn_model4)

## 
## Call:
## lm(formula = data_model$TEL ~ data_model$PCOMP + data_model$GLO + 
##     data_model$PHIVTA + data_model$`91TBILL`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -164.80  -63.46  -20.39   67.09  292.74 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -7.828e+02  1.163e+02  -6.730 4.88e-09 ***
## data_model$PCOMP      4.378e-01  3.565e-02  12.281  < 2e-16 ***
## data_model$GLO        4.704e-01  9.952e-02   4.727 1.24e-05 ***
## data_model$PHIVTA     1.378e-03  5.448e-04   2.528   0.0139 *  
## data_model$`91TBILL`  3.966e+01  5.160e+00   7.686 9.65e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 91.12 on 66 degrees of freedom
## Multiple R-squared:  0.8905, Adjusted R-squared:  0.8839 
## F-statistic: 134.2 on 4 and 66 DF,  p-value: < 2.2e-16

The value of r-squared is 0.8905 in model 4 or we can say that the 4 predictor variables PCOMP, GLO, PHIVTA and 91TBILL explain 89.05% of variability to TEL which is our response variable. The r-squared is 0.8905 or 89.05% which is pretty high and we have a model that can predict a good value.

Better view of Coeffiecents of The Model 4

pn_model4[["coefficients"]][["(Intercept)"]]

## [1] -782.8472

pn_model4[["coefficients"]][["data_model$PCOMP"]]

## [1] 0.437831

pn_model4[["coefficients"]][["data_model$GLO"]]

## [1] 0.4703947

pn_model4[["coefficients"]][["data_model$PHIVTA"]]

## [1] 0.001377586

pn_model4[["coefficients"]][["data_model$`91TBILL`"]]

## [1] 39.66295

Result of the Fitted Value and Residual of Model 4

pn_fitted4 <- -782.8472 + (0.437831 * data_model$PCOMP + 0.4703947 * data_model$GLO +
                              0.001377586 * data_model$PHIVTA + 39.66295 *  data_model$`91TBILL`)

pn_fitted4

##  [1]  829.4008  621.9778  656.6593  517.2916  461.7663  509.6347  438.7308
##  [8]  386.0370  405.2896  399.1766  359.3245  376.6481  346.1844  289.9296
## [15]  285.9536  308.5448  343.8319  373.3701  375.0045  519.7080  524.8927
## [22]  625.6109  628.4716  620.2825  565.7205  467.7336  394.9414  547.6876
## [29]  649.7960  649.7916  640.9357  670.1462  718.3513  739.8814  820.9865
## [36]  942.8010  911.1754  919.4129  578.1231  683.9633  780.3376  701.7430
## [43]  753.8440  696.8376  802.0328  805.1807  739.3996  898.7751 1008.2216
## [50]  909.3759  911.6164  912.6211  948.2494 1043.9730 1130.9181 1135.3897
## [57] 1156.3278  969.8743  943.1025  979.7618 1025.6818 1017.5833  854.3763
## [64]  627.2587  612.6658  854.1851  876.6633  985.4936 1111.9301 1200.8707
## [71] 1213.0392

pn_rsd4 <- data_model$TEL - pn_fitted4
pn_rsd4

##  [1]  140.59923585  133.02222532  108.34065405  122.70843791   73.23367723
##  [6]   20.36527011  126.26920854   46.46300955  -25.28956107  -84.17655787
## [11]  -66.82451807  -81.64814400  -76.18438515    0.07037975  -55.95359648
## [16]  -28.54484533  -58.83194814  -58.37005491   -5.00453179  -82.20803975
## [21]  -64.89271858  -90.61090820 -123.47164610 -110.28247962 -148.22048357
## [26]  -40.23362631  -14.94137428  -67.68757670 -164.79602517   25.20838015
## [31]   84.06427439  -20.14620219  -28.35129712  -24.88142098  -25.98653509
## [36]  -27.80095082  -46.17536489 -114.41294358  196.87685207  126.03674098
## [41]  -15.33756705  -31.74297131   21.15596255   68.16241283  -62.03275861
## [46]   94.81926457  150.60039597   51.22493200   16.77836774  -69.37587342
## [51]  -76.61644866  -27.62108192   -8.24940282   66.02698956   29.08190599
## [56]  -20.38970594   73.67224433   50.12566168  -38.10247084  -79.76181748
## [61]  -25.68176035   12.41674709  110.62372586  292.74133635  112.33416628
## [66]   70.81490763   73.33670275   14.50639918  -36.93007770 -145.87071262
## [71] -138.03922743

pn_data4 <- cbind(data_model , pn_fitted4, pn_rsd4)
pn_data4

check if the total residual of the model 4 is zero

sum(pn_rsd4)

## [1] 0.000856366

The sum of residual on model 4 is zero so it predict a good value.

Creating a Line Graph of Fitted, Residual and Dependent Value of the Models

Positive Correlation Model 1

#creatinga index

data_index <- as.integer(rownames(data_model))

my_color <- c("predicted Value" = "green", "Dependent Value" = "blue", 
              "Residual Value" = "red")

Data1 %>% 
  ggplot(aes(data_index)) +
  geom_line(aes(y = predicted1, color = "predicted Value")) +
  geom_line(aes( y = data_model$TEL, color ="Dependent Value")) +
  geom_line(aes( y = residual1, color ="Residual Value")) +
  labs(x = "Index",
       y = "Value",
       title = "Positive Correlation Model 1",
       color = "Legend") +
  theme_bw() +
  scale_color_manual(values = my_color)

###NEGATIVE Correlation Model 2

n_data2 %>% 
  ggplot(aes(data_index)) +
  geom_line(aes(y = n_predicted2, color = "predicted Value")) +
  geom_line(aes( y = data_model$TEL, color ="Dependent Value")) +
  geom_line(aes( y = n_residual2, color ="Residual Value")) +
  labs(x = "Index",
       y = "Value",
       title = "Negative Correlation Model 2",
       color = "Legend") +
  theme_bw() +
  scale_color_manual(values = my_color)

Positive and Negative Correlation Model 3

pn_data3 %>% 
  ggplot(aes(data_index)) +
  geom_line(aes(y = pn_fitted3, color = "predicted Value")) +
  geom_line(aes( y = data_model$TEL, color ="Dependent Value")) +
  geom_line(aes( y =pn_rsd3, color ="Residual Value")) +
  labs(x = "Index",
       y = "Value",
       title = "Positive and Negative Correlation Model 3",
       color = "Legend") +
  theme_bw() +
  scale_color_manual(values = my_color)

Positive AND Negative Correlation Model 4

pn_data4 %>% 
  ggplot(aes(data_index)) +
  geom_line(aes(y = pn_fitted4, color = "predicted Value")) +
  geom_line(aes( y = data_model$TEL, color ="Dependent Value")) +
  geom_line(aes( y = pn_rsd4, color ="Residual Value")) +
  labs(x = "Index",
       y = "Value",
       title = "Positive and Negative Correlation Model 4",
       color = "Legend") +
  theme_bw() +
  scale_color_manual(values = my_color)

Choosing the Best Model

The r-squared, fitted value, sum of residuals, and p-value are used to select the best in model. The residual of all models is typically zero, indicating that all models correctly predicted a value. The r-squared of various models differs widely, with some being especially high and others being perfectly acceptable. We also check the p value of the predictor variables in each model to see if they are statistically significant.One variable in Model 1 is not statistically significant, PHMS, which has a value of 0.05871 because it is greater than 0.05. In model 3, there are two variables, PHP with a p-value of.7556 and PHMS with a p-value of.4102, or 75.56 percent that this predictor PHP is not meaningful for our regression model, and PHMS with 41.02 percent that this predictor PHMS is not meaningful for our regression model.

Now i conclude that the best model among the 4 is the “MODEL 4” since it all predictor variables p-value is statically significant and second for the highest r-squared that is .8905 or 89.05% that higher to model 2 is .6785 or 67.85% and model 2 also all predictor variables p-value are statistically significant. We also see the line graph above in model 4 that the fitted values is very close to the dependent values and have low residual values.

Our model 4 have a response variable of TEL(STock price of Philippines Long Distance Telephone Company) and have a predictor variables are PCOMP(The Philippine Composite Index), GLO(Stock Price Of Globe Telecoms), PHIVTA(Philippine Visitor Travel ArrivaL) and 91TBILL(Philippines 91-day Treasury Bill Rates).

data_model %>% 
  select(everything()) %>% 
  skimr::skim()

Data summary
Name	Piped data
Number of rows	71
Number of columns	7
_______________________
Column type frequency:
numeric	7
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
TEL	1	714.26	267.42	230.00	482.50	765.00	922.50	1230.00	▅▅▇▇▂
PCOMP	1	1547.47	413.54	993.35	1231.61	1410.07	1959.60	2486.86	▇▇▃▅▂
GLO	1	481.75	160.31	124.00	431.00	510.00	588.00	860.00	▃▁▇▃▁
PHP	1	47036.17	6102.25	37975.00	40491.50	49600.00	52202.50	55725.00	▇▂▂▆▆
PHIVTA	1	165641.89	23126.95	109803.00	155365.50	166265.00	175787.50	238316.00	▂▅▇▂▁
PHMS	1	356415.80	69291.73	235853.00	302385.50	358852.00	404984.00	514214.00	▆▇▇▇▁
91TBILL	1	9.20	3.34	4.69	6.31	8.90	10.75	17.79	▆▇▂▃▁

ACTIVITY 1 MULTIPLE LINEAR REGRESSION MODEL

John LLoyd Espada

2/22/2022