DATA PRE-PROCESSING

library(devtools)
library(datarium)
marketing

    youtube facebook newspaper sales
1    276.12    45.36     83.04 26.52
2     53.40    47.16     54.12 12.48
3     20.64    55.08     83.16 11.16
4    181.80    49.56     70.20 22.20
5    216.96    12.96     70.08 15.48
6     10.44    58.68     90.00  8.64
7     69.00    39.36     28.20 14.16
8    144.24    23.52     13.92 15.84
9     10.32     2.52      1.20  5.76
10   239.76     3.12     25.44 12.72
11    79.32     6.96     29.04 10.32
12   257.64    28.80      4.80 20.88
13    28.56    42.12     79.08 11.04
14   117.00     9.12      8.64 11.64
15   244.92    39.48     55.20 22.80
16   234.48    57.24     63.48 26.88
17    81.36    43.92    136.80 15.00
18   337.68    47.52     66.96 29.28
19    83.04    24.60     21.96 13.56
20   176.76    28.68     22.92 17.52
21   262.08    33.24     64.08 21.60
22   284.88     6.12     28.20 15.00
23    15.84    19.08     59.52  6.72
24   273.96    20.28     31.44 18.60
25    74.76    15.12     21.96 11.64
26   315.48     4.20     23.40 14.40
27   171.48    35.16     15.12 18.00
28   288.12    20.04     27.48 19.08
29   298.56    32.52     27.48 22.68
30    84.72    19.20     48.96 12.60
31   351.48    33.96     51.84 25.68
32   135.48    20.88     46.32 14.28
33   116.64     1.80     36.00 11.52
34   318.72    24.00      0.36 20.88
35   114.84     1.68      8.88 11.40
36   348.84     4.92     10.20 15.36
37   320.28    52.56      6.00 30.48
38    89.64    59.28     54.84 17.64
39    51.72    32.04     42.12 12.12
40   273.60    45.24     38.40 25.80
41   243.00    26.76     37.92 19.92
42   212.40    40.08     46.44 20.52
43   352.32    33.24      2.16 24.84
44   248.28    10.08     31.68 15.48
45    30.12    30.84     51.96 10.20
46   210.12    27.00     37.80 17.88
47   107.64    11.88     42.84 12.72
48   287.88    49.80     22.20 27.84
49   272.64    18.96     59.88 17.76
50    80.28    14.04     44.16 11.64
51   239.76     3.72     41.52 13.68
52   120.48    11.52      4.32 12.84
53   259.68    50.04     47.52 27.12
54   219.12    55.44     70.44 25.44
55   315.24    34.56     19.08 24.24
56   238.68    59.28     72.00 28.44
57     8.76    33.72     49.68  6.60
58   163.44    23.04     19.92 15.84
59   252.96    59.52     45.24 28.56
60   252.84    35.40     11.16 22.08
61    64.20     2.40     25.68  9.72
62   313.56    51.24     65.64 29.04
63   287.16    18.60     32.76 18.84
64   123.24    35.52     10.08 16.80
65   157.32    51.36     34.68 21.60
66    82.80    11.16      1.08 11.16
67    37.80    29.52      2.64 11.40
68   167.16    17.40     12.24 16.08
69   284.88    33.00     13.20 22.68
70   260.16    52.68     32.64 26.76
71   238.92    36.72     46.44 21.96
72   131.76    17.16     38.04 14.88
73    32.16    39.60     23.16 10.56
74   155.28     6.84     37.56 13.20
75   256.08    29.52     15.72 20.40
76    20.28    52.44    107.28 10.44
77    33.00     1.92     24.84  8.28
78   144.60    34.20     17.04 17.04
79     6.48    35.88     11.28  6.36
80   139.20     9.24     27.72 13.20
81    91.68    32.04     26.76 14.16
82   287.76     4.92     44.28 14.76
83    90.36    24.36     39.00 13.56
84    82.08    53.40     42.72 16.32
85   256.20    51.60     40.56 26.04
86   231.84    22.08     78.84 18.24
87    91.56    33.00     19.20 14.40
88   132.84    48.72     75.84 19.20
89   105.96    30.60     88.08 15.48
90   131.76    57.36     61.68 20.04
91   161.16     5.88     11.16 13.44
92    34.32     1.80     39.60  8.76
93   261.24    40.20     70.80 23.28
94   301.08    43.80     86.76 26.64
95   128.88    16.80     13.08 13.80
96   195.96    37.92     63.48 20.28
97   237.12     4.20      7.08 14.04
98   221.88    25.20     26.40 18.60
99   347.64    50.76     61.44 30.48
100  162.24    50.04     55.08 20.64
101  266.88     5.16     59.76 14.04
102  355.68    43.56    121.08 28.56
103  336.24    12.12     25.68 17.76
104  225.48    20.64     21.48 17.64
105  285.84    41.16      6.36 24.84
106  165.48    55.68     70.80 23.04
107   30.00    13.20     35.64  8.64
108  108.48     0.36     27.84 10.44
109   15.72     0.48     30.72  6.36
110  306.48    32.28      6.60 23.76
111  270.96     9.84     67.80 16.08
112  290.04    45.60     27.84 26.16
113  210.84    18.48      2.88 16.92
114  251.52    24.72     12.84 19.08
115   93.84    56.16     41.40 17.52
116   90.12    42.00     63.24 15.12
117  167.04    17.16     30.72 14.64
118   91.68     0.96     17.76 11.28
119  150.84    44.28     95.04 19.08
120   23.28    19.20     26.76  7.92
121  169.56    32.16     55.44 18.60
122   22.56    26.04     60.48  8.40
123  268.80     2.88     18.72 13.92
124  147.72    41.52     14.88 18.24
125  275.40    38.76     89.04 23.64
126  104.64    14.16     31.08 12.72
127    9.36    46.68     60.72  7.92
128   96.24     0.00     11.04 10.56
129  264.36    58.80      3.84 29.64
130   71.52    14.40     51.72 11.64
131    0.84    47.52     10.44  1.92
132  318.24     3.48     51.60 15.24
133   10.08    32.64      2.52  6.84
134  263.76    40.20     54.12 23.52
135   44.28    46.32     78.72 12.96
136   57.96    56.40     10.20 13.92
137   30.72    46.80     11.16 11.40
138  328.44    34.68     71.64 24.96
139   51.60    31.08     24.60 11.52
140  221.88    52.68      2.04 24.84
141   88.08    20.40     15.48 13.08
142  232.44    42.48     90.72 23.04
143  264.60    39.84     45.48 24.12
144  125.52     6.84     41.28 12.48
145  115.44    17.76     46.68 13.68
146  168.36     2.28     10.80 12.36
147  288.12     8.76     10.44 15.84
148  291.84    58.80     53.16 30.48
149   45.60    48.36     14.28 13.08
150   53.64    30.96     24.72 12.12
151  336.84    16.68     44.40 19.32
152  145.20    10.08     58.44 13.92
153  237.12    27.96     17.04 19.92
154  205.56    47.64     45.24 22.80
155  225.36    25.32     11.40 18.72
156    4.92    13.92      6.84  3.84
157  112.68    52.20     60.60 18.36
158  179.76     1.56     29.16 12.12
159   14.04    44.28     54.24  8.76
160  158.04    22.08     41.52 15.48
161  207.00    21.72     36.84 17.28
162  102.84    42.96     59.16 15.96
163  226.08    21.72     30.72 17.88
164  196.20    44.16      8.88 21.60
165  140.64    17.64      6.48 14.28
166  281.40     4.08    101.76 14.28
167   21.48    45.12     25.92  9.60
168  248.16     6.24     23.28 14.64
169  258.48    28.32     69.12 20.52
170  341.16    12.72      7.68 18.00
171   60.00    13.92     22.08 10.08
172  197.40    25.08     56.88 17.40
173   23.52    24.12     20.40  9.12
174  202.08     8.52     15.36 14.04
175  266.88     4.08     15.72 13.80
176  332.28    58.68     50.16 32.40
177  298.08    36.24     24.36 24.24
178  204.24     9.36     42.24 14.04
179  332.04     2.76     28.44 14.16
180  198.72    12.00     21.12 15.12
181  187.92     3.12      9.96 12.60
182  262.20     6.48     32.88 14.64
183   67.44     6.84     35.64 10.44
184  345.12    51.60     86.16 31.44
185  304.56    25.56     36.00 21.12
186  246.00    54.12     23.52 27.12
187  167.40     2.52     31.92 12.36
188  229.32    34.44     21.84 20.76
189  343.20    16.68      4.44 19.08
190   22.44    14.52     28.08  8.04
191   47.40    49.32      6.96 12.96
192   90.60    12.96      7.20 11.88
193   20.64     4.92     37.92  7.08
194  200.16    50.40      4.32 23.52
195  179.64    42.72      7.20 20.76
196   45.84     4.44     16.56  9.12
197  113.04     5.88      9.72 11.64
198  212.40    11.16      7.68 15.36
199  340.32    50.40     79.44 30.60
200  278.52    10.32     10.44 16.08

 Here,the amount of sales depends on the advertisement made on the you tube,Facebook and newspaper(i.e.sales is an outcome/dependent variable and you tube,Facebook and newspaper are the predictor/explanatory/independent variable) .
 So,lets check the impact of each predictor variable on the outcome variable.

YOUTUBE

Checking the impact of youtube on sales

youtubeonsales<-marketing[,c("youtube","sales")]
View(youtubeonsales)

SELECTING THE MODEL

 As we have need to predict the continuous outcome variables,we use regression technique in supervised learning.
 Here,we will check weather the data is linearly related or non linearly related

library(ggplot2)
ggplot(marketing,aes(youtube,sales))+geom_point()+geom_smooth()

`geom_smooth()` using method = 'loess'

 From above plot of you tube vs sales,it clearly indicates that the you tube data points and sales data points are linearly related to each other and also we have one predictor variable and one outcome variable,we will use simple linear regression technique to build our model.

BUILDING THE MODEL

model<-lm(sales~youtube,data = marketing)
model


Call:
lm(formula = sales ~ youtube, data = marketing)

Coefficients:
(Intercept)      youtube  
    8.43911      0.04754

  Here,the equation is developed by calculating the intercept and slope   
  i.e.  y = 0.04754 * x + 8.43911       where y indicates sales
                                              x indicates you tube
  and hence the model is developed.

   After building/training the model we need to test the model to know how well it performs on test data.

TESTING THE MODEL

Checking for coefficient significance and residual square

summary(model)


Call:
lm(formula = sales ~ youtube, data = marketing)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.0632  -2.3454  -0.2295   2.4805   8.6548 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 8.439112   0.549412   15.36   <2e-16 ***
youtube     0.047537   0.002691   17.67   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.91 on 198 degrees of freedom
Multiple R-squared:  0.6119,    Adjusted R-squared:  0.6099 
F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

 Here,as the p value of both intercept and slope(you tube) are lesser than 0.05 we conclude that intercept(8.439112) and slope(0.047537) are not equals to zero.
 Therefore, the regression coefficients(i.e.slope and intercept) are significant.

 As we have R square as 0.6119,it indicates that 61.19% of time you tube explains better sales.

Checking the accuracy of a model by regression evaluation

library(DMwR)

Loading required package: lattice

Loading required package: grid

regr.eval(marketing$sales,model$fitted.values)

      mae       mse      rmse      mape 
 3.059767 15.138220  3.890787  0.205766

  As we have less mean absolute error(mae),mean square error(mse) and root mean square error(rmse) and mean absolute percentage error(mape) being 20.58% (i.e.nearly 80% accuracy) we can say that model is good.

Checking the regression model assumptions on residuals

linearity

normality

homoscadescity

independency

plot(model)

Here,in residuals vs fitted plot the red line is almost lying near 0 residual value and is almost horizontal and all the fitted values are scattered around it without any systematic relationship.

Therefore, LINEARITY IS MET on residuals.

In normal Q-Q plot drawn,the residuals are almost linearly distributed.(but lets check normality further using other tests)

 In scale-location plot,all the residuals are scattered(i.e. none of the points are clustered at one spot).

Therefore, HOMOSCADESCITY IS MET on residuals.

The residuals vs leverage plot tells about the influential observations which will discuss further clearly.

Checking for normality on residuals

  The normality on residuals can be checked by plotting histogram,by plotting qqplot,by plotting density plot,using mean and median,using skewness and kurtosis and by statistical tests(Shapiro wilk test,Anderson test,klomogrov test).
  The most recommended is statistical tests.  
  But,here we will check by using Shapiro wilk test,Anderson darling test and by skewness and kurtosis.

shapiro.test(model$residuals)


    Shapiro-Wilk normality test

data:  model$residuals
W = 0.99053, p-value = 0.2133

library(nortest)
ad.test(model$residuals)


    Anderson-Darling normality test

data:  model$residuals
A = 0.49121, p-value = 0.217

library(moments)
skewness(model$residuals)

[1] -0.08863202

kurtosis(model$residuals)

[1] 2.779015

 Here,the probability value of both Shapiro wilk test and Anderson darling test is more than 0.05 hence,we accept null hypothesis saying that the residual data is normally distributed.
 And we also have skewness nearly equals to zero and kurtosis nearly equal to 3 where we can say that residual data is normally distributed.

Therefore,NORMALITY IS MET on residuals.

Checking for independency on residuals

Here,we check whether the residuals are correlated(dependent) or not correlated(independent) by using durbin Watson test

library(car)
durbinWatsonTest(model)

 lag Autocorrelation D-W Statistic p-value
   1      0.02342385      1.934689   0.642
 Alternative hypothesis: rho != 0

  Here,the probability value is greater than 0.05 so we accept null hypothesis saying that there is no correlation among residuals(i.e.residuals are independent)

Therefore,INDEPENDENCY IS MET on residuals.

We have additional tests like outliers test and test on influential observations

Outlier test

Here, we check the availibility of outliers

outlierTest(model)


No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
     rstudent unadjusted p-value Bonferonni p
179 -2.633499          0.0091219           NA

boxplot(model$residuals)

 So,here in outlier test we see 179 which indicates that the 179th observation has a largest error and in boxplot we see outliers present,we can decrease the error and remove the outliers so as to increase the accuracy of the model but we will not do so.

Influential observations

 Here,we check the availability of influential observations by using cooks distance.
 Any observation far from cooks distance is referred as influential observations.These observations influence the model to commit an error.

plot(model,4)

 Here,we see 179th,36th observations far from cooks distance which are influential observations.

 Hence,the model is ready to deploy.

DEPLOY THE MODEL & PREDICT THE OUTCOMES

  Lets,predict the amount of sales on the following you tube given data

you_tube<-data.frame(youtube=c(123.8,67,239,598,787.12))
you_tube

  youtube
1  123.80
2   67.00
3  239.00
4  598.00
5  787.12

pred_sales<-predict(model,you_tube)
you_tube$sales<-pred_sales
you_tube

  youtube    sales
1  123.80 14.32415
2   67.00 11.62407
3  239.00 19.80037
4  598.00 36.86602
5  787.12 45.85615

  These are the outcomes(sales) given by the model we developed for the given predictors(you tube).

In the similar manner we can develop the model on facebook-sales and newspaper-sales.

FACEBOOK

Checking the impact of facebook on sales

fbonsales<-marketing[,c("facebook","sales")]
View(fbonsales)

SELECTING THE MODEL

 As we have need to predict the continuous outcome variables,we use regression technique in supervised learning.
 Here,we will check weather the data is linearly related or not.

library(ggplot2)
ggplot(marketing,aes(facebook,sales))+geom_point()+geom_smooth()

`geom_smooth()` using method = 'loess'

 From above plot of facebook vs sales,it clearly indicates that the facebook data points and sales data points are linearly related to each other and also we have one predictor variable and one outcome variable,we will use simple linear regression technique to build our model.

BUILDING THE MODEL

model<-lm(sales~facebook,data = marketing)
model


Call:
lm(formula = sales ~ facebook, data = marketing)

Coefficients:
(Intercept)     facebook  
    11.1740       0.2025

  Here,the equation is developed by calculating the intercept and slope  
  i.e.  y = 0.2025 * x + 11.1740        where y indicates sales
                                              x indicates facebook
  and hence the model is developed.

   After building/training the model we need to test the model to know how well it performs on test data.

TESTING THE MODEL

Checking for coefficient significance and residual square

summary(model)


Call:
lm(formula = sales ~ facebook, data = marketing)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.8766  -2.5589   0.9248   3.3330   9.8173 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 11.17397    0.67548  16.542   <2e-16 ***
facebook     0.20250    0.02041   9.921   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.13 on 198 degrees of freedom
Multiple R-squared:  0.332, Adjusted R-squared:  0.3287 
F-statistic: 98.42 on 1 and 198 DF,  p-value: < 2.2e-16

 Here,as the p value of both intercept and slope are lesser than 0.05 we conclude that intercept(11.17397) and slope(0.2025) are not equals to zero.
 Therefore, the regression coefficients(i.e.slope and intercept) are significant.

 As we have R square as 0.332,it indicates that 33.2% of time facebook explains better sales.

Checking the accuracy of a model by regression evaluation

library(DMwR)
regr.eval(marketing$sales,model$fitted.values)

       mae        mse       rmse       mape 
 3.9842626 26.0530528  5.1042191  0.3381669

  As we have less mean absolute error(mae),mean square error(mse) and root mean square error(rmse) and mean absolute percentage error(mape) being 33.82% (i.e.nearly 66% accuracy) we can say that model is good.

Checking the regression model assumptions on residuals

linearity

normality

homoscadescity

independency

plot(model)

Here,in residuals vs fitted plot the red line is almost lying near 0 residual value and is almost horizontal and all the fitted values are scattered around it without any systematic relationship.

Therefore, LINEARITY IS MET on residuals.

In normal Q-Q plot drawn,the residuals are almost linearly distributed.(but lets check normality further using other tests)

 In scale-location plot,all the residuals are scattered(i.e. none of the points are clustered at one spot).

Therefore, HOMOSCADESCITY IS MET on residuals.

The residuals vs leverage plot tells about the influential observations which will discuss further clearly.

Checking for normality on residuals

shapiro.test(model$residuals)


    Shapiro-Wilk normality test

data:  model$residuals
W = 0.96072, p-value = 2.367e-05

library(nortest)
ad.test(model$residuals)


    Anderson-Darling normality test

data:  model$residuals
A = 2.439, p-value = 3.467e-06

library(moments)
skewness(model$residuals)

[1] -0.7636953

kurtosis(model$residuals)

[1] 3.544281

plot(density(model$residuals))

qqnorm(model$residuals)

 Here,the probability value of both Shapiro wilk test and Anderson darling test is less than 0.05 hence,we accept alternate hypothesis saying that the residual data is not normally distributed.
 And we also have skewness nearly equals to zero and kurtosis nearly equal to 3 where we can say that residual data is normally distributed.
But when we observe density plot and q-q plot we can roughly say that is normally distributed

Therefore,NORMALITY IS HARDLY MET on residuals.

Checking for independency on residuals

library(car)
durbinWatsonTest(model)

 lag Autocorrelation D-W Statistic p-value
   1      0.02274019      1.945713   0.704
 Alternative hypothesis: rho != 0

  Here,the probability value is greater than 0.05 so we accept null hypothesis saying that there is no correlation among residuals(i.e.residuals are independent)

Therefore,INDEPENDENCY IS MET on residuals.

Additional tests

Outlier test

outlierTest(model)

     rstudent unadjusted p-value Bonferonni p
131 -3.825537          0.0001751     0.035019

boxplot(model$residuals)

 So,here in outlier test we see 131 which indicates that the 131th observation has a largest error and in boxplot we see outliers present,we can decrease the error and remove the outliers so as to increase the accuracy of the model but we will not do so.

Influential observations

plot(model,4)

 Here,we see 131th,6th observations far from cooks distance which are influential observations.

 Hence,the model is ready to deploy.

DEPLOY THE MODEL & PREDICT THE OUTCOMES

  Lets,predict the amount of sales on the following facebook given data

fb<-data.frame(facebook=c(123.8,67,239,598,787.12))
fb

pred_sales<-predict(model,fb)
fb$sales<-pred_sales
fb

  facebook     sales
1   123.80  36.24294
2    67.00  24.74118
3   239.00  59.57046
4   598.00 132.26644
5   787.12 170.56245

  These are the outcomes(sales) given by the model we developed for the given predictors(facebook).

NEWSPAPER

Checking the impact of newspaper on sales

newsonsales<-marketing[,c("newspaper","sales")]
View(newsonsales)

SELECTING THE MODEL

 As we have need to predict the continuous outcome variables,we use regression technique in supervised learning.
 Here,we will check weather the data is linearly related or not.

library(ggplot2)
ggplot(marketing,aes(newspaper,sales))+geom_point()+geom_smooth()

`geom_smooth()` using method = 'loess'

 From above plot of newspaper vs sales,it clearly indicates that the newspaper data points and sales data points are not linearly related to each other but we have one predictor variable and one outcome variable,we will use simple linear regression technique to build our model.

BUILDING THE MODEL

model<-lm(sales~newspaper,data = marketing)
model


Call:
lm(formula = sales ~ newspaper, data = marketing)

Coefficients:
(Intercept)    newspaper  
   14.82169      0.05469

  Here,the equation is developed by calculating the intercept and slope   
  i.e.  y = 0.05469 * x + 14.82169      where y indicates sales
                                              x indicates newspaper
  and hence the model is developed.

TESTING THE MODEL

Checking for coefficient significance and residual square

summary(model)


Call:
lm(formula = sales ~ newspaper, data = marketing)

Residuals:
    Min      1Q  Median      3Q     Max 
-13.473  -4.065  -1.007   4.207  15.330 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 14.82169    0.74570   19.88  < 2e-16 ***
newspaper    0.05469    0.01658    3.30  0.00115 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.111 on 198 degrees of freedom
Multiple R-squared:  0.05212,   Adjusted R-squared:  0.04733 
F-statistic: 10.89 on 1 and 198 DF,  p-value: 0.001148

 Here,as the p value of both intercept and slope are lesser than 0.05 we conclude that intercept(14.82169) and slope(0.05469) are not equals to zero.
 Therefore, the regression coefficients(i.e.slope and intercept) are significant.

 As we have R square as 0.05212,it indicates that only 5% of time newspaper explains better sales.

Checking the accuracy of a model by regression evaluation

library(DMwR)
regr.eval(marketing$sales,model$fitted.values)

       mae        mse       rmse       mape 
 4.9758717 36.9705927  6.0803448  0.3860048

  As we have less mean absolute error(mae) and root mean square error(rmse) but slightly more mean square error(mse) and mean absolute percentage error(mape) being 38.6% (i.e.nearly 62% accuracy) we can say that model is good.

Checking the regression model assumptions on residuals

linearity

normality

homoscadescity

independency

plot(model)

Here,in residuals vs fitted plot the red line is slightly horizontal and all the fitted values are scattered around it.

Therefore, LINEARITY IS NEARLY MET on residuals.

In normal Q-Q plot drawn,the residuals are almost linearly distributed.(but lets check normality further using other tests)

 In scale-location plot,all the residuals are scattered(i.e. none of the points are clustered at one spot).

Therefore, HOMOSCADESCITY IS MET on residuals.

The residuals vs leverage plot tells about the influential observations which will discuss further clearly.

Checking for normality on residuals

shapiro.test(model$residuals)


    Shapiro-Wilk normality test

data:  model$residuals
W = 0.98197, p-value = 0.0114

library(nortest)
ad.test(model$residuals)


    Anderson-Darling normality test

data:  model$residuals
A = 1.1601, p-value = 0.004848

library(moments)
skewness(model$residuals)

[1] 0.3295549

kurtosis(model$residuals)

[1] 2.527205

plot(density(model$residuals))

qqnorm(model$residuals)

 Here,the probability value of both Shapiro wilk test and Anderson darling test is less than 0.05 hence,we accept alternate hypothesis saying that the residual data is not normally distributed.
 And we also have skewness nearly equals to zero and kurtosis nearly equal to 3 where we can say that residual data is normally distributed.
But when we observe density plot and q-q plot we can roughly say that is normally distributed

Therefore,NORMALITY IS HARDLY MET on residuals.

Checking for independency on residuals

library(car)
durbinWatsonTest(model)

 lag Autocorrelation D-W Statistic p-value
   1     0.004787825      1.983434   0.914
 Alternative hypothesis: rho != 0

  Here,the probability value is greater than 0.05 so we accept null hypothesis saying that there is no correlation among residuals(i.e.residuals are independent)

Therefore,INDEPENDENCY IS MET on residuals.

Additional tests

Outlier test

outlierTest(model)


No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
   rstudent unadjusted p-value Bonferonni p
37 2.558821           0.011254           NA

boxplot(model$residuals)

 So,here in outlier test we see 37 which indicates that the 37th observation has a largest error and in boxplot we wont see any outliers.

Influential observations

plot(model,4)

 Here,we see 17th,76th observations far from cooks distance which are influential observations.

 Hence,the model is ready to deploy.

DEPLOY THE MODEL & PREDICT THE OUTCOMES

  Lets,predict the amount of sales on the following newspaper given data

newsp<-data.frame(newspaper=c(123.8,67,239,598,787.12))
newsp

  newspaper
1    123.80
2     67.00
3    239.00
4    598.00
5    787.12

pred_sales<-predict(model,newsp)
newsp$sales<-pred_sales
newsp

  newspaper    sales
1    123.80 21.59269
2     67.00 18.48613
3    239.00 27.89334
4    598.00 47.52816
5    787.12 57.87172

  These are the outcomes(sales) given by the model we developed for the given predictors(newspaper).

Comparing the outcomes of sales from youtube,facebook and newspaper

market<-data.frame(c(you_tube,fb,newsp))
market

  youtube    sales facebook   sales.1 newspaper  sales.2
1  123.80 14.32415   123.80  36.24294    123.80 21.59269
2   67.00 11.62407    67.00  24.74118     67.00 18.48613
3  239.00 19.80037   239.00  59.57046    239.00 27.89334
4  598.00 36.86602   598.00 132.26644    598.00 47.52816
5  787.12 45.85615   787.12 170.56245    787.12 57.87172

mean(market[,"sales"])

[1] 25.69415

mean(market[,"sales.1"])

[1] 84.6767

mean(market[,"sales.2"])

[1] 34.67441

  Here,the average sales from youtube is 25.69
       the average sales from facebook is 84.68
       the average sales from newspaper is 34.67 where facebook has better sales than youtube and newspaper.But this is not enough to test a model still we have lot of factors (like bias-variance) to be taken into consideration to perform the accurate analysis.

MODEL BUILDING ON MARKETING

shekar

6 June 2018

DATA PRE-PROCESSING

YOUTUBE

Checking the impact of youtube on sales

SELECTING THE MODEL

BUILDING THE MODEL

TESTING THE MODEL

Checking for coefficient significance and residual square

Checking the accuracy of a model by regression evaluation

Checking the regression model assumptions on residuals

linearity

normality

homoscadescity

independency

Checking for normality on residuals

Checking for independency on residuals

We have additional tests like outliers test and test on influential observations

Outlier test

Influential observations

DEPLOY THE MODEL & PREDICT THE OUTCOMES

FACEBOOK

Checking the impact of facebook on sales

SELECTING THE MODEL

BUILDING THE MODEL

TESTING THE MODEL

Checking for coefficient significance and residual square

Checking the accuracy of a model by regression evaluation

Checking the regression model assumptions on residuals

linearity

normality

homoscadescity

independency

Checking for normality on residuals

Checking for independency on residuals

Additional tests

Outlier test

Influential observations

DEPLOY THE MODEL & PREDICT THE OUTCOMES

NEWSPAPER

Checking the impact of newspaper on sales

SELECTING THE MODEL

BUILDING THE MODEL

TESTING THE MODEL

Checking for coefficient significance and residual square

Checking the accuracy of a model by regression evaluation

Checking the regression model assumptions on residuals

linearity

normality

homoscadescity

independency

Checking for normality on residuals

Checking for independency on residuals

Additional tests

Outlier test

Influential observations

DEPLOY THE MODEL & PREDICT THE OUTCOMES

Comparing the outcomes of sales from youtube,facebook and newspaper