library(devtools)
library(datarium)
marketing
youtube facebook newspaper sales
1 276.12 45.36 83.04 26.52
2 53.40 47.16 54.12 12.48
3 20.64 55.08 83.16 11.16
4 181.80 49.56 70.20 22.20
5 216.96 12.96 70.08 15.48
6 10.44 58.68 90.00 8.64
7 69.00 39.36 28.20 14.16
8 144.24 23.52 13.92 15.84
9 10.32 2.52 1.20 5.76
10 239.76 3.12 25.44 12.72
11 79.32 6.96 29.04 10.32
12 257.64 28.80 4.80 20.88
13 28.56 42.12 79.08 11.04
14 117.00 9.12 8.64 11.64
15 244.92 39.48 55.20 22.80
16 234.48 57.24 63.48 26.88
17 81.36 43.92 136.80 15.00
18 337.68 47.52 66.96 29.28
19 83.04 24.60 21.96 13.56
20 176.76 28.68 22.92 17.52
21 262.08 33.24 64.08 21.60
22 284.88 6.12 28.20 15.00
23 15.84 19.08 59.52 6.72
24 273.96 20.28 31.44 18.60
25 74.76 15.12 21.96 11.64
26 315.48 4.20 23.40 14.40
27 171.48 35.16 15.12 18.00
28 288.12 20.04 27.48 19.08
29 298.56 32.52 27.48 22.68
30 84.72 19.20 48.96 12.60
31 351.48 33.96 51.84 25.68
32 135.48 20.88 46.32 14.28
33 116.64 1.80 36.00 11.52
34 318.72 24.00 0.36 20.88
35 114.84 1.68 8.88 11.40
36 348.84 4.92 10.20 15.36
37 320.28 52.56 6.00 30.48
38 89.64 59.28 54.84 17.64
39 51.72 32.04 42.12 12.12
40 273.60 45.24 38.40 25.80
41 243.00 26.76 37.92 19.92
42 212.40 40.08 46.44 20.52
43 352.32 33.24 2.16 24.84
44 248.28 10.08 31.68 15.48
45 30.12 30.84 51.96 10.20
46 210.12 27.00 37.80 17.88
47 107.64 11.88 42.84 12.72
48 287.88 49.80 22.20 27.84
49 272.64 18.96 59.88 17.76
50 80.28 14.04 44.16 11.64
51 239.76 3.72 41.52 13.68
52 120.48 11.52 4.32 12.84
53 259.68 50.04 47.52 27.12
54 219.12 55.44 70.44 25.44
55 315.24 34.56 19.08 24.24
56 238.68 59.28 72.00 28.44
57 8.76 33.72 49.68 6.60
58 163.44 23.04 19.92 15.84
59 252.96 59.52 45.24 28.56
60 252.84 35.40 11.16 22.08
61 64.20 2.40 25.68 9.72
62 313.56 51.24 65.64 29.04
63 287.16 18.60 32.76 18.84
64 123.24 35.52 10.08 16.80
65 157.32 51.36 34.68 21.60
66 82.80 11.16 1.08 11.16
67 37.80 29.52 2.64 11.40
68 167.16 17.40 12.24 16.08
69 284.88 33.00 13.20 22.68
70 260.16 52.68 32.64 26.76
71 238.92 36.72 46.44 21.96
72 131.76 17.16 38.04 14.88
73 32.16 39.60 23.16 10.56
74 155.28 6.84 37.56 13.20
75 256.08 29.52 15.72 20.40
76 20.28 52.44 107.28 10.44
77 33.00 1.92 24.84 8.28
78 144.60 34.20 17.04 17.04
79 6.48 35.88 11.28 6.36
80 139.20 9.24 27.72 13.20
81 91.68 32.04 26.76 14.16
82 287.76 4.92 44.28 14.76
83 90.36 24.36 39.00 13.56
84 82.08 53.40 42.72 16.32
85 256.20 51.60 40.56 26.04
86 231.84 22.08 78.84 18.24
87 91.56 33.00 19.20 14.40
88 132.84 48.72 75.84 19.20
89 105.96 30.60 88.08 15.48
90 131.76 57.36 61.68 20.04
91 161.16 5.88 11.16 13.44
92 34.32 1.80 39.60 8.76
93 261.24 40.20 70.80 23.28
94 301.08 43.80 86.76 26.64
95 128.88 16.80 13.08 13.80
96 195.96 37.92 63.48 20.28
97 237.12 4.20 7.08 14.04
98 221.88 25.20 26.40 18.60
99 347.64 50.76 61.44 30.48
100 162.24 50.04 55.08 20.64
101 266.88 5.16 59.76 14.04
102 355.68 43.56 121.08 28.56
103 336.24 12.12 25.68 17.76
104 225.48 20.64 21.48 17.64
105 285.84 41.16 6.36 24.84
106 165.48 55.68 70.80 23.04
107 30.00 13.20 35.64 8.64
108 108.48 0.36 27.84 10.44
109 15.72 0.48 30.72 6.36
110 306.48 32.28 6.60 23.76
111 270.96 9.84 67.80 16.08
112 290.04 45.60 27.84 26.16
113 210.84 18.48 2.88 16.92
114 251.52 24.72 12.84 19.08
115 93.84 56.16 41.40 17.52
116 90.12 42.00 63.24 15.12
117 167.04 17.16 30.72 14.64
118 91.68 0.96 17.76 11.28
119 150.84 44.28 95.04 19.08
120 23.28 19.20 26.76 7.92
121 169.56 32.16 55.44 18.60
122 22.56 26.04 60.48 8.40
123 268.80 2.88 18.72 13.92
124 147.72 41.52 14.88 18.24
125 275.40 38.76 89.04 23.64
126 104.64 14.16 31.08 12.72
127 9.36 46.68 60.72 7.92
128 96.24 0.00 11.04 10.56
129 264.36 58.80 3.84 29.64
130 71.52 14.40 51.72 11.64
131 0.84 47.52 10.44 1.92
132 318.24 3.48 51.60 15.24
133 10.08 32.64 2.52 6.84
134 263.76 40.20 54.12 23.52
135 44.28 46.32 78.72 12.96
136 57.96 56.40 10.20 13.92
137 30.72 46.80 11.16 11.40
138 328.44 34.68 71.64 24.96
139 51.60 31.08 24.60 11.52
140 221.88 52.68 2.04 24.84
141 88.08 20.40 15.48 13.08
142 232.44 42.48 90.72 23.04
143 264.60 39.84 45.48 24.12
144 125.52 6.84 41.28 12.48
145 115.44 17.76 46.68 13.68
146 168.36 2.28 10.80 12.36
147 288.12 8.76 10.44 15.84
148 291.84 58.80 53.16 30.48
149 45.60 48.36 14.28 13.08
150 53.64 30.96 24.72 12.12
151 336.84 16.68 44.40 19.32
152 145.20 10.08 58.44 13.92
153 237.12 27.96 17.04 19.92
154 205.56 47.64 45.24 22.80
155 225.36 25.32 11.40 18.72
156 4.92 13.92 6.84 3.84
157 112.68 52.20 60.60 18.36
158 179.76 1.56 29.16 12.12
159 14.04 44.28 54.24 8.76
160 158.04 22.08 41.52 15.48
161 207.00 21.72 36.84 17.28
162 102.84 42.96 59.16 15.96
163 226.08 21.72 30.72 17.88
164 196.20 44.16 8.88 21.60
165 140.64 17.64 6.48 14.28
166 281.40 4.08 101.76 14.28
167 21.48 45.12 25.92 9.60
168 248.16 6.24 23.28 14.64
169 258.48 28.32 69.12 20.52
170 341.16 12.72 7.68 18.00
171 60.00 13.92 22.08 10.08
172 197.40 25.08 56.88 17.40
173 23.52 24.12 20.40 9.12
174 202.08 8.52 15.36 14.04
175 266.88 4.08 15.72 13.80
176 332.28 58.68 50.16 32.40
177 298.08 36.24 24.36 24.24
178 204.24 9.36 42.24 14.04
179 332.04 2.76 28.44 14.16
180 198.72 12.00 21.12 15.12
181 187.92 3.12 9.96 12.60
182 262.20 6.48 32.88 14.64
183 67.44 6.84 35.64 10.44
184 345.12 51.60 86.16 31.44
185 304.56 25.56 36.00 21.12
186 246.00 54.12 23.52 27.12
187 167.40 2.52 31.92 12.36
188 229.32 34.44 21.84 20.76
189 343.20 16.68 4.44 19.08
190 22.44 14.52 28.08 8.04
191 47.40 49.32 6.96 12.96
192 90.60 12.96 7.20 11.88
193 20.64 4.92 37.92 7.08
194 200.16 50.40 4.32 23.52
195 179.64 42.72 7.20 20.76
196 45.84 4.44 16.56 9.12
197 113.04 5.88 9.72 11.64
198 212.40 11.16 7.68 15.36
199 340.32 50.40 79.44 30.60
200 278.52 10.32 10.44 16.08
Here,the amount of sales depends on the advertisement made on the you tube,Facebook and newspaper(i.e.sales is an outcome/dependent variable and you tube,Facebook and newspaper are the predictor/explanatory/independent variable) .
So,lets check the impact of each predictor variable on the outcome variable.
youtubeonsales<-marketing[,c("youtube","sales")]
View(youtubeonsales)
As we have need to predict the continuous outcome variables,we use regression technique in supervised learning.
Here,we will check weather the data is linearly related or non linearly related
library(ggplot2)
ggplot(marketing,aes(youtube,sales))+geom_point()+geom_smooth()
`geom_smooth()` using method = 'loess'
From above plot of you tube vs sales,it clearly indicates that the you tube data points and sales data points are linearly related to each other and also we have one predictor variable and one outcome variable,we will use simple linear regression technique to build our model.
model<-lm(sales~youtube,data = marketing)
model
Call:
lm(formula = sales ~ youtube, data = marketing)
Coefficients:
(Intercept) youtube
8.43911 0.04754
Here,the equation is developed by calculating the intercept and slope
i.e. y = 0.04754 * x + 8.43911 where y indicates sales
x indicates you tube
and hence the model is developed.
After building/training the model we need to test the model to know how well it performs on test data.
summary(model)
Call:
lm(formula = sales ~ youtube, data = marketing)
Residuals:
Min 1Q Median 3Q Max
-10.0632 -2.3454 -0.2295 2.4805 8.6548
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.439112 0.549412 15.36 <2e-16 ***
youtube 0.047537 0.002691 17.67 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.91 on 198 degrees of freedom
Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16
Here,as the p value of both intercept and slope(you tube) are lesser than 0.05 we conclude that intercept(8.439112) and slope(0.047537) are not equals to zero.
Therefore, the regression coefficients(i.e.slope and intercept) are significant.
As we have R square as 0.6119,it indicates that 61.19% of time you tube explains better sales.
library(DMwR)
Loading required package: lattice
Loading required package: grid
regr.eval(marketing$sales,model$fitted.values)
mae mse rmse mape
3.059767 15.138220 3.890787 0.205766
As we have less mean absolute error(mae),mean square error(mse) and root mean square error(rmse) and mean absolute percentage error(mape) being 20.58% (i.e.nearly 80% accuracy) we can say that model is good.
plot(model)
Here,in residuals vs fitted plot the red line is almost lying near 0 residual value and is almost horizontal and all the fitted values are scattered around it without any systematic relationship.
Therefore, LINEARITY IS MET on residuals.
In normal Q-Q plot drawn,the residuals are almost linearly distributed.(but lets check normality further using other tests)
In scale-location plot,all the residuals are scattered(i.e. none of the points are clustered at one spot).
Therefore, HOMOSCADESCITY IS MET on residuals.
The residuals vs leverage plot tells about the influential observations which will discuss further clearly.
The normality on residuals can be checked by plotting histogram,by plotting qqplot,by plotting density plot,using mean and median,using skewness and kurtosis and by statistical tests(Shapiro wilk test,Anderson test,klomogrov test).
The most recommended is statistical tests.
But,here we will check by using Shapiro wilk test,Anderson darling test and by skewness and kurtosis.
shapiro.test(model$residuals)
Shapiro-Wilk normality test
data: model$residuals
W = 0.99053, p-value = 0.2133
library(nortest)
ad.test(model$residuals)
Anderson-Darling normality test
data: model$residuals
A = 0.49121, p-value = 0.217
library(moments)
skewness(model$residuals)
[1] -0.08863202
kurtosis(model$residuals)
[1] 2.779015
Here,the probability value of both Shapiro wilk test and Anderson darling test is more than 0.05 hence,we accept null hypothesis saying that the residual data is normally distributed.
And we also have skewness nearly equals to zero and kurtosis nearly equal to 3 where we can say that residual data is normally distributed.
Therefore,NORMALITY IS MET on residuals.
Here,we check whether the residuals are correlated(dependent) or not correlated(independent) by using durbin Watson test
library(car)
durbinWatsonTest(model)
lag Autocorrelation D-W Statistic p-value
1 0.02342385 1.934689 0.642
Alternative hypothesis: rho != 0
Here,the probability value is greater than 0.05 so we accept null hypothesis saying that there is no correlation among residuals(i.e.residuals are independent)
Therefore,INDEPENDENCY IS MET on residuals.
Here, we check the availibility of outliers
outlierTest(model)
No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferonni p
179 -2.633499 0.0091219 NA
boxplot(model$residuals)
So,here in outlier test we see 179 which indicates that the 179th observation has a largest error and in boxplot we see outliers present,we can decrease the error and remove the outliers so as to increase the accuracy of the model but we will not do so.
Here,we check the availability of influential observations by using cooks distance.
Any observation far from cooks distance is referred as influential observations.These observations influence the model to commit an error.
plot(model,4)
Here,we see 179th,36th observations far from cooks distance which are influential observations.
Hence,the model is ready to deploy.
Lets,predict the amount of sales on the following you tube given data
you_tube<-data.frame(youtube=c(123.8,67,239,598,787.12))
you_tube
youtube
1 123.80
2 67.00
3 239.00
4 598.00
5 787.12
pred_sales<-predict(model,you_tube)
you_tube$sales<-pred_sales
you_tube
youtube sales
1 123.80 14.32415
2 67.00 11.62407
3 239.00 19.80037
4 598.00 36.86602
5 787.12 45.85615
These are the outcomes(sales) given by the model we developed for the given predictors(you tube).
In the similar manner we can develop the model on facebook-sales and newspaper-sales.
fbonsales<-marketing[,c("facebook","sales")]
View(fbonsales)
As we have need to predict the continuous outcome variables,we use regression technique in supervised learning.
Here,we will check weather the data is linearly related or not.
library(ggplot2)
ggplot(marketing,aes(facebook,sales))+geom_point()+geom_smooth()
`geom_smooth()` using method = 'loess'
From above plot of facebook vs sales,it clearly indicates that the facebook data points and sales data points are linearly related to each other and also we have one predictor variable and one outcome variable,we will use simple linear regression technique to build our model.
model<-lm(sales~facebook,data = marketing)
model
Call:
lm(formula = sales ~ facebook, data = marketing)
Coefficients:
(Intercept) facebook
11.1740 0.2025
Here,the equation is developed by calculating the intercept and slope
i.e. y = 0.2025 * x + 11.1740 where y indicates sales
x indicates facebook
and hence the model is developed.
After building/training the model we need to test the model to know how well it performs on test data.
summary(model)
Call:
lm(formula = sales ~ facebook, data = marketing)
Residuals:
Min 1Q Median 3Q Max
-18.8766 -2.5589 0.9248 3.3330 9.8173
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.17397 0.67548 16.542 <2e-16 ***
facebook 0.20250 0.02041 9.921 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.13 on 198 degrees of freedom
Multiple R-squared: 0.332, Adjusted R-squared: 0.3287
F-statistic: 98.42 on 1 and 198 DF, p-value: < 2.2e-16
Here,as the p value of both intercept and slope are lesser than 0.05 we conclude that intercept(11.17397) and slope(0.2025) are not equals to zero.
Therefore, the regression coefficients(i.e.slope and intercept) are significant.
As we have R square as 0.332,it indicates that 33.2% of time facebook explains better sales.
library(DMwR)
regr.eval(marketing$sales,model$fitted.values)
mae mse rmse mape
3.9842626 26.0530528 5.1042191 0.3381669
As we have less mean absolute error(mae),mean square error(mse) and root mean square error(rmse) and mean absolute percentage error(mape) being 33.82% (i.e.nearly 66% accuracy) we can say that model is good.
plot(model)
Here,in residuals vs fitted plot the red line is almost lying near 0 residual value and is almost horizontal and all the fitted values are scattered around it without any systematic relationship.
Therefore, LINEARITY IS MET on residuals.
In normal Q-Q plot drawn,the residuals are almost linearly distributed.(but lets check normality further using other tests)
In scale-location plot,all the residuals are scattered(i.e. none of the points are clustered at one spot).
Therefore, HOMOSCADESCITY IS MET on residuals.
The residuals vs leverage plot tells about the influential observations which will discuss further clearly.
shapiro.test(model$residuals)
Shapiro-Wilk normality test
data: model$residuals
W = 0.96072, p-value = 2.367e-05
library(nortest)
ad.test(model$residuals)
Anderson-Darling normality test
data: model$residuals
A = 2.439, p-value = 3.467e-06
library(moments)
skewness(model$residuals)
[1] -0.7636953
kurtosis(model$residuals)
[1] 3.544281
plot(density(model$residuals))
qqnorm(model$residuals)
Here,the probability value of both Shapiro wilk test and Anderson darling test is less than 0.05 hence,we accept alternate hypothesis saying that the residual data is not normally distributed.
And we also have skewness nearly equals to zero and kurtosis nearly equal to 3 where we can say that residual data is normally distributed.
But when we observe density plot and q-q plot we can roughly say that is normally distributed
Therefore,NORMALITY IS HARDLY MET on residuals.
library(car)
durbinWatsonTest(model)
lag Autocorrelation D-W Statistic p-value
1 0.02274019 1.945713 0.704
Alternative hypothesis: rho != 0
Here,the probability value is greater than 0.05 so we accept null hypothesis saying that there is no correlation among residuals(i.e.residuals are independent)
Therefore,INDEPENDENCY IS MET on residuals.
outlierTest(model)
rstudent unadjusted p-value Bonferonni p
131 -3.825537 0.0001751 0.035019
boxplot(model$residuals)
So,here in outlier test we see 131 which indicates that the 131th observation has a largest error and in boxplot we see outliers present,we can decrease the error and remove the outliers so as to increase the accuracy of the model but we will not do so.
plot(model,4)
Here,we see 131th,6th observations far from cooks distance which are influential observations.
Hence,the model is ready to deploy.
Lets,predict the amount of sales on the following facebook given data
fb<-data.frame(facebook=c(123.8,67,239,598,787.12))
fb
facebook
1 123.80
2 67.00
3 239.00
4 598.00
5 787.12
pred_sales<-predict(model,fb)
fb$sales<-pred_sales
fb
facebook sales
1 123.80 36.24294
2 67.00 24.74118
3 239.00 59.57046
4 598.00 132.26644
5 787.12 170.56245
These are the outcomes(sales) given by the model we developed for the given predictors(facebook).
newsonsales<-marketing[,c("newspaper","sales")]
View(newsonsales)
As we have need to predict the continuous outcome variables,we use regression technique in supervised learning.
Here,we will check weather the data is linearly related or not.
library(ggplot2)
ggplot(marketing,aes(newspaper,sales))+geom_point()+geom_smooth()
`geom_smooth()` using method = 'loess'
From above plot of newspaper vs sales,it clearly indicates that the newspaper data points and sales data points are not linearly related to each other but we have one predictor variable and one outcome variable,we will use simple linear regression technique to build our model.
model<-lm(sales~newspaper,data = marketing)
model
Call:
lm(formula = sales ~ newspaper, data = marketing)
Coefficients:
(Intercept) newspaper
14.82169 0.05469
Here,the equation is developed by calculating the intercept and slope
i.e. y = 0.05469 * x + 14.82169 where y indicates sales
x indicates newspaper
and hence the model is developed.
summary(model)
Call:
lm(formula = sales ~ newspaper, data = marketing)
Residuals:
Min 1Q Median 3Q Max
-13.473 -4.065 -1.007 4.207 15.330
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.82169 0.74570 19.88 < 2e-16 ***
newspaper 0.05469 0.01658 3.30 0.00115 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.111 on 198 degrees of freedom
Multiple R-squared: 0.05212, Adjusted R-squared: 0.04733
F-statistic: 10.89 on 1 and 198 DF, p-value: 0.001148
Here,as the p value of both intercept and slope are lesser than 0.05 we conclude that intercept(14.82169) and slope(0.05469) are not equals to zero.
Therefore, the regression coefficients(i.e.slope and intercept) are significant.
As we have R square as 0.05212,it indicates that only 5% of time newspaper explains better sales.
library(DMwR)
regr.eval(marketing$sales,model$fitted.values)
mae mse rmse mape
4.9758717 36.9705927 6.0803448 0.3860048
As we have less mean absolute error(mae) and root mean square error(rmse) but slightly more mean square error(mse) and mean absolute percentage error(mape) being 38.6% (i.e.nearly 62% accuracy) we can say that model is good.
plot(model)
Here,in residuals vs fitted plot the red line is slightly horizontal and all the fitted values are scattered around it.
Therefore, LINEARITY IS NEARLY MET on residuals.
In normal Q-Q plot drawn,the residuals are almost linearly distributed.(but lets check normality further using other tests)
In scale-location plot,all the residuals are scattered(i.e. none of the points are clustered at one spot).
Therefore, HOMOSCADESCITY IS MET on residuals.
The residuals vs leverage plot tells about the influential observations which will discuss further clearly.
shapiro.test(model$residuals)
Shapiro-Wilk normality test
data: model$residuals
W = 0.98197, p-value = 0.0114
library(nortest)
ad.test(model$residuals)
Anderson-Darling normality test
data: model$residuals
A = 1.1601, p-value = 0.004848
library(moments)
skewness(model$residuals)
[1] 0.3295549
kurtosis(model$residuals)
[1] 2.527205
plot(density(model$residuals))
qqnorm(model$residuals)
Here,the probability value of both Shapiro wilk test and Anderson darling test is less than 0.05 hence,we accept alternate hypothesis saying that the residual data is not normally distributed.
And we also have skewness nearly equals to zero and kurtosis nearly equal to 3 where we can say that residual data is normally distributed.
But when we observe density plot and q-q plot we can roughly say that is normally distributed
Therefore,NORMALITY IS HARDLY MET on residuals.
library(car)
durbinWatsonTest(model)
lag Autocorrelation D-W Statistic p-value
1 0.004787825 1.983434 0.914
Alternative hypothesis: rho != 0
Here,the probability value is greater than 0.05 so we accept null hypothesis saying that there is no correlation among residuals(i.e.residuals are independent)
Therefore,INDEPENDENCY IS MET on residuals.
outlierTest(model)
No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferonni p
37 2.558821 0.011254 NA
boxplot(model$residuals)
So,here in outlier test we see 37 which indicates that the 37th observation has a largest error and in boxplot we wont see any outliers.
plot(model,4)
Here,we see 17th,76th observations far from cooks distance which are influential observations.
Hence,the model is ready to deploy.
Lets,predict the amount of sales on the following newspaper given data
newsp<-data.frame(newspaper=c(123.8,67,239,598,787.12))
newsp
newspaper
1 123.80
2 67.00
3 239.00
4 598.00
5 787.12
pred_sales<-predict(model,newsp)
newsp$sales<-pred_sales
newsp
newspaper sales
1 123.80 21.59269
2 67.00 18.48613
3 239.00 27.89334
4 598.00 47.52816
5 787.12 57.87172
These are the outcomes(sales) given by the model we developed for the given predictors(newspaper).
market<-data.frame(c(you_tube,fb,newsp))
market
youtube sales facebook sales.1 newspaper sales.2
1 123.80 14.32415 123.80 36.24294 123.80 21.59269
2 67.00 11.62407 67.00 24.74118 67.00 18.48613
3 239.00 19.80037 239.00 59.57046 239.00 27.89334
4 598.00 36.86602 598.00 132.26644 598.00 47.52816
5 787.12 45.85615 787.12 170.56245 787.12 57.87172
mean(market[,"sales"])
[1] 25.69415
mean(market[,"sales.1"])
[1] 84.6767
mean(market[,"sales.2"])
[1] 34.67441
Here,the average sales from youtube is 25.69
the average sales from facebook is 84.68
the average sales from newspaper is 34.67 where facebook has better sales than youtube and newspaper.But this is not enough to test a model still we have lot of factors (like bias-variance) to be taken into consideration to perform the accurate analysis.