For your assignment you may be using different dataset than what is included here.
Always read carefully the instructions on Sakai.
Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their section.
In a given year, if it rains more, we may see that there might be an increase in crop production. This is because more water may lead to more plants.
This is a direct relationship; the number of fruits may be able to be predicted by amount of waterfall in a certain year.
This example represents simple linear regression, which is an extremely useful concept that allows us to predict values of a certain variable based off another variable.
This lab will explore the concepts of simple linear regression, multiple linear regression, and watson analytics.
We are going to use tidyverse a collection of R packages designed for data science.
## Loading required package: tidyverse
## -- Attaching packages ----------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.4
## v tidyr 0.7.2 v stringr 1.2.0
## v readr 1.1.1 v forcats 0.2.0
## -- Conflicts -------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: plotly
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Name your dataset ‘mydata’ so it easy to work with.
Commands: read_csv() rename() head()
mydata = read.csv(file="Advertising.csv")
head(mydata)
## X TV radio newspaper sales
## 1 1 230.1 37.8 69.2 22.1
## 2 2 44.5 39.3 45.1 10.4
## 3 3 17.2 45.9 69.3 9.3
## 4 4 151.5 41.3 58.5 18.5
## 5 5 180.8 10.8 58.4 12.9
## 6 6 8.7 48.9 75.0 7.2
sales <- mydata$sales
radio <- mydata$radio
newspaper <- mydata$newspaper
TV <- mydata$TV
Show in New WindowClear OutputExpand/Collapse Output Loading required package: tidyverse – Attaching packages ————————————— tidyverse 1.2.1 – v ggplot2 2.2.1 v purrr 0.2.4 v tibble 1.4.2 v dplyr 0.7.4 v tidyr 0.7.2 v stringr 1.2.0 v readr 1.1.1 v forcats 0.2.0 – Conflicts —————————————— tidyverse_conflicts() – x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() Loading required package: plotly there is no package called ???plotly???Installing package into ???C:/Users/Jaki/Documents/R/win-library/3.4??? (as ???lib??? is unspecified) also installing the dependencies ???modeltools???, ???DEoptimR???, ???prettyunits???, ???debugme???, ???mclust???, ???flexmix???, ???prabclus???, ???diptest???, ???mvtnorm???, ???robustbase???, ???kernlab???, ???trimcluster???, ???udunits2???, ???e1071???, ???subprocess???, ???semver???, ???rappdirs???, ???progress???, ???reshape???, ???memoise???, ???git2r???, ???processx???, ???fpc???, ???viridis???, ???units???, ???classInt???, ???XML???, ???wdman???, ???binman???, ???repr???, ???htmlwidgets???, ???hexbin???, ???crosstalk???, ???data.table???, ???maps???, ???ggthemes???, ???GGally???, ???devtools???, ???Rserve???, ???RSclient???, ???Cairo???, ???webshot???, ???listviewer???, ???dendextend???, ???sf???, ???RSelenium???, ???IRdisplay???
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/modeltools_0.2-21.zip’ Content type ‘application/zip’ length 138817 bytes (135 KB) downloaded 135 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/DEoptimR_1.0-8.zip’ Content type ‘application/zip’ length 41956 bytes (40 KB) downloaded 40 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/prettyunits_1.0.2.zip’ Content type ‘application/zip’ length 27450 bytes (26 KB) downloaded 26 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/debugme_1.1.0.zip’ Content type ‘application/zip’ length 1010139 bytes (986 KB) downloaded 986 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/mclust_5.4.zip’ Content type ‘application/zip’ length 4128594 bytes (3.9 MB) downloaded 3.9 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/flexmix_2.3-14.zip’ Content type ‘application/zip’ length 1418984 bytes (1.4 MB) downloaded 1.4 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/prabclus_2.2-6.zip’ Content type ‘application/zip’ length 280371 bytes (273 KB) downloaded 273 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/diptest_0.75-7.zip’ Content type ‘application/zip’ length 355332 bytes (347 KB) downloaded 347 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/mvtnorm_1.0-7.zip’ Content type ‘application/zip’ length 233555 bytes (228 KB) downloaded 228 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/robustbase_0.92-8.zip’ Content type ‘application/zip’ length 3373315 bytes (3.2 MB) downloaded 3.2 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/kernlab_0.9-25.zip’ Content type ‘application/zip’ length 2218659 bytes (2.1 MB) downloaded 2.1 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/trimcluster_0.1-2.zip’ Content type ‘application/zip’ length 16170 bytes (15 KB) downloaded 15 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/udunits2_0.13.zip’ Content type ‘application/zip’ length 277648 bytes (271 KB) downloaded 271 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/e1071_1.6-8.zip’ Content type ‘application/zip’ length 895338 bytes (874 KB) downloaded 874 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/subprocess_0.8.2.zip’ Content type ‘application/zip’ length 513586 bytes (501 KB) downloaded 501 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/semver_0.2.0.zip’ Content type ‘application/zip’ length 618390 bytes (603 KB) downloaded 603 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/rappdirs_0.3.1.zip’ Content type ‘application/zip’ length 82922 bytes (80 KB) downloaded 80 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/progress_1.1.2.zip’ Content type ‘application/zip’ length 42555 bytes (41 KB) downloaded 41 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/reshape_0.8.7.zip’ Content type ‘application/zip’ length 128195 bytes (125 KB) downloaded 125 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/memoise_1.1.0.zip’ Content type ‘application/zip’ length 29930 bytes (29 KB) downloaded 29 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/git2r_0.21.0.zip’ Content type ‘application/zip’ length 3028572 bytes (2.9 MB) downloaded 2.9 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/processx_2.0.0.1.zip’ Content type ‘application/zip’ length 91610 bytes (89 KB) downloaded 89 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/fpc_2.1-11.zip’ Content type ‘application/zip’ length 458239 bytes (447 KB) downloaded 447 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/viridis_0.5.0.zip’ Content type ‘application/zip’ length 1714253 bytes (1.6 MB) downloaded 1.6 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/units_0.5-1.zip’ Content type ‘application/zip’ length 872436 bytes (851 KB) downloaded 851 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/classInt_0.1-24.zip’ Content type ‘application/zip’ length 60081 bytes (58 KB) downloaded 58 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/XML_3.98-1.10.zip’ Content type ‘application/zip’ length 4325149 bytes (4.1 MB) downloaded 4.1 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/wdman_0.2.2.zip’ Content type ‘application/zip’ length 54591 bytes (53 KB) downloaded 53 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/binman_0.1.0.zip’ Content type ‘application/zip’ length 83672 bytes (81 KB) downloaded 81 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/repr_0.12.0.zip’ Content type ‘application/zip’ length 61241 bytes (59 KB) downloaded 59 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/htmlwidgets_1.0.zip’ Content type ‘application/zip’ length 852738 bytes (832 KB) downloaded 832 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/hexbin_1.27.2.zip’ Content type ‘application/zip’ length 684884 bytes (668 KB) downloaded 668 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/crosstalk_1.0.0.zip’ Content type ‘application/zip’ length 599121 bytes (585 KB) downloaded 585 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/data.table_1.10.4-3.zip’ Content type ‘application/zip’ length 1577087 bytes (1.5 MB) downloaded 1.5 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/maps_3.2.0.zip’ Content type ‘application/zip’ length 3631730 bytes (3.5 MB) downloaded 3.5 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/ggthemes_3.4.0.zip’ Content type ‘application/zip’ length 910204 bytes (888 KB) downloaded 888 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/GGally_1.3.2.zip’ Content type ‘application/zip’ length 1243907 bytes (1.2 MB) downloaded 1.2 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/devtools_1.13.5.zip’ Content type ‘application/zip’ length 443954 bytes (433 KB) downloaded 433 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/Rserve_1.7-3.zip’ Content type ‘application/zip’ length 632080 bytes (617 KB) downloaded 617 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/RSclient_0.7-3.zip’ Content type ‘application/zip’ length 1292026 bytes (1.2 MB) downloaded 1.2 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/Cairo_1.5-9.zip’ Content type ‘application/zip’ length 1031084 bytes (1006 KB) downloaded 1006 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/webshot_0.5.0.zip’ Content type ‘application/zip’ length 1353612 bytes (1.3 MB) downloaded 1.3 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/listviewer_1.4.0.zip’ Content type ‘application/zip’ length 243425 bytes (237 KB) downloaded 237 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/dendextend_1.7.0.zip’ Content type ‘application/zip’ length 1853859 bytes (1.8 MB) downloaded 1.8 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/sf_0.6-0.zip’ Content type ‘application/zip’ length 36277805 bytes (34.6 MB) downloaded 34.6 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/RSelenium_1.7.1.zip’ Content type ‘application/zip’ length 1887767 bytes (1.8 MB) downloaded 1.8 MB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/IRdisplay_0.4.4.zip’ Content type ‘application/zip’ length 24378 bytes (23 KB) downloaded 23 KB
trying URL ‘https://cran.rstudio.com/bin/windows/contrib/3.4/plotly_4.7.1.zip’ Content type ‘application/zip’ length 1160742 bytes (1.1 MB) downloaded 1.1 MB
package ‘modeltools’ successfully unpacked and MD5 sums checked package ‘DEoptimR’ successfully unpacked and MD5 sums checked package ‘prettyunits’ successfully unpacked and MD5 sums checked package ‘debugme’ successfully unpacked and MD5 sums checked package ‘mclust’ successfully unpacked and MD5 sums checked package ‘flexmix’ successfully unpacked and MD5 sums checked package ‘prabclus’ successfully unpacked and MD5 sums checked package ‘diptest’ successfully unpacked and MD5 sums checked package ‘mvtnorm’ successfully unpacked and MD5 sums checked package ‘robustbase’ successfully unpacked and MD5 sums checked package ‘kernlab’ successfully unpacked and MD5 sums checked package ‘trimcluster’ successfully unpacked and MD5 sums checked package ‘udunits2’ successfully unpacked and MD5 sums checked package ‘e1071’ successfully unpacked and MD5 sums checked package ‘subprocess’ successfully unpacked and MD5 sums checked package ‘semver’ successfully unpacked and MD5 sums checked package ‘rappdirs’ successfully unpacked and MD5 sums checked package ‘progress’ successfully unpacked and MD5 sums checked package ‘reshape’ successfully unpacked and MD5 sums checked package ‘memoise’ successfully unpacked and MD5 sums checked package ‘git2r’ successfully unpacked and MD5 sums checked package ‘processx’ successfully unpacked and MD5 sums checked package ‘fpc’ successfully unpacked and MD5 sums checked package ‘viridis’ successfully unpacked and MD5 sums checked package ‘units’ successfully unpacked and MD5 sums checked package ‘classInt’ successfully unpacked and MD5 sums checked package ‘XML’ successfully unpacked and MD5 sums checked package ‘wdman’ successfully unpacked and MD5 sums checked package ‘binman’ successfully unpacked and MD5 sums checked package ‘repr’ successfully unpacked and MD5 sums checked package ‘htmlwidgets’ successfully unpacked and MD5 sums checked package ‘hexbin’ successfully unpacked and MD5 sums checked package ‘crosstalk’ successfully unpacked and MD5 sums checked package ‘data.table’ successfully unpacked and MD5 sums checked package ‘maps’ successfully unpacked and MD5 sums checked package ‘ggthemes’ successfully unpacked and MD5 sums checked package ‘GGally’ successfully unpacked and MD5 sums checked package ‘devtools’ successfully unpacked and MD5 sums checked package ‘Rserve’ successfully unpacked and MD5 sums checked package ‘RSclient’ successfully unpacked and MD5 sums checked package ‘Cairo’ successfully unpacked and MD5 sums checked package ‘webshot’ successfully unpacked and MD5 sums checked package ‘listviewer’ successfully unpacked and MD5 sums checked package ‘dendextend’ successfully unpacked and MD5 sums checked package ‘sf’ successfully unpacked and MD5 sums checked package ‘RSelenium’ successfully unpacked and MD5 sums checked package ‘IRdisplay’ successfully unpacked and MD5 sums checked package ‘plotly’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in C:2npqqp_packages
Attaching package: ???plotly???
The following object is masked from ???package:ggplot2???:
last_plot
The following object is masked from ???package:stats???:
filter
The following object is masked from ???package:graphics???:
layout
Show in New WindowClear OutputExpand/Collapse Output
X
#corr = cor( MYDATA )
#corr
corr = cor(mydata[ -c(1) ])
corr
## TV radio newspaper sales
## TV 1.00000000 0.05480866 0.05664787 0.7822244
## radio 0.05480866 1.00000000 0.35410375 0.5762226
## newspaper 0.05664787 0.35410375 1.00000000 0.2282990
## sales 0.78222442 0.57622257 0.22829903 1.0000000
In each collumn, one variable will have a perfect correlation with itself. The relationship between TV and TV will always be 1. TV and sales, radio and sales have the strongest correlations or the correlations closest to one.
qplot( x =radio, y = sales, data = mydata)
The relationship between sales and radio is postively correlated, because the points on the graph move up and to the right. The correlation coefficient of radio and sales is 0.57622257; which can be seen in the correlation chart
#Simple Linear Regression Model
#reg <- lm( DEPENDENT_VARIABLE ~ INDEPENDENT_VARIABLE )
reg <- lm( sales ~ radio )
#Summary of Simple Linear Regression Model
#summary(MODEL)
summary(reg)
##
## Call:
## lm(formula = sales ~ radio)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.7305 -2.1324 0.7707 2.7775 8.1810
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.31164 0.56290 16.542 <2e-16 ***
## radio 0.20250 0.02041 9.921 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.275 on 198 degrees of freedom
## Multiple R-squared: 0.332, Adjusted R-squared: 0.3287
## F-statistic: 98.42 on 1 and 198 DF, p-value: < 2.2e-16
The R-Squared is .332 and the Adjusted R-squared is .3287 anything below .5 indicates that it’s a bad fit for the data.
Y_mpg_predicted = intercept estimate value +/- intercept weight estimate * (weight) sales = 9.3116 - (0.20250) * (radio) to change from scientific values options (scipen = 9999) -> changes output to complete numbers rather than e
#p <- qplot( x = INDEPENDENT_VARIABLE, y = DEPENDENT_VARIABLE, data = mydata) + geom_point()
p <- qplot( x = radio, y = sales, data = mydata) + geom_point()
p
#Add a trend line plot using the a linear model
#p + geom_smooth(method = "lm", formula = y ~ x)
p + geom_smooth(method = "lm", formula = y ~ x)
y = 2.92110 + 0.04575*(mydata$TV) + 0.18799*(mydata$radio)
y
## [1] 20.554197 12.344982 12.336741 17.616212 13.222992 12.511836 11.717797
## [8] 12.104854 3.709329 12.550724 7.035517 17.255385 10.608399 8.810449
## [15] 18.443546 20.827773 12.903384 23.239554 9.940795 14.153036 18.120223
## [22] 14.740899 6.514041 16.542856 8.139999 15.606740 14.966882 17.045108
## [29] 19.398229 9.158890 21.641392 11.357301 7.649985 18.832100 7.562561
## [36] 16.991384 23.365737 15.625331 9.912258 20.439323 16.377652 17.297716
## [43] 21.560623 13.965891 8.900768 15.161700 8.885976 21.698110 16.285742
## [50] 8.181258 12.644719 9.319104 20.660583 19.960188 20.353737 21.307481
## [57] 8.537594 12.761658 21.889504 18.106330 5.744705 22.902748 16.782920
## [64] 13.184129 16.964897 7.826157 8.986779 12.019930 18.951875 21.092461
## [71] 17.782419 10.632707 10.350870 9.912693 17.308704 11.909438 4.480009
## [78] 13.791690 8.789051 9.675623 11.435733 14.662709 10.182272 14.415955
## [85] 20.772295 15.219016 11.581550 15.618019 11.754570 16.930372 9.986476
## [92] 4.511535 19.178540 21.261410 10.466510 16.332559 12.619265 15.328065
## [99] 24.126852 16.945683 13.904257 23.305437 17.638949 14.750953 20.266807
## [106] 17.952761 6.132740 7.113297 3.595621 19.662581 14.792968 21.122495
## [113] 13.854421 16.382894 15.296682 12.936575 11.977757 6.566792 15.608706
## [120] 6.816490 14.423707 7.860583 13.620276 15.057379 19.492802 9.128782
## [127] 10.590761 6.590250 22.211335 7.903680 10.397529 15.599171 8.418728
## [134] 19.274615 11.865689 13.966355 11.423910 20.875786 9.757291 19.633036
## [141] 9.474980 18.437721 19.250243 8.778093 10.104502 9.697006 15.278002
## [148] 23.259010 12.235597 9.816267 18.376186 10.035966 16.341467 18.221278
## [155] 15.479539 5.289359 15.394590 10.018837 10.393206 12.405391 14.215594
## [162] 13.571917 14.943019 17.319257 11.046453 14.288641 10.808449 13.359748
## [169] 17.212214 17.920519 7.389284 14.375966 7.596399 11.960129 13.735066
## [176] 24.781986 19.962698 12.174072 16.012502 12.377200 10.574324 13.932621
## [183] 6.563793 24.162370 18.536637 20.778199 9.698004 17.059238 18.618661
## [190] 6.051304 12.454614 8.405517 4.478759 18.447780 16.462319 5.364313
## [197] 8.151901 12.767157 23.791380 15.156389
The multiple linear regression model is better because it is higher meaning there is a better chance of it being correct.
MODEL 1
reg <- lm(mydata$sales ~ mydata$radio)
reg
##
## Call:
## lm(formula = mydata$sales ~ mydata$radio)
##
## Coefficients:
## (Intercept) mydata$radio
## 9.3116 0.2025
y = 0.2025*(69) + 9.3116
y
## [1] 23.2841
MODEL 2
reg <- lm(mydata$sales ~ mydata$newspaper)
reg
##
## Call:
## lm(formula = mydata$sales ~ mydata$newspaper)
##
## Coefficients:
## (Intercept) mydata$newspaper
## 12.35141 0.05469
y = 0.05469 + 12.35141*75
y
## [1] 926.4104
MODEL 3
mlrl <- lm(mydata$radio ~ mydata$TV + mydata$newspaper)
mlrl
##
## Call:
## lm(formula = mydata$radio ~ mydata$TV + mydata$newspaper)
##
## Coefficients:
## (Intercept) mydata$TV mydata$newspaper
## 15.043008 0.006029 0.240052
y = 0.240052*75+0.006029*255+15.043008
y
## [1] 34.5843
To complete the last task, follow the directions found below. Make sure to screenshot and attach any pictures of the results obtained or any questions asked.
knitr::include_graphics('SalesPredictors.png')
TV and radio combined are the strongest correlators of sales
knitr::include_graphics('SalesPredictors2.png')
see above for analysis ### 3B) Note the predictive power strength of reported variables. Consider the one field predictive model only, describe your findings and add and screenshot
knitr::include_graphics('SalesPredictors3.png')
It’s interesting that TV is a better predictor of Sales than Radio. It should also be noted that single correlations are a lot weaker than multiple variable correlations.
Watson results reconcile with my findings based on the R regression analysis in task 2 because the numbers are fairly similar. Sales and TV have high correlations and next come radio and sales. Generally Radio and TV are important and correlate strongly with Sales.