About

In a given year, if it rains more, we may see that there might be an increase in crop production. This is because more water may lead to more plants. This is a direct relationship; the number of fruits may be able to be predicted by amount of waterfall in a certain year. This example represents simple linear regression, which is an extremely useful concept that allows us to predict values of a certain variable based off another variable.

This lab will explore the concepts of simple linear regression, multiple linear regression, and watson analytics.

Setup

Make sure to download the folder titled ‘bsad_lab5’ zip folder and extract the folder to unzip it. Next, we must set this folder as the working directory. The way to do this is to open R Studio, go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Now, follow the directions to complete the lab.


Task 1

First, read in the marketing data that was used in the previous lab. It can be found in the downloaded folder bsad_lab5. Make sure to view the file to ensure it was read in correctly.

#Read data correctly
mydata = read.csv(file="data/marketing.csv")
head(mydata)

Next, apply the cor() function to the data to understand the correlations between variables. This is a great way to compare the correlations between all variables.

Why is the value “1.0” down the diagonal? Which pairs seem to have the strongest correlations? Answer and list the pairs below the matrix.

#Correlation Matrix
cor(mydata)

From the matrix, its clear that Sales and Radio have the strongest correlation. So, create a scatterplot between the two to visualize the data. Make sure to extract the columns.

#Extract all variables
pos  = mydata$pos
paper = mydata$paper
tv = mydata$tv
sales = mydata$sales
radio = mydata$radio

#Plot of Radio and Sales using plot command from Worksheet 4

From this plot, it seems the points are scattered in an almost linear way. So, we will try to fit a simple linear regression model to the graph.

The lm() function is a very useful one. The function is set up as lm(y~x) where the x variable, the independent variable, predicts values of the y variable, or the dependent variable.

In the regression below, we are using radio ads to predict sales. We print out a summary to view the quantitative facts about the linear model.

#Simple Linear Regression
reg <- lm(sales ~ radio)

#Summary of Model
summary(reg)

Report and interpret the R-Squared value below.

Because the R-Squared value is so high, it indicates that the model is a good fit, but not perfect. We will overlay a trend plot over the original plot we had. This will show how far the predictions are from the actual value. The distance from the actual versus the predicted is the residual.

#Plot Radio and Sales 
plot(radio,sales)

#Add a trend line plot using the linear model we created above
abline(reg,col="blue",lwd=2) 

List some observations from this plot.


Task2

Sometimes, one variable is very good at predicting another variable. But most times, there are more than one factors that affect the prediction of another variable. While increased rainfall is a good predictor of increased crop supply, decreased herbivores can also result in an increase of crops. This idea is a loose metaphor for multiple linear regression.

In R, multiple linear regression takes the form of lm(y ~ x0 + x1 + x2 + ... ), where y is the value that is being predicted, or the dependent value and the x variables are the predictors or the independent values.

Lets create a multiple linear regression predicting sales using radio and tv.

#Multiple Linear Regression Model
mlr1 <-lm(sales ~ radio + tv)

#Summary of Multiple Linear Regression Model
summary(mlr1)

For mlr1, the R-Squared values is 0.9577 and the Adj R-Squared is 0.9527.

Create a Multiple Linear Regression Model for each of the following, display the summary statistics, and write the values for R-Squared and Adj R-Squared:

#mlr2 = Sales predicted by radio, tv, and pos


#mlr3 = Sales predicted by radio, tv, pos, and paper

Based purely on the values for R-Squared and Adj R-Squared, which linear regression model is best in predicting sales. Explain why.

After deciding which model predicts sales best, we will confirm the it truly is the best model by predicting sales given independent variables.

Given that Radio = 69 , TV = 255 , POS = 1.5, and Paper = 75, calculate the predicted sales value for each of the three models above.


Task 3

To complete the last task, follow the directions found below. Make sure to screenshot and attach any pictures of the results obtained or any questions asked.

  1. Logon to your Watson Analytics account at watsonanalytics.com
  2. Upload the file marketing.csv unless already in your folder
  3. Use the Predictive module to analyze the data
  4. Note the predictive power strength of reported variables. Consider the one field predictive model only.
  5. How do Watson results reconcile with your findings based on the R regression analysis in task 2? Explain how.
LS0tDQp0aXRsZTogIkJ1c2luZXNzIEFuYWx5dGljcyBMYWIgV29ya3NoZWV0IDA1Ig0KYXV0aG9yOiAiWW91ciBOYW1lIEhlcmUiDQpkYXRlOiAiU3VtbWVyIDIwMTciDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQNCiAgaHRtbF9kb2N1bWVudDogZGVmYXVsdA0KICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQNCnN1YnRpdGxlOiBDTUUgR3JvdXAgRm91bmRhdGlvbiBCdXNpbmVzcyBBbmFseXRpY3MgTGFiDQotLS0NCg0KIyMjIEFib3V0DQpJbiBhIGdpdmVuIHllYXIsIGlmIGl0IHJhaW5zIG1vcmUsIHdlIG1heSBzZWUgdGhhdCB0aGVyZSBtaWdodCBiZSBhbiBpbmNyZWFzZSBpbiBjcm9wIHByb2R1Y3Rpb24uIFRoaXMgaXMgYmVjYXVzZSBtb3JlIHdhdGVyIG1heSBsZWFkIHRvIG1vcmUgcGxhbnRzLiBUaGlzIGlzIGEgZGlyZWN0IHJlbGF0aW9uc2hpcDsgdGhlIG51bWJlciBvZiBmcnVpdHMgbWF5IGJlIGFibGUgdG8gYmUgcHJlZGljdGVkIGJ5IGFtb3VudCBvZiB3YXRlcmZhbGwgaW4gYSBjZXJ0YWluIHllYXIuIFRoaXMgZXhhbXBsZSByZXByZXNlbnRzIHNpbXBsZSBsaW5lYXIgcmVncmVzc2lvbiwgd2hpY2ggaXMgYW4gZXh0cmVtZWx5IHVzZWZ1bCBjb25jZXB0IHRoYXQgYWxsb3dzIHVzIHRvIHByZWRpY3QgdmFsdWVzIG9mIGEgY2VydGFpbiB2YXJpYWJsZSBiYXNlZCBvZmYgYW5vdGhlciB2YXJpYWJsZS4gDQoNClRoaXMgbGFiIHdpbGwgZXhwbG9yZSB0aGUgY29uY2VwdHMgb2Ygc2ltcGxlIGxpbmVhciByZWdyZXNzaW9uLCBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiwgYW5kIHdhdHNvbiBhbmFseXRpY3MuIA0KDQojIyMgU2V0dXANCg0KTWFrZSBzdXJlIHRvIGRvd25sb2FkIHRoZSBmb2xkZXIgdGl0bGVkICdic2FkX2xhYjUnIHppcCBmb2xkZXIgYW5kIGV4dHJhY3QgdGhlIGZvbGRlciB0byB1bnppcCBpdC4gTmV4dCwgd2UgbXVzdCBzZXQgdGhpcyBmb2xkZXIgYXMgdGhlIHdvcmtpbmcgZGlyZWN0b3J5LiBUaGUgd2F5IHRvIGRvIHRoaXMgaXMgdG8gb3BlbiBSIFN0dWRpbywgZ28gdG8gJ1Nlc3Npb24nLCBzY3JvbGwgZG93biB0byAnU2V0IFdvcmtpbmcgRGlyZWN0b3J5JywgYW5kIGNsaWNrICdUbyBTb3VyY2UgRmlsZSBMb2NhdGlvbicuIE5vdywgZm9sbG93IHRoZSBkaXJlY3Rpb25zIHRvIGNvbXBsZXRlIHRoZSBsYWIuDQoNCi0tLS0tLS0tLS0NCg0KIyMjIFRhc2sgMQ0KDQpGaXJzdCwgcmVhZCBpbiB0aGUgbWFya2V0aW5nIGRhdGEgdGhhdCB3YXMgdXNlZCBpbiB0aGUgcHJldmlvdXMgbGFiLiBJdCBjYW4gYmUgZm91bmQgaW4gdGhlIGRvd25sb2FkZWQgZm9sZGVyIGJzYWRfbGFiNS4gTWFrZSBzdXJlIHRvIHZpZXcgdGhlIGZpbGUgdG8gZW5zdXJlIGl0IHdhcyByZWFkIGluIGNvcnJlY3RseS4gDQoNCmBgYHtyfQ0KI1JlYWQgZGF0YSBjb3JyZWN0bHkNCm15ZGF0YSA9IHJlYWQuY3N2KGZpbGU9ImRhdGEvbWFya2V0aW5nLmNzdiIpDQpoZWFkKG15ZGF0YSkNCmBgYA0KDQpOZXh0LCBhcHBseSB0aGUgYGNvcigpYCBmdW5jdGlvbiB0byB0aGUgZGF0YSB0byB1bmRlcnN0YW5kIHRoZSBjb3JyZWxhdGlvbnMgYmV0d2VlbiB2YXJpYWJsZXMuIFRoaXMgaXMgYSBncmVhdCB3YXkgdG8gY29tcGFyZSB0aGUgY29ycmVsYXRpb25zIGJldHdlZW4gYWxsIHZhcmlhYmxlcy4NCg0KV2h5IGlzIHRoZSB2YWx1ZSAiMS4wIiBkb3duIHRoZSBkaWFnb25hbD8gV2hpY2ggcGFpcnMgc2VlbSB0byBoYXZlIHRoZSBzdHJvbmdlc3QgY29ycmVsYXRpb25zPyBBbnN3ZXIgYW5kIGxpc3QgdGhlIHBhaXJzIGJlbG93IHRoZSBtYXRyaXguIA0KDQpgYGB7cn0NCiNDb3JyZWxhdGlvbiBNYXRyaXgNCmNvcihteWRhdGEpDQpgYGANCg0KRnJvbSB0aGUgbWF0cml4LCBpdHMgY2xlYXIgdGhhdCBTYWxlcyBhbmQgUmFkaW8gaGF2ZSB0aGUgc3Ryb25nZXN0IGNvcnJlbGF0aW9uLiBTbywgY3JlYXRlIGEgc2NhdHRlcnBsb3QgYmV0d2VlbiB0aGUgdHdvIHRvIHZpc3VhbGl6ZSB0aGUgZGF0YS4gTWFrZSBzdXJlIHRvIGV4dHJhY3QgdGhlIGNvbHVtbnMuDQoNCmBgYHtyfQ0KI0V4dHJhY3QgYWxsIHZhcmlhYmxlcw0KcG9zICA9IG15ZGF0YSRwb3MNCnBhcGVyID0gbXlkYXRhJHBhcGVyDQp0diA9IG15ZGF0YSR0dg0Kc2FsZXMgPSBteWRhdGEkc2FsZXMNCnJhZGlvID0gbXlkYXRhJHJhZGlvDQoNCiNQbG90IG9mIFJhZGlvIGFuZCBTYWxlcyB1c2luZyBwbG90IGNvbW1hbmQgZnJvbSBXb3Jrc2hlZXQgNA0KDQpgYGANCg0KRnJvbSB0aGlzIHBsb3QsIGl0IHNlZW1zIHRoZSBwb2ludHMgYXJlIHNjYXR0ZXJlZCBpbiBhbiBhbG1vc3QgbGluZWFyIHdheS4gU28sIHdlIHdpbGwgdHJ5IHRvIGZpdCBhIHNpbXBsZSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCB0byB0aGUgZ3JhcGguIA0KDQpUaGUgYGxtKClgIGZ1bmN0aW9uIGlzIGEgdmVyeSB1c2VmdWwgb25lLiBUaGUgZnVuY3Rpb24gaXMgc2V0IHVwIGFzIGBsbSh5fngpYCB3aGVyZSB0aGUgeCB2YXJpYWJsZSwgdGhlIGluZGVwZW5kZW50IHZhcmlhYmxlLCBwcmVkaWN0cyB2YWx1ZXMgb2YgdGhlIHkgdmFyaWFibGUsIG9yIHRoZSBkZXBlbmRlbnQgdmFyaWFibGUuDQoNCkluIHRoZSByZWdyZXNzaW9uIGJlbG93LCB3ZSBhcmUgdXNpbmcgcmFkaW8gYWRzIHRvIHByZWRpY3Qgc2FsZXMuIFdlIHByaW50IG91dCBhIHN1bW1hcnkgdG8gdmlldyB0aGUgcXVhbnRpdGF0aXZlIGZhY3RzIGFib3V0IHRoZSBsaW5lYXIgbW9kZWwuIA0KDQpgYGB7cn0NCiNTaW1wbGUgTGluZWFyIFJlZ3Jlc3Npb24NCnJlZyA8LSBsbShzYWxlcyB+IHJhZGlvKQ0KDQojU3VtbWFyeSBvZiBNb2RlbA0Kc3VtbWFyeShyZWcpDQpgYGANCg0KUmVwb3J0IGFuZCBpbnRlcnByZXQgdGhlIFItU3F1YXJlZCB2YWx1ZSBiZWxvdy4gDQoNCg0KQmVjYXVzZSB0aGUgUi1TcXVhcmVkIHZhbHVlIGlzIHNvIGhpZ2gsIGl0IGluZGljYXRlcyB0aGF0IHRoZSBtb2RlbCBpcyBhIGdvb2QgZml0LCBidXQgbm90IHBlcmZlY3QuIFdlIHdpbGwgb3ZlcmxheSBhIHRyZW5kIHBsb3Qgb3ZlciB0aGUgb3JpZ2luYWwgcGxvdCB3ZSBoYWQuIFRoaXMgd2lsbCBzaG93IGhvdyBmYXIgdGhlIHByZWRpY3Rpb25zIGFyZSBmcm9tIHRoZSBhY3R1YWwgdmFsdWUuIFRoZSBkaXN0YW5jZSBmcm9tIHRoZSBhY3R1YWwgdmVyc3VzIHRoZSBwcmVkaWN0ZWQgaXMgdGhlIHJlc2lkdWFsLg0KDQpgYGB7cn0NCiNQbG90IFJhZGlvIGFuZCBTYWxlcyANCnBsb3QocmFkaW8sc2FsZXMpDQoNCiNBZGQgYSB0cmVuZCBsaW5lIHBsb3QgdXNpbmcgdGhlIGxpbmVhciBtb2RlbCB3ZSBjcmVhdGVkIGFib3ZlDQphYmxpbmUocmVnLGNvbD0iYmx1ZSIsbHdkPTIpIA0KYGBgDQoNCkxpc3Qgc29tZSBvYnNlcnZhdGlvbnMgZnJvbSB0aGlzIHBsb3QuIA0KDQotLS0tLS0tLS0tDQoNCiMjIyBUYXNrMg0KDQpTb21ldGltZXMsIG9uZSB2YXJpYWJsZSBpcyB2ZXJ5IGdvb2QgYXQgcHJlZGljdGluZyBhbm90aGVyIHZhcmlhYmxlLiBCdXQgbW9zdCB0aW1lcywgdGhlcmUgYXJlIG1vcmUgdGhhbiBvbmUgZmFjdG9ycyB0aGF0IGFmZmVjdCB0aGUgcHJlZGljdGlvbiBvZiBhbm90aGVyIHZhcmlhYmxlLiBXaGlsZSBpbmNyZWFzZWQgcmFpbmZhbGwgaXMgYSBnb29kIHByZWRpY3RvciBvZiBpbmNyZWFzZWQgY3JvcCBzdXBwbHksIGRlY3JlYXNlZCBoZXJiaXZvcmVzIGNhbiBhbHNvIHJlc3VsdCBpbiBhbiBpbmNyZWFzZSBvZiBjcm9wcy4gVGhpcyBpZGVhIGlzIGEgbG9vc2UgbWV0YXBob3IgZm9yIG11bHRpcGxlIGxpbmVhciByZWdyZXNzaW9uLiANCg0KSW4gUiwgbXVsdGlwbGUgbGluZWFyIHJlZ3Jlc3Npb24gdGFrZXMgdGhlIGZvcm0gb2YgYGxtKHkgfiB4MCArIHgxICsgeDIgKyAuLi4gKWAsIHdoZXJlIHkgaXMgdGhlIHZhbHVlIHRoYXQgaXMgYmVpbmcgcHJlZGljdGVkLCBvciB0aGUgZGVwZW5kZW50IHZhbHVlIGFuZCB0aGUgeCB2YXJpYWJsZXMgYXJlIHRoZSBwcmVkaWN0b3JzIG9yIHRoZSBpbmRlcGVuZGVudCB2YWx1ZXMuIA0KDQpMZXRzIGNyZWF0ZSBhIG11bHRpcGxlIGxpbmVhciByZWdyZXNzaW9uIHByZWRpY3Rpbmcgc2FsZXMgdXNpbmcgcmFkaW8gYW5kIHR2LiANCg0KYGBge3J9DQojTXVsdGlwbGUgTGluZWFyIFJlZ3Jlc3Npb24gTW9kZWwNCm1scjEgPC1sbShzYWxlcyB+IHJhZGlvICsgdHYpDQoNCiNTdW1tYXJ5IG9mIE11bHRpcGxlIExpbmVhciBSZWdyZXNzaW9uIE1vZGVsDQpzdW1tYXJ5KG1scjEpDQpgYGANCg0KRm9yIG1scjEsIHRoZSBSLVNxdWFyZWQgdmFsdWVzIGlzIDAuOTU3NyBhbmQgdGhlIEFkaiBSLVNxdWFyZWQgaXMgMC45NTI3LiANCg0KQ3JlYXRlIGEgTXVsdGlwbGUgTGluZWFyIFJlZ3Jlc3Npb24gTW9kZWwgZm9yIGVhY2ggb2YgdGhlIGZvbGxvd2luZywgZGlzcGxheSB0aGUgc3VtbWFyeSBzdGF0aXN0aWNzLCBhbmQgd3JpdGUgdGhlIHZhbHVlcyBmb3IgUi1TcXVhcmVkIGFuZCBBZGogUi1TcXVhcmVkOiANCmBgYHtyfQ0KI21scjIgPSBTYWxlcyBwcmVkaWN0ZWQgYnkgcmFkaW8sIHR2LCBhbmQgcG9zDQoNCg0KI21scjMgPSBTYWxlcyBwcmVkaWN0ZWQgYnkgcmFkaW8sIHR2LCBwb3MsIGFuZCBwYXBlcg0KDQoNCmBgYA0KDQpCYXNlZCBwdXJlbHkgb24gdGhlIHZhbHVlcyBmb3IgUi1TcXVhcmVkIGFuZCBBZGogUi1TcXVhcmVkLCB3aGljaCBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCBpcyBiZXN0IGluIHByZWRpY3Rpbmcgc2FsZXMuIEV4cGxhaW4gd2h5LiANCg0KQWZ0ZXIgZGVjaWRpbmcgd2hpY2ggbW9kZWwgcHJlZGljdHMgc2FsZXMgYmVzdCwgd2Ugd2lsbCBjb25maXJtIHRoZSBpdCB0cnVseSBpcyB0aGUgYmVzdCBtb2RlbCBieSBwcmVkaWN0aW5nIHNhbGVzIGdpdmVuIGluZGVwZW5kZW50IHZhcmlhYmxlcy4gDQoNCkdpdmVuIHRoYXQgYFJhZGlvID0gNjlgICwgYFRWID0gMjU1YCAsIGBQT1MgPSAxLjVgLCBhbmQgYFBhcGVyID0gNzVgLCBjYWxjdWxhdGUgdGhlIHByZWRpY3RlZCBzYWxlcyB2YWx1ZSBmb3IgZWFjaCBvZiB0aGUgdGhyZWUgbW9kZWxzIGFib3ZlLg0KDQotLS0tLS0tLS0tDQoNCiMjIyBUYXNrIDMNCg0KVG8gY29tcGxldGUgdGhlIGxhc3QgdGFzaywgZm9sbG93IHRoZSBkaXJlY3Rpb25zIGZvdW5kIGJlbG93LiBNYWtlIHN1cmUgdG8gc2NyZWVuc2hvdCBhbmQgYXR0YWNoIGFueSBwaWN0dXJlcyBvZiB0aGUgcmVzdWx0cyBvYnRhaW5lZCBvciBhbnkgcXVlc3Rpb25zIGFza2VkLiANCg0KICAxLiBMb2dvbiB0byB5b3VyIFdhdHNvbiBBbmFseXRpY3MgYWNjb3VudCBhdCB3YXRzb25hbmFseXRpY3MuY29tDQogIDIuIFVwbG9hZCB0aGUgZmlsZSBtYXJrZXRpbmcuY3N2IHVubGVzcyBhbHJlYWR5IGluIHlvdXIgZm9sZGVyDQogIDMuIFVzZSB0aGUgUHJlZGljdGl2ZSBtb2R1bGUgdG8gYW5hbHl6ZSB0aGUgZGF0YQ0KICA0LiBOb3RlIHRoZSBwcmVkaWN0aXZlIHBvd2VyIHN0cmVuZ3RoIG9mIHJlcG9ydGVkIHZhcmlhYmxlcy4gQ29uc2lkZXIgdGhlIG9uZSBmaWVsZCBwcmVkaWN0aXZlIG1vZGVsIG9ubHkuDQogIDUuIEhvdyBkbyBXYXRzb24gcmVzdWx0cyByZWNvbmNpbGUgd2l0aCB5b3VyIGZpbmRpbmdzIGJhc2VkIG9uIHRoZSBSIHJlZ3Jlc3Npb24gYW5hbHlzaXMgaW4gdGFzayAyPyBFeHBsYWluIGhvdy4NCg0KDQo=