1.Why are we concerned with multicollinearity?
Multicollinearity is a phenomenon in which one independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation. Having a multicollinearity in the model would resulted in inflated explained varitaion due to high correlation between the predictors. Thus the R squares would be inflated as well give off false signal of the model being well fitted.
Intuitively, it is just a issue of overfitt with predictability of the model is compremised.
2.If we run an anova(model1, model2) and the p value is greater than .05 what does this mean? Is means that the two models are not significantly different from each other. If the two models are nested meaning that the more complicated model is not really improving from the simpler one.
3.Use the in-class practice data to write a summary. A linear regression model was conducted to predict ratings, based on their gender age and number of ads gor ran . The categorical variables are coded by using the dummy coding method; male is one and female is zero. A significant regression equation was found (F(2,500) = 1885, p < .001), with an R-square of .92, which suggests that 92% of the variance of ratings can be explained by the model. All of the predictor are significant. The result suggested that more ads and younger the aduience increate ratings significantly. Female audience are more likely to have a higer ratings.
library(readxl)
library(dplyr)
library("moments")
library("pastecs")
df = read_excel('/Users/jingx/Downloads/HU/510/RegressionExample.xlsx')
df = df%>% mutate(Sex = Sex %>% as.factor() %>% as.numeric()-1)
df
cor(df)
for (i in (df%>% colnames())){
plot(density(df[[i]]), main = i)
}
library(ggplot2)
ggplot(df, aes(x=Age, y=Rating,
shape=(Sex %>% as.factor()),
color= Ad %>% as.factor())) +
geom_point()
m1 = lm(Rating ~ Age, data = df)
m2 = lm(Rating ~ Age + Sex, data = df)
m3 = lm(Rating ~ Age + Ad, data = df)
m4 = lm(Rating ~ ., data = df)
m5 = lm(Rating ~ . + Age *Ad, data = df)
summary(m1)
summary(m2)
summary(m3)
summary(m4)
summary(m5)
anova(m4, m5)
LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoxLldoeSBhcmUgd2UgY29uY2VybmVkIHdpdGggbXVsdGljb2xsaW5lYXJpdHk/CgpNdWx0aWNvbGxpbmVhcml0eSBpcyBhIHBoZW5vbWVub24gaW4gd2hpY2ggb25lIGluZGVwZW5kZW50IHZhcmlhYmxlIGlzIGhpZ2hseSBjb3JyZWxhdGVkIHdpdGggb25lIG9yIG1vcmUgb2YgdGhlIG90aGVyIGluZGVwZW5kZW50IHZhcmlhYmxlcyBpbiBhIG11bHRpcGxlIHJlZ3Jlc3Npb24gZXF1YXRpb24uIEhhdmluZyBhIG11bHRpY29sbGluZWFyaXR5IGluIHRoZSBtb2RlbCB3b3VsZCByZXN1bHRlZCBpbiBpbmZsYXRlZCBgZXhwbGFpbmVkIHZhcml0YWlvbmAgZHVlIHRvIGhpZ2ggY29ycmVsYXRpb24gYmV0d2VlbiB0aGUgcHJlZGljdG9ycy4gVGh1cyB0aGUgUiBzcXVhcmVzIHdvdWxkIGJlIGluZmxhdGVkIGFzIHdlbGwgZ2l2ZSBvZmYgZmFsc2Ugc2lnbmFsIG9mIHRoZSBtb2RlbCBiZWluZyB3ZWxsIGZpdHRlZC4gCgpJbnR1aXRpdmVseSwgaXQgaXMganVzdCBhIGlzc3VlIG9mIG92ZXJmaXR0IHdpdGggcHJlZGljdGFiaWxpdHkgb2YgdGhlIG1vZGVsIGlzIGNvbXByZW1pc2VkLiAKCgoyLklmIHdlIHJ1biBhbiBhbm92YShtb2RlbDEsIG1vZGVsMikgYW5kIHRoZSBwIHZhbHVlIGlzIGdyZWF0ZXIgdGhhbiAuMDUgd2hhdCBkb2VzIHRoaXMgbWVhbj8KIElzIG1lYW5zIHRoYXQgdGhlIHR3byBtb2RlbHMgYXJlIG5vdCBzaWduaWZpY2FudGx5IGRpZmZlcmVudCBmcm9tIGVhY2ggb3RoZXIuIElmIHRoZSB0d28gbW9kZWxzIGFyZSBuZXN0ZWQgbWVhbmluZyB0aGF0IHRoZSBtb3JlIGNvbXBsaWNhdGVkIG1vZGVsIGlzIG5vdCByZWFsbHkgaW1wcm92aW5nIGZyb20gdGhlIHNpbXBsZXIgb25lLgoKMy5Vc2UgdGhlIGluLWNsYXNzIHByYWN0aWNlIGRhdGEgdG8gd3JpdGUgYSBzdW1tYXJ5LgpBIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsIHdhcyBjb25kdWN0ZWQgdG8gcHJlZGljdCByYXRpbmdzLCBiYXNlZCBvbiB0aGVpciBnZW5kZXIgYWdlIGFuZCBudW1iZXIgb2YgYWRzIGdvciByYW4gLiBUaGUgY2F0ZWdvcmljYWwgdmFyaWFibGVzIGFyZSBjb2RlZCBieSB1c2luZyB0aGUgZHVtbXkgY29kaW5nIG1ldGhvZDsgbWFsZSBpcyBvbmUgYW5kIGZlbWFsZSBpcyB6ZXJvLiBBIHNpZ25pZmljYW50IHJlZ3Jlc3Npb24gZXF1YXRpb24gd2FzIGZvdW5kIChGKDIsNTAwKSA9IDE4ODUsIHAgPCAuMDAxKSwgd2l0aCBhbiBSLXNxdWFyZSBvZiAuOTIsIHdoaWNoIHN1Z2dlc3RzIHRoYXQgOTIlIG9mIHRoZSB2YXJpYW5jZSBvZiByYXRpbmdzIGNhbiBiZSBleHBsYWluZWQgYnkgdGhlIG1vZGVsLiBBbGwgb2YgdGhlIHByZWRpY3RvciBhcmUgc2lnbmlmaWNhbnQuICBUaGUgcmVzdWx0IHN1Z2dlc3RlZCB0aGF0IG1vcmUgYWRzIGFuZCB5b3VuZ2VyIHRoZSBhZHVpZW5jZSBpbmNyZWF0ZSByYXRpbmdzIHNpZ25pZmljYW50bHkuIEZlbWFsZSBhdWRpZW5jZSBhcmUgbW9yZSBsaWtlbHkgdG8gaGF2ZSBhIGhpZ2VyIHJhdGluZ3MuIAoKYGBge3J9CmxpYnJhcnkocmVhZHhsKQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KCJtb21lbnRzIikKbGlicmFyeSgicGFzdGVjcyIpCmRmID0gcmVhZF9leGNlbCgnL1VzZXJzL2ppbmd4L0Rvd25sb2Fkcy9IVS81MTAvUmVncmVzc2lvbkV4YW1wbGUueGxzeCcpCgpkZiA9IGRmJT4lIG11dGF0ZShTZXggPSBTZXggJT4lIGFzLmZhY3RvcigpICU+JSBhcy5udW1lcmljKCktMSkKZGYKYGBgCgpgYGB7cn0KY29yKGRmKQpmb3IgKGkgaW4gKGRmJT4lIGNvbG5hbWVzKCkpKXsKICBwbG90KGRlbnNpdHkoZGZbW2ldXSksIG1haW4gPSBpKQp9CgpgYGAKCmBgYHtyfQpsaWJyYXJ5KGdncGxvdDIpCmdncGxvdChkZiwgYWVzKHg9QWdlLCB5PVJhdGluZywgCiAgICAgICAgICAgICAgIHNoYXBlPShTZXggJT4lIGFzLmZhY3RvcigpKSwgCiAgICAgICAgICAgICAgY29sb3I9IEFkICU+JSBhcy5mYWN0b3IoKSkpICsKICBnZW9tX3BvaW50KCkKCgpgYGAKCmBgYHtyfQptMSA9IGxtKFJhdGluZyB+IEFnZSwgZGF0YSA9IGRmKQptMiA9IGxtKFJhdGluZyB+IEFnZSArIFNleCwgZGF0YSA9IGRmKQptMyA9IGxtKFJhdGluZyB+IEFnZSArIEFkLCBkYXRhID0gZGYpCm00ID0gbG0oUmF0aW5nIH4gLiwgZGF0YSA9IGRmKQptNSA9IGxtKFJhdGluZyB+IC4gKyBBZ2UgKkFkLCBkYXRhID0gZGYpCgpzdW1tYXJ5KG0xKQpzdW1tYXJ5KG0yKQpzdW1tYXJ5KG0zKQpzdW1tYXJ5KG00KQpzdW1tYXJ5KG01KQoKYW5vdmEobTQsIG01KQpgYGAKCg==