This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
#Question1
x=c(9,5,10,7,15)
y=c(39,17,39,27,58)
plot(x,y)
linmod=lm(y~x)
abline(linmod, col="red")
linmod
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## -1.092 4.032
#Predecting Y at X=3
newdata = data.frame(x=c(3))
head(newdata)
sprintf('predicted value Y at X=3 is %f',predict(linmod,newdata))
## [1] "predicted value Y at X=3 is 11.003521"
#Question2 A linear regression model was fitted to predict the city’s per-capita gross metropolitan product (pcgmp: in dollars per person per year) on an urban economies dataset. Five variables, including population of each city (pop), the fraction (not percentage) of each city’s economy devoted to four industries: finance, “professional and technical” services (prof.tech), information and communications technologies (ICT), and management services (management), were used. Given the summary of the linear regression model in Figure.1, answer the following questions.
#a)write estimated model: pop=x1 finance=x2 prof.tech=x3 ict=x4 management=x5 pcgmp=2.17e+04+2.24e-03x1+2.42e+04x2+3.09e+04x3+6.42e+04x4+1.95e+05*x5
#b)RSs= RSE*n-2 =7250^2x127 = 6,67,54,37,500 #c)R^2 measures the proportion of variances in output as R^2 is : =100 x 0.433 = 43.3%
#d)Is there a relationship between the city’s per-capita gross metropolitan product and five variables? The F-statistic is greater than 1 and p value is less than 0.05.from this statistics we can conclude that there is relationship between pcgmp and the other 5 Features
#Question2 e)
x1=2361000
x2=0.2018
x3=0.0777
x4=0.03434
x5=0.02946
pcgmp=2.17e+04+2.24e-03*x1+2.42e+04*x2+3.09e+04*x3+6.42e+04*x4+1.95e+05*x5
sprintf("the per capita gross for model is %f",pcgmp)
## [1] "the per capita gross for model is 42222.458000"
#Question3 #Given X1 = GPA #X2 = IQ #X3 = Level (1 for college and 0 for high school) #X4 = Interaction between GPA and IQ #X5 = Interaction between GPA and Level #As we know the equation for multiple regression is #So Y = 50+20x1+0.07x2+35x3+0.01x4-10x5 #As given ‘0’ for High school #Yh5 = 50+20x1+0.07x2+35[0]+0.01x4-10[0] #Yh5 = 50+20x1+0.07x2+0.01x4
#Also, 1 for college,Yc = 50+20x1+0.07x2+35[1] +0.01x4-10x5 = 85+20x1+0.07x2+0.01x4-10x5
#Option(iii) For this a fixed value of IQ & GPA, high school graduates earn more, on average, than college graduates provided that the GPA is high enough.
#b) yc = 85+20x1+0.07x2+0.01x4-10x5Given salary of college graduate with IQ of 110 and GPA of 4.0 IQ = 110 = x2GPA = 4 = x1X4 = 1104X5 = 41Keeping them in yc equation yc = 85+20(4)+0.07(110)+0.01(1104)-10(41) = 137.1 # Since, the unit was in 1000 dollars, the postgrad salary is $137.100.
#c) The statement is false to verify the coefficient for the GPA or IQ interaction term is very small and also the statistical significance of the interaction is different. #Also it has very small coefficient the interaction effect is small because there are only two variables GPA and IQ.
#Question 4
library(ISLR)
#Dataset Auto is Loaded
data(Auto)
#lm() function to perform a simple linear regression with mpg and HP
fit = lm(mpg ~ horsepower, data = Auto)
summary(fit)
##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
#predicted mpg associated with a horsepower of 98
predict(fit, data.frame(horsepower = 98), interval = "predict",level=0.95)
## fit lwr upr
## 1 24.46708 14.8094 34.12476
predict(fit, data.frame(horsepower = 98), interval = "confidence",level=0.95)
## fit lwr upr
## 1 24.46708 23.97308 24.96108
#Plot function with Hp and mpg ; Abline function for leaset square regression line
plot(Auto$horsepower, Auto$mpg, main = "Scatterplot", xlab = "horsepower", ylab = "mpg")
abline(fit, col = "blue")
#produced diagnostic plots of the least squares regression fit
plot(fit)
#4 i) There is relationship between the predictor and response .As we
can see the plot as mpg and horsepower has highly significant
relationship # ii) The scatterplot has strong relationship between mpg
and horsepower because as we can see from plot the horsepower increases
mpg decreases approximately from this its evident that there is good
relationship # iii)The relationship is negative because,It has more
horsepower and less miles per gallon this will have less fuel efficiency
#iv) The predicted mpg associated with hp of 98 is 24.46708 # The level
of significance of 95% predicted is 14.8094-34.12476 # The level of
significance of 95% confidence is 24.97308-24.96108
#Question5
library(ISLR)
data(Auto)
#Scatterplot for all variables
pairs(Auto)
#correlation between the variable except name
cor(Auto[1:(length(Auto)-1)])
## mpg cylinders displacement horsepower weight
## mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
## cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
## displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
## horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
## weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
## acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
## year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
## origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054
## acceleration year origin
## mpg 0.4233285 0.5805410 0.5652088
## cylinders -0.5046834 -0.3456474 -0.5689316
## displacement -0.5438005 -0.3698552 -0.6145351
## horsepower -0.6891955 -0.4163615 -0.4551715
## weight -0.4168392 -0.3091199 -0.5850054
## acceleration 1.0000000 0.2903161 0.2127458
## year 0.2903161 1.0000000 0.1815277
## origin 0.2127458 0.1815277 1.0000000
#lm() function relation between the predictor and response except name
fit2=lm(mpg ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = Auto)
summary(fit2)
##
## Call:
## lm(formula = mpg ~ cylinders + displacement + horsepower + weight +
## acceleration + year + origin, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5903 -2.1565 -0.1169 1.8690 13.0604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.218435 4.644294 -3.707 0.00024 ***
## cylinders -0.493376 0.323282 -1.526 0.12780
## displacement 0.019896 0.007515 2.647 0.00844 **
## horsepower -0.016951 0.013787 -1.230 0.21963
## weight -0.006474 0.000652 -9.929 < 2e-16 ***
## acceleration 0.080576 0.098845 0.815 0.41548
## year 0.750773 0.050973 14.729 < 2e-16 ***
## origin 1.426141 0.278136 5.127 4.67e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
## F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
#plot() function to produce diagnostic plots of the linear regression fit
plot(fit2)
#* and : symbols are fitted linear regression models with interaction effects
fit3=lm(mpg ~ (horsepower* cylinders)+(acceleration * horsepower), data = Auto[1:8])
summary(fit3)
##
## Call:
## lm(formula = mpg ~ (horsepower * cylinders) + (acceleration *
## horsepower), data = Auto[1:8])
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5284 -2.2888 -0.3582 1.6502 15.2858
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 78.214448 6.589083 11.870 < 2e-16 ***
## horsepower -0.396913 0.066457 -5.972 5.31e-09 ***
## cylinders -5.949145 0.645117 -9.222 < 2e-16 ***
## acceleration -0.282888 0.238244 -1.187 0.236
## horsepower:cylinders 0.044425 0.006010 7.392 9.05e-13 ***
## horsepower:acceleration -0.002710 0.002371 -1.143 0.254
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.963 on 386 degrees of freedom
## Multiple R-squared: 0.7454, Adjusted R-squared: 0.7421
## F-statistic: 226.1 on 5 and 386 DF, p-value: < 2.2e-16
plot(fit3)
#different transformations of the variables, such as log(X), √X, X2.
plot(log(Auto$horsepower), Auto$mpg)
plot(sqrt(Auto$acceleration), Auto$mpg)
plot((Auto$horsepower)^2, Auto$mpg)
#5 c) i) Relationship between predictor and response is strong as f statistics is 252.4 on 7 D.O.F # ii)Here *** symbol represents it has high significant response from this intercept,displacement,weight,year,origin has strong relationship to the response. # iii) it has positive relationship because here year=0.750773
#d) From residual v/s fitted we can see 3 outliers unusually there are (323,326,327) # From residual v/s leverage there are unusually 3 outliers unusally are (327,394)
#e) from * interactions we can see that (horsepower* acceleration)& (horsepower*cylinders) can give more interactions to improve the fuel efficiency #f) performed different interactions with log(x),x^2,X