Homework1

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

#Question1
x=c(9,5,10,7,15)
y=c(39,17,39,27,58)
plot(x,y)
linmod=lm(y~x)
abline(linmod, col="red")

linmod

## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      -1.092        4.032

#Predecting Y at X=3
newdata = data.frame(x=c(3))
head(newdata)

sprintf('predicted value Y at X=3 is %f',predict(linmod,newdata))

## [1] "predicted value Y at X=3 is 11.003521"

#Question2 A linear regression model was fitted to predict the city’s per-capita gross metropolitan product (pcgmp: in dollars per person per year) on an urban economies dataset. Five variables, including population of each city (pop), the fraction (not percentage) of each city’s economy devoted to four industries: finance, “professional and technical” services (prof.tech), information and communications technologies (ICT), and management services (management), were used. Given the summary of the linear regression model in Figure.1, answer the following questions.

#a)write estimated model: pop=x1 finance=x2 prof.tech=x3 ict=x4 management=x5 pcgmp=2.17e+04+2.24e-03x1+2.42e+04x2+3.09e+04x3+6.42e+04x4+1.95e+05*x5

#b)RSs= RSE*n-2 =7250^2x127 = 6,67,54,37,500 #c)R^2 measures the proportion of variances in output as R^2 is : =100 x 0.433 = 43.3%

#d)Is there a relationship between the city’s per-capita gross metropolitan product and five variables? The F-statistic is greater than 1 and p value is less than 0.05.from this statistics we can conclude that there is relationship between pcgmp and the other 5 Features

#Question2 e)
x1=2361000 
x2=0.2018
x3=0.0777
x4=0.03434
x5=0.02946
pcgmp=2.17e+04+2.24e-03*x1+2.42e+04*x2+3.09e+04*x3+6.42e+04*x4+1.95e+05*x5
sprintf("the per capita gross for model is %f",pcgmp)

## [1] "the per capita gross for model is 42222.458000"

#Question3 #Given X1 = GPA #X2 = IQ #X3 = Level (1 for college and 0 for high school) #X4 = Interaction between GPA and IQ #X5 = Interaction between GPA and Level #As we know the equation for multiple regression is #So Y = 50+20x1+0.07x2+35x3+0.01x4-10x5 #As given ‘0’ for High school #Yh5 = 50+20x1+0.07x2+35[0]+0.01x4-10[0] #Yh5 = 50+20x1+0.07x2+0.01x4

#Also, 1 for college,Yc = 50+20x1+0.07x2+35[1] +0.01x4-10x5 = 85+20x1+0.07x2+0.01x4-10x5

#Option(iii) For this a fixed value of IQ & GPA, high school graduates earn more, on average, than college graduates provided that the GPA is high enough.

#b) yc = 85+20x1+0.07x2+0.01x4-10x5Given salary of college graduate with IQ of 110 and GPA of 4.0 IQ = 110 = x2GPA = 4 = x1X4 = 1104X5 = 41Keeping them in yc equation yc = 85+20(4)+0.07(110)+0.01(1104)-10(41) = 137.1 # Since, the unit was in 1000 dollars, the postgrad salary is $137.100.

#c) The statement is false to verify the coefficient for the GPA or IQ interaction term is very small and also the statistical significance of the interaction is different. #Also it has very small coefficient the interaction effect is small because there are only two variables GPA and IQ.

#Question 4
library(ISLR)
#Dataset Auto is Loaded
data(Auto)
#lm() function to perform a simple linear regression with mpg and HP
fit = lm(mpg ~ horsepower, data = Auto)
summary(fit)

## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

#predicted mpg associated with a horsepower of 98
predict(fit, data.frame(horsepower = 98), interval = "predict",level=0.95)

##        fit     lwr      upr
## 1 24.46708 14.8094 34.12476

predict(fit, data.frame(horsepower = 98), interval = "confidence",level=0.95)

##        fit      lwr      upr
## 1 24.46708 23.97308 24.96108

#Plot function with Hp and mpg ; Abline function for leaset square regression line
plot(Auto$horsepower, Auto$mpg, main = "Scatterplot", xlab = "horsepower", ylab = "mpg")
abline(fit, col = "blue")

#produced diagnostic plots of the least squares regression fit
plot(fit)

#4 i) There is relationship between the predictor and response .As we can see the plot as mpg and horsepower has highly significant relationship # ii) The scatterplot has strong relationship between mpg and horsepower because as we can see from plot the horsepower increases mpg decreases approximately from this its evident that there is good relationship # iii)The relationship is negative because,It has more horsepower and less miles per gallon this will have less fuel efficiency #iv) The predicted mpg associated with hp of 98 is 24.46708 # The level of significance of 95% predicted is 14.8094-34.12476 # The level of significance of 95% confidence is 24.97308-24.96108

#Question5 
library(ISLR)
data(Auto)
#Scatterplot for all variables
pairs(Auto)

#correlation between the variable except name
cor(Auto[1:(length(Auto)-1)])

##                     mpg  cylinders displacement horsepower     weight
## mpg           1.0000000 -0.7776175   -0.8051269 -0.7784268 -0.8322442
## cylinders    -0.7776175  1.0000000    0.9508233  0.8429834  0.8975273
## displacement -0.8051269  0.9508233    1.0000000  0.8972570  0.9329944
## horsepower   -0.7784268  0.8429834    0.8972570  1.0000000  0.8645377
## weight       -0.8322442  0.8975273    0.9329944  0.8645377  1.0000000
## acceleration  0.4233285 -0.5046834   -0.5438005 -0.6891955 -0.4168392
## year          0.5805410 -0.3456474   -0.3698552 -0.4163615 -0.3091199
## origin        0.5652088 -0.5689316   -0.6145351 -0.4551715 -0.5850054
##              acceleration       year     origin
## mpg             0.4233285  0.5805410  0.5652088
## cylinders      -0.5046834 -0.3456474 -0.5689316
## displacement   -0.5438005 -0.3698552 -0.6145351
## horsepower     -0.6891955 -0.4163615 -0.4551715
## weight         -0.4168392 -0.3091199 -0.5850054
## acceleration    1.0000000  0.2903161  0.2127458
## year            0.2903161  1.0000000  0.1815277
## origin          0.2127458  0.1815277  1.0000000

#lm() function relation between the predictor and response except name
fit2=lm(mpg ~ cylinders + displacement + horsepower + weight + acceleration + year + origin, data = Auto)
summary(fit2)

## 
## Call:
## lm(formula = mpg ~ cylinders + displacement + horsepower + weight + 
##     acceleration + year + origin, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5903 -2.1565 -0.1169  1.8690 13.0604 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
## cylinders     -0.493376   0.323282  -1.526  0.12780    
## displacement   0.019896   0.007515   2.647  0.00844 ** 
## horsepower    -0.016951   0.013787  -1.230  0.21963    
## weight        -0.006474   0.000652  -9.929  < 2e-16 ***
## acceleration   0.080576   0.098845   0.815  0.41548    
## year           0.750773   0.050973  14.729  < 2e-16 ***
## origin         1.426141   0.278136   5.127 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared:  0.8215, Adjusted R-squared:  0.8182 
## F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

#plot() function to produce diagnostic plots of the linear regression fit
plot(fit2)

#* and : symbols are fitted linear regression models with interaction effects
fit3=lm(mpg ~ (horsepower* cylinders)+(acceleration * horsepower), data = Auto[1:8])
summary(fit3)

## 
## Call:
## lm(formula = mpg ~ (horsepower * cylinders) + (acceleration * 
##     horsepower), data = Auto[1:8])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5284  -2.2888  -0.3582   1.6502  15.2858 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             78.214448   6.589083  11.870  < 2e-16 ***
## horsepower              -0.396913   0.066457  -5.972 5.31e-09 ***
## cylinders               -5.949145   0.645117  -9.222  < 2e-16 ***
## acceleration            -0.282888   0.238244  -1.187    0.236    
## horsepower:cylinders     0.044425   0.006010   7.392 9.05e-13 ***
## horsepower:acceleration -0.002710   0.002371  -1.143    0.254    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.963 on 386 degrees of freedom
## Multiple R-squared:  0.7454, Adjusted R-squared:  0.7421 
## F-statistic: 226.1 on 5 and 386 DF,  p-value: < 2.2e-16

plot(fit3)

#different transformations of the variables, such as log(X), √X, X2.
plot(log(Auto$horsepower), Auto$mpg)

plot(sqrt(Auto$acceleration), Auto$mpg)

plot((Auto$horsepower)^2, Auto$mpg)

#5 c) i) Relationship between predictor and response is strong as f statistics is 252.4 on 7 D.O.F # ii)Here *** symbol represents it has high significant response from this intercept,displacement,weight,year,origin has strong relationship to the response. # iii) it has positive relationship because here year=0.750773

#d) From residual v/s fitted we can see 3 outliers unusually there are (323,326,327) # From residual v/s leverage there are unusually 3 outliers unusally are (327,394)

#e) from * interactions we can see that (horsepower* acceleration)& (horsepower*cylinders) can give more interactions to improve the fuel efficiency #f) performed different interactions with log(x),x^2,X

Homework1_PDA

2023-02-20

R Markdown

Including Plots