This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
my_locale <- Sys.getlocale("LC_ALL")
Sys.setlocale("LC_ALL", my_locale)
OS reports request to set locale to "LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETARY=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252" cannot be honored
[1] ""
library(readxl)
Multilinear_regression_salary <- read_excel("C:/Users/DELL/Desktop/Imarticus/Assignments excel/Regression_BATCH2.xlsx")
View(Multilinear_regression_salary)
This dataset is all about predicting salary based on the variables.The variables GPA,Experience,Student,School ranking,Salary.
str(Multilinear_regression_salary)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 5 variables:
$ Student : num 1 2 3 4 5 6 7 8 9 10 ...
$ School_Ranking: num 78 56 23 67 56 78 68 89 37 67 ...
$ GPA : num 2.92 3.84 3.04 3.2 3.61 2.99 3.78 3.2 3.42 3.05 ...
$ Experience : num 3 9 6 6 7 5 8 5 7 5 ...
$ Salary : num 73590 87000 76970 79320 79530 ...
View function is used to view our dataset.
summary(Multilinear_regression_salary)
Student School_Ranking GPA Experience
Min. : 1.00 Min. :15.00 Min. :2.760 Min. :2.000
1st Qu.:10.75 1st Qu.:45.75 1st Qu.:3.033 1st Qu.:5.000
Median :20.50 Median :67.00 Median :3.155 Median :6.000
Mean :20.50 Mean :59.88 Mean :3.233 Mean :5.975
3rd Qu.:30.25 3rd Qu.:76.50 3rd Qu.:3.350 3rd Qu.:7.000
Max. :40.00 Max. :89.00 Max. :3.850 Max. :9.000
Salary
Min. :71040
1st Qu.:76913
Median :78670
Mean :78721
3rd Qu.:80600
Max. :87000
Summary of the dataset gives us the minimum value,maximum value, quartile values,mean,median. This gives us the basic understanding of our dataset.
str(Multilinear_regression_salary)
str stands for structure of the dataset to find out which are characters and which are numerical
Plot simply gives a scatter plot of our dataset including all variables.
Scatter.smooth gives us same scatter plot as plot but here we have done for only one variable
This is a boxplot for student variable of the dataset.
This boxplot is for school ranking variable of the dataset.
This boxplot is for only GPA variable ofthe dataset.
This boxplot contains for Experience variable of the dataset.
This boxplot is for salary variable of the datset.
This barplot compares salary and student variables.
This barplot compares salary and school ranking.
This barplot compares salary and GPA.
This barplot compares Salary and Experience.
par(mfrow=c(1,4))
barplot(Multilinear_regression_salary$Salary,Multilinear_regression_salary$Student,col = c('blue','green'),main='Salary vs Student')
barplot(Multilinear_regression_salary$Salary,Multilinear_regression_salary$School_Ranking,col=c('green','red'),main='Salary vs School ranking')
barplot(Multilinear_regression_salary$Salary,Multilinear_regression_salary$GPA,col =c('blue','red'),main = 'Salary vs GPA')
barplot(Multilinear_regression_salary$Salary,Multilinear_regression_salary$Experience,col=c('red','yellow'),main='Salary vs Experience')
Now if we want to see all the four graphs in a single screen we use par function. We mention the row number and coloumn number in mfrow and we get all the graphs in a single screen.
cor(Multilinear_regression_salary)
Student School_Ranking GPA Experience Salary
Student 1.00000000 0.0582101 -0.1262402 0.04395596 -0.00261919
School_Ranking 0.05821010 1.0000000 0.2051312 0.20250931 0.23429048
GPA -0.12624017 0.2051312 1.0000000 0.65904413 0.73788910
Experience 0.04395596 0.2025093 0.6590441 1.00000000 0.78580114
Salary -0.00261919 0.2342905 0.7378891 0.78580114 1.00000000
Now we check the corrrelation for all the variables to determine the strength.
In order to plot the correlation co-efficients we call the corrplot from library. We assign a variable to corrplot and run that variable.
myreg
Call:
lm(formula = Salary ~ Student + School_Ranking + GPA + Experience,
data = Multilinear_regression_salary)
Coefficients:
(Intercept) Student School_Ranking GPA
52751.870 7.064 9.442 5534.006
Experience
1232.513
After checking correlation we move on to regression. To perform regression we declare a variable and use lm function and form the regression model.
summary(myreg)
Call:
lm(formula = Salary ~ Student + School_Ranking + GPA + Experience,
data = Multilinear_regression_salary)
Residuals:
Min 1Q Median 3Q Max
-6359.4 -736.0 306.7 1392.8 4440.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52751.870 4928.281 10.704 1.39e-12 ***
Student 7.064 31.929 0.221 0.826188
School_Ranking 9.442 18.450 0.512 0.612031
GPA 5534.006 1785.345 3.100 0.003811 **
Experience 1232.513 294.651 4.183 0.000183 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2273 on 35 degrees of freedom
Multiple R-squared: 0.7058, Adjusted R-squared: 0.6722
F-statistic: 20.99 on 4 and 35 DF, p-value: 6.703e-09
Summary of the declared variable will give us the p value for considering our variables for the regression equation.
myreg1
Call:
lm(formula = Salary ~ GPA + Experience, data = Multilinear_regression_salary)
Coefficients:
(Intercept) GPA Experience
53295 5540 1257
After checking p value we have only two variables so we form a new eqaution with those two variables.
my_prediction_result
1
86620.66
Finaly with our formed regression equation we can predict for any given value.