WE ALL HAVE AN HOPE TO BUILD OR TO BUY A PROPERTY OF OUR OWN. THIS IS JUST AN PUBLICATION OF FUTURE PREDICTION OF OUR PROPERTY.

This is published by C.AKKIL ANAND.

THANKS TO MR. THULSIDASS.

THANKS TO KAAGLE.COM (FOR REFERENCE DATA).

Linear Regression

Introduction

• A data model explicitly describes a relationship between predictor and response variables.

• Linear regression fits a data model that is linear in the model coefficients.

• The most common type of linear regression is a least-squares fit, which can fit both lines and polynomials, among other linear models.

• Before you model the relationship between pairs of quantities, it is a good idea to perform correlation analysis to establish if a linear relationship exists between these quantities.

• Be aware that variables can have nonlinear relationships, which correlation analysis cannot detect.

Simple Linear Regression

• Linear regression models the relation between a dependent, or response, variable y and one or more independent, or predictor, variables x1.

• Simple linear regression considers only one independent variable using the relation

            y=β0+β1+ϵ,

where β0 is the y-intercept, β1 is the slope (or regression coefficient), and ϵ is the error term.

REFERENCE DATA:

This is the data which is refered to our prediction

View(House_Price)

SUMMARY OF THE DATA:

• This sample data consists of bedrooms, bathrooms, livivg area, floors, overall area, and price.

• These are some eesential data to predict our future value of the property.

summary(House_Price)

STRUCTURE OF OUR DATA:

These are some statistically used format.

just for the refernce for reserch purpose

str(House_Price)

GRAPHICAL REPRESENTATION OF SAMPLE DATA:

1.BARPLOT OF SAMPLE DATA:

table(House_Price$bedrooms)
barplot(table(House_Price$bedrooms),xlab = 'no.of bedroooms', ylab = 'frequency',main ='MOST PREFERED NO.OF.BEDROOM',col=c('blue','green','red','orange','yellow'))

this barpolt reveals that “MOST PREFERED NO.OF BEDROOMS are”3" and “4”.

table(House_Price$sqft_living)
plot(table(House_Price$sqft_living),xlab='living area',ylab = 'FREQUENCY',main = 'MOST PREFERED LIVING AREA',col=c('light green'))

This barplot results that the “MOST PREFERED LIVING AREA” is between : 1000 TO 2500 sq.ft

Box Plot

• To spot any outlier observations in the variable.

• Having outliers in your predictor can drastically affect the predictions as they can affect the direction/slope of the line of best fit.

boxplot(House_Price$price,main="Price Box Plot",col = "red")

• Using BoxPlot To Check For Outliers.

Scatter Plot:

• Using Scatter Plot To Visualise The Relationship.

• Scatter plots can help visualise linear relationships between the response and predictor variables.

scatter.smooth(x=House_Price$price,y=House_Price$sqft_living,main="HOUSE PRICE ANALYSIS",col="blue")

• The scatter plot along with the smoothing line above suggests a linear and positive relationship between the dependent and independent variable

CORRELATION OF THE SAMPLE DATA:

This the states that variable which have strong relationship between the price.

IS USED IN THE PROCESS OF PREDICTION WHICH MAY BE HELPFUL.

cor(House_Price)

CORRELATION GRAPH FOR THE RELATED VALUES:

• Correlation analysis studies the strength of relationship between two continuous variables.

• It involves computing the correlation coefficient between the the two variables.

par(mfrow=c(1,1))
library(corrplot)
al=cor(House_Price)
corrplot(al)

as we state that “PRICE” is our main part prediction we take best correlation of it, that “correlation between SQ.FT OF LIVING AND PRICE”

Other correlation are between bathroon and sq.ft of living.

REGRESSION FOR THE SAMPLE DATA:

• p-Values are very important.

• we can consider a linear model to be statistically significant only when both these p-Values are less than the pre-determined statistical significance level of 0.05.

myre=lm(House_Price$price~House_Price$bedrooms+House_Price$bathrooms+House_Price$sqft_living+House_Price$sqft_lot+House_Price$floors)
summary(myre)

• To compare the effiency of two different regression models, it’s a good practice to use the validation sample to compare the AIC of the two models.

• Besides AIC, other evaluation metrics like mean absolute percentage error (MAPE), Mean Squared Error (MSE) and Mean Absolute Error (MAE) can also be used.

• AIC measures of the goodness of fit of the linear regression model and can also be used for model selection.

STATISTIC CRITERION

• R-Squared - Higher the better

• Adj R-Squared - Higher the better

• F-Statistic - Higher the better

• Std. Error - Closer to zero the better

CALCULATING THE COEFFIENTS RELATED FOR THE MANNUAL CALCULATION

cor(House_Price$sqft_living,House_Price$price)
par(mfrow=c(2,2))
plot(myre,col="light blue")
abline(myre)
myre$fitted.values
plot(House_Price$price,myre$fitted.values,col="yellow")
abline(House_Price$price,myre$fitted.values)

COEFFIENTS:

coef(myre)

from these coeffients we calculate future value by substuting in linear regression equation

Predictions

my_predict=data.frame(bathrooms=2,floors=4)
my_predict_result=predict(myre,my_predict)
my_predict_result
