Linear Regression
Machine Learning With Python: Linear Regression With three Variable
Problem Statement
Problem Statement: Given above data build a machine learning model that can predict home prices based on square feet area, no of bedroom and age
| area | bedrooms | age | price |
|---|---|---|---|
| 2600 | 3 | 20 | 550000 |
| 3000 | 4 | 15 | 565000 |
| 3200 | 18 | 610000 | |
| 3600 | 3 | 30 | 595000 |
| 4000 | 5 | 8 | 760000 |
| 4100 | 6 | 8 | 810000 |
Mean Squared Error (MSE)
You can draw multiple lines like this but we choose the one where total sum of error is minimum
You might remember about linear equation from your high school days math class. Home prices can be presented as following equation,
home price = m * (area) + b
Intercept and Slope
Generic form of same equation is,
Download CSV file
!()[https://docs.google.com/spreadsheets/d/1C0FC0UnnH8WXzb85RTAaDKYaoxuZ1cWdkc8n2DJ3CDA/edit?usp=sharing]
load the data
## area bedrooms age price
## 0 2600 3.0 20 550000
## 1 3000 4.0 15 565000
## 2 3200 NaN 18 610000
## 3 3600 3.0 30 595000
## 4 4000 5.0 8 760000
## 5 4100 6.0 8 810000
draw chart between area and price
!()[scatterplot.png]
handle missing data
## <string>:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
## The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
##
## For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
features and target
Split the dataset into training and testing sets
Create and train the Linear Regression model
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
Predict on test set
Print the model’s coefficients and intercept
## Coefficients: [ 115.67164179 38432.8358209 -1902.98507463]
## Intercept: 120373.13432834996
Predict price for new data (example: area=3200, bedrooms=3, age=18)
## C:\Users\slaxm\Documents\projects\CA5CO32\myenv\Lib\site-packages\sklearn\base.py:493: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
## warnings.warn(
## Predicted Price: 571567.1641791033
Problem to Predict for the data from the given excel file and generate list of predictions
!()[https://docs.google.com/spreadsheets/d/1jDsPOTB5co7rcW66AVcRsQYrQRgYNXxC44XI-rSI7s4/edit?usp=sharing]