.row[ .col-7[ .title[ # Linear Regression ] .subtitle[ ## Linear regression with multiple features ] .author[ ### Laxmikant Soni <br> [Web-Site](https://laxmikants.github.io) <br> [<i class="fab fa-github"></i>](https://github.com/laxmiaknts) [<i class="fab fa-twitter"></i>](https://twitter.com/laxmikantsoni09) ] .affiliation[ ] ] .col-5[ .logo[ <img src="figures/rmarkdown.png" width="480" /> ] ] ] --- class: very-large-body # Multiple features .pull-top[ Linear regression with multiple variables is also known as “multivariate linear regression”. We now introduce notation for equations where we can have any number of input variables. `\(\begin{align*}x_j^{(i)} &= \text{value of feature } j \text{ in the }i^{th}\text{ training example} \newline x^{(i)}& = \text{the column vector of all the feature inputs of the }i^{th}\text{ training example} \newline m &= \text{the number of training examples} \newline n &= \left| x^{(i)} \right| ; \text{(the number of features)} \end{align*}\)` ] --- class: large-body # Hypothesis function .pull-top[ `\(h_\theta(x) = \theta_0 + \theta_1x_1 + \theta_2 x_2 + \theta_3 x_3 + ... + \theta_n x_n\)` In order to develop intuition about this function, we can think about θ0 as the basic price of a house, θ1 as the price per square meter, θ2 as the price per floor, etc. x1 will be the number of square meters in the house, x2 the number of floors, etc. Using the definition of matrix multiplication, our multivariate hypothesis function can be concisely represented as: `\(h_\theta (x) = \left [ \matrix { \theta_0 & \theta_1 ... & \theta_n } \right ] \left [ \matrix { x_0 \\ x_1 \\ ... \\ x_n} \right ] = \theta^T x\)` ] --- class: large-body # Hypothesis function .pull-top[ This is a vectorization of our hypothesis function for one training example; see the lessons on vectorization to learn more. `\(x_{0}^{(i)} = 1 \ for \ (i \in 1, ..., m)\)` `\(h_\theta(x) = \theta_0 + \theta_1x_1 + \theta_2 x_2 + \theta_3 x_3 + ... + \theta_n x_n\)` [Note: So that we can do matrix operations with theta and x, we will set `\(x_{(0)}^{(i)} = 1\)`, for all values of i. This makes the two vectors ‘theta’ and `\(x_{(i)}\)` match each other element-wise (that is, have the same number of elements: n+1).] The training examples are stored in X row-wise, like such: `\(\left[\matrix{x_{(0)}^{(1)} & x_{(1)}^{(1)} \\ x_{(0)}^{(2)} & x_{(1)}^{(2)} \\ x_{(0)}^{(3)} & x_{(1)}^{(3)}} \right] , \theta = \left[\matrix {\theta_0 \\ \theta_1}\right]\)` You can calculate the hypothesis as a column vector of size (m x 1) with: `\(h_\theta (X) = X\theta\)` ] --- class: large-body # Cost function .pull-top[ For the parameter vector `\(\theta\)`, the cost function is `\(J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta (x_i) - y_i)^2\)` The vectorized version is: `\(J(\theta) = \frac{1}{2m} (X \theta - \bar{y})^T (X \theta - \bar{y})\)` where `\(\bar{y}\)` denotes the vector of all y values ] --- class: large-body # Gradient Descent for Multiple Variables .pull-top[ The gradient descent equation itself is generally the same form; we just have to repeat it for our ‘n’ features: repeat until convergence: `\(\theta_0 := \theta_0 - \alpha \frac{1}{m} \sum_{i = 1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) * x_0^{(i)}\)` `\(\theta_1 := \theta_1 - \alpha \frac{1}{m} \sum_{i = 1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) * x_1^{(i)}\)` `\(\theta_2 := \theta_2 - \alpha \frac{1}{m} \sum_{i = 1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) * x_2^{(i)}\)` In other words: repeat until convergence: { `\(\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i = 1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) * x_0^{(i)} \ for \ j := 0..n\)` } ] --- class: large-body # Feature normalization .pull-top[ Mean normalization involves subtracting the average value for an input variable from the values for that input variable, resulting in a new average value for the input variable of just zero. To implement both of these techniques, adjust your input values as shown in this formula: `\(x_i := \frac{x_i - \mu_i}{s_i}\)` Where μi is the average of all the values for feature (i) and si is the range of values (max - min), or si is the standard deviation. ] --- class: large-body # Features and polynomial regression .pull-top[ We can improve our features and the form of our hypothesis function in a couple different ways. Our hypothesis function need not be linear (a straight line) if that does not fit the data well. We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form). `\(h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_1^2 + \theta_3 x_1^3\)` To make it a square root function, we could do `\(h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 \sqrt{x_1}\)` ] --- class: large-body # Normal Equation .pull-top[ The “Normal Equation” is a method of finding the optimum theta without iteration. `\(\theta = (X^T X)^{-1} X^T y\)` There is no need to do feature scaling with the normal equation. ] --- class: large-body # Machine Learning With Python: Linear Regression With three Variable .pull-left[ ### Import Libraries ```python import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error from sklearn.linear_model import LinearRegression ``` ] -- .pull-right[ - NumPy is a library used for numerical computations in Python, providing support for large multi-dimensional arrays and matrices, along with a variety of mathematical functions. - Pandas is a powerful data manipulation library, mainly used for data structures like Series (1D) and DataFrames (2D). It provides functions for reading, processing, and analyzing structured data, often in tabular form. - Matplotlib is a plotting library used to create static, animated, and interactive visualizations in Python. The pyplot module provides a MATLAB-like interface for creating various types of plots (e.g., line plots, bar charts, histograms). - LinearRegression is a machine learning algorithm for modeling the relationship between a dependent variable (target) and one or more independent variables (features). It fits a linear equation to observed data for prediction purposes. ] <hr> --- class: large-body # Machine Learning With Python: Linear Regression With three Variable .pull-left[ ### Download CSV File ] -- .pull-right[ !(CSV File)[https://docs.google.com/spreadsheets/d/1C0FC0UnnH8WXzb85RTAaDKYaoxuZ1cWdkc8n2DJ3CDA/edit?usp=sharing] ] --- class: large-body # Machine Learning With Python: Linear Regression With three Variable .pull-left[ ### Load CSV File ```python import pandas as pd df = pd.read_csv('~/homeprices.csv') df ``` ``` ## area bedrooms age price ## 0 2600 3.0 20 550000 ## 1 3000 4.0 15 565000 ## 2 3200 NaN 18 610000 ## 3 3600 3.0 30 595000 ## 4 4000 5.0 8 760000 ## 5 4100 6.0 8 810000 ``` ] -- .pull-right[ - **`import pandas as pd`**: Imports the Pandas library, which is used for data manipulation and analysis, and assigns it the alias `pd`. - **`df = pd.read_csv('homeprices.csv')`**: Reads a CSV file named **`homeprices.csv`** and loads it into a Pandas **DataFrame** called `df`. This function assumes the CSV file is in the same directory as your script or notebook. - **`df`**: Displays the contents of the DataFrame `df`, showing the dataset loaded from the CSV file. ] --- # Machine Learning With Python: Linear Regression With three Variable .pull-top[ ## handle missing data ```python df['bedrooms'].fillna(df['bedrooms'].median(), inplace=True) ``` ] --- # Machine Learning With Python: Linear Regression With three Variable .pull-top[ ## features and target ```python # Features and target X = df[['area', 'bedrooms', 'age']] # Independent variables y = df['price'] # Dependent variable (target) ``` ] --- # Machine Learning With Python: Linear Regression With three Variable .pull-top[ ## Split the dataset into training and testing sets ```python X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) ``` ] --- # Machine Learning With Python: Linear Regression With three Variable .pull-top[ ## Create and train the Linear Regression model ```python model = LinearRegression() model.fit(X_train, y_train) ``` ```{=html} <style>#sk-container-id-1 { /* Definition of color scheme common for light and dark mode */ --sklearn-color-text: black; --sklearn-color-line: gray; /* Definition of color scheme for unfitted estimators */ --sklearn-color-unfitted-level-0: #fff5e6; --sklearn-color-unfitted-level-1: #f6e4d2; --sklearn-color-unfitted-level-2: #ffe0b3; --sklearn-color-unfitted-level-3: chocolate; /* Definition of color scheme for fitted estimators */ --sklearn-color-fitted-level-0: #f0f8ff; --sklearn-color-fitted-level-1: #d4ebff; --sklearn-color-fitted-level-2: #b3dbfd; --sklearn-color-fitted-level-3: cornflowerblue; /* Specific color for light theme */ --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black))); --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white))); --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black))); --sklearn-color-icon: #696969; @media (prefers-color-scheme: dark) { /* Redefinition of color scheme for dark theme */ --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white))); --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111))); --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white))); --sklearn-color-icon: #878787; } } #sk-container-id-1 { color: var(--sklearn-color-text); } #sk-container-id-1 pre { padding: 0; } #sk-container-id-1 input.sk-hidden--visually { border: 0; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px); height: 1px; margin: -1px; overflow: hidden; padding: 0; position: absolute; width: 1px; } #sk-container-id-1 div.sk-dashed-wrapped { border: 1px dashed var(--sklearn-color-line); margin: 0 0.4em 0.5em 0.4em; box-sizing: border-box; padding-bottom: 0.4em; background-color: var(--sklearn-color-background); } #sk-container-id-1 div.sk-container { /* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */ display: inline-block !important; position: relative; } #sk-container-id-1 div.sk-text-repr-fallback { display: none; } div.sk-parallel-item, div.sk-serial, div.sk-item { /* draw centered vertical line to link estimators */ background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background)); background-size: 2px 100%; background-repeat: no-repeat; background-position: center center; } /* Parallel-specific style estimator block */ #sk-container-id-1 div.sk-parallel-item::after { content: ""; width: 100%; border-bottom: 2px solid var(--sklearn-color-text-on-default-background); flex-grow: 1; } #sk-container-id-1 div.sk-parallel { display: flex; align-items: stretch; justify-content: center; background-color: var(--sklearn-color-background); position: relative; } #sk-container-id-1 div.sk-parallel-item { display: flex; flex-direction: column; } #sk-container-id-1 div.sk-parallel-item:first-child::after { align-self: flex-end; width: 50%; } #sk-container-id-1 div.sk-parallel-item:last-child::after { align-self: flex-start; width: 50%; } #sk-container-id-1 div.sk-parallel-item:only-child::after { width: 0; } /* Serial-specific style estimator block */ #sk-container-id-1 div.sk-serial { display: flex; flex-direction: column; align-items: center; background-color: var(--sklearn-color-background); padding-right: 1em; padding-left: 1em; } /* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is clickable and can be expanded/collapsed. - Pipeline and ColumnTransformer use this feature and define the default style - Estimators will overwrite some part of the style using the `sk-estimator` class */ /* Pipeline and ColumnTransformer style (default) */ #sk-container-id-1 div.sk-toggleable { /* Default theme specific background. It is overwritten whether we have a specific estimator or a Pipeline/ColumnTransformer */ background-color: var(--sklearn-color-background); } /* Toggleable label */ #sk-container-id-1 label.sk-toggleable__label { cursor: pointer; display: block; width: 100%; margin-bottom: 0; padding: 0.5em; box-sizing: border-box; text-align: center; } #sk-container-id-1 label.sk-toggleable__label-arrow:before { /* Arrow on the left of the label */ content: "▸"; float: left; margin-right: 0.25em; color: var(--sklearn-color-icon); } #sk-container-id-1 label.sk-toggleable__label-arrow:hover:before { color: var(--sklearn-color-text); } /* Toggleable content - dropdown */ #sk-container-id-1 div.sk-toggleable__content { max-height: 0; max-width: 0; overflow: hidden; text-align: left; /* unfitted */ background-color: var(--sklearn-color-unfitted-level-0); } #sk-container-id-1 div.sk-toggleable__content.fitted { /* fitted */ background-color: var(--sklearn-color-fitted-level-0); } #sk-container-id-1 div.sk-toggleable__content pre { margin: 0.2em; border-radius: 0.25em; color: var(--sklearn-color-text); /* unfitted */ background-color: var(--sklearn-color-unfitted-level-0); } #sk-container-id-1 div.sk-toggleable__content.fitted pre { /* unfitted */ background-color: var(--sklearn-color-fitted-level-0); } #sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content { /* Expand drop-down */ max-height: 200px; max-width: 100%; overflow: auto; } #sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before { content: "▾"; } /* Pipeline/ColumnTransformer-specific style */ #sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label { color: var(--sklearn-color-text); background-color: var(--sklearn-color-unfitted-level-2); } #sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label { background-color: var(--sklearn-color-fitted-level-2); } /* Estimator-specific style */ /* Colorize estimator box */ #sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label { /* unfitted */ background-color: var(--sklearn-color-unfitted-level-2); } #sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label { /* fitted */ background-color: var(--sklearn-color-fitted-level-2); } #sk-container-id-1 div.sk-label label.sk-toggleable__label, #sk-container-id-1 div.sk-label label { /* The background is the default theme color */ color: var(--sklearn-color-text-on-default-background); } /* On hover, darken the color of the background */ #sk-container-id-1 div.sk-label:hover label.sk-toggleable__label { color: var(--sklearn-color-text); background-color: var(--sklearn-color-unfitted-level-2); } /* Label box, darken color on hover, fitted */ #sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted { color: var(--sklearn-color-text); background-color: var(--sklearn-color-fitted-level-2); } /* Estimator label */ #sk-container-id-1 div.sk-label label { font-family: monospace; font-weight: bold; display: inline-block; line-height: 1.2em; } #sk-container-id-1 div.sk-label-container { text-align: center; } /* Estimator-specific */ #sk-container-id-1 div.sk-estimator { font-family: monospace; border: 1px dotted var(--sklearn-color-border-box); border-radius: 0.25em; box-sizing: border-box; margin-bottom: 0.5em; /* unfitted */ background-color: var(--sklearn-color-unfitted-level-0); } #sk-container-id-1 div.sk-estimator.fitted { /* fitted */ background-color: var(--sklearn-color-fitted-level-0); } /* on hover */ #sk-container-id-1 div.sk-estimator:hover { /* unfitted */ background-color: var(--sklearn-color-unfitted-level-2); } #sk-container-id-1 div.sk-estimator.fitted:hover { /* fitted */ background-color: var(--sklearn-color-fitted-level-2); } /* Specification for estimator info (e.g. "i" and "?") */ /* Common style for "i" and "?" */ .sk-estimator-doc-link, a:link.sk-estimator-doc-link, a:visited.sk-estimator-doc-link { float: right; font-size: smaller; line-height: 1em; font-family: monospace; background-color: var(--sklearn-color-background); border-radius: 1em; height: 1em; width: 1em; text-decoration: none !important; margin-left: 1ex; /* unfitted */ border: var(--sklearn-color-unfitted-level-1) 1pt solid; color: var(--sklearn-color-unfitted-level-1); } .sk-estimator-doc-link.fitted, a:link.sk-estimator-doc-link.fitted, a:visited.sk-estimator-doc-link.fitted { /* fitted */ border: var(--sklearn-color-fitted-level-1) 1pt solid; color: var(--sklearn-color-fitted-level-1); } /* On hover */ div.sk-estimator:hover .sk-estimator-doc-link:hover, .sk-estimator-doc-link:hover, div.sk-label-container:hover .sk-estimator-doc-link:hover, .sk-estimator-doc-link:hover { /* unfitted */ background-color: var(--sklearn-color-unfitted-level-3); color: var(--sklearn-color-background); text-decoration: none; } div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover, .sk-estimator-doc-link.fitted:hover, div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover, .sk-estimator-doc-link.fitted:hover { /* fitted */ background-color: var(--sklearn-color-fitted-level-3); color: var(--sklearn-color-background); text-decoration: none; } /* Span, style for the box shown on hovering the info icon */ .sk-estimator-doc-link span { display: none; z-index: 9999; position: relative; font-weight: normal; right: .2ex; padding: .5ex; margin: .5ex; width: min-content; min-width: 20ex; max-width: 50ex; color: var(--sklearn-color-text); box-shadow: 2pt 2pt 4pt #999; /* unfitted */ background: var(--sklearn-color-unfitted-level-0); border: .5pt solid var(--sklearn-color-unfitted-level-3); } .sk-estimator-doc-link.fitted span { /* fitted */ background: var(--sklearn-color-fitted-level-0); border: var(--sklearn-color-fitted-level-3); } .sk-estimator-doc-link:hover span { display: block; } /* "?"-specific style due to the `<a>` HTML tag */ #sk-container-id-1 a.estimator_doc_link { float: right; font-size: 1rem; line-height: 1em; font-family: monospace; background-color: var(--sklearn-color-background); border-radius: 1rem; height: 1rem; width: 1rem; text-decoration: none; /* unfitted */ color: var(--sklearn-color-unfitted-level-1); border: var(--sklearn-color-unfitted-level-1) 1pt solid; } #sk-container-id-1 a.estimator_doc_link.fitted { /* fitted */ border: var(--sklearn-color-fitted-level-1) 1pt solid; color: var(--sklearn-color-fitted-level-1); } /* On hover */ #sk-container-id-1 a.estimator_doc_link:hover { /* unfitted */ background-color: var(--sklearn-color-unfitted-level-3); color: var(--sklearn-color-background); text-decoration: none; } #sk-container-id-1 a.estimator_doc_link.fitted:hover { /* fitted */ background-color: var(--sklearn-color-fitted-level-3); } </style><div id="sk-container-id-1" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>LinearRegression()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" checked><label for="sk-estimator-id-1" class="sk-toggleable__label fitted sk-toggleable__label-arrow fitted"> LinearRegression<a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LinearRegression.html">?<span>Documentation for LinearRegression</span></a><span class="sk-estimator-doc-link fitted">i<span>Fitted</span></span></label><div class="sk-toggleable__content fitted"><pre>LinearRegression()</pre></div> </div></div></div></div> ``` ] --- # Machine Learning With Python: Linear Regression With three Variable .pull-top[ ## Predict on test set ```python y_pred = model.predict(X_test) ``` ] --- # Machine Learning With Python: Linear Regression With three Variable .pull-top[ ## Evaluate the model (optional: print Mean Squared Error) ```python mse = mean_squared_error(y_test, y_pred) print(f"Mean Squared Error: {mse}") ``` ``` ## Mean Squared Error: 1713617314.5467577 ``` ] --- # Machine Learning With Python: Linear Regression With three Variable .pull-top[ ## Print the model's coefficients and intercept ```python print(f"Coefficients: {model.coef_}") ``` ``` ## Coefficients: [ 115.67164179 38432.8358209 -1902.98507463] ``` ```python print(f"Intercept: {model.intercept_}") ``` ``` ## Intercept: 120373.13432834996 ``` ] --- # Machine Learning With Python: Linear Regression With three Variable .pull-top[ ## Predict price for new data (example: area=3200, bedrooms=3, age=18) ```python new_data = np.array([[3200, 3, 18]]) predicted_price = model.predict(new_data) ``` ``` ## C:\Users\slaxm\AppData\Local\Programs\Python\Python312\Lib\site-packages\sklearn\base.py:493: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names ## warnings.warn( ``` ```python print(f"Predicted Price: {predicted_price[0]}") ``` ``` ## Predicted Price: 571567.1641791033 ``` ] --- class: inverse, center, middle # Thanks ---