class: title-slide .row[ .col-7[ .title[ # Correlation and Regression ] .subtitle[ ## Correlation and Regression ] .author[ ### Laxmikant Soni <br> [blog](https://laxmikants.github.io) <br> [<i class="fab fa-github"></i>](https://github.com/laxmiaknts) [<i class="fab fa-twitter"></i>](https://twitter.com/laxmikantsoni09) ] .affiliation[ ] ] .col-5[ .logo[ <img src="figures/rmarkdown.png" width="480" /> ] ] ] --- # Correlation and Regression .pull-left[ ## Correlation * **Definition**: Correlation measures the strength and direction of the linear relationship between two variables. * **Value Range**: Ranges from -1 to +1: - **+1**: Perfect positive correlation (variables increase together). - **-1**: Perfect negative correlation (one increases as the other decreases). - **0**: No linear relationship. * **Example**: A correlation of 0.8 between height and weight suggests that as height increases, weight tends to increase as well. ] -- .pull-right[ * **Types of Correlation**: - **Positive**: Both variables increase together. - **Negative**: One variable increases as the other decreases. - **No Correlation**: No consistent relationship between variables. ] --- # Correlation and Regression .pull-left[ ## Regression * **Definition**: Regression identifies the relationship between a dependent variable (outcome) and one or more independent variables (predictors). * **Types of Regression**: - **Simple Linear Regression**: Uses one predictor for prediction. - **Multiple Regression**: Uses multiple predictors. * **Example**: Predicting house price based on house size. Here, house size is the independent variable, and house price is the dependent variable. ] -- .pull-right[ * **Formula for Simple Linear Regression**: <p align = 'center'><img src = 'figures/Linear_model_representation.jpg' width = '400', height = '400'></p> ] --- # Correlation and Regression .pull-top[ ## Key Differences - **Correlation**: Quantifies strength and direction of a relationship but does not imply causation. - **Regression**: Models the relationship and enables prediction. * **Example**: If study hours and exam scores have a strong correlation, regression can model the relationship to predict a score based on study hours. ] --- # Statistical Methods for Correlation and Regression .pull-left[ ## Pearson Correlation Coefficient (r) * **Definition**: Measures the strength and direction of the linear relationship between two continuous variables. ] -- .pull-right[ * **Formula**: $$ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2} \sqrt{\sum (y_i - \bar{y})^2}} $$ - Values range from -1 (perfect negative correlation) to +1 (perfect positive correlation). ] --- # Statistical Methods for Correlation and Regression .pull-left[ ## Simple Linear Regression * **Definition**: Estimates the relationship between a dependent variable and one independent variable. ] -- .pull-right[ * **Formula**: $$ y = \theta_0 + \theta_1 x $$ - This minimizes the cost function to fit a line to the data. ] --- # Statistical Methods for Correlation and Regression .pull-left[ ## Multiple Linear Regression * **Definition**: Extends simple linear regression to include multiple predictors. ] -- .pull-right[ * **Formula**: $$ y = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \ldots + \theta_n x_n $$ - Finds the best-fit line in a multi-dimensional space. ] --- # Statistical Methods for Correlation and Regression .pull-left[ ## Polynomial Regression * **Definition**: Models non-linear relationships by including polynomial terms. ] -- .pull-right[ * **Example Formula**: $$ y = \theta_0 + \theta_1 x + \theta_2 x^2 + \ldots + \theta_n x^n $$ - This approach allows fitting curves to data instead of just straight lines. ] --- # Statistical Methods for Correlation and Regression .pull-left[ ## Logistic Regression * **Definition**: Used for binary outcomes (classification problems). ] -- .pull-right[ * **Formula**: $$ p(y=1) = \frac{1}{1 + e^{-(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \ldots + \theta_n x_n)}} $$ - Predicts the probability of a class or event. ] --- # Statistical Methods for Correlation and Regression .pull-left[ ## Choosing the Right Method - **Correlation**: Use Pearson for linear relationships, Spearman or Kendall for non-linear or ordinal data. ] -- .pull-right[ - **Regression**: Use simple or multiple regression for linear relationships, polynomial for curves, and logistic regression for classification tasks. ] --- # Choosing Between Correlation and Regression .pull-left[ ## Nature of Relationship * Use **correlation** to assess the strength and direction of a linear relationship between two variables. * Use **regression** to predict the value of one variable based on another. ] -- .pull-right[ ## Variables * Correlation examines the relationship between two variables without assuming causation. * Regression establishes a dependent (outcome) variable and one or more independent (predictor) variables. ] --- # Choosing Between Correlation and Regression .pull-left[ ## Data Types * Correlation is suitable for **numerical data**. * Regression can handle both **numerical and categorical** independent variables. ] -- .pull-right[ ## Analysis Goal * Choose correlation for **exploratory analysis** to understand relationships. * Choose regression for **modeling and predicting** outcomes based on one or more predictors. ] --- # Examples of Correlation and Regression .pull-left[ ## Suitable for Correlation * **Example 1**: Examining the relationship between hours studied and test scores to see if more study time is associated with higher scores. * **Example 2**: Analyzing the correlation between temperature and ice cream sales to determine if higher temperatures lead to increased sales. ] -- .pull-right[ ## Suitable for Regression * **Example 1**: Predicting house prices based on features like size, location, and number of bedrooms, where house price is the dependent variable. * **Example 2**: Estimating a student's future GPA based on their high school GPA and extracurricular activities, using multiple independent variables. ] --- class: inverse, center, middle # Thanks