- Correlation analysis measures the strength and direction of the linear relationship between two variables.
- It is often used to determine whether a change in one variable is associated with a change in another variable.
Conner Zhao
Suppose we want to analyze the correlation between the height and weight of individuals.
## [1] 0.66713
The correlation coefficient is calculated as follows:
\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \]
Where: - \(x_i, y_i\) are the individual sample points. - \(\bar{x}, \bar{y}\) are the means of the respective variables.
Here is the code used to create the scatter plot with a regression line:
p2 <- ggplot(df, aes(x = height, y = weight)) + geom_point(color = “blue”) + geom_smooth(method = “lm”, se = FALSE, color = “red”) + theme_minimal() + labs(title = “Height vs. Weight with Regression Line”, x = “Height (cm)”, y = “Weight (kg)”)
p2 ```
If \(r > 0\), there is a positive relationship between height and weight.
If \(r < 0\), there is a negative relationship between height and weight.
In our example, the correlation coefficient is 0.6671, indicating a positive relationship.
Correlation analysis helps determine the strength and direction of the relationship between two variables.
In our example, there is a positive correlation between height and weight.
This means that as height increases, weight also tends to increase.