In today’s data-driven world, knowing how to uncover hidden relationships in your dataset is key. In this post, I’ll walk you through real R code that shows how to use the cor() function in R. Whether you’re a student or a researcher in Europe, these easy-to-follow examples will help you understand how to calculate correlation coefficients, build correlation matrices, and visualize data. Our examples use the classic mtcars dataset to make learning practical and fun. Read on for clear code examples, helpful explanations, and tips to improve your data analysis skills.
cor Function in R | Calculate Correlation Coefficients in R
The simplest way to start is by calculating the correlation between two variables. For example, loading the built-in mtcars dataset and running:
## [1] -0.7761684
This code computes the correlation coefficient between miles per gallon (mpg) and horsepower (hp). A value close to 1 means there is a strong positive correlation, while a value near -1 shows a strong negative correlation. This step is crucial because it lets you see if two variables move together. The result gives you an immediate insight into how your data behaves and helps you decide on further analysis steps.
To compute correlations for all numeric columns in your dataset, the dplyr package is very useful. By using:
## mpg cyl disp hp drat wt
## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594
## cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958
## disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799
## hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479
## drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406
## wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000
## qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159
## vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157
## am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953
## gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870
## carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059
## qsec vs am gear carb
## mpg 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
## cyl -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
## hp -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
## drat 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
## wt -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
## qsec 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
## am -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
## gear -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
## carb -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
you generate a full correlation matrix. This matrix displays the correlation coefficient for every pair of variables, making it easier to spot relationships in one view. The table format allows you to compare multiple variables at once. This approach is especially handy for large datasets, as it saves time and improves your workflow. It is a clear and simple method that even beginners can use to get a quick snapshot of their data.
For deeper insights, the Hmisc package is a powerful tool. The following code snippet installs and loads Hmisc, then defines a custom function to format the correlation matrix:
## mpg| cyl| disp| hp| drat| wt| qsec|
## mpg 1.000 | -0.852**| -0.848**| -0.776**| 0.681**| -0.868**| 0.419* |
## cyl -0.852**| 1.000 | 0.902**| 0.832**| -0.700**| 0.782**| -0.591**|
## disp -0.848**| 0.902**| 1.000 | 0.791**| -0.710**| 0.888**| -0.434* |
## hp -0.776**| 0.832**| 0.791**| 1.000 | -0.449**| 0.659**| -0.708**|
## drat 0.681**| -0.700**| -0.710**| -0.449**| 1.000 | -0.712**| 0.091 |
## wt -0.868**| 0.782**| 0.888**| 0.659**| -0.712**| 1.000 | -0.175 |
## qsec 0.419* | -0.591**| -0.434* | -0.708**| 0.091 | -0.175 | 1.000 |
## vs 0.664**| -0.811**| -0.710**| -0.723**| 0.440* | -0.555**| 0.745**|
## am 0.600**| -0.523**| -0.591**| -0.243 | 0.713**| -0.692**| -0.230 |
## gear 0.480**| -0.493**| -0.556**| -0.126 | 0.700**| -0.583**| -0.213 |
## carb -0.551**| 0.527**| 0.395* | 0.750**| -0.091 | 0.428* | -0.656**|
## vs| am| gear| carb|
## mpg 0.664**| 0.600**| 0.480**| -0.551**|
## cyl -0.811**| -0.523**| -0.493**| 0.527**|
## disp -0.710**| -0.591**| -0.556**| 0.395* |
## hp -0.723**| -0.243 | -0.126 | 0.750**|
## drat 0.440* | 0.713**| 0.700**| -0.091 |
## wt -0.555**| -0.692**| -0.583**| 0.428* |
## qsec 0.745**| -0.230 | -0.213 | -0.656**|
## vs 1.000 | 0.168 | 0.206 | -0.570**|
## am 0.168 | 1.000 | 0.794**| 0.058 |
## gear 0.206 | 0.794**| 1.000 | 0.274 |
## carb -0.570**| 0.058 | 0.274 | 1.000 |
This function converts your data into a matrix, computes both the correlation coefficients and p-values, and then adds significance markers (stars) based on the p-values. This formatted output makes it easier to read and interpret the strength and significance of each relationship. It’s an excellent example of how custom code can tailor analysis to your needs.
Real-world data often contains missing values. To manage this, you can use the “use” parameter in the cor() function:
## mpg cyl disp hp drat wt
## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594
## cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958
## disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799
## hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479
## drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406
## wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000
## qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159
## vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157
## am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953
## gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870
## carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059
## qsec vs am gear carb
## mpg 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
## cyl -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
## hp -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
## drat 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
## wt -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
## qsec 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
## am -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
## gear -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
## carb -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
This ensures that only complete cases are used in the analysis, which improves accuracy. Additionally, you can choose the method for computing correlations. By default, Pearson correlation is used, but you can switch to Spearman or Kendall’s method for different types of data:
## [1] -0.7761684
## [1] -0.8946646
## [1] -0.7428125
Each method has its strengths; choose the one that best suits your data’s nature. This flexibility makes the cor() function a versatile tool for many analysis needs.
Visual tools help transform numbers into clear insights. With the corrplot package, you can turn a correlation matrix into an engaging graphic:
This code creates a colorful chart where each circle’s size and color indicate the strength and direction of the correlation. Blue may show a strong negative correlation, while red indicates a strong positive correlation. Visualization helps quickly spot trends and make comparisons across many variables. It simplifies complex data and makes presentations more appealing, making it a great tool for sharing findings with colleagues or in academic settings.
Another way to visualize correlations is by creating a heatmap using ggplot2 and reshape2. First, compute and melt the correlation matrix:
Then, build the heatmap:
This code creates a heatmap that uses color gradients to show correlations. It is an interactive way to see which variables are closely related, and helps you present data in a clear, engaging format.
Testing the significance of your correlations adds trust to your findings. Use the cor.test() function to see if a correlation is statistically significant:
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$hp
## t = -6.7424, df = 30, p-value = 1.788e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8852686 -0.5860994
## sample estimates:
## cor
## -0.7761684
This function not only returns the correlation coefficient but also a p-value and confidence intervals. These details help you understand whether the observed relationship between mpg and hp is real or just due to chance. This step is important for researchers who need solid evidence to back up their analysis. The results provide a complete picture of the data relationships and ensure your conclusions are sound.
This post has shown you how to use the cor() function in R in many ways—from basic correlation calculations with the mtcars dataset to advanced custom functions, handling missing data, and creating dynamic visualizations. Each step is designed to give you clear insights and make your data analysis work easier. Now is the time to take your data skills to the next level. Book a free call with our experts at RStudioDatalab and enjoy a free exploratory data analysis along with a 10% discount on our services this month. If you’re not satisfied, we offer a full refund. Start transforming your data today and join the community of smart, data-driven students and researchers!